Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games

Hongsong Tang, Yingzhuo Liu, Letian Ni, Liuyu Xiang, Yaodong Yang, Ke Bi, Zhaofeng He

April 2025

Abstract

Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO solution framework, optimizing policy diversity is crucial for addressing nontransitive game problems, helping the agent population avoid exploitation by unfamiliar opponents. In addition, while deep reinforcement learning is highly effective in solving complex game environments, its integration with PSRO remains fragmented and lacking in effective coordination. In this study, we propose distributed PSRO to efficiently solve complex game scenarios. To enhance diversity while managing optimization costs, we introduce TOP-K truncation, which prioritizes high-quality opponents and limits the size of the policy pool during sampling. This approach not only reduces interference from less effective strategies but also ensures computational efficiency by seamlessly integrating with our distributed training framework. We also design the distributed training framework to incorporate diversity estimation directly into the sampling process, achieving diversity optimization without incurring additional computational overhead. Furthermore, we introduce the opponent first (OF) method, which enhances decision-making by leveraging opponent information during interaction sampling. We perform experimental validation using a nontransitive mixture model and AlphaStar888 to confirm the effectiveness of the TOP-K truncation approach. Finally, we demonstrate the feasibility and efficiency of the distributed training framework and the OF approach in a Google Research Football 11 versus 11 scenario.

Type

Publication

IEEE Transactions on Neural Networks and Learning Systems