PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
1
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current …
Ceyao Zhang
,
Kaijie Yang
,
Siyi Hu
,
Zihao Wang
,
Guanghe Li
,
Yihang Sun
,
Cheng Zhang
,
Zhaowei Zhan
,
Anji Liu
,
Song-Chun Zhu
,
Xiaojun Chang
,
Junge Zhang
,
Feng Yin
,
Yitao Liang
,
Yaodong Yang
Cite
Code
STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning
Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent …
Sirui Chen
,
Zhaowei Zhang
,
Yaodong Yang
,
Yali Du
PDF
Cite
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in …
Siyuan Qi
,
Shuo Chen
,
Yexin Li
,
Xiangyu Kong
,
Junqi Wang
,
Bangcheng Yang
,
Pring Wong
,
Yifan Zhong
,
Xiaoyuan Zhang
,
Zhaowei Zhan
,
Nian Liu
,
Wei Wang
,
Yaodong Yang
,
Song-Chun Zhu
PDF
Cite
Code
Poster
Maximum Entropy Heterogeneous-Agent Reinforcement Learning
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years. However, existing …
Jiarong Liu
,
Yifan Zhong
,
Siyi Hu
,
Haobo Fu
,
QIANG FU
,
Xiaojun Chang
,
Yaodong Yang
PDF
Cite
BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment
In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs). …
Jiaming Ji
,
Mickel Liu
,
Juntao Dai
,
Xuehai Pan
,
Chi Zhang
,
Ce Bian
,
Boyuan Chen
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
PDF
Cite
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces …
Jiaming Ji
,
Borong Zhang
,
Jiayi Zhou
,
Xuehai Pan
,
Weidong Huang
,
Ruiyang Sun
,
Yiran Geng
,
Yifan Zhong
,
Juntao Dai
,
Yaodong Yang
PDF
Cite
Unidexgrasp++: Improving Dexterous Grasping Policy Learning via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning
We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud …
Weikang Wan
,
Haoran Geng
,
Yun Liu
,
Zikang Shan
,
Yaodong Yang
,
Li Yi
,
He Wang
PDF
Cite
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning …
Hanjing Wang
,
Man-Kit Sit
,
Congjie He
,
Ying Wen
,
Weinan Zhang
,
Jun Wang
,
Yaodong Yang
,
Luo Mai
PDF
Cite
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
In order for agents in multi-agent systems (MAS) to be safe, they need to take into account the risks posed by the actions of other …
Oliver Slumbers
,
David Henry Mguni
,
Stephen Marcus McAleer
,
Stefano B. Blumberg
,
Jun Wang
,
Yaodong Yang
PDF
Cite
Regret-Minimizing Double Oracle for Extensive-Form Games
By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form …
Xiaohang Tang
,
Le Cong Dinh
,
Stephen Marcus McAleer
,
Yaodong Yang
PDF
Cite
«
»
Cite
×