PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
1
Benchmarking Multi-national Value Alignment for Large Language Models
Do Large Language Models (LLMs) hold positions that conflict with your country’s values? Occasionally they do! However, existing works …
Chengyi Ju
,
Weijie Shi
,
Chengzhong Liu
,
Jiaming Ji
,
Jipeng Zhang
,
Ruiyuan Zhang
,
Jiajie Xu
,
Yaodong Yang
,
Sirui Han
,
Yike Guo
Cite
PKU-safeRLHF: Towards Multi-level Safety Alignment for LLMs with Human Preference
In this study, we introduce the safety human preference dataset, PKU-SafeRLHF, designed to promote research on safety alignment in …
Jiaming Ji
,
Donghai Hong
,
Borong Zhang
,
Boyuan Chen
,
Juntao Dai
,
Boren Zheng
,
Tianyi Qiu
,
Boxun Li
,
Yaodong Yang
PDF
Cite
Code
DOI
Differentiable Information Enhanced Model-Based Reinforcement Learning
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …
Xiaoyuan Zhang
,
Xinyan Cai
,
Bo Liu
,
Weidong Huang
,
Song-Chun Zhu
,
Siyuan Qi
,
Yaodong Yang
Cite
Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games
Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO …
Hongsong Tang
,
Yingzhuo Liu
,
Letian Ni
,
Liuyu Xiang
,
Yaodong Yang
,
Ke Bi
,
Zhaofeng He
PDF
Cite
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of …
Mingzhi Wang
,
Chengdong Ma
,
Qizhi Chen
,
Linjian Meng
,
Yang Han
,
Jiancong Xiao
,
Zhaowei Zhan
,
Jing Huo
,
Weijie J Su
,
Yaodong Yang
PDF
Cite
Rat: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These …
Fengshuo Bai
,
Runze Liu
,
Yali Du
,
Ying Wen
,
Yaodong Yang
Cite
Towards Efficient Collaboration Via Graph Modeling In Reinforcement Learning
In multi-agent reinforcement learning, a commonly considered paradigm is centralized training with decentralized execution. However, in …
Wenzhe Fan
,
Zishun Yu
,
Chengdong Ma
,
Changye Li
,
Yaodong Yang
,
Xinhua Zhang
Cite
Differentiable Information Enhanced Model-Based Reinforcement Learning
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …
Xiaoyuan Zhang
,
Xinyan Cai
,
Bo Liu
,
Weidong Huang
,
Song-Chun Zhu
,
Siyuan Qi
,
Yaodong Yang
PDF
Cite
Falcon: Fast Visuomotor Policies via Partial Denoising
Diffusion policies are widely adopted in complex visuomotor tasks for their ability to capture multimodal action distributions. …
Haojun Chen
,
Minghao Liu
,
Chengdong Ma
,
Xiaojian Ma
,
Zailin Ma
,
Huimin Wu
,
Yuanpei Chen
,
Yifan Zhong
,
Mingzhi Wang
,
Qing Li
,
Yaodong Yang
PDF
Cite
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement …
Jiayi Zhou
,
Jiaming Ji
,
Juntao Dai
,
Yaodong Yang
PDF
Cite
»
Cite
×