1

Benchmarking Multi-national Value Alignment for Large Language Models

Do Large Language Models (LLMs) hold positions that conflict with your country’s values? Occasionally they do! However, existing works …

Chengyi Ju, Weijie Shi, Chengzhong Liu, Jiaming Ji, Jipeng Zhang, Ruiyuan Zhang, Jiajie Xu, Yaodong Yang, Sirui Han, Yike Guo

PKU-safeRLHF: Towards Multi-level Safety Alignment for LLMs with Human Preference

In this study, we introduce the safety human preference dataset, PKU-SafeRLHF, designed to promote research on safety alignment in …

Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Juntao Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

Differentiable Information Enhanced Model-Based Reinforcement Learning

Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …

Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang

Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games

Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO …

Hongsong Tang, Yingzhuo Liu, Letian Ni, Liuyu Xiang, Yaodong Yang, Ke Bi, Zhaofeng He

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of …

Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhan, Jing Huo, Weijie J Su, Yaodong Yang

Rat: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These …

Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang

Towards Efficient Collaboration Via Graph Modeling In Reinforcement Learning

In multi-agent reinforcement learning, a commonly considered paradigm is centralized training with decentralized execution. However, in …

Wenzhe Fan, Zishun Yu, Chengdong Ma, Changye Li, Yaodong Yang, Xinhua Zhang

Differentiable Information Enhanced Model-Based Reinforcement Learning

Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …

Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang

Falcon: Fast Visuomotor Policies via Partial Denoising

Diffusion policies are widely adopted in complex visuomotor tasks for their ability to capture multimodal action distributions. …

Haojun Chen, Minghao Liu, Chengdong Ma, Xiaojian Ma, Zailin Ma, Huimin Wu, Yuanpei Chen, Yifan Zhong, Mingzhi Wang, Qing Li, Yaodong Yang

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement …

Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang