PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
Publications
Type
1
2
Date
2025
2024
2023
2022
2021
Zhaowei Zhang
,
Fengshuo Bai
,
Mingzhi Wang
,
Haoyang Ye
,
Chengdong Ma
,
Yaodong Yang
(2025).
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems
. Artificial General Intelligence.
Cite
DOI
Yiran Geng
,
Jiaming Ji
,
Yuanpei Chen
,
Haoran Geng
,
Fangwei Zhong
,
Yaodong Yang
(2025).
ReDMan: Reliable Dexterous Manipulation with Safe Reinforcement Learning
. Machine Learning.
PDF
Cite
Jiaming Ji
,
Donghai Hong
,
Borong Zhang
,
Boyuan Chen
,
Juntao Dai
,
Boren Zheng
,
Tianyi Qiu
,
Boxun Li
,
Yaodong Yang
(2025).
PKU-safeRLHF: Towards Multi-level Safety Alignment for LLMs with Human Preference
. ACL 2025.
PDF
Cite
Code
DOI
Jun Gao
,
Junlin Cui
,
Huijia Wu
,
Liuyu Xian
,
Han Zhao
,
Xiangang Li
,
Meng Fang
,
Yaodong Yang
,
Zhaofeng He
(2025).
Can Large Language Models Independently Complete Tasks? A Dynamic Evaluation Framework for Multi-turn Task Planning and Completion
. Neurocomputing.
Cite
Chengyi Ju
,
Weijie Shi
,
Chengzhong Liu
,
Jiaming Ji
,
Jipeng Zhang
,
Ruiyuan Zhang
,
Jiajie Xu
,
Yaodong Yang
,
Sirui Han
,
Yike Guo
(2025).
Benchmarking Multi-national Value Alignment for Large Language Models
. ACL 2025.
Cite
Xiangbin Meng
,
Jiaming Ji
,
Xiangyu Yan
,
Juntao Dai
,
Boyuan Chen
,
Guan Wang
,
Hua Xu
,
Jingjia Wang
,
Xuliang Wang
,
Da Liu
,
Mingqi Zheng
,
Rongzhou Wu
,
Chuanjie Wu
,
Yuwei Wu
,
Wenyao Wan
,
Zhen Song
,
Yaodong Yang
(2025).
Med-Aligner Empowers LLM Medical Applications for complex medical scenarios
. The Innovation.
PDF
Cite
Erlan Yu,
,
Xuehong Chu
,
Wanwan Zhang
,
Xiangbin Meng
,
Yaodong Yang
,
Xunming Ji
,
Chuanjie Wu
(2025).
Large Language Models in Medicine: Applications, Challenges, and Future Directions
. International Journal of Medical Sciences.
Cite
Zhixun Chen
,
Zijing Shi
,
Yaodong Yang
,
Meng Fang
,
Yali Du
(2025).
Hierarchical Multi-Agent Framework for Dynamic Macroeconomic Modeling Using Large Language Models
. Autonomous Agents and Multiagent Systems.
Cite
Wenzhe Fan
,
Zishun Yu
,
Chengdong Ma
,
Changye Li
,
Yaodong Yang
,
Xinhua Zhang
(2025).
Towards Efficient Collaboration Via Graph Modeling In Reinforcement Learning
. Proceedings of the AAAI Conference on Artificial Intelligence.
Cite
Mingxiao Feng
,
Yaodong Yang
,
Wengang Zhou
,
Houqiang Li
(2025).
TIMAR: Transition-Informed Representation for Sample-Efficient Multi-agent Reinforcement Learning
. Neural Networks.
Cite
Fengshuo Bai
,
Runze Liu
,
Yali Du
,
Ying Wen
,
Yaodong Yang
(2025).
Rat: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
. Proceedings of the AAAI Conference on Artificial Intelligence.
Cite
Mingzhi Wang
,
Chengdong Ma
,
Qizhi Chen
,
Linjian Meng
,
Yang Han
,
Jiancong Xiao
,
Zhaowei Zhan
,
Jing Huo
,
Weijie J Su
,
Yaodong Yang
(2025).
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
. ICLR 2025.
PDF
Cite
Hongsong Tang
,
Yingzhuo Liu
,
Letian Ni
,
Liuyu Xiang
,
Yaodong Yang
,
Ke Bi
,
Zhaofeng He
(2025).
Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games
. IEEE Transactions on Neural Networks and Learning Systems.
PDF
Cite
Xiaoyuan Zhang
,
Xinyan Cai
,
Bo Liu
,
Weidong Huang
,
Song-Chun Zhu
,
Siyuan Qi
,
Yaodong Yang
(2025).
Differentiable Information Enhanced Model-Based Reinforcement Learning
. Proceedings of the AAAI Conference on Artificial Intelligence.
Cite
Jiayi Zhou
,
Jiaming Ji
,
Juntao Dai
,
Yaodong Yang
(2025).
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
. 39th AAAI Conference on Artificial Intelligence (AAAI 2025).
PDF
Cite
Haojun Chen
,
Minghao Liu
,
Chengdong Ma
,
Xiaojian Ma
,
Zailin Ma
,
Huimin Wu
,
Yuanpei Chen
,
Yifan Zhong
,
Mingzhi Wang
,
Qing Li
,
Yaodong Yang
(2025).
Falcon: Fast Visuomotor Policies via Partial Denoising
. ICML 2025.
PDF
Cite
Xiaoyuan Zhang
,
Xinyan Cai
,
Bo Liu
,
Weidong Huang
,
Song-Chun Zhu
,
Siyuan Qi
,
Yaodong Yang
(2025).
Differentiable Information Enhanced Model-Based Reinforcement Learning
. 39th AAAI Conference on Artificial Intelligence (AAAI 2025).
PDF
Cite
Zhaowei Zhang
,
Fengshuo Bai
,
Qizhi Chen
,
Chengdong Ma
,
Mingzhi Wang
,
Haoran Sun
,
Zilong Zheng
,
Yaodong Yang
(2025).
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
. The 13th International Conference on Learning Representations (ICLR 2025).
PDF
Cite
Code
Hantao Lou
,
Changye Li
,
Jiaming Ji
,
Yaodong Yang
(2025).
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
. ICML 2025.
PDF
Cite
Jingrui Pan
,
Shancun Liu
,
Qiang Zhang
,
Yaodong Yang
(2025).
Discrete Information Acquisition in Financial Markets
. Mathematics.
Cite
DOI
Hantao Lou
,
Jiaming Ji
,
Kaile Wang
,
Yaodong Yang
(2025).
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
. AAAI Alignment Track 2025.
PDF
Cite
Code
Yue Li
,
Shurui Wang
,
Zhou Lv
,
Zhaoji Wang
,
Yunbiao Zhao
,
Ying Xie
,
Yang Xu
,
Yaodong Yang
,
Et Al
(2025).
Transforming the Synthesis of Carbon Nanotubes with Machine Learning Models and Automation
. Matter.
PDF
Cite
Lijun Zhang
,
Lin Li
,
Wei Wei
,
Huizhong Song
,
Yaodong Yang
,
Jiye Liang
(2025).
Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning
. Advances in Neural Information Processing Systems.
PDF
Cite
Juntao Dai
,
Tianle Chen
,
Xuyao Wang
,
Ziran Yang
,
Taiye Chen
,
Jiaming Ji
,
Yaodong Yang
(2025).
Safesora: Towards Safety Alignment of Text2video Generation via a Human Preference Dataset
. Advances in Neural Information Processing Systems.
PDF
Cite
Juntao Dai
,
Taiye Chen
,
Yaodong Yang
,
Qian Zheng
,
Gang Pan
(2025).
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
. ICLR 2025.
PDF
Cite
Dongxiang Chen
,
Yaodong Yang
,
Ying Wen
(2025).
DSR: Reinforcement Learning with Dynamical Skill Refinement
. Frontiers of Computer Science.
PDF
Cite
Zhiyu Zhao
,
Ning Yang
,
Xue Yan
,
Haifeng Zhang
,
Jun Wang
,
Yaodong Yang
(2025).
Correlated Mean Field Imitation Learning
. AAMAS 2025.
Cite
Zihao Wang
,
Shaofei Cai
,
Anji Liu
,
Yonggang Jin
,
Jinbing Hou
,
Bowei Zhang
,
Haowei Lin
,
Yaodong Yang
,
Et Al.
(2024).
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
PDF
Cite
Jie Liu
,
Yinmin Zhang
,
Chuming Li
,
Yaodong Yang
,
Yu Liu
,
Wanli Ouyang
(2024).
Adaptive Pessimism via Target Q-Value for Offline Reinforcement Learning
. Neural Networks.
Cite
Xiaotian Liu
,
Ming Hu
,
Yijie Peng
,
Yaodong Yang
(2024).
Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management
. Production and Operations Management.
PDF
Cite
Yifan Zhong
,
Chengdong Ma
,
Xiaoyuan Zhang
,
Ziran Yang
,
Haojun Chen
,
Qingfu Zhang
,
Siyuan Qi
,
Yaodong Yang
(2024).
Panacea: Pareto Alignment via Preference Adaptation for LLMs
. Advances in Neural Information Processing Systems, 2024.
PDF
Cite
Jiaming Ji
,
Boyuan Chen
,
Hantao Lou
,
Donghai Hong
,
Borong Zhang
,
Xuehai Pan
,
Tianyi Qiu
,
Juntao Dai
,
Yaodong Yang
(2024).
Aligner: Efficient Alignment by Learning to Correct
. Advances in Neural Information Processing Systems, 2024.
PDF
Cite
Qianxu Wang
,
Congyue Deng
,
Tyler Ga Wei Lum
,
Yuanpei Chen
,
Yaodong Yang
,
Jeannette Bohg
,
Yixin Zhu
,
Leonidas Guibas
(2024).
Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping
. 2024 Conference on Robot Learning.
PDF
Cite
Jiaming Ji
,
Jiayi Zhou
,
Borong Zhang
,
Juntao Dai
,
Xuehai Pan
,
Ruiyang Sun
,
Weidong Huang
,
Yiran Geng
,
Mickel Liu
,
Yaodong Yang
(2024).
Omnisafe: An infrastructure for accelerating safe reinforcement learning research
. Journal of Machine Learning Research.
PDF
Cite
Code
Chengdong Ma
,
Aming Li
,
Yali Du
,
Hao Dong
,
Yaodong Yang
(2024).
Efficient and Scalable Reinforcement Learning for Large-Scale Network Control
. Nature Machine Intelligence.
PDF
Cite
Qinghao Wang
,
Yaodong Yang
(2024).
Carbon Trading Supply Chain Management Based On Constrained Deep Reinforcement Learning
. Autonomous Agents and Multi-Agent Systems.
Cite
Tianyi Qiu
,
Yang Zhang
,
Xuchuan Huang
,
Jasmine Xinze Li
,
Jiaming Ji
,
Yaodong Yang
(2024).
ProgressGym: Alignment with a Millennium of Moral Progress
. NeurIPS 2024 Track on Datasets and Benchmarks (Spotlight).
PDF
Cite
Code
Ruiqing Chen
,
Xiaoyuan Zhang
,
Yali Du
,
Yifan Zhong
,
Zheng Tian
,
Fanglei Sun
,
Yaodong Yang
(2024).
Off-Agent Trust Region Policy Optimization
. International Joint Conference on Artificial Intelligence (IJCAI 2024).
PDF
Cite
Yizhe Huang
,
Anji Liu
,
Fanqi Kong
,
Yaodong Yang
,
Song-Chun Zhu
,
Xue Feng
(2024).
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
. 41st International Conference on Machine Learning 2024.
PDF
Cite
Jiaming Ji
,
Tianyi Qiu
,
Boyuan Chen
,
Yaodong Yang
(2024).
对齐的理论, 技术与评估 (Theories, Techniques, and Evaluation of AI Alignment)
. CCL 2024.
PDF
Cite
Dongzi Wang
,
Fangwei Zhon
,
Minglong Li
,
Muning Wen
,
Yuanxi Peng
,
Teng Li
,
Yaodong Yang
(2024).
RoMAT: Role-Based Multi-agent Transformer for Generalizable Heterogeneous Cooperation
. Neural Networks.
Cite
Yue Zhang
,
Yaodong Yang
,
Zhenbo Lu
,
Wengang Zhou
,
Houqiang Li
(2024).
Remember the Past for Better Future: Memory-Augmented Offline RL
. IJCNN 2024.
Cite
DOI
Jiaming Ji
,
Kaile Wang
,
Tianyi Qiu
,
Boyuan Chen
,
Jiayi Zhou
,
Changye Li
,
Hantao Lou
,
Juntao Dai
,
Yunhuai Liu
,
Yaodong Yang
(2024).
Language models resist alignment: Evidence from data compression
. ACL 2025.
PDF
Cite
Code
DOI
Siyuan Qi
,
Bangcheng Yang
,
Kailin Jiang
,
Xiaobo Wang
,
Jiaqi Li
,
Yifan Zhong
,
Yaodong Yang
,
Zilong Zheng
(2024).
In-Context Editing: Learning Knowledge from Self-Induced Distributions
. ICLR 2025.
PDF
Cite
Code
Jieming Cui
,
Tengyu Liu
,
Nian Liu
,
Yaodong Yang
,
Yixin Zhu
,
Siyuan Huang
(2024).
Anyskill: Learning Open-Vocabulary Physical Skill for Interactive Agents
. Computer Vision and Pattern Recognition (CVPR 2024).
PDF
Cite
Xiangbin Meng
,
Xiangyu Yan
,
Kuo Zhang
,
Da Liu
,
Xiaojuan Cui
,
Yaodong Yang
,
Muhan Zhang
,
Chunxia Cao
,
Jingjia Wang
,
Xuliang Wang
,
Jun Gao
,
Jiaming Ji
,
Zifeng Qiu
,
Muzi Li
,
Cheng Qian
,
Tianze Guo
,
Shuangquan Ma
,
Zeying Wang
,
Zexuan Guo
,
Youlan Lei
,
Chunli Shao
,
Wenyao Wang
,
Haojun Fan
,
Yi-Da Tang
(2024).
The Application of Large Language Models in Medicine: A Scoping Review
. iScience.
Cite
Lirui Luo
,
Guoxi Zhang
,
Hongming Xu
,
Yaodong Yang
,
Cong Fang
,
Qing Li
(2024).
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
. Proceedings of the 41st International Conference on Machine Learning 2024.
PDF
Cite
Yi-Da Tang
,
Jmir Preprints
,
Kuo Zhang
,
Xiangyu Yan
,
Dph
,
Xiangbin Meng
,
Jiaming Ji
,
Hua Xu
,
Jingqian Liu
,
Jingjia Wang
,
Xuliang Wang
,
Jun Gao
,
Da Liu
,
Yuan-Geng-Shuo Wang
,
Chunli Shao
,
Wenyao Wang
,
Yaodong Yang
(2024).
Revolutionizing Healthcare: The Transformative Impact of LLMs in Medicine
. Journal of Medical Internet Research.
Cite
Yuyang Li
,
Bo Liu
,
Yiran Geng
,
Puhao Li
,
Yaodong Yang
,
Yixin Zhu
,
Tengyu Liu
,
Siyuan Huang
(2024).
Grasp Multiple Objects with One Hand
. IEEE Robotics and Automation Letters (RA-L) & International Conference on Intelligent Robots and Systems (IROS).
PDF
Cite
Yinmin Zhang
,
Jie Liu
,
Chuming Li
,
Yazhe Niu
,
Yaodong Yang
,
Yu Liu
,
Wanli Ouyang
(2024).
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
. AAAI 2024.
Cite
DOI
Ceyao Zhang
,
Kaijie Yang
,
Siyi Hu
,
Zihao Wang
,
Guanghe Li
,
Yihang Sun
,
Cheng Zhang
,
Zhaowei Zhan
,
Anji Liu
,
Song-Chun Zhu
,
Xiaojun Chang
,
Junge Zhang
,
Feng Yin
,
Yitao Liang
,
Yaodong Yang
(2024).
ProAgent: Building Proactive Cooperative Agents with Large Language Models
. AAAI 2024 Oral.
Cite
Code
Sirui Chen
,
Zhaowei Zhang
,
Yaodong Yang
,
Yali Du
(2024).
STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning
. 38th Conference on Artificial Intelligence (AAAI 2024).
PDF
Cite
Yifan Zhong
,
Grudzien Kuba
,
Xidong Feng
,
Siyi Hu
,
Jiaming Ji
,
Yaodong Yang
(2024).
Heterogeneous-Agent Reinforcement Learning
. Journal of Machine Learning Research.
PDF
Cite
Chenguang Wang
,
Zhouliang Yu
,
Stephen McAleer
,
Tianshu Yu
,
Yaodong Yang
(2024).
ASP: Learn a Universal Neural Solver!
. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Cite
Jiarong Liu
,
Yifan Zhong
,
Siyi Hu
,
Haobo Fu
,
QIANG FU
,
Xiaojun Chang
,
Yaodong Yang
(2024).
Maximum Entropy Heterogeneous-Agent Reinforcement Learning
. International Conference on Learning Representations (ICLR 2024).
PDF
Cite
Siyuan Qi
,
Shuo Chen
,
Yexin Li
,
Xiangyu Kong
,
Junqi Wang
,
Bangcheng Yang
,
Pring Wong
,
Yifan Zhong
,
Xiaoyuan Zhang
,
Zhaowei Zhan
,
Nian Liu
,
Wei Wang
,
Yaodong Yang
,
Song-Chun Zhu
(2024).
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
. ICLR 2024.
PDF
Cite
Code
Poster
Shangding Gu
,
Jakub Grudzien Kuba
,
Yuanpei Chen
,
Yali Du
,
Long Yang
,
Alois Knoll
,
Yaodong Yang
(2023).
Safe Multi-agent Reinforcement Learning for Multi-robot Control
. Artificial Intelligence (AIJ).
PDF
Cite
Jiaming Ji
,
Borong Zhang
,
Jiayi Zhou
,
Xuehai Pan
,
Weidong Huang
,
Ruiyang Sun
,
Yiran Geng
,
Yifan Zhong
,
Juntao Dai
,
Yaodong Yang
(2023).
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark
. Neural Information Processing Systems, 2023.
PDF
Cite
Zhijian Duan
,
Wenhan Huang
,
Dinghuai Zhang
,
Yali Du
,
Jun Wang
,
Yaodong Yang
,
Xiaotie Deng
(2023).
Is Nash Equilibrium Approximator Learnable?
. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023).
PDF
Cite
Jiaming Ji
,
Mickel Liu
,
Juntao Dai
,
Xuehai Pan
,
Chi Zhang
,
Ce Bian
,
Boyuan Chen
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
(2023).
BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment
. Neural Information Processing Systems, 2023.
PDF
Cite
Weikang Wan
,
Haoran Geng
,
Yun Liu
,
Zikang Shan
,
Yaodong Yang
,
Li Yi
,
He Wang
(2023).
Unidexgrasp++: Improving Dexterous Grasping Policy Learning via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning
. International Conference on Computer Vision (ICCV 2023).
PDF
Cite
Muning Wen
,
Runji Lin
,
HanjingWANG
,
Yaodong Yang
,
Ying Wen
,
Luo Mai
,
Jun Wang
,
Haifeng Zhang
,
Weinan Zhang
(2023).
Large Sequence Models for Sequential Decision-Making: A Survey
. Frontiers of Computer Science (FCS).
PDF
Cite
Hanjing Wang
,
Man-Kit Sit
,
Congjie He
,
Ying Wen
,
Weinan Zhang
,
Jun Wang
,
Yaodong Yang
,
Luo Mai
(2023).
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
. The Fortieth International Conference on Machine Learning (ICML 2023).
PDF
Cite
Oliver Slumbers
,
David Henry Mguni
,
Stephen Marcus McAleer
,
Stefano B. Blumberg
,
Jun Wang
,
Yaodong Yang
(2023).
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
. The Fortieth International Conference on Machine Learning (ICML 2023).
PDF
Cite
Xiaohang Tang
,
Le Cong Dinh
,
Stephen Marcus McAleer
,
Yaodong Yang
(2023).
Regret-Minimizing Double Oracle for Extensive-Form Games
. The Fortieth International Conference on Machine Learning (ICML 2023).
PDF
Cite
Qinghao Wang
,
Yanling PENG
,
Yijie Peng
,
Yaodong Yang
(2023).
A Deep Reinforcement Learning-driven Vine Copula Method for Correlation Structure Analysis of Mortgage
. China Journal of Econometrics.
PDF
Cite
Ming Zhou
,
Ziyu Wan
,
Hanjing Wang
,
Muning Wen
,
Runzhe Wu
,
Ying Wen
,
Yaodong Yang
,
Yong Yu
,
Jun Wang
,
Weinan Zhang
(2023).
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning
. Journal of Machine Learning Research (JMLR).
PDF
Cite
David Mguni
,
Haojun Chen
,
Taher Jafferjee
,
Jianhong Wang
,
Long Fei
,
Xidong Feng
,
Stephen McAleer
,
Feifei Tong
,
Jun Wang
,
Yaodong Yang
(2023).
MANSA: Learning Fast and Slow in Multi-Agent Systems
. The Fortieth International Conference on Machine Learning (ICML 2023).
PDF
Cite
David Mguni
,
Taher Jafferjee
,
Jianhong Wang
,
Nicolas Perez Nieves
,
Tianpei Yang
,
Matthew Taylor
,
Wenbin Song
,
Feifei Tong
,
Hui Chen
,
Jiangcheng Zhu
,
Jun Wang
,
Yaodong Yang
(2023).
Learning to Shape Rewards using a Game of Two Partners
. Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023).
PDF
Cite
Shuang Wu
,
Jian Yao
,
Haobo Fu
,
Ye Tian
,
Chao Qian
,
Yaodong Yang
,
QIANG FU
,
Yang Wei
(2023).
Quality-Similar Diversity via Population Based Reinforcement Learning
. The Eleventh International Conference on Learning Representations (ICLR 2023).
PDF
Cite
Xiaotie Deng
,
Ningyuan Li
,
David Mguni
,
Jun Wang
,
Yaodong Yang
(2023).
On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games
. National Science Review (NSR).
PDF
Cite
Ying Wen
,
Hui Chen
,
Yaodong Yang
,
Minne Li
,
Zheng Tian
,
Xu Chen
,
Jun Wang
(2022).
A Game-Theoretic Approach to Multi-agent Trust Region Optimization
. International Conference on Distributed Artificial Intelligence (DAI 2022).
PDF
Cite
Qinghao Wang
,
Yijie Peng
,
Yaodong Yang
(2022).
Solving Inventory Management Problems through Deep Reinforcement Learning
. Journal of Systems Science and Systems Engineering.
PDF
Cite
Chuming Li
,
Jie Liu
,
Yinmin Zhang
,
Yuhong Wei
,
Yazhe Niu
,
Yaodong Yang
,
Yu Liu
,
Wanli Ouyang
(2022).
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency
. Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023).
PDF
Cite
Runji Lin
,
Ye Li
,
Xidong Feng
,
Zhaowei Zhang
,
Xian Hong Wu Fung
,
Haifeng Zhang
,
Jun Wang
,
Yali Du
,
Yaodong Yang
(2022).
Contextual Transformer for Offline Meta Reinforcement Learning
. NeurIPS 2022 Foundation Models for Decision Making Workshop.
PDF
Cite
Jie Ren
,
Xidong Feng
,
Bo Liu
,
Xuehai Pan
,
Yao Fu
,
Luo Mai
,
Yaodong Yang
(2022).
TorchOpt: An Efficient Library for Differentiable Optimization
. OPT2022: 14th Annual Workshop on Optimization for Machine Learning.
PDF
Cite
Yali Du
,
Chengdong Ma
,
Yuchen Liu
,
Runji Lin
,
Hao Dong
,
Jun Wang
,
Yaodong Yang
(2022).
Scalable Model-based Policy Optimization for Decentralized Networked Systems
. The 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022).
PDF
Cite
Huanzhou Zhu
,
Bo Zhao
,
Gang Chen
,
Weifeng Chen
,
Yijie Chen
,
Liang Shi
,
Yaodong Yang
,
Peter Pietzuch
,
Lei Chen
(2022).
MSRL: Distributed Reinforcement Learning with Dataflow Fragments
. USENIX Annual Technical Conference (ATC).
PDF
Cite
Puhao Li
,
Tengyu Liu
,
Yuyang Li
,
Yiran Geng
,
Yixin Zhu
,
Yaodong Yang
,
Siyuan Huang
(2022).
GenDexGrasp: Generalizable Dexterous Grasping
. 2023 IEEE International Conference on Robotics and Automation (ICRA 2023).
PDF
Cite
Le Cong Dinh
,
Yaodong Yang
,
Stephen McAleer
,
Zheng Tian
,
Nicolas Perez Nieves
,
Oliver Slumbers
,
David Henry Mguni
,
Haitham Bou Ammar
,
Jun Wang
(2022).
Online Double Oracle
. Transactions on Machine Learning Research (TMLR).
PDF
Cite
Yuanpei Chen
,
Tianhao Wu
,
Shengjie Wang
,
Xidong Feng
,
Jiechuang Jiang
,
Stephen Marcus McAleer
,
Hao Dong
,
Zongqing Lu
,
Song-Chun Zhu
,
Yaodong Yang
(2022).
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.
PDF
Cite
Runze Liu
,
Fengshuo Bai
,
Yali Du
,
Yaodong Yang
(2022).
Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
PDF
Cite
Xuehai Pan
,
Mickel Liu
,
Fangwei Zhong
,
Yaodong Yang
,
Song-Chun Zhu
,
Yizhou Wang
(2022).
MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.
PDF
Cite
Long Yang
,
Jiaming Ji
,
Juntao Dai
,
Linrui Zhang
,
Binbin Zhou
,
Pengfei Li
,
Yaodong Yang
,
Gang Pan
(2022).
Constrained Update Projection Approach to Safe Policy Optimization
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
PDF
Cite
Zongkai Liu
,
Chao Yu
,
Yaodong Yang
,
Peng Sun
,
Zifan Wu
,
Yuan Li
(2022).
A Unified Diversity Measure for Multiagent Reinforcement Learning
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
PDF
Cite
Bo Liu
,
Xidong Feng
,
Jie Ren
,
Luo Mai
,
Rui Zhu
,
Haifeng Zhang
,
Jun Wang
,
Yaodong Yang
(2022).
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
PDF
Cite
Yiran Geng
,
Boshi An
,
Haoran Geng
,
Yuanpei Chen
,
Yaodong Yang
,
Hao Dong
(2022).
End-to-End Affordance Learning for Robotic Manipulation
. 2023 IEEE International Conference on Robotics and Automation (ICRA 2023).
PDF
Cite
Zhitao Zhu
,
Shijing Si
,
Jianzong Wang
,
Yaodong Yang
,
Jing Xiao
(2022).
Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation
. Web Information Systems Engineering–WISE 2022: 23rd International Conference.
PDF
Cite
Linghui Meng
,
Muning Wen
,
Chenyang Le
,
Xiyun Li
,
Dengpeng Xing
,
Weinan Zhang
,
Ying Wen
,
Haifeng Zhang
,
Jun Wang
,
Yaodong Yang
,
Bo Xu
(2022).
Offline Pre-trained Multi-agent Decision Transformer
. Machine Intelligence Research.
PDF
Cite
Muning Wen
,
Jakub Grudzien Kuba
,
Runji Lin
,
Weinan Zhang
,
Ying Wen
,
Jun Wang
,
Yaodong Yang
(2022).
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
. The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
PDF
Cite
Yurong Chen
,
Xiaotie Deng
,
Chenchen Li
,
David Mguni
,
Jun Wang
,
Xiang Yan
,
Yaodong Yang
(2022).
On the Convergence of Fictitious Play: A Decomposition Approach
. The 31st International Joint Conference on Artificial Intelligence (IJCAI 2022).
PDF
Cite
Ricky Sanjaya
,
Jun Wang
,
Yaodong Yang
(2022).
Measuring the Non-Transitivity in Chess
. Algorithms 2022.
PDF
Cite
Xidong Feng
,
Oliver Slumbers
,
Ziyu Wan
,
Bo Liu
,
Stephen McAleer
,
Ying Wen
,
Jun Wang
,
Yaodong Yang
(2021).
Neural Auto-Curricula in Two-Player Zero-Sum Games
. The 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
PDF
Cite
David Henry Mguni
,
Taher Jafferjee
,
Jianhong Wang
,
Oliver Slumbers
,
Nicolas Perez Nieves
,
Feifei Tong
,
Li Yang
,
Jiangcheng Zhu
,
Yaodong Yang
,
Jun Wang
(2021).
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
. Tenth International Conference on Learning Representations (ICLR 2022).
PDF
Cite
Le Cong Dinh
,
David Henry Mguni
,
Long Tran-Thanh
,
Jun Wang
,
Yaodong Yang
(2021).
Online Markov Decision Processes with Non-oblivious Strategic Adversary
. Autonomous Agents and Multi-Agent Systems (2023).
PDF
Cite
Jakub Grudzien Kuba
,
Ruiqing Chen
,
Muning Wen
,
Ying Wen
,
Fanglei Sun
,
Jun Wang
,
Yaodong Yang
(2021).
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
. Tenth International Conference on Learning Representations (ICLR 2022).
PDF
Cite
Jakub Grudzien Kuba
,
Muning Wen
,
Linghui Meng
,
Shangding Gu
,
Haifeng Zhang
,
David Henry Mguni
,
Jun Wang
,
Yaodong Yang
(2021).
Settling the Variance of Multi-Agent Policy Gradients
. The 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
PDF
Cite
Xiangyu Liu
,
Hangtian Jia
,
Ying Wen
,
Yujing Hu
,
Yingfeng Chen
,
Changjie Fan
,
Zhipeng Hu
,
Yaodong Yang
(2021).
Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
. The 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
PDF
Cite
Cite
×