PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
Reinforcement Learning From Human Feedback
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Reinforcement learning from human feedback (RLHF) is an effective method for aligning large language models (LLMs) with human values. …
Juntao Dai
,
Taiye Chen
,
Yaodong Yang
,
Qian Zheng
,
Gang Pan
PDF
Cite
Aligner: Efficient Alignment by Learning to Correct
With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective …
Jiaming Ji
,
Boyuan Chen
,
Hantao Lou
,
Donghai Hong
,
Borong Zhang
,
Xuehai Pan
,
Tianyi Qiu
,
Juntao Dai
,
Yaodong Yang
PDF
Cite
Cite
×