PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
Large Language Model
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Reinforcement learning from human feedback (RLHF) is an effective method for aligning large language models (LLMs) with human values. …
Juntao Dai
,
Taiye Chen
,
Yaodong Yang
,
Qian Zheng
,
Gang Pan
PDF
Cite
Cite
×