PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
News
Publications
Resources
Contact
Mirror Descent
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of …
Mingzhi Wang
,
Chengdong Ma
,
Qizhi Chen
,
Linjian Meng
,
Yang Han
,
Jiancong Xiao
,
Zhaowei Zhan
,
Jing Huo
,
Weijie J Su
,
Yaodong Yang
PDF
Cite
Cite
×