PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
Talks
Publications
Resources
Contact
Reinforcement Learning With Human Feedback (RLHF)
BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment
In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs). …
Jiaming Ji
,
Mickel Liu
,
Juntao Dai
,
Xuehai Pan
,
Chi Zhang
,
Ce Bian
,
Boyuan Chen
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
PDF
Cite
Cite
×