PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
Talks
Publications
Resources
Contact
1
Differentiable Information Enhanced Model-Based Reinforcement Learning
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …
Xiaoyuan Zhang
,
Xinyan Cai
,
Bo Liu
,
Weidong Huang
,
Song-Chun Zhu
,
Siyuan Qi
,
Yaodong Yang
PDF
Cite
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement …
Jiayi Zhou
,
Jiaming Ji
,
Juntao Dai
,
Yaodong Yang
PDF
Cite
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, …
Zhaowei Zhang
,
Fengshuo Bai
,
Qizhi Chen
,
Chengdong Ma
,
Mingzhi Wang
,
Haoran Sun
,
Zilong Zheng
,
Yaodong Yang
PDF
Cite
Code
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased …
Hantao Lou
,
Jiaming Ji
,
Kaile Wang
,
Yaodong Yang
PDF
Cite
Code
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional …
Zihao Wang
,
Shaofei Cai
,
Anji Liu
,
Yonggang Jin
,
Jinbing Hou
,
Bowei Zhang
,
Haowei Lin
,
Yaodong Yang
,
Et Al.
PDF
Cite
Aligner: Efficient Alignment by Learning to Correct
With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective …
Jiaming Ji
,
Boyuan Chen
,
Hantao Lou
,
Donghai Hong
,
Borong Zhang
,
Xuehai Pan
,
Tianyi Qiu
,
Juntao Dai
,
Yaodong Yang
PDF
Cite
Panacea: Pareto Alignment via Preference Adaptation for LLMs
Current methods for large language model alignment typically use scalar human preference labels. However, this convention tends to …
Yifan Zhong
,
Chengdong Ma
,
Xiaoyuan Zhang
,
Ziran Yang
,
Haojun Chen
,
Qingfu Zhang
,
Siyuan Qi
,
Yaodong Yang
PDF
Cite
Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping
One-shot transfer of dexterous grasps to novel scenes with object and context variations has been a challenging problem. While …
Qianxu Wang
,
Congyue Deng
,
Tyler Ga Wei Lum
,
Yuanpei Chen
,
Yaodong Yang
,
Jeannette Bohg
,
Yixin Zhu
,
Leonidas Guibas
PDF
Cite
Cite
Off-agent trust region policy optimization
Leveraging the experiences of other agents offers a powerful mechanism to enhance policy optimization in multi-agent reinforcement …
Ruiqing Chen
,
Xiaoyuan Zhang
,
Yali Du
,
Yifan Zhong
,
Zheng Tian
,
Fanglei Sun
,
Yaodong Yang
PDF
Cite
»
Cite
×