1

Differentiable Information Enhanced Model-Based Reinforcement Learning

Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information …

Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement …

Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, …

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased …

Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional …

Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Yaodong Yang, Et Al.

Aligner: Efficient Alignment by Learning to Correct

With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective …

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi Qiu, Juntao Dai, Yaodong Yang

Panacea: Pareto Alignment via Preference Adaptation for LLMs

Current methods for large language model alignment typically use scalar human preference labels. However, this convention tends to …

Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang

Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping

One-shot transfer of dexterous grasps to novel scenes with object and context variations has been a challenging problem. While …

Qianxu Wang, Congyue Deng, Tyler Ga Wei Lum, Yuanpei Chen, Yaodong Yang, Jeannette Bohg, Yixin Zhu, Leonidas Guibas

Off-agent trust region policy optimization

Leveraging the experiences of other agents offers a powerful mechanism to enhance policy optimization in multi-agent reinforcement …

Ruiqing Chen, Xiaoyuan Zhang, Yali Du, Yifan Zhong, Zheng Tian, Fanglei Sun, Yaodong Yang