PAIR Lab: PKU Alignment and Interaction Research Lab
PAIR Lab: PKU Alignment and Interaction Research Lab
Open-Source Projects
People
Talks
Publications
Resources
Contact
1
Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation
Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many …
Zhitao Zhu
,
Shijing Si
,
Jianzong Wang
,
Yaodong Yang
,
Jing Xiao
PDF
Cite
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Large sequence models (SM) such as GPT series and BERT have displayed outstanding performance and generalization capabilities in …
Muning WEN
,
Jakub Grudzien Kuba
,
Runji LIN
,
Weinan ZHANG
,
Ying Wen
,
Jun Wang
,
Yaodong Yang
PDF
Cite
On the Convergence of Fictitious Play: A Decomposition Approach
Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in n-player …
Yurong Chen
,
Xiaotie Deng
,
Chenchen Li
,
David Mguni
,
Jun Wang
,
Xiang Yan
,
Yaodong Yang
PDF
Cite
Neural Auto-Curricula in Two-Player Zero-Sum Games
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, …
Xidong Feng
,
Oliver Slumbers
,
Ziyu Wan
,
Bo Liu
,
Stephen McAleer
,
Ying Wen
,
Jun Wang
,
Yaodong Yang
PDF
Cite
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agent systems, coordinated exploration …
David Henry Mguni
,
Taher Jafferjee
,
Jianhong Wang
,
Oliver Slumbers
,
Nicolas Perez-Nieves
,
Feifei Tong
,
Li Yang
,
Jiangcheng Zhu
,
Yaodong Yang
,
Jun Wang
PDF
Cite
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to …
Jakub Grudzien Kuba
,
Ruiqing Chen
,
Muning WEN
,
Ying Wen
,
Fanglei Sun
,
Jun Wang
,
Yaodong Yang
PDF
Cite
Settling the Variance of Multi-Agent Policy Gradients
Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance …
Jakub Grudzien Kuba
,
Muning WEN
,
Linghui Meng
,
Shangding Gu
,
Haifeng ZHANG
,
David Henry Mguni
,
Jun Wang
,
Yaodong Yang
PDF
Cite
Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles …
Xiangyu Liu
,
Hangtian Jia
,
Ying Wen
,
Yujing Hu
,
Yingfeng Chen
,
Changjie Fan
,
Zhipeng Hu
,
Yaodong Yang
PDF
Cite
«
Cite
×