Publications

* equal contribution, corresponding author

2026

  1. Main figure for Safe Multi-agent Reinforcement Learning with Natural Language Constraints
    AAAI'26

    Safe Multi-agent Reinforcement Learning with Natural Language Constraints

    Ziyan Wang , Meng Fang , Tristan Tomilin , Fei Fang and Yali Du
    Alignment Track of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI) 2026
  2. Under Review
    Under Review

    BazaarBench: Open-Ended Multi-Agent Safety Evaluation in C2C Marketplaces

    Ziyan Wang , Shuqing Shi , James Oldfield , Samuele Marro , Jialin Yu , Francesco Pinto , Philip Torr , Yali Du and Adel Bibi
    Under Review 2026
  3. Main figure for Fisher Decorator: Refining Flow Policy via a Local Transport Map
    arXiv

    Fisher Decorator: Refining Flow Policy via a Local Transport Map

    Xiaoyuan Cheng , Haoyu Wang , Wenxuan Yuan , Ziyan Wang , Zonghao Chen , Li Zeng and Zhuo Sun
    arXiv preprint arXiv:2604.17919 2026

2025

  1. Main figure for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
    ICML'25

    M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

    Ziyan Wang , Zhicheng Zhang , Fei Fang and Yali Du
    Forty-Second International Conference on Machine Learning (ICML) 2025
  2. Main figure for MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment
    TMLR

    MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

    Ziyan Wang , Yali Du , Yudi Zhang , Meng Fang and Biwei Huang
    Transactions on Machine Learning Research (TMLR) 2025
  3. NeurIPS'25
    NeurIPS'25

    Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

    Ch Smith , ler , Marwa Abdulhai , Manfred Diaz , Ziyan Wang and others
    The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025
  4. PRICAI'25
    PRICAI'25

    Bridging Confidence and Competence: Evaluating Self-Assessment Alignment in LLM Mathematical Reasoning

    Mingze Zhong , Zijing Shi , Ziyan Wang and others
    The Pacific Rim International Conference on Artificial Intelligence (PRICAI) 2025
  5. Main figure for Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models
    Under Review

    Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models

    Zhicheng Zhang* , Ziyan Wang* , Yali Du and Fei Fang
    Under Review 2025

2024

  1. Oral Main figure for Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting
    NeurIPS'24

    Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting

    Xiong-Hui Chen* , Ziyan Wang* , Yali Du , Shengyi Jiang , Meng Fang , Yang Yu and Jun Wang
    The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS) 2024
  2. Main figure for Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
    NeurIPS'24

    Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

    Xuanfa Jin* , Ziyan Wang* , Yali Du , Meng Fang , Haifeng Zhang and Jun Wang
    The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS) 2024
  3. AAMAS'24
    AAMAS'24

    Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models

    Xingzhou Lou , Junge Zhang , Ziyan Wang , Kaiqi Huang and Yali Du
    The 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2024

2023

  1. Main figure for ChessGPT: Bridging Policy Learning and Language Modeling
    NeurIPS'23

    ChessGPT: Bridging Policy Learning and Language Modeling

    Xidong Feng , Yicheng Luo , Ziyan Wang , Hongrui Tang , Mengyue Yang , Kun Shao , David Mguni , Yali Du and Jun Wang
    The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023
  2. NeurIPS'23
    NeurIPS'23

    Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

    Yudi Zhang , Yali Du , Biwei Huang , Ziyan Wang , Jun Wang , Meng Fang and Mykola Pechenizkiy
    The Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023

2022

  1. Spotlight
    ICML'22
    ICML'22

    Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

    Aivar Sootla , Alex Cowen-Rivers , er , Taher Jafferjee , Ziyan Wang , David H Mguni , Jun Wang and Haitham Ammar
    Proceedings of the 39th International Conference on Machine Learning (ICML) 2022

2021

  1. Preprint
    Preprint

    Multi-Agent Constrained Policy Optimisation

    Shangding Gu , Jakub Kuba , Muning Wen , Ruiqing Chen , Ziyan Wang , Zheng Tian , Jun Wang , Alois Knoll and Yaodong Yang
    arXiv preprint 2021

Additional Manuscripts

  1. Main figure for VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training: A Chess Case Study
    Under Review

    VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training: A Chess Case Study

    Zhicheng Zhang* , Ziyan Wang* , Yali Du and Fei Fang
    Under Review 2026
  2. Main figure for Memento: Teaching LLMs to Manage Their Own Context
    Under Review

    Memento: Teaching LLMs to Manage Their Own Context

    Vasilis Kontonis , Yuchen Zeng , Shivam Garg , Lingjiao Chen , Hao Tang , Ziyan Wang , Ahmed Awadallah , Eric Horvitz , John Langford and Dimitris Papailiopoulos
    Under Review 2026