Hello, I am Yidong Wang [i:doʊn wɑ:n] (王一栋). My research interests lie in language modeling, semi-supervised learning, transfer learning, and imbalanced learning. I have published several papers at the top international AI Conferences / Journals with total .
🔥 News
- 2023.09: 🎉🎉 I became a Ph.D. Student at Peking University.
- 2023.08: 🎉🎉 I finished my internship at Westlake University.
- 2022.10: 🎉🎉 I finished my internship at MSRA.
📖 Educations
- 2023.09 - 2027.06, doctoral student at National Engineering Research Center for Software Engineering, Peking University, advised by Prof. Shikun Zhang and Prof. Wei Ye.
- 2020.09 - 2022.10, master student in the Department of Information and Communications Engineering of Tokyo Institute of Technology, advised by Prof. Takahiro Shinozaki.
- 2015.09 - 2019.06, undergraduate student in the Department of Computer Science and Technology of Nanjing University, advised by Prof. Xinyu Dai.
💼 Internships
- 2024.01 - Now, Squirrel AI, advised by Dr. Qingsong Wen.
- 2022.05 - 2022.10, Microsoft Research Asia, advised by Dr. Jindong Wang.
- 2022.02 - 2022.05, Westlake University, advised by Prof. Yue Zhang.
- 2021.11 - 2022.02, Microsoft Research Asia, advised by Dr. Jindong Wang.
🔖 Selected Publications
-
(9) AutoSurvey: Large Language Models Can Automatically Write Surveys. [paper];
Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang.
Advances in Neural Information Processing Systems 2024 (``NeurIPS 2024``).
-
(8) How do Large Language Models understand Genes and Cells. [paper];
Chen Fang, Yidong Wang (co-first author), Yunze Song, Qingqing Long, Wang Lu, Linghui Chen, Pengfei Wang, Guihai Feng, Yuanchun Zhou, Xin Li.
Transactions on Intelligent Systems and Technology (``TIST 2024``).
-
(7) PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization. [paper];
Yidong Wang, Zhuohao Yu, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang.
International Conference on Learning Representations 2024 (``ICLR 2024``).
-
(6) Exploring Vision-Language Models for Imbalanced Learning. [paper];
Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye, Rui Xie, Xing Xie, Shikun Zhang.
International Journal of Computer Vision 2023 (``IJCV 2023``).
-
(5) FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning. [paper];
Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie.
International Conference on Learning Representations 2023 (``ICLR 2023``).
-
(4) USB: A Unified Semi-supervised Learning Benchmark for Classification. [paper];
Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang.
Advances in Neural Information Processing Systems 2022 (``NeurIPS 2022``).
-
(3) Margin Calibration for Long-Tailed Visual Recognition. [paper];
Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki.
Asian Conference on Machine Learning 2022 (``ACML 2022``).
-
(2) Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction. [paper];
Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang.
International Conference on Computational Linguistics 2022 (``COLING 2022``).
-
(1) Flexmatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. [paper];
Bowen Zhang, Yidong Wang (co-first author), Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki.
Advances in Neural Information Processing Systems 2021 (``NeurIPS 2021``).
📝 Preprints
-
(3) A Survey on Evaluating Large Language Models in Code Generation Tasks. [paper].
Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang.
-
(2) Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity. [paper];
Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Wenyang Gao, Xuming Hu, Zehan Qi, Yidong Wang, Linyi Yang, Jindong Wang, Xing Xie, Zheng Zhang and Yue Zhang.
-
(1) PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts. [paper];
Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, Xing Xie.
📝 Publications
-
(32) Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application. [paper].
Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen.
Transactions on Intelligent Systems and Technology (``TIST 2024``).
-
(31) RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation. [paper];
Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen.
The 2024 Conference on Empirical Methods in Natural Language Processing System Demonstration Track (``EMNLP 2024 Demo Track``).
-
(30) FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models. [paper];
Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Zhengran Zeng, Wei Ye, Jindong Wang, Yue Zhang, Shikun Zhang.
The 2024 Conference on Empirical Methods in Natural Language Processing System Demonstration Track (``EMNLP 2024 Demo Track``).
-
(29) PURE: Aligning LLM via Pluggable Query Reformulation for Enhanced Helpfulness.
Wenjin Yao, Yidong Wang, Zhuohao Yu, Rui Xie, Shikun Zhang, Wei Ye.
Findings of The 2024 Conference on Empirical Methods in Natural Language Processing (``EMNLP 2024 Findings``).
-
(28) Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations. [paper].
Hao Chen, Ankit Shah, Jindong Wang, Ran Tao, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj.
Advances in Neural Information Processing Systems 2024 (``NeurIPS 2024``).
-
(27) AutoSurvey: Large Language Models Can Automatically Write Surveys. [paper];
Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang.
Advances in Neural Information Processing Systems 2024 (``NeurIPS 2024``).
-
(26) How do Large Language Models understand Genes and Cells. [paper];
Chen Fang, Yidong Wang (co-first author), Yunze Song, Qingqing Long, Wang Lu, Linghui Chen, Pengfei Wang, Guihai Feng, Yuanchun Zhou, Xin Li.
Transactions on Intelligent Systems and Technology (``TIST 2024``).
-
(25) PIXEL: Prompt-based Zero-shot Hashing via Visual and Textual Semantic Alignment.
Zeyu Dong, Qingqing Long, Yihang Zhou, Zhihong Zhu, Yidong Wang, Xiao Luo, Pengyang Wang, Pengfei Wang, Yuanchun Zhou.
The 33rd ACM International Conference on Information and Knowledge Management (``CIKM 2024``).
-
(24) Enhancing In-Context Learning via Implicit Demonstration Augmentation. [paper].
Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang.
Annual Meeting of the Association for Computational Linguistics 2024 (``ACL 2024``).
-
(23) KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. [paper].
Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Wei Ye, Jindong Wang, Xing Xie, Yue Zhang, Shikun Zhang.
Annual Meeting of the Association for Computational Linguistics 2024 (``ACL 2024``).
-
(22) What Makes a Good Order of Examples in In-Context Learning. [paper].
Qi Guo, Leiyu Wang, Yidong Wang, Wei Ye, Shikun Zhang.
Findings of Annual Meeting of the Association for Computational Linguistics 2024 (``ACL 2024 Findings``).
-
(21) A General Framework for Learning from Weak Supervision. [paper].
Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
The Forty-first International Conference on Machine Learning (``ICML 2024``).
-
(20) Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets. [paper].
Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides.
Conference on Computer Vision and Pattern Recognition 2024 Workshop Prompting in Vision (``CVPR 2024 Workshop``).
-
(19) CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios. [paper];
Zhengran Zeng, Yidong Wang, Rui Xie, Wei Ye, Shikun Zhang.
The ACM SIGSOFT International Symposium on Software Testing and Analysis 2024 (``ISSTA 2024``).
-
(18) PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization. [paper];
Yidong Wang, Zhuohao Yu, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang.
International Conference on Learning Representations 2024 (``ICLR 2024``).
-
(17) Supervised Knowledge Makes Large Language Models Better In-context Learners. [paper]
Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang.
International Conference on Learning Representations 2024 (``ICLR 2024``).
-
(16) A Survey on Evaluation of Large Language Models. [paper];
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie.
Transactions on Intelligent Systems and Technology (``TIST 2024``).
-
(15) Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution. [paper].
Wang Lu, Jindong Wang, Yidong Wang, Kan Ren, Yiqiang Chen, Xing Xie.
SIAM Conference on Data Mining 2024 (``SDM 2024``).
-
(14) Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future. [paper].
Linyi Yang, Yaoxiao Song, Xuan Ren, Chenyang Lyu, Yidong Wang, Lingqiao Liu, Jindong Wang, Jennifer Foster, Yue Zhang.
The 2023 Conference on Empirical Methods in Natural Language Processing (``EMNLP 2023``).
-
(13) Evaluating open question answering evaluation. [paper];
Cunxiang Wang, Sirui Cheng, Zhikun Xu, Bowen Ding, Yidong Wang, Yue Zhang.
Advances in Neural Information Processing Systems 2023 (``NeurIPS 2023``).
-
(12) Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement. [paper];
Chao Zhang, Fangzhao Wu, Jingwei Yi, Derong Xu, Yang Yu, Jingdong Wang, Yidong Wang, Tong Xu, Xing Xie, Enhong Chen.
The Conference on Information and Knowledge Management 2023 (``CIKM 2023``).
-
(11) Exploring Vision-Language Models for Imbalanced Learning. [paper];
Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye, Rui Xie, Xing Xie, Shikun Zhang.
International Journal of Computer Vision 2023 (``IJCV 2023``).
-
(10) GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective. [paper];
Linyi Yang, Shuibai Zhang, Libo Qin, Yafu Li, Yidong Wang, Hanmeng Liu, Jindong Wang, Xing Xie, Yue Zhang.
Findings of Annual Meeting of the Association for Computational Linguistics 2023 (``ACL 2023 Findings``).
-
(9) On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective. [paper];
Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie.
Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models at ICLR 2023 (``RTML Workshop 2023``).
-
(8) FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning. [paper];
Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie.
International Conference on Learning Representations 2023 (``ICLR 2023``).
-
(7) SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning. [paper];
Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Marios Savvides, Jindong Wang, Bhiksha Raj, Xing Xie, Bernt Schiele.
International Conference on Learning Representations 2023 (``ICLR 2023``).
-
(6) USB: A Unified Semi-supervised Learning Benchmark for Classification. [paper];
Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang.
Advances in Neural Information Processing Systems 2022 (``NeurIPS 2022``).
-
(5) Margin Calibration for Long-Tailed Visual Recognition. [paper];
Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki.
Asian Conference on Machine Learning 2022 (``ACML 2022``).
-
(4) Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction. [paper];
Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang.
International Conference on Computational Linguistics 2022 (``COLING 2022``).
-
(3) Exploiting Adapters for Cross-lingual Low-resource Speech Recognition. [paper];
Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki.
IEEE/ACM Transactions on Audio, Speech and Language Processing 2022 (``TASLP 2022``).
-
(2) Flexmatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. [paper];
Bowen Zhang, Yidong Wang (co-first author), Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki.
Advances in Neural Information Processing Systems 2021 (``NeurIPS 2021``).
-
(1) Meta-Adapter: Efficient Cross-Lingual Adaptation With Meta-Learning. [paper];
Wenxin Hou, Yidong Wang, Shengzhou Gao, Takahiro Shinozaki.
IEEE International Conference on Acoustics, Speech, and Signal Processing 2021 (``ICASSP 2021``).
💻 Selected Projects
- PandaLM refers to ReProducible and Automated Language Model Assessment. PandaLM aims to provide reproducible and automated comparisons between different large language models (LLMs). By giving PandaLM the same context, it can compare the responses of different LLMs and provide a reason for the decision, along with a reference answer. I am the main contributor to this repo and now leading the PandaLM team.
- USB is a Pytorch-based Python package for Semi-Supervised Learning (SSL). It is easy-to-use/extend, affordable, and comprehensive for developing and evaluating SSL algorithms. USB provides the implementation of 14 SSL algorithms based on Consistency Regularization, and 15 tasks for evaluation from CV, NLP, and Audio domain. I am the main contributor to this repo and now leading the USB team.
- TorchSSL is an all-in-one toolkit based on PyTorch for semi-supervised learning (SSL). Currently, we implemented 9 popular SSL algorithms to enable fair comparison and boost the development of SSL algorithms. I am the main contributor to this repo and now leading the TorchSSL team.
🎖 Honors and Awards
- First Place in the Entrance Examination for PhD at the School of Software and Microelectronics, Peking University, 2023.
- Outstanding Student Award, Tokyo Institue of Technology, 2022.
- Stars of Tomorrow, Microsoft Research Asia, 2021&2022.
- Jasso Scholarship, Tokyo Institue of Technology, 2020.
- Excellence in Nanjing University Training Program of Innovation for Undergraduates, 2019.
- Honorable Mention of Interdisciplinary Contest in Modeling, 2018.
- Renmin Scholarship, Nanjing University, 2017&2018.
📄 Academic Services
- Reviewer for Conferences: NeurIPS 2022, CVPR 2023, ICML 2023, ICCV 2023, NeurIPS 2023, AAAI 2024, ICLR 2024, CVPR 2024, ICML 2024, ECCV 2024, NAACL 2024, ACL 2024, COLM 2024, NeurIPS 2024, ICLR 2025.
- Reviewer for Journals: IJCV, TIP, ACM TIST, JCST.
🏫 Teaching Experience
- 2024 Spring Teaching Assistant, Natural Language Processing(自然语言处理) by Prof. Di He, Peking University.
🎤 Invited Talks
- 2023, Microsoft Research Asia, Sharing Internship Experience.
- 2023, The AI Talks, Advancing Semi-Supervised Learning: Methods and Benchmarks.
- 2023, East China Normal University, Introduction to Semi-supervised Learning.
- 2024, Nanjing University, Addressing Low-Resource Challenges in Machine Learning: Strategies from the Early Algorithm Era to the Age of Large Models.