
Ph.D. student at Utrecht University
Email: y [dot] du [at] uu [dot] nl
Address: BBG 5.05, 3584CC Utrecht, the Netherlands
Short Bio
I am a final-year Ph.D. student at Utrecht University, advised by Dr. Dong Nguyen and Prof. Albert Gatt. I work on NLP and ML, and I am now interested in the attribution of language models, with a focus on reasoning tasks.
Before joining UU, I received both my bachelor’s (Psychology, 2017) and my master’s (Computer Science, 2020, advised by Dr. Yuanbin Wu) degrees from East China Normal University.
I am joining Saarland University as a postdoc with Prof. Alexander Koller from Sep. 2025.
Selected Publications
-
Yupei Du, Yingjin Song, Hugh Mee Wong, Daniil Ignatev, Albert Gatt, and Dong Nguyen.
Disentangling the Roles of Representation and Selection in Data Pruning. ACL 2025.
TL;DR: We disentangled and systematically studied the influence of data representation and selection algorithm in data pruning. -
Yuqian Li, Yupei Du, Yufang Liu, Feifei Feng, Mou Xiao Feng, and Yuanbin Wu.
On Support Samples of Next Word Prediction. ACL 2025.
TL;DR:We studied the training instances that support the predictions of language models, and reveal that supporting is likely an intrinsic property of data. -
Yingjin Song, Yupei Du, Denis Paperno, and Albert Gatt.
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences? Findings of ACL 2025.
TL;DR:We propose a vision-language benchmark for multi-event temporal grounding and reasoning in image sequences.. -
Yupei Du, Albert Gatt, and Dong Nguyen.
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics. COLING 2025. Previously DMLR Workshop @ ICLR 2024. [pdf]
TL;DR: We show that the training dynamics of an efficient but weak model can be transferred to much more capable models to achieve better robustness and efficiency. -
Goya van Boven, Yupei Du, and Dong Nguyen.
Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns. FAccT 2024. [pdf]
TL;DR: We show that few-shot counterfactual data augmentation can effectively debias Dutch coreference resolution systems for non-binary pronouns. -
Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma.
Understanding Gender Bias in Knowledge Base Embeddings. ACL 2022. [pdf]
TL;DR: We propose methods to both quantify and trace the origins of gender biases in knowledge base (embeddings), using a closed-form approximation of influence functions. -
Yupei Du, Yuanbin Wu, Man Lan.
Exploring Human Gender Stereotypes with Word Association Test. EMNLP 2019. [pdf]
TL;DR: We use label propagation to quantify and visualize how gender biases are transferred and reinforced through word associations, and therefore offer a large-scale dataset of word-level gender bias scores.
Experience
-
Visiting PhD researcher at LMU Munich, Munich, Germany. Jan. 2025 – Mar. 2025.
Advised by Prof. Barbara Plank at MaiNLP.
-
Applied Scientist Intern at Amazon.com Inc., Berlin, Germany. Nov. 2022 – May 2023.
Mentored by Prof. Ziawasch Abedjan.
-
Research Intern at Sogou Inc., Hangzhou, China. Jul. 2019 – Jan. 2020.
Services
-
Program Committee (2021–): ACL, EMNLP, NAACL, EACL. (2024-): COLM
-
Outstanding Reviewer: ACL 2022.