Yupei Du

Ph.D. student at Utrecht University

Email: y [dot] du [at] uu [dot] nl

Address: BBG 5.05, 3584CC Utrecht, the Netherlands

Short Bio

I am a final-year Ph.D. student at Utrecht University, advised by Dr. Dong Nguyen and Prof. Albert Gatt. I work on NLP and ML, and I am now interested in the attribution of language models, with a focus on reasoning tasks.

Before joining UU, I received both my bachelor’s (Psychology, 2017) and my master’s (Computer Science, 2020, advised by Dr. Yuanbin Wu) degrees from East China Normal University.

I am joining Saarland University as a postdoc with Prof. Alexander Koller from Sep. 2025.

Selected Publications

Yupei Du, Yingjin Song, Hugh Mee Wong, Daniil Ignatev, Albert Gatt, and Dong Nguyen.

Disentangling the Roles of Representation and Selection in Data Pruning. ACL 2025.
TL;DR: We disentangled and systematically studied the influence of data representation and selection algorithm in data pruning.
Yuqian Li, Yupei Du, Yufang Liu, Feifei Feng, Mou Xiao Feng, and Yuanbin Wu.

On Support Samples of Next Word Prediction. ACL 2025.
TL;DR:We studied the training instances that support the predictions of language models, and reveal that supporting is likely an intrinsic property of data.
Yingjin Song, Yupei Du, Denis Paperno, and Albert Gatt.

Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences? Findings of ACL 2025.
TL;DR:We propose a vision-language benchmark for multi-event temporal grounding and reasoning in image sequences..
Yupei Du, Albert Gatt, and Dong Nguyen.

FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics. COLING 2025. Previously DMLR Workshop @ ICLR 2024. [pdf]
TL;DR: We show that the training dynamics of an efficient but weak model can be transferred to much more capable models to achieve better robustness and efficiency.
Goya van Boven, Yupei Du, and Dong Nguyen.

Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns. FAccT 2024. [pdf]
TL;DR: We show that few-shot counterfactual data augmentation can effectively debias Dutch coreference resolution systems for non-binary pronouns.
Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma.

Understanding Gender Bias in Knowledge Base Embeddings. ACL 2022. [pdf]
TL;DR: We propose methods to both quantify and trace the origins of gender biases in knowledge base (embeddings), using a closed-form approximation of influence functions.
Yupei Du, Yuanbin Wu, Man Lan.

Exploring Human Gender Stereotypes with Word Association Test. EMNLP 2019. [pdf]
TL;DR: We use label propagation to quantify and visualize how gender biases are transferred and reinforced through word associations, and therefore offer a large-scale dataset of word-level gender bias scores.

Experience

Visiting PhD researcher at LMU Munich, Munich, Germany. Jan. 2025 – Mar. 2025.

Advised by Prof. Barbara Plank at MaiNLP.
Applied Scientist Intern at Amazon.com Inc., Berlin, Germany. Nov. 2022 – May 2023.

Mentored by Prof. Ziawasch Abedjan.
Research Intern at Sogou Inc., Hangzhou, China. Jul. 2019 – Jan. 2020.

Services

Program Committee (2021–): ACL, EMNLP, NAACL, EACL. (2024-): COLM
Outstanding Reviewer: ACL 2022.