I am a third-year Ph.D candidate (2023.09 – ) at Mila and Université de Montréal, advised by Prof. Aishwarya Agrawal. My research centers on faithful and efficient multimodal foundation models. My current work focuses on visual reasoning for spatial understanding with unified multimodal modeling. I have also contributed work on online data selection for MLLM training (CVPR 2026), modality alignment for resource-efficient MLLMs (CVPR 2025 Highlight), benchmarking code hallucination (AAAI 2025), and task decomposition for faithful multimodal reasoning (EMNLP 2024). I co-organized the VLMs4All workshop at CVPR 2025, centered on diversity in vision-language models. Before that I obtained my Master's degree in Computer Science from Harbin Institute of Technology, Shenzhen, China (2020-2023) under the supervision of Prof. Baotian Hu. I obtained my Bachelor's degree in Computer Science from the University of Electronic Science and Technology of China (2016-2020). I was a research intern at Alibaba DAMO Academy
.