I am a Senior Algorithm Expert at Alibaba Cloud’s Apsara Lab and Tongyi Lab. I obtained my Ph.D. degree and B.S. degree both from Zhejiang University in 2018 and 2012, respectively.
My current research focuses on large language models for reasoning
and content safety
. Prior to this, I contributed to the nationwide implementation of the City Brain Project in multiple cities across China, with a focus on applying artificial intelligence to urban governance in order to enhance administrative efficiency. I have published over 20 papers in top-tier conferences such as NeurIPS / ICLR / ICML / CVPR / ECCV / ACL / NAACL / KDD / ACM MM and leading journals such as TIP / TCSVT / TMM / TKDE / TOMM.
We are recruiting interns who are interested in large language model reasoning
. Please feel free to contact me.
Aug 21, 2025 | Two papers was accepted by EMNLP 2025. |
May 15, 2025 | One paper was accepted by ACL 2025 main conference. |
Jan 23, 2025 | Two papers were accepted by ICLR 2025. |
Jan 23, 2025 | Two papers were accepted by NAACL 2025 with one oral presentation. |
Sep 26, 2024 | One paper was accepted by NeurIPS 2024. |
May 17, 2024 | One paper was accepted by KDD 2024 - Research Track. |
May 02, 2024 | One paper was accepted by ICML 2024. |
(*Corresponding Author, †Project Lead)
-
Don’t Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
Shaotian Yan, Chen Shen*, Wenxiao Wang, Liang Xie, Junjie Liu, and Jieping Ye
In The Thirteenth International Conference on Learning Representations, 2025
Few-shot Chain-of-Thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs), functioning as a whole to guide these models in generating reasoning steps toward final answers. However, we observe that isolated segments, words, or tokens within CoT demonstrations can unexpectedly disrupt the generation process of LLMs. The model may overly concentrate on certain local information present in the demonstration, introducing irrelevant noise into the reasoning process and potentially leading to incorrect answers. In this paper, we investigate the underlying mechanism of CoT through dynamically tracing and manipulating the inner workings of LLMs at each output step, which demonstrates that tokens exhibiting specific attention characteristics are more likely to induce the model to take things out of context; these tokens directly attend to the hidden states tied with prediction, without substantial integration of non-local information. Building upon these insights, we propose a Few-shot Attention Intervention method (FAI) that dynamically analyzes the attention patterns of demonstrations to accurately identify these tokens and subsequently make targeted adjustments to the attention weights to effectively suppress their distracting effect on LLMs. Comprehensive experiments across multiple benchmarks demonstrate consistent improvements over baseline methods, with a remarkable 5.91% improvement on the AQuA dataset, further highlighting the effectiveness of FAI.
-
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
Sinan Fan, Liang Xie, Chen Shen*†, Ge Teng, Xiaosong Yuan, Xiaofeng Zhang, Chenxi Huang, and 3 more authors
In The Thirteenth International Conference on Learning Representations, 2025
-
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Junjie Liu, Shaotian Yan, Chen Shen*, Zhengdong Xiao, Liang Xie, Wenxiao Wang, and Jieping Ye
In Findings of the Association for Computational Linguistics: NAACL 2025, 2025
-
From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks
Xiaofeng Zhang, Yihao Quan, Chen Shen†, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, and 3 more authors
In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025
-
Instance-adaptive Zero-shot Chain-of-Thought Prompting
Xiaosong Yuan, Chen Shen†, Shaotian Yan, Xiaofeng Zhang, Liang Xie, Wenxiao Wang, Renchu Guan, and 2 more authors
In Advances in Neural Information Processing Systems, 2024
Please feel free to contact me via email or WeChat.