Pingchuan Ma

Ph.D Student, ASU

Email: pingchua (at) asu.edu

Biography

Pingchuan Ma (馬平川) is a second-year Ph.D Student at ASU, advised by Prof. Huan Liu in DMML. His current research interests include causal machine learning. Previously, he obtained his master of science degree from USC and worked as an analog engineer at Renesas. Before that, he received his undergraduate degree from ShanghaiTech University. He had the previlige to work with Prof. Jiaqi Gu before. [...]

Name

Family name: 馬(Mǎ) means horse.

Given name: 平(Píng)川(chuān) means flat and open land without geographical barriers.

Quoted from a Chinese idiom 一馬平川, which depicts a lone horse galloping freely across a vast, unobstructed plain. It symbolizes unimpeded progress and a life journey free of obstacles.

眾峰來自天目山，勢若駿馬奔平川。

宋·蘇軾《東坡全集·卷三·遊徑山》

There are lots of mountains from Tianmu Mountain, which have the might of fine horses galloping across flat ground.

Su Shi of the Song dynasty in Visit Jing Mountain

Research

(* indicates equal contribution, highlight indicates representative papers)

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang and Huan Liu

Under Review / Paper / Code / abstract / bibtex

Key Words: LLM, Chain of Thought, Data Distribution

Abstract: Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more superficial than it appears, motivating us to explore further. In this paper, we study CoT reasoning via a data distribution lens and investigate if CoT reasoning reflects a structured inductive bias learned from in-distribution data, allowing the model to conditionally generate reasoning paths that approximate those seen during training. Thus, its effectiveness is fundamentally bounded by the degree of distribution discrepancy between the training data and the test queries. With this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To investigate each dimension, we design DataAlchemy, an isolated and controlled environment to train LLMs from scratch and systematically probe them under various distribution conditions. Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

To be released upon acceptance.

Education

Arizona State University

Ph.D. in Computer Engineering

Jan. 2024 - Present

Tempe, AZ

University of Southern California

M.Sc. in Electrical Engineering

Jan. 2021 - Dec. 2022

Los Angeles, CA

ShanghaiTech University

B.Eng. in Electrical Engineering

Sep. 2016 - Jun. 2020

Shanghai, China

Experience

DMML

Graduate Research Assistant

Aug. 2025 - Present

Tempe, Az

ScopeX

Research Assistant | With Prof. Jiaqi Gu

Jan. 2024 - Aug. 2025

Tempe, AZ

Renesas

Analog Engineer

Jan. 2023 - Dec. 2023

San Jose, CA

Optoelectronic Devices Group

Research Assistant | With Prof. Baile Chen

Sept. 2018 - Jun. 2020

Shanghai, China