About Me

Hello! I am Kaiyue Wen. I am a second-year Phd student at Stanford University, where I am grateful to be advised by Tengyu Ma and Percy Liang. I graduated from Tsinghua University, where I was a member of Yao’s pilot class. Here are my CV and Publications. During my undergraduate study, I am fortunate to be advised by Tengyu Ma, Zhiyuan Liu, Andrej Risteski, Jingzhao Zhang, Yuhao Wang and Zhiyuan Li.

My research interest spreads broadly in deep learning. My long-term goal is to understand the physics behind deep learning and I believe a combination of theoretical analysis and empirical study is essential for this goal.

Recently, I’ve become fascinated by two fundamental axes of scaling in deep learning.

  1. Demystifying pretraining: Pretraining has been the driving force behind the evolution of large language models, yet many foundational algorithmic choices remain poorly understood. Key aspects such as optimizers, architectures, and hyperparameter scaling strategies still lack consensus. My goal is to clarify these choices through rigorous benchmarking (e.g., benchmarking modern optimizers) and theoretical analysis (e.g., exploring the representation limitation of RNNs, architectures beyond $\mathrm{TC}^0$, and river-valley loss landscape). Most of my research in this direction is carried out in the open-source project Marin.

  2. New algorithmic paradigms in reasoning: With the recent progress in reasoning reinforcement learning (RL), particularly innovations like long-chain-of-thought RL, there is growing potential to push the limits of model reasoning. While I am new to this field, my aim is to design end-to-end trainable multi-agent RL systems that build upon and extend the capabilities of current long-CoT RL paradigms.

Recent News

One More Thing

I keep a firm faith in analytical thinking, hard work, and consistent self-improvement. Any advice or feedback is welcome. You can use this Anonymous Form or discuss with me in person.