♨️
Focusing
PhD, Tsinghua (16~21); Postdoc, Alibaba (21~23); Staff Engineer, Alibaba (23~present)
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
3
stars
written in Cuda
Clear filter
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.





