Skip to content
View ipengx1029's full-sized avatar

Block or report ipengx1029

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

mHC kernels implemented in CUDA

Cuda 122 7 Updated Jan 4, 2026

This project aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters.

Python 48 3 Updated Dec 29, 2025

Official style files for papers submitted to venues of the Association for Computational Linguistics

BibTeX Style 1,461 305 Updated Nov 13, 2025

Official implementation of Log-linear Sparse Attention (LLSA).

Python 42 1 Updated Jan 2, 2026
Python 98 14 Updated Nov 28, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,742 273 Updated Dec 30, 2025

Learning about CUDA by writing PTX code.

Python 151 6 Updated Feb 27, 2024

TAAC2025初赛第十四名O_o队伍代码

Python 52 12 Updated Oct 27, 2025
Python 37 4 Updated Oct 31, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 505 24 Updated Dec 23, 2025

A collection of GPU experiments and benchmarks for my personal understanding and research.

Cuda 18 4 Updated Dec 8, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 173 18 Updated Jan 3, 2026

https://bbuf.github.io/gpu-glossary-zh/

Python 24 Updated Nov 7, 2025

GRID: Generative Recommendation with Semantic IDs

Python 520 90 Updated Oct 15, 2025

Minimal reproduction of OneRec

Python 810 117 Updated Dec 17, 2025

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 158 15 Updated Aug 14, 2024

轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

548 49 Updated Jun 16, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 343 68 Updated Dec 3, 2025

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Python 4,510 659 Updated Aug 30, 2025

大模型训练框架,支持现代大模型训练的全流程,包括基础预训练、长上下文预训练、思维链推理微调、强化学习训练、指令混合微调、直接偏好优化、模型权重合并、Web 推理服务

Python 7 Updated Oct 21, 2025

Efficient Triton Kernels for LLM Training

Python 6,001 458 Updated Jan 2, 2026

Benchmarking code for running quantized kernels from vLLM and other libraries

Python 12 1 Updated Dec 3, 2024

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook 2,341 325 Updated Jan 3, 2026

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

Python 109 34 Updated Dec 23, 2025

[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"

Python 203 22 Updated Nov 25, 2025

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 153 21 Updated Aug 21, 2025

Reduce kernel based on CUTLASS CuTe and TMA.

Cuda 9 Updated Sep 25, 2025

The first W4A4KV4 quantized + 50% sparse LLMs!

Python 20 1 Updated Nov 13, 2025

A minimal PyTorch re-implementation of Qwen3 VL with a fancy CLI

Python 297 17 Updated Dec 2, 2025

VAE from Scratch in Pure C

C 1 Updated Sep 13, 2025
Next