ipengx1029

ipengx1029

7 followers · 180 following

Achievements

Lists (19)

Sort

Stars

AndreSlavescu / mHC.cu

mHC kernels implemented in CUDA

Cuda 122 7 Updated Jan 4, 2026

WKQ9411 / Mini-LLM

This project aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters.

Python 48 3 Updated Dec 29, 2025

acl-org / acl-style-files

Official style files for papers submitted to venues of the Association for Computational Linguistics

BibTeX Style 1,461 305 Updated Nov 13, 2025

SingleZombie / LLSA

Official implementation of Log-linear Sparse Attention (LLSA).

Python 42 1 Updated Jan 2, 2026

gogongxt / nano-sglang

Python 98 14 Updated Nov 28, 2025

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,742 273 Updated Dec 30, 2025

unixpickle / learn-ptx

Learning about CUDA by writing PTX code.

Python 151 6 Updated Feb 27, 2024

salmon1802 / O_o

TAAC2025初赛第十四名O_o队伍代码

Python 52 12 Updated Oct 27, 2025

KaiyangLi1992 / Uni-LoRA

Python 37 4 Updated Oct 31, 2025

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 505 24 Updated Dec 23, 2025

StuartSul / gpu-experiments

A collection of GPU experiments and benchmarks for my personal understanding and research.

Cuda 18 4 Updated Dec 8, 2025

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 173 18 Updated Jan 3, 2026

BBuf / gpu-glossary-zh

https://bbuf.github.io/gpu-glossary-zh/

Python 24 Updated Nov 7, 2025

snap-research / GRID

GRID: Generative Recommendation with Semantic IDs

Python 520 90 Updated Oct 15, 2025

AkaliKong / MiniOneRec

Minimal reproduction of OneRec

Python 810 117 Updated Dec 17, 2025

alexzhang13 / flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 158 15 Updated Aug 14, 2024

hans0809 / MiniMind-in-Depth

轻量级大语言模型MiniMind的源码解读，包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

548 49 Updated Jun 16, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 343 68 Updated Dec 3, 2025

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Python 4,510 659 Updated Aug 30, 2025

hemengfei2014-stack / stepone

大模型训练框架，支持现代大模型训练的全流程，包括基础预训练、长上下文预训练、思维链推理微调、强化学习训练、指令混合微调、直接偏好优化、模型权重合并、Web 推理服务

Python 7 Updated Oct 21, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,001 458 Updated Jan 2, 2026

neuralmagic / quant_kernel_benchmarks

Benchmarking code for running quantized kernels from vLLM and other libraries

Python 12 1 Updated Dec 3, 2024

rasbt / reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook 2,341 325 Updated Jan 3, 2026

huggingface / optimum-onnx

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

Python 109 34 Updated Dec 23, 2025

ruikangliu / FlatQuant

[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"

Python 203 22 Updated Nov 25, 2025

HandH1998 / QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 153 21 Updated Aug 21, 2025

HydraQYH / cute_reduce

Reduce kernel based on CUTLASS CuTe and TMA.

Cuda 9 Updated Sep 25, 2025

csguoh / OBR

The first W4A4KV4 quantized + 50% sparse LLMs!

Python 20 1 Updated Nov 13, 2025

Emericen / tiny-qwen

A minimal PyTorch re-implementation of Qwen3 VL with a fancy CLI

Python 297 17 Updated Dec 2, 2025

Phoenix8215 / VAE-C

VAE from Scratch in Pure C

C 1 Updated Sep 13, 2025

ipengx1029

Lists (19)

C++/Linux

course

go

NLP专属repo

NLP以及开发面经

NLP随便点点

scratch

分布式

大模型

开发

强化学习

情感原因

推理优化

数据集

有意思的开源软件

爬虫

特别收藏

训练/推理引擎

论文repo

Stars