Skip to content
View stmatengss's full-sized avatar
♨️
Focusing
♨️
Focusing

Organizations

@tuna

Block or report stmatengss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

我的开发经验+提示词库=vibecoding工作站;My development experience + prompt dictionary = Vibecoding workstation;ניסיון הפיתוח שלי + מילון פרומפטים = תחנת עבודה Vibecoding;私の開発経験 + プロンプト辞書 = Vibecoding ワークステーション;나…

Python 5,458 566 Updated Dec 30, 2025

Official inference library for pre-processing of Mistral models

Python 835 121 Updated Dec 27, 2025

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 834 46 Updated Dec 30, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Python 1,213 49 Updated Jun 8, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,646 253 Updated Dec 29, 2025

Light Image Video Generation Inference Framework

Python 1,604 114 Updated Dec 30, 2025

A book for Learning the Foundations of LLMs

15,121 1,399 Updated Dec 12, 2025

Roo Code gives you a whole dev team of AI agents in your code editor.

TypeScript 21,491 2,720 Updated Dec 30, 2025

High-performance safetensors model loader

Python 89 16 Updated Dec 17, 2025

A framework for efficient model inference with omni-modality models

Python 1,859 234 Updated Dec 30, 2025

Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

TypeScript 141,122 18,694 Updated Dec 30, 2025

High Performance KV Cache Store for LLM

C 44 4 Updated Nov 27, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 478 27 Updated Nov 19, 2025
Python 645 65 Updated Dec 29, 2025

ModelScope: bring the notion of Model-as-a-Service to life.

Python 8,604 895 Updated Dec 23, 2025

A unified inference and post-training framework for accelerated video generation.

Python 2,878 230 Updated Dec 30, 2025

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 407 68 Updated Dec 30, 2025

The Intelligent Inference Scheduler for Large-scale Inference Services.

Go 50 13 Updated Dec 28, 2025

Modular RDMA Interface

C++ 69 15 Updated Dec 28, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 730 75 Updated Nov 30, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,364 92 Updated Dec 30, 2025

This is the code repository of paper "LightDSA: Enabling Efficient DSA Through Hardware-Aware Transparent Optimization"

C++ 5 Updated Oct 20, 2025

Offline optimization of your disaggregated Dynamo graph

Python 136 40 Updated Dec 23, 2025

Materials for learning SGLang

703 51 Updated Dec 15, 2025

The best ChatGPT that $100 can buy.

Python 39,472 5,022 Updated Dec 28, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,297 115 Updated Dec 27, 2025

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 760 74 Updated Nov 28, 2025

Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.

TypeScript 13,912 1,477 Updated Dec 30, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,383 3,251 Updated Dec 30, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,336 364 Updated Dec 29, 2025
Next