Lists (1)
Sort Name ascending (A-Z)
Starred repositories
tukuaiai / vibe-coding-cn
Forked from EnzeD/vibe-coding我的开发经验+提示词库=vibecoding工作站;My development experience + prompt dictionary = Vibecoding workstation;ניסיון הפיתוח שלי + מילון פרומפטים = תחנת עבודה Vibecoding;私の開発経験 + プロンプト辞書 = Vibecoding ワークステーション;나…
Official inference library for pre-processing of Mistral models
🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Light Image Video Generation Inference Framework
A book for Learning the Foundations of LLMs
Roo Code gives you a whole dev team of AI agents in your code editor.
High-performance safetensors model loader
A framework for efficient model inference with omni-modality models
Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
An early research stage expert-parallel load balancer for MoE models based on linear programming.
ModelScope: bring the notion of Model-as-a-Service to life.
A unified inference and post-training framework for accelerated video generation.
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
The Intelligent Inference Scheduler for Large-scale Inference Services.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
This is the code repository of paper "LightDSA: Enabling Efficient DSA Through Hardware-Aware Transparent Optimization"
Offline optimization of your disaggregated Dynamo graph
Distributed Compiler based on Triton for Parallel Systems
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels





