Stars
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Optimized primitives for collective multi-GPU communication
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version in translation
A Easy-to-understand TensorOp Matmul Tutorial
Distributed Compiler based on Triton for Parallel Systems
AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.
vincentloechner / polylib
Forked from harenome/polylibPolyLib official git.
An open-source AI agent that brings the power of Gemini directly into your terminal.
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
Shared Middle-Layer for Triton Compilation
SGLang is a fast serving framework for large language models and vision language models.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
A model compilation solution for various hardware
Adlik: Toolkit for Accelerating Deep Learning Inference
Tile primitives for speedy kernels
Efficient Triton Kernels for LLM Training
Building blocks for foundation models.
GPU programming related news and material links