Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
A generative speech model for daily dialogue.
OpenMMLab Detection Toolbox and Benchmark
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
OCR, layout analysis, reading order, table recognition in 90+ languages
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Chinese version of GPT2 training code, using BERT tokenizer.
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
a state-of-the-art-level open visual language model | 多模态预训练模型
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
[AAAI 2025] Official implementation of "OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on"
A synthetic data generator for text recognition
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
VideoSys: An easy and efficient system for video generation
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis
🩹Editing large language models within 10 seconds⚡
WACV 2020 "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison"
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"