Skip to content
View duyuankai1992's full-sized avatar

Block or report duyuankai1992

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
50 stars written in Python
Clear filter

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,582 31,631 Updated Jan 5, 2026

Inference code for Llama models

Python 59,019 9,812 Updated Jan 26, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 51,542 4,281 Updated Jan 4, 2026

Grok open release

Python 50,573 8,369 Updated Aug 30, 2024

A generative speech model for daily dialogue.

Python 38,466 4,183 Updated Dec 3, 2025

OpenMMLab Detection Toolbox and Benchmark

Python 32,243 9,835 Updated Aug 21, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,254 2,694 Updated Aug 12, 2024

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Python 22,044 2,453 Updated Oct 2, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 20,396 2,148 Updated Dec 18, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 19,067 1,309 Updated Oct 21, 2025

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 14,303 1,487 Updated Jan 4, 2026

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,290 1,238 Updated Nov 4, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,668 750 Updated Sep 22, 2025

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 8,254 850 Updated Aug 21, 2025

Chinese version of GPT2 training code, using BERT tokenizer.

Python 7,607 1,699 Updated Apr 25, 2024

总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力

Python 7,438 1,208 Updated Aug 24, 2022
Python 6,828 1,155 Updated Dec 21, 2025

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,713 450 Updated May 29, 2024

Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 6,626 588 Updated Oct 24, 2024

[AAAI 2025] Official implementation of "OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on"

Python 6,500 941 Updated May 13, 2024

A synthetic data generator for text recognition

Python 3,619 1,019 Updated Jul 18, 2024

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Python 3,148 272 Updated Jan 10, 2025

VideoSys: An easy and efficient system for video generation

Python 2,014 133 Updated Aug 27, 2025

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,857 139 Updated Jul 5, 2024

Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis

Python 1,538 150 Updated Jul 29, 2024

🩹Editing large language models within 10 seconds⚡

Python 1,357 101 Updated Aug 13, 2023

WACV 2020 "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison"

Python 1,121 150 Updated Mar 18, 2023

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,072 89 Updated Jun 13, 2024

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

Python 864 53 Updated May 8, 2025
Next