Stars
JiaVMode / Qwen3-SmVL
Forked from ShaohonChen/Qwen3-SmVL将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调
Everything about the SmolLM and SmolVLM family of models
AI-Powered Photos App for the Decentralized Web 🌈💎✨
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Align Anything: Training All-modality Model with Feedback
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
yolov5 deepsort 行人 车辆 跟踪 检测 计数
Optimizing inference proxy for LLMs
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Ongoing research training transformer models at scale
Retrieval and Retrieval-augmented LLMs
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Scalable data pre processing and curation toolkit for LLMs
The state-of-the-art image restoration model without nonlinear activation functions.
用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.