yuyhao

yuyhao

2 followers · 1 following

Stars

shell-nlp / gpt_server

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

Python 242 21 Updated Dec 25, 2025

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,356 334 Updated Dec 29, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,804 1,369 Updated Dec 30, 2025

XiaoMi / subllm

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Python 68 4 Updated Aug 13, 2024

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Python 3,053 528 Updated Jan 3, 2026

netease-youdao / QAnything

Question and Answer based on Anything.

Python 13,803 1,328 Updated Mar 24, 2025

langgenius / dify

Production-ready platform for agentic workflow development.

Python 124,432 19,350 Updated Jan 2, 2026

datafaker-net / datafaker

Generating fake data for the JVM (Java, Kotlin, Groovy) has never been easier!

Java 1,719 226 Updated Dec 31, 2025

Tongjilibo / build_MiniLLM_from_scratch

从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)

Python 511 60 Updated Mar 23, 2025

shibing624 / ChatPilot

ChatPilot: Chat Agent Web UI，实现Chat对话前端，支持Google搜索、文件网址对话（RAG）、代码解释器功能，复现了Kimi Chat(文件，拖进来；网址，发出来)。

Svelte 588 60 Updated Jun 23, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 53,738 5,889 Updated Dec 30, 2025

FreedomIntelligence / HuatuoGPT-II

HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)

Python 397 60 Updated Aug 30, 2024

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,696 294 Updated Aug 14, 2024

hellonlp / sentence-similarity

文本相似度，语义向量，文本向量，text-similarity，similarity, sentence-similarity，BERT，SimCSE，BERT-Whitening，Sentence-BERT, PromCSE, SBERT

Python 75 13 Updated Nov 26, 2024

dongrixinyu / JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Python 3,790 446 Updated Nov 27, 2025

shibing624 / ChatPDF

RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能，基于本地LLM、embedding模型、reranker模型实现，支持GraphRAG，无须安装任何第三方agent库。

Python 824 143 Updated Apr 2, 2025

yuyhao / ChatPDF

Forked from shibing624/ChatPDF

RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF

Python 1 Updated Jan 9, 2024

run-llama / llama_index

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Python 46,121 6,677 Updated Jan 2, 2026

xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc.…

Python 2,462 280 Updated Sep 26, 2024

bojone / CoSENT

比Sentence-BERT更有效的句向量方案

Python 375 25 Updated Nov 9, 2022

liucongg / ChatGPTBook

《ChatGPT原理与实战：大型语言模型的算法、技术和私有化》

Python 369 77 Updated Dec 9, 2023

google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,467 789 Updated Nov 7, 2025

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python 11,083 820 Updated Dec 15, 2025

shibing624 / similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包，支持亿级数据文搜文、文搜图、图搜图，python3开发，开箱即用。

Python 891 90 Updated Oct 29, 2024

openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 16,915 1,359 Updated Oct 6, 2025

LlamaFamily / Llama-Chinese

Llama中文社区，实时汇总最新Llama学习资料，构建最好的中文Llama大模型开源生态，完全开源可商用

Python 14,752 1,304 Updated Apr 6, 2025

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,123 292 Updated Nov 8, 2024

yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 6,625 588 Updated Oct 24, 2024

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Python 4,508 659 Updated Aug 30, 2025

DLLXW / baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

Python 2,887 348 Updated May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly