Stars
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Tongyi Deep Research, the Leading Open-source Deep Research Agent
This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
MTEB: Massive Text Embedding Benchmark
Question and Answer based on Anything.
Production-ready platform for agentic workflow development.
Generating fake data for the JVM (Java, Kotlin, Groovy) has never been easier!
从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)
ChatPilot: Chat Agent Web UI,实现Chat对话前端,支持Google搜索、文件网址对话(RAG)、代码解释器功能,复现了Kimi Chat(文件,拖进来;网址,发出来)。
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
文本相似度,语义向量,文本向量,text-similarity,similarity, sentence-similarity,BERT,SimCSE,BERT-Whitening,Sentence-BERT, PromCSE, SBERT
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,支持GraphRAG,无须安装任何第三方agent库。
yuyhao / ChatPDF
Forked from shibing624/ChatPDFRAG for Local LLM, chat with PDF/doc/txt files, ChatPDF
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc.…
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Retrieval and Retrieval-augmented LLMs
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用
A series of large language models developed by Baichuan Intelligent Technology
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.