Stars
🎙️ Arabic TTS models (Tacotron2, FastPitch)
Real time interactive streaming digital human
WebRTC and ORTC implementation for Python using asyncio
Event Driven Orchestration & Scheduling Platform for Mission Critical Applications
This is the GitHub page for publicly available emotional speech data.
B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.
基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo
A generative speech model for daily dialogue.
Foundational model for human-like, expressive TTS
A powerful framework for building realtime voice AI agents 🤖🎙️📹
Recognize speech from an audio file and convert it into animation FBX
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
upbit / GPT-SoVITS
Forked from RVC-Boss/GPT-SoVITS1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
🤖 Build voice-based LLM agents. Modular + open source.
Manipulate audio with a simple and easy high level interface
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
An Open Source text-to-speech system built by inverting Whisper.
Library for fast text representation and classification.
SGLang is a high-performance serving framework for large language models and multimodal models.
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!