Skip to content
View vshanyiao's full-sized avatar

Block or report vshanyiao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🎙️ Arabic TTS models (Tacotron2, FastPitch)

Jupyter Notebook 135 33 Updated Dec 13, 2025

Code and Data for Tau-Bench

Python 1,039 166 Updated Aug 28, 2025

Real time interactive streaming digital human

Python 6,956 1,082 Updated Jan 1, 2026

WebRTC and ORTC implementation for Python using asyncio

Python 4,967 860 Updated Nov 29, 2025

Event Driven Orchestration & Scheduling Platform for Mission Critical Applications

Java 26,213 2,426 Updated Jan 2, 2026

This is the GitHub page for publicly available emotional speech data.

378 27 Updated Jan 6, 2022

B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.

Python 26 4 Updated Jun 3, 2024

基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo

Python 105 12 Updated Mar 8, 2024

A generative speech model for daily dialogue.

Python 38,454 4,178 Updated Dec 3, 2025

Foundational model for human-like, expressive TTS

Python 4,196 694 Updated Jul 30, 2024

A powerful framework for building realtime voice AI agents 🤖🎙️📹

Python 8,924 2,329 Updated Jan 3, 2026

Recognize speech from an audio file and convert it into animation FBX

Python 24 3 Updated Mar 7, 2022

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Python 70,820 7,731 Updated Jan 4, 2026

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 17 1 Updated Apr 27, 2024

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,266 132 Updated May 30, 2025
Python 354 36 Updated May 17, 2024

🤖 Build voice-based LLM agents. Modular + open source.

Python 3,673 646 Updated Nov 15, 2024

A PyTorch-based Speech Toolkit

Python 10,997 1,620 Updated Jan 1, 2026

Manipulate audio with a simple and easy high level interface

Python 9,695 1,124 Updated Jul 26, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,100 5,884 Updated Aug 16, 2024

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 4,550 263 Updated Dec 14, 2025

KenLM: Faster and Smaller Language Model Queries

C++ 2,714 534 Updated Mar 30, 2025

Library for fast text representation and classification.

HTML 26,466 4,815 Updated Mar 22, 2024

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,114 3,933 Updated Jan 4, 2026

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Python 411 61 Updated Nov 20, 2025

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Python 1,224 177 Updated Feb 5, 2024

无需情感标注的情感可控语音合成模型,基于VITS

Jupyter Notebook 1,397 169 Updated Mar 30, 2023

人人都能用英语

TypeScript 33,172 4,681 Updated Nov 25, 2025
Next