Speech-to-text, text-to-speech, and speaker recognition
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Interface for OuteTTS models
Open-source multi-speaker long-form text-to-speech model
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Official PyTorch Implementation
A LaTeX class for producing presentations and slides
Towards Human-Level Text-to-Speech through Style Diffusion
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
The HTML Presentation Framework
The ioquake3 community effort to continue supporting/developing id's
The official KotlinConf application
macOS System-wide audio equalizer & volume mixer
A private, local meeting notes assistant
A generative speech model for daily dialogue
One-click deployment (including offline integration package)
A PyTorch-based Speech Toolkit
Offline speech recognition API for Android, iOS, Raspberry Pi
Synchronized Translation for Videos
Multi-modal large language model designed for audio understanding
Jekyll template for a conference website containing program
Instant voice cloning by MIT and MyShell. Audio foundation model
A deep learning toolkit for Text-to-Speech, battle-tested in research
A library for development of single-page full-stack web applications
Foundational model for human-like, expressive TTS