Skip to content

alanliu14/CosyVoice-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CosyVoice-Enhanced ๐Ÿš€

Enhanced CosyVoice with Advanced API Services, WebUI, and Pretrained Voice Support

CosyVoice Enhanced License Python

โœจ Key Enhancements

This enhanced version of CosyVoice adds powerful production-ready features on top of the original CosyVoice project:

๐ŸŽฏ Pretrained Voice Support

  • 8 Built-in Voices: Keira, ๆญฅ้ž็ƒŸ, ่ฅ้”€ๅท-ๅฅณๅฃฐ, ๅ˜‰็„ถ, ้’Ÿ็ฆป, etc.
  • Instant Loading: No need for audio samples or prompts
  • Optimized Performance: RTF ~1.4-1.6 with smart caching
  • SFT Pathway: Proper routing for pretrained vs zero-shot voices

๐ŸŒ Production API Service

  • FastAPI Backend: RESTful endpoints with OpenAPI documentation
  • Multiple Output Modes: Standard, SSE streaming, WAV streaming
  • Zero-Latency Features: Smart caching eliminates 1-2s delays
  • GPU Optimization: FP16 support with 3-10x speed improvements
  • Performance Monitoring: Real-time RTF tracking and benchmarks

๐ŸŽจ Modern WebUI

  • Gradio 3.x Compatible: Fixed compatibility issues
  • Voice Management: Easy selection between zero-shot and pretrained
  • Real-time Preview: Instant audio generation and playback
  • Responsive Design: Works on desktop and mobile

โšก Smart Caching System

  • Feature Caching: Pre-extract and cache voice features
  • Pretrained Embedding Cache: Instant voice switching
  • Performance Boost: RTF improvements from 2.3โ†’1.6
  • Memory Efficient: Intelligent cache management

๐Ÿ—๏ธ Architecture

CosyVoice-Enhanced/
โ”œโ”€โ”€ api_service.py          # FastAPI production server
โ”œโ”€โ”€ webui_service.py        # Modern Gradio WebUI
โ”œโ”€โ”€ cached_voice_manager.py # Smart caching system
โ”œโ”€โ”€ audio_utils.py          # Audio processing utilities
โ”œโ”€โ”€ voice/                  # Pretrained voice files (*.pt)
โ”œโ”€โ”€ cosyvoice/             # Core CosyVoice engine
โ””โ”€โ”€ pretrained_models/     # Model checkpoints

๐Ÿš€ Quick Start

1. Installation

git clone https://github.com/yourusername/CosyVoice-Enhanced.git
cd CosyVoice-Enhanced

# Create conda environment
conda create -n cosyvoice-enhanced python=3.10
conda activate cosyvoice-enhanced

# Install dependencies
pip install -r requirements.txt

# For RTX 30/40 series GPUs
pip install -r requirements-rtx30-40.txt

2. Model Setup

# Download models via ModelScope
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')

3. Start Services

# Start API service (production)
python api_service.py --port 8080

# Start WebUI (development/demo)
python webui_service.py --port 7861

๐Ÿ“ก API Usage

REST API Examples

# Standard TTS with pretrained voice
curl -X POST http://localhost:8080/v1/audio/speech \
  -F "voice_name=Keira" \
  -F "text=Hello, this is CosyVoice Enhanced!" \
  -o output.wav

# SSE Streaming (real-time)
curl -N -X POST http://localhost:8080/v1/audio/speech/sse \
  -F "voice_name=ๅ˜‰็„ถ" \
  -F "text=ๅฎžๆ—ถๆตๅผ่ฏญ้Ÿณๅˆๆˆๆต‹่ฏ•"

# List available voices
curl http://localhost:8080/voices | jq .

# Performance monitoring
curl http://localhost:8080/performance | jq .

Python SDK

import requests

# TTS Generation
response = requests.post(
    "http://localhost:8080/v1/audio/speech",
    data={
        "voice_name": "Keira",
        "text": "Enhanced CosyVoice is amazing!"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

๐ŸŽญ Available Voices

Pretrained Voices (Instant)

  • Keira: English female voice
  • ๆญฅ้ž็ƒŸ: Chinese elegant female
  • ่ฅ้”€ๅท-ๅฅณๅฃฐ: Chinese marketing-style female
  • ๅ˜‰็„ถ: Chinese cute female
  • ้’Ÿ็ฆป: Chinese deep male
  • ๅถๅ†…ๆณ•: Chinese sophisticated female

Zero-shot Voices

  • Upload your own audio samples for custom voices
  • Automatic feature extraction and caching

๐Ÿ“Š Performance

Benchmark Results

Voice Type RTF (First Call) RTF (Cached) Latency
Pretrained 1.8-2.9 1.4-1.6 ~6-8s
Zero-shot 2.5-3.2 1.5-1.8 ~7-10s
CPU Mode 8-12 6-10 ~30-45s

GPU Optimization

  • FP16 Support: 2-3x speed improvement
  • CUDA Acceleration: Automatic GPU detection
  • Memory Efficient: Smart memory management
  • Batch Processing: Multiple requests handling

๐Ÿ”ง Configuration

Environment Variables

export COSYVOICE_MODEL_DIR="pretrained_models/CosyVoice2-0.5B"
export COSYVOICE_CACHE_SIZE="100"
export COSYVOICE_GPU_MEMORY_FRACTION="0.8"

API Configuration

# api_service.py
api_service = CosyVoiceAPIService(
    model_dir="pretrained_models/CosyVoice2-0.5B",
    use_gpu=True,
    fp16=True,
    cache_size=100
)

๐Ÿณ Docker Deployment

FROM nvidia/cuda:11.8-devel-ubuntu20.04

COPY . /app
WORKDIR /app

RUN pip install -r requirements.txt
EXPOSE 8080

CMD ["python", "api_service.py", "--host", "0.0.0.0", "--port", "8080"]
docker build -t cosyvoice-enhanced .
docker run -d --gpus all -p 8080:8080 cosyvoice-enhanced

๐Ÿค Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests.

Development Setup

git clone https://github.com/yourusername/CosyVoice-Enhanced.git
cd CosyVoice-Enhanced
pip install -e .

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

This enhanced version maintains the same Apache 2.0 license as the original CosyVoice project to ensure compatibility and proper attribution.

๐Ÿ™ Acknowledgments

Built on top of the amazing CosyVoice project by FunAudioLLM team.

Core Dependencies

  • CosyVoice: Base TTS engine
  • FastAPI: Modern API framework
  • Gradio: WebUI framework
  • PyTorch: Deep learning framework
  • ModelScope: Model hosting platform

๐Ÿ“ž Support

๐Ÿ—บ๏ธ Roadmap

  • Real-time Streaming: WebSocket support for real-time TTS
  • Voice Cloning: Advanced zero-shot voice cloning
  • Multi-language: Enhanced multilingual support
  • Mobile SDK: iOS/Android SDK development
  • Cloud Deployment: Kubernetes helm charts
  • Voice Studio: Advanced voice management interface

โญ Star this repo if you find it useful! โญ

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 39

Languages