MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.
Features
- Apple-Silicon-optimized speech library built on MLX for fast local inference
- Command-line and Python APIs for TTS with controls for voice, speed, language, and output format
- Web interface with live waveform and 3D audio visualization plus file upload and playback
- Support for multi-language text-to-speech and transcription scenarios
- Quantization options and efficient model loading for better performance on resource-limited devices
- Built-in examples for tasks like audiobook creation and integrated FastAPI server for HTTP access