一款高质量开源TTS

最新推荐文章于 2025-07-07 22:30:00 发布

迷途小书童的Note

最新推荐文章于 2025-07-07 22:30:00 发布

阅读量779

点赞数 5

CC 4.0 BY-SA版权

本文链接：https://blog.csdn.net/djstavaV/article/details/141178396

大家好，我是小书童。

本篇给大家介绍一款开源的高质量 TTS (Text To Speech) 模型的推理和训练库，parler-tts。

parler-tts 是一种轻量级文本转语音 (TTS) 模型，它可以按照给定说话者的风格（性别、音调、说话风格等）生成高质量、听起来自然的语音。它是 Stability AI 和爱丁堡大学的 Dan Lyth 和 Simon King 论文 Natural language guidance of high-fidelity text-to-speech with synthetic annotations 的代码复现。

实操

与其他 TTS 模型不同，Parler-TTS 是一个完全开源的版本。所有的数据集、预处理、训练代码和权重均在许可下公开发布，代码仓库中包含了 Parler-TTS 的完整推理和训练代码，感谢作者付出。

今日，parler-tts 发布了2个新的模型，分别包含880M和2.3B参数。

Parler-TTS Mini，880M参数模型
Parler-TTS Large，2.3B 参数模型

下面开始安装

# 创建全新python环境，使用3.9版本
conda create -n tts python=3.9


# 激活环境
conda activate tts


# 安装parler-tts
pip install git+https://github.com/huggingface/parler-tts.git


# 或者通过源码来安装
git clone https://github.com/huggingface/parler-tts.git
cd parler-tts
python setup.py install


# 安装特定版本的numpy
pip install numpy==1.26.4

安装完毕后，看个示例

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf


device = "cuda:0" if torch.cuda.is_available() else "cpu"


model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")


prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."


input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)


generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

大家可以听听处理后的效果

不过比较遗憾的是，目前放出的模型都是基于英文数据来训练，因此暂时还不支持除英语外的其它语种。

不过，官方也提供了训练方法，如果有需要的话，可参考文档去自行训练。训练方法是与语言无关的，这意味着可以投入自己的中文训练数据，还可以从当前英文模型开始，进行模型微调，训练文档地址：https://github.com/huggingface/parler-tts/blob/main/training/README.md。