这是一个用C++实现ASR推理的项目，在树莓派4B等ARM平台也可以流畅的运行，由Transformer模型中优化而来.zip

共456个文件

h：116个

cpp：101个

py：68个

版权申诉

数据集

46 浏览量 2024-01-02 15:13:10 上传评论收藏 24.21MB ZIP 举报

《C++实现ASR推理在嵌入式系统中的应用与实践》在现代技术发展日新月异的今天，语音识别（Automatic Speech Recognition，简称ASR）已经成为了人机交互的重要桥梁，尤其在嵌入式系统领域，其轻量化、高效能的需求更为突出。本文将探讨一个基于C++实现的ASR推理项目，该项目经过优化，能够在树莓派4B这样的ARM平台上流畅运行，其核心是Transformer模型的改进。一、ASR技术概述 ASR是一种将人类语音转化为文本的技术，它涵盖了信号处理、模式识别和自然语言处理等多个领域的知识。在嵌入式系统中，ASR的重要性在于，它能够使设备无需依赖外部服务器，独立进行语音识别，提升用户体验并减少数据传输的延迟。二、C++与嵌入式系统的结合 C++作为一门强大的编程语言，因其高效的性能和丰富的库支持，被广泛用于嵌入式开发。在ASR项目中，C++的选择可以提供更好的内存管理和运行效率，确保在资源有限的嵌入式平台如树莓派4B上也能稳定运行。三、Transformer模型与优化 Transformer模型是自然语言处理领域的里程碑，以其自注意力机制和并行计算能力著称。在ASR任务中，Transformer模型可以捕获语音信号的长期依赖性，提高识别精度。然而，原版Transformer模型的计算复杂度较高，不适合资源受限的嵌入式环境。项目中的优化工作可能包括模型的轻量化、量化以及硬件加速等策略，以适应树莓派4B的ARM架构。四、嵌入式数据集数据集是训练和验证ASR模型的关键。对于嵌入式系统而言，数据集需要满足两个条件：一是多样性，涵盖不同语境、口音和噪声环境的语音样本，以增强模型的泛化能力；二是大小适中，能在有限的存储空间内加载和处理。项目中可能包含了特定场景下收集的语音数据，为模型提供了针对性的训练基础。五、FastASR-main项目结构分析 "FastASR-main"可能是项目的主要代码库，包含以下关键组件： 1. 预处理模块：对输入的音频信号进行降噪、分帧等处理，使之适合输入到模型。 2. 模型模块：包含了优化后的Transformer模型，可能有模型权重文件和推理接口。 3. 解码器模块：将模型输出的特征转换成可读文本。 4. 应用接口：提供给用户或其他系统调用的API，实现语音识别功能。六、实际应用场景嵌入式ASR技术可广泛应用于智能家居、智能车载、智能安防等领域。例如，用户可以通过语音命令控制智能设备，或者在车辆行驶中通过语音交互实现导航、娱乐等功能，显著提升了人机交互的便捷性和安全性。总结，这个C++实现的ASR推理项目展示了如何在嵌入式系统中高效利用Transformer模型，通过优化和定制化，使得ASR技术在树莓派4B等ARM平台上得以顺畅运行，充分体现了技术的创新性和实用性。理解并学习该项目，有助于开发者掌握在资源有限的环境中实现高性能语音识别的技巧。

资源推荐

资源详情

资源评论

收起资源包目录

这是一个用C++实现ASR推理的项目，在树莓派4B等ARM平台也可以流畅的运行，由Transformer模型中优化而来.zip （456个子文件）

fastasr.cpp.backup 2KB

fastasr_stream_c.c.backup 1KB

fastasr_cli_c.c.backup 1KB

fastasr.h.backup 800B

vad_core.c 26KB

resample_by_2_internal.c 20KB

vad_filterbank.c 14KB

complex_fft.c 10KB

resample_fractional.c 8KB

vad_sp.c 6KB

resample_48khz.c 6KB

min_max_operations.c 5KB

spl_sqrt.c 5KB

vector_scaling_operations.c 5KB

spl_init.c 5KB

complex_bit_reverse.c 4KB

division_operations.c 3KB

webrtc_vad.c 3KB

vad_gmm.c 3KB

downsample_fast.c 2KB

spl_sqrt_floor.c 2KB

get_scaling_square.c 1KB

cross_correlation.c 1KB

energy.c 1KB

spl_inl.c 1KB

checks.cc 5KB

dot_product_with_scale.cc 1KB

setup.cfg 2KB

.clang-format 996B

.clang-tidy 2KB

pybind11Common.cmake 13KB

FindPythonLibsNew.cmake 11KB

pybind11NewTools.cmake 9KB

pybind11Tools.cmake 8KB

FindEigen3.cmake 3KB

FindCatch.cmake 2KB

CODEOWNERS 182B

test_pytypes.cpp 28KB

test_class.cpp 23KB

test_virtual_functions.cpp 22KB

test_methods_and_attributes.cpp 21KB

test_stl.cpp 21KB

FeatureExtract.cpp 21KB

test_numpy_dtypes.cpp 21KB

test_sequences_and_iterators.cpp 20KB

test_numpy_array.cpp 19KB

test_smart_ptr.cpp 18KB

test_eigen.cpp 18KB

test_factory_constructors.cpp 18KB

test_builtin_casters.cpp 15KB

test_interpreter.cpp 14KB

test_multiple_inheritance.cpp 12KB

ModelImp.cpp 12KB

test_exceptions.cpp 11KB

test_copy_move.cpp 11KB

test_operator_overloading.cpp 9KB

test_callbacks.cpp 9KB

test_kwargs_and_defaults.cpp 9KB

CTCDecode.cpp 9KB

test_buffers.cpp 8KB

test_custom_type_casters.cpp 7KB

Audio.cpp 7KB

EmbedLayer.cpp 7KB

test_pickling.cpp 7KB

pybind11_cross_module_tests.cpp 6KB

test_constants_and_functions.cpp 6KB

test_enum.cpp 6KB

EmbedLayer.cpp 5KB

ConvModule.cpp 5KB

test_stl_binders.cpp 5KB

test_tagbased_polymorphic.cpp 4KB

test_numpy_vectorize.cpp 4KB

test_local_bindings.cpp 4KB

pybind11_tests.cpp 4KB

Predictor.cpp 4KB

test_modules.cpp 4KB

test_call_policies.cpp 4KB

test_iostream.cpp 4KB

Vocab.cpp 4KB

test_const_name.cpp 4KB

util.cpp 4KB

ModelParams.cpp 4KB

EncSelfAttn.cpp 4KB

test_chrono.cpp 3KB

test_eval.cpp 3KB

ModelParams.cpp 3KB

ModelImp.cpp 3KB

SubEncoder.cpp 3KB

test_docstring_options.cpp 3KB

test_opaque_types.cpp 3KB

DecSelfAttn.cpp 3KB

Encoder.cpp 3KB

ModelImp.cpp 3KB

Decoder.cpp 2KB

DecoderSrcAttn.cpp 2KB

test_thread.cpp 2KB

共 456 条

# FastASR 这是一个用C++实现ASR推理的项目，它依赖很少，安装也很简单，推理速度很快，在树莓派4B等ARM平台也可以流畅的运行。支持的模型是由Google的Transformer模型中优化而来，数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时)，所以识别效果也很好，可以媲美许多商用的ASR软件。 ## 项目简介目前本项目实现了4个模型，3个非流式模型，1个流式模型，如下表所示。 | 名称 | 来源 | 数据集 | 模型 | 语言 | |:----------------:|:--------------------------------------------------------------------------------------------------------------------------:|:-------------------:|:-----------------------------------:|:-----:| | paraformer | [阿里达摩院](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 私有数据集(60000h) | Paraformer-large | zh+en | | k2_rnnt2 | [kaldi2](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR) | WenetSpeech(10000h) | pruned_transducer_stateless2 | zh | | conformer | [paddlespeech](https://github.com/PaddlePaddle/PaddleSpeech/releases/tag/r1.0.1) | WenetSpeech(10000h) | conformer_wenetspeech-zh-16k | zh | | conformer_online | [paddlespeech](https://github.com/PaddlePaddle/PaddleSpeech/releases/tag/r1.0.1) | WenetSpeech(10000h) | conformer_online_wenetspeech-zh-16k | zh | * **非流式模型**：每次识别是以句子为单位，所以实时性会差一些，但准确率会高一些。 * **流式模型**：模型的输入是语音流，并实时返回语音识别的结果，但是准确率会下降些。 conformer_online是流式模型，其它模型为非流式模型。目前通过使用VAD技术, 非流式模型支持大段的长语音识别。上面提到的这些模型都是基于深度学习框架（paddlepaddle或pytorch）实现的, 本身的性能已经很不错了，即使在没有GPU的个人电脑上运行，也能满足实时性的要求（如:时长为10s的语音，推理时间小于10s，即可满足实时性）。但是要把深度学习模型部署在ARM平台，会遇到两个方面的困难。 * 不容易安装，需要自己编译一些组件。 * 执行效率很慢，无法满足实时性的要求。因此就有这个项目，它由纯C++编写，仅实现了模型的推理过程。 * **语言优势**: 由于C++和Python不同，是编译型语言，编译器会根据编译选项针对不同平台的CPU进行优化，更适合在不同CPU平台上面部署，充分利用CPU的计算资源。 * **独立**: 实现不依赖于现有的深度学习框架如pytorch、paddle、tensorflow等。 * **依赖少**: 项目仅使用了两个第三方库libfftw3和libopenblas，并无其他依赖，所以在各个平台的可移植行很好，通用性很强。 * **效率高**：算法中大量使用指针，减少原有算法中reshape和permute的操作，减少不必要的数据拷贝，从而提升算法性能。针对C++用户和python用户，本项目分别生成了静态库libfastasr.a和PyFastASR.XXX模块，调用方法可以参考example目录中的例子。 ### 未完成工作 * 量化和压缩模型 ## python安装目前fastasr在个平台的支持情况如下表, 其他未支持的平台可通过源码编译获得对应的whl包。 | | macOS Intel | Windows 64bit | Windows 32bit | Linux x86 | Linux x64 | Linux aarch64 | |---------------|----|-----|-----|----|-----|----| | CPython 3.6 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | CPython 3.7 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | CPython 3.8 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | CPython 3.9 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | CPython 3.10 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | CPython 3.11 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 可通过pip直接安装 ``` pip install fastasr ``` ## 源码编译安装指南 ### Ubuntu 安装依赖安装依赖库libfftw3 ```shell sudo apt-get install libfftw3-dev libfftw3-single3 ``` 安装依赖库libopenblas ```shell sudo apt-get install libopenblas-dev ``` 安装python环境 ```shell sudo apt-get install python3 python3-dev ``` ### MacOS 安装依赖安装依赖库fftw ```shell sudo brew install fftw ``` 安装依赖库openblas ```shell sudo brew install openblas ``` ### 编译源码 #### Build for Linux 下载最新版的源码 ```shell git clone https://github.com/chenkui164/FastASR.git ``` 编译最新版的源码， ```shell cd FastASR/ mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release .. make ``` 编译python的whl安装包 ```shell cd FastASR/ python -m build ``` #### Build for Windows [Windows编译指南](win/readme.md) 使用VisualStudio 2022打开CMakeLists.txt，选择Release编译。需要在vs2022安装linux开发组件。 ### 下载预训练模型 #### paraformer预训练模型下载进入FastASR/models/paraformer_cli文件夹，用于存放下载的预训练模型. ```shell cd ../models/paraformer_cli ``` 从modelscope官网下载预训练模型，预训练模型所在的[仓库地址](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files) 也可通过命令一键下载。 ```shell wget --user-agent="Mozilla/5.0" -c "https://www.modelscope.cn/api/v1/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/repo?Revision=v1.0.4&FilePath=model.pb" mv repo\?Revision\=v1.0.4\&FilePath\=model.pb model.pb ``` 将用于Python的模型转换为C++的，这样更方便通过内存映射的方式直接读取参数，加快模型读取速度。 ```shell ../scripts/paraformer_convert.py model.pb ``` 查看转换后的参数文件wenet_params.bin的md5码，md5码为c77bc27e5758ebdc28a9024460e48602，表示转换正确。 ``` md5sum -b wenet_params.bin ``` #### k2_rnnt2预训练模型下载进入FastASR/models/k2_rnnt2_cli文件夹，用于存放下载的预训练模型. ```shell cd ../models/k2_rnnt2_cli ``` 从huggingface官网下载预训练模型，预训练模型所在的[仓库地址](https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2) 也可通过命令一键下载。 ```shell wget -c https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2/resolve/main/exp/pretrained_epoch_10_avg_2.pt ``` 将用于Python的模型转换为C++的，这样更方便通过内存映射的方式直接读取参数，加快模型读取速度。 ```shell ../scripts/k2_rnnt2_convert.py pretrained_epoch_10_avg_2.pt ``` 查看转换后的参数文件wenet_params.bin的md5码，md5码为33a941f3c1a20a5adfb6f18006c11513，表示转换正确。 ``` md5sum -b wenet_params.bin ``` #### conformer_wenetspeech-zh-16k预训练模型下载进入FastASR/models/paddlespeech_cli文件夹，用于存放下载的预训练模型. ```shell cd ../models/paddlespeech_cli ``` 从PaddleSpeech官网下载预训练模型，如果之前已经在运行过PaddleSpeech，则可以不用下载，它已经在目录`~/.paddlespeech/models/conformer_wenetspeech-zh-16k`中。 ```shell wget -c https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar.gz ``` 将压缩包解压wenetspeech目录下 ``` mkdir wenetspeech tar -xzvf asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar.gz -C wenetspeech ``` 将用于Python的模型转换为C++的，这样更方便通过内存映射的方式直接读取参数，加快模型读取速度。 ```shell ../scripts/paddlespeech_convert.py wenetspeech/exp/conformer/checkpoints/

评论收藏

内容反馈

版权申诉