【RV1126】移植sherpa实时语音识别和TTS文字转语音功能

2024-02-28 08:55:49 浏览数 (1)

参考:【RV1126】移植kaldi实时语音识别 https://blog.csdn.net/qq_28877125/article/details/130376397

交叉编译sherpa

1、下载arm-gcc,要求gcc大于10.0;刚开始用瑞芯微的gcc库,一直编译不过。

代码语言:javascript复制
wget -q https://huggingface.co/csukuangfj/sherpa-ncnn-toolchains/resolve/main/gcc-arm-10.3-2021.07-x86_64-arm-none-linux-gnueabihf.tar.xz
tar xf gcc-arm-10.3-2021.07-x86_64-arm-none-linux-gnueabihf.tar.xz


arm-none-linux-gnueabihf-gcc --version

arm-none-linux-gnueabihf-gcc (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29)) 10.3.1 20210621Copyright (C) 2020 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

其实,这里还是有问题,还是建议用瑞芯微提供的gcc编译工具包,要不会因为gcc的版本不一样,生成的可执行文件还是不能运行。

2、编译

代码语言:javascript复制
sh -x ./build-arm-linux-gnueabihf.sh

编译成功后,

3、下载模型库,参考:

代码语言:javascript复制
使用的模型是小模型:
https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/zipformer-transucer-models.html#sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16

https://huggingface.co/csukuangfj/sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16

直接通过浏览器下载好,然后拷贝到开发板上

4、在正点原子的rv1126开发板上测试

代码语言:javascript复制
[root@ATK-DLRV1126:/userdata/rv1126]# ./sherpa-ncnn-alsa 
> ./tokens.txt 
> ./encoder_jit_trace-pnnx.ncnn.param 
> ./encoder_jit_trace-pnnx.ncnn.bin 
> ./decoder_jit_trace-pnnx.ncnn.param 
> ./decoder_jit_trace-pnnx.ncnn.bin 
> ./joiner_jit_trace-pnnx.ncnn.param 
> ./joiner_jit_trace-pnnx.ncnn.bin 
> "default" 
> 4 
> greedy_search
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./joiner_jit_trace-pnnx.ncnn.bin", tokens="./tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=300)), enable_endpoint=True, hotwords_file="", hotwrods_score=1.5)
Current sample rate: 16000
Recording started!
Use recording device: default
0:床前明月光
1:疑似地上霜
2:举头望明月
3:低头思故乡
4:你好^Z[1]   Stopped                    ./sherpa-ncnn-alsa ./tokens.txt ./encoder_jit_trace-pnnx.ncnn.param ./encoder_jit_trace-pnnx.ncnn.bin ./decoder_jit_trace-pnnx.ncnn.param ./decoder_jit_trace-pnnx.ncnn.bin ./joiner_jit_trace-pnnx.ncnn.param ./joiner_jit_trace-pnnx.ncnn.bin "default" 4 greedy_search
[root@ATK-DLRV1126:/userdata/rv1126]#

0 人点赞