语音识别概述
数据/语料库
英文数据 • TIMIT:音素识别,LDC版权 • WSJ:新闻播报,LDC版权 • Switchboard:电话对话,LDC版权 • Aurora4,鲁棒语音识别(WSJ加噪)(http://aurora.hsnr.de/aurora-4.html)• Librispeech:有声读物,1000小时,开源(http://openslr.org/12/) • AMI:会议,开源(http://openslr.org/16/) • TED-LIUM:演讲,开源(http://openslr.org/19/) • CHiME-4:平板远讲,需申请 • CHiME-5/6:聚会聊天,需申请
中文数据 • THCHS-30,30小时,开源(http://openslr.org/18/) • HKUST,150小时,电话对话,LDC版权 • AIShell-1:178小时,开源(http://openslr.org/33/) • AIShell-2:1000小时,开源需申(http://www.aishelltech.com/aishell_2)• aidatatang_200zh,200小时,开源(http://openslr.org/62/) • MAGICDATA,755小时,开源(http://openslr.org/68/)
工具包
• HTK: http://htk.eng.cam.ac.uk/ (c) • Kaldi: http://kaldi-asr.org/ (c , python)【推荐】 • ESPNet: https://github.com/espnet/ (pytorch based) • Lingvo: https://github.com/tensorflow/lingvo.git (Tensorflow based)
推荐阅读: Daniel Jurafsky and James H. Martin, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, Second Edition, Prentice Hall, 2008 (或第三版)
• Xuedong Huang, Alex Aceoro, Hsiao-Wuen Hon, Spoken Language Processing: A guide to theory, algorithm, and system development, Prentice Hall, 2011
• 韩继庆、张磊、郑铁然,《语音信号处理》,清华大学出版社• 赵力,《语音信号处理》,机械工业出版社
• Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
• Dong Yu and Li Deng, Automatic Speech Recognition - A Deep Learning Approach, Springer, 2014
• 俞栋、邓力著,俞凯、钱彦旻译,《解析深度学习:语音识别实践》,电 子工业出版社
• 李航,《统计学习方法》(第二版),清华大学出版社
• Automatic Speech Recognition (ASR) 2018-2019 Lectures, School of Informatics, University of Edinburgh,https://www.inf.ed.ac.uk/teaching/courses/asr/lectures-2019.html
• Speech Recognition, EECS E6870 – Spring 2016, Columbia University,http://www.ee.columbia.edu/~stanchen/spring16/e6870/outline.html