LAS: listen, attented and spell,Google 思想: sequence to sequence的思想,模型分为encoder和dec ...
论文: SpecAugment: A Simple Data Augmentation Methodfor Automatic Speech Recognition 思想: SpecAugment是一种log梅尔声谱层面上的数据增强方法,可以将模型训练的过拟合问题转化为欠拟合问题,以便通过大网络和长时训练策略来缓解欠拟合问题,提升语音识别效果 模型: 输入特征:log梅尔声谱 声谱增强:将log ...
2020-09-16 23:09 0 1603 推荐指数:
LAS: listen, attented and spell,Google 思想: sequence to sequence的思想,模型分为encoder和dec ...
论文: Deep-FSMN for Large Vocabulary Continuous Speech Recognition 思想: 对于大词汇量语音识别,往往需要更深的网络结构,但是当FSMN[1]或cFSMN[2]的结构很深时容易引发剃度消失和爆炸问题 ...
论文: SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODELFOR SPEECH RECOGNITION ...
论文: EESEN:END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING ...
论文: CTC:Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks 思想: 语音识别中,一般包含语音 ...
论文: A time delay neural network architecture for efficient modeling of longtemporal contexts ...
的时序长度,在大规模语音数据训练时提升计算效率; 2)decoder输入采样策略,如果训练时 ...
论文: TRANSFORMER-TRANSDUCER:END-TO-END SPEECH RECOGNITION WITH SELF-ATTENTION 思想: 1)借助RNN-T在语音识别上的优势,通过tranformer替换RNN-T中的RNN结构,实现 ...