[轉]kaldi上的深度神經網絡


轉:http://blog.csdn.net/wbgxx333/article/details/41019453

   深度神經網絡已經是語音識別領域最熱的話題了。從2010年開始,許多關於深度神經網絡的文章在這個領域發表。許多大型科技公司(谷歌和微軟)開始把DNN用到他們的產品系統里。(備注:谷歌的應該是google now,微軟的應該是win7和win8操作系統里的語音識別和他的SDK等等)

     但是,沒有一個工具箱像kaldi這樣可以很好的提供支持。因為先進的技術無時無刻不在發展,這就意味着代碼需要跟上先進技術的步伐和代碼的架構需要重新去思考。

     我們現在在kaldi里提供兩套分離的關於深度神經網絡代碼。一個在代碼目錄下的nnet/和nnetbin/,這個是由 Karel Vesely提供。此外,還有一個在代碼目錄nnet-cpu/和nnet-cpubin/,這個是由 Daniel Povey提供(這個代碼是從Karel早期版本修改,然后重新寫的)。這些代碼都是很官方的,這些在以后都會發展的。

    在例子目錄下,比如: egs/wsj/s5/, egs/rm/s5, egs/swbd/s5 and egs/hkust/s5b,神經網絡的例子腳本都可以找到。 Karel的例子腳本可以在local/run_dnn.sh或者local/run_nnet.sh,而Dan的例子腳本在local/run_nnet_cpu.sh。在運行這些腳本前,為了調整系統,run.sh你必須首先被運行。

    我們會很快的把這兩個神經網絡的詳細文檔公布。現在,我們總結下這兩個的最重要的區別:

        1.Karel的代碼,是用GPU加速的單線程的SGD訓練,而Dan的代碼是用多個CPU的多線程方式;

        2.Karel的代碼支持區分性訓練,而Dan的代碼不支持

     除了這些,在架構上有很多細小的區別。

     我們希望對於這些庫添加更多的文檔,Karel的版本的代碼有一些稍微過時的文檔在Karel's DNN training implementation.

    中文翻譯見:http://blog.csdn.net/wbgxx333/article/details/24438405

 -------------------------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------------------------------

這個是昨晚無意中的發現,由於遠在cmu大學的苗亞傑(Yajie Miao)博士的貢獻,我們又可以在kaldi上使用深度學習的模塊。之前在htk上使用dbn一直還沒成功,希望最近可以早點實現。以下是苗亞傑博士的主頁上的關於kaldi+pdnn的介紹。希望大家可以把自己的力量也貢獻出來,讓我們作為學生的多學習學習。

 

    
Kaldi+PDNN -- Implementing DNN-based ASR Systems with Kaldi and PDNN
 
Overview
     
Kaldi+PDNN contains a set of fully-fledged Kaldi ASR recipes, which realize DNN-based acoustic modeling using the PDNN toolkit. The overall pipeline has 3 stages: 
    
1. The initial GMM model is built with the existing Kaldi recipes
      
2. DNN acoustic models are trained by PDNN
      
3. The trained DNN model is ported back to Kaldi for hybrid decoding or further tandem system building



Hightlights  
     
Model diversity
. Deep Neural Networks (DNNs); Deep Bottleneck Features (DBNFs); Deep Convolutional Networks (DCNs)
     
PDNN toolkit. Easy and fast to implement new DNN ideas
    
Open license. All the codes are released under Apache 2.0, the same license as Kaldi
    
Consistency with Kaldi. Recipes follow the Kaldi style and can be integrated seemlessly with the existing setups
     
 
Release Log  
     
Dec 2013  ---  version 1.0 (the initial release)
Feb 2014  ---  version 1.1 (clean up the scripts, add the dnn+fbank recipe run-dnn-fbank.sh, enrich PDNN) 
    
Requirements
     
1. A GPU card should be available on your computing machine.
      
2. Initial model building should be run, ideally up to train_sat and align_fmllr
     
3. Software Requirements:
     
Theano. For information about Theano installation on Ubuntu Linux, refer to this document editted by Wonkyum Lee from CMU.
pfile_utils. This script (that is, kaldi-trunk/tools/install_pfile_utils.sh) installs pfile_utils automatically. 
     
Download
   
Kaldi+PDNN is hosted on Sourceforge. You can enter your Kaldi Switchboard setup (such as egs/swbd/s5b) and download the latest version via svn:
    
svn co svn://svn.code.sf.net/p/kaldipdnn/code-0/trunk/pdnn pdnn
svn co svn://svn.code.sf.net/p/kaldipdnn/code-0/trunk/steps_pdnn steps_pdnn
svn co svn://svn.code.sf.net/p/kaldipdnn/code-0/trunk/run_swbd run_swbd
ln -s run_swbd/* ./
     
Now the new run-*.sh scripts appear in your setup. You can run them directly.
    
Recipes
   
run-dnn.sh DNN hybrid system over fMLLR features
  Targets: context-dependent states from the SAT model exp/tri4a
    
Input: spliced fMLLR features 
    
Network:  360:1024:1024:1024:1024:1024:${target_num}
    
Pretraining: pre-training with stacked denoising autoencoders
       
run-dnn-fbank.sh DNN hybrid system over filterbank features
  Targets: context-dependent states from the SAT model exp/tri4a
    
Input: spliced log-scale filterbank features with cepstral mean and variance normalization
    
Network:  330:1024:1024:1024:1024:1024:${target_num}
    
Pretraining: pre-training with stacked denoising autoencoders
      
run-bnf-tandem.sh GMM Tandem system over Deep Bottleneck features   [ reference paper ]
  Targets: BNF network training uses context-dependent states from the SAT model exp/tri4a
    
Inputspliced fMLLR features
    
BNF Network: 360:1024:1024:1024:1024:42:1024:${target_num}
    
Pretraining: pre-training the prior-to-bottleneck layers (360:1024:1024:1024:1024) with stacked denoising autoencoders
      
run-bnf-dnn.sh DNN hybrid system over Deep Bottleneck features   [ reference paper ]
  BNF network: trained in the same manner as in run-bnf-tandem.sh
    
Hybrid Inputspliced BNF features
    
BNF Network: 378:1024:1024:1024:1024:${target_num}
    
Pretraining: pre-training with stacked denoising autoencoders
     
run-cnn.sh Hybrid system based on deep convolutional networks (DCNs)  [ reference paper ]
  The CNN recipe is not stable. Needs more investigation. 
    
Targets
context-dependent states from the SAT model exp/tri4a
    
Inputspliced log-scale filterbank features with cepstral mean and variance normalization; each frame is taken as an input feature map
    
Network:  two convolution layers followed by three fully-connected layers. See this page for how to config the network structure.
    
Pretrainingno pre-training is performed for DCNs

Experiments & Results
    
The recipes are developed based on the Kaldi 110-hour Switchboard setup. This is the standard system you can get if you run egs/swbd/s5b/run.sh. Our experiments follow the similar configurations as described inthis paper. We have the following data partitions. The "validation" set is used to measure frame accuracy and determine termination in DNN fine-tuning.
     
training -- train_100k_nohup (110 hours)         validation -- train_dev_nohup        testing -- eval2000 (HUB5'00)

Recipes
WER% on HUB5'00-SWB
WER% on HUB5'00
run-dnn.sh
          19.3
       25.7
run-dnn-fbank.sh
          21.4        28.4
run-bnf-tandem.sh
          TBA
       TBA
run-bnf-dnn.sh
          TBA
       TBA
run-cnn.sh
          TBA
       TAB

Our hybrid recipe run-dnn.sh is giving WER comparable with this paper (Table 5 for fMLLR features). We are confident to think that our recipes perform comparably with the Kaldi internal DNN setups. 

Want to Contribute?
  
We look forward to your contributions. Improvement can be made on the following aspects (but not limited to):
    
1. Optimization to the above recipes
2  New recipes
3. Porting the recipes to other datasets
4. Experiments and results
5. Contributions to the PDNN toolkit
 
Contact Yajie Miao (ymiao@cs.cmu.edu) if you have any questions or suggestions.

 

 

 

上述就是苗博士的介紹。具體可見:http://www.cs.cmu.edu/~ymiao/kaldipdnn.html。

 

有些復雜,后續有時間深入再看。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM