Data preparation
Audio data
自己創建數據集:
10個不同的說話人
每個人說10句話
每句話包含3個詞
總共300個詞,(數字0~9)
Task
kaldi-trunk/egs/digits創建digits_audio
文件夾,然后digits_audio,再創建train
and test兩個文件夾。
以說話人的ID命名文件夾,存放該說話人的數據,選出1個說話人的數據作為測試
數據,其它9人作為訓練數據。
Acoustic data
創建一些文本文件與音頻數據關聯。每個文件包含很多字符串,這些字符串需要被排序的。當遇到排序問題時,可以用checking (utils/validate_data_dir.sh
) and fixing (utils/fix_data_dir.sh
) ,保證有序,另外,將utils文件夾添加到工程目錄里面。
Task
kaldi-trunk/egs/digits
,創建一個data文件夾,然后再創建test和train兩個子文件夾再data里面。
a.) spk2gender
Pattern: <speakerID> <gender>
cristine f dad m josh m july f # and so on...
b.)wav.scp
Pattern: <uterranceID> <full_path_to_audio_file>
dad_4_4_2 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/dad/4_4_2.wav july_1_2_5 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/1_2_5.wav july_6_8_3 /home/{user}/kaldi-trunk/egs/digits/digits_audio/train/july/6_8_3.wav # and so on...
c.)text
Pattern: <uterranceID> <text_transcription>
dad_4_4_2 four four two july_1_2_5 one two five july_6_8_3 six eight three # and so on...
d.)utt2spk
Pattern: <uterranceID> <speakerID>
dad_4_4_2 dad july_1_2_5 july july_6_8_3 july # and so on...
e.)corpus.txt
Pattern: <text_transcription>
one two five six eight three four four two # and so on...
每個文件對應1個發音,包含3個數字,因此100個發音,對應100行。
Language data
Task
kaldi-trunk/egs/digits/data/local,創建dict文件夾。
a.) lexicon.txt
'phone transcriptions' (taken from /egs/voxforge
). 發音詞典
Pattern: <word> <phone 1> <phone 2> ...
!SIL sil <UNK> spn eight ey t five f ay v four f ao r nine n ay n one hh w ah n one w ah n seven s eh v ah n six s ih k s three th r iy two t uw zero z ih r ow zero z iy r ow
b.)nonsilence_phones.txt
This file lists nonsilence phones that are present in your project.
Pattern: <phone>
ah ao ay eh ey f hh ih iy k n ow r s t th uw w v z
c.) silence_phones.txt
This file lists silence phones.
Pattern: <phone>
sil spn
d.) optional_silence.txt
This file lists optional silence phones.
Pattern: <phone>
sil
Project finalization
Tools attachment
Task
From kaldi-trunk/egs/wsj/s5
copy two folders (with the whole content) - utils
and steps
- and put them in your kaldi-trunk/egs/digits
directory. You can also create links to these directories. You may find such links in, for example, kaldi-trunk/egs/voxforge/s5
.
拷貝wsj/s5里面兩文件夾utils
and steps到本工程里
Scoring script
This script will help you to get decoding results.
SRILM installation
You also need to install language modelling toolkit that is used in my example - SRI Language Modeling Toolkit (SRILM).
安裝語言模型工具包SRILM
Task
For detailed installation instructions go to kaldi-trunk/tools/install_srilm.sh
(read all comments inside).
安裝腳本kaldi-trunk/tools/install_srilm.sh
Configuration files
It is not necessary to create configuration files but it can be a good habit for future.
Task
In kaldi-trunk/egs/digits
create a folder conf
. Inside kaldi-trunk/egs/digits/conf
create two files (for some configuration modifications in decoding and mfcc feature extraction processes - taken from /egs/voxforge
):
修改解碼配置和mfcc特征提取配置文件
a.) decode.config
first_beam=10.0 beam=13.0 lattice_beam=6.0
b.) mfcc.conf
--use-energy=false
Running scripts creation
兩種訓練方法:1.單音素訓練 2.簡單的3音素訓練。從解碼結果可以看到這兩種方法的區別。
Task
In kaldi-trunk/egs/digits
directory create 3 scripts:
a.) cmd.sh
本地機器跑

b.) path.sh
添加路徑

c.) run.sh

Getting results
Now all you have to do is to run run.sh
script.
go to newly made kaldi-trunk/egs/digits/exp
. You may notice there folders with mono
and tri1
results as well - directories structure are the same.
Go to mono/decode
directory. Here you may find result files (named in a wer_{number}
way)
Summary
This is just an example. The point of this short tutorial is to show you how to create 'anything' in Kaldi and to get a better understanding of how to think while using this toolkit. Personally I started with looking for tutorials made by the Kaldi authors/developers. After succesful Kaldi installation I launched some example scripts (Yesno, Voxforge, LibriSpeech - they are relatively easy and have free acoustic/language data to download - I used these three as a base for my own scripts).
Make sure you follow http://kaldi-asr.org/- official project website. There are two very useful sections for beginners inside:
a.) Kaldi tutorial - almost 'step by step' tutorial on how to set up an ASR system; up to some point this can be done without RM dataset. It is good to read it,
b.) Data preparation - very detailed explaination of how to use your own data in Kaldi.
More useful links about Kaldi I found:
https://sites.google.com/site/dpovey/kaldi-lectures - Kaldi lectures created by the main author
http://www.superlectures.com/icassp2011/category.php?lang=en&id=131 - similar; video version
http://www.diplomovaprace.cz/133/thesis_oplatek.pdf - some master diploma thesis about speech recognition using Kaldi
This is all from my side. Good luck!