准備工作
1.安裝MFA庫,參考官方文檔
2.拼音詞典可使用MFA中自帶的mandarin字典,或下載普通話詞典mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon
3.普通話模型,可使用MFA自帶的mandarin模型,或下載普通話模型,或自行訓練模型(參考官方文檔在語料庫上訓練新的聲學模型)。
4.音頻數據,該目錄下每個文件下包含.wav文件和.lab文件,.lab文件中存放的是.wav的拼音。
強制對齊普通話音頻
1. 生成.lab文件
可通過執行text_pinyin.py文件將音頻對應的文本文件轉為.lab文件。
import os
import sys
import numpy as np
from pypinyin import pinyin, lazy_pinyin, Style
import re
root_dir = "/Users/mfa_data/my_corpus/synthesis_audio/"
pattern = re.compile(r'(.*)\.txt$')
for root, dir, files in os.walk(root_dir):
for filename in files:
#print(filename)
output = pattern.match(filename)
if output is not None:
print(root, filename)
text_file = open(root+"/"+filename)
line = text_file.read().strip()
line = line.replace(",", "")
pinyin = lazy_pinyin(line, style=Style.TONE3, neutral_tone_with_five=True)
pinyinline = ' '.join(pinyin)
print(line)
target_text_file = open(root+"/"+output.group(1)+".lab", "w")
target_text_file.write(pinyinline)
target_text_file.close()
text文件如下所示:
張大千國畫有什么
.lab文件如下所示:
zhang1 da4 qian1 guo2 hua4 you3 shen2 me5
2.生成TextGrid文件
- 激活MFA。命令
conda activate aligner
- 校驗音頻數據的格式是否適合MFA。命令
mfa validate 音頻數據 普通話字典
mfa validate ~/mfa_data/my_corpus/test/input/ ~/mfa_data/my_corpus/mandarin-for-montreal-forced-aligner-pre-trained-model.txt
- 對齊音頻。命令
mfa align 音頻數據 普通話詞典 普通話模型 保存目錄
mfa align ~/mfa_data/my_corpus/test/input/ ~/mfa_data/my_corpus/mandarin-for-montreal-forced-aligner-pre-trained-model.txt mandarin ~/mfa_data/my_corpus/test/output
若出現報錯信息,按報錯信息檢查或者更換模型,完成后的信息提示如下:
TextGrid文件格式如下:
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 2.1875
tiers? <exists>
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 2.1875
intervals: size = 10
intervals [1]:
xmin = 0
xmax = 0.22
text = "zhang1"
intervals [2]:
xmin = 0.22
xmax = 0.37
text = "da4"
intervals [3]:
xmin = 0.37
xmax = 0.69
text = "qian1"
intervals [4]:
xmin = 0.69
xmax = 0.84
text = "guo2"
可使用Praat軟件打開TextGrid文件,同時打開.wav和.textGrid文件,可得到如下界面:
按TextGrid文件分割音頻得到音素
1.執行read_textgrid.py,將音頻文件按音素進行分割,並將分割信息保存至xls文件。
import textgrid
import xlwt
import datetime
from pydub import AudioSegment
import os
def read_textgrid(file_name):
"""
textgrid文件中的size的值是幾就表示有幾個item, 每個item下面包含class, name, xmin, xmax, intervals的鍵值對,
item中的size是幾就表示這個item中有幾個intervals, 每個intervals有xmin, xmax, text三個鍵值參數.
"""
datas = []
tg = textgrid.TextGrid()
tgrid = tg.read(file_name)
intervalTire = tg.tiers[1] #時長集合
intervals = intervalTire.intervals #返回所有的 interval 的列表
for i, interval in enumerate(intervalTire):
start = intervalTire[i].minTime
end = intervalTire[i].maxTime
text= intervalTire[i].mark
if(text != ""):
# print(start, end, text)
data = [start, end, text]
datas.append(data)
print(len(datas))
return datas
def cut_wav(datas, file_name):
for i, data in enumerate(datas):
start = datas[i][0] #音素開始時間
end = datas[i][1] #音素結束時間
text = datas[i][2] #音素文本
cut_start = start * 1000
cut_end = end * 1000
sound = AudioSegment.from_file(file_name, "wav")
save_name = text + "_"+ file_name.split('/')[-1] + "_"+ str(datetime.datetime.utcfromtimestamp(start).strftime('%H:%M:%S.%f')) + "-" + str(datetime.datetime.utcfromtimestamp(end).strftime('%H:%M:%S.%f')) +".wav"
print(save_name)
phon =sound[cut_start: cut_end]
phon.export(save_name, format="wav")
def write_xlsx(datas, filename):
workbook = xlwt.Workbook(encoding = 'utf-8')
# 創建一個worksheet
worksheet = workbook.add_sheet('sheet1', cell_overwrite_ok=True)
for i, data in enumerate(datas):
start = data[0] #音素開始時間
end = data[1] #音素結束時間
text = data[2] #音素文本
start_time = datetime.datetime.utcfromtimestamp(start).strftime('%H:%M:%S.%f')
end_time = datetime.datetime.utcfromtimestamp(end).strftime('%H:%M:%S.%f')
# 參數對應 行, 列, 值
worksheet.write(i, 0, text)
worksheet.write(i, 1, start_time)
worksheet.write(i, 2, end_time)
# 保存
workbook.save(filename)
print(filename + "XLS保存成功")
path=r"/Users/zhangxiao/mfa_data/my_corpus/test/input/"
for fileName in os.listdir(path):
if os.path.splitext(fileName)[1] == '.TextGrid':
textgrid_file = path + fileName
datas = read_textgrid(textgrid_file)
cut_wav(datas, textgrid_file.split(".")[0] + ".wav")
write_xlsx(datas, textgrid_file.split(".")[0] + ".xls")
2.將音素文件按音素類別歸類
上述程序將音頻切割后,所有的音素均在一個文件夾中,可執行下列程序,將同一音素進行歸類。
import os
import shutil
"""
將同一文件夾下的不同音素文件分類到不同文件夾
"""
dir=r"/Users/zhangxiao/mfa_data/my_corpus/test/phoneme/"
files_list=os.listdir(dir)
files_list.sort()
for file in files_list:
filename, suffix = os.path.splitext(file)
if suffix == '.wav':
phoneme = filename.split('_')[0]
if not os.path.exists(os.path.join(dir, phoneme)):
# shutil.rmtree(phoneme) 刪除目錄
os.mkdir(os.path.join(dir, phoneme))
src = os.path.join(dir, file)
dst = os.path.join(dir, phoneme)
shutil.move(src, dst)
# print(filename)
print("文件歸類完成")