pyttsx的中文語音識別問題及探究之路

本文轉載自查看原文 2017-05-18 18:53 6770 pyttsx/ python/ 編碼轉化

最近在學習pyttsx時，發現中文閱讀一直都識別錯誤，從發音來看應該是字符編碼問題，但搜索之后並未發現解決方案。自己一路摸索解決，雖說最終的原因非常可笑，大牛們可能也是一眼就能洞穿，但也值得記錄一下。嗯，主要並不在於解決之道，而是探究之旅。

1、版本（python2中談編碼解碼問題不說版本都是耍流氓）

　　python：2.7

　　pyttsx：1.2

　　OS：windows10中文版

2、系統的各種字符編碼

sys.getdefaultencoding() ascii

sys.getfilesystemencoding() mbcs

locale.getdefaultlocale() ('zh_CN', 'cp936')

locale.getpreferredencoding() cp936

sys.stdin.encoding UTF-8

sys.stdout.encoding UTF-8

3、探究之路

　（1）初體驗：

　　按照http://pyttsx.readthedocs.io/en/latest/engine.html 的說明，傳入中文，使用unicode類型，utf-8編碼，結果發音並不是輸入的內容。

 1 #-*- coding: UTF-8 -*-
 2 import sys
 3 import pyttsx
 4 
 5 reload(sys)
 6 sys.setdefaultencoding("utf-8")
 7 
 8 text = u'你好，中文測試'
 9 engine = pyttsx.init()
10 engine.say(text)
11 engine.runAndWait()

　　（2）再試探：

　　或許是pyttsx內部轉換的問題？將傳入類型str類型，依然為utf-8編碼，但發音依舊不對，和之前一樣。

 1 #-*- coding: UTF-8 -*-
 2 import sys
 3 import pyttsx
 4 
 5 reload(sys)
 6 sys.setdefaultencoding("utf-8")
 7 
 8 text = '你好，中文測試'
 9 engine = pyttsx.init()
10 engine.say(text)
11 engine.runAndWait()

　　（3）困惑：

　　google、百度輪番上陣，並未發現有類似問題，難道是默認語音的問題？獲取屬性看看！

   voice = engine.getProperty('voice')

　　通過上述語句，獲取到的voice是 TTS_MS_ZH-CN_HUIHUI_11.0，在控制面板-語音識別中，可以看到huihui是中文語音，原因不在於此。

（4）深入深入，迷茫：

　　既然系統沒有問題，那看pyttsx的源碼怎么寫的吧，開源就這點好，有問題可以直接擼代碼。

在pyttsx\driver.py中的__init__函數中，可以看到windows平台，默認使用的是sapi5

 1  def __init__(self, engine, driverName, debug):
 2         '''
 3         Constructor.
 4 
 5         @param engine: Reference to the engine that owns the driver
 6         @type engine: L{engine.Engine}
 7         @param driverName: Name of the driver module to use under drivers/ or
 8             None to select the default for the platform
 9         @type driverName: str
10         @param debug: Debugging output enabled or not
11         @type debug: bool
12         '''
13         if driverName is None:
14             # pick default driver for common platforms
15             if sys.platform == 'darwin':
16                 driverName = 'nsss'
17             elif sys.platform == 'win32':
18                 driverName = 'sapi5'
19             else:
20                 driverName = 'espeak'
21         # import driver module
22         name = 'pyttsx.drivers.%s' % driverName
23         self._module = importlib.import_module(name)
24         # build driver instance
25         self._driver = self._module.buildDriver(weakref.proxy(self))
26         # initialize refs
27         self._engine = engine
28         self._queue = []
29         self._busy = True
30         self._name = None
31         self._iterator = None
32         self._debug = debug

　在pyttsx\driver\sapi5.py中可以看到say函數，在調用speak時，有一個toUtf8的轉換，難道最終傳入的是utf8編碼格式的？

1     def say(self, text):
2         self._proxy.setBusy(True)
3         self._proxy.notify('started-utterance')
4         self._speaking = True
5         self._tts.Speak(toUtf8(text), 19)

繼續向下探，在toUtf8的定義在pyttsx\driver\__init__.py中，只有1行

1 def toUtf8(value):
2     '''
3     Takes in a value and converts it to a text (unicode) type.  Then decodes that
4     type to a byte array encoded in utf-8.  In 2.X the resulting object will be a
5     str and in 3.X the resulting object will be bytes.  In both 2.X and 3.X any
6     object can be passed in and the object's __str__ will be used (or __repr__ if
7     __str__ is not defined) if the object is not already a text type.
8     '''
9     return six.text_type(value).encode('utf-8')

　繼續深入，pyttsx\six.py中，對text_type有定義

 1 # Useful for very coarse version differentiation.
 2 PY2 = sys.version_info[0] == 2
 3 PY3 = sys.version_info[0] == 3
 4 
 5 if PY3:
 6     string_types = str,
 7     integer_types = int,
 8     class_types = type,
 9     text_type = str
10     binary_type = bytes
11 
12     MAXSIZE = sys.maxsize
13 else:
14     string_types = basestring,
15     integer_types = (int, long)
16     class_types = (type, types.ClassType)
17     text_type = unicode
18     binary_type = str

　可以看到，PY2中是unicode，至此到底了。根據代碼，果然最終是轉成utf-8編碼的，可我傳入的就是utf-8編碼啊！

問題還是沒有解決，陷入更深的迷茫...

（5）峰回路轉

　　既然pyttsx沒有問題，那難道是sapi5的問題？轉而搜索sapi5。

　　 sapi5（The Microsoft Speech API）是微軟提供的語音API接口，win10系統提供的是最新的5.4版本，pyttsx中say最后調用的就是其中的ISpVoice::Speak接口，MSDN上有詳細的介紹。（https://msdn.microsoft.com/en-us/library/ee125024(v=vs.85).aspx）

　　從MSDN的介紹中，可以看出輸入可以是字符串，也可以是文件名，也可以是XML格式。輸入格式為LPCWSTR，指向unicode串。

ISpVoice:: Speak speaks the contents of a text string or file.

HRESULT Speak(
   LPCWSTR       *pwcs,
   DWORD          dwFlags,
   ULONG         *pulStreamNumber
);
Parameters

pwcs
[in, string] Pointer to the null-terminated text string (possibly containing XML markup) to be synthesized. This value can be NULL when dwFlags is set to SPF_PURGEBEFORESPEAK indicating that any remaining data to be synthesized should be discarded. If dwFlags is set to SPF_IS_FILENAME, this value should point to a null-terminated, fully qualified path to a file.
dwFlags
[in] Flags used to control the rendering process for this call. The flag values are contained in the SPEAKFLAGS enumeration.
pulStreamNumber
[out] Pointer to a ULONG which receives the current input stream number associated with this Speak request. Each time a string is spoken, an associated stream number is returned. Events queued back to the application related to this string will contain this number. If NULL, no value is passed back.

　　似乎看不出什么，但VS針對LPCWSTR，有兩種解析方式，一種是按照unicode體系，另外一種是mbcs體系了。現在utf-8編碼明顯不正確，證明實際COM組件並不是按照unicode體系來解析的，那似乎應該換mbcs來看看。windows中文系統在mbcs編碼體系下，字符集基本使用的就是GBK了。嗯，或許應該試試GBK？

　　先用其他方式驗證一下，參考網上的代碼用js寫了一段tts轉換的，核心讀取很簡單。

1         function j2()
2         {
3             var fso=new ActiveXObject("SAPI.SpVoice");
4             fso.Speak(arr[i]);
5             i=i+1;
6             setTimeout('j1()',100);
7             return i;
8         }

結果，當txt文件為utf-8格式時，讀取的結果和python實現的一樣；當text文件為簡體中文格式時，能夠正確朗讀。而一般文本編輯器在選擇簡體中文時，使用的就是GBK編碼。

（6）黎明到來

再次修改代碼，將文件編碼指定為gb18030，執行，結果還是不對...

 1 #-*- coding: gb18030  -*-
 2 import sys
 3 import pyttsx
 4 import chardet
 5 
 6 reload(sys)
 7 sys.setdefaultencoding("gb18030")
 8 
 9 text = '你好，中文測試'
10 engine = pyttsx.init()
11 engine.say(text)
12 engine.runAndWait()

　嗯，細心的同學已經猜到了，好吧，我承認我記性不好！

　　之前探究pyttsx時，最終實際是按照下方的鏈條進行了編碼轉化：

輸入的str（gb18030）--> unicode(系統默認coding，我指定的是"gb18030") --> str（utf8）

看來，如果要使用GBK編碼，只能改pyttsx的代碼了。難道是pyttsx的源碼錯了？去github上看看，結果... 好吧，我只能捂臉，大家看代碼吧。

sapi5.py中的say函數

1     def say(self, text):
2         self._proxy.setBusy(True)
3         self._proxy.notify('started-utterance')
4         self._speaking = True
5         self._tts.Speak(str(text), 19)

第5行已經被改成str了，不再是toUtf8了，而且這個修改時16/5發生的，到底我下了一個什么樣的版本? 從github上重新下載版本，安裝執行最后一個版本，成功。

原來是我自己的錯，反思一下。

慢着慢着，第一次我好像也是從github，打開chrome的歷史下載記錄：

第一次：https://codeload.github.com/westonpace/pyttsx/zip/master

第二次：https://codeload.github.com/RapidWareTech/pyttsx/zip/master

李逵碰到李鬼？重新打開第一次的下載，在https://github.com/westonpace/pyttsx 上豁然發現

westonpace/pyttsx

forked from RapidWareTech/pyttsx

哦，還是我疏忽了，大家一定要用正版啊！另外，吐個槽，https://pypi.python.org/pypi/pyttsx 也算是指定的下載點，但還是1.1的版本。

（7）復盤

　　問題雖然解決了，但還是有疑惑，中文只支持gbk（或者說mbcs體系）么？從結果反推是顯然的，但還是要探究一下。

　　我們知道，python不能直接調用com組件，需要先安裝pywin32。pywin32在安裝后，會在 /Lib/site-packages/下生成pythonwin、pywin32_system32（我用的是32位）、win32、win32com、win32comext、adodbapi等庫，其中和com組件關聯的主要是win32com。

　　在/win32com/client下，有一個makepy.py,它會根據要求生成所需com組件的py文件，生成的文件在 /win32com/gen_py/，我們在/win32com/gen_py/下果然看到

C866CA3A-32F7-11D2-9602-00C04F8EE628x0x5x4.py的文件，C866CA3A-32F7-11D2-9602-00C04F8EE628是CLSID，x0x5x4是版本。

（注：這個可以在 genpy.py中看到， self.base_mod_name = "win32com.gen_py." + str(clsid)[1:-1] + "x%sx%sx%s" % (lcid, major, minor)）

 1 # -*- coding: mbcs -*-
 2 # Created by makepy.py version 0.5.01
 3 # By python version 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
 4 # From type library '{C866CA3A-32F7-11D2-9602-00C04F8EE628}'
 5 # On Thu May 11 15:31:19 2017
 6 'Microsoft Speech Object Library'
 7 makepy_version = '0.5.01'
 8 python_version = 0x2070af0
 9 
10 import win32com.client.CLSIDToClass, pythoncom, pywintypes
11 import win32com.client.util
12 from pywintypes import IID
13 from win32com.client import Dispatch
14 
15 # The following 3 lines may need tweaking for the particular server
16 # Candidates are pythoncom.Missing, .Empty and .ArgNotFound
17 defaultNamedOptArg=pythoncom.Empty
18 defaultNamedNotOptArg=pythoncom.Empty
19 defaultUnnamedArg=pythoncom.Empty
20 
21 CLSID = IID('{C866CA3A-32F7-11D2-9602-00C04F8EE628}')

第一行codeing是mbcs，那這個是從哪里來的呢？

從genpy.py中，文件的實際編碼格式，是通過file參數指定的encoding來的。

 1  def do_gen_file_header(self):
 2     la = self.typelib.GetLibAttr()
 3     moduleDoc = self.typelib.GetDocumentation(-1)
 4     docDesc = ""
 5     if moduleDoc[1]:
 6       docDesc = moduleDoc[1]
 7 
 8     # Reset all the 'per file' state
 9     self.bHaveWrittenDispatchBaseClass = 0
10     self.bHaveWrittenCoClassBaseClass = 0
11     self.bHaveWrittenEventBaseClass = 0
12     # You must provide a file correctly configured for writing unicode.
13     # We assert this is it may indicate somewhere in pywin32 that needs
14     # upgrading.
15     assert self.file.encoding, self.file
16     encoding = self.file.encoding # or "mbcs"
17 
18     print >> self.file, '# -*- coding: %s -*-' % (encoding,)
19     print >> self.file, '# Created by makepy.py version %s' % (makepy_version,)
20     print >> self.file, '# By python version %s' % \
21                         (sys.version.replace("\n", "-"),)

回溯代碼，發現這個file來源於makepy.py的main函數

 1 def main():
 2     import getopt
 3     hiddenSpec = 1
 4     outputName = None
 5     verboseLevel = 1
 6     doit = 1
 7     bForDemand = bForDemandDefault
 8     try:
 9         opts, args = getopt.getopt(sys.argv[1:], 'vo:huiqd')
10         for o,v in opts:
11             if o=='-h':
12                 hiddenSpec = 0
13             elif o=='-o':
14                 outputName = v
15             elif o=='-v':
16                 verboseLevel = verboseLevel + 1
17             elif o=='-q':
18                 verboseLevel = verboseLevel - 1
19             elif o=='-i':
20                 if len(args)==0:
21                     ShowInfo(None)
22                 else:
23                     for arg in args:
24                         ShowInfo(arg)
25                 doit = 0
26             elif o=='-d':
27                 bForDemand = not bForDemand
28 
29     except (getopt.error, error), msg:
30         sys.stderr.write (str(msg) + "\n")
31         usage()
32 
33     if bForDemand and outputName is not None:
34         sys.stderr.write("Can not use -d and -o together\n")
35         usage()
36 
37     if not doit:
38         return 0        
39     if len(args)==0:
40         rc = selecttlb.SelectTlb()
41         if rc is None:
42             sys.exit(1)
43         args = [ rc ]
44 
45     if outputName is not None:
46         path = os.path.dirname(outputName)
47         if path is not '' and not os.path.exists(path):
48             os.makedirs(path)
49         if sys.version_info > (3,0):
50             f = open(outputName, "wt", encoding="mbcs")
51         else:
52             import codecs # not available in py3k.
53             f = codecs.open(outputName, "w", "mbcs")            
54     else:
55         f = None
56 
57     for arg in args:
58         GenerateFromTypeLibSpec(arg, f, verboseLevel = verboseLevel, bForDemand = bForDemand, bBuildHidden = hiddenSpec)
59 
60     if f:    
61         f.close()

可以看到，在文件指定時，打開的時候指定了mbcs的方式；在文件不指定時，后邊默認也會通過mbcs來編碼。因此從這里可以看到，win32com目前的版本最終都會轉成mbcs編碼格式。

OK，至此可以看出，win32com使用的是mbcs的編碼格式，問題終於搞定。

4、總結

問題終於圓滿解決了，總結一下。

（1）使用前檢查是否為最新版，一定去github上。

（2）不光是最新版，還得正本清源，去源頭看看，不要犯我的錯誤，搜索后沒細看就下載了。

（3）開源時代，出現問題，多研究源碼。

（4）python2的問題，編碼轉換太復雜，如果出現問題，是需要具體問題具體分析的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 kaldi與中文語音識別 freeswitch配置既能打電話又能語音識別問題（並個unimrcp傳自定義參數） GRU-CTC中文語音識別 Android中谷歌語音識別應用探究語音識別-TDNN Python 語音識別語音識別 -- 概述語音識別基礎基於android的語音識別語音識別概述