微軟認知服務實現語音識別功能
想實現語音識別已經很久了,也嘗試了許多次,終究還是失敗了,原因很多,識別效果不理想,個人在技術上沒有成功實現,種種原因,以至於花費了好多時間在上面。語音識別,我嘗試過的有科大訊飛、百度語音,微軟系。最終還是喜歡微軟系的簡潔高效。(勿噴,純個人感覺)
最開始自己的想法是我說一句話(暫且在控制台上做Demo),控制台程序能識別我說的是什么,然后顯示出來,並且根據我說的信息,執行相應的行為.(想法很美好,現實很糟心)初入語音識別,各種錯誤各種來,徘徊不定的選擇哪家公司的api,百度上查找各種語音識別的demo,學習參考,可是真正在.NET平台上運行成功的卻是寥寥無幾,或許是我查找方向有問題,經歷了許多的坑,沒一次成功過,心灰且意冷,打了幾次退堂鼓,卻終究忍受不住想玩語音識別。
可以看看我VS中的語音demo
第一個是今天的主角-稍后再提。
第二個和第三個是微軟系的系統自帶的System.Speech.dll和看了微軟博客里面的一篇文章而去嘗試的Microsoft.Speech.dll 可惜文章寫的挺好的,我嘗試卻是失敗 的,並且發現一個問題,就是英文版的微軟語音識別是無效的(Microsoft.Speech.Recognition),而中文版的語音合成是無效的(Microsoft.Speech.Synthesis).,因 此,我不得不將兩個dll混合使用,來達到我想要的效果,最終效果確實達到了,不過卻是極其簡單的,一旦識別詞匯多起來,這識別率直接下降,我一直認為是采樣 頻率的問題,可是怎么也找不到采樣頻率的屬性或是字段,如有會的朋友可給我點信息,讓我也飛起來,哈哈。
第四個是百度語音識別demo,代碼簡潔許多,實現難度不難,可是小細節很多,需要注意,然后是雷區挺多的,但是呢,指導走出雷區的說明書卻是太少了,我是 踩了雷,很痛的那群。
首先來看看,現在市面上主流語音識別設計方式:
1、離線語音識別
離線語音識別很好理解,就是語音識別庫在本地或是局域網內,無需發起遠程連接。這個也是我當初的想法,自己弄一套語音識別庫,然后根據里面的內容設計想要的行為請求。利用微軟系的System.Speech.dll中的語音識別和語音合成功能。實現了簡單的中文語音識別功能,但是一旦我將語音識別庫逐漸加大,識別率就越來越低,不知是我電腦麥克風不行還是其它原因。最終受打擊,放棄。當我試着學習百度語音時,也發現了離線語音識別庫,但是呢官方並沒有給出具體的操作流程和設計思路,我也沒有去深入了解,有時間我要好好了解一番。
1 using System; 2 //using Microsoft.Speech.Synthesis;//中文版tts不能發聲 3 using Microsoft.Speech.Recognition; 4 using System.Speech.Synthesis; 5 //using System.Speech.Recognition; 6 7 namespace SAssassin.SpeechDemo 8 { 9 /// <summary> 10 /// 微軟語音識別 中文版 貌似效果還好點 11 /// </summary> 12 class Program 13 { 14 static SpeechSynthesizer sy = new SpeechSynthesizer(); 15 static void Main(string[] args) 16 { 17 //創建中文識別器 18 using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("zh-CN"))) 19 { 20 foreach (var config in SpeechRecognitionEngine.InstalledRecognizers()) 21 { 22 Console.WriteLine(config.Id); 23 } 24 //初始化命令詞 25 Choices commonds = new Choices(); 26 string[] commond1 = new string[] { "一", "二", "三", "四", "五", "六", "七", "八", "九" }; 27 string[] commond2 = new string[] { "很高興見到你", "識別率", "assassin", "長沙", "湖南", "實習" }; 28 string[] commond3 = new string[] { "開燈", "關燈", "播放音樂", "關閉音樂", "澆水", "停止澆水", "打開背景燈", "關閉背景燈" }; 29 //添加命令詞 30 commonds.Add(commond1); 31 commonds.Add(commond2); 32 commonds.Add(commond3); 33 //初始化命令詞管理 34 GrammarBuilder gBuilder = new GrammarBuilder(); 35 //將命令詞添加到管理中 36 gBuilder.Append(commonds); 37 //實例化命令詞管理 38 Grammar grammar = new Grammar(gBuilder); 39 40 //創建並加載聽寫語法(添加命令詞匯識別的比較精准) 41 recognizer.LoadGrammarAsync(grammar); 42 //為語音識別事件添加處理程序。 43 recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Recognizer_SpeechRRecongized); 44 //將輸入配置到語音識別器。 45 recognizer.SetInputToDefaultAudioDevice(); 46 //啟動異步,連續語音識別。 47 recognizer.RecognizeAsync(RecognizeMode.Multiple); 48 //保持控制台窗口打開。 49 Console.WriteLine("你好"); 50 sy.Speak("你好"); 51 Console.ReadLine(); 52 } 53 } 54 55 //speechrecognized事件處理 56 static void Recognizer_SpeechRRecongized(object sender, SpeechRecognizedEventArgs e) 57 { 58 Console.WriteLine("識別結果:" + e.Result.Text + " " + e.Result.Confidence + " " + DateTime.Now); 59 sy.Speak(e.Result.Text); 60 } 61 } 62 }
2、在線語音識別。
在線語音識別是我們當前程序將語音文件發送到遠程服務中心,待遠程服務中心匹配解決后將匹配結果進行返回的過程。其使用的一般是Restful風格,利用Json數據往返識別結果。
剛開始學習科大訊飛的語音識別,剛開始什么也不懂,聽朋友推薦加上自己百度學習,科大訊飛都說很不錯,也抱着心態去學習學習,可是windows平台下只有C++的demo,無奈我是C#,雖說語言很大程度上不分家,可是不想過於麻煩,網上找了一個demo,據說是最全的C#版本的訊飛語音識別demo,可是當看到里面錯綜復雜的源代碼時,內心是憂傷的,這里是直接通過一種方式引用c++的函數,運行了該demo,成功了,能簡單的錄音然后識別,但是有些地方存在問題,也得不到解決方案,不得已,放棄。
后來,百度語音吸引我了,七月份時,重新開始看百度語音的demo,官網demo比較簡單,嘗試着學習了一下,首先你得到百度語音開放平台去創建應用得到App key 和Secret key,然后下載着demo,在構造函數或者字段中又或是寫入配置文件中,將這兩個得到的key寫入,程序會根據這兩個key去發起請求的。就如同開頭所說,這是在線語音識別,利用Restful風格,將語音文件上傳至百度語音識別中心,然后識別后將回執數據返回到我們的程序中,剛開始,配置的時候自己技術不怎么樣,配置各種出錯,地雷開始踩了,總要炸幾次,最終還是能將demo中的測試文件識別出來,算是我個人的一小步把.(如果有朋友正好碰到踩雷問題,不妨可與我一起探討,或許我也不懂,但在我踩過的里面至少我懂了,哈哈)
接下來是設計思路的問題,語音識別能成功了,語音合成也能成功了,這里要注意,語音識別和語音合成要分別開通,並且這兩個都有App Key和Secret Key 雖然是一樣的,但是還是要注意,不然語音合成就會出問題的。接下來要考慮的問題就是,百度語音的設計思路是根據文件識別,但是我們考慮的最多的就是我直接麥克風語音輸入,然后識別,這也是我的想法,接下來解決這一問題,設計思路是,我將輸入的信息作為文件形式保存,等我輸入完,然后就調用語音識別方法,這不就行了嗎,確實也是可以的,此處,又開始進入雷區了,利用NAudio.dll文件實現錄音功能,這個包可以在Nuget中下載。
1 using NAudio.Wave; 2 using System; 3 4 namespace SAssassin.VOC 5 { 6 /// <summary> 7 /// 實現錄音功能 8 /// </summary> 9 public class RecordWaveToFile 10 { 11 private WaveFileWriter waveFileWriter = null; 12 private WaveIn myWaveIn = null; 13 14 public void StartRecord() 15 { 16 ConfigWave(); 17 myWaveIn.StartRecording(); 18 } 19 20 private void ConfigWave() 21 { 22 string filePath = AppDomain.CurrentDomain.BaseDirectory + "Temp.wav"; 23 myWaveIn = new WaveIn() 24 { 25 WaveFormat = new WaveFormat(16000, 16, 1)//8k,16bit,單頻 26 //WaveFormat = new WaveFormat()//識別音質清晰 27 }; 28 myWaveIn.DataAvailable += new System.EventHandler<WaveInEventArgs>(WaveIn_DataAvailable); 29 myWaveIn.RecordingStopped += new System.EventHandler<StoppedEventArgs>(WaveIn_RecordingStopped); 30 waveFileWriter = new WaveFileWriter(filePath, myWaveIn.WaveFormat); 31 } 32 33 private void WaveIn_DataAvailable(object sender,WaveInEventArgs e) 34 { 35 if(waveFileWriter != null) 36 { 37 waveFileWriter.Write(e.Buffer,0,e.BytesRecorded); 38 waveFileWriter.Flush(); 39 } 40 } 41 42 private void WaveIn_RecordingStopped(object sender,StoppedEventArgs e) 43 { 44 myWaveIn.StopRecording(); 45 } 46 } 47 }
此處控制器中使用WaveInEvent不會報錯,可就在這之前,我用的是WaveIn類,然后直接報錯了
“System.InvalidOperationException:“Use WaveInEvent to record on a background thread””
在StackOverFlow上找到了解決方案,就是將WaveIn類換成WaveInEvent類即可,進入類里面看一下,其實發現都是引用同一個接口,甚至說兩個類的結構都是一模一樣的,只是一個用於GUI線程,一個用於后台線程。一切就緒,錄音也能實現,可是當我查看自己的錄音文件時,雜音好多,音質不侵襲,甚至是直接失真了,沒什么用,送百度也識別失敗,當將采樣頻率提高到44k時效果很好,錄音文件很不錯,但是問題來了,百度語音識別規定的pcm文件只能是8k-16bit,糟心,想換成其它格式的文件,采取壓縮形式保存,但是一旦將采樣頻率降下來,這個效果就很糟糕,識別也是成了問題。不得不說,這還要慢慢來解決哈。
進入今天重頭戲,這也是我博客園第一篇隨筆文章,該講點重點了,微軟認知服務,七月中旬的時候接觸到了必應的語音識別api,在微軟bing官網里,並且里面的識別效果,讓我驚呼,這識別率太高了。然后想找它的api,發現文檔全是英文資料,糟心。把資料看完,感覺使用方式很不錯,也是遠程調用的方式,但是api呢,官網找了老半天,只有文檔,那時也沒看上面的產品,試用版什么的,只能看着,卻不能用,心累。也就在這幾天,重新看了下必應的語音識別文檔,才接觸到這個詞--"微軟認知服務", 恕我見識太淺,這個好東西卻沒聽過,百度一查,真是不錯,微軟太牛了,這個里面包含很多api,語音識別都只算小菜一只,人臉識別,語義感知,等等很牛的功能,找到Api,找到免費試用,登錄獲得app的secret key ,便可以用起來了。下載一個demo,將secret key輸入,測試一下,哇塞,這識別效果,簡直了,太強了。並且從百度中看到很多結果,使用到微軟認知服務語音識別功能的很少,我也因此有寫一點東西的想法。
我將demo中的很多地方抽出來直接形成了一個控制器程序,源碼如下
1 public class SpeechConfig 2 { 3 #region Fields 4 /// <summary> 5 /// The isolated storage subscription key file name. 6 /// </summary> 7 private const string IsolatedStorageSubscriptionKeyFileName = "Subscription.txt"; 8 9 /// <summary> 10 /// The default subscription key prompt message 11 /// </summary> 12 private const string DefaultSubscriptionKeyPromptMessage = "Secret key"; 13 14 /// <summary> 15 /// You can also put the primary key in app.config, instead of using UI. 16 /// string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"]; 17 /// </summary> 18 private string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"]; 19 20 /// <summary> 21 /// Gets or sets subscription key 22 /// </summary> 23 public string SubscriptionKey 24 { 25 get 26 { 27 return this.subscriptionKey; 28 } 29 30 set 31 { 32 this.subscriptionKey = value; 33 this.OnPropertyChanged<string>(); 34 } 35 } 36 37 /// <summary> 38 /// The data recognition client 39 /// </summary> 40 private DataRecognitionClient dataClient; 41 42 /// <summary> 43 /// The microphone client 44 /// </summary> 45 private MicrophoneRecognitionClient micClient; 46 47 #endregion Fields 48 49 #region event 50 /// <summary> 51 /// Implement INotifyPropertyChanged interface 52 /// </summary> 53 public event PropertyChangedEventHandler PropertyChanged; 54 55 /// <summary> 56 /// Helper function for INotifyPropertyChanged interface 57 /// </summary> 58 /// <typeparam name="T">Property type</typeparam> 59 /// <param name="caller">Property name</param> 60 private void OnPropertyChanged<T>([CallerMemberName]string caller = null) 61 { 62 this.PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(caller)); 63 } 64 #endregion event 65 66 #region 屬性 67 /// <summary> 68 /// Gets the current speech recognition mode. 69 /// </summary> 70 /// <value> 71 /// The speech recognition mode. 72 /// </value> 73 private SpeechRecognitionMode Mode 74 { 75 get 76 { 77 if (this.IsMicrophoneClientDictation || 78 this.IsDataClientDictation) 79 { 80 return SpeechRecognitionMode.LongDictation; 81 } 82 83 return SpeechRecognitionMode.ShortPhrase; 84 } 85 } 86 87 /// <summary> 88 /// Gets the default locale. 89 /// </summary> 90 /// <value> 91 /// The default locale. 92 /// </value> 93 private string DefaultLocale 94 { 95 //get { return "en-US"; } 96 get { return "zh-CN"; } 97 98 } 99 100 /// <summary> 101 /// Gets the Cognitive Service Authentication Uri. 102 /// </summary> 103 /// <value> 104 /// The Cognitive Service Authentication Uri. Empty if the global default is to be used. 105 /// </value> 106 private string AuthenticationUri 107 { 108 get 109 { 110 return ConfigurationManager.AppSettings["AuthenticationUri"]; 111 } 112 } 113 114 /// <summary> 115 /// Gets a value indicating whether or not to use the microphone. 116 /// </summary> 117 /// <value> 118 /// <c>true</c> if [use microphone]; otherwise, <c>false</c>. 119 /// </value> 120 private bool UseMicrophone 121 { 122 get 123 { 124 return this.IsMicrophoneClientWithIntent || 125 this.IsMicrophoneClientDictation || 126 this.IsMicrophoneClientShortPhrase; 127 } 128 } 129 130 /// <summary> 131 /// Gets the short wave file path. 132 /// </summary> 133 /// <value> 134 /// The short wave file. 135 /// </value> 136 private string ShortWaveFile 137 { 138 get 139 { 140 return ConfigurationManager.AppSettings["ShortWaveFile"]; 141 } 142 } 143 144 /// <summary> 145 /// Gets the long wave file path. 146 /// </summary> 147 /// <value> 148 /// The long wave file. 149 /// </value> 150 private string LongWaveFile 151 { 152 get 153 { 154 return ConfigurationManager.AppSettings["LongWaveFile"]; 155 } 156 } 157 #endregion 屬性 158 159 #region 模式選擇控制器設置 160 /// <summary> 161 /// Gets or sets a value indicating whether this instance is microphone client short phrase. 162 /// </summary> 163 /// <value> 164 /// <c>true</c> if this instance is microphone client short phrase; otherwise, <c>false</c>. 165 /// </value> 166 public bool IsMicrophoneClientShortPhrase { get; set; } 167 168 /// <summary> 169 /// Gets or sets a value indicating whether this instance is microphone client dictation. 170 /// </summary> 171 /// <value> 172 /// <c>true</c> if this instance is microphone client dictation; otherwise, <c>false</c>. 173 /// </value> 174 public bool IsMicrophoneClientDictation { get; set; } 175 176 /// <summary> 177 /// Gets or sets a value indicating whether this instance is microphone client with intent. 178 /// </summary> 179 /// <value> 180 /// <c>true</c> if this instance is microphone client with intent; otherwise, <c>false</c>. 181 /// </value> 182 public bool IsMicrophoneClientWithIntent { get; set; } 183 184 /// <summary> 185 /// Gets or sets a value indicating whether this instance is data client short phrase. 186 /// </summary> 187 /// <value> 188 /// <c>true</c> if this instance is data client short phrase; otherwise, <c>false</c>. 189 /// </value> 190 public bool IsDataClientShortPhrase { get; set; } 191 192 /// <summary> 193 /// Gets or sets a value indicating whether this instance is data client with intent. 194 /// </summary> 195 /// <value> 196 /// <c>true</c> if this instance is data client with intent; otherwise, <c>false</c>. 197 /// </value> 198 public bool IsDataClientWithIntent { get; set; } 199 200 /// <summary> 201 /// Gets or sets a value indicating whether this instance is data client dictation. 202 /// </summary> 203 /// <value> 204 /// <c>true</c> if this instance is data client dictation; otherwise, <c>false</c>. 205 /// </value> 206 public bool IsDataClientDictation { get; set; } 207 208 #endregion 209 210 #region 委托執行對象 211 /// <summary> 212 /// Called when the microphone status has changed. 213 /// </summary> 214 /// <param name="sender">The sender.</param> 215 /// <param name="e">The <see cref="MicrophoneEventArgs"/> instance containing the event data.</param> 216 private void OnMicrophoneStatus(object sender, MicrophoneEventArgs e) 217 { 218 Task task = new Task(() => 219 { 220 Console.WriteLine("--- Microphone status change received by OnMicrophoneStatus() ---"); 221 Console.WriteLine("********* Microphone status: {0} *********", e.Recording); 222 if (e.Recording) 223 { 224 Console.WriteLine("Please start speaking."); 225 } 226 227 Console.WriteLine(); 228 }); 229 task.Start(); 230 } 231 232 /// <summary> 233 /// Called when a partial response is received. 234 /// </summary> 235 /// <param name="sender">The sender.</param> 236 /// <param name="e">The <see cref="PartialSpeechResponseEventArgs"/> instance containing the event data.</param> 237 private void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e) 238 { 239 Console.WriteLine("--- Partial result received by OnPartialResponseReceivedHandler() ---"); 240 Console.WriteLine("{0}", e.PartialResult); 241 Console.WriteLine(); 242 } 243 244 /// <summary> 245 /// Called when an error is received. 246 /// </summary> 247 /// <param name="sender">The sender.</param> 248 /// <param name="e">The <see cref="SpeechErrorEventArgs"/> instance containing the event data.</param> 249 private void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e) 250 { 251 Console.WriteLine("--- Error received by OnConversationErrorHandler() ---"); 252 Console.WriteLine("Error code: {0}", e.SpeechErrorCode.ToString()); 253 Console.WriteLine("Error text: {0}", e.SpeechErrorText); 254 Console.WriteLine(); 255 } 256 257 /// <summary> 258 /// Called when a final response is received; 259 /// </summary> 260 /// <param name="sender">The sender.</param> 261 /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param> 262 private void OnMicShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e) 263 { 264 Task task = new Task(() => 265 { 266 Console.WriteLine("--- OnMicShortPhraseResponseReceivedHandler ---"); 267 268 // we got the final result, so it we can end the mic reco. No need to do this 269 // for dataReco, since we already called endAudio() on it as soon as we were done 270 // sending all the data. 271 this.micClient.EndMicAndRecognition(); 272 273 this.WriteResponseResult(e); 274 }); 275 task.Start(); 276 } 277 278 /// <summary> 279 /// Called when a final response is received; 280 /// </summary> 281 /// <param name="sender">The sender.</param> 282 /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param> 283 private void OnDataShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e) 284 { 285 Task task = new Task(() => 286 { 287 Console.WriteLine("--- OnDataShortPhraseResponseReceivedHandler ---"); 288 289 // we got the final result, so it we can end the mic reco. No need to do this 290 // for dataReco, since we already called endAudio() on it as soon as we were done 291 // sending all the data. 292 this.WriteResponseResult(e); 293 294 }); 295 task.Start(); 296 } 297 298 /// <summary> 299 /// Called when a final response is received; 300 /// </summary> 301 /// <param name="sender">The sender.</param> 302 /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param> 303 private void OnMicDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e) 304 { 305 Console.WriteLine("--- OnMicDictationResponseReceivedHandler ---"); 306 if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation || 307 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout) 308 { 309 Task task = new Task(() => 310 { 311 // we got the final result, so it we can end the mic reco. No need to do this 312 // for dataReco, since we already called endAudio() on it as soon as we were done 313 // sending all the data. 314 this.micClient.EndMicAndRecognition(); 315 }); 316 task.Start(); 317 } 318 319 this.WriteResponseResult(e); 320 } 321 322 /// <summary> 323 /// Called when a final response is received; 324 /// </summary> 325 /// <param name="sender">The sender.</param> 326 /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param> 327 private void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e) 328 { 329 Console.WriteLine("--- OnDataDictationResponseReceivedHandler ---"); 330 if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation || 331 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout) 332 { 333 Task task = new Task(() => 334 { 335 336 // we got the final result, so it we can end the mic reco. No need to do this 337 // for dataReco, since we already called endAudio() on it as soon as we were done 338 // sending all the data. 339 }); 340 task.Start(); 341 } 342 343 this.WriteResponseResult(e); 344 } 345 346 /// <summary> 347 /// Sends the audio helper. 348 /// </summary> 349 /// <param name="wavFileName">Name of the wav file.</param> 350 private void SendAudioHelper(string wavFileName) 351 { 352 using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read)) 353 { 354 // Note for wave files, we can just send data from the file right to the server. 355 // In the case you are not an audio file in wave format, and instead you have just 356 // raw data (for example audio coming over bluetooth), then before sending up any 357 // audio data, you must first send up an SpeechAudioFormat descriptor to describe 358 // the layout and format of your raw audio data via DataRecognitionClient's sendAudioFormat() method. 359 int bytesRead = 0; 360 byte[] buffer = new byte[1024]; 361 362 try 363 { 364 do 365 { 366 // Get more Audio data to send into byte buffer. 367 bytesRead = fileStream.Read(buffer, 0, buffer.Length); 368 369 // Send of audio data to service. 370 this.dataClient.SendAudio(buffer, bytesRead); 371 } 372 while (bytesRead > 0); 373 } 374 finally 375 { 376 // We are done sending audio. Final recognition results will arrive in OnResponseReceived event call. 377 this.dataClient.EndAudio(); 378 } 379 } 380 } 381 #endregion 委托執行對象 382 383 #region 輔助方法 384 /// <summary> 385 /// Gets the subscription key from isolated storage. 386 /// </summary> 387 /// <returns>The subscription key.</returns> 388 private string GetSubscriptionKeyFromIsolatedStorage() 389 { 390 string subscriptionKey = null; 391 392 using (IsolatedStorageFile isoStore = IsolatedStorageFile.GetStore(IsolatedStorageScope.User | IsolatedStorageScope.Assembly, null, null)) 393 { 394 try 395 { 396 using (var iStream = new IsolatedStorageFileStream(IsolatedStorageSubscriptionKeyFileName, FileMode.Open, isoStore)) 397 { 398 using (var reader = new StreamReader(iStream)) 399 { 400 subscriptionKey = reader.ReadLine(); 401 } 402 } 403 } 404 catch (FileNotFoundException) 405 { 406 subscriptionKey = null; 407 } 408 } 409 410 if (string.IsNullOrEmpty(subscriptionKey)) 411 { 412 subscriptionKey = DefaultSubscriptionKeyPromptMessage; 413 } 414 415 return subscriptionKey; 416 } 417 418 /// <summary> 419 /// Creates a new microphone reco client without LUIS intent support. 420 /// </summary> 421 private void CreateMicrophoneRecoClient() 422 { 423 this.micClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient( 424 this.Mode,this.DefaultLocale,this.SubscriptionKey); 425 426 this.micClient.AuthenticationUri = this.AuthenticationUri; 427 428 // Event handlers for speech recognition results 429 this.micClient.OnMicrophoneStatus += this.OnMicrophoneStatus; 430 this.micClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler; 431 if (this.Mode == SpeechRecognitionMode.ShortPhrase) 432 { 433 this.micClient.OnResponseReceived += this.OnMicShortPhraseResponseReceivedHandler; 434 } 435 else if (this.Mode == SpeechRecognitionMode.LongDictation) 436 { 437 this.micClient.OnResponseReceived += this.OnMicDictationResponseReceivedHandler; 438 } 439 440 this.micClient.OnConversationError += this.OnConversationErrorHandler; 441 } 442 443 /// <summary> 444 /// Creates a data client without LUIS intent support. 445 /// Speech recognition with data (for example from a file or audio source). 446 /// The data is broken up into buffers and each buffer is sent to the Speech Recognition Service. 447 /// No modification is done to the buffers, so the user can apply their 448 /// own Silence Detection if desired. 449 /// </summary> 450 private void CreateDataRecoClient() 451 { 452 this.dataClient = SpeechRecognitionServiceFactory.CreateDataClient( 453 this.Mode, 454 this.DefaultLocale, 455 this.SubscriptionKey); 456 this.dataClient.AuthenticationUri = this.AuthenticationUri; 457 458 // Event handlers for speech recognition results 459 if (this.Mode == SpeechRecognitionMode.ShortPhrase) 460 { 461 this.dataClient.OnResponseReceived += this.OnDataShortPhraseResponseReceivedHandler; 462 } 463 else 464 { 465 this.dataClient.OnResponseReceived += this.OnDataDictationResponseReceivedHandler; 466 } 467 468 this.dataClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler; 469 this.dataClient.OnConversationError += this.OnConversationErrorHandler; 470 } 471 472 /// <summary> 473 /// Writes the response result. 474 /// </summary> 475 /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param> 476 private void WriteResponseResult(SpeechResponseEventArgs e) 477 { 478 if (e.PhraseResponse.Results.Length == 0) 479 { 480 Console.WriteLine("No phrase response is available."); 481 } 482 else 483 { 484 Console.WriteLine("********* Final n-BEST Results *********"); 485 for (int i = 0; i < e.PhraseResponse.Results.Length; i++) 486 { 487 Console.WriteLine( 488 "[{0}] Confidence={1}, Text=\"{2}\"", 489 i, 490 e.PhraseResponse.Results[i].Confidence, 491 e.PhraseResponse.Results[i].DisplayText); 492 if (e.PhraseResponse.Results[i].DisplayText == "關閉。") 493 { 494 Console.WriteLine("收到命令,馬上關閉"); 495 } 496 } 497 498 Console.WriteLine(); 499 } 500 } 501 #endregion 輔助方法 502 503 #region Init 504 public SpeechConfig() 505 { 506 this.IsMicrophoneClientShortPhrase = true; 507 this.IsMicrophoneClientWithIntent = false; 508 this.IsMicrophoneClientDictation = false; 509 this.IsDataClientShortPhrase = false; 510 this.IsDataClientWithIntent = false; 511 this.IsDataClientDictation = false; 512 513 this.SubscriptionKey = this.GetSubscriptionKeyFromIsolatedStorage(); 514 } 515 516 /// <summary> 517 /// 語音識別開始執行 518 /// </summary> 519 public void SpeechRecognize() 520 { 521 if (this.UseMicrophone) 522 { 523 if (this.micClient == null) 524 { 525 this.CreateMicrophoneRecoClient(); 526 } 527 528 this.micClient.StartMicAndRecognition(); 529 } 530 else 531 { 532 if (null == this.dataClient) 533 { 534 this.CreateDataRecoClient(); 535 } 536 537 this.SendAudioHelper((this.Mode == SpeechRecognitionMode.ShortPhrase) ? this.ShortWaveFile : this.LongWaveFile); 538 } 539 } 540 #endregion Init 541 }
在這其中有幾個引用文件可以通過nuget包下載,基本沒什么問題。
對了這里注意的一個問題就是,下載Microsoft.Speech的時候一定是兩個包都需要下載,不然會報錯的,版本必須是4.5+以上的。
只需替換默認的key就行,程序便可跑起來,效果真是很6
這識別率真是很好很好,很滿意,可是這個微軟的免費試用只有一個月,那就只能在這個月里多讓它開花結果了哈哈。