微軟認知服務實現語音識別功能


微軟認知服務實現語音識別功能

想實現語音識別已經很久了,也嘗試了許多次,終究還是失敗了,原因很多,識別效果不理想,個人在技術上沒有成功實現,種種原因,以至於花費了好多時間在上面。語音識別,我嘗試過的有科大訊飛、百度語音,微軟系。最終還是喜歡微軟系的簡潔高效。(勿噴,純個人感覺)

  最開始自己的想法是我說一句話(暫且在控制台上做Demo),控制台程序能識別我說的是什么,然后顯示出來,並且根據我說的信息,執行相應的行為.(想法很美好,現實很糟心)初入語音識別,各種錯誤各種來,徘徊不定的選擇哪家公司的api,百度上查找各種語音識別的demo,學習參考,可是真正在.NET平台上運行成功的卻是寥寥無幾,或許是我查找方向有問題,經歷了許多的坑,沒一次成功過,心灰且意冷,打了幾次退堂鼓,卻終究忍受不住想玩語音識別。

  可以看看我VS中的語音demo

  

  第一個是今天的主角-稍后再提。

  第二個和第三個是微軟系的系統自帶的System.Speech.dll和看了微軟博客里面的一篇文章而去嘗試的Microsoft.Speech.dll 可惜文章寫的挺好的,我嘗試卻是失敗   的,並且發現一個問題,就是英文版的微軟語音識別是無效的(Microsoft.Speech.Recognition),而中文版的語音合成是無效的(Microsoft.Speech.Synthesis).,因    此,我不得不將兩個dll混合使用,來達到我想要的效果,最終效果確實達到了,不過卻是極其簡單的,一旦識別詞匯多起來,這識別率直接下降,我一直認為是采樣  頻率的問題,可是怎么也找不到采樣頻率的屬性或是字段,如有會的朋友可給我點信息,讓我也飛起來,哈哈。

  第四個是百度語音識別demo,代碼簡潔許多,實現難度不難,可是小細節很多,需要注意,然后是雷區挺多的,但是呢,指導走出雷區的說明書卻是太少了,我是  踩了雷,很痛的那群。

 

  首先來看看,現在市面上主流語音識別設計方式:

  1、離線語音識別

  離線語音識別很好理解,就是語音識別庫在本地或是局域網內,無需發起遠程連接。這個也是我當初的想法,自己弄一套語音識別庫,然后根據里面的內容設計想要的行為請求。利用微軟系的System.Speech.dll中的語音識別和語音合成功能。實現了簡單的中文語音識別功能,但是一旦我將語音識別庫逐漸加大,識別率就越來越低,不知是我電腦麥克風不行還是其它原因。最終受打擊,放棄。當我試着學習百度語音時,也發現了離線語音識別庫,但是呢官方並沒有給出具體的操作流程和設計思路,我也沒有去深入了解,有時間我要好好了解一番。

 

復制代碼
 1 using System;
 2 //using Microsoft.Speech.Synthesis;//中文版tts不能發聲
 3 using Microsoft.Speech.Recognition;
 4 using System.Speech.Synthesis;
 5 //using System.Speech.Recognition;
 6 
 7 namespace SAssassin.SpeechDemo
 8 {
 9     /// <summary>
10     /// 微軟語音識別 中文版 貌似效果還好點
11     /// </summary>
12     class Program
13     {
14         static SpeechSynthesizer sy = new SpeechSynthesizer();
15         static void Main(string[] args)
16         {
17             //創建中文識別器  
18             using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("zh-CN")))
19             {
20                 foreach (var config in SpeechRecognitionEngine.InstalledRecognizers())
21                 {
22                     Console.WriteLine(config.Id);
23                 }
24                 //初始化命令詞  
25                 Choices commonds = new Choices();
26                 string[] commond1 = new string[] { "一", "二", "三", "四", "五", "六", "七", "八", "九" };
27                 string[] commond2 = new string[] { "很高興見到你", "識別率", "assassin", "長沙", "湖南", "實習" };
28                 string[] commond3 = new string[] { "開燈", "關燈", "播放音樂", "關閉音樂", "澆水", "停止澆水", "打開背景燈", "關閉背景燈" };
29                 //添加命令詞
30                 commonds.Add(commond1);
31                 commonds.Add(commond2);
32                 commonds.Add(commond3);
33                 //初始化命令詞管理  
34                 GrammarBuilder gBuilder = new GrammarBuilder();
35                 //將命令詞添加到管理中  
36                 gBuilder.Append(commonds);
37                 //實例化命令詞管理  
38                 Grammar grammar = new Grammar(gBuilder);
39 
40                 //創建並加載聽寫語法(添加命令詞匯識別的比較精准)  
41                 recognizer.LoadGrammarAsync(grammar);
42                 //為語音識別事件添加處理程序。  
43                 recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Recognizer_SpeechRRecongized);
44                 //將輸入配置到語音識別器。  
45                 recognizer.SetInputToDefaultAudioDevice();
46                 //啟動異步,連續語音識別。  
47                 recognizer.RecognizeAsync(RecognizeMode.Multiple);
48                 //保持控制台窗口打開。
49                 Console.WriteLine("你好");
50                 sy.Speak("你好");
51                 Console.ReadLine();
52             }
53         }
54 
55         //speechrecognized事件處理  
56         static void Recognizer_SpeechRRecongized(object sender, SpeechRecognizedEventArgs e)
57         {
58             Console.WriteLine("識別結果:" + e.Result.Text + " " + e.Result.Confidence + " " + DateTime.Now);
59             sy.Speak(e.Result.Text);
60         }
61     }
62 }
復制代碼

 

  2、在線語音識別。

  在線語音識別是我們當前程序將語音文件發送到遠程服務中心,待遠程服務中心匹配解決后將匹配結果進行返回的過程。其使用的一般是Restful風格,利用Json數據往返識別結果。

  剛開始學習科大訊飛的語音識別,剛開始什么也不懂,聽朋友推薦加上自己百度學習,科大訊飛都說很不錯,也抱着心態去學習學習,可是windows平台下只有C++的demo,無奈我是C#,雖說語言很大程度上不分家,可是不想過於麻煩,網上找了一個demo,據說是最全的C#版本的訊飛語音識別demo,可是當看到里面錯綜復雜的源代碼時,內心是憂傷的,這里是直接通過一種方式引用c++的函數,運行了該demo,成功了,能簡單的錄音然后識別,但是有些地方存在問題,也得不到解決方案,不得已,放棄。

  后來,百度語音吸引我了,七月份時,重新開始看百度語音的demo,官網demo比較簡單,嘗試着學習了一下,首先你得到百度語音開放平台去創建應用得到App key 和Secret key,然后下載着demo,在構造函數或者字段中又或是寫入配置文件中,將這兩個得到的key寫入,程序會根據這兩個key去發起請求的。就如同開頭所說,這是在線語音識別,利用Restful風格,將語音文件上傳至百度語音識別中心,然后識別后將回執數據返回到我們的程序中,剛開始,配置的時候自己技術不怎么樣,配置各種出錯,地雷開始踩了,總要炸幾次,最終還是能將demo中的測試文件識別出來,算是我個人的一小步把.(如果有朋友正好碰到踩雷問題,不妨可與我一起探討,或許我也不懂,但在我踩過的里面至少我懂了,哈哈)

  

 

  接下來是設計思路的問題,語音識別能成功了,語音合成也能成功了,這里要注意,語音識別和語音合成要分別開通,並且這兩個都有App Key和Secret Key 雖然是一樣的,但是還是要注意,不然語音合成就會出問題的。接下來要考慮的問題就是,百度語音的設計思路是根據文件識別,但是我們考慮的最多的就是我直接麥克風語音輸入,然后識別,這也是我的想法,接下來解決這一問題,設計思路是,我將輸入的信息作為文件形式保存,等我輸入完,然后就調用語音識別方法,這不就行了嗎,確實也是可以的,此處,又開始進入雷區了,利用NAudio.dll文件實現錄音功能,這個包可以在Nuget中下載。

復制代碼
 1 using NAudio.Wave;
 2 using System;
 3 
 4 namespace SAssassin.VOC
 5 {
 6     /// <summary>
 7     /// 實現錄音功能
 8     /// </summary>
 9     public class RecordWaveToFile
10     {
11         private WaveFileWriter waveFileWriter = null;
12         private WaveIn myWaveIn = null;
13 
14         public void StartRecord()
15         {
16             ConfigWave();
17             myWaveIn.StartRecording();
18         }
19 
20         private void ConfigWave()
21         {
22             string filePath = AppDomain.CurrentDomain.BaseDirectory + "Temp.wav";
23             myWaveIn = new WaveIn()
24             {
25                 WaveFormat = new WaveFormat(16000, 16, 1)//8k,16bit,單頻
26                 //WaveFormat = new WaveFormat()//識別音質清晰
27             };
28             myWaveIn.DataAvailable += new System.EventHandler<WaveInEventArgs>(WaveIn_DataAvailable);
29             myWaveIn.RecordingStopped += new System.EventHandler<StoppedEventArgs>(WaveIn_RecordingStopped);
30             waveFileWriter = new WaveFileWriter(filePath, myWaveIn.WaveFormat);
31         }
32 
33         private void WaveIn_DataAvailable(object sender,WaveInEventArgs e)
34         {
35             if(waveFileWriter != null)
36             {
37                 waveFileWriter.Write(e.Buffer,0,e.BytesRecorded);
38                 waveFileWriter.Flush();
39             }
40         }
41 
42         private void WaveIn_RecordingStopped(object sender,StoppedEventArgs e)
43         {
44             myWaveIn.StopRecording();
45         }
46     }
47 }
復制代碼

 

此處控制器中使用WaveInEvent不會報錯,可就在這之前,我用的是WaveIn類,然后直接報錯了

“System.InvalidOperationException:“Use WaveInEvent to record on a background thread””

  在StackOverFlow上找到了解決方案,就是將WaveIn類換成WaveInEvent類即可,進入類里面看一下,其實發現都是引用同一個接口,甚至說兩個類的結構都是一模一樣的,只是一個用於GUI線程,一個用於后台線程。一切就緒,錄音也能實現,可是當我查看自己的錄音文件時,雜音好多,音質不侵襲,甚至是直接失真了,沒什么用,送百度也識別失敗,當將采樣頻率提高到44k時效果很好,錄音文件很不錯,但是問題來了,百度語音識別規定的pcm文件只能是8k-16bit,糟心,想換成其它格式的文件,采取壓縮形式保存,但是一旦將采樣頻率降下來,這個效果就很糟糕,識別也是成了問題。不得不說,這還要慢慢來解決哈。


  進入今天重頭戲,這也是我博客園第一篇隨筆文章,該講點重點了,微軟認知服務,七月中旬的時候接觸到了必應的語音識別api,在微軟bing官網里,並且里面的識別效果,讓我驚呼,這識別率太高了。然后想找它的api,發現文檔全是英文資料,糟心。把資料看完,感覺使用方式很不錯,也是遠程調用的方式,但是api呢,官網找了老半天,只有文檔,那時也沒看上面的產品,試用版什么的,只能看着,卻不能用,心累。也就在這幾天,重新看了下必應的語音識別文檔,才接觸到這個詞--"微軟認知服務",     恕我見識太淺,這個好東西卻沒聽過,百度一查,真是不錯,微軟太牛了,這個里面包含很多api,語音識別都只算小菜一只,人臉識別,語義感知,等等很牛的功能,找到Api,找到免費試用,登錄獲得app的secret key ,便可以用起來了。下載一個demo,將secret key輸入,測試一下,哇塞,這識別效果,簡直了,太強了。並且從百度中看到很多結果,使用到微軟認知服務語音識別功能的很少,我也因此有寫一點東西的想法。

  我將demo中的很多地方抽出來直接形成了一個控制器程序,源碼如下

復制代碼
  1 public class SpeechConfig
  2     {
  3         #region Fields
  4         /// <summary>
  5         /// The isolated storage subscription key file name.
  6         /// </summary>
  7         private const string IsolatedStorageSubscriptionKeyFileName = "Subscription.txt";
  8 
  9         /// <summary>
 10         /// The default subscription key prompt message
 11         /// </summary>
 12         private const string DefaultSubscriptionKeyPromptMessage = "Secret key";
 13 
 14         /// <summary>
 15         /// You can also put the primary key in app.config, instead of using UI.
 16         /// string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"];
 17         /// </summary>
 18         private string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"];
 19 
 20         /// <summary>
 21         /// Gets or sets subscription key
 22         /// </summary>
 23         public string SubscriptionKey
 24         {
 25             get
 26             {
 27                 return this.subscriptionKey;
 28             }
 29 
 30             set
 31             {
 32                 this.subscriptionKey = value;
 33                 this.OnPropertyChanged<string>();
 34             }
 35         }
 36 
 37         /// <summary>
 38         /// The data recognition client
 39         /// </summary>
 40         private DataRecognitionClient dataClient;
 41 
 42         /// <summary>
 43         /// The microphone client
 44         /// </summary>
 45         private MicrophoneRecognitionClient micClient;
 46 
 47         #endregion Fields
 48 
 49         #region event
 50         /// <summary>
 51         /// Implement INotifyPropertyChanged interface
 52         /// </summary>
 53         public event PropertyChangedEventHandler PropertyChanged;
 54 
 55         /// <summary>
 56         /// Helper function for INotifyPropertyChanged interface 
 57         /// </summary>
 58         /// <typeparam name="T">Property type</typeparam>
 59         /// <param name="caller">Property name</param>
 60         private void OnPropertyChanged<T>([CallerMemberName]string caller = null)
 61         {
 62             this.PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(caller));
 63         }
 64         #endregion event
 65 
 66         #region 屬性
 67         /// <summary>
 68         /// Gets the current speech recognition mode.
 69         /// </summary>
 70         /// <value>
 71         /// The speech recognition mode.
 72         /// </value>
 73         private SpeechRecognitionMode Mode
 74         {
 75             get
 76             {
 77                 if (this.IsMicrophoneClientDictation ||
 78                     this.IsDataClientDictation)
 79                 {
 80                     return SpeechRecognitionMode.LongDictation;
 81                 }
 82 
 83                 return SpeechRecognitionMode.ShortPhrase;
 84             }
 85         }
 86 
 87         /// <summary>
 88         /// Gets the default locale.
 89         /// </summary>
 90         /// <value>
 91         /// The default locale.
 92         /// </value>
 93         private string DefaultLocale
 94         {
 95             //get { return "en-US"; }
 96             get { return "zh-CN"; }
 97 
 98         }
 99 
100         /// <summary>
101         /// Gets the Cognitive Service Authentication Uri.
102         /// </summary>
103         /// <value>
104         /// The Cognitive Service Authentication Uri.  Empty if the global default is to be used.
105         /// </value>
106         private string AuthenticationUri
107         {
108             get
109             {
110                 return ConfigurationManager.AppSettings["AuthenticationUri"];
111             }
112         }
113 
114         /// <summary>
115         /// Gets a value indicating whether or not to use the microphone.
116         /// </summary>
117         /// <value>
118         ///   <c>true</c> if [use microphone]; otherwise, <c>false</c>.
119         /// </value>
120         private bool UseMicrophone
121         {
122             get
123             {
124                 return this.IsMicrophoneClientWithIntent ||
125                     this.IsMicrophoneClientDictation ||
126                     this.IsMicrophoneClientShortPhrase;
127             }
128         }
129 
130         /// <summary>
131         /// Gets the short wave file path.
132         /// </summary>
133         /// <value>
134         /// The short wave file.
135         /// </value>
136         private string ShortWaveFile
137         {
138             get
139             {
140                 return ConfigurationManager.AppSettings["ShortWaveFile"];
141             }
142         }
143 
144         /// <summary>
145         /// Gets the long wave file path.
146         /// </summary>
147         /// <value>
148         /// The long wave file.
149         /// </value>
150         private string LongWaveFile
151         {
152             get
153             {
154                 return ConfigurationManager.AppSettings["LongWaveFile"];
155             }
156         }
157         #endregion 屬性
158 
159         #region 模式選擇控制器設置
160         /// <summary>
161         /// Gets or sets a value indicating whether this instance is microphone client short phrase.
162         /// </summary>
163         /// <value>
164         /// <c>true</c> if this instance is microphone client short phrase; otherwise, <c>false</c>.
165         /// </value>
166         public bool IsMicrophoneClientShortPhrase { get; set; }
167 
168         /// <summary>
169         /// Gets or sets a value indicating whether this instance is microphone client dictation.
170         /// </summary>
171         /// <value>
172         /// <c>true</c> if this instance is microphone client dictation; otherwise, <c>false</c>.
173         /// </value>
174         public bool IsMicrophoneClientDictation { get; set; }
175 
176         /// <summary>
177         /// Gets or sets a value indicating whether this instance is microphone client with intent.
178         /// </summary>
179         /// <value>
180         /// <c>true</c> if this instance is microphone client with intent; otherwise, <c>false</c>.
181         /// </value>
182         public bool IsMicrophoneClientWithIntent { get; set; }
183 
184         /// <summary>
185         /// Gets or sets a value indicating whether this instance is data client short phrase.
186         /// </summary>
187         /// <value>
188         /// <c>true</c> if this instance is data client short phrase; otherwise, <c>false</c>.
189         /// </value>
190         public bool IsDataClientShortPhrase { get; set; }
191 
192         /// <summary>
193         /// Gets or sets a value indicating whether this instance is data client with intent.
194         /// </summary>
195         /// <value>
196         /// <c>true</c> if this instance is data client with intent; otherwise, <c>false</c>.
197         /// </value>
198         public bool IsDataClientWithIntent { get; set; }
199 
200         /// <summary>
201         /// Gets or sets a value indicating whether this instance is data client dictation.
202         /// </summary>
203         /// <value>
204         /// <c>true</c> if this instance is data client dictation; otherwise, <c>false</c>.
205         /// </value>
206         public bool IsDataClientDictation { get; set; }
207 
208         #endregion
209 
210         #region 委托執行對象
211         /// <summary>
212         /// Called when the microphone status has changed.
213         /// </summary>
214         /// <param name="sender">The sender.</param>
215         /// <param name="e">The <see cref="MicrophoneEventArgs"/> instance containing the event data.</param>
216         private void OnMicrophoneStatus(object sender, MicrophoneEventArgs e)
217         {
218             Task task = new Task(() =>
219             {
220                 Console.WriteLine("--- Microphone status change received by OnMicrophoneStatus() ---");
221                 Console.WriteLine("********* Microphone status: {0} *********", e.Recording);
222                 if (e.Recording)
223                 {
224                     Console.WriteLine("Please start speaking.");
225                 }
226 
227                 Console.WriteLine();
228             });
229             task.Start();
230         }
231 
232         /// <summary>
233         /// Called when a partial response is received.
234         /// </summary>
235         /// <param name="sender">The sender.</param>
236         /// <param name="e">The <see cref="PartialSpeechResponseEventArgs"/> instance containing the event data.</param>
237         private void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)
238         {
239             Console.WriteLine("--- Partial result received by OnPartialResponseReceivedHandler() ---");
240             Console.WriteLine("{0}", e.PartialResult);
241             Console.WriteLine();
242         }
243 
244         /// <summary>
245         /// Called when an error is received.
246         /// </summary>
247         /// <param name="sender">The sender.</param>
248         /// <param name="e">The <see cref="SpeechErrorEventArgs"/> instance containing the event data.</param>
249         private void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e)
250         {
251             Console.WriteLine("--- Error received by OnConversationErrorHandler() ---");
252             Console.WriteLine("Error code: {0}", e.SpeechErrorCode.ToString());
253             Console.WriteLine("Error text: {0}", e.SpeechErrorText);
254             Console.WriteLine();
255         }
256 
257         /// <summary>
258         /// Called when a final response is received;
259         /// </summary>
260         /// <param name="sender">The sender.</param>
261         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
262         private void OnMicShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
263         {
264             Task task = new Task(() =>
265             {
266                 Console.WriteLine("--- OnMicShortPhraseResponseReceivedHandler ---");
267 
268                 // we got the final result, so it we can end the mic reco.  No need to do this
269                 // for dataReco, since we already called endAudio() on it as soon as we were done
270                 // sending all the data.
271                 this.micClient.EndMicAndRecognition();
272 
273                 this.WriteResponseResult(e);
274             });
275             task.Start();
276         }
277 
278         /// <summary>
279         /// Called when a final response is received;
280         /// </summary>
281         /// <param name="sender">The sender.</param>
282         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
283         private void OnDataShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
284         {
285             Task task = new Task(() =>
286             {
287                 Console.WriteLine("--- OnDataShortPhraseResponseReceivedHandler ---");
288 
289                 // we got the final result, so it we can end the mic reco.  No need to do this
290                 // for dataReco, since we already called endAudio() on it as soon as we were done
291                 // sending all the data.
292                 this.WriteResponseResult(e);
293 
294             });
295             task.Start();
296         }
297 
298         /// <summary>
299         /// Called when a final response is received;
300         /// </summary>
301         /// <param name="sender">The sender.</param>
302         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
303         private void OnMicDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
304         {
305             Console.WriteLine("--- OnMicDictationResponseReceivedHandler ---");
306             if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation ||
307                 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout)
308             {
309                 Task task = new Task(() =>
310                 {
311                     // we got the final result, so it we can end the mic reco.  No need to do this
312                     // for dataReco, since we already called endAudio() on it as soon as we were done
313                     // sending all the data.
314                     this.micClient.EndMicAndRecognition();
315                 });
316                 task.Start();
317             }
318 
319             this.WriteResponseResult(e);
320         }
321 
322         /// <summary>
323         /// Called when a final response is received;
324         /// </summary>
325         /// <param name="sender">The sender.</param>
326         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
327         private void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
328         {
329             Console.WriteLine("--- OnDataDictationResponseReceivedHandler ---");
330             if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation ||
331                 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout)
332             {
333                 Task task = new Task(() =>
334                 {
335 
336                     // we got the final result, so it we can end the mic reco.  No need to do this
337                     // for dataReco, since we already called endAudio() on it as soon as we were done
338                     // sending all the data.
339                 });
340                 task.Start();
341             }
342 
343             this.WriteResponseResult(e);
344         }
345 
346         /// <summary>
347         /// Sends the audio helper.
348         /// </summary>
349         /// <param name="wavFileName">Name of the wav file.</param>
350         private void SendAudioHelper(string wavFileName)
351         {
352             using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
353             {
354                 // Note for wave files, we can just send data from the file right to the server.
355                 // In the case you are not an audio file in wave format, and instead you have just
356                 // raw data (for example audio coming over bluetooth), then before sending up any 
357                 // audio data, you must first send up an SpeechAudioFormat descriptor to describe 
358                 // the layout and format of your raw audio data via DataRecognitionClient's sendAudioFormat() method.
359                 int bytesRead = 0;
360                 byte[] buffer = new byte[1024];
361 
362                 try
363                 {
364                     do
365                     {
366                         // Get more Audio data to send into byte buffer.
367                         bytesRead = fileStream.Read(buffer, 0, buffer.Length);
368 
369                         // Send of audio data to service. 
370                         this.dataClient.SendAudio(buffer, bytesRead);
371                     }
372                     while (bytesRead > 0);
373                 }
374                 finally
375                 {
376                     // We are done sending audio.  Final recognition results will arrive in OnResponseReceived event call.
377                     this.dataClient.EndAudio();
378                 }
379             }
380         }
381         #endregion 委托執行對象
382 
383         #region 輔助方法
384         /// <summary>
385         /// Gets the subscription key from isolated storage.
386         /// </summary>
387         /// <returns>The subscription key.</returns>
388         private string GetSubscriptionKeyFromIsolatedStorage()
389         {
390             string subscriptionKey = null;
391 
392             using (IsolatedStorageFile isoStore = IsolatedStorageFile.GetStore(IsolatedStorageScope.User | IsolatedStorageScope.Assembly, null, null))
393             {
394                 try
395                 {
396                     using (var iStream = new IsolatedStorageFileStream(IsolatedStorageSubscriptionKeyFileName, FileMode.Open, isoStore))
397                     {
398                         using (var reader = new StreamReader(iStream))
399                         {
400                             subscriptionKey = reader.ReadLine();
401                         }
402                     }
403                 }
404                 catch (FileNotFoundException)
405                 {
406                     subscriptionKey = null;
407                 }
408             }
409 
410             if (string.IsNullOrEmpty(subscriptionKey))
411             {
412                 subscriptionKey = DefaultSubscriptionKeyPromptMessage;
413             }
414 
415             return subscriptionKey;
416         }
417 
418         /// <summary>
419         /// Creates a new microphone reco client without LUIS intent support.
420         /// </summary>
421         private void CreateMicrophoneRecoClient()
422         {
423             this.micClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient(
424                 this.Mode,this.DefaultLocale,this.SubscriptionKey);
425 
426             this.micClient.AuthenticationUri = this.AuthenticationUri;
427 
428             // Event handlers for speech recognition results
429             this.micClient.OnMicrophoneStatus += this.OnMicrophoneStatus;
430             this.micClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler;
431             if (this.Mode == SpeechRecognitionMode.ShortPhrase)
432             {
433                 this.micClient.OnResponseReceived += this.OnMicShortPhraseResponseReceivedHandler;
434             }
435             else if (this.Mode == SpeechRecognitionMode.LongDictation)
436             {
437                 this.micClient.OnResponseReceived += this.OnMicDictationResponseReceivedHandler;
438             }
439 
440             this.micClient.OnConversationError += this.OnConversationErrorHandler;
441         }
442 
443         /// <summary>
444         /// Creates a data client without LUIS intent support.
445         /// Speech recognition with data (for example from a file or audio source).  
446         /// The data is broken up into buffers and each buffer is sent to the Speech Recognition Service.
447         /// No modification is done to the buffers, so the user can apply their
448         /// own Silence Detection if desired.
449         /// </summary>
450         private void CreateDataRecoClient()
451         {
452             this.dataClient = SpeechRecognitionServiceFactory.CreateDataClient(
453                 this.Mode,
454                 this.DefaultLocale,
455                 this.SubscriptionKey);
456             this.dataClient.AuthenticationUri = this.AuthenticationUri;
457 
458             // Event handlers for speech recognition results
459             if (this.Mode == SpeechRecognitionMode.ShortPhrase)
460             {
461                 this.dataClient.OnResponseReceived += this.OnDataShortPhraseResponseReceivedHandler;
462             }
463             else
464             {
465                 this.dataClient.OnResponseReceived += this.OnDataDictationResponseReceivedHandler;
466             }
467 
468             this.dataClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler;
469             this.dataClient.OnConversationError += this.OnConversationErrorHandler;
470         }
471 
472         /// <summary>
473         /// Writes the response result.
474         /// </summary>
475         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
476         private void WriteResponseResult(SpeechResponseEventArgs e)
477         {
478             if (e.PhraseResponse.Results.Length == 0)
479             {
480                 Console.WriteLine("No phrase response is available.");
481             }
482             else
483             {
484                 Console.WriteLine("********* Final n-BEST Results *********");
485                 for (int i = 0; i < e.PhraseResponse.Results.Length; i++)
486                 {
487                     Console.WriteLine(
488                         "[{0}] Confidence={1}, Text=\"{2}\"",
489                         i,
490                         e.PhraseResponse.Results[i].Confidence,
491                         e.PhraseResponse.Results[i].DisplayText);
492                     if (e.PhraseResponse.Results[i].DisplayText == "關閉。")
493                     {
494                         Console.WriteLine("收到命令,馬上關閉");
495                     }
496                 }
497 
498                 Console.WriteLine();
499             }
500         }
501         #endregion 輔助方法
502 
503         #region Init
504         public SpeechConfig()
505         {
506             this.IsMicrophoneClientShortPhrase = true;
507             this.IsMicrophoneClientWithIntent = false;
508             this.IsMicrophoneClientDictation = false;
509             this.IsDataClientShortPhrase = false;
510             this.IsDataClientWithIntent = false;
511             this.IsDataClientDictation = false;
512 
513             this.SubscriptionKey = this.GetSubscriptionKeyFromIsolatedStorage();
514         }
515 
516         /// <summary>
517         /// 語音識別開始執行
518         /// </summary>
519         public void SpeechRecognize()
520         {
521             if (this.UseMicrophone)
522             {
523                 if (this.micClient == null)
524                 {
525                     this.CreateMicrophoneRecoClient();
526                 }
527 
528                 this.micClient.StartMicAndRecognition();
529             }
530             else
531             {
532                 if (null == this.dataClient)
533                 {
534                     this.CreateDataRecoClient();
535                 }
536 
537                 this.SendAudioHelper((this.Mode == SpeechRecognitionMode.ShortPhrase) ? this.ShortWaveFile : this.LongWaveFile);
538             }
539         }
540         #endregion Init
541     }
復制代碼

   在這其中有幾個引用文件可以通過nuget包下載,基本沒什么問題。

對了這里注意的一個問題就是,下載Microsoft.Speech的時候一定是兩個包都需要下載,不然會報錯的,版本必須是4.5+以上的。

  只需替換默認的key就行,程序便可跑起來,效果真是很6

 

 

這識別率真是很好很好,很滿意,可是這個微軟的免費試用只有一個月,那就只能在這個月里多讓它開花結果了哈哈。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM