ASP.NET Core環境Web Audio API+SingalR+微軟語音服務實現web實時語音識別

本文轉載自查看原文 2020-03-07 01:58 1016

處於項目需要，我研究了一下web端的語音識別實現。目前市場上語音服務已經非常成熟了，國內的科大訊飛或是國外的微軟在這塊都可以提供足夠優質的服務，對於我們工程應用來說只需要花錢調用接口就行了，難點在於整體web應用的開發。最開始我實現了一個web端錄好音然后上傳服務端進行語音識別的簡單demo，但是這種結構太過簡單，對瀏覽器的負擔太重，而且響應慢，交互差；后來經過調研，發現微軟的語音服務接口是支持流輸入的連續識別的，因此開發重點就在於實現前后端的流式傳輸。
參考這位國外大牛寫的博文Continuous speech to text on the server with Cognitive Services，開發的新demo可以達到理想的效果，在網頁上點擊“開始錄音”開啟一次錄音，對着麥克風隨意說話，網頁會把實時的音頻數據傳遞給后端，后端實時識別並返回轉換結果，點擊“結束錄音”停止。

1 整體結構
整體結構圖如下所示，在web端需要使用HTML5的Web Audio API接收麥克風輸入的音頻流，進行適量的處理后實時傳遞給服務端；web與服務端之間的音頻流交互通過SignalR來實現；具體的語音識別通過調用微軟語音服務實現。
該web實時語音識別demo可以實現下面的功能：

可以通過網頁傳入麥克風音頻
網頁可以實時顯示語音識別結果，包括中間結果和最終結果
可以保存每一次的錄音，並且錄音時長可以非常長
支持多個web同時訪問，服務端管理多個連接
2 技術棧
ASP.NET Core開發
服務端使用較新的 ASP.NET Core技術開發，不同於傳統的 ASP.NET，ASP.NET Core更加適合前后台分離的web應用，我們會用 ASP.NET Core框架開發REST API為前端服務。如果不去糾結 ASP.NET Core的框架結構，實際的開發和之前的.NET應用開發沒什么不同，畢竟只是底層結構不同。
JavaScript或TypeScript開發
前端的邏輯用JavaScript開發，具體什么框架無所謂。我是用angular開發的，因此嚴格的說開發語言是TypeScript了。至於網頁的具體內容，熟悉html、CSS就行了。
Web Audio API的使用和基本的音頻處理的知識
采集和上傳麥克風音頻都在瀏覽器進行，對此需要使用HTML5標准下的Web API進行音頻流的獲取，使用音頻上下文(AudioContext)實時處理音頻流。具體處理音頻流時，需要了解一點基本的音頻知識，例如采樣率、聲道等參數，WAV文件格式等等。
相關資料:
HTML5網頁錄音和壓縮
HTML5 getUserMedia/AudioContext 打造音譜圖形化
Capturing Audio & Video in HTML5
微軟語音認知服務
微軟的語音識別技術是微軟雲服務中的成員之一，相比於國內比較熟知的科大訊飛，微軟的優勢在於契合 .NET Core技術棧，開發起來非常方便，支持連續識別，支持自定義訓練，並且支持容器部署，這對於那些對上傳雲服務有安全顧慮的用戶更是好消息。當然價格考慮就得看具體情況了，不過如果你有Azure賬號的話，可以開通標准版本的語音識別服務，這是免費的，只有時間限制；沒有賬號的話可以使用微軟提供的體驗賬號體驗一個月。
官方文檔： Speech Services Documentation
SignalR的使用
要實現web和服務端的流通信，就必須使用web socket一類的技術來進行長連接通信，微軟的SignalR是基於web socket的實時通信技術，如果我們的web需要和服務端保持長連接或者需要接收服務端的消息推送，使用該技術可以方便的實現。需要注意的是 ASP.NET SignalR and ASP.NET Core SignalR是有區別的，在 .NET Core環境下需要導入的是SignalR Core
官方文檔： Introduction to ASP.NET Core SignalR
Angular下調用SignalR: How to Use SignalR with .NET Core and Angular
字節流和異步編程的概念
參考這篇博文Continuous speech to text on the server with Cognitive Services，在服務端需要自己實現一個特別的字節流來作為語音服務的數據源，因為語音服務在默認的字節流上一旦讀取不到數據就會自動結束。在具體的實現中，將會用到一些信號量來進行讀取控制。
3 后端細節
3.1 獲取微軟語音識別服務
如果沒有Azure賬號，可以用微軟提供的試用賬號：

有Azure賬號的話，在Azure門戶里開通語音識別服務

創建的時候可以選擇F0類型的收費標准，這種是免費的：

開通成功后，得到API Key的值，我們調用服務的時候傳入這個參數；另一個參數是region，這個要看你創建服務的時候選擇的區域，如果你選擇的是East Asia，這個參數就是“eastasia”，如果用的是測試賬號，統一用“westus”。

3.2 創建並配置ASP.NET Core API項目
新建一個 ASP.NET Core API項目

通過Nuget管理器添加語音服務和SignalR相關的包。Microsoft.CognitiveServices.Speech是微軟語音服務包，Microsoft.AspNetCore.SignalR.Protocols.MessagePack用於SignalR中的MessagePack協議通信。

在Startup.cs中為 ASP.NET Core項目注入並配置SignalR服務：

public void ConfigureServices(IServiceCollection services)
{
services.AddCors(options =>
{
options.AddPolicy("CorsPolicy",
builder => builder.WithOrigins("http://localhost:4200")
.AllowAnyMethod()
.AllowAnyHeader()
.AllowCredentials());
}); // 跨域請求設置
services.AddSignalR().AddMessagePackProtocol(options =>
{
options.FormatterResolvers = new List<MessagePack.IFormatterResolver>()
{
MessagePack.Resolvers.StandardResolver.Instance
};
}); // 允許signalR以MessagePack消息進行通信
services.AddMvc().SetCompatibilityVersion(CompatibilityVersion.Version_2_2);
}

// This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
if (env.IsDevelopment())
{
app.UseDeveloperExceptionPage();
}
else
{
// The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts.
app.UseHsts();
}

app.UseHttpsRedirection();
app.UseCors("CorsPolicy"); //添加跨域請求服務
app.UseSignalR(routes =>
{
routes.MapHub<ContinuousS2TAPI.S2THub.S2THub>("/s2thub");
}); //添加SignalR服務並配置路由，訪問'/s2thub'將被映射到S2THub對象上
app.UseMvc();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
3.3 SignalR接口
新建一個S2THub文件夾，將SignalR接口放着里面。先創建一個Connection類，它代表一個客戶端連接：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using ContinuousS2TAPI.Speech;

namespace ContinuousS2TAPI.S2THub
{
public class Connection
{
public string SessionId; // 會話ID
public SpeechRecognizer SpeechClient; // 一個語音服務對象
public VoiceAudioStream AudioStream; //代表一個音頻流
public List<byte> VoiceData; //存儲該次會話的音頻數據
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
然后創建繼承自Hub的S2THub類，這個實例化的S2THub對象將會管理客戶端的連接

public class S2THub : Hub
{
private static IConfiguration _config;
private static IHubContext<S2THub> _hubContext; //S2THub實例的上下文
private static Dictionary<string, Connection> _connections; //維護客戶端連接

public S2THub(IConfiguration configuration, IHubContext<S2THub> ctx)
{
if (_config == null)
_config = configuration;

if (_connections == null)
_connections = new Dictionary<string, Connection>();

if (_hubContext == null)
_hubContext = ctx;
}
...
...
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
在S2THub類中，需要定義兩個供客戶端調用的接口AudioStart和ReceiveAudio，客戶端首先需要通過調用AudioStart來通知服務端開始一次語音識別會話，接着在接收到實時的音頻二進制數據后調用ReceiveAudio接口來將數據發送給服務端。這里會用到一個名為VoiceAudioStream的流對象，這是我們自定義的流對象，具體實現后文給出。

public async void AudioStart()
{
Console.WriteLine($"Connection {Context.ConnectionId} starting audio.");
var audioStream = new VoiceAudioStream(); // 創建一個供語音識別對象讀取的數據流
var audioFormat = AudioStreamFormat.GetWaveFormatPCM(16000, 16, 1);
var audioConfig = AudioConfig.FromStreamInput(audioStream, audioFormat);
var speechConfig = SpeechConfig.FromSubscription("your key", "you region"); //使用你自己的key和region參數
speechConfig.SpeechRecognitionLanguage = "zh-CN"; //中文

var speechClient = new SpeechRecognizer(speechConfig, audioConfig);

speechClient.Recognized += _speechClient_Recognized; // 連續識別存在Recognized和Recognizing事件
speechClient.Recognizing += _speechClient_Recognizing;
speechClient.Canceled += _speechClient_Canceled;

string sessionId = speechClient.Properties.GetProperty(PropertyId.Speech_SessionId);

var conn = new Connection()
{
SessionId = sessionId,
AudioStream = audioStream,
SpeechClient = speechClient,
VoiceData = new List<byte>()
};

_connections.Add(Context.ConnectionId, conn); //將這個新的連接記錄

await speechClient.StartContinuousRecognitionAsync(); //開始連續識別

Console.WriteLine("Audio start message.");

}

public void ReceiveAudio(byte[] audioChunk)
{
//Console.WriteLine("Got chunk: " + audioChunk.Length);
_connections[Context.ConnectionId].VoiceData.AddRange(audioChunk); //記錄接收到的音頻數據
_connections[Context.ConnectionId].AudioStream.Write(audioChunk, 0, audioChunk.Length);//並將實時的音頻數據寫入流
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
要將識別的結果返回給客戶端，就需要調用客戶端的接口實現推送，因此定義一個SendTranscript方法，內部會調用客戶端名為IncomingTranscript的接口來推送消息：

public async Task SendTranscript(string text, string sessionId)
{
var connection = _connections.Where(c => c.Value.SessionId == sessionId).FirstOrDefault();
await _hubContext.Clients.Client(connection.Key).SendAsync("IncomingTranscript", text); //調用指定客戶端的IncomingTranscript接口
}
1
2
3
4
5
在語音服務的識別事件中我們調用SendTranscript方法返回結果:

private async void _speechClient_Canceled(object sender, SpeechRecognitionCanceledEventArgs e)
{
Console.WriteLine("Recognition was cancelled.");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
await SendTranscript("識別失敗!", e.SessionId);
}
}

private async void _speechClient_Recognizing(object sender, SpeechRecognitionEventArgs e)
{
Console.WriteLine($"{e.SessionId} > Intermediate result: {e.Result.Text}");
await SendTranscript(e.Result.Text, e.SessionId);
}

private async void _speechClient_Recognized(object sender, SpeechRecognitionEventArgs e)
{
Console.WriteLine($"{e.SessionId} > Final result: {e.Result.Text}");
await SendTranscript(e.Result.Text, e.SessionId);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
最后我們重寫Hub的OnDisconnectedAsync方法，這個方法會在hub連接斷開時調用，可以在該方法內結束識別：

public async override Task OnDisconnectedAsync(Exception exception)
{
var connection = _connections[Context.ConnectionId];
Console.WriteLine($"Voice list length : {connection.VoiceData.Count}");
byte[] actualLength = System.BitConverter.GetBytes(connection.VoiceData.Count);
string rootDir = AppContext.BaseDirectory;
System.IO.DirectoryInfo directoryInfo = System.IO.Directory.GetParent(rootDir);
string root = directoryInfo.Parent.Parent.FullName;
var savePath = $"{root}\\voice{connection.SessionId}.wav";
using (var stream = new System.IO.FileStream(savePath, System.IO.FileMode.Create)) // 保存音頻文件
{
byte[] bytes = connection.VoiceData.ToArray();
bytes[4] = actualLength[0];
bytes[5] = actualLength[1];
bytes[6] = actualLength[2];
bytes[7] = actualLength[3];
bytes[40] = actualLength[0];
bytes[41] = actualLength[1];
bytes[42] = actualLength[2];
bytes[43] = actualLength[3]; // 計算並填入音頻數據長度
stream.Write(bytes, 0, bytes.Length);
}

await connection.SpeechClient.StopContinuousRecognitionAsync(); //結束識別
connection.SpeechClient.Dispose();
connection.AudioStream.Dispose();
_connections.Remove(Context.ConnectionId); //移除連接記錄
Console.WriteLine($"connection : {Context.ConnectionId} closed");
await base.OnDisconnectedAsync(exception);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
3.4 自定義音頻流
SpeechRecognizer對象可以接收一個特殊的流對象PullAudioStreamCallback作為數據源，如果傳入了這個對象，SpeechRecognizer會主動從該流對象里讀取數據。這個一個虛類，如果你只是給一段音頻文件做識別，通過MemoryStream和BinaryStreamReader的簡單組合就可以了(可以看微軟的官方demo)，但是SpeechRecognizer會在流中讀取到0個字節后停止識別，在我們的場景中默認的流類型無法滿足需求，當沒有數據讀取到時它們無法block住，PullAudioStreamCallback期望的效果是只有當明確流結束時讀取流的Read()方法才返回0。因此需要定義我們自己的音頻流對象
首先定義一個繼承自MemoryStream的EchoStream, 該流對象會在沒有數據進入時進行等待而不是直接返回0

public class EchoStream:MemoryStream
{
private readonly ManualResetEvent _DataReady = new ManualResetEvent(false);
private readonly ConcurrentQueue<byte[]> _Buffers = new ConcurrentQueue<byte[]>();

public bool DataAvailable { get { return !_Buffers.IsEmpty; } }

public override void Write(byte[] buffer, int offset, int count)
{
_Buffers.Enqueue(buffer.Take(count).ToArray());
_DataReady.Set();
}

public override int Read(byte[] buffer, int offset, int count)
{
//Debug.WriteLine("Data available: " + DataAvailable);

_DataReady.WaitOne();

byte[] lBuffer;

if (!_Buffers.TryDequeue(out lBuffer))
{
_DataReady.Reset();
return -1;
}

if (!DataAvailable)
_DataReady.Reset();

Array.Copy(lBuffer, buffer, lBuffer.Length);
return lBuffer.Length;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
然后定義PullAudioStreamCallback對象，作為語音服務的輸入源。服務端會把客戶端上傳的byte[]數據通過Write()方法寫入流，而語音服務會調用Read()方法讀取數據，可以看到，通過一個ManualResetEvent信號量，使得流對象必須在調用Close()方法之后才會在Read()方法中返回0

public class VoiceAudioStream : PullAudioInputStreamCallback
{
private readonly EchoStream _dataStream = new EchoStream();
private ManualResetEvent _waitForEmptyDataStream = null;

public override int Read(byte[] dataBuffer, uint size) //S2T服務從PullAudioInputStream中讀取數據, 讀到0個字節並不會關閉流
{
if (_waitForEmptyDataStream != null && !_dataStream.DataAvailable)
{
_waitForEmptyDataStream.Set();
return 0;
}

return _dataStream.Read(dataBuffer, 0, dataBuffer.Length);
}

public void Write(byte[] buffer, int offset, int count) //Client向PullAudioInputStream寫入數據
{
_dataStream.Write(buffer, offset, count);
}

public override void Close()
{
if (_dataStream.DataAvailable)
{
_waitForEmptyDataStream = new ManualResetEvent(false); //通過ManualResetEvent強制流的使用者必須調用close來手動關閉流
_waitForEmptyDataStream.WaitOne();
}

_waitForEmptyDataStream.Close();
_dataStream.Dispose();
base.Close();
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
4 前端細節
前端是用Angular框架寫的，這個Demo我們需要用到SignalR相關的庫。使用npm install @aspnet/signalr和npm install @aspnet/signalr-protocol-msgpack安裝這兩個包，並導入到組件中：

此外在polyfills.ts文件中添加如下代碼，否則可能會有瀏覽器兼容問題

在組件的初始化代碼中，初始化Hub對象，並檢查當前瀏覽器是否支持實時音頻，目前主要是firefox和chrome支持瀏覽器流媒體獲取。通過this.s2tHub.on('IncomingTranscript', (message) => { this.addMessage(message); });這句代碼，給客戶端注冊了一個IncomingTranscript方法以供服務端調用

constructor(private http: HttpClient) {
this.s2tHub = new signalR.HubConnectionBuilder()
.withUrl(this.apiurl) //apiurl是SignalR Hub的地址，我這里是'https://localhost:5001/s2thub'
.withHubProtocol(new signalRmsgpack.MessagePackHubProtocol())
.configureLogging(signalR.LogLevel.Information)
.build(); // 創建Hub對象
this.s2tHub.on('IncomingTranscript', (message) => {
this.addMessage(message);
}); // 在客戶端注冊IncomingTranscript接口
}

ngOnInit() {
if (navigator.mediaDevices.getUserMedia) { // 檢測當前瀏覽器是否支持流媒體
this.addMessage('This browser support getUserMedia');
this.supportS2T = true;
} else {
this.addMessage('This browser does\'nt support getUserMedia');
this.supportS2T = false;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
點擊"Start"后，啟動Hub連接，開始向服務端發送數據。先是調用服務端的AudioStart接口通知服務端開始識別；然后調用ReceiveAudio接口先向服務端發送44字節的wav頭數據，語音識別服務目前支持的音頻類型是PCM和WAV，16000采樣率，單聲道，16位寬；最后在startStreaming中開始處理實時音頻流。

async startRecord() {
if (!this.supportS2T) {
return;
}
let connectSuccess = true;
if (this.s2tHub.state !== signalR.HubConnectionState.Connected) {
await this.s2tHub.start().catch(err => { this.addMessage(err); connectSuccess = false; } );
}
if (!connectSuccess)
{
return;
}

this.s2tHub.send('AudioStart');
this.s2tHub.send('ReceiveAudio', new Uint8Array(this.createStreamRiffHeader()));
this.startStreaming();
}

private createStreamRiffHeader() {
// create data buffer
const buffer = new ArrayBuffer(44);
const view = new DataView(buffer);

/* RIFF identifier */
view.setUint8(0, 'R'.charCodeAt(0));
view.setUint8(1, 'I'.charCodeAt(0));
view.setUint8(2, 'F'.charCodeAt(0));
view.setUint8(3, 'F'.charCodeAt(0));

/* file length */
view.setUint32(4, 0, true); // 因為不知道數據會有多長，先將其設為0
/* RIFF type & Format */
view.setUint8(8, 'W'.charCodeAt(0));
view.setUint8(9, 'A'.charCodeAt(0));
view.setUint8(10, 'V'.charCodeAt(0));
view.setUint8(11, 'E'.charCodeAt(0));
view.setUint8(12, 'f'.charCodeAt(0));
view.setUint8(13, 'm'.charCodeAt(0));
view.setUint8(14, 't'.charCodeAt(0));
view.setUint8(15, ' '.charCodeAt(0));

/* format chunk length */
view.setUint32(16, 16, true); // 16位寬
/* sample format (raw) */
view.setUint16(20, 1, true);
/* channel count */
view.setUint16(22, 1, true); // 單通道
/* sample rate */
view.setUint32(24, 16000, true); // 16000采樣率
/* byte rate (sample rate * block align) */
view.setUint32(28, 32000, true);
/* block align (channel count * bytes per sample) */
view.setUint16(32, 2, true);
/* bits per sample */
view.setUint16(34, 16, true);
/* data chunk identifier */
view.setUint8(36, 'd'.charCodeAt(0));
view.setUint8(37, 'a'.charCodeAt(0));
view.setUint8(38, 't'.charCodeAt(0));
view.setUint8(39, 'a'.charCodeAt(0));

/* data chunk length */
view.setUint32(40, 0, true); // 因為不知道數據會有多長，先將其設為0

return buffer;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
在startStreaming方法中，首先調用window.navigator.mediaDevices.getUserMedia獲取到麥克風輸入流，然后創建AudioContext對象，用它創建多個音頻處理節點，這些節點依次連接：

audioInput節點是音頻輸入節點，以音頻流為輸入；lowpassFilter節點作為一個濾波節點，對輸入音頻做低通濾波進行簡單的降噪；jsScriptNode節點是主要的處理節點，可以為該節點添加事件處理，每當有數據進入該節點就進行處理和上傳；最后是destination節點，它是Web Audio Context終結點，默認情況下會連接到本地的揚聲器。

private startStreaming() {
window.navigator.mediaDevices.getUserMedia({
audio: true
}).then(mediaStream => {
this.addMessage('get media stream successfully');
this.audioStream = mediaStream;
this.context = new AudioContext();

this.audioInput = this.context.createMediaStreamSource(this.audioStream); // 源節點

this.lowpassFilter = this.context.createBiquadFilter();
this.lowpassFilter.type = 'lowpass';
this.lowpassFilter.frequency.setValueAtTime(8000, this.context.currentTime); //濾波節點

this.jsScriptNode = this.context.createScriptProcessor(4096, 1, 1);
this.jsScriptNode.addEventListener('audioprocess', event => {
this.processAudio(event);
}); // 處理事件

this.audioInput.connect(this.lowpassFilter);
this.lowpassFilter.connect(this.jsScriptNode);
this.jsScriptNode.connect(this.context.destination);

}).catch(err => {
this.addMessage('get media stream failed');
});
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
在jsScriptNode節點的audioprocess事件中，我們通過降采樣、取單通道音頻等處理獲得正確格式的音頻數據，再調用服務端的ReceiveAudio方法上傳數據塊

private processAudio(audioProcessingEvent: any) {
var inputBuffer = audioProcessingEvent.inputBuffer;
// The output buffer contains the samples that will be modified and played
var outputBuffer = audioProcessingEvent.outputBuffer;
var isampleRate = inputBuffer.sampleRate;
var osampleRate = 16000;
var inputData = inputBuffer.getChannelData(0);
var outputData = outputBuffer.getChannelData(0);
var output = this.downsampleArray(isampleRate, osampleRate, inputData);

for (var i = 0; i < outputBuffer.length; i++) {
outputData[i] = inputData[i];
}

this.s2tHub.send('ReceiveAudio', new Uint8Array(output.buffer)).catch(err => {this.addMessage(err); });
}

private downsampleArray(irate: any, orate: any, input: any): Int16Array { // 降采樣
const ratio = irate / orate;
const olength = Math.round(input.length / ratio);
const output = new Int16Array(olength);

var iidx = 0;
var oidx = 0;

for (var oidx = 0; oidx < output.length; oidx++) {
const nextiidx = Math.round((oidx + 1) * ratio);

var sum = 0;
var cnt = 0;

for (; iidx < nextiidx && iidx < input.length; iidx++) {
sum += input[iidx];
cnt++;
}

// saturate output between -1 and 1
var newfval = Math.max(-1, Math.min(sum / cnt, 1));

// multiply negative values by 2^15 and positive by 2^15 -1 (range of short)
var newsval = newfval < 0 ? newfval * 0x8000 : newfval * 0x7FFF;

output[oidx] = Math.round(newsval);
}

return output;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
最后，定義stopRecord方法，斷開AudioContext連接和Hub連接:

async stopRecord() {
this.jsScriptNode.disconnect(this.context.destination);
this.lowpassFilter.disconnect(this.jsScriptNode);
this.audioInput.disconnect(this.lowpassFilter);
this.s2tHub.stop();
}
1
2
3
4
5
6
5 擴展
自定義語音模型
可以上傳訓練數據進行自定義模型的訓練，得到的自定義語音服務可以適應更加特定的業務場景。不過自定義模型是需要開通標准收費服務的
容器部署
微軟目前也支持服務的容器部署，通過服務的本地部署，可以提高信息傳輸速度，並且減小私有信息安全性的顧慮

————————————————
版權聲明：本文為CSDN博主「皇家園林巡游者」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/wangzhenyang2/article/details/96020000

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 利用微軟認知服務實現語音識別功能微軟認知服務實現語音識別功能語音識別（Web Speech API）微軟認知語音服務語音識別 QT中使用微軟Speech API實現語音識別使用HttpClient對ASP.NET Web API服務實現增刪改查 ASP.NET Core Web API 開發-RESTful API實現花樣試用微軟語音服務曉曉 ASP.NET Core 中的實時框架 SingalR 使用ASP.NET Web Api構建基於REST風格的服務實戰系列教程【七】——實現資源的分頁