微信小程序語音識別服務搭建全過程解析（https api開放，支持新接口mp3錄音、老接口silk錄音）

本文轉載自查看原文 2017-11-09 15:26 5516 自然語言理解/ 微信小程序/ wx.getRecorderManager/ 語音識別/ silk轉wav

silk v3（或新錄音接口mp3）錄音轉olami語音識別和語義處理的api服務（ubuntu16.04服務器上實現）

重要的寫在前面

重要事項一：
所有相關更新，我優先更新到我個人博客中，其它地方的文章屬於本人或他人轉發，不一定及時同步。原文鏈接是： http://www.happycxz.com/m/?p=32

重要事項二：
目前本文中提到的API已支持微信小程序錄音文件格式：silk v3、mp3。
注：微信小程序開發工具上的錄音不論是新接口還是老接口，都是webm/base64格式，雖然后輟名是silk（或mp3），但不是真正的silk v3（或mp3）格式的，打開看頭部是 data:audio/webm;base64, 開頭的。
為了便於調試，2017年9月份時我補邏輯給支持上了，但是：我在2017年11月份發現原有的webm/base格式又不支持了。
api服務器端代碼沒有動過，跟蹤查過，目前僅發現在2017年10月份之前用微信小程序開發工具錄的文件，還是可以支持的，在10月份之后的文件，就不支持了，具體什么原因只能問老馬家的人了。
結論是：調用我本文中提到的兩個接口，只能用真機做測試，不能用電腦錄音來測試語音識別了。

重要事項三：
想要用我這個API，務必先去cn.olami.ai申請appKey和appSecret，然后將appKey告知我，我加進支持列表方可調用，二者缺一不可。文末有將有調用此文提到的API服務的案例以及源碼分享文章鏈接。

調用案例：“遙知之”智能小秘，歡迎掃碼體驗：
小程序碼小

重要事項四：
歡迎轉載本文，沒有什么別的要求，請保留：
原文鏈接：http://www.happycxz.com/m/?p=32
本文所有源碼對應碼雲鏈接：https://gitee.com/happycxz/silk2asr
本文所有源碼對應github鏈接：https://github.com/happycxz/silk2asr

為什么做？

前不久剛發布了一個智能生活信息查詢的小助手“遙知之”，可惜只能手動輸入來玩，這一點體驗很不好，因為微信小程序錄音是silk格式的，現在主要的語音識別接口都不支持。

在網上搜了下相應的功能，也只有php做的開源代碼實現的silk轉wav的服務器代碼，首先我不熟悉PHP，其次也不知道后期有沒有維護，干脆自己做一個tomcat + java版的，權當學習娛樂一下。

怎么做？

准備環境

先需要有一個支持https的服務器，我目前用的服務器是阿里雲秒殺的免費最低配置的服務器，預裝的ubuntu16.04 LTS版，然后自己搗鼓一下，配置上了https，具體是用 nginx + let's encrypt + tomcat來提供的https的API。這里不詳細介紹，感興趣的自己研究下。

需要一個silk解碼器，網上有一牛在2015年年初曾經發貼討論過這個話題：silk v3 編碼的音頻怎么轉換成 wav 或 mp3 之類的？

而且此牛后面有持續研究，提供了開源的silk_v3_decoder項目，具體見：kn007大牛的silk_v3_decoder

對了，開源項目是github上的，服務器上裝個git，這不用額外再說明了吧。

搭建服務步驟

下載silk-v3-decoder

基本就是在服務器上找個目錄，把大牛kn007的項目下載下來。

root@alijod:/home/jod/wechat_app# mkdir download
root@alijod:/home/jod/wechat_app# cd download/
root@alijod:/home/jod/wechat_app/download# git clone https://github.com/kn007/silk-v3-decoder.git
Cloning into 'silk-v3-decoder'...
remote: Counting objects: 634, done.
remote: Total 634 (delta 0), reused 0 (delta 0), pack-reused 634
Receiving objects: 100% (634/634), 72.79 MiB | 9.50 MiB/s, done.
Resolving deltas: 100% (352/352), done.
Checking connectivity... done.
root@alijod:/home/jod/wechat_app/download# ll
total 12
drwxr-xr-x 3 root root 4096 Sep 18 10:11 ./
drwxr-xr-x 7 root root 4096 Sep 18 10:11 ../
drwxr-xr-x 5 root root 4096 Sep 18 10:11 silk-v3-decoder/
root@alijod:/home/jod/wechat_app/download# ls silk-v3-decoder/
converter_beta.sh  converter.sh  LICENSE  README.md  silk  windows

看上述目錄，其實只用到了silk這個目錄，和converter.sh這個腳本。silk目錄中的C代碼需要gcc編譯，converter.sh腳本需要修改一下，后續都會提。

編譯silk_v3_decoder

根據https://github.com/kn007/silk-v3-decoder上的README，用上這個工具，需要gcc和ffmpeg，gcc是在編譯silk時執行make時用到的（普及一下小白），ffmpeg其實是腳本里要用的，與編譯無關。事實是，ffmpeg在整個服務搭建過程確實不是必備的，后文將有針對這個額外說明，只是本人偷懶，暫時不想再深入研究了。

gcc的環境，如果沒有安裝，自己網搜吧，這里不扯了，直接進入正題：

root@alijod:/home/jod/wechat_app/download# cd silk-v3-decoder/silk/
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder/silk# ll
total 32
drwxr-xr-x 5 root root  4096 Sep 18 10:11 ./
drwxr-xr-x 5 root root  4096 Sep 18 10:11 ../
drwxr-xr-x 2 root root  4096 Sep 18 10:11 interface/
-rw-r--r-- 1 root root  3278 Sep 18 10:11 Makefile
drwxr-xr-x 2 root root 12288 Sep 18 10:11 src/
drwxr-xr-x 2 root root  4096 Sep 18 10:11 test/
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder/silk# make
…………
…………（這里是一大段編譯過程日志）
…………
a - src/SKP_Silk_scale_vector.o
gcc -c -Wall -enable-threads -O3   -Iinterface -Isrc -Itest  -o test/Decoder.o test/Decoder.c
test/Decoder.c: In function ‘main’:
test/Decoder.c:187:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         fread(header_buf, sizeof(char), 1, bitInFile);
         ^
g++  -L./ test/Decoder.o -lSKP_SILK_SDK -o decoder
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder/silk# ls
decoder  interface  libSKP_SILK_SDK.a  Makefile  src  test
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder/silk#

可以看到，上面編譯過程中，最后出現了一個warning，不過沒關系，ls查一下，第一個“decoder”就是我們要用的binary啦，有它就證明編譯成功了。

測試silk_v3_decoder功能

接下來就要驗證一下編出來的這個能不能用了。
根據https://github.com/kn007/silk-v3-decoder上的README，摘下來一段：

sh converter.sh silk_v3_file/input_folder output_format/output_folder flag(format)

比如轉換一個文件，使用：

sh converter.sh 33921FF3774A773BB193B6FD4AD7C33E.slk mp3

注意：其中33921FF3774A773BB193B6FD4AD7C33E.slk是要轉換的文件，而mp3是最終轉換后輸出的格式。

參考上面那個例子就好了，腳本參數只有兩個，一個是源文件相對或絕對路徑，另一個是目標格式。
也就是說上述命令會將33921FF3774A773BB193B6FD4AD7C33E.slk（注意，例子里是slk后輟，你自己在獲取微信小程序錄音重命名時如果是.silk，別疑惑了，linux環境文件后輟名是沒有實際意義的，感興趣自己網搜，to小白）轉碼成33921FF3774A773BB193B6FD4AD7C33E.mp3。

沒有silk源文件？別急，我准備了個silk_v3錄音文件，附帶着轉出來的mp3一起放在我服務器上了，需要的可以去下載（右擊后另存即可，mp3可以在線播放，silk播放不了，直接單擊會“403”）：
微信小程序原始錄音文件：sample.silk
converter.sh腳本轉碼后的文件：sample.mp3

附上我轉碼的操作過程：

root@alijod:/home/jod/wechat_app/download/silk-v3-decoder# ll
total 48
drwxr-xr-x 5 root root 4096 Sep 18 10:43 ./
drwxr-xr-x 3 root root 4096 Sep 18 10:11 ../
-rw-r--r-- 1 root root 4131 Sep 18 10:11 converter_beta.sh
-rw-r--r-- 1 root root 3639 Sep 18 10:11 converter.sh
drwxr-xr-x 8 root root 4096 Sep 18 10:11 .git/
-rw-r--r-- 1 root root 1076 Sep 18 10:11 LICENSE
-rw-r--r-- 1 root root 3582 Sep 18 10:11 README.md
-rw-r----- 1 root root 6188 Sep 18 10:43 sample.silk
drwxr-xr-x 5 root root 4096 Sep 18 10:26 silk/
drwxr-xr-x 3 root root 4096 Sep 18 10:11 windows/
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder# 
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder# 
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder# sh converter.sh sample.silk mp3
-e [OK] Convert sample.silk To sample.mp3 Finish.
root@alijod:/home/jod/wechat_app/download/silk-v3-decoder# ll
total 68
drwxr-xr-x 5 root root  4096 Sep 18 10:43 ./
drwxr-xr-x 3 root root  4096 Sep 18 10:11 ../
-rw-r--r-- 1 root root  4131 Sep 18 10:11 converter_beta.sh
-rw-r--r-- 1 root root  3639 Sep 18 10:11 converter.sh
drwxr-xr-x 8 root root  4096 Sep 18 10:11 .git/
-rw-r--r-- 1 root root  1076 Sep 18 10:11 LICENSE
-rw-r--r-- 1 root root  3582 Sep 18 10:11 README.md
-rw-r--r-- 1 root root 17709 Sep 18 10:43 sample.mp3
-rw-r----- 1 root root  6188 Sep 18 10:43 sample.silk
drwxr-xr-x 5 root root  4096 Sep 18 10:26 silk/
drwxr-xr-x 3 root root  4096 Sep 18 10:11 windows/

關於converter.sh腳本

vim打開converter.sh腳本，顯示一下行號（vim中輸入":set nu"后回車，我為小白操心不少），想要簡單使用，其實只需要關注最后面這一段，如果想要深入研究，最好是把腳本完整過程搞懂。

 82 $cur_dir/silk/decoder "$1" "$1.pcm" > /dev/null 2>&1
 83 if [ ! -f "$1.pcm" ]; then
 84         ffmpeg -y -i "$1" "${1%.*}.$2" > /dev/null 2>&1 &
 85         ffmpeg_pid=$!
 86         while kill -0 "$ffmpeg_pid"; do sleep 1; done > /dev/null 2>&1
 87         [ -f "${1%.*}.$2" ]&&echo -e "${GREEN}[OK]${RESET} Convert $1 to ${1%.*}.$2 success, ${YELLOW}but not a silk v3 encoded file.${RESET}"&&exit
 88         echo -e "${YELLOW}[Warning]${RESET} Convert $1 false, maybe not a silk v3 encoded file."&&exit
 89 fi
 90 ##ffmpeg -y -f s16le -ar 24000 -ac 1 -i "$1.pcm" "${1%.*}.$2" > /dev/null 2>&1
 91 ffmpeg -y -f s16le -ar 12000 -ac 2 -i "$1.pcm" -f wav -ar 16000 -ac 1 "${1%.*}.$2" > /dev/null 2>&1
 92 ffmpeg_pid=$!
 93 while kill -0 "$ffmpeg_pid"; do sleep 1; done > /dev/null 2>&1
 94 rm "$1.pcm"
 95 [ ! -f "${1%.*}.$2" ]&&echo -e "${YELLOW}[Warning]${RESET} Convert $1 false, maybe ffmpeg no format handler for $2."&&exit
 96 echo -e "${GREEN}[OK]${RESET} Convert $1 To ${1%.*}.$2 Finish."
 97 exit

其實關鍵的兩行也就是Line 82和Line 90。第82行就是調用我們上文編出來的decoder解碼silk_v3文件，第90行是將silk_v3文件解碼出來的raw data數據轉成相應格式。

這里額外說明一下我跟這兩行的幾個插曲：

插曲一：speex壓縮

我做這個SILK語音識別服務的起初目的是讓我的“遙知之”支持語音輸入功能，“遙知之”上用的OLAMI接口也有語音識別，而且研究了一下他們的JAVA SDK和在線文檔，從在線文檔（OLAMI 文檔中心->語音識別接口文檔->“支持的音頻格式”）上看是支持wav格式，另外支持speex壓縮。

wav格式文件是很占空間的（相當於PCM原始采樣數據未經壓縮的，加了一個文件頭），如下圖所示（可能實際speex壓縮的效果會更好一點）：
pcm, silk, speex格式文件占空間比較圖

如果將數據通過speex壓縮，就只需要腳本中的第62行，就不用依賴ffmpeg去轉碼也可以直接省流量上傳到OLAMI語音識別服務器了。這里就是為什么我前面說到，ffmpeg並不是此服務搭建中必備之原因。

如果通過speex會大大降低傳輸效率，於是期間我有花蠻長時間在研究如何將pcm數據轉成speex的，比如怎么調用c代碼實現的speex的編碼（java下通過JNI調用speex的encoder，研究未果，放棄了這個方案），后來又找了jspeex（java版的speex codec）等等，后面因有另一個省事方案，這里用jspeex的方案就中斷未深入研究了，其實應該是行的通的。

在QQ群（群號：656580961）里提了一下，熱心的群主“黃眉毛”說olami java sdk里默認是將wav或pcm通過speex壓縮傳輸的，這樣一來，我只需要將wav或pcm對接olami java sdk就可以實現“省流量”傳輸到olami語音識別服務器了。這就是我最終采用的省事方案。

插曲二：采樣率不適配

發現通過微信小程序端錄音出來的silk v3文件，經過kn007的converter.sh轉出來的wav文件，再送到olami語音識別接口，發現識別效果很糟，把wav文件拿出來聽聽，似乎也正常。

這時候想起來腳本中PCM轉wav是按24K轉的，轉出來的WAV應該是24K的，而olami語音識別端支持的是16K（訊飛還支持8K的），可能是這個采樣率不一致導致的識別率差，網搜了一下，還真有前人碰到過相同問題，參見此文文中提到的“誤打誤撞”那一段：從微信中提取語音文件，並轉換成文字的全自動化解決方案，他的誤打誤撞的原理應該是小程序錄音就是雙通道12K的，然后ffmpeg額外指定一下參數將雙通道12K的數據流轉成16K的wav。

好了，離不開ffmpeg了，需要它幫着轉采樣率呀，speex壓縮又不負責解決采樣率轉換的問題。
重要的事說三遍：在原始腳本的基礎上，修改一下第90行：
重要的事說三遍：在原始腳本的基礎上，修改一下第90行：
重要的事說三遍：在原始腳本的基礎上，修改一下第90行：

ffmpeg -y -f s16le -ar 12000 -ac 2 -i "$1.pcm" -f wav -ar 16000 -ac 1 "${1%.*}.$2" > /dev/null 2>&1

插曲三：假silk（或mp3）真webm/base64格式

在使用微信小程序開發工具模擬手機做調試時，錄音文件不能被silk和ffmpeg轉，vim打開一看，頭部是“data:audio/webm;base64,”。

由此引伸出一個現象：微信小程序的錄音不全是silk v3（或mp3）格式，其中還有剛剛提到的webm/base64的，好像還有AMR格式的，聽kn007大神說還有混淆格式，也就是那種一個文件含多種格式混合的，也不知道為什么會有這種情況。

關於webm/base64格式，kn007的回復是，base64 decoder然后直接ffmpeg轉，於是我分兩步實現：
第一步：用java代碼做base64 decoder，再將文件寫到 xxx.webm文件中，這部分簡單，可參考微信小程序錄音文件格式silk 坑那樣做即可。
第二步：再調用ffmpeg命令直接轉碼成wav，主要是調用一下下述轉碼命令轉成16K的WAV：

ffmpeg -i "$1" -f wav -ar 16000 -ac 1 "${1%.*}.$2" > /dev/null 2>&1

其中調用ffmpeg命令容易出現失敗，原因之一可能會是文件讀寫權限不足，原因之二可能會是調用ffmpeg后，需要等ffmpeg進程消失，即轉碼任務完成，才退出。覺得我個人碰到的問題應該是原因之二導致的，因為我確實是將/usr/bin/ffmpeg設置成了777權限，還是會轉失敗，將調ffmpeg命令的部分在腳本中實現，並且加上kn007大神converter.sh中那樣的等待ffmpeg完成的部分，就搞定了。

為了讓腳本更通用，我將上述解決采樣率不匹配的問題，修改后的腳本基礎上，又添加了對webm格式的單獨ffmpeg轉碼支持（通過判斷傳入第1個參數的后輟是否是webm來判斷是不是直接ffmpeg轉碼然后exit，簡單粗暴並且高效！）大概在腳本的上方添加下面這一段：

SOURCE_FILE_SUFFIX=${1##*.}
echo -e "XXXX SOURCE_FILE_SUFFIX:${SOURCE_FILE_SUFFIX}"
if [[ "${SOURCE_FILE_SUFFIX}" == "webm" || "${SOURCE_FILE_SUFFIX}" == "mp3" ]]; then
		## if webm, ffmpeg it directly. webm/base64 had been base64 decode on api.happycxz.com already.
		## if mp3, ffmpeg it directly. mp3 do not need to decode, can be convert to wav directly.
		echo -e "begin to ffmpeg $2 from webm now..."
		##ffmpeg -i "$1" -f wav -ar 16000 -ac 1 "${1%.*}.$2" > ffmpeg.cxz.log 2>&1
		ffmpeg -i "$1" -f wav -ar 16000 -ac 1 "${1%.*}.$2" > /dev/null 2>&1
		##ffmpeg -i "$1" -f wav "${1%.*}.$2" > /dev/null 2>&1
		ffmpeg_pid=$!
		while kill -0 "$ffmpeg_pid"; do sleep 1; done > /dev/null 2>&1
		[ ! -f "${1%.*}.$2" ]&&echo -e "${YELLOW}[Warning]${RESET} Convert $1 false, maybe ffmpeg no format handler for $2."&&exit
		echo -e "${GREEN}[OK]${RESET} Convert $1 To ${1%.*}.$2 Finish."
		exit
else
		echo -e "begin to silk decoder flow..."
		## if not webm, follows default silk decoder road.
fi

(注：以上腳本片斷在2017.11.09更新了mp3部分支持，因為小程序錄音新接口已經支持mp3格式錄音，錄音文件直接經ffmpeg轉wav即可)

至此，converter_cxz.sh修改結束。

搭建web服務及主要代碼說明

前面相當於評估可行性，基本驗證了從小程序錄音文件 xx.silk 到語音識別API能認的數據或文件格式，這條路走通了，接下來就是堆JAVA代碼實現細節部分了。

創建sprinMVC工程

大概的工程目錄結構如下：
這里寫圖片描述

com.happycxz.controller中有兩個controller：
第1個，AdditionalController.java是用來查服務器狀態和在線更新數據用的，可忽略。
第2個，OlamiController.java是對接微信小程序silk文件上傳API接口的，代碼如下：

package com.happycxz.controller;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.security.NoSuchAlgorithmException;
import java.util.Map;  
  
import javax.servlet.ServletException;  
import javax.servlet.http.HttpServletRequest;  
import javax.servlet.http.HttpServletResponse;  
import javax.servlet.http.Part;

import org.springframework.stereotype.Controller;  
import org.springframework.util.StringUtils;  
import org.springframework.web.bind.annotation.RequestMapping;  
import org.springframework.web.bind.annotation.RequestParam;  
import org.springframework.web.bind.annotation.ResponseBody;

import com.happycxz.olami.AsrAdditionInfo;
import com.happycxz.olami.OlamiEntityFactory;
import com.happycxz.olami.SdkEntity;
import com.happycxz.utils.Configuration;
import com.happycxz.utils.Util;
import com.sun.org.apache.xml.internal.security.utils.Base64;  

/** 
 * olami與微信小程序 接口相關對接
 * @author Jod
 */
@Controller  
@RequestMapping("/olami")  
public class OlamiController {
	
	//保存linux shell命令字符串
	private static final String SHELL_CMD = Configuration.getInstance().getValue("local.shell.cmd", "sh /YOUR_PATH/silk-v3-decoder/converter_cxz.sh %s wav");

    //保存silk_v3, mp3和wav文件的目錄，放在web目錄、或一個指定的絕對目錄下 
    private static final String localFilePath = Configuration.getInstance().getValue("local.file.path", "/YOUR/LOCAL/VOICE/PATH/");;  
    
    static {
    	Util.p("OlamiController base SHELL_CMD:" + SHELL_CMD);
    	Util.p("OlamiController base localFilePath:" + localFilePath);
    }

    @RequestMapping(value="/asr", produces="plain/text; charset=UTF-8")  
    public @ResponseBody String asrUploadFile(HttpServletRequest request, HttpServletResponse response, @RequestParam Map<String, Object> p)  
            throws ServletException, IOException {  
    	return processBase(request, p, false);
    }  
   
    @RequestMapping(value="/mp3asr", produces="plain/text; charset=UTF-8")  
    public @ResponseBody String asrUploadFileMp3(HttpServletRequest request, HttpServletResponse response, @RequestParam Map<String, Object> p)  
            throws ServletException, IOException {
    	return processBase(request, p, true);
    }
    
    
    public String processBase(HttpServletRequest request, @RequestParam Map<String, Object> p, boolean isMp3)  
            throws ServletException, IOException {  

    	AsrAdditionInfo additionInfo = new AsrAdditionInfo(p);
    	if (additionInfo.getErrCode() != 0) {
    		//參數不合法，或者appKey沒有在支持列表中備錄
    		return Util.JsonResult(String.valueOf(additionInfo.getErrCode()), additionInfo.getErrMsg());  
    	}
    	
    	String localPathToday = localFilePath + getSrcFmt(isMp3) + File.separator + Util.getDateStr() + File.separator;
        // 如果文件存放路徑不存在，則mkdir一個  
        File fileSaveDir = new File(localPathToday);  
        if (!fileSaveDir.exists()) {  
            fileSaveDir.mkdirs();  
        }
  
        int count = 1;
        String asrResult = "";
        for (Part part : request.getParts()) {  
            String fileName_origin = extractFileName(part);
            //這里必須要用原始文件名是否為空來判斷，因為part列表是所有數據，前三個被formdata占了，對應文件名其實是空
            if(!StringUtils.isEmpty(fileName_origin)) {
            	Util.p("originFileName[" + count + "]:" + fileName_origin);
            	String fileName = additionInfo.getVoiceFileName(isMp3);
            	//DEBUG on windows, add temp path preffix to local D: to preserve part.write exception.
            	//String recFile = "D:" + localPathToday + fileName;
            	String recFile = localPathToday + fileName;
            	Util.p("recFileName[" + count + "]:" + recFile);

            	part.write(recFile);

            	if (webmBase64Decode2Wav(recFile)) {
            		//support webm/base64 in webmBase64Decode2Wav(), wxapp develop IDE record format. 
            		//even if the suffix is xx.silk(wx.startRecord generate) or xx.mp3(wx.getRecorderManager generate)
            		//if webm base64 format , and xxxx.webm file is temporary created, xxxx.wav was last be converted.
            	} else {
            		// run script to convert silk_v3 or mp3 to wav
                    Util.RunShell2Wav(SHELL_CMD, recFile);
            	}
            	
                // get wave file path and name, prepare for olami asr
                String waveFile = DotMp3OrDotSilk2DotOther(recFile, "wav");
                Util.p("OlamiController.asrUploadFile() waveFile:" + waveFile);
                
                if (new File(waveFile).exists() == false) {
                	Util.w("OlamiController.asrUploadFile() wav file[" + waveFile + "] not exist!", null);
					return Util.JsonResult("80", "convert " + getSrcFmt(isMp3) + " to wav failed, NOW NOT SUPPORT WXAPP DEVELOP RECORD because it is not " + getSrcFmt(isMp3) + " format. anyother reason please tell QQ:404499164."); 
                }
                
                try {
                	SdkEntity entity = OlamiEntityFactory.createEntity(additionInfo.getAppKey(), additionInfo.getAppSecret(), additionInfo.getUserId());
					asrResult = entity.getSpeechResult(waveFile);
					Util.p("OlamiController.asrUploadFile() asrResult:" + asrResult);
				} catch (NoSuchAlgorithmException | InterruptedException e) {
					Util.w("OlamiController.asrUploadFile() asr NoSuchAlgorithmException or InterruptedException", e);
				} catch (FileNotFoundException e) {
					Util.w("OlamiController.asrUploadFile() asr FileNotFoundException", e);
					return Util.JsonResult("80", "convert " + getSrcFmt(isMp3) + " to wav failed, NOW NOT SUPPORT WXAPP DEVELOP RECORD because it is not " + getSrcFmt(isMp3) + " format. anyother reason please tell QQ:404499164."); 
				} catch (Exception e) {
					Util.w("OlamiController.asrUploadFile() asr Exception", e);
				}
            }
            count++;
        }
        
        //防止數據傳遞亂碼
        //response.setContentType("application/json;charset=UTF-8");

        return Util.JsonResult("0", "olami asr success!", asrResult);  
    }
    
    private static String getSrcFmt(boolean isMp3) {
    	return (isMp3 ? "mp3":"silk_v3");
    }
    
    /**
     * 將  xxxxx.silk 文件名轉 xxxx.wav
     * @param silkName
     * @param otherSubFix
     * @return
     */
    private static String DotMp3OrDotSilk2DotOther(String recName, String otherSubFix) {
    	int removeByte = 4;
    	if (recName.endsWith("silk")) {
    		removeByte = 4;
    	} else if (recName.endsWith("slk") || recName.endsWith("mp3")) {
    		removeByte = 3;
    	}
    	return recName.substring(0, recName.length()-removeByte) + otherSubFix;
    }
    
    /** 
     * 從content-disposition頭中獲取源文件名 
     *  
     * content-disposition頭的格式如下： 
     * form-data; name="dataFile"; filename="PHOTO.JPG" 
     *  
     * @param part 
     * @return 
     */
	private String extractFileName(Part part) {  
        String contentDisp = part.getHeader("content-disposition");  
        String[] items = contentDisp.split(";");  
        for (String s : items) {  
            if (s.trim().startsWith("filename")) {  
                return s.substring(s.indexOf("=") + 2, s.length()-1);  
            }  
        }  
        return "";  
    }


    /**
     * 通過filePath內容判斷是否是webm/base64格式，如果是，先decode base64后，再直接ffmpeg轉wav，
     * 如果不是，返回false丟給外層繼續當作silk v3去解
     * @param filePath
     * @return
     */
	public static boolean webmBase64Decode2Wav(String filePath) {
		boolean isWebm = false;
		try {
			String encoding = "utf-8";
			File file = new File(filePath);
			// 判斷文件是否存在
			if ((file.isFile() == false) || (file.exists() == false)) {
				Util.w("webmBase64Decode2Wav() no file[" + filePath + "] exist.", null);
			}
			
			StringBuilder lineTxt = new StringBuilder();
			String line = null;
			try (
			InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);
			BufferedReader bufferedReader = new BufferedReader(read);) {
				while ((line = bufferedReader.readLine()) != null) {
					lineTxt.append(line);
				}
				read.close();
			} catch (Exception e) {
				Util.w("webmBase64Decode2Wav() exception0:", e);
				return isWebm;
			}
			
			String oldData = lineTxt.toString();
			if (oldData.startsWith("data:audio/webm;base64,") == false) {
				Util.d("webmBase64Decode2Wav() file[" + filePath + "] is not webm, or already decoded." );
				return isWebm;
			}
			
			isWebm = true;
			oldData = oldData.replace("data:audio/webm;base64,", "");
			String webmFileName = DotMp3OrDotSilk2DotOther(filePath, "webm");
			try {

				File webmFile = new File(webmFileName);
				byte[] bt = Base64.decode(oldData);
				FileOutputStream in = new FileOutputStream(webmFile);
				try {
					in.write(bt, 0, bt.length);
					in.close();
				} catch (IOException e) {
					Util.w("webmBase64Decode2Wav() exception1:", e);
					return isWebm;
				}
			} catch (FileNotFoundException e) {
				Util.w("webmBase64Decode2Wav() exception2:", e);
				return isWebm;
			}
			
			// run cmd to convert webm to wav
    		Util.RunShell2Wav(SHELL_CMD, webmFileName);
		} catch (Exception e) {
			Util.w("webmBase64Decode2Wav() exception3:", e);
			return isWebm;
		}
		
		return isWebm;
	}
	
	public static void main(String[] args) {
		webmBase64Decode2Wav("D:\\secureCRT_RZSZ\\1505716415538_f7d98081-4d21-3b40-a7df-e56c046a784d_b4118cd178064b45b7c8f1242bcde31f.silk");
	}
}

利用springMVC的注解，很方便的實現API功能，主要看這個asrUploadFile方法，參數包括request和response之外，還有一個Map結構的p，這個p是用來接收formdata的，即上傳錄音文件時附帶的信息。
我這里強制了必須上傳appKey、appSecret以及userId，因為我是直接對接的olami開放平台的接口。

大概的流程是（懶的畫流程圖了，直接看上面代碼，很容易看明白的）：

接收p中上傳的appKey、appSecret以及userId這三個必選參數
接收request中的Parts，獲取原始silk格式文件及對應的上傳文件名
這里面其實是包括file和formdata的，這里還掉進一個坑過，想着不需要調用“extractFileName”來拿原始文件名，直接收以請求，隨機生成一個文件名保存了得了，事實是，通過“extractFileName”拿文件名，當文件名為""或null時，這時候是formdata，不是文件，強制保存成文件肯定就出問題了（調試時發現有些錄音文件里只有一個很短的數字字母組成的字符串，就是這個原因）。
將文件另取個名字保存到服務器指定目錄
為什么要另存文件名：微信小程序上傳的錄音文件統一是wx-file.silk，不像小程序開發工具上錄音那樣文件名隨機生成。
這里有個額外判斷第3步中保存的xxx.silk是不是webm/base64格式的，如果是，就直接base64 decoder后保存文件 xxx.webm，然后調用converter_cxz.sh將webm格式的文件轉碼成xxx.wav的，走完流程或異常都跳過下一步，直接到第6步。如果不是webm/base64格式的，返回false，繼續走下一步。
調用silk_v3_decoder中的腳本（這里是上文提到的修改之后的腳本，我給重命名converter_cxz.sh了）轉xxx.wav
通過原來的silk文件全路徑，計算出wav文件全路徑
通過上一步得到的wav文件全路徑，以及appKey、appSecret以及userId這三個參數，生成一個SdkEntity實體，調用getSpeechResult接口獲取語音識別和語義處理的結果
組織輸出結果返回。

com.happycxz.olami中有四個文件：
第1個，AsrAdditionInfo.java是用來檢查https請求中formdata必選的三個參數是否都上傳了，是否合法。
這里我額外做了個限制，除了在olami平台上申請的appKey和appSecret之外，appKey還要額外告知我，我在支持列表中加上才可以用，避免被攻擊了大家都沒法用，沒辦法，小窩帶寬有限。

第2個，OlamiEntityFactory.java是做一個SdkEntity的緩存，如果formdata中上傳的userId不一樣，這個緩存就沒用了:(

第3個，OlamiKeyManager.java是配合第一個文件做appKey限制管理的。

第4個，SdkEntity.java是對接olami接口的部分，主要是從olami java sdk sample代碼中拷出來改改的。代碼如下：

package com.happycxz.olami;


import java.io.IOException;
import java.security.NoSuchAlgorithmException;

import com.google.gson.Gson;
import com.happycxz.utils.Util;

import ai.olami.cloudService.APIConfiguration;
import ai.olami.cloudService.APIResponse;
import ai.olami.cloudService.CookieSet;
import ai.olami.cloudService.SpeechRecognizer;
import ai.olami.cloudService.SpeechResult;
import ai.olami.nli.NLIResult;
import ai.olami.util.GsonFactory;

public class SdkEntity {
	
	//indicate simplified input
	private static int localizeOption = APIConfiguration.LOCALIZE_OPTION_SIMPLIFIED_CHINESE;
	// * Replace the audio type you want to analyze with this variable.
	
	private static int audioType = SpeechRecognizer.AUDIO_TYPE_PCM_WAVE;
	//private static int audioType = SpeechRecognizer.AUDIO_TYPE_PCM_RAW;

	// * Replace FALSE with this variable if your test file is not final audio. 
	private static boolean isTheLastAudio = true;
	
	private APIConfiguration config = null;
	
	//configure text recognizer
	SpeechRecognizer recoginzer = null;	
	// * Prepare to send audio by a new task identifier.
	//CookieSet cookie = new CookieSet();
	
	// json string for print pretty
	private static Gson jsonDump = GsonFactory.getDebugGson(false);
	// normal json string
	private static Gson mGson = GsonFactory.getNormalGson();

	public SdkEntity(String appKey, String appSecret, String userId) {
		Util.d("new SdkEntity() start.  appKey:" + appKey + ", appSecret: " + appSecret + ", userId: " + userId);
		try {
			config = new APIConfiguration(appKey, appSecret, localizeOption);
			recoginzer = new SpeechRecognizer(config);
	    	recoginzer.setEndUserIdentifier(userId);
	    	recoginzer.setTimeout(10000);
	    	recoginzer.setAudioType(audioType);
		} catch (Exception e) {
			Util.w("new SdkEntity() exception", e);
		}
		Util.d("new SdkEntity() done");
	}
	
	public String getSpeechResult(String inputFilePath) throws NoSuchAlgorithmException, IOException, InterruptedException {
		String lastResult = "";
		
		Util.d("SdkEntity.getSpeechResult() inputFilePath:" + inputFilePath);
		
		CookieSet cookie = new CookieSet();
		
		// * Start sending audio.
		APIResponse response = recoginzer.uploadAudio(cookie, inputFilePath, audioType, isTheLastAudio);
		//
		// You can also send audio data from a buffer (in bytes).
		//
		// For Example :
		// ===================================================================
		// byte[] audioBuffer = Files.readAllBytes(Paths.get(inputFilePath));
		// APIResponse response = recoginzer.uploadAudio(cookie, audioBuffer, audioType, isTheLastAudio);
		// ===================================================================
		//
		Util.d("\nOriginal Response : " + response.toString());
		Util.d("\n---------- dump ----------\n");
		Util.d(jsonDump.toJson(response));
		Util.d("\n--------------------------\n");

		//四種結果，full最完整，seg, nli, asr只包括那一部分
		String full = "", seg = "", nli = "", asr = "";
		// Check request status.
		if (response.ok()) {
			// Now we can try to get recognition result.
			Util.d("\n[Get Speech Result] =====================");
			while (true) {
				Thread.sleep(500);
				// * Get result by the task identifier you used for audio upload.
				Util.d("\nRequest CookieSet[" + cookie.getUniqueID() + "] speech result...");
				response = recoginzer.requestRecognitionWithAll(cookie);
				Util.d("\nOriginal Response : " + response.toString());
				Util.d("\n---------- dump ----------\n");
				Util.d(jsonDump.toJson(response));
				Util.d("\n--------------------------\n");
				// Check request status.
				if (response.ok() && response.hasData()) {
					full = mGson.toJson(response.getData());
					// * Check to see if the recognition has been completed.
					SpeechResult sttResult = response.getData().getSpeechResult();
					if (sttResult.complete()) {
						// * Get speech-to-text result
						Util.p("* STT Result : " + sttResult.getResult());
						asr = mGson.toJson(sttResult);
						// * Check to see if the recognition has be
						// Because we used requestRecognitionWithAll()
						// So we should be able to get more results.
						// --- Like the Word Segmentation.
						if (response.getData().hasWordSegmentation()) {
							String[] ws = response.getData().getWordSegmentation();
							for (int i = 0; i < ws.length; i++) {
								Util.d("* Word[" + i + "] " + ws[i]);
							}
							seg = response.getData().getWordSegmentationSingleString();
						}
						// --- Or the NLI results.
						if (response.getData().hasNLIResults()) {
							NLIResult[] nliResults = response.getData().getNLIResults();
							nli = mGson.toJson(nliResults);
						}
						// * Done.
						break;
					} else {
						// The recognition is still in progress.
						// But we can still get immediate recognition results.
						Util.d("* STT Result [Not yet completed] ");
						Util.d(" --> " + sttResult.getResult());
					}
				}
			}
		} else {
			// Error
			Util.w("* Error! Code : " + response.getErrorCode(), null);
			Util.w(response.getErrorMessage(), null);
		}
		
		lastResult = full;
		
		Util.d("\n===========================================\n");
		return lastResult;
	}
	
	public static void main(String[] args) throws NoSuchAlgorithmException, IOException, InterruptedException {
		Util.p("SdkEntity.main() start...");
    	int argLen = args.length;
    	
    	Util.d("SdkEntity.main() args.length[" + argLen + "]:");
    	for (String arg : args) {
    		Util.d("SpeexPcm.main() arg[" + arg + "]");
    	}

		new SdkEntity("b4118cd178064b45b7c8f1242bcde31f", "7908028332a64e47b8336d71ad3ce9ab", "abdd").getSpeechResult(args[0]);
    	Util.p("SdkEntity.main() end...");
	}
}

com.happycxz.olami中有兩個文件，是使用到的一些util、讀配置文件、系統日志等部分。

另外WEB-INFO/lib中加載olami的java sdk，如圖：
這里寫圖片描述

另外，額外附上一張olami-java-client-1.0.1-source.jar中關於默認采用speex壓縮的源碼部分：
這里寫圖片描述

怎么用

老接口（錄音為silk格式的，通過wx.startRecord錄音的）調用：https://api.happycxz.com/wxapp/silk2asr

新接口（錄音為mp3格式的，通過wx.getRecorderManager錄音的）調用：https://api.happycxz.com/wxapp/mp32asr
（注：如果調用新接口，要求小程序端錄音配置成：sampleRate: 16000, numberOfChannels: 1, encodeBitRate: 48000, format: 'mp3'）

formdata必選參數（以上兩個接口均適用）：

參數	是否必選	說明
appKey	是	從olami.cn上申請的key
appSecret	是	從olami.cn上申請的secret
userId	是	用戶的唯一標識，比如手機號，或唯一性的ID，或IMEI號之類的

返回數據res.data就是olami開放平台返回結果完全一致，未經修改，具體參考他們在線文檔：
olami開放平台的API接口返回數據格式

大概的是 seg字段是語音識別分段結果，asr是語音識別結果，nli是語義或語義處理的結果。小程序的開發工具上沒法DEBUG，就沒辦法截一段例子說明了。

調用案例：“遙知之”智能小秘

小程序碼小
歡迎掃碼試用。這一版支持語音識別，博客還沒來得及更新，稍后我會把相關代碼在這個文章“我的微信小程序支持語音識別啦！“遙知之”不再裝聾”中分享出來，主要是分享一下微信小程序里如何上傳SILK錄音部分以及如何解析olami返回的語音識別和語義處理結果的代碼。

最后閑話

本文歡迎轉載，原文鏈接：http://www.happycxz.com/m/?p=32

服務端工程的代碼分享：
本文所有源碼對應碼雲鏈接：https://gitee.com/happycxz/silk2asr
本文所有源碼對應github鏈接：https://github.com/happycxz/silk2asr

如果有不明白的都可以在本博客文章后面留言，也歡迎大家指正文中的理解或文字描述錯誤或不清楚的部分，我將及時更正，避免帶人跳坑。

需要用這個接口的，appKey可以在我的個人博客留言或私信告訴我，我加進我的白名單你才可以用。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 微信小程序語音識別服務搭建全過程解析（項目開源在github）小程序API錄音后Silk格式轉碼MP3 微信小程序使用訊飛接口語音識別 uniapp微信小程序語音識別實現微信小程序連接MQTT服務器全過程微信小程序實現錄音格式為mp3，並上傳到雲開發環境基於百度語音識別API的Python語音識別小程序記錄發布微信小程序npm包全過程微信錄音接口的調用以及amr文件轉碼MP3文件的實現 Google的語音識別API，支持各種語言