微信小程序語音與訊飛語音識別接口（Java），Kronopath/SILKCodec，ffmpeg處理silk，pcm，wav轉換

本文轉載自查看原文 2017-12-15 18:42 4277 silk/ Kronopath/SILKCodec/ 微信小程序上傳接口/ pcm/ java訊飛語音識別/ wav轉換/ ffmpeg處理

項目需求，需要使用訊飛的語音識別接口，將微信小程序上傳的錄音文件識別成文字返回

首先去訊飛開放平台中申請開通語音識別功能

在這里面下載sdk,然后解壓，注意appid與sdk是關聯的，appid在初始化接口時候需要

由於是在Linux上開發，所以需要將.so文件和.dll文件上傳到Linux服務器上安裝的jdk/lib/amd64里面，要不會報引擎錯誤，window環境直接放在項目跟目錄就行.

由於微信小程序上傳的文件格式是silk的，而訊飛接口能識別wav 格式的文件，所以需要將小程序上傳的silk文件轉成wav的格式

由於小程序上傳的silk文件是變異的silk（小程序上傳的silk文件中在編碼頭多添加了一個字節）文件，所以需要將他處理成正常的silk文件

由於項目是運行在Linux上，所以寫了一個簡單的shell腳本以供java程序調用處理

這個腳本的作用是刪除輸入文件中#！SILK_V3所在行的第一個字節

好了，文件處理完了，現在就是格式轉換了

經調研，發現一般是先將silk文件轉換成pcm，這里使用的是Kronopath/SILKCodec，下載到linux服務器上，然后在SILK_SDK_SRC_ARM里執行

make lib
make decoder

執行之后會生成命令行工具decoder

使用方法：

./decoder  要轉換文件.silk   要生成文件.pcm

執行完上面代碼就會生成.pcm文件，然后就是將pcm轉成wav格式了，這里使用的是ffmpeg,沒有安裝的可以參考一下

ubuntu14.04安裝ffmpeg：http://blog.csdn.net/leezha/article/details/77849286

阿里雲linux安裝ffmpeg:http://blog.csdn.net/baijinwen/article/details/77235725

安裝ffmpeg可能出現的問題：http://blog.51cto.com/zlyang/1709508

為了保證語音識別的准確性，使用一下代碼識別生成的wav文件，訊飛接口識別結果最好

ffmpeg -f s16le -ar 12k -ac 2 -i /path/to/pcm -f wav -ar 16k -ac 1 /path/to/wav

下面就是java的訊飛語音接口開發了，直接貼代碼

package com.example.utils;

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

/**
 * Created by songzs on 2017/12/12.
 * 封裝的轉碼工具類
 */
public class FFMPEGUtil {
    public static String silk2Pcm(String inputfile,String outputfile){
        List<String> commend = new ArrayList<String>();
        commend.add("/usr/local/silk2pcm_tool/SILKCodec/SILK_SDK_SRC_ARM/./decoder");
        commend.add(inputfile);
        commend.add(outputfile);
        StringBuffer test=new StringBuffer();
        for(int i=0;i<commend.size();i++)
            test.append(commend.get(i)+" ");
        System.out.println("decoder命令:"+test+"");
        exec(test);
        return outputfile;
    }
    public static String pcm2Wav(String inputfile,String outputfile){
        //ffmpeg -f s16le -ar 12k -ac 2 -i /path/to/pcm -f wav -ar 16k -ac 1 /path/to/wav
        List<String> commend = new ArrayList<String>();
        commend.add("ffmpeg");
        commend.add("-f");
        commend.add("s16le");
        commend.add("-ar");
        commend.add("12k");
        commend.add("-ac");
        commend.add("2");
        commend.add("-i");
        commend.add(inputfile);
        commend.add("-f");
        commend.add("wav");
        commend.add("-ar");
        commend.add("16k");
        commend.add("-ac");
        commend.add("1");
        commend.add(outputfile);
        StringBuffer test=new StringBuffer();
        for(int i=0;i<commend.size();i++)
            test.append(commend.get(i)+" ");
        System.out.println("ffmpeg命令:"+test+"");
        exec(test);
        return outputfile;
    }

    public static String silk_remove_word(String filepath){
        List<String> commend = new ArrayList<String>();
        commend.add("/home/workspace/./test.sh");
        commend.add(filepath);
        StringBuffer test=new StringBuffer();
        for(int i=0;i<commend.size();i++)
            test.append(commend.get(i)+" ");
        System.out.println("test命令:"+test+"");
        exec(test);
        return filepath;
    }

    private static void exec(StringBuffer test){
        try {
            Runtime rt = Runtime.getRuntime();
            Process proc = rt.exec(test.toString());
            InputStream stderr = proc.getErrorStream();
            InputStreamReader isr = new InputStreamReader(stderr);
            BufferedReader br = new BufferedReader(isr);
            String line = null;
            while ( (line = br.readLine()) != null) ;

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

語音結果處理工具類（代碼簡陋，見諒）

package com.example.utils;

import com.alibaba.fastjson.JSON;

import java.util.List;
import java.util.Map;

/**
 * Created by songzs on 2017/12/15.
 */
public class SR2Words {

    public static String sr2words(String jsonString){
        StringBuffer sb = new StringBuffer();
        String[] split = jsonString.split("}]}]}");
        for (int i = 0; i < split.length; i++) {
            String s = split[i] + "}]}]}";
            System.out.println(s);
            Map parse = (Map) JSON.parse(s);
            List<Map> ws = (List<Map>) parse.get("ws");
            for (int i1 = 0; i1 < ws.size(); i1++) {
                List<Map> cw = (List<Map>)ws.get(i1).get("cw");
                String w = cw.get(0).get("w").toString();
                sb.append(w);
            }

        }
        return sb.toString();
    }
}

小程序錄音文件上傳與訊飛語音識別

package com.example.service.impl;

import com.example.service.XunFeiService;
import com.example.utils.FFMPEGUtil;
import com.example.utils.SR2Words;
import com.example.utils.SRTool;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

/**
 * Created by songzs on 2017/12/12.
 */
@Service
public class XunFeiServiceImpl implements XunFeiService {

    @Override
    public Map<String,String> speechRecognition(MultipartFile multi) {
        Map<String,String> map =new HashMap<>();
        UUID uuid = UUID.randomUUID();
        String path = "/home/workspace/audio";
        String fileName = uuid.toString()+".silk";
        //臨時silk文件
        String tempFile = "/home/workspace/audio/"+uuid.toString()+".silk";
        //中間過渡pcm文件
        String pcmFile = "/home/workspace/audio/"+uuid.toString()+".pcm";
        //可識別的wav文件
        String wavFile = "/home/workspace/audio/"+uuid.toString()+".wav";
        File file = new File(path,fileName);
        try {
            multi.transferTo(file);
        } catch (IOException e) {
            e.printStackTrace();
        }
        /*移除臨時silk文件首字節start*/
        //標准silk文件
        String silkFile = FFMPEGUtil.silk_remove_word(tempFile);
        /*移除臨時silk文件首字節end*/
        //silk文件轉換成pcm文件
        String silk2Pcm = FFMPEGUtil.silk2Pcm(silkFile, pcmFile);
        //pcm文件轉換成wav文件
        String pcm2Wav = FFMPEGUtil.pcm2Wav(silk2Pcm, wavFile);
        //訊飛語音識別接口識別wav音頻文件，轉成文字返回
        SRTool sr = new SRTool();
        String words = null;
        try {
            words = sr.voice2words(pcm2Wav);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("訊飛識別的語音結果："+words);
        if("".equals(words)){
            System.out.println("訊飛識別的語音結果：null");
            map.put("status","error");
            map.put("content","對不起，請您在描述一遍！");
            return map;
        }
        String result = SR2Words.sr2words(words);
        System.out.println("訊飛識別的語音結果："+result);
        map.put("status","success");
        map.put("content",result);
        return map;
    }
}

訊飛語音識別工具類

package com.example.utils;

import com.iflytek.cloud.speech.*;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;

/**
 * Created by songzs on 2017/12/4.
 */
public class SRTool {

    private int perWaitTime = 100;

    private StringBuffer mResult = new StringBuffer();

    static {
        SpeechUtility.createUtility("appid=********");//申請的appid
    }

    public String voice2words(String fileName) throws InterruptedException, IOException {
        return to(fileName);
    }

    public String to(String fileName) throws InterruptedException, IOException {

        File file = new File(fileName);
        if(!file.exists()){
            throw new RuntimeException("要讀取的文件不存在");
        }
        FileInputStream fis = new FileInputStream(file);
        int len = 0;
        byte[] buf = new byte[fis.available()];
        fis.read(buf);
        fis.close();

        //1.創建SpeechRecognizer對象
        SpeechRecognizer mIat = SpeechRecognizer.createRecognizer();
        //2.設置聽寫參數，詳見《MSC Reference Manual》SpeechConstant類
        mIat.setParameter(SpeechConstant.DOMAIN, "iat");
        mIat.setParameter(SpeechConstant.LANGUAGE, "zh_cn");
        mIat.setParameter(SpeechConstant.ACCENT, "mandarin ");
        mIat.setParameter(SpeechConstant.AUDIO_SOURCE, "-1");
        //3.開始聽寫
        mIat.startListening(mRecoListener);

        //voiceBuffer為音頻數據流，splitBuffer為自定義分割接口，將其以4.8k字節分割成數組
        ArrayList<byte[]> buffers = splitBuffer(buf, buf.length, 4800);
        for (int i = 0; i < buffers.size(); i++) {
            // 每次寫入msc數據4.8K,相當150ms錄音數據
            mIat.writeAudio(buffers.get(i), 0, buffers.get(i).length);
        }
        mIat.stopListening();

        while (mIat.isListening()) {
            Thread.sleep(perWaitTime);
        }
        return mResult+"";
    }

    /**
     * 將字節緩沖區按照固定大小進行分割成數組
     *
     * @param buffer 緩沖區
     * @param length 緩沖區大小
     * @param spsize 切割塊大小
     * @return
     */
    private ArrayList<byte[]> splitBuffer(byte[] buffer, int length, int spsize) {
        ArrayList<byte[]> array = new ArrayList<byte[]>();
        if (spsize <= 0 || length <= 0 || buffer == null
                || buffer.length < length)
            return array;
        int size = 0;
        while (size < length) {
            int left = length - size;
            if (spsize < left) {
                byte[] sdata = new byte[spsize];
                System.arraycopy(buffer, size, sdata, 0, spsize);
                array.add(sdata);
                size += spsize;
            } else {
                byte[] sdata = new byte[left];
                System.arraycopy(buffer, size, sdata, 0, left);
                array.add(sdata);
                size += left;
            }
        }
        return array;
    }

    //聽寫監聽器
    private RecognizerListener mRecoListener = new RecognizerListener() {
        public void onResult(RecognizerResult results, boolean isLast) {
            System.out.println("Result:" + results.getResultString());
            mResult.append(results.getResultString());
        }

        //會話發生錯誤回調接口
        public void onError(SpeechError error) {
            System.out.println(error.getErrorCode()+"=========="+error.getErrorDesc());
            System.out.println(error);
        }

        //開始錄音
        public void onBeginOfSpeech() {
        }

        //音量值0~30
        public void onVolumeChange(int volume) {
        }

        @Override
        public void onVolumeChanged(int i) {

        }

        @Override
        public void onEndOfSpeech() {

        }

        @Override
        public void onEvent(int i, int i1, int i2, String s) {

        }
    };
}

*小程序上傳接口必須是https請求，所以可能需要搭建https,相關內容可以參考我上一篇文章

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 微信小程序語音識別開發過程記錄微信小程序silk轉mp3 silk轉wav 以及ffmpeg使用微信小程序語音紅包微信小程序使用訊飛接口語音識別微信小程序語音識別科大訊飛語音識別 ros科大訊飛語音識別微信小程序語音識別服務搭建全過程解析（https api開放，支持新接口mp3錄音、老接口silk錄音） cordova科大訊飛語音識別科大訊飛語音識別科大訊飛語音識別【Unity】訊飛語音識別SDK