為你的應用裝上“耳朵”

本文轉載自查看原文 2016-04-16 14:27 1224 Java/ 語音識別

背景：

　　公司每年會給出兩天時間讓大家各自提出、准備、實現和展示自己的idea。源於這次活動，對於語音識別做了一番了解，並對一些工具和API做了一些實戰調用。最后選擇調用一個神奇的js庫實現了語音識別和指令識別，融入了項目當中。

　　Demo的視頻地址：http://pan.baidu.com/s/1c1OsQYk　　提取密碼：229p

　　起初，准備使用百度語音API實現語音識別，但是面臨下面的困擾：

　　　　1.REST方式，百度推出的基於跨平台的REST 方式的語音識別，但是API文檔中明確說明，該方式只支持音頻文件上傳的方式

　　　　2.Android平台，搭建好android環境並導入項目后，Demo程序可以運行，而且導出的apk文件也可以在android真機上運行。但是我們的產品是web應用，要將android平台的代碼抽出來放進產品，從時間上來說，顯然不夠

　　后來，我們找到了一個js庫——annyang。

　　　　該庫是一個很小的JavaScript庫，可以讓你的訪客用語音命令來控制你的網站。annyang支持多國語言，沒有依賴性，重量僅為2KB，可以免費使用。

所以今天的主要內容分為以下三部分

百度語音Android平台

1.導入項目代碼　　

　　在上篇《玩轉百度語音識別，就是這么簡單》我們已經搭建了android的環境，並導入百度給出的Android Demo代碼。但是之前我們只看到了一個名為"Speech Recorder"的應用，后來我再次啟動模擬器的時候，除了"Speech Recorder"以外還有一個"百度語音示例(2.x)"的應用。

　　在模擬器中似乎沒有辦法訪問到麥克風，於是安裝到android機上測試。當然首先你需要將代碼Export為apk文件，具體操作網上很多。

2.安裝過程和測試過程

　　安裝完成

　　應用界面：

　　語音輸入及識別結果

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　（太不要臉了，但是說的都是大實話^^）

　　至此，親測android平台可用，而且從識別結果來看，還是蠻准確的。

百度語音REST API

　　在上篇中我們也對REST API的方式做了介紹並親測上傳錄制好的音頻文件再調用API是能夠返回結果的。REST API的調用方式跨平台，輕便好用，但是需要錄好音頻文件上傳后才能得到語音識別的結果，顯得不是很方便。

1.設計思路

繪制一個界面包含開始和結束用於控制語音輸入
點擊開始時，開始調用麥克風
添加線程監控麥克風，將麥克風輸入的內容存儲為指定音頻格式的文件
當點擊停止按鈕時，調用語音識別的REST API，讀取剛剛生成的音頻文件
請求遠程識別服務，返回音頻文件識別后的結果

有了以上思路，具體實現的代碼如下

package com.baidu.speech.serviceapi;

import javax.swing.*;
import javax.xml.bind.DatatypeConverter;

import org.json.JSONObject;

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

import javax.sound.sampled.*;

public class AudioUI extends JFrame {

	AudioFormat audioFormat;
	TargetDataLine targetDataLine;

	final JButton captureBtn = new JButton("Capture");
	final JButton stopBtn = new JButton("Stop");

	final JPanel btnPanel = new JPanel();
	final ButtonGroup btnGroup = new ButtonGroup();
	final JRadioButton aifcBtn = new JRadioButton("AIFC");
	final JRadioButton aiffBtn = new JRadioButton("AIFF");
	final JRadioButton auBtn = // selected at startup
	new JRadioButton("AU", true);
	final JRadioButton sndBtn = new JRadioButton("SND");
	final JRadioButton waveBtn = new JRadioButton("WAVE");
	
	//definition variable for REST 
	private static final String serverURL = "http://vop.baidu.com/server_api";
   　　 private static String token = "";
    　　private static final String testFileName = "C:\\Users\\Administrator\\workspace\\speechrecognition\\output.wav";
    　　//put your own params here
    　　private static final String apiKey = "***";//這里的apiKey就是前面申請在應用卡片中的apiKey
    　　private static final String secretKey = "***";//這里的secretKey就是前面申請在應用卡片中的secretKey
    　　private static final String cuid = "***";//cuid是設備的唯一標示，因為我用的是PC，所以這里用的是網卡Mac地址

	public static void main(String args[]) {
		new AudioUI();
	}// end main

	public AudioUI() {// constructor
		captureBtn.setEnabled(true);
		stopBtn.setEnabled(false);

		// Register anonymous listeners
		captureBtn.addActionListener(new ActionListener() {
			public void actionPerformed(ActionEvent e) {
				captureBtn.setEnabled(false);
				stopBtn.setEnabled(true);
				// Capture input data from the
				// microphone until the Stop button is
				// clicked.
				captureAudio();
			}// end actionPerformed
		}// end ActionListener
		);// end addActionListener()

		stopBtn.addActionListener(new ActionListener() {
			public void actionPerformed(ActionEvent e) {
				captureBtn.setEnabled(true);
				stopBtn.setEnabled(false);
				// Terminate the capturing of input data
				// from the microphone.
				targetDataLine.stop();
				targetDataLine.close();
				try {
					getToken();
					method1();
			        method2();
				} catch (Exception e1) {
					// TODO Auto-generated catch block
					e1.printStackTrace();
				}
		        
			}// end actionPerformed
		}// end ActionListener
		);// end addActionListener()

		// Put the buttons in the JFrame
		getContentPane().add(captureBtn);
		getContentPane().add(stopBtn);

		// Include the radio buttons in a group
		btnGroup.add(aifcBtn);
		btnGroup.add(aiffBtn);
		btnGroup.add(auBtn);
		btnGroup.add(sndBtn);
		btnGroup.add(waveBtn);

		// Add the radio buttons to the JPanel
		btnPanel.add(aifcBtn);
		btnPanel.add(aiffBtn);
		btnPanel.add(auBtn);
		btnPanel.add(sndBtn);
		btnPanel.add(waveBtn);

		// Put the JPanel in the JFrame
		getContentPane().add(btnPanel);

		// Finish the GUI and make visible
		getContentPane().setLayout(new FlowLayout());
		setTitle("Copyright 2003, R.G.Baldwin");
		setDefaultCloseOperation(EXIT_ON_CLOSE);
		setSize(300, 120);
		setVisible(true);
	}// end constructor

	// This method captures audio input from a
	// microphone and saves it in an audio file.
	private void captureAudio() {
		try {
			// Get things set up for capture
			audioFormat = getAudioFormat();
			DataLine.Info dataLineInfo = new DataLine.Info(TargetDataLine.class, audioFormat);
			targetDataLine = (TargetDataLine) AudioSystem.getLine(dataLineInfo);

			// Create a thread to capture the microphone
			// data into an audio file and start the
			// thread running. It will run until the
			// Stop button is clicked. This method
			// will return after starting the thread.
			new CaptureThread().start();
		} catch (Exception e) {
			e.printStackTrace();
			System.exit(0);
		} // end catch
	}// end captureAudio method

	// This method creates and returns an
	// AudioFormat object for a given set of format
	// parameters. If these parameters don't work
	// well for you, try some of the other
	// allowable parameter values, which are shown
	// in comments following the declarations.
	private AudioFormat getAudioFormat() {
		float sampleRate = 8000.0F;
		// 8000,11025,16000,22050,44100
		int sampleSizeInBits = 16;
		// 8,16
		int channels = 1;
		// 1,2
		boolean signed = true;
		// true,false
		boolean bigEndian = false;
		// true,false
		return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);
	}// end getAudioFormat
	// =============================================//

	// Inner class to capture data from microphone
	// and write it to an output audio file.
	class CaptureThread extends Thread {
		public void run() {
			AudioFileFormat.Type fileType = null;
			File audioFile = null;

			// Set the file type and the file extension
			// based on the selected radio button.
			if (aifcBtn.isSelected()) {
				fileType = AudioFileFormat.Type.AIFC;
				audioFile = new File("output.aifc");
			} else if (aiffBtn.isSelected()) {
				fileType = AudioFileFormat.Type.AIFF;
				audioFile = new File("output.aif");
			} else if (auBtn.isSelected()) {
				fileType = AudioFileFormat.Type.AU;
				audioFile = new File("output.au");
			} else if (sndBtn.isSelected()) {
				fileType = AudioFileFormat.Type.SND;
				audioFile = new File("output.snd");
			} else if (waveBtn.isSelected()) {
				fileType = AudioFileFormat.Type.WAVE;
				audioFile = new File("output.wav");
			} // end if

			try {
				targetDataLine.open(audioFormat);
				targetDataLine.start();
				AudioSystem.write(new AudioInputStream(targetDataLine), fileType, audioFile);
			} catch (Exception e) {
				e.printStackTrace();
			} // end catch

		}// end run
	}// end inner class CaptureThread
	// =============================================//
	
	private static void getToken() throws Exception {
        String getTokenURL = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials" + 
            "&client_id=" + apiKey + "&client_secret=" + secretKey;
        HttpURLConnection conn = (HttpURLConnection) new URL(getTokenURL).openConnection();
        token = new JSONObject(printResponse(conn)).getString("access_token");
    }

    private static void method1() throws Exception {
        File pcmFile = new File(testFileName);
        HttpURLConnection conn = (HttpURLConnection) new URL(serverURL).openConnection();

        // construct params
        JSONObject params = new JSONObject();
        params.put("format", "pcm");
        params.put("rate", 8000);
        params.put("lan", "en");
        params.put("channel", "1");
        params.put("token", token);
        params.put("cuid", cuid);
        params.put("len", pcmFile.length());
        params.put("speech", DatatypeConverter.printBase64Binary(loadFile(pcmFile)));

        // add request header
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Content-Type", "application/json; charset=utf-8");

        conn.setDoInput(true);
        conn.setDoOutput(true);

        // send request
        DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
        wr.writeBytes(params.toString());
        wr.flush();
        wr.close();

        printResponse(conn);
    }

    private static void method2() throws Exception {
        File pcmFile = new File(testFileName);
        HttpURLConnection conn = (HttpURLConnection) new URL(serverURL
                + "?cuid=" + cuid + "&token=" + token).openConnection();

        // add request header
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Content-Type", "audio/pcm; rate=8000");

        conn.setDoInput(true);
        conn.setDoOutput(true);

        // send request
        DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
        wr.write(loadFile(pcmFile));
        wr.flush();
        wr.close();

        printResponse(conn);
    }

    private static String printResponse(HttpURLConnection conn) throws Exception {
        if (conn.getResponseCode() != 200) {
            // request error
            return "";
        }
        InputStream is = conn.getInputStream();
        BufferedReader rd = new BufferedReader(new InputStreamReader(is));
        String line;
        StringBuffer response = new StringBuffer();
        while ((line = rd.readLine()) != null) {
            response.append(line);
            response.append('\r');
        }
        rd.close();
        System.out.println(new JSONObject(response.toString()).toString(4));
        return response.toString();
    }

    private static byte[] loadFile(File file) throws IOException {
        InputStream is = new FileInputStream(file);

        long length = file.length();
        byte[] bytes = new byte[(int) length];

        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length
                && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
            offset += numRead;
        }

        if (offset < bytes.length) {
            is.close();
            throw new IOException("Could not completely read file " + file.getName());
        }

        is.close();
        return bytes;
    }

}

　　注意上面的apiKey,secretKey,cuid都要填寫自己申請后的值。

2.效果展示

　　彈出的ui界面

　　選擇"wave"，點擊Capture，開始語音輸入，然后點擊stop，得到結果

{
    "access_token": "24.***a82646fffd31.259**2335-7980222",
    "refresh_token": "25.18***6f.315360000.1***-7980222",
    "scope": "public audio_voice_assistant_get wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian",
    "session_key": "***URC38LpHQ+crR5n6hQ***zVZRBK/rpVGeNviJXnmJpFIwpsT97C4xvsD",
    "session_secret": "***db82f505cba***",
    "expires_in": 2592000
}
{
    "result": [
        "hello how are you, ",
        "hello oh how are you, ",
        "hello how are u, ",
        "halo how are you, ",
        "hollow how are you, "
    ],
    "err_msg": "success.",
    "sn": "776663153181460775167",
    "corpus_no": "6273981573367938505",
    "err_no": 0
}
{
    "result": ["哈嘍好玩喲，"],
    "err_msg": "success.",
    "sn": "496395116711460775168",
    "corpus_no": "6273981574058301747",
    "err_no": 0
}

　　這里LZ輸入的語音為“hello，how are you”。從識別結果來看很理想。而且百度語音識別支持三種語言（中文，英語，粵語），這里第一個返回的結果是英文版本，第二個返回的結果是中文版本。

　　有了麥克風的支持，REST API的調用方式顯得可操作性更強了，你可以將這種方式集成到你的應用，實現在線輸入，在線搜索等等功能。

annyang

1.annyang介紹

　　annyang就是一個js庫，可以集成到你的應用中，只有2kb的大小，它到底有多好用我們可以去它的demo網站體驗下就知道了

http://www.jq22.com/yanshi216　　　　annyang1.0.0版本

https://www.talater.com/annyang/　　　annyang2.3.0版本

　　從console控制台可以看到該網站共加載了四條指令：hello(there)， show me * search， show :type report以及let's get started

　　這里LZ說出了指令"hello there",產生的效果就是console顯示了識別的結果為hello there並與網站加載的指令hello there匹配上了，所以網站會自動跳轉到顯示hell的部分

　　這里LZ說的是show me voice search,所以匹配的show me *search指令，並且頁面跳轉到“show me”的部分

2.集成annyang到自己的應用

　　在自己的github angelloExtend中加入annyang的支持，只需要兩步：

（1）引入annyang.js

　　在boot.js中添加

{ file: '//cdnjs.cloudflare.com/ajax/libs/annyang/2.3.0/annyang.min.js'},

（2）在DataCtroller.js中添加一條command並啟動annyang服務

if (annyang) {
	// Let's define our first command. First the text we expect, and then the function it should call
	var commands = {
		 'show bar chart': function() {
		  alert("hahahha~~~~");
		  myUser.show('bar');
	 }
 };
						
 　　// Add our commands to annyang
       annyang.addCommands(commands);
						
　　// Start listening. You can call this here, or attach this call to an event, button, etc.
       annyang.start();
 }

　　具體參見項目：https://github.com/DMinerJackie/angelloExtend

至此，我們主要介紹了

　　1.基於android平台的百度語音識別的測試和驗證

　　2.基於REST API方式的語音識別，並實現調用麥克風實現在線語音輸入和識別的功能

　　3.介紹annyang並演示如何使用與集成該庫

如果您覺得閱讀本文對您有幫助，請點一下“推薦”按鈕，您的“推薦”將是我最大的寫作動力！如果您想持續關注我的文章，請掃描二維碼，關注JackieZheng的微信公眾號，我會將我的文章推送給您，並和您一起分享我日常閱讀過的優質文章。

友情贊助

如果你覺得博主的文章對你那么一點小幫助，恰巧你又有想打賞博主的小沖動，那么事不宜遲，趕緊掃一掃，小額地贊助下，攢個奶粉錢，也是讓博主有動力繼續努力，寫出更好的文章^^。

　　　　1. 支付寶　　　　　　　　　　　　　　　　　　　　　　　　　　2. 微信

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 左耳朵耗子給出的學習指南解析鄰居的耳朵音樂地址（單頁下載）左耳朵耗子- 技術博客學習 VSCode和IDEA都請安裝上這個神奇的插件 Elasticsearch 及其套件的安裝上手 Visual Studio 2015 終於還是裝上了「和耳朵」聊聊微服務與分布式系統算法=邏輯+控制（左耳朵耗子：編程的本質是什么？） AirPods Pro 戴久了耳朵痛，如何正確佩戴？當火車進入隧道，耳朵感到不舒服以后