轉載:http://www.voidcn.com/relative/p-fwdkigvh-bro.html
pcm 文件存儲的是 原始的聲音波型二進制流,沒有文件頭。
(1)首先要確認 pcm文件的每個采樣數據 采樣位數,一般為8bit或16bit。
(2)然后確定是雙聲道還是單聲道,雙聲道是兩個聲道的數據交互排列,需要單獨提取出每個聲道的數據。
(3)然后確定有沒有符號位,如采樣點位16bit有符號位的的范圍為-32768~32767
(4)確定當前操作系統的內存方式是大端,還是小端存儲。具體看http://blog.csdn.net/u013378306/article/details/78904238
(5)根據以上四條對pcm文件進行解析,轉化為10進制文件
注意:對於1-3可以在windows使用cooledit 工具設置參數播放pcm文件來確定具體參數,也可以使用以下java代碼進行測試:
本例子的語音為: 靜默1秒,然后說 “你好”,然后靜默兩秒。pcm文件下載路徑:http://download.csdn.net/download/u013378306/10175068
package test; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioSystem; import javax.sound.sampled.DataLine; import javax.sound.sampled.LineUnavailableException; import javax.sound.sampled.SourceDataLine; public class test { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub File file = new File("3.pcm"); System.out.println(file.length()); int offset = 0; int bufferSize = Integer.valueOf(String.valueOf(file.length())) ; byte[] audioData = new byte[bufferSize]; InputStream in = new FileInputStream(file); in.read(audioData); float sampleRate = 20000; int sampleSizeInBits = 16; int channels = 1; boolean signed = true; boolean bigEndian = false; // sampleRate - 每秒的樣本數 // sampleSizeInBits - 每個樣本中的位數 // channels - 聲道數(單聲道 1 個,立體聲 2 個) // signed - 指示數據是有符號的,還是無符號的 // bigEndian -是否為大端存儲, 指示是否以 big-endian 字節順序存儲單個樣本中的數據(false 意味着 // little-endian)。 AudioFormat af = new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian); SourceDataLine.Info info = new DataLine.Info(SourceDataLine.class, af, bufferSize); SourceDataLine sdl = (SourceDataLine) AudioSystem.getLine(info); sdl.open(af); sdl.start(); for(int i=0;i<audioData.length;i++) audioData[i]*=1; while (offset < audioData.length) { offset += sdl.write(audioData, offset, bufferSize); } } }
如果測試通過確定了參數就可以對pcm文件進行解析,如下java代碼對每個采樣數據為16bits,單聲道的pcm,在操作系統內存為小端存儲下解析為10進制文件。
package test; import java.io.File; import java.io.FileInputStream; import java.io.FileWriter; import java.io.InputStream; import java.math.BigInteger; public class ffff { /** * 采樣位為16bits,小端存儲,單聲道解析為10進制文件 * @param args */ public static void main(String[] args) { try { File file = new File("3.pcm"); System.out.println(file.length()); System.out.println(file.length()); int bufferSize = Integer.valueOf(String.valueOf(file.length())); byte[] buffers = new byte[bufferSize]; InputStream in = new FileInputStream(file); in.read(buffers); String rs = ""; for (int i = 0; i < buffers.length; i++) { byte[] bs = new byte[2]; bs[0]=buffers[i+1];//小端存儲, bs[1]=buffers[i]; int s = Integer.valueOf(binary(bs, 10)); i = i + 1; rs += " " + s; } writeFile(rs); in.close(); } catch (Exception e) { e.printStackTrace(); } } public static void writeFile(String s) { try { FileWriter fw = new FileWriter("hello3.txt"); fw.write(s, 0, s.length()); fw.flush(); fw.close(); } catch (Exception e) { e.printStackTrace(); } } public static String binary(byte[] bytes, int radix) { return new BigInteger(bytes).toString(radix);// 這里的1代表正數 } }
執行完可以查看hello.txt ,可以看到一開始振幅很小,如下,基本不超過100:
-15 -12 -18 -24 -17 -8 -8 -17 -22 -14 -5 -18 -47 -67 -60 -41 -28 -28 -23 -12 -6 -9 -13 -8 0 6 21 49 68 48 -2 -43 -47 -32 -22 -10 22 56
但說你好的時候,振幅變得很大:
-2507 -2585 -2600 -2596 -2620 -2670 -2703 -2674 -2581 -2468 -2378 -2305 -2200 -2018 -1774 -1523 -1307 -1127 -962 -806 -652 -505 -384 -313 -281 -241 -163
然后靜默兩秒,振幅又變的很小:
5 3 0 -4 -5 -6 -6 -7 -7 -8 -9 -8 -10 -10 -11 -10 -11 -11 -11 -11 -11 -11 -10 -9 -7 -6 -3 -2 -2 -3 -3 -3 -1 2 4 4
具體波形圖可以使用python代碼顯示:
import numpy as np import pylab as pl import math import codecs file=codecs.open("hello3.txt","r") //原文代碼file=codecs.open("hello3.txt","rb"),b是binary,以二進制方式讀取,是錯誤的。 lines=" " for line in file.readlines(): lines=lines+line ys=lines.split(" ") yss=[] ays=list() axs=list() i=0 max1=pow(2,16)-1 for y in ys: if y.strip()=="": continue yss.append(y) for index in range(len(yss)): y1=yss[index] i+=1; y=int(y1) ays.append(y) axs.append(i) #print i file.close() pl.plot(axs, ays,"ro")# use pylab to plot x and y pl.show()# show the plot on the screen
得到波形圖
這里音頻振幅與audacity中呈現的結果吻合,只是這里把振幅放大以便用肉眼去觀察。
2019-11-20 更新:
經過實踐發展,可以使用時間單位來檢測該時間內的數據是否檢測振幅。
(數學不太好,隨便用一個字符代替說明一下)
設時間單位為t,音頻采樣率為S,如果連續的時間單位t時間內振幅很小(也可以計算分貝數),可以認為是靜音(沒有聲音錄入) 。
待檢驗數據長度L=S*t,則檢測目標是長度為L的數組,如果這個時間類振幅(分貝)數據小於閾值(threshold),則認為近似靜音。
例:采樣率16000,2秒以外則認為沒有聲音輸入。即 2*16000長度的數組內,所有數組低於一個閾值。
stackoverflow答案:
參考:https://stackoverflow.com/questions/5800649/detect-silence-when-recording
How can I detect silence when recording operation is started in Java?
Calculate the dB or RMS value for a group of sound frames and decide at what level it is considered to be 'silence'.
What is PCM data?
Data that is in Pulse-code modulation format.
How can I calculate PCM data in Java?
I do not understand that question. But guessing it has something to do with the speech-recognition
tag, I have some bad news.
This might theoretically be done using the Java Speech API. But there are apparently no 'speech to text' implementations available for the API (only 'text to speech').
I have to calculate rms for speech-recognition project. But I do not know how can I calculate in Java.
For a single channel that is represented by signal sizes in a double
ranging from -1 to 1, you might use this method.
/** Computes the RMS volume of a group of signal sizes ranging from -1 to 1. */ public double volumeRMS(double[] raw) { double sum = 0d; if (raw.length==0) { return sum; } else { for (int ii=0; ii<raw.length; ii++) { sum += raw[ii]; } } double average = sum/raw.length; double sumMeanSquare = 0d; for (int ii=0; ii<raw.length; ii++) { sumMeanSquare += Math.pow(raw[ii]-average,2d); } double averageMeanSquare = sumMeanSquare/raw.length; double rootMeanSquare = Math.sqrt(averageMeanSquare); return rootMeanSquare; }
There is a byte buffer to save input values from the line, and what I should have to do with this buffer?
If using the volumeRMS(double[])
method, convert the byte
values to an array of double
values ranging from -1 to 1. ;)
筆者的思路是計算音頻分貝值,可以參考通過pcm音頻數據計算分貝
以下代碼轉載自:https://blog.csdn.net/balijinyi/article/details/80284520
很多場合我們需要動態顯示實時聲音分貝,下面列舉三種計算分貝的算法。(以雙聲道為例,也就是一個short類型,最大能量值為32767)
1:計算分貝 音頻數據與大小
首先我們分別累加每個采樣點的數值,除以采樣個數,得到聲音平均能量值。
然后再將其做100與32767之間的等比量化。得到1-100的量化值。
通常情況下,人聲分布在較低的能量范圍,這樣就會使量化后的數據大致分布在1-20的較小區間,不能夠很敏感的感知變化。
所以我們將其做了5倍的放大,當然計算后大於100的值,我們將其賦值100.
//參數為數據,采樣個數 //返回值為分貝 #define VOLUMEMAX 32767 int SimpleCalculate_DB(short* pcmData, int sample) { signed short ret = 0; if (sample > 0){ int sum = 0; signed short* pos = (signed short *)pcmData; for (int i = 0; i < sample; i++){ sum += abs(*pos); pos++; } ret = sum * 500.0 / (sample * VOLUMEMAX); if (ret >= 100){ ret = 100; } } return ret; }
2:計算均方根(RMS) 即能量值
static const float kMaxSquaredLevel = 32768 * 32768; constexpr float kMinLevel = 30.f; void Process(const int16_t* data, size_t length) { float sum_square_ = 0; size_t sample_count_ = 0; for (size_t i = 0; i < length; ++i) { sum_square_ += data[i] * data[i]; } sample_count_ += length;. float rms = sum_square_ / (sample_count_ * kMaxSquaredLevel); //20log_10(x^0.5) = 10log_10(x) rms = 10 * log10(rms); if (rms < -kMinLevel) rms = -kMinLevel; rms = -rms; return static_cast<int>(rms + 0.5); }
3:獲取音頻數據最大的振幅(即絕對值最大)(0-32767),除以1000,得到(0-32)。從數組中獲取相應索引所對應的分貝值。(提取自webrtc)
const int8_t permutation[33] = {0,1,2,3,4,4,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,9,9,9,9,9,9,9,9,9,9,9}; int16_t WebRtcSpl_MaxAbsValueW16C(const int16_t* vector, size_t length) { size_t i = 0; int absolute = 0, maximum = 0; for (i = 0; i < length; i++) { absolute = abs((int)vector[i]); if (absolute > maximum) { maximum = absolute; } } if (maximum > 32767) { maximum = 32767; } return (int16_t)maximum; } void ComputeLevel(const int16_t* data, size_t length) { int16_t _absMax = 0; int16_t _count = 0; int8_t _currentLevel = 0; int16_t absValue(0); absValue = WebRtcSpl_MaxAbsValueW16(data,length); if (absValue > _absMax) _absMax = absValue; if (_count++ == 10) { _count = 0; int32_t position = _absMax/1000; if ((position == 0) && (_absMax > 250)){ position = 1; } _currentLevel = permutation[position]; _absMax >>= 2; } }