WAV文件格式解析及處理

本文轉載自查看原文 2019-03-07 14:35 1532 android提高篇/ java提高篇

RIFF file format

RIFF全稱為資源互換文件格式（Resources Interchange File Format），是Windows下大部分多媒體文件遵循的一種文件結構。RIFF文件所包含的數據類型由該文件的擴展名來標識

Chunk

RIFF文件結構可以看作是樹狀結構，其基本構成是稱為"塊"（Chunk）的單元，每個塊有"標志符"、"數據大小"及"數據"所組成

 public static class Chunk { //4個字節 public String chunkId; //4個字節。指的是 data的長度 public int dataSize; public byte[] data; }

chunkId
4字節，用以標識塊中所包含的數據。如：RIFF,LIST,fmt,data,WAV,AVI等。RIFF文件是按照小端 little-endian字節順序寫入的。
dataSize
存儲在data域中的數據長度
data
包含數據，數據以字為單位存放，如果數據長度為奇數（字節為單位），則最后添加一個空字節。

chunk是可以嵌套的，但是只有塊標志為RIFF或者LIST的chunk才能包含其他的chunk。

RIFF chunk

標志為RIFF的chunk是比較特殊的，每一個RIFF文件首先存放的必須是一個RIFF chunk，並且只能有這一個標志為RIFF的chunk。RIFF的數據域的起始位置是一個4字節碼（FOURCC），用於標識其數據域中chunk的數據類型；緊接着數據域的內容則是包含的subchunk，如下圖

RIFF chunk

這是一個RIFF chunk中包含有兩個subchunk，可以看出RIFF chunk的數據域首先是是4字節的 Form Type，接着是兩個subchunk，每一個subchun有包含有自己的標識、數據域的大小以及數據域。
除了RIFF cunk可以嵌套其他的chunk外，另一個可以有subchunk的就是LIST chunk。

image

上圖中，首先是RIFF文件必須的RIFF chunk，其數據域又包含有兩個subchunk，其中一個subchunk的類型為LIST，該LIST chunk又包含了兩個subchunk。

FourCC

FourCC 全稱為Four-Character Codes，是一個4字節32位的標識符，通常用來標識文件的數據格式。例如，在音視頻播放器中，可以通過文件的FourCC來決定調用那種CODEC進行視音頻的解碼。例如：DIV3,DIV4,DIVX,H264等，對於音頻則有：WAV,MP3等。對於上面的RIFF文件，則有：RIFF,WAVE,fmt,data等。FourCC是4個ASCII字符，不足四個字符的則在最后補充空格（不是空字符）。比如，FourCC fmt，實際上是'f' 'm' 't' ' '。

WAV

WAV 是Microsoft開發的一種音頻文件格式，它符合上面提到的RIFF文件格式標准，可以看作是RIFF文件的一個具體實例。既然WAV符合RIFF規范，其基本的組成單元也是chunk。一個WAV文件通常有三個chunk以及一個可選chunk，其在文件中的排列方式依次是：RIFF chunk，Format chunk，Fact chunk（附加塊，可選），Data chunk。

image.png

一個WAV文件，首先是一個RIFF chunk；RIFF chunk又包含有Format chunk，Data chunk以及可選的Fact chunk。各個chunk中字段的意義如下：

RIFF chunk

id	size	data
'R' 'I' 'F' 'F'	其data字段中數據的大小字節數	包含其他的chunk

Format chunk

id	size	data
'f' 'm' 't' ' '	見下面Chunk Size	見下面Chunk Data

chunk size

數據字段包含數據的大小。如無擴展塊，則值為16；有擴展塊，則值為= 16 + 2字節擴展塊長度 + 擴展塊長度或者值為18（只有擴展塊的長度為2字節，值為0）

chunk Data

存放音頻格式、聲道數、采樣率等信息

format_tag
2字節，表示音頻數據的格式。如值為1，表示使用PCM格式。
channels
2字節，聲道數。值為1則為單聲道，為2則是雙聲道。
samples_per_sec
采樣率，主要有22.05KHz，44.1kHz和48KHz。
bytes_per sec
音頻的碼率，每秒播放的字節數。samples_per_sec * channels * bits_per_sample / 8，可以估算出使用緩沖區的大小
block_align
數據塊對齊單位，一次采樣的大小，值為聲道數 * 量化位數 / 8，在播放時需要一次處理多個該值大小的字節數據。
bits_per_sample
音頻sample的量化位數，有16位，24位和32位等。
cbSize
擴展區的長度

擴展塊內容

22字節，具體介紹，后面補充。

Fact chunk**(option)

id	size	采樣總數
'f' 'a' 'c' 't'	數據域的長度，4（最小值為4）	采樣總數 (每個聲道)

采用壓縮編碼的WAV文件，必須要有Fact chunk，該塊中只有一個數據，為每個聲道的采樣總數。

Data chunk

id	size	data
'd' 'a' 't' 'a'	數據域的長度	具體的音頻數據存放在這里

補充

Format chunk 中的編碼方式

在Format chunk中，除了有音頻的數據的采樣率、聲道等音頻的屬性外，另一個比較主要的字段就是format_tag，該字段表示音頻數據是以何種方式編碼存放的。其具體的取值可以為以下：

格式代碼	格式名稱	fmt 塊長度	fact 塊
1(0x0001)	PCM/非壓縮格式	16
2(0x0002	Microsoft ADPCM	18	√
3(0x0003)	IEEE float	18	√
6(0x0006)	ITU G.711 a-law	18	√
7(0x0007)	ITU G.711 μ-law	18	√
49(0x0031)	GSM 6.10	20	√
64(0x0040)	ITU G.721 ADPCM		√
65,534(0xFFFE)	見子格式塊中的編碼格式	40

關於擴展格式塊

當WAV文件使用的不是PCM編碼方式是，就需要擴展格式塊，它是在基本的Format chunk又添加一段數據。該數據的前兩個字節，表示的擴展塊的長度。緊接其后的是擴展的數據區，含有擴展的格式信息，其具體的長度取決於壓縮編碼的類型。當某種編碼方式（如 ITU G.711 a-law）使擴展區的長度為0，擴展區的長度字段還必須保留，只是其值設置為0。
擴展區的各個字節的含義如下：

size 2字節
擴展區的數據長度，可以為0或22
valid_bits_per_sample 2字節
有效的采樣位數，最大值為采樣字節數 * 8。可以使用更靈活的量化位數，通常音頻sample的量化位數為8的倍數，但是使用了WAVE_FORMAT_EXTENSIBLE時，量化的位數有擴展區中的valid bits per sample來描述，可以小於Format chunk中制定的bits per sample。
channle mask 4字節
聲道掩碼
sub format 16字節
GUID，include the data format code，數據格式碼。

在Format chunk中的format_tag設置為0xFFFE時，表示使用擴展區中的sub_format來決定音頻的數據的編碼方式。在以下幾種情況下必須要使用WAVE_FORMAT_EXTENSIBLE

PCM數據的量化位數大於16
音頻的采樣聲道大於2
實際的量化位數不是8的倍數
存儲順序和播放順序不一致，需要指定從聲道順序到聲卡播放順序的映射情況。

Data chunk

Data塊中存放的是音頻的采樣數據。每個sample按照采樣的時間順序寫入，對於使用多個字節的sample，使用小端模式存放（低位字節存放在低地址，高位字節存放在高地址）。對於多聲道的sample采用交叉存放的方式。例如：立體雙聲道的sample存儲順序為：聲道1的第一個sample，聲道2的第一個sample；聲道1的第二個sample，聲道2的第二個sample；依次類推....。對於PCM數據，有以下兩種的存儲方式：

單聲道，量化位數為8，使用偏移二進制碼
除上面之外的，使用補碼方式存儲。

實例分析

普通的WAV

image.png

RIFF塊

由上面的介紹可知，由RIFF格式固定的。包括RIFF、Size和FOURCC

RIFF

RIFF.png
Size

Size.png

因為是小端的順序。實際上的十六進制數應該是 “00077090”，轉為487568。這個數值+8，就是文件的長度。
WAVE

WAVE.png

Format chunk

ChunkId
"fmt "。和上面標識的一樣。是4個字節，不足補“ ”

image.png
Chunk Size

image.png

因為是小端的順序。實際上的十六進制數應該是 “00000010”，為16。就是后續的Data的長度。
Chunk Data
fmt chunk中的chunk data就是包含有該視頻的信息。

Chunk Data.png

名稱	偏移地址	字節數	端序	內容	當前值
AudioFormat	0x08	2Byte	小端	音頻格式	1，PCM音頻數據的值為1。則當前沒有`Fact chunk`
NumChannels	0x0A	2Byte	小端	聲道數	2,表示音頻數據的聲道數，1：單聲道，2：雙聲道。
SampleRate	0x0C	4Byte	小端	采樣率	44100
ByteRate	0x10	4Byte	小端	每秒數據字節數	176400。SampleRate * NumChannels * BitsPerSample / 8
BlockAlign	0x14	2Byte	小端	數據塊對齊	4。NumChannels * BitsPerSample / 8
BitsPerSample	0x16	2Byte	小端	采樣位數	采樣深度16bit。8：8bit，16：16bit，32：32bit

Data

因為是PCM的數據格式，所以直接就到了data

標識'data'

data.png
音頻數據的長度Size

Size.png

名稱	偏移地址	字節數	端序	內容	當前值
ID	0x00	4Byte	大端	'data' (0x64617461)	“0x77000”，轉為十進制為 487424 。
Size	0x04	4Byte	小端	N	等於 ByteRate * seconds ，約為2.7秒。
Data	0x08	NByte	小端	音頻數據	...

總結

頭部大小
通常的WAV，以PCM為數據格式的，基本上頭部就如上面的結構。頭部的SIZE為固定的44,
通常對WAV音頻進行處理時，會直接寫死這個頭部的Offset。

排查一次WAV處理中的雜音情況

但是在實際處理的過程中，遇到了下面這樣的WAV HEADER。頭部的長度不同，導致后續的音頻處理中出現了雜音的情況。排查之后，才發現是因為頭部大小不同導致。

特殊一點的WAV

由Adobe Premiere Pro CC 創建的WAV。

image.png

它包含有 LIST Chunk。而且 fmt chunk的size為18。

wav list.png

因為有LIST,導致上面通常寫死的HEAD_SIZE 44出現錯誤。
這個時候重新去計算這個HEAD_SIZE就可以了。

LIST CHUNK

CHUNK ID
CHUNK ID為“LIST”
CHUNK SIZE
可以看到為0x58，十進制為88。

計算HEAD_SIZE

    private static int getHeadSize(RandomAccessFile srcFis) throws IOException { int offset = 0; //riff getChunkId(srcFis); offset += 4; //length getChunkSize(srcFis); offset += 4; //wave getChunkId(srcFis); offset += 4; //fmt getChunkId(srcFis); offset += 4; //fmt length int skipLength = getChunkSize(srcFis); offset += 4; byte[] skipBytes = new byte[skipLength]; srcFis.read(skipBytes); offset += skipLength; String chunkId = getChunkId(srcFis); offset += 4; while (!chunkId.equals("data")) { skipLength = getChunkSize(srcFis); offset += 4; skipBytes = new byte[skipLength]; srcFis.read(skipBytes); offset += skipLength; chunkId = getChunkId(srcFis); offset += 4; } offset += 4; System.out.println("headSize="+offset); return offset; } private static int getChunkSize(RandomAccessFile srcFis) throws IOException { byte[] formatSize = new byte[4]; srcFis.read(formatSize); int fisrt8 = formatSize[0] & 0xFF; int fisrt16 = formatSize[1] & 0xFF; int fisrt24 = formatSize[2] & 0xFF; int fisrt32 = formatSize[3] & 0xFF; int chunkSize = fisrt8 | (fisrt16 << 8) | (fisrt24 << 16) | (fisrt32 << 24); System.out.println("ChunkSize=" + chunkSize); return chunkSize; } private static String getChunkId(RandomAccessFile srcFis) throws IOException { byte[] bytes = new byte[4]; srcFis.read(bytes); StringBuilder stringBuilder = new StringBuilder(); for (int i = 0; i < bytes.length; i++) { stringBuilder.append((char) bytes[i]); } String chunkId = stringBuilder.toString(); System.out.println("ChunkId=" + chunkId); return chunkId; }

只有這樣計算出的HEAD_SIZE才能正確的處理文件，避免因為這個原因導致的雜音。

WAV一些處理

獲取wave文件某個時間對應的數據位置

    private static int getPositionFromWave(float time, int sampleRate, int channels, int bitNum) { int byteNum = bitNum / 8; //時間* 每秒數據字節數= 當前時間的字節數 int position = (int) (time * sampleRate * channels * byteNum); //當前時間的字節數 / 每個采樣所需的字節數 * 當前時間的字節數 來進行取整。定位到一個完整的采樣的起點 position = position / (byteNum * channels) * (byteNum * channels); return position; }

當前時間的字節數
sampleRate * channels * byteNum
定位到完整的采樣時間的起點
position = position / (byteNum * channels) * (byteNum * channels);

剪切音頻

剪切音頻的流程很簡單

計算兩個采樣點的位置。偏移頭部的大小，復制兩個采樣點之間的數據。
重新寫入修改之后的頭部。因為數據長度修改。里面的RIFF塊ChunkSize和data塊的長度由當前的長度做對應修改。

 public static void cutAudio(Audio audio, Audio audioOut, float cutStartTime, float cutEndTime) { if (cutStartTime == 0 && cutEndTime == audio.getTimeMillis() / 1000f) { return; } if (cutStartTime >= cutEndTime) { return; } String srcWavePath = audio.getPath(); int sampleRate = audio.getSampleRate(); int channels = audio.getChannel(); int bitNum = audio.getBitNum(); RandomAccessFile srcFis = null; RandomAccessFile newFos = null; String tempOutPath = srcWavePath + ".temp"; try { //創建輸入流 srcFis = new RandomAccessFile(srcWavePath, "rw"); newFos = new RandomAccessFile(tempOutPath, "rw"); //源文件開始讀取位置，結束讀取文件，讀取數據的大小 final int cutStartPos = getPositionFromWave(cutStartTime, sampleRate, channels, bitNum); final int cutEndPos = getPositionFromWave(cutEndTime, sampleRate, channels, bitNum); final int contentSize = cutEndPos - cutStartPos; //復制wav head 字節數據 byte[] headerData = AudioEncodeUtil.getWaveHeader(contentSize, sampleRate, channels, bitNum); copyHeadData(headerData, newFos); //取到正確頭部偏移 int srcHeadSize = getHeadSize(srcFis); //移動到文件開始讀取處 srcFis.seek(srcHeadSize + cutStartPos); //復制裁剪的音頻數據 copyData(srcFis, newFos, contentSize); } catch (Exception e) { e.printStackTrace(); return; } finally { //關閉輸入流 if (srcFis != null) { try { srcFis.close(); } catch (IOException e) { e.printStackTrace(); } } if (newFos != null) { try { newFos.close(); } catch (IOException e) { e.printStackTrace(); } } } //重命名為源文件 FileUtil.renameFile(new File(tempOutPath), audioOut.getPath()); } public static byte[] getWaveHeader(long totalAudioLen, int sampleRate, int channels, int bitNum) throws IOException { //總大小，由於不包括RIFF和WAV，所以是44 - 8 = 36，在加上PCM文件大小 long totalDataLen = totalAudioLen + 36; //采樣字節byte率 long byteRate = sampleRate * channels * bitNum / 8; byte[] header = new byte[44]; header[0] = 'R'; // RIFF header[1] = 'I'; header[2] = 'F'; header[3] = 'F'; header[4] = (byte) (totalDataLen & 0xff);//數據大小 header[5] = (byte) ((totalDataLen >> 8) & 0xff); header[6] = (byte) ((totalDataLen >> 16) & 0xff); header[7] = (byte) ((totalDataLen >> 24) & 0xff); header[8] = 'W';//WAVE header[9] = 'A'; header[10] = 'V'; header[11] = 'E'; //FMT Chunk header[12] = 'f'; // 'fmt ' header[13] = 'm'; header[14] = 't'; header[15] = ' ';//過渡字節 //數據大小 header[16] = 16; // 4 bytes: size of 'fmt ' chunk header[17] = 0; header[18] = 0; header[19] = 0; //編碼方式 10H為PCM編碼格式 header[20] = 1; // format = 1 header[21] = 0; //通道數 header[22] = (byte) channels; header[23] = 0; //采樣率，每個通道的播放速度 header[24] = (byte) (sampleRate & 0xff); header[25] = (byte) ((sampleRate >> 8) & 0xff); header[26] = (byte) ((sampleRate >> 16) & 0xff); header[27] = (byte) ((sampleRate >> 24) & 0xff); //音頻數據傳送速率,采樣率*通道數*采樣深度/8 header[28] = (byte) (byteRate & 0xff); header[29] = (byte) ((byteRate >> 8) & 0xff); header[30] = (byte) ((byteRate >> 16) & 0xff); header[31] = (byte) ((byteRate >> 24) & 0xff); // 確定系統一次要處理多少個這樣字節的數據，確定緩沖區，通道數*采樣位數 header[32] = (byte) (channels * 16 / 8); header[33] = 0; //每個樣本的數據位數 header[34] = 16; header[35] = 0; //Data chunk header[36] = 'd';//data header[37] = 'a'; header[38] = 't'; header[39] = 'a'; header[40] = (byte) (totalAudioLen & 0xff); header[41] = (byte) ((totalAudioLen >> 8) & 0xff); header[42] = (byte) ((totalAudioLen >> 16) & 0xff); header[43] = (byte) ((totalAudioLen >> 24) & 0xff); return header; }

替換和插入音頻

計算兩個采樣點的位置。偏移頭部的大小，講兩個采樣點之間的數據，替換成想要的音頻。
重新寫入修改之后的頭部。因為數據長度修改。里面的RIFF塊ChunkSize和data塊的長度由當前的長度做對應修改。

public static void replaceAudioWithSame(Audio srcAudio, Audio coverAudio, Audio outAudio, float srcStartTime) { String srcWavePath = srcAudio.getPath(); String coverWavePath = coverAudio.getPath(); int sampleRate = srcAudio.getSampleRate(); int channels = srcAudio.getChannel(); int bitNum = srcAudio.getBitNum(); RandomAccessFile srcFis = null; RandomAccessFile coverFis = null; RandomAccessFile newFos = null; String tempOutPcmPath = srcWavePath + ".tempPcm"; try { //創建輸入流 srcFis = new RandomAccessFile(srcWavePath, "rw"); coverFis = new RandomAccessFile(coverWavePath, "rw"); newFos = new RandomAccessFile(tempOutPcmPath, "rw"); int srcHeadSize = getHeadSize(srcFis); int coverHeadSize = getHeadSize(coverFis); final int srcStartPos = getPositionFromWave(srcStartTime, sampleRate, channels, bitNum); final int coverStartPos = 0; final int coverEndPos = (int) coverFis.length() - coverHeadSize; //復制源音頻srcStartTime時間之前的數據 //跳過頭文件數據 srcFis.seek(srcHeadSize); copyData(srcFis, newFos, srcStartPos); //復制覆蓋音頻指定時間段的數據 //跳過指定位置數據 coverFis.seek(coverHeadSize + coverStartPos); int copyCoverSize = coverEndPos - coverStartPos; float volume = coverAudio.getVolume(); copyData(coverFis, newFos, copyCoverSize); //復制srcStartTime時間后的源文件數據 final int srcStartAddCoverPosition = getPositionFromWave(srcStartTime + ((float) coverAudio.getTimeMillis()) / 1000, sampleRate, channels, bitNum); final long srcFileSize = srcFis.length() - srcHeadSize; int remainSize = (int) (srcFileSize - srcStartAddCoverPosition); if (remainSize > 0) { // coverFis.seek(WAVE_HEAD_SIZE + coverStartPos); srcFis.seek(srcHeadSize + srcStartAddCoverPosition); copyData(srcFis, newFos, remainSize); } } catch (Exception e) { e.printStackTrace(); return; } finally { //關閉輸入流 if (srcFis != null) { try { srcFis.close(); } catch (IOException e) { e.printStackTrace(); } } if (coverFis != null) { try { coverFis.close(); } catch (IOException e) { e.printStackTrace(); } } if (newFos != null) { try { newFos.close(); } catch (IOException e) { e.printStackTrace(); } } } // 刪除源文件, // new File(srcWavePath).delete(); // 轉換臨時文件為源文件 AudioEncodeUtil.convertPcm2Wav(tempOutPcmPath, outAudio.getPath(), sampleRate, channels, bitNum); //刪除臨時文件 new File(tempOutPcmPath).delete(); }

參考

RIFF和WAVE音頻文件格式
 WAV文件格式詳解
 wav文件格式分析與詳解

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 wav文件格式解析 WAV文件格式分析解析，附帶嗎 wav文件格式及ffmpeg處理命令 WAV格式解析 WAV文件格式分析解析，代碼已附 wav音頻文件格式解析【個人筆記】(自用) wav文件格式分析 WAV文件格式分析 WAV文件格式 wav文件格式分析(代碼)