AAC編碼后數據打包到FLV很簡單。
1. FLV音頻Tag格式
字節位置 意義
0x08, // 0, TagType
0xzz, 0xzz, 0xzz, // 1-3, DataSize,
0xzz, 0xzz, 0xzz, 0xzz, // 4-6, 7 TimeStamp | TimeStampExtend
0x00, 0x00, 0x00, // 8-10, StreamID
0xzz, // 11, AudioTag Header
0x0b, // 12, AACPacketType (如果不是AAC編碼 沒有這個字節)
0xzz ... 0xzz // 音頻數據
2. AudioTagHeader
音頻Tag頭一般由一個字節定義(AAC用兩個字節),第一個字節的定義如下:
音頻格式 4bits | 采樣率 2bits | 采樣精度 1bits | 聲道數 1bits|
音頻格式 4bits
0x00 = Linear PCM, platform endian
0x01 = ADPCM
0x02 = MP3
0x03 = Linear PCM, little endian
0x04 = Nellymoser 16-kHz mono
0x05 = Nellymoser 8-kHz mono
0x06 = Nellymoser
0x07 = G.711 A-law logarithmic PCM
0x08 = G.711 mu-law logarithmic PCM
0x09 = reserved
0x0A = AAC
0x0B = Speex
0x0E = MP3 8-Khz
0x0F = Device-specific sound
采樣率 2bits
0 = 5.5-kHz
1 = 11-kHz
2 = 22-kHz
3 = 44-kHz
對於AAC總是3,這里看起來FLV不支持48K AAC,其實不是的,后面還是可以定義為48K。
采樣精度 1bits
0 = snd8Bit
1 = snd16Bit
壓縮過的音頻都是16bit
聲道數 1bits
0 = sndMono
1 = sndStereo
對於AAC總是1
綜上,如果是AAC 48K 16比特精度 雙聲道編碼,該字節為 0b1010 1111 = 0xAF。
看第2個字節,如果音頻格式AAC(0x0A),AudioTagHeader中會多出1個字節的數據AACPacketType,這個字段來表示AACAUDIODATA的類型:
0x00 = AAC sequence header,類似h.264的sps,pps,在FLV的文件頭部出現一次。
0x01 = AAC raw,AAC數據
3. AAC Sequence header
AAC sequence header定義AudioSpecificConfig,AudioSpecificConfig包含着一些更加詳細的音頻信息,它的定義在ISO14496-3中1.6.2.1。
簡化的AudioSpecificConfig 2字節定義如下:
AAC Profile 5bits | 采樣率 4bits | 聲道數 4bits | 其他 3bits |
AAC Profile 5bits,參考ISO-14496-3 Object Profiles Table
AAC Main 0x01
AAC LC 0x02
AAC SSR 0x03
...
(為什么有些文檔看到profile定義為4bits,實際驗證是5bits)
采樣率 4bits
Value samplingFrequencyIndex
0x00 96000
0x01 88200
0x02 64000
0x03 48000
0x04 44100
0x05 32000
0x06 24000
0x07 22050
0x08 16000
0x09 12000
0x0A 11025
0x0B 8000
0x0C reserved
0x0D reserved
0x0E reserved
0x0F escape value
聲道數 4bits
0x00 - defined in audioDecderSpecificConfig
0x01 單聲道(center front speaker)
0x02 雙聲道(left, right front speakers)
0x03 三聲道(center, left, right front speakers)
0x04 四聲道(center, left, right front speakers, rear surround speakers)
0x05 五聲道(center, left, right front speakers, left surround, right surround rear speakers)
0x06 5.1聲道(center, left, right front speakers, left surround, right surround rear speakers, front low frequency effects speaker)
0x07 7.1聲道(center, left, right center front speakers, left, right outside front speakers, left surround, right surround rear speakers, front low frequency effects speaker)
0x08-0x0F - reserved
其他3bits設置為0即可。
AAC-LC, 48000,雙聲道 這樣的設置 Sequence header 為 0b 00010 0011 0010 000 = 0x11 0x90。
因此 AAC Sequence header的整個音頻Tag包為 0x08, 00 00 04, 00 00 00 00, 00 00 00, AF 00 11 90 | 00 00 00 0F
AAC Sequence header這個音頻包有些FLV文件里面沒有也可以正確解碼。但對於RTMP播放,必須要在發送第一個音頻數據包前發送這個header包。
4. AAC音頻包
結構為:0x08, 3字節包長度,4字節時間戳,00 00 00,AF 01 N字節AAC數據 | 前包長度
其中編碼后AAC純數據長度為N,3字節包長度 = N + 2
前包長度 = 11 + 3字節包長度 = 11 + N + 2 = 13 + N。
FLV格式非常簡單,頭信息數據量很少,適合網絡傳輸,因此被廣泛的應用。
1. H264 NALU結構
h264 NALU: 0x00 00 00 01 | nalu_type(1字節)| nalu_data (N 字節) | 0x00 00 00 01 | ...
起始碼(4字節) 類型 數據 下一個NALU起始碼
H264 NALU固定以 0x00 00 00 01為起始,NALU_data部分不會出現這個起始碼;
在找到下一個起始碼之前,當前NALU數據長度不知;
NALU_type 1字節,定義為:1比特禁止位 | 2比特 重要性指示位 | 5比特 類型
固定為0 11重要 不能少 1-12 由h264使用
00不重要 可以丟棄
幾個常用Nalu_type:
0x67 (0 11 00111) SPS 非常重要 type = 7
0x68 (0 11 01000) PPS 非常重要 type = 8
0x65 (0 11 00101) IDR幀 關鍵幀 非常重要 type = 5
0x41 (0 10 00001) P幀 重要 type = 1
0x01 (0 00 00001) B幀 不重要 type = 1
0x06 (0 00 00110) SEI 不重要 type = 6
2. FLV tag
前面講過FLV文件就是由無數個Tag組成的,Tag有Video Tag, Audio Tag和Script Tag.
A/V Tag里面存儲的就是音視頻編碼數據,Script Tag里面是一些碼流描述信息。
理論上來說,不解析Script tag也可以對A/V Tag完整解碼。tag的固定格式是:
Tag Type(1字節) | DataSize(3字節) | TimeStamp(3字節) | TimeStampExtended (1字節)| StreamID (3) | ...
下面將分別介紹各種NALU封到tag里面的結構。
2. 一般Video tag
字節位置 意義
0x09, // 0, TagType
0xzz, 0xzz, 0xzz, // 1-3, DataSize,
0xzz, 0xzz, 0xzz, 0xzz, // 4-6, 7 TimeStamp | TimeStampExtend
0x00, 0x00, 0x00, // 8-10, StreamID
0xz7, // 11, FrameType | CodecID
0x01, // 12, AVCPacketType
0x00, 0x00, 0x00, // 13-15, CompositionTime
0xzz, 0xzz, 0xzz, 0xzz, // 16-19, NaluLength NBytes
0xzz, ...., ...., 0xzz, // NBytes, NaluData
0xzz, 0xzz, 0xzz, 0xzz. // N+1-N+3, PreviousTagSize
其中 0xzz的意思是該字節根據實際情況付不同的值
2.1 DataSize[0,1,2] = NaluLength + 5 + 4;
5 是 AVCPacket頭5比特,(FrameType+CodecID | AVCPacketType | CompositionTime)
4 是寫入NaluLength
2.2 對於一個裸h264流,沒有時間戳的概念,可以默認以25fps,即40ms一幀數據。
int cts = 0;
TimeStamp[0,1,2] = cts[0,1,2];
TimeStampExtend[0] = cts[3];
cts += 40;
2.3 if(nalu_type == IDR) FrameType | CodecID = 0x17;
else FrameType | CodecID = 0x27;
2.4 NaluLength就是nalu長度,然后緊跟N字節的Nalu數據。
2.5 PreviousTagSize在這里計算最為方便,PreviousTagSize = 11 + 5 + 4 + NaluLength
11 是video tag頭數據 (TagType到StreamID)
IDR,I,P,B幀的NALU都是這個結構
3. SPS/PPS NALU
SPS和PPS在FLV里面稱為序列頭信息sequence header,它的AVCPacketType為0x00
字節位置 意義
0x09, // 0, TagType
0xzz, 0xzz, 0xzz, // 1-3, DataSize,
0x00, 0x00, 0x00, 0x00, // 4-6, 7; TimeStamp | TimeStampExtend
0x00, 0x00, 0x00, // 8-10, StreamID
// AVC video tage header 5Bytes
0x17, // 11, FrameType | CodecID
0x00, // 12, AVCPacketType
0x00, 0x00, 0x00, // 13-15, CompositionTime
// AVCDecoderConfigurationRecord 6 Bytes
0x01, // 16, ConfigurationVersion
0xzz, // 17, AVC Profile
0x00, // 18, profile_compatibility
0xzz, // 19, AVC Level
0xFF, // 20, lengthSizeMinusOne,
// reserved 6bits | NAL unit length-1, commonly be 3
0xzz, // 21, numOfSequenceParameterSets,
// reserved 3bits | SPS count, commonly be 1
0xzz, 0xzz, // 22-23, SPS0 Length N0 Byte
0xzz, ...., 0xzz, // N0 Byte SPS0 Data
0xzz, 0xzz, // SPSm Length Nm Byte (如果存在) 循環存放最多31個SPS
0xzz, ...., 0xzz, // Nm Byte SPSm Data
0xzz, // PPS count
0xzz, 0xzz, // PPS0 Length
0xzz, ...., 0xzz, // N0 Byte PPS0 Data
0xzz, 0xzz, // PPSm Length Nm Byte (如果存在) 循環存放最多255個PPS
0xzz, ...., 0xzz, // Nm Byte PPSm Data
0xzz, 0xzz, 0xzz, 0xzz. // N+1-N+3, PreviousTagSize
3.1 在H.264碼流里面reserved bit一般為0; 而在FLV碼流里面reserved bit定義為1
3.2 在H.264里面 SPS和PPS是對立的NALU,但是在FLV里面會把他們統一寫在一個Video Tag里面。
而且這個tag必須是FLV里面第一個Video Tag,否則接收到其他video tag也沒法解碼.
為了防止SPS,PPS數據丟失,有些編碼器會在每個IDR幀之前重復發SPS,PPS。這些SPS其實是一樣的。
但也不排除有些變態的編碼器前后的SPS會不同,比較標准容許這樣做。
這樣就需要首先遍歷一邊h264碼流,將其中不同的SPS,PPS提起出來,先記錄下來,然后再統一寫到FLV。
也可以大膽一點接收到第一個SPS和第一個PPS后就結束這個遍歷,就當作碼流里面只有一個SPS和一個PPS。
3.3 DataSize=5 + // AVC video tag header (FrameType + CodecID | .. CompositionTime)
6 + // AVCDecoderConfigurationRecord
SPSCount*2 + // 每個SPS長度2字節
各個 SPSDataLength + // 所有SPS數據長度和
1 + // PPS個數
PPSCount*2 + // 每個PPS長度2字節
各個 PPSDataLength; // 所有PPS數據長度和
3.4 AVC Profile和 AVC Level就等於SPS NALU里面第1字節和第3字節 (第0字節為NaluType)
3.5 lengthSizeMinusOne,這個定義沒有理解,不知道低2比特是什么含義,看到很多文檔里面就直接設為0b11, 所有這個字節為 0xFF
3.6 numOfSequenceParameterSets, 低5比特是SPS個數,H.264標准里面定義最多SPS個數為255,這里只有31。
不知道會不會存在問題,當然一般情況下就一個SPS,該值為 0xE1 (0b111 00001)
3.7 每個SPS,PPS數據長度都用兩個字節來表述,
3.8 這個tag的 PreviousTagSize = 11 + DataSize。 11 是Video tag (TagType到StreamID)
4. FLV頭
'F', 'L', 'V', // 0-2 FLV file Signature, also can be 'f''l''v'
0x01, // FLV version,
0x0z, // AV tag Enable. 0x05 AV both, 0x03 audio only, 0x01 video only
0x00, 0x00, 0x00, 0x09, // Length of this header.
0x00, 0x00, 0x00, 0x00. // PreviousTagLength.
5. SEI NALU
SEI是H.264里面的附加增強信息NALU,他對解析解碼沒有幫助,但提供一些編碼器控制參數等信息。
FLV沒有一個Tag單獨包含SEI數據,它把SEI數據和緊隨其后那個視頻NALU數據打在同一個Video Tag里面。
包含SEI數據的VideoTag結構如下
字節位置 意義
0x09, // 0, TagType
0xzz, 0xzz, 0xzz, // 1-3, DataSize,
0xzz, 0xzz, 0xzz, 0xzz, // 4-6, 7; TimeStamp | TimeStampExtend
0x00, 0x00, 0x00, // 8-10, StreamID
0x27, // 11, FrameType | CodecID
0x01, // 12, AVCPacketType
0x00, 0x00, 0x00, // 13-15, CompositionTime
0xzz, 0xzz, 0xzz, 0xzz, // 16-19, SEILength NBytes
0xzz, ...., ...., 0xzz, // NBytes, SEIData
0xzz, 0xzz, 0xzz, 0xzz, // NaluLength NBytes
0xzz, ...., ...., 0xzz, // NBytes, NaluData
0xzz, 0xzz, 0xzz, 0xzz. // PreviousTagSize
5.1 DataSize[0,1,2] = (NaluLength + 5 + 4) + (SEILength + 4);
6. 得到NALU代碼
// 輸入: H264_fp 264文件指針
// 輸出: 找到的Nalu長度,
*nalu_type返回找到的NALU類型
int h264_get_nalu(FILE *h264_fp, uint8_t *nalu_type) {
int start_pos = -1;
int nalu_size = 0;
int zero_num = 0;
uint8_t tmp;
while(!feof(h264_fp)){
fread(&tmp, 1, 1, h264_fp);
if(tmp == 0) zero_num++;
else if(tmp == 1) {
if(zero_num >= 3) {
if(start_pos == -1) {
start_pos = ftell(h264_fp);
fread(nalu_type, 1, 1, h264_fp);
} else {
nalu_size = ftell(h264_fp) - start_pos - 4;
fseek(h264_fp, start_pos, 0);
break;
}
}
} else
zero_num = 0;
}
return nalu_size;
}
Overview
Flash Video(簡稱FLV),是一種流行的網絡格式。目前國內外大部分視頻分享網站都是采用的這種格式.
File Structure
從整個文件上開看,FLV是由The FLV header 和 The FLV File Body 組成.
1.The FLV header
Field | Type | Comment |
Signature | UI8 | Signature byte always 'F' (0x46) |
Signature | UI8 | Signature byte always 'L' (0x4C) |
Signature | UI8 | Signature byte always 'V' (0x56) |
Version | UI8 | File version (for example, 0x01 for FLV version 1) |
TypeFlagsReserved | UB [5] | Shall be 0 |
TypeFlagsAudio | UB [1] | 1 = Audio tags are present |
TypeFlagsReserved | UB [1] | Shall be 0 |
TypeFlagsVideo | UB [1] | 1 = Video tags are present |
DataOffset | UI32 | The length of this header in bytes |
Signature: FLV 文件的前3個字節為固定的‘F’‘L’‘V’,用來標識這個文件是flv格式的.在做格式探測的時候,
如果發現前3個字節為“FLV”,就認為它是flv文件.
Version: 第4個字節表示flv版本號.
Flags: 第5個字節中的第0位和第2位,分別表示 video 與 audio 存在的情況.(1表示存在,0表示不存在)
DataOffset : 最后4個字節表示FLV header 長度.
2.The FLV File Body
Field | Type | Comment |
PreviousTagSize0 | UI32 | Always 0 |
Tag1 | FLVTAG | First tag |
PreviousTagSize1 | UI32 | Size of previous tag, including its header, in bytes. For FLV version1, this value is 11 plus the DataSize of the previous tag. |
Tag2 | FLVTAG | Second tag |
... | ... | ... |
PreviousTagSizeN-1 | UI32 | Size of second-to-last tag, including its header, in bytes. |
TagN | FLVTAG | Last tag |
PreviousTagSizeN | UI32 | Size of last tag, including its header, in bytes |
FLV header之后,就是 FLV File Body.
FLV File Body是由一連串的back-pointers + tags構成.back-pointers就是4個字節數據,表示前一個tag的size.
FLV Tag Definition
FLV文件中的數據都是由一個個TAG組成,TAG里面的數據可能是video、audio、scripts.
下表是TAG的結構:
1.FLVTAG
Field | Type | Comment |
Reserved | UB [2] | Reserved for FMS, should be 0 |
Filter | UB [1] | Indicates if packets are filtered. 0 = No pre-processing required. 1 = Pre-processing (such as decryption) of the packet is required before it can be rendered. Shall be 0 in unencrypted files, and 1 for encrypted tags. See Annex F. FLV Encryption for the use of filters. |
TagType | UB [5] | Type of contents in this tag. The following types are |
DataSize | UI24 | Length of the message. Number of bytes after StreamID to end of tag (Equal to length of the tag – 11) |
Timestamp | UI24 | Time in milliseconds at which the data in this tag applies. This value is relative to the first tag in the FLV file, which always has a timestamp of 0. |
TimestampExtended | UI8 | Extension of the Timestamp field to form a SI32 value. This field represents the upper 8 bits, while the previous Timestamp field represents the lower 24 bits of the time in milliseconds. |
StreamID | UI24 | Always 0. |
AudioTagHeader | IF TagType == 8 AudioTagHeader |
|
VideoTagHeader | IF TagType == 9 VideoTagHeader |
|
EncryptionHeader | IF Filter == 1 EncryptionTagHeader |
|
FilterParams | IF Filter == 1 FilterParams |
|
Data | IF TagType == 8 AUDIODATA IF TagType == 9 VIDEODATA IF TagType == 18 SCRIPTDATA |
Data specific for each media type. |
TagType: TAG中第1個字節中的前5位表示這個TAG中包含數據的類型,8 = audio,9 = video,18 = script data.
DataSize:StreamID之后的數據長度.
Timestamp和TimestampExtended組成了這個TAG包數據的PTS信息,記得剛開始做FVL demux的時候,並沒有考慮TimestampExtended的值,直接就把Timestamp默認為是PTS,后來發生的現 象就是畫面有跳幀的現象,后來才仔細看了一下文檔發現真正數據的PTS是PTS= Timestamp | TimestampExtended<<24.
StreamID之后的數據就是每種格式的情況不一樣了,接下格式進行詳細的介紹.
Audio Tags
如果TAG包中的TagType==8時,就表示這個TAG是audio。
StreamID之后的數據就表示是AudioTagHeader,AudioTagHeader結構如下:
Field | Type | Comment |
SoundFormat | UB [4] | Format of SoundData. The following values are defined: 0 = Linear PCM, platform endian 1 = ADPCM 2 = MP3 3 = Linear PCM, little endian 4 = Nellymoser 16 kHz mono 5 = Nellymoser 8 kHz mono 6 = Nellymoser 7 = G.711 A-law logarithmic PCM 8 = G.711 mu-law logarithmic PCM 9 = reserved 10 = AAC 11 = Speex 14 = MP3 8 kHz 15 = Device-specific sound Formats 7, 8, 14, and 15 are reserved. AAC is supported in Flash Player 9,0,115,0 and higher. Speex is supported in Flash Player 10 and higher. |
SoundRate | UB [2] | Sampling rate. The following values are defined: 0 = 5.5 kHz 1 = 11 kHz 2 = 22 kHz 3 = 44 kHz |
SoundSize | UB [1] | Size of each audio sample. This parameter only pertains to |
SoundType | UB [1] | Mono or stereo sound 0 = Mono sound 1 = Stereo sound |
AACPacketType | IF SoundFormat == 10 UI8 |
The following values are defined: 0 = AAC sequence header 1 = AAC raw |
AudioTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着音頻類型、采樣率等的基本信息.表里列的十分清楚.
AudioTagHeader之后跟着的就是AUDIODATA數據了,也就是audio payload 但是這里有個特例,如果音頻格式(SoundFormat)是10 = AAC,AudioTagHeader中會多出1個字節的數據AACPacketType,這個字段來表示AACAUDIODATA的類型:0 = AAC sequence header,1 = AAC raw。
Field | Type | Comment |
Data | IF AACPacketType ==0 AudioSpecificConfig |
The AudioSpecificConfig is defined in ISO14496-3. Note that this is not the same as the contents of the esds box from an MP4/F4V file. |
ELSE IF AACPacketType == 1 Raw AAC frame data in UI8 [ ] |
audio payload |
AAC sequence header也就是包含了AudioSpecificConfig,AudioSpecificConfig包含着一些更加詳細音頻的信息,AudioSpecificConfig的定義在ISO14496-3中1.6.2.1 AudioSpecificConfig,這里就不詳細貼了。而且在ffmpeg中有對AudioSpecificConfig解析的函數,ff_mpeg4audio_get_config(),可以對比的看一下,理解更深刻。
AAC raw 這種包含的就是音頻ES流了,也就是audio payload.
在FLV的文件中,一般情況下 AAC sequence header 這種包只出現1次,而且是第一個audio tag,為什么要提到這種tag,因為當時在做FLVdemux的時候,如果是AAC的音頻,需要在每幀AAC ES流前邊添加7個字節ADST頭,ADST在音頻的格式中會詳細解讀,這是解碼器通用的格式,就是AAC的純ES流要打包成ADST格式的AAC文件,解碼器才能正常播放.就是在打包ADST的時候,需要samplingFrequencyIndex這個信息,samplingFrequencyIndex最准確的信息是在AudioSpecificConfig中,所以就對AudioSpecificConfig進行解析並得到了samplingFrequencyIndex。
到這步你就完全可以把FLV 文件中的音頻信息及數據提取出來,送給音頻解碼器正常播放了。
Video Tags
如果TAG包中的TagType==9時,就表示這個TAG是video.
StreamID之后的數據就表示是VideoTagHeader,VideoTagHeader結構如下:
Field | Type | Comment |
Frame Type | UB [4] | Type of video frame. The following values are defined: 1 = key frame (for AVC, a seekable frame) 2 = inter frame (for AVC, a non-seekable frame) 3 = disposable inter frame (H.263 only) 4 = generated key frame (reserved for server use only) 5 = video info/command frame |
CodecID | UB [4] | Codec Identifier. The following values are defined: 2 = Sorenson H.263 3 = Screen video 4 = On2 VP6 5 = On2 VP6 with alpha channel 6 = Screen video version 2 7 = AVC |
AVCPacketType | IF CodecID == 7 UI8 |
The following values are defined: |
CompositionTime | IF CodecID == 7 SI24 |
IF AVCPacketType == 1 Composition time offset ELSE 0 See ISO 14496-12, 8.15.3 for an explanation of composition times. The offset in an FLV file is always in milliseconds. |
VideoTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着視頻幀類型及視頻CodecID最基本信息.表里列的十分清楚.
VideoTagHeader之后跟着的就是VIDEODATA數據了,也就是video payload.當然就像音頻AAC一樣,這里也有特例就是如果視頻的格式是AVC(H.264)的話,VideoTagHeader會多出4個字節的信息.
AVCPacketType 和 CompositionTime。AVCPacketType 表示接下來 VIDEODATA (AVCVIDEOPACKET)的內容:
IF AVCPacketType == 0 AVCDecoderConfigurationRecord(AVC sequence header)
IF AVCPacketType == 1 One or more NALUs (Full frames are required)
AVCDecoderConfigurationRecord.包含着是H.264解碼相關比較重要的sps和pps信息,再給AVC解碼器送數據流之前一定要把sps和pps信息送出,否則的話解碼器不能正常解碼。而且在解碼器stop之后再次start之前,如seek、快進快退狀態切換等,都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情況也是出現1次,也就是第一個video tag.
AVCDecoderConfigurationRecord的定義在ISO 14496-15, 5.2.4.1中,這里不在詳細貼,
SCRIPTDATA
如果TAG包中的TagType==18時,就表示這個TAG是SCRIPT.
SCRIPTDATA 結構十分復雜,定義了很多格式類型,每個類型對應一種結構.
Field | Type | Comment |
Type | UI8 | Type of the ScriptDataValue. The following types are defined: 0 = Number 1 = Boolean 2 = String 3 = Object 4 = MovieClip (reserved, not supported) 5 = Null 6 = Undefined 7 = Reference 8 = ECMA array 9 = Object end marker 10 = Strict array 11 = Date 12 = Long string |
ScriptDataValue | IF Type == 0 DOUBLE IF Type == 1 UI8 IF Type == 2 SCRIPTDATASTRING IF Type == 3 SCRIPTDATAOBJECT IF Type == 7 UI16 IF Type == 8 SCRIPTDATAECMAARRAY IF Type == 10 SCRIPTDATASTRICTARRAY IF Type == 11 SCRIPTDATADATE IF Type == 12 SCRIPTDATALONGSTRING |
Script data value. The Boolean value is (ScriptDataValue ≠ 0). |
類型在FLV的官方文檔中都有詳細介紹.
onMetaData
onMetaData 是SCRIPTDATA中對我們來說十分重要的信息,結構如下表:
Property Name | Type | Comment |
audiocodecid | Number | Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values) |
audiodatarate | Number | Audio bit rate in kilobits per second |
audiodelay | Number | Delay introduced by the audio codec in seconds |
audiosamplerate | Number | Frequency at which the audio stream is replayed |
audiosamplesize | Number | Resolution of a single audio sample |
canSeekToEnd | Boolean | Indicating the last video frame is a key frame |
creationdate | String | Creation date and time |
duration | Number | Total duration of the file in seconds |
filesize | Number | Total size of the file in bytes |
framerate | Number | Number of frames per second |
height | Number | Height of the video in pixels |
stereo | Boolean | Indicating stereo audio |
videocodecid | Number | Video codec ID used in the file (see E.4.3.1 for available CodecID values) |
videodatarate | Number | Video bit rate in kilobits per second |
width | Number | Width of the video in pixels |
這里面的duration、filesize、視頻的width、height等這些信息對我們來說很有用.
keyframes
當時在做flv demux的時候,發現官方的文檔中並沒有對keyframes index做描述,但是flv的這種結構每個tag又不像TS有同步頭,如果沒有keyframes index 的話,seek及快進快退的效果會非常差,因為需要一個tag一個tag的順序讀取。后來通過網絡查一些資料,發現了一個keyframes的信息藏在SCRIPTDATA中。
keyframes幾乎是一個非官方的標准,也就是民間標准.在網上已經很難看到flv文件格式,但是metadata里面不包含 keyframes項目的視頻 . 兩個常用的操作metadata的工具是flvtool2和FLVMDI,都是把keyframes作為一個默認的元信息項目.在FLVMDI的主頁(http://www.buraks.com/flvmdi/)上有描述:
keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).
也就是說keyframes中包含着2個內容 'filepositions' and 'times'分別指的是關鍵幀的文件位置和關鍵幀的PTS.通過keyframes可以建立起自己的Index,然后再seek和快進快退的操作中,快速有效的跳轉到你想要找的關鍵幀的位置進行處理。
如果要對流中的音頻或視頻單獨處理,需要根據flv協議分別提取。
簡單修改rtmpdump代碼,增加相應功能。
1 提取音頻:
rtmpdump程序在Download函數中循環下載:
....
do
{
....
nRead = RTMP_Read(rtmp, buffer, bufferSize);
....
}while(!RTMP_ctrlC && nRead > -1 && RTMP_IsConnected(rtmp) && !RTMP_IsTimedout(rtmp));
....
原程序是收到后寫文件,生成flv。
現在,在寫之前分別提取音視頻,提取音頻比較簡單,直接分析buffer(參考RTMP_Write函數里的方法).
注意的是,rtmpdump里用的是RTMP_Read來接收,注意它的參數。為了方便,也可以直接用RTMP_ReadPacket。后面的視頻使用RTMP_ReadPacket來接收並處理。
int RTMP_Write2(RTMP *r, const char *buf, int size)
{
RTMPPacket *pkt = &r->m_write;
char *pend, *enc;
int s2 = size, ret, num;
if (size < 11) {
/* FLV pkt too small */
return 0;
}
if (buf[0] == 'F' && buf[1] == 'L' && buf[2] == 'V')
{
buf += 13;
s2 -= 13;
}
pkt->m_packetType = *buf++;
pkt->m_nBodySize = AMF_DecodeInt24(buf);
buf += 3;
pkt->m_nTimeStamp = AMF_DecodeInt24(buf);
buf += 3;
pkt->m_nTimeStamp |= *buf++ << 24;
buf += 3;
s2 -= 11;
if (((pkt->m_packetType == RTMP_PACKET_TYPE_AUDIO
|| pkt->m_packetType == RTMP_PACKET_TYPE_VIDEO) &&
!pkt->m_nTimeStamp) || pkt->m_packetType == RTMP_PACKET_TYPE_INFO)
{
pkt->m_headerType = RTMP_PACKET_SIZE_LARGE;
if (pkt->m_packetType == RTMP_PACKET_TYPE_INFO)
pkt->m_nBodySize += 16;
}
else
{
pkt->m_headerType = RTMP_PACKET_SIZE_MEDIUM;
}
BYTE outbuf2[640];
int nLen2 = 640;
AVManager::GetInstance()->Decode((BYTE*)(pkt->m_body+1), pkt->m_nBodySize-1, outbuf2, nLen2);
//實際音頻內容為pkt->m_body+1,大小是pkt->m_nBodySize-1。這里的聲音是speex編碼。
為什么跳過第一字節,可以參考:http://bbs.rosoo.net/thread-16488-1-1.html
evt_OnReceivePacket((char*)outbuf2, nLen2);//回調出來
RTMPPacket_Free(pkt);
pkt->m_nBytesRead = 0;
2
視頻處理
可以參考rtmpsrv.c
把nRead = RTMP_Read(rtmp, buffer, bufferSize);改成:
RTMPPacket pc = { 0 }, ps = { 0 };
bool bFirst = true;
while (RTMP_ReadPacket(rtmp, &pc))
{
if (RTMPPacket_IsReady(&pc))
{
if (pc.m_packetType == RTMP_PACKET_TYPE_VIDEO && RTMP_ClientPacket(rtmp, &pc))
{
bool bIsKeyFrame = false;
if (result == 0x17)//I frame
{
bIsKeyFrame = true;
}
else if (result == 0x27)
{
bIsKeyFrame = false;
}
static unsigned char const start_code[4] = {0x00, 0x00, 0x00, 0x01};
fwrite(start_code, 1, 4, pf );
//int ret = fwrite(pc.m_body + 9, 1, pc.m_nBodySize-9, pf);
if( bFirst) {
//AVCsequence header
//ioBuffer.put(foredata);
//獲取sps
int spsnum = data[10]&0x1f;
int number_sps = 11;
int count_sps = 1;
while (count_sps<=spsnum){
int spslen =(data[number_sps]&0x000000FF)<<8 |(data[number_sps+1]&0x000000FF);
number_sps += 2;
fwrite(data+number_sps, 1, spslen, pf );
fwrite(start_code, 1, 4, pf );
//ioBuffer.put(data,number_sps, spslen);
//ioBuffer.put(foredata);
number_sps += spslen;
count_sps ++;
}
//獲取pps
int ppsnum = data[number_sps]&0x1f;
int number_pps = number_sps+1;
int count_pps = 1;
while (count_pps<=ppsnum){
int ppslen =(data[number_pps]&0x000000FF)<<8|data[number_pps+1]&0x000000FF;
number_pps += 2;
//ioBuffer.put(data,number_pps,ppslen);
//ioBuffer.put(foredata);
fwrite(data+number_pps, 1, ppslen, pf );
fwrite(start_code, 1, 4, pf );
number_pps += ppslen;
count_pps ++;
}
bFirst =false;
} else {
//AVCNALU
int len =0;
int num =5;
//ioBuffer.put(foredata);
while(num<pc.m_nbodysize)
{
len =(data[num]&0x000000FF)<<24|(data[num+1]&0x000000FF)<<16|(data[num+2]&0x000000FF)<<8|data[num+3]&0x000000FF;
num = num+4;
//ioBuffer.put(data,num,len);
//ioBuffer.put(foredata);
fwrite(data+num, 1, len, pf );
fwrite(start_code, 1, 4, pf );
num = num + len;
}
}
}
}