【轉】打包AAC碼流到FLV文件


AAC編碼后數據打包到FLV很簡單。


1. FLV音頻Tag格式
                              字節位置    意義
0x08,                        // 0,       TagType
0xzz, 0xzz, 0xzz,            // 1-3,     DataSize,    
0xzz, 0xzz, 0xzz, 0xzz,      // 4-6, 7   TimeStamp | TimeStampExtend    
0x00, 0x00, 0x00,            // 8-10,    StreamID
 
0xzz,                        // 11,      AudioTag Header  
0x0b,                        // 12,      AACPacketType    (如果不是AAC編碼 沒有這個字節)       
0xzz ... 0xzz                // 音頻數據



2. AudioTagHeader
音頻Tag頭一般由一個字節定義(AAC用兩個字節),第一個字節的定義如下:
音頻格式 4bits | 采樣率 2bits | 采樣精度 1bits | 聲道數 1bits|

音頻格式 4bits
0x00 = Linear PCM, platform endian
0x01 = ADPCM
0x02 = MP3
0x03 = Linear PCM, little endian
0x04 = Nellymoser 16-kHz mono
0x05 = Nellymoser 8-kHz mono
0x06 = Nellymoser
0x07 = G.711 A-law logarithmic PCM
0x08 = G.711 mu-law logarithmic PCM
0x09 = reserved
0x0A = AAC
0x0B = Speex
0x0E = MP3 8-Khz
0x0F = Device-specific sound

采樣率 2bits
0 = 5.5-kHz
1 = 11-kHz
2 = 22-kHz
3 = 44-kHz
對於AAC總是3,這里看起來FLV不支持48K AAC,其實不是的,后面還是可以定義為48K。

采樣精度 1bits
0 = snd8Bit
1 = snd16Bit
壓縮過的音頻都是16bit

聲道數 1bits
0 = sndMono
1 = sndStereo
對於AAC總是1


綜上,如果是AAC 48K 16比特精度 雙聲道編碼,該字節為 0b1010 1111 = 0xAF。

看第2個字節,如果音頻格式AAC(0x0A),AudioTagHeader中會多出1個字節的數據AACPacketType,這個字段來表示AACAUDIODATA的類型:
0x00 = AAC sequence header,類似h.264的sps,pps,在FLV的文件頭部出現一次。
0x01 = AAC raw,AAC數據



3. AAC Sequence header
AAC sequence header定義AudioSpecificConfig,AudioSpecificConfig包含着一些更加詳細的音頻信息,它的定義在ISO14496-3中1.6.2.1。
簡化的AudioSpecificConfig 2字節定義如下:
AAC Profile 5bits | 采樣率 4bits | 聲道數 4bits | 其他 3bits |

AAC Profile 5bits,參考ISO-14496-3 Object Profiles Table
AAC Main 0x01
AAC LC    0x02
AAC SSR  0x03
...

(為什么有些文檔看到profile定義為4bits,實際驗證是5bits)


采樣率 4bits
Value samplingFrequencyIndex
0x00   96000
0x01   88200
0x02   64000
0x03   48000
0x04   44100
0x05   32000
0x06   24000
0x07   22050
0x08   16000
0x09   12000
0x0A   11025
0x0B     8000
0x0C   reserved
0x0D   reserved
0x0E   reserved
0x0F   escape value


聲道數 4bits
0x00 - defined in audioDecderSpecificConfig
0x01 單聲道(center front speaker)
0x02 雙聲道(left, right front speakers)
0x03 三聲道(center, left, right front speakers)
0x04 四聲道(center, left, right front speakers, rear surround speakers)
0x05 五聲道(center, left, right front speakers, left surround, right surround rear speakers)
0x06 5.1聲道(center, left, right front speakers, left surround, right surround rear speakers, front low frequency effects speaker)
0x07 7.1聲道(center, left, right center front speakers, left, right outside front speakers, left surround, right surround rear speakers, front low frequency effects speaker)
0x08-0x0F - reserved


其他3bits設置為0即可。

AAC-LC, 48000,雙聲道 這樣的設置 Sequence header 為 0b 00010 0011 0010 000 = 0x11 0x90。
因此 AAC Sequence header的整個音頻Tag包為 0x08, 00 00 04, 00 00 00 00, 00 00 00, AF 00 11 90 | 00 00 00 0F

AAC Sequence header這個音頻包有些FLV文件里面沒有也可以正確解碼。但對於RTMP播放,必須要在發送第一個音頻數據包前發送這個header包。

 

 

4. AAC音頻包
結構為:0x08, 3字節包長度,4字節時間戳,00 00 00,AF 01 N字節AAC數據 | 前包長度
其中編碼后AAC純數據長度為N,3字節包長度 = N + 2

前包長度 = 11 + 3字節包長度 = 11 + N + 2 = 13 + N。

 

FLV格式非常簡單,頭信息數據量很少,適合網絡傳輸,因此被廣泛的應用。 
1. H264 NALU結構 
    h264 NALU:  0x00 00 00 01 | nalu_type(1字節)| nalu_data (N 字節) | 0x00 00 00 01 | ... 
                      起始碼(4字節)          類型                            數據               下一個NALU起始碼 
             H264 NALU固定以 0x00 00 00 01為起始,NALU_data部分不會出現這個起始碼; 
             在找到下一個起始碼之前,當前NALU數據長度不知; 
             NALU_type 1字節,定義為:1比特禁止位 | 2比特 重要性指示位  | 5比特 類型 
                                                             固定為0           11重要 不能少          1-12 由h264使用 
                                                                                    00不重要 可以丟棄      
             幾個常用Nalu_type: 
                               0x67 (0 11 00111) SPS    非常重要       type = 7 
                               0x68 (0 11 01000) PPS     非常重要       type = 8 
                               0x65 (0 11 00101) IDR幀  關鍵幀  非常重要 type = 5 
                               0x41 (0 10 00001) P幀      重要         type = 1      
                               0x01 (0 00 00001) B幀     不重要        type = 1 
                               0x06 (0 00 00110) SEI     不重要        type = 6 
2. FLV tag 
    前面講過FLV文件就是由無數個Tag組成的,Tag有Video Tag, Audio Tag和Script Tag. 
    A/V Tag里面存儲的就是音視頻編碼數據,Script Tag里面是一些碼流描述信息。 
    理論上來說,不解析Script tag也可以對A/V Tag完整解碼。tag的固定格式是: 
     Tag Type(1字節) | DataSize(3字節) | TimeStamp(3字節) | TimeStampExtended (1字節)| StreamID (3) | ... 
     下面將分別介紹各種NALU封到tag里面的結構。 

2. 一般Video tag 
                                         字節位置    意義 
0x09,                              // 0,        TagType 
0xzz, 0xzz, 0xzz,              // 1-3,     DataSize,     
0xzz, 0xzz, 0xzz, 0xzz,    // 4-6, 7 TimeStamp | TimeStampExtend     
0x00, 0x00, 0x00,            // 8-10,   StreamID 
  
0xz7,                                  // 11,       FrameType | CodecID 
0x01,                                  // 12,       AVCPacketType        
0x00, 0x00, 0x00,                // 13-15, CompositionTime 
  
0xzz, 0xzz, 0xzz, 0xzz,        // 16-19,   NaluLength   NBytes 
0xzz, ...., ...., 0xzz,             // NBytes,  NaluData   
  
0xzz, 0xzz, 0xzz, 0xzz.        // N+1-N+3, PreviousTagSize 

    其中 0xzz的意思是該字節根據實際情況付不同的值 
    2.1 DataSize[0,1,2] = NaluLength + 5 + 4; 
           5 是 AVCPacket頭5比特,(FrameType+CodecID | AVCPacketType | CompositionTime) 
           4 是寫入NaluLength        

    2.2 對於一個裸h264流,沒有時間戳的概念,可以默認以25fps,即40ms一幀數據。 
        int cts = 0; 
        TimeStamp[0,1,2]   = cts[0,1,2]; 
        TimeStampExtend[0] = cts[3];    
        cts += 40; 
     
    2.3  if(nalu_type == IDR)  FrameType | CodecID = 0x17; 
            else                          FrameType | CodecID = 0x27; 

    2.4 NaluLength就是nalu長度,然后緊跟N字節的Nalu數據。 
    2.5 PreviousTagSize在這里計算最為方便,PreviousTagSize = 11 + 5 + 4 + NaluLength 
                                                            11 是video tag頭數據 (TagType到StreamID) 
     IDR,I,P,B幀的NALU都是這個結構 

3. SPS/PPS NALU 
   SPS和PPS在FLV里面稱為序列頭信息sequence header,它的AVCPacketType為0x00 
                                      字節位置   意義 
0x09,                               // 0,       TagType 
0xzz, 0xzz, 0xzz,               // 1-3,     DataSize,     
0x00, 0x00, 0x00, 0x00,      // 4-6, 7;  TimeStamp | TimeStampExtend     
0x00, 0x00, 0x00,             // 8-10,    StreamID 
  
                                            // AVC video tage header 5Bytes   
0x17,                                  // 11,      FrameType | CodecID 
0x00,                                  // 12,      AVCPacketType        
0x00, 0x00, 0x00,                // 13-15,   CompositionTime 
        
                                            // AVCDecoderConfigurationRecord 6 Bytes 
0x01,                                  // 16, ConfigurationVersion 
0xzz,                                   // 17,  AVC Profile 
0x00,                                  // 18,  profile_compatibility 
0xzz,                                  // 19,  AVC Level 
0xFF,                                  // 20,  lengthSizeMinusOne, 
                                           //         reserved 6bits | NAL unit length-1, commonly be 3 
0xzz,                                  // 21,  numOfSequenceParameterSets, 
                                           //         reserved 3bits | SPS count, commonly be 1 

0xzz, 0xzz,                       // 22-23,   SPS0 Length N0 Byte 
0xzz, ...., 0xzz,                 // N0 Byte  SPS0 Data 
0xzz, 0xzz,                       // SPSm Length Nm Byte (如果存在)  循環存放最多31個SPS        
0xzz, ...., 0xzz,                 // Nm Byte  SPSm Data 
         
0xzz,                             //          PPS count 
0xzz, 0xzz,                      //          PPS0 Length 
0xzz, ...., 0xzz,                // N0 Byte  PPS0 Data 
0xzz, 0xzz,                      //          PPSm Length Nm Byte (如果存在)  循環存放最多255個PPS        
0xzz, ...., 0xzz,                // Nm Byte  PPSm Data 
  
0xzz, 0xzz, 0xzz, 0xzz. // N+1-N+3, PreviousTagSize 

  3.1 在H.264碼流里面reserved bit一般為0; 而在FLV碼流里面reserved bit定義為1 
  3.2 在H.264里面 SPS和PPS是對立的NALU,但是在FLV里面會把他們統一寫在一個Video Tag里面。 
         而且這個tag必須是FLV里面第一個Video Tag,否則接收到其他video tag也沒法解碼. 
         為了防止SPS,PPS數據丟失,有些編碼器會在每個IDR幀之前重復發SPS,PPS。這些SPS其實是一樣的。 
         但也不排除有些變態的編碼器前后的SPS會不同,比較標准容許這樣做。 
         這樣就需要首先遍歷一邊h264碼流,將其中不同的SPS,PPS提起出來,先記錄下來,然后再統一寫到FLV。 
         也可以大膽一點接收到第一個SPS和第一個PPS后就結束這個遍歷,就當作碼流里面只有一個SPS和一個PPS。 
  
  3.3 DataSize=5 +                  // AVC video tag header (FrameType + CodecID | .. CompositionTime) 
                       6 +                   // AVCDecoderConfigurationRecord 
                      SPSCount*2 +               // 每個SPS長度2字節 
                      各個 SPSDataLength + // 所有SPS數據長度和 
                      1 +                                     // PPS個數 
                      PPSCount*2 +                // 每個PPS長度2字節 

                      各個 PPSDataLength;   // 所有PPS數據長度和

  3.4 AVC Profile和 AVC Level就等於SPS NALU里面第1字節和第3字節 (第0字節為NaluType)  

  3.5 lengthSizeMinusOne,這個定義沒有理解,不知道低2比特是什么含義,看到很多文檔里面就直接設為0b11, 所有這個字節為 0xFF

  3.6 numOfSequenceParameterSets, 低5比特是SPS個數,H.264標准里面定義最多SPS個數為255,這里只有31。 

        不知道會不會存在問題,當然一般情況下就一個SPS,該值為 0xE1 (0b111 00001) 

  3.7 每個SPS,PPS數據長度都用兩個字節來表述,

  3.8 這個tag的 PreviousTagSize = 11 + DataSize。 11 是Video tag (TagType到StreamID)    

4. FLV頭 
       'F', 'L', 'V',                              // 0-2 FLV file Signature, also can be 'f''l''v' 
       0x01,                                     // FLV version, 
       0x0z,                                     // AV tag Enable.  0x05 AV both, 0x03 audio only, 0x01 video only 
       0x00, 0x00, 0x00, 0x09,        // Length of this header. 
       0x00, 0x00, 0x00, 0x00.        // PreviousTagLength. 

5. SEI NALU 
   SEI是H.264里面的附加增強信息NALU,他對解析解碼沒有幫助,但提供一些編碼器控制參數等信息。 
   FLV沒有一個Tag單獨包含SEI數據,它把SEI數據和緊隨其后那個視頻NALU數據打在同一個Video Tag里面。 
   包含SEI數據的VideoTag結構如下 
                                      字節位置   意義 
0x09,                               // 0,       TagType 
0xzz, 0xzz, 0xzz,               // 1-3,     DataSize,     
0xzz, 0xzz, 0xzz, 0xzz,     // 4-6, 7;  TimeStamp | TimeStampExtend     
0x00, 0x00, 0x00,             // 8-10,    StreamID 
  
0x27,                                   // 11,      FrameType | CodecID 
0x01,                                   // 12,      AVCPacketType        
0x00, 0x00, 0x00,                 // 13-15,   CompositionTime 

0xzz, 0xzz, 0xzz, 0xzz,         // 16-19,   SEILength   NBytes 
0xzz, ...., ...., 0xzz,              // NBytes,  SEIData   
  
0xzz, 0xzz, 0xzz, 0xzz,         //          NaluLength   NBytes 
0xzz, ...., ...., 0xzz,              // NBytes,  NaluData   
  
0xzz, 0xzz, 0xzz, 0xzz.     // PreviousTagSize 

   5.1 DataSize[0,1,2] = (NaluLength + 5 + 4) + (SEILength + 4); 

6. 得到NALU代碼 
// 輸入:  H264_fp 264文件指針 
// 輸出: 找到的Nalu長度, 
          *nalu_type返回找到的NALU類型 
int h264_get_nalu(FILE *h264_fp, uint8_t *nalu_type) { 
    int start_pos = -1; 
    int nalu_size = 0; 
    int zero_num = 0; 
    uint8_t tmp; 
    while(!feof(h264_fp)){ 
        fread(&tmp, 1, 1, h264_fp); 
        if(tmp == 0) zero_num++; 
        else if(tmp == 1) { 
            if(zero_num >= 3) { 
                if(start_pos == -1) { 
                    start_pos = ftell(h264_fp); 
                    fread(nalu_type, 1, 1, h264_fp); 
                } else { 
                    nalu_size = ftell(h264_fp) - start_pos - 4; 
                    fseek(h264_fp, start_pos, 0); 
                    break; 
                } 
            } 
        } else 
            zero_num = 0; 
    } 
    return nalu_size; 
}

 

Overview

Flash Video(簡稱FLV),是一種流行的網絡格式。目前國內外大部分視頻分享網站都是采用的這種格式.


File Structure

從整個文件上開看,FLV是由The FLV header 和 The FLV File Body 組成.

1.The FLV header

Field Type Comment
Signature UI8 Signature byte always 'F' (0x46)
Signature UI8 Signature byte always 'L' (0x4C)
Signature UI8 Signature byte always 'V' (0x56)
Version UI8 File version (for example, 0x01 for FLV version 1)
TypeFlagsReserved UB [5] Shall be 0
TypeFlagsAudio UB [1] 1 = Audio tags are present
TypeFlagsReserved UB [1] Shall be 0
TypeFlagsVideo UB [1] 1 = Video tags are present
DataOffset UI32 The length of this header in bytes

 

Signature: FLV 文件的前3個字節為固定的‘F’‘L’‘V’,用來標識這個文件是flv格式的.在做格式探測的時候,

如果發現前3個字節為“FLV”,就認為它是flv文件.

Version: 第4個字節表示flv版本號.

Flags: 第5個字節中的第0位和第2位,分別表示 video 與 audio 存在的情況.(1表示存在,0表示不存在)

DataOffset : 最后4個字節表示FLV header 長度.

2.The FLV File Body

Field Type Comment
PreviousTagSize0 UI32 Always 0
Tag1 FLVTAG First tag
PreviousTagSize1 UI32

Size of previous tag, including its header, in bytes. For FLV version1,

this value is 11 plus the DataSize of the previous tag.

Tag2 FLVTAG Second tag
... ... ...
PreviousTagSizeN-1 UI32 Size of second-to-last tag, including its header, in bytes.
TagN FLVTAG Last tag
PreviousTagSizeN UI32 Size of last tag, including its header, in bytes

 

FLV header之后,就是 FLV File Body.

FLV File Body是由一連串的back-pointers + tags構成.back-pointers就是4個字節數據,表示前一個tag的size.

 


FLV Tag Definition

FLV文件中的數據都是由一個個TAG組成,TAG里面的數據可能是video、audio、scripts.

下表是TAG的結構:

1.FLVTAG

Field Type Comment
Reserved UB [2] Reserved for FMS, should be 0
Filter UB [1] Indicates if packets are filtered.
0 = No pre-processing required.
1 = Pre-processing (such as decryption) of the packet is
required before it can be rendered.
Shall be 0 in unencrypted files, and 1 for encrypted tags.
See Annex F. FLV Encryption for the use of filters.
TagType UB [5]

Type of contents in this tag. The following types are
defined:
8 = audio
9 = video
18 = script data

DataSize UI24 Length of the message. Number of bytes after StreamID to
end of tag (Equal to length of the tag – 11)
Timestamp UI24 Time in milliseconds at which the data in this tag applies.
This value is relative to the first tag in the FLV file, which
always has a timestamp of 0.
TimestampExtended UI8 Extension of the Timestamp field to form a SI32 value. This
field represents the upper 8 bits, while the previous
Timestamp field represents the lower 24 bits of the time in
milliseconds.
StreamID UI24 Always 0.
AudioTagHeader IF TagType == 8
AudioTagHeader
 
VideoTagHeader IF TagType == 9
VideoTagHeader
 
EncryptionHeader IF Filter == 1
EncryptionTagHeader
 
FilterParams IF Filter == 1
FilterParams
 
Data IF TagType == 8
AUDIODATA
IF TagType == 9
VIDEODATA
IF TagType == 18
SCRIPTDATA
Data specific for each media type.

 

TagType: TAG中第1個字節中的前5位表示這個TAG中包含數據的類型,8 = audio,9 = video,18 = script data.

DataSize:StreamID之后的數據長度.

TimestampTimestampExtended組成了這個TAG包數據的PTS信息,記得剛開始做FVL demux的時候,並沒有考慮TimestampExtended的值,直接就把Timestamp默認為是PTS,后來發生的現 象就是畫面有跳幀的現象,后來才仔細看了一下文檔發現真正數據的PTS是PTS= Timestamp | TimestampExtended<<24.

StreamID之后的數據就是每種格式的情況不一樣了,接下格式進行詳細的介紹.


Audio Tags

如果TAG包中的TagType==8時,就表示這個TAG是audio。

StreamID之后的數據就表示是AudioTagHeaderAudioTagHeader結構如下:

Field Type Comment
SoundFormat UB [4] Format of SoundData. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16 kHz mono
5 = Nellymoser 8 kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
11 = Speex
14 = MP3 8 kHz
15 = Device-specific sound
Formats 7, 8, 14, and 15 are reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player 10 and higher.
SoundRate UB [2] Sampling rate. The following values are defined:
0 = 5.5 kHz
1 = 11 kHz
2 = 22 kHz
3 = 44 kHz
SoundSize UB [1]

Size of each audio sample. This parameter only pertains to
uncompressed formats. Compressed formats always decode
to 16 bits internally.
0 = 8-bit samples
1 = 16-bit samples

SoundType UB [1] Mono or stereo sound
0 = Mono sound
1 = Stereo sound
AACPacketType IF SoundFormat == 10
UI8
The following values are defined:
0 = AAC sequence header
1 = AAC raw

 

AudioTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着音頻類型、采樣率等的基本信息.表里列的十分清楚.

AudioTagHeader之后跟着的就是AUDIODATA數據了,也就是audio payload 但是這里有個特例,如果音頻格式(SoundFormat)是10 = AAC,AudioTagHeader中會多出1個字節的數據AACPacketType,這個字段來表示AACAUDIODATA的類型:0 = AAC sequence header,1 = AAC raw。

Field Type Comment
Data

IF AACPacketType ==0 AudioSpecificConfig

The AudioSpecificConfig is defined in ISO14496-3. Note that this is not the same as the contents of the esds box from an MP4/F4V file.

 

ELSE IF AACPacketType == 1 Raw AAC frame data in UI8 [ ]

audio payload

AAC sequence header也就是包含了AudioSpecificConfigAudioSpecificConfig包含着一些更加詳細音頻的信息,AudioSpecificConfig的定義在ISO14496-31.6.2.1 AudioSpecificConfig,這里就不詳細貼了。而且在ffmpeg中有對AudioSpecificConfig解析的函數,ff_mpeg4audio_get_config(),可以對比的看一下,理解更深刻。

AAC raw 這種包含的就是音頻ES流了,也就是audio payload.

在FLV的文件中,一般情況下 AAC sequence header 這種包只出現1次,而且是第一個audio tag,為什么要提到這種tag,因為當時在做FLVdemux的時候,如果是AAC的音頻,需要在每幀AAC ES流前邊添加7個字節ADST頭,ADST在音頻的格式中會詳細解讀,這是解碼器通用的格式,就是AAC的純ES流要打包成ADST格式的AAC文件,解碼器才能正常播放.就是在打包ADST的時候,需要samplingFrequencyIndex這個信息,samplingFrequencyIndex最准確的信息是在AudioSpecificConfig中,所以就對AudioSpecificConfig進行解析並得到了samplingFrequencyIndex。

到這步你就完全可以把FLV 文件中的音頻信息及數據提取出來,送給音頻解碼器正常播放了。


Video Tags

如果TAG包中的TagType==9時,就表示這個TAG是video.

StreamID之后的數據就表示是VideoTagHeaderVideoTagHeader結構如下:

Field Type Comment
Frame Type UB [4] Type of video frame. The following values are defined:
1 = key frame (for AVC, a seekable frame)
2 = inter frame (for AVC, a non-seekable frame)
3 = disposable inter frame (H.263 only)
4 = generated key frame (reserved for server use only)
5 = video info/command frame
CodecID UB [4] Codec Identifier. The following values are defined:
2 = Sorenson H.263
3 = Screen video
4 = On2 VP6
5 = On2 VP6 with alpha channel
6 = Screen video version 2
7 = AVC
AVCPacketType IF CodecID == 7
UI8

The following values are defined:
0 = AVC sequence header
1 = AVC NALU
2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)

CompositionTime IF CodecID == 7
SI24
IF AVCPacketType == 1
Composition time offset
ELSE
0
See ISO 14496-12, 8.15.3 for an explanation of composition
times. The offset in an FLV file is always in milliseconds.

VideoTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着視頻幀類型及視頻CodecID最基本信息.表里列的十分清楚.

VideoTagHeader之后跟着的就是VIDEODATA數據了,也就是video payload.當然就像音頻AAC一樣,這里也有特例就是如果視頻的格式是AVC(H.264)的話,VideoTagHeader會多出4個字節的信息.

AVCPacketType 和 CompositionTime。AVCPacketType 表示接下來 VIDEODATA (AVCVIDEOPACKET)的內容:

IF AVCPacketType == 0 AVCDecoderConfigurationRecord(AVC sequence header)
IF AVCPacketType == 1 One or more NALUs (Full frames are required)

AVCDecoderConfigurationRecord.包含着是H.264解碼相關比較重要的sps和pps信息,再給AVC解碼器送數據流之前一定要把sps和pps信息送出,否則的話解碼器不能正常解碼。而且在解碼器stop之后再次start之前,如seek、快進快退狀態切換等,都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情況也是出現1次,也就是第一個video tag.

AVCDecoderConfigurationRecord的定義在ISO 14496-15, 5.2.4.1中,這里不在詳細貼,


SCRIPTDATA

如果TAG包中的TagType==18時,就表示這個TAG是SCRIPT.

SCRIPTDATA 結構十分復雜,定義了很多格式類型,每個類型對應一種結構.

Field Type Comment
Type UI8 Type of the ScriptDataValue.
The following types are defined:
0 = Number
1 = Boolean
2 = String
3 = Object
4 = MovieClip (reserved, not supported)
5 = Null
6 = Undefined
7 = Reference
8 = ECMA array
9 = Object end marker
10 = Strict array
11 = Date
12 = Long string
ScriptDataValue IF Type == 0
DOUBLE
IF Type == 1
UI8
IF Type == 2
SCRIPTDATASTRING
IF Type == 3
SCRIPTDATAOBJECT
IF Type == 7
UI16
IF Type == 8
SCRIPTDATAECMAARRAY
IF Type == 10
SCRIPTDATASTRICTARRAY
IF Type == 11
SCRIPTDATADATE
IF Type == 12
SCRIPTDATALONGSTRING
Script data value.
The Boolean value is (ScriptDataValue ≠ 0).

類型在FLV的官方文檔中都有詳細介紹.

onMetaData

onMetaData 是SCRIPTDATA中對我們來說十分重要的信息,結構如下表:

Property Name Type Comment
audiocodecid Number Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values)
audiodatarate Number Audio bit rate in kilobits per second
audiodelay Number Delay introduced by the audio codec in seconds
audiosamplerate Number Frequency at which the audio stream is replayed
audiosamplesize Number Resolution of a single audio sample
canSeekToEnd Boolean Indicating the last video frame is a key frame
creationdate String Creation date and time
duration Number Total duration of the file in seconds
filesize Number Total size of the file in bytes
framerate Number Number of frames per second
height Number Height of the video in pixels
stereo Boolean Indicating stereo audio
videocodecid Number Video codec ID used in the file (see E.4.3.1 for available CodecID values)
videodatarate Number Video bit rate in kilobits per second
width Number Width of the video in pixels

這里面的duration、filesize、視頻的width、height等這些信息對我們來說很有用.

keyframes

當時在做flv demux的時候,發現官方的文檔中並沒有對keyframes index做描述,但是flv的這種結構每個tag又不像TS有同步頭,如果沒有keyframes index 的話,seek及快進快退的效果會非常差,因為需要一個tag一個tag的順序讀取。后來通過網絡查一些資料,發現了一個keyframes的信息藏在SCRIPTDATA中。

keyframes幾乎是一個非官方的標准,也就是民間標准.在網上已經很難看到flv文件格式,但是metadata里面不包含 keyframes項目的視頻 . 兩個常用的操作metadata的工具是flvtool2和FLVMDI,都是把keyframes作為一個默認的元信息項目.在FLVMDI的主頁(http://www.buraks.com/flvmdi/)上有描述:

keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

也就是說keyframes中包含着2個內容 'filepositions' and 'times'分別指的是關鍵幀的文件位置和關鍵幀的PTS.通過keyframes可以建立起自己的Index,然后再seek和快進快退的操作中,快速有效的跳轉到你想要找的關鍵幀的位置進行處理。 

 
 
rtmpdump可以下載rtmp流並保存成flv文件。
如果要對流中的音頻或視頻單獨處理,需要根據flv協議分別提取。
簡單修改rtmpdump代碼,增加相應功能。
1 提取音頻:
rtmpdump程序在Download函數中循環下載:
....
 do
 {
....
nRead = RTMP_Read(rtmp, buffer, bufferSize);
....
}while(!RTMP_ctrlC && nRead > -1 && RTMP_IsConnected(rtmp) && !RTMP_IsTimedout(rtmp));
....

原程序是收到后寫文件,生成flv。
現在,在寫之前分別提取音視頻,提取音頻比較簡單,直接分析buffer(參考RTMP_Write函數里的方法).
注意的是,rtmpdump里用的是RTMP_Read來接收,注意它的參數。為了方便,也可以直接用RTMP_ReadPacket。后面的視頻使用RTMP_ReadPacket來接收並處理。

int RTMP_Write2(RTMP *r, const char *buf, int size)
{
  RTMPPacket *pkt = &r->m_write;
  char *pend, *enc;
  int s2 = size, ret, num;


 if (size < 11) {
   /* FLV pkt too small */
   return 0;
 }


 if (buf[0] == 'F' && buf[1] == 'L' && buf[2] == 'V')
   {
     buf += 13;
     s2 -= 13;
   }


 pkt->m_packetType = *buf++;
 pkt->m_nBodySize = AMF_DecodeInt24(buf);
 buf += 3;
 pkt->m_nTimeStamp = AMF_DecodeInt24(buf);
 buf += 3;
 pkt->m_nTimeStamp |= *buf++ << 24;
 buf += 3;
 s2 -= 11;


 if (((pkt->m_packetType == RTMP_PACKET_TYPE_AUDIO
                || pkt->m_packetType == RTMP_PACKET_TYPE_VIDEO) &&
            !pkt->m_nTimeStamp) || pkt->m_packetType == RTMP_PACKET_TYPE_INFO)
   {
     pkt->m_headerType = RTMP_PACKET_SIZE_LARGE;
     if (pkt->m_packetType == RTMP_PACKET_TYPE_INFO)
      pkt->m_nBodySize += 16;
   }
 else
   {
     pkt->m_headerType = RTMP_PACKET_SIZE_MEDIUM;
   }


BYTE outbuf2[640];
int nLen2 = 640;


AVManager::GetInstance()->Decode((BYTE*)(pkt->m_body+1), pkt->m_nBodySize-1, outbuf2, nLen2);
//實際音頻內容為pkt->m_body+1,大小是pkt->m_nBodySize-1。這里的聲音是speex編碼。
為什么跳過第一字節,可以參考:http://bbs.rosoo.net/thread-16488-1-1.html

evt_OnReceivePacket((char*)outbuf2, nLen2);//回調出來



RTMPPacket_Free(pkt);
pkt->m_nBytesRead = 0;


視頻處理
可以參考rtmpsrv.c
把nRead = RTMP_Read(rtmp, buffer, bufferSize);改成:

RTMPPacket pc = { 0 }, ps = { 0 };
 bool bFirst = true;
while (RTMP_ReadPacket(rtmp, &pc))
{
if (RTMPPacket_IsReady(&pc))
 {
     if (pc.m_packetType == RTMP_PACKET_TYPE_VIDEO && RTMP_ClientPacket(rtmp, &pc))
    {
        bool bIsKeyFrame = false;
     if (result == 0x17)//I frame
    {
        bIsKeyFrame = true;
    }
    else if (result == 0x27)
    {
        bIsKeyFrame = false;
    }
static unsigned char const start_code[4] = {0x00, 0x00, 0x00, 0x01};
fwrite(start_code, 1, 4, pf );
//int ret = fwrite(pc.m_body + 9, 1, pc.m_nBodySize-9, pf);


if( bFirst) {


//AVCsequence header


//ioBuffer.put(foredata);


//獲取sps


int spsnum = data[10]&0x1f;


int number_sps = 11;


int count_sps = 1;


while (count_sps<=spsnum){


int spslen =(data[number_sps]&0x000000FF)<<8 |(data[number_sps+1]&0x000000FF);


number_sps += 2;


fwrite(data+number_sps, 1, spslen, pf );
fwrite(start_code, 1, 4, pf );


//ioBuffer.put(data,number_sps, spslen);
//ioBuffer.put(foredata);


number_sps += spslen;


count_sps ++;


}


//獲取pps


int ppsnum = data[number_sps]&0x1f;


int number_pps = number_sps+1;


int count_pps = 1;


while (count_pps<=ppsnum){


int ppslen =(data[number_pps]&0x000000FF)<<8|data[number_pps+1]&0x000000FF;


number_pps += 2;


//ioBuffer.put(data,number_pps,ppslen);


//ioBuffer.put(foredata);


fwrite(data+number_pps, 1, ppslen, pf );
fwrite(start_code, 1, 4, pf );


number_pps += ppslen;


count_pps ++;


}


bFirst =false;


} else {


//AVCNALU


int len =0;


int num =5;


//ioBuffer.put(foredata);


while(num<pc.m_nbodysize) 
 {


len =(data[num]&0x000000FF)<<24|(data[num+1]&0x000000FF)<<16|(data[num+2]&0x000000FF)<<8|data[num+3]&0x000000FF;


num = num+4;


//ioBuffer.put(data,num,len);


//ioBuffer.put(foredata);


fwrite(data+num, 1, len, pf );
fwrite(start_code, 1, 4, pf );


num = num + len;


}


}       
 

  }
}


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM