http://blog.csdn.net/pirateleo/article/details/7061452
一、基本概念
1、 文件,由許多Box和FullBox組成。
2、 Box,每個Box由Header和Data組成。
3、 FullBox,是Box的擴展,Box結構的基礎上在Header中增加8bits version和24bits flags。
4、 Header,包含了整個Box的長度size和類型type。當size==0時,代表這是文件中最后一個Box;當size==1時,意 味着Box長度需要更多bits來描述,在后面會定義一個64bits的largesize描述Box的長度;當type是uuid時,代表Box中的數 據是用戶自定義擴展類型。
5、 Data,是Box的實際數據,可以是純數據也可以是更多的子Boxes。
6、 當一個Box的Data中是一系列子Box時,這個Box又可成為Container Box。
結構如下圖:

文件基本結構描述圖
1、 ftypbox,在文件的開始位置,描述的文件的版本、兼容協議等;
2、 moovbox,這個box中不包含具體媒體數據,但包含本文件中所有媒體數據的宏觀描述信息,moov box下有mvhd和trak box。
>>mvhd中記錄了創建時間、修改時間、時間度量標尺、可播放時長等信息。
>>trak中的一系列子box描述了每個媒體軌道的具體信息。
3、 moofbox,這個box是視頻分片的描述信息。並不是MP4文件必須的部分,但在我們常見的可在線播放的MP4格式文件中(例如Silverlight Smooth Streaming中的ismv文件)確是重中之重。
4、 mdatbox,實際媒體數據。我們最終解碼播放的數據都在這里面。
5、 mfrabox,一般在文件末尾,媒體的索引文件,可通過查詢直接定位所需時間點的媒體數據。
附:Smooth Streaming中ismv文件結構,文件分為了多個Fragments,每個Fragment中包含moof和mdat。這樣的結構符合漸進式播放需求。(mdat及其描述信息逐步傳輸,收齊一個Fragment便可播放其中的mdat)。
/* Set other implicit flags immediately */
if (mov->mode == MODE_ISM)
mov->flags |= FF_MOV_FLAG_EMPTY_MOOV | FF_MOV_FLAG_SEPARATE_MOOF |
FF_MOV_FLAG_FRAGMENT;
static int mov_read_ftyp(MOVContext *c, AVIOContext *pb, MOVAtom atom)
{
uint32_t minor_ver;
int comp_brand_size;
char minor_ver_str[11]; /* 32 bit integer -> 10 digits + null */
char* comp_brands_str;
uint8_t type[5] = {0};
avio_read(pb, type, 4);
if (strcmp(type, "qt "))
c->isom = 1;
av_log(c->fc, AV_LOG_DEBUG, "ISO: File Type Major Brand: %.4s\n",(char *)&type);
av_dict_set(&c->fc->metadata, "major_brand", type, 0);
minor_ver = avio_rb32(pb); /* minor version */
snprintf(minor_ver_str, sizeof(minor_ver_str), "%"PRIu32"", minor_ver);
av_dict_set(&c->fc->metadata, "minor_version", minor_ver_str, 0);
comp_brand_size = atom.size - 8;
if (comp_brand_size < 0)
return AVERROR_INVALIDDATA;
comp_brands_str = av_malloc(comp_brand_size + 1); /* Add null terminator */
if (!comp_brands_str)
return AVERROR(ENOMEM);
avio_read(pb, comp_brands_str, comp_brand_size);
comp_brands_str[comp_brand_size] = 0;
av_dict_set(&c->fc->metadata, "compatible_brands", comp_brands_str, 0);
av_freep(&comp_brands_str);
return 0;
}
二、MP4文件格式(ISO-14496-12/14)
MP4文件概述
MP4文件就是由各式各樣的Box組成的,下表中列出了所有必選或可選的Box類型,√代表Box必選。

具體列表:
| ftyp |
|
|
|
|
|
√ |
file type and compatibility |
| pdin |
|
|
|
|
|
|
progressive download information |
| moov |
|
|
|
|
|
√ |
container for all the metadata |
|
|
mvhd |
|
|
|
|
√ |
movie header, overall declarations |
|
|
trak |
|
|
|
|
√ |
container for an individual track or stream |
|
|
|
tkhd |
|
|
|
√ |
track header, overall information about the track |
|
|
|
tref |
|
|
|
|
track reference container |
|
|
|
edts |
|
|
|
|
edit list container |
|
|
|
|
elst |
|
|
|
an edit list |
|
|
|
mdia |
|
|
|
√ |
container for the media information in a track |
|
|
|
|
mdhd |
|
|
√ |
media header, overall information about the media |
|
|
|
|
hdlr |
|
|
√ |
handler, declares the media (handler) type |
|
|
|
|
minf |
|
|
√ |
media information container |
|
|
|
|
|
vmhd |
|
|
video media header, overall information (video track only) |
|
|
|
|
|
smhd |
|
|
sound media header, overall information (sound track only) |
|
|
|
|
|
hmhd |
|
|
hint media header, overall information (hint track only) |
|
|
|
|
|
nmhd |
|
|
Null media header, overall information (some tracks only) |
|
|
|
|
|
dinf |
|
√ |
data information box, container |
|
|
|
|
|
|
dref |
√ |
data reference box, declares source(s) of media data in track |
|
|
|
|
|
stbl |
|
√ |
sample table box, container for the time/space map |
|
|
|
|
|
|
stsd |
√ |
sample descriptions (codec types, initialization etc.) |
|
|
|
|
|
|
stts |
√ |
(decoding) time-to-sample |
|
|
|
|
|
|
ctts |
|
(composition) time to sample |
|
|
|
|
|
|
stsc |
√ |
sample-to-chunk, partial data-offset information |
|
|
|
|
|
|
stsz |
|
sample sizes (framing) |
|
|
|
|
|
|
stz2 |
|
compact sample sizes (framing) |
|
|
|
|
|
|
stco |
√ |
chunk offset, partial data-offset information |
|
|
|
|
|
|
co64 |
|
64-bit chunk offset |
|
|
|
|
|
|
stss |
|
sync sample table (random access points) |
|
|
|
|
|
|
stsh |
|
shadow sync sample table |
|
|
|
|
|
|
padb |
|
sample padding bits |
|
|
|
|
|
|
stdp |
|
sample degradation priority |
|
|
|
|
|
|
sdtp |
|
independent and disposable samples |
|
|
|
|
|
|
sbgp |
|
sample-to-group |
|
|
|
|
|
|
sgpd |
|
sample group description |
|
|
|
|
|
|
subs |
|
sub-sample information |
|
|
mvex |
|
|
|
|
|
movie extends box |
|
|
|
mehd |
|
|
|
|
movie extends header box |
|
|
|
trex |
|
|
|
√ |
track extends defaults |
|
|
ipmc |
|
|
|
|
|
IPMP Control Box |
| moof |
|
|
|
|
|
|
movie fragment |
|
|
mfhd |
|
|
|
|
√ |
movie fragment header |
|
|
traf |
|
|
|
|
|
track fragment |
|
|
|
tfhd |
|
|
|
√ |
track fragment header |
|
|
|
trun |
|
|
|
|
track fragment run |
|
|
|
sdtp |
|
|
|
|
independent and disposable samples |
|
|
|
sbgp |
|
|
|
|
sample-to-group |
|
|
|
subs |
|
|
|
|
sub-sample information |
| mfra |
|
|
|
|
|
|
movie fragment random access |
|
|
tfra |
|
|
|
|
|
track fragment random access |
|
|
mfro |
|
|
|
|
√ |
movie fragment random access offset |
| mdat |
|
|
|
|
|
|
media data container |
| free |
|
|
|
|
|
|
free space |
| skip |
|
|
|
|
|
|
free space |
|
|
udta |
|
|
|
|
|
user-data |
|
|
|
cprt |
|
|
|
|
copyright etc. |
| meta |
|
|
|
|
|
|
metadata |
|
|
hdlr |
|
|
|
|
√ |
handler, declares the metadata (handler) type |
|
|
dinf |
|
|
|
|
|
data information box, container |
|
|
|
dref |
|
|
|
|
data reference box, declares source(s) of metadata items |
|
|
ipmc |
|
|
|
|
|
IPMP Control Box |
|
|
iloc |
|
|
|
|
|
item location |
|
|
ipro |
|
|
|
|
|
item protection |
|
|
|
sinf |
|
|
|
|
protection scheme information box |
|
|
|
|
frma |
|
|
|
original format box |
|
|
|
|
imif |
|
|
|
IPMP Information box |
|
|
|
|
schm |
|
|
|
scheme type box |
|
|
|
|
schi |
|
|
|
scheme information box |
|
|
iinf |
|
|
|
|
|
item information |
|
|
xml |
|
|
|
|
|
XML container |
|
|
bxml |
|
|
|
|
|
binary XML container |
|
|
pitm |
|
|
|
|
|
primary item reference |
|
|
fiin |
|
|
|
|
|
file delivery item information |
|
|
|
paen |
|
|
|
|
partition entry |
|
|
|
|
fpar |
|
|
|
file partition |
|
|
|
|
fecr |
|
|
|
FEC reservoir |
|
|
|
segr |
|
|
|
|
file delivery session group |
|
|
|
gitn |
|
|
|
|
group id to name |
|
|
|
tsel |
|
|
|
|
track selection |
| meco |
|
|
|
|
|
|
additional metadata container |
|
|
mere |
|
|
|
|
|
metabox relation |
正式開始前先對文件的幾個重要部分宏觀介紹一下,以便諸位在后續學習時心中有數:
1、 ftypbox,在文件的開始位置,描述的文件的版本、兼容協議等;
2、 moovbox,這個box中不包含具體媒體數據,但包含本文件中所有媒體數據的宏觀描述信息,moov box下有mvhd和trak box。
>>mvhd中記錄了創建時間、修改時間、時間度量標尺、可播放時長等信息。
>>trak中的一系列子box描述了每個媒體軌道的具體信息。
3、 moofbox,這個box是視頻分片的描述信息。並不是MP4文件必須的部分,但在我們常見的可在線播放的MP4格式文件中(例如Silverlight Smooth Streaming中的ismv文件)確是重中之重。
4、 mdatbox,實際媒體數據。我們最終解碼播放的數據都在這里面。
5、 mfrabox,一般在文件末尾,媒體的索引文件,可通過查詢直接定位所需時間點的媒體數據。

附:Smooth Streaming中ismv文件結構,文件分為了多個Fragments,每個Fragment中包含moof和mdat。這樣的結構符合漸進式播放需求。(mdat及其描述信息逐步傳輸,收齊一個Fragment便可播放其中的mdat)。
http://blog.csdn.net/tx3344/article/details/8476669
MP4(MPEG-4 Part 14)是一種常見的多媒體容器格式,它是在“ISO/IEC 14496-14”標准文件中定義的。
1.最小組成單元 BOX
2.mp4文件整體結構
| Code | Abstract | Defined in/by |
| ainf | Asset information to identify, license and play | DECE |
| albm | Album title and track number (user-data) | 3GPP |
| auth | Media author name (user-data) | 3GPP |
| avcn | AVC NAL Unit Storage Box | DECE |
| bloc | Base location and purchase location for license acquisition | DECE |
| bpcc | Bits per component | JP2 |
| buff | Buffering information | AVC |
| bxml | binary XML container | ISO |
| ccid | OMA DRM Content ID | OMA DRM 2.1 |
| cdef | type and ordering of the components within the codestream | JP2 |
| clsf | Media classification (user-data) | 3GPP |
| cmap | mapping between a palette and codestream components | JP2 |
| co64 | 64-bit chunk offset | ISO |
| colr | specifies the colourspace of the image | JP2 |
| cprt | copyright etc. (user-data) | ISO |
| crhd | reserved for ClockReferenceStream header | MP4V1 |
| cslg | composition to decode timeline mapping | ISO |
| ctts | (composition) time to sample | ISO |
| cvru | OMA DRM Cover URI | OMA DRM 2.1 |
| dcfD | Marlin DCF Duration, user-data atom type | OMArlin |
| dinf | data information box, container | ISO |
| dref | data reference box, declares source(s) of media data in track | ISO |
| dscp | Media description (user-data) | 3GPP |
| dsgd | DVB Sample Group Description Box | DVB |
| dstg | DVB Sample to Group Box | DVB |
| edts | edit list container | ISO |
| elst | an edit list | ISO |
| feci | FEC Informatiom | ISO |
| fecr | FEC Reservoir | ISO |
| fiin | FD Item Information | ISO |
| fire | File Reservoir | ISO |
| fpar | File Partition | ISO |
| free | free space | ISO |
| frma | original format box | ISO |
| ftyp | file type and compatibility | JP2, ISO |
| gitn | Group ID to name | ISO |
| gnre | Media genre (user-data) | 3GPP |
| grpi | OMA DRM Group ID | OMA DRM 2.0 |
| hdlr | handler, declares the media (handler) type | ISO |
| hmhd | hint media header, overall information (hint track only) | ISO |
| hpix | Hipix Rich Picture (user-data or meta-data) | HIPIX |
| icnu | OMA DRM Icon URI | OMA DRM 2.0 |
| ID32 | ID3 version 2 container | inline |
| idat | Item data | ISO |
| ihdr | Image Header | JP2 |
| iinf | item information | ISO |
| iloc | item location | ISO |
| imif | IPMP Information box | ISO |
| infu | OMA DRM Info URL | OMA DRM 2.0 |
| iods | Object Descriptor container box | MP4V1 |
| iphd | reserved for IPMP Stream header | MP4V1 |
| ipmc | IPMP Control Box | ISO |
| ipro | item protection | ISO |
| iref | Item reference | ISO |
| jP$20$20 | JPEG 2000 Signature | JP2 |
| jp2c | JPEG 2000 contiguous codestream | JP2 |
| jp2h | Header | JP2 |
| jp2i | intellectual property information | JP2 |
| kywd | Media keywords (user-data) | 3GPP |
| loci | Media location information (user-data) | 3GPP |
| lrcu | OMA DRM Lyrics URI | OMA DRM 2.1 |
| m7hd | reserved for MPEG7Stream header | MP4V1 |
| mdat | media data container | ISO |
| mdhd | media header, overall information about the media | ISO |
| mdia | container for the media information in a track | ISO |
| mdri | Mutable DRM information | OMA DRM 2.0 |
| meco | additional metadata container | ISO |
| mehd | movie extends header box | ISO |
| mere | metabox relation | ISO |
| meta | Metadata container | ISO |
| mfhd | movie fragment header | ISO |
| mfra | Movie fragment random access | ISO |
| mfro | Movie fragment random access offset | ISO |
| minf | media information container | ISO |
| mjhd | reserved for MPEG-J Stream header | MP4V1 |
| moof | movie fragment | ISO |
| moov | container for all the meta-data | ISO |
| mvcg | Multiview group | AVC |
| mvci | Multiview Information | AVC |
| mvex | movie extends box | ISO |
| mvhd | movie header, overall declarations | ISO |
| mvra | Multiview Relation Attribute | AVC |
| nmhd | Null media header, overall information (some tracks only) | ISO |
| ochd | reserved for ObjectContentInfoStream header | MP4V1 |
| odaf | OMA DRM Access Unit Format | OMA DRM 2.0 |
| odda | OMA DRM Content Object | OMA DRM 2.0 |
| odhd | reserved for ObjectDescriptorStream header | MP4V1 |
| odhe | OMA DRM Discrete Media Headers | OMA DRM 2.0 |
| odrb | OMA DRM Rights Object | OMA DRM 2.0 |
| odrm | OMA DRM Container | OMA DRM 2.0 |
| odtt | OMA DRM Transaction Tracking | OMA DRM 2.0 |
| ohdr | OMA DRM Common headers | OMA DRM 2.0 |
| padb | sample padding bits | ISO |
| paen | Partition Entry | ISO |
| pclr | palette which maps a single component in index space to a multiple- component image | JP2 |
| pdin | Progressive download information | ISO |
| perf | Media performer name (user-data) | 3GPP |
| pitm | primary item reference | ISO |
| res$20 | grid resolution | JP2 |
| resc | grid resolution at which the image was captured | JP2 |
| resd | default grid resolution at which the image should be displayed | JP2 |
| rtng | Media rating (user-data) | 3GPP |
| sbgp | Sample to Group box | AVC, ISO |
| schi | scheme information box | ISO |
| schm | scheme type box | ISO |
| sdep | Sample dependency | AVC |
| sdhd | reserved for SceneDescriptionStream header | MP4V1 |
| sdtp | Independent and Disposable Samples Box | AVC, ISO |
| sdvp | SD Profile Box | SDV |
| segr | file delivery session group | ISO |
| senc | Sample specific encryption data | DECE |
| sgpd | Sample group definition box | AVC, ISO |
| sidx | Segment Index Box | 3GPP |
| sinf | protection scheme information box | ISO |
| skip | free space | ISO |
| smhd | sound media header, overall information (sound track only) | ISO |
| srmb | System Renewability Message | DVB |
| srmc | System Renewability Message container | DVB |
| srpp | STRP Process | ISO |
| stbl | sample table box, container for the time/space map | ISO |
| stco | chunk offset, partial data-offset information | ISO |
| stdp | sample degradation priority | ISO |
| sthd | Subtitle Media Header Box | DECE |
| stsc | sample-to-chunk, partial data-offset information | ISO |
| stsd | sample descriptions (codec types, initialization etc.) | ISO |
| stsh | shadow sync sample table | ISO |
| stss | sync sample table (random access points) | ISO |
| stsz | sample sizes (framing) | ISO |
| stts | (decoding) time-to-sample | ISO |
| styp | Segment Type Box | 3GPP |
| stz2 | compact sample sizes (framing) | ISO |
| subs | Sub-sample information | ISO |
| swtc | Multiview Group Relation | AVC |
| tfad | Track fragment adjustment box | 3GPP |
| tfhd | Track fragment header | ISO |
| tfma | Track fragment media adjustment box | 3GPP |
| tfra | Track fragment radom access | ISO |
| tibr | Tier Bit rate | AVC |
| tiri | Tier Information | AVC |
| titl | Media title (user-data) | 3GPP |
| tkhd | Track header, overall information about the track | ISO |
| traf | Track fragment | ISO |
| trak | container for an individual track or stream | ISO |
| tref | track reference container | ISO |
| trex | track extends defaults | ISO |
| trgr | Track grouping information | ISO |
| trik | Facilitates random access and trick play modes | DECE |
| trun | track fragment run | ISO |
| tsel | Track selection (user-data) | 3GPP |
| udta | user-data | ISO |
| uinf | a tool by which a vendor may provide access to additional information associated with a UUID | JP2 |
| UITS | Unique Identifier Technology Solution | Universal Music |
| ulst | a list of UUID’s | JP2 |
| url$20 | a URL | JP2 |
| uuid | user-extension box | ISO, JP2 |
| vmhd | video media header, overall information (video track only) | ISO |
| vwdi | Multiview Scene Information | AVC |
| xml$20 | a tool by which vendors can add XML formatted information | JP2 |
| xml$20 | XML container | ISO |
| yrrc | Year when media was recorded (user-data) | 3GPP |
| Code | Abstract | Defined in/by |
| clip | Visual clipping region container | QT |
| crgn | Visual clipping region definition | QT |
| ctab | Track color-table | QT |
| elng | Extended Language Tag | QT |
| imap | Track input map definition | QT |
| kmat | Compressed visual track matte | QT |
| load | Track pre-load definitions | QT |
| matt | Visual track matte for compositing | QT |
| pnot | Preview container | QT |
| wide | Expansion space reservation | QT |
1.File Type Box
Box Type: `ftyp’
這種box一般情況下都會出現在mp4文件的開頭,它可以作為mp4容器格式的可表示信息。就像flv頭‘F’ 'L' 'V' 3字節,MKV頭部的1A 45 DF A3 、ASF_Header_Object 可以作為ASF容器格式的可辨識信息一樣。
ftyp box內容結構如下
- aligned(8) class FileTypeBox
- extends Box(‘ftyp’) {
- unsigned int(32) major_brand;
- unsigned int(32) minor_version;
- unsigned int(32) compatible_brands[]; // to end of the box
- }
2.Movie Box
moov 這個box 里面包含了很多個子box,就像上篇那個圖上標的。一般情況下moov 會緊跟着 ftyp。moov里面包含着mp4文件中的metedata。音視頻相關的基礎信息。讓我們看看moov 里面都含有哪些重要的box。
2.1 Movie Header Box
- aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) timescale;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) timescale;
- unsigned int(32) duration;
- }
- template int(32) rate = 0x00010000; // typically 1.0
- template int(16) volume = 0x0100; // typically, full volume
- const bit(16) reserved = 0;
- const unsigned int(32)[2] reserved = 0;
- template int(32)[9] matrix =
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // Unity matrix
- bit(32)[6] pre_defined = 0;
- unsigned int(32) next_track_ID;
- }
| Type |
Comment |
|
| box size |
4 |
box大小 |
| box type |
4 |
box類型 |
| version |
1 |
box版本,0或1,一般為0。 |
| flags |
3 |
flags |
| creation time |
4 |
創建時間(相對於UTC時間1904-01-01零點的秒數) |
| modification time |
4 |
修改時間 |
| time scale |
4 |
文件媒體在1秒時間內的刻度值,可以理解為1秒長度的時間單元數 一般情況下視頻的 都是90000 |
| duration |
4 |
該track的時間長度,用duration和time scale值可以計算track時長,比如audio track的time scale = 8000, duration = 560128,時長為 70.016,video track的time scale = 600, duration = 42000,時長為70 |
| rate |
4 |
推薦播放速率,高16位和低16位分別為小數點整數部分和小數部分,即[16.16] 格式,該值為1.0(0x00010000)表示正常前向播放 |
| volume |
2 |
與rate類似,[8.8] 格式,1.0(0x0100)表示最大音量 |
| reserved |
10 |
保留位 |
| matrix |
36 |
視頻變換矩陣 |
| pre-defined |
24 |
|
| next track id |
4 |
下一個track使用的id號 |
所以通過解析這部分內容可以或者duration、rate等主要信息。舉個例子:


2.2.1 Track Header Box
- aligned(8) class TrackHeaderBox
- extends FullBox(‘tkhd’, version, flags){
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(32) duration;
- }
- const unsigned int(32)[2] reserved = 0;
- template int(16) layer = 0;
- template int(16) alternate_group = 0;
- template int(16) volume = {if track_is_audio 0x0100 else 0};
- const unsigned int(16) reserved = 0;
- template int(32)[9] matrix=
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // unity matrix
- unsigned int(32) width;
- unsigned int(32) height;
- }
| Field |
Type |
Comment |
| box size |
4 |
box大小 |
| box type |
4 |
box類型 |
| version |
1 |
box版本,0或1,一般為0。 |
| flags |
3 |
按位或操作結果值,預定義如下: |
| track id |
4 |
id號,不能重復且不能為0 |
| reserved |
4 |
保留位 |
| duration |
4 |
track的時間長度 |
| reserved |
8 |
保留位 |
| layer |
2 |
視頻層,默認為0,值小的在上層 |
| alternate group |
2 |
track分組信息,默認為0表示該track未與其他track有群組關系 |
| volume |
2 |
[8.8] 格式,如果為音頻track,1.0(0x0100)表示最大音量;否則為0 |
| reserved |
2 |
保留位 |
| matrix |
36 |
視頻變換矩陣 |
| width |
4 |
寬 |
| height |
4 |
高,均為 [16.16] 格式值,與sample描述中的實際畫面大小比值,用於播放時的展示寬高 |
