[時間:2016-07] [狀態:Open]
MKV是一種開源的多媒體封裝格式,是Matroska中應用比較多的格式之一。常見的后綴格式是.mkv(視頻,包括音頻和字幕)、.mka(純音頻)、.mks(純字幕)、.mk3d(3d視頻,包括音頻和字幕)。
0. 學習多媒體容器格式的目的
主要是為了回答以下問題:
- 該容器中數據是如何組織的?
- 該容器包含哪些編碼格式的數據?這些數據是如何存儲的?
- 該容器包含哪些元數據信息?包含哪些節目信息?
- 對於支持多節目的容器格式,如何找到對應的音頻流、視頻流、字幕流?
- 如何確定該容器的節目播放時長?
- 如何從該容器中提取音頻、視頻、字幕數據,並交給解碼器解碼,有時間戳否?
- 該容器是否支持seek?有哪些輔助信息?
- 是否支持直接流化?
- 哪里可以找到該容器格式最標准的文檔資料?
- 有哪些可用的工具,方便分析容器格式異常或者錯誤?
1. MKV文件總體結構
MKV是基於EBML(Extensible Binary Meta Language)基礎上的,EBML是參考XML實現的用於存儲二進制數據的格式。所以在說明MKV之前,先簡單了解下EBML。
EBML
更具體的標准在這里EBML specifications。
既然是基於XML的,很明顯的具有很多的嵌套存在,比較多的是下面這種:
<root>
<header vaue="123"/>
</root>
那么EBML是如何構成的呢?
構成EBML最基礎的是EBML Element,通過多個EBML Element構成一個Document。EBML Element定義如下:
typedef struct {
vint ID; // EBML-ID
vint size; // size of element
char data[size]; // data
} EBML_ELEMENT;
這里的數據可以包括二進制數據,也可以包括其他EBML Element。
vint(Unsigned Integer Values of Variable Length)是可變長度無符號整型,比傳統32/64位整型更加節省空間。vint有三個部分構成: VINT_WIDTH,VINT_MARKER,VINT_DATA。VINT_MAKRER指的是二進制數據中第一個1的位置;VINT_WIDTH指的是在VINT_MARKER之前的0的個數(可以是0),VINT_WIDTH+1表示對應的vint占用的字節數目。比如比較經典的mkv文件開頭的字節:
42 82 88 6d 61 74 72 6f 73 6b 61
這個字段是一個完整的DocType Element,0x282是EBML-ID,8是Elemet-size,后面8個字符就是"matroska"。
0x42寫成二進制就是0100 0010
,那么ID的vint的字節數是0+1=2byte,所以id就是0x282;接下來size的vint解析下,10001000
,字節數目是0+1=1字節,值為8,解析完成,讀出來后面的string就可以了。
MKV整體概述
從總體結構來看MKV跟AVI、ASF、MP4文件格式類似,主要包括下面幾個部分:
Header |
Meta Seek Information |
Segment Information |
Track |
Chapters |
Clusters |
Cueing Data |
Attachment |
Tagging |
注意這里僅僅是簡化之后的文件結構示例,mkv各部分如何存儲並不是直接按照上面結構來的,需要參考標准解析。以下是各部分簡要功能介紹:
- Header部分包含EBML版本信息以及EBML的類型(表明是Matroska文件)。
- Metaseek section Info部分包含用於定位文件其他部分(例如Track Info、Chapters、Tags、Cues、Attachments等)的索引信息。這一部分不是必須的,如果不存在的話可以通過掃描整個文件的其他字段獲取。
- Segment Info部分包含整個文件相關的基本信息,例如title信息,並包含唯一的ID,如果是連續多個相關文件,還會包含下一個文件的ID。
- Track部分包含track相關的信息,比如音頻、視頻、字幕,視頻分辨率、音頻采樣率、編碼方式等信息。
- Chapters部分給出所有Chapters。其中每個Chapters是臃腫預設音視頻播放點的方式。
- Clusters部分主要包含每個track的音頻幀和視頻幀。
- Cueing Data部分包含所有的Cue信息。Cues是每個track的索引信息,跟MetaSeek Info類似,但主要用於播放時seek到特定時間。
- Attachment部分主要是用於支持在MKV文件中附加任何類型的文件,包括圖片、網頁、程序等。
- Tagging部分包含跟文件和各個track相關的Tag。這些Tag跟MP3文件中的ID3 tag類似,包含主入writer、singer、ctor等信息。
上面的結構是一個整體的概覽。實際上EBML是分Level的。標准規定,位於Level n的元素只能包含Level n+1的元素。Matroska最頂層的是Level 0的元素,主要有兩個:EBML Header和Segment。
2. EBML Header
EBML Header位於MKV文件開頭,是level 0元素之一,主要包含兩個level 1的元素。
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EBML Header | |||||||||||||
EBML | 0 | [1A][45][DF][A3] | * | * | - | - | m | * | * | * | * | * | Set the EBML characteristics of the data to follow. Each EBML document has to start with this. |
EBMLVersion | 1 | [42][86] | * | - | - | 1 | u | * | * | * | * | * | The version of EBML parser used to create the file. |
EBMLReadVersion | 1 | [42][F7] | * | - | - | 1 | u | * | * | * | * | * | The minimum EBML version a parser has to support to read this file. |
EBMLMaxIDLength | 1 | [42][F2] | * | - | - | 4 | u | * | * | * | * | * | The maximum length of the IDs you'll find in this file (4 or less in Matroska). |
EBMLMaxSizeLength | 1 | [42][F3] | * | - | - | 8 | u | * | * | * | * | * | The maximum length of the sizes you'll find in this file (8 or less in Matroska). This does not override the element size indicated at the beginning of an element. Elements that have an indicated size which is larger than what is allowed by EBMLMaxSizeLength shall be considered invalid. |
DocType | 1 | [42][82] | * | - | - | matroska | s | * | * | * | * | * | A string that describes the type of document that follows this EBML header. 'matroska' in our case or 'webm' for webm files. |
DocTypeVersion | 1 | [42][87] | * | - | - | 1 | u | * | * | * | * | * | The version of DocType interpreter used to create the file. |
DocTypeReadVersion | 1 | [42][85] | * | - | - | 1 | u | * | * | * | * | * | The minimum DocType version an interpreter has to support to read this file. |
這里首先介紹下上表中各列表框的意義:
- Element Name:給出所描述元素的名稱。
- L:EBML中元素出現的Level。
+
表示可以遞歸包含,g
表示全局元素,可以位於任意level。 - EBML ID:ID的字節碼。
- Ma:強制出現的標志,如果表格上是
*
表示強制標志,標准中縮寫為»mand.«。 - Mu:多重性標志,如果表格上是
*
表示該元素可以出現多次,標准中縮寫為 »mult.«。 - Rng:所存儲元素的有效范圍,通常針對整型或浮點型數據類型。
- Default:默認元素對應的負載的值。
- T:元素包含的數據類型。其中具體取值含義如下,m: Master(可變長度,可以包含一個或多個其他類型元素), u: unsigned int, i: signed integer, s: string, 8: UTF-8 string, b: binary, f: float, d: date。
- 1:表示該元素包含在Matroska version 1中。
- 2:表示該元素包含在Matroska version 2中。
- 3:表示該元素包含在Matroska version 3中。
- 4:表示該元素包含在Matroska version 4中。
- W:表示該元素在WebM中使用。
- Description:簡要描述元素的功能。
通常EBML Element的ID是vint的,可以直接通過固定字段即可確定,比如上面的0x1A45DFA3
。通過EBML Header的ID可以用於唯一的識別MKV文件。
在解析EBML Header的時候需要通過DocType判斷實際封裝格式,常規的mkv文件該字段必須是"mastroka"。
3. Segment
除了EBML Header,MKV中其它部分都是Segment,其中包含了音視頻數據和音視頻信息。segment的id是[18][53][80][67]
,位於Level 0,並且包含了所有位於Level 1的元素。接下來依次說明Level 1的元素。
Meta Seek Info
Meta Seek Info是一個快速索引的信息,不是必須有的,但存在的話通常只有一個,如果不存在需要順序掃描文件重建這些信息。Meta Seek Info包含一個SeekHead及多個seek entry,每個seek entry包含一個seek point,每個seek point包含seekID和SeekPostion兩個元素。標准中定義格式如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SeekHead | 1 | [11][4D][9B][74] | - | * | - | - | m | * | * | * | * | * | Contains the position of other Top-Level Elements. |
Seek | 2 | [4D][BB] | * | * | - | - | m | * | * | * | * | * | Contains a single seek entry to an EBML Element. |
SeekID | 3 | [53][AB] | * | - | - | - | b | * | * | * | * | * | The binary ID corresponding to the Element name. |
SeekPosition | 3 | [53][AC] | * | - | - | - | u | * | * | * | * | * | The position of the Element in the Segment in octets (0 = first level 1 Element). |
注意這里的SeekID包含了level 1元素的ID,位置是相對Segment起始位置的偏移。比如下面數據解析出來的,第一個seek entry數據如下:
11 4d 9b 74 bb 4d bb 8b 53 ab 84 15 49 a9 66 53 ac 81 40
解析之后seekID=0x1549a966
,seekPostion=0x40
。查詢Matroska標准知道這個ID是Segment Info的ID,偏移量加上Segment起始位置正好是Segment Info段的存儲位置。
Segment Info
Segment Info部分包含了用於識別文件的信息(SegmentUID),也包括duration字段。標准中定義結構如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Info | 1 | [15][49][A9][66] | * | * | - | - | m | * | * | * | * | * | Contains miscellaneous general information and statistics on the file. |
SegmentUID | 2 | [73][A4] | - | - | not 0 | - | b | * | * | * | * | A randomly generated unique ID to identify the current segment between many others (128 bits). | |
SegmentFilename | 2 | [73][84] | - | - | - | - | 8 | * | * | * | * | A filename corresponding to this segment. | |
PrevUID | 2 | [3C][B9][23] | - | - | - | - | b | * | * | * | * | A unique ID to identify the previous chained segment (128 bits). | |
PrevFilename | 2 | [3C][83][AB] | - | - | - | - | 8 | * | * | * | * | An escaped filename corresponding to the previous segment. | |
NextUID | 2 | [3E][B9][23] | - | - | - | - | b | * | * | * | * | A unique ID to identify the next chained segment (128 bits). | |
NextFilename | 2 | [3E][83][BB] | - | - | - | - | 8 | * | * | * | * | An escaped filename corresponding to the next segment. | |
SegmentFamily | 2 | [44][44] | - | * | - | - | b | * | * | * | * | A randomly generated unique ID that all segments related to each other must use (128 bits). | |
ChapterTranslate | 2 | [69][24] | - | * | - | - | m | * | * | * | * | A tuple of corresponding ID used by chapter codecs to represent this segment. | |
ChapterTranslateEditionUID | 3 | [69][FC] | - | * | - | - | u | * | * | * | * | Specify an edition UID on which this correspondance applies. When not specified, it means for all editions found in the segment. | |
ChapterTranslateCodec | 3 | [69][BF] | * | - | - | - | u | * | * | * | * | The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). | |
ChapterTranslateID | 3 | [69][A5] | * | - | - | - | b | * | * | * | * | The binary value used to represent this segment in the chapter codec data. The format depends on theChapProcessCodecID used. | |
TimecodeScale | 2 | [2A][D7][B1] | * | - | - | 1000000 | u | * | * | * | * | * | Timestamp scale in nanoseconds (1.000.000 means all timestamps in the Segment are expressed in milliseconds). |
Duration | 2 | [44][89] | - | - | > 0 | - | f | * | * | * | * | * | Duration of the segment (based on TimecodeScale). |
DateUTC | 2 | [44][61] | - | - | - | - | d | * | * | * | * | * | Date of the origin of timecode (value 0), i.e. production date. |
Title | 2 | [7B][A9] | - | - | - | - | 8 | * | * | * | * | General name of the segment. | |
MuxingApp | 2 | [4D][80] | * | - | - | - | 8 | * | * | * | * | * | Muxing application or library ("libmatroska-0.4.3"). |
WritingApp | 2 | [57][41] | * | - | - | - | 8 | * | * | * | * | * | Writing application ("mkvmerge-0.3.3"). |
Track
Track包含了音視頻的基本信息,如音視頻解碼器類型、視頻分辨率、音頻采樣率等這。通過對Track部分的解析。我們就能得到音視頻的基本信息。為選擇相應解碼器以及初始化這些解碼器做好准備工作。Track中包含至少一個TrackEntry,每個TrackEntry代表着1條軌道信息;TrackEntry包含Name、TrackNumber、TrackType等信息。標准定義的相關字段如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tracks | 1 | [16][54][AE][6B] | - | * | - | - | m | * | * | * | * | * | A Top-Level Element of information with many tracks described. |
TrackEntry | 2 | [AE] | * | * | - | - | m | * | * | * | * | * | Describes a track with all Elements. |
TrackNumber | 3 | [D7] | * | - | not 0 | - | u | * | * | * | * | * | The track number as used in the Block Header (using more than 127 tracks is not encouraged, though the design allows an unlimited number). |
TrackUID | 3 | [73][C5] | * | - | not 0 | - | u | * | * | * | * | * | A unique ID to identify the Track. This should be kept the same when making a direct stream copy of the Track to another file. |
TrackType | 3 | [83] | * | - | 1-254 | - | u | * | * | * | * | * | A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control). |
FlagEnabled | 3 | [B9] | * | - | 0-1 | 1 | u | * | * | * | * | Set if the track is usable. (1 bit) | |
FlagDefault | 3 | [88] | * | - | 0-1 | 1 | u | * | * | * | * | * | Set if that track (audio, video or subs) SHOULD be active if no language found matches the user preference. (1 bit) |
FlagForced | 3 | [55][AA] | * | - | 0-1 | 0 | u | * | * | * | * | * | Set if that track MUST be active during playback. There can be many forced track for a kind (audio, video or subs), the player should select the one which language matches the user preference or the default + forced track. Overlay MAY happen between a forced and non-forced track of the same kind. (1 bit) |
FlagLacing | 3 | [9C] | * | - | 0-1 | 1 | u | * | * | * | * | * | Set if the track may contain blocks using lacing. (1 bit) |
MinCache | 3 | [6D][E7] | * | - | - | 0 | u | * | * | * | * | The minimum number of frames a player should be able to cache during playback. If set to 0, the reference pseudo-cache system is not used. | |
MaxCache | 3 | [6D][F8] | - | - | - | - | u | * | * | * | * | The maximum cache size required to store referenced frames in and the current frame. 0 means no cache is needed. | |
DefaultDuration | 3 | [23][E3][83] | - | - | not 0 | - | u | * | * | * | * | * | Number of nanoseconds (not scaled via TimecodeScale) per frame ('frame' in the Matroska sense -- one Element put into a (Simple)Block). |
DefaultDecodedFieldDuration | 3 | [23][4E][7A] | - | - | not 0 | - | u | * | The period in nanoseconds (not scaled by TimcodeScale) between two successive fields at the output of the decoding process (see the notes) | ||||
TrackTimecodeScale | 3 | [23][31][4F] | * | - | > 0 | 1.0 | f | * | * | * | DEPRECATED, DO NOT USE. The scale to apply on this track to work at normal speed in relation with other tracks (mostly used to adjust video speed when the audio length differs). | ||
TrackOffset | 3 | [53][7F] | - | - | - | 0 | i | A value to add to the Block's Timestamp. This can be used to adjust the playback offset of a track. | |||||
MaxBlockAdditionID | 3 | [55][EE] | * | - | - | 0 | u | * | * | * | * | The maximum value of BlockAddID. A value 0 means there is no BlockAdditions for this track. | |
Name | 3 | [53][6E] | - | - | - | - | 8 | * | * | * | * | * | A human-readable track name. |
Language | 3 | [22][B5][9C] | - | - | - | eng | s | * | * | * | * | * | Specifies the language of the track in the Matroska languages form. |
CodecID | 3 | [86] | * | - | - | - | s | * | * | * | * | * | An ID corresponding to the codec, see the codec page for more info. |
CodecPrivate | 3 | [63][A2] | - | - | - | - | b | * | * | * | * | * | Private data only known to the codec. |
CodecName | 3 | [25][86][88] | - | - | - | - | 8 | * | * | * | * | * | A human-readable string specifying the codec. |
AttachmentLink | 3 | [74][46] | - | - | not 0 | - | u | * | * | * | * | The UID of an attachment that is used by this codec. | |
CodecDecodeAll | 3 | [AA] | * | - | 0-1 | 1 | u | * | * | * | The codec can decode potentially damaged data (1 bit). | ||
TrackOverlay | 3 | [6F][AB] | - | * | - | - | u | * | * | * | * | Specify that this track is an overlay track for the Track specified (in the u-integer). That means when this track has a gap (see SilentTracks) the overlay track should be used instead. The order of multiple TrackOverlay matters, the first one is the one that should be used. If not found it should be the second, etc. | |
CodecDelay | 3 | [56][AA] | - | - | - | 0 | u | * | |||||
SeekPreRoll | 3 | [56][BB] | * | - | - | 0 | u | * | |||||
TrackTranslate | 3 | [66][24] | - | * | - | - | m | * | * | * | * | The track identification for the given Chapter Codec. | |
TrackTranslateEditionUID | 4 | [66][FC] | - | * | - | - | u | * | * | * | * | Specify an edition UID on which this translation applies. When not specified, it means for all editions found in the Segment. | |
TrackTranslateCodec | 4 | [66][BF] | * | - | - | - | u | * | * | * | * | The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). | |
TrackTranslateTrackID | 4 | [66][A5] | * | - | - | - | b | * | * | * | * | The binary value used to represent this track in the chapter codec data. The format depends on the ChapProcessCodecID used. | |
Video | 3 | [E0] | - | - | - | - | m | * | * | * | * | * | Video settings. |
FlagInterlaced | 4 | [9A] | * | - | 0-2 | 0 | u | * | * | * | * | A flag to declare is the video is known to be progressive or interlaced and if applicable to declare details about the interlacement. (0: undetermined, 1: interlaced, 2: progressive) | |
FieldOrder | 4 | [9D] | * | - | 0-14 | 2 | u | * | Declare the field ordering of the video. If FlagInterlaced is not set to 1, this Element MUST be ignored. (0: Progressive, 1: Interlaced with top field display first and top field stored first, 2: Undetermined field order, 6: Interlaced with bottom field displayed first and bottom field stored first, 9: Interlaced with bottom field displayed first and top field stored first, 14: Interlaced with top field displayed first and bottom field stored first) | ||||
StereoMode | 4 | [53][B8] | - | - | - | 0 | u | * | * | * | Stereo-3D video mode (0: mono, 1: side by side (left eye is first), 2: top-bottom (right eye is first), 3: top-bottom (left eye is first), 4: checkboard (right is first), 5: checkboard (left is first), 6: row interleaved (right is first), 7: row interleaved (left is first), 8: column interleaved (right is first), 9: column interleaved (left is first), 10: anaglyph (cyan/red), 11: side by side (right eye is first), 12: anaglyph (green/magenta), 13 both eyes laced in one Block (left eye is first), 14 both eyes laced in one Block (right eye is first)) . There are some more details on 3D support in the Specification Notes. | ||
AlphaMode | 4 | [53][C0] | - | - | - | 0 | u | * | * | * | Alpha Video Mode. Presence of this Element indicates that the BlockAdditional Element could contain Alpha data. | ||
PixelWidth | 4 | [B0] | * | - | not 0 | - | u | * | * | * | * | * | Width of the encoded video frames in pixels. |
PixelHeight | 4 | [BA] | * | - | not 0 | - | u | * | * | * | * | * | Height of the encoded video frames in pixels. |
PixelCropBottom | 4 | [54][AA] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove at the bottom of the image (for HDTV content). |
PixelCropTop | 4 | [54][BB] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove at the top of the image. |
PixelCropLeft | 4 | [54][CC] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove on the left of the image. |
PixelCropRight | 4 | [54][DD] | - | - | - | 0 | u | * | * | * | * | * | The number of video pixels to remove on the right of the image. |
DisplayWidth | 4 | [54][B0] | - | - | not 0 | PixelWidth - PixelCropLeft - Pi | u | * | * | * | * | * | Width of the video frames to display. Applies to the video frame after cropping (PixelCrop* Elements). The default value is only valid when DisplayUnit is 0. |
DisplayHeight | 4 | [54][BA] | - | - | not 0 | PixelHeight - PixelCropTop - Pi | u | * | * | * | * | * | Height of the video frames to display. Applies to the video frame after cropping (PixelCrop* Elements). The default value is only valid when DisplayUnit is 0. |
DisplayUnit | 4 | [54][B2] | - | - | - | 0 | u | * | * | * | * | * | How DisplayWidth & DisplayHeight should be interpreted (0: pixels, 1: centimeters, 2: inches, 3: Display Aspect Ratio). |
AspectRatioType | 4 | [54][B3] | - | - | - | 0 | u | * | * | * | * | * | Specify the possible modifications to the aspect ratio (0: free resizing, 1: keep aspect ratio, 2: fixed). |
ColourSpace | 4 | [2E][B5][24] | - | - | - | - | b | * | * | * | * | Same value as in AVI (32 bits). | |
Audio | 3 | [E1] | - | - | - | - | m | * | * | * | * | * | Audio settings. |
SamplingFrequency | 4 | [B5] | *- | > 0 | 8000.0 | f | * | * | * | * | * | Sampling frequency in Hz. | |
OutputSamplingFrequency | 4 | [78][B5] | - | - | > 0 | SamplingFrequency | f | * | * | * | * | * | Real output sampling frequency in Hz (used for SBR techniques). |
Channels | 4 | [9F] | * | - | not 0 | 1 | u | * | * | * | * | * | Numbers of channels in the track. |
BitDepth | 4 | [62][64] | - | - | not 0 | - | u | * | * | * | * | * | Bits per sample, mostly used for PCM. |
ContentEncodings | 3 | [6D][80] | - | - | - | - | m | * | * | * | * | Settings for several content encoding mechanisms like compression or encryption. | |
ContentEncoding | 4 | [62][40] | * | * | - | - | m | * | * | * | * | Settings for one content encoding like compression or encryption. | |
ContentEncodingOrder | 5 | [50][31] | * | - | - | 0 | u | * | * | * | * | Tells when this modification was used during encoding/muxing starting with 0 and counting upwards. The decoder/demuxer has to start with the highest order number it finds and work its way down. This value has to be unique over all ContentEncodingOrder Elements in the Segment. | |
ContentEncodingScope | 5 | [50][32] | * | - | not 0 | 1 | u | * | * | * | * | A bit field that describes which Elements have been modified in this way. Values (big endian) can be OR'ed. Possible values:1 - all frame contents, 2 - the track's private data, 4 - the next ContentEncoding (next ContentEncodingOrder. Either the data inside ContentCompression and/or ContentEncryption) | |
ContentEncodingType | 5 | [50][33] | * | - | - | 0 | u | * | * | * | * | A value describing what kind of transformation has been done. Possible values: 0 - compression, 1 - encryption | |
ContentCompression | 5 | [50][34] | - | - | - | - | m | * | * | * | * | Settings describing the compression used. Must be present if the value of ContentEncodingType is 0 and absent otherwise. Each block must be decompressable even if no previous block is available in order not to prevent seeking. | |
ContentEncryption | 5 | [50][35] | - | - | - | - | m | * | * | * | * | Settings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise. |
Chapters
Chapter的功能有點類似給媒體文件添加章節目錄信息,比如片頭、片尾、鋪墊等。如果你對這部分感興趣建議參考Chapter Specifications。Chapter的ID是[10][43][A7][70]
。
Clusters
Clusters部分中包含了所有的音視頻數據,是由多個Cluster構成的。
每個Cluster中可能包含多個BlockGroup,每個BlockGroup由多個Block(ReferenceBlock)構成,音視頻數據可以交織存儲在Block中,但是每個Block存儲的數據必須是音頻、視頻、字幕的一種。
這一部分標准中定義如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cluster | 1 | [1F][43][B6][75] | - | * | - | - | m | * | * | * | * | * | The Top-Level Element containing the (monolithic) Block structure. |
Timecode | 2 | [E7] | * | - | - | - | u | * | * | * | * | * | Absolute timestamp of the cluster (based on TimecodeScale). |
SilentTracks | 2 | [58][54] | - | - | - | - | m | * | * | * | * | The list of tracks that are not used in that part of the stream. It is useful when using overlay tracks on seeking. Then you should decide what track to use. | |
SilentTrackNumber | 3 | [58][D7] | - | * | - | - | u | * | * | * | * | One of the track number that are not used from now on in the stream. It could change later if not specified as silent in a further Cluster. | |
Position | 2 | [A7] | - | - | - | - | u | * | * | * | * | The Position of the Cluster in the Segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams. | |
PrevSize | 2 | [AB] | - | - | - | - | u | * | * | * | * | * | Size of the previous Cluster, in octets. Can be useful for backward playing. |
SimpleBlock | 2 | [A3] | - | * | - | - | b | * | * | * | * | Similar to Block but without all the extra information, mostly used to reduced overhead when no extra feature is needed. (see SimpleBlock Structure) | |
BlockGroup | 2 | [A0] | - | * | - | - | m | * | * | * | * | * | Basic container of information containing a single Block and information specific to that Block. |
Block | 3 | [A1] | * | - | - | - | b | * | * | * | * | * | Block containing the actual data to be rendered and a timestamp relative to the Cluster Timecode. (see Block Structure) |
BlockAdditions | 3 | [75][A1] | - | - | - | - | m | * | * | * | * | Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data. | |
BlockMore | 4 | [A6] | * | * | - | - | m | * | * | * | * | Contain the BlockAdditional and some parameters. | |
BlockAddID | 5 | [EE] | * | - | not 0 | 1 | u | * | * | * | * | An ID to identify the BlockAdditional level. | |
BlockAdditional | 5 | [A5] | * | - | - | - | b | * | * | * | * | Interpreted by the codec as it wishes (using the BlockAddID). | |
BlockDuration | 3 | [9B] | - | - | - | DefaultDuration | u | * | * | * | * | * | The duration of the Block (based on TimecodeScale). This Element is mandatory when DefaultDuration is set for the track (but can be omitted as other default values). When not written and with no DefaultDuration, the value is assumed to be the difference between the timestamp of this Block and the timestamp of the next Block in "display" order (not coding order). This Element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks. When set to 0 that means the frame is not a keyframe. |
ReferencePriority | 3 | [FA] | * | - | - | 0 | u | * | * | * | * | This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced. | |
ReferenceBlock | 3 | [FB] | - | * | - | - | i | * | * | * | * | * | Timestamp of another frame used as a reference (ie: B or P frame). The timestamp is relative to the block it's attached to. |
Slices | 3 | [8E] | - | - | - | - | m | * | * | * | * | * | Contains slices description. |
TimeSlice | 4 | [E8] | - | * | - | - | m | * | * | * | * | * | Contains extra time information about the data contained in the Block. While there are a few files in the wild with this Element, it is no longer in use and has been deprecated. Being able to interpret this Element is not required for playback. |
LaceNumber | 5 | [CC] | - | - | - | 0 | u | * | * | * | * | * | The reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this Element, it is no longer in use and has been deprecated. Being able to interpret this Element is not required for playback. |
詳細數據解析建議參考標准文檔或者mkv分析工具對比。
Cueing Data
Cueing Data這部分內容其實是關鍵幀的索引表,如果沒有關鍵幀的索引表的話,在做seek、快進快退的時候是十分困難的。需要逐個包去找。之前說過flv文件中官方對關鍵幀的索引表的規定。但是在民間已經做了補充。mkv官方有對索引表的規范。那就是Cueing Data。標准中對其定義如下:
Element Name | L | EBML ID | Ma | Mu | Rng | Default | T | 1 | 2 | 3 | 4 | W | Description-------------------------------------------------------------- |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cues | 1 | [1C][53][BB][6B] | - | - | - | - | m | * | * | * | * | * | A Top-Level Element to speed seeking access. All entries are local to the Segment. Should be mandatory for non "live" streams. |
CuePoint | 2 | [BB] | * | * | - | - | m | * | * | * | * | * | Contains all information relative to a seek point in the Segment. |
CueTime | 3 | [B3] | * | - | - | - | u | * | * | * | * | * | Absolute timestamp according to the Segment time base. |
CueTrackPositions | 3 | [B7] | * | * | - | - | m | * | * | * | * | * | Contain positions for different tracks corresponding to the timestamp. |
CueTrack | 4 | [F7] | * | - | not 0 | - | u | * | * | * | * | * | The track for which a position is given. |
CueClusterPosition | 4 | [F1] | * | - | - | - | u | * | * | * | * | * | The position of the Cluster containing the required Block. |
CueRelativePosition | 4 | [F0] | - | - | - | - | u | * | The relative position of the referenced block inside the cluster with 0 being the first possible position for an Element inside that cluster. | ||||
CueDuration | 4 | [B2] | - | - | - | - | u | * | The duration of the block according to the Segment time base. If missing the track's DefaultDuration does not apply and no duration information is available in terms of the cues. | ||||
CueBlockNumber | 4 | [53][78] | - | - | not 0 | 1 | u | * | * | * | * | * | Number of the Block in the specified Cluster. |
CueCodecState | 4 | [EA] | - | - | - | 0 | u | * | * | * | The position of the Codec State corresponding to this Cue Element. 0 means that the data is taken from the initial Track Entry. | ||
CueReference | 4 | [DB] | - | * | - | - | m | * | * | * | The Clusters containing the required referenced Blocks. | ||
CueRefTime | 5 | [96] | * | - | - | - | u | * | * | * | Timestamp of the referenced Block. |
至於最后兩個部分:Attachment、Tagging,建議參考標准中介紹的內容。這里面包含metadata相關的很多信息,也可以自定諸多其他自定義信息。
4. 關於其他問題的概述
-
對於支持多節目的容器格式,如何找到對應的音頻流、視頻流、字幕流?
在MKV文件的Track部分,包含的每個TrackEntry都是一個獨立的音頻流、視頻流或字幕流。通過這個可以知道當前容器中的多媒體格式。 -
如何確定該容器的節目播放時長?
Segment Info部分中有個duration字段,可以通過這個直接讀取節目時長。 -
MKV容器是否支持seek?有哪些輔助信息?
很明顯MKV的索引表保存在Cues部分,可以通過這里面提供的關鍵幀索引表實現快速seek。 -
哪里可以找到該容器格式最標准的文檔資料?
Matroska是開源的,可以通過https://www.matroska.org/直接訪問。也可以參考其提供的標准文檔。 -
有哪些可用的工具,方便分析容器格式異常或者錯誤?
比較常用的工具是mkvtoolnix,其他工具在Matroska-download中介紹很多,可以按照實際需求選擇。
5. 總結和參考資料
MKV是一個相對復雜的容器格式,但是在理解了基本原則基礎上閱讀標准文檔,整體還是非常清晰的,順序解析就可以完成。
主要參考如下: