kafka-文件存儲格式
kafka的消息和偏移量保存在文件里。保存在磁盤上的數據格式與從生產者發送過來或者發送給消費者的消息格式是一樣的。因為使用了相同的消息格式進行磁盤存儲和網絡傳輸,kafka可以使用零復制技術給消費者發送消息,同時避免了對生產者已經壓縮過的消息進行解壓和再壓縮。
除了鍵、值和偏移量外,消息里還包含了消息大小、校驗和、消息格式版本號、壓縮算法(Snappy、GZip或LZ4)和時間戳。時間戳可以是生產者發送消息的時間,也可以是消息到達broker的時間,這個是可配置的。
如果生產者發送的是壓縮過的消息,那么同一個批次的消息會被壓縮在一起,被當做“包裝消息”進行發送。於是,broker就會收到一個這樣的消息,然后再把它發送給消費者。消費者在解壓這個消息之后,會看到整個批次的消息,它們都有自己的時間戳和偏移量。
如果在生產者端使用了壓縮功能,那么發送的批次越大,就意味着在網絡傳輸和磁盤存儲方面會獲得越好的壓縮性能,同時意味着如果修改了消費者使用的消息格式,那么網絡傳輸和磁盤存儲的格式也要隨之修改,而且broker要知道如何處理包含了兩種消息格式的文件。
kafka附帶了一個叫DumpLogSegment的工具,可以用它查看片段的內容。它可以顯示每個消息的偏移量、校驗和、魔術數字節、消息大小和壓縮算法。運行該工具的方式如下
/kafka-run-class.sh kafka.tools.DumpLogSegments
如果使用--deep-iteration參數,可以顯示被壓縮到包裝消息里的消息。
--files參數,用於指定想查看的分區片段
--print-data-log參數,指定打印詳細內容
FengZhendeMacBook-Pro:bin FengZhen$ ./kafka-run-class.sh kafka.tools.DumpLogSegments --print-data-log --files /tmp/kafka-logs/test_partition1-0/00000000000000000000.log Dumping /tmp/kafka-logs/test_partition1-0/00000000000000000000.log Starting offset: 0 baseOffset: 0 lastOffset: 2 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 0 CreateTime: 1593354030017 size: 124 magic: 2 compresscodec: NONE crc: 286907760 isvalid: true | offset: 0 CreateTime: 1593354030017 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana2 payload: 我是2 | offset: 1 CreateTime: 1593354030017 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana4 payload: 我是4 | offset: 2 CreateTime: 1593354030017 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana7 payload: 我是7 baseOffset: 3 lastOffset: 5 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 124 CreateTime: 1593354171689 size: 124 magic: 2 compresscodec: NONE crc: 1670561994 isvalid: true | offset: 3 CreateTime: 1593354171688 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana2 payload: 我是2 | offset: 4 CreateTime: 1593354171689 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana4 payload: 我是4 | offset: 5 CreateTime: 1593354171689 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana7 payload: 我是7 baseOffset: 6 lastOffset: 8 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 248 CreateTime: 1593354604994 size: 124 magic: 2 compresscodec: NONE crc: 4132128811 isvalid: true | offset: 6 CreateTime: 1593354604993 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana2 payload: 我是2 | offset: 7 CreateTime: 1593354604994 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana4 payload: 我是4 | offset: 8 CreateTime: 1593354604994 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana7 payload: 我是7 baseOffset: 9 lastOffset: 11 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 372 CreateTime: 1593355312853 size: 124 magic: 2 compresscodec: NONE crc: 4126984011 isvalid: true | offset: 9 CreateTime: 1593355312852 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana2 payload: 我是2 | offset: 10 CreateTime: 1593355312853 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana4 payload: 我是4 | offset: 11 CreateTime: 1593355312853 keysize: 7 valuesize: 7 sequence: -1 headerKeys: [] key: Banana7 payload: 我是7 baseOffset: 12 lastOffset: 14 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 496 CreateTime: 1593355487628 size: 1588 magic: 2 compresscodec: NONE crc: 2183807263 isvalid: true | offset: 12 CreateTime: 1593355487628 keysize: 7 valuesize: 493 sequence: -1 headerKeys: [] key: Banana2 payload: 【我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段】2 | offset: 13 CreateTime: 1593355487628 keysize: 7 valuesize: 493 sequence: -1 headerKeys: [] key: Banana4 payload: 【我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段】4 | offset: 14 CreateTime: 1593355487628 keysize: 7 valuesize: 493 sequence: -1 headerKeys: [] key: Banana7 payload: 【我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段我是測試分段】7