利用Flink消費Kafka數據保證全局有序


Kafka 是現在大數據中流行的消息中間件,其中 kafka 中由 topic 組成,而 topic 下又可以由多個 partition 構成。有時候我們在消費 kafka 中的數據想要保證消費 kafka 中的所有的分區下數據是全局有序的,這種情況下就需要將 topic 下的 partition 的數量設置為一個這樣才會保證全局有序,但是這種情況消費數據並沒有多並發,也就影響效率。

在 Flink 中則可以即保證消費 kafka 中的數據全局有序,又可以構成多並發,這就是 flink 中的時間特性帶來的效果。Flink 在創建 kafka 的數據源時可以將其中的所有數據都存有時間並設置對應的 watermark,這樣利用 event time 對 kafka 中的數據已經形成了時間概念上的全局有序性,當 flink 在消費其中的數據時則根據時間處理即可保證 kafka 中數據的全局有序性。

下面一部分內容來至於 flink 的官網,鏈接:

https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_timestamps_watermarks.html

Timestamps per Kafka Partition

When using Apache Kafka as a data source, each Kafka partition may have a simple event time pattern (ascending timestamps or bounded out-of-orderness). However, when consuming streams from Kafka, multiple partitions often get consumed in parallel, interleaving the events from the partitions and destroying the per-partition patterns (this is inherent in how Kafka’s consumer clients work).

In that case, you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.

For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the ascending timestamps watermark generator will result in perfect overall watermarks.

The illustrations below show how to use the per-Kafka-partition watermark generation, and how watermarks propagate through the streaming dataflow in that case.

FlinkKafkaConsumer09<MyType> kafkaSource = new FlinkKafkaConsumer09<>("myTopic", schema, props);
kafkaSource.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyType>() {

    @Override
    public long extractAscendingTimestamp(MyType element) {
        return element.eventTimestamp();
    }
});
DataStream<MyType> stream = env.addSource(kafkaSource);

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM