kafka的JavaAPI操作

本文轉載自查看原文 2021-12-23 13:01 1069

一、創建maven工程並添加jar包
創建maven工程並添加以下依賴jar包的坐標到pom.xml

<dependencies>

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>1.0.0</version>
</dependency>

</dependencies>

<build>
<plugins>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>

二、生產者代碼
1、使用生產者，生產數據
/**
* 訂單的生產者代碼，
*/
public class OrderProducer {
public static void main(String[] args) throws InterruptedException {
/* 1、連接集群，通過配置文件的方式
* 2、發送數據-topic:order，value
*/
Properties props = new Properties(); props.put("bootstrap.servers", "node01:9092"); props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>
(props);
for (int i = 0; i < 1000; i++) {
// 發送數據 ,需要一個producerRecord對象,最少參數 String topic, V value kafkaProducer.send(new ProducerRecord<String, String>("order", "訂單信
息！"+i));
Thread.sleep(100);
}
}
}

2、kafka當中的數據分區
kafka生產者發送的消息，都是保存在broker當中，我們可以自定義分區規則，決定消息發送到哪個partition里面去進行保存
查看ProducerRecord這個類的源碼，就可以看到kafka的各種不同分區策略。
kafka當中支持以下四種數據的分區方式：

第一種分區策略，如果既沒有指定分區號，也沒有指定數據key，那么就會使用輪詢的方式將數據均勻的發送到不同的分區里面去
//ProducerRecord<String, String> producerRecord1 = new ProducerRecord<>("mypartition", "mymessage" + i);
//kafkaProducer.send(producerRecord1);

第二種分區策略如果沒有指定分區號，指定了數據key，通過key.hashCode % numPartitions來計算數據究竟會保存在哪一個分區里面
//注意：如果數據key，沒有變化 key.hashCode % numPartitions = 固定值所有的數據都會寫入到某一個分區里面去
//ProducerRecord<String, String> producerRecord2 = new ProducerRecord<>("mypartition", "mykey", "mymessage" + i);
//kafkaProducer.send(producerRecord2);

第三種分區策略：如果指定了分區號，那么就會將數據直接寫入到對應的分區里面去
// ProducerRecord<String, String> producerRecord3 = new ProducerRecord<>("mypartition", 0, "mykey", "mymessage" + i);
// kafkaProducer.send(producerRecord3);

第四種分區策略：自定義分區策略。如果不自定義分區規則，那么會將數據使用輪詢的方式均勻的發送到各個分區里面去
kafkaProducer.send(new ProducerRecord<String, String>("mypartition","mymessage"+i));

自定義分區策略

public class KafkaCustomPartitioner implements Partitioner {
  @Override
public void configure(Map<String, ?> configs) {
}

  @Override
public int partition(String topic, Object arg1, byte[] keyBytes, Object arg3, byte[] arg4, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int partitionNum = partitions.size();
Random random = new Random();
int partition = random.nextInt(partitionNum);
return partition;
}

  @Override
public void close() {
}
}
主代碼中添加配置

@Test
public void kafkaProducer() throws Exception {
//1、准備配置文件
Properties props = new Properties();
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("partitioner.class", "cn.itcast.kafka.partitioner.KafkaCustomPartitioner");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
//2、創建KafkaProducer
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(props);
for (int i=0;i<100;i++){
//3、發送數據
kafkaProducer.send(new ProducerRecord<String, String>("testpart","0","value"+i));
}

kafkaProducer.close();
}

三、消費者代碼
消費必要條件
消費者要從kafka Cluster進行消費數據，必要條件有以下四個

#1、地址
bootstrap.servers=node01:9092
#2、序列化
key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer
#3、主題（topic）需要制定具體的某個topic（order）即可。
#4、消費者組 group.id=test
1、自動提交offset
消費完成之后，自動提交offset

/**
* 消費訂單數據--- javaben.tojson
*/
public class OrderConsumer {
public static void main(String[] args) {
// 1\連接集群
Properties props = new Properties(); props.put("bootstrap.servers", "hadoop-01:9092"); props.put("group.id", "test");
//以下兩行代碼 ---消費者自動提交offset值
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>
(props);
// 2、發送數據發送數據需要，訂閱下要消費的topic。 order kafkaConsumer.subscribe(Arrays.asList("order"));
while (true) {
ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(100);// jdk queue offer插入、poll獲取元素。 blockingqueue put插入原生， take獲取元素
for (ConsumerRecord<String, String> record : consumerRecords) { System.out.println("消費的數據為：" + record.value());
}
}
}
}
2、手動提交offset
如果Consumer在獲取數據后，大數據培訓需要加入處理，數據完畢后才確認offset，需要程序來控制offset的確認？關閉自動提交確認選項

props.put("enable.auto.commit", "false");
手動提交o?set值
kafkaConsumer.commitSync();
完整代碼如下所示：
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
//關閉自動提交確認選項
props.put("enable.auto.commit", "false");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList("test"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
// 手動提交offset值
consumer.commitSync();
buffer.clear();
}
}

3、消費完每個分區之后手動提交offset
上面的示例使用commitSync將所有已接收的記錄標記為已提交。在某些情況下，您可能希望通過明確指定偏移量來更好地控制已提交的記錄。在下面的示例中，在完成處理每個分區中的記錄后提交偏移量。

try {
while(running) {
ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) { System.out.println(record.offset() + ": " + record.value());
}
long lastOffset = partitionRecords.get(partitionRecords.size() -1).offset();
consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
}
}
}
finally { consumer.close();
注意事項：
提交的偏移量應始終是應用程序將讀取的下一條消息的偏移量。因此，在調用commitSync（偏移量）時，應該在最后處理的消息的偏移量中添加一個

4、指定分區數據進行消費
1、如果進程正在維護與該分區關聯的某種本地狀態（如本地磁盤上的鍵值存儲），那么它應該只獲取它在磁盤上維護的分區的記錄。
2、如果進程本身具有高可用性，並且如果失敗則將重新啟動（可能使用YARN，Mesos或AWS工具等集群管理框架，或作為流處理框架的一部分）。在這種情況下，Kafka不需要檢測故障並重新分配分區，因為消耗過程將在另一台機器上重新啟動。

Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
//consumer.subscribe(Arrays.asList("foo", "bar"));
//手動指定消費指定分區的數據---start
String topic = "foo";
TopicPartition partition0 = new TopicPartition(topic, 0);
TopicPartition partition1 = new TopicPartition(topic, 1); consumer.assign(Arrays.asList(partition0, partition1));
//手動指定消費指定分區的數據---end
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
注意事項：
1、要使用此模式，您只需使用要使用的分區的完整列表調用assign（Collection），而不是使用subscribe訂閱主題。
2、主題與分區訂閱只能二選一

5、重復消費與數據丟失
已經消費的數據對於kafka來說，會將消費組里面的oﬀset值進行修改，那什么時候進行修改了？是在數據消費完成之后，比如在控制台打印完后自動提交；
提交過程：是通過kafka將oﬀset進行移動到下個message所處的oﬀset的位置。
拿到數據后，存儲到hbase中或者mysql中，如果hbase或者mysql在這個時候連接不上，就會拋出異常，如果在處理數據的時候已經進行了提交，那么kafka傷的oﬀset值已經進行了修改了，但是hbase或者mysql中沒有數據，這個時候就會出現數據丟失。
什么時候提交oﬀset值？在Consumer將數據處理完成之后，再來進行oﬀset的修改提交。默認情況下oﬀset是自動提交，需要修改為手動提交oﬀset值。
如果在處理代碼中正常處理了，但是在提交oﬀset請求的時候，沒有連接到kafka或者出現了故障，那么該次修改oﬀset的請求是失敗的，那么下次在進行讀取同一個分區中的數據時，會從已經處理掉的oﬀset值再進行處理一次，那么在hbase中或者mysql中就會產生兩條一樣的數據，也就是數據重復
6、consumer消費者消費數據流程

流程描述
Consumer連接指定的Topic partition所在leader broker，采用pull方式從kafkalogs中獲取消息。對於不同的消費模式，會將offset保存在不同的地方
官網關於high level API 以及low level API的簡介
http://kafka.apache.org/0100/documentation.html#impl_consumer

高階API（High Level API）
kafka消費者高階API簡單；隱藏Consumer與Broker細節；相關信息保存在zookeeper中。

/* create a connection to the cluster */
ConsumerConnector connector = Consumer.create(consumerConfig);
interface ConsumerConnector {
/**
This method is used to get a list of KafkaStreams, which are iterators over
MessageAndMetadata objects from which you can obtain messages and their
associated metadata (currently only topic).
Input: a map of <topic, #streams>
Output: a map of <topic, list of message streams>
*/
public Map<String,List<KafkaStream>> createMessageStreams(Map<String,Int> topicCountMap);
/**
You can also obtain a list of KafkaStreams, that iterate over messages
from topics that match a TopicFilter. (A TopicFilter encapsulates a
whitelist or a blacklist which is a standard Java regex.)
*/
public List<KafkaStream> createMessageStreamsByFilter( TopicFilter topicFilter, int numStreams);
/* Commit the offsets of all messages consumed so far. */ public commitOffsets()
/* Shut down the connector */ public shutdown()
}
說明：大部分的操作都已經封裝好了，比如：當前消費到哪個位置下了，但是不夠靈活（工作過程推薦使用）

低級API(Low Level API)
kafka消費者低級API非常靈活；需要自己負責維護連接Controller Broker。保存offset，Consumer Partition對應關系。

class SimpleConsumer {
/* Send fetch request to a broker and get back a set of messages. */
public ByteBufferMessageSet fetch(FetchRequest request);
/* Send a list of fetch requests to a broker and get back a response set. */ public MultiFetchResponse multifetch(List<FetchRequest> fetches);
/**
Get a list of valid offsets (up to maxSize) before the given time.
The result is a list of offsets, in descending order.
@param time: time in millisecs,
if set to OffsetRequest$.MODULE$.LATEST_TIME(), get from the latest
offset available. if set to OffsetRequest$.MODULE$.EARLIEST_TIME(), get from the earliest
available. public long[] getOffsetsBefore(String topic, int partition, long time, int maxNumOffsets);
* offset
*/
說明：沒有進行包裝，所有的操作有用戶決定，如自己的保存某一個分區下的記錄，你當前消費到哪個位置。

四、kafka Streams API開發
需求：使用StreamAPI獲取test這個topic當中的數據，然后將數據全部轉為大寫，寫入到test2這個topic當中去

第一步：創建一個topic
node01服務器使用以下命令來常見一個topic 名稱為test2
cd /export/servers/kafka_2.11-1.0.0/
bin/kafka-topics.sh --create --partitions 3 --replication-factor 2 --topic test2 --zookeeper node01:2181,node02:2181,node03:2181

第二步：開發StreamAPI
public class StreamAPI {
public static void main(String[] args) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "node01:9092");
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
KStreamBuilder builder = new KStreamBuilder();
builder.stream("test").mapValues(line -> line.toString().toUpperCase()).to("test2");
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
}
}
第三步：生產數據
node01執行以下命令，向test這個topic當中生產數據
cd /export/servers/kafka_2.11-1.0.0
bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic test

第四步：消費數據
node02執行一下命令消費test2這個topic當中的數據
cd /export/servers/kafka_2.11-1.0.0
bin/kafka-console-consumer.sh --from-beginning --topic test2 --zookeeper node01:2181,node02:2181,node03:2181

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Kafka】JavaAPI操作 ES javaAPI操作 Kafka-JavaAPI(Producer And Consumer) 使用javaAPI操作hdfs javaAPI操作Hbase hadoop的hdfs中的javaAPI操作 javaAPI操作ES分組聚合十三、 ELASTICSEARCH JAVAAPI-查詢操作 hbase javaAPI創建，刪除，查詢表操作 Hbase調用JavaAPI實現批量導入操作