直接上代碼
StreamingExamples.setStreamingLogLevels()
val Array(brokers, topics) = args
// Create context with 2 second batch interval
// 創建conf,spark streaming至少要啟動兩個線程,一個負責接受數據,一個負責處理數據
val conf = new SparkConf().setMaster("local[4]").setAppName("NetworkWordCount")
// 創建StreamingContext,每隔2秒產生一個批次
val ssc = new StreamingContext(conf, Seconds(2));
val topicsSet = topics.split(",").toSet
// 配置Kafka參數
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
// 用直連方式讀取Kafka數據,在Kafka中讀取偏移量
val messages = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,// 位置策略(如果Kafka和spark程序在同一台機器,會從最優位置讀取數據【當前位置】)
ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams))// 訂閱策略(可以指定用正則的方式讀取topic【topic-*】)
//====================在下面寫業務邏輯============================
val lines = messages.map(_.value())
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x=>(x, 1L)).reduceByKey(_+_)
wordCounts.print()
//====================在上面寫業務邏輯============================
ssc.start()
ssc.awaitTermination()
打包報錯
Error:(44, 49) overloaded method value createDirectStream with alternatives:
(jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String],perPartitionConfig: org.apache.spark.streaming.kafka010.PerPartitionConfig)org.apache.spark.streaming.api.java.JavaInputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String])org.apache.spark.streaming.api.java.JavaInputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(ssc: org.apache.spark.streaming.StreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String],perPartitionConfig: org.apache.spark.streaming.kafka010.PerPartitionConfig)org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(ssc: org.apache.spark.streaming.StreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String])org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]]
cannot be applied to (org.apache.spark.streaming.StreamingContext, org.apache.spark.streaming.kafka010.LocationStrategy, org.apache.spark.streaming.kafka010.ConsumerStrategy[Nothing,Nothing])
val messages = KafkaUtils.createDirectStream[String, String](
這是一個很長的信息,說主題需要設置[字符串],而不是設置[字符]。
我能看到解決這個問題的最佳方法是:
val topicsSet = topics.toString.split(",").toSet
但是,如果你真的只有一個主題,那么只需按照上面的Set(topics)將字符串拆分成一組單個字符。
