大部分的 DataStream API 的算子的输出是单一输出,也就是某种数据类型的流。除了 split 算子,可以将一条流分成多条流,这些流的数据类型也都相同。processfunction 的 side outputs 功能可以产生多条流,并且这些流的数据类型可以不一样。一个 sideoutput 可以定义为 OutputTag[X]对象,X 是输出流的数据类型。processfunction 可以通过 Context 对象发射一个事件到一个或者多个 side outputs。
下面的代码演示了低于32F的温度信息进入到测输出流"freezing alert"中。
object SideOutputTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val socketStream = env.socketTextStream("hadoop102", 7777)
val dataStream: DataStream[SensorReading] = socketStream.map(d => {
val arr = d.split(",")
SensorReading(arr(0).trim, arr(1).trim.toLong, arr(2).toDouble)
})
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) {
override def extractTimestamp(t: SensorReading): Long = t.timestamp * 1000
})
//低温报警处理
val processStream = dataStream.process(new FreezingAlert)
//打印主输出流
processStream.print("process stream")
//打印侧输出流。先得到某个测输出流。
processStream.getSideOutput(new OutputTag[String]("freezing alert")).print("freezing alert")
env.execute("window test")
}
}
class FreezingAlert extends ProcessFunction[SensorReading, SensorReading] {
lazy val tag = new OutputTag[String]("freezing alert")
override def processElement(value: SensorReading,
ctx: ProcessFunction[SensorReading, SensorReading]#Context,
collector: Collector[SensorReading]): Unit = {
if (value.temperature<32){
//侧输出流
ctx.output(tag,"freezing alert for " + value.temperature)
}else{
//主输出流
collector.collect(value)
}
}
}
端口数据
[atguigu@hadoop102 ~]$ nc -lk 7777 sensor_1, 1547718200, 30 sensor_1, 1547718200, 25 sensor_1, 1547718200, 35
控制台打印
freezing alert> freezing alert for 30.0 freezing alert> freezing alert for 25.0 process stream> SensorReading(sensor_1,1547718200,35.0)
