1.Model level
###1. DataStream API
use Data Source
environment.fromSource(
Source<OUT, ?, ?> source,
WatermarkStrategy<OUT> timestampsAndWatermarks,
String sourceName)
StreamExecutionEnvironment.addSource(sourceFunction).
###2.DataSet API
DataSet Transformations
###3.Table API & SQL
使用Java開發 依賴
flink-table-common
flink-table-api-java-bridge
flink-table-planner-blink
flink-table-runtime-blink
引入:
org.apache.flink.table.api.Table;
org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
org.apache.flink.table.api.bridge.java.BatchTableEnvironment
Flink 1.11 引入了新的 Table Source 和 Sink 接口(即 DynamicTableSource 和 DynamicTableSink ),
org.apache.flink.table.connector.source
org.apache.flink.table.connector.sink
這一接口可以統一批作業和流作業
2.Data Types
Supported Data Types
Type handling
Creating a TypeInformation or TypeSerializer
Data Types in the Table API
org.apache.flink.table.types.DataType within the Table API
or when defining connectors, catalogs,
or user-defined functions.
3.Connector
從數據講,有三類connector
DataStream Connectors
DataSet Connectors
Table & SQL Connectors
作用:
01.DataStream Connectors
Predefined Sources and Sinks
Bundled Connectors
Connectors in Apache Bahir
Other Ways to Connect to Flink
Data Enrichment via Async I/O
Queryable State
02.DataSet Connectors
file systems
other systems using Input/OutputFormat wrappers for Hadoop
03.Table & SQL Connectors : register table sources and table sinks
Flink’s table connectors.
User-defined Sources & Sinks == develop a custom, user-defined connector.
Metadata Planning Runtime
實現:
Dynamic Table Source Dynamic Table Factories
Dynamic Table Sink Encoding / Decoding Formats
Predefined Sources and Sinks
1.pre-defined source connectors
自定義的Source SourceOperators
flink-core
org.apache.flink.api.connector.source.SourceSplit
org.apache.flink.api.connector.source.SourceReader
org.apache.flink.api.connector.source.SplitEnumerator
org.apache.flink.api.connector.source.event.NoMoreSplitsEvent
自定義一個新的 數據源或者理解Fink的數據源的原理
Sources and sinks are often summarized under the term connector.
4.Refactor Source Interface
. Data Source API
Flink提供的Source - Data Source API
01. A Data Source has three core components:
Splits , the SplitEnumerator, and the SourceReader.
在有界或者批處理的情況下,
the enumerator generates a fix set of splits, and each split is necessarily finite.
讀取完成后,會返回 NoMoreSplits ,即 有限的splits,且每一個 split是有界的
在無界的流處理情況下
one of the two is not true (splits are not finite, or the enumerator keep generating new splits).
例如:
Bounded File Source
Unbounded Streaming File Source
SplitEnumerator 不對 NoMoreSplits做回應,且周期的查看內容
02.The Source API is 工廠模式的接口來創建以下組件
Split Serializer
Split Enumerator
Enumerator Checkpoint Serializer
Source Reader 消費來自Split的消息
03.
The SplitReader is the high-level API for simple synchronous reading/polling-based source implementations,
SourceReaderBase
SplitFetcherManager
數據源的Event Time and Watermarks ,不要使用老的函數了,因為數據源已經assigned
2. Data Source Function
01.預定義的 Source 和 Sink,
(內置在 Flink 里 直接使用,一般用於調試驗證等,不需要引入外部依賴)
pre-implemented source functions,
File-based
Socket-based
Collection-based
02.Connectors provide code for interfacing with various third-party systems
連接器可以和多種多樣的第三方系統進行交互
001.Flink里已經提供了一些綁定的Connector(需要 將相應的connetor相關類打包進)
public abstract class KafkaDynamicSinkBase implements DynamicTableSink
public interface ScanTableSource extends DynamicTableSource
org.apache.flink.table.connector.sink.DynamicTableSink
org.apache.flink.table.connector.source.DynamicTableSource
002.Apache Bahir中的連接器
03.Flink 提供了異步 I/O API 連接Fink,一般用於訪問外部數據庫
異步I/O可以並發處理多個請求,提高吞吐,減少延遲
04.可查詢狀態