Flink源码阅读（二）——checkpoint源码分析

本文转载自查看原文 2019-10-30 02:33 574 Flink

前言

　　在Flink原理——容错机制一文中，已对checkpoint的机制有了较为基础的介绍，本文着重从源码方面去分析checkpoint的过程。当然本文只是分析做checkpoint的调度过程，只是尽量弄清楚整体的逻辑，没有弄清楚其实现细节，还是有遗憾的，后期还是努力去分析实现细节。文中若是有误，欢迎大伙留言指出！

　　本文基于Flink1.9。

1、参数设置

　　1.1 有关checkpoint常见的参数如下：

1 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 2 env.enableCheckpointing(10000); //默认是不开启的　　 3 env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); //默认为EXACTLY_ONCE 4 env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000);　　//默认为0，最大值为1年 5 env.getCheckpointConfig().setCheckpointTimeout(150000);　　//默认为10min 6 env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);　　//默认为1

　　上述参数的默认值可见flink-streaming-java*.jar中的CheckpointConfig.java，配置值是通过该类中私有configureCheckpointing()的jobGraph.setSnapshotSettings(settings)传递给runtime层的，更多设置也可以参见该类。

　　1.2 参数分析

　　这里着重分析enableCheckpointing()设置的baseInterval和minPauseBetweenCheckpoint之间的关系。为分析两者的关系，这里先给出源码中定义

1     /** The base checkpoint interval. Actual trigger time may be affected by the 2  * max concurrent checkpoints and minimum-pause values */
3     //checkpoint触发周期，时间触发时间还受maxConcurrentCheckpointAttempts和minPauseBetweenCheckpointsNanos影响
4     private final long baseInterval; 5     
6     /** The min time(in ns) to delay after a checkpoint could be triggered. Allows to 7  * enforce minimum processing time between checkpoint attempts */
8     //在可以触发checkpoint的时，两次checkpoint之间的时间间隔
9     private final long minPauseBetweenCheckpointsNanos;

　　当baseInterval<minPauseBetweenCheckpoint时，在CheckpointCoordinator.java源码中定义如下：

1     // it does not make sense to schedule checkpoints more often then the desired 2     // time between checkpoints
3     long baseInterval = chkConfig.getCheckpointInterval(); 4     if (baseInterval < minPauseBetweenCheckpoints) { 5         baseInterval = minPauseBetweenCheckpoints; 6     }

　　从此可以看出，checkpoint的触发虽然设置为周期性的，但是实际触发情况，还得考虑minPauseBetweenCheckpoint和maxConcurrentCheckpointAttempts，若maxConcurrentCheckpointAttempts为1，就算满足触发时间也需等待正在执行的checkpoint结束。

2、checkpoint调用过程

　　将JobGraph提交到Dispatcher后，会createJobManagerRunner和startJobManagerRunner，可以关注Dispatcher类中的createJobManagerRunner(...)方法。

　　2.1 createJobManagerRunner阶段

　　该阶段会创建一个JobManagerRunner实例，在该过程和checkpoint有关的是会启动listener去监听job的状态。

 1 　　#JobManagerRunner.java  2     public JobManagerRunner(...) throws Exception {  3 
 4         //..........  5 
 6         // make sure we cleanly shut down out JobManager services if initialization fails
 7         try {  8             //..........  9             //加载JobGraph、library、leader选举等 10 
11             // now start the JobManager 12             //启动JobManager
13             this.jobMasterService = jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader); 14  } 15         catch (Throwable t) { 16             //......
17  } 18  } 19     
20     //在DefaultJobMasterServiceFactory类的createJobMasterService()中新建一个JobMaster对象 21     //#JobMaster.java
22     public JobMaster(...) throws Exception { 23 
24         //........ 25         //该方法中主要做了参数检查，slotPool的创建、slotPool的schedul的创建等一系列的事情 26         
27         //创建一个调度器
28         this.schedulerNG = createScheduler(jobManagerJobMetricGroup); 29         //......
30     }

　　在创建调度器中核心的语句如下：

 1 　　//#LegacyScheduler.java中的LegacyScheduler()  2     //创建ExecutionGraph
 3     this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup, checkNotNull(shuffleMaster), checkNotNull(partitionTracker));  4 　　
 5 
 6     private ExecutionGraph createAndRestoreExecutionGraph(  7  JobManagerJobMetricGroup currentJobManagerJobMetricGroup,  8         ShuffleMaster<?> shuffleMaster,  9         PartitionTracker partitionTracker) throws Exception { 10 
11         
12         ExecutionGraph newExecutionGraph = createExecutionGraph(currentJobManagerJobMetricGroup, shuffleMaster, partitionTracker); 13 
14         final CheckpointCoordinator checkpointCoordinator = newExecutionGraph.getCheckpointCoordinator(); 15 
16         if (checkpointCoordinator != null) { 17             // check whether we find a valid checkpoint 18             //若state没有被恢复是否可以通过savepoint恢复 19             //......
20  } 21  } 22 
23         return newExecutionGraph; 24     }

　　通过调用到达生成ExecutionGraph的核心类ExecutionGraphBuilder的在buildGraph()方法，其中该方法主要是生成ExecutionGraph和设置checkpoint，下面给出其中的核心代码：

 1     //..............  2     //生成ExecutionGraph的核心方法，这里后期会详细分析
 3  executionGraph.attachJobGraph(sortedTopology);  4     
 5     //.......................  6         
 7     //在enableCheckpointing中设置CheckpointCoordinator
 8  executionGraph.enableCheckpointing(  9  chkConfig, 10  triggerVertices, 11  ackVertices, 12  confirmVertices, 13  hooks, 14  checkpointIdCounter, 15  completedCheckpoints, 16  rootBackend, 17         checkpointStatsTracker);

　　在enableCheckpointing()方法中主要是创建了checkpoint失败是的manager、设置了checkpoint的核心类CheckpointCoordinator。

 1     //#ExecutionGraph.java
 2     public void enableCheckpointing(  3  CheckpointCoordinatorConfiguration chkConfig,  4             List<ExecutionJobVertex> verticesToTrigger,  5             List<ExecutionJobVertex> verticesToWaitFor,  6             List<ExecutionJobVertex> verticesToCommitTo,  7             List<MasterTriggerRestoreHook<?>> masterHooks,  8  CheckpointIDCounter checkpointIDCounter,  9  CompletedCheckpointStore checkpointStore, 10  StateBackend checkpointStateBackend, 11  CheckpointStatsTracker statsTracker) { 12         //Job的状态必须为Created，
13         checkState(state == JobStatus.CREATED, "Job must be in CREATED state"); 14         checkState(checkpointCoordinator == null, "checkpointing already enabled"); 15         //checkpointing的不同状态
16         ExecutionVertex[] tasksToTrigger = collectExecutionVertices(verticesToTrigger); 17         ExecutionVertex[] tasksToWaitFor = collectExecutionVertices(verticesToWaitFor); 18         ExecutionVertex[] tasksToCommitTo = collectExecutionVertices(verticesToCommitTo); 19 
20         checkpointStatsTracker = checkNotNull(statsTracker, "CheckpointStatsTracker"); 21         //checkpoint失败manager，若是checkpoint失败会根据设置来决定下一步
22         CheckpointFailureManager failureManager = new CheckpointFailureManager( 23  chkConfig.getTolerableCheckpointFailureNumber(), 24             new CheckpointFailureManager.FailJobCallback() { 25  @Override 26                 public void failJob(Throwable cause) { 27                     getJobMasterMainThreadExecutor().execute(() -> failGlobal(cause)); 28  } 29 
30  @Override 31                 public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingTask) { 32                     getJobMasterMainThreadExecutor().execute(() -> failGlobalIfExecutionIsStillRunning(cause, failingTask)); 33  } 34  } 35  ); 36 
37         // create the coordinator that triggers and commits checkpoints and holds the state 38         //checkpoint的核心类CheckpointCoordinator
39         checkpointCoordinator = new CheckpointCoordinator( 40  jobInformation.getJobId(), 41  chkConfig, 42  tasksToTrigger, 43  tasksToWaitFor, 44  tasksToCommitTo, 45  checkpointIDCounter, 46  checkpointStore, 47  checkpointStateBackend, 48  ioExecutor, 49  SharedStateRegistry.DEFAULT_FACTORY, 50  failureManager); 51 
52         // register the master hooks on the checkpoint coordinator
53         for (MasterTriggerRestoreHook<?> hook : masterHooks) { 54             if (!checkpointCoordinator.addMasterHook(hook)) { 55                 LOG.warn("Trying to register multiple checkpoint hooks with the name: {}", hook.getIdentifier()); 56  } 57  } 58         //checkpoint统计
59  checkpointCoordinator.setCheckpointStatsTracker(checkpointStatsTracker); 60 
61         // interval of max long value indicates disable periodic checkpoint, 62         // the CheckpointActivatorDeactivator should be created only if the interval is not max value 63         //设置为Long.MAX_VALUE标识关闭周期性的checkpoint
64         if (chkConfig.getCheckpointInterval() != Long.MAX_VALUE) { 65             // the periodic checkpoint scheduler is activated and deactivated as a result of 66             // job status changes (running -> on, all other states -> off) 67             //只有在job的状态为running时，才会开启checkpoint的scheduler 68             //createActivatorDeactivator()创建一个listener监听器 69             //registerJobStatusListener()将listener加入监听器集合jobStatusListeners中
70  registerJobStatusListener(checkpointCoordinator.createActivatorDeactivator()); 71  } 72  } 73     
74     
75     //#CheckpointCoordinator.java
76     / ------------------------------------------------------------------------
77     // job status listener that schedules / cancels periodic checkpoints 78     // ------------------------------------------------------------------------ 79     //创建一个listener监听器checkpointCoordinator.createActivatorDeactivator()
80     public JobStatusListener createActivatorDeactivator() { 81         synchronized (lock) { 82             if (shutdown) { 83                 throw new IllegalArgumentException("Checkpoint coordinator is shut down"); 84  } 85 
86             if (jobStatusListener == null) { 87                 jobStatusListener = new CheckpointCoordinatorDeActivator(this); 88  } 89 
90             return jobStatusListener; 91  } 92     }

　　至此，createJobManagerRunner阶段结束了，ExecutionGraph中checkpoint的配置就设置好了。

　　2.2 startJobManagerRunner阶段

　　在该阶段中，在获得leaderShip之后，就会启动startJobExecution，这里只给出调用涉及的类和方法：

1     //#JobManagerRunner.java类中 2     //grantLeadership(...)==>verifyJobSchedulingStatusAndStartJobManager(...) 3     //==>startJobMaster(...)，该方法中核心代码为
4     startFuture = jobMasterService.start(new JobMasterId(leaderSessionId)); 5     
6     //进一步调用#JobMaster.java类中的start()==>startJobExecution(...)

　　startJobExecution()方法是JobMaster类中的私有方法，具体代码分析如下：

 1 　　//----------------------------------------------------------------------------------------------  2     // Internal methods  3     //----------------------------------------------------------------------------------------------  4 
 5     //-- job starting and stopping -----------------------------------------------------------------
 6 
 7     private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {  8 
 9  validateRunsInMainThread(); 10 
11         checkNotNull(newJobMasterId, "The new JobMasterId must not be null."); 12 
13         if (Objects.equals(getFencingToken(), newJobMasterId)) { 14             log.info("Already started the job execution with JobMasterId {}.", newJobMasterId); 15 
16             return Acknowledge.get(); 17  } 18 
19  setNewFencingToken(newJobMasterId); 20         //启动slotPool并申请资源，该方法可以具体看看申请资源的过程
21  startJobMasterServices(); 22 
23         log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId); 24         //执行ExecuteGraph的切入口，先判断job的状态是否为created的，后调执行executionGraph.scheduleForExecution();
25  resetAndStartScheduler(); 26 
27         return Acknowledge.get(); 28     }

　　在LegacyScheduler类中的方法scheduleForExecution()调度过程如下：

 1     public void scheduleForExecution() throws JobException {
 2 
 3         assertRunningInJobMasterMainThread();
 4 
 5         final long currentGlobalModVersion = globalModVersion;
 6         //任务执行之前进行状态切换从CREATED到RUNNING，
 7         //transitionState(...)方法中会通过notifyJobStatusChange(newState, error)通知jobStatusListeners集合中listeners状态改变
 8         if (transitionState(JobStatus.CREATED, JobStatus.RUNNING)) {
 9             //根据启动算子调度模式不同，采用不同的调度方案
10             final CompletableFuture<Void> newSchedulingFuture = SchedulingUtils.schedule(
11                 scheduleMode,
12                 getAllExecutionVertices(),
13                 this);
14             
15             //..............
16         }
17         else {
18             throw new IllegalStateException("Job may only be scheduled from state " + JobStatus.CREATED);
19         }
20     }
21     
22     private void notifyJobStatusChange(JobStatus newState, Throwable error) {
23         if (jobStatusListeners.size() > 0) {
24             final long timestamp = System.currentTimeMillis();
25             final Throwable serializedError = error == null ? null : new SerializedThrowable(error);
26 
27             for (JobStatusListener listener : jobStatusListeners) {
28                 try {
29                     listener.jobStatusChanges(getJobID(), newState, timestamp, serializedError);
30                 } catch (Throwable t) {
31                     LOG.warn("Error while notifying JobStatusListener", t);
32                 }
33             }
34         }
35     }
36     
37     
38     //#CheckpointCoordinatorDeActivator.java
39     public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {
40         if (newJobStatus == JobStatus.RUNNING) {
41             // start the checkpoint scheduler
42             //触发checkpoint的核心方法
43             coordinator.startCheckpointScheduler();
44         } else {
45             // anything else should stop the trigger for now
46             coordinator.stopCheckpointScheduler();
47         }
48     }

　　下面具体分析触发checkpoint的核心方法startCheckpointScheduler()。

　　startCheckpointScheduler()方法结合注释还是比较好理解的，但由于方法太长这里就不全部贴出来了，先分析一下大致做什么了，然后给出其核心代码：

　　1）检查触发checkpoint的条件。如coordinator被关闭、周期性checkpoint被禁止、在没有开启强制checkpoint的情况下没有达到最小的checkpoint间隔以及超过并发的checkpoint个数等；

　　2）检查是否所有需要checkpoint和需要响应checkpoint的ACK（的task都处于running状态，否则抛出异常；

　　3）若均符合，执行checkpointID = checkpointIdCounter.getAndIncrement();以生成一个新的checkpointID，然后生成一个PendingCheckpoint。其中，PendingCheckpoint仅是一个启动了的checkpoint，但是还没有被确认，直到所有的task都确认了本次checkpoint，该checkpoint对象才转化为一个CompletedCheckpoint；

　　4）调度timer清理失败的checkpoint；

　　5）定义一个超时callback，如果checkpoint执行了很久还没完成，就把它取消；

　　6）触发MasterHooks，用户可以定义一些额外的操作，用以增强checkpoint的功能（如准备和清理外部资源）；

　　核心代码如下：

1     // send the messages to the tasks that trigger their checkpoint 2     //遍历ExecutionVertex，是否异步触发checkpoint
3     for (Execution execution: executions) { 4         if (props.isSynchronous()) { 5  execution.triggerSynchronousSavepoint(checkpointID, timestamp, checkpointOptions, advanceToEndOfTime); 6         } else { 7  execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions); 8  } 9     }

　　不管是否以异步的方式触发checkpoint，最终调用的方法是Execution类中的私有方法triggerCheckpointHelper(...)，具体代码如下：

 1 　　//Execution.java  2     private void triggerCheckpointHelper(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {  3 
 4         final CheckpointType checkpointType = checkpointOptions.getCheckpointType();  5         if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {  6             throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");  7  }  8 
 9         final LogicalSlot slot = assignedResource; 10 
11         if (slot != null) { 12             //TaskManagerGateway是用于与taskManager通信的组件
13             final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway(); 14 
15  taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions, advanceToEndOfEventTime); 16         } else { 17             LOG.debug("The execution has no slot assigned. This indicates that the execution is no longer running."); 18  } 19     }

　　至此，checkpointCoordinator就将做checkpoint的命令发送到TaskManager去了，下面着重分析TM中checkpoint的执行过程。

　　2.3 TaskManager中checkpoint

　　TaskManager 接收到触发checkpoint的RPC后，会触发生成checkpoint barrier。RpcTaskManagerGateway作为消息入口，其triggerCheckpoint(...)会调用TaskExecutor的triggerCheckpoint(...)，具体过程如下：

 1 　　//RpcTaskManagerGateway.java
 2     public void triggerCheckpoint(ExecutionAttemptID executionAttemptID, JobID jobId, long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {  3  taskExecutorGateway.triggerCheckpoint(  4  executionAttemptID,  5  checkpointId,  6  timestamp,  7  checkpointOptions,  8  advanceToEndOfEventTime);  9  } 10     
11     //TaskExecutor.java
12  @Override 13     public CompletableFuture<Acknowledge> triggerCheckpoint( 14  ExecutionAttemptID executionAttemptID, 15             long checkpointId, 16             long checkpointTimestamp, 17  CheckpointOptions checkpointOptions, 18             boolean advanceToEndOfEventTime) { 19         log.debug("Trigger checkpoint {}@{} for {}.", checkpointId, checkpointTimestamp, executionAttemptID); 20 
21         //...........
22 
23         if (task != null) { 24             //核心方法，触发生成barrier
25  task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions, advanceToEndOfEventTime); 26 
27             return CompletableFuture.completedFuture(Acknowledge.get()); 28         } else { 29             final String message = "TaskManager received a checkpoint request for unknown task " + executionAttemptID + '.'; 30 
31             //.........
32  } 33     }

　　在Task类的triggerCheckpointBarrier(...)方法中生成了一个Runable匿名类用于执行checkpoint，然后以异步的方式触发了该Runable，具体代码如下：

 1 　　　　public void triggerCheckpointBarrier(  2             final long checkpointID,  3             final long checkpointTimestamp,  4             final CheckpointOptions checkpointOptions,  5             final boolean advanceToEndOfEventTime) {  6 
 7         final AbstractInvokable invokable = this.invokable;  8         //创建一个CheckpointMetaData，该对象仅有checkpointID、checkpointTimestamp两个属性
 9         final CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointID, checkpointTimestamp); 10 
11         if (executionState == ExecutionState.RUNNING && invokable != null) { 12 
13             //..............
14 
15             Runnable runnable = new Runnable() { 16  @Override 17                 public void run() { 18                     // set safety net from the task's context for checkpointing thread
19                     LOG.debug("Creating FileSystem stream leak safety net for {}", Thread.currentThread().getName()); 20  FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(safetyNetCloseableRegistry); 21 
22                     try { 23                         //根据SourceStreamTask和StreamTask调用不同的方法
24                         boolean success = invokable.triggerCheckpoint(checkpointMetaData, checkpointOptions, advanceToEndOfEventTime); 25                         if (!success) { 26  checkpointResponder.declineCheckpoint( 27  getJobID(), getExecutionId(), checkpointID, 28                                     new CheckpointException("Task Name" + taskName, CheckpointFailureReason.CHECKPOINT_DECLINED_TASK_NOT_READY)); 29  } 30  } 31                     catch (Throwable t) { 32                         if (getExecutionState() == ExecutionState.RUNNING) { 33                             failExternally(new Exception( 34                                 "Error while triggering checkpoint " + checkpointID + " for " +
35  taskNameWithSubtask, t)); 36                         } else { 37                             LOG.debug("Encountered error while triggering checkpoint {} for " +
38                                 "{} ({}) while being not in state running.", checkpointID, 39  taskNameWithSubtask, executionId, t); 40  } 41                     } finally { 42                         FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(null); 43  } 44  } 45  }; 46             //以异步的方式触发Runnable
47  executeAsyncCallRunnable( 48  runnable, 49                     String.format("Checkpoint Trigger for %s (%s).", taskNameWithSubtask, executionId)); 50  } 51         else { 52             LOG.debug("Declining checkpoint request for non-running task {} ({}).", taskNameWithSubtask, executionId); 53 
54             // send back a message that we did not do the checkpoint
55  checkpointResponder.declineCheckpoint(jobId, executionId, checkpointID, 56                     new CheckpointException("Task name with subtask : " + taskNameWithSubtask, CheckpointFailureReason.CHECKPOINT_DECLINED_TASK_NOT_READY)); 57  } 58     }

　　SourceStreamTask和StreamTask调用triggerCheckpoint最终都是调用StreamTask类中的triggerCheckpoint(...)方法，其核心代码为：

1 　　//#StreamTask.java
2     return performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics, advanceToEndOfEventTime);

　　在performCheckpoint(...)方法中，主要有以下两件事：

　　1、若task是running，则可以进行checkpoint，主要有以下三件事：

　　　　1）为checkpoint做准备，一般是什么不做的，直接接受checkpoint；

　　　　2）生成barrier，并以广播的形式发射到下游去；

　　　　3）触发本task保存state；

　　2、若不是running，通知下游取消本次checkpoint，方法是发送一个CancelCheckpointMarker，这是类似于Barrier的另一种消息。

　　具体代码如下：

 1 　　//#StreamTask.java
 2     private boolean performCheckpoint(  3  CheckpointMetaData checkpointMetaData,  4  CheckpointOptions checkpointOptions,  5  CheckpointMetrics checkpointMetrics,  6             boolean advanceToEndOfTime) throws Exception {  7         //......
 8 
 9         synchronized (lock) { 10             if (isRunning) { 11 
12                 if (checkpointOptions.getCheckpointType().isSynchronous()) { 13  syncSavepointLatch.setCheckpointId(checkpointId); 14 
15                     if (advanceToEndOfTime) { 16  advanceToEndOfEventTime(); 17  } 18  } 19 
20                 // All of the following steps happen as an atomic step from the perspective of barriers and 21                 // records/watermarks/timers/callbacks. 22                 // We generally try to emit the checkpoint barrier as soon as possible to not affect downstream 23                 // checkpoint alignments 24 
25                 // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work. 26                 // The pre-barrier work should be nothing or minimal in the common case.
27  operatorChain.prepareSnapshotPreBarrier(checkpointId); 28 
29                 // Step (2): Send the checkpoint barrier downstream
30  operatorChain.broadcastCheckpointBarrier( 31  checkpointId, 32  checkpointMetaData.getTimestamp(), 33  checkpointOptions); 34 
35                 // Step (3): Take the state snapshot. This should be largely asynchronous, to not 36                 // impact progress of the streaming topology
37  checkpointState(checkpointMetaData, checkpointOptions, checkpointMetrics); 38 
39                 return true; 40  } 41             else { 42                 //.......
43  } 44  } 45     }

　　接下来分析checkpointState(...)过程。

　　checkpointState(...)方法最终会调用StreamTask类中executeCheckpointing()，其中会创建一个异步对象AsyncCheckpointRunnable，用以报告该检查点已完成，关键代码如下：

 1 　　//#StreamTask.java类中executeCheckpointing()
 2     public void executeCheckpointing() throws Exception {  3             startSyncPartNano = System.nanoTime();  4 
 5             try {  6                 //调用StreamOperator进行snapshotState的入口方法，依算子不同而变
 7                 for (StreamOperator<?> op : allOperators) {  8  checkpointStreamOperator(op);  9  } 10                 //......... 11 
12                 // we are transferring ownership over snapshotInProgressList for cleanup to the thread, active on submit
13                 AsyncCheckpointRunnable asyncCheckpointRunnable = new AsyncCheckpointRunnable( 14  owner, 15  operatorSnapshotsInProgress, 16  checkpointMetaData, 17  checkpointMetrics, 18  startAsyncPartNano); 19 
20  owner.cancelables.registerCloseable(asyncCheckpointRunnable); 21  owner.asyncOperationsThreadPool.execute(asyncCheckpointRunnable); 22 
23                 //.........
24             } catch (Exception ex) { 25                 //.......
26  } 27         }

　　进入AsyncCheckpointRunnable(...)中的run()方法，其中会调用StreamTask类中reportCompletedSnapshotStates(...)（对于一个无状态的job返回的null），进而调用TaskStateManagerImpl类中的reportTaskStateSnapshots(...)将TM的checkpoint汇报给JM，关键代码如下：

1     //TaskStateManagerImpl.java
2     checkpointResponder.acknowledgeCheckpoint(
3             jobId,
4             executionAttemptID,
5             checkpointId,
6             checkpointMetrics,
7             acknowledgedState);

　　其逻辑是逻辑是通过rpc的方式远程调JobManager的相关方法完成报告事件。

　　2.4 JobManager处理checkpoint

　　通过RpcCheckpointResponder类中acknowledgeCheckpoint(...)来响应checkpoint返回的消息，该方法之后的调度过程和涉及的核心方法如下：

 1 　　 //#JobMaster类中acknowledgeCheckpoint==>  2     //#LegacyScheduler类中acknowledgeCheckpoint==>  3     //#CheckpointCoordinator类中receiveAcknowledgeMessage(...)==>  4     //completePendingCheckpoint(checkpoint);  5     
 6     //<p>Important: This method should only be called in the checkpoint lock scope
 7     private void completePendingCheckpoint(PendingCheckpoint pendingCheckpoint) throws CheckpointException {  8         final long checkpointId = pendingCheckpoint.getCheckpointId();  9         final CompletedCheckpoint completedCheckpoint; 10 
11         // As a first step to complete the checkpoint, we register its state with the registry
12         Map<OperatorID, OperatorState> operatorStates = pendingCheckpoint.getOperatorStates(); 13  sharedStateRegistry.registerAll(operatorStates.values()); 14 
15         try { 16             try { 17                 //完成checkpoint
18                 completedCheckpoint = pendingCheckpoint.finalizeCheckpoint(); 19  failureManager.handleCheckpointSuccess(pendingCheckpoint.getCheckpointId()); 20  } 21             catch (Exception e1) { 22                 // abort the current pending checkpoint if we fails to finalize the pending checkpoint.
23                 if (!pendingCheckpoint.isDiscarded()) { 24  failPendingCheckpoint(pendingCheckpoint, CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, e1); 25  } 26 
27                 throw new CheckpointException("Could not finalize the pending checkpoint " + checkpointId + '.', 28  CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, e1); 29  } 30 
31             // the pending checkpoint must be discarded after the finalization
32             Preconditions.checkState(pendingCheckpoint.isDiscarded() && completedCheckpoint != null); 33 
34             try { 35                 //添加新的checkpoints，若有必要（completedCheckpoints.size() > maxNumberOfCheckpointsToRetain）删除旧的
36  completedCheckpointStore.addCheckpoint(completedCheckpoint); 37             } catch (Exception exception) { 38                 // we failed to store the completed checkpoint. Let's clean up
39                 executor.execute(new Runnable() { 40  @Override 41                     public void run() { 42                         try { 43  completedCheckpoint.discardOnFailedStoring(); 44                         } catch (Throwable t) { 45                             LOG.warn("Could not properly discard completed checkpoint {}.", completedCheckpoint.getCheckpointID(), t); 46  } 47  } 48  }); 49 
50                 throw new CheckpointException("Could not complete the pending checkpoint " + checkpointId + '.', 51  CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, exception); 52  } 53         } finally { 54  pendingCheckpoints.remove(checkpointId); 55 
56  triggerQueuedRequests(); 57  } 58 
59  rememberRecentCheckpointId(checkpointId); 60 
61         // drop those pending checkpoints that are at prior to the completed one 62         //删除在其之前未完成的checkpoint（优先级高的）
63  dropSubsumedCheckpoints(checkpointId); 64 
65         // record the time when this was completed, to calculate 66         // the 'min delay between checkpoints'
67         lastCheckpointCompletionNanos = System.nanoTime(); 68 
69         LOG.info("Completed checkpoint {} for job {} ({} bytes in {} ms).", checkpointId, job, 70  completedCheckpoint.getStateSize(), completedCheckpoint.getDuration()); 71 
72         if (LOG.isDebugEnabled()) { 73             StringBuilder builder = new StringBuilder(); 74             builder.append("Checkpoint state: "); 75             for (OperatorState state : completedCheckpoint.getOperatorStates().values()) { 76  builder.append(state); 77                 builder.append(", "); 78  } 79             // Remove last two chars ", "
80             builder.setLength(builder.length() - 2); 81 
82  LOG.debug(builder.toString()); 83  } 84 
85         // send the "notify complete" call to all vertices
86         final long timestamp = completedCheckpoint.getTimestamp(); 87         
88         //通知所有（TM中）operator该checkpoint已完成
89         for (ExecutionVertex ev : tasksToCommitTo) { 90             Execution ee = ev.getCurrentExecutionAttempt(); 91             if (ee != null) { 92  ee.notifyCheckpointComplete(checkpointId, timestamp); 93  } 94  } 95     }

　　至此，checkpoint的整体流程分析完毕建议结合原理去理解，参考的三篇文献都是写的很好的，有时间建议看看。

Ref：

[1]https://www.jianshu.com/p/a40a1b92f6a2

[2]https://www.cnblogs.com/bethunebtj/p/9168274.html

[3] https://blog.csdn.net/qq475781638/article/details/92698301

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 flink checkpoint 源码分析（一） flink checkpoint 源码分析（二） Flink源码阅读（一）--Checkpoint触发机制 Flink 非对齐Unaligned的checkpoint（源码分析） flink-connector-kafka consumer checkpoint源码分析 Spark源码分析 – Checkpoint flink源码阅读（概览） flink源码阅读（2） Flink源码阅读(1.7.2) Spark Streaming源码分析 – Checkpoint