SparkContext.setCheckpointDir()

本文轉載自查看原文 2015-08-29 20:53 2235 spark

class SparkContext extends Logging with ExecutorAllocationClient

Main entry point for Spark functionality.

spark功能函數的主入口。

def parallelize[T](seq: Seq[T], numSlices: Int = defaultParallelism)(implicit arg0: ClassTag[T]): RDD[T]

Distribute a local Scala collection to form an RDD.

將一個本地Scala collection 格式化為一個RDD。

Note

Parallelize acts lazily. If seq is a mutable collection and is altered after the call to parallelize and before the first action on the RDD, the resultant RDD will reflect the modified collection. Pass a copy of the argument to avoid this.

注意

Parallelize是懶動作函數.如果參數seq是一個易變的collection，並且在調用parallelize之后但又在一個對RDD的action之前的期間會被修改，那么所得的RDD將會反應出被修改的collection，導致結果可能會不可預料。所以，向本函數的參數seq傳遞一個副本。

checkpoint(self)

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.

checkpoint(self)

標記當前RDD的校驗點。它會被保存為在由SparkContext.setCheckpointDir()方法設置的checkpoint目錄下的文件集中的一個文件。簡而言之就是當前RDD的校驗點被保存為了一個文件，而這個文件在一個目錄下，這個目錄下有不少的這樣的文件，這個目錄是由SparkContext.setCheckpointDir()方法設置的。並且所有從父RDD中引用的文件都將被刪除。這個函數必須在所有的job前被調用，運行在這個RDD上。它被強烈的建議保存在內存中，否則，也就是從內存轉出存入文件，則需要重新計算它。

scala:

def setCheckpointDir(directory: String): Unit

Set the directory under which RDDs are going to be checkpointed. The directory must be a HDFS path if running on a cluster.

設置一個目錄，用來讓RDD們可以在其下被checkpoint。如果是跑在一個集群上，這個目錄必須是一個HDFS路徑。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 SparkConf和SparkContext SparkSession 與SparkContext Spark 核心篇-SparkContext spark[源碼]-sparkContext詳解[一] sparkContext的addFile和addPyFile Spark——SparkContext簡單分析 spark 筆記 5: SparkContext，SparkConf Spark源碼分析 – SparkContext spark教程(四)-SparkContext 和 RDD 算子 Spark分析之SparkContext啟動過程分析