通過觀察RDD.scala源代碼即可知道cache和persist的區別:
def persist(newLevel: StorageLevel): this.type = { sc.cleaner.foreach(_.registerRDDForCleanup(this)) /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ |
可知:
1)RDD的cache()方法其實調用的就是persist方法,緩存策略均為MEMORY_ONLY;
2)可以通過persist方法手工設定StorageLevel來滿足工程需要的存儲級別;
3)cache或者persist並不是action;