RDD中cache和persist的區別


通過觀察RDD.scala源代碼即可知道cache和persist的區別:

def persist(newLevel: StorageLevel): this.type = {
  if (storageLevel != StorageLevel.NONE && newLevel != storageLevel) {
    throw new UnsupportedOperationException( "Cannot change storage level of an RDD after it was already assigned a level")
  }
  sc.persistRDD(this)

  sc.cleaner.foreach(_.registerRDDForCleanup(this))
  storageLevel = newLevel
  this
}

/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)

 

/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def cache(): this.type = persist()

 

 

 

 

 

 

 

 

 

 

 

可知:

1)RDD的cache()方法其實調用的就是persist方法,緩存策略均為MEMORY_ONLY;

2)可以通過persist方法手工設定StorageLevel來滿足工程需要的存儲級別;

3)cache或者persist並不是action;

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM