Spark ML 之 ALS內存溢出的解決辦法

本文轉載自查看原文 2020-10-23 11:32 325 Spark ML

原帖：https://blog.csdn.net/Damonhaus/article/details/76572971

問題：協同過濾 ALS算法。在測試過程中遇到了內存溢出的錯誤

解決辦法1：降低迭代次數，20次 -> 10次

val model = new ALS().setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false) .run(alldata)

以上改成 .setIterations(10)

解決辦法2：checkpoint機制

  /**
     *  刪除checkpoint留下的過程數據
     */
    val path = new Path(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"); //聲明要操作（刪除）的hdfs 文件路徑
    val hadoopConf = spark.sparkContext.hadoopConfiguration
    val hdfs = org.apache.hadoop.fs.FileSystem.get(new URI(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"),hadoopConf)
    if(hdfs.exists(path)) {
      //需要遞歸刪除設置true，不需要則設置false
      hdfs.delete(path, true) //這里因為是過程數據，可以遞歸刪除
    }

  /**
   * 設置 CheckpointDir
   */
    spark.sparkContext.setCheckpointDir(HDFSConnection.paramMap("hadoop_url")+"/checkpoint")

 /**
   * Set period (in iterations) between checkpoints (default = 10). Checkpointing helps with
   * recovery (when nodes fail) and StackOverflow exceptions caused by long lineage. It also helps
   * with eliminating temporary shuffle files on disk, which can be important when there are many
   * ALS iterations. If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
   * this setting is ignored.
   */

val model = new ALS().setCheckpointInterval(2).setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false)
      .run(alldata)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 jvm內存溢出的三種情況以及解決辦法內存溢出的幾種原因和解決辦法 PHP內存溢出Allowed memory size of 解決辦法 JVM 發生內存溢出的 8 種原因、及解決辦法 Tomcat內存溢出的三種情況及解決辦法分析內存溢出的幾種原因和解決辦法(轉) java 大數據處理之內存溢出解決辦法（一）執行yarn deploy打包，報內存溢出的錯誤解決辦法 android通過BitmapFactory.decodeFile獲取圖片bitmap報內存溢出的解決辦法 Python中的棧溢出及解決辦法