問題:
最近現場反饋采用yarn-cluster方式提交spark application后,在提交節點機上依然會存在一個yarn的client進程不關閉,又由於spark application都是spark structured streaming程序(application常年累月的執行),最終導致spark application提交節點服務器資源被占滿,當執行其他操作時,會出現以下錯誤:
[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189 Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory. # An error report file with more information is saved as: # /home/dx/myProj/appApp/bin/hs_err_pid53561.log [dx@my-linux-01 bin]$
現場對spark application提交節點進行分析發現占用進程主要是(yarn client集成占用):
[dx@my-linux-01 bin]$ top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 122236 dx 20 0 20.629g 1.347g 3520 S 0.3 2.1 7:02.42 java 122246 dx 20 0 20.629g 1.311g 3520 S 0.3 2.0 7:03.42 java 122236 dx 20 0 20.629g 1.288g 3520 S 0.3 2.2 7:05.83 java 122346 dx 20 0 20.629g 1.344g 3520 S 0.3 2.1 7:10.42 java 121246 dx 20 0 20.629g 1.343g 3520 S 0.3 2.3 7:01.42 java 122346 dx 20 0 20.629g 1.341g 3520 S 0.3 2.4 7:03.39 java 112246 dx 20 0 20.629g 1.344g 3520 S 0.3 2.0 7:02.42 java ............ 112260 dx 20 0 20.629g 1.344g 3520 S 0.3 2.0 7:02.02 java 112260 dx 20 0 113116 200 0 S 0.0 0.0 0:00.00 sh ............
Yarn提交Spark任務分析:
yarn方式提交spark application包含兩種:
1)yarn-client(spark-submit --master yarn --deploy-mode client ...):
這種方式spark提交application任務之后,driver運行在提交服務器節點,且driver運行yarn的client進程中,因此如果關閉了提交服務器節點上client進程會導致driver被關閉,進而導致application被關閉。
2)yarn-cluster(spark-submit --master yarn --deploy-mode cluster):
這種方式spark提交application任務之后,driver運行yarn分配container內,container內分配一個AM(Application Master)進程,SparkContext(driver)運行在該AM內,在yarn提交時,在提交節點上也會啟動一個yarn的client進程,默認yarn-client方式提交完application后會等待任務結束(failed,finished等),否則會一直運行。
解決方案:
yarn.client的參數
spark.yarn.submit.waitAppCompletion
如果設置這個參數為true 的話,client將會一直運行並且報告application的狀態直到application退出(無論何種原因);
如果設置這個參數為false的話,client的進程將會在application提交后退出。
在spark-submit 參數添加參數
./bin/spark-submit.sh \ --master yarn \ --deploy-mode cluster \ --conf spark.yarn.submit.waitAppCompletion=false ....
對應yarn.client類中代碼位置:
/** * Submit an application to the ResourceManager. * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive * reporting the application's status until the application has exited for any reason. * Otherwise, the client process will exit after submission. * If the application finishes with a failed, killed, or undefined status, * throw an appropriate SparkException. */ def run(): Unit = { this.appId = submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report)) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId) if (yarnApplicationState == YarnApplicationState.FAILED || finalApplicationStatus == FinalApplicationStatus.FAILED) { throw new SparkException(s"Application $appId finished with failed status") } if (yarnApplicationState == YarnApplicationState.KILLED || finalApplicationStatus == FinalApplicationStatus.KILLED) { throw new SparkException(s"Application $appId is killed") } if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) { throw new SparkException(s"The final status of application $appId is undefined") } } }