spark版本:1.6.0
scala版本:2.10
報錯日志:
Application application_1562341921664_2123 failed 2 times due to AM Container for appattempt_1562341921664_2123_000002 exited with exitCode: -104
For more detailed output, check the application tracking page: http://winner-offline-namenode:8088/cluster/app/application_1562341921664_2123 Then click on links to logs of each attempt.
Diagnostics: Container [pid=8891,containerID=container_e12_1562341921664_2123_02_000001] is running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 5.0 GB of 6.3 GB virtual memory used. Killing container.
Dump of the process-tree for container_e12_1562341921664_2123_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 8891 8889 8891 8891 (bash) 0 0 112881664 367 /bin/bash -c LD_LIBRARY_PATH=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64: /usr/java/jdk1.8.0_162/bin/java -server -Xmx2048m -Djava.io.tmpdir=/disk1/hadoop/yarn/local/usercache/hadoop/appcache/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/tmp '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' -Dhdp.version=2.6.4.0-91 -Dspark.yarn.app.container.log.dir=/disk1/hadoop/yarn/log/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.winner.wifi.WinWifiAPIncre' --jar file:/hadoop/datadir/guozy/jar/winnercloud-assembly.jar --arg '2019-07-02' --arg 'bg' --arg 'St_P00094#St_P00087#Xsj_00001P00016#Cqhm_P00091#Langold_00004P00001#Xcjt_00004P00010#Djzx_P00001#Xsj_00001P00035#Xsj_00001P00013#Whc_P00081#Sygc_00001P00001#Szc_P00017#Lyyysc_P00001#Longfor_00001P00013#Lgjt_P00040#Wfjjt_P00028#Qzkyss_00001P00001#Hfjt_P00002#Xxhjgjgc_P00001#Xsj_00001P00024#Wdl_P00103#Hualian_00008P00020#Xl_P00067#Bfc_P00114#Snjt_00001P00001#Lfjt_P00002#Xuhui_P00050#Bailian_00002P00051#Xinyuan_P00077#Wdl_P00105#Hualian_P00004#Wfjjt_P00095#Fxhyjt_00001P00001#Xsj_00001P00023#Xsj_00001P00020#Xhbh_P00089#Szc_P00001#Yxgw_00001P00001#Wfjjt_P00035#Mybh_P00001#Yhjt_00002P00001#Wdl_P00106#Nydtjt_00001P00001#Xsj_00001P00002#Xuhui_P00038#Xsj_00001P00017#Xsj_00001P00012#Xsj_00001P00027#Lfjt_P00001#Xsj_00001P00034#Xhbh_P00108#Wjjt_P00022#Hjgc_P00001#Zxtfgc_P00001#Xhzb_P00001#Txjt_P00001#Mybh_P00003#Hdjt_00002P00018#Bailian_00002P00026#Hrzd_00002P00001#Bailian_00002P00052#Hxmkl_P00007#Ayjt_P00002#Wdl_P00107#Bailian_00002P00004#Ayjt_00004P00013#Hxmkl_P00009#Xsj_00001P00006#Wfjjt_P00027' --arg '1' --arg '150' --executor-memory 2048m --executor-cores 1 --properties-file /disk1/hadoop/yarn/local/usercache/hadoop/appcache/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/__spark_conf__/__spark_conf__.properties 1> /disk1/hadoop/yarn/log/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/stdout 2> /disk1/hadoop/yarn/log/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/stderr
|- 8906 8891 8891 8891 (java) 51299 2993 5230977024 787116 /usr/java/jdk1.8.0_162/bin/java -server -Xmx2048m -Djava.io.tmpdir=/disk1/hadoop/yarn/local/usercache/hadoop/appcache/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/tmp -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dhdp.version=2.6.4.0-91 -Dspark.yarn.app.container.log.dir=/disk1/hadoop/yarn/log/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class com.winner.wifi.WinWifiAPIncre --jar file:/hadoop/datadir/guozy/jar/winnercloud-assembly.jar --arg 2019-07-02 --arg bg --arg St_P00094#St_P00087#Xsj_00001P00016#Cqhm_P00091#Langold_00004P00001#Xcjt_00004P00010#Djzx_P00001#Xsj_00001P00035#Xsj_00001P00013#Whc_P00081#Sygc_00001P00001#Szc_P00017#Lyyysc_P00001#Longfor_00001P00013#Lgjt_P00040#Wfjjt_P00028#Qzkyss_00001P00001#Hfjt_P00002#Xxhjgjgc_P00001#Xsj_00001P00024#Wdl_P00103#Hualian_00008P00020#Xl_P00067#Bfc_P00114#Snjt_00001P00001#Lfjt_P00002#Xuhui_P00050#Bailian_00002P00051#Xinyuan_P00077#Wdl_P00105#Hualian_P00004#Wfjjt_P00095#Fxhyjt_00001P00001#Xsj_00001P00023#Xsj_00001P00020#Xhbh_P00089#Szc_P00001#Yxgw_00001P00001#Wfjjt_P00035#Mybh_P00001#Yhjt_00002P00001#Wdl_P00106#Nydtjt_00001P00001#Xsj_00001P00002#Xuhui_P00038#Xsj_00001P00017#Xsj_00001P00012#Xsj_00001P00027#Lfjt_P00001#Xsj_00001P00034#Xhbh_P00108#Wjjt_P00022#Hjgc_P00001#Zxtfgc_P00001#Xhzb_P00001#Txjt_P00001#Mybh_P00003#Hdjt_00002P00018#Bailian_00002P00026#Hrzd_00002P00001#Bailian_00002P00052#Hxmkl_P00007#Ayjt_P00002#Wdl_P00107#Bailian_00002P00004#Ayjt_00004P00013#Hxmkl_P00009#Xsj_00001P00006#Wfjjt_P00027 --arg 1 --arg 150 --executor-memory 2048m --executor-cores 1 --properties-file /disk1/hadoop/yarn/local/usercache/hadoop/appcache/application_1562341921664_2123/container_e12_1562341921664_2123_02_000001/__spark_conf__/__spark_conf__.properties
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
Failing this attempt. Failing the application.
報錯截圖:
主要資源配置:
--driver-memory 2g
--executore-memory 2g
臨時解決方案:
增大driver端的內存
--driver-memory 3g
--executore-memory 2g
問題解決。
補充(根本原因):
之前一直從網上搜索了很多的問題解決方案,基本大多數都是讓我們調整參數,比如,在嘗試了各種參數的調整之后,還是沒有效果,所以,就想應該是代碼上出了問題,回去仔細查看了代碼,找到了原因。
1、代碼中使用了for循環,並在在否循環中使用了大量的廣播變量,而這些廣播變量基本都是一些配置文件,完全是沒有必要每次循環都要廣播一遍,只需要廣播一遍即可,因為,spark中,在對某個變量進行廣播的時候,driver端會先將變量收到driver段之后,然后在進行往各個節點分發,這樣每循環一次,driver端就會想變量收集一次,這樣,隨着循環次數的增多,driver段的內存越來越緊張,知道最后,導致driver段內存撐爆。
2、雖然一開始的時候每次使用完廣播變量之后,都調用了unpersist方法進行了釋放廣播變量,但是driver的gc實際上還是調用的系統的gc方式,即system.gc,而driver段的gc默認是沒半個小時執行一次,所以即使我們調用了unpersist方法,也不是說馬上就會進行垃圾變量清楚。
3、其次就是,我在submit的時候修改了gc的算法,改為了G1回收算法,但是我覺得這個應該不是導致driver段內存爆滿的原因。
4、反思:遇到問題的時候還是思考的有點少,其實調整參數個人認為只能說是一些基本的輔助作用,根本原因還是在我們寫的代碼上面,爭取將代碼調整到最優之后,基本就沒什么問題了。