yarn啟動spark進行測試時候發現,幾台機器上啟動executor都失敗了,日志如下2018-11-22 14:32:24 WARN YarnAllocator:66 - Container marked as failed: container_1516236189600_0229_01_000011 on host: cloud3. Exit status: 127. Diagnostics: Exception from container-launch.
Container id: container_1516236189600_0229_01_000011 Exit code: 127 Stack trace: ExitCodeException exitCode=127: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 127 2018-11-22 14:32:26 INFO YarnAllocator:54 - Driver requested a total number of 0 executor(s).
在google上查了很久也沒解決問題,直到看到有個人建議用yarn logs -applicationId <APP_ID>查日志才發現如下情況
Container: container_1516236189600_0231_01_000001 on cloud3_22681
===================================================================
LogType:stderr
Log Upload Time:星期四 十一月 22 14:48:13 +0800 2018
LogLength:75
Log Contents:
/bin/bash: /data/platform/jdk1.8.0_144/bin/java: No such file or directory
LogType:stdout
Log Upload Time:星期四 十一月 22 14:48:13 +0800 2018
LogLength:0
Log Contents:
已經很明顯了,然后連到相應機器上看,確實沒有jdk1.8.0_144,增加jdk1.8.0_144。重啟job發現沒問題了。