報錯日志如下:(肯定有時報錯信息不准確,不能准確定位問題出現在哪里)
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException: sleep interrupted at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:348) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:568) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:352) at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:300) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:244) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:438) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345) ... 17 more Total MapReduce CPU Time Spent: -2 msec Job Submission failed with exception 'org.apache.hadoop.yarn.exceptions.YarnRuntimeException(java.lang.InterruptedException: sleep interrupted)'
或者如下:
2021-10-31 09:00:11,340 [Thread-72] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not load native gpl library java.lang.UnsatisfiedLinkError: /home/pirate/dev/disk-5/tmp/yarn-local/usercache/pirate/appcache/application_1635150008466_34289/container_1635150008466_34289_01_000001/tmp/unpacked-3959672880919352106-libgplcompression.so: /home/pirate/dev/disk-5/tmp/yarn-local/usercache/pirate/appcache/application_1635150008466_34289/container_1635150008466_34289_01_000001/tmp/unpacked-3959672880919352106-libgplcompression.so: failed to map segment from shared object: Operation not permitted at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) at java.lang.Runtime.load0(Runtime.java:809) at java.lang.System.load(System.java:1086) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:51) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:159) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:151) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283) at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:309) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:470) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:328) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:432) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) 2021-10-31 09:00:11,341 [Thread-72] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without native-hadoop
排查hive腳本發現,Hive指定優化參數如下:
set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; set mapred.output.compression.type=BLOCK; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; set hive.auto.convert.join=true; set mapreduce.map.memory.mb=40960; set mapreduce.reduce.memory.mb=40960; set mapred.child.java.opts=-Xmx1536m; set mapreduce.job.reduce.slowstart.completedmaps=0.8; set hive.exec.parallel=true;
考慮可能是mapreduce.map.memory.mb 或者 mapreduce.reduce.memory.mb參數配置過大引起的,這兩個參數代表需要向yarn container中申請的內存大小,查找Hadoop yarn-site.xml配置文件發現如下配置:
<property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>30720</value> </property>
於是將上述參數調小至此參數范圍內,重新提交腳本,發現腳本執行成功;
總結:
Mapper/Reducer階段JVM堆內存溢出參數調優
目前MapReduce主要通過兩個組參數去控制內存:(將如下參數調大)
Maper: mapreduce.map.java.opts=-Xmx2048m(默認參數,表示jvm堆內存,注意是mapreduce不是mapred) mapreduce.map.memory.mb=2304(container的內存) Reducer: mapreduce.reduce.java.opts=-=-Xmx2048m(默認參數,表示jvm堆內存) mapreduce.reduce.memory.mb=2304(container的內存)
注意:因為在yarn container這種模式下,map/reduce task是運行在Container之中的,
所以上面提到的mapreduce.map(reduce).memory.mb大小都大於mapreduce.map(reduce).java.opts值的大小。
mapreduce.{map|reduce}.java.opts能夠通過Xmx設置JVM最大的heap的使用,一般設置為0.75倍的memory.mb,因為需要為java code等預留些空間