2020-07-30 14:19:34,034 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: RECORDS_OUT_INTERMEDIATE:50, 2020-07-30 14:19:34,037 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.orc.impl.DynamicByteArray.get(DynamicByteArray.java:283) at org.apache.orc.impl.TreeReaderFactory$StringDictionaryTreeReader.nextVector(TreeReaderFactory.java:1584) at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1277) at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) 2020-07-30 14:19:34,049 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system... 2020-07-30 14:19:34,050 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped. 2020-07-30 14:19:34,050 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
這就是堆內存溢出。
看看yarn-site.xml 的配置
<property> <name>yarn.nodemanager.resource.memory-mb</name> <value>24576</value> <!-- 每個節點可用內存,配置為24G 限制了NodeManager 從本機中申請內存的上限--> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>16384</value> <!-- 單個任務最大可申請內存16G,這里只是進行了一個限制,也就是不能超過它,如果超過了會被kill--> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> <!-- 單個任務可申請的最小內存2G,這是只是進行了一個限制,如果是map任務還需要在mapred-site.xml 再次設置 --> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <!-- 關閉檢查虛擬內存量 --> </property>
看看mapred-site.xml
<property> <name>mapreduce.map.memory.mb</name> <value>4096</value> <!-- map 任務最大內存 --> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>4096</value> <!-- reduce 任務最大內存 --> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx2048m</value> <!-- JVM 虛擬機的啟動參數 --> </property>
一直沒找到原因;然后就想到了是不是hive 自身的問題;
終於找到了解決方案,在執行HiveSQL 之前進行一下設置
set mapreduce.map.java.opts=-Xmx2048m; -- 然后在執行sql select * from xxx