[Hadoop] - Hadoop Mapreduce Error: GC overhead limit exceeded


在運行mapreduce的時候,出現Error: GC overhead limit exceeded,查看log日志,發現異常信息為

2015-12-11 11:48:44,716 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.io.DataInputStream.readUTF(DataInputStream.java:661)
    at java.io.DataInputStream.readUTF(DataInputStream.java:564)
    at xxxx.readFields(DateDimension.java:186)
    at xxxx.readFields(StatsUserDimension.java:67)
    at xxxx.readFields(StatsBrowserDimension.java:68)
    at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
    at org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:239)
    at xxx.reduce(BrowserReducer.java:37)
    at xxx.reduce(BrowserReducer.java:16)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

從異常中我們可以看到,在reduce讀取一下個數據的時候,出現內存不夠的問題,從代碼中我發現再reduce端使用了讀個map集合,這樣會導致內存不夠的問題。在hadoop2.x中默認Container的yarn child jvm堆大小為200M,通過參數mapred.child.java.opts指定,可以在job提交的時候給定,是一個客戶端生效的參數,配置在mapred-site.xml文件中,通過將該參數修改為-Xms200m -Xmx1000m來更改jvm堆大小,異常解決。

參數名稱 默認值 描述
mapred.child.java.opts -Xmx200m 定義mapreduce執行的container容器的執行jvm參數
mapred.map.child.java.opts   單獨指定map階段的執行jvm參數
mapred.reduce.child.java.opts   單獨指定reduce階段的執行jvm參數
mapreduce.admin.map.child.java.opts
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
管理員指定map階段執行的jvm參數
mapreduce.admin.reduce.child.java.opts
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
管理員指定reduce階段的執行jvm參數

 

 上述五個參數生效的分別執行順序為:

  map階段:mapreduce.admin.map.child.java.opts < mapred.child.java.opts < mapred.map.child.java.opts, 也就是說最終會采用mapred.map.child.java.opts定義的jvm參數,如果有沖突的話。

  reduce階段:mapreduce.admin.reduce.child.java.opts < mapred.child.java.opts < mapred.reduce.child.java.opts

 hadoop源碼參考:org.apache.hadoop.mapred.MapReduceChildJVM.getChildJavaOpts方法。

private static String getChildJavaOpts(JobConf jobConf, boolean isMapTask) {
    String userClasspath = "";
    String adminClasspath = "";
    if (isMapTask) {
        userClasspath = jobConf.get(JobConf.MAPRED_MAP_TASK_JAVA_OPTS,
                jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
                        JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
        adminClasspath = jobConf.get(
                MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,
                MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
    } else {
        userClasspath = jobConf.get(JobConf.MAPRED_REDUCE_TASK_JAVA_OPTS,
                jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
                        JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
        adminClasspath = jobConf.get(
                MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,
                MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
    }

    // Add admin classpath first so it can be overridden by user.
    return adminClasspath + " " + userClasspath;
}

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM