今天嘗試跑了一個這樣的Hive SQL,跑過去30天的用戶的平均步數和卡路里。
#!/bin/bash cur_date=`date +%Y%m%d` pasts="" for i in `seq 30` do iday=`date -d "$i days ago" +%Y%m%d` if [ 1 -eq $i ] then pasts=$iday else pasts=$pasts","$iday fi done # echo $pasts sudo -su hdfs hive -e "select uid,avg(steps),avg(calories) from dailystats where day in ($pasts) group by uid" > /ad/tongji/output/getAvgStats/$cur_date
結果到Web Tracker(默認8088端口的服務)中觀察發現Hive啟動了2個Map,然后這個Map就失敗重試最后全部失敗。
從Web Tracker返回的結果是:
AttemptID:attempt_1395208369821_0011_m_000004_0 Timed out after 600 secscleanup failed for container container_1395208369821_0011_01_000006 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:122) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:208) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:400) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at com.sun.proxy.$Proxy29.stopContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:119) ... 5 more Caused by: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) ... 7 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291) at org.apache.hadoop.ipc.Client.call(Client.java:1209) ... 8 more
Hive Shell返回的是
Error during job, obtaining debugging information...
Job Tracking URL: http://AY130105124528d0c2393:8088/proxy/application_1395208369821_0011/
Examining task ID: task_1395208369821_0011_m_000003 (and more) from job job_1395208369821_0011
Task with the most failures(1):
-----
Task ID:
task_1395208369821_0011_m_000004
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1395208369821_0011&tipid=task_1395208369821_0011_m_000004
-----
Diagnostic Messages for this Task:
AttemptID:attempt_1395208369821_0011_m_000004_0 Timed out after 600 secs
cleanup failed for container container_1395208369821_0011_01_000006 : java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:122)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:208)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:400)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
at com.sun.proxy.$Proxy29.stopContainer(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:119)
... 5 more
Caused by: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1242)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
... 7 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
at org.apache.hadoop.ipc.Client.call(Client.java:1209)
... 8 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
可是呢,完全沒頭腦,最后查Hive Shell找到對應的Application的log位置:
/var/log/hadoop-yarn/containers/application_1395208369821_0011/container_1395208369821_0011_01_000006 # ll
total 16
drwx--x--- 2 yarn yarn 4096 Mar 19 14:44 ./
drwx--x--- 8 yarn yarn 4096 Mar 19 14:44 ../
-rw-rw-r-- 1 yarn yarn 0 Mar 19 14:44 stderr
-rw-rw-r-- 1 yarn yarn 544 Mar 19 14:44 stdout
-rw-rw-r-- 1 yarn yarn 3852 Mar 19 14:44 syslog
查看stdout
/var/log/hadoop-yarn/containers/application_1395208369821_0011/container_1395208369821_0011_01_000006 # ll more stdout Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f5e80000, 99090432, 0) faile d; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 99090432 bytes for committing reserved memory. # An error report file with more information is saved as: # /ad/hadoop-yarn/cache/yarn/nm-local-dir/usercache/hdfs/appcache/application_1395208369821_0011/containe r_1395208369821_0011_01_000006/hs_err_pid2286.log
卡在Map的原因就是 Cannot allocate memory
可以對Map內存使用進行設置,實際我只修改了mapred-site文件,加入這個property
<name>mapreduce.map.memory.mb</name> <value>800</value>
機器內存4G,我就設置800M,當然也嘗試過900和其他數值,這個數值可以了。30天數據大概450W,5分鍾跑完。偶也~
參考:
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
http://woopisy.hatenablog.com/entry/2013/11/19/131033