hiveserver 占用內存過大的問題


今天為了求解hiveserver占用內存過大的問題,特地加了hive在apache的郵件列表,討論半天。特別說的是 里面的人確實很熱情啊 ,外國人做事確實很認真,討論帖發的時候都狠詳細。

 

粘出一些記錄:

 

 

Did you update your JDK in last time? A java-dev told me that could be
a  issue in JDK _26
(https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
devs report a memory decrease when they use GC - flags. I'm quite not
sure, sounds for me to far away.

The stacks have a lot waitings, but I see nothing special.

- Alex

2011/12/12 王鋒 <wfeng1982@163.com>:
>
> The hive log:
>
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
> real=0.08 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
> real=0.07 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>
> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
> still be so large .I'm  mad, God
>
> have other suggestions ?
>
> 在 2011-12-12 17:59:52,"alo alt" <wget.null@googlemail.com
>> 寫道:
>>When you start a high-load hive query can you watch the stack-traces?
>>Its possible over the webinterface:
>>http://jobtracker:50030/stacks
>>
>>- Alex
>>
>>
>>2011/12/12 王鋒 <wfeng1982@163.com>
>>>
>>> hiveserver will throw oom after several hours .
>>>
>>>
>>> At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:
>>>
>>> what happen when you set xmx=2048m or similar? Did that have any negative effects for running queries?
>>>
>>> 2011/12/12 王鋒 <wfeng1982@163.com>
>>>>
>>>> I have modify hive jvm args.
>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>
>>>> but the memory  used by hiveserver  is still large.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com> wrote:
>>>>
>>>> Not from the running jobs, what I am saying is the heap size of the Hadoop really depends on the number of files, directories on the HDFS. Remove old files periodically or merge small files would bring in some performance boost.
>>>>
>>>> On the Hive end, the memory consumed also depends on the queries that are executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce part could be the bottleneck here.
>>>>
>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>
>>>> 2011/12/12 王鋒 <wfeng1982@163.com>
>>>>>
>>>>> is the files you said  the files from runned jobs  of our system? and them  can't be so much large.
>>>>>
>>>>> why is the cause of namenode.  what are hiveserver doing   when it use so large memory?
>>>>>
>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.sun82@gmail.com> 寫道:
>>>>>
>>>>> Not sure if this is because of the number of files, since the namenode would track each of the file and directory, and blocks.
>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>
>>>>> Please correct me if I am wrong, because this seems to be more like a hdfs problem which is actually irrelevant to Hive.
>>>>>
>>>>> Thanks
>>>>> Aaron
>>>>>
>>>>> 2011/12/11 王鋒 <wfeng1982@163.com>
>>>>>>
>>>>>>
>>>>>> I want to know why the hiveserver use so large memory,and where the memory has been used ?
>>>>>>
>>>>>> 在 2011-12-12 10:02:44,"王鋒" <wfeng1982@163.com> 寫道:
>>>>>>
>>>>>>
>>>>>> The namenode summary:
>>>>>>
>>>>>>
>>>>>>
>>>>>> the mr summary
>>>>>>
>>>>>>
>>>>>> and hiveserver:
>>>>>>
>>>>>>
>>>>>> hiveserver jvm args:
>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>>>>>
>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>
>>>>>>
>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.sun82@gmail.com> 寫道:
>>>>>>
>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>
>>>>>> 2011/12/11 王鋒 <wfeng1982@163.com>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver several months. We have our own tasks schedule system .The system can schedule tasks running with hiveserver by jdbc.
>>>>>>>
>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   we have 5min tasks which will be  running every 5 minutes.,and have hourly tasks .total num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle connected .
>>>>>>>
>>>>>>>     so why Memory of  hiveserver  using so large and how we do or some suggestion from you ?
>>>>>>>
>>>>>>> Thanks and Best Regards!
>>>>>>>
>>>>>>> Royce Wang

 

 

 

最上面   Alex發現一篇文章

https://forums.oracle.com/forums/thread.jspa?threadID=2309872說是 jdk_1.0.26存在泄露的風險,我們正在使用也正是同一個版本,看這個url里文章說的也是誰也不能確認,而oracle方自然說不由其負責。
I tried with java6u29 and java7 and they work great. Actually on the production server we are running for almost 4 days with java7 and it's stable, no crash, no slowdown, no restart in this period, and with less maximum memory. If it's going to last for a week then I trust it will go on fine.
最后是有人用java6u29 和java7 運行 穩定。特別是java7.明天嘗試在hiveserver服務器換用java7試試。
 
        

  append。。。。

 

今天改用jdk 7測試 情況基本一致,看來問題並不是 jvm問題。

 

使用jmap -heap 發現  hiveserver 新生代 並沒有去按照ratio設置的 那樣,最大容量還是默認的800m,這個對數據分析來說太小了,使用xmn配置新生代,並配置最大新生代大小,而且將gc機制改為cms,目前內存占用穩定在 2.3g左右。

最后的參數 :

 

export HADOOP_OPTS="$HADOOP_OPTS  -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k  -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM