因為最近線上的Hadoop集群從mrv1升級到mrv2了,監控模板也跟着變動了。線上是200台左右的集群,模塊采用了link的方式來添加,即一個模板下link大量的模塊,然后主機添加到這個模板里。
這樣算下來一台機器差不多有 215個item。
為了增加NM的監控,也采用了link的方式來連接模板,在頁面上link時發現一直返回一個空白頁。
為了快速上線,改變了下方法,使用了host.update的api,直接把host link到NM的模板。
回過頭來看這個問題:
在通過頁面link模板時,其實也是調用了zabbix template相關的api(具體調用了template.update方法)
直接通過腳本來調用api測試:
測試腳本:
#!/usr/bin/env python
import urllib2
import sys
import json
def requestJason(url,values): data = json.dumps(values)
print data
req = urllib2.Request(url, data, {'Content-Type': 'application/json-rpc'})
response = urllib2.urlopen(req, data)
data_get = response.read()
output = json.loads(data_get)
print output
try:
message = output['result']
except:
message = output['error']['data']
quit()
print json.dumps(message)
return output
def authenticate(url, username, password):
values = {'jsonrpc': '2.0',
'method': 'user.login',
'params': {
'user': username,
'password': password
},
'id': '0'
}
idvalue = requestJason(url,values)
return idvalue['result']
def getTemplate(hostname,url,auth):
values = {'jsonrpc': '2.0',
'method': 'template.get',
'params': {
'output': "extend",
'filter': {
'host': hostname
}
},
'auth': auth,
'id': '2'
}
output = requestJason(url,values)
print output['result'][0]['hostid']
return output['result'][0]['hostid']
def changeTemplate(idx,id_list,url,auth):
values = {'jsonrpc': '2.0',
'method': 'template.update',
'params': {
"templateid":idx,
"templates":id_list
},
'auth': auth,
'id': '2'
}
output = requestJason(url,values)
print output
def main():
id_list = []
hostname = "Vipshop_Template_OS_Linux_Hadoop_Datanode_Pro"
url = 'xxxx'
username = 'admin'
password = 'xxxx'
auth = authenticate(url, username, password)
idx = getTemplate(hostname,url,auth)
temlist = ['Vipshop_Template_LB_Tengine_8090','Vipshop_Template_Redis_6379','Vipshop_Template_Redis_6380','Vipshop_Template_Redis_6381','Vipshop_Template_Redis_6382','Vipshop_Template_Redis_6383']
for tem in temlist:
idtemp = getTemplate(tem,url,auth)
id_list.append({"templateid":idtemp})
print id_list
#id_list = [{"templateid":'10843'},{"templateid":"10554"},{"templateid":"10467"},{"templateid":"10560"},{"templateid":"10566"},{"templateid":"10105"}]
changeTemplate(idx,id_list,url,auth)
if __name__ == '__main__':
main()
腳本結果:
urllib2.HTTPError: HTTP Error 500: Internal Server Error
因為api其實是發送了一個jason格式的post請求,手動使用curl來驗證:
curl -vvv -i -X POST -H 'Content-Type:application/json' -d
'{"params": {"templates": [{"templateid": "10117"}, {"templateid": "10132"}, {"templateid": "10133"}, {"templateid": "10134"},
{"templateid": "10135"}, {"templateid": "10136"}], "templateid": "10464"}, "jsonrpc": "2.0", "method": "template.update", "auth": "421a04b400e859834357b5681a586a5f", "id": "2"}'
http://zabbix.idc.vipshop.com/api_jsonrpc.php
返回500錯誤(即后端php處理時遇到錯誤導致),調整php的配置,把日志改成debug格式:
php-fpm.conf:
log_level = debug
在error log中發現如下錯誤:
[04-May-2014 14:04:32.115189] WARNING: pid 6270, fpm_request_check_timed_out(), line 271: [pool www] child 6294, script '/apps/svr/zabbix/wwwroot/api_jsonrpc.php' (request: "POST /api_jsonrpc.php") executing too slow (1.269946 sec), logging
[04-May-2014 14:04:32.115327] DEBUG: pid 6270, fpm_got_signal(), line 72: received SIGCHLD
[04-May-2014 14:04:32.115371] NOTICE: pid 6270, fpm_children_bury(), line 227: child 6294 stopped for tracing
[04-May-2014 14:04:32.115385] NOTICE: pid 6270, fpm_php_trace(), line 142: about to trace 6294
[04-May-2014 14:04:32.115835] NOTICE: pid 6270, fpm_php_trace(), line 170: finished trace of 6294
[04-May-2014 14:04:32.115874] DEBUG: pid 6270, fpm_event_loop(), line 409: event module triggered 1 events
[04May-2014 14:04:35.318614] WARNING: pid 6270, fpm_stdio_child_said(), line 166: [pool www] child 6294 said into stderr: "NOTICE: sapi_cgi_log_message(), line 663: PHP message: PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 512 bytes) in /apps/svr/zabbix/wwwroot/api/classes/CItem.php on line 1088"
[04-May-2014 14:04:35.318665] DEBUG: pid 6270, fpm_event_loop(), line 409: event module triggered 1 events
即在做link模板時,需要把相關的數據放在php的內存中,而默認的設置是128M,如果在item和host比較多的時候,很容易就會超過這個限制。
更改為:
memory_limit = 1280M
重新測試,返回了502 Bad Gateway錯誤,即后端執行超時導致。
error log:
[04-May-2014 14:50:21.318071] WARNING: pid 4131, fpm_request_check_timed_out(), line 281: [pool www] child 4147, script '/apps/svr/zabbix/wwwroot/api_jsonrpc.php' (request: "POST /api_jsonrpc.php") execution timed out (10.030883 sec), terminating
執行時間超過request_terminate_timeout 設置。導致502產生。
更改 request_terminate_timeout = 1800(默認是10s),max_execution_time = 0(默認30s),重新測試。ok.
小結
zabbix不同於一般的線上應用,在調用api做更新時,是一個batch的行為,對內存和執行時間有一定的要求。
因此要合理的設置php的相關參數,在debug的時候調低日志級別並開啟slow log來方便定位問題。