1. Cloudera Management Service服務全部無法啟動
觀察到的現象:
(1)cm service 組件無法啟動,啟動時服務請求超時終止;(2)主機信息也無法獲取到,一直提示“無法與服務端取得聯系”(3)cm-server日志中提示“Authentication failure for user: '__cloudera_internal_user__mgmt-EVENTSERVER-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82” 組件權限認證失敗。
猜到到可能的原因:
(1)scm-agent與scm-server服務連接異常;
(2)mysql數據庫連接異常,用戶認證失敗;
cloudera-scm-server 日志信息:
2019-01-29 08:44:10,188 INFO 780911426@scm-web-776:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-EVENTSERVER-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82 2019-01-29 08:44:10,194 INFO 416547936@scm-web-773:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-HOSTMONITOR-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82 2019-01-29 08:44:11,181 INFO 416547936@scm-web-773:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-SERVICEMONITOR-95d257fb4b0322939118ac4012bb8d4e' from 10.21.48.82
cloudera-scm-agent 日志信息:
[02/Jan/2019 16:20:21 +0000] 28617 MainThread agent ERROR Heartbeating to 10.21.48.82:7182 failed. Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.0-py2.6.egg/cmf/agent.py", line 1419, in _send_heartbeat self.master_port) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__ self.conn.connect() File "/usr/lib64/python2.6/httplib.py", line 742, in connect self.timeout) File "/usr/lib64/python2.6/socket.py", line 567, in create_connection raise error, msg error: [Errno 111] Connection refused
最后定位到了問題,是由於scm-agent連接scm-server的配置之前做過調整,導致scm-agent一直無法與scm-server取得聯系,修改scm-agent的連接信息,主要server_host和server_port都要確認下(之前修改了server_host連接還是無法正常取得聯系)。
修改scm-agent端所在的配置文件 /etc/cloudera-scm-agent/config.ini :
[General] # Hostname of the CM server. server_host=10.21.48.82 # Port that the CM server is listening on. server_port=7182
修改后,問題解決,cm service正常啟動。
Tips:定位問題要從整個系統架構層面去思考,熟悉架構的整體運行邏輯,猜測問題可能出現的環節,不要過早地陷入局部思維,然后就是一定要學會看log。