問題一:
錯誤描述:
/opt/cm-5.7.0/etc/init.d/cloudera-scm-agent status
cloudera-scm-agent dead but pid file exists
查看日志/opt/cm-5.7.0/log/cloudera-scm-agent/cloudera-scm-agent.log:
No socket could be created on ('testintf.novalocal', 9000) -- [Errno 99] Cannot assign requested address
此問題主要是網絡問題
1.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'命令獲取/etc/hosts文件中的IP和hostname
正規hosts如下:
127.0.0.1 localhost.xxxx localhost
111.222.333.444 aa.aa aa
555.666.777.888 bb.bb bb
上述命令獲取結果為 :111.111.111.111 aa.aa
此IP和ifocnfig中獲取的IP相同,(有公網和內網的同學,請自覺選擇內網ip)
hostname和hostname命令獲取的名稱一樣。
2. 同在一個內網的幾台服務器之間是相互通信的,但是使用公網IP就不可以了,所以CDH集群中需要大量的端口通信,所以在設置ocnfig.ini文件中的server_host時,選擇內網IP。
問題二:cm界面安裝時,agent服務不起,所在服務器不受管。導致后面agent時界面安裝的。在界面安裝中出現以下錯誤提示:
解決辦法:
1.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
和
hostname
兩種方式得出的主機名不同造成的。
2. Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
telnet 112.35.23.45 7182
ps -ef |grep PID?
3. Ensure that ports 9000 and 9001 are free on the host being added.
netstat|grep 9000
netstat |grep 9001
4. Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details)
這個目錄時agent服務起來之后才有的,如果agent 啟動失敗,則不會有。
問題三:
[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2161, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2183, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
verbose=self.__verbose
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/xmlrpc.py", line 470, in request
'' )
ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit
解決辦法:
kill 掉supervisored的進程,重啟,多試幾次就好了。