问题一:
错误描述:
/opt/cm-5.7.0/etc/init.d/cloudera-scm-agent status
cloudera-scm-agent dead but pid file exists
查看日志/opt/cm-5.7.0/log/cloudera-scm-agent/cloudera-scm-agent.log:
No socket could be created on ('testintf.novalocal', 9000) -- [Errno 99] Cannot assign requested address
此问题主要是网络问题
1.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'命令获取/etc/hosts文件中的IP和hostname
正规hosts如下:
127.0.0.1 localhost.xxxx localhost
111.222.333.444 aa.aa aa
555.666.777.888 bb.bb bb
上述命令获取结果为 :111.111.111.111 aa.aa
此IP和ifocnfig中获取的IP相同,(有公网和内网的同学,请自觉选择内网ip)
hostname和hostname命令获取的名称一样。
2. 同在一个内网的几台服务器之间是相互通信的,但是使用公网IP就不可以了,所以CDH集群中需要大量的端口通信,所以在设置ocnfig.ini文件中的server_host时,选择内网IP。
问题二:cm界面安装时,agent服务不起,所在服务器不受管。导致后面agent时界面安装的。在界面安装中出现以下错误提示:
解决办法:
1.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
和
hostname
两种方式得出的主机名不同造成的。
2. Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
telnet 112.35.23.45 7182
ps -ef |grep PID?
3. Ensure that ports 9000 and 9001 are free on the host being added.
netstat|grep 9000
netstat |grep 9001
4. Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details)
这个目录时agent服务起来之后才有的,如果agent 启动失败,则不会有。
问题三:
[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2161, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2183, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
verbose=self.__verbose
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/xmlrpc.py", line 470, in request
'' )
ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit
解决办法:
kill 掉supervisored的进程,重启,多试几次就好了。