cloudstack下libvirtd服務無響應問題


      在cloudstack4.5.2版本下,偶爾出現libvirtd服務無響應的情況,導致virsh命令無法使用,同時伴隨cloudstack master丟失該slave主機連接的情況。懷疑是libvirtd服務或版本的問題,但是在官網上並沒有找到類似的bug提交,該問題可能還存在於更高的版本,需要時間進一步從根本上分析。下面是該問題的處理過程,在此記錄下,關注和使用cloudstack的朋友可以參考。

眾所周知,cloudstack的社區熱度遠不如openstack,為什么還要選擇clcoudstack?這個問題以后有機會再和大家聊。言歸正傳。

環境交代

宿主機操作系統:centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64

問題描述

通過cloudstackapi listHosts報警信息顯示:
node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有關cloudstackapi的使用方法在其它文章中總結,不在此處說明。

登陸問題宿主服務器檢查:
[root@node5 log]#virsh list --all
沒有響應ctrl^c退出
這時的vm可以正常工作,但處於失控狀態

嘗試重啟啟動libvirtd服務:
[root@node5 log]# service libvirtd stop
正在關閉 libvirtd 守護進程:                               [失敗]  #無法關閉libvirtd服務

嘗試重啟啟動cloudstack-agent服務:
[root@node5 libvirt]# service cloudstack-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
libvirtd故障依舊

簡單維護

[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:錯誤:Unable to initialize network sockets。查看 /var/log/messages 或者運行不帶 --daemon 的命令查看更多信息。

[root@node5 log]# libvirtd -d
可以執行成功,這時執行virsh list --all 可以查看和操作vm

[root@node5 log]#virsh list --all
Id    名稱                         狀態
----------------------------------------------------
 2     i-4-185-VM                     running

雖然vm運行正常,現在也可以通過命令正常管理了。但是對於cloudstack平台而言,宿主機處於down狀態,vm處於失控狀態。

臨時解決辦法是在其它大的升級和維護過程中重啟服務器解決,根本解決還要具體問題具體分析。

分析與排查

檢查進程

[root@node5 log]# ps ax |grep libvirtd
  6485 ?        R    863:37 libvirtd --daemon -l  #該服務始終處於run狀態

[root@node5 log]# top -p 6485
top -p 6485
top - 09:19:41 up 12 days, 22:27,  1 user,  load average: 3.05, 5.07, 6.64
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.8%us,  1.4%sy,  0.0%ni, 93.1%id,  0.6%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  264420148k total, 182040780k used, 82379368k free,   834232k buffers
Swap:  8388600k total,       92k used,  8388508k free, 100453708k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  6485 root      20   0  984m  12m 4440 R  100.2  0.0 844:22.68 libvirtd        #cpu占用100%,無法釋放,影響系統穩定性


殺進程

[root@node5 log]# kill -9 6485
[root@node5 log]# kill -9 6485
[root@master log]# ps ax |grep libvirtd  #檢查進程依然存在
  6485 ?        R    863:37 libvirtd --daemon -l
[root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:錯誤:Unable to initialize network sockets。查看 /var/log/messages 或者運行不帶 --daemon 的命令查看更多信息。
[root@node5 ~]# netstat -antp |grep 16509
tcp        0      0 0.0.0.0:16509               0.0.0.0:*                   LISTEN      3658/libvirtd       
tcp        1      0 192.168.14.25:16509         192.168.14.22:8717          CLOSE_WAIT  -                   
tcp        1      0 192.168.14.25:16509         192.168.14.20:5152          CLOSE_WAIT  -                   
tcp        1      0 192.168.14.25:16509         192.168.14.10:39359         CLOSE_WAIT  -                   
tcp        0      0 :::16509                    :::*                        LISTEN      3658/libvirtd       
tcp       39      0 ::1:16509                   ::1:19715                   CLOSE_WAIT  -  

經過上述操作,初步判斷libvirtd陷入了hang死狀態。

追蹤進程

[root@node5 log]#strace -f libvirtd      #后來發現這樣strace是不對的,正確的方法是strace -f -p 3658 針對pid進行strace
[pid 107570] close(23058)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23059)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23060)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23061)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23062)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23063)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23064)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23065)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23066)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23067)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23068)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23069)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23070)               = -1 EBADF (Bad file descriptor)
[pid 107570] close(23071)               = -1 EBADF (Bad file descriptor)
^C[pid 107570] close(23072 <unfinished ...>
Process 107559 detached
Process 107560 detached
Process 107561 detached
Process 107562 detached
Process 107563 detached
Process 107564 detached
Process 107565 detached
Process 107566 detached
Process 107567 detached
Process 107568 detached
Process 107569 detached
Process 107570 detached

父進程6485在不斷的產生和關閉子進程,並返回錯誤信息。Bad file descriptor的原因(如何觸發的,誰觸發的)? 循環為何無法退出?問題如何再現?

獲得更多的線索
官方文檔(libvirtd各種故障診斷記錄和解決辦法非常詳盡)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start

開啟系統日志
Change libvirt's logging in /etc/libvirt/libvirtd.conf by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf file in a text editor, remove the hash (or #) symbol from the beginning of the following line, and save the change:
log_outputs="3:syslog:libvirtd"

參照配置,重啟服務器等待下次故障觀察日志

......

Jun  1 12:42:26 node5 abrtd: New client connected
Jun  1 12:42:26 node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected
Jun  1 12:42:26 node5 abrt-server[70066]: Saved Python crash dump of pid 70065 to /var/spool/abrt/pyhook-2016-06-01-12:42:26-70065
Jun  1 12:42:26 node5 abrtd: Package 'cloudstack-common' isn't signed with proper key
Jun  1 12:42:26 node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with 1
Jun  1 12:42:26 node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065'
Jun  1 12:43:26 node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py'
......

Jun  6 10:36:21 node5 libvirtd: 102840: warning : qemuDomainObjBeginJobInternal:878 : Cannot start job (modify, none) for domain i-4-30-VM; current job is (modify, none) owned by (102925, 0)
Jun  6 10:36:21 node5 libvirtd: 102840: error : qemuDomainObjBeginJobInternal:883 : Timed out during operation: cannot acquire state change lock
Jun  6 10:39:59 node5 libvirtd: 114071: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun  6 10:39:59 node5 libvirtd: 114071: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun  6 10:40:46 node5 libvirtd: 114147: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun  6 10:40:46 node5 libvirtd: 114147: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun  6 10:42:15 node5 libvirtd: 114204: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun  6 10:42:15 node5 libvirtd: 114204: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun  6 10:47:05 node5 libvirtd: 114375: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun  6 10:47:05 node5 libvirtd: 114375: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun  6 10:47:23 node5 libvirtd: 114412: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun  6 10:47:23 node5 libvirtd: 114412: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
......

Jun 12 03:08:02 node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3111" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Jun 12 09:20:40 node5 libvirtd: 72575: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 12 09:20:40 node5 libvirtd: 72575: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': 資源暫時不可用

並未獲得致命錯誤和更多線索。(該日志配置選項還是很有必要打開的,很多問題都可以通過它來定位)

解決過程

解決思路

  • 嘗試和找到終止進程、重啟服務的方法
  • 提交bug,等待補丁升級
  • 分析源代碼,再現問題,解決問題(投入研發和時間)

由於不能再現問題,還是從簡入繁吧。觸發這些子進程的元凶是誰?還是cloudstack-agent的嫌疑最大,但之前重啟過該服務並沒有解決問題,那么agent服務是怎么一回事呢?

看下啟動腳本可以基本了解,

[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent

#!/bin/bash

# chkconfig: 35 99 10
# description: Cloud Agent

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well

. /etc/rc.d/init.d/functions

# set environment variables

SHORTNAME=$(basename $0 | sed -e 's/^[SK][0-9][0-9]//')
PIDFILE=/var/run/"$SHORTNAME".pid
LOCKFILE=/var/lock/subsys/"$SHORTNAME"
LOGDIR=/var/log/cloudstack/agent
LOGFILE=${LOGDIR}/agent.log
PROGNAME="Cloud Agent"
CLASS="com.cloud.agent.AgentShell"
JSVC=`which jsvc 2>/dev/null`;

# exit if we don't find jsvc
if [ -z "$JSVC" ]; then
    echo no jsvc found in path;
    exit 1;
fi

unset OPTIONS
[ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME"

# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT)
JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun"

for jdir in $JDK_DIRS; do
    if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
        JAVA_HOME="$jdir"
    fi
done
export JAVA_HOME

ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'`
PCP=`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'`

# We need to append the JSVC daemon JAR to the classpath
# AgentShell implements the JSVC daemon methods
export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$PCP:/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts"

start() {
    echo -n $"Starting $PROGNAME: "
    if hostname --fqdn >/dev/null 2>&1 ; then
        $JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \
            -errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS
        RETVAL=$?
        echo
    else
        failure
        echo
        echo The host name does not resolve properly to an IP address.  Cannot start "$PROGNAME". > /dev/stderr
        RETVAL=9
    fi
    [ $RETVAL = 0 ] && touch ${LOCKFILE}
    return $RETVAL
}

stop() {
    echo -n $"Stopping $PROGNAME: "
    $JSVC -pidfile "$PIDFILE" -stop $CLASS
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE}
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    status)
        status -p ${PIDFILE} $SHORTNAME
        RETVAL=$?
        ;;
    restart)
        stop
        sleep 3
        start
        ;;
    condrestart)
        if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then
            stop
            sleep 3
            start
        fi
        ;;
    *)
    echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}"
    RETVAL=3
esac

exit $RETVAL
View Code

[root@node5 libvirt]# ps ax |grep jsvc.exec

[root@node5 libvirt]# ps ax |grep jsvc.exec
 6655 ?        Ss     0:00 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
 6657 ?        Sl     0:05 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
View Code

重啟服務

[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid  6657) 正在運行...
[root@node5 bin]# service cloudstack-agent stop
Stopping Cloud Agent:

[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid  6657) 正在運行..

ps ax |grep jsvc.exec 也驗證了進程依然存在

眼前一亮的同時,也發現了之前使用restart帶來的問題,stop不成功的問題被掩蓋了~~~有沒有懊惱? 不過來不及反思,接下來的問題還遠不是這么簡單......

[root@node5 bin]# kill -9 6655 6657
[root@node5 bin]# kill -9 6655 6657
-bash: kill: (6655) - 沒有那個進程
-bash: kill: (6657) - 沒有那個進程
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但 pid 文件仍存
[root@node5 bin]# rm /var/run/cloudstack-agent.pid
rm:是否刪除普通文件 "/var/run/cloudstack-agent.pid"?y
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但是 subsys 被鎖
[root@node5 bin]# service cloudstack-agent start
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid  109382) 正在運行...
[root@node5 bin]# netstat -antp |grep 8250
tcp        0      0 192.168.14.20:22220         192.168.14.10:8250          ESTABLISHED 109382/jsvc.exec 

處理后狀態恢復正常,但是libvirtd仍然無法殺掉, 很快netstat -antp |grep 8250 狀態再次消失,cloudstack master平台監控主機記錄由Up狀態轉為disconnect狀態。不過畢竟不是down狀態,較之前已經有了進步。

啟動一個libvirtd -d看下,

[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
    6485 ?        R    863:37 libvirtd --daemon -l
  130057 ?        Sl     0:38 libvirtd -d
  28904 pts/0     S+     0:00 grep libvirtd

然后在cloudstack master平台上手工點擊強制重新連接該主機,成功了。主機監控狀態由disconnect轉為Up,這時再次嘗試殺掉6485仍然是不成功的,於是又在cloudstack master管理平台上嘗試着點擊操作了一下暫停vm命令,vm成功暫停。再返回服務器上觀察原來hung死的libvirtd進程已經消失。

[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
  130057 ?        Sl     0:38 libvirtd -d
  28904 pts/0     S+     0:00 grep libvirtd

至此既恢復了平台對該主機的管控,也終止了libvirtd異常進程。問題初步歸於cloudstack-agent在處理發送個libvirtd的信號上存在些小問題。以后再單獨分析下jsvc進程,再現問題和根本解決。

 

問題反思

在處理服務異常的問題上,命令行參數不要用restart,用stop和kill來調試。說起來都是淚!

 

相關鏈接

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start

 

-本文完-

2016-06-13 15:20:22


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM