Python檢測服務端口存活狀態並報警


最近發現公司的測試環境中有個Socket服務的端口總是莫名其妙Down掉,但是服務卻正常運行着,看樣子是僵死了。。。

雖然是測試環境,但是也不能這樣放着不管,於是連夜寫了一個簡單的監控腳本。因為服務器是Windows的,所以要用到wmi模塊。邏輯如下:

1、使用CMD命令"net start"獲取系統中處於運行狀態的服務,將這些服務名稱生成一個列表。

2、判斷監控的服務是否存在於列表中,如果不存在說明服務已經停止,那么將嘗試啟動服務,並發送報警郵件。

3、向本地的Socket服務端口發送一個connect,如果捕獲到異常將嘗試重啟服務,並發送報警郵件。

4、每次執行時腳本將會循環執行以上步驟兩次,間隔10秒,以確保服務狀態正常。

在運行的時候發現了一個問題,Python使用wmi模塊來對Windows系統進行操作的時候速度格外的慢,不知道有沒有其他的代替方法,哪位如果有更好的方法可以指點一下。

更新:用Windows CMD命令"net start"代替了wmi模塊獲取運行中的服務名列表。

 

源碼如下:

#!/usr/bin/env python

import os
import wmi
import time
import socket
import smtplib
import logging
from email.mime.text import MIMEText


def get_stop_service(designation):
    """To obtain a list of running the service name,
    check whether the monitoring server is present in the list.
    """
    lines = os.popen('net start').readlines()
    line = [item.strip() for item in [i for i in lines]]
    if designation in line:
        return True
    else:
        logging.error('Service [%s] is down, try to restart the service. \r\n' % designation)
        return False

def monitor(sname):
    """Send the machine IP port 20000 socket request,
    If capture the abnormal returns false.
    """
    s = socket.socket()
    s.settimeout(3)  # timeout
    host = ('127.0.0.1', 20000)
    try:  # Try connection to the host
        s.connect(host)
    except socket.error as e:
        logging.warning('[%s] service connection failed: %s \r\n' % (sname, e))
        return False
    return True


def restart_service(rstname, conn, run):
    """First check whether the service is stopped,
    if stop, start the service directly.
    The check whether the zombies,
    if a zombie, then restart the service.
    """
    flag = False
    try:
        # From get_stop_service() to obtain the return value, the return value
        if not run:
            ret = os.system('sc start "%s"' % rstname)
            if ret != 0:
                raise Exception('[Errno %s]' % ret)
            flag = True
        elif not conn:
            retStop = os.system('sc stop "%s"' % rstname)
            retSart = os.system('sc start "%s"' % rstname)
            if retSart != 0:
                raise Exception('retStop [Status code %s] '
                                'retSart [Status code %s] ' % (retStop, retSart))
            flag = True
        else:
            logging.info('[%s] service running status to normal' % rstname)
            return True
    except Exception as e:
        logging.warning('[%s] service restart failed: %s \r\n' % (rstname, e))
        return flag


def send_mail(to_list, sub, contents):
    """
    Send alarm mail.
    """
    mail_server = 'mail.stmp.com'  # STMP Server
    mail_user = 'YouAccount'  # Mail account
    mail_pass = 'Password'  # password
    mail_postfix = 'smtp.com'  # Domain name

    me = 'Monitor alarm<%s@%s>' % (mail_user, mail_postfix)
    message = MIMEText(contents, _subtype='html', _charset='utf-8')

    message['Subject'] = sub
    message['From'] = me
    message['To'] = ';'.join(to_list)

    flag = False  # To determine whether a mail sent successfully
    try:
        s = smtplib.SMTP()
        s.connect(mail_server)
        s.login(mail_user, mail_pass)
        s.sendmail(me, to_list, message.as_string())
        s.close()
        flag = True
    except Exception, e:
        logging.warning('Send mail failed, exception: [%s]. \r\n' % e)

    return flag


def main(sname):
    """Parameter type in the name of the service need to monitor,
    perform functions defined in turn, and the return value is correct.
    After the program is running, will test two times,
    each time interval to 10 seconds.
    """
    retry = 2
    count = 0
    retval = False  # Used return to the state of the socket
    while count < retry:
        ret = monitor(sname)
        if not ret:  # If socket connection is normaol, return retval
            retval = ret
            return retval
        isDown = get_stop_service(sname)
        restart_service(rstname=sname, conn=ret, run=isDown)

        host = socket.gethostname()
        address = socket.gethostbyname(host)
        mailto_list = ['mail@smtp.com', ]  # Alarm contacts
        send_mail(mailto_list,
                  'Alarm',
                  ' <h4>Level: <u>ERROR</u></br> Host name: %s</br>'
                  ' IP Address: %s</br>'
                  ' Service name:</h4> <h5>%s</h5>'
                  % (host, address, sname))
        count += 1
        time.sleep(10)
    else:
        logging.error('[%s] service try to restart more than three times \r\n' % sname)

    return retval


if __name__ == '__main__':

    logging.basicConfig(level=logging.INFO,
                        format='%(asctime)s %(levelname)s %(message)s',
                        datefmt='%Y/%m/%d %H:%M:%S',
                        filename='D:\\logs\\Monitor.log',
                        filemode='ab')

    name = 'Service Name'
    response = main(name)
    if response:
        logging.info('The [%s] service connection is normal \r\n' % name)

以上代碼還是有可以改進的地方,將多個服務名寫到文件中,程序去讀取文件中的服務依次進行檢測。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM