Python 如何安全地实现实现多进程日志以及日志正常的分割


  在Python中我们经常需要使用到多进程来提高我们程序性能,但是多进程的编程中经常有各种各样的问题来困扰我们,比如多进程和多线程的公用导致的子进程的卡死,进程间的通信等问题.还有一个问题我们也许不经常注意到,就是日志的记录.对于一些一次性的任务来说这个问题基本不存在,但是对于一些需要长期运行的Python任务,以及日志记录会比较多的任务来说,比如我们经常使用的各种Web框架,例如Flask和Tornado以及Django等.

  1. 我们通常使用的Python的自带模块logging来记录日志,但是官方文档已经明确指出:这个模块不支持多进程,支持多线程.所以对于多进程的实现就需要我们自己来完成了.

  

 

 

 

  2. 其实网络上已经有许多对于logging模块的多进程实现,基本都是使用了文件锁fcntl来实现的,我们也参考他们使用该模块,logging模块打印日志的都是由最终的hangdler实现的,所以我们只需要实现一个支持多进程打印日志的Handler即可.

  3. 分析BaseRotatingHandler模块并结合官方文档,我们发现emit方法是最终处理日志输出的方法以及进行日志轮转的地方于是我们将锁加在这里,我们就将锁加在这个函数上面.

  

def emit(self, record):
        """
        Emit a record.

        Output the record to the file, catering for rollover as described
        in doRollover().
        """
        try:
            if self.shouldRollover(record):
                self.doRollover()
            logging.FileHandler.emit(self, record)
        except Exception:
            self.handleError(record)

 

  4. 文件锁的使用,fcntl的使用

  这个锁只在linux上可以使用,但是一般来说我们的运行环境都是在liunx上其实问题不大.这个锁有好几种,我们这里只需要做到进程间的日志打印,所以使用排他锁.

import fcntl
file_path = "/home/ubuntu/aaa.json"
f = open(file_path, 'w')
fcntl.flock(f.fileno(), fcntl.LOCK_EX)    # 加锁,其它进程对文件操作则不能成功
f.write("something")
fcntl.flock(f.fileno(), fcntl.LOCK_UN)    # 解锁
f.close()

  5. 具体实现

  

import logging
import os
import fcntl

from logging.handlers import RotatingFileHandler


class MyRotatingFileHandler(RotatingFileHandler):

    def emit(self, record):
        """
        Emit a record.

        Output the record to the file, catering for rollover as described
        in doRollover().
        """
        f = None
        try:
            # 建议在初始化的时候将这个锁文件的目录初始化好,避免每次拼接,这里为了方便就写在这里了
            lock_path = os.path.join(os.path.dirname(self.baseFilename,),'.loglockfile')
            # 为了加快打印速度,我们也可以做一个文件池来存储已经打开的文件的文件描述符,
            # 但是需要使用本进程的id作为唯一的key来保证这个文件是本进程打开的而不是从父进程继承的
            f = open(lock_path,'wb')
            # 加锁
            fcntl.flock(f.fileno(), fcntl.LOCK_EX)
            if self.shouldRollover(record):
                self.doRollover()
            logging.FileHandler.emit(self, record)
        except Exception:
            self.handleError(record)
        finally:
            if f:
                # 解锁以及关闭文件
                fcntl.flock(f.fileno(), fcntl.LOCK_UN)
f.close()

  6.你以为这就结束了?其实没有,在长期的使用过程中我们会发现,我们的文件打印日志虽然没有问题了,但是日志文件的切割却经常有问题.而且打印日志经常会丢失只剩下一个进程在打印日志,其他进程的日志却消失了.问题出在哪里了呢?,具体往下看

class RotatingFileHandler(BaseRotatingHandler):
    """
    Handler for logging to a set of files, which switches from one file
    to the next when the current file reaches a certain size.
    """
    def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False):
        """
        Open the specified file and use it as the stream for logging.

        By default, the file grows indefinitely. You can specify particular
        values of maxBytes and backupCount to allow the file to rollover at
        a predetermined size.

        Rollover occurs whenever the current log file is nearly maxBytes in
        length. If backupCount is >= 1, the system will successively create
        new files with the same pathname as the base file, but with extensions
        ".1", ".2" etc. appended to it. For example, with a backupCount of 5
        and a base file name of "app.log", you would get "app.log",
        "app.log.1", "app.log.2", ... through to "app.log.5". The file being
        written to is always "app.log" - when it gets filled up, it is closed
        and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc.
        exist, then they are renamed to "app.log.2", "app.log.3" etc.
        respectively.

        If maxBytes is zero, rollover never occurs.
        """
        # If rotation/rollover is wanted, it doesn't make sense to use another
        # mode. If for example 'w' were specified, then if there were multiple
        # runs of the calling application, the logs from previous runs would be
        # lost if the 'w' is respected, because the log file would be truncated
        # on each run.
        if maxBytes > 0:
            mode = 'a'
        BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
        self.maxBytes = maxBytes
        self.backupCount = backupCount

    def doRollover(self):
        """
        Do a rollover, as described in __init__().
        """
        if self.stream:
            self.stream.close()
            self.stream = None
        if self.backupCount > 0:
            for i in range(self.backupCount - 1, 0, -1):
                sfn = self.rotation_filename("%s.%d" % (self.baseFilename, i))
                dfn = self.rotation_filename("%s.%d" % (self.baseFilename,
                                                        i + 1))
                if os.path.exists(sfn):
                    if os.path.exists(dfn):
                        os.remove(dfn)
                    os.rename(sfn, dfn)
            dfn = self.rotation_filename(self.baseFilename + ".1")
            if os.path.exists(dfn):
                os.remove(dfn)
            self.rotate(self.baseFilename, dfn)
        if not self.delay:
            self.stream = self._open()

    def shouldRollover(self, record):
        """
        Determine if rollover should occur.

        Basically, see if the supplied record would cause the file to exceed
        the size limit we have.
        """
        if self.stream is None:                 # delay was set...
            self.stream = self._open()
        if self.maxBytes > 0:                   # are we rolling over?
            msg = "%s\n" % self.format(record)
            self.stream.seek(0, 2)  #due to non-posix-compliant Windows feature
            if self.stream.tell() + len(msg) >= self.maxBytes:
                return 1
        return 0

  这是一段logging模块里面的源码,也是具体实现文件日志按照大小切割的具体逻辑.日志文件的大小的判断是根据文件的tell()方法加当前邀打印的msg长度来判断的,一旦文件超过限定的长度就会去切割日志,切割日志有一行代码 self.rotate(self.baseFilename, dfn),这个操作是当前在打印的日志文件和最文件切割的关键.我们看一下具体实现:

def rotate(self, source, dest):
        """
        When rotating, rotate the current log.

        The default implementation calls the 'rotator' attribute of the
        handler, if it's callable, passing the source and dest arguments to
        it. If the attribute isn't callable (the default is None), the source
        is simply renamed to the destination.

        :param source: The source filename. This is normally the base
                       filename, e.g. 'test.log'
        :param dest:   The destination filename. This is normally
                       what the source is rotated to, e.g. 'test.log.1'.
        """
        if not callable(self.rotator):
            # Issue 18940: A file may not have been created if delay is True.
            if os.path.exists(source):
                os.rename(source, dest)
        else:
            self.rotator(source, dest)

这个方法就是将文件简单的重命名了一下,这样会导致多进程打印日志的时候,其中一个日志文件切换了日志文件,但是其他进程打印的时候,打印到了被重命名后的文件里面,然后又去切换日志导致日志大小出现问题,所以我们可以简单的重写这个rotate方法,我们只是将当前的文件(test.log)复制一份到test.log.1,然后将当前的test.log日志文件请空,然后就可以愉快的接着打印日志了,但是其他进程可以感知到这个变化吗?

  其实是可以的,由于是我们打开文件的方式是'a'方式打开的,在shouldRollover方法中有一个tell()方法,会将当前日志的大小返回,系统会自动帮住我们将这个文件的当前大小告诉我们.貌似完美的解决了.但是经过我们的测试发现并不是这样,我们必须首先在文件中的写入一些东西之后才可以获得文件真实大小.所以我们调整顺序,首先打印然后再判断是否要去切换日志.这样就可以完美解决了.

 
 
import logging
import os
import fcntl

from logging.handlers import RotatingFileHandler
from shutil import copyfile


class MyRotatingFileHandler(RotatingFileHandler):

def emit(self, record):
"""
Emit a record.

Output the record to the file, catering for rollover as described
in doRollover().
"""
f = None
try:
# 建议在初始化的时候将这个锁文件的目录初始化好,避免每次拼接,这里为了方便就写在这里了
lock_path = os.path.join(os.path.dirname(self.baseFilename, ), '.loglockfile')
# 为了加快打印速度,我们也可以做一个文件池来存储已经打开的文件的文件描述符,
# 但是需要使用本进程的id作为唯一的key来保证这个文件是本进程打开的而不是从父进程继承的
f = open(lock_path, 'wb')
# 首先写入文件,获取到文件的真实大小
logging.FileHandler.emit(self, record)
# 加锁
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
if self.shouldRollover(record):
self.doRollover()

except Exception:
self.handleError(record)
finally:
if f:
# 解锁
fcntl.flock(f.fileno(), fcntl.LOCK_UN)

def rotate(self, source, dest):

copyfile(source, dest)
# clear the log file
with open(source, 'wb'):
pass
 

 

  

  


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM