在Python中我们经常需要使用到多进程来提高我们程序性能,但是多进程的编程中经常有各种各样的问题来困扰我们,比如多进程和多线程的公用导致的子进程的卡死,进程间的通信等问题.还有一个问题我们也许不经常注意到,就是日志的记录.对于一些一次性的任务来说这个问题基本不存在,但是对于一些需要长期运行的Python任务,以及日志记录会比较多的任务来说,比如我们经常使用的各种Web框架,例如Flask和Tornado以及Django等.
1. 我们通常使用的Python的自带模块logging来记录日志,但是官方文档已经明确指出:这个模块不支持多进程,支持多线程.所以对于多进程的实现就需要我们自己来完成了.
2. 其实网络上已经有许多对于logging模块的多进程实现,基本都是使用了文件锁fcntl来实现的,我们也参考他们使用该模块,logging模块打印日志的都是由最终的hangdler实现的,所以我们只需要实现一个支持多进程打印日志的Handler即可.
3. 分析BaseRotatingHandler模块并结合官方文档,我们发现emit方法是最终处理日志输出的方法以及进行日志轮转的地方于是我们将锁加在这里,我们就将锁加在这个函数上面.
def emit(self, record): """ Emit a record. Output the record to the file, catering for rollover as described in doRollover(). """ try: if self.shouldRollover(record): self.doRollover() logging.FileHandler.emit(self, record) except Exception: self.handleError(record)
4. 文件锁的使用,fcntl的使用
这个锁只在linux上可以使用,但是一般来说我们的运行环境都是在liunx上其实问题不大.这个锁有好几种,我们这里只需要做到进程间的日志打印,所以使用排他锁.
import fcntl file_path = "/home/ubuntu/aaa.json" f = open(file_path, 'w') fcntl.flock(f.fileno(), fcntl.LOCK_EX) # 加锁,其它进程对文件操作则不能成功 f.write("something") fcntl.flock(f.fileno(), fcntl.LOCK_UN) # 解锁 f.close()
5. 具体实现
import logging import os import fcntl from logging.handlers import RotatingFileHandler class MyRotatingFileHandler(RotatingFileHandler): def emit(self, record): """ Emit a record. Output the record to the file, catering for rollover as described in doRollover(). """ f = None try: # 建议在初始化的时候将这个锁文件的目录初始化好,避免每次拼接,这里为了方便就写在这里了 lock_path = os.path.join(os.path.dirname(self.baseFilename,),'.loglockfile') # 为了加快打印速度,我们也可以做一个文件池来存储已经打开的文件的文件描述符, # 但是需要使用本进程的id作为唯一的key来保证这个文件是本进程打开的而不是从父进程继承的 f = open(lock_path,'wb') # 加锁 fcntl.flock(f.fileno(), fcntl.LOCK_EX) if self.shouldRollover(record): self.doRollover() logging.FileHandler.emit(self, record) except Exception: self.handleError(record) finally: if f: # 解锁以及关闭文件 fcntl.flock(f.fileno(), fcntl.LOCK_UN)
f.close()
6.你以为这就结束了?其实没有,在长期的使用过程中我们会发现,我们的文件打印日志虽然没有问题了,但是日志文件的切割却经常有问题.而且打印日志经常会丢失只剩下一个进程在打印日志,其他进程的日志却消失了.问题出在哪里了呢?,具体往下看
class RotatingFileHandler(BaseRotatingHandler): """ Handler for logging to a set of files, which switches from one file to the next when the current file reaches a certain size. """ def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False): """ Open the specified file and use it as the stream for logging. By default, the file grows indefinitely. You can specify particular values of maxBytes and backupCount to allow the file to rollover at a predetermined size. Rollover occurs whenever the current log file is nearly maxBytes in length. If backupCount is >= 1, the system will successively create new files with the same pathname as the base file, but with extensions ".1", ".2" etc. appended to it. For example, with a backupCount of 5 and a base file name of "app.log", you would get "app.log", "app.log.1", "app.log.2", ... through to "app.log.5". The file being written to is always "app.log" - when it gets filled up, it is closed and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc. exist, then they are renamed to "app.log.2", "app.log.3" etc. respectively. If maxBytes is zero, rollover never occurs. """ # If rotation/rollover is wanted, it doesn't make sense to use another # mode. If for example 'w' were specified, then if there were multiple # runs of the calling application, the logs from previous runs would be # lost if the 'w' is respected, because the log file would be truncated # on each run. if maxBytes > 0: mode = 'a' BaseRotatingHandler.__init__(self, filename, mode, encoding, delay) self.maxBytes = maxBytes self.backupCount = backupCount def doRollover(self): """ Do a rollover, as described in __init__(). """ if self.stream: self.stream.close() self.stream = None if self.backupCount > 0: for i in range(self.backupCount - 1, 0, -1): sfn = self.rotation_filename("%s.%d" % (self.baseFilename, i)) dfn = self.rotation_filename("%s.%d" % (self.baseFilename, i + 1)) if os.path.exists(sfn): if os.path.exists(dfn): os.remove(dfn) os.rename(sfn, dfn) dfn = self.rotation_filename(self.baseFilename + ".1") if os.path.exists(dfn): os.remove(dfn) self.rotate(self.baseFilename, dfn) if not self.delay: self.stream = self._open() def shouldRollover(self, record): """ Determine if rollover should occur. Basically, see if the supplied record would cause the file to exceed the size limit we have. """ if self.stream is None: # delay was set... self.stream = self._open() if self.maxBytes > 0: # are we rolling over? msg = "%s\n" % self.format(record) self.stream.seek(0, 2) #due to non-posix-compliant Windows feature if self.stream.tell() + len(msg) >= self.maxBytes: return 1 return 0
这是一段logging模块里面的源码,也是具体实现文件日志按照大小切割的具体逻辑.日志文件的大小的判断是根据文件的tell()方法加当前邀打印的msg长度来判断的,一旦文件超过限定的长度就会去切割日志,切割日志有一行代码 self.rotate(self.baseFilename, dfn),这个操作是当前在打印的日志文件和最文件切割的关键.我们看一下具体实现:
def rotate(self, source, dest): """ When rotating, rotate the current log. The default implementation calls the 'rotator' attribute of the handler, if it's callable, passing the source and dest arguments to it. If the attribute isn't callable (the default is None), the source is simply renamed to the destination. :param source: The source filename. This is normally the base filename, e.g. 'test.log' :param dest: The destination filename. This is normally what the source is rotated to, e.g. 'test.log.1'. """ if not callable(self.rotator): # Issue 18940: A file may not have been created if delay is True. if os.path.exists(source): os.rename(source, dest) else: self.rotator(source, dest)
这个方法就是将文件简单的重命名了一下,这样会导致多进程打印日志的时候,其中一个日志文件切换了日志文件,但是其他进程打印的时候,打印到了被重命名后的文件里面,然后又去切换日志导致日志大小出现问题,所以我们可以简单的重写这个rotate方法,我们只是将当前的文件(test.log)复制一份到test.log.1,然后将当前的test.log日志文件请空,然后就可以愉快的接着打印日志了,但是其他进程可以感知到这个变化吗?
其实是可以的,由于是我们打开文件的方式是'a'方式打开的,在shouldRollover方法中有一个tell()方法,会将当前日志的大小返回,系统会自动帮住我们将这个文件的当前大小告诉我们.貌似完美的解决了.但是经过我们的测试发现并不是这样,我们必须首先在文件中的写入一些东西之后才可以获得文件真实大小.所以我们调整顺序,首先打印然后再判断是否要去切换日志.这样就可以完美解决了.
import logging
import os
import fcntl
from logging.handlers import RotatingFileHandler
from shutil import copyfile
class MyRotatingFileHandler(RotatingFileHandler):
def emit(self, record):
"""
Emit a record.
Output the record to the file, catering for rollover as described
in doRollover().
"""
f = None
try:
# 建议在初始化的时候将这个锁文件的目录初始化好,避免每次拼接,这里为了方便就写在这里了
lock_path = os.path.join(os.path.dirname(self.baseFilename, ), '.loglockfile')
# 为了加快打印速度,我们也可以做一个文件池来存储已经打开的文件的文件描述符,
# 但是需要使用本进程的id作为唯一的key来保证这个文件是本进程打开的而不是从父进程继承的
f = open(lock_path, 'wb')
# 首先写入文件,获取到文件的真实大小
logging.FileHandler.emit(self, record)
# 加锁
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
if self.shouldRollover(record):
self.doRollover()
except Exception:
self.handleError(record)
finally:
if f:
# 解锁
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
def rotate(self, source, dest):
copyfile(source, dest)
# clear the log file
with open(source, 'wb'):
pass