在Python中我們經常需要使用到多進程來提高我們程序性能,但是多進程的編程中經常有各種各樣的問題來困擾我們,比如多進程和多線程的公用導致的子進程的卡死,進程間的通信等問題.還有一個問題我們也許不經常注意到,就是日志的記錄.對於一些一次性的任務來說這個問題基本不存在,但是對於一些需要長期運行的Python任務,以及日志記錄會比較多的任務來說,比如我們經常使用的各種Web框架,例如Flask和Tornado以及Django等.
1. 我們通常使用的Python的自帶模塊logging來記錄日志,但是官方文檔已經明確指出:這個模塊不支持多進程,支持多線程.所以對於多進程的實現就需要我們自己來完成了.
2. 其實網絡上已經有許多對於logging模塊的多進程實現,基本都是使用了文件鎖fcntl來實現的,我們也參考他們使用該模塊,logging模塊打印日志的都是由最終的hangdler實現的,所以我們只需要實現一個支持多進程打印日志的Handler即可.
3. 分析BaseRotatingHandler模塊並結合官方文檔,我們發現emit方法是最終處理日志輸出的方法以及進行日志輪轉的地方於是我們將鎖加在這里,我們就將鎖加在這個函數上面.
def emit(self, record): """ Emit a record. Output the record to the file, catering for rollover as described in doRollover(). """ try: if self.shouldRollover(record): self.doRollover() logging.FileHandler.emit(self, record) except Exception: self.handleError(record)
4. 文件鎖的使用,fcntl的使用
這個鎖只在linux上可以使用,但是一般來說我們的運行環境都是在liunx上其實問題不大.這個鎖有好幾種,我們這里只需要做到進程間的日志打印,所以使用排他鎖.
import fcntl file_path = "/home/ubuntu/aaa.json" f = open(file_path, 'w') fcntl.flock(f.fileno(), fcntl.LOCK_EX) # 加鎖,其它進程對文件操作則不能成功 f.write("something") fcntl.flock(f.fileno(), fcntl.LOCK_UN) # 解鎖 f.close()
5. 具體實現
import logging import os import fcntl from logging.handlers import RotatingFileHandler class MyRotatingFileHandler(RotatingFileHandler): def emit(self, record): """ Emit a record. Output the record to the file, catering for rollover as described in doRollover(). """ f = None try: # 建議在初始化的時候將這個鎖文件的目錄初始化好,避免每次拼接,這里為了方便就寫在這里了 lock_path = os.path.join(os.path.dirname(self.baseFilename,),'.loglockfile') # 為了加快打印速度,我們也可以做一個文件池來存儲已經打開的文件的文件描述符, # 但是需要使用本進程的id作為唯一的key來保證這個文件是本進程打開的而不是從父進程繼承的 f = open(lock_path,'wb') # 加鎖 fcntl.flock(f.fileno(), fcntl.LOCK_EX) if self.shouldRollover(record): self.doRollover() logging.FileHandler.emit(self, record) except Exception: self.handleError(record) finally: if f: # 解鎖以及關閉文件 fcntl.flock(f.fileno(), fcntl.LOCK_UN)
f.close()
6.你以為這就結束了?其實沒有,在長期的使用過程中我們會發現,我們的文件打印日志雖然沒有問題了,但是日志文件的切割卻經常有問題.而且打印日志經常會丟失只剩下一個進程在打印日志,其他進程的日志卻消失了.問題出在哪里了呢?,具體往下看
class RotatingFileHandler(BaseRotatingHandler): """ Handler for logging to a set of files, which switches from one file to the next when the current file reaches a certain size. """ def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False): """ Open the specified file and use it as the stream for logging. By default, the file grows indefinitely. You can specify particular values of maxBytes and backupCount to allow the file to rollover at a predetermined size. Rollover occurs whenever the current log file is nearly maxBytes in length. If backupCount is >= 1, the system will successively create new files with the same pathname as the base file, but with extensions ".1", ".2" etc. appended to it. For example, with a backupCount of 5 and a base file name of "app.log", you would get "app.log", "app.log.1", "app.log.2", ... through to "app.log.5". The file being written to is always "app.log" - when it gets filled up, it is closed and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc. exist, then they are renamed to "app.log.2", "app.log.3" etc. respectively. If maxBytes is zero, rollover never occurs. """ # If rotation/rollover is wanted, it doesn't make sense to use another # mode. If for example 'w' were specified, then if there were multiple # runs of the calling application, the logs from previous runs would be # lost if the 'w' is respected, because the log file would be truncated # on each run. if maxBytes > 0: mode = 'a' BaseRotatingHandler.__init__(self, filename, mode, encoding, delay) self.maxBytes = maxBytes self.backupCount = backupCount def doRollover(self): """ Do a rollover, as described in __init__(). """ if self.stream: self.stream.close() self.stream = None if self.backupCount > 0: for i in range(self.backupCount - 1, 0, -1): sfn = self.rotation_filename("%s.%d" % (self.baseFilename, i)) dfn = self.rotation_filename("%s.%d" % (self.baseFilename, i + 1)) if os.path.exists(sfn): if os.path.exists(dfn): os.remove(dfn) os.rename(sfn, dfn) dfn = self.rotation_filename(self.baseFilename + ".1") if os.path.exists(dfn): os.remove(dfn) self.rotate(self.baseFilename, dfn) if not self.delay: self.stream = self._open() def shouldRollover(self, record): """ Determine if rollover should occur. Basically, see if the supplied record would cause the file to exceed the size limit we have. """ if self.stream is None: # delay was set... self.stream = self._open() if self.maxBytes > 0: # are we rolling over? msg = "%s\n" % self.format(record) self.stream.seek(0, 2) #due to non-posix-compliant Windows feature if self.stream.tell() + len(msg) >= self.maxBytes: return 1 return 0
這是一段logging模塊里面的源碼,也是具體實現文件日志按照大小切割的具體邏輯.日志文件的大小的判斷是根據文件的tell()方法加當前邀打印的msg長度來判斷的,一旦文件超過限定的長度就會去切割日志,切割日志有一行代碼 self.rotate(self.baseFilename, dfn),這個操作是當前在打印的日志文件和最文件切割的關鍵.我們看一下具體實現:
def rotate(self, source, dest): """ When rotating, rotate the current log. The default implementation calls the 'rotator' attribute of the handler, if it's callable, passing the source and dest arguments to it. If the attribute isn't callable (the default is None), the source is simply renamed to the destination. :param source: The source filename. This is normally the base filename, e.g. 'test.log' :param dest: The destination filename. This is normally what the source is rotated to, e.g. 'test.log.1'. """ if not callable(self.rotator): # Issue 18940: A file may not have been created if delay is True. if os.path.exists(source): os.rename(source, dest) else: self.rotator(source, dest)
這個方法就是將文件簡單的重命名了一下,這樣會導致多進程打印日志的時候,其中一個日志文件切換了日志文件,但是其他進程打印的時候,打印到了被重命名后的文件里面,然后又去切換日志導致日志大小出現問題,所以我們可以簡單的重寫這個rotate方法,我們只是將當前的文件(test.log)復制一份到test.log.1,然后將當前的test.log日志文件請空,然后就可以愉快的接着打印日志了,但是其他進程可以感知到這個變化嗎?
其實是可以的,由於是我們打開文件的方式是'a'方式打開的,在shouldRollover方法中有一個tell()方法,會將當前日志的大小返回,系統會自動幫住我們將這個文件的當前大小告訴我們.貌似完美的解決了.但是經過我們的測試發現並不是這樣,我們必須首先在文件中的寫入一些東西之后才可以獲得文件真實大小.所以我們調整順序,首先打印然后再判斷是否要去切換日志.這樣就可以完美解決了.
import logging
import os
import fcntl
from logging.handlers import RotatingFileHandler
from shutil import copyfile
class MyRotatingFileHandler(RotatingFileHandler):
def emit(self, record):
"""
Emit a record.
Output the record to the file, catering for rollover as described
in doRollover().
"""
f = None
try:
# 建議在初始化的時候將這個鎖文件的目錄初始化好,避免每次拼接,這里為了方便就寫在這里了
lock_path = os.path.join(os.path.dirname(self.baseFilename, ), '.loglockfile')
# 為了加快打印速度,我們也可以做一個文件池來存儲已經打開的文件的文件描述符,
# 但是需要使用本進程的id作為唯一的key來保證這個文件是本進程打開的而不是從父進程繼承的
f = open(lock_path, 'wb')
# 首先寫入文件,獲取到文件的真實大小
logging.FileHandler.emit(self, record)
# 加鎖
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
if self.shouldRollover(record):
self.doRollover()
except Exception:
self.handleError(record)
finally:
if f:
# 解鎖
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
def rotate(self, source, dest):
copyfile(source, dest)
# clear the log file
with open(source, 'wb'):
pass