最近在用python tornado開發一個app的服務端。投產的系統肯定需要包含日志功能,這里就自然想到了用python自帶的logging庫。
logging中日志內容的輸出都交由Handler來實現,但是logging中的自帶的Handler都不能滿足我們的需求。
我們希望能按時間段分割日志,如果使用FileHandler日志只能保存在一個文件,到后期日志文件會非常大,讀寫都成問題;而TimedRotatingFileHandler雖然可分割日志但是多進程時可能會造成日志文件被相互覆蓋,導致日志丟失。
如此我便開始踏上找尋合適Handler的路上。
首先嘗試使用FileHandler,然后寫個腳本(比如用supervisord)定時切分日志的方式。但這里有一個問題是日志文件會被正在運行FileHandler保持,無法重命名,日志還是會一直寫到同一個文件,嘗試失敗。
然后嘗試使用繼承logging自帶的切分文件處理TimedRotatingFileHandler再重寫處理切分日志的方法。
這里使用了一個網友所寫的類

#!/usr/bin/env python # -*- coding: utf-8 -*- # # Copyright 2012 Ethan Zhang<http://github.com/Ethan-Zhang> # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. import time import os from logging.handlers import TimedRotatingFileHandler class MultiProcessTimedRotatingFileHandler(TimedRotatingFileHandler): def doRollover(self): """ do a rollover; in this case, a date/time stamp is appended to the filename when the rollover happens. However, you want the file to be named for the start of the interval, not the current time. If there is a backup count, then we have to get a list of matching filenames, sort them and remove the one with the oldest suffix. """ if self.stream: self.stream.close() # get the time that this sequence started at and make it a TimeTuple t = self.rolloverAt - self.interval if self.utc: timeTuple = time.gmtime(t) else: timeTuple = time.localtime(t) dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple) #if os.path.exists(dfn): # os.remove(dfn) if not os.path.exists(dfn): os.rename(self.baseFilename, dfn) if self.backupCount > 0: # find the oldest log file and delete it #s = glob.glob(self.baseFilename + ".20*") #if len(s) > self.backupCount: # s.sort() # os.remove(s[0]) for s in self.getFilesToDelete(): os.remove(s) #print "%s -> %s" % (self.baseFilename, dfn) self.mode = 'a' self.stream = self._open() currentTime = int(time.time()) newRolloverAt = self.computeRollover(currentTime) while newRolloverAt <= currentTime: newRolloverAt = newRolloverAt + self.interval #If DST changes and midnight or weekly rollover, adjust for this. if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc: dstNow = time.localtime(currentTime)[-1] dstAtRollover = time.localtime(newRolloverAt)[-1] if dstNow != dstAtRollover: if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour newRolloverAt = newRolloverAt - 3600 else: # DST bows out before next rollover, so we need to add an hour newRolloverAt = newRolloverAt + 3600 self.rolloverAt = newRolloverAt
最后經過驗證還是會出現日志文件被覆蓋的情況,這個方案又被自己給否定。
“文件式的Handler不能滿足需求,要就嘗試則用數據庫存日志吧。” 經過以上的失敗后這個想法在浮現在腦海。
要實現這個方式需要兩個步驟;第一步,自定義一個Handler;第二步,選擇一中數據庫存放日志。
如何自定義一個Handler呢?這個可用用順藤摸瓜的方式找到方案。
首先,到python logging庫中找到其自帶的handler;對比找到一個自定義Handler最基本的約束,也就是找到其中最簡單Handler的實現。
在handlers.py 文件下找到了最簡單的SMTPHandler

其中getSubject只是給emit內部調用,所以很容易就得出實現自定義Handler的方式為:繼承logging.Handler 然后再實現emit方法(寫入日志時會調用)。
第一步已經實現,第二步就是選數據庫了。
首先想選sqlite,查了一下其性能不太好,后面就嘗試選mongodb。
在要自力更生的寫一個之前,手賤的到網上一搜果然已經有人寫好的mongodb handler,唉...

import logging from bson.timestamp import Timestamp from pymongo import Connection from pymongo.collection import Collection from pymongo.errors import OperationFailure, PyMongoError """ Example format of generated bson document: { 'thread': -1216977216, 'threadName': 'MainThread', 'level': 'ERROR', 'timestamp': Timestamp(1290895671, 63), 'message': 'test message', 'module': 'test_module', 'fileName': '/var/projects/python/log4mongo-python/tests/test_handlers.py', 'lineNumber': 38, 'method': 'test_emit_exception', 'loggerName': 'testLogger', 'exception': { 'stackTrace': 'Traceback (most recent call last): File "/var/projects/python/log4mongo-python/tests/test_handlers.py", line 36, in test_emit_exception raise Exception(\'exc1\') Exception: exc1', 'message': 'exc1', 'code': 0 } } """ class MongoFormatter(logging.Formatter): DEFAULT_PROPERTIES = logging.LogRecord('', '', '', '', '', '', '', '').__dict__.keys() def format(self, record): """Formats LogRecord into python dictionary.""" # Standard document document = { 'timestamp': Timestamp(int(record.created), int(record.msecs)), 'level': record.levelname, 'thread': record.thread, 'threadName': record.threadName, 'message': record.getMessage(), 'loggerName': record.name, 'fileName': record.pathname, 'module': record.module, 'method': record.funcName, 'lineNumber': record.lineno } # Standard document decorated with exception info if record.exc_info is not None: document.update({ 'exception': { 'message': str(record.exc_info[1]), 'code': 0, 'stackTrace': self.formatException(record.exc_info) } }) # Standard document decorated with extra contextual information if len(self.DEFAULT_PROPERTIES) != len(record.__dict__): contextual_extra = set(record.__dict__).difference(set(self.DEFAULT_PROPERTIES)) if contextual_extra: for key in contextual_extra: document[key] = record.__dict__[key] return document class MongoHandler(logging.Handler): def __init__(self, level=logging.NOTSET, host='localhost', port=27017, database_name='logs', collection='logs', username=None, password=None, fail_silently=False, formatter=None, capped=False, capped_max=1000, capped_size=1000000, **options): """Setting up mongo handler, initializing mongo database connection via pymongo.""" logging.Handler.__init__(self, level) self.host = host self.port = port self.database_name = database_name self.collection_name = collection self.username = username self.password = password self.fail_silently = fail_silently self.connection = None self.db = None self.collection = None self.authenticated = False self.formatter = formatter or MongoFormatter() self.capped = capped self.capped_max = capped_max self.capped_size = capped_size self.options = options self._connect() def _connect(self): """Connecting to mongo database.""" try: self.connection = Connection(host=self.host, port=self.port, **self.options) except PyMongoError: if self.fail_silently: return else: raise self.db = self.connection[self.database_name] if self.username is not None and self.password is not None: self.authenticated = self.db.authenticate(self.username, self.password) if self.capped: try: # We don't want to override the capped collection (and it throws an error anyway) self.collection = Collection(self.db, self.collection_name, capped=True, max=self.capped_max, size=self.capped_size) except OperationFailure: # Capped collection exists, so get it. self.collection = self.db[self.collection_name] else: self.collection = self.db[self.collection_name] def close(self): """If authenticated, logging out and closing mongo database connection.""" if self.authenticated: self.db.logout() if self.connection is not None: self.connection.disconnect() def emit(self, record): """Inserting new logging record to mongo database.""" if self.collection is not None: try: self.collection.save(self.format(record)) except Exception: if not self.fail_silently: self.handleError(record)
還好我發現了他的連接不支持mongodb 的主從模式和副本集模式。哈哈,找到了發泄我寫代碼情緒的地方了(雖然只寫了兩三行代碼)。

mongodb_url='mongodb://192.168.10.200:10001,192.168.10.201:10001' handler=MongoHandler(url=mongodb_url) logger.addHandler(handler)
加上這幾句就可以開始使用啦。
本來這應該算高一段落了,但我又想到了這個方案的缺欠——這個日志系統要依賴以數據庫,而我們的日志系統是整個系統的基礎模塊,這個方案又不太合適了。
讓我們回到最初的夢想,日志還是直接寫到文件中,logging只帶的不能有效的分割文件我們就自己寫一個。
於是有了這一個完全自己寫、支持按時間分割的Handler
#!/usr/bin/env python # -*- coding:utf-8 -*- import logging import os,os.path import datetime _filefmt=os.path.join("logs","%Y-%m-%d","%H.log") class MyLoggerHandler(logging.Handler): def __init__(self,filefmt=None): self.filefmt=filefmt if filefmt is None: self.filefmt=_filefmt logging.Handler.__init__(self) def emit(self,record): msg=self.format(record) _filePath=datetime.datetime.now().strftime(self.filefmt) _dir=os.path.dirname(_filePath) try: if os.path.exists(_dir) is False: os.makedirs(_dir) except Exception: print "can not make dirs" print "filepath is "+_filePath pass try: _fobj=open(_filePath,'a') _fobj.write(msg) _fobj.write("\n") _fobj.flush() _fobj.close() except Exception: print "can not write to file" print "filepath is "+_filePath pass
這里的思路是每次寫日志完后馬上釋放文件的句柄,這樣這多進程中就不會照成沖突了。當然這里還可以最一下優化就是先緩沖一部分內容、或一段時間再一次性寫入(這個后面覺得性能不敢再做吧)。
回顧一下這個找尋合適Handler的過程,主要集中在三步。第一、確認python自帶的handler不合適,第二、確認handler的自定義實現方式,第三、選擇日志的存儲載體(這里思路似乎走的有點遠,遠離了最初的設想)。值得欣慰的是這個過程終於可以告一段落了。