問題
之前在調用class內的函數用multiprocessing模塊的pool函數進行多線程處理的時候報了以下下錯誤信息:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
查了下官方文檔發現python默認只能pickle以下的類型:
- None, True, and False
- integers, floating point numbers, complex numbers
- strings, bytes, bytearrays
- tuples, lists, sets, and dictionaries containing only picklable objects
- functions defined at the top level of a module (using def, not lambda)
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose dict or the result of calling getstate() is picklable (see section -
- Pickling Class Instances for details).
函數只能pickle在頂層定義的函數,很明顯的class內的函數無法被pickle因此會報錯。
import multiprocessing
def work(): # top-level 函數
print "work!"
class Foo():
def work(self): # 非top-level函數
print "work"
pool1 = multiprocessing.Pool(processes=4)
foo = Foo()
pool1.apply_async(foo.work)
pool1.close()
pool1.join()
# 此時報錯
pool2 = multiprocessing.Pool(processes=4)
pool2.apply_async(work)
pool2.close()
pool2.join()
# 此時工作正常
解決方案
調用pathos
包下的multiprocessing模塊代替原生的multiprocessing。pathos中multiprocessing是用dill包改寫過的,dill包可以將幾乎所有python的類型都serialize,因此都可以被pickle。或者也可以自己用dill
寫一個(有點重復造輪子之嫌啊)