jieba分詞單例模式及linux權限不夠情況下tmp_dir自定義


在linux環境下,沒有root權限的情況下,有時會碰到如下問題:

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Dumping model to file cache /tmp/jieba.cache
Dump cache file failed.
Traceback (most recent call last):
  File "/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py", line 153, in initialize
    _replace_file(fpath, cache_file)
OSError: [Errno 1] Operation not permitted

 

這是因為jieba默認情況下在/tmp下存儲緩存文件,然而不是root用戶,權限不夠。解決辦法是修改默認緩存文件的目錄,把緩存文件放在用戶的目錄下面。 jieba文檔提到了tmp_dir和cache_file可以改,所以我們查看了下源碼

/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py,文件52行-66行如下:
class Tokenizer(object):

    def __init__(self, dictionary=DEFAULT_DICT):
        self.lock = threading.RLock()
        if dictionary == DEFAULT_DICT:
            self.dictionary = dictionary
        else:
            self.dictionary = _get_abs_path(dictionary)
        self.FREQ = {}
        self.total = 0
        self.user_word_tag_tab = {}
        self.initialized = False
        self.tmp_dir = None
        # self.tmp_dir = '/'
        self.cache_file = None

修改源碼,在64行self.tmp_dir中可以設置自定義緩存路徑。 

 

另外一種方式是在代碼中修改,以下是jieba單例模式demo

 1 class Singleton(object):
 2     """
 3     Jieba Utils Class
 4     """
 5     _instance = None
 6 
 7     def __new__(cls, *args, **kwargs):
 8         if not cls._instance:
 9             cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)
10         return cls._instance
11 
12 
13 class JiebaUtil(Singleton):
14     """
15     jiebautil 工具包
16     """
17     _jieba_instance = None
18 
19     def get_instance(self):
20         """
21         get the global jieba instance
22         """
23         if self._jieba_instance:
24             return self._jieba_instance
25         print 'initialize...'
26         obj = jieba.Tokenizer()
27         obj.tmp_dir = dirpath
28         obj.load_userdict(user_dict_path)
29         obj.initialize()
30         self._jieba_instance = obj
31         return obj
32 
33 
34 if __name__ == '__main__':
35 
36     one = JiebaUtil()
37     two = JiebaUtil()
38 
39     print one == two
40 
41     tkn = one.get_instance()
42     tkn2 = one.get_instance()
43     print tkn == tkn2
44 
45     print id(one), id(two)
46 
47     print id(tkn), id(tkn2)

 

在27行中可以設置自定義的他們tmp_dir緩存路徑。

 

參考:

http://funhacks.net/2017/01/17/singleton/

https://blog.csdn.net/sijiaqi11/article/details/78601258

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM