跑一個使用jieba分詞的腳本出現問題
報錯如下:
Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Dumping model to file cache /tmp/jieba.cache Dump cache file failed. Traceback (most recent call last): File "/home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py", line 154, in initialize _replace_file(fpath, cache_file) PermissionError: [Errno 1] Operation not permitted: '/tmp/tmpg255ml7f' -> '/tmp/jieba.cache' Loading model cost 0.900 seconds. Prefix dict has been built successfully.
原因
是由於jieba在系統根目錄下創建緩存文件/temp/jieba.cache來存儲模型,但用戶權限不夠。
一般是在服務器上,因為不是root權限跑代碼,所以出現此錯誤
解決方法
是修改默認緩存文件的目錄,把緩存文件放在用戶目錄下。
在源碼line64把self.tmp_dir賦值為用戶目錄下的任意目錄例如"/home1/yanghan",self.cache_file不需要修改。
上面出錯時有提示:
File "/home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py", line 154, in initialize
_replace_file(fpath, cache_file)
就到這個目錄,修改jieba的源代碼:
vi /home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py"
把
class Tokenizer(object): def __init__(self, dictionary=DEFAULT_DICT): self.lock = threading.RLock() if dictionary == DEFAULT_DICT: self.dictionary = dictionary else: self.dictionary = _get_abs_path(dictionary) self.FREQ = {} self.total = 0 self.user_word_tag_tab = {} self.initialized = False self.tmp_dir = None self.cache_file = None
修改為:
class Tokenizer(object): def __init__(self, dictionary=DEFAULT_DICT): self.lock = threading.RLock() if dictionary == DEFAULT_DICT: self.dictionary = dictionary else: self.dictionary = _get_abs_path(dictionary) self.FREQ = {} self.total = 0 self.user_word_tag_tab = {} self.initialized = False self.tmp_dir = "/home1/yanghan/" self.cache_file = None
參考:
https://blog.csdn.net/u013421629/article/details/91393781