word2vec初探（用python簡單實現）

本文轉載自查看原文 2017-12-27 09:44 12140 近義詞/ gensim/ NLP自然語言處理/ python/ jieba/ 模型/ word2vec/ 分詞/ Python

為什么要用這個?

因為看論文和博客的時候很常見,不論是干嘛的,既然這么火,不妨試試.

如何安裝

從網上爬數據下來
對數據進行過濾、分詞
用word2vec進行近義詞查找等操作

完整的工程傳到了我的github上了:https://github.com/n2meetu/word2vec.git

運行結果：

需要安裝的包，可以用pycharm的preference：

點「+」加號

同樣，點「+」加號。過一會兒會提示你安裝是否成功的。

整個的文件結構：

-語料（網上爬下來的）
-自定義詞典
-主要的python文件

主要的py文件：

# -*- coding: utf-8-*-
import jieba
from gensim.models import word2vec

#  去掉中英文狀態下的逗號、句號
def clearSen(comment):
    comment = comment.strip(' ')
    comment = comment.replace('、','')
    comment = comment.replace('~','。')
    comment = comment.replace('～','')
    comment = comment.replace('{"error_message": "EMPTY SENTENCE"}','')
    comment = comment.replace('…','')
    comment = comment.replace('\r', '')
    comment = comment.replace('\t', ' ')
    comment = comment.replace('\f', ' ')
    comment = comment.replace('/', '')
    comment = comment.replace('、', ' ')
    comment = comment.replace('/', '')
    comment = comment.replace(' ', '')
    comment = comment.replace(' ', '')
    comment = comment.replace('_', '')
    comment = comment.replace('?', ' ')
    comment = comment.replace('？', ' ')
    comment = comment.replace('了', '')
    comment = comment.replace('➕', '')
    return comment

# 用jieba進行分詞
comment = open('./corpus/comment.txt').read()
comment = clearSen(comment)
jieba.load_userdict('./user_dict/userdict_food.txt')
comment = ' '.join(jieba.cut(comment))

# 分完詞后保存到新的txt中
fo = open("./corpus/afterSeg.txt","w")
fo.write(comment)
print("finished!")
fo.close()

# 用 word2vec 進行訓練
sentences=word2vec.Text8Corpus(u'./corpus/afterSeg.txt')
# 第一個參數是訓練語料，第二個參數是小於該數的單詞會被剔除，默認值為5, 第三個參數是神經網絡的隱藏層單元數，默認為100
model=word2vec.Word2Vec(sentences,min_count=3, size=50, window=5, workers=4)

y2=model.similarity(u"不錯", u"好吃") #計算兩個詞之間的余弦距離
print(y2)

for i in model.most_similar(u"好吃"): #計算余弦距離最接近“滋潤”的10個詞
    print(i[0],i[1])

# 訓練詞向量時傳入的兩個參數也對訓練效果有很大影響，需要根據語料來決定參數的選擇，好的詞向量對NLP的分類、聚類、相似度判別等任務有重要意義

清洗數據的clearSen()不要笑。萌新就是這樣很傻很粗暴的……

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 word2vec及其python實現 word2vec公式推導及python簡單實現 Python實現word2Vec -model python 調用word2vec 基於pytorch實現word2vec word2vec模型訓練簡單案例 Spark Word2Vec算法代碼實現 word2vec模型原理與實現 Python Word2Vec參數內容 word2vec並行實現小記