深度學習之NLP獲取詞向量

本文轉載自查看原文 2019-06-17 17:37 455 自然語言處理(NLP)

1、代碼

def clean_text(text, remove_stopwords=False):
    """
    數據清洗
    """
    text = BeautifulSoup(text, 'html.parser').get_text()
    text = re.sub(r'[^a-zA-Z]', ' ', text)
    words = text.lower().split()
    if remove_stopwords:
        words = [w for w in words if w not in eng_stopwords]
    return words

def to_review_vector(review):
    """
    獲取詞向量
    """
    global word_vec
    
    review = clean_text(review, remove_stopwords=True)
    #print (review)
    #words = nltk.word_tokenize(review)
    word_vec = np.zeros((1,300))
    for word in review:
        #word_vec = np.zeros((1,300))
        if word in model:
            word_vec += np.array([model[word]]) 
    #print (word_vec.mean(axis = 0))
    return pd.Series(word_vec.mean(axis = 0))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Bert獲取詞向量的過程 NLP獲取詞向量的方法（Glove、n-gram、word2vec、fastText、ELMo 對比分析） gensim生成詞向量並獲取詞向量矩陣 NLP與深度學習（一）NLP任務流程 NLP與深度學習（四）Transformer模型 NLP學習（1）---Glove模型---詞向量模型 NLP與深度學習（五）BERT預訓練模型 NLP&深度學習：近期趨勢概述 NLP與深度學習（二）循環神經網絡 NLP與深度學習（六）BERT模型的使用