python之NLP詞性標注

本文轉載自查看原文 2019-06-15 10:27 2221 自然語言處理(NLP)

1、知識點

包括中文和英文的詞性標注
主要使用的庫是nltk和jiaba

2、代碼

# coding = utf-8

import nltk
from nltk.corpus import stopwords
from nltk.corpus import brown
import numpy as np
"""
標注步驟:
    1、清洗，分詞
    2、標注
    
FAQ:
    1、 Resource punkt not found.
        請安裝punkt模塊 
    2、安裝average_perceptron tagger
    3、Resource sinica_treebank not found
        請安裝sinica_treebank模塊
"""
def english_label():
    """
    英文詞性標注
    :return:
    """
    # 分詞
    text = "Sentiment analysis is a challenging subject in machine learning.\
     People express their emotions in language that is often obscured by sarcasm,\
      ambiguity, and plays on words, all of which could be very misleading for \
      both humans and computers.".lower()
    text_list = nltk.word_tokenize(text)
    # 去掉標點符號
    english_punctuations = [',', '.', ':', ';', '?', '(', ')', '[', ']', '&', '!', '*', '@', '#', '$', '%']
    text_list = [word for word in text_list if word not in english_punctuations]
    # 去掉停用詞
    stops = set(stopwords.words("english"))
    text_list = [word for word in text_list if word not in stops]

    list = nltk.pos_tag(text_list) #打標簽
    print(list)


def chineses_label():
    import jieba.posseg as pseg
    import re
    """
    fool也可以針對中文詞性標注
    HanLP詞性標注集
    案例使用jieba進行詞性標注
    :return:
    """
    str = "我愛你，是粉色，舒服 ，舒服，士大夫"
    posseg_list = re.sub(r'[，]', " ", str)
    posseg_list =pseg.cut(posseg_list)
    print(posseg_list)
    print(' '.join('%s/%s' % (word, tag) for (word, tag) in posseg_list))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 nlp詞性標注的作用【Python & NLP】關於語料庫標注——詞性標注、分詞標注、類別標簽等-例如brat python中詞性標注【NLP】Python3.6.5中使用 Stanford NLP工具包進行詞性標注 02-NLP-05-使用HMM進行詞性標注詞性標注 nltk詞性標注什么是詞性標注（POS tagging）詞性標注的簡單綜述 jieba分詞的詞性標注