NLP(十二)指代消解


原文鏈接:http://www.one2know.cn/nlp12/

  • 代詞是用來代替重復出現的名詞
    例句:
    1.Ravi is a boy. He often donates money to the poor.
    先出現主語,后出現代詞,所以流動的方向從左到右,這類句子叫回指(Anaphora)
    2.He was already on his way to airport.Realized Ravi.
    這種句子表達的方式的逆序的,這類句子叫預指(Cataphora)
  • 代碼
import nltk
from nltk.chunk import tree2conlltags
from nltk.corpus import names # 有 人名和性別 標簽
import random

class AnaphoraExample:
    def __init__(self): # 不需要參數就能構造
        males = [(name,'male') for name in names.words('male.txt')]
        females = [(name,'female') for name in names.words('female.txt')]
        combined = males + females # 列表元素:人名和性別構成的元組
        random.shuffle(combined)
        # print(combined)
        training = [(self.feature(name),gender) for (name,gender) in combined]
        self._classifier = nltk.NaiveBayesClassifier.train(training) # 分類器

    def feature(self,word): # 單詞最后一個字母當特征
        return {'last(1)' : word[-1]}

    def gender(self,word): # 返回單詞放到分類器中得到的性別標簽
        return self._classifier.classify(self.feature(word))

    def learnAnaphora(self):
        sentences = [
            "John is a man. He walks",
            "John and Mary are married. They have two kids",
            "In order for Ravi to be successful, he should follow John",
            "John met Mary in Barista. She asked him to order a Pizza",
        ]

        for sent in sentences:
            chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)),binary=False)
            # 實現分詞,詞性標注,組塊(實體)抽取,返回組塊樹結果,賦給chunks
            stack = []
            print(sent)
            items = tree2conlltags(chunks) # 將一個句子展平成一個列表,並以IOB格式表示
            for item in items:
                if item[1] == 'NNP' and (item[2] == 'B-PERSON' or item[2] == '0'): # 人名
                    stack.append((item[0],self.gender(item[0]))) # 人名和性別的元組
                elif item[1] == 'CC': # 連詞
                    stack.append(item[0])
                elif item[1] == 'PRP': # 人稱代詞
                    stack.append(item[0])
            print('\t{}'.format(stack))

if __name__ == "__main__":
    anaphora = AnaphoraExample()
    anaphora.learnAnaphora()

輸出:

John is a man. He walks
	[('John', 'male'), 'He']
John and Mary are married. They have two kids
	[('John', 'male'), 'and', ('Mary', 'female'), 'They']
In order for Ravi to be successful, he should follow John
	[('Ravi', 'female'), 'he', ('John', 'male')]
John met Mary in Barista. She asked him to order a Pizza
	[('John', 'male'), ('Mary', 'female'), 'She', 'him']


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM