hanlp學習七：依存句法分析

本文轉載自查看原文 2020-01-27 13:02 1610 自然語言處理

一概念：

從屬詞：一個詞修飾另一個詞

支配詞：被修飾的詞語

依存關系：從屬詞與支配詞間語法關系

依存句法樹：將一個句子中所有詞語的依存關系以有向的形式表示出來，就會得到一顆樹

依存句法樹庫：由大量人工標注的依存句法樹組成的語料庫

依存句法分析：分析句子的依存語法的一種中高級NLP人物，其輸入通常是詞語與詞性，輸出則是一棵依存句法樹。

二基於轉移的依存句法分析流程：

將一棵依存句法樹的構建過程表示為兩個動作，如果機器學習模型能夠根據句子的某些特征准確預測這些動作，那么計算機能夠根據這些動作拼裝出正確的依存句法樹了。這種拼裝動作稱為轉移

a.確定轉移系統

轉移系統（虛擬機器）根據自己的狀態和輸入的單詞預測下一步要執行的移動動作，最后根據轉移動作拼裝句法樹

轉移系統主要負責制定所有可執行的動作以及相應的條件

b.特征提取

有了特征之后，轉移系統的一個狀態就被表示為一個稀疏的二進制向量

c.規范：

將語料庫中的依存句法樹轉換為正確的轉移動作序列，以供機器學習

c.分類器預測轉移動作

三代碼：

訓練模型

# -*- coding:utf-8 -*-
# Author: hankcs
# Date: 2019-02-11 23:18
# 《自然語言處理入門》12.5.1 訓練模型
# 配套書籍：http://nlp.hankcs.com/book.php
# 討論答疑：https://bbs.hankcs.com/

from pyhanlp import *
import zipfile
import os

from pyhanlp.static import download, remove_file, HANLP_DATA_PATH


def test_data_path():
    """
    獲取測試數據路徑，位於$root/data/test，根目錄由配置文件指定。
    :return:
    """
    data_path = os.path.join(HANLP_DATA_PATH, 'test')
    if not os.path.isdir(data_path):
        os.mkdir(data_path)
    return data_path


def ensure_data(data_name, data_url):
    root_path = test_data_path()
    dest_path = os.path.join(root_path, data_name)
    if os.path.exists(dest_path):
        return dest_path
    if data_url.endswith('.zip'):
        dest_path += '.zip'
    download(data_url, dest_path)
    if data_url.endswith('.zip'):
        with zipfile.ZipFile(dest_path, "r") as archive:
            archive.extractall(root_path)
        remove_file(dest_path)
        dest_path = dest_path[:-len('.zip')]
    return dest_path

KBeamArcEagerDependencyParser = JClass('com.hankcs.hanlp.dependency.perceptron.parser.KBeamArcEagerDependencyParser')
CTB_ROOT = ensure_data("ctb8.0-dep", "http://file.hankcs.com/corpus/ctb8.0-dep.zip")
CTB_TRAIN = CTB_ROOT + "/train.conll"#訓練集
CTB_DEV = CTB_ROOT + "/dev.conll" # 開發集
CTB_TEST = CTB_ROOT + "/test.conll" # 詞聚類問件
CTB_MODEL = CTB_ROOT + "/ctb.bin" # 模型
BROWN_CLUSTER = ensure_data("wiki-cn-cluster.txt", "http://file.hankcs.com/corpus/wiki-cn-cluster.zip")

if __name__ == '__main__':
    parser = KBeamArcEagerDependencyParser.train(CTB_TRAIN, CTB_DEV, BROWN_CLUSTER, CTB_MODEL)
    print(parser.parse("人吃魚"))
    score = parser.evaluate(CTB_TEST)
    print("UAS=%.1f LAS=%.1f\n" % (score[0], score[1]))

意見抽取例子

# -*- coding:utf-8 -*-
# Author: hankcs
# Date: 2019-06-02 18:03
# 《自然語言處理入門》12.6 案例:基於依存句法樹的意見抽取
# 配套書籍：http://nlp.hankcs.com/book.php
# 討論答疑：https://bbs.hankcs.com/

from pyhanlp import *

CoNLLSentence = JClass('com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLSentence')
CoNLLWord = JClass('com.hankcs.hanlp.corpus.dependency.CoNll.CoNLLWord')
IDependencyParser = JClass('com.hankcs.hanlp.dependency.IDependencyParser')
KBeamArcEagerDependencyParser = JClass('com.hankcs.hanlp.dependency.perceptron.parser.KBeamArcEagerDependencyParser')


def main():
    parser = KBeamArcEagerDependencyParser()
    tree = parser.parse("電池非常棒，機身不長，長的是待機，但是屏幕分辨率不高。")
    print(tree)
    print("第一版")
    extactOpinion1(tree)
    print("第二版")
    extactOpinion2(tree)
    print("第三版")
    extactOpinion3(tree)


def extactOpinion1(tree):
    for word in tree.iterator():
        if word.POSTAG == "NN" and word.DEPREL == "nsubj":
            print("%s = %s" % (word.LEMMA, word.HEAD.LEMMA))


def extactOpinion2(tree):
    for word in tree.iterator():
        if word.POSTAG == "NN" and word.DEPREL == "nsubj":
            if tree.findChildren(word.HEAD, "neg").isEmpty():
                print("%s = %s" % (word.LEMMA, word.HEAD.LEMMA))
            else:
                print("%s = 不%s" % (word.LEMMA, word.HEAD.LEMMA))


def extactOpinion3(tree):
    for word in tree.iterator():
        if word.POSTAG == "NN":
            if word.DEPREL == "nsubj":  # ①屬性

                if tree.findChildren(word.HEAD, "neg").isEmpty():
                    print("%s = %s" % (word.LEMMA, word.HEAD.LEMMA))
                else:
                    print("%s = 不%s" % (word.LEMMA, word.HEAD.LEMMA))
            elif word.DEPREL == "attr":
                top = tree.findChildren(word.HEAD, "top")  # ②主題

                if not top.isEmpty():
                    print("%s = %s" % (word.LEMMA, top.get(0).LEMMA))


if __name__ == '__main__':
    main()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 中文依存句法分析---ddparser hanlp入門（含命名實體識別與詞性標注、關鍵詞提取、自動摘要、地名識別、依存句法分析、短語提取） [nlp] 淺層句法分析《自然語言處理入門》12.依存句法分析--提取用戶評論句法分析樹標注集使用Berkeley Parser進行句法分析自然語言中的詞法分析、語法分析、句法分析哈工大LTP語言分析：分詞、詞性標注、句法分析等 pyhanlp 兩種依存句法分類器 SpaceSyntax【空間句法】之DepthMapX學習：第三篇軟件介紹與一般分析流程圖