語義_NLU

本文轉載自查看原文 2021-08-04 17:29 108 語義

History
Scope and context
Components and architecture

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension.

自然語言理解(NLU)或自然語言解釋(NLI)是人工智能中自然語言處理的一個子主題。

自然語言處理是處理機器閱讀理解。

Natural-language understanding is considered an AI-hard problem.

自然語言理解被認為是一個人工智能難題。

There is considerable commercial interest in the field because of its application to automated reasoning, machine translation, question answering, news-gathering, text categorization, voice-activation, archiving, and large-scale content analysis.

由於其在自動推理、機器翻譯、問題回答、新聞收集、文本分類、語音激活、存檔和大規模內容分析中的應用，在該領域有相當大的商業利益。

History

The program STUDENT, written in 1964 by Daniel Bobrow for his PhD dissertation at MIT, is one of the earliest known attempts at natural-language understanding by a computer.

1964年，丹尼爾·伯羅為他在麻省理工學院的博士論文撰寫了“STUDENT”項目，這是已知的最早通過計算機理解自然語言的嘗試之一。

Eight years after John McCarthy coined the term artificial intelligence, Bobrow's dissertation (titled Natural Language Input for a Computer Problem Solving System) showed how a computer could understand simple natural language input to solve algebra word problems.

在約翰·麥卡錫創造了人工智能這個術語八年后，鮑勃羅的論文(標題為計算機問題解決系統的自然語言輸入)展示了計算機如何理解簡單的自然語言輸入來解決代數單詞問題。

A year later, in 1965, Joseph Weizenbaum at MIT wrote ELIZA, an interactive program that carried on a dialogue in English on any topic, the most popular being psychotherapy.

一年后的1965年，麻省理工學院的約瑟夫·韋森鮑姆寫了《伊萊扎》，這是一個互動節目，用英語就任何話題進行對話，最受歡迎的是心理治療。

ELIZA worked by simple parsing and substitution of key words into canned phrases and Weizenbaum sidestepped the problem of giving the program a database of real-world knowledge or a rich lexicon.

伊萊扎通過簡單的解析和將關鍵詞替換成固定短語來工作，韋森鮑姆回避了為程序提供真實世界知識數據庫或豐富詞典的問題。

Yet ELIZA gained surprising popularity as a toy project and can be seen as a very early precursor to current commercial systems such as those used by Ask.com.

然而，伊萊扎作為一個玩具項目獲得了驚人的人氣，可以被視為當前商業系統(如Ask.com使用的系統)的早期先驅。

In 1969 Roger Schank at Stanford University introduced the conceptual dependency theory for natural-language understanding.

1969年，斯坦福大學的Roger Schank引入了自然語言理解的概念依賴理論。

This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner.

這一模型部分受到悉尼·蘭姆工作的影響，被夏克在耶魯大學的學生廣泛使用，如羅伯特·威倫斯基、溫迪·萊納特和珍妮特·科洛德納。

In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input.

1970年，威廉·伍茲引入了擴展轉換網絡(ATN)來表示自然語言輸入。

Instead of phrase structure rules ATNs used an equivalent set of finite state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years.

ATNs使用遞歸調用的有限狀態自動機的等價集合來代替短語結構規則。ATNs及其更通用的稱為“通用ATNs”的格式持續使用了若干年。

In 1971 Terry Winograd finished writing SHRDLU for his PhD thesis at MIT.

1971年，特里·維諾格拉德在麻省理工學院完成了博士論文的寫作。

SHRDLU could understand simple English sentences in a restricted world of children's blocks to direct a robotic arm to move items. The successful demonstration of SHRDLU provided significant momentum for continued research in the field.

SHRDLU可以在一個兒童街區的受限世界里理解簡單的英語句子，指導機器人手臂移動物品。SHRDLU的成功展示為該領域的持續研究提供了重要動力。

Winograd continued to be a major influence in the field with the publication of his book Language as a Cognitive Process.

隨着他的《語言作為認知過程》一書的出版，維諾格拉德繼續在該領域產生重大影響。

At Stanford, Winograd would later advise Larry Page, who co-founded Google.

在斯坦福大學，維諾格拉德后來為共同創立谷歌的拉里·佩奇提供建議。

In the 1970s and 1980s the natural language processing group at SRI International continued research and development in the field.

在20世紀70年代和80年代，SRI國際的自然語言處理小組繼續在該領域進行研究和開發。

A number of commercial efforts based on the research were undertaken, e.g., in 1982 Gary Hendrix formed Symantec Corporation originally as a company for developing a natural language interface for database queries on personal computers.

基於該研究進行了許多商業努力，例如，1982年加里·亨德里克斯成立了賽門鐵克公司，最初是一家為個人計算機上的數據庫查詢開發自然語言界面的公司。

However, with the advent of mouse-driven graphical user interfaces, Symantec changed direction.

然而，隨着鼠標驅動的圖形用戶界面的出現，賽門鐵克改變了方向。

A number of other commercial efforts were started around the same time, e.g., Larry R. Harris at the Artificial Intelligence Corporation and Roger Schank and his students at Cognitive Systems Corp.

大約在同一時間，許多其他的商業努力也開始了，例如人工智能公司的拉里·哈里斯和認知系統公司的羅傑·夏克及其學生

In 1983, Michael Dyer developed the BORIS system at Yale which bore similarities to the work of Roger Schank and W. G. Lehnert.

1983年，邁克爾·戴爾在耶魯大學開發了BORIS系統，該系統與羅傑·夏克和W. G .萊納特的工作有相似之處。

The third millennium saw the introduction of systems using machine learning for text classification, such as the IBM Watson.

第三個千年出現了使用機器學習進行文本分類的系統，例如IBM Watson。

However, experts debate how much "understanding" such systems demonstrate: e.g., according to John Searle, Watson did not even understand the questions.

然而，專家們爭論這種系統證明了多少“理解”:例如，根據約翰·塞爾的說法，沃森甚至不理解這些問題。

John Ball, cognitive scientist and inventor of Patom Theory, supports this assessment.

認知科學家和帕頓理論的發明者約翰·鮑爾支持這一評估。

Natural language processing has made inroads for applications to support human productivity in service and ecommerce, but this has largely been made possible by narrowing the scope of the application.

自然語言處理已經在支持服務和電子商務中的人類生產力的應用程序中取得了進展，但這在很大程度上是通過縮小應用程序的范圍來實現的。

There are thousands of ways to request something in a human language that still defies conventional natural language processing.

有成千上萬種方法可以用人類語言來請求一些仍然無法通過傳統自然語言處理的東西。

"To have a meaningful conversation with machines is only possible when we match every word to the correct meaning based on the meanings of the other words in the sentence – just like a 3-year-old does without guesswork."

“只有當我們根據句子中其他單詞的含義將每個單詞與正確的含義進行匹配時，才有可能與機器進行有意義的對話——就像一個3歲的孩子不需要猜測一樣。”

Scope and context

The umbrella term "natural-language understanding" can be applied to a diverse set of computer applications, ranging from small, relatively simple tasks such as short commands issued to robots, to highly complex endeavors such as the full comprehension of newspaper articles or poetry passages.

“自然語言理解”這個總括術語可以應用於各種各樣的計算機應用，從相對簡單的小任務(如向機器人發出的簡短命令)到高度復雜的任務(如完全理解報紙文章或詩歌段落)。

Many real-world applications fall between the two extremes, for instance text classification for the automatic analysis of emails and their routing to a suitable department in a corporation does not require an in-depth understanding of the text,but needs to deal with a much larger vocabulary and more diverse syntax than the management of simple queries to database tables with fixed schemata.

許多現實世界的應用程序處於兩個極端之間，例如，用於自動分析電子郵件的文本分類以及將它們發送到公司的合適部門不需要對文本有深入的理解，而是需要處理比管理對具有固定模式的數據庫表的簡單查詢更大的詞匯和更多樣的語法。

Throughout the years various attempts at processing natural language or English-like sentences presented to computers have taken place at varying degrees of complexity.

多年來，各種處理自然語言或類似英語的句子的嘗試以不同的復雜程度出現在計算機上。

Some attempts have not resulted in systems with deep understanding, but have helped overall system usability.

一些嘗試沒有產生具有深刻理解的系統，但是有助於整體系統可用性。

For example, Wayne Ratliff originally developed the Vulcan program with an English-like syntax to mimic the English speaking computer in Star Trek.

例如，韋恩·拉特里夫最初用類似英語的語法開發了火神程序，以模仿《星際迷航》中說英語的計算機。

Vulcan later became the dBase system whose easy-to-use syntax effectively launched the personal computer database industry.

Vulcan后來成為dBase系統，其易於使用的語法有效地啟動了個人計算機數據庫行業。

Systems with an easy to use or English like syntax are, however, quite distinct from systems that use a rich lexicon and include an internal representation (often as first order logic) of the semantics of natural language sentences.

然而，具有易於使用的或類似英語的語法的系統與使用豐富的詞典並包括自然語言句子語義的內部表示(通常作為一階邏輯)的系統截然不同。

Hence the breadth and depth of "understanding" aimed at by a system determine both the complexity of the system (and the implied challenges) and the types of applications it can deal with.

因此，系統所針對的“理解”的廣度和深度決定了系統的復雜性(以及隱含的挑戰)和它能夠處理的應用類型。

The "breadth" of a system is measured by the sizes of its vocabulary and grammar.

一個系統的“廣度”是由它的詞匯和語法的大小來衡量的。

The "depth" is measured by the degree to which its understanding approximates that of a fluent native speaker.

“深度”是通過它的理解接近流利的母語人士的程度來衡量的。

At the narrowest and shallowest, English-like command interpreters require minimal complexity, but have a small range of applications.

在最窄和最淺的地方，類似英語的命令解釋器需要最小的復雜性，但是應用范圍很小。

Narrow but deep systems explore and model mechanisms of understanding,but they still have limited application.

狹窄但深入的系統探索和模型化理解機制，但它們的應用仍然有限。

Systems that attempt to understand the contents of a document such as a news release beyond simple keyword matching and to judge its suitability for a user are broader and require significant complexity,but they are still somewhat shallow.

試圖超越簡單的關鍵字匹配來理解文檔內容(如新聞發布)並判斷其是否適合用戶的系統范圍更廣，需要相當大的復雜性，但它們仍然有些膚淺。

Systems that are both very broad and very deep are beyond the current state of the art.

既寬又深的系統超出了當前的技術水平。

Components and architecture

Regardless of the approach used, most natural-language-understanding systems share some common components.

不管使用哪種方法，大多數自然語言理解系統都有一些共同的組成部分。

The system needs a lexicon of the language and a parser and grammar rules to break sentences into an internal representation.

該系統需要一個語言詞典、一個解析器和語法規則來將句子分解成內部表示。

The construction of a rich lexicon with a suitable ontology requires significant effort, e.g., the Wordnet lexicon required many person-years of effort.

用合適的本體構建豐富的詞典需要大量的努力，例如，Wordnet詞典需要許多人年的努力。

The system also needs theory from semantics to guide the comprehension.

該系統還需要語義學的理論來指導理解。

The interpretation capabilities of a language-understanding system depend on the semantic theory it uses.

該系統還需要語義學的理論來指導理解。

Competing semantic theories of language have specific trade-offs in their suitability as the basis of computer-automated semantic interpretation.

相互競爭的語言語義理論在作為計算機自動語義解釋基礎的適用性方面有特定的權衡。

These range from naive semantics or stochastic semantic analysis to the use of pragmatics to derive meaning from context.

這些包括從朴素語義或隨機語義分析到使用語用學從語境中推導意義。

Semantic parsers convert natural-language texts into formal meaning representations.

語義分析器將自然語言文本轉換成形式意義表示。

Advanced applications of natural-language understanding also attempt to incorporate logical inference within their framework.

自然語言理解的高級應用程序也試圖將邏輯推理納入它們的框架。

This is generally achieved by mapping the derived meaning into a set of assertions in predicate logic, then using logical deduction to arrive at conclusions.

這通常是通過將派生的意義映射到謂詞邏輯中的一組斷言中，然后使用邏輯推導得出結論來實現的。

Therefore, systems based on functional languages such as Lisp need to include a subsystem to represent logical assertions, while logic-oriented systems such as those using the language Prolog generally rely on an extension of the built-in logical representation framework.

因此，基於功能語言(如Lisp)的系統需要包含一個子系統來表示邏輯斷言，而面向邏輯的系統(如使用語言Prolog的系統)通常依賴於內置邏輯表示框架的擴展。

The management of context in natural-language understanding can present special challenges.

自然語言理解中的語境管理會帶來特殊的挑戰。

A large variety of examples and counter examples have resulted in multiple approaches to the formal modeling of context, each with specific strengths and weaknesses.

大量不同的例子和反例導致了多種形式的上下文建模方法，每種方法都有特定的優點和缺點。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 NLP VS NLU 任務型對話（一）—— NLU/SLU（意圖識別和槽值填充）智能問答中的NLU意圖識別流程梳理什么是語義化？列舉幾個語義化標簽【原創】消息隊列的消費語義和投遞語義語義SLAM的數據關聯和語義定位（一）圖像語義分割技術 HTML 5語義化標簽語義分割總結理解HTML語義化