CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding 1 Natural Language Understanding...

24
CSE 415 -- (c) S. Tanimot o, 2008 Nat ural Language Understandi 1 Natural Language Understanding Outline: Motivation Structural vs Statistical Approaches Syntax Semantics Semantic grammars Augmented Transition Nets NLU in Closed Worlds: Operational Semantics The STONEWORLD program Statistical NLP
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding 1 Natural Language Understanding...

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

1

Natural Language Understanding

Outline:

MotivationStructural vs Statistical ApproachesSyntaxSemanticsSemantic grammarsAugmented Transition NetsNLU in Closed Worlds: Operational SemanticsThe STONEWORLD programStatistical NLP

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

2

MotivationMake it easier for people to give commands to computers.

Allow computers to perform language translation.

Allow computers to listen to lectures and read books, in order alleviate the knowledge acquisition bottleneck.

Improve information retrieval services including search engines such as Google.

Integrate robots into human society.

Better understand human communication and linguistics.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

3

Structural vs Statistical Approaches

Structural Approach:Analytical approach based on the linguistic structure of language – esp. syntax as studied by Chomsky. Encompasses handcrafted lexical analyzers, parsers, semantic interpreters, and knowledge bases.Example technique: Augmented Transition Nets based on semantic grammars.

Statistical Approach:Grows out of the availability of large language corpora via the Internet, and improvements in machine learning technology.Example technique: Latent Semantic Analysis

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

4

Levels of Analysis for NLU(for both structural and statistical approaches)

(Read up from the acoustic level to the pragmatic level)

Pragmatic level (goals, intents, dialog, rhetorical structure, speech acts)

Semantic level (meaning, representation)

Syntactic level (grammar, phrase structure)

Lexical, Morphological level (words, inflections)

Phonological level (acoustic features -- phonemes)

Acoustic level (sensing, signal processing)

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

5

Syntax, Semantics, Pragmatics

By taking a more systematic approach to NLU at these levels (than was done in programs like ELIZA), we will be able to create more useful and reliable natural language interfaces.

Issues to resolve: What is the ultimate purpose of language, and how does that influence NLU?

How can the phrase structure of natural language be captured in a grammar?

How can meaning be interpreted and represented?

How can the syntax and semantics of a system be designed to match the needs of an application?

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

6

Communicating with LanguageLanguage is for communication.

Communication usually means sending and receiving information.

Sentences describe events, states of the world, objects and ideas, feelings and attitudes, and hypothetical situations.

Phrase-structure grammars provide a method of organizing the components of messages, allowing for a great variety of possible meanings.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

7

SyntaxDescribes the form, not meaning, of sentences in a language.

Syntax is traditionally described with formal systems called grammars.

A context free grammar can be specified with 4 components:G = (Σ, V, S, P)

whereΣ is a finite set of terminal symbols called the alphabet.V is a finite set of nonterminal symbols (“syntactic categories,” e.g., noun, noun-phrase, clause, etc.)S is a distinguished member of V called the start symbol (or “the initial sentential form”).P is a finite set of productions (rewrite rules). Each production has the form A b0 b1 ... bn-1 where A is a nonterminal symbol and each bi is either a terminal or nonterminal.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

8

Example Grammar from a Formal Languages Context

G = ({0, 1}, {S, A, B}, S, P), where

P = {S AS B

A 0A0

A 1

B 1B1

B 0}

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

9

Example Grammar from a Computational Linguistics context

G = ({symbols, are, tools}, {S, N, V}, S, P), where

P = {S NVN

N symbols

N tools

V are}

A derivation of a sentence from S:S NVN tools VN tools are N tools are symbolsEach item in the sequence is a sentential form.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

10

ExerciseFor each of the strings below, determine whether or not it is in L(G), the language generated by G.If it’s in the language, give a derivation.

01λ01100101S10101S101

G = ({0, 1}, {S}, S, P), whereP = {S 01S, S 10S, S 0S1, S 1S0, S 01, S 10}

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

11

SemanticsThe job of semantic analysis is to construct a representation of the meaning of a piece of NL text.

Meaning representations can be• descriptive – like definitions of words in a dictionary• operational – e.g., executable program code• anything in-between

Semantic primitives: Often the meaning of a word or small phrase consists of a reference to a node in a semantic network, such as WordNet.

Semantic compounds: More complex meanings may be represented as case frames, or (relatively) small semantic networks whose nodes in turn reference nodes in a large semantic network or dictionary.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

12

Semantics (cont.)One approach: Representation of meaning using case frames. A frame is an attribute-value structure. In a case frame, the frame has a type that usually corresponds to a verb. The particular kinds of attributes in the frame depend on the type.

“Alexander took an exam.”

Action: take (write, submit to)Agent: AlexanderObject: examinationTime: past

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

13

Semantic Analysis: InterpretationThe process of semantic analysis starts with either NL text or a parse (e.g., parse tree). It produces a representation of the meaning of the text. This process is also called “semantic interpretation” or simply “interpretation”.

One successful approach to interpretation for some computer applications involves coordinating parsing and interpretation (similar to syntax-directed translation in some programming language compilers).

For this approach, we usually need a “semantic grammar” ...

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

14

Semantic Grammar

A semantic grammar is a grammar whose syntactic categories correspond directly to groups of words whose meanings can be largely inferred from the parse.

<command> <do-word> the <job-word>

<do-word> do | perform | start | finish

<job-word> job | task | command | activity | operation

“start the activity”“do the operation”“finish the job”

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

15

Controlled Language

A controlled language is a subset of a natural language specified in a computer-based representation or formal system for the purpose of facilitating analysis or understanding by computer.

The language generated by a semantic grammar is one type of controlled language.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

16

Augmented Transition Nets

An ATN is a language processor that combines parsing and translation. It is based on a collection of transition diagrams.

<command>

<do-word>

<job-word>

<do-word> <job-word>

the

do, etc.

job, etc

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

17

Stone WorldA microworld: 2-D cellular space in which various objects can be placed. An agent “Mace” that takes commands from the user, and which inhabits the microworld. Stationary objects: pillars, wells, quarries. Portable objects: stones, gems. Actions: Mace can move and can carry objects.

A natural-language interface: Augmented transition network based on a semantic grammar.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

18

Stone World MotivationDemonstrates a full combination of syntax, semantics, actions, and responses.

An artificial, closed world permits unambiguous interpretation.

Stone World offers a substrate upon which experiments can games can be constructed.

Stone World, while simple by comparison, shares these features with the well-known research system SHRDLU, developed by Terry Winograd at MIT.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

19

Stone World’s ATN

SHOW *

LASTG1 T2 T3 T4

P2 P3

G2 G3

DOWN,

*

IT

DOWN, (PUT-VERB)

(TAKE-VERB) UP,

(GO-VERB)

*TO (DNP1)

TOWARD (DNP1)

*

G1(NP1)

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

20

Stone World’s ATN (Cont)

(ARTICLE) (OBJ-NOUN)

(ARTICLE)

(OBJ-NOUN)

(DIRECTION-NOUN)

DNP1 DNP2

NP1 NP2

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

21

Demonstration of Stone World

The Python Implementation of Stone World consists of two parts:

1. representation and methods for accessing and transforming the state of the microworld;

2. the Augmented Transition Network and other support for the natural-language interface.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

22

Sample ConversationWALK NORTH *

I UNDERSTAND YOU. OK

GO TO THE WEST *

I UNDERSTAND YOU. OK

GO WEST *

I UNDERSTAND YOU. OK

TAKE A STONE FROM THE QUARRY *

I UNDERSTAND YOU. OK

DROP THE STONE TOWARD THE EAST *

I UNDERSTAND YOU. OK

TAKE A STONE *

I UNDERSTAND YOU. OK

DROP IT TO THE NORTH *

I UNDERSTAND YOU. OK

GO SOUTH *

I UNDERSTAND YOU. OK

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

23

Statistical NLP

Statistics has long been a part of computational linguistics.

However, interest in the approach has grown rapidly during the 1990s as the Internet has grown.

Subareas include corpus-based language description, applications in improving search-engine indexing and retrieval, question answering, and data mining.

CSE 415 -- (c) S. Tanimoto, 2008 Natural Language Understanding

24

Statistical NLP (cont)

Latent Semantic Analysis (use of singular-value decomposition of large term-document matrices to create “semantic spaces” in which semantically related words and documents tend to be close together – to be presented later).