START: Natural Language Access to Information

23
START: Natural Language Access to Information Boris Katz, Gary Borchardt, Sue Felshin, Jimmy Lin, Jerome McFarland, Ali Ibrahim, Luciano Castagnola, Baris Temelkuran, Aaron Fernandes, Alp Simsek, Jonathan Wolfe, Matthew Bilotti MIT Artificial Intelligence Lab http://www.ai.mit.edu/projects/infolab/

description

START: Natural Language Access to Information. Boris Katz, Gary Borchardt, Sue Felshin, Jimmy Lin, Jerome McFarland, Ali Ibrahim, Luciano Castagnola, Baris Temelkuran, Aaron Fernandes, Alp Simsek, Jonathan Wolfe, Matthew Bilotti MIT Artificial Intelligence Lab - PowerPoint PPT Presentation

Transcript of START: Natural Language Access to Information

START: Natural Language Access

to InformationBoris Katz, Gary Borchardt, Sue Felshin, Jimmy Lin, Jerome McFarland, Ali Ibrahim, Luciano Castagnola,

Baris Temelkuran, Aaron Fernandes, Alp Simsek, Jonathan Wolfe, Matthew Bilotti

MIT Artificial Intelligence Lab

http://www.ai.mit.edu/projects/infolab/

I had a dream...

Library of Congress

?

RealityWhat we can do:

Understand ordinary sentences and questions

What we can’t do (yet):

1. Full-text NL understanding still beyond reach• Common sense implication• Intersentential reference• Summarization

2. Not all information is language—most Web resources are not textual

• Maps and Images • Sound and Video • Multimedia • Web resources are distributed across numerous non-traditional databases

Bridging the Gap

Library of Congress

+ In 1492,

Columbus sailed the ocean blue.

An object at rest tends to remain at rest.

Four score and seven years ago our forefathers brought forth

The Solution: Natural Language Annotations

Annotations bridge the gap between our ability to analyze naturallanguage sentences and our desire to access the huge amount of data available in our libraries and on the Web.

Annotations are collections of natural language sentences and phrases that describe the content of various information segments.

START

• analyzes these annotations

• creates the necessary representational structures

• produces special pointers to the information segments summarized by the annotations

Natural Language Annotations

Annotation

“Mars’s year is long.”+

Questions• “How long is the Martian year?”• “How long is a year on Mars?”• “How many days are in a Martian year?”• …

STARTknowledge base

Annotator

User

is

year long

related-to

year Mars

... one Mars year lasts 687 Earth days.

... one Mars year lasts 687 Earth days.

noun

molecule

quantity

two

det

Parsing

a noun

NP

N

PP

NPprep

converts

VP

S

A chain of reactions converts each molecule of glucose into two smaller molecules of pyruvate.

each

NP

N

PP

of glucose

into

smaller

prep NP

N

PP

molecules of pyruvate

N

V

chain

noun

reactions

of

Ternary expressions (T-expressions)

A chain of reactions converts each molecule of glucose into two smaller molecules of pyruvate.

<chain-1 related-to reactions-1><molecules-5 related-to pyruvate-1><molecules-5 quantity 2><molecules-5 is smaller><molecule-1 related-to glucose-1><molecule-1 quantifier each>

<<chain-1 convert molecule-1> into molecules-5>

into

moleculesconverts

chain

molecule

related-to

reactions glucose

related-to

pyruvate

related-to

each

quantifier

two

quantity

smaller

is

T-expression Representation

• List of node-link-node triples

• Nouns, adjectives are nodes

• Links cover:

• relationships between verbs and their arguments

• fundamental semantic relationships: “is-a” (for equality, membership, and subclass relationships), “related-to” (for possessives, etc.)

• modification of nouns: “quantifier”, “quantity”, “is” (for adjectives)

• prepositions

S-rules for Structural Variation

S-rule for the Property Factoring alternation:

emotional-reaction-

verb

someone1 someone2

with

something

related-to

someone1

someone1 emotional-reaction-verb someone2 with something

someone1’s something emotional-reaction-verb someone2

emotional-reaction-

verb

something1 someone2

something1

related-to

someone1

The president impressed the country with his determination.

The president’s determination impressed the country.

Emotional reaction verbs:

surprise stunamaze startleimpress pleaseembarrass annoyetc.

Sample Assertion

A chain of reactions converts each molecule of glucose into two smaller molecules of pyruvate.

into

moleculesconverts

chain

molecule

related-to

reactions glucose

related-to

pyruvate

related-to

each

quantifier

two

quantity

smaller

is

<chain-1 related-to reactions-1><molecules-5 related-to pyruvate-1><molecules-5 quantity 2><molecules-5 is smaller><molecule-1 related-to glucose-1><molecule-1 quantifier each>

<<chain-1 convert molecule-1> into molecules-5>

Sample Query

How are the glucose molecules converted into pyruvate molecules?

into

moleculesconverts

molecules

glucose

related-to

pyruvate

related-to

something

<molecules-5 related-to pyruvate-1><molecules-1 related-to glucose-1>

<<something convert molecules-1> into molecules-5>

Matching

Matcher

T-expressionsfrom Query

T-expressionsfrom Assertion

into

moleculesconverts

chain

molecule

related-to

reactions glucose

related-to

pyruvate

related-to

each

quantifier

two

quantity

smaller

is

something

Key: Input Processing Query Processing

A. Reply by Generating

A chain of reactions converts each molecule of glucose into two smaller molecules of pyruvate.

Generator DisplayedAnswer

Ternaryexpressions

Query: How are the glucose molecules converted into pyruvate molecules?

into

moleculesconverts

chain

molecule

related-to

reactions glucose

related-to

pyruvate

related-to

each

quantifier

two

quantity

smaller

is

Answer:

Reply by Generating: Example

B. Reply from annotation

Find resource DisplayedAnswer

Ternaryexpressions

related-to

picture Cog

Annotatedresource

+

Query: Show me a picture of Cog.

Reply from annotation: Example

C. Reply from annotation with script

directs

any-person any-IMDb-movie

+

Gone with the Wind (1939) was directed by George Cukor, Victor Fleming, and Sam Wood.

Source: The Internet Movie Database

Script•get http://us.imdb.com/Details?0031381•match regexp...

IMDb

T-exps

Run script DisplayedAnswerFind resource

Query: Who directed Gone with the Wind?

Reply from annotation with script: Example

NASA

POTUS

Webster

Uniform Access

START

NL questions

Multimediaresponses

Omnibase

Queries

Data

• Local knowledge base of ternary expressions• Core vocabulary

• Uniform interface to multiple database formats (Web, text, etc.)• Integration time independent

of size of database• Extended lexicon

U.S. Census

IMDb

How START works

Web browser

START

Parser

Matcher

English

Input T-exps

Databaseof T-exps

T-exps from KB

Generator

HTML

English

Annotations

Scripts

Omnibase(externalknowledge)

Nativeknowledge

Scripts

WWW

PotusIMDb

World Factbook

U.S. Census

Q. "I'd like to speak to Trevor."Q. "Is Trevor in his office?"

A. "Trevor is in his office but he is on the phone." A. "Trevor is in his office but he is talking to Boris now." A. "Trevor is in his office; however, he doesn't want to be disturbed until 2pm."

Multi-Modal Interaction