Sharing the How (and not the What) - uml.edubsun/tlworkshop2015_talks/percy... · would just...

85
Sharing the ”How” (and not the ”What”) Percy Liang NIPS Workshop on Transfer Learning December 12, 2015

Transcript of Sharing the How (and not the What) - uml.edubsun/tlworkshop2015_talks/percy... · would just...

Sharing the ”How” (and not the ”What”)

Percy Liang

NIPS Workshop on Transfer Learning

December 12, 2015

What is shared between tasks?

Task A Task B

1

What is shared between tasks?

Task A Task B

1

detecting trucks detecting cars

2

detecting trucks detecting cars

sharing low-level representations (”what”)

2

tallest mountain

in Quebec

most popular restaurant

in Palo Alto

3

tallest mountain

in Quebec

most popular restaurant

in Palo Alto

sharing abstractions (”how”)

3

Natural language interfaces

4

Natural language interfaces

Tasks parametrized by natural language:

User: Find me the best restaurants in Montreal.

Computer: Toque, Gibby’s restaurant, Le Local, ...

4

Natural language interfaces

Tasks parametrized by natural language:

User: Find me the best restaurants in Montreal.

Computer: Toque, Gibby’s restaurant, Le Local, ...

User: Find me speakers at the transfer learning workshop.

Computer: Bengio, Ben-David, Darrell, Liang, Mohri, Pontil, Salakhut-dinov, Yang

4

Natural language interfaces

Tasks parametrized by natural language:

User: Find me the best restaurants in Montreal.

Computer: Toque, Gibby’s restaurant, Le Local, ...

User: Find me speakers at the transfer learning workshop.

Computer: Bengio, Ben-David, Darrell, Liang, Mohri, Pontil, Salakhut-dinov, Yang

Background: semantic parsing in NLP [Mooney, Artzi, Kwiatkowski,Zettlemoyer, Collins, Roth, Berant, Liang, etc.]

4

Outline

Program induction

Zero-shot entity extraction

Question answering on new tables

5

Program induction

I like programs, but I wish programs

would just program themselves since

I don’t like programming.

I like <i>programs</i>, but I wish <i>programs</i>

would just <i>program</i> themselves since

I don’t like <i>programming</i>.

[Liang et al., 2010, ICML]

6

Program induction

I like programs, but I wish programs

would just program themselves since

I don’t like programming.

I like <i>programs</i>, but I wish <i>programs</i>

would just <i>program</i> themselves since

I don’t like <i>programming</i>.

Goal: Programming by Demonstration

[Lau et al. 2003; Liang et al. 2010; Gulwani et al. 2011]

If the user demonstrates italicizing the first occurrence, can we generalizeto the remaining?

[Liang et al., 2010, ICML]

6

Program induction

I like programs, but I wish programs

would just program themselves since

I don’t like programming.

I like <i>programs</i>, but I wish <i>programs</i>

would just <i>program</i> themselves since

I don’t like <i>programming</i>.

Goal: Programming by Demonstration

[Lau et al. 2003; Liang et al. 2010; Gulwani et al. 2011]

If the user demonstrates italicizing the first occurrence, can we generalizeto the remaining?

Solution: represent task by a program to be learned

1. Move to next occurrence of word with prefix program

2. Insert <i>

3. Move to end of word

4. Insert </i>

[Liang et al., 2010, ICML]

6

Program induction

I like programs, but I wish programs

would just program themselves since

I don’t like programming.

I like <i>programs</i>, but I wish <i>programs</i>

would just <i>program</i> themselves since

I don’t like <i>programming</i>.

Goal: Programming by Demonstration

[Lau et al. 2003; Liang et al. 2010; Gulwani et al. 2011]

If the user demonstrates italicizing the first occurrence, can we generalizeto the remaining?

Solution: represent task by a program to be learned

1. Move to next occurrence of word with prefix program

2. Insert <i>

3. Move to end of word

4. Insert </i>

Challenge: learn from very few examples

[Liang et al., 2010, ICML]

6

Sharing programs

min(·, ·) max(·, ·)

[Liang et al., 2010, ICML]

7

Sharing programs

min(·, ·) max(·, ·)

Find programs that share common subprograms.

• Programs do tend to share common components.

• Bayesian nonparametric prior over all programs

[Liang et al., 2010, ICML]

7

Refactoring[Liang et al., 2010, ICML]

8

Outline

Program induction

Zero-shot entity extraction

Question answering on new tables

9

Zero-shot entity extraction [ACL 2014]

Panupong (Ice) Pasupat

10

Semantic parsing on the web

Input:

• query x

hiking trails near Baltimore

• web page w

11

Semantic parsing on the web

Input:

• query x

hiking trails near Baltimore

• web page w

11

Semantic parsing on the web

Input:

• query x

hiking trails near Baltimore

• web page w

11

Semantic parsing on the web

Input:

• query x

hiking trails near Baltimore

• web page w

Output:

• list of entities y

[Avalon Super Loop, Patapsco Valley State Park, ...]

11

Logical forms: XPath expressions

html

head body

table

tr

td td td td

h1 table

tr

th th

tr

td td

... tr

td td

z = /html[1]/body[1]/table[2]/tr/td[1]

[Sahuguet and Azavant, 1999; Liu et al., 2000; Crescenzi et al., 2001]

12

Framework

x whiking trails

near Baltimore

html

head

...

body

...

13

Framework

x w

Generation

Z

hiking trails

near Baltimore

html

head

...

body

...

(|Z| ≈ 8500)

13

Framework

x w

Generation

Z

Model

z

hiking trails

near Baltimore

html

head

...

body

...

(|Z| ≈ 8500)

/html[1]/body[1]/table[2]/tr/td[1]

13

Framework

x w

Generation

Z

Model

z Execution

y

hiking trails

near Baltimore

html

head

...

body

...

(|Z| ≈ 8500)

/html[1]/body[1]/table[2]/tr/td[1]

[Avalon Super Loop, Patapsco Valley State Park, ...]

13

Features

pθ(z | x,w) ∝ exp{θ>φ(x,w, z)}

14

Features

pθ(z | x,w) ∝ exp{θ>φ(x,w, z)}

Structural Features: captures context

>

14

Features

pθ(z | x,w) ∝ exp{θ>φ(x,w, z)}

Denotation Features: captures content

hiking trails near Baltimore

Avalon Super Loop

Patapsco Valley State Park

Gunpowder Falls State Park

Rachel Carson Conservation Park

Union Mills Hike

...

>

hiking trails near Baltimore

Home

About Baltimore Tour

Pricing

Contact

Online Support

...

14

Dataset

airlines of italy

natural causes of global warming

lsu football coaches

bf3 submachine guns

badminton tournaments

foods high in dha

technical colleges in south carolina

songs on glee season 5

singers who use auto tune

san francisco radio stations

15

Dataset

airlines of italy natural causes of global warming lsu football coaches

15

Dataset statistics

2773 examples

2269 unique queries

894 unique headwords ← long tail!

1483 unique web domains ← long tail!

(6= wrapper induction)

16

Results

Baseline

(Most frequent

extraction

predicates)

Accuracy Accuracy @ 50

10

20

30

40

50

60

70

10.3

40.5

55.8

17

Correct prediction

Query: disney channel movies

/html[1]/body/div[2]/div/div/div[3]/div[1]/div/div/div/div/b

18

Ranking error

Query: doctors at emory

/html/body/div[3]/div[4]/table/tbody/tr/td[2]

Need better understanding of entities and categories!19

Coverage error

Query: hedge funds in new york

/html/body/div[3]/div[3]/div[4]/.../table/tbody/tr/td[2]/a

Need compositionality!

20

Outline

Program induction

Zero-shot entity extraction

Question answering on new tables

21

Semantic parsing on tables [ACL 2015]

Panupong (Ice) Pasupat

22

In what city did Piotr’s last 1st place finish occur?

23

How long did it take this competitor to finish the 4x400 meter relay atUniversiade in 2005?

23

Where was the competition held immediately before the one in Turkey?

23

How many times has this competitor placed 5th or better in competition?

23

Dataset

WikiTableQuestions (2108 tables, 22033 questions)

• 3929 unique column headers = relations

• Breadth: Freebase can answer only ≈ 20% of the questions

24

Dataset

WikiTableQuestions (2108 tables, 22033 questions)

• 3929 unique column headers = relations

• Breadth: Freebase can answer only ≈ 20% of the questions

• Tables in test data are not seen during training

24

Dataset

WikiTableQuestions (2108 tables, 22033 questions)

• 3929 unique column headers = relations

• Breadth: Freebase can answer only ≈ 20% of the questions

• Tables in test data are not seen during training

• Depth: crowdsourced complex questions

13.5% 20.5% 24.5% 21.5% 20%

table lookup

+ next, after, ...

+ count, total, ...

+ first, last, ...

+ top, most, ...

+ arithmetic, comparison,

alternative, multiple filters

other phenomena

24

Graph representation

Add normalization / auxiliary edges (custom functions), push resolutionto semantic parsing

25

Lambda DCS logical forms

Entity

Chicago

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

Intersect

Type.PersonuPlaceOfBirth.Chicago

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

Intersect

Type.PersonuPlaceOfBirth.Chicago

Aggregation

count(Type.Person u PlaceOfBirth.Chicago)

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

Intersect

Type.PersonuPlaceOfBirth.Chicago

Aggregation

count(Type.Person u PlaceOfBirth.Chicago)

Superlative

argmin(Type.Person u PlaceOfBirth.Chicago,DateOfBirth)

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

Intersect

Type.PersonuPlaceOfBirth.Chicago

Aggregation

count(Type.Person u PlaceOfBirth.Chicago)

Superlative

argmin(Type.Person u PlaceOfBirth.Chicago,DateOfBirth)

Anaphora

µx.Type.Person u Children.Influence.x

26

Lambda DCS logical forms

Entity

Chicago

Join

PlaceOfBirth.Chicago

Intersect

Type.PersonuPlaceOfBirth.Chicago

Aggregation

count(Type.Person u PlaceOfBirth.Chicago)

Superlative

argmin(Type.Person u PlaceOfBirth.Chicago,DateOfBirth)

Anaphora

µx.Type.Person u Children.Influence.x

Variable

argmax(Type.Person,R[λx.count(Parent.Parent.x)])26

Parsing/generation

Greece held its last Summer Olympics in which year?

?2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

R[Index].Country.Greece

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

R[Nations].Country.Greece

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

argmax(Country.Greece,Nations)

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

argmax(Country.Greece, Index)

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

... (hundreds of logical forms later) ...

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

R[Date].R[Year].argmax(Country.Greece, Index)

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

Parsing/generation

Greece held its last Summer Olympics in which year?

R[Date].R[Year].R[Prev].R[Prev].argmax(Type.Row, Index)

2004

Year City Country Nations

1896 Athens Greece 14

1900 Paris France 24

1904 St. Louis USA 12

... ... ... ...

2004 Athens Greece 201

2008 Beijing China 204

2012 London UK 204

27

x: Greece held its last Summer Olympics in which year?

z: R[Date].R[Year].argmax(Country.Greece, Index)

y: 2004

Feature vector φ(x, z) ∈ RF :

28

x: Greece held its last Summer Olympics in which year?

z: R[Date].R[Year].argmax(Country.Greece, Index)

y: 2004

Feature vector φ(x, z) ∈ RF :

Feature template Feature Value

(word, predicate) (Greece,Greece) 1

(held ,Greece) 1

(its,Greece) 1

(Greece,argmax) 1

phrase=predicate 1

... ...

(missing predicates) missing 1

... ...

(denotation size) size=1 1

(phrase,denotation type) (which,Date) 1

(which year ,Date) 1

(in which,Date) 1

... ...

28

Results

IR: Train classifer to pick answer directly from table.

WQ: Use logical complexity of Freebase work.

IR WQ Full0

10

20

30

40

50

12.7

24.3

37.1

29

Examples of correct predictions

What train was developed after the erlangener erprobungstrager?

According to the table, what is the last title that spicy horse produced?

Who finished directly after the driver who finished in 1:28.745?

Which album has the highest number of sales but doesn’t have a desig-nated artist?

How many districts have a population density of at lest 1000.0?

30

Examples of errors

Unhandled operations (19%):

• Was there more gold medals won than silver?

• Which movies were number 1 for at least two consecutive weeks?

• How many titles had the same author listed as the illustrator?

31

Examples of errors

Unhandled operations (19%):

• Was there more gold medals won than silver?

• Which movies were number 1 for at least two consecutive weeks?

• How many titles had the same author listed as the illustrator?

Table normalization:

• In what city did Piotr’s last 1st place finish occur? ...[Bangkok,Thailand]...

• How long does the show defcon 3 last? ...[2pm-3pm]...

31

Examples of errors

Unhandled operations (19%):

• Was there more gold medals won than silver?

• Which movies were number 1 for at least two consecutive weeks?

• How many titles had the same author listed as the illustrator?

Table normalization:

• In what city did Piotr’s last 1st place finish occur? ...[Bangkok,Thailand]...

• How long does the show defcon 3 last? ...[2pm-3pm]...

Lexical mismatch:

• Mexican ⇒ Mexico, airplane ⇒ Model

31

Starting from zero examples...

32

Zero inputs!

Traditional approach:

Ad-hoc (incomplete)

papers published in 2015

who wrote the origin of species?

[Wang/Berant/Liang, ACL 2015]

33

Zero inputs!

Traditional approach:

Ad-hoc (incomplete), Annotation: understand + answer (expensive)

papers published in 2015

⇒ Type.Article u PublicationDate.2015

who wrote the origin of species?

⇒ R[Author].OriginOfSpecies

[Wang/Berant/Liang, ACL 2015]

33

Zero inputs!

Traditional approach:

Ad-hoc (incomplete), Annotation: understand + answer (expensive)

papers published in 2015

⇒ Type.Article u PublicationDate.2015

who wrote the origin of species?

⇒ R[Author].OriginOfSpecies

New approach:

Grammar (complete)

argmax(Type.Article,PublicationDate)

article with the largest publication date

argmax(Type.Person,R[λx.Count(Type.Article u Author.x)])

person that is author of the most number of article

[Wang/Berant/Liang, ACL 2015]

33

Zero inputs!

Traditional approach:

Ad-hoc (incomplete), Annotation: understand + answer (expensive)

papers published in 2015

⇒ Type.Article u PublicationDate.2015

who wrote the origin of species?

⇒ R[Author].OriginOfSpecies

New approach:

Grammar (complete), Annotation: understand (cheap)

argmax(Type.Article,PublicationDate)

article with the largest publication date

⇒ what is the newest published article?

argmax(Type.Person,R[λx.Count(Type.Article u Author.x)])

person that is author of the most number of article

⇒ who has published the most articles?

[Wang/Berant/Liang, ACL 2015]

33

On-the-job learning [NIPS 2015]

34

On-the-job learning [NIPS 2015]

Maintain high accuracy Diminishing costs

35

Final remarks...

36

What is shared between tasks?

Task A Task B

37

What is shared between tasks?

Task A Task Bprograms

37

Task 1:

input representation abstraction output

Task 2:

input representation abstraction output

38

Parting words

• Programs allow transferring abstractions (think Chess/Go, analo-gies)

• Task parametrized by natural language (and examples)

• More general problem: starting from zero examples

39

Code and data

http://www-nlp.stanford.edu/software/sempre/

https://worksheets.codalab.org

Funding

Google

Microsoft

DARPA

Sloan Foundation

Thank you!

40