3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question...

7
3/7/2009 1 Photo-based Question Answering Tom Yeh 1 , John J. Lee 1 , Trevor Darrell 2 EECS & CSAIL, MIT 1 EECS & ICIS, UC Berkeley 2 Is there any problem? How many floors? Who is the architect? How many stories? What labs are here? How tall? How tall? Who wrote this book? When was it written? Is this book good? What is the rating of this book? Is there a sequel? Who lives here? Application Approach Evaluation Approach Evaluation Application Text-based QA has become increasingly popular. Yahoo! Answers Many people found text-based QA useful. Many of our questions have been asked already. Yahoo! Answers

Transcript of 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question...

Page 1: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

1

Photo-based Question Answering

Tom Yeh1, John J. Lee1, Trevor Darrell2

EECS & CSAIL, MIT 1

EECS & ICIS, UC Berkeley 2

Is there any problem? How many floors?

Who is the architect?

How many stories?

What labs are here?

How tall?

How tall?Who wrote this book?

When was it written?

Is this book good?

What is the rating of this book?

Is there a sequel?

Who lives here?

Application

Approach

Evaluation

Approach

Evaluation

ApplicationText-based QA has become increasingly popular.

Yahoo! Answers

Many people found text-based QA useful. Many of our questions have been asked already.

Yahoo! Answers

Page 2: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

2

Text-based QA sometimes can be difficult. Photo-based QA sometimes can be more desirable.

The community is receptive of photo-based QA. Some photo-based QA can be automated.

1

2

3

4

5

6

7

Prototype 1: Adding photos to a text-based QA system Prototype 2: Adding QA to a photo-album system

4

3

5

2

1

Page 3: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

3

Prototype 3: Applying photo-based QA to mobile devices.

Application

Evaluation

Approach

Here are some photo-based questions!

Is there any problem?

How many floors?

Who is the architect?

How many stories?

What labs are here?

How tall?

How tall?

Who wrote this book?

When was it written?

Is this book good?

What is the rating of this book?

Is there a sequel? Who lives here?

Resolved Questions

Template-based QA

WWW

Is there any problem?

Layer 1

How many floors?

Who is the architect?

IR-based QA

Layer 2

Human-based QA

Layer 3 Community

Three-layer architecture for Photo-based QA

IMDBAmazonWiki …

Easiest questions are handled by Template-based QA.

Template-based QA

How many floors?

Who is the architect?Is there any problem?

BuildingsBooks

Layer 1

WWW

IMDBAmazonWiki …

Easiest questions are handled by Template-based QA.

Template-based QA

How many floors?

Who is the architect?Is there any problem?

BuildingsBooks

Layer 1

WWW

IMDBAmazonWiki …

Page 4: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

4

Easiest questions are handled by Template-based QA.

Template-based QA

Frank Gehry

How many floors?

Who is the architect?Is there any problem?

BuildingsBooks

Layer 1

WWW

IMDBAmazonWiki …

Resolved Questions

Template-based QA

WWW

Is there any problem?

BuildingsBooks

Layer 1

Frequent questions are handled by IR-based QA.

How many floors?

Who is the architect?

IR-based QAHow many stories?

What labs are here?

Layer 2

9 floors

CSAIL

How tall?

3 floors

IR-based QAResolved

Questions

Template-based QA

WWW

Is there any problem?

How many stories?

BuildingsBooks

What labs are here?

Layer 1

Layer 2

Frequent questions are handled by IR-based QA.

How many floors?

Who is the architect?

9 floors

CSAIL

How tall?

3 floors

IR-based QAResolved

Questions

Template-based QA

WWW

Is there any problem?

How many stories?

BuildingsBooks

What labs are here?

Layer 1

Layer 2

Frequent questions are handled by IR-based QA.

How many floors?

Who is the architect?

9 floors

CSAIL

How tall?

3 floors

Resolved Questions

Template-based QA

WWW

Is there any problem?

BuildingsBooks

Layer 1

Hard questions are handled by Human-based QA.

How many floors?

Who is the architect?

IR-based QAHow many stories?

What labs are here?

Layer 2

9 floors

CSAIL

How tall?

3 floors

Human-based QA

Layer 3 Community

Resolved Questions

Template-based QA

WWW

Is there any problem?

BuildingsBooks

Layer 1

How many floors?

Who is the architect?

IR-based QAHow many stories?

What labs are here?

Layer 2

9 floors

CSAIL

How tall?

3 floors

Human-based QAPeople are getting lost a lot.

Layer 3 CommunityLots of leaks.

Hard questions are handled by Human-based QA.

Page 5: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

5

Resolved Questions

Template-based QA

WWW

Is there any problem?

BuildingsBooks

Layer 1

How many floors?

Who is the architect?

IR-based QAHow many stories?

What labs are here?

Layer 2

9 floors

CSAIL

How tall?

3 floors

Human-based QAPeople are getting lost a lot.

Layer 3 CommunityLots of leaks.

Hard questions are handled by Human-based QA. Images can be indexed based on visual properties.

Cars

Books LandmarksGrocery

Items

Fashion Items

ImageIndex

It is often advantageous to partition the index.

Cars

Books LandmarksGrocery

Items

Fashion Items

Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

But, choosing the right partition is tricky.

Cars

Books LandmarksGrocery

Items

Fashion Items

?????Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

Question can help select the partition.

Cars

Books LandmarksGrocery

Items

Fashion Items

Is this granola bar delicious?

Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

Question can help filter within the partition.

Cars

Books Grocery Items

Fashion Items

Is this granola bar delicious?

Maggi Cream of Asparagus Soup

Quaker Chewy Granola Bar

Native forest Artichoke Heart Nature Valley Granola Bar, Vanilla nut

Betty Crocker Complete Meals

Tylenol Cold Multi-SymptomHot Sauce for Chilly Noodle Japanese Rice Crackers

Japanese Green Tea Pasta Sauce Blend

Kelloggs Frosted Flakes

Landmarks

Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

Page 6: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

6

Cars

Books Grocery Items

Fashion Items

Is this granola bar delicious?

Maggi Cream of Asparagus Soup

Quaker Chewy Granola Bar

Native forest Artichoke Heart Nature Valley Granola Bar, Vanilla nut

Betty Crocker Complete Meals

Tylenol Cold Multi-SymptomHot Sauce for Chilly Noodle Japanese Rice Crackers

Japanese Green Tea Pasta Sauce Blend

Kelloggs Frosted Flakes

Landmarks

Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

Question can help filter within the partition.

Cars

Books Grocery Items

Fashion Items

Is this granola bar delicious?

Maggi Cream of Asparagus Soup

Quaker Chewy Granola Bar

Native forest Artichoke Heart Nature Valley Granola Bar, Vanilla nut

Betty Crocker Complete Meals

Tylenol Cold Multi-SymptomHot Sauce for Chilly Noodle Japanese Rice Crackers

Japanese Green Tea Pasta Sauce Blend

Kelloggs Frosted Flakes

Landmarks

Image Index 3

Image Index 1

Image Index 2

Image Index 4

Image Index 5

Question can help filter within the partition.

Application

Approach

Evaluation

Evaluation is based on a dataset of 30,000+ images.

Sample match results Image matching may perform poorly without any filtering.

Page 7: 3/7/2009 - Clemson Universityjzwang/ustc11/photo-based-QA.pdf · 3/7/2009 1 Photo-based Question Answering Tom Yeh 1, John J. Lee , Trevor Darrell2 EECS & CSAIL, MIT 1 EECS & ICIS,

3/7/2009

7

Category-based filtering can improve performance. Category-based filtering can improve performance.

Application Approach Evaluation

Questions?