Li- Jia Li, Richard Socher , Li Fei-Fei

43
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei-Fei 1

description

Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework. Li- Jia Li, Richard Socher , Li Fei-Fei. City Travel. Pagoda. Sunrise Sunshine Sun. Weber et al 00 Fergus et al 03 Felzenswalb et al 04. Classification. City Travel. - PowerPoint PPT Presentation

Transcript of Li- Jia Li, Richard Socher , Li Fei-Fei

Page 1: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Towards Total Scene Understanding:Classification, Annotation and

Segmentation in an Automatic Framework

Li-Jia Li, Richard Socher, Li Fei-Fei

1

Page 2: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

2

City Travel

Pagoda

SunriseSunshine

Sun

Page 3: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

3

City Travel

Pagoda

SunriseSunshine

Sun

Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06

Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06

Duygulu et al 02

Barnard et al 03

Blei et al 03Gupta et al 08

Alipr Li et al 03Sudderth et al 05

Segmentation

Classification

AnnotationRemark: Approaches in yellow will be used to compare withour model in later Experiments.

Page 4: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

4

City Travel

Pagoda

SunriseSunshine

Sun

Weber et al 00Fergus et al 03Felzenswalb et al 04Fei-Fei et al 05Sivic et al 05Bosch et al 06Oliva et al 01Lazebnik et al 06

Shi et al 00Felzenszwalb et al04Sali et al 99Winn et al 05Kumar et al 05Cao et al 07Russell et al 06Todorovic et al 06

Duygulu et al 02

Barnard et al 03

Blei et al 03Gupta et al 08

Alipr Li et al 03Sudderth et al 05

Segmentation

Classification

Annotation

UTotal Scene Understandi

ng

Page 5: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Application

5

Page 6: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

6

Classification Annotation Segmentation

Mutually beneficial!

Page 7: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

7

AthleteHorseGrassTreesSkySaddle

Classification Annotation Segmentation

HorseHorse

class: Polo

Page 8: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

8

Horse

Horse

Horse

HorseHorse

SkyTree

Grass

AthleteHorseGrassTreesSkySaddle

Classification Annotation Segmentation

Horse

Athleteclass: Polo

Page 9: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

9

class: Polo

Horse

Horse

Horse

HorseHorse

AthleteHorseGrassTreesSkySaddle

Classification Annotation Segmentation

Page 10: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

10

Related Work:

Tu et al 03

AnnotationSegmentation

Horse

Horse

Horse

HorseHorse

SkyTree

GrassHorse

Athlete

Li & Fei-Fei 07

AnnotationClassification

Sky

GrassHorse

AthleteHorse

Horse

Horse

HorseHorse

Class: Polo

ClassificationSegmentation

Tree

Heitz et al 08

Class: Polo

Page 11: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Learning

Model

Recognition & Experiment

Outline

Classification

Annotation Segmentation

Page 12: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

12

C

Nr

O

RNF

XAr

NtZ

S

T

D

AthleteHorseGrassTreesSkySaddle

Page 13: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

13

C

Visual

Text

class: Polo

AthleteHorseGrassTreesSkySaddle

Joint distribution of random variable Visual Component

Text Component.

D

Page 14: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

14

O

14

Text Component.

D

Visual

TextC

class: Polo

Page 15: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

15

RNF

Color LocationTexture Shape

Text Component.

O

D

Visual

TextC

class: Polo

Page 16: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

RNF

O

D

Visual

TextC

class: Polo

16

XAr

Text Component.

Page 17: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

AthleteHorseGrassTreesSkySaddle

Text Component.

Page 18: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

.

S AthleteHorseGrassTreesSkySaddle

AthleteHorseGrassTreesSkySaddle

VisibleNot visible

“Switch variable”

Horse

Horse

Horse

HorseHorse

Athlete

Horse

Page 19: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

RNF

O

D

Visual

TextC

class: Polo

XAr ZNr Nt “Connector variable”

S AthleteHorseGrassTreesSkySaddle

VisibleNot visible

“Switch variable”

T

Horse

.

Page 20: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Visual Text C

Nr

O

RNF

XAr

NtZ

S

TLearning

Model

Recognition & Experiment

Outline

Page 21: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

21

Learning

Exact Inference is Intractable !

Relationship of the random variables

Visual

Text C

Nr

O

RNF

XAr

NtZ

S

T

Page 22: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

22

Relationship of the random variables

Visual

Text C

Nr

O

RNF

XAr

NtZ

S

T

Top-down force

Bottom-up force from visual information

Bottom-up force from text information

Collapsed Gibbs Sampling

(R. Neal, 2000)

Page 23: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

23

Scene/Event imagesfrom the Internet

There is no object-text correspondence…

AthleteHorseGrassTree

Saddle

Page 24: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

24

Scene/Event imagesfrom the Internet

Our model builds the correspondence…

C

Nr

O

RNF

XAr

NtZ

S

T

D

AthleteHorseGrassTree

Saddle

Page 25: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

25

AthleteHorseGrassTreesSkySaddle

AthleteHorseGrassBall

However, a big obstacle is: many objects always co-occur together

??

?

Scene/Event imagesfrom the Internet

Page 26: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

26

C

RNF

XAr Nr Z

Nt

T

S

O

One solution: some good initialization of O

Grass

Athlete

Horse

AthleteHorseGrassTreesSkySaddle

Scene/Event imagesfrom the Internet

Page 27: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

27

Scene/Event imagesfrom the Internet

Initializing O: obtain internet images for each O Object images

Page 28: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

28

Scene/Event images

C

RNF

XAr Nr Z

Nt

T

SO

Any object

detection&

segmentation

Algorithm

D

Initializing O: train an object detector for each OObject imagesEvent/Scene images

Page 29: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

29

Scene/Event images

…Black box

object detection& segmentation

Black box object detection& segmentation

C

RNF

XAr Nr Z

Nt

T

SO

D

Initialize O in the scene image by the trained object detectors

Object imagesEvent/Scene images

Any object

detection&

segmentation

Algorithm

Page 30: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

30

Scene/Event images

…Black box

object detection& segmentation

Black box object detection& segmentation

C

RNF

XAr Nr Z

Nt

T

SO

Black box object detection& segmentation

D

Initialize O in the scene image by the trained object detectors

Cao & Fei-Fei, 2007

θ C

XR

O

NrAr

Our Model

Object imagesEvent/Scene images

Page 31: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

C

RNF

XAr Nr Z

Nt

T

SO

D

Auto-semi-supervised learning: Small # of initialized images + Large # of uninitialized images

Our Model + AthleteHorseGrassTree

SaddleWind

Small # of initialized images

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

SnowboardLarge # of uninitialized images

Scene/Event images

Page 32: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

AthleteHorseGrassTree

SaddleWind

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

Snowboard

Large # of uninitialized images

Visual Text C

Nr

O

RNF

XAr

NtZ

S

T

Learning Model

Recognition & Experiment• Dataset• Learned Model• Results

OutlineSmall # of automatically initialized images

Page 33: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Badminton

Bocce

Croquet

Polo

33

8 Event/Scene Classes

Remark: Tags are not used during testing

Page 34: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

Rockclimbing

Rowing

Sailing

Snowboarding

34

8 Event/Scene Classes

Page 35: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

35

C

Nr

RNF

XAr

NtZ

S

T

Learned model: O

D

O

Page 36: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

36

Athlete

Grass

Horse

C

Nr

O

NF

XAr

NtZ

S

T

D

R

Learned model: R

Page 37: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

37

C

Nr

O

RNF

XAr

NtZ

T

D

S

Learned model: S

Page 38: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

38

8 way classification: 54%

Classification Annotation Segmentation

Page 39: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

39

Classification Annotation Segmentation

Alipr: Li et al 03 Corr LDA: Blei et al 03

Page 40: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

40

Classification Annotation Segmentation

Page 41: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

41

Effect of top-down class context

Horse

C

O

R X Z

T

SO

R X Z

T

S

Model w/o top-down class Full Model

Page 42: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

AthleteHorseGrassTree

SaddleWind

AthleteRockGrassTree

SkyRope

AthleteSnow

TreeSky

Snowboard

Large # of uninitialized images

Small # of automatically initialized images

Visual Text C

Nr

O

RNF

XAr

NtZ

S

T

Sky

AthleteTree

Mountain

Rock Class: Rock

climbingAthleteMountainTreeRockSkyAscent

Sky

Athlete

Water

Tree sailboat

Class: SailingAthleteSailboatTreeWaterSkyWind

Learning Model

Recognition & Experiment

Tree

AthleteSnowboard

Snow

Class: Snowboarding

AthleteSnowboardTreeSnowSkyPowder

Page 43: Li- Jia  Li, Richard  Socher , Li  Fei-Fei

43

ThankProf. Silvio Savarese , Juan Carlos Niebles, Chong Wang, Barry Chai, Min Sun, Bangpeng Yao, Hao Su, Jia Deng, anonymous reviewers

And You