President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de...

48
A Higher-Level Visual Representation For Semantic Learning In Image Databases Ismail EL SAYAD 18/07/2011 Presiden t : Sophie Tison Université Lille 1 Reviewer s : Philippe Mulhem Laboratoire d'Informatique de Grenoble Zhongfei Zhang State University of New York Examinat Bernard Eurecom Sophia-Antipolis

Transcript of President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de...

Page 1: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

A Higher-Level Visual Representation For Semantic

Learning In ImageDatabases

Ismail EL SAYAD

18/07/2011President :

Sophie Tison Université Lille 1

Reviewers :

Philippe Mulhem

Laboratoire d'Informatique de Grenoble

Zhongfei Zhang

State University of New York

Examinator:

Bernard Merialdo

Eurecom Sophia-Antipolis

Advisor : Chabane Djeraba

Université Lille 1

Co-advisor :

Jean Martinet Université Lille 1

Page 2: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Overview

Introduction

Related works

Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model

(MSSA) Semantically Significant Invariant Visual Glossary

(SSIVG)

Experiments Image retrieval Image classification Object Recognition

Conclusion and perspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

2

Page 3: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Digital content grows rapidly Personal acquisition devices Broadcast TV Surveillance

Relatively easy to store, but useless if no automatic processing, classification, and retrieving

The usual way to solve this problem is by describing

images by keywords.

This method suffers from subjectivity,

text

ambiguity and the lack of automatic

annotation

Motivation

3

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 4: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Image-based representations are based on global visual features extracted over the whole image like color, color moment, shape or texture

Image-based representations

Part-based representations

Visual representations

Visual representations

Introduction Related works Our approach ExperimentsConclusion and perspectives

4

Page 5: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

The main drawbacks of Image-based representations: High sensitivity to :

▪ Scale▪ Pose ▪ Lighting condition changes ▪ Occlusions

Cannot capture the local information of an image

Part-based representations: Based on the statistics of features extracted from

segmented image regions

Visual representations

5

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 6: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Visual representationsPart-based representations (Bag of visual words)

6

Compute local descriptors

Feature clustering

Feature space

VW1

VW2

VW3

VW4

.

.

.

Visual word vocabulary

2111...

VW1

VW2

VW3

VW4

.

.

.

Frequency

VW1

VW3

VW2

VW4

VW1

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 7: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Spatial information loss Record number of occurrences Ignore the position

Using only keypoints-based Intensity descriptors: Neither shape nor color information is used

Feature quantization noisiness: Unnecessary and insignificant visual words are generated

7

Visual representations Bag of visual words (BOW) drawbacks

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 8: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Low discrimination power: Different image semantics are represented by the same

visual words

Low invariance for visual diversity: One image semantic is represented by different visual words

8

Visual representationsDrawbacks Bag of Visual words (BOW)

VW330 VW480

VW148

VW263

Introduction Related works Our approach ExperimentsConclusion and perspectives

VW1364VW1364

Page 9: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Objectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure

Higher-level visual representation Less noisy More discriminative More invariant to the visual diversity

9

Page 10: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Overview of the proposed higher-level visual representation

M S S A m o d e l

Learning the MSSA model

E - B O W

Set of images

E-BOW representation

Visual word vocabulary

building

S S I V G

SSVIWs & SSIVPs

generationSSIVG

representation

Introduction Related works Our approach ExperimentsConclusion and perspectives

10

Page 11: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Introduction

Related works Spatial Pyramid Matching Kernel (SPM) & sparse coding Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset

Our approach

Experiments

Conclusion and perspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

11

Page 12: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the

spatial information of location regions.

Yang et al. [CVPR09] SPM + sparse coding: replacing k-means in the SPM

12

Spatial Pyramid Matching Kernel (SPM) & sparse coding

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 13: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Zheng and Gao [TOMCCAP08] Visual phrase: pair of spatially adjacent local image

patches

Zhang et al. [ACM MM09] Descriptive visual phrase: selected according to the

frequencies of its constituent visual word pairs

Visual phrase & descriptive visual phrase

Introduction Related works Our approach ExperimentsConclusion and perspectives

13

Page 14: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Yuan et al. [CVPR07] Visual phrase pattern: spatially co-occurring group of

visual words

Zheng et al. [CVPR08] Visual synset: relevance-consistent group of visual

words or phrases in the spirit of the text synset

Visual phrase pattern & visual sysnet

Introduction Related works Our approach ExperimentsConclusion and perspectives

14

Page 15: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

SPM SPM +

sparse

coding

Visual phrase

Descriptive visual phrase

Visual phrase pattern

Visual synse

t

Our approa

ch

Considering the spatial location + + - - - - +Describing different local information

- - - - - - +Eliminating ambiguous visual words semantically

- - - - - - +

Efficient structure for storing visual vocabulary

- - - + + - +

Enhancing low discrimination power

- - + + + + +Tackling low invariance for visual diversity

- - - - + + +

Comparison of the different enhancements of the BOW

Introduction Related works Our approach ExperimentsConclusion and perspectives

15

Page 16: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Introduction

Related works

Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG)

Experiments

Conclusion and perspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

16

Page 17: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Enhanced Bag of Visual Words (E-BOW)

Introduction Related works Our approach ExperimentsConclusion and perspectives

17

SS IVGMSSA mode l

E -BOW

Hierarchal features

quantization

Features fusion

Set of images

E-BOW representa

tion

SURF & Edge

Context extraction

Page 18: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Enhanced Bag of Visual Words (E-BOW)Feature extraction

18

Interest points detection

SURF feature vector extraction at each interest

point

Fusion of the SURF and edge context feature vectors

HAC and Divisive Hierarchical K-Means

clustering

VW vocabulary

Collection of all vectors for the whole image set

Edge points detection

Color and position vector clustering using Gaussian

mixture model

∑3 µ3

Pi3

∑2 µ2

Pi2

Color filtering using vector median filter (VMF ) 

∑1 µ1

Pi1

Color feature extraction at each interest and edge

point

Edge Context feature vector extraction at each interest

point

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 19: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

SURF is a low-level feature descriptor Describes how the pixel intensities are distributed

within a scale dependent neighborhood of each interest point.

Good at Handling serious blurring Handling image rotation

Poor at Handling illumination change

Efficient

Enhanced Bag of Visual Words (E-BOW)Feature extraction (SURF)

19

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 20: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Edge context descriptor is represented at each interest point as a histogram : 6 bins for the magnitude of the drawn vectors to the

edge points 4 bins for the orientation angle

Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge Context descriptor)

20

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 21: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

This descriptor is invariant to :

Translation : The distribution of the edge points is measured with

respect to fixed points

Scale: The radial distance is normalized by a mean

distance between the whole set of points within the same Gaussian

Rotation: All angles are measured relative to the tangent

angle of each interest point

Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge context descriptor)

21

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 22: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Visual word vocabulary is created by clustering the observed merged features (SURF + Edge context 88 D) in 2 clustering steps:

Enhanced Bag of Visual Words (E-BOW)Hierarchal feature quantization Enhanced Bag of Visual Words (E-BOW)Hierarchal feature quantization

22

Stop clustering at

desired level k

A cluster

at k =4

Merged feature in the feature space

Hierarchical Agglomerative Clustering (HAC)

The tree is determined level by level, down to some maximum number of levels L, and each division into k parts.

Divisive Hierarchical K-Means Clustering

k clusters from HAC

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 23: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Multilayer Semantically Significant Analysis (MSSA) model

Introduction Related works Our approach ExperimentsConclusion and perspectives

23

SS IVGMSSA mode l

Generative process

Parameters

estimation

Number of latent

topics Estimatio

n

VWs semantic inference

estimation

E-BOW

Hierarchal features

quantization

Features fusion

Set of images

E-BOW representa

tion

SURF & Edge

Context extraction

Page 24: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Multilayer Semantically Significant Analysis (MSSA) model Generative Process

Different Visual aspects

Higher-level aspect: People

24

A topic model that considers this hierarchal structure is needed

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 25: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Multilayer Semantically Significant Analysis (MSSA) model Generative Process

V W

vhim

φΘ Ψ

MN 25

In the MSSA, there are two different latent (hidden) topics: High latent topic that represents the high aspects Visual latent topic that represents the visual aspects

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 26: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Probability distribution function :

Log-likelihood function :

Gaussier et al. [ ACM SIGIR05]: maximizing the likelihood can be seen as a Nonnegative Matrix Factorization (NMF) problem under the generalized KL divergence

Objective function:

Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation

26

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 27: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

KKT conditions are used to derive the multiplicative update rules for minimizing the objective function

This leads to the following multiplicative update rules :

Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation

Introduction Related works Our approach ExperimentsConclusion and perspectives

27

Page 28: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Multilayer Semantically Significant Analysis (MSSA) modelNumber of Latent Topics Estimation

Minimum Description Length (MDL) is used as a model selection criteria Number of the high latent topics (L) Number of the visual latent topics (K)

is the log-likelihood

is the number of free parameters:

28

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 29: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Semantically Significant Invariant Visual Glossary (SSIVG) representation

Introduction Related works Our approach ExperimentsConclusion and perspectives

MSSA mode l

Generative process

Parameters

estimation

Number of latent

topics Estimatio

n

VWs semantic inference

estimation

E-BOW

Hierarchal features

quantization

Features fusion

Set of images

E-BOW representa

tion

SURF & Edge

Context extraction

29

SS IVG

SSVW representa

tion

SSVPs generatio

n

SSVP representa

tion

Divisivetheoretic clustering

SSIVW representa

tion

SSVWs selection

SSIVP representa

tion

SSIVG representa

tion

Page 30: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Word (SSVW)

Estimating using MSSA

Set of relevant

visual topics

Estimating using MSSA

Set of SSVWs

Set of VWs

30

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 31: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

SSVP: Higher-level and more discriminative representation SSVWs + their inter-relationships

SSVPs are formed from SSVW sets that satisfy all the following conditions: Occur in the same spatial context Involved in strong association rules

High support and confidence Have the same semantic meaning

High probability related to at least one common visual latent topic

Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically significant Visual Phrase (SSVP)

31

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 32: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Phrase (SSVP)

Introduction Related works Our approach ExperimentsConclusion and perspectives

SSIVP12

6

SSIVP32

6

SSIVP30

4

SSIVP12

6

SSIVP32

6

SSIVP30

4

32

Page 33: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Studying the co-occurrence and spatial scatter information make the image representation more discriminative

The invariance power of SSVWs and SSVPs is still low

Text documents Synonymous words can be clustered into one

synonymy set to improve the document categorization performance

33

Semantically Significant Invariant Visual Glossary (SSIVG) representationInvariance Problem

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 34: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

SSIVG : higher-level visual representation composed from two different layers of representation Semantically Significant Invariant Visual Word (SSIVW)

▪ Re-indexed SSVWs after a distributional clustering Semantically Significant Invariant Visual Phrases (SSIVP)

▪ Re-indexed SSVPs after a distributional clustering

Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Invariant Visual Glossary (SSIVG) representation

Estimating using MSSA

Set of SSVWs and

SSVPs

Set of relevant

visual topics

Set of SSIVGs

Divisivetheoretic clustering

Estimating using MSSA

34

Introduction Related works Our approach ExperimentsConclusion and perspectives

Set of SSIVPs

Set of SSIVWs

Page 35: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Experiments

Introduction

Related works

Our approach

Experiments Image retrieval Image classification Object Recognition

Conclusion and perspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

35

Page 36: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

36

Assessment of the SSIVG representation performance in image retrievalAssessment of the SSIVG representation performance in image retrieval

Introduction Related works Our approach ExperimentsConclusion and perspectives

Dataset Total Nr. of images

Nr. of training images

Nr. of test images

Nr. of image categories

NUS-WIDE 269,648 161,789 107,859 81

Evaluation criteria : Mean Average Precision (MAP)

The traditional Vector Space Model of Information Retrieval is adapted The weighting for the SSIVP Spatial weighting for the SSIVW

The inverted file structure

Page 37: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

37

Assessment of the SSIVG representation Performance in image retrievalAssessment of the SSIVG representation Performance in image retrieval

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 38: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

38

Assessment of the SSIVG representation performance in image retrievalAssessment of the SSIVG representation performance in image retrieval

Introduction Related works Our approach ExperimentsConclusion and perspectives

38

Page 39: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

39

Introduction Related works Our approach ExperimentsConclusion and perspectives

Dataset # images # training images

# test images

# image categories

MIRFLICKER 25000 15000 10,000 11

Evaluation criteria : Classification Average Precision over each class

Classifiers : SVM with linear kernel Multiclass Vote-Based Classifier (MVBC)

Evaluation of the SSIVG representation in image classificationEvaluation of the SSIVG representation in image classification

Page 40: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

The final voting score for a high latent topic :

is

Each image is categorized according to the dominant high latent

Evaluation of the SSIVG representation in image classificationMulticlass Vote-Based Classifier (MVBC)

Evaluation of the SSIVG representation in image classificationMulticlass Vote-Based Classifier (MVBC)

Introduction Related works Our approach ExperimentsConclusion and perspectives

For each , we detect the high latent topic that maximizes:

is

40

Page 41: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

41

Evaluation of the SSIVG representation performance in classificationEvaluation of the SSIVG representation performance in classification

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 42: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

42

Assessment of the SSIVG representation Performance in object recognitionAssessment of the SSIVG representation Performance in object recognition

Introduction Related works Our approach ExperimentsConclusion and perspectives

Dataset # images # training images

# test images

# image categories

Caltech101 8707 7697 1,010 101

Each test image is recognized by predicting the object class using the SSIVG representation and the MVBC

Evaluation criteria: Classification Average Precision (AP) over each object

class

Page 43: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

43

Assessment of the SSIVG Representation Performance in Object RecognitionAssessment of the SSIVG Representation Performance in Object Recognition

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 44: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Experiments

Introduction

Related works

Our approach

Experiments

Conclusion and perspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

44

Page 45: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

45

ConclusionConclusion

Enhanced BOW (E-BOW) representation Modeling the spatial-color image constitution using GMM New local feature descriptor (Edge Context) Efficient visual word vocabulary structure

New Multilayer Semantic Significance (MSSA) model Semantic inferences of different layers of representation

Semantically Significant Visual Glossary (SSIVG) More discriminative More invariant to visual diversity

Experimental validation Outperform other sate of the art works

Introduction Related works Our approach ExperimentsConclusion and perspectives

Page 46: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

46

PerspectivesPerspectives

Introduction Related works Our approach ExperimentsConclusion and perspectives

MSSA Parameters update On-line algorithms to continuously (re-)learn the

parameters

Invariance issue Context large-scale databases where large intra-

class variations can occur

Cross-modalitily extension to video content Cross-modal data (visual and textual closed captions

contents)

New generic framework of video summarization Study the semantic coherence between visual contents

and textual captions

Page 47: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

QUESTIONS ?

Thank you for your attention [email protected]

Page 48: President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

48

Parameter Settings Parameter Settings

Introduction Related works Our approach ExperimentsConclusion and perspectives