ACM Multimedia 2007ACM Multimedia 2007
Guo-Jun Qi, Guo-Jun Qi, Xian-Sheng HuaXian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei and , Yong Rui, Jinhui Tang, Tao Mei and Hong-Jiang ZhangHong-Jiang Zhang
Microsoft Research AsiaMicrosoft Research Asia
September 25, 2007September 25, 2007
MotivationMotivation Correlative Multi-Label AnnotationCorrelative Multi-Label Annotation Modeling correlationsModeling correlations Learning the classifierLearning the classifier Connections to Gibbs Random FieldConnections to Gibbs Random Field
Experiments Experiments Live DemoLive Demo
2
How many images and videos in the How many images and videos in the world?world?
3
May 2007: 500
millionsAug. 2007 : 1
billion2000 images
/minute
Sep. 2007 : 84
millions
70 - 80’ Manual Labeling
90’ Pure Content Based (QBE)
Now Automated Annotation
Year
Manual
Automatic
Learning-Based
1970 1980 1990 2000
Now Automated Annotation
Learning-Based
Modeling and
Learning
Classifier
Training samples
Features
Learning-based video annotation schemes
Person
Grass
Tree
Building
Road
Face
New sampleLake?
A typical strategy – A typical strategy – Individual Concept Individual Concept DetectionDetection
Annotate multiple concepts separatelyAnnotate multiple concepts separately
6
Low-Level Features
Outdoor Face PersonPeople-
MarchingRoad
Walking- Running
-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1
7
√ Person√ Street√ Building
× Beach× Mountain
√ Crowd√ Outdoor√ Walking/Running
√ Marching? Marching
Low-Level Features
Outdoor Face PersonPeople-Marchin
gRoad
Walking- Running
-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1
8
Low-Level Features
Outdoor Face PersonPeople-Marchin
gRoad
Walking- Running
-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1
Concept Model Vector
Score Score Score Score Score Score
Concept Fusion
Another typical strategy – Another typical strategy – Fusion-BasedFusion-Based Context Based Concept fusion (CBCF)Context Based Concept fusion (CBCF)
9
Low-Level Features
Outdoor Face PersonPeople-
MarchingRoad
Walking- Running
-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1
Concept Fusion
Concept Model Vector
Score Score Score Score Score Score
10
Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection
Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)
-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1
Low-Level Features
OutdoorPeople-
MarchingRoadFace Person
Walking- Running
11
Multi-Label Annotation
No correlation
Has Correlations, but uses a second step
Model concepts and correlations in one step
Individual Detectors
Fusion Based
Integrated
1st Paradigm
2nd Paradigm
3rd Paradigm
Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection
Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)
13
How to model concepts and the How to model concepts and the correlations among concept in a single correlations among concept in a single stepstep
NotationsNotations
14
Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously
1 1
15
6.0,5.0,4.0,3.0,2.0,1.0x
1:,1:,1:,1:,1: treecarbeachroadperson y
02.002.0
1.0001.0
01.0
12,2
12,2
11,2
11,2
13,1
13,1
12,1
12,1
11,1
11,1
NoYesconceptfeature
/,
-
Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously
1 1
16
6.0,5.0,4.0,3.0,2.0,1.0x
1:,1:,1:,1:,1: treecarbeachroadperson y
0010
00011,1
3,11,1
3,11,1
3,11,13,1
112,1
112,1
1,12,1
1,12,1
,,
NYNYConceptConcept
/,/2,1
-
Modeling concept and correlationsModeling concept and correlations
17
12 KDK
Learning the classifierLearning the classifier
Misclassification Error
Loss function
Empirical risk
Regularization
Introduce slackvariables
Lagrange dual
Find solution by SMO
18
Connection to Gibbs Random FieldConnection to Gibbs Random Field
Define a random field
19
Rewrite the classifier
is a random field
consists of all adjacent sites, that is, this RF is fully connected
Define energy functionDefine GRF
Connection to Gibbs Random FieldConnection to Gibbs Random Field
Rewrite the classifier
20
Define energy function
Intuitive explanation of CML
Define a random field
is a random field
consists of all adjacent sites, that is, this RF is fully connected
Define GRF
ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%)
21
ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF
((MAP=0.253MAP=0.253) 14%) 14%
22
CMLCBCFSVM
SVM CML ↑ 17%CBCF CML ↑14%
ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF
((MAP=0.253MAP=0.253) 14%) 14%
23
CMLCBCFSVM
SVM CML ↑ 131%CBCF CML ↑128%
ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF
((MAP=0.253MAP=0.253) 14%) 14%
24
CMLCBCFSVM CMLCBCFSVM CMLCBCFSVM
ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF
((MAP=0.253MAP=0.253) 14%) 14%
25
26
Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts
simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field
27
Multi-Instance Multi-Label AnnotationMulti-Instance Multi-Label Annotation Exploit correlations among concepts and among Exploit correlations among concepts and among
instances at the same timeinstances at the same time Not only can get image/frame level annotation, Not only can get image/frame level annotation,
but also can get region level annotationbut also can get region level annotation
28
Sky
MountainWater
Sands
Scenery
29
Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts
simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field
30
Top Related