Presentation 17 may morning keynote cees snoek
-
Upload
nederlands-instituut-voor-beeld-en-geluid -
Category
Documents
-
view
124 -
download
0
Transcript of Presentation 17 may morning keynote cees snoek
22-‐05-‐13
1
A Rose'a Stone for Image Understanding
Cees Snoek
University of Amsterdam
The Netherlands
Euvision Technologies
The Netherlands
A classical problem
Understanding was lost from 394CE to 1822
22-‐05-‐13
2
RoseEa Stone discovery in 1799
A decree by King Ptolemy V – Hieroglyphs – DemoMc script – Ancient Greek
Key to decipherment in 1822
JF Champollion
RECOGNIZING WORDS Understanding images
Mazloom et al., ICMR 2013
22-‐05-‐13
3
How difficult is the problem?
Human vision consumes 50% brain power…
Van Essen, Science 1992
Visual labeling in a nutshell
Visualization by Jasper Schulte
22-‐05-‐13
4
Visual labeling by machine
Encode Reduce
Encode Reduce
Learn
Label
InternaMonal compeMMon
NIST TRECVID Benchmark
Promote progress in video retrieval research
Open data, tasks, evaluaMon and innovaMon
hEp://trecvid.nist.gov/
22-‐05-‐13
5
Are we making progress?
• 1000+ others
x MediaMill team
MediaMill team, TRECVID 2004-2012
Performance doubled in just 3 years Snoek & Smeulders, IEEE Computer 2010
So&ware licensed by Euvision Technologies
22-‐05-‐13
6
MediaMill video search engines
Learning from social-‐tagged images Xirong Li et al, TMM 2009
Exploit consistency in tagging behavior of different users for visually similar images
22-‐05-‐13
7
Tag relevance
ObjecMve tags are idenMfied and reinforced
Based on 3.5 Million images downloaded from Flickr
RECOGNIZING SENTENCES Understanding images
Mazloom et al., ICMR 2013
22-‐05-‐13
8
Human event descripMon on web video
We analyze 13K web videos and their descripMons
People compe:ng in a sand sculp:ng compe::on and children playing on the beach.
A woman folds and packages a scarf she has made.
Habibian et al., ICMR 2013
Human concept-‐vocabulary
Consists of 5K disMnct and mostly rare concepts Includes general and specialized concepts It is composed of various concept types
0 10 20 30 40 50
Non Visual
Attribute
Scene
Action
Object
Portions (in %)
Anim
al
Peop
le
22-‐05-‐13
9
Concepts categorized by type
Object
People
Animal
Scene
AcDon
A'ribute
From concepts to sentences
Input Video
Event Models
Concept 1
Concept 2
Concept K
…
Concept Vocabulary
Train SVM
Crea9ng the concept vocabulary is cri9cal
Sadanand, CVPR12 Merler, TMM12 Althoff, MM12
AEempMng a board trick
22-‐05-‐13
10
Video sentence examples
ABemp9ng a board trick
Working on a woodworking project
Changing a vehicle 9re
Are more concepts beEer?
In general, more is beBer. But, a vocabulary of 500 concepts exists that outperforms all others
Mazloom et al., ICMR 2013
22-‐05-‐13
11
Results for “Landing a fish in”
A vocabulary of 100 concepts is the best performer
InformaMve concepts vs All concepts
The 23% most informa9ve concepts lead to a 65% rela9ve increase in event detec9on accuracy.
22-‐05-‐13
12
What concepts are informaMve
Font size correlates with informativeness
Wedding Ceremony Landing a Fish
Visual translaMon
Represent images and text in unified semantic space
C1
Cn
C2
The 18th-‐largest country in the world in terms of area at 1 , 6 4 8 , 1 9 5 I r a n h a s a populaMon of around 75 million. It is a country of parMcular geo..
Concept Detectors (Textual) Concept Detectors (Visual)
SemanMc Space
22-‐05-‐13
13
Example: query by a video
Video translaMon
Summary of most likely translaMons
Habibian et al., submi@ed
22-‐05-‐13
14
Conclusion
AI-‐progress and human descripMons on the web act as ‘RoseEa Stone’ for image understanding.
AutomaMc metadata generaMon jumps from
words to sentences.
www.ceessnoek.info