Presentation 17 may morning keynote cees snoek

22-‐05-‐13

1

A Rose'a Stone for Image Understanding

Cees Snoek

University of Amsterdam

The Netherlands

Euvision Technologies

The Netherlands

A classical problem

Understanding was lost from 394CE to 1822

22-‐05-‐13

2

RoseEa Stone discovery in 1799

A decree by King Ptolemy V – Hieroglyphs – DemoMc script – Ancient Greek

Key to decipherment in 1822

JF Champollion

RECOGNIZING WORDS Understanding images

Mazloom et al., ICMR 2013

22-‐05-‐13

3

How difficult is the problem?

Human vision consumes 50% brain power…

Van Essen, Science 1992

Visual labeling in a nutshell

Visualization by Jasper Schulte

22-‐05-‐13

4

Visual labeling by machine

Encode Reduce

Encode Reduce

Learn

Label

InternaMonal compeMMon

NIST TRECVID Benchmark

Promote progress in video retrieval research

Open data, tasks, evaluaMon and innovaMon

hEp://trecvid.nist.gov/

22-‐05-‐13

5

Are we making progress?

• 1000+ others

x MediaMill team

MediaMill team, TRECVID 2004-2012

Performance doubled in just 3 years Snoek & Smeulders, IEEE Computer 2010

So&ware licensed by Euvision Technologies

22-‐05-‐13

6

MediaMill video search engines

Learning from social-‐tagged images Xirong Li et al, TMM 2009

Exploit consistency in tagging behavior of different users for visually similar images

22-‐05-‐13

7

Tag relevance

ObjecMve tags are idenMfied and reinforced

Based on 3.5 Million images downloaded from Flickr

RECOGNIZING SENTENCES Understanding images


22-‐05-‐13

8

Human event descripMon on web video

We analyze 13K web videos and their descripMons

People compe:ng in a sand sculp:ng compe::on and children playing on the beach.

A woman folds and packages a scarf she has made.

Habibian et al., ICMR 2013

Human concept-‐vocabulary

Consists of 5K disMnct and mostly rare concepts Includes general and specialized concepts It is composed of various concept types

0 10 20 30 40 50

Non Visual

Attribute

Scene

Action

Object

Portions (in %)

Anim

al

Peop

le

22-‐05-‐13

9

Concepts categorized by type

Object

People

Animal

Scene

AcDon

A'ribute

From concepts to sentences

Input Video

Event Models

Concept 1

Concept 2

Concept K

…

Concept Vocabulary

Train SVM

Crea9ng the concept vocabulary is cri9cal

Sadanand, CVPR12 Merler, TMM12 Althoff, MM12

AEempMng a board trick

22-‐05-‐13

10

Video sentence examples

ABemp9ng a board trick

Working on a woodworking project

Changing a vehicle 9re

Are more concepts beEer?

In general, more is beBer. But, a vocabulary of 500 concepts exists that outperforms all others


22-‐05-‐13

11

Results for “Landing a fish in”

A vocabulary of 100 concepts is the best performer

InformaMve concepts vs All concepts

The 23% most informa9ve concepts lead to a 65% rela9ve increase in event detec9on accuracy.

22-‐05-‐13

12

What concepts are informaMve

Font size correlates with informativeness

Wedding Ceremony Landing a Fish

Visual translaMon

Represent images and text in unified semantic space

C1

Cn

C2

The 18th-‐largest country in the world in terms of area at 1 , 6 4 8 , 1 9 5 I r a n h a s a populaMon of around 75 million. It is a country of parMcular geo..

Concept Detectors (Textual) Concept Detectors (Visual)

SemanMc Space

22-‐05-‐13

13

Example: query by a video

Video translaMon

Summary of most likely translaMons

Habibian et al., submi@ed

22-‐05-‐13

14

Conclusion

AI-‐progress and human descripMons on the web act as ‘RoseEa Stone’ for image understanding.

AutomaMc metadata generaMon jumps from

words to sentences.

www.ceessnoek.info

Presentation 17 may morning keynote cees snoek

Documents

Transcript of Presentation 17 may morning keynote cees snoek