Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...

46
Content-based image and video analysis Tools and Libraries 19.07.2010 Hazım Kemal Ekenel, [email protected] Rainer Stiefelhagen, [email protected]

Transcript of Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...

Page 1: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Content-based image and video analysis

Tools and Libraries

19.07.2010

Hazım Kemal Ekenel, [email protected]

Rainer Stiefelhagen, [email protected]

Page 2: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Lecture overview

Labeled data is needed in almost all

discussed approaches

But: Labeling data is tedious and expensive work

We will discuss different approaches to solve this

problem

Software libraries for standard tasks can

drastically reduce development time

We will discuss software libraries for image

processing and machine learning

Content-based Image and Video Retrieval 2

Page 3: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Labels

Diverse types of data annotations needed Face recognition face bounding box, facial landmarks, id

Object recognition bounding box, object class

High-level features: images depicting concept

Genre-classification:videos of certain genre

The only way to get the labels is to get people to manually label the data It is a very boring task

People need to be compensated somehow

Most common way: just pay them! Can be quite expensive for large datasets

Content-based Image and Video Retrieval 3

Page 4: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

LabelMe

Collaborative labeling

To download the database, a certain number of

images have to be annotated

Label quality probably good, since annotators

have to use them

Public (everyone can join)

A large database (460000 labeled objects of all

kinds)

http://labelme.csail.mit.edu/

Credit: Franziska Kraus

Content-based Image and Video Retrieval 4

Page 5: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Annotation Tool

Draw polygons and name the labeled object

Content-based Image and Video Retrieval 5

Page 6: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Annotation Tool

Quality of polygons and names is not ensured by supervision → still sufficient A lot of images have more than 80% of their pixels

labeled

Several different object categories in many images

Content-based Image and Video Retrieval 6

Page 7: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Annotation Tool

Objects are mostly labeled

completely despite of

partial occlusions

Object-parts hierarchies

Depth-ordering

Content-based Image and Video Retrieval 7

Page 8: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Browse Database

Content-based Image and Video Retrieval 8

Page 9: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Query Database

Content-based Image and Video Retrieval 9

Page 10: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Query Database

Label names are up to the user

→ no consistency

Use

WordNet

(dictionary)

to group

categories

Content-based Image and Video Retrieval 10

Page 11: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Object-parts hierarchies

Polygons with high overlap either Object-part hierarchy (head body)

Occlusion

For a given query (e.g. “car”) check for polygons that often have a high overlap (e.g. “wheel”)

Compute score as percentage of images where part (“wheel”) has high overlap with object (“car”)

List of object-part candidates

Content-based Image and Video Retrieval 11

Page 12: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Occlusion / Depth-ordering

Simple heuristics

Some things can never occlude other objects (sky)

An object completely contained in another is on

top

May be wrong if containing object is transparent etc.

The polygon with more control points in the

intersecting area is on top

Use color histograms: compare histogram in

overlapping region to the two other regions

More similar region is on top

Combined heuristic achieves 2.9% error

Content-based Image and Video Retrieval 12

Page 13: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Semi-automatic labeling

Use available labels to train detectors

Run detector on unlabeled data

Let user verify the generated labels

Content-based Image and Video Retrieval 13

Page 14: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Summary of LabelMe

Advantages Motivation to label images is given

To download the database a user first has to label a certain number of images

→ Data quality is probably good

Disadvantages Only a few people are interested in the labels

They are the only ones providing the labels

Is it be possible to get people with no interest in the labels to annotate the images (without paying them)?

Content-based Image and Video Retrieval 14

Page 15: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Human Computation

Humans can solve problems that computers

can't solve yet

Simple example: captchas

Humans are way better than computers in

labeling and tagging images

How do we get them to do that?

Do people on the Internet have enough time?

Content-based Image and Video Retrieval 15

Page 16: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Human Computation

“People all over the world spent 9 billion hours playing solitaire in 2003” Luis von Ahn, 2006

Construction of the Empire State Building took ~ 6.8h playing solitaire(7 million human-hours)

Building the Panama Canal took less than 1 day of playing solitaire(20 million human-hours)

Content-based Image and Video Retrieval 16

Page 17: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Human Computation

Humans have a lot of time

Humans can easily label and tag images

BUT you'll have to pay them to do that for you

Solution – Create a game that encourages

people to label images

Or any other task you want them to do

Content-based Image and Video Retrieval 17

Page 18: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Games With A Purpose

http://www.gwap.com/

Content-based Image and Video Retrieval 18

Page 19: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

ESP-Game

Game for tagging images from

the Web

Users see a picture and have

to agree on a tag for what the

picture contains

→ A lot of image-tag pairs, but

no information about location

or size of objects in the image

Content-based Image and Video Retrieval 19

Page 20: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

ESP-Game

Content-based Image and Video Retrieval 20

Page 21: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

User statistics

In the first four months after release

13,630 people played the game

80% of them played more than once

1,271,451 labels for 293,760 images were

generated

33 people played more than 1000 games (>50h)

Extrapolating

5000 people playing 24h/d could label all images

in Google image search in one month!

In popular online gaming sites, many more players

are online at a time (>100,000 players)

Content-based Image and Video Retrieval 21

Page 22: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Label quality

Search for labels For 10 random labels, images with this label were

displayed

For all returned images, the label made sense

Very high precision

Manually labeled images For all images at least 83% of the ESP game tags

were also used by the manual annotators

For all images the three most common tags used by the manual annotators were also in the ESP game tags

Manual quality assessment People would use 85% of the ESP game tags to

describe the corresponding image

Only 1.7% of the tags don’t make sense as description of the image

Content-based Image and Video Retrieval 22

Page 23: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Peekaboom

Game for locating objects in images

Improves data collected by the ESP Game

Two random players are paired up

Boom (Player 1) reveals parts of the image

Peek (Player 2) guesses the associated word

Role switch on successful guess

Playing “bots” for uneven number of players

or when people quit the game

Content-based Image and Video Retrieval 23

Page 24: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Game Overview

Content-based Image and Video Retrieval 24

Page 25: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Pings

Boom can “ping”

parts of the revealed

object to point out

particular parts

Content-based Image and Video Retrieval 25

Page 26: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Word – Image Relation

Boom can give hints about how the word

relates to the image

Content-based Image and Video Retrieval 26

Page 27: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Image Metadata

For each image-word pair, metadata is collected How the word relates to the image

Hints

Necessary pixels to guess the word Area that is revealed

Pixels inside the specified object Pings

Most salient aspects of objects in the image Sequence of Boom's clicks

Elimination of poor image-word pairs Many pairs of players click “pass”

Content-based Image and Video Retrieval 27

Page 28: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Cheating

People could try to violate the game by

logging in at the same time

Tell each other which words to type

→ reveal the wrong parts and type the right

word anyway

Multiple anti-cheating mechanisms

Content-based Image and Video Retrieval 28

Page 29: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Anti-Cheating Mechanisms

Player queue Player has to wait n seconds until he is paired up

IP address checks

Seed images Images with hand-verified metadata in the game

Limited freedom to enter guesses Guess field used for communication

Only letters, only words in the dictionary

Aggregating data from multiple players

Content-based Image and Video Retrieval 29

Page 30: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Implementation

Spelling check Incorrect words are displayed

in a different color

Inappropriate word replacement Substituted with words like

“ILuvPeekaboom”

Top scores list and ranks Top scores of the day and of

all time

Users get a rank based on their total number of points

Content-based Image and Video Retrieval 30

Page 31: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Additional Applications

Improving image-search results

Peekaboom gives estimate of the fraction of the

image that is related to the word

→ use these fractions to order image results

Use ping data for pointing to objects

Object bounding boxes

Image search engine with highlighted results

Content-based Image and Video Retrieval 31

Page 32: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Ping Accuracy

Use ping data for pointing

Arrow lines pointing to the

objects

Pings selected at random

100% accuracy shown in

experiment

Content-based Image and Video Retrieval 32

Page 33: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Bounding Box Accuracy

Bounding box created

with Peekaboom

Lowest overlap with user-

created box was 50%

Content-based Image and Video Retrieval 33

Page 34: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

User Statistics

How enjoyable is the game?

14,000 different people and 1.1 million pieces of

data during the first month

User comments:

Content-based Image and Video Retrieval 34

“One unfortunate side effect

of playing so much in such a

short time was a mild case of

carpal tunnel syndrome in my

right hand and forearm, but

that dissipated quickly.”

“This game is like crack. I've been

Peekaboom-free for 32 hours. Unlike other

games, Peekaboom is cooperative.”

“[...] I would say that it gives the same gut feeling as

combining gambling with charades while riding on a

roller coaster. The good points are that you increase

and stimulate your intelligence, you don't lose all your

money and you don't fall off the ride. The bad point is

that you look at your watch and eight hours have just

disappeared!”

Page 35: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Summary of Peekaboom

Peekaboom works because people like to

play games

Experiments show that results are sufficiently

accurate

In addition to the ESP data, information about

location and size of objects in the image is

retrieved

Content-based Image and Video Retrieval 35

Page 36: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Other “Games with a purpose”

Tag a tune Some piece of music is played to you and partner

You must describe the music with words

Based on your partner’s description you have to decide whether you two are listening to the same song

Verbosity Describe a word to the partner using other words

Partner must guess secret word

Squigl Trace the outline of an object in the same way as your partner

Very limited time (5-10 seconds)

Matchin Decide which one of two images you like best

Points when partner agrees

Popvideo You and your partner are shown a video clip

You have to enter tags describing the video (and audio!)

Points when tags match

Content-based Image and Video Retrieval 36

Page 37: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Publicly available labeled datasets

Some datasets are available for free in order

to advance research

Caltech 101/256, AR, …

Some evaluation campaigns generate a lot of

labeled data and provide it

for everybody: PASCAL VOC challenge

for participants: TRECVID, ImageCLEF

Content-based Image and Video Retrieval 37

Page 38: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Caltech 101/256

Freely available http://vision.caltech.edu/Image_Datasets/Caltech101

http://vision.caltech.edu/Image_Datasets/Caltech256

Pictures of 101/256 object categories

Labels and outlines of the objects

Content-based Image and Video Retrieval 38

Page 39: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

PASCAL

Collection of object recognition databases with ground truth

VOC challenge uses it

Freely available http://pascallin.ecs.soton.ac.uk/challenges/VOC/databa

ses.html

Content-based Image and Video Retrieval 39

Page 40: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Software libraries

Many systems use standard algorithms Linear algebra

Image processing

Image filters, image features, etc.

Machine learning

SVMs, PCA, LDA, etc.

Advantages of using standard libraries Avoid errors in own implementation

Leverage know-how of other people

The implementation details of simple algorithms can be quite tricky sometimes!

Save a lot of development time

More time to work on your algorithm

Content-based Image and Video Retrieval 41

Page 41: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

OpenCV

Open Computer Vision library http://sourceforge.net/projects/opencvlibrary/

http://opencv.willowgarage.com/

Features C-like interface (Version 2.0 is more C++)

Linear algebra

Image processing functions Filters, FFT, DCT, etc.

Image features SURF, HOG, Haar-like features (mainly in 2.0/SVN)

Detectors Haar-cascades (training & detection)

Machine learning SVMs, Neural nets, decision trees, GMMs, …

Much much more…

Content-based Image and Video Retrieval 42

Page 42: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

OKAPI

Open Karlsruhe library for processing of images http://cvhci.anthropomatik.kit.edu/okapi

Features C++

Cameras (V4L, Firewire, VfW)

Videos (frame-accurate random access)

Image Features (DCT, Gabor, LBP, MCT, …)

Linear Projections (PCA, LDA, RCA)

Detector (using MCT features, very fast)

SVMs (libsvm, liblinear)

3D geometry functions

Simple GUI for prototypes

Many utility functions (timing, XML, in-memory imageIO, …)

Content-based Image and Video Retrieval 43

Page 43: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

SVMs

Libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Simple and standard

Liblinear (linear SVMs) http://www.csie.ntu.edu.tw/~cjlin/liblinear/

Much faster for linear SVMs

SVMlight http://svmlight.joachims.org/

Also very popular

Shogun toolbox http://www.shogun-toolbox.org/

Many kernel functions and Multi-Kernel Learning (MKL)

Uses libsvm or SVMlight in the background

Content-based Image and Video Retrieval 44

Page 44: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Machine Learning

Weka (Java) http://www.cs.waikato.ac.nz/ml/weka/

Java-ML (Java) http://java-ml.sourceforge.net/

Torch (C/Lua) http://torch5.sourceforge.net/

MLC++ (C++) http://www.sgi.com/tech/mlc/

MLPACK / FASTlib http://mloss.org/software/view/152/

http://fastlib.analytics1305.com/

Spider (Matlab) http://www.kyb.tuebingen.mpg.de/bs/people/spider/

FLANN (Approximate k-Nearest Neighbors) http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

List of Open Source Machine Learning software http://mloss.org/

Content-based Image and Video Retrieval 45

Page 45: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

References

B.C. Russell, A. Torralba, K. Murphy, W.T. FreemanLabelMe: A Database and Web-Based Tool for Image AnnotationInternational Journal of Computer Vision, vol. 77, issue 1, May 2008

L. von Ahn, L. DabbishLabeling images with a computer gameProceedings of the SIGCHI conference on Human factors in computing systems, 2004

L. von Ahn, R. Liu, M. BlumPeekaboom: a game for locating objects in imagesProceedings of the SIGCHI conference on Human Factors in computing systems, 2006

Content-based Image and Video Retrieval 46

Page 46: Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people played the game 80% of them played more than once 1,271,451 labels for 293,760 images

Lecture Overview

Introduction

Visual Descriptors

Image Segementation

Classification

Shot Boundary Detection & Genre Classification

High-level Feature Detection

High-level Feature Detection II

Semantics

Event Recognition

Person Identification

Copy Detection

Search

Tools and Libraries

Content-based Image and Video Retrieval 47

Computer Vision & Machine

Learning Overview

Indexing & General concept

detection

Matching

Special topics in

concept detection

Automatic and interactive search

Annotation tools, databases,

and software libraries

Intro. to Video proc.