Words & Pictures Clustering and Bag of Words
Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei
Li, Rob Fergus, and Antonio Torralba
Slide 2
Announcements HW1 due Thurs, Sept 27 @ 12pm By email to
[email protected]. No need to include shopping image
[email protected] Write-up can be webpage or pdf.
Slide 3
Document Vectors Represent document as a bag of words
Slide 4
Origin: Bag-of-words models Orderless document representation:
frequencies of words from a dictionary Salton & McGill
(1983)
Slide 5
Origin: Bag-of-words models US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/ Orderless document
representation: frequencies of words from a dictionary Salton &
McGill (1983)
Slide 6
Origin: Bag-of-words models US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/ Orderless document
representation: frequencies of words from a dictionary Salton &
McGill (1983)
Slide 7
Origin: Bag-of-words models US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/ Orderless document
representation: frequencies of words from a dictionary Salton &
McGill (1983)
Slide 8
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob
Fergus, and Antonio Torralba
Slide 9
Bags of features for image classification 1.Extract
features
Slide 10
2.Learn visual vocabulary Bags of features for image
classification
Slide 11
1.Extract features 2.Learn visual vocabulary 3.Quantize
features using visual vocabulary Bags of features for image
classification
Slide 12
1.Extract features 2.Learn visual vocabulary 3.Quantize
features using visual vocabulary 4.Represent images by frequencies
of visual words Bags of features for image classification
2. Learning the visual vocabulary Clustering Slide credit:
Josef Sivic Visual vocabulary
Slide 21
Clustering The assignment of objects into groups (called
clusters) so that objects from the same cluster are more similar to
each other than objects from different clusters. Often similarity
is assessed according to a distance measure. Clustering is a common
technique for statistical data analysis, which is used in many
fields, including machine learning, data mining, pattern
recognition, image analysis and bioinformatics.
Slide 22
Slide 23
Slide 24
Any of the similarity metrics we talked about before (SSD,
angle between vectors)
Slide 25
Feature Clustering Clustering is the process of grouping a set
of features into clusters of similar features. Features within a
cluster should be similar. Features from different clusters should
be dissimilar.
Slide 26
source: Dan Klein
Slide 27
K-means clustering Want to minimize sum of squared Euclidean
distances between points x i and their nearest cluster centers m k
source: Svetlana Lazebnik
Slide 28
K-means clustering Want to minimize sum of squared Euclidean
distances between points x i and their nearest cluster centers m k
source: Svetlana Lazebnik
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
source: Dan Klein
Slide 41
Slide 42
Source: Hinrich Schutze
Slide 43
Slide 44
Hierarchical clustering strategies Agglomerative clustering
Start with each point in a separate cluster At each iteration,
merge two of the closest clusters Divisive clustering Start with
all points grouped into a single cluster At each iteration, split
the largest cluster source: Svetlana Lazebnik
Slide 45
source: Dan Klein
Slide 46
Slide 47
Divisive Clustering Top-down (instead of bottom-up as in
Agglomerative Clustering) Start with all docs in one big cluster
Then recursively split clusters Eventually each node forms a
cluster on its own. Source: Hinrich Schutze
Slide 48
Flat or hierarchical clustering? For high efficiency, use flat
clustering (e.g. k means) For deterministic results: hierarchical
clustering When a hierarchical structure is desired: hierarchical
algorithm Hierarchical clustering can also be applied if K cannot
be predetermined (can start without knowing K) Source: Hinrich
Schutze
Slide 49
2. Learning the visual vocabulary Clustering Slide credit:
Josef Sivic
Slide 50
2. Learning the visual vocabulary Clustering Slide credit:
Josef Sivic Visual vocabulary
Slide 51
From clustering to vector quantization Clustering is a common
method for learning a visual vocabulary or codebook Unsupervised
learning process Each cluster center produced by k-means becomes a
codebook entry Codebook can be learned on separate training set
Provided the training set is sufficiently representative, the
codebook will be universal The codebook is used for quantizing
features A vector quantizer takes a feature vector and maps it to
the index of the nearest entry in the codebook Codebook = visual
vocabulary Codebook entry = visual word
Slide 52
Example visual vocabulary Fei-Fei et al. 2005
Slide 53
Image patch examples of visual words Sivic et al. 2005
Slide 54
Visual vocabularies: Issues How to choose vocabulary size? Too
small: visual words not representative of all patches Too large:
quantization artifacts, overfitting Computational efficiency
Vocabulary trees (Nister & Stewenius, 2006)
Slide 55
3. Image representation .. frequency codewords
Slide 56
Image classification (next) Given the bag-of-features
representations of images from different classes, how do we learn a
model for distinguishing them?
Slide 57
Clustering in Action
Slide 58
President George W. Bush makes a statement in the Rose Garden
while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003.
Rumsfeld said the United States would release graphic photographs
of the dead sons of Saddam Hussein to prove they were killed by
American troops. Photo by Larry Downing/Reuters Names and Faces
Whos in the picture? T.L. Berg, A.C. Berg, J. Edwards, D.A.
Forsyth
Slide 59
Intuition George Bush
Slide 60
500k News Corpora Producer and director Bruce Paltrow has died
at the age of 58 in Rome, Italy, the U.S. Consulate said on October
3, 2002. Paltrow had suffered from throat cancer for several years,
but the cause of his death was not immediately known. He is seen
with his daughter actress Gwyneth Paltrow after the Academy Awards
in Los Angles in March 21, 1999 file photo. (Fred Prouser/Reuters)
Actress Winona Ryder (news) reacts to remarks by prosecutor Ann
Rundle during the sentencing hearing in her felony shoplifting case
Friday, Dec. 6, 2002 at the Beverly Hills, Calif., courthouse. At
right is Ryder's attorney Mark Geragos. Ryder was sentenced to
three years of probation and was ordered to perform 480 hours of
community service. (AP Photo/Steve Grayson, POOL)
Slide 61
President George W. Bush makes a statement in the Rose Garden
while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003.
Rumsfeld said the United States would release graphic photographs
of the dead sons of Saddam Hussein to prove they were killed by
American troops. Photo by Larry Downing/Reuters Name & Face
Extraction Detected Faces
Slide 62
President George W. Bush makes a statement in the Rose Garden
while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003.
Rumsfeld said the United States would release graphic photographs
of the dead sons of Saddam Hussein to prove they were killed by
American troops. Photo by Larry Downing/Reuters Name & Face
Extraction Detected Names: President George W. Bush, Defense Donald
Rumsfeld, Saddam Hussein. Detected Faces
Slide 63
Each name in the dataset is a potential cluster. Want to
simultaneously: 1.) Learn image model for each person. 2.) Learn
depiction model across names. Achieve both of these by considering
a big assignment (clustering) problem. Goal
Slide 64
Assignment Problem
Slide 65
Language indicates Depiction President George W. Bush makes a
statement in the Rose Garden while Secretary of Defense Donald
Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States
would release graphic photographs of the dead sons of Saddam
Hussein to prove they were killed by American troops. Photo by
Larry Downing/Reuters Cues - POS tags before and after name,
location in caption, distance to closest: ( ) (L) (C) (R) left
right center shown pictured above P(Depicted | Context) Yes/No
multiple independent cues
Slide 66
1.) Update assignments 2.) Update: appearance model for each
person. language model of depiction across names. Iterate 1-2
Method
Slide 67
Results British director Sam Mendes and his partner actress
Kate Winslet arrive at the London premiere of 'The Road to
Perdition', September 18, 2002. The films stars Tom Hanks as a
Chicago hit man who has a separate family life and co- stars Paul
Newman and Jude Law. REUTERS/Dan Chung World number one Lleyton
Hewitt of Australia hits a return to Nicolas Massu of Chile at the
Japan Open tennis championships in Tokyo October 3, 2002.
REUTERS/Eriko Sugita
Slide 68
US President George W. Bush (L) makes remarks while Secretary
of State Colin Powell (R) listens before signing the US Leadership
Against HIV /AIDS, Tuberculosis and Malaria Act of 2003 at the
Department of State in Washington, DC. The five-year plan is
designed to help prevent and treat AIDS, especially in more than a
dozen African and Caribbean nations(AFP/Luke Frazza) German
supermodel Claudia Schiffer gave birth to a baby boy by Caesarian
section January 30, 2003, her spokeswoman said. The baby is the
first child for both Schiffer, 32, and her husband, British film
producer Matthew Vaughn, who was at her side for the birth.
Schiffer is seen on the German television show 'Bet It...?!'
('Wetten Dass...?!') in Braunschweig, on January 26, 2002.
(Alexandra Winkler/Reuters) Results
Slide 69
Without CEO Summit With Martha Stewart Without James Bond With
Pierce Brosnan Without Dick Cheney With George W. Bush
ModelAccuracy of labeling Vision model, No Lang model67% Vision
model + Lang model78%
Slide 70
Face Dictionary
http://tamaraberg.com/faces/faceDict/NIPSdict/index.html
Slide 71
Results - Depiction Classifier% correct Baseline (all
pictured)67% Learned Lang Model86% IN - pictured, OUT - not
pictured