Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring...
Transcript of Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring...
![Page 1: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/1.jpg)
Inferring What’s Important in Image Search
Kristen Grauman
University of Texas at Austin
With Adriana Kovashka, Devi Parikh, and Sung Ju Hwang
![Page 2: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/2.jpg)
“Visual” search 1.0
• Associate images by keywords and meta-data
![Page 3: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/3.jpg)
Visual search 2.0
• Auto-annotate images with relevant keywords:
objects, attributes, scenes, visual concepts…
cow
furry
black
outdoors
[Kumar et al. 2008, Snoek et al. 2006, Naphade et al. 2006, Chang et al.
2006, Vaquero et al. 2009, Berg et al. 2010, and many others…]
Kristen Grauman, UT-Austin
![Page 4: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/4.jpg)
Problem
• Fine-grained visual differences beyond keyword
composition influence image search relevance.
?
Similar object distributions, yet are they equally relevant?
vs
Kristen Grauman, UT-Austin
![Page 5: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/5.jpg)
Problem
• Fine-grained visual differences beyond keyword
composition influence image search relevance.
How to capture target with a single description?
≠ brown strappy heels
Kristen Grauman, UT-Austin
![Page 6: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/6.jpg)
Goal
• Fine-grained visual differences beyond keyword
composition influence image search relevance.
• Goal: Account for subtleties in visual relevance
– Implicit importance:
Infer which objects most define the scene
– Explicit importance:
Comparative feedback about which properties are
(ir)relevant
Kristen Grauman, UT-Austin
![Page 7: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/7.jpg)
Related work
• Region-noun correspondence [Duygulu et al. 2002,
Barnard et al. 2003, Berg et al. 2004, Gupta & Davis
2008, Li et al. 2009, Hwang & Grauman 2010,…]
• Dual-view image-text representations [Monay & Gatica-
Perez 2003, Hardoon & Shawe-Taylor 2003, Quattoni et
al. 2007, Bekkerman & Jeon 2007, Quack et al. 2008,
Blaschko & Lampert 2008, Qi et al. 2009,…]
• Image description and memorability [Spain & Perona
2008, Farhadi et al. 2010, Berg et al. 2011, Parikh &
Grauman 2011, Isola et al. 2011, Berg et al. 2012]
Kristen Grauman, UT-Austin
![Page 8: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/8.jpg)
Capturing relative importance
versus
Query Retrieved images
• Object presence != importance
Can we infer what human viewers find most important? Kristen Grauman, UT-Austin
![Page 9: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/9.jpg)
• Intuition: Human-provided tags give useful cues
beyond just which objects are present.
Based on tags alone, what can you say about the
mug in each image?
Mug Key Keyboard Toothbrush Pen Photo Post-it
Computer Poster Desk Bookshelf Screen Keyboard Screen Mug Poster
? ?
Capturing relative importance
Kristen Grauman, UT-Austin
![Page 10: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/10.jpg)
• Intuition: Human-provided tags give useful cues
beyond just which objects are present.
Based on tags alone, what can you say about the
mug in each image?
Mug Key Keyboard Toothbrush Pen Photo Post-it
Computer Poster Desk Bookshelf Screen Keyboard Screen Mug Poster
Capturing relative importance
Kristen Grauman, UT-Austin
![Page 11: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/11.jpg)
• Learn cross-modal representation that accounts
for “what to mention” using implicit cues from text
Our idea: Learning implicit importance
Textual:
• Frequency
• Relative order
• Mutual proximity
Visual:
• Texture
• Scene
• Color…
TAGS:
Cow Birds Architecture Water Sky
Training: human-given descriptions
Kristen Grauman, UT-Austin
![Page 12: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/12.jpg)
• Learn cross-modal representation that accounts
for “what to mention” using implicit cues from text
Our idea: Learning implicit importance
Textual:
• Frequency
• Relative order
• Mutual proximity
Visual:
• Texture
• Scene
• Color…
TAGS:
Cow Birds Architecture Water Sky
Training: human-given descriptions
Importance = how likely an object is
named early on by a human
describing an image.
Kristen Grauman, UT-Austin
![Page 13: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/13.jpg)
Presence or absence of other objects affects the
scene layout record bag-of-words frequency.
Presence or absence of other objects affects the
scene layout
Mug Key Keyboard Toothbrush Pen Photo Post-it
Computer Poster Desk Bookshelf Screen Keyboard Screen Mug Poster
People tag the “important” objects earlier record
rank of each tag compared to its typical rank.
People tend to move eyes to nearby objects after
first fixation
People tag the “important” objects earlier
People tend to move eyes to nearby objects after
first fixation record proximity of all tag pairs.
Mug Key Keyboard Toothbrush Pen Photo Post-it
Computer Poster Desk Bookshelf Screen Keyboard Screen Mug Poster
2 3
4 5
6
7
1 1
2
3 4 5
6 7
8 9
Implicit tag features
Kristen Grauman, UT-Austin
![Page 14: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/14.jpg)
Importance-aware
semantic space
View y View x
[Hwang & Grauman, IJCV 2011]
Learning an importance-aware semantic space
Untagged query image
Kristen Grauman, UT-Austin
![Page 15: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/15.jpg)
Select projection bases:
Given paired data Linear CCA:
Kernel CCA: Given pair of kernel functions
Same objective, but projections in kernel space:
[Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]
Learning an importance-aware semantic space
Kristen Grauman, UT-Austin
![Page 16: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/16.jpg)
Assumptions
1. People tend to agree about which objects most
define a scene.
2. Significance of those objects in turn influences
the order in which they are mentioned.
Evidence from previous studies that these hold:
[von Ahn & Dabbish 2004, Tatler et al. 2005, Spain
& Perona 2008, Einhauser et al. 2008, Elazary &
Itti 2008, Berg et al. 2012]
Kristen Grauman, UT-Austin
![Page 17: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/17.jpg)
Image+text datasets
• PASCAL VOC 2007 with tags ~10K images
• LabelMe images with tags ~4K images
• PASCAL VOC 2007 with sentences ~500 images
Text data collected on MTurk (~750 unique workers)
Tags Sentences
Kristen Grauman, UT-Austin
![Page 18: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/18.jpg)
Query Image
Results: Accounting for
importance in image search
Our method
Words + Visual
Visual only
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 19: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/19.jpg)
Our method
Words + Visual
Visual only
Query Image
Results: Accounting for
importance in image search
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 20: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/20.jpg)
Query Image
Results: Accounting for
importance in image search
Words + Visual
Visual only
Our method
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 21: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/21.jpg)
Results: Accounting for
importance in image search
Our method better retrieves images that
share the query’s important objects
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 22: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/22.jpg)
Importance-aware
semantic space
Auto-tagging
Untagged
query image
We can also predict descriptions for novel images
Cow Tree Grass
Field Cow Fence Cow
Grass
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 23: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/23.jpg)
Results: Accounting for importance in auto-tagging
Person Tree Car Chair Window
Bottle Knife Napkin Light Fork
Tree Boat Grass Water Person
Boat Person Water Sky Rock
We can also predict descriptions for novel images
[Hwang & Grauman, IJCV 2011] Kristen Grauman, UT-Austin
![Page 24: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/24.jpg)
What do human judges think?
Select those images below that contain the “most important” objects seen in the query.
Kristen Grauman, UT-Austin
![Page 25: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/25.jpg)
What do human judges think?
Subjects are
323 MTurk
workers
Require
unanimous
vote among 5
for image to be
considered
relevant
Kristen Grauman, UT-Austin
![Page 26: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/26.jpg)
Goal
• Fine-grained visual differences beyond keyword
composition influence image search relevance.
• Goal: Account for subtleties in visual relevance
– Implicit importance:
Infer which objects most define the scene
– Explicit importance:
Comparative feedback about which properties are
(ir)relevant
Kristen Grauman, UT-Austin
![Page 27: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/27.jpg)
Problem with one-shot visual search
• Keywords (including attributes) can be
insufficient to capture target in one shot.
≠ brown strappy heels
Kristen Grauman, UT-Austin
![Page 28: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/28.jpg)
Interactive visual search
Feedback
Results
• Iteratively refine the set of retrieved images based on user feedback on results so far
• Potential to communicate more precisely the desired visual content
Kristen Grauman, UT-Austin
![Page 29: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/29.jpg)
• Tuning system parameters difficult for user [Flickner et al. 1995, Ma & Manjunath 1997, Iqbal & Aggarwal 2002]
Limitations of traditional interactive methods
color
texture
shape
0.2
0.2
0.6
… …
Kristen Grauman, UT-Austin
![Page 30: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/30.jpg)
• Tuning system parameters difficult for user [Flickner et al. 1995, Ma & Manjunath 1997, Iqbal & Aggarwal 2002]
• Traditional binary feedback imprecise [Rui et al. 1998, Zhou et al. 2003, …]
“white
high
heels”
Limitations of traditional interactive methods
irrelevant
irrelevant relevant relevant
Kristen Grauman, UT-Austin
![Page 31: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/31.jpg)
WhittleSearch: Relative attribute feedback
Whittle away irrelevant images via precise semantic feedback
Feedback: “shinier
than these”
Feedback: “more formal
than these”
Refined top
search results
Initial top
search results
…
Kovashka, Parikh, and Grauman, CVPR 2012
…
Query: “white high-heeled shoes”
Kristen Grauman, UT-Austin
![Page 32: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/32.jpg)
Feedback: “broader
nose”
…
Refined
top search
results
Initial
reference
images
…
Feedback: “similar hair
style”
WhittleSearch: Relative attribute feedback
Whittle away irrelevant images via precise semantic feedback
Kovashka, Parikh, and Grauman, CVPR 2012 Kristen Grauman, UT-Austin
![Page 33: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/33.jpg)
Visual attributes
• High-level semantic properties shared by objects
• Human-understandable and machine-detectable
brown
indoors
outdoors flat
four-legged
high
heel
red has-
ornaments
metallic
[Farhadi et al. 2009, Lampert et al. 2009, Kumar et al. 2009,
Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson
et al. 2010, Parikh & Grauman 2011, …]
Kristen Grauman, UT-Austin
![Page 34: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/34.jpg)
• Represent comparative relationships between
classes, images, and their properties.
Relative attributes
Properties
Concept
Properties
Concept
Properties
Brighter
than
[Parikh & Grauman, ICCV 2011]
Bright Bright
Kristen Grauman, UT-Austin
![Page 35: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/35.jpg)
Learning relative attributes
• We want to learn a spectrum (ranking model) for an attribute, e.g. “brightness”.
• Supervision consists of:
Parikh and Grauman, ICCV 2011
Ordered pairs
Similar pairs
Kristen Grauman, UT-Austin
![Page 36: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/36.jpg)
Learn a ranking function
that best satisfies the constraints:
Image features
Learned parameters
Learning relative attributes
Parikh and Grauman, ICCV 2011 Kristen Grauman, UT-Austin
![Page 37: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/37.jpg)
Max-margin learning to rank formulation
Image Relative attribute score
Learning relative attributes
Joachims, KDD 2002; Parikh and Grauman, ICCV 2011
Rank margin
wm
Kristen Grauman, UT-Austin
![Page 38: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/38.jpg)
Relating images
• Rank images according to attribute presence
bright
formal
natural
Kristen Grauman, UT-Austin
![Page 39: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/39.jpg)
WhittleSearch with relative attribute feedback
Offline:
We learn a spectrum for each attribute
During search:
1. User selects some reference images and marks how they differ from the desired target
2. We update the scores for each database image
natural
scores = scores + 1 scores = scores + 0 “I want something
less natural than this.”
Kristen Grauman, UT-Austin
![Page 40: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/40.jpg)
WhittleSearch with relative attribute feedback
natural
perspective “I want
something more natural
than this.” “I want something less natural than this.”
“I want something with more perspective than this.”
score = 0
score = 1 score = 1
score = 1
score = 1 score = 0
score = 1
score = 2 score = 1
score = 1
score = 2 score = 1
score = 2
score = 3 score = 2
score = 1
score = 2 score = 1
Kristen Grauman, UT-Austin
![Page 41: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/41.jpg)
Shoes: [Berg; Kovashka] 14,658 shoe images;
10 attributes: “pointy”, “bright”, “high-heeled”, “feminine” etc.
OSR: [Oliva & Torralba] 2,688 scene images;
6 attributes: “natural”, “perspective”,
“open-air”, “close-depth” etc.
PubFig: [Kumar et al.] 772 face images;
11 attributes: “masculine”, “young”,
“smiling”, “round-face”, etc.
Datasets
41 Kristen Grauman, UT-Austin
![Page 42: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/42.jpg)
Experimental setup
• Give the user the target image to look for
• Pair each target image with 16 reference images
• Get judgments on pairs from users on MTurk
Is ?
Binary feedback baseline
similar to
or
dissimilar from
Relative attribute feedback
Is than ?
pointy
open
bright
ornamented
shiny
high-heeled
long on the leg
formal
sporty
feminine
more
or
less
Kristen Grauman, UT-Austin
![Page 43: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/43.jpg)
[Kovashka et al., CVPR 2012]
We more rapidly converge on the envisioned visual content.
WhittleSearch Results
vs.
Kristen Grauman, UT-Austin
![Page 44: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/44.jpg)
[Kovashka et al., CVPR 2012]
We more rapidly converge on the envisioned visual content.
Richer feedback faster gains per unit of user effort.
WhittleSearch Results
Kristen Grauman, UT-Austin
![Page 45: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/45.jpg)
More open than
Example WhittleSearch
45
More open than
Less ornaments than
Match
Round 1
Ro
un
d 2
Round 3
Query: “I want a bright, open shoe that is short on the leg.”
Selected feedback
[Kovashka et al., CVPR 2012] Kristen Grauman, UT-Austin
![Page 46: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/46.jpg)
Failure case (?)
Is the user searching for a specific person (identity), or a person meeting the description?
Kristen Grauman, UT-Austin
![Page 47: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/47.jpg)
Hybrid relevance feedback
“shininess”
Image database
Relevance constraints
More relevant
Less relevant
“similar to these”
Feedback: “more shiny than these”
“dissimilar from these”
• We integrate relative attribute and binary feedback by learning a relevance ranking function.
[Kovashka et al., CVPR 2012] Kristen Grauman, UT-Austin
![Page 48: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/48.jpg)
Dissimilar from
Less open than
Query: “I want a non-open shoe that is long on the leg and covered in ornaments.”
Match
Round 1
Round 2
Similar to
Selected feedback
More bright than
Example hybrid WhittleSearch
[Kovashka et al., CVPR 2012] Kristen Grauman, UT-Austin
![Page 49: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/49.jpg)
Summary
• Fine-grained visual relevance is essential for next steps in image search
• Beyond tags when learning from text+images model implied importance cues
• Beyond clicks as feedback visual comparisons to refine search
Kristen Grauman, UT-Austin
![Page 50: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/50.jpg)
Looking forward
• What is implied by natural language description beyond ordering? (tags vs. sentences)
• How to ensure that feedback user gives is useful (e.g., not redundant)?
• What attributes should be in the vocabulary?
• How to align user’s attribute language with the visual attribute models?
Kristen Grauman, UT-Austin
![Page 51: Inferring What’s Important in Image Searchgrauman/slides/grauman-vabs-workshop...Inferring What’s Important in Image Search Kristen Grauman University of Texas at Austin With Adriana](https://reader035.fdocuments.us/reader035/viewer/2022071010/5fc82a6fa6e3f2553844bea8/html5/thumbnails/51.jpg)
Summary
• Fine-grained visual relevance is essential for next steps in image search
• Beyond tags when learning from text+images model implied importance cues
• Beyond clicks as feedback visual comparisons to refine search
Kristen Grauman, UT-Austin