Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.
-
Upload
arnold-bridges -
Category
Documents
-
view
214 -
download
1
Transcript of Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.
![Page 1: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/1.jpg)
Enhancing Human-Machine Communication via Visual Attributes
Devi ParikhVirginia Tech
![Page 2: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/2.jpg)
Interacting with Vision Systems
User Supervisor
2
![Page 3: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/3.jpg)
Interacting with Vision Systems
Semantic Gap3
Mode of communication is important
![Page 4: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/4.jpg)
Interacting with Vision Systems
• Necessary for communication– Language that humans understand (semantic)– Language that machines understand (visual)
• Attributes– Example: furry, natural, chubby, shiny, etc.– Better features, deeper image understanding, etc.
Farhadi et al., Kumar et al., Lampert et al., etc.– Human-machine communication
4
![Page 5: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/5.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
5
![Page 6: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/6.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
6
![Page 7: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/7.jpg)
Image SearchQuery: “black shoes”
…
7
Binary Relevance Feedback
![Page 8: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/8.jpg)
Image SearchQuery: “black shoes”
…
“shinier than these”
“more formal than these”
…
8
![Page 9: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/9.jpg)
Relative Attributes
Openness
9
Linear ranking function: open
Training
Testing
[Parikh and Grauman, ICCV 2011]
![Page 10: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/10.jpg)
Image Search
• System has pre-trained relative attribute predictors
• Relevance of image = # constraints satisfied
10
…
“shinier”“more formal”
![Page 11: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/11.jpg)
WhittleSearchshiny
formal
…
“shinier”“more formal” 11
![Page 12: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/12.jpg)
WhittleSearchshiny
formal
12
![Page 13: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/13.jpg)
WhittleSearch
13
[Kovashka, Parikh and Grauman, CVPR 2012](Patent pending)
13
![Page 14: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/14.jpg)
Whittle Search: Demo (Online)
14[Prepared by Naman Agrawal, Demo at CVPR 2013]
(Patent pending) 14
![Page 15: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/15.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
15
![Page 16: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/16.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
16
![Page 17: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/17.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
17
![Page 18: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/18.jpg)
Traditional Active Learning
Is this a forest? No, this is not a forest.
18
![Page 19: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/19.jpg)
[Parkash and Parikh, ECCV 2012]
Classifier FeedbackI think this is a
forest. What do you think ?
No, this is too open to be a
forest.
…
Ah! These images must
not be forests either then.
19
[Images more open than query]
![Page 20: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/20.jpg)
Classifier FeedbackI think this is a
forest. What do you think ?
No, this is too open to be a
forest.
…
Ah! These images must
not be forests either then.
20
[Images more open than query]
Pre-trained relative
attributes
![Page 21: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/21.jpg)
Classifier FeedbackI think this is a
forest. What do you think ?
No, this is too open to be a
forest.
…
Ah! These images must
not be forests either then.
21
[Images more open than query]
Learn attributes on
the fly
![Page 22: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/22.jpg)
Classifier FeedbackI think this is a
forest. What do you think ?
No, this is too open to be a
forest.
Ah! These images must be less open than query
22
…
[images labeled as forest]
![Page 23: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/23.jpg)
[Biswas and Parikh, CVPR 2013]
Classifier Feedback
• Learning attributes on the fly– Start only with unlabeled images (+ a supervisor)– Categories and attributes learnt from scratch
• Confidence in instances
• Active learning for learning with attributes-based classifier feedback
23
![Page 24: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/24.jpg)
Classifier Feedback
0 50 100 150 200 250 30020
30
40
50
60
70
No attributes-based feedback
Parkash et al. ECCV 2012
Proposed
Number of iterations
Accu
racy
24
Parkash and Parikh ECCV 2012
Biswas and Parikh CVPR 2013
![Page 25: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/25.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
25
![Page 26: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/26.jpg)
WhittleSearchQuery: “black shoes”
…
“shinier than these”
“more formal than these”
…
26
![Page 27: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/27.jpg)
Image Search
27[Parikh and Grauman, ICCV 2013]
![Page 28: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/28.jpg)
28
Saying the Right Thing
Smiling more thanNot smiling
[Sadovnik, Gallagher, Parikh and Chen, ICCV 2013]
• Improved image search, description
![Page 29: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/29.jpg)
29
Saliency of Attributes• Improved image search, zero-shot learning,
description
White, furry Scary, sharp teeth
[Turakhia and Parikh, ICCV 2013]
![Page 30: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/30.jpg)
SupervisorUser
User
Reading Between the
Lines
Supervisor
Role of the Human
Com
mun
icat
or
SupervisorUser
Hum
anM
achi
neImage Search Instilling Domain Knowledge
Characterizing Failure Modes
Interpretable Models
My missing brother is fuller-faced than
this boy.
Polar bears are white and larger
than rabbits.
If the image is blurry or the face is not frontal, I may fail.
I think this is a polar bear because this is a
white and furry animal.
Active and Interactive Learning
30
Accessing user’s intensions for mental
image search
More usable computer vision
systems even with their imperfections
Trustworthy systems: key for effective human-
machine teams
Integrating AI with today’s machine
learning tools
Getting more from what the
human says without added human effort
Enhanced human-machine communication via attributes for improved visual
recognition
![Page 31: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.](https://reader035.fdocuments.us/reader035/viewer/2022062803/56649efb5503460f94c0e142/html5/thumbnails/31.jpg)
Thank you!