Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public...
Transcript of Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public...
![Page 1: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/1.jpg)
![Page 2: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/2.jpg)
Visual Relationship Detection track
![Page 3: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/3.jpg)
Outline
P 3Open Images Challenge: Visual Relationships Detection track
● Visual relationship detection track overview● Dataset: data collection and statistics● Metrics● Result analysis
![Page 4: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/4.jpg)
Open Images Challenge: Visual relationship detection
Visual relationship detection
P 4
DOGDOG
BICYCLE BICYCLE
PERSON
PERSON
Both images have the same set of objects and layout but very different semantics
Task:
● Two objects locations and classes
● Relationship between two objects
![Page 5: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/5.jpg)
Participation and winning requirements
● Additional annotations on top of Open Images V4● External data/pre-trained models are allowed but must be disclosed● Evaluation server is hosted by Kaggle● Full prize: 20K USD split between 3 winners● Winner obligations:
○ Detailed, minimum 2-page description of method● Winners encouraged:
○ Open-source their framework
Open Images Challenge: Visual Relationships Detection track P 5
![Page 6: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/6.jpg)
Dataset: data collection
P 6
2. Generate label co-occurrence statistics of Open Images V4+ pick interesting relationships
1. Existing works● VRD dataset1
● Visual Genome Dataset2
3. Generate candidate triplets for annotation
Relationships:<Pets> under {Table, Chair, etc ...}<Object> on top of <Object><Object> inside of <Object><Human> holds <Object><Human> on top of <Object><Human> hits {Football,Tennis ball,...}<Human> plays {Drums, Guitar, …}<Object> is {attribute}
1Lu, C., Krishna, R., Bernstein, M, Fei-Fei, Li, “Visual Relationship Detection with Language Priors”, ECCV 20162 Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Jia-Li L., Ayman Shamma D., Bernstein M., Fei-Fei L., “Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations”, 2016
![Page 7: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/7.jpg)
Open Images Challenge: Visual Relationships Detection track
Dataset: annotation
P 7
ManMan Man
Example triplet: Man holds Microphone
![Page 8: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/8.jpg)
Open Images Challenge: Visual Relationships Detection track
Dataset: annotation
P 8
Microphone
Example triplet: Man holds Microphone
![Page 9: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/9.jpg)
Open Images Challenge: Visual Relationships Detection track
Dataset: annotation
P 9
Please verify that the relation holds connects the man and the microphone on the image: man holds microphone
Microphone
Man
![Page 10: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/10.jpg)
Open Images Challenge: Visual Relationships Detection track
Dataset: statistics
Train set:● 1,743,042 images● 374,768 relationship annotations● 3,290,070 bounding boxes● 329 distinct triplets● 100k subset for validation
Test set:● 100K images● 30% in public split● 70% in private split
P 10
![Page 11: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/11.jpg)
Open Images Challenge: Visual Relationships Detection track
Evaluation
P 11
No standard metric for visual relationships detection evaluation.
Evaluation server is hosted by Kaggle
Public metric implementation is available as a part of Tensorflow Object Detection API
![Page 12: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/12.jpg)
Evaluation: metrics
P 12
Three metrics used in literature1,2:● AP relationships detection (but reported values are low)● AP phrase detection● Recall@50, Recall@100 for both relationship detection and phrase
detection
Final score:0.4*mAP(relationships) + 0.4*mAP(phrase) + 0.2*Recall@50(relationships)
1Lu, C., Krishna, R., Bernstein, M, Fei-Fei, Li, “Visual Relationship Detection with Language Priors”, ECCV 20162 Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Jia-Li L., Ayman Shamma D., Bernstein M., Fei-Fei L., “Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations”, 2016
![Page 13: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/13.jpg)
Open Images Challenge: Visual Relationships Detection track
Evaluation: metrics
AP per relationship (i.e. holds)
● mean AP(relationships)● Recall@50
True Positive:● IoU > 0.5 for each box● Object labels and
relationship labelmatch
P 13
![Page 14: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/14.jpg)
Open Images Challenge: Visual Relationships Detection track
Evaluation: metrics
P 14
AP per relationship (i.e. holds)
mean AP(phrase)
True Positive:● IoU > 0.5 for box union● Object labels and
relationship labelmatch
![Page 15: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/15.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: overview
P 15
Number of teams with at least one submission: 232 teamsEvaluation server days: 51
External datasets/pre-trained models used: ● OpenImagesV4● ImageNet● COCO● Visual Genome
Base model architectures:● ResNets, YOLO, Darknet, SENet,
Retinanet ...
Deep learning frameworks:● Tensorflow Object Detection API,
Detectron, Cadene (pyTorch), fastai library, ImageAI, ChainerCV, TensorFlow-Slim, Keras, MXNet
![Page 16: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/16.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: teams
P 16
Number of teams: 232
![Page 17: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/17.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: components of the final score (weighted)
P 17
● Some teams with lower overall score had higher score in mAPs
● Some teams scored well, mostly because of Recall@50
![Page 18: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/18.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: number of submissions per day
P 18
![Page 19: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/19.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: evolution of scores
P 19
Dots: winners entering the competition
![Page 20: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/20.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: evolution of scores (winning teams)
P 20
![Page 21: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/21.jpg)
Open Images Challenge: Visual Relationships Detection track
Results analysis: winners breakdown by score compos (unweighted)
P 21
2rd, 3rd and 4th place teams showed approximately the same performance on mAP(relationship)
![Page 22: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/22.jpg)
Open Images Challenge: Visual Relationships Detection track
Winning models: final result
P 22
Public leaderboard score Private leaderboard score
Seiji 0.33213 0.28544
tito 0.25571 0.23709
Kyle 0.28043 0.23491
toshif 0.25621 0.22832
Undisclosed
![Page 23: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/23.jpg)
Open Images Challenge: Visual Relationships Detection track
Winning models
P 23
Commonalities:● Different models for attributes (“is” relationship) and relationships
between two objects● The models combine a detector with a module on top for relationship
prediction
![Page 24: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/24.jpg)
Open Images Challenge: Visual Relationships Detection track
Questions?
P 24
Next - presentations by winning teams
![Page 25: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/25.jpg)
P 25
![Page 26: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/26.jpg)
Materials below this slide
P 26
![Page 27: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/27.jpg)
Candidates generation
Man ManMan
Example triplet: Man holds Microphone
![Page 28: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/28.jpg)
Microphone
Candidates generation
Example triplet: Man holds Microphone
![Page 29: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/29.jpg)
Microphone
Man
Please verify that the relation holds connects the man and the microphone on the image: man holds microphone
Annotation process
More about annotation process: go/oi_triplet_annotation
![Page 30: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/30.jpg)
Microphone
Please verify that the relation holds connects the man and the microphone on the image: man holds microphone
Annotation process
Man
More about annotation process: go/oi_triplet_annotation
![Page 31: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API.](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec9dc95f4c826280677bf85/html5/thumbnails/31.jpg)
Microphone
Please verify that the relation holds connects the man and the microphone on the image: man holds microphone
Annotation process
Man
More about annotation process: go/oi_triplet_annotation