Learning from the Uncertain: Leveraging Social Communities to generate reliable Training Data for...

19
Learning from the Uncertain: Leveraging Social Communities to generate reliable Training Data for Visual Concept Detection Tasks i-KNOW 2015 Christian Hentschel, Harald Sack Hasso Plattner Institute, University of Potsdam, Germany

Transcript of Learning from the Uncertain: Leveraging Social Communities to generate reliable Training Data for...

Learning from the Uncertain: Leveraging Social Communities to generate reliable Training Data for Visual Concept Detection Tasksi-KNOW 2015

Christian Hentschel, Harald SackHasso Plattner Institute, University of Potsdam, Germany

Agenda

● Visual Concept Detection○ Problem: Insufficient Training Data○ Approach: Leveraging Social Photo Communities

● Relevant Image Retrieval○ Improved Dataset○ Community language model○ Visual Re-ranking

● Results● Conclusions & Outlook

Learning from the UncertainHarald Sack,10-21-2015

Chart 2

Visual Concept Detection

● Ability of learning visual categories in order to automatically identify new, unseen images of these categories only based on visual content

Learning from the Uncertain

Chart 3

Harald Sack,10-21-2015

● Supervised Machine Learning Task:○ Positive images (that depict a concept)

○ Negative images (that don’t)

○ Classification/Prediction:■ Test image if it depicts concept (or not):

Visual Concept Detection

Learning from the Uncertain

Chart 4

Harald Sack,10-21-2015

● Approach: Convolutional Neural Networks (CNN)○ outperformed all other approaches○ variation of multi-layer perceptron○ deep (i.e. many hidden layers)○ many parameters to train → prone to overfitting

■ important: (very) large training datasets

Visual Concept Detection - Training Data

Learning from the Uncertain

Chart 5Bag of Visual WordsConvolutional Neural Networks

Harald Sack,10-21-2015

● Training Data for Convolutional Neural Networks○ ImageNet

■ widely used■ benchmarking initiative■ > 14m photos■ > 21k classes

■ goal: 40k categories each covered by 10k photos evaluated as relevant by the majority of 10 human annotators

■ estimated annotation time: 63 years (2s per image)

Visual Concept Detection - Training Data

Learning from the Uncertain

Chart 6

Harald Sack,10-21-2015

http://www.image-net.org /

● Training Data for Convolutional Neural Networks (cont.)○ Flickr

■ > 8b photos (as of 12-12-2012)■ user generated annotations■ potentially unlimited visual concepts and photos per concept■ problems:

● incomplete (not severe as there are so many)○ missing annotations

● highly subjective!

● often not related to visual content!○ e.g. describe viewpoint rather than object

Visual Concept Detection - Leveraging Social Photo Communities

Learning from the Uncertain

Chart 7

Harald Sack,10-21-2015

● Task: identify images relevant for a given visual concept○ scene/object should be clearly visible, no major occlusions

Visual Concept Detection - Leveraging Social Photo Communities

Learning from the Uncertain

Chart 8

Harald Sack,10-21-2015

Relevant Image Retrieval - The Dataset

● MIRFLICKR-1M○ 1 Million Flickr images (selection based on

interestingness score)○ published under Creative Commons Attribution Licence○ provides EXIF data and user tags only

Learning from the UncertainHarald Sack,10-21-2015

Chart 9

○ authoritative metadata: title, description, geo information

○ social metadata: user comments, note, album and group memberships

○ available for > 90% of the original data○ publicly available: www.s16a.org/mirflickr

● our s14a extension:

● query extension○ based on the language used by Flickr users, generate query terms

similar/related to the visual concept query○ example: → …○ learn these annotation relationships based on metadata corpus

● language model: word2vec○ make predictions about words meaning based on learned contextual

appearances○ unsupervised training of a neural network: given a corpus

■ predict a word given its context (Continuous Bag-of-Words)■ predict context given a word (skip-gram)

○ skip-gram better suited to represent infrequent words

Relevant Image Retrieval - Community Language Model

Learning from the Uncertain

Chart 10

Harald Sack,10-21-2015

https://code.google.com/p/word2vec/

● currently: model is based on user tags only○ titles, descriptions, user comments use HTML encoded full-text strings○ pre-processing: lemmatization, stop word removal (English only)

● skip-gram model○ context window size: 6 (average # of tags per image: 12 )○ ignore words with total frequency f < 5○ 300-dimensional feature vector

● compute k=20 most similar terms per query concept○ extend initial visual concept query (e.g. ‘sunset’) by additional, related terms○ rank images based on number of query terms found in metadata

Relevant Image Retrieval - Community language model (cont.)

Learning from the Uncertain

Chart 11

Harald Sack,10-21-2015

● Advantages of corpus-specific language model:○ retrieves synonyms:

■ airplane: [aircraft, aeroplane, plane]○ retrieves related concepts:

■ beach: [sand, ocean, shore]○ retrieves frequent instances:

■ flower: [dahlia, spiderwort, tulip], car: [ford, porsche]○ multilingual:

■ dog: [chien, perro, hund]○ captures sub- and superclasses:

■ boat: [fisherboat, sailboat, sailingships], tiger: [flickrbigcats]○ no external knowledge base, dictionary or thesaurus required

Relevant Image Retrieval - Community language model (cont.)

Learning from the Uncertain

Chart 12

Harald Sack,10-21-2015

● Pre-trained CNN as feature extractor○ trained on ImageNet Large Scale Visual Recognition Challenge 2012 data○ deep feature encoding○ pen-ultimate (fully connected) layer○ 4,096-dimensional feature vector○ extracted for all images of MIRFLICKR-1M collection

■ 3 hours on NVIDIA Tesla K20X GPU○ publicly available: www.s16a.org/mirflickr

● Re-ranking: highest ranked image from extended query○ assumption: high relevance for visual concept○ compute cosine similarity using deep feature representations

Relevant Image Retrieval - Further Improvement by Visual Re-ranking

Learning from the Uncertain

Chart 13

Harald Sack,10-21-2015

● baseline:○ select photos based on presence of concept term in tagset○ no query extension○ large number of candidate images: randomly select n=200

● compare against○ extended query using community language model○ extended query + visual re-ranking

● evaluation using average precision:

○ R: no. of relevant photos, Rk: relevant images among the top k ranked instances, rel(k) = 1, if the photo at rank k is relevant, 0 otherwise

Results

Learning from the Uncertain

Chart 14

Harald Sack,10-21-2015

● tested for 10 visual concepts

● manual assessment of relevance:○ object/scene clearly visible○ no major occlusions

● visual-reranking of results obtained from language model superior

Results (cont.)

Learning from the Uncertain

Chart 15

Harald Sack,10-21-2015

○ exceptions: ‘airplane’, ‘car’

Results (cont.)

Learning from the Uncertain

Chart 16

Harald Sack,10-21-2015

● top ranked candidate images according to community language model

passenger, vliegtuig, aéroport, jetplane, traveller, airliner, lesavions, economysection, traveler, vacation, fuselage, motor, jetliner, transportation, airplane, legroom, transport, sky, passengerjet, jet, travel, flying, avion, airport, inflight, rudder, flugzeug, tail, bin, ptvs, aerial, cabin, passengerplane, aircraftpicture, schipholairport, flap, nederland, aviao, amsterdam, landinggear, cockpit, luggagebins, aircraft, plane, aviation, seat, economyclass, airship, aisle, netherlands, nosegear, holland, luggage, aircraftcabin, engine, aeroport, ptv, aeroplane, aeroplano, wing, paysbas

harney, ford, illinois, myoldpostcards, leeannharney, il, owner, coupe, chromeengine, backend, taillight, automobile, route66, custom, tail, motorvehicle, vintagecar, fomoco, international, fin, collectiblecar, 2012, custombuilt, september21232012, auto, dougthompson, motherroadfestival, 1950, 9212312, convertible, fordmotorcompany, classiccar, 2door, worldcars, car, ghostflames, vonliski, rearend, sidepipes, antiquecar, springfield, frankharney, deluxe, carsonconvertibletop, oldcar

airplane

car

Results (cont.)

Learning from the Uncertain

Chart 17

Harald Sack,10-21-2015

● Problem:○ visual re-ranking based on airport/rear light image fails to capture essential

features of the visual concept:■ airplane top-5 ranked images according to language model:

■ airplane top-5 ranked images after visual re-ranking:

● Automatically retrieve training images for visual concept detection:○ extend MIRFLICKR-1M dataset by additional authoritative and social metadata

■ www.s16a.org/mirflickr○ community language model improves over tag-based ranking

■ mAP: 56% → 77%○ visual re-ranking improves in most cases

● Future Work:○ ground truth relevance estimation based on single evaluator○ include further annotation data

■ title, description, Flickr group information …○ improve visual re-ranking:

■ outlier detection (e.g. single class SVM)■ on top-n results (instead of top-1)

Conclusions & Outlook

Learning from the Uncertain

Chart 18

Harald Sack,10-21-2015

Thank you for your attention!

Christian Hentschel, Harald SackHasso Plattner Institute, University of Potsdam, Germany