Learning Models for Object Recognition from Natural Language Descriptions
description
Transcript of Learning Models for Object Recognition from Natural Language Descriptions
Learning Models for Object Recognitionfrom Natural Language Descriptions
Presenters:Sagardeep Mahapatra – 108771077Keerti Korrapati - 108694316
.
Goal
• Learning models for visual object recognition from natural language descriptions alone
Why learn model from natural language?
• Manually collecting and labeling large image sets is difficult
• New training set needs to be created for each new category
• Finding images for fined grained object categories is tough• Ex- species of plants and animals• But detailed visual descriptions may be readily available
.
Outline
• Datasets for training and testing
• Natural Language Processing methods
• Template Filling
• Extraction of visual attributes from test images
• Score an image against the learnt template models
• Results
• Observations
.
Dataset
• Text descriptions associated with ten species of butterflies from the eNature guide to construct the template model• Butterflies, because they have distinctive visual features like wing colors, spots, etc
• Images downloaded from google for each of the ten butterfly categories form the testing set
• »
Danaus plexippus Heliconius charitonius Heliconius erato Junonia coenia Lycaena phlaeas
Nymphalis antiopa Papilio cresphontes Pieris rapae Vanessa atalanta Vanessa cardui
.
Natural Language Processing
• Goal: Convert unstructured data in descriptions into structured templates
Factual but unstructured data in text
Information
Extraction
………..…….….………..
.
Template Filling
• Text is tokenized into words
• Tokens are tagged with parts of speech (using C&C tagger)
• Custom transformations are performed to correct known mistakes• Required because eNature guide tends to suppress some information
• Chunks of texts matching pre-defined tag sequence are extracted• Ex- noun phrases (‘wings have blue spots’), adjective phrases (‘wings are black’)
• Extracted phrases are filtered through a list of colors, patterns and positions to fill the template slots
Tokenization Part-of-Speech Tagging
Custom Transformation
Chunking Template Filling
Visual Processing
Performed based on two attributes of butterflies
• Dominant Wing Color• Colored Spots
1) Image Segmentation
• Variation in the background can pose challenges during image classification
• Hence, the butterfly image was segmented from the background using the ‘star shape’ graph cut approach
2) Spot Detection (Using a spot classifier)
• Hand marked butterfly images with no prior class information form the training set for the spot classifier
• Candidate regions likely to be spots are extracted by using Difference-of-Gaussians interest point operator
• Image descriptors (SIFT features) are extracted around the candidate spot to classify it as a spot or non-spot
3) Color Modelling
• Required to connect color names of dominant wing colors and spot colors in learnt templates to image observations
• For each color name ci, probability distribution p(z|ci) was learnt from training butterfly images ,where z is a pixel color observation in the L*a*b* color space
Generative Model
Given an input image I
the probability of the image given a butterfly category Bi as a product over the spot and wing observations:
Spot color name prior Equal priors to all spot colors
Dominant color name prior
.
Experimental Results
Two set of experiments were performed
• Performance of human beings in recognizing butterflies from textual descriptions• Because this may be reasonably considered as an upper bound
• Performance of the proposed method
Human Performance
Performance of proposed method
Observations
• Accuracy of proposed method was comparable to accuracy of non-native English speakers
• Accuracy of proposed method was more than 80 percent for four categories
• Classification of ‘Heliconius charitonius’ was the toughest for humans and also with the ground-truth and learnt templates
• Performance with ground-truth templates was comparable to that with the learnt templates• Errors in templates due to NLP methods did not have much impact
Thank You