Final Year Major Project Report ( Year 2010-2014 Batch )
-
Upload
parul6792 -
Category
Engineering
-
view
305 -
download
3
description
Transcript of Final Year Major Project Report ( Year 2010-2014 Batch )
2
Outline
– Introduction• Why image retrieval is hard• How images are represented• Current approaches
– Indexing and Retrieving Images• Navigational approaches• Relevance Feedback• Automatic Keywording
– Advanced Topics, Futures and Conclusion• Video and music retrieval• Towards practical systems• Conclusions and Feedback
3
Scope
General Digital Still Photographic Image Retrieval– Generally colour
Some different issues arise– Narrower domains
• E.g.Medical images especially where part of body and/or specific disorder is suspected
– Video– Image Understanding - object recognition
4
Thanks to
Chih-Fong Tsai Sharon McDonald Ken McGarry Simon Farrand And members of the University of
Sunderland Information Retrieval Group
6
Why is Image Retrieval Hard ?Why is Image Retrieval Hard ?
? What is the topic of What is the topic of this imagethis image
? What are right What are right keywords to index keywords to index this imagethis image
? What words would What words would you use to retrieve you use to retrieve this image ?this image ?
The Semantic GapThe Semantic Gap
7
Problems with Image RetrievalProblems with Image Retrieval
A picture is worth a thousand wordsA picture is worth a thousand words The meaning of an image is highly The meaning of an image is highly
individual and subjectiveindividual and subjective
12
Compression
• In practice images are stored as compressed raster– Jpeg– Mpeg
• Cf Vector …
• Not Relevant to retrieval
13
Image Processing for Retrieval
• Representing the Images– Segmentation – Low Level Features
• Colour• Texture• Shape
14
Image Features
• Information about colour or texture or shape which are extracted from an image are known as image features– Also a low-level features
• Red, sandy
– As opposed to high level features or concepts• Beaches, mountains, happy, serene, George Bush
15
Image Segmentation
• Do we consider the whole image or just part ?– Whole image - global features– Parts of image - local features
16
Global features• Averages across whole image Tends to loose distinction between foreground
and background Poorly reflects human understanding of images Computationally simple A number of successful systems have been built
using global image features including Sunderland’s CHROMA
19
Tiling Break image down into simple geometric
shapes Similar Problems to Global
Plus dangers of breaking up significant objects
Computational Simple Some Schemes seem to work well in practice
20
Regioning
• Break Image down into visually coherent areas
Can identify meaningful areas and objects
Computationally intensive Unreliable
21
Colour
• Produce a colour signature for region/whole image
• Typically done using colour correllograms or colour histograms
22
Colour HistogramsIdentify a number of buckets in which to sort the available colours (e.g. red green and blue, or up to ten or so colours) Allocate each pixel in an image to a bucket and count the number of pixels in each bucket.Use the figure produced (bucket id plus count, normalised for image size and resolution) as the index key (signature) for each image.
24
Other Colour Issues
• Many Colour Models– RGB (red green blue)– HSV (Hue Saturation Value)– Lab, etc. etc.
• Problem is getting something like human vision– Individual differences
25
Texture
• Produce a mathematical characterisation of a repeating pattern in the image– Smooth– Sandy– Grainy– Stripey
28
Texture
• Reduces an area/region to a (small - 15 ?) set of numbers which can be used a signature for that region.
• Proven to work weel in practice
• Hard for people to understand
30
Ducks again
• All objects have closed boundaries
• Shape interacts in a rather vicious way with segmentation
• Find the duck shapes
32
Summary of Image Representation
• Pixels and Raster• Image Segmentation
– Tiles
– Regions
• Low-level Image Features– Colour
– Texture
– Shape
3434
Overview of Section 2Overview of Section 2
Quick Reprise on IRQuick Reprise on IR Navigational ApproachesNavigational Approaches Relevance FeedbackRelevance Feedback Automatic Keyword AnnotationAutomatic Keyword Annotation
3535
Reprise on Key Interactive IR Reprise on Key Interactive IR ideasideas
Index Time vs Query Time ProcessingIndex Time vs Query Time Processing Query TimeQuery Time
Must be fast enough to be interactiveMust be fast enough to be interactive Index (Crawl) TimeIndex (Crawl) Time
Can be slow(ish) Can be slow(ish) There to support retrievalThere to support retrieval
3636
An IndexAn Index
A data structure which stores data in a suitably A data structure which stores data in a suitably abstracted and compressed form in order to abstracted and compressed form in order to faciliate rapid processing by an applicationfaciliate rapid processing by an application
3939
Essential IdeaEssential Idea
Layout images in a virtual space in an Layout images in a virtual space in an arrangement which will make some sense to arrangement which will make some sense to the userthe user
Project this onto the screen in a Project this onto the screen in a comprehensible formcomprehensible form
Allow them to navigate around this projected Allow them to navigate around this projected space (scrolling, zooming in and out)space (scrolling, zooming in and out)
4040
NotesNotes
Typically colour is usedTypically colour is used Texture has proved difficult for people to Texture has proved difficult for people to
understandunderstand Shape possibly the same, and also user interface - Shape possibly the same, and also user interface -
most people can’t draw !most people can’t draw ! Alternatives include time (Canon’s Time Alternatives include time (Canon’s Time
Tunnel) and recently location (GPS Cameras)Tunnel) and recently location (GPS Cameras) Need some means of knowing where you areNeed some means of knowing where you are
4141
ObservationObservation
It appears people can take in and will inspect It appears people can take in and will inspect many more images than texts when searcingmany more images than texts when searcing
4242
CHROMACHROMA
Development in Sunderland: Development in Sunderland: mainly by Ting Sheng Lai now of National Palace mainly by Ting Sheng Lai now of National Palace
Museum, Taipei, TaiwanMuseum, Taipei, Taiwan Structure Navigation SystemStructure Navigation System
Thumbnail ViewerThumbnail Viewer
Similarity SearchingSimilarity Searching
Sketch ToolSketch Tool
4343
The CHROMA SystemThe CHROMA System
General Photographic ImagesGeneral Photographic Images
Global Colour is the Primary Indexing KeyGlobal Colour is the Primary Indexing Key
Images organised in a hierarchical Images organised in a hierarchical
classification using 10 colour descriptorsclassification using 10 colour descriptors and and
colour histogramscolour histograms
4646
Technical IssuesTechnical Issues
Fairly Easy to arrange image signatures so Fairly Easy to arrange image signatures so they support rapid browsing in this spacethey support rapid browsing in this space
4848
Relevance FeedbackRelevance Feedback
Well established technique in text retrievalWell established technique in text retrieval Experimental results have always shown it to work Experimental results have always shown it to work
well in practicewell in practice Unfortunately experience with search engines Unfortunately experience with search engines
has show it is difficult to get real searchers to has show it is difficult to get real searchers to adopt it - too much interactionadopt it - too much interaction
4949
Essential IdeaEssential Idea
User performs an initial queryUser performs an initial query Selects some relevant resultsSelects some relevant results System then extracts terms from these to System then extracts terms from these to
augment the initial query augment the initial query RequeriesRequeries
5050
Many VariantsMany Variants
PseudoPseudo Just assume high ranked documents are relevantJust assume high ranked documents are relevant
Ask users about terms to useAsk users about terms to use Include negative evidenceInclude negative evidence Etc. etc.Etc. etc.
5252
Why useful in Image Retrieval?Why useful in Image Retrieval?
1.1. Provides a bridge between the users Provides a bridge between the users understanding of images and the low level understanding of images and the low level features (colour, texture etc.) with which the features (colour, texture etc.) with which the systems is actually operatingsystems is actually operating
2.2. Is relatively easy to interface to Is relatively easy to interface to
5454
ObservationsObservations
Most image searchers prefer to use key words Most image searchers prefer to use key words to formulate initial queriesto formulate initial queries Eakins et al, Enser et alEakins et al, Enser et al
First generation systems all operated using low First generation systems all operated using low level features onlylevel features only Colour, texture, shape etc.Colour, texture, shape etc. Smeulders et alSmeulders et al
5555
Ideal Image Retrieval ProcessIdeal Image Retrieval Process
Thumbnail Browsing
Need KeywordQuery
More Like this
5656
Image Retrieval as Text RetrievalImage Retrieval as Text Retrieval
What we really want to do is make the image What we really want to do is make the image retrieval problem text retrievalretrieval problem text retrieval
5757
Three Ways to goThree Ways to go
Manually Assign Keywords to each imageManually Assign Keywords to each image Use text associated with the images (captions, Use text associated with the images (captions,
web pages)web pages) Analyse the image content to automatically Analyse the image content to automatically
assign keywordsassign keywords
5858
Manual KeywordingManual Keywording
ExpensiveExpensive Can only really be justified for high value Can only really be justified for high value
collections – advertisingcollections – advertising UnreliableUnreliable
Do the indexers and searchers see the images in Do the indexers and searchers see the images in the same waythe same way
FeasibleFeasible
5959
Associated TextAssociated Text
CheapCheap PowerfulPowerful
Famous names/incidentsFamous names/incidents Tends to be “one dimensional”Tends to be “one dimensional”
Does not reflect the content rich nature of imagesDoes not reflect the content rich nature of images Currently Operational - GoogleCurrently Operational - Google
6060
Possible SourcesPossible Sourcesof Associated textof Associated text
FilenamesFilenames Anchor TextAnchor Text Web Page Text around the anchor/where the Web Page Text around the anchor/where the
image is embeddedimage is embedded
6161
Automatic Keyword AssignmentAutomatic Keyword Assignment
A form of Content Based Image RetrievalA form of Content Based Image Retrieval
Cheap (ish)Cheap (ish) Predictable (if not always “right”)Predictable (if not always “right”) No operational System DemonstratedNo operational System Demonstrated
Although considerable progress has been made Although considerable progress has been made recentlyrecently
6262
Basic ApproachBasic Approach
Learn a mapping from the low level image Learn a mapping from the low level image features to the words or concepts features to the words or concepts
6363
Two RoutesTwo Routes
1.1. Translate the image into piece of textTranslate the image into piece of textn Forsyth and other sForsyth and other sn Manmatha and othersManmatha and others
2.2. Find that category of images to which a Find that category of images to which a keyword applieskeyword applies
n Tsai and TaitTsai and Taitn (SIGIR 2005)(SIGIR 2005)
6464
Second Session SummarySecond Session Summary
Separating Index Time and Retrieval Time Separating Index Time and Retrieval Time OperationsOperations
““First generation CBIR”First generation CBIR” Navigation (by colour etc.)Navigation (by colour etc.) Relevance FeedbackRelevance Feedback
Keyword based RetrievalKeyword based Retrieval Manual IndexingManual Indexing Associated TextAssociated Text Automatic Keywording Automatic Keywording
68
Video Retrieval
• All current Systems are based on one or more of:– Narrow domain - news, sport– Use automatic speech recognition to do
speech to text on the soundtrack– Do key frame extraction and then treat the
problem as still image retrieval
69
Missing Opportunities in Video Retrieval
• Using delta’s - frame to frame differences - to segment the image into foreground/background, players, pitch, crowd etc.
• Trying to relate image data to language/text data
70
Music Retrieval
• Distinctive and Hard Problem– What makes one piece of music similar to
another
• Features– Melody– Artist– Genre ?
73
Requirements
> 5000 Key word vocabulary > 5% accuracy of keyword assignment for all
keywords> 5% precision in response to single key word queries
The Semantic Gap Bridged!
74
CLAIRE
Example State of the Art Semantic CBIR System Colour and Texture Features Simple Tiling Scheme Two Stage Learning Machine
SVM/SVM and SVM/k-NNColour to 10 basic coloursTexture to one texture term per category
76
TextureClassifier
TextureClassifier
Architecture of ClaireIm
age
Seg
men
tati
on Col
our
Tex
ture K
ey w
ord
Ann
otat
ion
DataExtractor
DataExtractor
Known Key Word/classKnown Key Word/class
77
Training/Test Collection
Randomly Selected from Corel Training Set
30 images per category Test Collection
20 images per category
78
SVM/SVM Keywording with 100+50 Categories
0%
10%
20%
30%
40%
50%
60%
70%
10 30 50 70 100
concrete classesabstract classesbaseline
79
Examples Keywords Concrete
Beaches Dogs Mountain Orchids Owls Rodeo Tulips Women
Abstract Architecture City Christmas Industry Sacred Sunsets Tropical Yuletide
80
SVM vs kNN
0%
10%
20%
30%
40%
50%
60%
70%
10 30 50 70 100 150
SVM concreteSVM abstractbaselinekNN abstractkNN concrete
81
Reduction in Unreachable Classes
0
10
20
30
40
50
60
10 30 50 70 100 150
Missing Category Numbers
SVM concreteSVM abstractkNN concretekNN abstract
84
Keywording 200+200 Categories
SVM/1-NN
0%
10%
20%
30%
40%
50%
60%
10 30 50 70 100 150 200
concrete keywords
abstract keywords
baseline
Expon. (abstractkeywords)
85
Discussion Results still promising 5.6% of images have at least one
relevant keyword assigned Still useful - but only for a vocabulary of 400 words !
See demo at http://osiris.sunderland.ac.uk/~da2wli/system/silk1/ High proportion of categories which are never assigned
88
Effectiveness Comparison
8% (81)13.7% (25)
16.7% (15)
21.43% (5)
27.9% (2)
36.67% (0)
52.5% (0)
14.3% (30)
9.13% (69)
27.79% (1)
33% (0)
40.67% (1)
61.5% (0)
18.4% (9)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
10 30 50 70 100 150 200
No. of concrete classes
Ac
cu
rac
y
tiles
regions
8% (81)8.86% (44)
11.25% (21)
16.29% (7)
22.55% (2)26.33% (2)
52.5% (0)
9% (51) 9.13% (69)
9.25% (27)
14.14% (7)
31% (0)
21.06% (0)
48% (0)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
10 30 50 70 100 150 200
No. of abstract classes
Ac
cu
rac
y
tiles
regionsFive Tiles vs Five Regions1-NN Data Extractor
89
Next Steps
More categories Integration into complete systems Systematic Comparison with Generative approach
pioneered by Forsyth and others
90
Other Promising Examples Jeon, Manmatha and others -
High number of categories - results difficult to interpret Carneiro and Vasconcelos
Also problems with missing concepts Srikanth et al
Possibly leading results in terms of precision and vocabulary scale
91
Conclusions
Image Indexing and Retrieval is Hard Effective Image Retrieval needs a cheap and
predictable way of relating words and images Adaptive and Machine Learning approaches offer
one way forward with much promise
94
Early SystemsEarly SystemsThe following leads into all the major trends in systems based on colour, The following leads into all the major trends in systems based on colour,
texture and shapetexture and shape A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end
of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-1380, 2000.1380, 2000.
CHROMACHROMA Sharon McDonald and John TaitSharon McDonald and John Tait “Search Strategies in Content-Based Image Retrieval” Proceedings “Search Strategies in Content-Based Image Retrieval” Proceedings
of the 26of the 26thth ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-32003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-3
Sharon McDonald, Ting-Sheng Lai and Sharon McDonald, Ting-Sheng Lai and John Tait, John Tait, “Evaluating a Content Based Image Retrieval “Evaluating a Content Based Image Retrieval System” Proceedings of the 24System” Proceedings of the 24 thth ACM SIGIR Conference on Research and Development in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240 ..
Translation Based ApproachesTranslation Based Approaches P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image
Vocabulary” European Conference on Computer Vision, 2002.Vocabulary” European Conference on Computer Vision, 2002. K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of
machine Learning Research 3: 1107-1135, 2003.machine Learning Research 3: 1107-1135, 2003.
Very recent new paper on this is:Very recent new paper on this is: P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video
Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.
95
Cross-media Relevance Models etcCross-media Relevance Models etc J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media
Relevance Models” Relevance Models” Proceedings of the 26Proceedings of the 26 thth ACM SIGIR Conference on Research and Development ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126
See also recent unpublished papers onSee also recent unpublished papers on
http://ciir.cs.umass.edu/~manmatha/mmpapers.html
More recent stuffMore recent stuff G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval” G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval”
Proceedings of the 28Proceedings of the 28 thth ACM SIGIR Conference on Research and Development in Information ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005
M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image Annotation” Proceedings of the 28Annotation” Proceedings of the 28 thth ACM SIGIR Conference on Research and Development in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005
See also the SIGIR workshop proceedingsSee also the SIGIR workshop proceedings
http://mmir.doc.ic.ac.uk/mmir2005http://mmir.doc.ic.ac.uk/mmir2005