Geophoto Memex
Liangliang Cao
What is “Geophoto Memex”?
• Geophoto Memex: Record all the photos that are associated with locations in
the world, and provide geographical analytics on request.
• Our related papers: – ACMMM 2009: “Enhancing semantic and geographic annotation
of web images via logistic canonical correlation regression”
– ICASSP 2010: “A worldwide tourism recommendation system based on geotagged web photos”
– SDM 2011: “Diversified Trajectory Pattern Ranking in Geo-tagged Social Media”
– WWW 2011: “Geographical topic discovery and comparison”
Geophotos
• Where is the data from? − Advanced cameras with GPS receivers
− GPS sensor in smart phones
− Web Apps including Google Earth, Flickr, Twitter
3
Project Overview
Data Collection
Data Cleaning
Data
Analytics
4
• Already collected 1M geo-tagged photos
• Aim to collect
− 100+M geo-photo from Flickr
− More geo-tagged document from Twitter
Project Overview
Data Collection
Data Cleaning
Data
Analytics
5
• Remove the
label ambiguity
• Refine the
annotation
Project 1
Project Overview
Data Collection
Data Cleaning
Data
Analytics
6
• Tourism
Recommendation
• Geo-info discovery
• User interest mining
Project 2 Project 3
Our Projects
7
Geographical &
Semantic Annotation
Tourism
Recommendation Geographical Topics in
Social Media
Project 1 Project 2 Project 3
Geographical & Semantic Annotation
8
2006, clouds, sc,
d50, mywinners,
nikon, pond,
reflections, sun,
september
• User-provided annotation are usually limited and noisy.
– ambiguous or irrelevant labels
– Only a small amount of photos are geo-tagged
• Is it possible to refine and enrich these annotations?
– Large scale visual recognition can help.
– We combine both visual feature and tag features into the classifiers.
Geographical & Semantic Annotation
9
2006, clouds, sc,
d50, mywinners,
nikon, pond,
reflections, sun,
september
Annotation questions:
– What exists in the image?
– Where was the image taken?
Geographical & Semantic Annotation
10
2006, clouds, sc,
d50, mywinners,
nikon, pond,
reflections, sun,
september
• Annotation cue lies in different features.
• We train a model using Flickr data to annotate the images automatically.
Image
Geographical & Semantic Annotation
11
2006, clouds, sc,
d50, mywinners,
nikon, pond,
reflections, sun,
september nature, sky, water
New
Annotation
Combining Visual Features and Noisy Labels
• There are multiple features for online images
– Visual features: color, shape, GIST…
– Noisy annotations
• We explore the canonical correlations between multiple feature and use them to enrich the annotations
12
Canonical Correlation Analysis
Let x and y represent two feature vectors, CCA looks for the projection
where the optimal a, b maximize the correlation in projected subspaces
It is easy to show that the solution can be found by solving the general eigen decomposition problem
13
CCA for a Toy Example
Neither of the two dimensions in original space characterizes the
linear correlation. However, after projecting the data into the
canonical space, we can see the linear correlation clearly.
14
Logistic Canonical Correlation Regression
• Given multiple features, we can compute the canonical correlations between the feature and a given label.
• To combine the clues from multiple features, we employ the logistic canonical correlation regression (LCCR) model, which maximizes the likelihood
• The estimated function is
where is the correlation between label and the m-th feature, is the parameter for the logistic model.
15
Dataset
• We collect 380,573 images with tags and GPS records from Flickr.
• The number of tags for each image
– varies from zero to over ten
– the average number is 4.96 tags per image.
• The GPS location:
– The scope of the geographic areas is within the North America (users in other areas may use different languages for tags)
16
Enhancing Semantic Annotation
• We employ 66 semantic concepts (tags) for semantic annotation: most popular labels in Flickr
• We train our LCCR model based on multiple features
– 6 visual features: LAB color histogram, GIST, tiny image, LAB color of tiny image, image projection in PCA and LDA spaces.
– Existing tag features: we remove the terms that are the same as labeling concepts in the process of both training and testing because they are the very annotation we are trying to predict.
17
More Examples
canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull
river, colors, sony, quebec, minolta, paysage, a100, automne
nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper
More Examples
canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull, bird, nature, sea
river, colors, sony, quebec, minolta, paysage, a100, automne, autumn, fall, landscape
nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper, autumn, fall, flower, garden, nature
More Examples
canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull, bird, nature, sea
20
river, colors, sony, quebec, minolta, paysage, a100, automne, autumn, fall, landscape
nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper, autumn, fall, flower, garden, nature
Evaluation: Geographical Annotation
21
Evaluation Semantic Annotation
22
Our Projects
23
Geographical &
Semantic Annotation
Tourism
Recommendation Geographical Topics in
Social Media
Project 1 Project 2 Project 3
Tourism Recommendation from Image Retrieval
24
Query
Recommended places
Similar photo
from indexed
dataset
Find Popular Attractions
Popular attraction are usually those with many photos.
(different color denotes different attractions)
Mining Top Tourist Routes in Big Cities
26
London Eye →
Big Ben →
Downing Street →
Horse Guards →
Trafalgar Square
Apple Store →
St.patrick Cathedral →
Rockefeller Center
Eiffel Tower →
Louvre →
Notredame
Our Projects
27
Geographical &
Semantic Annotation
Tourism
Recommendation Geographical Topics in
Social Media
Project 1 Project 2 Project 3
Topics over Geographical Regions [WWW’ 11]
28
Input:
Output:
1. Geographic topics 2. Topics at a location
Motivations
• Goal: – Analyze the cultural differences around the world
– Explore the hot topics or events in different places
– Compare the popularity of specific products in different regions
• Latent Geographical Topic Analysis – The topics are generated from regions instead of
documents
– If two words are close to each other in space, they are more likely to belong to the same region
– If two words are from the same region, they are more likely to be clustered into the same topic
29
Latent Geographical Topic Analysis
region
importance
(N-d vector)
region geo-information
{p(z|r)} {p(w|z)}
location shape
30
Location/Text Perplexity
}
N
)l,p(w log
exp{)(Dperplexity
test
test
Dd d
Dd dd
testextlocation/t
Geographical Topic Comparison
• Food dataset
Topics over Geographical Regions
33
Italian food
Japanese food Chinese food
Spanish food Mexican food
French food
The redness
represents the
probability of
each topic at a
location.
Distinguish Different Landscapes
34
Beach Desert Mountains
Acknowledgement To My Terrific Collaborators
Thomas Huang Jiawei Han
Zhijun Yin Jiebo Luo Andew Gallagher
Top Related