Nonnegative Shared Subspace Learning and IAlii SilIts Application … · 2011-03-24 · Previous...
Transcript of Nonnegative Shared Subspace Learning and IAlii SilIts Application … · 2011-03-24 · Previous...
Nonnegative Shared Subspace Learning and I A li i S i lIts Application to Social M di R t i lMedia RetrievalPresenter: Andy Limy
Paper TopicPaper Topic• Folksonomy
• Social media sharing platforms
The ProblemThe Problem• Rise in popularity of social image and video sharing platforms
• Precision of tag‐based media retrieval
• Tags are• Noisyy• Ambiguous• Incomplete• Subjective• Subjective
• Lack of constraints
Tags: hotdog, chinese, trololol, aidjishi, sandwich, bread
• Free‐text tags (i.e. “djfja;sldfkj”)
Previous Research(Internal)• Improving tag relevance
• Sigurbjornsson and ZwolD l d th d f di t f l t t• Developed a method of recommending a set of relevant tags based on tag popularity
• Li et al.• List all images for a given tag and determine tag relevance from visual similarity
• All are confined to noisy tags within the primary dataset
The ApproachThe Approach• Internal vs. External
• Leverage external auxiliary sources of information to improve target tagging systems (presumably much noisier)target tagging systems (presumably much noisier)
• Exploit disparate characteristics of target domain using auxiliary source
• Note: What is the optimal level of joint modeling such that the• Note: What is the optimal level of joint modeling such that the target domain still benefits from the auxiliary source?
AssumptionsAssumptions• There is a common underlying subspace shared by the primary and secondary domains
• The primary domain is much nosier than the secondary• The primary domain is much nosier than the secondary domains
Nonnegative Matrix Factorization
• X (M x N data matrix) where N = documents in terms of MX (M x N data matrix) where N documents in terms of M vocabulary words
• F (M x R nonnegative matrix) represents R basis vectors
• H (R x N nonnegative matrix) contains coordinates of eachH (R x N nonnegative matrix) contains coordinates of each document
Joint Shared Nonnegative Matrix Factorization (JSNMF)• Input:p
• X (target domain), Y (auxiliary domain), R1 and R2 (dimensionality of underlying subspaces of X and Y), K (basis vectors)
• Output:• W (joint shared subspace), U (remaining subspace in target domain), V (remaining subspace in(remaining subspace in auxiliary domain), H (coordinate matrix for target domain), L ( di t t i f(coordinate matrix for auxiliary domain)
Retrieval using JSNMFRetrieval using JSNMF• Input: W, U, H, query sentence S number ofsentence SQ, number of images (or videos) to be retrieved N and image (or video) datasetvideo) dataset
• Output: Return top N retrieved images (or videos)
ExperimentExperiment• Use LabelMe tags (auxiliary) to improve
• Image retrieval in Flickr• Video retrieval in Youtube
• Why LabelMe?• Object image tagging• Controlled vocabulary
Flickr DatasetFlickr Dataset• Downloaded 50,000 images from Flickr
• Average number of distinct tags = 8
• Removed• Rare tags (appears less than 5 times)g ( pp )• Images with no tags and non‐English tags
• Obtained 20 000 labeled images• Obtained 20,000 labeled images
• 7,000 examples are kept for investigating internal auxiliary dataset
YouTubeDatasetYouTube Dataset• Downloaded 18,000 videos’ metadata (tags, URL, category, title, comments, etc.)
• Average number of distinct tags = 7g g
• Removed• Rare tags (appearing less than 2 times)• Rare tags (appearing less than 2 times)• Videos with no tags or non‐English tags
Obt i d d t t di t 12 000 id• Obtained dataset corresponding to 12,000 videos
• Again, kept 7,000 examples to be used as an internal auxiliary dataset
LabelMeDatasetLabelMeDataset• Added 7,000 images with tags from LabelMe
• Average number of distinct tags = 32
• Removed• Rare tags (appearing less than 2 times)g ( pp g )
• Cleanup does not reduce dataset
EvaluationMeasuresEvaluation Measures• Defined query set Q
• {cloud, man, street, water, road, leg, table, plant, girl, drawer, lamp, bed, cable, bus, pole, laptop, plate, kitchen, river, pool, flower}
• Manually annotated the two datasets (Flickr and YouTube) with respect to the query set (no benchmark datasetwith respect to the query set (no benchmark dataset available)
• Query term and an image is relevant if the concept is clearly visible in the image (or video)
Results with JSNMFResults with JSNMF• Precision‐Scope Curve
• Fix recall at 0.1U ll l i t t d i• Users are usually only interested in first few results
• 10% improvement
Results with JSNMFResults with JSNMF• Under‐representation
• Shares very few basis vectors
O t ti• Over‐representation• Forces many basis vectors to represent both datasets
• Appropriate level of representationp
Flickr Retrieval ResultsFlickr Retrieval Results
• Results are better with LabelMeResults are better with LabelMe
• As recall increases, precision decreases
• When K=0 (no sharing) or K=40 (fully sharing), precision is lower compared to K=15lower compared to K 15
YouTube Retrieval ResultsYouTube Retrieval Results
• Similar to Flickr Results
Extra Notes &Questions?Extra Notes & Questions?
• Can be extended to multiple datasets (not just 2)
• Can use generic model to apply to other data mining tasksCan use generic model to apply to other data mining tasks