Gunhee Kim Leonid Sigal Eric P. Xing
description
Transcript of Gunhee Kim Leonid Sigal Eric P. Xing
![Page 1: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/1.jpg)
1
Joint Summarization of Large-scale Collections of Web Images and Videos
for Storyline Reconstruction
Gunhee Kim Leonid Sigal Eric P. Xing
June 16, 2014
![Page 2: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/2.jpg)
2
• Problem Statement • Algorithm
Video summarization Storyline reconstruction
• Experiments• Conclusion
Outline
![Page 3: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/3.jpg)
3
Background
Online photo/video sharing becomes so popular
Information overload problem in visual data
Average 3,000 pictures uploaded per minute
100 hours of video are uploaded per minute
Any efficient and comprehensive summary?
![Page 4: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/4.jpg)
4
Our Objective
Jointly summarize large sets of online images and videos• The characteristics of two media are complementary
A user video
Videos: Much redundant and noisy information
backlit subjectsfull of trivial BGoverexposure
A set of photo streams
Images: More carefully taken from canonical viewpoints
Video summarizationCollections of Images
![Page 5: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/5.jpg)
5
Our Objective
Jointly summarize large sets of online images and videos• The characteristics of two media are complementary
A set of user videos
Images: Sequential structure is often missing
A photo stream
Videos: Motion pictures
Image summarization Collections of Videos
![Page 6: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/6.jpg)
Problem Statement
6
(Input) A set of photo streams and user videos for a topic of interest
• Edges: chronological or causal relations (i.e., recur in many photo streams)
• Vertices: dominant image clusters
(Output1) Video summary: keyframe-based summarization
(Output2) Image summary as Storyline graph
![Page 7: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/7.jpg)
7
Flickr and YouTube Dataset
20 outdoor recreational classes
SurfingBeach
HorseRiding
RAfting
YAcht
Air Ball-ooning
ROwing
ScubaDiving
FormulaOne
SNowboarding
SafariPark
MountainCamping
RockClimbing
Tour deFrance
LondonMarathon
FlyFishing
• # videos (15,912)
Independ-ence Day
ChineseNew year Memorial
DaySt.Patrick
Day
Wimble-don
• # images/photo streams (2,769,504, 35,545)
![Page 8: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/8.jpg)
8
• Problem Statement • Algorithm
Video summarization Storyline reconstruction
• Experiments• Conclusion
Outline
![Page 9: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/9.jpg)
9
Algorithm for Video Summarization
1. For each video , find the K-nearest photo streams
• Extreme diversity even with the same keywords
• Use Naïve-Bayes Nearest Neighbor method
A user video
A set of photo streams
2. Build a similarity graph between video frames and images
![Page 10: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/10.jpg)
10
Algorithm for Video Summarization
1. For each video , find the K-nearest photo streams
• Extreme diversity even with the same keywords
• Use Naïve-Bayes Nearest Neighbor method
A user videos
A set of photo streams
2. Build a similarity graph between video frames and images
• k-th order Markov chain between frames• Each image casts m similarity votes
![Page 11: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/11.jpg)
11
Algorithm for Video Summarization
3. Solve the following optimization problem of diversity ranking
A user videos
A set of photo streams
• Choose the nodes to place heat source to maximize the temperature• Sources should be (i) densely connected nodes, (ii) distant one another.
Submodular
[Kim et al. ICCV 2011]
A simply greedy achieves a constant factor approximation
![Page 12: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/12.jpg)
12
• Problem Statement • Algorithm
Video summarization Image summarization (Storyline reconstruction)
• Experiments• Conclusion
Outline
![Page 13: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/13.jpg)
13
Definition of Storyline Graphs
A storyline graph• : the vertex set = the set of codewords (i.e. image clusters)
Edges should be Sparse and Time-varying [Song et al. 09, Kolar et al.10]
• Images are too many, and much of them are largely redundant• : popular transitions recurring across many photo streams
Sparsity : only a small number of branching stories per node • A few nonzero elements in
![Page 14: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/14.jpg)
14
Definition of Storyline Graphs
Edges should be Sparse and Time-varying [Song et al. 09, Kolar et al.10]
Time-varying: popular transitions change over time
timeline
t = 10AM t = 12PM t = 2PM
Cluster 10 25
44
A storyline graph• : the vertex set = the set of codewords (i.e. image clusters)• Images are too many, and much of them are largely redundant• : popular transitions recurring across many photo streams
At 1PM
At 7PM
![Page 15: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/15.jpg)
15
Directed Tree Derived from Photo Stream
1. For each photo stream , find the K-nearest videos
• Use Naïve-Bayes Nearest Neighbor method
2. k-th order Markov chain btw images in a photo stream
4. Additional links are connected based on one-to-one correspondences
3. Keyframe detection for each neighbor video
![Page 16: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/16.jpg)
16
Directed Tree Derived from Photo Stream
5. Replace the vee structure (impractical artifact) by two parallel edges
✗• and are followed by .
• Both and must occur in order for to appear.
![Page 17: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/17.jpg)
17
Inferring Photo Storyline Graphs (1/3)
Input: A set of photo streams
Output : A set of adjacency matrices for
Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions
(A1) All photo streams are taken independently
Likelihood of a single photo stream
(A2) k-th order Markovian assumption btw consecutive images in PS (ex. k=1)
(A3) The codewords of xli are conditional independent one another given xl
i-1
Transition model
![Page 18: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/18.jpg)
18
Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions
Inferring Photo Storyline Graphs (2/3)
•
For transition model, use a linear dynamic model
where Gaussian noise
• 1st order Markovian assumption
• k-th order Markovian assumption
A transition from x to y is very unlikely!
whereTransition model
![Page 19: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/19.jpg)
Objective: Derive the likelihood of an observed set of photo streams with reasonable assumptions
Inferring Photo Storyline Graphs (3/3)
where
For transition model, use a linear dynamic model
where Gaussian noise
• 1st order Markovian assumption
• The transition model per dimension can be
The log likelihood
Transition model
d-th row
![Page 20: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/20.jpg)
20
Optimization (1/2)
• (A4) Graphs vary smoothly over time.
For each t , estimate At by maximizing the log-likelihood
Optimization
Data (i.e. images) Timeline
Gaussian Kernel weighting
![Page 21: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/21.jpg)
21
Optimization (2/2)
In summary, the graph inference is
Iteratively solve a weighted L1-regularized least square problem
• Trivially parallelizable (for each d)
• Linear-time algorithm (eg. Coordinate descent)
• Important in our problem (i.e. handling millions of images).
where
Sparsity
![Page 22: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/22.jpg)
22
• Problem Statement • Algorithm
Video summarization Storyline reconstruction
• Experiments• Conclusion
Outline
![Page 23: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/23.jpg)
23
Evaluation of Video Summarization via AMT
(OursV): our method with videos only. (OursIV): our method with videos and images(Unif): uniform sampling. (Spect),(Kmeans): Spectral clustering/Kmeans(RankT): Keyframe extraction methods using the rank-tracing technique
Groundtruths for video summarication via Amazon Mechanical Turk
• (1) For each of 100 test videos, each algorithm selects K keyframes
• (2) At least five turkers are asked to choose GT keyframes
• (3) Compare between GT keyframes and ones chosen by the algorithm
![Page 24: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/24.jpg)
24
Comparison of Video Summarization
air+ballooning fly+fishing
AMT
(OursIV)
(OursV)
(Kmean)
(Unif)
(Unif): cannot correctly handle different lengths of subshots
(OursIV): Get help from the voting by more carefully taken images
(Kmean): hard to know best K
(OursV): suffer from the limitations of using low-level features only
![Page 25: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/25.jpg)
25
Evaluation on Storyline Graphs via AMT
Main difficulty of quantitative evaluation
• No groudtruth available.
• For a human subject, images and too many and graphs are too big
Crowdsourcing-based evaluation via
Ex) fly+fishingWhich is
better?
![Page 26: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/26.jpg)
26
Evaluation on Storyline Graphs via AMT
1. Each algorithm creates storyline per topic.
2. Sample 100 important images as test images
3. Each algorithm predicts next most-likely image after the test image
4. A pairwise preference test• Given the test image, which of A and B is more likely to come next?
✔ Our method
Baseline 2
• Get responses from at least 3 turkers per test image
A crowd of human subjects evaluate only a basic unit (i.e. important edges of storyline).
Test image A
B
![Page 27: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/27.jpg)
27
Quantitative of Storyline Graphs via AMT
Results of pairwise preference tests
• The numbers indicates the percentage of responses that our prediction is more likely to occur next.
(OursV): our method with videos only. (OursIV): our method with videos and imagesNET: Network-based topic models ([Kim et al. 2008]) HMM: Hidden Markov ModelsPage: PageRank based image retrieval (no structural info)
• At least the number should be higher than 50% to validate the superiority of our algorithm.
![Page 28: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/28.jpg)
28
Qualitative Evaluation on Storyline Graphs
Given a pair of images in a novel photo stream, predict 10 images that are likely to occur between them using its storyline graph
• (HMM) retrieves reasonably good but highly redundant images. No branching structure.
• (PageRank) retrieves high-quality images but no sequential structure.
GT
Ours
(HMM)
(PageRank)
![Page 29: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/29.jpg)
29
Qualitative Evaluation on Storyline Graphs
Given a pair of images in a novel photo stream, predict 10 images that are likely to occur between them using its storyline graph
GT
Ours
A downsized storyline graph
![Page 30: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/30.jpg)
30
• Problem Statement • Algorithm
Video summarization Storyline reconstruction
• Experiments• Conclusion
Outline
![Page 31: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/31.jpg)
31
Structural summary with branching narratives
• Global optimality, linear complexity, and easy parallelization
Joint summarization of Flickr images and YouTube videos
Inference algorithm for sparse time-varying directed graphs
Conclusion
Semantic summary even with simple feature similarity
• 2.7M Flickr images and 17K YouTube videos for 20 classes
Images: More carefully taken from canonical viewpoints• The characteristics of two media are complementary
Videos: Motion pictures
![Page 32: Gunhee Kim Leonid Sigal Eric P. Xing](https://reader035.fdocuments.us/reader035/viewer/2022062410/568163d8550346895dd52a9b/html5/thumbnails/32.jpg)
32
Thank you !