Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText...
Transcript of Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText...
![Page 1: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/1.jpg)
Bridge Semantic Gap: A Large Scale
Concept Ontology for Multimedia
(LSCOM)
Guo-Jun Qi
Beckman Institute
University of Illinois at Urbana-Champaign
![Page 2: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/2.jpg)
LSCOM (Large Scale Concept
Ontology for Multimedia)
A broadcast news video dataset
200+ news videos/ 170 hours
61,901 shots
Language
◦ English/Arabic/Chinese
![Page 3: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/3.jpg)
Why broadcast News ontology?
Critical mass of users, content providers,
applications
Good content availability (TRECVID LDC
FBIS)
Share Large set of core concepts with
other domains
![Page 4: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/4.jpg)
LSCOM Provides
Richly annotated video content for accomplishing required access and analysis functions over massive amount of video content
Large scale useful well-defined semantic lexicon
◦ More than 3000 concepts
◦ 374 annotated concepts
◦ Bridging semantic gap from low-level features to high-level concepts
![Page 5: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/5.jpg)
A LSCOM concept
000 - Parade
Concept ID: 000
Name: Parade
Definition: Multiple units of marchers,
devices, bands, banners or Music.
Labeled: Yes
![Page 6: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/6.jpg)
LSCOM Hierarchy
http://www.lscom.org/ontology/index.html
Thing
.Individual
..Dangerous_Thing
...Dangerous_Situation
....Emergency_Incident
.....Disaster_Event
......Natural_Disaster
....Natural_Hazard
.....Avalance
.....Earthquake
.....Mudslide
.....Natural_Disaster
.....Tornado
...Dangerous_Tangible_Thing
....Cutting_Device
![Page 7: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/7.jpg)
Definition: What’s the ontology?
(Wikipedia) An ontology is a formal representation
of the knowledge by a set of concepts
within a domain and the relationships
between those concepts. It is used to
reason about the properties of that
domain, and may be used to describe the
domain.
![Page 8: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/8.jpg)
Ontology
Represents the visual knowledge base in a
structure way
◦ Graph structure
◦ Tree (hierarchy) structure
Images/videos can be effectively learned
and retrieved by the coherence between
concepts
◦ Logical coherence
◦ Statistical coherence
![Page 9: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/9.jpg)
An Ontology Hierarchy: Military
Vehicle
![Page 10: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/10.jpg)
An example from Wikipedia
![Page 11: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/11.jpg)
Ontology Tree for LSCOM
![Page 12: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/12.jpg)
A Light Scale Concept Ontology for
Multimedia Understanding
(LSCOM-Lite) The aim is to break the semantic space using
a few concepts (39 concepts).
Selection Criteria
◦ Semantic Coverage
As many as semantic concepts in News videos could be covered by the light concept set.
◦ Compactness
These concept should not semantically overlap.
◦ Modelability These concepts could be modeled with a smaller
semantic gap.
![Page 13: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/13.jpg)
Selected concept dimensions
Divide the semantic space into a multimedia-dimensional space, where each dimension is nearly orthogonal
◦ Program Category
◦ Setting/Scene/Site
◦ People
◦ Objects
◦ Activities
◦ Events
◦ Graphics
![Page 14: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/14.jpg)
Histogram of LSCOM-Lite
Concepts
![Page 15: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/15.jpg)
Some example keyframes
![Page 16: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/16.jpg)
Applications
Application I: Conceptual Fusion (most
basic – early fusion)
Application II: Cross-Category
Classification (inter-class relation)
Application III: Event Dynamic in Concept
Space
![Page 17: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/17.jpg)
Application I: Conceptual Fusion
Video
Concept 1
Concept 2
Concept 3
Concept n
Visual
Features
Classifier
…
![Page 18: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/18.jpg)
LSCOM 374 Models
374 LIBSVM models
◦ http://www.ee.columbia.edu/ln/dvmm/columbi
a374/
◦ Feature used (MPEG-7 descriptors)
Color Moments
Edge Histogram
Wavelet Texture
◦ LIBSVM – a library for support vector
machine at
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
![Page 19: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/19.jpg)
Application II: cross-category
classification with concept transfer
G.-J. Qi et al. Towards Cross-Category
Knowledge Propagation for Learning
Visual Concepts, in CVPR 2011
![Page 20: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/20.jpg)
Instance-Level Concept Correlation
+1
-1
+1
-1
Mountain Castle
Mountain and castle
![Page 21: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/21.jpg)
Transfer Function
Mountain, Castle
Mountain
Castle
None of them
![Page 22: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/22.jpg)
Model Concept Relations
![Page 23: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/23.jpg)
Automatically construct ontology in
a data-driven manner
![Page 24: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/24.jpg)
An application III – Event Dynamics
in Concept Space
![Page 25: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/25.jpg)
Event Detection with Concept
Dynamics
W. Jiang et al, Semantic event detection
based on visual concept prediction, ICME,
Germany, 2008.
![Page 26: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/26.jpg)
Open Problems
Cross-Dataset Gap
◦ Generalize LSCOM dataset to other dataset (e.g., non-
news video dataset)
Cross-Domain Gap
◦ Text script associated with news videos
Can help information extraction for visual concepts?
Automatic ontology construction
◦ Task dependent v.s. task independent
◦ Data driven v.s. preliminary knowledge (e.g., WordNet)
◦ Incorporate prior human knowledge (logic relation
etc.)
![Page 27: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/27.jpg)
TRECVID Competition
Task 1: High-Level Feature Extraction
◦ Input: subshot
◦ Output: detection results for 39 LSCOM-Lite
concepts in the subshot
![Page 28: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/28.jpg)
High-Level Feature Extraction
Each concept assumed to be binary
(absent or present) in each subshot
Submission: Find subshots that contain a
certain concept, rank them by the
detection confidence score, and submit
the top 2000.
Evaluations: NIST evaluated 20 medium
frequent concepts from 39 concepts using a
50% random samples of all the submission pools
![Page 29: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/29.jpg)
20 Evaluated Concepts
![Page 30: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/30.jpg)
Evaluation Metric: Average Precision
Relevant subshots should be ranked
higher than the irrelevant ones.
R is the number of relevant images in total,
Rj is the number of relevant images in top
j images, Ij indicates if the jth image is
irrelevant or not.
1
1Average Precision
Nj
j
j
RI
R j
![Page 31: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/31.jpg)
Results
![Page 32: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/32.jpg)
TRECVID Competition
Task II: Video Search
◦ Input: text-based 24 topics
◦ Output: relevant subshots in the database
![Page 33: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/33.jpg)
Topics to search
![Page 34: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/34.jpg)
Topics to search (cont’d)
![Page 35: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/35.jpg)
Topics to search
![Page 36: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/36.jpg)
Three Types of Search Systems
![Page 37: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/37.jpg)
Results: Automatic Runs
![Page 38: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/38.jpg)
Results: Manual Runs
![Page 39: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/39.jpg)
Results: Interactive Runs
![Page 40: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/40.jpg)
Machine Problem 7: Shot Boundary
Detection in Videos
![Page 41: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/41.jpg)
Goals
Detect the abrupt content changes
between consecutive frames.
◦ Scene changes
◦ Scene cuts
![Page 42: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/42.jpg)
Steps
Step 1: Measuring the change of content
between video frames
◦ Visual/Acoustic measurements
Step 2: Compare the content distance
between successive frames. If the
distance is larger than a certain threshold,
then a shot boundary may exist.
![Page 43: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/43.jpg)
Measuring Content based on Visual
Information
256 dimensional Color Histogram
◦ In RGB space, normalize the r, g, b in [0,1]
◦ Color space
nr
ng
8X8 histogram
![Page 44: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/44.jpg)
Color Histograms Divide each image into four parts, each
part has a 8X8 histogram, and 256 dim
features in total.
![Page 45: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/45.jpg)
Acoustic Features
12 cepstral coefficients
Energy (sum of square of raw signals)
Zero crossing rates (ZCR)
ZCR = sum(|sign(S(2:N))-sign(S(1:N-1))|)
Hints: normalize energy to avoid it over-
dominating when computing distances
between successive frames
![Page 46: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/46.jpg)
Datasets
Two videos of little over one minute
Manually label the shot boundary
![Page 47: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/47.jpg)
What to submit
Source code
Report
◦ compare shot boundary detection results
returned by your algorithm with the manually
labeled boundaries
◦ Compare
◦ Explain your choice of threshold
◦ Explain the differences between the acoustic-
based and visual-based detection results
![Page 49: Guo-Jun Qi Beckman Institute University of Illinois at …ece417/LectureNotes/ECE417-LSCOM.pdfText script associated with news videos Can help information extraction for visual concepts?](https://reader033.fdocuments.us/reader033/viewer/2022050101/5f3ff0c7b606b734c31f5f0d/html5/thumbnails/49.jpg)
Thanks! Q&A