Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for...
-
Upload
allen-stevenson -
Category
Documents
-
view
215 -
download
1
Transcript of Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for...
![Page 1: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/1.jpg)
Copyright © 2004
Informedia Contexture:Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues
Alex Hauptmann, Howard WactlarCarnegie Mellon University
Pittsburgh, USA
October 2004
![Page 2: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/2.jpg)
Copyright © 2004
Our Meaning of Contexture
Definition: The weaving or assembly of [multimedia] parts into a cohesive whole in order to provide a more complete picture or information structure [to both questions and answers]
• Interpreting and communicating an associated visual and verbal context to information
• May contain language, imagery and gestures• May illuminate the meaning or significance• May explain its circumstances• More like a collegial expert response than an encyclopedic source
• Accelerating discovery by both system and analyst• Understanding video perspectives
• Subtle opinions, attitudes, biases• Both visual and textual rhetoric
• Continuously updating video biographs and event timelines
![Page 3: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/3.jpg)
Copyright © 2004
9/11/20013/18/1998 7/2/1999 9/19/2001 3/19/2003 9/17/2003
PeopleAssociation
Interviewee
Monologue
WordQuotation
VisualQuotation
A word quotation of Retired Wesley Clark describing for the president in the
Being a CNN Military Analyst on military deployment in preparation for the wa
Being an interviewee in a news program as a presidential candidate for
Wesley Clark (without being mentioned in the video clip) sitting next to Madeleine hearing on a resolution that would direct President Clinton to
A visual quotation of Wesley Clark (referred commander) describing President Milosevic’s intent in.
Being a CNN Military Analyst on action in the Iraq War.
Biograph data Visualizations
Synthesizedvideo clips
Contextual info
Conceptual Overview
-Semantic relations on entities- Harden with structured data- Perspective interpretation- Video ontology
Context Analysis0 01011100101010011010101001100010010011100
Extract Semantic Data
-Scene classification- Event detection- Title/topic labeling- Named entity extraction- Verify entities with structured data
Generate Information Contexture
- Understand questions - Provide context-rich answers - Produce video biographs - Enable context-based iterative
QA process
Analyst
Analyst’s Profile &QA History
Multiple Multimedia Information Sources
StructuredData
Domestic Sources. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
Foreign Sources. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
![Page 4: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/4.jpg)
Copyright © 2004
Scope of Work
Extracting information from video sources for finding
answers Applying broadcast TVnews ontology(joint with USC)
Understanding multimediaquestions and their context
Understanding the biasof source, topical, or
rhetorical perspectives
Integration andevaluation
Incorporating video biographs and
perspectives intoanswer contextures
Contexturedialogue
Learning fromthe analyst
![Page 5: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/5.jpg)
Copyright © 2004
Scope of Work
Applying broadcast TVnews ontology(joint with USC)
Extracting information from video sources for finding
answers
Understanding multimediaquestions and their context
Understanding the biasof source, topical, or
rhetorical perspectives
Incorporating video biographs and
perspectives intoanswer contextures
Contexturedialogue
Learning fromthe analyst
Integration andevaluation
![Page 6: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/6.jpg)
Copyright © 2004
Scope of Work
Extracting information from video sources for finding
answers Applying broadcast TVnews ontology(joint with USC)
Understanding multimediaquestions and their context
Understanding the biasof source, topical, or
rhetorical perspectives
Integration andevaluation
Incorporating video biographs and
perspectives intoanswer contextures
Contexturedialogue
Learning fromthe analyst
![Page 7: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/7.jpg)
Copyright © 2004
Understanding Multimedia Questions
1. Find shots of Pope John Paul II.
2. Find shots of a rocket or missile taking off
3. Find shots of the Tomb of the Unknown Soldier at Arlington National Cemetery.
4. Find shots of the front of the White House in the day-time with the fountain running.
![Page 8: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/8.jpg)
Copyright © 2004
Automatic Video Retrieval System
Multi-modal Query
Pope John Paul II
Video Library
Weighted Fusion of Similarity Rankings
Final Ranked List of Video Shots
…
Multiple Modality Video Analysis Experts
SpeechTrans.
Video OCR
Color Feature
Semantic Class Filter
Audio Feature
Texture Feature
![Page 9: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/9.jpg)
Copyright © 2004
Finding the combination weights
Offline
Online
Video Library
Learn Weights Learn
Weights
TrainingQueries
Training Data
Similarity rankings from multiple experts
Query
![Page 10: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/10.jpg)
Copyright © 2004
Finding the combination weights
Offline
Online
Video Library
ClassifyQueriesClassifyQueries
Learn Weights Learn
Weights
TrainingQueries
Training Data
Similarity rankings from multiple experts
Query
![Page 11: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/11.jpg)
Copyright © 2004
Query Types for Video Retrieval• Named person queries, possibly with constraints
“Find shots of Yasser Arafat“, “Find shots of Ronald Reagan speaking".
• Named object queries for an object with a unique name.
“Find shots of the Statue of Liberty“, “Find shots of the Mercedes logo".
• General object queries for a type of objects. They may be qualified.
“Find shots of snow-covered mountains“, “Find shots of one or more cats".
• Scene queries for multiple types of objects in certain relationships.
“Find shots of roads with lots of vehicles“, “Find shots of people spending leisure time on the beach".
![Page 12: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/12.jpg)
Copyright © 2004
Finding the Combination Weightsfor Merging Search Results
• Uniform, fixed weights for all queries• Individual weightings for each query
• Not enough known about each query
• Weightings for each of 4 query types
Text search usually does better and is more consistent than any other single search modality
![Page 13: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/13.jpg)
Copyright © 2004
Query Classification
Named-Entityextraction
POS tagging + NP chunking
Syntacticparsing
Person Q Specific Object Qpeople name
organization or location name
Scene Qmultiple NPs
single NP
no propername
Scene QGeneral Object Q
“Find shots of Bill Clinton”
“Find shots of Capitol Hill”
“Find shots with (multiple pedestrians) and (multiple vehicles in motion)”
“Find shots of (a person diving into the water)”
nested NPsno nested NP
“Find shots of (one or more cats)”
Query X
![Page 14: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/14.jpg)
Copyright © 2004
Hierarchical Mixture of Experts
VideoShots
Query
TextRetrieval
RetrievalExpert 1
RetrievalExpert n
2
1
( | , )u uk k
k
P R q s
QueryType
n
k
lk
lk sqRPf
1
),|(
uu
ll
![Page 15: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/15.jpg)
Copyright © 2004
Performance of different weighting schemes
0.1
0.15
0.2
0.25
0.3
0.35
MAP PREC@10 PREC@30
Text Only Q-Uniform Oracle Q-Type Single MoE Q-Type Hier MoE
![Page 16: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/16.jpg)
Copyright © 2004
Performance of different weighting schemes
0.1
0.15
0.2
0.25
0.3
0.35
0.4
MAP PREC@10 PREC@30
Text Only Q-Uniform Oracle Q-Type Single MoE Q-Type Hier MoE Q-Specific Oracle
![Page 17: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/17.jpg)
Copyright © 2004
Current Limitations
• Unable to assign multiple query types to one query• “Finding Bill Clinton
speaking in front of a US flag” (person, object)
• Unable to capture the query-specific aspects • “Finding day-time scenes
of the Federal Reserve Building, Washington DC”
![Page 18: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/18.jpg)
Copyright © 2004
Scope of Work
Extracting information from video sources for finding
answers Applying broadcast TVnews ontology(joint with USC)
Understanding multimediaquestions and their context
Understanding the biasof source, topical, or
rhetorical perspectives
Integration andevaluation
Incorporating video biographs and
perspectives intoanswer contextures
Learning fromthe analyst
Contexturedialogue
![Page 19: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/19.jpg)
Copyright © 2004
Labeling Every Face with aNews Structure Model (NSM)
Sources of information:• Audio transcripts + Named Entity extraction• Overlaid text• Speaker audio characteristics• Temporal position of name w.r.t. video segment• Temporal structure of news (“Grammar”)
• Constraints based on image similarity• Constraints from speaker audio similarity
![Page 20: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/20.jpg)
Copyright © 2004
Baseline Algorithm
shot s
Transcript clues exist for anchor OR s is first shot in story
Transcript clues exist for reporter
reporter name(s)by distance
news-subject name(s)by distance
anchorname
Y
Y N
N
![Page 21: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/21.jpg)
Copyright © 2004
Overlaid Text with Video OCR
Overlaid text
Rep. NEWT GINGRICH
VOCR text
rgp nev~j ginuhicij
Edit distance to names:
Bill Clinton (0.67)
Newt Gingrich (0.46)
David Ensor (0.72)
Saddam Hussein (0.78)
Elizabeth Vargas (0.88)
Bill Richardson (0.80)
![Page 22: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/22.jpg)
Copyright © 2004
news-subject
anchor
reporter
news-subject
news-subject
anchorreporter
reporter
anchor
Detection of Anchors, Reporters and News-Subjects
reporter
reporter
reporternews-subject
news-subject
news-subject
anchor
![Page 23: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/23.jpg)
Copyright © 2004
Image and Audio Similarity Constraints
Speaker ID
Shot
1 65432
span = 1 span = 3span = 2I I I I II
7 1098 11
I I I I I
span = 2 span = 3span = 3
Constraints from image similarity
Constraints from speaker IDs
![Page 24: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/24.jpg)
Copyright © 2004
Naming Accuracy of Different Approaches
MAP for
(count)
Anchor
(187)
Reporter
(64)
News-Subject
(125)
Overall
(376)
Top-1Baseline 0.834 0.359 0.256 0.561
NSM 0.957 0.734 0.512 0.771
Top-3Baseline 0.877 0.515 0.560 0.710
NSM 0.983 0.922 0.752 0.896
![Page 25: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/25.jpg)
Copyright © 2004
Featureextraction
Visual Gender Classification
Face detection
Originalscale
Haar wavelets
Boostingclassifiers
Output
male
male
female
female
male
Correct Error
![Page 26: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/26.jpg)
Copyright © 2004
Interface Showing the People Labeled with Names
![Page 27: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/27.jpg)
Copyright © 2004
Scope of Work
Understanding the biasof source, topical, or
rhetorical perspectives
Contexturedialogue
Extracting information from video sources for finding
answers Applying broadcast TVnews ontology(joint with USC)
Understanding multimediaquestions and their context
Integration andevaluation
Incorporating video biographs and
perspectives intoanswer contextures
Learning fromthe analyst
![Page 28: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/28.jpg)
Copyright © 2004
Finding Stories with Different Perspectives
![Page 29: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/29.jpg)
Copyright © 2004
Show length and shot type
22 sec
2 min 24 sec
12 sec
![Page 30: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/30.jpg)
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq.
Perspective of broadcaster can be seen in text overlay
![Page 31: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/31.jpg)
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq.
FOX uses faster cut rate and has more participation by the anchor
![Page 32: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/32.jpg)
Copyright © 2004
Scope of Work
Integration andevaluation
Contexturedialogue
Extracting information from video sources for finding
answers Applying broadcast TVnews ontology(joint with USC)
Understanding multimediaquestions and their context
Understanding the biasof source, topical, or
rhetorical perspectives
Incorporating video biographs and
perspectives intoanswer contextures
Learning fromthe analyst
![Page 33: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/33.jpg)
Copyright © 2004
Metrics-based Evaluations
NIST TRECVID 2004 Video Search Evaluation
• Submitted classification results for 10 different semantic features• Similar to a “Routing Task” for video clips
• Submitted Informedia system video search answers for
• Interactive runs comparing expert/novice users
• Interactive runs using either complete or only visual information
• Automatic/Manual runs contrasting components of the system
Results to be announced later in October ...
![Page 34: Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard.](https://reader030.fdocuments.us/reader030/viewer/2022032706/56649de45503460f94adad47/html5/thumbnails/34.jpg)
Copyright © 2004
Thank you
Carnegie Mellon UniversityPittsburgh, PA
USA