Story Compression: Aggregating News...

Story Compression: Aggregating News Feeds Joseph W. Barker

Advisor: James W. Davis Ohio State University

What is Story Compression? • News broadcasts from multiple sources tend to cover same stories • Stories have content overlap – General content covered by multiple sources – Specific content covered by one source

• Information gathering – Waste time if view all broadcasts (general content → redundancy) – Miss information if only view one broadcast (specific content)

• Answer: Story Compression – Detect general vs. specific content and create single story from all

broadcasts with no redundancy

Overview • Divide story into content segments (i.e., single idea) – Video shot (continuous scene) detection

• Compare segments – Speech/text contains most of the informational content – Word similarity → Segment Similarity

• Detect specific vs. general segments

Word Similarity

• Focus on concepts rather than specific word matching

• Graph-based hierarchy of word-concept relationships

– E.g., WordNet

• Malik et. al 2007

– 𝑠𝑖𝑚 𝑤1, 𝑤2 =2∙𝑑𝑖𝑠𝑡(𝑟𝑜𝑜𝑡,𝐿𝐶𝑆 𝑤1,𝑤2 )

𝑑𝑖𝑠𝑡 𝑟𝑜𝑜𝑡,𝑤1 +𝑑𝑖𝑠𝑡(𝑟𝑜𝑜𝑡,𝑤2)

• Li et. al 2003

– 𝑠𝑖𝑚 𝑤1, 𝑤2 =

𝑒−𝛼 𝑑𝑖𝑠𝑡 𝑤1,𝑤2 tanh (𝛽 𝑑𝑖𝑠𝑡 𝑟𝑜𝑜𝑡, 𝐿𝐶𝑆 𝑤1, 𝑤2 )

Feline

Mammal

Canine

Poodle

Object

Segment Similarity • Sentence similarity? – Segments range from sub-sentence to

multiple sentences – Also, sentence boundaries (when multiple)

poorly defined – Sentence similarity emphasizes

grammar/word order; won’t work

• If ordering is problematic, use unordered groups instead

• Solution: Graph collapsing – Group of nodes collapsed to single node by

summing edge weights – Inspired by spectral clustering and notion

of random walk on graphs – Random walk between groups equivalent

to random walk between collapsed nodes

Segment Similarity

Word Similarity

Most Unique Segments • Manual segmentation

employed • Specific content • Uniqueness → overall

dissimilarity • Perfect dissimilarity →

similarity matrix rows/columns zero except for diagonal

• Thus, sum of row/column should approach zero for most dissimilar segments

Most Related Segments • General content • Related → group self-

similar • Perfect self-similarity →

similarity matrix elements for group all one

• Thus, sum of elements should approach 𝑛2 (𝑛=number in group)

0 10 20 30 40 50 60 70 80 90 1003.3

3.8Segment Pair Similarity (higher is better)

ilarity

Segment pairs (sorted)

0 5 10 15 20 25 30 35 40 450.014

Segment Uniqueness (lower better)

ueness

Segments (sorted)

Perfect dissimilarity Somewhat dissimilar

Perfect similarity Somewhat similar

Automatic Segment Detection • How to decide boundaries

between segments? – No sentence boundaries, so text

not strong indicator • Shot detection: Detect visual

change from one scene to another

• Common techniques: – Temporal extent

• Consecutive: compare sequential pairs of frames

• Key frame: compare to “key” frame of previous segment

– Distance measures • Pixel-based: Sum of Absolute

Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC)

• Color-based (histograms): χ2, Bhattacharyya

• Texture-based: Scale Invariant Feature Transform (SIFT)

Towards Improving Segment Detection • Common methods give mediocre

performance • May be due to only examining single

temporal extent • Possible solution: Use graph

collapsing to examine all temporal extents simultaneously

• Sum of blocks on diagonal approaches 𝑛2 if members in segment

• Sum of block anti-diagonal approaches zero if corner is segment boundary

• Current problem: Scale of valleys (boundaries) varies quadratically with segment size, simple peak finding not good enough

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.9Shot Detection: Key Frame (First)

Normalized threshold (1 = perfect match)

SIFT-MR

BATTA-H16

CHI2-H16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.9Shot Detection: Consecutive

Normalized threshold (1 = perfect match)

SIFT-MR

BATTA-H16

CHI2-H16

Method F TP FP FN

SAD 0.747 0.596 0.081 0.322

SSD 0.746 0.595 0.044 0.362

NCC 0.770 0.626 0.009 0.365

BATTA-H16 0.779 0.638 0.125 0.237

CHI2-H16 0.210 0.117 0.005 0.878

0 2000 4000 6000 8000 10000 120000.85

Anti-diagonal Sum

Conclusion and Future Work • Graph collapsing can be used to derive group similarity from

similarity of group members • Additionally, can be used to evaluate uniqueness of objects,

relatedness of groups – Tested with text, working on video

• Future work – Finalize graph collapsing video segmentation – Expand word similarity to include multiple languages – Investigate sub-image feature extraction/matching – Examine other sources (e.g., YouTube)

“…declaring a public health emergency….”

ABC NBC

“…after the virus killed….” “…sadly had claimed 18 lives….”

“…declaring a public health emergency….”

“…to repeat, declared a public health emergency….”

ABC NBC

“…they’ve set up a special tent….”

“…a tent has been setup….”

“In Boston today, the mayor sounded the alarm”

“…moved onto the upper respiratory, which is a lot of coughing…”

“…stay home when you are sick…”

“…I’ve never been hit by a Mack truck…”

“…is on the panel that decides what goes in the vaccine…”

“…after confirmed cases of flu reach 700…”

Consecutive Shot Detection Across All Stories

Video similarity

Sum of diagonal blocks

Block End

Story Compression: Aggregating News...

Documents

Transcript of Story Compression: Aggregating News...

The Kongskilde CVL Component Vacuum Loader Streamlines ... · CVL A New Era In Efficiency For Finished Products. The Kongskilde CVL Component Vacuum Loader Streamlines Product Batching

[Cvl] Statistika Probabilitas

CVL - docs.rs-online.com

Module 2 CVL 301.05 (P).doc

CVL Color CoveLine Red, Green, Blue & Amber. is a low ... › wp-content › uploads › 2015 › 05 › ss… · CVL Color CoveLine CVL Color CoveLine is a low profile LED cove luminaire

4.20.MP75 CVL 020 Cvl Des Criteria

Eng Cvl Chap7 f2015 Suspension Bridges

Cvl bundle presentation

[Cvl]-Panduan Sistem Informasi Manajemen IBMS

CVL Case Study - Leading

Aggregating capital

CVL Liquid tight flexible conduit

RTX-70C-CVL - Dakota Digital

The CVL Network Code - tfwrail.wales · the CVL IM, at the CVL IM's own discretion, to satisfy the relevant obligation on the CVL IM's behalf, provided that this will not reduce or

CvL-A SoSe2014 Programm

Aggregating Semantic Annotators Paper

[Cvl]-BMS Bridge Design Manual Vol 2

CVL National Academic Depository (CVL NAD) NAD Awareness ...files-cdn.pseb.ac.in/pseb_files/CVL-NAD-Presentation-for-Awareness... · State Bank of HDFC Bank Ltd, 7.18% Standard Chartered

Cvl Servs vs Public Reps

BOOTSTRAP AGGREGATING MULTIVARIATE ADAPTIVE …