503 Final Presentation

Post on 08-Jul-2015

317 views 0 download

Tags:

description

Computational Linguistics Course Project

Transcript of 503 Final Presentation

TIMELINE FROM NEWS

KK Lo

GOAL...

RELATED WORK

Topic Detection and Tracking

Temporal and Event Tagging

2communities

Topic Detection and Tracking

tracking topics?classifying documents

discovering new topic

Events of interest

assume each article is an event

Problems

lack of details

publication date =event happen time?

Temporal and Event Tagging

? Tagging events and their temporal relationships

too many Events....

Problems

Result obtained from the TARSQI toolkit

Event

Event Event

Event

EventEventEvent

MY SOLUTION

APPLY SUMMARIZATIONTECHNIQUE AS

EVENT FILTERING

3components

Prior Ranking1. Sentence A

2. Sentence B

3. Sentence C

4. ...

Beginning sentence has a higher prior probability

0prior probability

Grasshopper

A Page-rank-like ranking algorithm

s1

s2s3

s4

s5

cosine similarities

TARSQI Toolkit

explicit time

event instance

event-time link

event-event link

From TEXT to TimeML

Event FilteringEvents in TimeML

Appear in the Top Selected Sentences?

PICK

BYENO

YES

Temporal Reasoner

Find the (start, end) bound for each events

2008Dec

event1event2

event3

2009

RESULT?

Sentence Selection Quality

Special Thanks to

for the data and ROUGE =p

250-words summary form 25 documents with DUC2007 Data Set

How can we represent 3320 events on a timeline?

Effect of Sentence Filtering

D0701A D0720E

#Event before Filtering 3320 1435

#Event after Filtering 67 37

choosing the top 10 sentences

This shows that my approach is a failure

Time-Event AnchoringD0701A D0720E

#Event before Filtering

3320 1435

#Failure 3085 1129

#Event after Filtering

67 37

#Failure 49 29

WHY?Unable to deduce the

relationships for all pair of events

TARSQI only support single document

e.g. 50 tagged events,only 50 pairs of relation are taggedshould be 50C2 = 1225

LESSON LEARNED

3areas

Topic Detection and Tracking

Temporal and Event Tagging

Automatic Summarization

my project

The limit of existing technology

cannot get enough information from the documents

The limit of temporal analysis

OR EVEN

cosine similarity with tf-idf weighting is computational

expensive

2.5 hrs for 867 sentences

DUC2007 Documents are hard to parse

different documents have different format........

no standard date format...

contains some special characters that cause troubles

to XML parsers...

Q & A