Information Extraction from Event Announcements

19
Jianwei Lu 1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale

description

Information Extraction from Event Announcements. Student: Jianwei Lu (40942937) Supervisor: Robert Dale. Agenda. Project Introduction Email Event Information Extractor Conclusion. Background. What is Information Extraction (IE)? Automated extraction of key information - PowerPoint PPT Presentation

Transcript of Information Extraction from Event Announcements

Page 1: Information Extraction from  Event Announcements

Jianwei Lu 1

Information Extraction from Event Announcements

Student: Jianwei Lu (40942937)Supervisor: Robert Dale

Page 2: Information Extraction from  Event Announcements

Jianwei Lu 2

Agenda

Project Introduction Email Event Information Extractor Conclusion

Page 3: Information Extraction from  Event Announcements

Jianwei Lu 3

Background

What is Information Extraction (IE)? Automated extraction of key information Populate a database

What are the significances? Manage and search data efficiently Aim for other target applications

FOR MORE INFO...

[Cowie J and Wilks Y n,d] http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf

Page 4: Information Extraction from  Event Announcements

Jianwei Lu 4

The Outcomes

Title

URL

Page 5: Information Extraction from  Event Announcements

Jianwei Lu 5

Sample Data

Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1,500 documents

Page 6: Information Extraction from  Event Announcements

Jianwei Lu 6

Agenda

Project Introduction Email Event Information Extractor Conclusion

Page 7: Information Extraction from  Event Announcements

Jianwei Lu 7

My System Architecture

Page 8: Information Extraction from  Event Announcements

Jianwei Lu 8

Text Zoning

Page 9: Information Extraction from  Event Announcements

Jianwei Lu 9

URL Finding Rules

Use pattern to capture URLs Approaches for finding an event URL

1. Search Summary zone2. Search the whole document

Results

Page 10: Information Extraction from  Event Announcements

Jianwei Lu 10

Dates Finding Rules

Use pattern to capture Dates Use clues to find corresponding date

1. submission-date < start-date <= end-date2. no submission-date in a “Call for

Participation” announcement3. etc.

Results

Page 11: Information Extraction from  Event Announcements

Jianwei Lu 11

Locations Finding Rules

Tokenise lines into words Use gazetteer to capture Locations

Results

Page 12: Information Extraction from  Event Announcements

Jianwei Lu 12

Title Finding Rules

Page 13: Information Extraction from  Event Announcements

Jianwei Lu 13

Title Finding Rules (cont’d)

Apply Machine Learning to classify title lines

Refine title after classification

Results

Page 14: Information Extraction from  Event Announcements

Jianwei Lu 14

Current Performance

Page 15: Information Extraction from  Event Announcements

Jianwei Lu 15

Agenda

Project Introduction Email Event Information Extractor Conclusion

Page 16: Information Extraction from  Event Announcements

Jianwei Lu 16

What I have Achieved

Modules for Information Extraction URL Dates Locations Title

Evaluation Framework

Page 17: Information Extraction from  Event Announcements

Jianwei Lu 17

Limitations and Future Work

Extension for refining titles Comparison for titles Comprehensive study on SVM tool and

features used for machine learning

Page 18: Information Extraction from  Event Announcements

Jianwei Lu 18

Implementation Details

Python 2.6 Gazetteer from

http://world-gazetteer.com/

Support Vector Machinehttp://svmlight.joachims.org/

Natural Language Toolkit (NLTK)http://www.nltk.org/Home

Page 19: Information Extraction from  Event Announcements

Jianwei Lu 19

Questions?