Information Extraction from Event Announcements
description
Transcript of Information Extraction from Event Announcements
![Page 1: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/1.jpg)
Jianwei Lu 1
Information Extraction from Event Announcements
Student: Jianwei Lu (40942937)Supervisor: Robert Dale
![Page 2: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/2.jpg)
Jianwei Lu 2
Agenda
Project Introduction Email Event Information Extractor Conclusion
![Page 3: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/3.jpg)
Jianwei Lu 3
Background
What is Information Extraction (IE)? Automated extraction of key information Populate a database
What are the significances? Manage and search data efficiently Aim for other target applications
FOR MORE INFO...
[Cowie J and Wilks Y n,d] http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf
![Page 4: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/4.jpg)
Jianwei Lu 4
The Outcomes
Title
URL
![Page 5: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/5.jpg)
Jianwei Lu 5
Sample Data
Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1,500 documents
![Page 6: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/6.jpg)
Jianwei Lu 6
Agenda
Project Introduction Email Event Information Extractor Conclusion
![Page 7: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/7.jpg)
Jianwei Lu 7
My System Architecture
![Page 8: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/8.jpg)
Jianwei Lu 8
Text Zoning
![Page 9: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/9.jpg)
Jianwei Lu 9
URL Finding Rules
Use pattern to capture URLs Approaches for finding an event URL
1. Search Summary zone2. Search the whole document
Results
![Page 10: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/10.jpg)
Jianwei Lu 10
Dates Finding Rules
Use pattern to capture Dates Use clues to find corresponding date
1. submission-date < start-date <= end-date2. no submission-date in a “Call for
Participation” announcement3. etc.
Results
![Page 11: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/11.jpg)
Jianwei Lu 11
Locations Finding Rules
Tokenise lines into words Use gazetteer to capture Locations
Results
![Page 12: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/12.jpg)
Jianwei Lu 12
Title Finding Rules
![Page 13: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/13.jpg)
Jianwei Lu 13
Title Finding Rules (cont’d)
Apply Machine Learning to classify title lines
Refine title after classification
Results
![Page 14: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/14.jpg)
Jianwei Lu 14
Current Performance
![Page 15: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/15.jpg)
Jianwei Lu 15
Agenda
Project Introduction Email Event Information Extractor Conclusion
![Page 16: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/16.jpg)
Jianwei Lu 16
What I have Achieved
Modules for Information Extraction URL Dates Locations Title
Evaluation Framework
![Page 17: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/17.jpg)
Jianwei Lu 17
Limitations and Future Work
Extension for refining titles Comparison for titles Comprehensive study on SVM tool and
features used for machine learning
![Page 18: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/18.jpg)
Jianwei Lu 18
Implementation Details
Python 2.6 Gazetteer from
http://world-gazetteer.com/
Support Vector Machinehttp://svmlight.joachims.org/
Natural Language Toolkit (NLTK)http://www.nltk.org/Home
![Page 19: Information Extraction from Event Announcements](https://reader035.fdocuments.us/reader035/viewer/2022081520/568148b8550346895db5cf59/html5/thumbnails/19.jpg)
Jianwei Lu 19
Questions?