Information Extractors
description
Transcript of Information Extractors
Hassan A. Sleiman
Information Extractors
RoadMap• Introduction• Comparison• IE Framework• Conclusions
We are talking about IEs
WrapperForm FillerNavigator
Information ExtractorOntologiser
Verifier
IE in action
¨ Input:¨ Web pages¨ Rules/patterns
¨ Output:¨ Extracted data
Extraction rules
Information extractor
Document
DataThe Da Vinci Code
Dan Brown
15.95 €
2006
Robert Langdon…
Doubleday
Comparison
...
...
Framework
¨ IE framework.¨ Reusable.¨ Comparable results.
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Survey
¨ 62 Information Extractors identified.¨ 43 IEs are studied.
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Components
DataSet
Resultset
RuleSet
Learner
InfoExtractor
PreprocessorUtilities
<a href=“http://example.com”> the _<span> Times </span></a>
<a href=“http://example.com”> the _<span> Times </span></a>
<a “href=http://example.com”> the _<span> Times </span></a>
Tokenisation
<a “href=http://example.com”> the <span> Times </span></a>
• Tag & Text
• Word & No-Word
• Chars
Example:
DataSet 1/2
DataSet 2/2
RuleSet
Keep in mind!
Dataset
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Conclusions
¨ Goals for 2010:¨ IE Framework.¨ Survey.¨ Comparable IE implementations.¨ Marking tool.¨ Tokeniser.
¨ Achievements 2009:¨ Studying 43 IEs.¨ Framework Modules definition.
Seeking for a paper?Try The TDG Scholar at
http://scholar.tdg-seville.info/
Thanks!