Bootstrapping an Ontology-based
Information Extraction System
Alexander Maedche, Günter Neumann, Steffen Staab
(presented by D. Lonsdale)
CS 652 – June 7/04
Traditional IE + machine learning Extensive use of NLP (SMES: German,
English, Japanese) Ontologies and related tools (OntoEdit,
OntoBroker)
abstract ontology + lexicon
concrete ontology Conclusions/reflections
Overview
The mantra
Lexical knowledge As usual, concepts are grounded in lexical items
Extraction rules OntoBroker: deductive, OODB, F-Logic
Ontology Abstract ontology + lexicon concrete ontology
Lexical knowledge
Low-level lexicons, dynamically updated Basic low-level NLP:
tokenization (50 classes) morphological processing POS tagging named entity extraction chunk parsing thematic role assignment (grammatical function)
Cascading finite-state transducers
Extraction
Concept definitions Inference rules/axioms Bridging (forward inferencing)
Syntactic dependency relations “...implementations of idiosyncratic syntactic cues
for particular ontological structures...” Logical relations (e.g. transitivity, LocatedIn)
OntoBroker engine
Ontology learning
So how does ontology learning happen? Ontology engineer specifies, refines knowledge structures Select and process a text corpus with the model Use a set of different learning approaches
“...generalized association rule learning algorithm...” Extend the extracted model (all three parts...) Human reviews learning decisions
The ontology is concrete, the methodology description less so...
Top Related