Issues in Learning an Ontology from Text
-
Upload
robertstevens65 -
Category
Science
-
view
113 -
download
1
description
Transcript of Issues in Learning an Ontology from Text
![Page 1: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/1.jpg)
Issues in Learning an Ontology from
Text
Christopher Brewster, Simon Jupp, Joanne Luciano, David Shotton, Robert Stevens, and Ziqi Zhang
![Page 2: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/2.jpg)
The Use Case: Animal Behaviour
• Animal behaviour community recognises the need for an ontology, e.g. for video annotation/retrieval
• The community created an “Animal Behaviour Ontology” - 339 terms
• Can we (semi-) automatically build from text?
![Page 3: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/3.jpg)
Some Questions
• Do we get a “good ontology”?
• If not, is it useful?
• Is it low-effort?
• Should the result be “tidied up” or used as a donor?
![Page 4: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/4.jpg)
Methodology: Dataset
• Journal “Animal Behaviour” from Elsevier
• 623 articles from Vol 71 (2006) - Vol 74 (2007)
• 2.2 million words
• Various formats - most usefully xml
![Page 5: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/5.jpg)
We Want an Ontology of Green
• An ontology of “animal behaviours”
• Not an ontology of the corpus
We want the green terms in the ontology
![Page 6: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/6.jpg)
Processing Steps (1)
1. Text extracted from XML - excluding affiliations, acknowledgements, bibliography except for title etc.
2. Noise removed - person names, animal names, place names
3. Lemmatiser used to reduce data sparsity
4. Term extraction applied
![Page 7: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/7.jpg)
Processing Steps (2)5. Term selection
Regular expression used to select terms ending in behaviour, display, construction, inspection plus generic -ing, -ism, etc.
Build hierarchies using String Inclusion
6. Top level terms filtered using “Hearst Patterns” to test if X ISA behaviour/activity/etc.
WalkingRunningJumpingHuntingPeckingReed BuntingCorn BuntingHerringCourtshipStudentshipCannibalismDimorphism
![Page 8: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/8.jpg)
Applying String Inclusion /Rules to Terms
C
BCAC
ABC
Selection
Mate Selection
Natural Selection
Female Mate Selection
![Page 9: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/9.jpg)
Lexico-Syntactic Patterns
• X such as P, Q, R; X is a Y
• Grooming is a behaviour
• Copulation is an activity
• Dimorphism is a behaviour
• Calls such as trills, whistles, grunts
![Page 10: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/10.jpg)
Results
• 64,000 terms extracted
• The regexp selected 10,335 terms
• Step 6a resulted in an ontology with 17,776 classes and 1295 top level classes
• Step 6b resulted in an ontology with 13,058 classes and 912 top level classes
![Page 11: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/11.jpg)
Results (2) - Copulation Sub-tree
![Page 12: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/12.jpg)
Results(3)
• Evaluation of terms excluded by regexp:
• 56,000 terms excluded
• Random sample of 3140 terms evaluated by hand
• 7 verbs and 42 nouns should not have been excluded
• E.g., “interaction”
• A recall of .905
![Page 13: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/13.jpg)
Discussion: The problem of focus
![Page 14: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/14.jpg)
Other Issues
• More a vocabulary than an ontology
• SKOS-like rather than OWL-like
• Can deal with “selection”, “mate selection” and “natural selection
• Highly compositional terms “Adult male grooming behaviour”
• Cleanish list of top level terms: Canabalism, copulation, eating, foraging, fighting, grooming
![Page 15: Issues in Learning an Ontology from Text](https://reader033.fdocuments.us/reader033/viewer/2022052900/555d05add8b42add648b5720/html5/thumbnails/15.jpg)
Discussion: Is it useful?
• Answers: No, yes, yes, donor
• Useful ontological fragments
• Bringing ontology to ontology learning is the research challenge
• Limitations: noise; the problem of focus; only taxonomic relations
• Advantages: speed; ease; a step towards formal ontologies