Dunja Mladenic J.Stefan Institute, Slovenia Dora Groo Hungarian STF , Hungary Maija Bundule
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC...
-
Upload
curtis-wade -
Category
Documents
-
view
218 -
download
0
Transcript of Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC...
Blaz Fortuna, Marko Grobelnik, Dunja MladenicJozef Stefan Institute
http://ontogen.ijs.si
ONTOGEN SEMI-AUTOMATIC
ONTOLOGY EDITOR
Outline
Motivation Functionality Conclusion
HCII2007, July 26th
2
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Motivation
HCII2007, July 26th
3
Blaz Fortuna, Jozef Stefan Institute, Slovenia
What is ontology?
Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts.
Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another
It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc.
HCII2007, July 26th
4
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Sample Ontology
HCII2007, July 26th
5
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Ontology is normally designed by knowledge engineers using ontology editors: Protégé, OntoStudio, …
Domain experts are needed to aid the knowledge engineer at the understanding the domain Ontology editors are not aware of
the ontology’s domain
Our goal is to make ontology editor easy-to-use and domain-aware so that it can be used by domain experts. Reduces the need for knowledge
engineer This is done through the use of text
mining and machine learning.
In this presentation we focus on construction of Topic Ontologies
Ontology Editor
Creating Ontology
HCII2007, July 26th
6
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Domain Expert
Domain Expert
Knowledge
Engineer
Knowledge
Engineer
Xerox
Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …
Xerox
Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …
Yahoo!
Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …
Yahoo!
Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …
The Washington Post
Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …
The Washington Post
Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …
How does it work?
OntoGen suggests concepts Suggestions are generated automatically
… from the text corpus by clustering similar documents … based on user query … through text corpus map
User selects appropriate suggestions and adds them to the ontology OntoGen helps deciding which suggestions to include
… by extracting main keywords from the documents … with ontology and concept visualizations … by list documents behind concepts
Behind each concept there is a set of documents Documents are automatically assigned to concepts Document assignments can be edited manually
HCII2007, July 26th
7
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Example
Domain
Text corpus Ontology
Concept A
Concept B
Concept C
HCII2007, July 26th
8
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Functionality
HCII2007, July 26th
9
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Main Features
Interactive user interface User can interact in real-
time with the integrated machine learning and text mining methods
Concept discovery methods: Unsupervised
System provides suggestions
Supervised Concept learning Concept visualization
Methods for helping at understanding the discovered concepts: Keyword extraction
Generates a list of characteristic keywords of a given concept
Concept visualization Creates a map of
documents from a given concept
Also available as a separate tool named Document Atlas
http://docatlas.ijs.si
HCII2007, July 26th
10
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Main view
Concept hierarchyConcept
hierarchy
List of suggested sub-concepts
List of suggested sub-concepts
Ontology visualization
Ontology visualization
Selected conceptSelected concept
11
Concept suggestion
Selected conceptSelected concept
12
Suggested subconcepts
Suggested subconcepts
Add new conceptAdd new concept
New concept
New concept
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
Personalized suggestions13
Topics view
Countries view
UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …
UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …
Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …
Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
Concept learning14
QueryQuery
New ConceptNew ConceptFinis
hFinis
h
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
Concept’s instances visualization
15
Instances are visualized as points on 2D map The distance
between two instances on the map correspond to their content similarity
Characteristic keywords are shown for all parts of the map
User can select groups of instances on the map to create sub-concepts.
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
Concept management
Concept’s details
Concept’s details
Concept’s instance
management
Concept’s instance
management
Selected conceptSelected concept
KeywordsKeywords
Selected instanceSelected instance
16
New documentsNew documents
Classification of selected document
Classification of selected document
Content of selected
document
Content of selected
document
Adding new documents to ontology
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
17
Selected documentSelected
document
Conclusions
HCII2007, July 26th
18
Blaz Fortuna, Jozef Stefan Institute, Slovenia
Evaluation
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
19
First prototype was successfully used in several commercial projects: Applied in multiple domains: business, legislations and digital
libraries Users were always domain experts with limited knowledge and
experience with ontology construction / knowledge engineering Valuable data from first trails was used as input for the interface
design of the second prototype (the one presented here). Feedback from the users of the second prototype
Main impression was that the tool saves time and is especially useful when working with large collections of documents
Among main disadvantages were abstraction and unattractive look
Many users use the program for exploration of the data
Future work
HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia
20
Tools for suggestion and learning of more complex relations
Extended support for collaborative editing of ontologies
Easier input of background knowledge Improvement of the user interface based on the
feedback from user trails and real-world users
Questions? Comments?
Thank you for listening!
HCII2007, July 26th
21
Blaz Fortuna, Jozef Stefan Institute, Slovenia
http://ontogen.ijs.si