IN350 Summary and Overview Judith Molka-Danielsen Nov.28.2003.

42
IN350 Summary and Overview Judith Molka-Danielsen Nov.28.2003
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of IN350 Summary and Overview Judith Molka-Danielsen Nov.28.2003.

IN350 Summary and Overview

Judith Molka-Danielsen

Nov.28.2003

Major Topics What is Content Management? What is Document

Management and Information Steering?(2003) What is the role of Markup Languages in Content

Management? Document Properties and Markup Languages (Text + Multimedia Languages + Properties, Ch. 6, Baeza-Yates)

File Organization and Storage Structures, Connolly). Text Properties, Zipf's Law, Heap's Law (Text

Operations, Ch.7, Baeza-Yates). Text Operations ch. 7 (doc) Search Enhancements (older notes) (htm) and

Compression of text. (Compressing, Ch. 7, Cyganski). Oracle Text Operations... Creating an Index And

Types of Indicies

Major Topics Retrieval Evaluation Measures, with Precision and

Recall. (Read: Retrieval Evaluation, Ch.3, Baeza-Yates).

The role of taxonomies in content management. Searching the Web (Read: Searching on the Web,

Ch.13, Baeza-Yates). Multimedia Management (Read: Image Compression,

Ch.8, Cyganski, and Digital Video, Ch. 9, Cyganski). Data Warehousing (Read: Data Warehousing, Ch. 13). Large Capacity Storage, Ch. 17, Cyganski). Document Publishing and Distribution and older notes

on Online Publishing B2B e-commerce standards for document exchange Ontologies in Document Exchange

COEUR-SW program

Ontologies

From the IDILecture series

Reference:Jon Atle Gulla

Nicola Guarino:Formal Ontology and Information Systems. Robert Jasper and Mike Uschold: A Framework for Understanding and Classifying Ontology Applications.

Ontology ABC Ontology attracts attentions across many fields in computer

science recently. There exists no consensus definition about ontology. One most cited is “Ontology is an explicit representation of a

conceptualization, the conceptualization includes a set of concepts, their definition and inter-relationships”.

In many cases, the term ontology is another name denoting the result of familiar activities like conceptual analysis and domain modeling.

The roles of ontology vary from knowledge management to semantic interoperability.

One important reason for that ontology attracts so much attention recently is the semantic web, since ontology is considered the key enabler of semantic web.

More terminology Ontology: engineering artifact

Constituted by a vocabulary (concepts, relations) Assumptions about intended meaning

Formalization: Logical theory accounting for the intended meaning of

a formal vocabulary Committed to a particular conceptualization of the world

Ontology vs. conceptualization Conceptualization is language-independent Ontology is language-dependent

Example 1. Ontology of American Universities

SHOE ontology of university concepts

<?xml version = “1.0” encoding=“ISO-8859-1” standalone=“no” ?><!DOCTYPE ontology SYSTEM “http://…/onto.dtd”><ontology id=“university-ont” version=“2.1” description=“…”>

<def-category name=“Department” isa=“EducationalOrganization” short=“university department /><def-category name=“Activity” isa=“SHOEEntity” short=“activity /><def-category name=“Work” isa=“Activity” short=“work /><def-category name=“Course” isa=“Work” short=“teaching course />….</ontology>

<?xml version = “1.0” encoding=“ISO-8859-1” standalone=“no” ?><!DOCTYPE ontology SYSTEM “http://…/onto.dtd”><ontology id=“university-ont” version=“2.1” description=“…”>

<def-category name=“Department” isa=“EducationalOrganization” short=“university department /><def-category name=“Activity” isa=“SHOEEntity” short=“activity /><def-category name=“Work” isa=“Activity” short=“work /><def-category name=“Course” isa=“Work” short=“teaching course />….</ontology>

Example 2. Business Process Ontology

MIT process handbook

Sell financial service

Sell savings & investment service

Sell loan

Sell management service

Sell account access servicesSell ATM access

Sell telephone access

Sell reserve credit

Sell credit card

Sell installment loan

Sell letter of credit

Sell mortgage

Sell credit line

Sell certificate of deposit

Sell account

Sell mutual funds

Sell retirement plan

Example 3. Hierarchical Categories?

Can hierarchical categories be ontologies?

Conceptualization of medicaldomain?

More Confusion

Differences and similarities

OntologyOntology

ThesaurusThesaurus

DictionaryDictionary CategoriesCategories??

The Semantic Web Goal: Evolve the Web –

From sites designed for human consumption To sites also understandable and usable by computer programs.

What would that do for us? Query answering rather than document retrieval Services findable, usable, and composable by automated agents Information exchange among independently designed programs

““The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

How do we get there from here? For services –

Service description Ontologies to provide intended

meaning of service item. For documents –

Structure, ala XML Ontologies to provide intended meaning of terms

XML Describes Document Structure HTML

Language for describing how to display document content

E.g., tag a word to be displayed in bold or italic

XML

Language for describing the structure of document content

E.g., declare data to be a retail price, a sales tax, a book title, ...

Uniform method for describing and exchanging data using HTTP

Provides a “syntactic schema”

XML allows authors to create their own markup (e.g. <AUTHOR>), which seems to

carry some semantics. However, from a computational perspective tags like

<AUTHOR> carries as much semantics as a tag like <H1>. A computer simply does not

know, what an author is and how the concept author is related to e.g. a concept person.

Bibliographic Entry in XML<Publication URL = "ftp://db.stanford … xml.ps”> <Title> From Semistructured Data ... Language </Title>

<Author> R. Goldman </Author> <Published> Proceedings of ... Databases </Published> <Location> <City> Philadelphia </City> <State> Pennsylvania </State> </Location> <Date> <Month> June </Month> <Year> 1999 </Year> </Date></Publication>

Location of what?Location of what?

When in June?When in June?

XML Is Not Enough

Language for describing the structure of document contentLanguage for describing the structure of document content

E.g., declare data to be a retail price, a sales tax, a book title, ...E.g., declare data to be a retail price, a sales tax, a book title, ...

Uniform method for describing and exchanging data using HTTPUniform method for describing and exchanging data using HTTP

Provides a “syntactic schema”Provides a “syntactic schema”

Provides no means of specifying intended meaning of tagsProvides no means of specifying intended meaning of tags

Ontologies enable independently developed programs to exchange dataOntologies enable independently developed programs to exchange data

Ontologies specify intended meaning in a computer interpretable formOntologies specify intended meaning in a computer interpretable form

W3C Semantic Web Activity Semantic Web Activity (http://www.w3.org/2001/sw/)

“Established to serve a leadership role, in both the design of enabling specifications and the open, collaborative development of technologies that support the automation, integration and reuse of data across various applications.”

Successor to the W3C Metadata Activity

RDF Core Working Group (http://www.w3.org/2001/sw/RDFCore/)

Responsible for the Resource Description Framework (RDF)

Web Ontology Working Group (http://www.w3.org/2001/sw/WebOnt/) Charter: Build upon the RDF Core work a language for defining structured

web based ontologies which will provide richer integration and interoperability of data among descriptive communities

Developing Ontology Web Language (OWL)

Based on DAML+OIL, developed in DARPA’s Agent Markup Language

program

Resource Description Framework A simple representation language for describing Web resources

All sentences are triples of the form “(Property Subject Object)” Property is a binary relation

Subject is a URI reference

Object is either a URI reference or a literal

E.g., (creatorOf http://www.w3.org/Lassila “Ora Lassila”) XML external syntax

Model theoretic semantics

Includes a resource “Class” and properties “type”, “subclassOf”, etc. Supports classes of resources and literals

E.g., (type Elephant Clyde)

Supports subclass hierarchies

E.g., (subclassOf Elephant Mammal) Like a primitive frame representation language

RDF

Classes

Resource

Property

Literal

Statement

Container Bag

Seq

Alt

Properties type

subject

predicate

object

RDF Schema PropertiesProperties

subClassOf

subPropertyOf

seeAlso

isDefinedBy

comment

label

range

domain

member

ClassesClasses Class

ContainerMembershipProperty

Resource

Class Property

ContainerMembershipProperty

Literal

Container Statement

Bag

Seq Alt

RDF-S Class and Property Definitions

<rdf:Class ID="MotorVehicle"> <rdfs:subClassOf rdf:resource="http.../PR-rdf-schema-19990303#Resource"/></rdf:Class>

<rdf:Class ID="PassengerVehicle"> <rdfs:subClassOf rdf:resource="#MotorVehicle"/></rdf:Class>

<rdf:Class ID="Van" <rdfs:subClassOf rdf:resource="#MotorVehicle"/></rdf:Class>

<rdf:Class ID="MiniVan"> <rdfs:subClassOf rdf:resource="#Van"/> <rdfs:subClassOf rdf:resource="#PassengerVehicle"/> </rdf:Class>

<rdf:Property ID = "registeredToregisteredTo"> <rdfs:domain rdf:resource = “#MotorVehicle” /> <rdfs:range rdf:resource = “#Person” /></rdf:Property>

Christine is a Christine is a passenger vehicle.passenger vehicle.

Is Christine a motor Is Christine a motor vehicle?vehicle?Yes.Yes.

Christine is registered Christine is registered to Arnie.to Arnie.

What is Arnie?What is Arnie?A person.A person.

Comments on RDF and RDF-S Severely lacking in expressive power

Domain and range constraints rather than Value-Type

E.g., can’t define class of people all of whose children are male No cardinality constraints

Particularly important for “exactly 1” and “at most 1” No decompositions

Particularly important for “disjoint” and “exhaustive” No axioms No negation (!)

Not useful for checking consistency

E.g., can’t prove an object is not an instance of a class

Basically a typing system

More powerful ontology representation languages are needed.

Ontology languages

The DAML Program

DAML: DARPA Agent Markup Language Goal: achieve semantic interoperability between Web pages, databases,

programs, and sensors. DAML+OIL:

This language gets its strange name because it was created by a Joint Committee of US and European researchers who were working on two different, but similar languages.

DAML stands for the DARPA Agent Markup Language, which is a project being funded by the US Defense Advanced Research Projects Agency -- the same organization that funded much of the original work on the Internet (which was then called the ARPAnet).

OIL stands for the Ontology Interchange Language and is developed by a number of researchers, primarily a group funded by the European Union's Information Society Technologies Program.

The joint committee created a new language with the best features of SHOE, DAML, OIL and several other markup approaches. At the time of this writing, DAML+OIL is the most advanced web ontology language, and it is expected to provide the basis for future web standards for ontologies (OWL.

Web site: http://www.daml.org/

DAML+OIL A representation language for user-defined ontologies

An ontology added to RDF and RDF-Schema

Specification document:

http://www.daml.org/2000/12/daml+oil-index.html

Expressive power analogous to: Description logics (e.g., CLASSIC)

Monotonic frame languages (e.g., OKBC knowledge model)

Designed in collaboration with the European Community

Designers of the Ontology Inference Layer (OIL)

Basis for OWL, the candidate W3C standard

DAML+OIL ClassesThing

Restriction

List

Ontology

AbstractProperty

TransitiveProperty

DatatypeProperty

UniqueProperty

UnambiguousProperty

Nothing

DAML+OIL Properties

EquivalenceequivalentTo, sameClassAs,

samePropertyAs

Listsfirst, rest, item

PropertiesinverseOf

OntologiesversionInfo, imports

Classes

disjointWith

Defining Non-primitive classes

unionOf, disjointUnionOf, intersectionOf, complementOf, oneOf

Restrictions

onProperty, toClass, hasValue, hasClass, hasClassQ

minCardinality, maxCardinality, cardinality

minCardinalityQ, maxCardinalityQ, cardinalityQ

Property Restrictions on Classes<Class ID = "Person">

<comment> Person is a subclass of objects whose parents are persons. </comment>

<rdfs:subClassOf>

<daml:Restriction>

<daml:onProperty rdf:resource = “#hasParent” />

<daml:toClass rdf:resource = “#Person” />

</daml:Restriction>

</rdfs:subClassOf>

<comment > Person is a subclass of resources that have one father. </comment>

<rdfs:subClassOf>

<daml:Restriction>

<daml:onProperty rdf:resource = “#hasFather” />

<daml:cardinality> 1 </daml:cardinality>

</daml:Restriction>

</rdfs:subClassOf>

All objects all All objects all of whose of whose

parents are parents are personspersons

All objects that All objects that have exactly 1 have exactly 1

fatherfather

PersonPerson

Formal ontology and information systems This paper is trying to offer a systematic account of the

central role ontologies may play in information systems. Ontology may have impacts for the three main

components of information systems: information resources, user interfaces and application programs.

In AI, an ontology is an engineering artifact. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation.

Kinds of ontologies, depending on level of generality

Top-level ontologies: general concepts like space, time, matter, object, event,etc… which are independent of a particular problem or domain.

Domain ontogies: the vocabulary related to a generic domain (medicine) , by specifying the terms in the top-level ontology.

Task ontologies: describe generic tasks or activities (diagnosing or selling)

Application ontologies: describe concepts depending both on a particular domain and task. Application ontology is a particular knowledge base, describing facts assuming to be always true by a community of users.

Ontology-driven information systems

An IS consists of components of three different types: application programs, information resources, and user interfaces. Ontologies can play a central role here.

Two dimensions for analysis: Temporal dimension: using ontologies at

development time or run time. Structural dimension: impact of ontologies on

different IS components.

The structural dimension: impact of ontologies on IS components

Using an ontology for the database component. An ontology can be compared with the schema

component of a database. At development time, the resulting conceptual model

of requirement analysis can be represented as a computer processable ontology and from there mapped to concrete target platforms.

Another main use of ontology in development time is information integration.

At run time, explicit ontologies (run-time accessible database schema) are at the core of the mediation based approach to information integration.

Using an ontology for the user interface component. Allow the user to query and browse the ontology. The user can browse the ontology in order to better

understand the vocabulary used by the IS, being able therefore to formulate queries at the desired level of specificity.

Another usage is vocabulary detaching: the user can use his own natural language terms which are mapped to the IS vocabulary with the help of the ontology

Impact..

Impact..

Using an ontology for the application program component. Application programs encode knowledge in the

form of type or class declaration and procedures. The ontological commitment of the program

should be made explicit using ontologies Further, for the benefits of ease of maintenance

and flexibility, we can turn the program into knowledge based system.

Conclusion

Ontology driven information system. Different types of ontologies. The role of ontology in IS

Time dimension Development time vs. run time

Structural dimension. Information resource, user interfaces, and application

program.

Common access to information

Use ontologies to enable multiple target applications (or humans) to have access to heterogeneous sources of information (ontology based information integration).

Four categories. Human communication Data access via shared ontology Data access via mapped ontology Shared services

Human communication Promote common

understanding among knowledge workers.

Supporting technologies include ontology editors and browsers.

Example: the work flow management coalition reference documents.

Maturity: library classification skills have a long history (KWs sharing an ontology in the form of a glossary)

Data access via shared ontology An ontology can be used as

an interchange format to enable common access to operational data.

Example: Process Interchange Format (PIF) and EcoCyc

Maturity: commercial success exists in some context, while in others, the technology is a long way from being mature. Difficult to agree on

common ontology

Shared services Similar to data access via

shared ontology, but different in the focus of what is being shared. The ontology defines interfaces in multiple target languages.

Example: Using UML to create an ontology for product data management, this ontology is then used to generate interface code for the client and server.

Maturity: relatively mature

Ontology based search Use an ontology for

searching an information repository for desired resources.

Example: Yahoo Maturity: Many commercial

internet portals are beginning to explore the use of concepts for ontology-based search.

Endeca, Kaidara

Conclusion

The paper presents a framework for understanding ontology applications.

We studied The framework Various ontology application scenarios (use cases).

Ontology as specification Common access to information

Human communication Data access via shared ontology Data access via mapped ontology Shared services

Ontology-based search

Summary

Ontology ABC Motivation

Semantic web RDF and RDFS

Brief introduction to state of art ontology languages. In depth introduction to one of such languages -

DAML+OIL

Impact of ontology to information system. Classification of ontology applications.

What is Content Management? What is Document Management and Information Steering?(2003)

What is the role of Markup Languages in Content Management? Document Properties and Markup Languages (Text + Multimedia Languages + Properties, Ch. 6, Baeza-Yates)

File Organization and Storage Structures, Connolly). Text Properties, Zipf's Law, Heap's Law (Text Operations, Ch.7, Baeza-Yates). Text Operations ch. 7 (doc) Search Enhancements (older notes) (htm) and Compression of text.

(Compressing, Ch. 7, Cyganski). Oracle Text Operations... Creating an Index And Types of Indicies Retrieval Evaluation Measures, with Precision and Recall. (Read: Retrieval

Evaluation, Ch.3, Baeza-Yates). The role of taxonomies in content management. Searching the Web (Read: Searching on the Web, Ch.13, Baeza-Yates). Multimedia Management (Read: Image Compression, Ch.8, Cyganski, and Digital

Video, Ch. 9, Cyganski). Data Warehousing (Read: Data Warehousing, Ch. 13). Large Capacity Storage, Ch. 17, Cyganski). Document Publishing and Distribution and older notes on Online Publishing B2B e-commerce standards for document exchange Ontologies in Document Exchange

IN350 Document Management and Information Steering