Semantic Web - Yolanetwork536.yolasite.com/resources/semantic web.pdf · •Web Data Annotation...

Post on 22-Jul-2020

8 views 0 download

Transcript of Semantic Web - Yolanetwork536.yolasite.com/resources/semantic web.pdf · •Web Data Annotation...

Semantic Web

Tahani Aljehani

Motivation: Example 1

• You are interested in SOAP Web architecture

• Use your favorite search engine to find the articles about SOAP

• Keywords-based search

• You'll get lots of information, both relevant and irrelevant

• Dish washing soap, facial soaps, (...)

• You still have to do a lot of work to find your required information

Motivation: Example 1

• What was the problem?

• The simple keyword matching does not take “semantics” into account

• The word soap in different context means different things, lexical ambiguity

• Semantics of a word = meaning of the word

Motivation: Example 2

• You are interested to know about the former kings of Saudi Arabia

• Again, the keyword former would not match with the word previous or old

• Most search engines do not make intelligent search, for example, by exploiting synonymy

• The information is available on the Web, but you might not get

• The problem becomes worse when information is in another language, e.g. Arabic

Limitation of the current Web

• Finding relevant information, WHY?

• Synonymy (above, Example 2)

• Homonymy (above, Example 1)

• Spelling variants:

• e.g. “organize” in American English vs. “organise” in British English

• Spelling mistakes

• Multiple languages

• English, Arabic, French,…

Limitation of the current Web

• Tasks often require to combine data on the Web

– Searching for the same information in different digital libraries

– Information may come from different web sites and needs to be combined

• Some existing Web sites often provide some limited facility to combine data from various sources

– But, these are not scalable

How to improve the existing Web?

• Increasing automatic linking among data

• Increasing accuracy in search

• Increasing automation in data integration

• Adding semantics to data is the solution!

What is Semantic Web

• A Web of data

• It is not a Web of pages

• It describes the relationship between things

– X is-a-student-of Y

• It describes properties of things

– price, color, ...

What is Semantic Web?

• The next generation of the WWW

• Information has machine-processable and machine-understandable semantics

• Not a separate Web but an augmentation of the current one

• The backbone of Semantic Web are RDF and ontologies

Semantic Web

Semantic means the study of the meaning

• “The Semantic Web is a major research initiative of the World Wide Web Consortium (W3C) to create a metadata-rich Web of resources that can describe themselves not only by how they should be displayed (HTML) or syntactically (XML), but also by the meaning of the metadata.”

• An enhancement to the current Web, not a replacement

• “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

The Semantic Web is about…

• Web Data Annotation – connecting (syntactic) Web objects, like text chunks,

images, … to their semantic notion (e.g., this image is about Innsbruck, Dieter Fensel is a professor)

• Data Linking on the Web (Web of Data) – global networking of knowledge through URI, RDF, and

SPARQL (e.g., connecting my calendar with my rss feeds, my pictures, ...)

• Data Integration over the Web – seamless integration of data based on different conceptual

models (e.g., integrating data coming from my two favorite book sellers)

The structure of data integration

Same book in frensh

Start making queries…

• User of data “F” can now ask queries like:

• –“give me the title of the original”

• well, … « donnes-moi le titre de l‟original »

• This information is not in the dataset “F”…

• …but can be retrieved by merging with dataset “A”!

However, more can be achieved…

• We “feel” that a:author and f:auteur should be the same

• But an automatic merge doest not know that! • Let us add some extra information to the merged

data: –a:author same as f:auteur –both identify a “Person” –a term that a community may have already defined:

• a “Person” is uniquely identified by his/her name and, say, homepage

• can be used as a “category” for certain type of resources

Start making richer queries!

• User of dataset “F” can now query:

–“donnes-moi la page d‟accueil de l‟auteur de l‟original”

– well… “give me the home page of the original‟s „auteur‟”

• The information is not in datasets “F” or “A”…

but was made available by:

–merging datasets “A” and datasets “F”

Combine with different datasets

• Using, e.g., the “Person”, the dataset can be combined with other sources

• For example, data in Wikipedia can be extracted using dedicated tools

• –e.g., the “dbpedia” project can extract the “infobox” information from Wikipedia

So where is the Semantic Web?

• The Semantic Web provides technologies to make such integration possible!

• Hopefully you get a full picture at the end of the tutorial…

Semantic Web technology stack as a framework

Semantic Web Technologies

– Hypertext Web technologies

– Standardized Semantic Web technologies

– Unrealized Semantic Web technologies

Hypertext Web technologies

• Internationalized Resource Identifier (IRI), – generalization of URI Semantic Web needs unique identification

to allow provable manipulation with resources in the top layers.

• Unicode – Semantic Web should also help to bridge documents in different

human languages, so it should be able to represent them.

• XML – is a markup language that enables creation of documents

composed of structured data. Semantic web gives meaning (semantics) to structured data.

• XML Namespaces – provides a way to use markups from more sources. Semantic

Web is about connecting data together, and so it is needed to refer more sources in one document.

Standardized Semantic Web technologies

• Resource Description Framework – (RDF) is a framework for creating statements in a form of so-called triples. It

enables to represent information about resources in the form of graph

• RDF Schema (RDFS) – provides basic vocabulary for RDF. Using RDFS it is for example possible to

create hierarchies of classes and properties.

• Web Ontology Language – It allows stating additional constraints, such as for example cardinality,

restrictions of values, or characteristics of properties such as transitivity. It is based on description logic and so brings reasoning power to the semantic web.

• SPARQL – is a RDF query language - it can be used to query any RDF-based data (i.e.,

including statements involving RDFS and OWL). Querying language is necessary to retrieve information for semantic web applications.

Unrealized Semantic Web technologies

• RIF or SWRL – will bring support of rules. This is important for example to

allow describing relations that cannot be directly described using description logic used in OWL.

• Cryptography – is important to ensure and verify that semantic web statements

are coming from trusted source. This can be achieved by appropriate digital signature of RDF statements.

– Trust to derived statements will be supported by (a) verifying that the premises come from trusted source and by (b) relying on formal logic during deriving new information.

• User interface – is the final layer that will enable humans to use semantic web

applications.

RDF

• RDF was designed to provide a common way to describe information so it can be read and understood by computer applications.

• RDF descriptions are not designed to be displayed on the web.

• RDF documents are written in XML. The XML language used by RDF is called RDF/XML.

RDF

• By using XML, RDF information can easily be exchanged between different types of computers using different types of operating systems and application languages

• RDF uses Web identifiers (URIs) to identify resources.

• RDF describes resources with properties and property values.

RDF

• Explanation of Resource, Property, and Property value: – A Resource is anything that can have a URI, such

as "http://www.w3schools.com/rdf"

– A Property is a Resource that has a name, such as "author" or "homepage"

– A Property value is the value of a Property, such as "Jan Egil Refsnes" or "http://www.w3schools.com" (note that a property value can be another resource)

RDF

• RDF Statements – The combination of a Resource, a Property, and a

Property value forms a Statement (known as the subject, predicate and object of a Statement).

• Statement: "The author of http://www.w3schools.com/rdf is Jan Egil Refsnes". – The subject of the statement above is:

http://www.w3schools.com/rdf – The predicate is: author – The object is: Jan Egil Refsnes

Limitation of RDF

• That’s what RDF describes

– type

– subClassOf

– subPropertyOf

– range

– domain

– label

– comment

Examples

•type – a resource belongs to a certain class

–<WillSmith> <type> <Actor>

–This defines which properties will be relevant to Will Smith

•subClassOf – a class belongs to a parent class

–<Actor> <subClassOf> <Person>

OWL = Web Ontology Language

Ontologies

•Ontologies? –Definition and classification of concepts and entities, and the relationships between them.

•Provide a mechanism for defining the relationship among different words and for the Semantic Web, relationships among different resources

Ontologies

Based on the basic elements of RDF; adds more vocabulary for describing properties and classes.

Relationships between classes (ex: disjointWith)

• Equality (ex: sameAs)

• Richer properties (ex: symmetrical)

• Class property restrictions (ex: allValuesFrom)

Ontologies

Relationships between Classes

• disjointWith – resources belonging to one class cannot belong to the other

<Person> <disjointWith> <Country>

• complementOf – the members of one class are all the resources that do not belong to the other

<InanimateThings> <complementOf> <LivingThings>

Ontologies

Equality

• sameAs – indicates that two resources actually refer to the same real-world thing or concept

<wills> <sameAs> <wismith>

• equivalentClass – indicates that two classes have the same set of members

<CoopBoardMembers> <equivalentClass> <CoopResidents>

Ontologies

Richer Properties • Symmetric – a relationship between A and B is

also true between B and A <WillSmith> <marriedTo> <JadaPinkettSmith> implies <JadaPinkettSmith> <marriedTo> <WillSmith>

• Transitive – a relationship between A and B and between B and C is also true between A and C

<piston> <isPartOf> <engine> <engine> <isPartOf> <automobile> implies <piston> <isPartOf> <automobile>

Ontologies

Richer Properties continued

• inverseOf – a relationship of type X between A and B implies a relationship of type Y between B and A

<starsIn> <inverseOf> <hasStar>

<MenInBlack> <hasStar> <WillSmith>

implies <WillSmith> <starsIn> <MenInBlack>

Ontologies

Inferences

• Create new triples based on existing triples

• Deduce new facts based on the stated facts

<piston> <isPartOf> <engine>

<engine> <isPartOf> <automobile> implies <piston> <isPartOf> <automobile>