Introduction tothe Semantic Web and Linked Data

Post on 01-Nov-2014

2.457 views 1 download

Tags:

description

This was presented to the San Francisco chapter of DAMA International on June 9, 2010 at SAP in Palo Alto, California.

Transcript of Introduction tothe Semantic Web and Linked Data

Introduction to theSemantic Web and Linking Data

Eric Axel FranzonVice PresidentSemantic Universe/Wilshire Conferences

About Me• Professional

• Wilshire Conferences• Semantic Universe• W3C• Guidewire Group

• Coach / Consultant / Trainer• Geek

Today we will talk about:

• Semantic Technologies

• Semantic Web & Web 3.0

• Linked Data– Linked Open Data

– Linked Enterprise Data

• Use cases

• That harmonica on the first slide

SemanticTechnologies

SemanticWeb

WebTechnologies

WorldWideWeb

Semantic Web = Web 3.0Semantic Web

= Web of Data

ww

w.g

eeka

ndpo

ke.c

om

What is the Web of Data Not?

• A software package• Something that will ever

“be complete”• A replacement for the

current Web• A pipe dream• A silver bullet

It’s also not…

• HAL 9000

• Skynet

It’s also not…

What is the Web of Data?

• A Web-scale architecture• A metadata technology• A layer of meaning on the

existing Web• In use TODAY!

Web of Data

Q: What does Linked Data have to do with the Semantic Web?

Web 1.0 – Linking Documents

Web 1.0

Web 1.0

“I see: characters + formatting + images”--my Computer

Web 1.0 – Linking DocumentsWeb 2.0 – Linking People

Web 2.0

Web 2.0

“I see: characters + formatting + images”--my Computer

Web 1.0 – Linking DocumentsWeb 2.0 – Linking PeopleWeb 3.0 – Linking Data

Web 3.0 – Linking Data

Title Publisher

Price

Format

Cover

Author

Web 3.0 – Linking Data

Title Publisher

Price

Format

Cover

Author

“I see: things + relationships. This informationis about a book.”

SemanticTechnologies

SemanticWeb

LinkedOpenData

Linking Open Data ProjectMay, 2007

March 2009

Data from these trusted sources is available for you

to use in your applications TODAY.

Data you can LINK to.

And not just data…

Semantic Data that is not onlymachine READABLE.

It is machine UNDERSTANDABLE!

Disambiguation

Disambiguation

mole, n.

But…

MetadataDoctorow’s Criticisms LOD/LED Response

“People lie” Allow users to choose a social trust model

“People are lazy”Automate where possible and encourage

authoring where needed

“People are stupid”Automate where possible, check where

possible

“Mission Impossible: know thyself” Allow multiple sources of metadata

“Schemas aren’t neutral” Allow multiple schemas

“Metrics influence results” Allow multiple metrics

“There’s more than one way to describe something”

Allow multiple descriptions

LOD/LED is flexible

1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS3. By using TRIPLES

How does LOD/LED work?

So, what’s a THING?

1. By uniquely identifying THINGS

How does LOD/LED work?

A THING is anything that can be uniquely identified by a URI or a literal (string)

Me

My postal code

The White House

L.A. County’s sales tax rate

http://twitter.com/ericaxel

http://www.city-data.com/zips/90043.html

Lat: 38.89859 Long: -77.035971

9.750 %

http://ericfranzon.com/operator.jpg

This is a collection of THINGS:

t_peopleName City State Post codeDavid Fredericksburg VA 22408Eric Culver City CA 90230

Trees and Tables

t_people

Name City State Post code

David Fredericksburg VA 22408

Eric Culver City CA 90230

people

EricDavid

Fredericksburg VA 22408

City

State Postcode

Culver City CA 90230

City

State Postcode

Trees and Tables – Problem 1

t_people

Name City State Post code flag

David Fredericksburg VA 22408 1

Eric Culver City CA 90230

people

EricDavid

Fredericksburg VA 22408

City

State Postcode

Culver City CA 90230

City

State Postcode

flag1

Adding partial data totables leads to sparseness

Trees and Tables – Problem 2

t_people

Name City State Post code

David Culver City CA 90230

Eric Culver City CA 90230

people

EricDavid

Culver City CA 90230

City

State Postcode

Culver City CA 90230

City

State Postcode

Common data leads to (lots!) of duplication

Graphs

people

EricDavidCity

State

Postcode

Culver City

CA

90230

City

State

Postcode

flag1

Who’s your daddy?

1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS

How does LOD/LED work?

Is Father of

<owl:ObjectProperty rdf:ID="isFather"><rdfs:domain rdf:resource="#Person"/><rdfs:range rdf:resource="#Person"/>

</owl:ObjectProperty>

mailto:ericaxel@yahoo.com

1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS3. By using TRIPLES

What’s a triple?

Predicate

Triples? It’s Elementary! (School)

book has title.

RelationshipThat is a Triple!

“This book has a title.”

“Eric wrote this Web page.”

“This article is about moles.”

“I like blues.”

“I like B.L.U.E.S.”

“This image can be used non-commercially.”

“My email address is ericaxel@yahoo.com.”

Triples? It’s Elementary!

Book Has Title “Title”

Eric Created Webpage

Image Has License CC Non-Commercial

TriplesSu

bjec

ts

Obj

ects

Predicates

Book

Author Title

PublisherISBN

The Trouble with Triples

Cytoscape.org

Review of the Review

Our Data are Multiplying.

Trends in data growth

• Vast amounts of digital data being produced daily.

–Wal-Mart produces 1 million transactions every hour. DBs estimated at > 2.5 petabytes

• US National Archives creating > 10 million digital assets annually

Data Inflation

• Megabyte (MB) = 220

• Gigabyte (GB) = 230

• Terabyte (TB) = 240

• Petabyte (PB) = 250 or 1000TB

• Exabyte (EB) = 260 or 1,000PB

• Zettabyte (ZB) = 270 or 1,000EB

• Yottabyte (YB) = 280 or 1,000ZB

Acceleration

–Decoding human genome involves analyzing 3 billion base pairs

• what took 10 years to process in 2003, takes a week today

A brand new professional has emerged ....

The data scientist, who combines the skills of

software programmer, statistician and storyteller/artist to extract the

nuggets of gold hidden under mountains of data.

- The Economist, “Data, data everywhere”, Feb 27th 2010

When we come back…

S – T – R – E – T – C - HBreak!

Linked Data is like a harmonica

• It’s easy to play

Facebook• Unique Visitors*: 540,000,000• Page Views: 570,000,000,000

* Per month

Source: Google - The 1000 most-visited sites on the web

Facebook

Facebook

FOAF: Friend-Of-A-Friend

http://www.foaf-project.org/

FOAF-a-Matichttp://www.ldodds.com/foaf/foaf-a-matic

semantictweet.com

semantictweet.com

semantictweet.com

Can create four FOAF files: • Friends (who I follow)• Followers• All• Just Me

Linked Data is like a harmonica

• It’s easy to play• It’s a “real” instrument

The Technologies of RDBMS

• Data• Schemas• Query Language

RDBMS Datat_people

Name City State Post codeDavid Fredericksburg VA 22408Eric Culver City CA 90230

RDBMS Schema

RDBMS Query Language: SQL

SELECT isbn, title, price, price * 0.06 AS

sales_taxFROM Book WHERE price > 100.00 ORDER BY title;

The Technologies of LOD/LED

• Data• Schemas• Query Language

The Data Language

ResourceDescriptionFramework

RDF TriplesSubject Predicate Object

http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf

http://www.geonames.org/maps/google_34.021_-118.396.html

dc: location N 34° 1' 16''W 118° 23' 47''

http://twitter.com/ericaxel foaf: knows “Brian Sletten”

RDF Triple ComponentsSubject Predicate Object

http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf

http://www.geonames.org/maps/google_34.021_-118.396.html

dc: location N 34° 1' 16''W 118° 23' 47''

http://twitter.com/ericaxel foaf: knows “Brian Sletten”

URI URI URI orString Literal

http://twitter.com/bsletten

“RDF is good for distributing dataacross the Web and pretendingit’s in one place.”

-Dean Allemang, TopQuadrant

Just so you know…There are many ways of representing RDF:

• RDF/XML• N3• JSON

• N-Triples • Turtle• RDFa

Each serialization has pros and cons, but they all are used to connect THINGS and RELATIONSHIPS into TRIPLES

The Schemata

Linked Data schemas consist of:

Your RDF relationships (predicates)+

Relationship descriptions

LOD/LED Schemata

id First Name Last Name

1 Tony Shaw

Schema

Data

Initial Schema

hasID

hasFirstName hasLastName

Tony Shaw1

owl:sameAs

hasSurname

Relationshipdescription

Choosing Relationships

• Reuse popular vocabularies

–FOAF (Friend-of-a-friend)

–Dublin Core (library/publisher metadata)

–SIOC (Semantically-Interlinked Online Communities)

• ...or make up your own!

RDF TriplesSubject Predicate Object

http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf

http://www.geonames.org/maps/google_34.021_-118.396.html

dc: location N 34° 1' 16''W 118° 23' 47''

http://twitter.com/ericaxel foaf: knows “David Wood”

1. Resource Description Framework Schema (RDFS): Simple, hierarchical classes

2. Simple Knowledge Organization System (SKOS): Port taxonomies to the Semantic Web

3. Web Ontology Language (OWL): Complex logical relationships

Relationship Descriptions

Combine vocabularies and descriptions

LOD/LED Schemata

• Put as much work into creating your LED schema as you put into creating your relational schemas

• ... maybe even a bit more (due to links between your data and others’).

New York Times -SKOS

New York Times -SKOS

New York Times -SKOS

SKOS STUFF

The query language

SPARQLProtocolAndRDFQueryLanguage

SPARQL

SPARQL Example #1FOAF (some people that Eric Franzon knows)

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?nameFROM <http://ericaxel.com/eric.rdf>WHERE {

?knower foaf:knows ?known .?known foaf:name ?name .

}

SPARQL Example #1

Example #1 - Results

SPARQL Example #2Querying two FOAF Profiles

PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?nameFROM NAMED <http://ericaxel.com/eric.rdf>FROM NAMED <http://zepheira.com/team/dave/dave.rdf>WHERE {GRAPH <http://ericaxel.com/eric.rdf> {?x rdf:type foaf:Person .?x foaf:name ?name .

} .GRAPH <http://zepheira.com/team/dave/dave.rdf> {?y rdf:type foaf:Person .?y foaf:name ?name .

} .}

Example #2 - Results

SPARQL Example #3Bart Simpson's chalkboard gags (DBPedia)

SELECT ?episode,?chalkboard_gagWHERE { ?episode skos:subject ?season .

?season rdfs:label ?season_title . ?episode dbpedia2:blackboard ?chalkboard_gag .

FILTER (regex(?season_title, "The Simpsons episodes, season")) . } ORDER BY ?season

Example #3 - Results

http://www.milinkito.com/swf/bart.php

Are *real* companies using Linked Data?

Easy to play; takes work to master.

E-Commerce

A vocabulary to describe products, services, and other e-commerce terms.

Who is using GoodRelations?

1100+ Best Buy stores

Phase 2

~640,000 “next-gen” product detail pages

21 Open Box Productslisted at this store!

Who is using GoodRelations?

With RDFa + GoodRelations, but no additional SEO work, PlushBeautyBar.com was indexed by Google within one week.

Semantic (Web) Technologies

SemanticWeb

LinkedEnterpriseData

RDBMS

CRM

Calendars

LinkedOpenData

MIXING private and public data?

Absolutely! And it’s really useful to do so!

Example:

iConcertCal

Public + Private Data: iConcertCal

Public + Private Data: iConcertCal

Example:

Siri

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Siri.com

Siri is a Virtual Assistant.

I ask it to do things for me.

It does, by mixing data,by disambiguating, andby reasoning.

Example:

• Largest broadcasting corp. in the world

• 8 national TV channels

• 10 national radio stations

• 40 local radio stations

• An extensive website, bbc.co.uk

• Broadcasts 1,000-1,500 programs per day.

• Publishes information in several formats: audio, video, textual.

• Needed to relate information across media for both users and third-party developers

• Approach: Create a Web presence for each

• Broadcast

• Artist

• Species (and other biological ranks), habitat and adaptation

–that the BBC has an interest in.

"Creating web identifiers for every item the BBC has an interest in, and considering those as aggregations of BBC content about that item, allows us to enable very rich cross-domain user journeys."-- Yves Raimond

• BBC Music is underpinned by the Musicbrainz music database and Wikipedia.

• “BBC Music takes the approach that the Web itself is its content management system. [BBC] editors directly contribute to Musicbrainz and Wikipedia.”

BBC

• Wildlife Finder links existing LOD data with BBC content to make pages about each species, habitat and adaptation:

• Wildlife programmes (clips and episodes) are identified by tagging the clip or episode with the appropriate dbpedia URI.

"The RDF representations of these web identifiers allow developers to use our data to build applications."-- Yves Raimond

A few final thoughts

A little bit can be very powerful!

Web 3.0 = Semantic Web

tripleOWLRDF

SPARQL

Linked Data

RDFs

SKOS

RDFa

Web 3.0 = Semantic Web

Dublin Core

tripleOWLRDF RDFa PURLs

ontology

NLP

OWL-DLOWL-FullRDFs

entity extraction

OWL2OWL-lite

subject objectpredicate

folksonomy

microformats GRDDL

URI

triplestore

SPARQLArtificial Intelligence cloud computing open world reasoning

reasoning engine

Linked Data

taxonomy

data portability

LOD LED

REST

vocabulary

SKOS

microdata

Questions? Operators are standing by.

EricAxel@yahoo.com

THANK YOU!

Semantic UniverseFree Informational Resourcewww.SemanticUniverse.com

Semantic Technology Conferencewww.Semantic-Conference.com

June 21-25, 2010

Resourceshttp://geekandpoke.typepad.com/

http://richard.cyganiak.de/2007/10/lod/

http://iconcertcal.com

http://siri.com

http://data.nytimes.com

http://freedigitalphotos.com

http://aldobucchi.com

http://www.milinkito.com/swf/bart.php

Resourceshttp://www.flickr.com/photos/kellyhogaboom/4369774518/

http://www.flickr.com/photos/zenera/56677048/

http://www.flickr.com/photos/97964364@N00/59780745/

http://www.flickr.com/photos/starwarsblog/793008715/

http://www.flickr.com/photos/peterpearson/871254091/

http://www.flickr.com/photos/birdfarm/60946474/

http://www.flickr.com/photos/entropy1138/173847148/

http://www.flickr.com/photos/wainwright/351684037/

http://data.nytimes.com/50891932523096258603.rdf