Georgi Kobilarov, Chris Bizer, Sören Auer, Jens...

72
Georgi Kobilarov , Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig Freie Universität Berlin, Universität Leipzig

Transcript of Georgi Kobilarov, Chris Bizer, Sören Auer, Jens...

Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann

Freie Universität Berlin, Universität LeipzigFreie Universität Berlin, Universität Leipzig

Querying Wikipedia

like a Database

Domain specific

Titl

pData

Title

DescriptionImages

p

LanguagesInfoboxes

Web Links

Categorization

I f b E t tiInfobox Extraction

dbpedia:Albert Einstein p:namedbpedia:Albert_Einstein p:name„Albert Einstein“

dbpedia:Albert Einstein p:birth placedbpedia:Albert_Einstein p:birth_place dbpedia:Ulm

dbpedia:Albert Einstein p:birth datedbpedia:Albert_Einstein p:birth_date„ 1956‐07‐09“

P t SProperty Synonyms

St t i Wiki di ‘ K l dStructuring Wikipedia‘s Knowledge

• Structuring actual data, not modeling theworldworld

• Bound to Wikipedia Templates, parsers handle template values based on rules (propertysplitting merging transformation)splitting, merging, transformation)

DB di O t lDBpedia Ontology

• DBpedia Ontology build from scratch

• 170 classes 900 properties• 170 classes, 900 properties

l hNo living things

Cl Hi hClass Hierarchy

„Select all TV Episodes …“„ p

T l t M iTemplate Mapping

Class TV Episode (Work)

Wikipedia Templates:Wikipedia Templates:

Television Episodep

UK Office Episode

Simpsons Episode

D t Wh BDoctorWhoBox

T l t M iTemplate Mapping

I f b C i k tInfobox CricketerInfobox Historic CricketerInfobox Historic CricketerInfobox Recent CricketerInfobox Old Cricketer

Infobox Cricketer BiographyInfobox Cricketer Biography

=> Class Cricketer (Athlete)

P lPeople

Actors

Athlete

JournalistJournalist

MusicalArtist

Politician

Scientist

W itWriter

PlPlaces

Airport

City

CountryCountry

Island

Mountain

River

O i tiOrganisations

Band

Company

Educational InstitutionEducational Institution

Radio Station

Sports Team

E tEvent

Convention

Military Conflict

Music EventMusic Event

Sport Eventp

W kWork

Book

Broadcast

FilmFilm

Software

Television

M t t d d tMore structured data

• Categories in SKOS

• Intra‐wiki links

• Disambiguation• Disambiguation

• Redirects

• Links to Images (and Flickr)

Li k t t l b• Links to external webpages

• Data about 2.6 million “things”

• 274 million pieces of information (RDF triples)

M ltili lMultilingual

Abstracts– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 S di h 144 000– Swedish: 144,000 

– Chinese: 101,000

DBpedia asp

Linked Data HubLinked Data Hub

S ti W bSemantic Web

“My document can point at your document on the Web but my database can't point atthe Web, but my database can t point at something in your database without writing 

l d h bspecial purpose code. The Semantic Web aims at fixing that.”g

Prof. James Hendler

W b f D tWeb of Documents

Web Browsers

Search Engines

HTTP

HTML HTML HTMLhyper h h

HTMLhyperlinks

hyperlinks

hyperlinks

A B C DA B C D

W b f D tWeb of Data

Search  Linked DataLinked DataEngines MashupsBrowsers

HTTP HTTP

Thing Thing Thing Thing Thing

data data data data

Thing Thing Thing Thing Thing

datalink

datalink

datalink

datalink

B CA D E

Li k d D tLinked Data

• Use URIs as names for thingsg• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.p , p• Include links to other URIs. so that they can discover more 

things.

Wikipedia Article URI:h // iki di / iki/ d idhttp://en.wikipedia.org/wiki/Madrid

DBpedia Resource URIhttp://dbpedia org/resource/Madridhttp://dbpedia.org/resource/Madrid

HTTP URIHTTP URIs

Information Resources Real‐World Resources

htt //db di / /M d id

http://dbpedia.org/resource/Madrid

http://dbpedia.org/page/Madrid

HTTP GET > 200 OKHTTP GET ‐> 303 See other

HTTP GET ‐> 200 OKhttp://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid

‐> 200 OK

Online ActivitiesMusic Online Activities

PublicationsGeographic

Cross-Domain

Life SciencesLife Sciences

4.5 billion triples 180 million data links

Use CasesUse Cases

U CUse Cases

1. Data Source for Web‐Applications

2. Querying Wikipedia like a database

3 Tag Web content with concepts instead of3. Tag Web content with concepts instead offree‐text tags

4. Vocabulary and semantic backbone forenterprise linked data integrationenterprise linked data integration

DB di d tDBpedia as data source

• Embed structured information fromWikipedia into your web applicationsWikipedia into your web applications

• Build (mobile) maps applications usingDB di d b lDBpedia data about places

Di l ltili l titl &• Display multilingual titles &descriptions in 15 languages

DB di M bilDBpedia Mobile

S l E d i tSparql Endpoint

http://dbpedia.org/sparql

Wiki di QWikipedia Query

A t ti D tAnnotating Documents

• Use DBpedia concepts to annotate documentsinstead of free‐text tagsinstead of free text tags

• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais Muddy Boots)(OpenCalais, Muddy Boots)

• Social Bookmarking with DBpedia URIs as tags www faviki comwww.faviki.com

A l “„Apple“

http://dbpedia.org/resource/Apple_Inc.

http://dbpedia org/resource/Apple (fruit)http://dbpedia.org/resource/Apple_(fruit)

http://dbpedia.org/resource/Apple_Records

A t ti D tAnnotating Documents

• BBC editors tag news articles with DBpediatconcepts

• DBpedia Lookup ServiceDBpedia Lookup Servicehttp://lookup.dbpedia.org

Li ki E t i D tLinking Enterprise Data

Take the Linking Open Data 

h t th t iapproach to the enterprises

Li ki E t i D tLinking Enterprise Data

• Connect data sets with DBpedia as shared vocabulary

• Enable meaningful navigation paths across BBC websites• Enable meaningful navigation paths across BBC websites

• Browsing Madonna‐related information across BBC News, BBC Music BBC ProgrammesBBC Music, BBC Programmes, …

• Make use of the rich background information:

relate the release of a music album to a news article aboutthe artist

The Future of DBpedia

Improve Information Extraction

Croud‐sourceCroud source

Information ExtractionInformation Extraction

C d S d E t tiCrowd Sourced Extraction

Wh ‘ h b fi ?Where‘s the user benefit?

Data Fusion

C L D t F iCross‐Language Data Fusion

• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian villages

– German Wikipedia contains more person infoboxesinfoboxes

• Augment the infobox dataset with facts from other Wikipedia editionsother Wikipedia editions.

A t DB di ith E t l D tAugment DBpedia with External Data

• Linking Open Data cloud provides more data than WikipediaWikipedia– EuroStat provides additional statistical information about countries.

– Musicbrainz contains additional information about other bands.

– Geonames provides additional information about locations.

• Idea – Augment DBpedia with additional data from external g psources.

C t ib t b k t Wiki diContribute back to Wikipedia

• OpportunityF d d t b k t Wiki di– Feed data back to Wikipedia

• Extend the Wikipedia authoring environment p gwith– Suggestions for infobox values– Suggestions for infobox values– Cross‐language consistency checking for infoboxes

• Currently going on– New maps in Wikipedia based on Dbpedia MobilNew maps in Wikipedia based on Dbpedia Mobil Code (OpenStreetMap)

C t ib t b k t Wiki diContribute back to Wikipedia

• Initialize Wikipedia Clean‐Up Cycles– Data‐driven search interfaces expose the weaknesses of Wikipedia template system.

– Preferred items not showing up in end‐user interfaces may motivate Wikipedia editors to useinterfaces may motivate Wikipedia editors to use templates more stringently.

Li U d tLive Update

• Current SituationDB di d t l 3 th– DBpedia update cycle: 3 month

– Wikipedia provides us with access to the live update stream

• OpportunityOpportunity– Increase the currency of the DBpedia dataset using this update streamusing this update stream

• Result– DBpedia in synchronization with Wikipedia.

Open Sourcep

Open Datap

What is the

Wikipedia for Data?Wikipedia for Data?

Wikipedia is thep

Wikipedia for DataWikipedia for Data

Summary

h //db dihttp://dbpedia.org

georgi.kobilarov@fu‐berlin.de