Georgi Kobilarov, Chris Bizer, Sören Auer, Jens...
Transcript of Georgi Kobilarov, Chris Bizer, Sören Auer, Jens...
Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann
Freie Universität Berlin, Universität LeipzigFreie Universität Berlin, Universität Leipzig
I f b E t tiInfobox Extraction
dbpedia:Albert Einstein p:namedbpedia:Albert_Einstein p:name„Albert Einstein“
dbpedia:Albert Einstein p:birth placedbpedia:Albert_Einstein p:birth_place dbpedia:Ulm
dbpedia:Albert Einstein p:birth datedbpedia:Albert_Einstein p:birth_date„ 1956‐07‐09“
St t i Wiki di ‘ K l dStructuring Wikipedia‘s Knowledge
• Structuring actual data, not modeling theworldworld
• Bound to Wikipedia Templates, parsers handle template values based on rules (propertysplitting merging transformation)splitting, merging, transformation)
DB di O t lDBpedia Ontology
• DBpedia Ontology build from scratch
• 170 classes 900 properties• 170 classes, 900 properties
T l t M iTemplate Mapping
Class TV Episode (Work)
Wikipedia Templates:Wikipedia Templates:
Television Episodep
UK Office Episode
Simpsons Episode
D t Wh BDoctorWhoBox
T l t M iTemplate Mapping
I f b C i k tInfobox CricketerInfobox Historic CricketerInfobox Historic CricketerInfobox Recent CricketerInfobox Old Cricketer
Infobox Cricketer BiographyInfobox Cricketer Biography
=> Class Cricketer (Athlete)
O i tiOrganisations
Band
Company
Educational InstitutionEducational Institution
Radio Station
Sports Team
M t t d d tMore structured data
• Categories in SKOS
• Intra‐wiki links
• Disambiguation• Disambiguation
• Redirects
• Links to Images (and Flickr)
Li k t t l b• Links to external webpages
M ltili lMultilingual
Abstracts– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 S di h 144 000– Swedish: 144,000
– Chinese: 101,000
S ti W bSemantic Web
“My document can point at your document on the Web but my database can't point atthe Web, but my database can t point at something in your database without writing
l d h bspecial purpose code. The Semantic Web aims at fixing that.”g
Prof. James Hendler
W b f D tWeb of Documents
Web Browsers
Search Engines
HTTP
HTML HTML HTMLhyper h h
HTMLhyperlinks
hyperlinks
hyperlinks
A B C DA B C D
W b f D tWeb of Data
Search Linked DataLinked DataEngines MashupsBrowsers
HTTP HTTP
Thing Thing Thing Thing Thing
data data data data
Thing Thing Thing Thing Thing
datalink
datalink
datalink
datalink
B CA D E
Li k d D tLinked Data
• Use URIs as names for thingsg• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.p , p• Include links to other URIs. so that they can discover more
things.
Wikipedia Article URI:h // iki di / iki/ d idhttp://en.wikipedia.org/wiki/Madrid
DBpedia Resource URIhttp://dbpedia org/resource/Madridhttp://dbpedia.org/resource/Madrid
HTTP URIHTTP URIs
Information Resources Real‐World Resources
htt //db di / /M d id
http://dbpedia.org/resource/Madrid
http://dbpedia.org/page/Madrid
HTTP GET > 200 OKHTTP GET ‐> 303 See other
HTTP GET ‐> 200 OKhttp://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid
‐> 200 OK
Online ActivitiesMusic Online Activities
PublicationsGeographic
Cross-Domain
Life SciencesLife Sciences
U CUse Cases
1. Data Source for Web‐Applications
2. Querying Wikipedia like a database
3 Tag Web content with concepts instead of3. Tag Web content with concepts instead offree‐text tags
4. Vocabulary and semantic backbone forenterprise linked data integrationenterprise linked data integration
DB di d tDBpedia as data source
• Embed structured information fromWikipedia into your web applicationsWikipedia into your web applications
• Build (mobile) maps applications usingDB di d b lDBpedia data about places
Di l ltili l titl &• Display multilingual titles &descriptions in 15 languages
A t ti D tAnnotating Documents
• Use DBpedia concepts to annotate documentsinstead of free‐text tagsinstead of free text tags
• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais Muddy Boots)(OpenCalais, Muddy Boots)
• Social Bookmarking with DBpedia URIs as tags www faviki comwww.faviki.com
A l “„Apple“
http://dbpedia.org/resource/Apple_Inc.
http://dbpedia org/resource/Apple (fruit)http://dbpedia.org/resource/Apple_(fruit)
http://dbpedia.org/resource/Apple_Records
A t ti D tAnnotating Documents
• BBC editors tag news articles with DBpediatconcepts
• DBpedia Lookup ServiceDBpedia Lookup Servicehttp://lookup.dbpedia.org
Li ki E t i D tLinking Enterprise Data
Take the Linking Open Data
h t th t iapproach to the enterprises
Li ki E t i D tLinking Enterprise Data
• Connect data sets with DBpedia as shared vocabulary
• Enable meaningful navigation paths across BBC websites• Enable meaningful navigation paths across BBC websites
• Browsing Madonna‐related information across BBC News, BBC Music BBC ProgrammesBBC Music, BBC Programmes, …
• Make use of the rich background information:
relate the release of a music album to a news article aboutthe artist
C L D t F iCross‐Language Data Fusion
• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian villages
– German Wikipedia contains more person infoboxesinfoboxes
• Augment the infobox dataset with facts from other Wikipedia editionsother Wikipedia editions.
A t DB di ith E t l D tAugment DBpedia with External Data
• Linking Open Data cloud provides more data than WikipediaWikipedia– EuroStat provides additional statistical information about countries.
– Musicbrainz contains additional information about other bands.
– Geonames provides additional information about locations.
• Idea – Augment DBpedia with additional data from external g psources.
C t ib t b k t Wiki diContribute back to Wikipedia
• OpportunityF d d t b k t Wiki di– Feed data back to Wikipedia
• Extend the Wikipedia authoring environment p gwith– Suggestions for infobox values– Suggestions for infobox values– Cross‐language consistency checking for infoboxes
• Currently going on– New maps in Wikipedia based on Dbpedia MobilNew maps in Wikipedia based on Dbpedia Mobil Code (OpenStreetMap)
C t ib t b k t Wiki diContribute back to Wikipedia
• Initialize Wikipedia Clean‐Up Cycles– Data‐driven search interfaces expose the weaknesses of Wikipedia template system.
– Preferred items not showing up in end‐user interfaces may motivate Wikipedia editors to useinterfaces may motivate Wikipedia editors to use templates more stringently.
Li U d tLive Update
• Current SituationDB di d t l 3 th– DBpedia update cycle: 3 month
– Wikipedia provides us with access to the live update stream
• OpportunityOpportunity– Increase the currency of the DBpedia dataset using this update streamusing this update stream
• Result– DBpedia in synchronization with Wikipedia.