DBpedia 2014: Highlights and Issues of the New Release

16
DBpedia 2014 : Highlights and Issues of the New Release Volha Bryl Data and Web Science Research Group University of Mannheim, Germany DBpedia Community Meeting, Leipzig, Germany, September 3, 2014

description

Presentation of the new DBpedia 2014 release at the DBpedia Community meeting in Leipzig, September 3, 2014

Transcript of DBpedia 2014: Highlights and Issues of the New Release

Page 1: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014 : Highlights and Issues

of the New Release

Volha BrylData and Web Science Research Group

University of Mannheim, Germany

DBpedia Community Meeting, Leipzig, Germany, September 3, 2014

Page 2: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 2

DBpedia 2014 : Almost Released

http://dbpedia.org/page/Rome/Londonhttp://dbpedia.org/sparql/

http://dbpedia.org/fct/

http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/

http://wiki.dbpedia.org/Datasets2014/DatasetStatistics

Page 3: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 3

DBpedia 2014 : Almost Released

• Why “DBpedia 2014”?

• Suggested at one of the developers’ hangouts

• 3.10 would be confusing

• 4.0 would mean “there are major changes/improvements”

Page 4: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 4

DBpedia 2014 : Team

• Research Group Data and Web Science, University of Mannheim• Daniel Fleischhacker *

• Michael Moore * (intern from Uni Waterloo)

• Volha Bryl *

• Christian Bizer

* funded by the LOD2 project

• With the support of• Dimitris Kontokostas

• Jona Christopher Sahnwaldt

• Kingsley Idehen, Patrick van Kleef, Mitko Iliev (OpenLink Software)

• Heiko Paulheim, Petar Ristoski (Uni Mannheim)

• …the whole DBpedia community…

Page 5: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 5

DBpedia 2014 : Facts and Numbers

• Dumps from April / May 2014

• 3.9 was based on dumps from March / April 2013

• Improved mappings

• http://mappings.dbpedia.org/, mid July 2014

• 4,339 mappings (3.9: 3,177 mappings)

• Enlarged Ontology

• 685 classes (3.9: 529)

• 1,079 object and 1,600 datatype properties (3.9: 927 and 1,290)

• More mappings to schema.org, Wikidata, …

• Mappings to DOLCE ontology, by Aldo Gangemi

Page 6: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 6

DBpedia 2014 : Facts and Numbers

• 125 languages (3.9: 119)

• 10,000+ articles

• New mapping-based chapters and data for Belarusian (be), Serbian (sr), Welsh (cy), Slovak (sk)

• Wikimedia Commons extraction

• New extractors/datasets

• Length of article page (page-length)

• Number of out-going links (out-degree)

• Anchor texts used by links referring to an entity (anchor-text)

• Should we publish them?

• Surface forms (anchor-texts + redirect labels + labels) (surface-form)

Page 7: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 7

DBpedia 2014 : Facts and Numbers

(code)

•New abstract extraction approach, abstracts are much cleaner now

• Local Wikipedia copy + using Media Wiki API for parsing

•Canonicalized (-en-uris) dumps based on Wikidata language

• Based on newly introduced Wikidata extractors

• old-interlanguage-links dumps contain leftover language links directly contained in Wikipedia (ignored for canonicalization)

•Support for RDF 1.1

•Improved handling of dates and times

•Improved handling of external URLs

Page 8: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 8

DBpedia 2014 : Facts and Numbers

Instances, localized (non-en) URIs Mapping-based statements

3.9 2014 diff, % 3.9 2014 diff, %

en 4,258,406 4,584,616 7.7 41,804,545 61,255,734 46.5

nl 1,461,314 1,774,536 21.4 5,039,583 6,752,260 34.0

de 1,547,785 1,692,634 9.4 4,070,927 6,733,886 65.4

fr 1,378,099 1,504,453 9.2 5,273,302 6,899,052 30.8

it 1,029,528 1,128,909 9.7 5,724,415 7,984,501 39.5

ru 999,165 1,119,142 12.0 3,174,725 4,070,294 28.2

es 1,003,158 1,086,296 8.3 5,950,626 7,070,608 18.8

pl 960,880 1,043,400 8.6 4,624,126 6,031,811 30.4

ja 860,917 913,488 6.1 1,674,891 2,136,719 27.6

pt 764,132 812,610 6.3 4,489,235 5,098,947 13.6

* More at http://wiki.dbpedia.org/Datasets2014/DatasetStatistics

Page 9: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 9

DBpedia 2014 : Facts and Numbers

English DBpedia

3.9 2014 diff, %Persons 832,000 1,445,000 73.68Places 639,000 735,000 15.02Populated Places 427,000 478,000 11.94Creative Works 372,000 411,000 10.48Music Albums 116,000 123,000 6.03Films 78,000 87,000 11.54Video Games 18,500 19,000 2.70Organizations 209,000 241,000 15.31Companies 49,000 58,000 18.37Educational Institutions 45,000 49,000 8.89Species 226,000 251,000 11.06Diseases 5,600 6,000 7.14

Page 10: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 10

DBpedia 2014 : Organizational

• International DBpedia chapters

• http://wiki.dbpedia.org/Internationalization/Chapters

• …how many are alive?

• Call to chapter maintainers: please update the data!

• …or, do you prefer to (more frequently) extract data on your own?

• Let’s keep trace of all the ongoing and completed DBpedia projects

• https://github.com/dbpedia/extraction-framework/wiki

Page 11: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 11

DBpedia 2014 : Open Points

• Mappings are… aging

• cs (Czech) template usage patterns changed =>

• Fixed: redirects in both directions are now resolved

• Fix on the statistics page?

Page 12: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 12

DBpedia 2014 : Open Points

• Mappings are… aging

• For some old and new templates parallel mappings exist!

• New, not detailed, created by Lebot (who is to blame?) http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_spaceflight

• Old, detailed, created in 2010: http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_space_mission

• Redirects from the old to the new one exist in Wikipedia• …but our fix would not help in this case

Page 13: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 13

DBpedia 2014 : Open Points

• Wikidata vs. DBpedia

• Wikidata properties adoption in Infoboxes, counts

• English: 5,229 occurrences

• Italian: 291

• German: 460

=> strategy – ignore (so far)

• Does any kind of template auto-completion exists?

Page 14: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 14

DBpedia 2014 : Open Points

• Nesting templates and conditional logic

• Infobox is filled from other templates, Extraction Framework gives very few results

• Strategy – ???

Page 15: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 15

DBpedia 2014 : More Points

• Take care about documentation and comments

• A big effort to improve extraction/release preparation guides while working on the release, to be integrated:

https://github.com/dfleischhacker/extraction-framework/wiki/

• Testing and data quality checking should be done on languages other than English

• Next time announce not only mapping, but also coding sprint

Page 16: DBpedia 2014: Highlights and Issues of the New Release

DBpedia 2014, Volha Bryl 16

DBpedia 2014 : More Points

• Take care about documentation and comments

• A big effort to improve extraction/release preparation guides while working on the release, to be integrated:

https://github.com/dfleischhacker/extraction-framework/wiki/

• Testing and data quality checking should be done on languages other than English

• Next time announce not only mapping, but also coding sprint

• …volunteers for doing the next release?