DBpedia ♥ Commons
-
Upload
dimitris-kontokostas -
Category
Technology
-
view
131 -
download
1
description
Transcript of DBpedia ♥ Commons
2nd DBpedia Meeting Leipzig 03.09.2014
DBpedia ♥ Commons
Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this
2nd DBpedia Meeting Leipzig 03.09.2014
A lot of pages like this
2nd DBpedia Meeting Leipzig 03.09.2014
Many pages like this
2nd DBpedia Meeting Leipzig 03.09.2014
Not very similar to pages like this
2nd DBpedia Meeting Leipzig 03.09.2014
DBpedia Extraction Framework
✔ “Wiki agnostic”
✔ Pluggableextractors
✔ Out of the box support for common metadata
✗ Tuned for extraction in the main namespace (not File:)
✗ Many other challenges left
2nd DBpedia Meeting Leipzig 03.09.2014
Challenges
✔ File metadata
✔ KML files
✔ Image Galleries
✔ Image Annotations
✔ Mappings Wiki
✔ Bootstrap community mappings✔ Template Statistics
✔ Licensing
✔ Technical details I'll not go into
2nd DBpedia Meeting Leipzig 03.09.2014
Out-of-the-box support
● Categories (skos)
● External links
● Geo-coordinates
● Raw infobox properties
● Labels
● PageIds / Revisions
● Links (internal / external)
● Mappings Wiki (with some tweaking / more on that later)
2nd DBpedia Meeting Leipzig 03.09.2014
File metadata
● New Extractor
● New file Class hierarchy
– dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and dbo:Sound
Sample Output:
:Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; dcterms:type dbo:StillImage dbo:fileExtension "jpg" dcterms:format "image/jpeg" dbo:fileURL commons-path:Aeropetes.JPG ; foaf:depiction commons-path:Aeropetes.JPG ; dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
2nd DBpedia Meeting Leipzig 03.09.2014
Image Galleries
● Attach each galleryitem to the pageresource
:Colorado dbo:hasGalleryItem Colorado.JPG, Denver_Colorado_Art.jpg, ColoradoCenter1.jpg.
2nd DBpedia Meeting Leipzig 03.09.2014
Image Annotations
● AnnotationGadget
● Boxes withoptional description
2nd DBpedia Meeting Leipzig 03.09.2014
Image Annotations
● W3 Media Fragments recommendation
● Embed the box in the URI– ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> .
● Add descriptions in the new resource
2nd DBpedia Meeting Leipzig 03.09.2014
Mappings Wiki
2nd DBpedia Meeting Leipzig 03.09.2014
Template Statistics
2nd DBpedia Meeting Leipzig 03.09.2014
Licensing
● Identified & imported automatically ~360 licence templates
● Use the mappings wiki
● Needed some hacking to make it work
– e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}}
:Acraea_circeis.JPG dbo:license <http://creativecommons.org/publicdomain/mark/1.0/>
:Antepipona_deflenda_-_2012-10-17.webm dbo:license <http://creativecommons.org/licenses/by-sa/3.0/ >
2nd DBpedia Meeting Leipzig 03.09.2014
KML Annotations attached to media
Attach raw KML data to resource with custom extractor
Sample Output::Yellowstone_1871b.jpg dbo:hasKMLData “”” ?xml version=1.0 encoding=UTF-8?><kml xmlns=http://earth.google.com/kml/2.2”><GroundOverlay><name>Yorktown, Indiana (1878)</name><description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman Brothers' Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> <color>99ffffff</color><Icon><href>BIG_LINK_HERE</href><viewBoundScale>0.75</viewBoundScale></Icon><LatLonBox><north>40.26126145890567</north><south>40.25777915632657</south><east>-86.77033439383223</east><west>-86.77398493316619</west><rotation>-1.123009884936565</rotation></LatLonBox></GroundOverlay></kml>“”"^^rdfs:XMLLiteral .
2nd DBpedia Meeting Leipzig 03.09.2014
Left TODOs
● Nested templates are commonly used and cannot be handled by the mappings wiki atm
– e.g. Media descriptions (although mapped) are missing{{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr|Logo du projet [[w:fr:DBpedia|DBpedia]]}}
● Annotation descriptions need some tweaking
– Need to render wikitext● Put it under a SPARQL Endpoint
● Provide Linked Data
– http://commons.dbpedia.org
2nd DBpedia Meeting Leipzig 03.09.2014
Thank You!
Special thanks to:
● Alexandru Todor (importing the License templates)
● Google Summer of Code for sponsoring this project (Gaurav Vaidya)
Questions?
Dataset: http://nl.dbpedia.org/downloads/commonswiki Dataset samples: https://github.com/gaurav/commons-extraction