The RDF Report Card: Beyond the Triple Count
-
Upload
leigh-dodds -
Category
Technology
-
view
11.797 -
download
0
Transcript of The RDF Report Card: Beyond the Triple Count
![Page 1: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/1.jpg)
Leigh Dodds@ldodds
http://kasabi.comhttp://slideshare.net/ldodds
The RDF Report Card
Beyond the Triple Count
26th September 2011
SemTechBiz 2011
![Page 2: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/2.jpg)
Triple counts tell us nothing
![Page 3: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/3.jpg)
Triple counts are not a quality indicator
![Page 4: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/4.jpg)
http://dbpedia.org/resource/London
![Page 5: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/5.jpg)
6 triples for Population Density
Property Count Value
http://dbpedia.org/ontology/PopulatedPlace/populationDensity 2 4807.04806.971873853451
http://dbpedia.org/ontology/populationDensity 2 4806.9718744807.000000
http://dbpedia.org/property/populationDensityKm 1 4807
http://dbpedia.org/property/populationDensitySqMi 1 12450
![Page 6: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/6.jpg)
12 triples for Location (1)
Property Count Value
georss:point 1 51.507222222222225 -0.1275
geo:geometry 1 POINT(-0.1275 51.5072)
geo:lat 1 51.507221
geo:long 1 -0.127500
![Page 7: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/7.jpg)
12 triples for Location (2)Property Count Value
dbpprop:latd 1 51
dbpprop:latm 1 30
dbpprop:lats 1 26
dbpprop:latns 1 N
dbpprop:longd 1 0
dbpprop:longm 1 7
dbpprop:longs 1 39
dbpprop:longew 1 W
![Page 8: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/8.jpg)
~4.6m redundant triples
![Page 9: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/9.jpg)
Triple counts don't indicate utility
![Page 10: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/10.jpg)
http://bbc.co.uk/programmes
2.5 million unique users per week, 60 req/s*
*http://www.guardian.co.uk/media/pda/2011/apr/06/bbc-yves-raimond
![Page 11: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/11.jpg)
http://bbc.co.uk/programmes
Dataset is less than 50 million triples
![Page 12: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/12.jpg)
Beyond the Triple Count
![Page 13: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/13.jpg)
Dataset Information Spectrum
High DetailLow Detail
Summary and overview of dataset content
Detailed data model documentation & guides
![Page 14: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/14.jpg)
Dataset Information Spectrum
High DetailLow Detail
Summary and overview of dataset content
Detailed data model documentation & guides
More Information
![Page 15: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/15.jpg)
Dataset Information Spectrum
High DetailLow Detail
● Title, Description● Provenance● Publication dates● Licensing● Usage cues● Related datasets
Metadata
![Page 16: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/16.jpg)
Dataset Information Spectrum
High DetailLow Detail
Scope ● What types of entity?● How many of each type?● Coverage
● Geographic● Events (time)
![Page 17: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/17.jpg)
Dataset Information Spectrum
High DetailLow Detail
Structure ● URI Scheme● Vocabulary meshing
● How is a person described?
![Page 18: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/18.jpg)
Dataset Information Spectrum
High DetailLow Detail
Internals ● List of Schemas & RDF terms● Class/property usage counts● Triple counts● Named graph structure● Source files
![Page 19: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/19.jpg)
RDF Report Card Example
![Page 20: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/20.jpg)
Summarising Content of a Dataset
● Find all classes in all datasets in Kasabi
● Tag each class against a pre-defined set of categories● Customized version of top-level schema.org
classes
● Generate a report card for each dataset listing types of entity
![Page 21: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/21.jpg)
Report Card Categories
![Page 22: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/22.jpg)
Ordnance Survey
http://beta.kasabi.com/dataset/ordnance-survey-linked-data
![Page 23: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/23.jpg)
BBC Music
http://beta.kasabi.com/dataset/bbc-music
![Page 24: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/24.jpg)
British National Bibliography
http://beta.kasabi.com/dataset/british-national-bibliography-bnb
![Page 25: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/25.jpg)
NHS Performance Data
http://beta.kasabi.com/dataset/nhs-performance-data
![Page 26: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/26.jpg)
Summary
● Triple counts tell us nothing● Vital to present the quality & utility of our data
● Data publishing platforms should support this
● "Progressive disclosure"● Right detail at the right time
● Dataset analysis can generate useful summaries● e.g. an RDF report card
![Page 27: The RDF Report Card: Beyond the Triple Count](https://reader034.fdocuments.us/reader034/viewer/2022042702/55d730bebb61eb502b8b4582/html5/thumbnails/27.jpg)