Data All the Way Down

Post on 24-Jan-2015

3.069 views 0 download

description

Presentation at OKCon 2011 on how to build web applications that provide complex data using a layered architecture.

Transcript of Data All the Way Down

Data All the Way DownJeni Tennison@JeniThttp://www.jenitennison.com/blog/

Data All the Way Down

• challenges of complex open data

• layered approach to data publishing

• essential steps

• benefits

Complex Datasets

• too much for a single spreadsheet

• need to navigate• browse through data

• look at slices of larger dataset

• view summary statistics

• need to explain• definitions of terms, provisos & disclaimers

User Challenge

• complex data sets have range of users• different hardware / platforms

• different tasks / goals

• different ability / understanding

• no one interface satisfies everyone

• data owners cannot satisfy everyone

• create ecosystem around open data

visualisation / data gap end user vs reuser

Visualisations

• approachable for real people

• necessary for stakeholder buy-in

• beauty is in what's left out• advertisement or taster of rich datasets

• often not possible in official data

• leaves questions unanswered• what if we looked at the data in a different way?

Raw Data

• importable into own data store• often only interested in particular slice

• data set may be massive / changing

• run whatever analysis you want• requires at least some programming skills

• analysis might not be appropriate for the data

• documentation probably lacking

bridging the gap layered data access

Photo by Nikita Kravchuk http://www.flickr.com/photos/mi55er/3845619153/

Layered Architecture

• user interface• navigation and global understanding

• API• curated, targeted, programmable access

• query• free-form programmable access

• raw data

legislation.gov.uk lists as Atom feeds

legislation.gov.uk content as XML

legislation.gov.uk layer other views

organograms navigable visualisation

organograms JSON data

organograms RDF / XML / HTML

organograms SPARQL query

organograms raw data

Key Techniques

• resource-driven design (good URIs)

• every page built based on API calls

• explicit links to API access• for bonus points, link to your transformation code

• consistent terminology• clear mapping from UI to API

• caching & access control at each level

Benefits

• fork at any point• don't like the visualisation / API? create your own!

• everyone is human• reusers gain understanding from user interface

• visualisation benefits the stack• API oriented towards achieving a goal

• visual validation of data improves quality

Questions?