HDI III - Healthdata.gov - Now, Next and Challenges

60
healthdata.gov now and next challenges overview hhs ocio, health datapalooza 2012

description

This is a presentation that will be given at the 2012 Health Datapalooza (http://hdiforum.org), describing the new healthdata.gov site, its PaaS/DaaS direction, and related i2/ONC developer challenges.

Transcript of HDI III - Healthdata.gov - Now, Next and Challenges

Page 1: HDI III - Healthdata.gov - Now, Next and Challenges

healthdata.govnow and next

challenges overview

hhs ocio, health datapalooza 2012

Page 2: HDI III - Healthdata.gov - Now, Next and Challenges

2

session agenda

• now– tools and features

• next– target architecture

• challenges– explanations in sequence

Page 3: HDI III - Healthdata.gov - Now, Next and Challenges

3

now – tools and features

• Drupal – publishing workflow and community engagement

• Solr – faceted search

• CKAN– ‘on demand resources’ (RESTful API and feeds)

• EC2– powered by GovCloud

• github.com/hhs – public repo’s coming soon!

Page 4: HDI III - Healthdata.gov - Now, Next and Challenges

4

publishing workbench

• insert interesting workbench screenshot

Page 5: HDI III - Healthdata.gov - Now, Next and Challenges

5

community engagement

• insert interesting community engagement

screenshot

• question and/or ideas example

Page 7: HDI III - Healthdata.gov - Now, Next and Challenges

7

hub.healthdata.gov/api/rest/dataset

step 1: HTTP GET/dataset

collection as JSON

(GUID or name)

Page 8: HDI III - Healthdata.gov - Now, Next and Challenges

8

hub.healthdata.gov/api/rest/dataset/{name}

step 2: HTTP GET

each/dataset

(as JSON, RDF/XML, or N3)

Page 9: HDI III - Healthdata.gov - Now, Next and Challenges

9

hub.healthdata.gov/api/search/dataset?q=medicare+costs

JSON results for ‘medicare’and ‘costs’

search query

Page 10: HDI III - Healthdata.gov - Now, Next and Challenges

10

hub.healthdata.gov/feeds/dataset.atom

atom feed for all

datasets (including recent

updates and changes)

Page 11: HDI III - Healthdata.gov - Now, Next and Challenges

11

hub.healthdata.gov/feeds/custom.atom?q=medicare+cost

custom search query result

atom feed(anything with

‘medicare+cost’)

Page 12: HDI III - Healthdata.gov - Now, Next and Challenges

12

next – target architecture

• linked data– (closed) google knowledge graph

– open health knowledge graph

• integration framework– top down modeling

– bottom up mapping

– social curation

Page 13: HDI III - Healthdata.gov - Now, Next and Challenges

13

#gkg – (closed) ‘things, not strings’

“The Knowledge Graph helps us understand the relationships between things [… that are]

linked in our graph. […] It’s not just a catalog of objects; it also

models all these inter-relationships.” source

Page 15: HDI III - Healthdata.gov - Now, Next and Challenges

15

health.data.gov/id/hospital/393303

Page 16: HDI III - Healthdata.gov - Now, Next and Challenges

16

clinical quality linked data (HDI II)

Page 17: HDI III - Healthdata.gov - Now, Next and Challenges

17

lifting and enrichment

Page 18: HDI III - Healthdata.gov - Now, Next and Challenges

18

Linked Data Integration Framework

GKG/Watson/Siri/… healthdata.gov

HKG

PCAST DEAS

Health Data Actor

Variety Volume Velocity

Page 19: HDI III - Healthdata.gov - Now, Next and Challenges

19

social meta/data – graph curation

Page 20: HDI III - Healthdata.gov - Now, Next and Challenges

20

i2 challenges

• two types– three domain specific

• improve the integration and liquidity of data made available

– four platform specific• enhance the capabilities of the technology components

• 3 release rounds– sequenced to leverage dependencies

• round 1: June through October 2102• round 2: November 2012 through May 2013• round 3: June through December 2013

Page 21: HDI III - Healthdata.gov - Now, Next and Challenges

21

round 1 challenges

• June 2012 through October 2012

– domain specific • [1.1] cross domain and domain specific metadata

–voluntary consensus standards organizations, defacto standards, other

– platform specific• [1.2] Simplified Sign On (SSO)

–WebID identity provider and relying parties, HDP infrastructure components

– $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes

Page 22: HDI III - Healthdata.gov - Now, Next and Challenges

22

round 2 challenges

• November 2012 through May 2013

– domain specific • [2.3] Mapping, Reconciliation and Correlation

–structural variety, authoritative URI’s, linking heuristics

– platform specific• [2.4] Faceted Browsing and Visualization

–D3 (backbone, jQuery, etc.)• [2.5] Custom API

–Linked Data API ‘configurator’ for dataset resources

»each of these builds on [1.1] results

Page 23: HDI III - Healthdata.gov - Now, Next and Challenges

23

round 3 challenges

• June 2013 through December 2013

– domain specific • [3.6] Correlating HHS and NHS Classifications

–structural variety, authoritative URI’s, linking heuristics

– platform specific• [3.7] Linked Data API based Data Element Access Services

– ‘securing the data, not just the device’»builds on [1.1], [1.2], and [2.5]

Page 24: HDI III - Healthdata.gov - Now, Next and Challenges

24

domain challenge [1.1]

• Metadata– requests the application of existing voluntary

consensus standards for metadata common to all open government data

– and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations

– that form the foundations for better discovery, integration and liquidity.

• 374 on challenge.gov

Page 25: HDI III - Healthdata.gov - Now, Next and Challenges

25

W3C SKOS – concept schemes

Page 26: HDI III - Healthdata.gov - Now, Next and Challenges

26

W3C DCAT – data catalogs

Page 27: HDI III - Healthdata.gov - Now, Next and Challenges

27

hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf

rdf/xml output uses dublin core and dcat metadata

(mapping issues to work out, N3 output is incomplete, etc.)

Page 28: HDI III - Healthdata.gov - Now, Next and Challenges

28

https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf

ckan script that creates dc and dcat

metadata tags / values

(thanks @JoshData! public github repo

soon :-)

Page 29: HDI III - Healthdata.gov - Now, Next and Challenges

29

W3C Data Cube – statistics

refactor CQLD vocabs/data?start here and follow imports

Page 30: HDI III - Healthdata.gov - Now, Next and Challenges

30

W3C Provenance – change mgmt

apply to CKAN /revisions

Page 31: HDI III - Healthdata.gov - Now, Next and Challenges

31

hub.healthdata.gov/revision

Page 32: HDI III - Healthdata.gov - Now, Next and Challenges

32

W3C org – organization

Page 33: HDI III - Healthdata.gov - Now, Next and Challenges

33

quantity, units, dimensions, time

Page 34: HDI III - Healthdata.gov - Now, Next and Challenges

34

OGC GeoSPARQL – geospatial

Page 36: HDI III - Healthdata.gov - Now, Next and Challenges

36

CQLD domain specific

Page 37: HDI III - Healthdata.gov - Now, Next and Challenges

37

platform challenge [1.2]

• WebID based SSO– will improve community engagement – by providing simplified sign on (SSO) for external

users interacting across multiple HDP technology components,

– making it easier for community collaborators to contribute,

– leveraging new approaches to decentralized authentication.

• 375 on challenge.gov

Page 38: HDI III - Healthdata.gov - Now, Next and Challenges

38

relying party WebID login

Page 39: HDI III - Healthdata.gov - Now, Next and Challenges

39

identity provider WebID login

Page 40: HDI III - Healthdata.gov - Now, Next and Challenges

40

edit WebID property ACL at IdP

Page 41: HDI III - Healthdata.gov - Now, Next and Challenges

41

property is now visible to the RP

Page 42: HDI III - Healthdata.gov - Now, Next and Challenges

42

domain challenge [2.3]

• Mapping, Reconciliation and Correlation– builds on the Metadata domain challenge [1.1]– begins by acknowledging disparate open government publishing

practices – and seeks the demonstration of an innovative and automated

solution for transforming semi-structured data into structured data,– reconciles decentralized distributions about the same data entity

against the master identity of an authoritative source, – and correlates these master identities when multiple authoritative

sources exist, – enabling the network effect by introducing strong identity resolution

techniques that ease the ability to aggregate different data about the same entities from independent publishers.

Page 43: HDI III - Healthdata.gov - Now, Next and Challenges

43

automating structural transformations

Page 44: HDI III - Healthdata.gov - Now, Next and Challenges

44

‘reconciling’ strings to things

Page 45: HDI III - Healthdata.gov - Now, Next and Challenges

45

result: turtle is the new JSON!

Page 46: HDI III - Healthdata.gov - Now, Next and Challenges

46

link automation heuristics editor

Page 47: HDI III - Healthdata.gov - Now, Next and Challenges

47

platform challenge [2.4]

• Faceted Browsing and Visualization– builds on the Metadata domain challenge [1.1]– uses the most popular browser based UI frameworks and libraries

to realize novel exploration and discovery techniques for traversing large amounts of interrelated data,

– contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content.

Page 48: HDI III - Healthdata.gov - Now, Next and Challenges

48

surfing the domain schemata

no domain knowledge required to discover

entities and relationships

Page 49: HDI III - Healthdata.gov - Now, Next and Challenges

49

agents construct e/r queries

Siri, which {LA County} Hospitals have the best {Heart Attack} stats?

Page 50: HDI III - Healthdata.gov - Now, Next and Challenges

50

d3 (jQuery, backbone, etc.)

Page 51: HDI III - Healthdata.gov - Now, Next and Challenges

51

platform challenge [2.5]

• Custom API– also builds on the Metadata domain challenge [1.1]– makes it possible to tune programmatic access in accordance

with dataset metadata, leveraging an existing ‘Web 3.0’ framework and Linked Data API (LDA) implementation to provide specialized interfaces

Page 52: HDI III - Healthdata.gov - Now, Next and Challenges

52

a ‘Web 3.0’ API ‘configurator’

• Linked Data API (LDA)– http://code.google.com/p/linked-data-api/

• open source impl here

– http://code.google.com/p/puelia-php/

• example usage here

– http://reference.data.gov.uk/doc/department

• example api reference docs here

– http://environment.data.gov.uk/lab/doc/api-bwq-reference-v0.2.html

• commercialization example here

– http://kasabi.com/tour

Page 53: HDI III - Healthdata.gov - Now, Next and Challenges

53

domain challenge [3.6]

• Correlating HHS – NHS Classifications– builds on both the Metadata [1.1] and Mapping, Reconciliation and

Correlation [2.3] domain challenges, – and uses the US and UK health domain specific classification

schemes to exercise the capabilities demonstrated by the automated solution to [2.3],

– resulting in better international integration of frameworks for understanding societal outcomes and their corresponding health statistics.

Page 54: HDI III - Healthdata.gov - Now, Next and Challenges

54

platform challenge [3.7]

• Linked Data API based Data Element Access Services– builds on the Metadata domain challenge [1.1], and the Web ID

based SSO [1.2], and Custom API [2.5] platform challenges – augmenting WebID based authentication with metadata driven

authorization, – introducing an innovative security and privacy implementation of

‘data element access services’ (DEAS) as described by the PCAST Health IT Report,

– resulting in a Custom API configured by domain specific metadata that governs fine grained access to provide the right data to the right user.

• ‘secure the data, not just the devices’

Page 55: HDI III - Healthdata.gov - Now, Next and Challenges

55

LDA + PPO = DEAS

Page 56: HDI III - Healthdata.gov - Now, Next and Challenges

56

Privacy Preference Ontology (PPO)

Page 57: HDI III - Healthdata.gov - Now, Next and Challenges

57

user 1 AuthZ ‘1101’ all attributes

Page 58: HDI III - Healthdata.gov - Now, Next and Challenges

58

multiple machine readable formats

Page 59: HDI III - Healthdata.gov - Now, Next and Challenges

59

user 2 AuthZ ‘1101’ no attributes

Page 60: HDI III - Healthdata.gov - Now, Next and Challenges

60

thanks!

@prefix drm: <http://vocab.data.gov/def/drm#>

@prefix sdo: <http://schema.org/>

@prefix vcard: <http://www.w3.org/2006/vcard/ns#>

@prefix dc: <http://purl.org/dc/terms/>

<http://hhs.gov/staff/georgethomas#>

rdf:type drm:DataSteward , sdo:Person ;

vcard:email “george dot thomas 1 at hhs dot gov” ;

dc:contributor <healthdata.gov>,

<data.gov/semantic> .