Sentara Linked Data Workshop - Sept 10, 2012

62
Integrated Data for Improved Personal Health Delivery 10-September-2012 Presenters: Bernadette Hyland, David Wood & Luke Ruth Email. [email protected] Twitter: @BernHyland This presentation: http://slideshare.net/3roundstones

description

One day workshop to Sentara Healthcare on using a Linked Data approach for enterprise architecture. Topics include: Open Government Data initiatives, demo of Weather Health Web application; leveraging open data from NIH, NLM, NOAA, EPA, HHS; Callimachus Enterprise, a Linked Data Management System for the enterprise.

Transcript of Sentara Linked Data Workshop - Sept 10, 2012

Page 1: Sentara Linked Data Workshop - Sept 10, 2012

Integrated Data for Improved Personal Health Delivery

10-September-2012Presenters: Bernadette Hyland, David Wood & Luke Ruth

Email. [email protected]: @BernHyland

This presentation: http://slideshare.net/3roundstones

Page 2: Sentara Linked Data Workshop - Sept 10, 2012

• 9.00-9.20 - (All) Introductions

• 9.20-9.45 - (Phil) Goals & objectives

• 9.45-10.30 - (Bernadette) Value proposition of Linked Data, update on government data publishing initiatives, Health Datapalooza

• 10.30-11.10 - (David) Intro to enterprise linked data, a resource oriented approach to interoperability

• 11.10-11.30 - Break

• 11.30-12noon - (Luke) Review of Weather Health app development

• 12.00-12.45 - lunch

• 12.45-1.30 (David) Web of data architecture, Callimachus

• 1.30-2.15 (All) Building support within Sentara, uses cases for Weather Health (Phase I), Q&A

Today’s Agenda

Page 3: Sentara Linked Data Workshop - Sept 10, 2012

• Sentara team

• 3 Round Stones team

• Dave Wood, PhD - Enterprise Architect

• Bernadette Hyland - Sr. Solutions Architect

• Luke Ruth - Software Engineer

• ... All specialists in Web architecture & Linked Data

Introductions ...

Page 4: Sentara Linked Data Workshop - Sept 10, 2012

Customers & Affiliations

Environmental Protection Agency

Government Printing Office

Health & Human Services

Page 5: Sentara Linked Data Workshop - Sept 10, 2012

• Linked Data is about publishing and consuming data using international data standards

• Based on 20 year old idea

• A system of linked information systems

Why am I speaking on Linked Data and sharing today? I’m here in my role as the co-chair of W3C GLD WG.I’m a serial entrepreneur in this space having founded several companies that led some of the most widely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs 2.0 and Callimachus. I’ve authored chapters a couple peer-reviewed chapters in these books which are available in hardcopy or for free, via the Web.

Page 6: Sentara Linked Data Workshop - Sept 10, 2012

What ideas involving data access, sharing & re-use can we help nurture?

Page 7: Sentara Linked Data Workshop - Sept 10, 2012

Businesses are in future shock

• Needs changing at faster pace

• Affordable Care Act,

new regulations, changes in

global economy accelerating

changes

• Information increasingly morecentral to the operation of

any business

Jeff Pollock, Oracle

In a dynamic economy, we have to adapt quickly. We cannot change people or hardware fast enough. We have to take a new approach in software to deal with this. This is a quote from a director @ Oracle who is saying this.

Credits: (c) Random House

Page 8: Sentara Linked Data Workshop - Sept 10, 2012

"If information systems are to

keep up with business,

we need to change more than technology -

we need to change how people deal with technology."

- Jeff Pollock

Of course, Jeff also said "Changes in behavior have to be well-motivated and show some visible value immediately."

Page 9: Sentara Linked Data Workshop - Sept 10, 2012

Goal for improved health delivery ...

• Harness larger & more complex datasets to evaluate the potential for health impacts

• More accurately predict factors that contribute to illness or diagnose disease

Page 10: Sentara Linked Data Workshop - Sept 10, 2012

DATADrives every decision we make daily &

every decision others make on our behalf

What is happening to data? We are sharing it ...The Web is the a natural place to publish information for public dissemination.The modern Web is an information system owned by no one and yet open to vendors, governments and private citizens. The Web of documents has been a great place to share HTML, PDF. However we are entering the Web of Data. This is how we’ll share most open data in the next decade.

Page 11: Sentara Linked Data Workshop - Sept 10, 2012

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

Governments around the world are defining detailed digital services plans that are based on Open data and open APIs to deliver government and private digital services. At the highest level, government executives in the UK, EU, US, India, Brazil are committed to managing open data and content in a way that is useful for the consumer of that content. The question is HOW?

Page 12: Sentara Linked Data Workshop - Sept 10, 2012

Sharing Worldwide

We are sharing documents and data worldwide, routinely with people we don’t know. If achieved, it will transform how governments interact with one another, between nations and how they serve their citizens in the 21st Century.Using the Web to solicit input and inform decision making, and ultimately, to create a more transparent and accessible government is a very, very worthwhile goal.

Page 13: Sentara Linked Data Workshop - Sept 10, 2012

Who is sharing their data ... ? Small and large commercial and government organizations, NGOs, Non-profits ... plus many universities. Governments in the last few years have been responding to Open Government initiatives that mandate publishing open government data. Some are careful, slow-moving entities who simply needed to find real solutions to real problems.

Page 14: Sentara Linked Data Workshop - Sept 10, 2012

RetailersGoal: Improve click-throughs on search results

Page 15: Sentara Linked Data Workshop - Sept 10, 2012

Book PublishersGoals: Improve internal manuscript pipelines, expose

additional ways of finding and using content

Page 16: Sentara Linked Data Workshop - Sept 10, 2012

New Media

Page 17: Sentara Linked Data Workshop - Sept 10, 2012

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

Page 18: Sentara Linked Data Workshop - Sept 10, 2012

Common business need ...

•The ability to integrate & manage large amounts of data in a rigorous & transparent manner

•Discovery through interaction of scientific communities, including biomedical informatics & evidence-based medicines

Page 19: Sentara Linked Data Workshop - Sept 10, 2012

How many are doing it ...

the Web of Data• No one vendor owns it• It scales ... to Web-scale• Doesn’t require a super model• Based on International Data Exchange Standards (RDF, SPARQL)

Scope: Bigger than any other deployed systemInfinitely adaptable: Changes piecemeal and allows for ad hoc additions & changes.Ownership: Nobody owns it

Page 20: Sentara Linked Data Workshop - Sept 10, 2012

Let’s look at some ‘versions’ of the Web. It should be said here that Tim Berners-Lee, the recognized “father” of the WWW, doesn’t like the idea of versioning the Web. I happen to agree, but I understand why people do it.

As we talk about these versions of the Web, you may want to think of this as a continuum with significant waves; each with its own benchmark technologies rather than specific versions with distinct start and end points.

Nova Spivack of Radar Networks and Twine.com created this.

Page 21: Sentara Linked Data Workshop - Sept 10, 2012

RDF is a lingua franca for data

exchange

Not all of Open Government content is Linked Data. A relatively small percentage of open data is 4-5 star linked data, however it is growing exponentially. Use of structured data is actively promoted by international standards groups like the W3C and major search engines, Google, Yahoo!, Bing, Yandex.

Page 22: Sentara Linked Data Workshop - Sept 10, 2012

SemanticTechnologies

SemanticWeb

Linked

Data

Linked Open Data is a small, pragmatic portion of the greater body of Semantic Technologies & international standards for data.

Page 23: Sentara Linked Data Workshop - Sept 10, 2012

Credit: http://www.w3.org/DesignIssues/LinkedData.html

★ Make your stuff available on the web (any format)

★★ make it available as structured data (e.g. Excel instead of image scan of a table)

★★★ Use a non-proprietary format (e.g. CSV instead of Excel)

★★★★ Use URLs to identify things, so that people can point at your stuff

★★★★★ Link your data to other people’s data to provide context

The 5 Stars of Open Linked DataGuidance per Tim Berners-Lee, W3C

Page 24: Sentara Linked Data Workshop - Sept 10, 2012

★ Publish your vocabulary on the Web at a stable URI

★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number)

★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes

★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation

★★★★★ Link to other vocabularies by re-using elements rather than re-inventing

5 Stars of Open Linked VocabulariesBernard Vatant (Mondeca) Guidance

Credit: http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html

Page 25: Sentara Linked Data Workshop - Sept 10, 2012

Why is RDF important?

• It is an international standard for publishing data on the Web (public and private)

• Data exchange model

• It is the future of the Web

• ... because it is how we share and reuse data

Leading publishers, HCLS scientists, library scientists, new media, old media, retailers have all committed to structured data for improved search & access.

Page 26: Sentara Linked Data Workshop - Sept 10, 2012

WE’VE SEEN THIS BEFORE

Like HTML and RDF, credit cards have a human-readable side and a machine-readable side.

Page 27: Sentara Linked Data Workshop - Sept 10, 2012

Each HTML page is paired with a machine-readable data representation.

Page 28: Sentara Linked Data Workshop - Sept 10, 2012

Open Government Data3 brief years ...

• Starting in 2008, a few heads of state directed open government data to be published on the Web

• In September 2011, Presidents Obama (USA) and Rousseff (Brazil) endorsed the Open Government Partnership

• 7 other nations launched their government’s National Plans during the meeting of the UN General Assembly

Beginning in 2008, a a couple of heads of state embraced directed open government to be published on the Web. Last month, (September 2011), President Obama and President Dilma Rousseff stood with other heads of state to endorse the principles of the Open Government Partnership and launch their government’s Open Government National Plans during the meeting of the UN General Assembly.In addition to Brazil and the US, nations who have made committments include: Indonesia, Mexico, Norway, Philippines, South Africa, and the UK.

Page 29: Sentara Linked Data Workshop - Sept 10, 2012

• Structured data on the Web is rapidly becoming mainstream

• Government authorities are funding more Linked Open Data projects, especially for weather, human health and scientific research

• In 2012 we’re seeing Apps Challenges, hack-a-thons, funding ($1M-$200M)

What is next for Data?

What’s next? We are already seeing signs of the things to come.Structured data on the Web is quickly becoming mainstream.There have been many well-publicized triple challenges, hack-a-thons, apps challenges -- they are popping up everywhere.Organizations with mission critical applications based on relational technologies are creating a layer above their traditional architectures and building Linked Data-driven Web apps. Web apps based on LD are beginning to replace data warehouses.

Page 30: Sentara Linked Data Workshop - Sept 10, 2012

Publishing data in 2012 & beyond ...

• Good = Use Data Standards (RDF) to publish metadata about data and models

• Better = Use a Linked Data approach to publish all your open data on the Web

• Best = Link your data + models using a Linked Data approach

• Web architecture, Web-scale

Page 31: Sentara Linked Data Workshop - Sept 10, 2012
Page 32: Sentara Linked Data Workshop - Sept 10, 2012

EMRData

InternalPortal Data

Linked DataCloud

Open Government

Data

Social Media

Clinical Condition Specific

PhysiciansServicesLocations

DBpediaPub Med

NLM

CDCEPAUS

Census

FacebookTwitter

ClinicalOntology

BusinessOntology

Page 33: Sentara Linked Data Workshop - Sept 10, 2012

Methodology

1. Define target population and clinical data from electronic medical record

2. Identify sources of open government data related to environmental, weather, and other variables related to chronic pulmonary disease exacerbations

3. Combine open content from NLM, PubMed, Medline to support education

4. Leverage a Linked Data approach, using Open Source and international data exchange standards (RDF)

5. Alert patient of possible hazardous conditions and recommend appropriate actions

Page 34: Sentara Linked Data Workshop - Sept 10, 2012

Iterative Approach

• Initial POC delivered May 2012 (60 day sprint)

• EMR (anonymized)

• EPA air quality

• Doctors listing (spreadsheet)

• Demo’d at Health Datapalooza, Washington DC in June

Page 35: Sentara Linked Data Workshop - Sept 10, 2012

Using EMR and Linked Open Data

to Manage Chronic

Asthma and COPD

Health  DatapaloozaHealth  Data  Ini,a,ve  Forum  III

Page 36: Sentara Linked Data Workshop - Sept 10, 2012

Health  Data  Ini,a,ve  Forum  III

Health  Datapalooza

 Pa$ents  with  chronic  pulmonary  disease  that  are  educated  and  no$fied  of  adverse  environmental,  weather,  and  geographic    condi$ons  are  .  .  .      

Conceptual MODEL  

be#er  able  to  respond  and  proac/vely  manage  their  condi/on.

Page 37: Sentara Linked Data Workshop - Sept 10, 2012

Health  Data  Ini,a,ve  Forum  III

Health  Datapalooza

Decrease in  costly  Emergency  Department  visits

Reduce  hospital  re-­‐admissions  aBer  treatment

Improve self-­‐care  and  medica$on  compliance

Awareness  of  triggers  and  disease  management

Value

MODEL  

PROPOSITION

Page 38: Sentara Linked Data Workshop - Sept 10, 2012

Big data ecosystem includes complex data

A phased approach to delivery of a successful Weather Health Explorer application is selecting both available and reliable data sources as inputs. It is for these reasons, authoritative government sources from organizations including the National Library of Medicine (NLM), National Oceanic and Atmospheric Association (NOAA) and the US Environment Protection Agency (EPA) have been selected for use in this project.

Page 39: Sentara Linked Data Workshop - Sept 10, 2012

Health  Data  Ini,a,ve  Forum  III

Health  Datapalooza

Leverage  Linked

CDCEPAUS  Census

DBpediaPub  MedNLM

Web  of  Data

EMR

SMS

Email

Web

SEMANTIC  FRAMEWORK

DATA,  OPEN  SOURCE  &  STANDARDS

Callimachus is a Linked Data Management platform that takes full advantage of RDF and data driven navigation. Created with Web 2.0 developers in mind.Governments are providing citizens access to open government data; Corporates can information to the public, customers, suppliers, regulators, with timely information on the corporation; Research portals etc.

Page 40: Sentara Linked Data Workshop - Sept 10, 2012
Page 41: Sentara Linked Data Workshop - Sept 10, 2012

CurrentEPA

DataPatient

Admission Data by Date

Historic EPA Data at Admission

Today’s Asthma Forecast

Anticipate and Prevent

Page 42: Sentara Linked Data Workshop - Sept 10, 2012

Progress Update

• June - Sept 2012

• Designed Weather Health Web application

• Identified data sources (NIH, NOAA, EPA)

• Created a Web based application with live data feeds from NIH, NOAA & EPA

• Hosted on the cloud using a linked data management system, Callimachus

Page 43: Sentara Linked Data Workshop - Sept 10, 2012
Page 44: Sentara Linked Data Workshop - Sept 10, 2012
Page 45: Sentara Linked Data Workshop - Sept 10, 2012
Page 46: Sentara Linked Data Workshop - Sept 10, 2012

NOAA EPA AQS EPA UV

NIH NIH

User

Page 47: Sentara Linked Data Workshop - Sept 10, 2012

The NLM will function as the primary source for drug-related information. The NLM publishes multiple API’s that could be of use to this project but the most immediately beneficial will probably be one called DailyMed. DailyMed is an API that offers access to current Structured Product Label (SPL) information for drugs.

Page 48: Sentara Linked Data Workshop - Sept 10, 2012

http://demo.3roundstones.net/sentara/home.docbook?view

Page 49: Sentara Linked Data Workshop - Sept 10, 2012

Drug information may also be taken from a service called MedlinePlus - which is organized and distributed by the National Library of Medicine, National Institutes of Health, and the Department of Health and Human Services. Upgrades are currently being done to MedlinePlus which will include the ability to return an XML document as opposed to a search results page. This feature would be extremely useful and if fully functional, may make MedlinePlus the logical choice for primary drug information.

Page 50: Sentara Linked Data Workshop - Sept 10, 2012

EBS - 50 GBM2.2XLarge

S3 - 50 GB

Additional attachedstorage

Periodic snapshots(backup)

Monitoring Service

Appl

icatio

n-le

vel

mon

itorin

gHT

TP/H

TTPS

Email/SMS

notifications

Adm

inistr

ation

Off-site backups

SNS

System-level

monitoring

Emai

l/SM

Sno

tifica

tions

Callimachus(application)

Public users

Hosted on cloud

Page 51: Sentara Linked Data Workshop - Sept 10, 2012

In summary, Weather Health ... • Leverages internal and external structured data on the Web

• All data from authoritative sources

• Involves a combination of static and dynamic data

• Hosted on the cloud using AWS

• Created using a linked data management system

• Callimachus enables Web 2.0 internal or contract developers to combine data sources & quickly build a web UI for Web or mobile devices

The Weather Health application can also serve to warn patients of drug interactions or advising them on dosage. There is also opportunity for smaller modules within the application such as pill identification by using imprint data. This application was built using Callimachus, a data platform for data-driven applications. Callimachus allows Web 2.0 developers within Sentara or external developers to combine multiple data sources and quickly build a Web UI.

The basic architecture for the Weather Health solution involves a combination of both static (or pseudo-static) and dynamic data.

Page 52: Sentara Linked Data Workshop - Sept 10, 2012

LUNCH BREAK!

Page 53: Sentara Linked Data Workshop - Sept 10, 2012

Web of Data

• Resource oriented approach to data interoperability

• Callimachus Overview

• Maturity of ecosystem

• Development environments, reporting tools, databases, hosting, commercial support & training

• Next steps, an iterative approach

Page 54: Sentara Linked Data Workshop - Sept 10, 2012

1970s 1980s 1990s

$ cat foo.txt | grep blah | sort

A neat little package Client-Server The Early Web

A History of Silos

Page 55: Sentara Linked Data Workshop - Sept 10, 2012

Web of Data

Extending theUniversal Client

Expanding theUniversal Connection

Providing theUniversal Database

Explaining the Logic

Ubiquitous,reusable applications

The Next Great Leap

Page 56: Sentara Linked Data Workshop - Sept 10, 2012

1970s 1980s 1990s 2000s

Code written

Dataformatted

Writing Business Applications

Page 57: Sentara Linked Data Workshop - Sept 10, 2012

R&D | RDI

Requirements of The Informatics Landscape

vMust span the entire drug development lifecycleo and back (post-market surveillance to discovery)

vMust support large and very heterogeneous datao single nucleotide polymorphisms to countries

vWill change as new science emerges & new regulations come into playoMedline just under 1M articles/year

vMust be able to work with multiple, international regulatory bodiesoEmerging markets

vPartners, customers and collaborators will changeo and will have divergent technical aptitudes

vMust be able to interoperated with pre-competitive consortiaoCan they perform common tasks for the community

vMust be able to work with legacy datao Lots of unmined gems here!

Maximal Agility

Slide credit: Tom Plaster, PhD, AstraZeneca

Page 58: Sentara Linked Data Workshop - Sept 10, 2012

Improving Internal Interoperability

Scientists, Clinicians, Informaticists can now freely interoperate as:

vThe PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations

vThe vocabulary server provides a way of harmonizing concepts across different domainsoWhere possible, public vocabularies are usedoWhere not, they’re extendedoWe don’t want to develop and maintain vocabularies

Slide credit: Tom Plaster, PhD, AstraZeneca

Page 59: Sentara Linked Data Workshop - Sept 10, 2012

•Callimachus is a framework for data-driven applications based on Linked Data principles

•Callimachus allows Web developers to easily create data driven applications for the Web

• It is Open Source (FLOSS)

•http://callimachusproject.org

Page 60: Sentara Linked Data Workshop - Sept 10, 2012

• Large and small vendors are involved in Linked Data

• From Oracle, IBM to 3 Round Stones

• Listing of active research projects & deployments See http://dir.w3.org/

• Best practices, see http://www.w3.org/2011/gld/charter

Tools & best practices?

Page 61: Sentara Linked Data Workshop - Sept 10, 2012

W3C HCLS

vActivities:oContinue to develop high level (e.g. TMO) and architectural (e.g. SWAN)

vocabularies.oImplement proof-of-concept demonstrations and industry-ready code.oDocument guidelines to accelerate the adoption of the technology.oDisseminate information about the group's work at government, industry, academic

events and by participating in community initiatives.vUse Cases/DomainsoDrug DiscoveryoElectronic Lab NotebooksoComparator Arm Data oPatient Data Ownership oBiotech AcquisitionoSupply Chain AutomationoWeb Integration oBio-surveillance oCo-development

Reference: http://www.w3.org/blog/hcls/

The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical

research and translational medicine

Slide credit: Tom Plaster, PhD, AstraZeneca

Page 62: Sentara Linked Data Workshop - Sept 10, 2012

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

This presentation is licensed under a Creative Commons BY-SA license, allowing you to share and remix its contents as long as you give us attribution and share alike.