Data Commons Garvan - 2016

44
The Data Commons Digital Ecosystems for Sharing and Analyzing biomedical Big Data Vivien Bonazzi, Ph.D. Senior Advisor for Data Science Office of Data Science (ADDS) National Institutes of Health

Transcript of Data Commons Garvan - 2016

PowerPoint Presentation

The Data Commons Digital Ecosystems for Sharing and Analyzing biomedical Big DataVivien Bonazzi, Ph.D.Senior Advisor for Data ScienceOffice of Data Science (ADDS)National Institutes of Health

Lets Talk About Biomedical Big Data

What Makes Big Data Big?

VOLUMEVELOCITYVARIETYVERACITY

Its a signal of the coming Digital Economy DATA has VALUEDATA is CENTRAL to the Digital EconomyBut its more than this..

An economy characterized by using data to gain a business advantage

(yes, institutions are a business)

Organizations that are not born digital will be at a disadvantage in the new economy

Organizations will be defined by their digital assets

Scientific digital assets Data Software Workflows Documentation Journal Articles

The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise

Make data

The currency of an organization

Usable in a digital ecosystems Data Commons

The problem with biomedical data

Digital assets includes Data

Challenges Biomedical Data

The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data

The ProblemWith Biomedical DATA

https://www.youtube.com/watch?v=N2zK3sAtr-4

WhatsChanging?

FAIR principles drive data to become the currency

Policies that promote data sharing via FAIR help change the culture

Currencies dont exist in a vacuum

Buy and sell Goods

14

We also need a digital ecosystem that allows transactions to occur on FAIR data at scale

The Data Commons is a platform that fosters the development of a digital ecosystem

The Data Commons platform that fosters development of a digital ecosystem

Treats products of research data, software, methods, papers etc as digital asset (object)

Digital objects need to conform to FAIR principles

Digital objects exist in a shared virtual space- Find, Deposit, Manage, Share and Reuse: digital assets

Enables interactions between Producers and Consumers of digital assets

Gives currency to digital assets and the people who develop and support them

The Data Commons is a platform? that fosters the development of a digital ecosystem

A nascent platform18

A platform is a plug and play model that allows multiple participants (producers and consumers) to connect to it, interact with each other and create value

Sangeet Paul Choudary Platform Scale

A lot of what see today uses a platform approach

Sangeet Paul Choudary Platform Scale

Platforms that utilize data as a central currency enable transactions between producers and consumers20

The goal of the a Data Commons Platform is to enable interactions between producers and consumersSangeet Paul Choudary Platform Scale

Producers of digital objects - data, tools, workflows - used by consumersThe Platform enables these transactions Accommodates bioinformatics and non bioinformatics users21

To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stackto help visualize the concept

Framework helps visualize the concept of the platform22

Sangeet Paul Choudary Platform Scale

Platforms have 3 layers

NIH Data Commons - Platform Stackhttps://datascience.nih.gov/commons

TechnologyTechnologyDataNetwork/market place

https://datascience.nih.gov/commonsNIH Data Commons - Platform Stack

Initial PhaseUnique digital object identifiers of resolvable to original authoritative sourceMachine readableA minimal set of searchable metadata Clear access rules (especially important for human subjects data)An entry (with metadata) in one or more indices

Future PhasesStandard, community based unique digital object identifiers Conform to community approved standard metadata and ontologies for enhanced searchingDigital objects accessible via open standard APIsNIH Data Commons: Digital Asset Compliance Making things FAIR

Data Commons Platform drives digital ecosystem

The NIH Data Commons Pilot

The NIH Data Commons Pilot

Co-location of large and/or highly utilized NIH funded data withstorage and computing infrastructure + Commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community.

Investigators will be able to collaborate and share digital objects within this environment and connect with others

NIH Nascent Commons Pilots

An NIH Wide Data Commons Pilot

Indexing

Indexing

IndexingAuthorization /authentication layer

Considerations

Metrics - understanding and accounting of data usage patterns

Cost - Cloud Storage, pay for use cloud compute (NIH credits)

Hybrid Clouds Mix of research and commercial clouds

Connecting - Interoperability with other Commons, clouds

Consent - Reconsenting data, Dynamic consents

Standards Metadata, UIDs, APIs

An Australian Commons Experiment?

A Garvan Data Commons Platform?

Garvan DATANCI + CloudAnalysis tools (Inc 3rd party)Apps StoreCommunity Research, Clinical, PublicAPI connectivity with other Commons

* All Garvan Data + Tools in authorized /access control environment allow access to approved users

* Hybrid Clouds: NCI (National Computing Infrastructure) + Commercial (AWS Allow approved users Garvan or others inlcudingcommercial vendors (ie DNA Nexus) to develop tools (SaaS) onto of the Garvan dataAPI connections to other Commons NY Genome Center* Beacon projects - variation

40

An Australian Data Commons?Australian DATA - Flora and FaunaCommercial Cloud (NCI) Analysis tools (Inc 3rd party)Apps StoreCommunity Research, Clinical, PublicAPI connectivity with other Commons

Develop an Australian Data CommonsMake ALL Australian data : flora, fauna incl. human clinical data available in a data commons cloud (mix of NCI and commercial cloud)Encourage tool development from bioinformatics research or commercial groupsMake the commons interoperable with other Cloud CommonsUse NCBI and EBI as an archive learn their annotation methods for metadata and their data distribution methods and cloud access.Embed Postdocs within NCBI and EBI to learn these methods and bring them back to Australia. Develop a team approachUse this as a way to train the next generation of scientists Bfx and non Bfx

41

To achieve great things, two things are needed: a plan and not quite enough time

Leonard Bernstein

Thank youADDS Office- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka NgossoNCBI: Jim Ostell, David Lipman, George KomatsoulisNHGRI: Valentina di Francesco, Kevin Lee, Eric GreenNIGMS: John Lorsch, Susan GregurikCIT: Andrea Norris, Debbie Sinmao, Stacy CharlandNCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen, Ian ForeNIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria GiovanniThe NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie DerrTrans NIH BD2K Executive Committee & Working groupsMany biomedical researchers, cloud providers, IT professionals

John Mattick and the Garvan Institute

Stay in Touch

QR Business [email protected]

SlideshareBlog (Coming soon!)Vivien Bonazzi

[email protected]