NIH Data Commons - Note: Presentation has animations

22
The NIH Data Commons Digital Ecosystems for using and sharing FAIR Data Vivien Bonazzi, Ph.D. Senior Advisor for Data Science Office of Data Science (ADDS) National Institutes of Health

Transcript of NIH Data Commons - Note: Presentation has animations

Page 1: NIH Data Commons  - Note:  Presentation has animations

The NIH Data Commons Digital Ecosystems for using and sharing FAIR

DataVivien Bonazzi, Ph.D.

Senior Advisor for Data ScienceOffice of Data Science (ADDS)National Institutes of Health

Page 2: NIH Data Commons  - Note:  Presentation has animations

The Data Commons is a platform

that fosters the development of a digital ecosystem

Page 3: NIH Data Commons  - Note:  Presentation has animations

That digital ecosystem allows

transactions to occur on FAIR data at scale

Page 4: NIH Data Commons  - Note:  Presentation has animations

Data Commons is a Platform that fosters development of a digital Ecosystem

Treats products of research – data, software, methods, papers etc as a digital asset (object)

Digital objects need to conform to FAIR principles- Findable, Accessible, Interoperable, Reproducible

Digital objects exist in a shared virtual space (initial) - Find, Deposit, Manage, Share and Reuse: digital assets

Enables interactions between Producers and Consumers of digital assets

Gives currency to digital assets and the people who develop and support them

Page 5: NIH Data Commons  - Note:  Presentation has animations

To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stackto help visualize the concept

Page 6: NIH Data Commons  - Note:  Presentation has animations

NIH Data Commons - Platform Stack

https://datascience.nih.gov/commons

Page 7: NIH Data Commons  - Note:  Presentation has animations

https://datascience.nih.gov/commons

NIH Data Commons - Platform Stack

Page 8: NIH Data Commons  - Note:  Presentation has animations

NIH Data Commons - Platform Stack

Digital Market Place, Bazaar, Community

Sangeet Paul Choudary – Platform Scale

Network/Community

Market Place

Technology

Data

Page 9: NIH Data Commons  - Note:  Presentation has animations

NIH Data Commons Pilots

Page 10: NIH Data Commons  - Note:  Presentation has animations

Current Data Commons Pilots

Reference Data Sets

Commons Stack Pilots

Cloud Credit Model

Resource Search & Index

• Explore feasibility of the Commons Platform (FW)• Provide data objects to populate the Commons• Facilitate collaboration and interoperability

• Provide access to cloud (IaaS) and PaaS/SaaS via credits• Connecting credits to NIH Grant

• Making large and/or high value NIH funded data sets and tool accessible in the cloud

• Developing Data & Software Indexing methods• Leveraging BD2K efforts bioCADDIE et al• Collaborating with external groups

Page 11: NIH Data Commons  - Note:  Presentation has animations
Page 12: NIH Data Commons  - Note:  Presentation has animations

Data Commons Pilot – connecting the pieces

Co-location of large and/or highly utilized NIH funded data on the cloud+ commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community.

Investigators will be able to collaborate and share digital objects within this environment and connect with others

Page 13: NIH Data Commons  - Note:  Presentation has animations

An NIH Wide Data Commons Pilot

Data Lake

Page 14: NIH Data Commons  - Note:  Presentation has animations

Data Lake

Page 15: NIH Data Commons  - Note:  Presentation has animations

Indexing

Data Lake

Page 16: NIH Data Commons  - Note:  Presentation has animations

Indexing

Data Lake

Page 17: NIH Data Commons  - Note:  Presentation has animations

Indexing

Data Lake New large

data projects Messy data

Data Pond

Page 18: NIH Data Commons  - Note:  Presentation has animations

Indexing

Authorization /authentication layer

Page 19: NIH Data Commons  - Note:  Presentation has animations

ConsiderationsMetrics - understanding and accounting of data usage patterns

Cost - Cloud Storage, pay for use cloud compute (NIH credits)

Hybrid Clouds – Mix of research and commercial clouds

Connecting - Interoperability with other Commons, clouds

Consent - Reconsenting data, Dynamic consents

Standards – Metadata, UIDs, APIs

Page 20: NIH Data Commons  - Note:  Presentation has animations

A digital economy is characterized by making data a central currency to gain a business advantage

Organizations that are not born digital will be at a disadvantage in the new economy

Page 21: NIH Data Commons  - Note:  Presentation has animations

Thank you• ADDS Office

- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso• NCBI: George Komatsoulis• NHGRI: Valentina di Francesco• NIGMS: Susan Gregurik• CIT: Andrea Norris, Debbie Sinmao, • NCI: Warren Kibbe, Tony Kerlavage, Tanja Davidsen, Ian Fore• NIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria Giovanni, Alison Yao• The NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie Derr• Trans NIH BD2K Executive Committee & Working groups• Many biomedical researchers, cloud providers, IT professionals