NDS Relevant Update from the NIH Data Science (ADDS) Office
-
Upload
philip-bourne -
Category
Education
-
view
415 -
download
0
Transcript of NDS Relevant Update from the NIH Data Science (ADDS) Office
![Page 1: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/1.jpg)
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
NDS Relevant Update from the NIH Data Science (ADDS) Office
Phil Bourne, Ph.D., FACMIAssociate Director for Data Science (ADDS)
![Page 2: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/2.jpg)
How Can NDS Succeed?• Be at the right place at right time• Bring together all the right stakeholders – there are
groups missing now- eg application scientists, publishers
• Define very well the problem(s) you are trying to solve• Start with pilots, but proceed to a soup to nuts
application that has value and can be sustained
![Page 3: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/3.jpg)
How can NDS Interface with the NIH ….
![Page 4: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/4.jpg)
ADDS Mission StatementTo use data science
to foster an open digital ecosystem
that will accelerate efficient, cost-effective
biomedical research
to enhance health, lengthen life, and reduce illness and disability
![Page 5: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/5.jpg)
A couple of announcements …
![Page 6: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/6.jpg)
http://www.nih.gov/news/health/oct2015/od-20.htm
![Page 7: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/7.jpg)
![Page 8: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/8.jpg)
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative
• Workforce developmentStrengthen the ability of a diverse biomedical workforce to develop and benefit from data science
• Policy and processContribute to policies & processes involving data that further the NIH mission
• LeadershipFurther visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders
• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
![Page 9: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/9.jpg)
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative• Workforce development
Strengthen the ability of a diverse biomedical workforce to develop and benefit from data science• Policy and process
Contribute to policies & processes involving data that further the NIH mission• Leadership
Further visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders
• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem: The Commons
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
![Page 10: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/10.jpg)
Some Developments…• Centers, standards, training coordination
centers off and running• Looking at funding reference datasets• Hackathons and more…• NLM 2.0
![Page 11: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/11.jpg)
Commons Updateenabling the digital enterprise
![Page 12: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/12.jpg)
What is The Commons?
• Treats products of research – data, methods, papers etc. as digital objects
• These digital objects exist in a shared virtual space
• Digital objects conform to FAIR principles:– Findable– Accessible (and usable)– Interoperable – Reusable
![Page 13: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/13.jpg)
The Commons: Components• Computing environment
– cloud and/or HPC – supports access, utilization, sharing and storage of digital objects.
• Methods for Interoperability– enables connectivity, shareability and interoperability between digital objects.– APIs, Containers (docker etc)
• Digital object compliance model – describes the properties of digital objects that enables them to be discoverable and
shareable– Metadata, UIDs, Clear access controls (human subject data)
• Indexing– Means to find and catalog digital objects
![Page 14: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/14.jpg)
The Commons: Components
![Page 15: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/15.jpg)
Computing Environment: Cloud The ability to store, share and compute on digital research objects
Especially useful for large data sets that are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters collaboration
![Page 16: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/16.jpg)
The Commons: Cloud Commercial
AWS, Google, Microsoft, IBM Others
Academic OSC (Open Science Cloud) iDASH (HIPAA compliant)
The Broad Others
![Page 17: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/17.jpg)
The Commons: HPC• Supercomputing Centers in the US
– Supported by DOE and NSF• NERSC(San Francisco)• ORNL (Oak Ridge)• TACC (Texas)• SDSC (San Diego)• Argonne (Urbana- Champaign)
• Optimized, high performance systems with IT support
![Page 18: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/18.jpg)
The Commons: Interoperability
![Page 19: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/19.jpg)
The Commons: Interoperability• Software that supports connectivity and interoperability
between digital (data) objects
– API (Application Programing Interfaces)• Expose and and provide direct access to data• Enable data to be passed to analysis tools or pipelines
– Containers• Package and deploy software tools and pipelines to the cloud
![Page 20: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/20.jpg)
The Commons: Digital Object Compliance
![Page 21: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/21.jpg)
The CommonsDigital Object Compliance: FAIR
• Attributes of digital objects in the Commons • Initial Phase
• Unique digital object identifiers of some type• A minimal set of searchable metadata • Physically available in a cloud based Commons provider• Clear access rules (especially important for human subjects data)• An entry (with metadata) in one or more indices
– Future Phases• Standard, community based unique digital object identifiers • Conform to community approved standard metadata for enhanced searching• Digital objects accessible via open standard APIs• Are physically and logical available to the commons
![Page 22: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/22.jpg)
Commons Pilot Projects
![Page 23: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/23.jpg)
Commons Pilot Projects• Evaluating Commons Framework & Populating the Commons
– NIH funded Large Resource groups BD2K groups (cloud)
– HMP Data and tools available in the cloud (AWS)• https://aws.amazon.com/datasets/1903160021374413
– NCI Cloud Pilots & Genomic Data Commons (AWS, Google)
• The Cloud Credits - business model for using cloud resources
![Page 24: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/24.jpg)
Commons Credits (business model)
The Commons(infrastructure)Cloud Provider
ACloud Provider
BCloud Provider
C
Investigator
NIH
Provides credits Enables Search
Discovery Index
Uses credits inthe Commons IndexesOption:
Direct Funding
![Page 25: NDS Relevant Update from the NIH Data Science (ADDS) Office](https://reader035.fdocuments.us/reader035/viewer/2022070602/587650dc1a28ab0d198b6b4f/html5/thumbnails/25.jpg)
• Cost effective - Only pay for IT support used• Drives competition – Better services at lower cost• Supports data access and sharing by driving science into the Commons• Can help determine metrics of data object usage• Facilitates public-private partnership
• Never been tried, so we don’t have data about likelihood of success• Cost Models: Predicated prices among providers• Service Providers: Predicated on service providers willing to make the investment to
become conformant• Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going
Cloud Credits: Pros and Cons