The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...
Transcript of The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...
The NCI Cancer Research Data Commons
AllenDearry,Ph.D.ProgramDirector
CenterforBiomedicalInforma9csandInforma9onTechnology
CI4CC10.25.2017
Agenda
1.Background2.Overview–NCICancerResearchDataCommons3.CommonsFramework4.Discussion
Background
4
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the Underlying Causes of Primary Untreated Tumors Omics Characterization
Precision Medicine Ini1a1ve (PMI)
• Deepbiologicalunderstanding• Advancesinscien9ficmethods,instrumenta9on,andtechnology
• Advancesindatamanagementandcomputa9on• Abilitytoapplythoseadvancestodriveresearchandtreatment
• Abilitytosecurelysharedataacrossdomains,ins9tu9ons,andstakeholders
CancerresearchandcaregeneratedetaileddatathatarecriBcaltocreatealearninghealthsystemforcancerKeytenetofthePMI:secure,responsibleaccesstohigh-qualitydataThePMIwasannouncedduringtheStateoftheUnionAddress,2015
PrecisionMedicineisagrandchallenge,requiring:
5
Basic Ingredients for PMI Big Data
• Open Science. Supporting Open Access, Open Data, Open Source Software, and Data Liquidity for the cancer community
• Standardization through terminology, CDEs, and CRFs
• Interoperability by exposing existing knowledge through appropriate integration of ontologies, vocabularies, taxonomies, and data standards
• Sustainable models for informatics infrastructure, services, data, metadata, curation
6
NIH Genomic Data Sharing Policy
hAps://gds.nih.gov/ Went into effect January 25, 2015
NCI guidance:
hAp://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
Guiding Principle:
The greatest public benefit will be realized if large-scale genomic data are made available in a 1mely manner to the largest possible number of
inves1gators. For human data, data are made available under terms and condi1ons consistent with the informed consent provided by individual
par1cipants.
7
The Beau Biden Cancer Moonshotsm
Overarchinggoals–Jan,2016• Accelerateprogressincancer,includingpreven9on&screening
• FromcuNngedgebasicresearchtowideruptakeofstandardofcare
• Encouragegreatercoopera9onandcollabora9on
• Withinandbetweenacademia,government,andprivatesector
• Enhancedatasharing
BlueRibbonPanel–October,2016• NetworkforDirectPa9entEngagement• CancerImmunotherapyTransla9onalScience
Network• Therapeu9cTargetIden9fica9ontoOvercome
DrugResistance• ANa9onalCancerDataEcosystemforSharing
andAnalysis• FusionOncoproteinsinChildhoodCancers• SymptomManagementResearch• Preven9onandEarlyDetec9on–Implementa9on
ofEvidence-basedApproaches• Retrospec9veAnalysisofBiospecimensfrom
Pa9entsTreatedwithStandardofCare• Genera9onof3DHumanTumorAtlas• DevelopmentofNewEnablingCancer
Technologies• Fullreport:www.cancer.gov/brp
Na9onalCancerDataEcosystemRecommenda9on
Overallgoal:“Enableallpar9cipantsacrossthecancerresearchandcarecon9nuumtocontribute,access,combineandanalyzediversedatathatwillenablenewdiscoveriesandleadtoloweringtheburdenofcancer.”• Envisionedtoconsistofmul9plecomponents
• Fundamentalinfrastructuretoconnectthecomponentsandensureinteroperability
• CommonAPIs• Dataschemas• Commondatadic9onaries• Enhancedcloudcompu9ngpla]orms
• Componentssuchasrepositories,analy9csservices,andinterac9veportals
• TheabilitytolinkdiversedatatypesanddatasourcesisfundamentaltointeroperabilityoftheCancerDataEcosystem.
9
Changing the Conversa1on around Data Sharing
• Howdowefinddata,so^ware,standards?• Howcanwemakedata,annota9ons,so^ware,metadataaccessible?• Howdoweadopt/adaptorcreatedatastandards?• Howdowemakemoredatamachinereadable?
NaBonalCancerDataEcosystemNCICancerResearchDataCommons
NIHDataCommonsPilot
DataCommonsco-locatedata,storageandcompuBnginfrastructure,andfrequentlyusedtoolsforanalyzingandsharingdatatocreateaninteroperableresourcefortheresearchcommunity.
10
• AkeycomponentofalearningNa9onalCancerDataEcosystem• Makingresearchdataavailablefordiscovery,valida9on,newtherapies• Maximizingtheimpact,reuse,andreproducibilityofcancerresearch• Facilita9nginnova9onofmethodsandtoolsforresearch• Promo9ngresearchcollabora9ons• Changingincen9vesfordatasharing
Reducetherisk,improveearlydetecBon,outcomes,andsurvivorshipincancer
Why Develop a Cancer Research Data Commons?
NCICancerResearchDataCommons-Vision
12
The NCI Genomic Data Commons
• Unifyfragmentaryrepositories• Supportthereceipt,qualitycontrol,integraBon,storage,andredistribuBonofstandardizedgenomicdatasetsderivedfromcancerresearchstudies
• Harmoniza9onofrawsequencebothfromexis9ngandnewcancerresearchprograms
• Applica9onofstate-of-the-artmethodsofgenera9ngderivedgenomicdata• Providethefounda9onfor:
• Iden9fica9onofhigh-andlow-frequencycancerdrivers• Defininggenomicdeterminantsofresponsetotherapy• Clinicaltrialcohortssharingtargetedgene9clesions
13
• PI:GadGetz,AnthonyPhilippakis• GoogleCloud• FirehoseinthecloudincludingBroadbestprac9cesworkflows• hep://firecloud.org
BroadIns9tute
• PI:IlyaShmulevich• GoogleCloud• LeverageGoogleinfrastructure;Novelqueryandvisualiza9on• hep://cgc.systemsbiology.net/
Ins9tuteforSystemsBiology
• PI:BrandiDavis-Dusenbery• AmazonWebServices• Interac9vedataexplora9on;>30publicpipelines• hep://www.cancergenomicscloud.org
SevenBridgesGenomics
Three NCI Genomics Cloud Pilots
ExtensionDesign/BuildI
Design/BuildII Evalua9on Cloud
Resources
Sept2016Jan2016April2015Sept2014 October2017
14
Original Goals of the Pilots Remain Relevant
DemocraBzeaccesstoNCI-generatedgenomicandrelateddata,andtocreateacost-effecBvewaytoprovidescalablecomputaBonalcapacitytothecancerresearchcommunity.
Provide:• Accesstolargegenomicdatasetswithoutneedtodownload• Accesstopopularpipelinesandvisualiza9ontools• Abilityforresearcherstobringtheirowntoolsandpipelinestothedata• Abilityforresearcherstobringtheirowndataandanalyzeincombina9onwithexis9nggenomicdata• Workspaces,forresearcherstosaveandsharetheirdataandresultsofanalyses
SBGCGC
BroadFireCloud ISBCGC
Researchers
WebInterface WebInterface
DataSubmission&Harmoniza9on
NCICloudResources:Visualiza9on,Compute,Pipelines,WorkspacesAuthen9ca9on
&Authoriza9onthrueRACommons&dbGaP
GDC
GDC / Cloud Resources: Today
GenomicDataCommons:Harmoniza9on,Visualiza9on,&Download
APIsAPIs
DataCommonsFramework
Whatisit?• Reusable,expandableframeworkfortheDataCommons
• DefinesthecoreprinciplesandstructureofaDataCommons
• ProvidesreusablecomponentsthatcanbeleveragedacrosstheDataCommons
Components• Secureuserauthen9ca9onandauthoriza9on
• Metadatavalida9onandtools• Domain-specific,extensibledatamodels• APIandcontainerenvironmentfortoolsandpipelines
• Accesstocomputa9onalworkspacesforstoringdata,tools,andresults
DataCommonsFramework–Why?
• LeverageworkalreadycompletedbyGDCandCloudPilots/Resources.
• Developinfrastructureandfounda9onfortheDataCommonsandnodesastheyarecreated.
• Ensureconsistencyandinteroperabilityfromthestart,maximizefuturedatasharing.
• Designmodular,interoperablecomponents—dataaccessservices,indexingandsearch,workspaces,workflowandtoolstores,portalsandUIs--thatcanbeflexibleandassembledintodiversedataenvironments.
• Op9mizeabilitytointegratenewdatatypes.• InterrelatewithotherCommonsdevelopments—NIH,CZI...
NCI Cancer Research Data Commons (NCRDC)
GenomicDataCommonsNode:GDC
ImagingDataCommonsNode:IDC
ProteomicDataCommonsNode:PDC
APIs
• Authentication and Authorization • Metadata Validation Tools • Data Models
• User Workspaces • Container Environment
DataCommonsFramework–Modular,FlexibleCoreServices
Researchers
WebInterface
DataSubmission&Harmoniza9on
GDC
GDC / Cloud Resources: Near Term - Moving Towards a Commons Framework
DockStore Analysisresources
APIsAPIs
SBGCGC
BroadFireCloud
ISBCGC
GDC@GCP
GDC@AWS
GDC@Azure
WebInterface
Centrally-managedcopiesofthedata,mirroredinthe
commercialclouds
Centrally-managedauthenBcaBonandauthorizaBonthrueRACommonsand
dbGaP
CloudResourcesconBnuetoprovidedataaccess,analyBctools,workspace
The NCI Cancer Research Data Commons A virtual, expandable infrastructure
Ø StandardizeddatasubmissionandQ/CØ ControlledvocabulariesØ Harmoniza9onbysubjectmaeerexperts GenomicData
ProteomicData
GDC
Clinical
Functional
Cancer Models
Imaging
Population
Proteomics
NCI Cancer Research Data Commons
GDC
ImagingDataØ SecuredataaccessthroughAPIorwebUIØ QueryacrossdatadomainsØ Analy9cs,elas9ccompute,visualiza9on
GDC
Authentication &
Authorization
Biologists / Clinical Researchers
Clinicians and Patients
Tool / Algorithm Developers
Computational Scientists
DataContributors
API API API API
CancerDataAggregatorAggregatebycase,sample,study,disease,Bssue,etc.
API
APIs
CommunityPresenta9on
Analy9cs
Mul9-modaldataaggrega9on
DataCommonsRepositories/Nodes
Genomics Imaging ProteomicsClinical
GovernanceandOutreach
• Governanceprocesstobeestablished,includingScien9ficandTechnicalReviewBoardandSteeringCommieee
• Structuredprocessfordecisions,interac9ons,roles
• Outreachandcollabora9on• WorkingwithNIHandotherICsonrelatedini9a9ves/DataCommons,aswellasexternalgroupssuchasChanZuckerberg
• Par9cipa9ngonNIHandinteragencyworkinggroupsandonPMI-andMoonshot-relatedprojects
• PlansforworkshopsandRFIstogetcommunityinput,feedback,andpar9cipa9on
CloudResourcesTeamLeads• GadGetz,Ph.D-BroadIns9tute• IlyaShmulevich,Ph.D-ISB• BrandiDavis-Dusenberry,Ph.D-SevenBridges
NCICBIITTeam• DurgaAddepalli,Ph.D.• AllenDearry,Ph.D.• JuliKlemm,Ph.D.• TanjaDavidsen,Ph.D.• IzumiHinkson,Ph.D.• BetsyHsu,Ph.D.• StephenJee,Ph.D.• JohnOtridge,Ph.D.• SimaPandya• EveShalley• SteveTsang,Ph.D.
FrameworkTeam• RobertGrossman,Ph.D-UniversityofChicago• PhillisTang• Chris9naYung
Acknowledgements NCICenterforCancerGenomics
• JCZenklusen,Ph.D.• DanielaGerhard,Ph.D.• ZhiningWang,Ph.D.
NCIOfficeofCancerClinicalProteomicsResearch• HenryRodriguez,Ph.D.• ChrisKinsinger,Ph.D.
NCICancerImagingProgram
• PaulaJacobs• JohnFreymann• Jus9nKirby
NCILeadership• DougLowy,M.D.• WarrenKibbe,Ph.D.• LouStaudt,M.D.,Ph.D.• StephenChanock,M.D.
www.cancer.gov www.cancer.gov/espanol