Download - FDA NGS and Big Data Conference September 2014

Transcript
Page 1: FDA NGS and Big Data Conference September 2014

National Cancer InstituteU.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESNational Institutes of Health

NCI Genomics Data Commons and cloud pilots

September 2014

Page 2: FDA NGS and Big Data Conference September 2014

Overview• National Challenges in Cancer Data

• Disruptive Technologies

• NCI Genomics Data Commons

• NCI Cloud Pilots

• Building a national learning health system for cancer clinical genomics

Page 3: FDA NGS and Big Data Conference September 2014

National Challenges in Cancer Informatics

• Lowering barriers to data access, analysis and modeling for cancer research

• Integration of data and learning from basic and clinical research with cancer care that enable prediction and improved outcomes

Page 4: FDA NGS and Big Data Conference September 2014

We need:

• Open Science (Open Access, Open Data, Open Source) and Data Liquidity for the cancer community

• Semantic interoperability through CDEs and Case Report Forms mapped to standards

• Sustainable models for informatics infrastructure, services, data

Page 5: FDA NGS and Big Data Conference September 2014

Where we areDisruptive technologies

Getting social

Open access to data

Page 6: FDA NGS and Big Data Conference September 2014

Disruptive Technologies• Printing• Steam power• Transportation• Electricity• Antibiotics• Semiconductors &VLSI design• http• High throughput biology

Systems view - end of reductionism?

Page 7: FDA NGS and Big Data Conference September 2014

Disruptive Technologies• Printing• Steam power• Transportation• Electricity• Antibiotics• Semiconductors &VLSI design• http• High throughput biology• Ubiquitous computing Everyone is a data provider

Data immersion

World:6.6B active mobile contracts1.9B smart phone contracts1.1B land linesWorld population 7.1B

US:345M active mobile contracts287M smart phone contractsUS population 313M

Page 8: FDA NGS and Big Data Conference September 2014

What about social media?

• Social media may be one avenue for modifying behaviors that result in cancer

• Properly orchestrated, social media can have dramatic impact on quality of life for patients and survivors

• It can reach into all segments of our society, including underserved populations

Page 9: FDA NGS and Big Data Conference September 2014

Public Health

• These three modifiable factors - infectious disease, smoking, and poor nutrition and lack of exercise contribute to at least 50% of our current cancer burden. And the cost from loss of quality of life, pain and suffering is incalculable.

Page 10: FDA NGS and Big Data Conference September 2014

Some NCI Big Data activities

• TCGA, TARGET and ICGC

– Cancer Genomics Data Commons

– NCI Cloud Pilots

• Molecular Clinical Trials:

– MPACT, MATCH, Exceptional Responders

Page 11: FDA NGS and Big Data Conference September 2014

Data are accumulating!

Page 12: FDA NGS and Big Data Conference September 2014

From the Second Machine Age

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 13: FDA NGS and Big Data Conference September 2014

Molecular data is Big Data

• Brief trip down memory lane• Sequencing and the Human Genome

Project

Page 14: FDA NGS and Big Data Conference September 2014
Page 15: FDA NGS and Big Data Conference September 2014
Page 16: FDA NGS and Big Data Conference September 2014

GenBank High Throughput Genome Sequence (HTGS)

Page 17: FDA NGS and Big Data Conference September 2014

February 12, 2001

Page 18: FDA NGS and Big Data Conference September 2014

HGP outcomes

Page 19: FDA NGS and Big Data Conference September 2014

19

Page 20: FDA NGS and Big Data Conference September 2014

Assays and Data Types

20

Page 21: FDA NGS and Big Data Conference September 2014

TCGA history

• Initiated in 2005• Collaboration of NHGRI and NCI to

examine GBM, Lung and Ovarian cancer using genomic techniques in 2006.

• Expanded to 20+ tumor types.

Page 22: FDA NGS and Big Data Conference September 2014

TCGA drivers• Providing high quality reference sets for

20+ tissue types• Providing a platform for systems biology

and hypothesis generation• Providing a test bed for understanding the

real world implications of consent and data access policies on genomic and clinical data.

Page 23: FDA NGS and Big Data Conference September 2014

Focus on TCGA• TCGA consortium slides• Thanks to Lou Staudt and Jean Claude

Zenklusen

Page 24: FDA NGS and Big Data Conference September 2014

TCGA –Lessons from

structural genomics

Jean Claude Zenklusen, Ph.D.DirectorTCGA Program OfficeNational Cancer Institute

Page 25: FDA NGS and Big Data Conference September 2014

The Mutational Burden of Human Cancer

Mike Lawrence and Gaddy Getz

Increasing genomiccomplexity

Childhoodcancers

Carcinogens

Page 26: FDA NGS and Big Data Conference September 2014
Page 27: FDA NGS and Big Data Conference September 2014
Page 28: FDA NGS and Big Data Conference September 2014

TCGA Nature 497:67 (2013)

Molecular Subgroups Refine Histological DiagnosisOf Endometrial Carcinoma

POLE(ultra-

mutated)MSI

(hypermutated)Copy-number low

(endometriod)Copy-number high

(serous-like)

Histology

MutationsPer Mb

PolEMSI / MSH2

Copy #PTEN

p53

Serousmisdiagnosed

as endometrioid?EndometrioidSerous

Histology

Page 29: FDA NGS and Big Data Conference September 2014

TCGA Nature 497:67 (2013)

Molecular Diagnosis of Endometrial Cancer MayInfluence Choice of Therapy

POLE(ultra-

mutated)MSI

(hypermutated)Copy-number low

(endometriod)Copy-number high

(serous-like)

Histology

MutationsPer Mb

PolEMSI / MSH2

Copy #PTEN

p53

Adjuvantchemotherapy?

Adjuvantradiotherapy?

Surgery only?

Page 30: FDA NGS and Big Data Conference September 2014

GDC

NCI Cancer Genomics Data Commons

NCI GenomicsData Commons

Genomic +clinical data

. . .

Page 31: FDA NGS and Big Data Conference September 2014

GDC

NCI Cancer Genomics Data Commons

NCI GenomicsData Commons

Genomic +clinical data

. . .

Cancerinformation

donor

Page 32: FDA NGS and Big Data Conference September 2014

GDC

Identifylow-frequencycancer drivers

Define genomicdeterminants of response

to therapy

Compose clinical trialcohorts sharing

Targeted genetic lesions

Cancerinformation

donor

Utility of a Cancer Knowledge Base

Page 33: FDA NGS and Big Data Conference September 2014

DACO

ICGC

dbGaP

EGA

TCGA

BAM

Open

Open

ERA

BAM

Germ

Line

+ EGA id

BAMBAM

Page 34: FDA NGS and Big Data Conference September 2014

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

COSMICOpen Data

Page 35: FDA NGS and Big Data Conference September 2014

Driver for the Cloud Pilots• An inflection point for TCGA is looming

Gig

abyt

es (G

B)

Page 36: FDA NGS and Big Data Conference September 2014

Local copies and computation

• Assuming the 2.5 PB TCGA data set

• Storage and backups ~ $1M US

• Downloading TCGA data at 10 Gb/sec = 23 days

• Size + high dimensionality = high computational requirements that grow quickly

Page 37: FDA NGS and Big Data Conference September 2014

GDC

Relationship of the Cancer Genomics Data Commons and NCI Cloud Pilots

NCI Cloud Computational Centers

PeriodicData Freezes

Search /retrieve

AnalysisNCI GenomicsData Commons

Page 38: FDA NGS and Big Data Conference September 2014

Cancer Genomics Cloud Pilots

Page 39: FDA NGS and Big Data Conference September 2014

NCI Cloud Pilots

• Funding for up to 3 cloud pilots - 24

month pilots that are meant to inform the

Cancer Genomics Data Commons

– Explore models for cancer genomics APIs

– Explore cloud models for data+analysis

Page 40: FDA NGS and Big Data Conference September 2014

NCI Cloud Pilots• A way to move computation to the data• Sustainable models for providing access

to data• Reproducible pipelines for QA, variant

calling, knowledge sharing• Define genomics/phenomics APIs for

discovering new variants contributing to cancer, enhancing response, modulating risk

Page 41: FDA NGS and Big Data Conference September 2014

The future• Elastic computing ‘clouds’• Social networks• Big Data analytics

• Precision medicine• Measuring health• Practicing protective medicine

Semantic and synoptic data

Intervening before health is

compromised

Learning systems that enable learning from every cancer patient

Page 42: FDA NGS and Big Data Conference September 2014

Thank you

Warren A. [email protected]

Page 43: FDA NGS and Big Data Conference September 2014