Genentech icgc 2015

85
Status and Update of the International Cancer Genomics Consortium (ICGC) e 1 st 2015 F. Francis Ouellette [email protected] Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.

Transcript of Genentech icgc 2015

Page 1: Genentech icgc 2015

Status and Update of the International Cancer Genomics Consortium (ICGC)

June 1st 2015B.F. Francis Ouellette [email protected]

• Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON

• Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.

Page 2: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

You are free to:

Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:

You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Page 3: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

3Module #: Title of Module

Page 4: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 5: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Disclaimer

I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, and HMP2), as well as on the Science, Industry Advisory Committee of Genome Canada.

I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.

Page 6: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

@bffo

[email protected]

Page 7: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 8: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium

Page 9: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://www.csb.utoronto.ca/

Page 10: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCHhttp://bioinformatics.ca/

Page 11: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 12: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 13: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://bioinformatics.ca/workshops/2014

Page 14: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

E-mail: [email protected]

Web: http://bioinformatics.ca

Page 15: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

CancerA Disease of the Genome

Challenge in Treating Cancer:

Every tumor is different Every cancer patient is different

Page 16: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Johns Hopkins> 18,000 genes analyzed for mutations11 breast and 11 colon tumorsL.D. Wood et al, Science, Oct. 2007

Wellcome Trust Sanger Institute518 genes analyzed for mutations210 tumors of various typesC. Greenman et al, Nature, Mar. 2007

TCGA (NIH)Multiple technologiesbrain (glioblastoma multiforme), lung (squamous

carcinoma), and ovarian (serous cystadenocarcinoma).

F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007

Large-Scale Studies of Cancer Genomes

Page 17: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Heterogeneity within and across tumor types

High rate of abnormalities (driver vs passenger)

Sample quality matters

Consent and controlled data access is complicated

Lessons learned

Page 18: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium

Collect ~500 tumour/normal pairs from each of 50 different major cancer types;

Comprehensive genome analysis of each T/N pair: Genome

Transcriptome

Methylome

Clinical data

Make the data available to the research community & public.

Identify genome changes

…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…

Page 19: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Rationale for the ICGCThe scope is huge, such that no country can do it all.

Coordinated cancer genome initiatives will reduce duplication of effort for common and easy to acquire tumor samples and and ensure complete studies for many less frequent forms of cancer.

Standardization and uniform quality measures across studies will enable the merging of datasets, increasing power to detect additional targets.

The spectrum of many cancers varies across the world for many tumor types, because of environmental, genetic and other causes.

The ICGC will accelerate the dissemination of genomic and analytical methods across participating sites, and the user community

Page 20: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGCGoals, Structure, Policies & Guidelines

http://goo.gl/sPGLQN

Page 21: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Primary Goal: coordinate efforts to reach goals (50 tumours)

Page 22: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://docs.icgc.org/dcc-data-element-specifications

Page 23: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Primary Goal: be comprehensive

http://goo.gl/BE7KH1

Page 24: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Analysis Data Types

Germline variants (SNPs)

Simple Somatic Mutations (SSM)

Copy Number Alterations (CNA)

Structural Variants (SV)

Gene Expression (micro-arrays and RNASeq)

miRNA Expression (RNASeq)

Epigenomics (Arrays and Methylation)

Splicing Variation (RNASeq)

Protein Expression (Arrays)

Page 25: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Primary Goal: generate highest quality

http://goo.gl/FXCvi9

Page 26: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 27: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Primary Goal: available to all

Page 28: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Primary Goal: available to all

Page 29: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes

• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants

ICGC OA Datasets

http://goo.gl/w4mrV

Page 30: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Secondary Goal: coordinate work to benefit productivity

http://goo.gl/K5mHC3

Page 31: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://icgc.org/icgc/committees-and-working-groups

Page 32: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Secondary Goal: disseminate knowledge

http://goo.gl/ObcZXy

Page 33: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGCGoals, Structure, Policies & Guidelines

http://goo.gl/sPGLQN

Page 34: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Policy

ICGC membership implies compliance with Core Bioethical Elements for samples used in ICGC Cancer Projects:

http://goo.gl/TFrCmKhttp://goo.gl/nYx6YG

Page 35: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

POLICY:The members of the International Cancer Genomics Consortium (ICGC) are committed to the principle of rapid data release to the scientific community.

http://goo.gl/TFrCmK

Page 36: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Publication Policy

The individual research groups in the ICGC are free to publish the results of their own efforts in independent publications at any time (subject, of course, to any policies of any collaborations in which they may be participating).

Page 37: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Moratorium: http://www.icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy

Page 38: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Publication Policy

Page 39: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Where do you find that information?

We actually make it hard to find, but we are working on that! (this is an example of where ICGC would like to do what TCGA does!)http://cancergenome.nih.gov/publications/publicationguidelines

Page 40: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Policy on Intellectual PropertyAll ICGC members agree not to make claims to possible IP derived from primary data (including somatic mutations) and to not pursue IP protections that would prevent or block access to or use of any element of ICGC data or conclusions drawn directly from those data.

http://goo.gl/TCMXCl

Page 41: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

85 Projects 18 Jurisdictions 42 Cancer typesOver 12,000 Cancer Genomes

International Cancer Genome Consortium: February 2015

Page 42: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

DCC Activities

DCC activities are split between two groups:

Software Development

DCC portal

Submission tool

Biocuration (which also includes Content Management)

Data level management

Submitter “handling”

Coordination with secretariat

User support

http://dcc.icgc.org/team42

Page 43: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Data

ValidationValidationValidation(dictionary)

Validation(across fields)

Validation(across fields)

Validation(across fields)

indexing

Happy Users

http://goo.gl/1EcyR

Page 44: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://docs.icgc.org/methods

Page 45: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://docs.icgc.org/dcc-data-element-specifications

Page 46: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC Biocuration

Helping submitters get their data to ICGC

Progress reporting (data audit)

Quality checks (coverage, correctness, etc.)

Helping users get to the data

Validate and check (and recheck) metadata on public repositories

Test and integrate with other public repositories via standard data formats, ontologies.

Documentation, documentation, and more documentation

Training

46

Page 47: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC datasets to date: https://dcc.icgc.org/projects/history

Page 48: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

Missing Clinical Data?

Page 49: Genentech icgc 2015

49

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

Page 50: Genentech icgc 2015

50

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 51: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

DACOData Portal Info/help

Login

Page 52: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://dcc.icgc.org/

Page 53: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://dcc.icgc.org/

55 projects

Access to all data files(and more with DACO access)

Faceted searches

Page 54: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/projects

Page 55: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/search

Page 56: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 57: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/repository

Page 58: Genentech icgc 2015

58

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC DCC community http://goo.gl/wfxRqJ

https://goo.gl/M1vch1

Page 59: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

Page 60: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

Page 61: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules

Page 62: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data

• Gene Expression (probe-level data)

• Raw genotype calls

• Gene-sample identifier links

• Genome sequence files

• Germ line variants

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Somatic variants from Exome or WGS

ICGC OpenAccess Datasets

http://goo.gl/w4mrV

Page 63: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Primary sequence data (BAM and FASTQ files)

• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole

genome sequencing• Certain information in MAFs• A full list of controlled-access

data types can be found at: http://goo.gl/K1h7zu

TCGA Controlled Access Datasets

• De-identified clinical and demographic data

• Gene expression data• Copy number alterations in regions

of the genome• Epigenetic data• Summaries of data compiled across

individuals• Anonymized single amplicon DNA

sequence data• Somatic variants from scrubbed

exome sequencing

TCGA OpenAccess Datasets

http://goo.gl/A1rMRB

Page 64: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

TCGA/ICGC users agreed:

… to keep all computer systems on which controlled access data reside, or which provide access to such data, up to date with respect to software and security patches.

… to protect Controlled Access Data against disclosure to unauthorized individuals. 

… to monitor and control which individuals have access to Controlled Access Data. 

Page 65: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

TCGA/ICGC users agreed:

… to destroy all copies of controlled access data after controlled access privileges expires. 

... to only use secure transfer protocols: e.g. https and sftp

… to encrypt Controlled Access data in transfers and storage

Page 66: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

What does it mean for this file?

simple_somatic_mutation.aggregated.vcf.gzhttps://dcc.icgc.org/repository/release_18/Summary

Page 67: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 68: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Identify yourself

Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement

All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf

Page 69: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 70: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 71: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 72: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 73: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 74: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 75: Genentech icgc 2015

75

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://icgc.org/daco/approved-projects

173 groups 977 people

Page 76: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

DACO

ICGC

dbGaPcgHUB

EGA

TCGA

BAM

Open

Open

ERA

BAM

BAM

EGA id& password

WGS

Page 77: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Making sense of it all

1 project == 1 pipeline

Page 78: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Making sense of it all

55 projects == 55 pipelines

Page 79: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Making sense of it all

55 projects == 1 pipeline

Page 80: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

PanCancer Analysis of Whole Genomes (PCAWG)

2,400 T/N pairs with clinical dataanalyzed over 6 Academic clouds

16 working groups, > 1000 scientists

1 alignment pipeline (10 months)

Data freeze 2 months ago

3 somatic mutation pipelines (2 more months?)

2 RNA-Seq pipelines (done)

Start writing papers in January 2016

Page 81: Genentech icgc 2015

81

ONTARIO INSTITUTE FOR CANCER RESEARCH

From PCAWG we will have:

1st PANCANCER analysis on > 2,400 cancer tumours from a WGS perspective

RNA, SSM, CNV, Methylation analysis

Published (executable) pipelines

Docker https://github.com/docker/docker

Galaxy galaxyproject.org

Seqware http://seqware.github.io/

Method papers

Multiple cloud access to data

Multiple portal access to data

Page 82: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Other projects in planning ICGC to finish in Spring of 2018

Planning for ICGC2

ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data)

ICGC2: (planning) 250,000 Tumours (DNA, RNA, Epigenome, Clinical trial) (1/2 million genomes)

ICGC1 was the picture, ICGC2 will be the movie (before and after treatment).

Trailers to come out in December, before Christmas

Submission system with one place for data and metadata

Tools/links directory portal

Page 83: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

DCC Software Developer

Vincent Ferretti Daniel ChangAnthony CrosJerry LamBrian O'ConnorBob TiernayStuart WattShane WilsonJunjun Zhang

Acknowledgments

ICGC/OICR Project leaders:

Tom HudsonJohn McPhersonLincoln SteinJared SimpsonPaul BoutrosVincent FerrettiFrancis OuelletteJennifer Jennings

Ouellette Lab

Michelle BrazasEmilie ChautardNina PalikucaZhibin Lu

Web Dev

Joseph YamadaKamen WuKim CullionMiyuki Fukuma

ICGC DCC Biocuration

Hardeep NahalMarc PerryKevin Chen

http://oicr.on.ca http://icgc.org

… and all the patients and their families that that are putting their hopes into our work!

Research IT/Systems

David Sutton, Bob GibsonSam MaclennanDavid MagdaRob NaccaratoBrian OttGino Yearwood

EGAJustin PaschallJeff Almeida-KingIlkka LappalainenJordi Rambla De ArgilaMarc Sitges Puy

Genome Sequence Informatics (GSI)

Lars Jorgensen

Tim BeckTony DeBatLarry HeisslerXuemei (Mei) LuoMichael MoorhouseYogi Sundaravadanam

Morgan TaschukMichael Laszloffy Peter Ruzanov

Page 84: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

Informatics and Biocomputing at the OICR

Page 85: Genentech icgc 2015

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://icgc.org

http://dcc.icgc.org

http://docs.icgc.org

[email protected]

[email protected] @bffo