UCSC Cancer Browser Workshop

114

description

UCSC Cancer Browser Workshop. Mary Goldman [email protected]. First: use Firefox or Chrome. Please do not use Internet Explorer Download Firefox or Chrome if you need to Our browser does have some functionality on IE but it is limited. Use Firefox or Chrome for our full feature set. - PowerPoint PPT Presentation

Transcript of UCSC Cancer Browser Workshop

Page 1: UCSC Cancer Browser Workshop
Page 2: UCSC Cancer Browser Workshop

UCSC Cancer Browser Workshop

Mary [email protected]

Page 3: UCSC Cancer Browser Workshop

First: use Firefox or Chrome

Please do not use Internet Explorer➔ Download Firefox or Chrome if you need to

Our browser does have some functionality on IE but it is limited. Use Firefox or Chrome for our full feature set.

Page 4: UCSC Cancer Browser Workshop

What is the Cancer Browser?

It is a tool to visually explore and analyze cancer genomics data and its associated clinical information.

https://genome-cancer.ucsc.edu/

Page 5: UCSC Cancer Browser Workshop

It can be used to:

● analyze data on the browser ● do proof-of-concept visualization to

determine if more complicated analysis is worth performing

● visualize analysis results○ for colleagues, papers, presentations,

posters, etc.

Page 6: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 7: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 8: UCSC Cancer Browser Workshop
Page 9: UCSC Cancer Browser Workshop

Interactive tutorial highlights features of our browser

Page 10: UCSC Cancer Browser Workshop

Genomic data Clinical data

Page 11: UCSC Cancer Browser Workshop

Genomic data Clinical data

Samples

Genomic locations / Genes

Page 12: UCSC Cancer Browser Workshop

Genomic data Clinical data

Samples

Genomic locations / Genes

Page 13: UCSC Cancer Browser Workshop

Both clinical and genomic heatmaps sorted by left-most clinical feature and then subsorted on following features

Page 14: UCSC Cancer Browser Workshop

Red = amplificationBlue = deletion

Page 15: UCSC Cancer Browser Workshop

View data in summary modes (box plot or proportions)

Page 16: UCSC Cancer Browser Workshop

Also known as stacked bar graphs, proportions view shows the distribution of each column of data

Page 17: UCSC Cancer Browser Workshop
Page 18: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 19: UCSC Cancer Browser Workshop

Data Sources● TCGA● TARGET and other pediatric cancer● CCLE● SU2C● Connectivity Map

➢ 698 datasets including 526 public datasets➢ 227,000 samples

Page 20: UCSC Cancer Browser Workshop

http://cancergenome.nih.gov/

Page 21: UCSC Cancer Browser Workshop

Level 3 data

All of the TCGA data we display are Level 3. Level 3 means:● read-level data has been summarized to

gene- and probe-level data● no longer patient identifiable● publicly available

Page 22: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation (gene-level)● Protein expression● Paradigm Pathway activity

Page 23: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm Pathway activity

Vaske,C.J., Benz,S.C., Sanborn,J.Z., Earl,D., Szeto,C., Zhu,J., Haussler,D. and Stuart,J.M. (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 26, i237-i245.

Page 24: UCSC Cancer Browser Workshop

Multidimensional data is challengingGene Expression

DNA Methylation

Copy Number Variation

Mutation

Page 25: UCSC Cancer Browser Workshop

Paradigm

● Infers patient-specific pathway activities using CNV and gene expression data

● Developed at UCSC

● Multiple datasets depending on what data was used to make the calls (e.g. RNAseq + CNV)

Vaske, et. al. 2010

Page 26: UCSC Cancer Browser Workshop

PANCAN12 Datasets

● TCGA formed an Analysis Working Group to look at genomics abnormalities across cancers

● 12 tumor types: breast cancer, ovarian cancer, GBM, ....

● CNV, expression, mutation, protein

Hoadley,K.A., Yau,C., Wolf,D.M., Cherniack,A.D., Tamborero,D., Ng,S., Leiserson,M.D.M., Niu,B., McLellan,M.D., Uzunangelov,V., et. al. (2014) Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin. Cell, 158, 929–944.

Page 27: UCSC Cancer Browser Workshop

Pan-Cancer datasets

● We assembled these datasets● 19 tumor types: breast cancer, ovarian

cancer, GBM, melanoma, thyroid cancer, ....● CNV, expression, mutation, paradigm

Page 28: UCSC Cancer Browser Workshop

Pan-Cancer mutations

Looking at the most frequently mutated genes in cancer, we can see across almost 4.5K samples that TP53 is by far the most mutated

Page 29: UCSC Cancer Browser Workshop

Pan-Cancer Normalized Gene expression● Allows you to see differences in expression

across all cancer types● Combine illumina RNAseq data from all

TCGA cohorts● Mean-normalized per gene

Page 30: UCSC Cancer Browser Workshop

https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/#?bookmark=c347fdabddde3e73d824caff1290a6a8

FOXM1 Pathway

FOXM1 Pathway

GBM

LGG

Page 31: UCSC Cancer Browser Workshop

TCGA Data Curation

● Map between patient, sample and omic IDs. Same ID on genomic and clinical matrices

● Curated overall and recurrence-free survival● More easily readable clinical/phenotype data● Matrix format can be downloaded for both

genomic and phenotype/clinical data

Page 32: UCSC Cancer Browser Workshop

Non-TCGA Public Data

● TARGET and Childhood cancer

● Cell line data (CCLE, SU2C, Connectivity

Map)

Page 33: UCSC Cancer Browser Workshop

TARGET and Childhood cancer

● TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. (AML and Neuroblastoma)

● Other cancer types, including some from the Pediatric Tumor Affymetrix Database

Page 34: UCSC Cancer Browser Workshop

Cell Line data

● CCLE: Genome-wide information of ~1000 cell lines under baseline condition. Pharmacologic response profiles (IC50) and mutation status analysis.

● SU2C: 50 Breast cancer cell lines. GI50 to 77 therapeutic compounds.

● Connectivity Map: 4 cell lines and 1309 perturbagens at several concentrations. Gene expression change after treatment.

Page 35: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 36: UCSC Cancer Browser Workshop

Browser Demo

Page 37: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 38: UCSC Cancer Browser Workshop

PAM50

● Breast cancer● 4 major intrinsic subtypes: Luminal A,

Luminal B, Her2-enriched, Basal● Subtypes are clinically relevant for drug

sensitivity and long-term survival ● Determine tumor subtype by looking at the

gene expression of 50 genes

Page 39: UCSC Cancer Browser Workshop

Our Goals

● Look at the expression of these 50 genes and their relationship to the subtype calls

● Look at the survivorship of these different subtypes

● Make a bookmark to share with others

Page 40: UCSC Cancer Browser Workshop

Steps

1. Go to https://genome-cancer.ucsc.edu/2. Open TCGA Breast Agilent dataset3. Go to genes mode4. Replace current geneset with the predefined

PAM50 geneset from the Favorites menu.5. Perform KM plot6. Bookmark the view to share

Page 41: UCSC Cancer Browser Workshop

How to view Kaplan-Meier Plots

Time

Sur

viva

l

Steep curve = Poor survival

worse survival

better survival

Page 42: UCSC Cancer Browser Workshop
Page 43: UCSC Cancer Browser Workshop

Initially Luminal AB have higher survival than Basal / HER2-enriched

Page 44: UCSC Cancer Browser Workshop

As patients age, Basal / HER2-enriched have higher survival than Luminal AB

Page 45: UCSC Cancer Browser Workshop

Bonus Question

We know that several of these tumor samples went through both Agilent and RNAseq analysis. Now we now want to see if the gene expression patterns we're seeing for these 50 genes are Agilent specific or if they are cross-platform.➢ How do you do this?

Page 46: UCSC Cancer Browser Workshop

More information: PAM50● Tumors can instead be classified by hormone

cell surface receptors --> ER, PR and HER2● Patients who have at least one of these cell

surface receptors tend to respond to traditional hormone therapy

● Patients who are triple negative (negative for all 3 cell surface receptors) typically do not respond and have a poor prognosis

Page 47: UCSC Cancer Browser Workshop

Our Goals

● Examine relationship between these two subtyping methods

● Examine survivorship of triple negative patients compared with other patients

Page 48: UCSC Cancer Browser Workshop

But!

There is no 'triple negative' classification in the browser.

➔ We will need to create this classification and load it back into the browser

Page 49: UCSC Cancer Browser Workshop

Steps1. Download clinical data in view2. Open clinical data Excel or other

spreadsheet program

Page 50: UCSC Cancer Browser Workshop

Our Goal for Excel

Create a column next to the Sample ID column, where if the sample is triple negative it will be "1". Otherwise it will be "0".

➔ https://genome-cancer.ucsc.edu/download/public/BRCA_modified_clinical.xls

Page 51: UCSC Cancer Browser Workshop

Last Steps:https://genome-cancer.ucsc.edu/download/public/BRCA_modified_clinical.xls3. Upload new column4. Perform KM plot, see differences in survival

between triple negative and non-triple negative patients

Page 52: UCSC Cancer Browser Workshop

Outline

● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example

Page 53: UCSC Cancer Browser Workshop

Telomeres and nervous system cancers

Olena Morozova

Page 54: UCSC Cancer Browser Workshop

TCGA Lower Grade Glioma

TARGET Neuroblastoma

TCGA GBM

Page 55: UCSC Cancer Browser Workshop

ATRX mutation frequency is high in TCGA Lower Grade Glioma, TARGET childhood neuroblastoma, but not TCGA GBM.

ATRX

TCGA Lower Grade Glioma

TARGET Neuroblastoma

TCGA GBM

Page 56: UCSC Cancer Browser Workshop

ALT and ATRX● ATRX affects chromatin remodeling and methylation

patterns across the genome

● Loss of ATRX is associated with alternative lengthening of telomeres (ALT)

Page 57: UCSC Cancer Browser Workshop

Telomeres● Repeating sequences at end of chromosomes● Shorten due to cell replication● Extended by telomerase in germline cells● If the telomeres get too short, the cell undergoes

apoptosis and dies● Cancer cells lengthen telomeres as a way to avoid

cell death

Page 58: UCSC Cancer Browser Workshop

http://www.sciencemag.org/content/336/6087/1388.full

Page 59: UCSC Cancer Browser Workshop

Hypothesis: Low expression of ATRX leads to ALT

ATRX Alternative lengthening of telomeres

Telomeres lengthened (avoids cell death)

ATRX Mutation

Page 60: UCSC Cancer Browser Workshop

ATRX Alternative lengthening of telomeres

Telomeres lengthened (avoids cell death)

ATRX Mutation

➢ What about the telomerase pathway?

Hypothesis: Low expression of ATRX leads to ALT

Page 61: UCSC Cancer Browser Workshop

Telomerase and TERT

● TERT protein is a subunit of telomerase

● High expression lengthens telomeres

Page 62: UCSC Cancer Browser Workshop

TERT

Use gene expression to infer telomere lengthening method

Increased telomerase activity

Telomeres lengthened (avoids cell death)

ATRX Alternative lengthening of telomeres

Telomeres lengthened (avoids cell death)

Page 63: UCSC Cancer Browser Workshop

Our Goals● Examine how ATRX mutation frequency

relates to ATRX expression ○ Does a mutation in ATRX lead to lower

expression?● Examine how ATRX expression and TERT

expression relate to each other ○ What is the relationship between these

two telomere lengthening methods?

Page 64: UCSC Cancer Browser Workshop

On the current Cancer Browser

Page 65: UCSC Cancer Browser Workshop

On the current Cancer Browser

➔ Create a new spreadsheet view

Page 66: UCSC Cancer Browser Workshop

Xena

View multiple types of data together in a large spreadsheetView mutation position

Page 67: UCSC Cancer Browser Workshop

Xena

https://genome-cancer.ucsc.edu/proj/site/ hgHeatmap-cavm/http://tinyurl.com/op8qk3g

➔ Please use Chrome if you have it

Page 68: UCSC Cancer Browser Workshop

Xena demo

Page 69: UCSC Cancer Browser Workshop

ATRX

TERT

2 pathways to same result

NO Alternative lengthening of telomeres

Increased telomerase activity

Telomeres lengthened (avoids cell death)

ATRX

TERT

Alternative lengthening of telomeres

Controlled telomerase no lengthening

OR

Telomeres lengthened (avoids cell death)

Page 70: UCSC Cancer Browser Workshop

Summary: ATRX/TERT● ATRX mutations are associated low ATRX

expression● ATRX and TERT expression is positively

related -> one or the other pathway is being activated to lengthen telomeres

Page 71: UCSC Cancer Browser Workshop

More: TERT promoter mutations

Looked at mutations in the promoter region of the TERT and found that many samples had mutationsMarked which samples were TERT promoter mutation wildtype or mutant

Page 72: UCSC Cancer Browser Workshop

Our Goals

Examine Olena's TERT promoter calls in the context of other data from TCGA LGG

Is there a relationship between TERT promoter mutations and TERT expression?

Page 73: UCSC Cancer Browser Workshop

Xena

View multiple types of data together in a large spreadsheetView mutation positionSecurely and easily view:

your own annotations

Page 74: UCSC Cancer Browser Workshop

Xena demo 2

Page 75: UCSC Cancer Browser Workshop

Summary TERT promoter mutations

TERT promoter mutations are associated with increased TERT expression

Page 76: UCSC Cancer Browser Workshop

Xena

View multiple types of data together in a large spreadsheetView mutation position, including 3D structureSecurely and easily view:

your own annotations

Page 77: UCSC Cancer Browser Workshop

Xena

View multiple types of data together in a large spreadsheetView mutation position, including 3D structureSecurely and easily view:

your own annotationsyour own cohort of data

Analyze data in Galaxy

Page 78: UCSC Cancer Browser Workshop

Galaxy Analysis Tools

Users continually asking for more and more analysis tools ● keeping up with demand is impossible

--> Integrate with Galaxy to provide users with a huge range of tools

Page 79: UCSC Cancer Browser Workshop

Galaxy● Large tool workshed● Import our data, analyze, and then visualize

on our browser● Galaxy keeps track of analysis done so that

can reproduce later● Can currently import and export from Cancer

Browser and Xena

Page 80: UCSC Cancer Browser Workshop

Future Xena

Composite cohortsMore data (COSMIC, ICGC, LINCS)Make it easier to view own data

Page 81: UCSC Cancer Browser Workshop

With your laptop Xena you could ...

● View your own genomic, clinical/phenotype or mutation data.

● View your annotations on TCGA data● Perform analysis in Galaxy

--> Click on 'Help' in the top menu bar to get started.

Page 82: UCSC Cancer Browser Workshop

The End

Page 83: UCSC Cancer Browser Workshop

Acknowledgements

Brian CraftTeresa SwatloskiJingchun ZhuMelissa ClineOlena MorozovaSofie SalamaMaximilian HaeusslerErich WeilerJoshua StuartDavid Haussler

[email protected]

Page 84: UCSC Cancer Browser Workshop
Page 85: UCSC Cancer Browser Workshop
Page 86: UCSC Cancer Browser Workshop

Normalization for visualization

● All normalization we've talked about is on the data○ We also do some normalization for visualization

only● Does not affect underlying data● Subtract the mean of each genomic location● Automatically turned on for all transcription

datasets except for pancan normalized datasets● Can be turned off and on as desired

Page 87: UCSC Cancer Browser Workshop

Without normalization

Everything is red because, for RNAseq, all values are above zero

Page 88: UCSC Cancer Browser Workshop

With normalization

By subtracting out the mean, we can see places in the genome that are relatively under- or over-expressed compared to other samples

Page 89: UCSC Cancer Browser Workshop

● Comprehensive study of 20+ cancer types● Bulk of our data in the browser● Typically studies only obtain a few types of

genomic data (e.g. only mutation)● TCGA aims to obtain as many different types of

genomic data about one tumor as possible

○ It's a comprehensive resource

The Cancer Genome Atlas (TCGA)

Page 90: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm

Page 91: UCSC Cancer Browser Workshop

Copy Number Variation (CNV)● 2 processing methods: CBS or Gistic2● Circular binary segmentation (CBS) determines which

pieces of DNA were amplified/deleted based on SNP array results ○ 2 datasets○ One dataset has germline CNV removed by Broad○ We don't display normal samples

● Gistic2 generates gene level CNV estimates ○ 3 datasets

Page 92: UCSC Cancer Browser Workshop

Gistic2 Copy Number Variation Calls● Called by Firehose, an analysis pipeline ● Separates arm-level and focal alterations (short

segments) based on segment length before predicting overall CNV

● GISTIC2 focal: focal alterations only○ TCGA doesn't give arm-level alterations only

● GISTIC2 thresholded: data has been thresholded to -2,-1,0,1,2, representing homozygous deletion, single copy deletion, diploid normal copy, low copy number amplification, or high copy number amplification.

Page 93: UCSC Cancer Browser Workshop

Ovarian Serous Cystadenocarcinoma

Glioblastoma Multiforme

segmented copy number (delete germline cnv)

Page 94: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm

Page 95: UCSC Cancer Browser Workshop

DNA methylation● 2 platforms: 27K and 450K. 90% of 27K in 450K.● DNA methylation beta values range between 0

(hypomethylated) and 1 (hypermethylated). Bimodal distribution with peaks at 0.1 and 0.9

● Beta values were offset by 0.5 (new range: -0.5 to 0.5)● In 27K platform, the average of the unshifted beta

values is 0.26, thus much of the heatmap appears hypomethylated (blue). In 450K platform, the average is around 0.5.

Page 96: UCSC Cancer Browser Workshop

27k

450k

Page 97: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm

Page 98: UCSC Cancer Browser Workshop

Gene expression● Microarrays - Agilent and Affy● RNAseq - 2 Illumina sequencers

○ Most use RSEM to estimate gene-level transcription

○ We log2 transformed the data to normalize the distribution

Page 99: UCSC Cancer Browser Workshop

Exon Expression

● Illumina RNAseq● Exon-level transcription estimates, as in

RPKM values (Reads Per Kilobase of exon model per Million mapped reads)

● We log2 transformed the data to normalize the distribution

Page 100: UCSC Cancer Browser Workshop

GBM gene-level Illumina HiSeq. We can see a clear correlation in the expression of the genes to the subtypes called in the first clinical feature

Glioblastoma: gene expression subtypes

https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/#?bookmark=6e2048a6ef5cd04fe7c336e9f270c106

Page 101: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm

Page 102: UCSC Cancer Browser Workshop

Somatic Mutation

● High level view of mutations across genome● Mutation calls from TCGA pan-cancer

analysis● If there is a non-silent mutation in a coding

gene or any mutation in a non-coding gene, then we mark the entire gene as being mutated

Page 103: UCSC Cancer Browser Workshop

Somatic Mutation

Page 104: UCSC Cancer Browser Workshop

Somatic Mutation

● View in genes mode since calls are per gene● View in proportions to get a feel for what proportion of the

samples have a mutation in a particular gene

Page 105: UCSC Cancer Browser Workshop

TCGA Data types

● Copy Number Variation● DNA Methylation● Gene and exon expression● Somatic mutation● Protein expression● Paradigm

Page 106: UCSC Cancer Browser Workshop

Protein

● Reverse Phase Protein Array (RPPA)● 200 antibodies. Most antibodies are for

phosphorylated protein level. Some are for total protein.

● Include kinases, cell surface receptors, etc● RBN (replicate-base normalization)

○ RBN allows you to combine datasets from multiple RPPA runs

Page 107: UCSC Cancer Browser Workshop

Future Xena Data

COSMIC (Catalogue Of Somatic Mutations In Cancer)

● 947,213 Samples with 1,592,109 Mutations

● January 2014

Page 108: UCSC Cancer Browser Workshop

Future Xena Data

LINCS (Library of Integrated Network-based Signatures)● 42,532 perturbations for 15 cell lines● April 2014

Page 109: UCSC Cancer Browser Workshop

Outline● Quick overview of the browser● Overview of our data (TCGA + more)● How to use the browser● Breast cancer PAM50 example● Lower Grade Glioma Telomere example● Lower Grade Glioma IDH1 example

Page 110: UCSC Cancer Browser Workshop

TCGA Lower Grade Glioma

● Large survivorship difference within the LGG● Can we find subtypes with LGG that were

predictive of survivorship?● Ran clustering algorithm on DNA, RNA and

methylation data

TCGA (2014) Comprehensive and Integrative Genomic Characterization of Diffuse Lower Grade Gliomas, N. Engl. J. Med., In press.

Page 111: UCSC Cancer Browser Workshop

Clustering Results

● Found that IDH1 mutation status to be predictive of survival

● Also correlated with EGFR and PTEN copy number status

TCGA (2014) Comprehensive and Integrative Genomic Characterization of Diffuse Lower Grade Gliomas, N. Engl. J. Med., In press.

Page 112: UCSC Cancer Browser Workshop

Goals

Start up a fresh browser sessionView IDH1 mutation status as well as EGFR and PTEN copy number status together to see how they relate to one another.

Page 113: UCSC Cancer Browser Workshop

Steps

1. Open TCGA Lower Grade Glioma cohort2. Open LGG mutation (broad automated) -->

IDH13. Open LGG copy number (delete germline

cnv) --> EGFR, PTEN

Page 114: UCSC Cancer Browser Workshop

Xena demo 3