Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course 311-405...

68
Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course 311-405 Term 3, 2008/2009 Lecture on Wednesday, January 28 th , 2009 Aleksandar Milosavljevic, Aleksandar Milosavljevic, PhD PhD http:// http:// www.brl.bcm.tmc.edu www.brl.bcm.tmc.edu

Transcript of Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course 311-405...

Genomic DNA Variation

Computer-Aided Discovery Methods

Baylor College of Medicine course 311-405Term 3, 2008/2009

Lecture on Wednesday, January 28th, 2009

Aleksandar Milosavljevic, PhDAleksandar Milosavljevic, PhDhttp://www.brl.bcm.tmc.eduhttp://www.brl.bcm.tmc.edu

Entering Segment 2

Segment 1 (3 weeks): Cancer Lectures (1,2,3) Lab: Genboree, Ruby

Segment 2 (4 weeks): Bringing it together: Lecture+Lab

Segment 3: Review lectures

Background reading

A broad-brush survey of trends:CREATIVITY SUPPORT TOOLSAccelerating Discovery and InnovationBen Schneiderman

A bit of history and pointers to philosophy ( Karl Popper, C.S. Peirce ):THINKING WITH MACHINES: Intelligence Augmentation, Evolutionary Epistemology, and SemioticPeter Skagestad

Cancer Genome Variation: Methods

Recent landmark studies ( not covered this year ):

The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008 Sep 4. [Epub ahead of print].

Parsons DW et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008 Sep 26;321(5897):1807-12. Epub 2008 Sep 4.

Jones S. et al Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008 Sep 26;321(5897):1801-6. Epub 2008 Sep 4.

Cancer Genome Variation: Methods

Lecture focus today

Lab focus ( Friday )

Chromosome Aberrations: References 1 of 2

Background (optional)

[Balmain 2001] Balmain, A., Cancer genetics: from Boveri and Mendel to microarrays. Nat Rev Cancer, 2001. 1(1): p. 77-82.

[Albertson et al. 2003] Albertson, D.G., et al., Chromosome aberrations in solid tumors. Nat Genet, 2003. 34(4): p. 369-76.

[Rabbitts et al. 2003] Rabbitts, T.H. and M.R. Stocks, Chromosomal translocation products engender new intracellular therapeutic technologies. Nat Med, 2003. 9(4): p. 383-6.

Chromosome Aberrations References 2 of 2

Breast cancer – copy number variation, array CGH and gene expression

[Chin K. et al. 2006] Chin K et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell 10:529-541 2006

Prostate Cancer – aberrant fusions – via gene expression[Tomlins et al. 2005] Tomlins SA et al., Recurrent fusion of

TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8.

Breast cancer – direct detection of aberrant fusions by end-sequence profiling

[Hampton OA et al] Hampton, OA et al, A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Research. 2008 Dec 9. [Epub ahead of print]

Boveri, one century ago …

Multiple cell poles cause unequalsegregation of chromosomes.

a | Fertilization of sea-urchin eggs bytwo sperm results in multiple cell poles.

b | Chromosomes are aberrantly segregated

[Balmain 2001]

Chromosomal aberrations

[Albertson et al.]

Chromosomal aberrations

[Albertson et al.]

Cancer Genome Variation: Methods

(Array) Comparative Genome Hybridization (array CGH)

Chin K. et al., Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell 10:529-541 2006.

• 100+ aggressively treated early stage breast tumors

1989-1997, before ERBB2 antagonist Trastuzumab (Herceptin) was approved for treating ERBB2+ breast cancer

ERBB2 heuristic (“paradigm”) formulated in last sentence of Chin K. et al.

“Taking ERBB2 as the paradigm (recurrently amplified, overexpressed, associated with outcome and with demonstrated functional importance in cancer) suggests FGFR1, TACC1, ADAM9, IKBKB, PNMT, and GRB7 as high-priority therapeutic targets in these regions of amplification.”

“Taking ERBB2 as the paradigm (recurrently amplified, overexpressed…

Array CGH (~3K BAC array)

Gene expression (Affymetrix U133A array)

“Taking ERBB2 as the paradigm (recurrently amplified…

“Taking ERBB2 as the paradigm (… associated with outcome…)

“Taking ERBB2 as the paradigm (… associated with outcome…)

Deletions, amplifications induce aberrant fusions

…but…

Some aberrant fusion-producing rearrangements ( reciprocal translocations, inversions ) may not affect copy number

Mapping rearrangements ( aberrant fusions ) using paired ends

Two significant types of aberrant fusions

[Rabbitts et al.]

aberrantlyamplified

expression

aberrantactivation

of signaling protein

BCR-ABL fusion in Chronic Myeloid Leukaemia: four decades from lesion discovery

to Imatinib ( Gleevec)

1960: Philadelphia chromosome discovered

1973: Chromosome translocation t(9;22) identified

1983: Activated oncogene ABL identified

2001: Drug inhibiting BCR-ABL fusion identified

Fourfold significance of recurrent chromosomal aberrations

Prognostic Marker

Drug target

Pointing to biological pathway

Early diagnostic marker

Two case studies of fusion discovery

Case Study: Prostate Cancer Overexpression recurrent chromosomal aberration[Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion

of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8.

Case Study: Breast cancerDirect discovery of submicroscopic chromosomal

aberrations[Hampton OA et al] Hampton, OA et al. A sequence-level map

of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Research. 2008 Dec 9. [Epub ahead of print]

Case Study: Prostate Cancer

Recurrent ( > 50% cases) chromosomal aberrations discovered in leukaemias, lymphomas, and sarcomas

Carcinomas more complex: -- more rearrangements-- submicroscopic structure

Gene overexpression recurrent chromosomal aberration present in > 50% prostate carcinomas

[Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8.

Cancer Outlier Profile Analysis (COPA) using Oncomine database reveals overexpression of

ETV1 and ERG

[Tomlins et al.]

Frequent gene amplifications and losses in receptor tyrosine kinase-mediated signaling

ETV1

ERG

Recurrent TMPRSS2:ETV1 and TMPRSS2:ERG fusions

revealed by the study of rearrangements involving ETV1 and ERG

Expression of TMPRSS2 is regulated by androgen

[Tomlins et al.]

Exclusivity of rearrangement:either ETV1 or ERG

[Tomlins et al.]

TMPRSS2 translocation associated with:

• Aggressive disease

Cancer Res 66:8347-51, 2006

• Reduced disease free survival

Cancer Biol Ther 6, 2007

• Higher rate of prostate cancer specific death

TMPRSS2:ERG gene fusion associated with lethal prostate cancer in a watchful waiting cohort. Oncogene, 2007

Direct discovery of submicroscopic chromosomal aberrations by end-

sequence profiling

Detecting breakpoints / fusions by end-sequence profiling of genomic DNA

fragments

Human Chr 20

Human Chr 3

Cancer chromosome

Paired-end shotgun sequencing

Spectral Karyotyping (SKY) of MCF-7 breast cancer cell line

• Near triploid• Translocations involve all chromosomes except 4

Davidson et al (2000) Br J Cancer 83, 1309-17

Current model for origin of rearrangements in breast cancer:

Breakage-Fusion-Bridge (BFB) cycles initiated by “sticky” telomere ends

Figure 10.14a The Biology of Cancer (© Garland Science 2007)

Figure 10.14b The Biology of Cancer (© Garland Science 2007)

Figure 10.14c The Biology of Cancer (© Garland Science 2007)

Genome instability occurs during transition from hyperplasia to carcinoma in situ

End-sequence profiling of cancer

First genome-wide End-Sequence Profile of cancer: MCF-7 breast cancer cell line (Volik et al, 2003 & 2006)

~20,000 BAC ends sequenced by Sanger method~1X genome coverage

MCF-7 BAC (~150Kb)

chromosome 17chromosome 20

Left Tag Right Tag

Whole-genome BAC-end sequencing of MCF-7 (Volik et al. 2006):

1) ~20,000 MCF-7 BACs end-sequenced2) end-sequences mapped onto reference genome

Intrachromosomally rearranged BACs

Interchromosomally rearranged BACs

Rearrangement-spanning MCF-7 BACs

Chr 1 2 3 4 5 6 ……..

~ 600 BACs contain rearrangements (Volik et al. 2006)~ 2.5 % of the human genome

Fosmid

Library F

96-BAC

Pool 6

Fosmid

Library E

96-BAC

Pool 5454 PyroSeq

Run 3

Fosmid

Library D

96-BAC

Pool 4

Fosmid

Library C

96-BAC

Pool 3454 PyroSeq

Run 2

Fosmid Library B

96-BAC

Pool 2

Fosmid

Library A96-BAC

Pool 1454 PyroSeq

Run 1

8-10KFosmid clonesselected from each library for end sequencing(sanger)

569 non-redundant rearranged BACs

Volik et al, 2003 & 2006

Down to the basepair level:Down to the basepair level:Sequencing of Rearrangement-spanning MCF-7 BACsSequencing of Rearrangement-spanning MCF-7 BACs

Hampton OA et al.

Bridging (FES) and Outlining (454 PyroSeq)Bridging (FES) and Outlining (454 PyroSeq)

BAC (134Kb)

Fosmids (40Kb)

chromosome 3

PyroSeq

chromosome 17

PyroSeq

chromosome 20

PyroSeq

PCR Validation Pipeline and Genboree integration

Hampton OA et al.

157 PCR-confirmed somatic 157 PCR-confirmed somatic breakpoint junctionsbreakpoint junctions

Hampton OA et al.

Genomic Aberrations in MCF-7Genomic Aberrations in MCF-7

1

3

20

17157 rearrangements• detected in BACs • PCR-validated on gDNA

83 Intrachromosomal

74 Interchromosomal

Hampton OA et al.

A majority of dispersed breakpoints A majority of dispersed breakpoints fall within LCRsfall within LCRs

Hampton OA et al.

Detection of Fusion TranscriptsTranscript RT-PCR to validate Transcript RT-PCR to validate expression of predicted fusion transcriptsexpression of predicted fusion transcripts

ATXN7

Exon 6 Exon 13promoter Exon 7

Fusion

ATXN7

RAD51C

RAD51C

MCF7 10A NFusion

MCF7 10A NRAD51C ATXN7

MCF7 10A N

RT-PCR

Genomic fusion:

Predicted transcript:

Hampton OA et al.

Expression of predicted fusion transcriptsExpression of predicted fusion transcripts

ValidationBysiRNAknock-down

Hampton OA et al.

Biological validation:Biological validation:siRNA knock-down of SULF2 in 3 cell linessiRNA knock-down of SULF2 in 3 cell lines

growth

survival

anchorage-independent growth

Hampton OA et al.

Expression of predicted fusion transcriptsExpression of predicted fusion transcripts

Hampton OA et al.

Two Mechanisms for Double-Strand Break Repair

NAHR:

Non-Allelic Homologous Recombination

NHEJ:

Non-Homologous End-Joining

Figure 12.32 The Biology of Cancer (© Garland Science 2007)

RAD51

RAD51C

Roles of RAD51 and RAD51C in HRNAHR: Non-Alleleic Homologous Recombination

RAD51C is under-expressed in 51 out of 53 breast cancer cell lines relative to normal breast tissue

Row 25

6

7

8

9

10

11600MPE AU565

BT20 BT474

BT483 BT549

CAMA1 DU4475

HBL100 HCC38

HCC70 HCC202

HCC1007 HCC1008

HCC1143 HCC1187

HCC1428 HCC1500

HCC1569 HCC1599

HCC1937 HCC1954

HCC2157 HCC2185

HCC3153 HS578T

LY2 MCF10A

MCF12A MCF7

MDAMB134 MDAMB157

MDAMB175 MDAMB231

MDAMB361 MDAMB415

MDAMB435 MDAMB436

MDAMB453 MDAMB468

SKBR3 SUM44PE

SUM52PE SUM149PT

SUM159PT SUM185PE

SUM190PT SUM225CWN

SUM1315 T47D

UACC812 ZR751

ZR7530 ZR75B

Cell Line

Exp

ressio

n L

evel

NormalBreast

MCF-7

RAD51C under-expression is cancer specific, not tissue specific

Does the RAD51C / ATXN7 fusion

• interfere with resolution of Holliday junctions or

• otherwise affect HR

in a dominant negative fashion?

Figure 12.33 The Biology of Cancer (© Garland Science 2007)

NHEJ: Non-Homologous End-Joining

Figure 12.34a The Biology of Cancer (© Garland Science 2007)

Pending publication in PNAS ?

Back to technology: ramping up

Coverage is proportional to insert size

long inserts

short inserts

2X

2X 2X

coverage = L * N / G

L = insert size N = number of insertsG = genome size

coverage = L * N / G

L = insert size N = number of insertsG = genome size

Probability of breakpoint detection

= 1 - e – coverage

Massively parallel paired-end sequencing

fragment size run ~ cost unit200bp Illumina > 50M reads per run ( 8 lanes )3 Kbp Illumina, SOLiD > 50M reads per run20 Kbp Roche-454 < 1M reads per run40 Kbp diTag Method> 50M reads (54bp diTags) per run

0 10 20 30 40 50 60 700

1

2

3

475 fragment

75-paired end

45 paired end35 fragment

Cycles

Err

or

Rat

e (%

)

Left Tag Right Tag

Illumina

BLAST hits using diTag as query

Platform-independent end-sequencing

Paired-endMethod X

Paired-endMethod Y

Vendor X Vendor Y Vendor Z

Paired-endMethod Z

$1M genome $100 genome

Modular paired-end method

Effective coverage is reduced when cell population is heterogeneous

Effective coverage = Coverage * Fraction of tumor cells

with rearrangement

80% tumor cells

20% tumor cells

20% non-tumor cells

80% non-tumor cells

Effective coverage = Coverage * 80%

Effective coverage = Coverage * 20%

From the perspective of an LCR breakpoint insert size is effectively reduced by LCR size

Probability of breakpoint detection = 1 - e – effective coverage

effective coverage = W * N / GW = insert size – LCR size (“wiggle room”)N = number of insertsG = genome size

“wiggle room”

inserts

LCR

breakpoint

Breakpoints detected by 54bp diTag seqencing

~ 0.5 Mbp deletion

Roche-454 and Illumina diTag mappings are consistent with fosmid insert size

Roche-454

Illumina

Genboree pipelines for genome mapping: paired-end and array CGH

Laboratory exercise this week: array CGH

Analysis of array CGH data from a set of tumor samples using Genboree

– Upload array CGH data– Perform segmentation (invoke Bioconductor tool)– Subtract polymorphisms (databases, current literature)– Identify recurrent amplifications or deletions– Study correlation with gene expression