clustering of Somatic Mutations to Characterize Cancer Heterogeneity With Whole Genome Sequencing
description
Transcript of clustering of Somatic Mutations to Characterize Cancer Heterogeneity With Whole Genome Sequencing
CLUSTERING OF SOMATIC MUTATIONS TO CHARACTERIZE CANCER HETEROGENEITY WITH WHOLE GENOME SEQUENCING J Becq1, A Alexa1, K Cheetham1, R Grocock1, Z Kingsbury1, A Timbs2, D McBride1, S Humphray1, M Ross1, A Schuh2 and D Bentley1
1illumina Cambridge Ltd., Chesterford Research Park, Cambridge, UK and 2Oxford NIHR Biomedical Research Centre, University of Oxford, Oxford UK
CANCER HETEROGENEITY
CLINICAL STUDY
DETECTION OF TUMOUR SPECIFIC MUTATIONS
TIME-SERIES ANALYSIS OF SOMATIC SINGLE NUCLEOTIDE VARIANTS
Mutant AF: 4/7 = 0.57
Mutant AF: 8/9 = 0.89
REF Base = A
Stage D
Stage R1
A
A
A
T
T
T
T
A
T
T
T
T
T
T
T
T
1. Select SNVs seen in at least one time-point 2. Filter for good quality (not in copy number aberration region, 15x < coverage < 200x in all but
1, genotype Qscore > 15 in all but 1) 3. Measure mutant Allele Frequency (mutant AF) at each time-point
chr position D R1 P1 R2 preT3
chr1 154543705 0.0000 0.0000 0.2174 0.4255 0.4242
chr2 198266834 0.5000 0.6364 0.3478 0.4091 0.5938
chr3 31107645 0.0303 0.0000 0.1928 0.4146 0.4483
… … … … … … …
chr21 1592215 0.0000 0.0000 0.0476 0.0526 0.2500
chr22 32831696 0.4211 0.5294 0.0538 0.0000 0.0000
chrX 142716811 0.0000 0.0000 0.4737 0.9000 1.0000
MUTATION PROFILES OF ALL SOMATIC SNVS
FROM CLUSTERS TO CLONES
Founder mutations
Mutations sensitive to treatment
Mutations resistant to treatment
Emerging mutations after treatment
Cancers are genomically diverse and dynamic entities
Clonal evolution generally selects for increased proliferation and survival, and might lead to invasion, metastasis and therapeutic resistance
Black lines: somatic SNVs with non-synonymous or nonsense consequences
VALIDATION WITH DEEP SEQUENCING
PROPOSED CELL POPULATION
Target
mutation of interest
Ultra-deep sequencing (50,000x) of amplicons for all somatic SNVs with non-synonymous consequences Report accurate mutant allele frequency for each amplicon
Amplify by PCR
Sequence on MiSeq instrument
Mutations present at all stages, regardless
of treatments
Mutations decreasing after Fludarabine +
Chlorambucil + Rituximab treatment
Emerging mutations (mutant AF is 0% at early stages)
Expanding mutations (mutant AF is >0% at early stages)
Time-series whole genome sequencing at 30x is sufficient to provide a representation of tumour cell populations / heterogeneity, provided each cell population is >10%
Deep sequencing has confirmed the WGS analysis while providing greater sensitivity as mutations at very low frequencies (~1%) can be detected
DNA samples were collected from a patient with Chronic Lymphocytic Leukemia at different time-points during his treatment
Principal component analysis
Co
mp
on
ent
2
Component 1
4. Cluster SNVs with similar mutant AF profiles using k-means
Chlorambucil
Diagnosis Remission Remission Relapse Relapse
Germline
Fludarabine Chlorambucil
Rituximab short remission duration, aggressive disease, death despite treatment
GL
D P1 R2 preT3 R1
time
Tumour progression
Germline
Tumours
Tumour realigned reads
Candidate Indels
candidate indel search
Somatic Caller
Normal BAM
Normal realigned reads
realignment
Tumour BAM
realignment
Post-call filtration
Small Somatic variants
Fludarabine Chlorambucil
Rituximab
There are two late subclones, one present at diagnosis and one emerging after the second line of treatment
13%
3%
80%
4%
8%
3%
88%
1%
5%
47%
12%
36%
2%
89%
1%
8%
3%
95%
0%
2%
Founder mutations
Early subclone mutations
Late subclone mutations
Non-cancer
cancer
D P1 R2 preT3 R1
6%
<1%
13%
44%
5%
87%
3%
94%
9%
D P1 R2 preT3 R1
Enumerate possible cells
possible common ancestor (founder clone)
Enumerate most parsimonious phylogenies
Average HET mutant AF x 2
# of SNVs
Mutational group
D R1 P1 R2 preT3
1136 96% 99% 64% 92% 98% 686 80% 88% 12% 1% 0%
1241 3% 3% 47% 89% 95%
502 3% 5% 4% 5% 5%
x w y z
Because of constant < 10% mutant AF, the green mutational group is considered as noise
w + x + y + z = 100% - GL contamination =
+ y + z =
w + y =
At
each
tim
e-p
oin
t
Trees containing this subclone are rejected because it’s frequency is below 3% at most (below noise level)
Deep-sequencing also provides accurate proportions
CONCLUSION
Somatic variants