Figure S1. Overview of the variant calling and ...Figure S1. Overview of the variant calling and...
Transcript of Figure S1. Overview of the variant calling and ...Figure S1. Overview of the variant calling and...
Figure S1. Overview of the variant calling and verification process. This figure
expands on Fig. 1c with details of verified variants identification in 547 additional
validation samples. Somatic variants (SNVs, indels, focal and chromosome-arm-
level CNVs, and fusion products) were first called in 197 diagnostic samples with
remission DNA (for germline) using a Complete Genomics custom Whole
Genome Sequencing (WGS) variant calling pipeline. Complete Genomics calls
were optimized at the start of the TARGET project using 100 independently-
verified variants in WGS samples. Matched tumor and remission samples in 153
cases were used for somatic variant calling by both WGS and targeted capture
sequencing (TCS) of genes recurrently impacted in the WGS samples. 72% of
WGS SNVs, and 76% of WGS indels were confirmed by TCS (red & green text in
figures). For focal copy number (CN) alterations spanning fewer than 7 genes,
75% of recurrent WGS deletion/loss and 85% gain/amplification calls matched
recurrent alterations discovered by SNP6 arrays in 96 matching samples. For
chromosomal junctions, we integrated WGS, clinical and RNA-seq data by
majority vote, and confirmed 89% of WGS calls. An additional 29 samples from
the WGS discovery cohort were verified by TCS of diagnostic cases only, as part
of 146 tumors without matched remission (see top portion of the figure). The
remainder of these 146 cases were not used for discovery or validation
purposes, rather, we simply identified recurrence of variants that were observed
and verified in other samples.
Nature Medicine: doi:10.1038/nm.4439
Figure S2. Cellular processes and pathways commonly impacted in pediatric AML. The height of each bar indicates the percentage of samples with verified fusions
(green), SNVs/indels (grey), or focal CNVs (gold) in recurrently impacted genes within 684 pediatric AML samples. See Table S2b for a list of the impacted genes.
Nature Medicine: doi:10.1038/nm.4439
Figure S3. Data type overlap for TARGET and TCGA diagnostic samples. UpSet plots (http://www.caleydo.org/tools/upset/) showing the set overlaps for whole genome sequencing (WGS), whole exome (WXS), mRNA sequencing, DNA methylation arrays (CpGmeth), miRNA sequencing and targeted capture sequencing (TCS) in the TARGET and TCGA cohorts. The numbers of assays analyzed for each type are indicated by the horizontal bar graphs and number in the set intersection is illustrated in vertical bar graphs. The Clinical category includes samples comprising the entire TARGET AML dataset, including those in TARGET AML subprojects (e.g. previously reported WXS analysis6). Data from these samples are included in the chromosomal arm level and karyotype based assessments of copy loss and fusions. (a) All TARGET AML project samples available. (b) All TCGA samples used for comparisons to TARGET. (c) Assay type overlaps for TCGA and TARGET data combined.
TCGA assay overlap
(n=177) Combined TARGET & TCGA
assay data overlap (n=1200)
TARGET AML assay overlap
(n=1023)
* Clinical annotations include ISCN karyotype
a
b c
Nature Medicine: doi:10.1038/nm.4439
Figure S4. Clonality estimates are consistent by age across cohorts. Both TCGA and TARGET AML cohorts contain affected individuals between the ages of 15 and 39
(adolescent and young adult, or AYA). Mutational and karyotypic clonality were assessed in AYA patients with whole-genome or whole-exome sequencing from either
cohort, resulting in estimates from 40 TARGET AML subjects and 22 TCGA AML subjects in this age group. No significant association between cohort and mutational
clonality estimate (p = 0.79613, Fisher’s exact test) or karyotypic clonality (p= 0.180302, Fisher’s exact test) is observed (TCGA AYA cases are older and more likely to
have normal karyotype, though not significantly so). A multivariate Poisson model similarly shows little evidence for a significant cohort-wise effect. The strongest
predictor of (decreasing) mutational clonality in AYAs is age at diagnosis (p=0.28).
Nature Medicine: doi:10.1038/nm.4439
Figure S5. The context of genome-wide mutation burden in pediatric AML. The mutational burden of SNV and indels is low in pediatric AML (blue), with a median of 10
mutations/case across the 197 sample WGS cohort. This places pediatric AML, along with other pediatric malignancies (rhabdoid tumor, Ewing sarcoma,
medulloblastoma) and adult AML (red) among the least mutated of human cancers. Figure reproduced from the raw data reported by Lawrence and colleagues51
updated to reflect TARGET AML results, plotted using the ggplot2 package in the R statistical environment.
Nature Medicine: doi:10.1038/nm.4439
Figure S6. A simplified visualization of common genomic variants in TARGET and TCGA AML data. Selected small variants are grouped by those that appear distinctive
from core binding factor (CBF; t(8;21) and inv(16)) and KMT2A (aka MLL) fusions (grp1: mutations of WT1, NPM1, PTPN11, GATA2, CEBPA) and those that frequently co-
occur with CBF alterations (grp2: mutations of KIT or ASXL2, loss of chr X). C, chromosomal alteration; J, junction/translocation; M, mutation; I, ITD.
Pediatric Adult
KMT2A
CBF
grp1.var
FLT3
grp2.var
NRAS
KRAS
ZEB2
MBNL1
KMT2A
KRAS
grp1.var
DNMT3A
IDH2
IDH1
CBF
TET2
TP53
NRAS
grp2.var
Nature Medicine: doi:10.1038/nm.4439
Figure S7. Adult-Pediatric mutational contrasts in AML. Lollipop plots generated with ProteinPaint (https://pecan.stjude.org/#/proteinpaint) highlight differences in
frequency, type, and location of sequence variants in pediatric and adult AML. The plotted data reflect all somatic coding variants identified at presentation in 177 TCGA
cases and 815 TARGET AML cases (WGS + TCS). Mutations are coded by functional class: blue, missense; brown, insertion; gray, deletion; red, frameshifting; orange,
stop-gain; green, tandem duplication.
c KRAS
TARGET
TCGA
e NRAS
TARGET
TCGA
d FLT3
TARGET
TCGA
b GATA2
TARGET
a MYC
TARGET
TARGET
TCGA
f KIT
Nature Medicine: doi:10.1038/nm.4439
Figure S8. The impact of pediatric gene fusions on clinical outcome. (a) 199 patients evaluated for CBFA2T3-GLIS2 fusion had clinical outcome data available for
analysis. Those with the fusion (n=9) had significantly worse overall survival than patients without the fusion (n=190) (p=0.0101). (b) 824 patients were evaluated for
fusions involving ETS family transcription factors (ETV6, FUS, or ERG) through karyotype and/or transcriptome sequencing and had clinical outcome data available for
analysis. Those with fusions (n=20) had significantly worse event-free survival than patients without a fusion (n=804) (p=0.0060). (c) 824 patients were evaluated for
fusions involving KAT6A through karyotype and/or transcriptome sequencing and had clinical outcome data available for analysis. Those with fusions (n=8) had
significantly worse overall survival than patients without a fusion (n=816) (p=0.0195). Differences in outcome were assessed by log-rank test. EFS, event free survival;
OS, overall survival.
Nature Medicine: doi:10.1038/nm.4439
Figure S9. Pediatric CBL Exonic Deletions Detected by cDNA Fragment Length Analysis. Representative examples of CBL wild-type and deletion transcripts detected by
capillary electrophoresis of cDNA. Horizontal axis depicts size of the PCR fragment (bp), while vertical axis indicates strength of signal. WT size (full-length transcript) is
685bp, exon 8 deletion only is 563bp, and deletions of exons 8 and 9 is 354bp.
Nature Medicine: doi:10.1038/nm.4439
Figure S10. Mutational frequency differences in key myeloid genes. (a) ECOG comparison4. (b) TCGA comparison, balanced by cytogenetic subtypes (see online
Methods). Error bars indicate the empirical SD from the resampling procedure.
b a
TARGET ECOG
TARGET
TCGA
Nature Medicine: doi:10.1038/nm.4439
Figure S11. Mutational co-occurrence in KMT2A rearranged childhood AML. We identified single copy segmental deletions of ZEB2 and/or MBNL1 in 14 patients, 6 of
whom had concurrent KMT2A fusions (p=0.035, Fisher’s exact test). The row entitled “KMT2A (clinical)” shows the manually-curated classification of the tumor primary
cytogenetic type by combining results from clinical, genomic and RNA-seq assays. By this measure, all samples are classified as belonging to the KMT2A fusion
cytogenetic group. The row entitled KMT2A (WGS) shows KMT2A variants found by WGS alone. Note 2 samples have copy number alterations as well as fusions
impacting KMT2A. C, copy number alteration; J, junction/translocation; M, mutation; I, ITD.
KMT2A (clinical) KMT2A (WGS) MLLT3 NRAS FLT3 KRAS MLLT10 MBNL1 TMEM14E ZEB2
Nature Medicine: doi:10.1038/nm.4439
Figure S12. Clonality at presentation in pediatric AML. (a) Mutation-based inference of clonality in 197 TARGET AML cases with WGS and 177 TCGA AML cases
identifies 2 or more detectable clones in the majority of patients across age ranges. (b) A similar pattern with overall fewer detectable clones was observed by
karyotypic inference of clonal relationships at presentation.
a
b
Infants
(age <3)
Children
(age 3-15)
AYA
(age 15-40)
Adults
(age >40)
Mutational clones detected at diagnosis
Karyotypic clones detected at diagnosis
Nature Medicine: doi:10.1038/nm.4439
Figure S13. Gene variants alone and in combination impact pediatric AML outcomes. (a) 963 patients from the TARGET dataset with clinical results for FLT3 internal
tandem duplication (ITD), NPM1, WT1, NUP98-NSD1 fusion had clinical outcome data for analysis. Patients with a combination of FLT3 ITD and WT1 or NUP98-NSD1
versus FLT3 ITD alone or in combination with NPM1 mutation exhibit significantly decreased overall survival (p<0.001). (b) Similar results were found for COG trial
AAML0531 (b), COG trial CCG-2961 (c), and the Dutch Childhood Oncology Group (DCOG) (d). In each trial those with FLT3 ITD plus WT1 and/or NUP98-NSD1 fusion
exhibit significantly worse overall survival. The exact numbers of patients in each subgroup are indicated in the table below the figures. The total numbers of evaluable
patients is indicated in the table below. ITD, FLT3-ITD.
Cohort ITD- ITD-
NPM1+ ITD-
WT1+ ITD-
NPM1+
WT1+
ITD-
WT1+
NUP98-NSD1+
ITD-
NUP98-NSD1+ ITD+ ITD+
NPM1+ ITD+
WT1+ ITD+
NPM1+
WT1+
ITD+
WT1+
NUP98-NSD1+
ITD+
NUP98-NSD1+ ITD-
NPM1+
NUP98-NSD1+
Total
TARGET 687 37 56 7 4 0 72 27 27 7 17 21 1 963
AAML 0531 651 41 43 5 3 0 67 28 21 3 12 13 1 888
CCG-2961 435 41 27 2 0 0 17 8 11 2 4 9 0 556
DCOG 225 14 14 0 1 1 28 9 9 0 4 9 0 314
Nature Medicine: doi:10.1038/nm.4439
Figure S14. Remission rates vary for pediatric AML with FLT3-ITD according to cooperating mutations. The CCG-2961, AAML0531 and DCOG cohorts were combined
to compare complete remission (CR) rates after one cycle of induction therapy for groups with FLT3-ITD cooperating mutations, as shown. CR rates are consistent with
the survival outcomes (Figs. 3c and S13) among these studies: the poorest outcome group containing FLT3-ITD and a cooperating WT1 and/or NUP98-NSD1 fusion had
the lowest CR rate, at 54.8%. The most favorable group, FLT3-ITD positive, NPM1 positive at 93.0% (groupwise p<0.0001, Kruskal-Wallis).
Nature Medicine: doi:10.1038/nm.4439
Figure S15. Novel ZEB2 and MBNL1 Deletions. (a-b) show short (<500 Kbp) deletion segments along chromosomes 2 (panel a, ZEB2) and chromosome 3 (panel b,
MBNL1) in TARGET discovery cohort samples (n=197). (c) With the exception of one ZEB2-deleted sample (red point at top right of panel c), samples with ZEB2 and
MBNL1 deletions are not impacted by large numbers of other CNVs.
a b
c
Nature Medicine: doi:10.1038/nm.4439
Figure S16. Novel ELF1 focal deletions in the TARGET discovery cohort. (a) Genome browser view of segmental deletions covering the ELF1 locus. Patients (n=197) are
in rows, blue bars indicate length of deletion in that genomic region. (b) Genomic deletions were confirmed in a secondary assay using the nCounter CNV assay
(Nanostring Techologies), with verification (boxed specimens with low probes signals as identified by green signals in the heatmap below) of all ELF1 deletions initially
identified by WGS. (c) Expression values (RPKM) of ELF1 differ between those with the deletion and those with wild-type copy number (p=0.0077). (d) Unsupervised
clustering of 63 differentially expressed genes (p<0.01) between patients with and without ELF1 deletion shows many genes are upregulated in the samples with ELF1
deletions. Orange labels on the y axis indicate patients with an ELF1 deletion.
a b
c
Expression of ELF1
ELF1
delet
ion
ELF1
WT
0
200
400
600
800
1000
1200
Exp
ressio
n V
alu
e
ELF1
exp
ress
ion
(R
PK
M)(
RP
KM
)exp
ress
ion
del WT
d
Nature Medicine: doi:10.1038/nm.4439
Figure S17. Summary view of the key fusion classes in pediatric AML. Each colored region represents a fusion family. Descriptive labels are written adjacent to each
family. The fusion partner genes for each family are indicated by their HGNC symbols. The lines connecting gene symbols indicate fusion partners. The thickness of each
line reflects the frequency of the observed fusion.
Nature Medicine: doi:10.1038/nm.4439
Figure S18. Varying the age cutoff for infants (< 3 years in Figure 4b) vs. children, to < 2 or even < 1, does not substantially alter conclusions about fusion prevalence.
Panel c is the same as Fig. 4b (reproduced here for comparison). Panels a and b show how samples shift between age groups if the infant-child threshold is reduced to
<2 years (b), or <1 year (a). Fusions are listed in the same order as in 4b and used the same color scheme.
AYA Adults Infants
<3
Children
3-15 AYA Adults Infants
<2
Children
2-15 AYA Adults Infants
<1
Children
1-15
a b c
Nature Medicine: doi:10.1038/nm.4439
Figure S19. Co-occurring mutations with CEBPA. (a) Oncoprint (http://www.cbioportal.org) showing all TARGET samples with functionally-validated CSF3R
mutations20. Green indicates samples with mutations. (b) CEBPA and GATA2 mutations combinatorially impact Event-Free Survival.
CEBPA and/or GATA2 in Normal Cyto EFS
0 1000 2000 3000 40000
20
40
60
80
100
CEBPA +, GATA2 +
CEBPA +, GATA2 -
GATA2 +, CEBPA -
Wildtype
EFS (Days)
Perc
en
t su
rviv
al (N=16)
(N=13) (N=143)
(N=7)
P=0.0177
b a
Nature Medicine: doi:10.1038/nm.4439
Figure S20. Patterns of mutual co-occurrence and mutual exclusion among somatic pediatric AML variants. (a) Patterns of co-occurrence and (b) mutual exclusion
among variants in the TARGET cohort were evaluated using CoMEt (see online methods). Line thickness represents –log(p-value) for the observed co-occurrence rates.
Orange boxes indicate cytogenetic groups. Except for copy number alterations at the top-right, which were only evaluated within the 197 samples with WGS, all other
relations are among 684 samples with TCS. (c) An alternative derivation of conditional gene-gene relationships using a penalized Ising model yields similar conditional
dependencies.
a b
Nature Medicine: doi:10.1038/nm.4439
c
Nature Medicine: doi:10.1038/nm.4439
Figure S21. Anti-correlated DNA methylation and reduced transcription potential. By scanning 2000 bp upstream and 200 bp downstream of the transcription start
site (TSS) for all known ENSEMBL isoforms of ~8000 expressed genes in AML, we fit segmented regression models of DNA methylation (X axis) against asinh (transcripts
per million, TPM, Y axis) of each transcript or gene. Hyperbolic arcsine (asinh) is similar to log transformation but is defined at all points along the real number line.
Since large batch effects confound the biological differences between TARGET pediatric AML and TCGA adult AML mRNA data, we opted to take the within-cohort
median expression for samples with 10% or less methylation at a CpG locus, and the “silencing threshold” at the locus corresponding to the gene of interest was then
defined as the methylation fraction beyond which no sample in a cohort exceeded the median “unmethylated” expression level (from samples with <= 10%
methylation) within its cohort. Any locus where healthy progenitors or myeloid cells showed >= 10% methylation was omitted from consideration. After these filtering
steps, the most significantly associated locus (ideally correlated with r > 0.8 against its neighboring loci) was then selected as a “tag CpG” for the downstream
transcript(s). A tag CpG for HumanMethylation450 arrays and either the same locus or (if not present) the best surrogate locus for HumanMethylation27 arrays passing
the filters was retained for silencing “calls”. If no suitable HumanMethylation27 locus could be found, only samples with HumanMethylation450 data were assayed for
silencing of a given gene. This method identified 119 genes with recurrent silencing by promoter hypermethylation within the TARGET and TCGA datasets. Examples
below include THRB and WDR35 (components of NMF signature 2 and 13 signals, respectively), CDKN2B, and ULBP1, ULBP2 and ULBP3 (NK ligands). The red line marks
the empirically determined silencing threshold (% methylation).
Nature Medicine: doi:10.1038/nm.4439
Figure S22. Integrative analysis of gene mutations, deletion, and transcriptional silencing by promoter methylation. Silencing (gold) or mutation/deletion events
(gray) for each gene (rows) are displayed for all assayed patients (columns), with marginal total of events per patient illustrated in the upper histogram. The plotted
data reflects 172 TCGA cases and 284 TARGET cases at 119 genes and are outlined in Tables S8-S9. These data represent a complete illustration of the subset shown in
Fig. 5a with differences in row/column ordering based on differing clustering solutions for greater numbers of samples and genes.
Cohort
silenced mutated
Status
Nature Medicine: doi:10.1038/nm.4439
Figure S23. NMF Deconvolution of genome-wide methylation patterns. DNA methylation signatures derived by non-negative matrix factorization (NMF) and in silico
purification. Samples are ordered by hierarchical clustering of signatures (labeled at right) and demonstrate the relative similarity of methylation features from samples
within cytogenetic categories (top ribbon). The plotted data are outlined in Table S10 and represent a complete illustration of those shown in Fig 5b.
Cohort
Associations
Nature Medicine: doi:10.1038/nm.4439
Figure S24. Two DNA methylation signatures mark poor prognosis. Kaplan-Meier plots for signatures 2 and 13. After stratifying by cohort and adjusting for both TP53
mutation status and white blood cell count, these two signatures predict significantly (p < 0.05) poorer event-free survival in both pediatric and adult patients with
above-median scores.
DNA methylation signature #2 DNA methylation signature #13
Nature Medicine: doi:10.1038/nm.4439
Figure S25. Unsupervised Nonnegative Matrix Factorization (NMF) Clustering of miRNA Expression. This figure is a fully annotated version of Fig. 6A in the main text.
Unsupervised NMF clustering of miRNA expression patterns in pediatric AML samples revealed 4 discrete pediatric subgroups (marked by the numbered colored
rectangles at the top) that were correlated with specific genomic alterations (indicated by blue bars in the gray annotation rows below the “race” and “FAB category”
annotations, near the top).
Consensus
matrix
Co
nse
nsu
s m
atri
x
sco
re
Expression z-
score
1 2 3 4
Nature Medicine: doi:10.1038/nm.4439
Figure S26. Kaplan-Meier plots for samples expressing low and high levels of miRs let-7a-3p, let-7b-5p and 30a-3p. The expression (RPM) cut point between high and
low expression groups for each miRNA was defined using the X-tile method77, where all separation points between patients are considered and the selected cut point is
the one that provided the optimal (lowest) EFS log rank p-value. OS, overall survival.
P=<0.0001
P=<0.0001
P=0.0001
Nature Medicine: doi:10.1038/nm.4439
Figure S27. Kaplan-Meier plots for samples expressing low and high levels of miRs 155-5p, 3614-5p, 4662-5p and 26a-2-3p. The expression (RPM) cut point between
high and low expression groups for each miRNA was defined using the X-tile method77. OS, overall survival.
Nature Medicine: doi:10.1038/nm.4439
Figure S28. High expression levels of miRs 133a-3p, 212-3p, and 29c-5p have deleterious effects on event free survival (EFS). The expression (RPM) cut point between
high and low expression groups for each miRNA was defined using the X-tile method77. EFS, event-free survival.
Nature Medicine: doi:10.1038/nm.4439