Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu,...

124
www.sciencetranslationalmedicine.org/cgi/content/full/5/197/197ra101/DC1 Supplementary Materials for Genome-Wide Mutational Signatures of Aristolochic Acid and Its Application as a Screening Tool Song Ling Poon, See-Tong Pang,* John R. McPherson, Willie Yu, Kie Kyon Huang, Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache, Dachuan Huang, Lian Dee Ler, Maarja-Liisa Nairismägi, Ming Hui Lee, Ying-Hsu Chang, Kai-Jie Yu, Waraporn Chan-on, Bin-Kui Li, Yun-Fei Yuan, Chao-Nan Qian, Kwai-Fong Ng, Ching-Fang Wu, Cheng-Lung Hsu, Ralph M. Bunte, Michael R. Stratton, P. Andrew Futreal, Wing-Kin Sung, Cheng-Keng Chuang, Choon Kiat Ong, Steven G. Rozen,* Patrick Tan,* Bin Tean Teh* *Corresponding author. E-mail: [email protected] (B.T.T.); [email protected] (P.T.); [email protected] (S.G.R.); [email protected] (S.-T.P.) Published 7 August 2013, Sci. Transl. Med. 5, 197ra101 (2013) DOI: 10.1126/scitranslmed.3006086 This PDF file includes: Materials and Methods Fig. S1. Frequency of mutations in carcinogen-induced cancers. Fig. S2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome. Fig. S3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs. Fig. S4. Systematic up-regulation of NMD gene transcripts in AA-UTUC compared to adjacent normal tissue. Fig. S5. A heterozygous 3splice-site mutation results in skipping of RFC2 exon 10 in AA-UTUC. Fig. S6. Strong association between CAG>CTG mutations at 3splice sites and altered splicing. Fig. S7. Further details of the in vivo model of AA-induced damage. Fig. S8. Superimposed individual tumor data points for the total nonsynonymous single-nucleotide variants and each of the separate mutation types in AA-HCCs and non–AA-HCCs. Fig. S9. Nineteen HCCs exhibiting a “weak” AA mutational signature. Fig. S10. Schematic representation of 3splice-site CAGs. Table S1. Clinical characteristics of AA-UTUC patients analyzed by whole- genome and/or exome sequencing.

Transcript of Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu,...

Page 1: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

www.sciencetranslationalmedicine.org/cgi/content/full/5/197/197ra101/DC1

Supplementary Materials for

Genome-Wide Mutational Signatures of Aristolochic Acid and Its Application as a Screening Tool

Song Ling Poon, See-Tong Pang,* John R. McPherson, Willie Yu, Kie Kyon Huang,

Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache, Dachuan Huang, Lian Dee Ler, Maarja-Liisa Nairismägi, Ming Hui Lee, Ying-Hsu Chang, Kai-Jie Yu,

Waraporn Chan-on, Bin-Kui Li, Yun-Fei Yuan, Chao-Nan Qian, Kwai-Fong Ng, Ching-Fang Wu, Cheng-Lung Hsu, Ralph M. Bunte, Michael R. Stratton, P. Andrew Futreal, Wing-Kin Sung, Cheng-Keng Chuang, Choon Kiat Ong, Steven G. Rozen,*

Patrick Tan,* Bin Tean Teh*

*Corresponding author. E-mail: [email protected] (B.T.T.); [email protected] (P.T.); [email protected] (S.G.R.); [email protected] (S.-T.P.)

Published 7 August 2013, Sci. Transl. Med. 5, 197ra101 (2013)

DOI: 10.1126/scitranslmed.3006086

This PDF file includes:

Materials and Methods Fig. S1. Frequency of mutations in carcinogen-induced cancers. Fig. S2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome. Fig. S3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs. Fig. S4. Systematic up-regulation of NMD gene transcripts in AA-UTUC compared to adjacent normal tissue. Fig. S5. A heterozygous 3′ splice-site mutation results in skipping of RFC2 exon 10 in AA-UTUC. Fig. S6. Strong association between CAG>CTG mutations at 3′ splice sites and altered splicing. Fig. S7. Further details of the in vivo model of AA-induced damage. Fig. S8. Superimposed individual tumor data points for the total nonsynonymous single-nucleotide variants and each of the separate mutation types in AA-HCCs and non–AA-HCCs. Fig. S9. Nineteen HCCs exhibiting a “weak” AA mutational signature. Fig. S10. Schematic representation of 3′ splice-site CAGs. Table S1. Clinical characteristics of AA-UTUC patients analyzed by whole-genome and/or exome sequencing.

Page 2: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

Table S2. Sequence analysis summary of whole genome–sequenced AA-UTUC (9T). Table S3. Breakdown of somatic mutations by genomic region. Table S4. Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-UTUC. Table S5. Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated regions) of AA-UTUC (9T). Table S6. Somatic substitutions in the intergenic regions of AA-UTUC (9T). Table S7. Sequence analysis summary of nine exome-sequenced AA-UTUCs. Table S8. Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs. Table S9. The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC. Table S10. The effect of +/− one base flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC. Table S11. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC. Table S12. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC. Table S13. The effect of +/− one base flanking the mutated TAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC. Table S14. The effect of +/− one base flanking the mutated CAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC. Table S15. Hypergeometric analysis for enrichment of CAG splice-site mutations in AA-UTUCs, AA-treated HK2 clones, and non–AA-associated cancers. Table S16. RPKM gene expression values for 15 NMD pathway genes in the AA-UTUC and matched normal tissue. Table S17. Identities of 3′ splice sites with CAG>CTG mutations and RPKM > 2. Table S18. 3′ splice sites without CAG>CTG mutations for evaluating the proportion of unmutated sites associated with aberrant splicing. Table S19. Sequence analysis summary of two exome-sequenced AA-treated HK2 clones. Table S20. Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones. Table S21. Comparison of mutation rates in AA-UTUC, carcinogen-induced cancers, mismatch repair–defective colorectal cancers, and POLE/POLD1 mutated colorectal cancers. Table S22. Primer sequences.

Other Supplementary Material for this manuscript includes the following: (available at www.sciencetranslationalmedicine.org/cgi/content/full/5/197/197ra101/DC1)

Page 3: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

Tables S1 to S22 (Microsoft Excel format)

Page 4: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

1

Materials and Methods Study Design

This was an observational, hypothesis-generating study based on archival tissue samples. The

study did not involve treatments for UTUC or HCC and did not consider prognosis or clinical

endpoints. The rationale for this study was the hypothesis that examination of genome and

exome sequences of UTUC tumors from patients with likely AA exposure, compared to matched

non-malignant tissues, would reveal new information about the mechanisms by which AA

induces tumorigenesis. The finding that some HCCs also show strong evidence of AA exposure

was serendipitous. Most HCC sequence data was from a previously published study (31), as

specified in the Main Text. After we detected likely AA signatures in the published HCC data,

we examined five additional HCC tumors sequenced by our group and found one that was likely

to have been exposed to AA.

The purpose of the mouse studies was to confirm that the compounds used for the in vitro

(cell line) experiments were indeed nephrotoxic. We planned to assess nephrotoxicity at three

time points, and therefore included 3 controls in the study. We planned to examine

nephrotoxicity in 4 mice after 10 days’ exposure, reasoning that there would be little change in

physiological status of the kidney. Based on previous literature (9), we expected clear

nephrotoxicity after 30 days’ exposure, and we planned to examine 10 mice to confirm this.

Again, based on previous literature, we expected severe nephrotoxicity after 90 days’ exposure

and planned to examine 6 mice to confirm this.

Page 5: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

2

Whole genome sequencing

We used Illumina TruSeq DNA Sample Prep Kit (Illumina Inc.) to prepare DNA for whole

genome shotgun (WGS) libraries. In brief, 1 µg of DNA from each sample was sheared using the

Covaris E210 instrument (Covaris) to a range of 100-700 base pairs. The resulting DNA

fragments were end-repaired, phosphorylated, and adenylated at the 3’ ends. Standard paired-end

adaptors were ligated to both ends, and fragments were subjected to gel electrophoresis (2%

agarose, 120 volts, 1 hour) and size selected by gel excision of the bands (400-500 bp). The

selected fragments were purified with MinElute Gel Extraction Kit (Qiagen Inc.) follow by

enrichment with PCR amplification (10 cycles) as for Illumina protocol. This produced a WGS

library for each sample, with inserts averaging 300 bp to 400 bp. AMPure XP Beads (Illumina)

were used for the PCR clean-up. WGS libraries were sequenced on an Illumina HiSeq 2000 as

paired-end 76-base pair reads, resulting in an average haploid coverage of 33X for the tumor

genome and 33X for the normal genome.

Exome sequencing

Genomic DNA from the AA-UTUC and the adjacent normal tissues from nine patients (3 µg per

sample) were used to prepare fragment libraries suitable for massively parallel paired-end

sequencing as previously described (1, 2). The coding sequences were enriched by the SureSelect

Human All Exon kit v3 and sequenced using the Illumina Genome Analyzer IIx using 76-bp

paired-end reads.

Page 6: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

3

Bioinformatic analysis of genome and exome

We used the Burrows-Wheeler Aligner (BWA, http://bio-bwa.sourceforge.net/) to align the

sequence reads to the human reference genome NCBI GRC Build 37 (hg19) and employed

SAMtools (http://samtools.sourceforge.net/) to remove PCR duplicates. In order to detect single-

nucleotide variants (SNVs), we used a discovery pipeline based on the Genome Analyzer Toolkit

(GATK). Our pipeline first recalibrated base qualities and realigned the sequence reads around

micro-indels. Next, we employed the GATK Unified Genotyper, which performs consensus

calling, in order to identify SNVs. Only well-mapped reads (mapping quality ≥30 and number of

mismatches within a 40-bp window ≤3) were used as input for the genotyper. We retained SNVs

that passed additional quality filters (quality by depth ≥3, variant depth ≥10 and normal depth ≥5)

and discarded any SNV close to a micro-indel or to several other SNVs. The quality-by-depth

score is GATK's consensus quality score for a variant divided by the (unfiltered) read depth at

that position (http://www.broadinstitute.org/gsa/wiki/index.php/main_page). The consensus

quality score increases when there is more evidence for the existence of a variant at that position.

If quality by depth is low, the inference is that evidence for the existence of a variant is weak in

proportion to the number of reads available. The variant depth refers to the number of variant

reads in the tumor sample, and the normal depth refers to the number of reads in the same

position in the corresponding normal sample. We compared our variants against the common

polymorphisms present in dbSNPv135 (http://www.ncbi.nlm.nih.gov/projects/SNP/) and in the

1000 Genomes Project databases (http://www.1000genomes.org/), in order to discard any

common SNPs. Several cancer somatic mutations are also present in dbSNP, and we retained any

common variants also found to be present in COSMIC v52. All variants retained after this step

were considered to be newly identified here. Whenever possible, we used the Consensus CDS

Page 7: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

4

(CCDS) gene annotation database to determine amino acid changes. In the few instances where

CCDS annotations were not available for SureSelect capture targets, we used RefSeq annotations,

or, if those also were not available, Ensembl or UCSC annotations were used. For exome

sequencing data, only SNVs in exons or in canonical splice sites were further analyzed. Amino

acid changes corresponding to SNVs were annotated according to the largest transcript of the

gene. In order to determine somatic variants for analysis, we compared the lists of newly

identified non-synonymous SNVs detected in the tumor with those found in the corresponding

normal sample and retained the ones that were present only in the tumor. All such somatic SNVs

were submitted to PolyPhen (http://genetics.bwh.harvard.edu/pph2/indext.shtml) for functional

prediction. Sanger capillary sequencing was used to validate selected mutations predicted by

deep sequencing.

Sanger sequencing validation

Sanger sequencing primers were designed using Primer3 software (http://frodo.wi.mit.edu).

Purified PCR products were sequenced in the forward and reverse directions using ABI PRISM

BigDye Terminator Cycle Sequencing Ready Reaction kits (Version 3) and an ABI PRISM 3730

Genetic Analyzer.

Given the high mutation rates observed in the AA-UTUC, it is impractical to validate all

candidate mutations. Thus, for exome sequencing data, we randomly selected at least 100

candidate somatic coding mutations from each tumor for validation. Thirty mutations could not

be tested due to PCR failure, and of the remaining 970 mutations, 946 were confirmed by Sanger

sequencing (97.5%). For whole genome sequencing, we randomly selected 50 coding mutations,

Page 8: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

5

100 intragenic mutations and 100 intergenic mutations for validation. ~98% of the WGS

predicted somatic substitutions were confirmed as genuine somatic mutations.

Splice site mutation enrichment analysis

We used a hypergeometric test to determine the probability of observing > X somatic CAG

mutations at splice sites from among a total of N somatic CAG mutations. To determine the

number of assayed CAGs at splice sites and the total number of assayed CAGs at splice and non-

splice sites, we used the CCDS entries corresponding to the Agilent SureSelect v3 all-exon

capture kit to generate a list of genes to be considered. We used the CCDS (Consensus CDS)

database (http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) to extract the corresponding

ENSEMBL (http://www.ensembl.org/index.html) transcript annotations, including the following

information: genomic start and stop coordinates for 3’ UTR, 5’UTR, and exons. After removal of

3’ and 5’ UTRs, a CAG was considered to be at a 3’ splice site according to fig. S10.

In a total of 16,546 genes, the total number of CAGs at splice sites is 93,292, while the

total number of CAGs at non-splice sites is 776,280. Given these background counts of splice-

site and non-splice-site CAGs in the captured exome, for each tumor we used the phyper()

function in the R statistical package (http://www.r-project.org/) to calculate the probability of

observing greater than X somatic CAG splice site mutations in a total of N somatic CAG

mutations.

RNA sequencing and analysis

Libraries for sequencing were prepared using the Illumina Tru-Seq RNA Sample Preparation v2,

according to the manufacturer’s instructions. Briefly, poly-A RNA was recovered from 1 µg of

Page 9: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

6

total RNA using poly-T oligo attached magnetic beads. The recovered poly-A RNA was then

chemically fragmented and converted to cDNA using SuperScript II and random primers. The

second strand was synthesized using the Second Strand Master Mix provided in the kit, followed

by purification with AMPure XP beads. The ends of the cDNA were repaired using 3’ to 5’

exonuclease. A single adenosine was added to the 3′ end, and the adaptors were attached to the

ends of the cDNA using T4 DNA ligase. The fragments with adapters ligated onto both ends

were enriched by PCR. Libraries were validated with an Agilent Bioanalyzer (Agilent

Technologies). Libraries were diluted to 11 pM and applied to an Illumina flow cell using the

Illumina Cluster Station. Sequencing was performed on an Illumina High Seq2000 sequencer

with the paired-end 76-bp read option, according to the manufacturer’s instructions.

For the analysis of the RNA sequencing data, we converted Illumina-format “bcl” (base

call) files to fastq files using Illumina’s CASAVA 1.8 software (http://

support.illumina.com/sequencing/sequencing/sequencing_software/casava.ilmn). We used

Tophat 1.2 (http://tophat.cbcb.umd.edu/) to map the reads to hg19 and the ENSEMBL60 gene

annotation. We visualized the exons surrounding splice site mutations using the UCSC genome

browser (http://genome.ucsc.edu/). Mapped reads were analyzed via RNA-SeQC

(https://confluence.broadinstitute.org/display/CGATools/RNA-SeQC) for quality control and

exon level quantification. We used Cufflinks 1.3 (http://cufflinks.cbcb.umd.edu/) for whole-

transcript quantification of NMD pathway genes.

RT-qPCR

Total purified RNA was reverse-transcribed to cDNA using iScript™ cDNA Synthesis Kit (Bio-

Rad) following the manufacturer’s instructions. Real-time PCR was performed with SsoFast

Page 10: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

7

EvaGreen Supermix according to the manufacturer’s recommendations using a CFX96™ Real-

Time PCR Detection System (Bio-Rad), and the primers for UPF1, UPF2, UPF3A and MAGOH

used in this study were designed accordingly (table S22). All RT-qPCR experiments were run in

triplicate, and a mean value was used for the determination of mRNA levels. Negative controls

containing water instead of sample cDNAs were used in each experiment. Relative quantification

of the mRNA levels of UPF1, UPF2, UPF3A and MAGOH was performed using the

comparative Cq method with GAPDH as the reference gene and with the formula 2-∆∆Cq.

Human proximal tubule epithelial cells (HK2) exposed to AA

Human immortalized proximal tubule epithelial cells (HK2) were obtained from the American

Type Culture Collection and grown in 10% Keratinocyte medium supplemented with bovine

pituitary extract and human epidermal growth factor (Life Technologies) at 37º C in a humidified

atmosphere of 5% CO2. After reaching 90% confluence, the cells were sub-cultured using a

trypsin/EDTA solution (0.05% trypsin, 0.5 mM EDTA). HK2 cells were treated with aristolochic

acid-I (Sigma-Aldrich Co., Cat# A9451) at a subtoxic dose (10 µM) for six months. Two

individual clones from these treated cells were then randomly selected and subjected to exome

analysis as described above.

Induction of AA-associated nephropathy in C57/B6 mice

A total of 23 mice were used in this experiment. 20 mice were gavaged with AA at a dose of 50

mg/kg of body weight in phosphate-buffered saline (PBS) for three days while 3 non-treated

mice were used as controls at each indicated time point. The mice were killed under diethyl ether

anesthesia, on days 10 (4 mice), 30 (10 mice), and 90 (6 mice) after AA treatment. The AA-

Page 11: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

8

treated and non-treated mouse kidneys were fixed with 4% formaldehyde and sectioned at 4 µm

throughout the kidney. To evaluate glomerulonephritis and tubulointerstitial nephritis in mouse

kidneys, one section in every 20 sequential sections was selected for hematoxylin and eosin

staining. The procedures for the present study were approved by the Animal Committee under

Singapore Health Institute, and all animals were treated according to the guidelines for animal

experimentation of Singapore Health Institute (IACUC protocol #2012/SHS/773).

Statistical analysis Statistical analysis was carried out in the R statistical programming environment (http://r-

project.org). Enrichment for splice-site mutations (Fig. 3A, table S15) was determined by

hypergeometric tests (R function phyper). Up-regulation of NMD genes (Fig. 3B) was analyzed

by one-sided Wilcoxon rank-sum tests (function wilcox.test). Up-regulation of 13 out of 15

NMD genes (fig. S4 and table S16) was analyzed by a two-sided Wilcoxon signed-rank test

(function wilcox.test). fig. S6 used a two-sided Fisher’s exact test (function fisher.test).

Page 12: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

9

Supplementary Figures

Supplementary Figure 1. Frequency of mutations in carcinogen-induced cancers.

Page 13: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

10

Supplementary Figure 2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome.

Page 14: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

11

Supplementary Figure 3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs.

Page 15: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

12

Supplementary Figure 4. Systematic up-regulation of NMD gene transcripts in AA-UTUC

compared to adjacent normal tissue. The same 13 genes reported to be up-regulated in the

activation of NMD in myelodysplasia (26) were also up-regulated in AA-UTUC tumor. P-value

by Wilcoxon signed-rank test.

Page 16: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

13

Supplementary Figure 5. A heterozygous 3′ splice-site mutation resulting in skipping of RFC2

exon 10 in AA-UTUC. Bridging reads, confirming the exon skipping, are shown.

Page 17: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

14

Supplementary Figure 6. Strong association between CAG>CTG mutations at 3′ splice-sites

and altered splicing. (A) We looked for altered splicing in tumor transcripts at the locations of 15

CAG > CTG mutations at 3' splice-sites preceding internal exons. These were mutations in the

15 genes with adequate coverage to assess splicing alteration (Whole gene RPKM by

Cufflinks >2), table S17. (B) As a control data set, we searched for altered splicing near 29

unmutated 3’ splice sites with similar RPKMs (table S18). (C) We tested for enrichment of

altered splicing at the sites of the 15 CAG > CTG mutations.

Page 18: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

15

Supplementary Figure 7. Further details of the in vivo model of AA-induced damage. (A,B) Non-treated and AA-treated mouse kidneys. (C,D) Non-treated vs. AA-treated mouse kidney, H&E staining, 100X, scale bar = 100 µm and 200X, scale bar = 50 µm. At day 10, there was a dramatic accumulation of proteinic fluid within the tubules of the AA-treated kidney (*). At days 30 and 90, the tubular epithelial cells were necrotic and had collapsed, leaving the tubular basement membrane naked (#). There were few signs of regeneration. The glomeruli demonstrated ischemic shrinkage (arrowheads). At day 90, local inflammatory infiltration (arrows) was evident in the interstitium. In addition, urothelial dysplasia developed.

Page 19: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

16

Supplementary Figure 8. Superimposed individual tumor data points for (A) total

nonsynonymous single-nucleotide variants and (B) each of the separate mutation types in AA-

HCCs and non–AA-HCCs.

Page 20: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

17

Supplementary Figure 9. Nineteen HCCs exhibiting a “weak” AA mutational signature. (A) Numbers of mutations in each of six possible mutation classes. (B) The sequence contexts for A > T mutations in the HCC exomes. Mutation rates are expressed as fractions of the counts of the given triplet. For example, the rate of C[A>T]G mutations is the rate per million CAG triplets. (C) Strand bias: there are about twice as many A > T mutations on the sense (s; non-transcribed) strand as on the antisense (a; transcribed) strand.

Page 21: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

18

Supplementary Figure 10. Schematic representation of 3′ splice-site CAG.

Page 22: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

19

Supplementary Table 1

Clinical characteristics of AA-UTUC patients analyzed by whole-genome and/or exome sequencing

Sample

Age at

diagnosis Gender Characteristic Grade

Herb

intake

Renal function before

operation Surgery time*

Exome

sequenced

Whole

genome

sequenced MSI

3T 60 Female papillary Low Yes Normal 2009/03/18 Yes No MSS

6T 56 Female papillary Low Yes Poor (Cr 2.41 mg/dl) 2008/09/03 Yes No MSS

9T 43 Female infiltrating High Yes Poor (Cr 4.2 mg/dl) 2007/08/15 Yes Yes MSS

10T 67 Female infiltrating High Yes Normal 2007/04/10 Yes No MSS

13T 64 Female papillary Low Yes Normal 2008/05/20 Yes No MSS

20T 71 Female papillary Low Yes Normal 2008/06/04 Yes No MSS

79T 70 Female papillary Low Yes Normal 2009/08/14 Yes No MSS

80T 78 Female infiltrating High Yes ESRD 2009/09/21 Yes No MSS

100T 72 Female infiltrating High Yes Poor (Cr 3.7 mg/dl) 2010/06/21 Yes No MSS

Note: Cr unit: creatinine ratio ESRD: End stage renal disease *Surgery time is when tissue was collected MSS: Microsatellite stable

Page 23: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

20

Supplementary Table 2

Sequence analysis summary of whole genome–sequenced AA-UTUC (9T)

Ave. Depth

Per Targeted

Base

Targeted Bases

with Depth at

Least 1X

Targeted Bases

with Depth at

Least 20X

9T Normal 33 96.9% 81.3%

Tumor 33 96.8% 85.2%

Page 24: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

21

Supplementary Table 3

Breakdown of somatic mutations by genomic region

Category Class Total

Mutation Rates

(per Mb) Region Size

Base Substitutions Intragenic (transcribed) 201,192 130.7 1,538,519,851

Coding region 2858 75.44 37,883,392

Intron, 5' UTR, 3' UTR 198,334 132.17 1,500,636,459

Intergenic Region 237,680 174.9 1,358,790,611

Total Mutations 438,872 151.4 2,897,310,462

Page 25: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

22

Supplementary Table 4

Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-

UTUC

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

AP3B2 9T g.chr15: 83331919 A>T L836Q Missense

ATP2B3 9T g.chrX: 152818503 A>G I612V Missense

BCAP31 9T g.chrX: 152966174 T>A H165L Missense

BTBD7 9T g.chr14: 93730138 T>A Y455F Missense

CIT 9T g.chr12: 120241041 T>A S422C Missense

CUL9 9T g.chr6: 43189421 A>T - Splice site

DNAH6 9T g.chr2: 84954839 A>T Q3340L Missense

DNAH6 9T g.chr2: 84936672 T>A - Splice site

FBXO9 9T g.chr6: 52938277 A>T - Splice site

GOLGA6L2 9T g.chr15: 23688976 T>A E180V Missense

HDAC6 9T g.chrX: 48673239 A>T - Splice site

IKBKAP 9T g.chr9: 111653686 T>A - Splice site

KCNA6 9T g.chr12: 4919578 A>T Y124F Missense

KCNH6 9T g.chr17: 61619736 T>A Y697N Missense

KCNQ1 9T g.chr11: 2591856 A>T - Splice site

KDM6A 9T g.chrX: 44928975 A>T Q692L Missense

KRT18P29 9T g.chr2: 182826925 T>A Y94F Missense

KRT37 9T g.chr17: 39577829 A>T L344Q Missense

LRRC1 9T g.chr6: 53764543 A>T - Splice site

MED24 9T g.chr17: 38179458 T>A S726C Missense

MERTK 9T g.chr2: 112767579 C>T P672L Missense

MLL 9T g.chr11: 118371808 A>T T2086P Missense

MSH2 9T g.chr2: 47635644 A>T R106X Nonsense

MYO7B 9T g.chr2: 128341759 T>A V469E Missense

MYO7B 9T g.chr2: 128341759 T>A V469E Missense

NCAPD2 9T g.chr12: 6627094 A>T I520F Missense

NCRNA00219 9T g.chr5: 111497846 A>T - Splice site

PIK3C2A 9T g.chr11: 17190802 T>A S163C Missense

PPIP5K2 9T g.chr5: 102482229 A>T - Splice site

PRPF4B 9T g.chr6: 4041004 A>T - Splice site

PSD4 9T g.chr2: 113958889 T>A L1023Q Missense

RFX4 9T g.chr12: 107113771 A>T Q391L Missense

RP11-141O11.2 9T g.chr5: 68265748 T>C - Splice site

RP11-569O4.6 9T g.chr13: 21523056 T>A H80L Missense

Page 26: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

23

Supplementary Table 4 continued

Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-

UTUC

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

RP1-238O23.5 9T g.chr6: 41100612 A>T - Splice site

RPS4XP21 9T g.chr19: 34583858 A>T Y131N Missense

SEC14L1 9T g.chr17: 75202813 A>T S449C Missense

SLC4A1 9T g.chr17: 42330725 T>A K691I Missense

SOLH 9T g.chr16: 599377 A>T Y583F Missense

SRPK3 9T g.chrX: 153049776 T>A V392E Missense

TBC1D21 9T g.chr15: 74180019 T>A L279Q Missense

TECPR2 9T g.chr14: 102916105 A>T Q1072L Missense

TP53 9T g.chr17: 7578556 T>A - Splice site

TP53 9T g.chr17: 7579329 T>A K120X Nonsense

TRPM4 9T g.chr19: 49703872 A>T K928M Missense

U52112.12 9T g.chrX: 153154005 T>A M149K Missense

USP20 9T g.chr9: 132637837 A>T - Splice site

WDR6 9T g.chr3: 49051910 C>T H951Y Missense

WDR87 9T g.chr19: 38383525 A>T C901S Missense

ZBTB44 9T g.chr11: 130131321 T>A N150Y Missense

Page 27: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

24

Supplementary Table 5

Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated

regions) of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

SLC9A6 9T g.chrX: 135103323 A>T

HIP1 9T g.chr7: 75275455 T>A

EFCAB2 9T g.chr1: 245226587 A>T

uc002lcx.2 9T g.chr18: 45111034 A>T

IL1RAPL2 9T g.chrX: 103935618 T>A

CHSY3 9T g.chr5: 129280152 T>A

HS3ST4 9T g.chr16: 25911747 T>A

C8orf46 9T g.chr8: 67427820 T>A

HECW2 9T g.chr2: 197274587 T>A

IDE 9T g.chr10: 94264889 T>A

TRPM3 9T g.chr9: 73666998 A>T

NP_005699 9T g.chr13: 94463217 A>T

RP11-445O3.2 9T g.chr5: 4730066 T>A

SMOC2 9T g.chr6: 168893650 A>T

LINGO2 9T g.chr9: 28177909 T>A

RBMS3 9T g.chr3: 28979525 G>A

B4DJW3 9T g.chr10: 89901054 T>A

Q59GJ1 9T g.chr9: 87545086 A>T

NBEA 9T g.chr13: 35733188 A>T

NRXN3 9T g.chr14: 79573597 T>A

SORCS2 9T g.chr4: 7287997 A>T

MBOAT1 9T g.chr6: 20144580 T>A

RP3-410B11.1 9T g.chrX: 18066741 T>A

REEP1 9T g.chr2: 86552956 A>T

TNIK 9T g.chr3: 170958372 G>T

NDFIP2 9T g.chr13: 80093913 A>T

GCLC 9T g.chr6: 53401065 T>A

RNASEN 9T g.chr5: 31414219 T>A

ABCC9 9T g.chr12: 22068897 T>A

FSTL4 9T g.chr5: 132584785 A>T

ETV2 9T g.chr19: 36133111 T>A

CDH18 9T g.chr5: 19742210 T>A

NP_001139395 9T g.chr15: 76688922 T>A

AC011294.3 9T g.chr7: 46766363 T>A

Page 28: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

25

Supplementary Table 5 continued

Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated

regions) of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

AL355916.1 9T g.chr14: 62026870 A>T

NPAS3 9T g.chr14: 33915975 A>T

ENOX2 9T g.chrX: 129871217 T>A

RAP1GDS1 9T g.chr4: 99358642 A>T

ACO2 9T g.chr22: 41885492 A>T

CADM2 9T g.chr3: 85287890 T>A

AC133680.1 9T g.chr3: 24902748 A>G

RALGPS2 9T g.chr1: 178850123 T>A

RP11-400D2.3 9T g.chr4: 135345443 T>A

uc001pkv.1 9T g.chr11: 110207442 A>T

Q67FW5 9T g.chr17: 80989752 T>A

FANCC 9T g.chr9: 98058513 T>A

PDE8B 9T g.chr5: 76602773 T>A

SLC25A36 9T g.chr3: 140694351 A>T

ABCA4 9T g.chr1: 94521077 T>A

ZNF473 9T g.chr19: 50538507 A>T

GABRB1 9T g.chr4: 47204188 G>T

MSH3 9T g.chr5: 79969597 A>G

RP11-20B7.1 9T g.chr3: 73914159 T>A

NP_443068 9T g.chr10: 73178045 T>A

NEK11 9T g.chr3: 130918963 A>T

uc003xyc.1 9T g.chr8: 69905012 A>T

MITF 9T g.chr3: 69815171 A>T

RP11-978I15.10 9T g.chr1: 247774573 A>T

C9orf171 9T g.chr9: 135308436 A>T

Q8WWA6 9T g.chr7: 111940387 A>T

RP11-779P15.2 9T g.chr3: 99226142 A>T

NCRNA00210 9T g.chr1: 218093698 A>C

RPL14 9T g.chr3: 40500832 A>T

uc010kgo.2 9T g.chr6: 135900536 C>T

RP11-454C18.2 9T g.chr3: 151613997 T>A

C1orf57 9T g.chr1: 233093436 A>T

Q8NB90-3 9T g.chr4: 123885255 A>T

MOBKL2B 9T g.chr9: 27475826 A>T

Page 29: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

26

Supplementary Table 5 continued

Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated

regions) of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

RBM6 9T g.chr3: 50031686 A>T

CCNB3 9T g.chrX: 50089427 A>T

WDR64 9T g.chr1: 241962096 T>C

HECW1 9T g.chr7: 43423583 T>T

CTD-2245E15.3 9T g.chr5: 1546591 A>G

GREB1L 9T g.chr18: 18890938 T>A

KIAA0802 9T g.chr18: 8771099 A>T

PIK3C2G 9T g.chr12: 18496516 A>G

DGKB 9T g.chr7: 14976051 A>T

CTD-2340E1.3 9T g.chr5: 101977310 T>A

NELL1 9T g.chr11: 21236047 A>T

HS6ST3 9T g.chr13: 97412407 T>A

Q0VG04 9T g.chr18: 48134672 T>A

TMEM64 9T g.chr8: 91650736 T>A

SETD3 9T g.chr14: 99941572 T>A

KCNJ6 9T g.chr21: 39141463 T>C

PALM2 9T g.chr9: 112544722 A>T

CDH18 9T g.chr5: 20465415 A>T

ZNF438 9T g.chr10: 31162692 A>T

AC068987.1 9T g.chr12: 52744781 A>T

Page 30: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

27

Supplementary Table 6

Somatic substitutions in the intergenic regions of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

Intra_1 9T g.chr10: 65943421 A>T

Intra_2 9T g.chr18: 12394557 A>T

Intra_3 9T g.chrX: 132126132 T>C

Intra_4 9T g.chr6: 164071634 A>T

Intra_5 9T g.chr3: 104590294 T>A

Intra_6 9T g.chr4: 60329949 T>A

Intra_7 9T g.chr14: 45940707 A>T

Intra_8 9T g.chr13: 63012417 T>A

Intra_9 9T g.chr11: 59241667 A>T

Intra_10 9T g.chr9: 29710104 A>G

Intra_11 9T g.chrX: 76611475 A>T

Intra_12 9T g.chr14: 62637522 T>A

Intra_13 9T g.chrX: 42634291 A>T

Intra_14 9T g.chr4: 185814004 A>T

Intra_15 9T g.chr4: 182088275 A>T

Intra_16 9T g.chr12: 41534083 A>T

Intra_17 9T g.chr15: 79596062 T>A

Intra_18 9T g.chr14: 103730618 T>A

Intra_19 9T g.chr14: 96738041 T>A

Intra_20 9T g.chr13: 96729807 T>A

Intra_21 9T g.chr7: 67921420 T>A

Intra_22 9T g.chr12: 11544214 A>T

Intra_23 9T g.chr2: 79295864 T>A

Intra_24 9T g.chr3: 137440666 T>A

Intra_25 9T g.chr4: 61236737 A>C

Intra_26 9T g.chr11: 25686709 T>A

Intra_27 9T g.chr9: 11174306 A>T

Intra_28 9T g.chr4: 60814990 C>A

Intra_29 9T g.chr7: 9431510 T>A

Intra_30 9T g.chr1: 159712935 T>A

Intra_31 9T g.chr2: 138590367 A>T

Intra_32 9T g.chr5: 110294140 A>T

Intra_33 9T g.chrX: 15910211 A>T

Intra_34 9T g.chr4: 185891404 A>T

Page 31: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

28

Supplementary Table 6 continued

Somatic substitutions in the intergenic regions of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

Intra_35 9T g.chr3: 11946867 T>A

Intra_36 9T g.chr6: 155694030 A>T

Intra_37 9T g.chr1: 80454030 T>G

Intra_38 9T g.chr9: 28756586 G>A

Intra_39 9T g.chr1: 51639186 T>A

Intra_40 9T g.chr6: 49781341 G>A

Intra_41 9T g.chr11: 10929472 A>G

Intra_42 9T g.chr2: 194100276 T>A

Intra_43 9T g.chrX: 113647575 A>G

Intra_44 9T g.chr7: 49373001 C>T

Intra_45 9T g.chr15: 39711715 T>A

Intra_46 9T g.chr7: 1218518 T>A

Intra_47 9T g.chr8: 33126381 T>A

Intra_48 9T g.chr6: 87758958 A>T

Intra_49 9T g.chr9: 5244825 A>T

Intra_50 9T g.chr15: 26147197 A>T

Intra_51 9T g.chrX: 62336318 A>G

Intra_52 9T g.chr3: 5733085 A>T

Intra_53 9T g.chr10: 89844130 A>T

Intra_54 9T g.chr2: 62666724 G>A

Intra_55 9T g.chr7: 17783888 T>A

Intra_56 9T g.chr5: 14580920 A>T

Intra_57 9T g.chr15: 80293494 A>T

Intra_58 9T g.chr7: 6105635 C>A

Intra_59 9T g.chr4: 153229559 A>T

Intra_60 9T g.chr7: 88233656 A>C

Intra_61 9T g.chr7: 9088664 A>T

Intra_62 9T g.chrX: 20609773 C>T

Intra_63 9T g.chr6: 92265045 T>A

Intra_64 9T g.chr2: 173016965 T>A

Intra_65 9T g.chr4: 12804502 T>A

Intra_66 9T g.chr7: 108844198 A>T

Intra_67 9T g.chr13: 89950282 T>A

Intra_68 9T g.chr4: 146977164 C>A

Page 32: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

29

Supplementary Table 6 continued

Somatic substitutions in the intergenic regions of AA-UTUC (9T)

Gene symbol Sample

ID Nucleotide (Genomic)

Intra_69 9T g.chr1: 160444224 A>T

Intra_70 9T g.chr5: 119014089 A>T

Intra_71 9T g.chr8: 97197166 A>T

Intra_72 9T g.chr5: 16275367 C>T

Intra_73 9T g.chr4: 180521786 A>T

Intra_74 9T g.chr19: 31229836 A>T

Intra_75 9T g.chr13: 59405509 A>T

Intra_76 9T g.chr15: 97568375 A>T

Intra_77 9T g.chr5: 108052849 A>T

Intra_78 9T g.chrX: 1694957 T>A

Intra_79 9T g.chr16: 26262204 T>A

Intra_80 9T g.chr9: 38704742 T>A

Intra_81 9T g.chr5: 63025558 T>C

Intra_82 9T g.chr12: 94484571 T>A

Intra_83 9T g.chr14: 29059801 T>A

Intra_84 9T g.chrX: 108137918 T>A

Intra_85 9T g.chr8: 140419290 A>T

Intra_86 9T g.chr13: 44609566 T>A

Intra_87 9T g.chr1: 63428116 T>A

Intra_88 9T g.chr1: 85382956 T>A

Intra_89 9T g.chr19: 39292235 T>A

Intra_90 9T g.chr7: 98172016 T>A

Intra_91 9T g.chr5: 86141968 T>A

Intra_92 9T g.chr10: 113298318 T>A

Intra_93 9T g.chr4: 45618818 T>A

Intra_94 9T g.chr12: 30124771 T>A

Intra_95 9T g.chr1: 68018886 T>A

Page 33: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

30

Supplementary Table 7

Sequence analysis summary of nine exome-sequenced AA-UTUCs

Bases in Target

Region

Bases Mapped to

Target Region

Ave. Depth

Per Targeted

Base

Targeted Bases

with Depth at

Least 1X

Targeted Bases

with Depth at

Least 20X

Somatic Mutations

Identified in

Targeted Region

3T Normal 37804019 37192007 40 92.4 64

Tumor 37804019 31149098 33 92.5 60 437

6T Normal 37804019 19051035 35 94.1 58

Tumor 37804019 19025237 35 94.1 57 1364

9T Normal 37804019 18557621 34 94.5 58

Tumor 37804019 18537281 34 94.9 59 1775

10T Normal 37804019 18674154 35 94.8 59

Tumor 37804019 18877699 35 94.6 58 403

13T Normal 37804019 33166677 35 92.2 61

Tumor 37804019 35941590 37 93.2 64 955

20T Normal 37804019 18924524 35 95.7 61

Tumor 37804019 18851564 35 96.7 55 1383

79T Normal 37804019 39151870 42 93.1 68

Tumor 37804019 35165500 37 92.5 63 1037

80T Normal 37804019 31127604 33 92.7 61

Tumor 37804019 32370161 34 92.8 58 1102

100T Normal 37804019 31670901 33 93.8 59

Tumor 37804019 32050111 34 92.7 62 1477

Page 34: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

31

Supplementary Table 8

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

RP11-790I12.1 3T g.chr4: 70265447 A>T K11X Nonsense

CCNB3 3T g.chrX: 50052086 A>T Q306L Missense

TET1 3T g.chr10: 70332976 A>T H294L Missense

ARHGEF38 3T g.chr4: 106588667 A>T D652V Missense

FAM178A 3T g.chr10: 102684332 A>T K525I Missense

ARSK 3T g.chr5: 94927237 A>T H335L Missense

LRRTM4 3T g.chr2: 77746288 A>T L236X Nonsense

RP4-555N2.2 3T g.chrX: 118355124 A>T Q127L Missense

WDR60 3T g.chr7: 158705715 A>T I544F Missense

OVCH1 3T g.chr12: 29630115 A>T S433T Missense

OR7A17 3T g.chr19: 14991680 A>T V163E Missense

TTN 3T g.chr2: 179466753 A>T D15847E Missense

GRHL1 3T g.chr2: 10101201 A>T Q102L Missense

HIVEP1 3T g.chr6: 12122558 A>T T844S Missense

PKD1L3 3T g.chr16: 71984095 A>T L1102Q Missense

DOCK3 3T g.chr3: 51395271 A>T - Splice site

F8 3T g.chrX: 154221283 A>T Y177N Missense

GRID2 3T g.chr4: 94693392 A>T R923W Missense

OR4C10P 3T g.chr11: 48454529 A>T M57K Missense

SCN8A 3T g.chr12: 52156408 A>T E831V Missense

TSHB 3T g.chr1: 115576652 A>T Y74F Missense

OXSM 3T g.chr3: 25833031 A>T K174X Nonsense

C11orf30 3T g.chr11: 76169243 A>T S88C Missense

CDH11 3T g.chr16: 64982547 A>T Y680N Missense

GK2 3T g.chr4: 80328321 A>T I345N Missense

SLITRK2 3T g.chrX: 144905604 A>T N554T Missense

DMD 3T g.chrX: 32490358 A>T S958T Missense

FMN2 3T g.chr1: 240492706 A>T T1459S Missense

PRRC2C 3T g.chr1: 171491384 A>T L271F Missense

TMEM225 3T g.chr11: 123755985 A>T W50R Missense

AC010872.2 3T g.chr2: 21362956 A>T S873C Missense

KCNU1 3T g.chr8: 36793232 A>T T1082S Missense

PDE1C 3T g.chr7: 31862741 A>T W510R Missense

RP11-513I15.3 3T g.chr6: 34187365 A>T I50L Missense

SCN1A 3T g.chr2: 166901793 A>T S474R Missense

Page 35: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

32

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

TTC3 3T g.chr21: 38538777 A>T T1421S Missense

ACTRT1 3T g.chrX: 127186125 A>T C21S Missense

C4orf44 3T g.chr4: 3265639 A>T R253X Nonsense

FRAS1 3T g.chr4: 79296985 A>G N1082H Missense

ADAM9 3T g.chr8: 38947672 A>T R725S Missense

VPS8 3T g.chr3: 184542427 A>T N3Y Missense

ZNF99 3T g.chr19: 22939396 A>T F925L Missense

C11orf82 3T g.chr11: 82644226 A>T R616W Missense

CEP192 3T g.chr18: 13073006 A>T - Splice site

DGKK 3T g.chrX: 50113475 A>T V1222E Missense

KCMF1 3T g.chr2: 85273323 A>T S175C Missense

KDM6A 3T g.chrX: 44949966 A>T - Splice site

USH2A 3T g.chr1: 216390880 A>T C1002X Nonsense

PDE6B 3T g.chr4: 663865 C>A A845E Missense

HCN1 3T g.chr5: 45303881 C>A D480Y Missense

HDAC4 3T g.chr2: 239988551 C>A C952F Missense

KIAA1377 3T g.chr11: 101828996 G>A E202K Missense

OR2J1 3T g.chr6: 29069377 G>T G220C Missense

LPHN3 3T g.chr4: 62598993 G>A D306N Missense

LPHN3 3T g.chr4: 62598912 G>A E279K Missense

NBEA 3T g.chr13: 35806645 G>C E1889Q Missense

MGAM 3T g.chr7: 141708506 G>T - Splice site

MYO6 3T g.chr6: 76596696 G>T L881F Missense

PCDH18 3T g.chr4: 138449633 T>A - Splice site

ATP10B 3T g.chr5: 160112022 T>C Y69C Missense

KRBA2 3T g.chr17: 8273402 T>A T177S Missense

COL4A3BP 3T g.chr5: 74677020 T>A N670Y Missense

FAT4 3T g.chr4: 126336256 T>A D2046E Missense

NEB 3T g.chr2: 152528966 T>A K1406X Nonsense

PRR14L 3T g.chr22: 32110638 T>A K1063X Nonsense

PHKA2 3T g.chrX: 18944568 T>A - Splice site

TAS2R30 3T g.chr12: 11286345 T>A K167X Nonsense

ZNF274 3T g.chr19: 58723646 T>G L126V Missense

OR52A4 3T g.chr11: 5142647 T>A K54N Missense

Page 36: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

33

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SCN3A 3T g.chr2: 165947154 T>A M1837L Missense

CDC42BPA 3T g.chr1: 227333321 T>A K338X Nonsense

MUC16 3T g.chr19: 8996369 T>A T13735S Missense

PLG 3T g.chr6: 161127436 T>A - Splice site

FAT4 3T g.chr4: 126412143 T>A H4722Q Missense

NPFFR2 3T g.chr4: 73013419 T>A F487I Missense

OR6B3 3T g.chr2: 240984538 T>A S318C Missense

ANKRD34B 3T g.chr5: 79855624 T>A E72V Missense

N6AMT2 3T g.chr13: 21306187 T>C I101V Missense

CDK19 3T g.chr6: 110953242 T>A K213X Nonsense

USP29 3T g.chr19: 57640739 T>A C232X Nonsense

GABRQ 3T g.chrX: 151820129 T>A Y348N Missense

RFTN2 3T g.chr2: 198511279 T>A Q84L Missense

ZFHX4 3T g.chr8: 77763949 T>A C1553S Missense

VTCN1 3T g.chr1: 117695815 T>A M208L Missense

CCDC54 3T g.chr3: 107096438 T>A Y2N Missense

HGD 3T g.chr3: 120394676 T>A E17V Missense

KRBA2 3T g.chr17: 8273480 T>A K151X Nonsense

RP11-4H14.1 3T g.chr3: 140621361 T>A L25Q Missense

SIN3A 3T g.chr15: 75702483 T>A S385C Missense

TRIM3 3T g.chr11: 6478534 T>A K230X Nonsense

ATP13A3 3T g.chr3: 194180616 T>A N104Y Missense

C6orf112 3T g.chr6: 105614442 T>A S58T Missense

OR51J1 3T g.chr11: 5424630 T>A H268Q Missense

STAG2 3T g.chrX: 123200222 T>A L734Q Missense

SYF2 3T g.chr1: 25553933 T>A H156L Missense

BTBD9 3T g.chr6: 38548045 T>A Q328L Missense

GPR110 3T g.chr6: 46989725 T>A Y174F Missense

HSPA13 3T g.chr21: 15746182 T>A E391V Missense

CREBBP 3T g.chr16: 3789633 T>G F1409C Missense

CDH10 3T g.chr5: 24509775 T>A R386W Missense

KDM6A 6T g.chrX: 44922793 A>T S552C Missense

ABCF1 6T g.chr6: 30553654 A>T - Splice site

GRIN3A 6T g.chr9: 104433085 A>T C537S Missense

LRRK2 6T g.chr12: 40716983 A>T D1844V Missense

LRRK2 6T g.chr12: 40742246 A>G M2106V Missense

Page 37: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

34

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

MBD5 6T g.chr2: 149248144 A>T Q1415L Missense

SPTA1 6T g.chr1: 158615301 A>T L1327X Nonsense

SYNE2 6T g.chr14: 64516447 A>T Q2499L Missense

SYNE2 6T g.chr14: 64430615 A>T - Splice site

USH2A 6T g.chr1: 216172244 A>T Y2214X Nonsense

BSN 6T g.chr3: 49693462 A>T H2158L Missense

CNTLN 6T g.chr9: 17457560 A>T K1051N Missense

COL5A1 6T g.chr9: 137710896 A>T - Splice site

COL6A3 6T g.chr2: 238271909 A>T L2017Q Missense

COL6A3 6T g.chr2: 238265910 A>T L43Q Missense

CPS1 6T g.chr2: 211542689 A>T S1501C Missense

DAPK1 6T g.chr9: 90296322 A>T T669S Missense

DAPK1 6T g.chr9: 90321813 A>T K1276M Missense

FLG 6T g.chr1: 152280558 A>T H2268Q Missense

GPR180 6T g.chr13: 95275557 A>T - Splice site

HERC2 6T g.chr15: 28422635 A>T L3062V Missense

LAMC2 6T g.chr1: 183196728 A>T D455V Missense

LRFN5 6T g.chr14: 42356084 A>T T86S Missense

LRP1B 6T g.chr2: 141027893 A>T S4389T Missense

LRRC7 6T g.chr1: 70484405 A>G K404E Missense

NF1 6T g.chr17: 29556961 A>T S987C Missense

NF1 6T g.chr17: 29509652 A>T K286I Missense

NRP1 6T g.chr10: 33475318 A>T C721S Missense

PCNX 6T g.chr14: 71479729 A>T R936X Nonsense

PDCD11 6T g.chr10: 105173795 A>T R420X Nonsense

PIK3CA 6T g.chr3: 178952085 A>T H1047L Missense

PIKFYVE 6T g.chr2: 209190552 A>T Q1006L Missense

PPFIA1 6T g.chr11: 70184484 A>T Q499L Missense

ROS1 6T g.chr6: 117715413 A>T L359Q Missense

ADAMTSL1 6T g.chr9: 84690254 N1456Y Missense

SCN10A 6T g.chr3: 38812891 A>T F160I Missense

SEZ6L 6T g.chr22: 26689017 A>T E247V Missense

SLC22A14 6T g.chr3: 38347971 A>T T152S Missense

TRPM6 6T g.chr9: 77411805 A>T I748N Missense

ABCA9 6T g.chr17: 67028333 A>T L454H Missense

KCNA2 6T g.chr1: 111146380 A>T F342Y Missense

Page 38: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

35

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SCN10A 6T g.chr3: 38812891 A>T F160I Missense

SLC12A2 6T g.chr5: 127484544 A>T L660F Missense

SLC22A14 6T g.chr3: 38347971 A>T T152S Missense

TRPM1 6T g.chr15: 31294399 A>T S1480T Missense

TRPM6 6T g.chr9: 77411805 A>T I748N Missense

ATP1A2 6T g.chr1: 160105048 A>T Q693L Missense

CACNA1S 6T g.chr1: 201046136 A>T L580H Missense

CACNG4 6T g.chr17: 65014347 A>T E88V Missense

KCTD7 6T g.chr7: 66098401 A>T Y95F Missense

SLC17A4 6T g.chr6: 25769280 A>T Q53H Missense

SLC6A16 6T g.chr19: 49793914 A>T M630K Missense

SLC4A1AP 6T g.chr2: 27888030 A>T K297X Nonsense

ATP6V0A4 6T g.chr7: 138444559 A>T C193S Missense

SCN3A 6T g.chr2: 165986519 A>T C951X Nonsense

IGSF10 6T g.chr3: 151155382 C>T V2323M Missense

SCN1A 6T g.chr2: 166848516 C>T G1746R Missense

RIPK1 6T g.chr6: 3105964 C>T P419S Missense

SCN1A 6T g.chr2: 166848516 C>T G1746R Missense

SNTG1 6T g.chr8: 51314780 G>A G13E Missense

SORBS2 6T g.chr4: 186515083 G>A R1031C Missense

TAS1R1 6T g.chr1: 6639019 G>A R634H Missense

SLC6A2 6T g.chr16: 55703518 G>T G106W Missense

KDM6A 6T g.chrX: 44950111 T>A - Splice site

DNAH9 6T g.chr17: 11783478 T>A L3521Q Missense

MYO5C 6T g.chr15: 52556407 T>A N343Y Missense

TP53 6T g.chr17: 7578291 T>A - Splice site

ANKRD26 6T g.chr10: 27368093 T>A - Splice site

IGSF10 6T g.chr3: 151163661 T>A T1370S Missense

SH3TC2 6T g.chr5: 148427447 T>A Q86L Missense

SPG11 6T g.chr15: 44921537 T>A R595S Missense

SPTA1 6T g.chr1: 158653252 T>A E100V Missense

SYNE1 6T g.chr6: 152708252 T>A Q2814H Missense

SYNE1 6T g.chr6: 152738003 T>A S1857C Missense

USH2A 6T g.chr1: 216108136 T>A - Splice site

ALS2 6T g.chr2: 202587806 T>A Y1221F Missense

CHD6 6T g.chr20: 40122545 G>T D378Y Missense

Page 39: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

36

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

CASK 6T g.chrX: 41428922 T>A - Splice site

COL6A3 6T g.chr2: 238287413 T>A Q788L Missense

FAM5B 6T g.chr1: 177250326 T>A S672T Missense

HERC2 6T g.chr15: 28380829 T>A T4009S Missense

ITPR2 6T g.chr12: 26640004 T>A - Splice site

MYH1 6T g.chr17: 10395849 T>A R1902W Missense

MYO5B 6T g.chr18: 47479655 T>A Q576L Missense

RB1CC1 6T g.chr8: 53573219 T>A - Splice site

RPE65 6T g.chr1: 68912474 T>A E55V Missense

RPE65 6T g.chr1: 68904766 T>A - Splice site

SCN2A 6T g.chr2: 166170519 T>A Y428X Nonsense

TG 6T g.chr8: 133961126 T>A V1780E Missense

UBR4 6T g.chr1: 19494518 T>A - Splice site

UBR4 6T g.chr1: 19407930 T>A E5049V Missense

WDR72 6T g.chr15: 53901780 T>A E961V Missense

ABCC11 6T g.chr16: 48226622 T>A S839C Missense

ATP6V1C2 6T g.chr2: 10923387 T>A - Missense

ATP8B2 6T g.chr1: 154310133 T>A C383S Missense

KCNQ5 6T g.chr6: 73904885 T>A S849R Missense

ATP8B2 6T g.chr1: 154313388 T>A C398S Missense

KCNJ1 6T g.chr11: 128709031 T>A T389S Missense

SLC12A3 6T g.chr16: 56918015 T>A I575N Missense

SLC12A5 6T g.chr20: 44664504 T>A V146E Missense

SLC17A1 6T g.chr6: 25813324 T>A - Splice site

SLC17A8 6T g.chr12: 100790131 T>A H204Q Missense

SLC22A8 6T g.chr11: 62763484 T>C Y96C Missense

SLC5A6 6T g.chr2: 27429792 T>A N138Y Missense

SLC6A13 6T g.chr12: 344372 T>A T239S Missense

SCN2A 6T g.chr2: 166170519 T>A Y428X Nonsense

SLC4A5 6T g.chr2: 74483029 T>A K300X Nonsense

KCNQ5 6T g.chr6: 73787616 T>A - Splice site

SLC12A6 6T g.chr15: 34547459 T>A - Splice site

SLC5A5 6T g.chr19: 17999266 T>A - Splice site

DCHS2 6T g. chr4: 155155838 T>A E2867D Missense

ARID1A 6T g.chr1: 27097616 A>T K1069X Nonsense

SETX 6T g. chr9: 135158647 A>T - Splice site

Page 40: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

37

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

CREBBP 6T g. chr16: 3786686 T>A K1509X Nonsense

KDM6A 9T g.chrX: 44928975 A>T Q692L Missense

DNAH9 9T g.chr17: 11757413 A>T R3201W Missense

DNAH9 9T g.chr17: 11511538 A>T Q170H Missense

MYO5C 9T g.chr15: 52571196 A>T M108K Missense

GRIN3A 9T g.chr9: 104449238 A>T L315H Missense

IGSF10 9T g.chr3: 151162912 A>T S1619R Missense

MBD5 9T g.chr2: 149247504 A>T S1202C Missense

SH3TC2 9T g.chr5: 148407581 A>T Y572N Missense

SPTA1 9T g.chr1: 158653289 A>T - Splice site

SYNE2 9T g.chr14: 64465690 A>T T1138S Missense

USH2A 9T g.chr1: 215847599 A>T W4552R Missense

ARMC3 9T g.chr10: 23248385 A>T Y140F Missense

CASK 9T g.chrX: 41485949 A>T V308E Missense

COL5A1 9T g.chr9: 137734011 A>T K1793N Missense

CPS1 9T g.chr2: 211476878 A>T Q816L Missense

DAPK1 9T g.chr9: 90312118 A>T - Splice site

FLG2 9T g.chr1: 152323217 A>T Y2349N Missense

GPR180 9T g.chr13: 95275532 A>T E355V Missense

ADAMTSL1 9T g.chr9: 84592659 A>T H664L Missense

HMCN1 9T g.chr1: 186055437 A>T I2982F Missense

HMCN1 9T g.chr1: 186151407 A>T K5468X Nonsense

LAMA1 9T g.chr18: 6985242 A>T L1885Q Missense

LAMC2 9T g.chr1: 183209276 A>C E1057D Missense

MACF1 9T g.chr1: 39782061 A>T - Splice site

MACF1 9T g.chr1: 39910443 A>T E4957V Missense

MACF1 9T g.chr1: 39833821 A>T - Splice site

NF1 9T g.chr17: 29509547 A>G D251G Missense

PCNX 9T g.chr14: 71568795 A>T H1893L Missense

PDCD11 9T g.chr10: 105183303 A>C Q884P Missense

PDE3B 9T g.chr11: 14852316 A>T K627M Missense

PPFIA1 9T g.chr11: 70201892 A>T Q821H Missense

RELN 9T g.chr7: 103206814 A>T M1598K Missense

RELN 9T g.chr7: 103214542 A>C - Splice site

RELN 9T g.chr7: 103163883 A>T I2482K Missense

RIPK1 9T g.chr6: 3085645 A>T - Splice site

Page 41: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

38

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SCN10A 9T g.chr3: 38763750 A>T - Splice site

SLC22A14 9T g.chr3: 38347882 A>T Q122L Missense

SNTG1 9T g.chr8: 51449364 A>T S226C Missense

SPEG 9T g.chr2: 220352936 A>T I2588F Missense

SRCAP 9T g.chr16: 30740854 A>T T2030S Missense

TRPM6 9T g.chr9: 77397732 A>T I986K Missense

TRPM6 9T g.chr9: 77415213 A>T M732K Missense

ABCD2 9T g.chr12: 39980033 A>T H571Q Missense

ATP11A 9T g.chr13: 113490562 A>T - Splice site

ATP1A4 9T g.chr1: 160141428 A>G T579A Missense

KCNH7 9T g.chr2: 163302648 A>T N478K Missense

KCNJ6 9T g.chr21: 38997661 A>T Y358N Missense

SCN10A 9T g.chr3: 38763750 A>T - Splice site

SLC17A3 9T g.chr6: 25862661 A>T C35S Missense

SLC19A3 9T g.chr2: 228567028 A>T C3S Missense

SLC22A14 9T g.chr3: 38347882 A>T Q122L Missense

SLC47A2 9T g.chr17: 19584847 A>T M448K Missense

SLC7A11 9T g.chr4: 139144472 A>T V176E Missense

TRPM6 9T g.chr9: 77415213 A>T M732K Missense

TRPM6 9T g.chr9: 77397732 A>T I986K Missense

ATP2B3 9T g.chrX: 152818503 A>G I612V Missense

ATP6V0A1 9T g.chr17: 40620122 A>T - Splice site

ATP6V0A4 9T g.chr7: 138433945 A>T Y383N Missense

ATP8A1 9T g.chr4: 42571217 A>T V434D Missense

CACNA1F 9T g.chrX: 49070726 A>T Y1212N Missense

KCNB2 9T g.chr8: 73849709 A>T S707C Missense

SLC20A1 9T g.chr2: 113410198 A>T Q66H Missense

SLC25A14 9T g.chrX: 129492624 A>T Q170L Missense

SLC45A2 9T g.chr5: 33947323 A>T L438Q Missense

TRPM4 9T g.chr19: 49703872 A>T K928M Missense

KCNH8 9T g.chr3: 19491657 A>T R479X Nonsense

SLC10A4 9T g.chr4: 48487157 A>T K267X Nonsense

AQP10 9T g.chr1: 154295715 A>T - Splice site

SLC22A3 9T g.chr6: 160831759 A>T - Splice site

TRPC5 9T g.chrX: 111155885 A>T C178X Nonsense

USH2A 9T g.chr1: 216011333 C>A R3124I Missense

Page 42: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

39

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

APOB 9T g.chr2: 21232009 C>T M2577I Missense

SLC6A17 9T g.chr1: 110717451 C>T R208X Nonsense

FLG 9T g.chr1: 152282665 G>T T1566N Missense

SCN2A 9T g.chr2: 166246035 G>A V1907M Missense

SORBS2 9T g.chr4: 186508806 G>A P1093L Missense

SCN2A 9T g.chr2: 166246035 G>A V1907M Missense

SLC5A5 9T g.chr19: 17985491 G>T W165L Missense

DCHS2 9T g.chr4: 155253631 T>A - Splice site

MYO5C 9T g.chr15: 52532060 T>A K858M Missense

TP53 9T g.chr17: 7579329 T>A K120X Nonsense

ANKRD26 9T g.chr10: 27342321 T>A - Splice site

ATRX 9T g.chrX: 76944313 T>A - Splice site

IGSF10 9T g.chr3: 151162889 T>A Q1627L Missense

LRRK2 9T g.chr12: 40646807 T>A L426X Nonsense

SCN1A 9T g.chr2: 166897815 T>A N770Y Missense

SPG11 9T g.chr15: 44888412 T>A K1435X Nonsense

SYNE1 9T g.chr6: 152754941 T>A S1484C Missense

SYNE1 9T g.chr6: 152652963 T>A Q4286L Missense

ALS2 9T g.chr2: 202592003 T>A - Splice site

ARMC3 9T g.chr10: 23295742 T>A S23T Missense

COL6A3 9T g.chr2: 238243488 T>A S3004C Missense

CPS1 9T g.chr2: 211469972 T>A - Splice site

CUBN 9T g.chr10: 16893316 T>A N3194I Missense

DDX60 9T g.chr4: 169195161 T>A Y793F Missense

EPHA7 9T g.chr6: 94120446 T>A Y202F Missense

HERC2 9T g.chr15: 28380628 T>A R4076X Nonsense

IKBKAP 9T g.chr9: 111679834 T>A E286V Missense

IKBKAP 9T g.chr9: 111653686 T>A - Splice site

ITPR2 9T g.chr12: 26640123 T>A Y1811F Missense

ITPR2 9T g.chr12: 26648195 T>A - Splice site

KRTAP10-11 9T g.chr21: 46066685 T>A C104S Missense

LRFN5 9T g.chr14: 42360707 T>A I547N Missense

MAGI1 9T g.chr3: 65361589 T>A - Splice site

MLL3 9T g.chr7: 151877842 T>A Q2368L Missense

MYH1 9T g.chr17: 10397768 T>A - Splice site

PIK3C2A 9T g.chr11: 17190802 T>A S163C Missense

Page 43: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

40

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

RB1CC1 9T g.chr8: 53574107 T>A H449L Missense

RELN 9T g.chr7: 103207089 T>G K1569T Missense

RELN 9T g.chr7: 103124129 T>A Q3384H Missense

RPE65 9T g.chr1: 68904738 T>A K295N Missense

SCN2A 9T g.chr2: 166245507 T>A C1731S Missense

SCN2A 9T g.chr2: 166245286 T>A L1657H Missense

SEZ6L 9T g.chr22: 26736587 T>A M734K Missense

SNTG1 9T g.chr8: 51621455 T>A Y401N Missense

SORBS2 9T g.chr4: 186536279 T>A I892F Missense

STAT2 9T g.chr12: 56742590 T>A - Splice site

UBR4 9T g.chr1: 19518874 T>A - Splice site

ABCA10 9T g.chr17: 67187398 T>A I644F Missense

ABCA8 9T g.chr17: 66928642 T>A H195L Missense

ABCB1 9T g.chr7: 87196278 T>A Y118F Missense

ABCC4 9T g.chr13: 95686889 T>A E1280D Missense

ATP12A 9T g.chr13: 25274951 T>A M591K Missense

ATP5E 9T g.chr20: 57605482 T>A Y12F Missense

ATP6V1H 9T g.chr8: 54682222 T>A R377S Missense

CATSPER1 9T g.chr11: 65793850 T>A M1L Missense

CATSPER4 9T g.chr1: 26527439 T>A L369H Missense

CATSPERB 9T g.chr14: 92102815 T>A S566C Missense

KCNH1 9T g.chr1: 211280614 T>A Q62L Missense

SCN1A 9T g.chr2: 166897815 T>A N770Y Missense

SCN2A 9T g.chr2: 166245507 T>A C1731S Missense

SCN2A 9T g.chr2: 166245286 T>A L1657H Missense

SCN8A 9T g.chr12: 52200293 T>A Y1675N Missense

SCN9A 9T g.chr2: 167055808 T>A M1770L Missense

SLC11A2 9T g.chr12: 51389416 T>A Q329L Missense

SLC38A11 9T g.chr2: 165755161 T>A H314L Missense

TRPA1 9T g.chr8: 72987592 T>A Q18L Missense

TRPC4 9T g.chr13: 38211320 T>A Q890L Missense

TRPM6 9T g.chr9: 77442805 T>A M244L Missense

ABCG1 9T g.chr21: 43711705 T>A M543K Missense

ATP13A5 9T g.chr3: 193029642 T>A Q803L Missense

ATP7B 9T g.chr13: 52539118 T>A T587S Missense

KCNB2 9T g.chr8: 73848740 T>A Y384N Missense

Page 44: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

41

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

KCNB2 9T g.chr8: 73480015 T>A S16T Missense

KCND2 9T g.chr7: 119914973 T>A L96Q Missense

KCNH6 9T g.chr17: 61619736 T>A Y697N Missense

KCNK10 9T g.chr14: 88652253 T>A S420C Missense

KCTD1 9T g.chr18: 24039649 T>A N792Y Missense

SLC1A2 9T g.chr11: 35327669 T>A M228L Missense

SLC25A12 9T g.chr2: 172666140 T>A Q18L Missense

SLC30A8 9T g.chr8: 118174094 T>A S230R Missense

ABCB4 9T g.chr7: 87046833 T>A - Splice site

ATP13A4 9T g.chr3: 193232515 T>A E69V Missense

SLC39A6 9T g.chr18: 33703455 T>A - Splice site

CDH10 9T g.chr5: 24491891 T>C H557R Missense

KDM6A 10T g.chrX: 44966653 A>T - Splice site

DCHS2 10T g.chr4: 155157973 A>T Y2156N Missense

SCN1A 10T g.chr2: 166912988 A>T C136S Missense

COL5A1 10T g.chr9: 137593015 A>T - Splice site

CUBN 10T g.chr10: 16919073 A>T C2977S Missense

HMCN1 10T g.chr1: 186121990 A>T E5002V Missense

MACF1 10T g.chr1: 39896469 A>T R4182S Missense

MYH2 10T g.chr17: 10428845 A>T L1487Q Missense

PIKFYVE 10T g.chr2: 209200622 A>T - Splice site

SNTG1 10T g.chr8: 51362289 A>T - Splice site

TNKS 10T g.chr8: 9565879 A>T - Splice site

ABCC9 10T g.chr12: 22089572 A>T Y13N Missense

SCN1A 10T g.chr2: 166912988 A>T C136S Missense

SLC13A1 10T g.chr7: 122757637 A>T L513H Missense

SLC36A1 10T g.chr5: 150847473 A>T Q237L Missense

ATP6V0A4 10T g.chr7: 138444626 A>T - Splice site

MBD5 10T g.chr2: 149247139 C>T P1080L Missense

FAM5B 10T g.chr1: 177226406 C>G N185K Missense

HERC2 10T g.chr15: 28436146 C>T D2872N Missense

ATP2B4 10T g.chr1: 203678616 C>T P582L Missense

HMCN1 10T g.chr1: 186024775 G>C M2371I Missense

MACF1 10T g.chr1: 39549979 G>A R30Q Missense

KCNB1 10T g.chr20: 47989776 G>T P774H Missense

MYO5C 10T g.chr15: 52511984 T>A S1253C Missense

Page 45: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

42

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

TP53 10T g.chr17: 7577100 T>A R280X Nonsense

ATRX 10T g.chrX: 76939787 T>A S321C Missense

LRRK2 10T g.chr12: 40668734 T>A L627Q Missense

SH3TC2 10T g.chr5: 148408046 T>A R417X Nonsense

SPTA1 10T g.chr1: 158605793 T>C D1781G Missense

ALS2 10T g.chr2: 202608999 T>A R718W Missense

CUBN 10T g.chr10: 16957975 T>A H2352L Missense

STAT2 10T g.chr12: 56750063 T>A E46D Missense

TAS1R1 10T g.chr1: 6639211 T>A L698Q Missense

TMEM132D 10T g.chr12: 129566383 T>A Q615L Missense

WDR72 10T g.chr15: 53997395 T>A N380Y Missense

ZNF608 10T g.chr5: 123983113 T>A Q988H Missense

SCN11A 10T g.chr3: 38888399 T>C Y1721C Missense

TRPM3 10T g.chr9: 73225527 T>A - Splice site

SETX 10T g.chr9: 135205772 T>A N405Y Missense

C3orf35 13T g.chr3: 37476357 A>T K83N Missense

SERAC1 13T g.chr6: 158569947 A>T V102E Missense

ITGA11 13T g.chr15: 68605190 A>T L965Q Missense

MRPS5P3 13T g.chr5: 126479587 A>G C15G Missense

GPKOW 13T g.chrX: 48972313 A>T L358M Missense

RP11-543B16.2 13T g.chr1: 211380761 A>T R61W Missense

MYOM3 13T g.chr1: 24413244 A>T L563Q Missense

KBTBD5 13T g.chr3: 42730146 A>T Y453F Missense

PARVG 13T g.chr22: 44602220 A>T S304C Missense

KCNG4 13T g.chr16: 84270838 A>T L85Q Missense

REL 13T g.chr2: 61118816 A>T - Splice site

ABCB8 13T g.chr7: 150733629 A>T - Splice site

NLRC5 13T g.chr16: 57079358 A>T - Splice site

TNR 13T g.chr1: 175360422 A>T - Splice site

RBBP7 13T g.chrX: 16863081 A>T - Splice site

MYH14 13T g.chr19: 50812935 A>T Q1967L Missense

SIN3B 13T g.chr19: 16962260 A>T Q255L Missense

PIK3R6 13T g.chr17: 8722405 A>T F663Y Missense

RNF112 13T g.chr17: 19319213 A>T M541L Missense

SH2D4B 13T g.chr10: 82394257 A>T H400L Missense

KIF1A 13T g.chr2: 241664775 A>T M1289K Missense

Page 46: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

43

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

APOB 13T g.chr2: 21235181 A>T L1520Q Missense

RP11-302I18.2 13T g.chr1: 220487703 A>T L17H Missense

ZFYVE19 13T g.chr15: 41105034 A>T R322W Missense

PANX1 13T g.chr11: 93862626 A>T I50F Missense

NPTX1 13T g.chr17: 78444723 A>T C397S Missense

RP11-432I13.3 13T g.chr10: 45742206 A>T W56R Missense

KCNMA1 13T g.chr10: 78846244 A>T - Splice site

USH2A 13T g.chr1: 216465699 A>T L553X Nonsense

ODF2 13T g.chr9: 131235199 A>T K173X Nonsense

WNT2 13T g.chr7: 116918365 A>T C309X Nonsense

SHROOM4 13T g.chrX: 50341538 A>T - Splice site

TET1 13T g.chr10: 70404793 A>T E769D Missense

MAST1 13T g.chr19: 12976569 A>T T615S Missense

PTGFR 13T g.chr1: 78963602 A>T Y281F Missense

CDH23 13T g.chr10: 73501502 A>T I1602F Missense

C1orf129 13T g.chr1: 170931064 A>T N108Y Missense

DNTTIP1 13T g.chr20: 44439543 A>T E272V Missense

DMD 13T g.chrX: 31497171 A>G L2866P Missense

HS2ST1 13T g.chr1: 87570369 A>T K354M Missense

TGFBI 13T g.chr5: 135394898 A>T S600C Missense

GTF2H3 13T g.chr12: 124139477 A>C K165Q Missense

CXCR2P1 13T g.chr2: 218925611 A>T L37Q Missense

ABCG8 13T g.chr2: 44099141 A>T S331C Missense

UBXN2A 13T g.chr2: 24199914 A>T S86C Missense

ARHGEF10L 13T g.chr1: 17953828 A>T M472L Missense

DNM3 13T g.chr1: 172037957 A>T - Splice site

PTEN 13T g.chr10: 89720654 A>T K269X Nonsense

CHD2 13T g.chr15: 93486296 A>T Q350H Missense

ARID1B 13T g.chr6: 157528178 A>G H1950R Missense

SORL1 13T g.chr11: 121454205 C>T R1207X Nonsense

COL27A1 13T g.chr9: 117002747 C>T R939C Missense

COPS7B 13T g.chr2: 232653609 C>G P110R Missense

SLC24A4 13T g.chr14: 92920322 C>T T303I Missense

NKX2-2 13T g.chr20: 21494154 C>A A52S Missense

DUSP27 13T g.chr1: 167097723 C>T Q1119X Nonsense

SLC18A1 13T g.chr8: 20038417 C>T R20Q Missense

Page 47: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

44

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

HDAC7 13T g.chr12: 48191205 G>C P180R Missense

OR2AG2 13T g.chr11: 6789911 G>C S93C Missense

RP11-572H4.2 13T g.chr9: 31253974 G>A V26M Missense

C1orf54 13T g.chr1: 150248149 G>C - Splice site

CABYR 13T g.chr18: 21736705 G>A E414K Missense

ROBO4 13T g.chr11: 124766568 T>A - Splice site

PTPN3 13T g.chr9: 112190951 T>A K260X Nonsense

WDR45 13T g.chrX: 48934338 T>A K104X Nonsense

ATRX 13T g.chrX: 76875968 T>A I1723F Missense

MUC5B 13T g.chr11: 1155619 T>A C145S Missense

ABCC4 13T g.chr13: 95673932 T>A Y1292F Missense

GP2 13T g.chr16: 20334208 T>A Q213L Missense

EIF2C2 13T g.chr8: 141549543 T>A H682L Missense

BZRAP1 13T g.chr17: 56386657 T>A S1326C Missense

HAPLN4 13T g.chr19: 19371792 T>A Q105L Missense

P2RX5 13T g.chr17: 3593938 T>A S133C Missense

RPS6KA2 13T g.chr6: 166836808 T>A Q568L Missense

C15orf38 13T g.chr15: 90451608 T>A I69F Missense

AC005400.1 13T g.chr7: 35187404 T>A - Splice site

MED12 13T g.chrX: 70356734 T>A Y1802X Nonsense

TECPR1 13T g.chr7: 97861181 T>A T637S Missense

MYF5 13T g.chr12: 81111087 T>A M82K Missense

SLC7A3 13T g.chrX: 70149486 T>A Y121F Missense

RP11-226L15.1 13T g.chr1: 159990364 T>A V51E Missense

ABP1 13T g.chr7: 150554807 T>G C417G Missense

PTCHD3 13T g.chr10: 27702383 T>A N266I Missense

TEK 13T g.chr9: 27218797 T>A W1029R Missense

UNC80 13T g.chr2: 210840906 T>A L2818H Missense

GRIPAP1 13T g.chrX: 48837713 T>A K615M Missense

NEGR1 13T g.chr1: 72400854 T>A Q106L Missense

ATG2A 13T g.chr11: 64664084 T>A Q1759H Missense

AJ239329.1 13T g.chrX: 37351507 T>A H47Q Missense

SMC1A 13T g.chrX: 53436189 T>A E450V Missense

ANKRD31 13T g.chr5: 74484441 T>A S481C Missense

DUSP27 13T g.chr1: 167095806 T>A W480R Missense

DMKN 13T g.chr19: 35990861 T>A - Splice site

Page 48: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

45

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

C2CD3 13T g.chr11: 73844604 T>A - Splice site

SPG11 13T g.chr15: 44914098 T>A K827X Nonsense

HDAC9 13T g.chr7: 18767270 T>A M600K Missense

CHD1 13T g.chr5: 98228232 T>A - Splice site

DCHS2 13T g. chr4: 155156196 T>A E2748V Missense

KDM6A 13T g.chrX: 44949979 A>T K1250X Nonsense

ADAMTSL1 13T g.chr9: 18770776 A>T - Splice site

KDM6A 20T g.chrX: 44918297 A>T R308W Missense

DNAH9 20T g.chr17: 11603157 A>T E1661V Missense

LRRK2 20T g.chr12: 40716170 A>T E1789D Missense

SCN1A 20T g.chr2: 166896089 A>T F800L Missense

SYNE2 20T g.chr14: 64520317 A>T K3229I Missense

APOB 20T g.chr2: 21231467 A>T L2758Q Missense

APOB 20T g.chr2: 21252632 A>G M499T Missense

APOB 20T g.chr2: 21230211 A>T Y3177N Missense

COL6A3 20T g.chr2: 238275367 A>T H1821Q Missense

DAPK1 20T g.chr9: 90272978 A>T N620I Missense

FAM5B 20T g.chr1: 177249988 A>T Y559F Missense

LRFN5 20T g.chr14: 42356415 A>T K196M Missense

LRFN5 20T g.chr14: 42361138 A>T T691S Missense

MYH1 20T g.chr17: 10411976 A>T F534Y Missense

PCNX 20T g.chr14: 71502846 A>T H1280L Missense

PDCD11 20T g.chr10: 105183312 A>T E887V Missense

PDE3B 20T g.chr11: 14865386 A>T R778S Missense

PIKFYVE 20T g.chr2: 209200846 A>T Q1481L Missense

SCN2A 20T g.chr2: 166226646 A>T D1229V Missense

SRCAP 20T g.chr16: 30747572 A>T K2261X Nonsense

SRCAP 20T g.chr16: 30748505 A>T S2382C Missense

TNKS 20T g.chr8: 9590849 A>T L736F Missense

TRPM6 20T g.chr9: 77377164 A>T C1475S Missense

TRPM6 20T g.chr9: 77391042 A>T F10I Missense

WDR72 20T g.chr15: 53889332 A>T L1031Q Missense

ATP10D 20T g.chr4: 47578794 A>T Y1124F Missense

ATP11B 20T g.chr3: 182605478 A>T L940F Missense

ATP6V0A2 20T g.chr12: 124233220 A>T K608M Missense

CACNG2 20T g.chr22: 36960768 A>T F201Y Missense

Page 49: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

46

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

KCNAB2 20T g.chr1: 6158599 A>T N357Y Missense

KCNQ5 20T g.chr6: 73751755 A>T R196W Missense

SCN4B 20T g.chr11: 118014760 A>T V84E Missense

SLC2A2 20T g.chr3: 170727818 A>T M142K Missense

SLC35A1 20T g.chr6: 88187080 A>T - Splice site

SLC44A1 20T g.chr9: 108127919 A>T - Splice site

SLC4A1 20T g.chr17: 42337223 A>T L188Q Missense

ABCA12 20T g.chr2: 215823144 A>T S1992T Missense

ABCA4 20T g.chr1: 94508359 A>G S1096P Missense

ABCC8 20T g.chr11: 17432181 A>T L859Q Missense

ABCD2 20T g.chr12: 39998687 A>T W428R Missense

AQP4 20T g.chr18: 24442173 A>T S140R Missense

ATP11B 20T g.chr3: 182583486 A>T - Splice site

CACNG7 20T g.chr19: 54445336 A>T E206V Missense

NALCN 20T g.chr13: 101714399 A>T L1559Q Missense

SLC38A8 20T g.chr16: 84050785 A>T S305T Missense

SLC45A1 20T g.chr1: 8390964 A>T S471C Missense

SLC6A15 20T g.chr12: 85260867 A>T L534Q Missense

SLC7A3 20T g.chrX: 70147714 A>T L326H Missense

SLC8A2 20T g.chr19: 47944660 A>T Y601N Missense

TRPM6 20T g.chr9: 77377164 A>T C1475S Missense

TRPM8 20T g.chr2: 234894401 A>T D944V Missense

ATP11A 20T g.chr13: 113470418 A>T K155X Nonsense

KCTD5 20T g.chr16: 2752423 A>T K207X Nonsense

ABCG1 20T g.chr21: 43645779 A>T - Splice site

SLC25A48 20T g.chr5: 135186112 A>T - Splice site

DNAH9 20T g.chr17: 11757708 C>T A3299V Missense

USH2A 20T g.chr1: 216363588 C>A G1458V Missense

COL6A3 20T g.chr2: 238285856 C>A V877L Missense

IKBKAP 20T g.chr9: 111640407 C>A K1241N Missense

RB1CC1 20T g.chr8: 53540721 C>A V1503L Missense

SEZ6L 20T g.chr22: 26702041 C>T T482M Missense

ZNF521 20T g.chr18: 22807406 C>T R159H Missense

SLC9A3 20T g.chr5: 484719 C>T R283H Missense

KCNF1 20T g.chr2: 11053888 C>T R446C Missense

KCNJ4 20T g.chr22: 38823596 C>T R181Q Missense

Page 50: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

47

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SLC29A3 20T g.chr10: 73082758 C>T R83C Missense

SEZ6L 20T g.chr22: 26771655 G>C - Splice site

AQP11 20T g.chr11: 77301461 G>A G142S Missense

MYO5C 20T g.chr15: 52529767 T>A Q927L Missense

TP53 20T g.chr17: 7578208 T>A H214L Missense

ANKRD26 20T g.chr10: 27324267 T>A R1038X Nonsense

ATRX 20T g.chrX: 76952194 T>A - Splice site

COL11A1 20T g.chr1: 103488300 T>A - Splice site

IGSF10 20T g.chr3: 151163277 T>A M1498L Missense

LRRK2 20T g.chr12: 40734258 T>A - Splice site

SCN1A 20T g.chr2: 166901676 T>A E513D Missense

SH3TC2 20T g.chr5: 148386493 T>A Y1209F Missense

SPG11 20T g.chr15: 44890850 T>A S1291C Missense

SPTA1 20T g.chr1: 158645938 T>A T369S Missense

SYNE1 20T g.chr6: 152646266 T>A K5204X Nonsense

USH2A 20T g.chr1: 216371708 T>A M1344L Missense

CASK 20T g.chrX: 41394169 T>A K733M Missense

CPS1 20T g.chr2: 211456674 T>A V362D Missense

CUBN 20T g.chr10: 17171671 T>A R32X Nonsesne

CUBN 20T g.chr10: 17157479 T>A Q237H Missense

DDX60 20T g.chr4: 169177011 T>A L1136F Missense

EPHA7 20T g.chr6: 94120890 T>A - Splice site

ITPR2 20T g.chr12: 26875388 T>A N156I Missense

KRTAP10-11 20T g.chr21: 46066685 T>A C104S Missense

LAMA1 20T g.chr18: 6958623 T>A E2606V Missense

LAMC2 20T g.chr1: 183196772 T>A C470S Missense

LRFN5 20T g.chr14: 42357204 T>A L459H Missense

MAGI1 20T g.chr3: 65433698 T>A - Splice site

MYH2 20T g.chr17: 10443262 T>A E377V Missense

MYH2 20T g.chr17: 10440607 T>G K614Q Missense

NRP1 20T g.chr10: 33552708 T>A Y175F Missense

PIK3C2A 20T g.chr11: 17139093 T>A Q1054L Missense

ROS1 20T g.chr6: 117609792 T>A R2303W Missense

RPE65 20T g.chr1: 68910270 T>G T147P Missense

SCN10A 20T g.chr3: 38797361 T>A H460L Missense

STAT2 20T g.chr12: 56740405 T>A K622M Missense

Page 51: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

48

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

TAS2R1 20T g.chr5: 9630054 T>A N31Y Missense

TG 20T g.chr8: 133882039 T>A L81Q Missense

TMEM132D 20T g.chr12: 129566579 T>A - Splice site

TRPM6 20T g.chr9: 77397384 T>A S1035C Missense

ZNF608 20T g.chr5: 123980182 T>A E1293V Missense

ABCA10 20T g.chr17: 67183857 T>A L765F Missense

CACNA1S 20T g.chr1: 201030546 T>A Y1035F Missense

CACNA1S 20T g.chr1: 201009109 T>A E1824D Missense

KCNA10 20T g.chr1: 111060436 T>A Q325L Missense

KCNH8 20T g.chr3: 19389326 T>A L227H Missense

KCNJ12 20T g.chr17: 21319391 T>A L246Q Missense

SCN1A 20T g.chr2: 166901676 T>A E513D Missense

SLC12A8 20T g.chr3: 124826598 T>A R478W Missense

SLC27A6 20T g.chr5: 128302153 T>A V108E Missense

SLC30A4 20T g.chr15: 45814216 T>A R113W Missense

SLC34A2 20T g.chr4: 25674799 T>A L380Q Missense

SLC6A7 20T g.chr5: 149580653 T>A - Splice site

TRPC3 20T g.chr4: 122820799 T>A N839Y Missense

ABCA8 20T g.chr17: 66925823 T>A Y273F Missense

ATP13A4 20T g.chr3: 193185209 T>A Q337L Missense

CATSPER1 20T g.chr11: 65787795 T>A K686M Missense

KCNQ2 20T g.chr20: 62038530 T>A T696S Missense

SCN10A 20T g.chr3: 38797361 T>A H460L Missense

SLC12A3 20T g.chr16: 56913515 T>A L466H Missense

SLC13A3 20T g.chr20: 45224960 T>A Q188L Missense

SLC19A3 20T g.chr2: 228563932 T>A M167L Missense

SLC1A7 20T g.chr1: 53580467 T>A I132F Missense

SLC26A9 20T g.chr1: 205904913 T>A R12S Missense

SLC7A8 20T g.chr14: 23612372 T>A T184S Missense

SLC8A3 20T g.chr14: 70633441 T>A R567W Missense

SLC8A3 20T g.chr14: 70517824 T>A - Splice site

SLC9A9 20T g.chr3: 143185967 T>A M461L Missense

ABCC8 20T g.chr11: 17470203 T>A K398X Nonsense

ATP7B 20T g.chr13: 52511764 T>A K1251X Nonsense

SLCO1B1 20T g.chr12: 21358925 T>A C485X Nonsense

SLC12A3 20T g.chr16: 56904009 T>A - Splice site

Page 52: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

49

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SLC14A2 20T g.chr18: 43258991 T>A - Splice site

SLC7A8 20T g.chr14: 23597255 T>A K472X Nonsense

DCHS2 20T g.chr4: 155161818 A>T F1955L Missense

CHD6 20T g.chr20: 40162122 T>A S41C Missense

DAGLA 79T g.chr11: 61496472 A>T K281X Nonsense

MGAT5B 79T g.chr17: 74909775 A>T Q350H Missense

MYOZ3 79T g.chr5: 150042513 A>T K4X Nonsense

ASPSCR1 79T g.chr17: 79954475 A>T Q229L Missense

GIMAP5 79T g.chr7: 150439885 A>T S220C Missense

ABCC12 79T g.chr16: 48134841 A>T S994T Missense

B4GALNT3 79T g.chr12: 665985 A>T Q778L Missense

PRICKLE4 79T g.chr6: 41753148 A>T Q151L Missense

KBTBD5 79T g.chr3: 42730169 A>T M461L Missense

AUTS2 79T g.chr7: 70255695 A>T T1165S Missense

FAM135B 79T g.chr8: 139189617 A>T L359H Missense

TRAF3IP1 79T g.chr2: 239237760 A>T Q231L Missense

XIRP2 79T g.chr2: 168103624 A>T T1908S Missense

SLC24A3 79T g.chr20: 19664954 A>T M346L Missense

AATK 79T g.chr17: 79094987 A>T Y917N Missense

SOX11 79T g.chr2: 5832960 A>T E36V Missense

TIPRL 79T g.chr1: 168154055 A>G Y108C Missense

APOH 79T g.chr17: 64216826 A>C F150L Missense

GAPVD1 79T g.chr9: 128067358 A>T Q349L Missense

MMRN1 79T g.chr4: 90830441 A>T Y213F Missense

HHATL 79T g.chr3: 42735194 A>T L388Q Missense

PACSIN1 79T g.chr6: 34498289 A>T K321M Missense

RP13-228J13.10 79T g.chrX: 154539707 A>T N7Y Missense

DPP3 79T g.chr11: 66264867 A>T - Splice site

PAEP 79T g.chr9: 138457182 A>G - Splice site

COL11A1 79T g.chr1: 103469998 A>T - Splice site

USP47 79T g.chr11: 11901748 A>T R16X Nonsense

CEP250 79T g.chr20: 34061712 A>T Q469L Missense

ACAN 79T g.chr15: 89417121 A>T Q2461L Missense

FLYWCH1 79T g.chr16: 2979894 A>T M70L Missense

MYO18B 79T g.chr22: 26177749 A>T S754C Missense

STAB1 79T g.chr3: 52551593 A>T S1531C Missense

Page 53: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

50

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

CUBN 79T g.chr10: 17024572 A>T C1536S Missense

CDH16 79T g.chr16: 66946629 A>T L407H Missense

ACAN 79T g.chr15: 89398382 A>T S856C Missense

MOXD1 79T g.chr6: 132619050 A>T L518H Missense

ADAMTS13 79T g.chr9: 136291398 A>T S207C Missense

C7 79T g.chr5: 40959685 A>T S542C Missense

IL22 79T g.chr12: 68647218 A>T L4Q Missense

CDH26 79T g.chr20: 58560054 A>T - Splice site

TNFRSF9 79T g.chr1: 7993305 A>T L199X Nonsense

NAV3 79T g.chr12: 78598880 A>T K2312X Nonsense

FMN2 79T g.chr1: 240256570 A>T Q387H Missense

DACH1 79T g.chr13: 72038864 A>T - Splice site

ARID1A 79T g.chr1: 27100818 A>T - Splice site

PTEN 79T g.chr10: 89692768 A>T - Splice site

SET 79T g.chr9: 131455250 A>T K174I Missense

CHD9 79T g.chr16: 53260328 A>G I649M Missense

ASPHD1 79T g.chr16: 29913022 C>G P244A Missense

EBF1 79T g.chr5: 158158119 C>A K361N Missense

COL12A1 79T g.chr6: 75804894 C>A - Splice site

MLL2 79T g.chr12: 49416411 C>A E5434X Nonsense

MAGEA11 79T g.chrX: 148794916 G>T - Splice site

DPM2 79T g.chr9: 130697930 G>A P20S Missense

METT10D 79T g.chr17: 2344819 G>T L255M Missense

CAD 79T g.chr2: 27460414 G>T - Splice site

TTLL8 79T g.chr22: 50480200 G>A P227L Missense

HRAS 79T g.chr11: 533875 G>T Q61K Missense

DNAH5 79T g.chr5: 13866393 T>A - Splice site

CHD6 79T g.chr20: 40065941 G>T D1347E Missense

COL6A3 79T g.chr2: 238275672 T>A K1720X Nonsense

MAML2 79T g.chr11: 95825260 T>A Q645H Missense

BPI 79T g.chr20: 36963945 T>A S17T Missense

ANKMY1 79T g.chr2: 241421606 T>G Q871P Missense

SCN1A 79T g.chr2: 166170484 A>T I417L Missense

ACVRL1 79T g.chr12: 52306296 T>A L13Q Missense

TLN1 79T g.chr9: 35713279 T>A Q1089L Missense

SIRPB1 79T g.chr20: 1600557 T>A S12C Missense

Page 54: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

51

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

GADL1 79T g.chr3: 30898599 T>A Q82L Missense

MAPK15 79T g.chr8: 144804377 T>A S531T Missense

PRRT2 79T g.chr16: 29825053 T>A D226E Missense

GFOD2 79T g.chr16: 67709689 T>A H176L Missense

PKD1L1 79T g.chr7: 47941982 T>A K686N Missense

AC092718.1 79T g.chr16: 81150983 T>A R49W Missense

ASAP3 79T g.chr1: 23769010 T>A Q190L Missense

NFATC1 79T g.chr18: 77170957 T>A C215S Missense

SHANK2 79T g.chr11: 70333244 T>A S694C Missense

S1PR2 79T g.chr19: 10335163 T>A Y140F Missense

PTPRS 79T g.chr19: 5258140 T>A - Splice site

PYCR2 79T g.chr1: 226109073 T>A - Splice site

RASGEF1C 79T g.chr5: 179555612 T>A - Splice site

GS1-541M1.2 79T g.chrX: 26675416 T>A Q67L Missense

MEFV 79T g.chr16: 3293995 T>A K400M Missense

MAP7 79T g.chr6: 136698890 T>A - Splice site

PDE3B 79T g.chr11: 14839832 T>A D542E Missense

NFKBID 79T g.chr19: 36388650 T>A Y122F Missense

BEND3 79T g.chr6: 107391344 T>A S351C Missense

CNNM1 79T g.chr10: 101090451 T>A L436Q Missense

OLA1 79T g.chr2: 174943758 T>A I343F Missense

SCGB1C1 79T g.chr11: 193804 T>A L50M Missense

KDR 79T g.chr4: 55963840 T>A K868I Missense

IGLV7-46 79T g.chr22: 22724367 T>A L88H Missense

DHX15 79T g.chr4: 24538798 T>A L18F Missense

CELSR3 79T g.chr3: 48685734 T>A Y2313F Missense

AFAP1L2 79T g.chr10: 116057049 T>C K746R Missense

ECEL1 79T g.chr2: 233347320 T>A - Splice site

FGFR4 79T g.chr5: 176517643 T>A L115X Nonsense

GHRHR 79T g.chr7: 31011710 T>A - Splice site

NPHS1 79T g.chr19: 36332614 T>A - Splice site

NCOR1 79T g.chr17: 15968256 T>A R1677X Nonsense

ATRX 79T g.chrX: 76855240 T>A D1916V Missense

NCOA2 79T g.chr8: 71039266 T>A Q1233L Missense

LRRK2 79T g.chr12: 40668799 T>A - Splice site

DCHS2 79T g.chr4: 155225939 A>T F1374L Missense

Page 55: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

52

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

USH2A 79T g.chr1: 216256802 A>T L1765H Missense

CREBBP 79T g.chr16: 3828009 T>A - Splice site

CDH10 79T g.chr5: 24509898 T>A T345S Missense

GATA1 80T g.chrX: 48650728 A>G - Splice site

DNAI2 80T g.chr17: 72306154 A>T - Splice site

WBP5 80T g.chrX: 102612541 A>T - Splice site

RP13-77O11.2 80T g.chrX: 52862803 A>T Y49X Nonsense

GUCY1B2 80T g.chr13: 51594629 A>T L353X Nonsense

PRRC2A 80T g.chr6: 31593091 A>T - Splice site

KIF19 80T g.chr17: 72356353 A>T - Splice site

AC036111.1 80T g.chr11: 55522736 A>T E25V Missense

PARP14 80T g.chr3: 122414327 A>T Q218L Missense

ZNF229 80T g.chr19: 44932813 A>T C715S Missense

SEMA3F 80T g.chr3: 50220092 A>T H260L Missense

ACSF2 80T g.chr17: 48541261 A>T M377L Missense

STK19 80T g.chr6: 31946765 A>T Y218F Missense

KCNH8 80T g.chr3: 19432059 A>T T300S Missense

GPR149 80T g.chr3: 154138891 A>T S520R Missense

ALOX12B 80T g.chr17: 7984263 A>T Y156N Missense

KRT8P14 80T g.chrX: 45491705 A>T Y422N Missense

UGT2B4 80T g.chr4: 70361452 A>T L43Q Missense

ZNF41 80T g.chrX: 47307450 A>T H573Q Missense

SYT9 80T g.chr11: 7324516 A>T H131L Missense

LAMB4 80T g.chr7: 107703374 A>T C1043S Missense

DBH 80T g.chr9: 136508539 A>T E250V Missense

GRM7 80T g.chr3: 7188338 A>T Q240L Missense

BMPER 80T g.chr7: 33976899 A>T - Splice site

C7orf63 80T g.chr7: 89929179 A>T - Splice site

IL21R 80T g.chr16: 27455861 A>T - Splice site

CTC-348L14.1 80T g.chr5: 82837510 A>T L29X Nonsense

GLI2 80T g.chr2: 121726477 A>T S148C Missense

RAD51L1 80T g.chr14: 69061307 A>T Q381L Missense

C14orf159 80T g.chr14: 91626745 A>T - Splice site

SLC22A10 80T g.chr11: 63065197 A>T - Splice site

USP13 80T g.chr3: 179479042 A>T - Splice site

USP30 80T g.chr12: 109509505 A>G H190R Missense

Page 56: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

53

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

ARHGAP9 80T g.chr12: 57872524 A>T H111Q Missense

UNC5A 80T g.chr5: 176289631 A>T Q26L Missense

OTOG 80T g.chr11: 17632801 A>T Q1997L Missense

IMPG1 80T g.chr6: 76640727 A>T L729H Missense

KDM6A 80T g.chrX: 44896898 A>T - Splice site

NCOA7 80T g.chr6: 126202293 A>T S173C Missense

SETD1A 80T g.chr16: 30972738 A>T K133X Nonsense

LRRK2 80T g.chr12: 40699751 A>T K1314N Missense

HERC6 80T g.chr4: 89319318 A>T H350L Missense

FLT4 80T g.chr5: 180046770 C>A - Splice site

DSCR4 80T g.chr21: 39325190 C>A D117Y Missense

PDCD11 80T g.chr10: 105201680 C>G P1552R Missense

CACNA1G 80T g.chr17: 48674220 C>A A1065E Missense

SETD7 80T g.chr4: 140439168 C>G G264A Missense

WDR27 80T g.chr6: 170038699 G>A S602F Missense

RP5-1158E12.2 80T g.chrX: 45772886 G>C W68S Missense

HSPA7 80T g.chr1: 161576646 G>A R189Q Missense

ITFG2 80T g.chr12: 2927518 G>T - Splice site

HMCN1 80T g.chr1: 186010203 G>A R2080K Missense

CHD5 80T g.chr1: 6185888 G>C S1370X Nonsense

NPC2 80T g.chr14: 74946993 T>A - Splice site

INSL5 80T g.chr1: 67263854 T>A K84X Nonsense

FSIP2 80T g.chr2: 186661198 T>A L3201X Nonsense

HPX 80T g.chr11: 6461466 T>A R89X Nonsense

BCAS1 80T g.chr20: 52601879 T>A K363X Nonsense

KBTBD6 80T g.chr13: 41706643 T>A Q2L Missense

EPB41L4B 80T g.chr9: 111947779 T>A K803I Missense

RP11-118F2.3 80T g.chr9: 94789814 T>A L12H Missense

FAP 80T g.chr2: 163046259 T>C K486E Missense

SLC25A39 80T g.chr17: 42399905 T>A Q69L Missense

OR51B5 80T g.chr11: 5364529 T>A M76L Missense

IPCEF1 80T g.chr6: 154481085 T>A T399S Missense

HTR1A 80T g.chr5: 63257471 T>A T26S Missense

GSDMC 80T g.chr8: 130762242 T>A I403L Missense

MYOM2 80T g.chr8: 2021455 T>A F332Y Missense

ABCC9 80T g.chr12: 21981977 T>A K1195I Missense

Page 57: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

54

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

GFRA1 80T g.chr10: 117825133 T>A Q401L Missense

GPR142 80T g.chr17: 72368656 T>A C436S Missense

PTGFR 80T g.chr1: 79002259 T>A C323S Missense

BCAR3 80T g.chr1: 94037273 T>A D643V Missense

LRRC66 80T g.chr4: 52861278 T>A Q637L Missense

POSTN 80T g.chr13: 38158986 T>A E325D Missense

SPOCK2 80T g.chr10: 73832299 T>C Y69C Missense

STAB2 80T g.chr12: 104098351 T>A C1287S Missense

DMD 80T g.chrX: 31676108 T>A - Splice site

MYH11 80T g.chr16: 15847367 T>A - Splice site

COL2A1 80T g.chr12: 48373349 T>A - Splice site

C1orf61 80T g.chr1: 156374287 T>A Q12L Missense

AP002512.1 80T g.chr11: 56216444 T>A C99X Nonsense

RP11-293F5.8 80T g.chr1: 21737997 T>C R153G Missense

TM6SF2 80T g.chr19: 19380986 T>A R133X Nonsense

AP001482.1 80T g.chr11: 88845898 T>A C6S Missense

ARHGAP17 80T g.chr16: 24979666 T>A - Splice site

TNR 80T g.chr1: 175334147 T>A - Splice site

TTC5 80T g.chr14: 20766956 T>A - Splice site

WDR52 80T g.chr3: 113152477 T>A E12V Missense

CLNK 80T g.chr4: 10509653 T>A Q305L Missense

ANPEP 80T g.chr15: 90335737 T>A E769V Missense

TXNDC11 80T g.chr16: 11823782 T>A N255I Missense

MLLT3 80T g.chr9: 20448215 T>A E109V Missense

EPHA5 80T g.chr4: 66230888 T>C K695E Missense

NCOA2 80T g.chr8: 71041023 T>A R1173X Nonsense

ATR 80T g.chr3: 142211983 T>A K2023N Missense

SETD2 80T g.chr3: 47162530 T>A Q1199L Missense

CHD8 80T g.chr14: 21875046 T>A K680M Missense

JMJD6 80T g.chr17: 74720048 T>A K204M Missense

MLL2 80T g.chr12: 49442554 T>A - Splice site

TP53 80T g.chr17: 7579494 A>T R65X Nonsense

SETX 80T g.chr9: 135205119 T>A - Splice site

DNAH9 80T g.chr17: 11572858 A>T I1034F Missense

CDH10 80T g.chr5: 24509750 T>A H394L Missense

ADAMTSL1 80T g.chr9: 18723054 A>T L416H Missense

Page 58: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

55

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

USP5 100T g.chr12: 6969313 A>T - Splice site

RAB11FIP4 100T g.chr17: 29854858 A>T - Splice site

DIRC2 100T g.chr3: 122564608 A>T - Splice site

PTPN7 100T g.chr1: 202127440 A>T L44Q Missense

FGD4 100T g.chr12: 32729289 A>T Q46H Missense

LYPD6B 100T g.chr2: 150061614 A>T K67X Nonsense

KALRN 100T g.chr3: 124114272 A>T N95Y Missense

DLG3 100T g.chrX: 69712449 A>T - Splice site

PPRC1 100T g.chr10: 103904062 A>T - Splice site

SSTR3 100T g.chr22: 37603714 A>T S43R Missense

PLEKHA2 100T g.chr8: 38801329 A>T Q74L Missense

DTX1 100T g.chr12: 113515429 A>T N154Y Missense

FANCI 100T g.chr15: 89807826 A>T Q248L Missense

PPARD 100T g.chr6: 35391914 A>T S206C Missense

MED12L 100T g.chr3: 151148098 A>T Q2105H Missense

TRPC1 100T g.chr3: 142503867 A>T I394F Missense

TNFSF14 100T g.chr19: 6667432 A>T L83Q Missense

KRT7 100T g.chr12: 52628958 A>T Q115L Missense

ADARB1 100T g.chr21: 46603371 A>T T448S Missense

OR5I1 100T g.chr11: 55703374 A>T L168Q Missense

TNR 100T g.chr1: 175348866 A>T D595E Missense

IGHV5-51 100T g.chr14: 107034944 A>T Y46N Missense

GSG1L 100T g.chr16: 27895848 A>T L170H Missense

LMNA 100T g.chr1: 156105102 A>T - Splice site

G3BP1 100T g.chr5: 151178771 A>T - Splice site

DNAH17 100T g.chr17: 76565246 A>T - Splice site

DECR2 100T g.chr16: 455541 A>T S60C Missense

SLC47A2 100T g.chr17: 19618217 A>T L146H Missense

WAS 100T g.chrX: 48544472 A>T R170X Nonsense

ADAMTSL1 100T g.chr9: 18574247 A>T I153F Missense

IDH3G 100T g.chrX: 153055624 A>T C29S Missense

FITM1 100T g.chr14: 24602021 A>T K290X Nonsense

PLXNA1 100T g.chr3: 126741156 A>T K1400X Nonsense

SCML4 100T g.chr6: 108042098 A>T L261X Nonsense

SERPINB11 100T g.chr18: 61387273 A>T S54C Missense

CHD6 100T g.chr20: 53358140 A>T Q2660L Missense

Page 59: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

56

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

AE000659.8 100T g.chr14: 22309774 A>T Q53L Missense

MLL 100T g.chr11: 118374588 A>T K2658X Nonsense

KDM6A 100T g.chrX: 44870204 A>T - Splice site

SETD1A 100T g.chr16: 30982741 A>T Y1020F Missense

ARID4A 100T g.chr14: 58814448 A>T E419V Missense

MLLT10 100T g.chr10: 22002766 A>T T605S Missense

SETBP1 100T g.chr18: 42531454 A>T K717X Nonsense

LRRK2 100T g.chr12: 40626129 A>T Q97H Missense

NCOA1 100T g.chr2: 24930115 A>T L592F Missense

ARID1A 100T g.chr1: 27097784 A>T K1125X Nonsense

ARID2 100T g.chr12: 46285700 A>T - Splice site

VSIG4 100T g.chrX: 65253560 C>T W56X Nonsense

MOGAT3 100T g.chr7: 100839260 C>G G264R Missense

DNAH6 100T g.chr2: 84756185 C>A P186H Missense

IARS 100T g.chr9: 95025273 C>A V589L Missense

PSD 100T g.chr10: 104175876 C>A - Splice site

MLL 100T g.chr11: 118343798 C>T L642F Missense

SUPT3H 100T g.chr6: 45289535 G>T - Splice site

SYNPO2L 100T g.chr10: 75406610 G>A P934S Missense

RASL11A 100T g.chr13: 27847371 G>T V157L Missense

EPHA5 100T g.chr4: 66356195 G>T H434Q Missense

SAFB2 100T g.chr19: 5598897 T>G - Splice site

REXO4 100T g.chr9: 136276195 T>A - Splice site

DDX60 100T g.chr4: 169206651 T>A - Splice site

ATP8B4 100T g.chr15: 50294422 T>A - Splice site

SHPK 100T g.chr17: 3533642 T>A - Splice site

LRRIQ1 100T g.chr12: 85547825 T>A L1558X Nonsense

LRRC48 100T g.chr17: 17907855 T>A L244Q Missense

CENPI 100T g.chrX: 100375419 T>A L207X Nonsense

XIST 100T g.chrX: 73047745 T>A M26L Missense

COL22A1 100T g.chr8: 139606430 T>A Y1482F Missense

RGS9 100T g.chr17: 63159213 T>A - Splice site

HMSD 100T g.chr18: 61627528 T>C F120S Missense

APC 100T g.chr5: 112179558 T>C I2756T Missense

TRIM9 100T g.chr14: 51450125 T>A T599S Missense

CCDC135 100T g.chr16: 57735981 T>A L213Q Missense

Page 60: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

57

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

SACS 100T g.chr13: 23914405 T>A S1204C Missense

RYR1 100T g.chr19: 39009953 T>A V3373E Missense

XRN1 100T g.chr3: 142090087 T>A E1021V Missense

ADAMTS9 100T g.chr3: 64606882 T>A E907D Missense

KDR 100T g.chr4: 55970893 T>A Q635L Missense

OR2A13P 100T g.chr7: 143839246 T>A C7S Missense

DHX37 100T g.chr12: 125441411 T>A Q760L Missense

IGLV4-60 100T g.chr22: 22516934 T>A S74R Missense

YME1L1 100T g.chr10: 27436564 T>A I68F Missense

WDR49 100T g.chr3: 167245702 T>A E485V Missense

AE000660.10 100T g.chr14: 22600551 T>A F13I Missense

LRIG3 100T g.chr12: 59282252 T>A Q269L Missense

ITGA4 100T g.chr2: 182343486 T>A Y187N Missense

GPR77 100T g.chr19: 47844093 T>A Y13N Missense

FAM123A 100T g.chr13: 25744200 T>A S401C Missense

SLC28A2 100T g.chr15: 45557339 T>A L252Q Missense

ASTN2 100T g.chr9: 119249665 T>A K1153M Missense

DBH 100T g.chr9: 136516823 T>A V420E Missense

TYR 100T g.chr11: 88911227 T>A C36S Missense

SLC25A22 100T g.chr11: 792199 T>A Q254L Missense

HSPBAP1 100T g.chr3: 122496755 T>A - Splice site

TMEM44 100T g.chr3: 194344041 T>A - Splice site

PHF21B 100T g.chr22: 45285664 T>A - Splice site

DAB2 100T g.chr5: 39392567 T>A - Splice site

FBN2 100T g.chr5: 127599350 T>A - Splice site

MAL 100T g.chr2: 95719326 T>A C15S Missense

OTOF 100T g.chr2: 26702137 T>A K737X Nonsense

ERCC3 100T g.chr2: 128036750 T>A K577X Nonsense

RYR1 100T g.chr19: 38958256 T>A V1062E Missense

PRF1 100T g.chr10: 72360180 T>A Q160L Missense

ATP13A1 100T g.chr19: 19770717 T>A H159L Missense

NFXL1 100T g.chr4: 47896219 T>A Q477L Missense

PHKA2 100T g.chrX: 18917328 T>A Q1025L Missense

SHANK2 100T g.chr11: 70332859 T>A Q822L Missense

DNMT3A 100T g.chr2: 25497835 T>A E205V Missense

NCOA6 100T g.chr20: 33338025 T>A Q658L Missense

Page 61: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

58

Supplementary Table 8 continued

Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs

Gene Symbol Sample

ID Nucleotide (Genomic)

AA

Change Change Type

ARID4B 100T g.chr1: 235331959 T>A T1274S Missense

HERC1 100T g.chr15: 63950744 T>A - Splice site

MLL2 100T g.chr12: 49427737 T>A Q3584L Missense

PIK3CG 100T g.chr7: 106545627 T>A L1035Q Missense

SMARCA1 100T g.chrX: 128602884 T>A - Splice site

BRIP1 100T g.chr17: 59861787 T>A - Splice site

SETX 100T g.chr9: 135205675 T>A Y437F Missense

DCHS2 100T g.chr4: 155242017 C>G D1057H Missense

USH2A 100T g.chr1: 216074128 T>A R2474W Missense

DNAH9 100T g.chr17: 11597768 A>T - Splice site

MYO5C 100T g.chr15: 52543631 T>A R540X Nonsense

CREBBP 100T g.chr16: 3832778 T>A N494Y Missense

Page 62: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

59

Supplementary Table 9

The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Triplet sequence Number of occurrences in reference unspliced

transcript regions

Whole genome mutations in unspliced transcripts (sense

strand) per million triplet occurrences

TAG 21,385,856 489.1

CAG 33,081,154 457.4

TAC 17,558,516 361.5

CTA 19,584,816 252.6

TAT 32,436,138 247.3

CAT 28,802,535 236.9

TAA 31,522,442 231.6

CTG 34,400,990 225.8

CAC 23,010,091 222.3

CAA 26,804,923 214.6

GTA 18,835,554 182.5

AAG 30,295,166 161.1

GAG 28,017,769 160.6

ATA 30,643,440 130.1

ATG 29,057,712 129.5

GTG 26,485,820 116.1

TTA 33,840,449 112.1

TTG 32,831,444 106.7

CTT 33,446,798 83.3

GAA 30,134,519 81.7

CTC 27,481,282 80.6

GAC 15,016,072 74.8

CCG 5,119,091 63.9

AAC 21,105,949 56.6

GAT 21,869,739 55.7

AAA 54,647,870 51.2

AAT 37,056,182 44.9

CGG 5,052,009 44.9

TTC 32,138,122 43

GCG 4,495,083 41.2

GTC 15,927,567 39.4

ACG 4,145,742 38.6

CCC 21,932,371 35.1

Page 63: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

60

Supplementary Table 9 continued

The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Triplet sequence Number of occurrences in reference unspliced

transcript regions

Whole genome mutations in unspliced transcripts (sense

strand) per million triplet occurrences

CGT 4,584,943 32.1

GTT 25,336,011 31.8

ATC 20,434,461 26.8

GGG 22,783,978 26.6

CGC 4,320,937 26.4

TTT 66,478,696 26.1

CCA 28,970,783 26

ATT 40,028,353 24.6

TCG 3,948,305 23.8

TGG 31,610,504 20.7

CGA 3,752,064 19.5

CCT 29,864,527 17.6

AGG 29,156,183 16.5

GGA 25,390,838 14.1

TCC 25,158,132 14.1

ACC 18,338,132 10.7

TCT 36,335,394 10.2

AGA 33,959,533 10

GCC 20,457,256 9.8

GGT 20,012,098 9.2

GGC 20,417,785 9

TGA 31,935,743 8.5

ACA 29,224,997 8.1

TCA 30,539,328 7.6

TGT 35,004,257 6.6

ACT 24,980,528 5.7

AGT 26,982,802 5.1

GCA 22,962,367 5.1

TGC 24,226,432 4.5

AGC 22,681,450 3.7

GCT 23,733,151 3.3

Page 64: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

61

Supplementary Table 10

The effect of +/− one base flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Triplet sequence

Number of occurrences in

reference outside transcribed regions

Whole genome intergenic mutations

per million triplet occurrences

CTA : TAG 35,286,346 514.2

CAG : CTG 52,807,266 509.3

GTA : TAC 30,703,409 398.2

ATG : CAT 50,755,626 265.4

CAC : GTG 39,654,427 264.4

ATA : TAT 58,565,776 259.8

TAA : TTA 57,491,649 245

CAA : TTG 52,404,647 233.8

CTC : GAG 44,367,158 180.3

AAG : CTT 54,359,348 177.7

CCG : CGG 6,372,732 91

GAA : TTC 54,396,287 90.1

GAC : GTC 25,085,578 82.5

AAC : GTT 39,807,488 69.1

ATC : GAT 36,676,580 59.5

CGC : GCG 5,451,325 57.3

AAA : TTT 105,971,797 54.3

AAT : ATT 70,126,249 49.6

ACG : CGT 6,231,337 48.5

CCC : GGG 33,505,642 47.7

CCA : TGG 48,825,083 35.2

AGG : CCT 46,388,989 29.7

CGA : TCG 5,451,878 29.7

GGA : TCC 41,028,220 22.7

ACC : GGT 30,626,984 16.4

AGA : TCT 60,587,491 15.5

GCC : GGC 29,932,276 14.6

TCA : TGA 53,457,500 14

ACA : TGT 55,112,234 11.4

ACT : AGT 43,279,543 9.4

GCA : TGC 38,228,283 8.7

AGC : GCT 36,563,127 6.3

Page 65: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

62

Supplementary Table 11

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences ATAGG 1,196,711 819.7

CTAGG 1,156,560 721.1

CTAGC 824,340 691.5

ATAGC 1,097,065 674.5

ACAGG 2,179,044 660.8

ATAGA 1,825,784 660

ACAGC 1,468,809 630.4

CTAGA 1,343,347 616.4

GTAGG 980,728 599.6

ACAGT 1,893,468 538.7

ATACA 2,175,087 530.6

CTACA 1,386,997 523.4

CCAGG 2,920,014 515.8

ACAGA 2,556,782 506.5

GCAGG 2,005,285 505.2

GTACA 1,125,872 472.5

ATAGT 1,607,995 458.3

CCAGA 1,968,757 455.1

CTATG 1,252,622 441.5

GTAGA 1,498,367 433.8

GCAGC 1,521,819 427.8

GTAGC 1,033,949 427.5

TCAGG 2,157,273 426.5

CTAAG 1,180,278 425.3

CCAGT 1,678,066 424.3

CTAGT 1,023,498 421.1

CTATA 1,354,815 420

ATACC 1,008,599 414.4

GCAGA 1,956,583 413.5

CCTAT 1,123,682 412.9

CTACC 915,021 407.6

ATACT 1,540,424 403.8

ACATA 2,028,968 396.8

Page 66: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

63

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CCAGC 2,637,820 396.2

ATATG 2,076,349 394.9

TTAGG 1,442,396 394.5

ATACG 216,776 382.9

ATATA 3,382,612 382.2

ATAAG 1,576,073 375.6

ACATG 2,054,573 370.4

GCAGT 1,907,864 367.4

CCATG 1,955,349 364.1

GTAGT 1,169,626 363.4

TCAGC 1,997,426 358

CTACT 1,566,215 357.5

GTATG 1,181,594 357.1

CTAAC 863,735 356.6

CCATA 1,210,080 355.3

GCTAT 1,101,373 355

TCAGA 2,192,906 350.2

TCAGT 2,036,941 350

TTAGC 1,393,011 345.3

ACACA 2,558,406 340.4

ACAAG 1,505,533 338.1

CCTAG 1,143,756 335.7

GTAAG 1,125,649 334

ACACG 336,252 330.1

ACACT 1,512,147 328

ATAAC 1,245,568 327.6

TCTAT 1,793,844 327.2

GTATA 1,468,668 326.1

GTACT 948,726 322.5

GCTGT 1,782,660 316.4

TCTAG 1,414,524 310.4

GCTAG 862,816 308.3

CTATC 912,009 300.4

Page 67: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

64

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CCTGT 2,378,443 297.7

ATATC 1,408,600 294.6

CCAAG 1,787,484 292

TTAGA 1,930,896 287.9

GCATA 1,110,806 287.2

GCATG 1,655,026 285.8

CCTAC 861,548 279.7

ACATC 1,285,663 279.2

ACAAC 1,126,435 277

GTATC 875,253 270.8

ACACC 1,223,095 269

GTACC 681,562 268.5

GCACA 1,534,212 266.6

CCTGG 3,036,580 266.1

TGTAT 2,627,279 263

TCTAC 1,310,024 261.1

ACAAT 1,751,803 260.9

CAAGG 1,629,941 260.1

AGAGG 2,235,125 259

ACTAT 1,395,163 258.8

TAAGG 1,264,331 257.8

TTACA 2,386,112 255.2

CCTGC 2,068,227 254.8

ACTGT 1,935,065 251.2

CTAAA 2,028,241 249.5

CTATT 1,894,484 248.1

CTACG 161,886 247.1

ATAAA 3,870,624 246.7

TCTGT 3,129,571 246.4

CCACA 1,918,264 239.8

GTAAC 839,636 239.4

TGTAG 1,700,884 237.5

GCAAG 1,332,614 236.4

Page 68: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

65

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CTAAT 1,719,130 235.6

ACTGG 1,557,983 234.3

GTACG 124,926 232.1

CATAG 1,321,681 229.3

TCTGG 2,123,671 227.9

TTAGT 1,860,278 227.4

ATAAT 2,868,687 227.3

CCAAT 1,139,360 226.4

TATAG 1,511,726 226.2

TTACC 1,148,826 223.7

CCACT 1,839,226 223.5

GCATC 1,025,348 222.4

GCACT 1,497,985 222.3

GCTAC 909,960 222

TTAAG 1,914,895 220.9

CATAT 2,024,901 219.3

GGTAT 1,103,551 219.3

CCAAC 1,150,566 217.3

CATGT 2,232,017 215.9

GTAAA 1,899,997 215.8

TCAAG 1,950,784 215.8

TGTAC 1,176,813 215

CGTAT 237,460 214.8

CTTAG 1,340,465 214.1

AAAGG 2,275,558 214

ACATT 2,614,712 213.8

CCAAA 2,334,491 213.8

TCATA 1,719,856 213.4

CAAGC 1,283,238 212

TTACT 1,960,114 211.7

GCTGC 1,599,146 211.4

ATATT 3,544,566 211

ACAAA 3,127,568 209.4

Page 69: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

66

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CCATC 1,669,035 209.1

GCACG 335,795 208.5

CCTAA 1,239,069 207.4

TATGT 2,347,689 202.8

GGTAG 1,019,912 202

GCAAT 1,324,115 201.6

CCACG 460,938 199.6

TCTGC 2,065,759 197.5

CCTGA 2,157,244 195.2

TTACG 210,494 194.8

TTATG 1,996,409 194.8

GCTGG 2,881,658 194

AGTAT 1,664,023 193.5

GTAAT 1,772,965 193.5

TAAGC 1,028,832 193.4

CTTAT 1,727,514 192.8

GCAAC 1,044,025 192.5

ACTAG 926,779 192.1

AGAGC 1,613,921 191.5

CAAGA 1,991,200 190.8

CGAGG 457,087 188.1

TGAGG 2,446,641 187.2

TCATG 1,981,533 185.7

ACTGA 1,774,841 183.7

CATGG 1,927,451 183.1

GAAGG 1,927,260 182.1

TATAT 3,554,080 182

AGAGA 3,208,362 178.3

GCACC 1,082,671 178.3

ACTAC 948,651 178.1

GCTGA 1,965,043 178.1

CGTAC 118,678 176.9

AGTAG 1,773,315 175.9

Page 70: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

67

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TATGG 1,313,356 175.9

CGTAG 188,618 175

TTAAC 1,432,050 174.6

GATAG 1,022,660 174.1

TCAAC 1,105,819 173.6

GTATT 2,106,637 169.9

GCTAA 1,302,400 166.6

GTTAG 1,048,024 166

TCTGA 2,219,897 165.8

ACTGC 1,723,894 165.3

TAAGA 1,851,381 165.3

CAAGT 1,671,961 165.1

CATAC 1,022,951 164.2

TCACA 2,061,956 163.4

TCACT 2,368,410 163.4

TGTGT 3,431,103 162.9

CTTAC 1,102,458 162.4

TATAC 1,335,228 161.8

TCAAT 1,563,226 161.2

AGTGT 1,851,038 160.5

CCATT 2,129,170 160.2

TCTAA 1,759,834 159.7

TGAGC 1,986,513 158.6

TATGC 1,101,358 158

CATGC 1,588,965 156.1

AGAGT 1,891,490 155.4

GATAT 1,500,074 155.3

AAAGC 1,845,144 155

CCACC 2,170,438 153

GGTAC 661,436 152.7

GCAAA 1,869,491 152.4

AGTAC 924,188 151.5

TTATA 2,784,296 149.1

Page 71: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

68

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GGAGG 2,880,335 147.6

CTTGT 1,862,880 147.1

TCACG 447,573 145.2

AGAAG 2,624,991 144.8

CGTGT 425,183 143.5

CGAGA 478,705 142

GGAGC 1,316,451 141.3

TGAGA 2,868,040 140.9

CGAAG 235,543 140.1

CCTTG 1,843,788 139.9

GAAGC 1,415,080 139.2

GATGT 1,487,539 139.2

TAAGT 1,652,007 139.2

CGAGC 216,337 138.7

CCTTA 1,282,901 138

GTTAC 897,669 137

TTAAA 4,220,733 136.9

TTATC 1,622,110 136.9

CTTGG 2,031,681 134.9

GCATT 1,757,313 134.3

TGTGG 2,301,303 134.3

TGAGT 1,938,112 133.6

TGTGC 1,796,759 133.6

TCATC 1,640,929 133.5

AGTGG 2,051,485 133.1

TCAAA 2,689,897 132.3

ATTGT 2,211,104 132.1

TTAAT 2,961,609 132

TCACC 1,661,133 131.2

TTTAG 2,491,824 130.8

TGTAA 2,445,904 130

CATGA 1,899,001 129

GTTGT 1,533,215 128.5

Page 72: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

69

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GGTGT 1,506,228 127.5

AATAG 1,871,604 126.6

CCTCT 2,297,690 124

CTTGC 1,459,495 124

AAAGA 3,599,258 123.4

ATTGG 1,422,826 123

GTTAT 1,464,484 122.9

GATAC 832,386 122.5

GGAGA 2,345,781 117.2

TGAAG 2,273,317 117

CAAAG 2,378,782 116.9

GCCGT 248,314 116.8

TTTAC 1,995,555 116.8

CGAGT 284,243 116.1

GAAGA 2,341,241 115.8

ATTAG 1,746,436 115.7

GATGC 1,029,408 115.6

GTTGG 1,459,066 113.8

CAATG 1,452,710 113.6

ACTAA 1,490,471 112.7

GGAGT 1,848,924 112.5

TTTAT 4,782,503 111.7

GGTAA 1,140,608 111.3

TGACA 1,798,112 110.7

CCTTT 2,671,626 108.2

CGACC 157,280 108.1

ATTAT 3,002,047 107.6

CGTAA 204,686 107.5

AATGT 2,624,276 107.1

CTTGA 2,081,248 107.1

AATAT 3,333,236 106.2

CAACA 1,851,159 105.9

AGTGC 1,663,404 105.8

Page 73: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

70

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences AGTAA 1,847,337 105

GATGG 1,813,464 104.2

TTTGT 4,264,759 103.9

AGACA 2,211,856 103.5

CGTGG 517,850 102.3

GCTCT 1,700,767 102.3

TCATT 2,961,630 100.6

ATTGC 1,523,508 99.8

TAAAG 2,183,525 99.8

GAAGT 1,726,675 99.6

TTTGG 2,875,060 99.5

TCTCT 3,430,559 99.1

CAATA 1,541,229 98.6

GCTTG 1,439,693 98.6

CCTCA 2,368,321 98.4

AATGG 2,008,464 97.6

GTTAA 1,481,224 97.2

GCTTA 1,123,982 97

CCTCG 467,262 96.3

ACCGT 281,526 95.9

AGAAC 1,604,320 95.4

ACGGC 201,551 94.3

CATAA 1,700,150 94.1

TAACA 1,753,204 94.1

GTTGA 1,393,457 94

TATGA 1,744,778 94

AAAGT 2,789,924 93.6

TGTGA 2,405,210 93.5

CGACG 53,661 93.2

GTTGC 1,225,081 93.1

CGATA 140,380 92.6

GGAAG 2,276,439 92.2

GAAAG 2,390,000 92.1

Page 74: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

71

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TCTTG 2,409,905 92.1

ACTTG 1,741,874 91.3

AGATA 1,916,407 90.8

CGATG 220,508 90.7

CGTGC 367,049 89.9

CCTTC 1,995,653 89.7

CCCGC 462,974 88.6

GGTGG 2,286,103 88.4

AGATG 2,280,579 88.1

ACTTA 1,591,657 88

CCCGT 343,210 87.4

TGATA 1,712,570 86.4

CGACT 163,151 85.8

CGACA 211,039 85.3

TTTGC 2,269,755 85

GCTTC 1,480,116 84.5

AGTGA 2,455,577 84.3

AGAAT 2,807,086 84.1

CTTAA 1,871,029 83.4

TGATG 2,012,580 83

ATTGA 1,915,648 82

TATAA 2,589,257 81.1

AATAC 1,750,376 80

TAATG 1,893,230 79.8

TGACT 1,717,370 79.8

ACTCT 1,784,576 79.6

CAAAT 2,490,445 79.5

GGTGC 1,169,212 79.5

ACGGT 251,817 79.4

TAAAC 1,564,335 78.6

CAACT 1,275,167 78.4

TTATT 4,573,537 78.1

ACTCA 1,644,045 77.9

Page 75: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

72

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences ATTAC 1,710,030 77.8

TGAAC 1,625,556 77.5

ACCGC 246,239 77.2

AATGC 1,573,543 76.9

AGACG 416,851 76.8

CGAAC 185,317 75.5

TCTTA 2,043,125 75.4

TGAAT 2,537,051 75.3

GCCGG 439,091 75.2

TACGC 119,692 75.2

TCCGT 320,449 74.9

TATCG 147,372 74.6

TTTGA 3,301,776 74.2

TGACG 229,315 74.1

CAAAC 1,575,197 73.6

GCTCG 217,848 73.4

AAAAG 3,557,820 72.8

CGAAT 192,395 72.8

CGTGA 497,670 72.3

AGACT 1,734,014 72.1

GCTCA 1,887,082 72.1

GATGA 1,692,502 71.5

CGTCG 56,374 71

CTTCT 2,826,827 70.8

GCCGC 312,289 70.4

GCGGT 258,027 69.8

CTTCA 2,187,966 69

TGAAA 3,474,874 67.9

TCTCA 2,756,477 67.5

TTTAA 4,480,310 67.4

CCTCC 2,927,702 67.3

GGAAC 1,084,026 67.3

GGTGA 1,759,283 67.1

Page 76: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

73

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GCTTT 2,240,670 66.9

GGATG 1,508,733 66.9

AGAAA 4,454,267 66.7

TCTTT 4,488,015 66.4

TAATA 2,516,522 66

TACGT 278,358 64.7

GGCGT 454,366 63.8

AAACG 329,619 63.7

TCTCC 2,350,751 63

CACGT 430,088 62.8

AGACC 1,363,176 62.4

AAACA 3,220,352 61.8

CTTCG 261,669 61.1

ACCGG 181,150 60.7

TCCGC 329,980 60.6

TCGGT 198,165 60.6

GGATA 1,124,692 60.5

ATTAA 2,697,424 60.4

TCTTC 2,516,257 60.4

ACGGG 333,595 60

CGCGC 184,400 59.7

TGTTA 2,077,631 59.2

CGAAA 254,428 59

GATAA 1,538,897 58.5

AATGA 2,670,950 58.4

ACTCG 256,893 58.4

CAATC 995,390 58.3

CTTTG 2,842,703 57.3

CAACC 1,106,715 56.9

TGCGT 298,767 56.9

CTTTA 2,397,027 56.7

GAACA 1,657,449 56.7

GACGT 231,783 56.1

Page 77: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

74

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GAATA 1,862,184 55.3

GGACA 1,373,640 55.3

ACCGA 181,101 55.2

TGACC 1,325,013 54.3

AAAAC 2,843,978 53.4

TAACT 1,553,687 53.4

ACTTT 2,955,966 53.1

TAAAT 3,470,005 53

TGTCT 2,636,640 52.7

GTTCT 1,863,119 52.6

CATTG 1,805,412 52.1

TCTCG 521,574 51.8

GCTCC 1,237,058 51.7

CCCCT 1,549,404 51

TGTTG 2,472,227 51

TAACG 176,992 50.8

TCCCT 2,206,139 50.8

GAAAC 1,814,590 50.7

GCGTA 137,955 50.7

TATTG 1,959,216 50

ACGTA 260,967 49.8

CGATC 240,798 49.8

TGTCA 1,968,231 49.8

CCGTC 343,390 49.5

GGAAT 1,760,350 49.4

TGCGC 264,078 49.2

AACGC 162,959 49.1

ACTCC 1,631,453 49

CTTCC 2,335,942 48.8

CCGGC 435,045 48.3

TAACC 896,878 47.9

GAATG 1,924,126 47.8

GGAAA 2,742,646 47.4

Page 78: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

75

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GTTCA 1,709,786 47.4

ACGCT 232,572 47.3

GCGCG 190,401 47.3

CCGTA 169,716 47.1

CACGG 363,943 46.7

AAATG 3,606,681 46.6

CAACG 193,710 46.5

ACTTC 1,620,365 46.3

AGGGG 1,515,646 46.2

ATTCT 2,940,287 45.9

GGACG 240,819 45.7

AGATC 1,363,087 45.5

AGTCG 176,040 45.4

GGACT 1,218,661 45.1

TGCGG 310,425 45.1

AGCGC 223,129 44.8

AAATA 4,768,519 44.7

GTTCG 201,328 44.7

AGTCA 1,637,059 44.6

TAAAA 4,800,838 44.6

GTCGG 180,588 44.3

GCCGA 361,825 44.2

AATAA 3,733,046 43.9

GGCGC 433,487 43.8

TGTCG 255,141 43.1

ACGCG 69,711 43

TCCGA 188,225 42.5

ACCAT 1,799,498 42.2

CCGGT 189,741 42.2

ACCCT 1,376,458 42.1

GCGGG 479,786 41.7

ACGTT 339,505 41.2

CAAAA 3,576,736 41.1

Page 79: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

76

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GATTG 1,293,906 41

CCGGA 246,274 40.6

GACGC 172,356 40.6

CGTCT 419,530 40.5

AACGT 323,346 40.2

CGTTC 224,657 40.1

TGATC 1,502,826 39.9

CATTA 1,789,486 39.7

GAAAT 2,882,921 39.5

AGCGG 278,882 39.4

TAATC 1,589,175 39

CATCA 1,749,553 38.9

ATCGC 257,954 38.8

CGCGT 77,386 38.8

GCCCT 1,391,836 38.8

GCGAT 284,533 38.7

GGACC 801,101 38.7

GTCGC 180,882 38.7

TCGGG 336,464 38.6

TATCT 1,978,794 38.4

TCGCG 78,192 38.4

GAACG 209,638 38.2

GCGGA 314,482 38.2

TATCA 1,597,745 38.2

TGTCC 1,494,977 38.1

TATTA 2,615,304 37.9

ATTCA 2,369,453 37.6

CATCT 2,215,828 37.5

GCCAT 1,487,994 37

AGTTG 1,572,993 36.9

TCGGA 189,833 36.9

ATTTA 3,714,156 36.6

TTCGT 300,726 36.6

Page 80: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

77

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TTTCA 3,702,732 36.5

CAATT 1,788,994 36.3

GCGTC 192,694 36.3

ATTTG 2,988,525 36.1

CCCAT 1,612,870 36

GGCGG 614,549 35.8

CTTTT 4,484,575 35.7

GTTTG 2,074,984 35.7

CGTTA 197,330 35.5

TCCCG 481,128 35.3

GTTCC 1,172,656 35

GGTCT 1,491,396 34.9

ACGAT 229,873 34.8

CCCCG 492,991 34.5

TACCG 144,941 34.5

CTTTC 2,631,978 34.2

ACCCC 1,201,127 34.1

TCCCC 1,639,875 34.1

TTTCT 5,383,927 34

AGCGT 265,750 33.9

CCCCA 2,003,088 33.9

GCGTT 209,754 33.4

TCCAT 2,063,377 33.4

TGGGT 1,951,465 33.3

ACGTC 211,048 33.2

AGATT 2,232,198 33.2

AGTCT 1,866,281 33.2

TCGTA 180,801 33.2

GGGGT 1,336,146 32.9

TATCC 1,094,063 32.9

CCCGG 639,671 32.8

ACCCG 336,523 32.7

GACGA 183,720 32.7

Page 81: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

78

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences AAACC 1,748,056 32.6

AGGGA 2,148,555 32.6

TGGGG 2,211,306 32.6

CCCCC 1,201,327 32.5

CGTTG 247,037 32.4

CTCGT 308,730 32.4

GCGCA 246,785 32.4

CCGCT 279,398 32.2

ACGGA 281,671 32

CCCTG 2,141,641 31.8

ATCGT 252,737 31.7

ATGGG 1,676,500 31.6

GCCCC 1,204,419 31.6

AGGGT 1,458,919 31.5

TACGA 159,672 31.3

CCGCA 288,662 31.2

GACGG 352,833 31.2

CATCG 224,820 31.1

TATTC 1,928,243 31.1

GAACT 1,584,270 30.9

ATGGA 2,050,759 30.7

AGTTA 1,637,716 30.5

AAACT 2,499,781 30.4

CCGAC 166,458 30

CGTCA 233,678 30

CCGGG 634,882 29.9

CCCTT 1,745,886 29.8

CCCGA 336,597 29.7

GGTTG 1,318,526 29.6

CGTCC 237,658 29.5

GGTCG 169,286 29.5

AGTCC 1,197,074 29.2

CGTTT 411,262 29.2

Page 82: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

79

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TCGGC 379,732 29

ATTCC 1,666,678 28.8

TCCCA 2,890,728 28.7

TGTTC 1,924,955 28.6

CCCTA 947,329 28.5

GAAAA 3,838,493 28.4

CATTT 3,987,777 28.3

AGCGA 356,441 28.1

CATCC 1,430,845 28

TAATT 3,322,941 28

TGTTT 4,287,921 28

AAAAT 5,813,462 27.9

ACCAC 1,542,993 27.9

GTTTA 1,861,719 27.9

GCGCC 432,566 27.7

TCGTG 361,038 27.7

ACCCA 1,704,439 27.6

GGTTA 978,764 27.6

TTCGC 185,740 26.9

ACGCC 410,905 26.8

ATCTA 1,424,270 26.7

GTTTC 2,106,256 26.6

CACGC 490,095 26.5

GTTTT 3,843,083 26.5

ATGGC 1,446,599 26.3

GCCAC 1,670,766 26.3

CAGGT 1,963,502 26

ACGCA 231,978 25.9

ATGGT 1,889,813 25.9

AACGG 195,438 25.6

GGTCA 1,289,385 25.6

CCGAG 549,026 25.5

GATTA 1,722,546 25.5

Page 83: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

80

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CGCGA 78,619 25.4

CTGGT 1,774,314 25.4

CCGTG 438,594 25.1

GGCGA 278,461 25.1

GCGGC 320,154 25

GGGGG 1,200,687 25

ACCAG 1,610,502 24.8

CCGTT 241,819 24.8

TTTTA 5,777,747 24.8

TTTCC 3,005,026 24.6

TCCGG 245,895 24.4

ACGTG 452,092 24.3

TCGTT 289,165 24.2

TGCGA 207,264 24.1

TTTTG 4,805,218 24.1

AGTTT 3,018,806 23.9

GGTCC 798,015 23.8

TCCAA 1,681,218 23.8

GTCGT 168,842 23.7

CCCAC 1,692,493 23.6

GGATT 1,905,074 23.6

CATTC 1,880,272 23.4

GCGTG 555,896 23.4

GTGGG 1,889,457 23.3

CGGCA 258,868 23.2

AGGAG 2,808,406 23.1

GGTTT 2,168,060 23.1

GCCCA 1,780,207 23

TTCGA 261,131 23

CAGGG 2,049,422 22.9

CGGGT 348,816 22.9

TCCAC 1,482,678 22.9

CCGCC 613,395 22.8

Page 84: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

81

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences ATCCG 220,629 22.7

ATTTC 3,021,513 22.5

TGATT 2,484,674 22.5

AAAAA 9,462,714 22.2

TTCGG 225,076 22.2

CCCAA 1,818,902 22

GAACC 1,047,764 22

GTGGT 1,815,701 22

AACGA 228,240 21.9

CCCAG 3,254,121 21.8

AGGGC 1,333,814 21.7

TAGGT 1,244,207 21.7

AAATC 2,084,708 21.6

TAGGA 1,535,787 21.5

CAGAT 1,915,927 21.4

AATTG 2,014,516 21.3

GCGCT 235,541 21.2

GCCAG 1,989,326 21.1

TCCTG 2,988,450 21.1

GATCA 1,394,788 20.8

GCCAA 1,441,148 20.8

TAGGG 1,061,118 20.7

ATCGG 149,114 20.1

CAGGA 2,701,023 20

TCCAG 2,350,691 20

TTTCG 301,797 19.9

CGCAT 202,138 19.8

CGGGA 456,824 19.7

CTCCA 2,278,862 19.7

CGGGC 458,659 19.6

CCGAA 205,761 19.4

TCGAA 258,183 19.4

CGGAT 207,047 19.3

Page 85: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

82

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CGATT 260,957 19.2

CGGGG 520,613 19.2

TATTT 5,616,489 19.2

TTGGG 2,131,650 19.2

AATCA 2,042,160 19.1

GCCCG 471,701 19.1

CTCCT 2,887,270 19

GTGGA 1,681,036 19

CCCTC 1,688,997 18.9

AAATT 4,196,357 18.8

AGTTC 1,645,584 18.8

GCCTA 905,644 18.8

TGGGC 1,860,194 18.8

TTGGT 1,917,795 18.8

CTGGG 3,394,548 18.6

TAGAT 1,614,361 18.6

ATCCT 1,783,775 18.5

GGGGC 1,242,732 18.5

CTCCG 434,620 18.4

TGGAT 1,845,048 18.4

CGGAG 438,057 18.3

CTCTA 1,583,490 18.3

TGGGA 3,069,293 18.2

CGGCT 444,780 18

GCGAC 166,650 18

GGGGA 1,668,896 18

GATCT 1,453,710 17.9

GTCCG 167,450 17.9

TACCA 1,341,094 17.9

CGCGG 168,320 17.8

ATCCA 1,653,448 17.5

GCGAA 172,071 17.4

AATTA 3,028,733 17.2

Page 86: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

83

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TCGCC 290,342 17.2

AATCT 2,044,248 17.1

ATTTT 6,889,566 17.1

ACCAA 1,470,794 17

TTTTC 4,633,855 17

GAATC 1,256,563 16.7

GGGAT 1,619,645 16.7

TCGAC 121,003 16.5

GATCC 977,327 16.4

GGCCG 486,575 16.4

GTGGC 1,770,658 16.4

TTGGA 2,021,706 16.3

CTGGA 2,422,025 16.1

AAGGT 1,565,091 16

ACCTT 1,627,537 16

TCGAG 314,039 15.9

ACCTG 1,902,889 15.8

GTCGA 126,568 15.8

GGGAG 2,548,300 15.7

TCCTT 2,740,778 15.7

TCGCA 191,240 15.7

GAGGG 1,728,181 15.6

GGTTC 1,151,677 15.6

AAGGG 1,618,064 15.5

TAGAG 1,744,127 15.5

AAGGA 2,465,830 15.4

CGCCT 649,561 15.4

AGGAT 1,828,630 15.3

AAGGC 1,448,102 15.2

ATCCC 1,444,244 15.2

CGGTT 199,182 15.1

ACCTC 1,802,771 15

TCGTC 200,023 15

Page 87: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

84

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences ACCTA 1,071,888 14.9

CTCCC 2,546,263 14.9

GGGCC 1,148,697 14.8

TCCTC 2,104,325 14.7

TCCTA 1,443,269 14.6

TTGGC 1,717,496 14.6

ACGAG 275,645 14.5

ATCTG 1,927,416 14.5

GAGGA 2,135,035 14.5

GGCCT 1,694,825 14.2

ATCAA 1,629,493 14.1

GGGAC 1,132,001 14.1

TGGTA 1,488,127 14.1

ATGAT 2,077,816 14

TCGCT 358,226 14

GTCCA 1,081,576 13.9

GAATT 2,253,811 13.8

CTCGC 293,386 13.6

GGATC 961,929 13.5

TACCT 1,407,871 13.5

TGGAG 2,596,589 13.5

GATTT 2,533,510 13.4

CACCT 2,032,511 13.3

GTCTA 916,887 13.1

ACGAA 231,638 13

ATCAG 1,609,797 13

CACCA 2,173,231 12.9

CGGCG 154,655 12.9

TGCCT 2,791,274 12.9

TGGTG 2,471,478 12.9

CACCG 389,594 12.8

CGGAA 234,656 12.8

CTCGA 312,122 12.8

Page 88: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

85

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GATTC 1,332,981 12.8

GACAT 1,424,213 12.6

GGGCA 1,688,438 12.4

AGGTG 2,171,788 12

CAGAC 1,336,578 12

GTGAC 1,166,789 12

ATCTT 2,352,939 11.9

CTGGC 2,009,055 11.9

AACAT 2,448,809 11.8

GCCTC 2,464,770 11.8

TAGGC 935,262 11.8

TGGCA 2,040,974 11.8

TGGAC 1,118,195 11.6

AAGAT 2,182,717 11.5

GAGAG 2,174,136 11.5

GTCCC 1,128,048 11.5

CAGGC 2,547,809 11.4

AATCC 1,599,750 11.3

ATGTA 2,115,619 11.3

GGGAA 2,043,736 11.3

GTGAT 1,856,529 11.3

CACAT 2,075,826 11.1

CGCAA 179,679 11.1

TAGAA 2,336,356 11.1

TTCCT 3,255,059 11.1

GTCAA 1,086,031 11

GTCTT 2,002,883 11

AACCG 183,118 10.9

CTCGG 549,387 10.9

AGCCA 2,317,199 10.8

CTCTC 2,135,735 10.8

AGCCT 2,521,864 10.7

CACAA 1,685,692 10.7

Page 89: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

86

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CTCTT 2,613,324 10.7

GCCTG 2,607,454 10.7

TAGTG 1,303,038 10.7

TTTTT 12,403,974 10.7

ATGCA 1,784,314 10.6

TTCCC 2,099,250 10.5

CACCC 1,542,080 10.4

AGGCC 1,570,698 10.2

CTGAT 1,673,282 10.2

ATCAT 1,981,919 10.1

ATCTC 1,988,006 10.1

CACGA 296,505 10.1

GCGAG 297,587 10.1

CAGAG 2,709,767 10

CGGTG 399,373 10

GAGGC 2,308,648 10

GGGTA 901,530 10

GTCAT 1,393,797 10

TCGAT 201,684 9.9

TGGAA 2,615,669 9.9

AGGAA 2,969,380 9.8

AACTA 1,447,514 9.7

AACCT 1,674,876 9.6

AGGCA 2,512,670 9.6

ATGAA 2,702,783 9.6

ATTCG 207,881 9.6

CTCAT 1,985,339 9.6

GACAC 1,041,277 9.6

TACAG 2,072,679 9.6

TTGAG 2,408,269 9.6

GCCTT 1,679,701 9.5

TTGAT 2,105,257 9.5

TTCCA 2,564,195 9.4

Page 90: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

87

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences AGGTA 1,397,835 9.3

GTCAG 1,516,244 9.2

TAGTT 1,837,905 9.2

ATGAG 1,986,080 9.1

GTGTA 1,312,975 9.1

AAGAG 2,320,735 9

GACAA 1,334,055 9

GTCTG 1,548,023 9

GTGAG 2,113,466 9

TACTA 1,335,826 9

AATTT 4,476,001 8.9

AAGAC 1,611,101 8.7

CTGTC 1,951,171 8.7

GACCC 915,662 8.7

GAGAT 2,079,231 8.7

GAGGT 1,847,409 8.7

AGCCC 1,414,929 8.5

CACTA 1,057,476 8.5

CTCAA 2,002,943 8.5

TGGTC 1,409,723 8.5

TTGTC 1,673,883 8.4

ATCAC 1,562,962 8.3

TGCCA 2,050,488 8.3

TGGCT 2,530,312 8.3

GATCG 242,933 8.2

TTCTT 4,488,245 8.2

ATGAC 1,240,787 8.1

CACAG 2,229,476 8.1

GTGAA 1,980,737 8.1

CTCTG 2,881,012 8

TAGTC 994,806 8

AACCA 1,640,714 7.9

AGGCT 2,518,182 7.9

Page 91: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

88

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences GACCA 1,268,804 7.9

AATTC 2,042,794 7.8

TTCAC 1,912,944 7.8

AACAA 2,583,271 7.7

AGGTT 1,824,771 7.7

GGGTT 1,564,662 7.7

TAGAC 903,455 7.7

TTCTA 2,353,747 7.6

CTGTG 2,669,503 7.5

TTGTA 2,392,407 7.5

GGGTC 951,210 7.4

TGGTT 2,028,333 7.4

TACTT 2,056,964 7.3

TGCCC 1,769,066 7.3

TTGAC 1,238,357 7.3

AGCAT 1,809,041 7.2

CGGTA 138,026 7.2

GAGAA 2,772,870 7.2

GGCCA 1,798,589 7.2

GGGTG 1,678,020 7.2

TTGAA 2,940,009 7.1

AGCCG 426,921 7

GACCT 1,289,808 7

GGCCC 1,138,358 7

TTCTC 2,989,593 7

TTGTG 2,276,551 7

AACCC 1,300,674 6.9

AACTC 1,585,053 6.9

AAGTG 2,306,384 6.9

ATGTC 1,452,773 6.9

AGCAA 2,052,753 6.8

ATGTT 2,786,964 6.8

CCGAT 146,543 6.8

Page 92: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

89

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TAGCG 146,093 6.8

GACAG 1,795,098 6.7

TGCCG 298,743 6.7

TGCAC 1,505,446 6.6

TGGCG 452,214 6.6

GACTA 920,327 6.5

GAGCG 309,210 6.5

GAGAC 1,874,728 6.4

CACAC 2,081,905 6.2

CGGTC 161,217 6.2

TTGTT 3,529,061 6.2

GTGTG 2,651,781 6

TTCTG 3,182,478 6

CACTC 1,512,674 5.9

CAGAA 2,712,980 5.9

CCGCG 170,170 5.9

TACAT 2,035,214 5.9

GACCG 172,337 5.8

TACCC 860,100 5.8

TTCAT 2,943,069 5.8

AACTG 1,756,292 5.7

CACTG 2,438,855 5.7

CTGTT 2,475,015 5.7

GAGCA 1,574,195 5.7

GGCAG 2,285,671 5.7

GTCTC 1,946,130 5.7

TTGCA 2,099,278 5.7

GGGCG 535,296 5.6

TGCAT 1,971,144 5.6

AACAC 1,449,098 5.5

AAGTT 2,181,688 5.5

CAGCG 361,790 5.5

CTGAG 2,731,706 5.5

Page 93: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

90

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences CTGCG 367,134 5.4

TACTG 1,488,913 5.4

TTGCC 1,678,964 5.4

CAGTG 2,646,157 5.3

TACTC 1,133,773 5.3

CTCAC 1,931,040 5.2

CTGTA 2,130,012 5.2

TACAA 1,908,493 5.2

TGCAA 1,917,811 5.2

AAGCA 2,170,505 5.1

ATCGA 195,088 5.1

AACAG 2,000,834 5

AGGAC 1,231,398 4.9

CTGCT 2,245,964 4.9

TTCAG 2,522,192 4.8

AAGCT 1,712,564 4.7

AGGCG 638,469 4.7

CAGTA 1,494,909 4.7

CGCCA 422,958 4.7

GGCTG 2,741,900 4.7

GTGCT 1,714,266 4.7

TACAC 1,057,643 4.7

TGGCC 1,920,427 4.7

AGCAG 2,164,259 4.6

GACTG 1,307,666 4.6

GGGCT 1,523,231 4.6

TGCAG 2,605,444 4.6

CTCAG 2,736,400 4.4

GTCCT 1,350,689 4.4

TGCTC 1,585,672 4.4

TTGCT 2,485,128 4.4

AAGTC 1,400,112 4.3

GAGTC 1,154,013 4.3

Page 94: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

91

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences AATCG 239,739 4.2

TGCTA 1,429,971 4.2

AGCAC 1,475,385 4.1

CACTT 2,208,939 4.1

GAGTA 1,235,963 4

GAGTG 1,765,602 4

ATGCT 1,790,013 3.9

ATGTG 2,336,429 3.9

CTGAA 2,287,643 3.9

GAGTT 1,807,249 3.9

TTCAA 2,591,506 3.9

CTGCC 2,363,179 3.8

GGCAT 1,566,285 3.8

TTCCG 261,801 3.8

GAGCT 1,605,030 3.7

GGCTA 1,071,955 3.7

TGCTG 2,682,543 3.7

AAGAA 3,668,520 3.5

CTGAC 1,424,020 3.5

GGCAA 1,420,064 3.5

GTCAC 1,132,235 3.5

GTGCG 287,419 3.5

TAGCA 1,427,378 3.5

TGCTT 2,537,248 3.5

CAGCA 2,329,412 3.4

CAGCT 2,359,725 3.4

CAGTT 2,048,159 3.4

AACTT 2,124,044 3.3

AAGCG 306,975 3.3

AGCTG 2,432,398 3.3

ATGCC 1,507,838 3.3

TAGTA 1,525,667 3.3

AGCTC 1,543,819 3.2

Page 95: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

92

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences AGCTT 1,856,882 3.2

CAGCC 2,575,196 3.1

GGCTC 1,621,320 3.1

GTGCA 1,635,546 3.1

CGCAG 336,093 3

GAGCC 1,644,944 3

GTGTT 1,971,703 3

CTGCA 2,480,807 2.8

GACTC 1,085,413 2.8

TAGCC 1,078,376 2.8

AAGTA 1,952,402 2.6

AGGTC 1,225,867 2.4

TAGCT 1,696,538 2.4

CAGTC 1,327,380 2.3

GTGTC 1,277,120 2.3

AAGCC 1,382,283 2.2

GTGCC 1,359,530 2.2

GACTT 1,519,825 2

AGCTA 1,540,926 1.9

CGCCC 525,704 1.9

GGCTT 1,581,596 1.9

GGCAC 1,228,974 0.8

ACGAC 130,986 0

ATGCG 211,435 0

CGCAC 240,881 0

CGCCG 149,145 0

CGCTA 133,733 0

CGCTC 291,869 0

CGCTG 371,534 0

CGCTT 308,712 0

CGGAC 152,601 0

CGGCC 478,383 0

TACGG 156,372 0

Page 96: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

93

Supplementary Table 11 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced

transcript (transcribed region, including introns) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript

mutations (sense strand) per million quintuplet

occurrences TTGCG 214,486 0

Page 97: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

94

Supplementary Table 12

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences

ATAGG : CCTAT 2,052,364 849.7

CCTAG : CTAGG 1,896,141 771.6

ACAGG : CCTGT 3,450,530 751.5

ATAGC : GCTAT 1,959,877 743.4

ACAGC : GCTGT 2,532,402 706.1

CCTAC : GTAGG 1,484,355 655.5

ATAGA : TCTAT 3,476,362 647.0

CTAGC : GCTAG 1,398,035 633.7

CTAGA : TCTAG 2,416,288 626.5

ATACA : TGTAT 4,300,830 615.0

CCAGG : CCTGG 4,241,635 593.7

ACAGA : TCTGT 4,777,684 592.8

CTACA : TGTAG 2,626,430 571.1

CCTGC : GCAGG 2,958,690 565.1

ACAGT : ACTGT 3,070,569 546.8

ATACG : CGTAT 375,014 522.6

CCAGA : TCTGG 3,326,103 510.5

GTAGA : TCTAC 2,365,186 510.3

GCTAC : GTAGC 1,509,832 490.1

GCAGC : GCTGC 2,293,528 485.8

ACTGG : CCAGT 2,650,172 484.9

CATAG : CTATG 2,315,218 476.9

CCTGA : TCAGG 3,432,582 468.8

CTATA : TATAG 2,594,873 468.2

CTAAG : CTTAG 2,099,113 465.9

CCAGC : GCTGG 3,965,825 463.5

GTACA : TGTAC 1,938,474 460.2

ACTAT : ATAGT 2,711,116 458.8

ACATG : CATGT 3,734,730 450.7

ACTAG : CTAGT 1,673,935 439.0

AGTAT : ATACT 2,747,714 438.2

GCAGA : TCTGC 3,286,245 437.8

CTACC : GGTAG 1,566,082 432.9

Page 98: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

95

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences ACATA : TATGT 4,063,726 425.7

ATATG : CATAT 3,878,372 422.6

ATACC : GGTAT 1,862,340 419.3

CGTAG : CTACG 255,747 414.4

GCTGA : TCAGC 3,122,946 397.0

ACTGC : GCAGT 2,766,283 392.6

CATGG : CCATG 3,238,291 390.3

CCTAA : TTAGG 2,290,472 390.3

ACACA : TGTGT 5,179,407 388.5

ATAAG : CTTAT 2,991,670 384.8

AGTAG : CTACT 2,655,704 384.5

ACTGA : TCAGT 3,143,432 382.7

CCATA : TATGG 2,275,110 378.4

AGTAC : GTACT 1,476,881 370.4

ATATA : TATAT 6,772,263 367.7

ACACG : CGTGT 546,593 364.1

ACTAC : GTAGT 1,744,628 359.4

GCATA : TATGC 1,989,600 358.3

CATAC : GTATG 1,887,210 357.6

ACAAG : CTTGT 2,877,958 352.0

CTTAC : GTAAG 1,832,124 347.2

ACACT : AGTGT 2,860,776 343.3

TCAGA : TCTGA 3,786,073 343.1

GCTAA : TTAGC 2,215,166 339.0

GCACA : TGTGC 2,675,440 331.1

GTATA : TATAC 2,573,081 329.2

ATAAC : GTTAT 2,423,998 321.4

CTATC : GATAG 1,774,630 320.6

CTAAC : GTTAG 1,612,806 315.0

CCAAG : CTTGG 3,021,796 314.7

CATGC : GCATG 2,630,545 309.8

ACATC : GATGT 2,408,340 309.3

GGTAC : GTACC 1,097,889 308.8

Page 99: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

96

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CAAGG : CCTTG 2,819,989 308.5

TCTAA : TTAGA 3,209,633 305.0

CCACA : TGTGG 3,442,619 303.5

ACAAC : GTTGT 2,299,513 301.8

ATATC : GATAT 2,846,993 297.5

CTTGC : GCAAG 2,325,023 291.6

ACAAT : ATTGT 3,739,513 289.6

CCTTA : TAAGG 2,140,251 286.9

CTAAA : TTTAG 3,845,536 285.3

GTAAC : GTTAC 1,442,530 282.1

AGAGG : CCTCT 3,609,743 281.7

GATAC : GTATC 1,475,654 279.2

ACACC : GGTGT 2,190,637 273.4

CGTAC : GTACG 170,187 270.3

AGTGC : GCACT 2,404,275 267.9

AATAG : CTATT 3,515,013 262.9

ATTGG : CCAAT 2,413,363 259.0

ATAAA : TTTAT 8,099,851 258.9

AGTGG : CCACT 3,095,771 257.8

ATTAG : CTAAT 3,039,701 257.3

TGTAA : TTACA 4,148,426 257.2

CCACG : CGTGG 642,335 250.7

CCAAC : GTTGG 2,121,332 249.4

CCATC : GATGG 2,829,160 243.9

AGAGC : GCTCT 2,615,718 240.9

GTAAA : TTTAC 3,406,648 236.0

ACTAA : TTAGT 2,881,846 236.0

AAAGG : CCTTT 4,226,614 234.7

GCACC : GGTGC 1,682,128 233.6

GATGC : GCATC 1,677,346 231.3

CCTCA : TGAGG 3,771,129 230.7

GCAAC : GTTGC 1,815,344 230.2

ATTGC : GCAAT 2,588,999 229.8

Page 100: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

97

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences ATAAT : ATTAT 5,489,421 227.9

CTTGA : TCAAG 3,322,833 227.6

ACAAA : TTTGT 6,652,102 227.4

CATGA : TCATG 3,425,400 226.8

AGTAA : TTACT 3,229,297 226.1

CAAGC : GCTTG 2,176,986 224.6

CGTGC : GCACG 452,367 223.3

ATTAC : GTAAT 2,947,816 222.9

CGAGC : GCTCG 267,150 220.8

GGTAA : TTACC 1,932,429 216.9

GCTTA : TAAGC 1,817,182 216.3

TATGA : TCATA 3,145,460 212.7

AATGT : ACATT 4,725,048 211.2

CCAAA : TTTGG 4,516,292 209.6

CTTAA : TTAAG 3,198,367 208.5

AATAT : ATATT 6,575,896 206.3

CAAGA : TCTTG 3,748,442 206.2

CCTCG : CGAGG 589,094 203.7

TCACA : TGTGA 3,913,802 202.9

CCACC : GGTGG 3,273,847 193.3

AGTGA : TCACT 3,915,720 191.5

GTTAA : TTAAC 2,494,623 191.2

CCTTC : GAAGG 3,205,842 190.9

GTTGA : TCAAC 2,242,837 190.4

AGAGA : TCTCT 5,807,492 187.2

CATAA : TTATG 3,392,975 186.9

ATTGA : TCAAT 3,409,665 184.2

ACTTG : CAAGT 2,801,997 182.3

TAAGA : TCTTA 3,367,481 180.9

AAAGC : GCTTT 3,462,030 179.9

AATGG : CCATT 3,768,892 175.9

ACTCT : AGAGT 3,097,200 174.6

GCTCA : TGAGC 2,942,445 174.0

Page 101: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

98

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CGTAA : TTACG 318,896 172.5

AATAC : GTATT 3,349,997 171.0

TATAA : TTATA 4,968,257 169.5

GCAAA : TTTGC 3,718,838 168.9

CCTCC : GGAGG 4,217,404 165.5

ACTTA : TAAGT 2,796,406 163.8

ACTCA : TGAGT 3,077,142 163.1

GATGA : TCATC 2,849,649 159.7

AATGC : GCATT 2,961,725 157.4

CGTGA : TCACG 681,218 157.1

CGAGA : TCTCG 697,460 154.8

ACCGT : ACGGT 374,711 152.1

TTAAA : TTTAA 7,445,848 148.0

ACTCG : CGAGT 381,872 146.6

AGAAG : CTTCT 4,707,560 145.5

TCTCA : TGAGA 4,855,169 145.4

TCAAA : TTTGA 5,337,344 143.5

GAAGC : GCTTC 2,372,194 142.1

GATAA : TTATC 2,890,095 141.9

GGTGA : TCACC 2,695,528 140.6

AAAGA : TCTTT 7,414,758 138.1

GCTCC : GGAGC 1,910,076 137.7

GGAGA : TCTCC 3,817,593 137.0

ATTAA : TTAAT 5,119,815 134.1

ACGGC : GCCGT 276,090 134.0

GAAGA : TCTTC 4,233,476 130.4

CTTCA : TGAAG 3,837,423 129.5

CGTTA : TAACG 294,433 125.7

CGAAG : CTTCG 349,354 120.3

CAACA : TGTTG 3,717,821 119.7

CGATA : TATCG 243,357 119.2

AGAAC : GTTCT 2,999,610 118.7

CAAAG : CTTTG 4,519,070 116.2

Page 102: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

99

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences AGACA : TGTCT 4,209,019 115.2

ACGGG : CCCGT 442,822 115.2

ACTTC : GAAGT 2,862,069 113.5

ACTCC : GGAGT 2,708,221 112.6

AATGA : TCATT 5,030,821 110.5

CTTTA : TAAAG 4,018,590 110.2

CTTCC : GGAAG 3,779,279 109.8

CAATG : CATTG 3,026,134 107.8

AGTCG : CGACT 233,226 107.2

TAACA : TGTTA 3,362,077 105.3

CGACA : TGTCG 324,069 104.9

CGAAC : GTTCG 270,384 103.6

GCCGC : GCGGC 318,992 103.4

CCCGC : GCGGG 541,822 101.5

TGACA : TGTCA 3,205,410 101.4

CAATA : TATTG 3,400,301 100.9

AGATG : CATCT 3,743,691 99.9

CCGGC : GCCGG 501,117 99.8

CAACG : CGTTG 332,131 99.4

ACGGA : TCCGT 433,275 99.3

CCGCG : CGCGG 162,136 98.6

AAAGT : ACTTT 4,911,127 97.2

GTTCA : TGAAC 2,767,227 96.2

CGACG : CGTCG 54,041 92.6

ACCGC : GCGGT 327,027 91.7

ATTCA : TGAAT 4,511,329 90.2

ACGCA : TGCGT 379,989 89.5

AGTTG : CAACT 2,433,741 88.8

AATAA : TTATT 7,753,623 87.8

CATCG : CGATG 325,063 86.2

AGATA : TATCT 3,622,646 85.9

CATTA : TAATG 3,292,676 85.6

CTTTC : GAAAG 4,424,502 85.2

Page 103: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

100

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences GCGCA : TGCGC 306,633 84.8

CAAAC : GTTTG 3,214,849 84.6

AGCGC : GCGCT 272,542 84.4

AGTCA : TGACT 2,858,999 83.9

AAAAG : CTTTT 7,052,340 83.3

AGACT : AGTCT 3,037,692 83.3

AGAAT : ATTCT 5,136,604 82.4

AGACG : CGTCT 585,786 82.0

CATCA : TGATG 3,323,362 81.8

ACCGA : TCGGT 269,661 81.5

GCGGA : TCCGC 405,725 81.3

TATCA : TGATA 3,078,532 80.5

AGACC : GGTCT 2,239,082 80.4

GCGTA : TACGC 186,896 80.2

AGAAA : TTTCT 8,982,776 79.9

GGTTA : TAACC 1,578,124 79.8

ATTCG : CGAAT 340,773 79.3

ACGCT : AGCGT 343,503 78.6

ATTTG : CAAAT 5,046,087 77.7

TAATA : TATTA 4,751,072 75.8

CCCGG : CCGGG 755,245 75.4

CCGGA : TCCGG 308,370 74.6

GGACA : TGTCC 2,342,223 74.3

CGATC : GATCG 338,139 71.0

GGAAC : GTTCC 1,880,006 70.7

CAACC : GGTTG 1,995,173 70.2

ACCGG : CCGGT 242,556 70.1

CGTCA : TGACG 328,474 70.0

CGAAA : TTTCG 430,148 69.7

ATTCC : GGAAT 3,077,476 69.2

GTTTA : TAAAC 3,025,257 69.1

AAACA : TGTTT 6,593,719 67.3

CATCC : GGATG 2,372,318 66.6

GAACA : TGTTC 3,130,709 64.9

Page 104: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

101

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CAATC : GATTG 2,087,332 64.7

TGAAA : TTTCA 6,328,700 63.9

AGTTA : TAACT 2,739,276 63.9

CGTTC : GAACG 316,884 63.1

CCCGA : TCGGG 431,561 62.6

CATTC : GAATG 3,442,173 62.5

ATTTA : TAAAT 6,541,833 62.4

ACGTC : GACGT 320,149 62.4

TCCGA : TCGGA 258,527 61.9

ACGTG : CACGT 654,147 61.1

AGGGG : CCCCT 2,293,071 61.0

CGTCC : GGACG 294,768 61.0

CCGCA : TGCGG 379,538 60.6

GAATA : TATTC 3,514,487 60.1

GCCGA : TCGGC 484,164 59.9

CACGC : GCGTG 671,787 59.5

ATGGG : CCCAT 2,777,449 59.1

CAAAA : TTTTG 7,444,256 56.9

CACGG : CCGTG 517,811 56.0

AACGC : GCGTT 267,967 56.0

AGGGA : TCCCT 3,562,788 55.6

GGTCA : TGACC 2,040,429 55.4

ACGCC : GGCGT 549,598 54.6

ACCCT : AGGGT 2,229,226 53.4

GGATA : TATCC 2,042,852 53.3

ACCCC : GGGGT 1,894,959 53.3

AGATC : GATCT 2,365,504 52.9

ACGTA : TACGT 435,280 52.8

GCGCC : GGCGC 498,139 52.2

AAATG : CATTT 6,725,462 51.8

ACCCA : TGGGT 3,008,267 51.5

GGAAA : TTTCC 4,995,248 51.4

ACGAT : ATCGT 371,807 51.1

CCCCA : TGGGG 3,180,062 50.9

Page 105: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

102

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CCGTA : TACGG 236,122 50.8

AGCGG : CCGCT 339,581 50.1

CCCTA : TAGGG 1,665,007 49.8

GAAAC : GTTTC 3,351,605 49.5

AGTCC : GGACT 1,942,058 49.4

GGGGA : TCCCC 2,523,449 48.7

ATGGA : TCCAT 3,722,767 48.3

CGGGA : TCCCG 601,104 48.2

AGGGC : GCCCT 1,981,241 47.4

AGTTC : GAACT 2,728,094 47.3

ACCAT : ATGGT 3,157,808 46.8

CCGTC : GACGG 448,462 46.8

AAACG : CGTTT 581,809 46.4

GGCGA : TCGCC 367,842 46.2

AAAAC : GTTTT 5,844,209 46.0

GATTA : TAATC 2,814,962 45.8

AACGT : ACGTT 525,445 45.7

CCCCC : GGGGG 1,673,893 45.4

AGCGA : TCGCT 488,795 45.0

CCGCC : GGCGG 712,310 44.9

ACCCG : CGGGT 446,585 44.8

TCCCA : TGGGA 4,695,858 44.5

TAAAA : TTTTA 9,212,257 44.4

CCCCG : CGGGG 585,338 44.4

CACGA : TCGTG 478,740 43.9

AAACT : AGTTT 4,918,718 43.7

CGACC : GGTCG 207,990 43.3

AATTG : CAATT 3,637,696 43.1

ATTTC : GAAAT 5,379,726 42.8

CAGGG : CCCTG 3,061,530 41.5

ACCAG : CTGGT 2,654,089 41.1

CCCTC : GAGGG 2,562,870 41.0

ACCTG : CAGGT 2,945,489 40.8

ACGAG : CTCGT 417,911 40.7

Page 106: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

103

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences GATCA : TGATC 2,385,310 40.2

ATCGA : TCGAT 325,380 39.9

GAACC : GGTTC 1,741,491 39.0

CCGAA : TTCGG 309,817 38.7

AAATA : TATTT 9,528,640 38.4

ACCTA : TAGGT 2,014,228 37.7

AATTA : TAATT 5,738,091 37.3

AATCT : AGATT 3,824,640 36.6

CGCGC : GCGCG 193,803 36.1

CCCAG : CTGGG 4,805,933 35.8

GATCC : GGATC 1,536,655 35.8

GACGC : GCGTC 223,131 35.8

ATGGC : GCCAT 2,453,540 35.5

ACCAA : TTGGT 2,998,119 35.0

ACCAC : GTGGT 2,627,802 35.0

CTCGC : GCGAG 372,293 34.9

AAACC : GGTTT 3,341,606 34.7

GGACC : GGTCC 1,180,875 34.7

TACGA : TCGTA 261,572 34.4

TCGCA : TGCGA 291,523 34.3

AACGG : CCGTT 324,489 33.9

ATCCG : CGGAT 297,731 33.6

AAGGA : TCCTT 4,524,298 33.3

CCCAA : TTGGG 3,196,868 33.2

GAAAA : TTTTC 7,580,956 32.8

AATCA : TGATT 4,144,717 32.8

GCCCA : TGGGC 2,648,929 32.5

GTGGA : TCCAC 2,589,941 32.4

AAAAT : ATTTT 11,341,687 32.3

ATCCA : TGGAT 3,084,545 32.1

ATCGG : CCGAT 219,179 31.9

CGGGC : GCCCG 538,369 31.6

GAATC : GATTC 2,265,912 30.9

CTGGA : TCCAG 3,820,895 30.6

Page 107: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

104

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences AAGGG : CCCTT 2,776,940 30.6

CAGGA : TCCTG 4,405,951 30.4

ATCTA : TAGAT 2,810,436 30.2

ATCCC : GGGAT 2,478,217 29.8

CTCCA : TGGAG 3,906,542 28.5

TAGGA : TCCTA 2,521,865 28.1

CCCAC : GTGGG 2,752,863 28.0

GCCCC : GGGGC 1,640,122 27.4

TCGAA : TTCGA 401,739 27.4

GAGGA : TCCTC 3,348,583 27.2

AGCCG : CGGCT 553,889 27.1

ACCTC : GAGGT 2,823,718 26.9

CACCG : CGGTG 522,284 26.8

CTCCC : GGGAG 3,808,670 26.5

CTGGC : GCCAG 2,902,899 26.5

AAGGT : ACCTT 2,611,045 26.5

TCCAA : TTGGA 3,302,901 26.3

ATCTG : CAGAT 3,249,764 26.2

ACGAA : TTCGT 421,152 26.1

AAATC : GATTT 4,129,568 25.9

CCGAG : CTCGG 700,550 25.7

ATCGC : GCGAT 389,448 25.6

CTCTA : TAGAG 2,773,224 24.6

GCCAC : GTGGC 2,484,104 24.5

AGGAG : CTCCT 4,385,115 24.2

AAAAA : TTTTT 17,969,434 24.0

ACGAC : GTCGT 208,177 24.0

AGGAT : ATCCT 3,020,489 23.8

CGCCG : CGGCG 126,036 23.8

ATCAT : ATGAT 3,677,654 23.4

CCGAC : GTCGG 214,350 23.3

AATCC : GGATT 2,929,127 22.9

GGGAA : TTCCC 3,400,156 22.6

Page 108: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

105

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences AATCG : CGATT 399,232 22.6

ATCAG : CTGAT 2,847,458 22.1

GACGA : TCGTC 271,370 22.1

ATCAA : TTGAT 3,532,513 21.8

AAGAT : ATCTT 4,004,025 21.7

TACCA : TGGTA 2,440,256 21.3

AACCG : CGGTT 281,066 21.3

CACCA : TGGTG 3,709,651 21.0

AGCCA : TGGCT 3,859,218 20.5

CTCGA : TCGAG 444,875 20.2

CGGAG : CTCCG 552,978 19.9

ATCTC : GAGAT 3,492,313 19.8

GCCAA : TTGGC 2,575,039 19.8

AACGA : TCGTT 407,359 19.6

GGGAC : GTCCC 1,696,168 19.5

GCGAA : TTCGC 257,972 19.4

AGGTC : GACCT 1,975,778 19.2

AATTC : GAATT 3,843,858 18.5

GAGGC : GCCTC 3,452,260 18.5

GCGAC : GTCGC 215,949 18.5

AGGCA : TGCCT 4,143,383 18.3

GTCCA : TGGAC 1,814,974 18.2

AAGGC : GCCTT 2,480,421 18.1

CTCTC : GAGAG 3,640,518 17.6

AGGTG : CACCT 3,202,165 17.5

GGGTA : TACCC 1,491,394 17.4

AAATT : AATTT 7,925,716 17.3

AGGAA : TTCCT 5,349,454 17.2

AACTA : TAGTT 2,982,814 17.1

AGGAC : GTCCT 2,045,582 17.1

GCCTA : TAGGC 1,522,285 17.1

GTCTA : TAGAC 1,650,946 17.0

CACTA : TAGTG 2,012,473 16.9

Page 109: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

106

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences ATGAG : CTCAT 3,452,201 16.8

TGCCA : TGGCA 3,270,540 16.8

GTCGA : TCGAC 180,867 16.6

ATGTA : TACAT 3,772,157 16.5

CACCC : GGGTG 2,407,840 16.2

CGCTC : GAGCG 372,157 16.2

ATGAC : GTCAT 2,264,189 15.9

AGCAT : ATGCT 3,100,402 15.8

TGGAA : TTCCA 4,630,455 15.5

AAGAG : CTCTT 4,263,642 15.5

CTGTA : TACAG 3,350,982 15.2

CAGAG : CTCTG 4,452,819 14.9

TAGAA : TTCTA 4,232,848 14.8

ATGTC : GACAT 2,573,351 14.8

CGGAC : GTCCG 203,247 14.8

CGGTA : TACCG 202,040 14.8

ATGTG : CACAT 3,956,837 14.7

AACCA : TGGTT 3,195,409 14.7

AGCCT : AGGCT 3,773,234 14.6

CGGCC : GGCCG 549,181 14.5

CAGGC : GCCTG 3,670,279 14.4

CGGTC : GACCG 208,544 14.4

GACCA : TGGTC 2,092,573 14.3

AGGTA : TACCT 2,325,061 14.2

ATCAC : GTGAT 2,874,220 14.0

AACCT : AGGTT 2,891,494 13.8

AGCCC : GGGCT 2,137,993 13.1

ACGCG : CGCGT 76,690 13.0

TACTA : TAGTA 2,388,396 12.6

AGGCC : GGCCT 2,401,729 12.5

GTCAA : TTGAC 2,080,186 12.5

CACAG : CTGTG 3,867,255 12.4

GTGAA : TTCAC 3,329,761 12.3

Page 110: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

107

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CTCAA : TTGAG 3,703,373 12.2

CTGAC : GTCAG 2,300,700 11.8

CGCAG : CTGCG 426,776 11.7

CTGTC : GACAG 2,867,756 11.5

CGCGA : TCGCG 87,447 11.5

TTCAA : TTGAA 4,996,620 11.4

AACCC : GGGTT 2,288,382 11.4

CACAC : GTGTG 3,837,959 11.2

AAGAC : GTCTT 3,105,255 11.2

GTCAC : GTGAC 1,788,265 11.1

CGCCC : GGGCG 629,376 11.1

AGCAG : CTGCT 3,469,011 11.0

GGGCA : TGCCC 2,539,788 11.0

CTCAG : CTGAG 4,276,152 10.9

CTCAC : GTGAG 3,214,093 10.9

AAGAA : TTCTT 7,391,191 10.8

AACAT : ATGTT 4,629,550 10.8

AACTT : AAGTT 3,778,240 10.8

GAGAC : GTCTC 2,984,778 10.8

GACAA : TTGTC 2,690,605 10.4

CAGAC : GTCTG 2,330,738 10.3

GACAC : GTGTC 1,948,916 10.3

ATGAA : TTCAT 5,057,318 9.9

GGCCA : TGGCC 2,720,333 9.9

CACAA : TTGTG 3,548,989 9.8

AACTG : CAGTT 3,186,755 9.7

CAGAA : TTCTG 5,142,565 9.6

CTGAA : TTCAG 4,060,876 9.6

AGCTA : TAGCT 2,655,556 9.4

AGCAC : GTGCT 2,450,745 9.4

AACAC : GTGTT 2,920,967 9.2

AACAA : TTGTT 5,508,395 9.1

ATGCA : TGCAT 3,390,737 9.1

Page 111: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

108

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences CAGCG : CGCTG 443,621 9.0

TGCAA : TTGCA 3,612,657 8.9

AGCAA : TTGCT 3,967,310 8.8

AAGTG : CACTT 3,706,350 8.7

GTGTA : TACAC 2,069,745 8.7

AAGCT : AGCTT 3,011,869 8.6

GTGCA : TGCAC 2,549,578 8.6

GGCCC : GGGCC 1,501,415 8.6

AGGCG : CGCCT 806,814 8.6

CACTG : CAGTG 3,907,592 8.5

AACAG : CTGTT 3,745,316 8.5

CGGCA : TGCCG 351,139 8.5

TACAA : TTGTA 3,821,239 8.4

GAGAA : TTCTC 5,060,061 8.3

CACTC : GAGTG 2,650,103 8.3

GAGCA : TGCTC 2,535,818 8.3

CTGCA : TGCAG 4,023,532 8.2

AAGTC : GACTT 2,548,544 8.2

AAGTA : TACTT 3,338,487 7.8

GACTC : GAGTC 1,793,884 7.8

CAGTC : GACTG 2,101,763 7.6

GACTA : TAGTC 1,627,837 7.4

GACCC : GGGTC 1,391,465 7.2

AGCTG : CAGCT 3,659,524 7.1

CAGTA : TACTG 2,434,289 7.0

AAGCA : TGCTT 3,978,654 6.8

CGCAA : TTGCG 294,062 6.8

ATGCC : GGCAT 2,461,463 6.5

CTGCC : GGCAG 3,385,498 6.2

GGCAA : TTGCC 2,574,229 6.2

GAGTA : TACTC 1,948,447 6.1

CAGCA : TGCTG 3,994,634 6.0

AAGCC : GGCTT 2,412,704 5.8

Page 112: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

109

Supplementary Table 12 continued

The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic

mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference outside transcript regions

Whole genome intergenic mutations per million quintuplet

occurrences TAGCA : TGCTA 2,478,445 5.7

CGGAA : TTCCG 351,915 5.7

AACTC : GAGTT 2,872,014 4.9

CGCTA : TAGCG 203,508 4.9

AAGCG : CGCTT 425,150 4.7

GGCTA : TAGCC 1,745,360 4.6

AGCTC : GAGCT 2,477,629 4.4

CAGCC : GGCTG 3,816,659 3.9

GGCAC : GTGCC 1,883,650 3.8

CGCCA : TGGCG 565,505 3.5

ATGCG : CGCAT 306,587 3.2

CGCAC : GTGCG 330,260 3.0

GAGCC : GGCTC 2,349,760 2.6

Page 113: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

110

Supplementary Table 13

The effect of +/− one base flanking the mutated TAG on the rates of unspliced transcript (transcribed

regions, including intron) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript (sense

strand) per million quintuplet occurrences

ATAGG 1,196,711 819.7

CTAGG 1,156,560 721.1

CTAGC 824,340 691.5

ATAGC 1,097,065 674.5

ATAGA 1,825,784 660

CTAGA 1,343,347 616.4

GTAGG 980,728 599.6

ATAGT 1,607,995 458.3

GTAGA 1,498,367 433.8

GTAGC 1,033,949 427.5

CTAGT 1,023,498 421.1

TTAGG 1,442,396 394.5

GTAGT 1,169,626 363.4

TTAGC 1,393,011 345.3

TTAGA 1,930,896 287.9

TTAGT 1,860,278 227.4

Page 114: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

111

Supplementary Table 14

The effect of +/− one base flanking the mutated CAG on the rates of unspliced transcript (transcribed

regions, including intron) mutations in AA-UTUC

Quintuplet sequence Number of occurrences in

reference unspliced transcript regions

Whole genome mutations in unspliced transcript (sense

strand) per million quintuplet occurrences

ACAGG 2,179,044 660.8

ACAGC 1,468,809 630.4

ACAGT 1,893,468 538.7

CCAGG 2,920,014 515.8

ACAGA 2,556,782 506.5

GCAGG 2,005,285 505.2

CCAGA 1,968,757 455.1

GCAGC 1,521,819 427.8

TCAGG 2,157,273 426.5

CCAGT 1,678,066 424.3

GCAGA 1,956,583 413.5

CCAGC 2,637,820 396.2

GCAGT 1,907,864 367.4

TCAGC 1,997,426 358.0

TCAGA 2,192,906 350.2

TCAGT 2,036,941 350.0

Page 115: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

112

Supplementary Table 15

Hypergeometric analysis for enrichment of CAG splice-site mutations in AA-UTUCs, AA-treated HK2 clones, and non–AA-associated cancers

A>T:T>A

splice non splice total p-value <x

AA

-UT

UC

s

3T 12 34 46 1.55E-02

6T 32 103 135 4.76E-06

9T 40 189 229 7.58E-04

10T 8 25 33 6.58E-03

13T 24 99 128 1.27E-03

20T 39 171 210 2.41E-04

79T 29 111 140 1.70E-04

80T 32 132 164 2.89E-04

100T 49 149 198 6.60E-09

AA

trea

ted

HK

2 cl

ones

HK2_clone 1 7 8 15 5.62E-05

HK2_clone 2 3 15 18 2.29E-04

non-

AA

-as

soci

ated

ca

ncer

s

H.pylori associated gastric cancer (n=15) 1 13 14 0.76

OV associated cholangiocarcinoma (n=8) 3 49 52 0.21

Exome CAG triad 93,292 776,280

Page 116: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

113

Supplementary Table 16

RPKM gene expression values for 15 NMD pathway genes in the AA-UTUC and matched normal tissue

80N 80T

RPKM value

MAGOH 6.87443 33.0464

WIBG 10.9648 28.1009

SMG5 3.94804 19.5251

UPF1 4.83847 19.7272

EIF4A3 3.58962 14.3719

DHX34 2.08127 10.5602

RBM8A 4.06941 10.9308

UPF2 2.40556 3.72954

SMG7 1.01325 2.12839

UPF3B 0.403335 1.39435

CASC3 2.20616 3.01697

SMG6 1.4214 2.00396

SMG8 0.085067 0.283679

SMG1 0.170046 0.164403

SMG9 14.6615 12.3358

Page 117: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

114

Supplementary Table 17

Identities of 3′ splice sites with CAG>CTG mutations and RPKM > 2

Gene

symbol Exon RPKM Finding Ensembl transcript

METAP2 4 2.13

Exon skipping by reads spanning

flanking exons ENST000000261220

TRAM1 7 2.13

Exon skipping by read depth and reads

spanning flanking exons ENST00000262213

GGNBP2 4 2.4

Exon skipping by reads spanning

flanking exons ENST00000304718

CNOT8 4 2.62 Mutated intron retained ENST00000523698

OXSM 2 4.01

Exon skipping by reads spanning

flanking exons ENST00000420173

MARS 15 4.63 Mutated intron retained ERST00000262027

RBM10 18 5.31 Mutated intron retained ENST00000377604

AIDA 4 5.38 IC; high 3' RPKM ENST00000340020

SEC31A 3 5.39 IC ENST00000348405

MLL2 13 6.18 IC; high 3' RPKM ENST00000301067

AEBP1 8 8.96

All exons 5' of the mutation skipped by

read depth ENST00000223357

C3orf19 6 9.27 IC ENST00000285042

RFC2 9 15.13

Exon skipping by read depth and reads

spanning flanking exons ENST00000352131

MBOAT7 2 25.41

Exon skipping by read depth and reads

spanning flanking exons ENST00000245615

SRRT 4 27.55 No aberration ENST00000423692

Note: IC = inadequate coverage at site of hypothetical mutation high 3' RPKM = overall RPKM is high because of very high coverage in the 3' most exon

Page 118: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

115

Supplementary Table 18

3′ splice sites without CAG>CTG mutations for evaluating the proportion of unmutated sites associated

with aberrant splicing

Gene

symbol Exon RPKM Finding Ensembl transcript

HTRA3 2 2.13 No aberration ENST00000307358

PPFIBP2 17 2.13 No aberration ENST00000299492

SMG7 20 2.13 IC ENST00000367537

KIF3B 5 2.13 IC; high 3' RPKM ENST00000375712

LSP1 2 2.4 IC; high 3' RPKM ENST00000381775

ARHGAP26 13 2.4 No aberration ENST00000274498

DHDS 5 2.62 No aberration ENST00000374194

RSL24D1 2 2.62 IC; high 3' RPKM ENST00000260443

ITGA1 27 4 No aberration ENST00000282588

CCBP2 3 4.01 IC ENST00000496604

AGPAT6 7 4.62 No aberration ENST00000396987

RHBDL2 2 4.63 No aberration ENST00000372985

MEF2D 6 5.3 No aberration ENST00000348159

TMEM98 3 5.32 No aberration ENST00000439138

HDAC7 21 5.38 IC ENST00000380610

KDM5C 20 5.38 No aberration ENST00000375401

TIAM1 4 5.3 No aberration ENST00000399841

AP3D1 6 5.4 No aberration ENST00000355272

PRKDC 45 6.11 IC ENST00000523565

PDE4DIP 6 6.33 IC ENST00000369359

UBAP2L 14 8.95 No aberration ENST00000271877

TRIM26 5 9.24 No aberration ENST00000454678

PCIG1 13 9.28 No aberration ENST00000443130

RCC1 12 15.12 No aberration ENST00000398962

INPP5K 12 15.13 No aberration ENST00000406424

IDH1 4 25.4 No aberration ENST00000345146

RTN2 5 25.42 No aberration ENST00000245923

SFRS16 5 27.52 No aberration ENST00000221455

BRF1 2 27.9 No aberration ENST00000327359

Note: IC = inadequate coverage at site of hypothetical mutation high 3' RPKM = overall RPKM is high because of very high coverage in the 3' most exon

Page 119: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

116

Supplementary Table 19

Sequence analysis summary of two exome-sequenced AA-treated HK2 clones

Bases in Target

Region

Bases Mapped to

Target Region

Ave. Depth Per

Targeted Base

Targeted Bases

with Depth at

Least 1X

Targeted Bases

with Depth at Least

20X

Somatic Mutations

Identified in

Targeted Region

HK2_ctrl 37804019 29,125,955 34 91.6 56

HK2_clone 1 37804019 30,308,167 34 91.7 57 168

HK2_clone 2 37804019 23,749,988 28 91.1 49 219

Page 120: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

117

Supplementary Table 20

Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones

Gene Symbol Sample ID Nucleotide (Genomic) AA

Change

Change

Type

ITGA1 AA_HK2_clone 1 g.chr5: 52235727 A>W S1080C Missense

PTPRB AA_HK2_clone 1 g.chr12: 70932712 A>W I1954K Missense

EBAG9 AA_HK2_clone 1 g.chr8: 110575694 A>W Y197F Missense

JAZF1 AA_HK2_clone 1 g.chr7: 28111249 A>W W2R Missense

NT5DC1 AA_HK2_clone 1 g.chr6: 116466615 A>W T182S Missense

FAM73A AA_HK2_clone 1 g.chr1: 78326961 A>W Q443L Missense

ZSWIM3 AA_HK2_clone 1 g.chr20: 44506288 A>W N364I Missense

KIAA0586 AA_HK2_clone 1 g.chr14: 58896090 A>W D70V Missense

TET2 AA_HK2_clone 1 g.chr4: 106157825 A>W Q909L Missense

JAK2 AA_HK2_clone 1 g.chr9: 5069984 A>W T525S Missense

ACTR10 AA_HK2_clone 1 g.chr14: 58701154 A>W Q380L Missense

C22orf23 AA_HK2_clone 1 g.chr22: 38349178 A>W F58I Missense

KRT1 AA_HK2_clone 1 g.chr12: 53070958 A>R L380P Missense

RTP4 AA_HK2_clone 1 g.chr3: 187088871 A>W T151S Missense

PKHD1 AA_HK2_clone 1 g.chr6: 51513899 A>W V3765E Missense

PLEKHG1 AA_HK2_clone 1 g.chr6: 151152519 A>W S758C Missense

MTF2 AA_HK2_clone 1 g.chr1: 93584947 A>W T31S Missense

TG AA_HK2_clone 1 g.chr8: 134145907 A>W - Splice site

PAN2 AA_HK2_clone 1 g.chr12: 56713400 A>W - Splice site

B3GAT2 AA_HK2_clone 1 g.chr6: 71571491 A>W H309Q Missense

HMGCS2 AA_HK2_clone 1 g.chr1: 120307153 A>W Y67X Nonsense

DKK1 AA_HK2_clone 1 g.chr10: 54076441 A>W R225S Missense

RUNX2 AA_HK2_clone 1 g.chr6: 45479981 A>W - Splice site

ZNF766 AA_HK2_clone 1 g.chr19: 52793724 A>W K227M Missense

CHD8 AA_HK2_clone 1 g.chr14: 21878104 T>W E478V Missense

MLL2 AA_HK2_clone 1 g.chr12: 49420693 G>R A5019V Missense

JMJD1C AA_HK2_clone 1 g.chr10: 64974567 T>W T454S Missense

HIVEP2 AA_HK2_clone 1 g.chr6: 143092547 T>W H1110L Missense

CASQ1 AA_HK2_clone 1 g.chr1: 160167402 T>W - Splice site

C1orf186 AA_HK2_clone 1 g.chr1: 206241623 T>W S56C Missense

ROCK2 AA_HK2_clone 1 g.chr2: 11332613 T>W M1305L Missense

KRT36 AA_HK2_clone 1 g.chr17: 39643829 T>W Q287L Missense

FAM26D AA_HK2_clone 1 g.chr6: 116879255 T>K S90A Missense

FEZF1 AA_HK2_clone 1 g.chr7: 121942943 T>W R323X Nonsense

Page 121: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

118

Supplementary Table 20 continued

Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones

Gene Symbol Sample ID Nucleotide (Genomic) AA

Change

Change

Type

BNC1 AA_HK2_clone 1 g.chr15: 83931855 T>W E716D Missense

OR2AE1 AA_HK2_clone 1 g.chr7: 99473789 T>W T290S Missense

MAGEB6B AA_HK2_clone 1 g.chrX: 26179580 T>W L288Q Missense

MUC16 AA_HK2_clone 1 g.chr19: 9086915 T>W T1634S Missense

RHEB AA_HK2_clone 1 g.chr7: 151174464 T>W D77V Missense

CENPJ AA_HK2_clone 1 g.chr13: 25480785 T>W N464I Missense

PARL AA_HK2_clone 1 g.chr3: 183585748 T>W R76X Nonsense

JAM2 AA_HK2_clone 1 g.chr21: 27078366 T>W V258E Missense

SLC9A9 AA_HK2_clone 1 g.chr3: 143271281 C>Y V338I Missense

VPS24 AA_HK2_clone 1 g.chr2: 86737516 C>M A125S Missense

C13orf40 AA_HK2_clone 1 g.chr13: 103382804 C>S R6748T Missense

IL22RA2 AA_HK2_clone 1 g.chr6: 137476239 C>Y W104X Nonsense

DNAH7 AA_HK2_clone 2 g.chr2: 196753635 A>T V1706E Missense

FANCM AA_HK2_clone 2 g.chr14: 45650702 A>T K1431I Missense

AC073343.2 AA_HK2_clone 2 g.chr7: 6715977 A>T T160S Missense

ITGA2B AA_HK2_clone 2 g.chr17: 42463385 A>T F101L Missense

PMAIP1 AA_HK2_clone 2 g.chr18: 57569877 A>T - Splice site

PTPRB AA_HK2_clone 2 g.chr12: 70932712 A>T I1954K Missense

HOXB5 AA_HK2_clone 2 g.chr17: 46670858 A>G S63P Missense

IKZF5 AA_HK2_clone 2 g.chr10: 124753324 A>T F411Y Missense

TRIO AA_HK2_clone 2 g.chr5: 14330935 A>T T594S Missense

KRT38 AA_HK2_clone 2 g.chr17: 39593694 A>T C447X Nonsense

UPK3A AA_HK2_clone 2 g.chr22: 45689079 A>T I197F Missense

KTN1 AA_HK2_clone 2 g.chr14: 56108481 A>T - Splice site

BICC1 AA_HK2_clone 2 g.chr10: 60556187 A>T T423S Missense

NLE1 AA_HK2_clone 2 g.chr17: 33462448 A>C V345G Missense

TRO AA_HK2_clone 2 g.chrX: 54957697 A>T K683X Nonsense

TMEM86A AA_HK2_clone 2 g.chr11: 18723407 A>T T192S Missense

SNCAIP AA_HK2_clone 2 g.chr5: 121799229 A>T K1004X Nonsense

KRT1 AA_HK2_clone 2 g.chr12: 53070958 A>G L380P Missense

PKP2 AA_HK2_clone 2 g.chr12: 32974403 A>C W678G Missense

SLC5A12 AA_HK2_clone 2 g.chr11: 26743137 A>T L42Q Missense

ABCC4 AA_HK2_clone 2 g.chr13: 95840754 T>A R436X Nonsense

ANPEP AA_HK2_clone 2 g.chr15: 90340900 T>A E688V Missense

IFNA8 AA_HK2_clone 2 g.chr9: 21409555 T>A V127E Missense

Page 122: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

119

Supplementary Table 20 continued

Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones

Gene Symbol Sample ID Nucleotide (Genomic) AA

Change

Change

Type

AC104667.3 AA_HK2_clone 2 g.chr2: 238499912 T>C - Splice site

CDC27 AA_HK2_clone 2 g.chr17: 45201328 T>A - Splice site

REEP1 AA_HK2_clone 2 g.chr2: 86444235 T>A - Splice site

DPP3 AA_HK2_clone 2 g.chr11: 66260238 T>G V347G Missense

SERPINB11 AA_HK2_clone 2 g.chr18: 61383361 T>A N36K Missense

FEN1 AA_HK2_clone 2 g.chr11: 61563225 T>G V131G Missense

BID AA_HK2_clone 2 g.chr22: 18222132 T>G T162P Missense

ZMYM4 AA_HK2_clone 2 g.chr1: 35870649 T>G V1185G Missense

RNF216 AA_HK2_clone 2 g.chr7: 5792480 T>A - Splice site

ABCA10 AA_HK2_clone 2 g.chr17: 67181728 T>A K796I Missense

RCBTB2 AA_HK2_clone 2 g.chr13: 49070341 T>A K501X Nonsense

ABCA2 AA_HK2_clone 2 g.chr9: 139909983 T>A I1194F Missense

ZBTB17 AA_HK2_clone 2 g.chr1: 16271027 T>A H380L Missense

PAPOLG AA_HK2_clone 2 g.chr2: 60988881 T>G V62G Missense

DNHD1 AA_HK2_clone 2 g.chr11: 6589392 T>G V4153G Missense

C12orf51 AA_HK2_clone 2 g.chr12: 112641549 T>A D2344V Missense

ATP8B2 AA_HK2_clone 2 g.chr1: 154303054 T>C S72P Missense

NCOA6 AA_HK2_clone 2 g.chr20: 33345199 T>A Q451L Missense

HERC4 AA_HK2_clone 2 g.chr10: 69750124 T>A K493X Nonsense

MGAT4A AA_HK2_clone 2 g.chr2: 99279644 T>A - Splice site

Page 123: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

120

Supplementary Table 21

Comparison of mutation rates in AA-UTUC, carcinogen-induced cancers, mismatch repair–defective

colorectal cancers, and POLE/POLD1 mutated colorectal cancers

AA-

UTUC

(n=9)

UV-

melanoma

(n=7)

tobacco-lung

cancer (n=12)

OV-

CCA

(n=8)

H. pylori-

gastric

cancer

(n=15)

MSI-

colorectal

cancer (n=34)

POLE

mutated

colorectal

cancer (n=8)

1103.67 335.85 213.83 128 68 1596.5 3678

Page 124: Supplementary Materials for...2013/08/05  · Peiyong Guan, Wen-Hui Weng, Ee Yan Siew, Yujing Liu, Hong Lee Heng, Soo Ching Chong, Anna Gan, Su Ting Tay, Weng Khong Lim, Ioana Cutcutache,

121

Supplementary Table 22

Primer sequences

Primer sequences used in RT-qPCR

Primers ID Sequences

UPF1_forward 5' AAC GAG CAC CAA GGC ATT GGC T 3'

UPF1_reverse 5' GGC TGC TTT GAT AGT GCC TTC G 3'

UPF2_forward 5' TCT CAC CTG AGG ACC AGT GTA C 3'

UPF2_reverse 5' AGC TGG AGG TGG GTT GCA GTA G 3'

UPF3A_forward 5' TAC TGG AGG TGG CAA GCA GGA A 3'

UPF3A_reverse 5' CCT GTG CTC TTT ATC ACT GCC G 3'

MAGOH_forward 5' CCT GGA GTT TGA GTT TCG ACC G 3'

MAGOF_reverse 5' TCT CTT CAG TTC CTC CAT CAC GC 3'

Primer sequences used in verification of MBOAT7 exon skipping

Primers ID Sequences

MBOAT7_E2_forward 5' TCT TAT CTC CAT CCC CAT CG 3'

MBOAT7_E5_reverse 5' CTG AGG CCA TTT CCT TCC T 3'