Supplementary Materials for - Science...anonymized in accordance with approval and advisory report...
Transcript of Supplementary Materials for - Science...anonymized in accordance with approval and advisory report...
www.sciencemag.org/content/357/6352/eaan2507/suppl/DC1
Supplementary Materials for
A pathology atlas of the human cancer transcriptome
Mathias Uhlen,* Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors,
Kemal Sanli, Kalle von Feilitzen, Per Oksvold, Emma Lundberg, Sophia Hober, Peter Nilsson, Johanna Mattsson, Jochen M. Schwenk, Hans Brunnström, Bengt Glimelius, Tobias Sjöblom, Per-Henrik Edqvist, Dijana Djureinovic, Patrick Micke, Cecilia Lindskog, Adil Mardinoglu,
Fredrik Ponten *Corresponding author. Email: [email protected]
Published 18 August 2017, Science 357, eaan2507 (2017) DOI: 10.1126/science.aan2507
This PDF file includes: Materials and Methods
Figs. S1 to S14
Captions for tables S1 to S21
References
Other supplementary material for this manuscript includes the following: Tables S1 to S21 (Excel format)
3
Materials and Methods
Sample preparation
Samples of normal and cancer tissues used for protein and mRNA expression analysis, as
described previously (6) were obtained from the Department of Pathology, Uppsala University
Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank
(http://www.uppsalabiobank.uu.se/en/). All human tissue samples used in the present study were
anonymized in accordance with approval and advisory report from the Uppsala Ethical Review
Board (Reference # 2002-577, 2005-338, 2007-159 and 2012-532 (protein) and # 2011-473 and
2012-532 (RNA)).
Cancer patient samples used for mRNA expression and survival analysis were collected
from The Cancer Genome Atlas (TCGA) project from the initial release of Genomic Data
Commons (GDC) on June 6, 2016, and information regarding sex, age and other clinical
information can be found at https://gdc-portal.nci.nih.gov/. Only samples with both clinical info
and transcriptomic data available at that time point were used in this study.
The lung cancer cohort consists of 345 patients that were consecutively operated at the
Uppsala University hospital between 2006-2010 as published previously (31). Fresh-frozen RNA
was available for 199 of these patients and used for RNAseq analysis as previously described
(30).
The colorectal cancer cohort is based on U-CAN (http://www.u-can.uu.se/?languageId=1),
an infrastructure programme for biobanking, and includes 828 patients with tumor tissue from
colorectal cancers in a TMA format and an associated clinical database. All patients have been
operated at the Uppsala University hospital between 2010 and 2016. For a selected subset of
these patients (n=60), where frozen tumor tissue showed a high fraction of tumor cells, RNA was
extracted and used for RNA sequencing based on the same methodologies as for the lung cancer
tissue described above.
The hepatocellular carcinoma cell line Hep G2 was derived from DSMZ, Braunschweig,
Germany (42).
Protein profiling (tissue microarrays and immunohistochemistry)
Candidates for protein profiling in lung and colorectal cancer were selected based on
prognostic association in the TCGA data, availability of antibodies already analyzed by the
Human Protein Atlas project, supportive antibody validation and distinct differentially expressed
staining pattern among the 12 cancer patients available on the Human Protein Atlas. Generation
of tissue microarrays (TMAs), immunohistochemical staining and slide scanning were performed
as previously described (43). Briefly, formalin-fixed, paraffin-embedded (FFPE) tissue samples
were collected from the pathology archives based on hematoxylin and eosin (HE)-stained tissue
sections showing a representative normal histology for each tissue type. Representative cores (1
mm in diameter) were sampled from the FFPE blocks and assembled into TMAs. TMA blocks
were cut into 4-μm-thick sections using waterfall microtomes (Microm HM 355S, Thermo
Fisher Scientific, Freemont, CA, USA), dried at RT overnight and baked at 50°C for 12-24 hours
prior to immunohistochemical staining. Automated immunohistochemistry was performed using
Autostainer 480® instruments (Lab Vision, Freemont, CA, USA). For details on antibodies, see
Table S18. High-resolution digital images were obtained by slide scanning using Scanscope XT
(Aperio, Vista, CA, USA). The images of immunohistochemically stained TMA sections were
evaluated and scored manually using a four-graded scale for staining intensity (negative, weak,
4
moderate or strong) and a six-graded scale for fraction of positive cells (0-1%, 2-10%, 11-25%,
26-50%, 50-75% or >75%).
Transcript profiling (RNA-seq)
Tissue samples were embedded in Optimal Cutting Temperature (O.C.T.) compound and
stored at –80°C. HE-stained frozen sections (4 µm) were prepared from each sample using a
cryostat and the CryoJane® Tape-Transfer System (Instrumedics, St. Louis, MO, USA). Each
slide was examined by a pathologist to ensure sampling of representative normal tissue. Three
sections (10 µm) were cut from each frozen tissue block and collected in a tube for subsequent
RNA extraction. The tissue was homogenized mechanically using a 3-mm steel grinding ball
(VWR). Total RNA was extracted from the cell lines and tissue samples using the RNeasy Mini
Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The extracted RNA
samples were analyzed using either an Experion automated electrophoresis system (Bio-Rad
Laboratories, Hercules, CA, USA) with the standard sensitivity RNA chip or an Agilent 2100
Bioanalyzer system (Agilent Biotechnologies, Palo Alto, USA) with the RNA 6000 Nano
Labchip Kit. Only samples of high-quality RNA (RNA Integrity Number ≥7.5) were used in the
following mRNA sample preparation for sequencing.
Processing of RNA-seq data
RNA sequencing data for 162 samples from 37 tissues and organs from the Human Protein
Atlas, 9,666 samples from 33 cancer types from TCGA, 198 samples from a lung cancer cohort
from a previous study (30) and 59 colorectal cancer samples in the UCAN cohort were
processed/reprocessed using the same pipeline as GDC. In brief, the processed reads were
mapped to the human genome (GRCh38) using STAR v2.4.2a (44). To obtain quantification
scores for all human genes and transcripts across all samples, raw counts were calculated using
HTSeq v0.6.1p1 (45) and then converted to FPKM (fragments per kilobase of exon per million
mapped reads). Gencode annotation v22 was used in HTSeq, and 19,571 protein-coding genes
overlapped with the Human Protein Atlas. The average FPKM value for all individual samples
for each tissue was used to estimate gene expression levels. A cut-off value of 1 FPKM was used
as a detection limit across all tissues.
RNA-based classification of genes
Each of the 19,571 genes with mapped RNA-seq data was classified into one of six
categories for normal tissues and cancers based on the FPKM levels in 32 normal tissues and 33
cancer types, respectively: (1) Not detected: FPKM <1 in all tissues/cancers; (2) Enriched: at
least a 5-fold higher FPKM level in one tissue/cancer than in all other tissues/cancers; (3) Group
enriched: a 5-fold higher average FPKM value in a group of 2-7 tissues/cancers than in all other
tissues/cancers; (4) Expressed in all: detected in all 32 tissues/cancers with FPKM >1; (5) Tissue
enhanced: at least a 5-fold higher FPKM level in one tissue/cancer than the average value of all
37/33 tissues/cancers; and (6) Mixed: the remaining genes detected in 1-36/32 tissues/cancers
with FPKM >1 that did not fit the above categories.
Differential expression analysis
The significantly down-regulated tissue enriched genes in liver cancer were identified by
differential expression analysis using DESeq2 (46). The raw counts for 10 normal and 365
cancer samples were used as input for DESeq2.
Survival analysis
Based on the FPKM value of each gene, we classified the patients into two groups and
examined their prognoses. In the analysis, we excluded genes with low expression, i.e., those
with a median expression among samples less than FPKM 1. The prognosis of each group of
5
patients was examined by Kaplan-Meier survival estimators, and the survival outcomes of the
two groups were compared by log-rank tests. To choose the best FPKM cut-offs for grouping the
patients most significantly, all FPKM values from the 20th to 80th percentiles were used to
group the patients, significant differences in the survival outcomes of the groups were examined
and the value yielding the lowest log-rank P value is selected.
Additionally, a previous published method as a R package named ‘maxstat’ (13) for
normalization of optimally selected expression cutoff was employed to evaluate the stability of
the results.
Defining favorable and unfavorable prognostic genes
Genes with log rank P values less than 0.001 were defined as prognostic genes. In addition,
if the group of patients with high expression of a selected prognostic gene has a higher observed
event than expected event, it is an unfavorable prognostic gene; otherwise, it is a favorable
prognostic gene. When the statistic method by Hothorn and Lausen was used for sensitivity
analysis, prognostic genes were defined as genes with maximal P value less than 0.01. When
hazard ratio (HR) was used for sensitivity analysis, genes whose high expression associated with
the group of patient with HR more than 1.2 were defined as unfavorable prognostic genes, and
genes whose low expression associated with the group of patient with HR more than 1.2 were
defined as favorable prognostic genes.
Survival analysis based on a panel of genes with the most prognostic expression
After examining the most prognostic genes for each cancer with their best FPKM cut-offs,
we selected the five most significant favorable genes and five most significant adverse genes as
“panel” genes. Based on the best FPKM cut-off values, we examined whether favorable genes
were expressed more than the best cut-off or adverse genes expressed less than the best cut-off in
all patients. If more than 80% of the panel genes were expressed in one of the two cases, we
predicted that those patients would be in the better survival group; otherwise, we predicted that
those patients would be in the poor survival group. Next, we compared the survival outcomes for
these two patient groups using log-rank tests.
Survival analysis of lung and colorectal cancer validation cohorts
The protein expression scores, based on staining intensity (score 1-4) and fraction of stained
ells (score 1-6), were multiplied in order to generate a protein level score between one and 24.
This score was used for subsequent survival analysis using a best separation cut-off.
Gene ontology analysis
Enriched gene ontology terms in sets of enriched genes were determined using DAVID
Bioinformatics Resource v 6.8 (47). Only the biological gene ontology term ‘GOTERM_BP_5’
was used to obtain reliable and interpretable enriched terms.
Visualization of enriched GO terms
The enriched GO terms were visualized in a network plot using Cytoscape (version 3.2.1)
with the external package EnrichmentMap (48). An FDR of 0.05 was used as a threshold for the
selection of enriched GO terms. The overlap coefficient 0.8 and combined constant 0.8 were
selected for similarity cut-offs.
Generality and directionality in the bubble plot
All GO terms that were over-represented by favorable or unfavorable prognostic genes for
at least one cancer were visualized in the plot. The bubbles are located based on two parameters
of the corresponding GO terms defined herein as generality (y-axis) and directionality (x-axis),
calculated as follows:
6
𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑡𝑦 = ∑(𝑁𝑓𝑎𝑣,𝑖 + 𝑁𝑢𝑛𝑓,𝑖)
𝑛
𝑖 = 1
𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑡𝑦 = ∑ 𝑁𝑓𝑎𝑣,𝑖
𝑛
𝑖 = 1
− ∑ 𝑁𝑢𝑛𝑓,𝑖
𝑛
𝑖 = 1
where i represents each cancer type, and n represents the total number of cancer types. Nfav,i and
Nunf,i are binary variables, and their values are 1 if the GO term is over-represented by the
favorable and unfavorable prognostic genes of cancer ith, respectively, and 0 otherwise.
Generation of cancer-specific co-expression networks
For each cancer transcriptome, we first removed genes with low expression by disregarding
the bottom 25% expression means (mean values shown in Table S11). Calculating Pearson’s
correlation coefficients between the expressions of genes above the bottom 25%, we selected the
top 1% correlation values of those gene pairs of each cancer type and constructed cancer-specific
co-expression networks. Co-expression networks are available at http://inetmodels.com.
Hallmark gene list
The hallmark genes used in this study were collected based on related biological functions
in the MSigDB and KEGG databases, and the detailed list is included in Table S19.
Analysis of co-expression networks
In each co-expression network, co-expression clusters, i.e., groups of highly co-expressed
genes, were identified using the modularity-based community detection algorithm random walk,
which was implemented in the cluster-walktrap function of the R igraph package (49, 50). Here,
we excluded small-sized co-expression clusters with less than five genes for further analysis.
While visualizing co-expression networks by their co-expression clusters (Figure 5C),
interactions among co-expression clusters were identified based on an interaction score of two
clusters, A and B (IAB), which were defined by expected co-expression links (EAB) and observed
co-expression links (OAB) between the clusters, as described below:
Cluster interactions when IAB >1,
𝐼𝐴𝐵 = 𝑂𝐴𝐵−𝐸𝐴𝐵
𝐸𝐴𝐵 (1)
𝐸𝐴𝐵 = ∑ ∑𝑘𝑎𝑘𝑏
2𝑁𝑏∈𝐵𝑎∈𝐴 (2)
where a and b respectively indicate a node of cluster A and a node of cluster B, ka and kb
respectively indicate the degree of connectivity of node a and b, and N indicates the number of
all network edges.
Next, we examined co-expression clusters enriched in genes associated with hallmarks of
cancer using hypergeometric tests (Figure 5A-C). For the examination, we selected genes that
were associated with hallmarks of cancer from the MSigDB and KEGG pathway (16, 20).
Likewise, we examined co-expression clusters that were enriched in favorable or adverse genes
by hypergeometric tests (Figure 5C).
Reconstruction of personalized genome-scale metabolic models (GSMMs)
Personalized models were reconstructed based on the RNA-seq data and a previously
developed task-driven model reconstruction (tINIT) algorithm (26). The tINIT algorithm
7
employs defined metabolic tasks for imposing constraints on the functionality of the
reconstructed models. In this context, only cell growth was defined as required for tumor cells.
This metabolic task was used as an input in the tINIT algorithm for the reconstruction of
personalized models for growth and was simultaneously consistent with the RNA-seq data. A
generic GSMM for human cancer (Table S20) was used as the reference model for the tINIT
algorithm. A time limit of 10h have been set, and as a result, 6753 personalized models have
been reconstructed (Table S21). The personalized GSMMs are available at
https://www.ebi.ac.uk/biomodels (51) with the accession numbers MODEL1707110000-
MODEL1707116752.
Metabolic pathway enrichment analysis
The genes related to a specific metabolic pathway were defined according to the Gene-
Reaction relationship from the generic GSMM for human cancer. A gene set was regarded as
enriched in a specific metabolic pathway if it significantly (Padj <0.05) overlapped with the
genes related to the metabolic pathway using the hypergeometric test.
8
Fig. S1 Global expression pattern of protein-coding genes in human tissues and cancers.
Heat map showing the pairwise correlation between all 37 normal tissues and 33 TCGA cancers
based on transcript expression levels of 19,571 genes. The average FPKM values for each gene
and tissue/cancer were used in the analysis.
9
Fig. S2 Classification of protein-coding genes in human tissues and cancers. The number of
protein-coding genes classified in each expression category based on the transcript expression
level of 33 cancers from TCGA (A) and 37 normal tissues (B).
10
Fig. S3 GO term enrichment analysis of cancer-specific house-keeping genes. Network
visualization of enriched GO terms, in which the node sizes indicate the number of genes in the
corresponding GO terms, and edge widths indicate the number of genes shared between the two
linked GO terms.
11
Fig. S4 PCA plot showing the similarities in expression of 19,571 protein-coding genes
among 21 subcancer types. The short names of the subcancer types follow the naming of
TCGA which are provided in Table S4.
12
Fig. S5 Overall survival analysis for 17 cancer types. Kaplan-Meier plots showing the overall
survival rates of patients from each of the 17 cancer types. We showed overall survival rates
(solid line) with 95% confidence intervals (dashed lines for upper and lower bounds). Here we
found prostate cancer and testis cancer had the best 3-year survival rates and glioma and
pancreatic cancer had the lowest 3-year survival rates.
13
Fig. S6 Comparison between maximally selected log rank P values used in our study and
another method described by Hothorn and Laursen for 17 cancer types. Scatter plots show
the correlation between the expression cut-offs for stratifying patients (left) and log scale P
values (right) between the method used in this paper and the method described by Hothorn and
Laursen (13). The two alternative statistical methods showed highly similar results.
14
Fig. S7 Bubble plots showing the common enriched GO terms among the 17 Human
Pathology Atlas cancer types based on optional P value or HR cutoff defined prognostic
genes
15
Fig. S8 Overlapping of hallmark genes with prognostic genes of cancers. Bar plot showing
the fraction of hallmark genes that overlap with prognostic genes for all and each of the 17
cancers.
16
Fig. S9. Co-expression network analyses with prognostic genes selected based on optional
parameters. (A) Overlapping of hallmark genes with prognostic genes of cancers. (B) Network
plot showing co-expression clusters of lung cancer, overlapped with prognostic genes. The gray,
yellow and red color of the nodes indicates that the cluster was significantly enriched with
hallmark genes, prognostic genes and both cases, respectively. (C) Bar plot showing the fraction
of prognostic genes that are mere hallmark genes (red), co-expressed in “hallmark” gene clusters
(pink), or not co-expressed with “hallmark” genes (gold).
17
Fig. S10 Co-expression cluster analysis for 17 cancers. Network plots for the co-expression
clusters for 17 major cancers. All nodes indicate gene co-expression clusters, and edges indicate
significant co-expression links connected between clusters. Gray, yellow and red nodes indicate
clusters that are significantly enriched with hallmark genes, prognostic genes or both,
respectively.
18
Fig. S11 Summary of model statistics for personalized GSMMs from 17 cancer types. Box
plot showing the number of reactions, metabolites and genes for personalized GSMMs from 17
different cancer types.
19
Fig. S12 Metabolic functions of non-toxic genes that are essential for tumor growth and
conserved in 17 cancer types. Circus plot showing the 32 conserved genes that are essential for
tumor growth and their corresponding metabolic functions.
20
Fig. S13 Validation of panel genes for lung cancer. Kaplan-Meier plot for panel genes
stratified patient groups in independent lung cancer cohort, showing high statistical significance
(log-rank P = 0.0154). Exactly same expression cutoffs as the discovery cohort were used for
each gene, and each patient in group marked as ‘good’ has 8 out of the 10 panel genes showing
favorable sign (high expression of favorable genes or low expression of unfavorable genes).
21
Fig. S14 Validation of selected genes with a prognostic effect in colorectal cancer.
Kaplan-Meier plots for RNA level separation from the TCGA cohort, the HPA cohort and
protein level separation are shown in the first, second and third columns, respectively. The 3-
year recurrence was used as event for the HPA cohort because of the short follow-up time. The
log-rank P values are shown in the lower left corner of each Kaplan-Meier plot. High and low
proteins staining are shown in the fourth and fifth columns. Protein expression levels of the
targets in all Human Pathology Atlas cancers are shown in the last column.
2
Table S1. Summary of 33 TCGA cancer types.
Table S2. Categories of protein-coding genes in normal tissues and cancers.
Table S3. GO term enrichment analysis for cancer-specific house-keeping genes from DAVID.
Table S4. Summary of the 17 major cancer types examined in this study,
Table S5. The number of prognostic genes for 17 major cancer types.
Table S6. Expression cut-off for the best stratification and results of the survival analysis for all protein-
coding genes in 17 major cancer types.
Table S7. Prognostic genes and their log-rank P values involved in prognostic panels of the Big 5 cancers
shown in Figure 2A.
Table S8. Summary of all prognostic genes and the respective cancer types for which they are prognostic
markers.
Table S9. Enriched GO terms for each cancer type with prognostic genes defined by two different log rank
P value cutoffs and HR cutoff.
Table S10. Summary of unfavorable prognostic cell cycle genes and the respective cancer types for which
they are prognostic markers.
Table S11. Hypergeometric P values of the overlap between favorable prognostic genes for each cancer
and genes with elevated expression in their supposed tissues of origin.
Table S12. Statistical features of cancer-specific co-expression networks for 17 cancer types. All the
networks are normalized and of the same size with 14,293 genes and 1,021,378 co-expressed gene pairs for
fair comparison.
Table S13. Summary of genes involved in the co-expression cluster of lung cancer in Figure 5B.
Table S14. Statistical summary of cancer-specific co-expression networks in cancer.
Table S15. Statistical summary of genome-scale metabolic models for all patients.
Table S16. Summary of metabolic pathways associated with the essential genes in 17 cancers.
Table S17. Short names for all 17 cancer types.
Table S18. Antibodies used for protein profiling of the selected genes.
Table S19. Terms and full gene list for hallmark of cancer.
Table S20. Reference GSMM for reconstruction of personalized GSMMs.
Table S21. Complete list of patient IDs and corresponding cancer types for reconstructed GSMMs.
References
1. D. J. Brennan, D. P. O’Connor, E. Rexhepaj, F. Ponten, W. M. Gallagher, Antibody-based proteomics: Fast-tracking molecular diagnostics in oncology. Nat. Rev. Cancer 10, 605–617 (2010). doi:10.1038/nrc2902 Medline
2. E. Björnson, B. Mukhopadhyay, A. Asplund, N. Pristovsek, R. Cinar, S. Romeo, M. Uhlen, G. Kunos, J. Nielsen, A. Mardinoglu, Stratification of hepatocellular carcinoma patients based on acetate utilization. Cell Rep. 13, 2014–2026 (2015). doi:10.1016/j.celrep.2015.10.045 Medline
3. A. Mardinoglu, J. Nielsen, New paradigms for metabolic modeling of human cells. Curr. Opin. Biotechnol. 34, 91–97 (2015). doi:10.1016/j.copbio.2014.12.013 Medline
4. S. Lee, A. Mardinoglu, C. Zhang, D. Lee, J. Nielsen, Dysregulated signaling hubs of liver lipid metabolism reveal hepatocellular carcinoma pathogenesis. Nucleic Acids Res. 44, 5529–5539 (2016). doi:10.1093/nar/gkw462 Medline
5. The Cancer Genome Atlas (TCGA) Research Network, J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J. M. Stuart, The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). doi:10.1038/ng.2764 Medline
6. M. Uhlén, L. Fagerberg, B. M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, Å. Sivertsson, C. Kampf, E. Sjöstedt, A. Asplund, I. Olsson, K. Edlund, E. Lundberg, S. Navani, C. A.-K. Szigyarto, J. Odeberg, D. Djureinovic, J. O. Takanen, S. Hober, T. Alm, P.-H. Edqvist, H. Berling, H. Tegel, J. Mulder, J. Rockberg, P. Nilsson, J. M. Schwenk, M. Hamsten, K. von Feilitzen, M. Forsberg, L. Persson, F. Johansson, M. Zwahlen, G. von Heijne, J. Nielsen, F. Pontén, Tissue-based map of the human proteome. Science 347, 1260419 (2015). doi:10.1126/science.1260419 Medline
7. J. Lonsdale, J. Thomas, M. Salvatore, R. Phillips, E. Lo, S. Shad, R. Hasz, G. Walters, F. Garcia, N. Young, B. Foster, M. Moser, E. Karasik, B. Gillard, K. Ramsey, S. Sullivan, J. Bridge, H. Magazine, J. Syron, J. Fleming, L. Siminoff, H. Traino, M. Mosavel, L. Barker, S. Jewell, D. Rohrer, D. Maxim, D. Filkins, P. Harbach, E. Cortadillo, B. Berghuis, L. Turner, E. Hudson, K. Feenstra, L. Sobin, J. Robb, P. Branton, G. Korzeniewski, C. Shive, D. Tabor, L. Qi, K. Groch, S. Nampally, S. Buia, A. Zimmerman, A. Smith, R. Burges, K. Robinson, K. Valentino, D. Bradbury, M. Cosentino, N. Diaz-Mayoral, M. Kennedy, T. Engel, P. Williams, K. Erickson, K. Ardlie, W. Winckler, G. Getz, D. DeLuca, D. MacArthur, M. Kellis, A. Thomson, T. Young, E. Gelfand, M. Donovan, Y. Meng, G. Grant, D. Mash, Y. Marcus, M. Basile, J. Liu, J. Zhu, Z. Tu, N. J. Cox, D. L. Nicolae, E. R. Gamazon, H. K. Im, A. Konkashbaev, J. Pritchard, M. Stevens, T. Flutre, X. Wen, E. T. Dermitzakis, T. Lappalainen, R. Guigo, J. Monlong, M. Sammeth, D. Koller, A. Battle, S. Mostafavi, M. McCarthy, M. Rivas, J. Maller, I.
Rusyn, A. Nobel, F. Wright, A. Shabalin, M. Feolo, N. Sharopova, A. Sturcke, J. Paschal, J. M. Anderson, E. L. Wilder, L. K. Derr, E. D. Green, J. P. Struewing, G. Temple, S. Volpi, J. T. Boyer, E. J. Thomson, M. S. Guyer, C. Ng, A. Abdallah, D. Colantuoni, T. R. Insel, S. E. Koester, A. R. Little, P. K. Bender, T. Lehner, Y. Yao, C. C. Compton, J. B. Vaught, S. Sawyer, N. C. Lockhart, J. Demchok, H. F. Moore, The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013). doi:10.1038/ng.2653 Medline
8. L. Collado-Torres, A. Nellore, K. Kammers, S. E. Ellis, M. A. Taub, K. D. Hansen, A. E. Jaffe, B. Langmead, J. T. Leek, Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35, 319–321 (2017). doi:10.1038/nbt.3838 Medline
9. L. Peng, X. W. Bian, D. K. Li, C. Xu, G. M. Wang, Q. Y. Xia, Q. Xiong, Large-scale RNA-Seq transcriptome analysis of 4043 cancers and 548 normal tissue controls across 12 TCGA cancer types. Sci. Rep. 5, 13413 (2015). doi:10.1038/srep13413 Medline
10. F. Edfors, F. Danielsson, B. M. Hallström, L. Käll, E. Lundberg, F. Pontén, B. Forsström, M. Uhlén, Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016). doi:10.15252/msb.20167144 Medline
11. D. Hanahan, R. A. Weinberg, Hallmarks of cancer: The next generation. Cell 144, 646–674 (2011). doi:10.1016/j.cell.2011.02.013 Medline
12. C. Kandoth, M. D. McLellan, F. Vandin, K. Ye, B. Niu, C. Lu, M. Xie, Q. Zhang, J. F. McMichael, M. A. Wyczalkowski, M. D. M. Leiserson, C. A. Miller, J. S. Welch, M. J. Walter, M. C. Wendl, T. J. Ley, R. K. Wilson, B. J. Raphael, L. Ding, Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013). doi:10.1038/nature12634 Medline
13. T. Hothorn, B. Lausen, On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal. 43, 121–137 (2003). doi:10.1016/S0167-9473(02)00225-6
14. B. Hjelm, D. J. Brennan, N. Zendehrokh, J. Eberhard, B. Nodin, A. Gaber, F. Pontén, H. Johannesson, K. Smaragdi, C. Frantz, S. Hober, L. B. Johnson, S. Påhlman, K. Jirström, M. Uhlen, High nuclear RBM3 expression is associated with an improved prognosis in colorectal cancer. Proteomics Clin. Appl. 5, 624–635 (2011). doi:10.1002/prca.201100020 Medline
15. C. J. Creighton, M. Morgan, P. H. Gunaratne, D. A. Wheeler, R. A. Gibbs, A. Gordon Robertson, A. Chu, R. Beroukhim, K. Cibulskis, S. Signoretti, F. Vandin Hsin-Ta Wu, B. J. Raphael, R. G. W. Verhaak, P. Tamboli, W. Torres-Garcia, R. Akbani, J. N. Weinstein, V. Reuter, J. J. Hsieh, A. Rose Brannon, A. Ari Hakimi, A. Jacobsen, G. Ciriello, B. Reva, C. J. Ricketts, W. Marston Linehan, J. M. Stuart, W. Kimryn Rathmell, H. Shen, P. W. Laird, D. Muzny, C. Davis, M. Morgan, L. Xi, K. Chang, N. Kakkar, L. R. Treviño, S. Benton, J. G. Reid, D. Morton, H. Doddapaneni, Y. Han, L. Lewis, H. Dinh, C. Kovar,
Y. Zhu, J. Santibanez, M. Wang, W. Hale, D. Kalra, C. J. Creighton, D. A. Wheeler, R. A. Gibbs, G. Getz, K. Cibulskis, M. S. Lawrence, C. Sougnez, S. L. Carter, A. Sivachenko, L. Lichtenstein, C. Stewart, D. Voet, S. Fisher, S. B. Gabriel, E. Lander, R. Beroukhim, S. E. Schumacher, B. Tabak, G. Saksena, R. C. Onofrio, S. L. Carter, A. D. Cherniack, J. Gentry, K. Ardlie, C. Sougnez, G. Getz, S. B. Gabriel, M. Meyerson, A. Gordon Robertson, A. Chu, H.-J. E. Chun, A. J. Mungall, P. Sipahimalani, D. Stoll, A. Ally, M. Balasundaram, Y. S. N. Butterfield, R. Carlsen, C. Carter, E. Chuah, R. J. N. Coope, N. Dhalla, S. Gorski, R. Guin, C. Hirst, M. Hirst, R. A. Holt, C. Lebovitz, D. Lee, H. I. Li, M. Mayo, R. A. Moore, E. Pleasance, P. Plettner, J. E. Schein, A. Shafiei, J. R. Slobodan, A. Tam, N. Thiessen, R. J. Varhol, N. Wye, Y. Zhao, I. Birol, S. J. M. Jones, M. A. Marra, J. T. Auman, D. Tan, C. D. Jones, K. A. Hoadley, P. A. Mieczkowski, L. E. Mose, S. R. Jefferys, M. D. Topal, C. Liquori, Y. J. Turman, Y. Shi, S. Waring, E. Buda, J. Walsh, J. Wu, T. Bodenheimer, A. P. Hoyle, J. V. Simons, M. G. Soloway, S. Balu, J. S. Parker, D. Neil Hayes, C. M. Perou, R. Kucherlapati, P. Park, H. Shen, T. Triche Jr., D. J. Weisenberger, P. H. Lai, M. S. Bootwalla, D. T. Maglinte, S. Mahurkar, B. P. Berman, D. J. Van Den Berg, L. Cope, S. B. Baylin, P. W. Laird, C. J. Creighton, D. A. Wheeler, G. Getz, M. S. Noble, D. DiCara, H. Zhang, J. Cho, D. I. Heiman, N. Gehlenborg, D. Voet, W. Mallard, P. Lin, S. Frazer, P. Stojanov, Y. Liu, L. Zhou, J. Kim, M. S. Lawrence, L. Chin, F. Vandin, H.-T. Wu, B. J. Raphael, C. Benz, C. Yau, S. M. Reynolds, I. Shmulevich, R. G. W. Verhaak, W. Torres-Garcia, R. Vegesna, H. Kim, W. Zhang, D. Cogdell, E. Jonasch, Z. Ding, Y. Lu, R. Akbani, N. Zhang, A. K. Unruh, T. D. Casasent, C. Wakefield, D. Tsavachidou, L. Chin, G. B. Mills, J. N. Weinstein, A. Jacobsen, A. Rose Brannon, G. Ciriello, N. Schultz, A. Ari Hakimi, B. Reva, Y. Antipin, J. Gao, E. Cerami, B. Gross, B. Arman Aksoy, R. Sinha, N. Weinhold, S. Onur Sumer, B. S. Taylor, R. Shen, I. Ostrovnaya, J. J. Hsieh, M. F. Berger, M. Ladanyi, C. Sander, S. S. Fei, A. Stout, P. T. Spellman, D. L. Rubin, T. T. Liu, J. M. Stuart, S. Ng, E. O. Paull, D. Carlin, T. Goldstein, P. Waltman, K. Ellrott, J. Zhu, D. Haussler, P. H. Gunaratne, W. Xiao, C. Shelton, J. Gardner, R. Penny, M. Sherman, D. Mallery, S. Morris, J. Paulauskis, K. Burnett, T. Shelton, S. Signoretti, W. G. Kaelin, T. Choueiri, M. B. Atkins, R. Penny, K. Burnett, D. Mallery, E. Curley, S. Tickoo, V. Reuter, W. Kimryn Rathmell, L. Thorne, L. Boice, M. Huang, J. C. Fisher, W. Marston Linehan, C. D. Vocke, J. Peterson, R. Worrell, M. J. Merino, L. S. Schmidt, P. Tamboli, B. A. Czerniak, K. D. Aldape, C. G. Wood, J. Boyd, J. E. Weaver, M. V. Iacocca, N. Petrelli, G. Witkin, J. Brown, C. Czerwinski, L. Huelsenbeck-Dill, B. Rabeno, J. Myers, C. Morrison, J. Bergsten, J. Eckman, J. Harr, C. Smith, K. Tucker, L. Anne Zach, W. Bshara, C. Gaudioso, C. Morrison, R. Dhir, J. Maranchie, J. Nelson, A. Parwani, O. Potapova, K. Fedosenko, J. C. Cheville, R. Houston Thompson, S. Signoretti, W. G. Kaelin, M. B. Atkins, S. Tickoo, V. Reuter, W. Marston Linehan, C. D. Vocke, J. Peterson, M. J. Merino, L. S. Schmidt, P. Tamboli, J. M. Mosquera, M. A. Rubin, M. L. Blute, W. Kimryn Rathmell, T. Pihl, M. Jensen, R. Sfeir, A. Kahn, A. Chu, P. Kothiyal, E. Snyder,
J. Pontius, B. Ayala, M. Backus, J. Walton, J. Baboud, D. Berton, M. Nicholls, D. Srinivasan, R. Raman, S. Girshik, P. Kigonya, S. Alonso, R. Sanbhadti, S. Barletta, D. Pot, M. Sheth, J. A. Demchok, T. Davidsen, Z. Wang, L. Yang, R. W. Tarnuzzer, J. Zhang, G. Eley, M. L. Ferguson, K. R. Mills Shaw, M. S. Guyer, B. A. Ozenberger, H. J. Sofia, Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). doi:10.1038/nature12222 Medline
16. A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, J. P. Mesirov, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005). doi:10.1073/pnas.0506580102 Medline
17. H. A. Edmondson, P. E. Steiner, Primary carcinoma of the liver: A study of 100 cases among 48,900 necropsies. Cancer 7, 462–503 (1954). doi:10.1002/1097-0142(195405)7:3<462:AID-CNCR2820070308>3.0.CO;2-E Medline
18. T. M. Pawlik, A. L. Gleisner, R. A. Anders, L. Assumpcao, W. Maley, M. A. Choti, Preoperative assessment of hepatocellular carcinoma tumor grade using needle biopsy: Implications for transplant eligibility. Ann. Surg. 245, 435–442 (2007). doi:10.1097/01.sla.0000250420.73854.ad Medline
19. A. J. Simpson, O. L. Caballero, A. Jungbluth, Y. T. Chen, L. J. Old, Cancer/testis antigens, gametogenesis and cancer. Nat. Rev. Cancer 5, 615–625 (2005). doi:10.1038/nrc1669 Medline
20. M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, K. Morishima, KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017). doi:10.1093/nar/gkw1092 Medline
21. T. I. Zack, S. E. Schumacher, S. L. Carter, A. D. Cherniack, G. Saksena, B. Tabak, M. S. Lawrence, C. Z. Zhsng, J. Wala, C. H. Mermel, C. Sougnez, S. B. Gabriel, B. Hernandez, H. Shen, P. W. Laird, G. Getz, M. Meyerson, R. Beroukhim, Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013). doi:10.1038/ng.2760 Medline
22. N. N. Pavlova, C. B. Thompson, The emerging hallmarks of cancer metabolism. Cell Metab. 23, 27–47 (2016). doi:10.1016/j.cmet.2015.12.006 Medline
23. M. G. Vander Heiden, R. J. DeBerardinis, Understanding the intersections between metabolism and cancer biology. Cell 168, 657–669 (2017). doi:10.1016/j.cell.2016.12.039 Medline
24. P. Ghaffari, A. Mardinoglu, J. Nielsen, Cancer metabolism: A modeling perspective. Front. Physiol. 6, 382 (2015). doi:10.3389/fphys.2015.00382 Medline
25. A. Mardinoglu, R. Agren, C. Kampf, A. Asplund, M. Uhlen, J. Nielsen, Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat. Commun. 5, 3083 (2014). doi:10.1038/ncomms4083 Medline
26. R. Agren, A. Mardinoglu, A. Asplund, C. Kampf, M. Uhlen, J. Nielsen, Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling. Mol. Syst. Biol. 10, 721 (2014). doi:10.1002/msb.145122 Medline
27. A. Mardinoglu, E. Bjornson, C. Zhang, M. Klevstig, S. Söderlund, M. Ståhlman, M. Adiels, A. Hakkarainen, N. Lundbom, M. Kilicarslan, B. M. Hallström, J. Lundbom, B. Vergès, P. H. R. Barrett, G. F. Watts, M. J. Serlie, J. Nielsen, M. Uhlén, U. Smith, H.-U. Marschall, M.-R. Taskinen, J. Boren, Personal model-assisted identification of NAD(+) and glutathione metabolism as intervention target in NAFLD. Mol. Syst. Biol. 13, 916 (2017). doi:10.15252/msb.20167422 Medline
28. L. Jerby-Arnon, N. Pfetzer, Y. Y. Waldman, L. McGarry, D. James, E. Shanks, B. Seashore-Ludlow, A. Weinstock, T. Geiger, P. A. Clemons, E. Gottlieb, E. Ruppin, Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell 158, 1199–1209 (2014). doi:10.1016/j.cell.2014.07.027 Medline
29. C. Zhang, Q. Hua, Applications of genome-scale metabolic models in biotechnology and systems medicine. Front. Physiol. 6, 413 (2016). doi:10.3389/fphys.2015.00413 Medline
30. D. Djureinovic, B. M. Hallström, M. Horie, J. S. M. Mattsson, L. La Fleur, L. Fagerberg, H. Brunnström, C. Lindskog, K. Madjar, J. Rahnenführer, S. Ekman, E. Ståhle, H. Koyi, E. Brandén, K. Edlund, J. G. Hengstler, M. Lambe, A. Saito, J. Botling, F. Pontén, M. Uhlén, P. Micke, Profiling cancer testis antigens in non-small-cell lung cancer. Jci Insight 1, e86837 (2016). doi:10.1172/jci.insight.86837 Medline
31. P. Micke, J. S. M. Mattsson, D. Djureinovic, B. Nodin, K. Jirström, L. Tran, P. Jönsson, M. Planck, J. Botling, H. Brunnström, The impact of the Fourth Edition of the WHO Classification of Lung Tumours on histological classification of resected pulmonary NSCCs. J. Thorac. Oncol. 11, 862–872 (2016). doi:10.1016/j.jtho.2016.01.020 Medline
32. T. Tanaka, G. Kutomi, T. Kajiwara, K. Kukita, V. Kochin, T. Kanaseki, T. Tsukahara, Y. Hirohashi, T. Torigoe, Y. Okamoto, K. Hirata, N. Sato, Y. Tamura, Cancer-associated oxidoreductase ERO1-α drives the production of VEGF via oxidative protein folding and regulating the mRNA level. Br. J. Cancer 114, 1227–1234 (2016). doi:10.1038/bjc.2016.105 Medline
33. K. Katono, Y. Sato, S.-X. Jiang, M. Kobayashi, K. Saito, R. Nagashio, S. Ryuge, Y. Satoh, M. Saegusa, N. Masuda, Clinicopathological significance of S100A10 expression in lung adenocarcinomas. Asian Pac. J. Cancer Prev. 17, 289–294 (2016). doi:10.7314/APJCP.2016.17.1.289 Medline
34. K. Saito, M. Kobayashi, R. Nagashio, S. Ryuge, K. Katono, H. Nakashima, B. Tsuchiya, S.-X. Jiang, M. Saegusa, Y. Satoh, N. Masuda, Y. Sato, S100A16 is a prognostic marker for lung adenocarcinomas. Asian Pac. J. Cancer Prev. 16, 7039–7044 (2015). doi:10.7314/APJCP.2015.16.16.7039 Medline
35. F. Penault-Llorca, N. Radosevic-Robin, Ki67 assessment in breast cancer: An update. Pathology 49, 166–171 (2017). doi:10.1016/j.pathol.2016.11.006 Medline
36. J. N. Jakobsen, J. B. Sørensen, Clinical impact of Ki-67 labeling index in non-small cell lung cancer. Lung Cancer 79, 1–7 (2013). doi:10.1016/j.lungcan.2012.10.008 Medline
37. M. Younes, R. W. Brown, M. Stephenson, M. Gondo, P. T. Cagle, Overexpression of Glut1 and Glut3 in stage I nonsmall cell lung carcinoma is associated with poor survival. Cancer 80, 1046–1051 (1997). doi:10.1002/(SICI)1097-0142(19970915)80:6<1046:AID-CNCR6>3.0.CO;2-7 Medline
38. C. K. Jung, J. H. Jung, G. S. Park, A. Lee, C. S. Kang, K. Y. Lee, Expression of transforming acidic coiled-coil containing protein 3 is a novel independent prognostic marker in non-small cell lung cancer. Pathol. Int. 56, 503–509 (2006). doi:10.1111/j.1440-1827.2006.01998.x Medline
39. K. Magnusson, G. Gremel, L. Rydén, V. Pontén, M. Uhlén, A. Dimberg, K. Jirström, F. Pontén, ANLN is a prognostic biomarker independent of Ki-67 and essential for cell cycle progression in primary breast cancer. BMC Cancer 16, 904 (2016). doi:10.1186/s12885-016-2923-8 Medline
40. C. Suzuki, Y. Daigo, N. Ishikawa, T. Kato, S. Hayama, T. Ito, E. Tsuchiya, Y. Nakamura, ANLN plays a critical role in human lung carcinogenesis through the activation of RHOA and by involvement in the phosphoinositide 3-kinase/AKT pathway. Cancer Res. 65, 11314–11325 (2005). doi:10.1158/0008-5472.CAN-05-1507 Medline
41. D. Aran, M. Sirota, A. J. Butte, Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015). doi:10.1038/ncomms9971 Medline
42. D. P. Aden, A. Fogel, S. Plotkin, I. Damjanov, B. B. Knowles, Controlled synthesis of HBsAg in a differentiated human liver carcinoma-derived cell line. Nature 282, 615–616 (1979). doi:10.1038/282615a0 Medline
43. C. Kampf, I. Olsson, U. Ryberg, E. Sjöstedt, F. Pontén, Production of tissue microarrays, immunohistochemistry staining and digitalization within the human protein atlas. J. Vis. Exp. 3620, 3620 (2012). doi:10.3791/3620 Medline
44. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). doi:10.1093/bioinformatics/bts635 Medline
45. S. Anders, P. T. Pyl, W. Huber, HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015). doi:10.1093/bioinformatics/btu638 Medline
46. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). doi:10.1186/s13059-014-0550-8 Medline
47. D. W. Huang, B. T. Sherman, Q. Tan, J. Kir, D. Liu, D. Bryant, Y. Guo, R. Stephens, M. W. Baseler, H. C. Lane, R. A. Lempicki, DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35 (suppl. 2), W169–W175 (2007). doi:10.1093/nar/gkm415 Medline
48. D. Merico, R. Isserlin, O. Stueker, A. Emili, G. D. Bader, Enrichment map: A network-based method for gene-set enrichment visualization and interpretation. PLOS ONE 5, e13984 (2010). doi:10.1371/journal.pone.0013984 Medline
49. G. Csardi, T. Nepusz, The igraph software package for complex network research. Int. J. Complex Syst. 1695, 1–9 (2006).
50. P. Pons, M. Latapy, in International Symposium on Computer and Information Sciences (Springer, 2005), pp. 284–293.
51. V. Chelliah, N. Juty, I. Ajmera, R. Ali, M. Dumousseau, M. Glont, M. Hucka, G. Jalowicki, S. Keating, V. Knight-Schrijver, A. Lloret-Villas, K. N. Natarajan, J.-B. Pettit, N. Rodriguez, M. Schubert, S. M. Wimalaratne, Y. Zhao, H. Hermjakob, N. Le Novère, C. Laibe, BioModels: Ten-year anniversary. Nucleic Acids Res. 43, D542–D548 (2015). doi:10.1093/nar/gku1181