Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples...

19
Article Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome Graphical Abstract Highlights d RNA-seq of 124 unfractionated tissue samples from 32 different organs was analyzed d Human pan-endothelial enriched transcripts across vascular beds were identified d Relative expression profile was maintained in early passage cultured cells d Analysis method is applicable to profile other body-wide expressed cell types Authors Lynn Marie Butler, Bjo ¨ rn Mikael Hallstro ¨ m, Linn Fagerberg, Fredrik Ponte ´ n, Mathias Uhle ´ n, Thomas Renne ´, Jacob Odeberg Correspondence [email protected] In Brief Butler et al. use RNA-seq data from 124 unfractionated tissue samples from 32 human organs to identify known and previously unknown endothelial-specific transcripts and provide a searchable resource that can be used to determine the ‘‘endothelial enrichment score’’ of any human protein coding gene. In addition to identifying potential vascular drug targets or endothelial biomarkers, this study provides a framework to determine the specific transcriptome profiles of other cell types distributed across multiple organs. Butler et al., 2016, Cell Systems 3, 1–15 September 28, 2016 ª 2016 Elsevier Inc. http://dx.doi.org/10.1016/j.cels.2016.08.001

Transcript of Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples...

Page 1: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Article

Analysis of Body-wide Unf

ractionated Tissue Data toIdentify a Core Human Endothelial Transcriptome

Graphical Abstract

Highlights

d RNA-seq of 124 unfractionated tissue samples from 32

different organs was analyzed

d Human pan-endothelial enriched transcripts across vascular

beds were identified

d Relative expression profile was maintained in early passage

cultured cells

d Analysis method is applicable to profile other body-wide

expressed cell types

Butler et al., 2016, Cell Systems 3, 1–15September 28, 2016 ª 2016 Elsevier Inc.http://dx.doi.org/10.1016/j.cels.2016.08.001

Authors

Lynn Marie Butler,

Bjorn Mikael Hallstrom,

Linn Fagerberg, Fredrik Ponten,

Mathias Uhlen, Thomas Renne,

Jacob Odeberg

[email protected]

In Brief

Butler et al. use RNA-seq data from 124

unfractionated tissue samples from 32

human organs to identify known and

previously unknown endothelial-specific

transcripts and provide a searchable

resource that can be used to determine

the ‘‘endothelial enrichment score’’ of any

human protein coding gene. In addition to

identifying potential vascular drug targets

or endothelial biomarkers, this study

provides a framework to determine the

specific transcriptome profiles of other

cell types distributed across multiple

organs.

Page 2: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Cell Systems

Article

Analysis of Body-wide Unfractionated Tissue Datato Identify a Core Human Endothelial TranscriptomeLynn Marie Butler,1,2,6,* Bjorn Mikael Hallstrom,3 Linn Fagerberg,3 Fredrik Ponten,4 Mathias Uhlen,3 Thomas Renne,1,2

and Jacob Odeberg3,5

1Institute for Clinical Chemistry and Laboratory Medicine, University Medical Centre Hamburg-Eppendorf, 20246 Hamburg, Germany2Clinical Chemistry and Blood Coagulation, Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden3Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology (KTH), 171 21 Stockholm, Sweden4Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden5Coagulation Unit, Centre for Hematology, Karolinska University Hospital, 171 76 Stockholm, Sweden6Lead Contact*Correspondence: [email protected]

http://dx.doi.org/10.1016/j.cels.2016.08.001

SUMMARY

Endothelial cells line blood vessels and regulate he-mostasis, inflammation, and blood pressure. Pro-teins critical for these specialized functions tend tobe predominantly expressed in endothelial cellsacross vascular beds. Here, we present a systemsapproach to identify a panel of human endothelial-enriched genes using global, body-wide transcrip-tomics data from 124 tissue samples from 32 organs.We identified known and unknown endothelial-en-riched gene transcripts and used antibody-basedprofiling to confirm expression across vascularbeds. The majority of identified transcripts could bedetected in cultured endothelial cells from variousvascular beds, and we observed maintenance ofrelative expression in early passage cells. In sum-mary, we describe a widely applicable method todetermine cell-type-specific transcriptome profilesin a whole-organism context, based on differentialabundance across tissues. We identify potentialvascular drug targets or endothelial biomarkers andhighlight candidates for functional studies to in-crease understanding of the endothelium in healthand disease.

INTRODUCTION

Endothelial cells (ECs) line the inside of all vessels and have a

critical role in the regulation of hemostasis, inflammation, de-

fense against blood borne pathogens, vascular tone, angiogen-

esis, and the transport of molecules and nutrients to and from the

blood stream (Pober and Sessa, 2007; Vita, 2011). The involve-

ment of ECs in multiple disease states, such as coronary artery

disease, venous thromboembolism, edema, and vasculitis is

well recognized (Ganz and Hsue, 2013; Mackman, 2012; Steyers

and Miller, 2014; Tabas et al., 2015). ECs from different vascular

beds can vary in their gene expression profile, reflecting organ-

specific functions and even morphologically similar ECs can

show differences in gene expression (Aird, 2012; Civelek et al.,

2011; Nolan et al., 2013; Seaman et al., 2007). Known genes

with largely EC-restricted expression across tissue beds are

important for vascular stability (Du Toit, 2015) or cell-specific

functions, for example, in inflammatory processes (Ley, 2003)

or hemostasis (Lenting et al., 2015).

Recently there have been significant technological advance-

ments in large-scale analysis of cellular gene expression profiles

(Spies and Ciaudo, 2015). As ECs are a minority cell type in a

given organ it is challenging to determine EC gene expression

profiles from averaged transcriptome analysis of whole-tissue

samples. Methodological advances, such as laser cell capture

(Cheng et al., 2013), enzymatic, or manual dissection and cell

sorting (Berger et al., 2012; Nolan et al., 2013) and immuno-pu-

rification (Wang and Navin, 2015) have allowed the isolation of

ECs from tissue prior to analysis. However, such processing

and/or subsequent in vitro culture can trigger changes in gene

expression, due to the loss of the organ-specific microenviron-

ment (Amaya et al., 2015; Balda and Matter, 2009; Durr et al.,

2004).

Here, as an illustrative application of the Human Protein

Atlas resource (Uhlen et al., 2015), we used a systems-based

approach to define the physiological human in vivo pan

endothelial-enriched gene expression profile using whole-

transcriptome analysis of unfractionated tissue samples. We

identify a panel of human pan EC-enriched transcripts and

replicate our findings using the same analysis protocol on

Genotype-Tissue Expression (GTEx) datasets. 118 of the

identified transcripts encode for novel or uncharacterized

EC proteins. We also provide a searchable resource that

can be used to determine the extent of pan endothelial spec-

ificity of any gene.

The identification of previously unknown EC-enriched genes

provides new vascular drug targets or biomarker candidates

and presents candidates for future studies to further increase

our understanding of EC function in health and disease.

RESULTS

We performed RNA sequencing (RNA-seq) tissue transcript

profiling of 124 samples collected from 32 human organs (n =

2–7 samples/organ) as part of the Human Protein Atlas Project

Cell Systems 3, 1–15, September 28, 2016 ª 2016 Elsevier Inc. 1

Page 3: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Figure 1. CLEC14A, vWF, and CD34 Tran-

script Quantities In Vivo Reflect the Degree

of Tissue Vascularization

(A) Mean FPKM values for c-type lectin domain

family 14, member A (CLEC14A), von Willebrand

factor (vWF), and CD34 (CD34) transcripts in bone

marrow, pancreas, ovary, tonsil, salivary gland,

appendix, spleen, thyroid gland, gallbladder, uri-

nary bladder, heart muscle, and lung; n = 2–5 in-

dividual samples/organ (see Table S1). Data are

mean ± SEM. Corresponding IHC images stained

with primary antibodies against CLEC14A, vWF,

and CD34 protein are shown on tissue sections

from ovary, appendix, gall bladder, and lung.

(B) Scatterplots showing correlations between

mean CLEC14A, vWF, and CD34 FPKM values

and the estimated mean EC percentage in the

sequenced sample, determined by histological

examination prior to processing. Tissue type rep-

resented by each symbol corresponds to that

indicated on the x axis of (A). Pearson correlation

and corresponding p values are shown in the top

left of each scatterplot.

See also Figure S1A. Scale bars, 100 mm.

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

(HPA; http://www.proteinatlas.org/) (Uhlen et al., 2015). Tissue

cryosections from selected organs (bone marrow, pancreas,

ovary, tonsil, salivary gland, appendix, spleen, thyroid gland,

gall bladder, urinary bladder, heart muscle, and lung) were

H&E stained to morphologically determine the percentage of

ECs, prior to RNA processing and sequencing from identical

samples (see Table S1 and Experimental Procedures for further

details). Fragments per kilobase of exon model per million map-

ped reads (FPKMs) values were calculated for 20,073 mapped

protein-coding genes in all 124 samples.

CLEC14A, vWF, and CD34 Transcript Quantities ReflectLevels of Tissue VascularizationWe selected three transcripts that encode for proteins that are

predominantly expressed in ECs across different vascular

beds; c-type lectin domain family 14, member A (CLEC14A)

(Rho et al., 2011), von Willebrand factor (vWF) (Zanetta et al.,

2 Cell Systems 3, 1–15, September 28, 2016

2000), and CD34 (CD34) (Muller et al.,

2002; Pusztaszeri et al., 2006). VWF has

long been acknowledged as an EC

marker in vivo (Zanetta et al., 2000), as

has CD34, although both reportedly

show some variation between tissue

beds and vessel types (Muller et al.,

2002; Pusztaszeri et al., 2006). CLEC14A

was originally described as an EC protein

in murine models (Rho et al., 2011), and it

was later described as a tumor angiogen-

esis marker with limited expression in

selected normal human tissues (Mura

et al., 2012; Noy et al., 2015). However,

immunohistochemistry (IHC) confirmed

enriched EC expression of all three

across vascular beds (examples shown

in Figure 1A). Mean FPKM values of

CLEC14A, vWF, and CD34 varied from <1–56, <1–110, and

4–166, respectively, across the 32 organs (Figure 1A). Although

absolute FPKM values differed, the relative expression of the

EC reference transcripts were strikingly similar, with highest

levels detected in highly vascularized organs, such as the heart,

lung, placenta, and adipose tissue and lowest in less vascular-

ized organs, such as pancreas and ovary (organs with accompa-

nying percentage EC data in Figure 1A, those without in Fig-

ure S1A.i). CLEC14A, vWF, and CD34 FPKM values were

strongly correlated with each other across individual samples

(correlation >0.74, p values <0.001) (Figure S1A.ii), supporting

the concept that combined CLEC14A, vWF, and CD34 expres-

sion provides a surrogate measurement for the relative degree

of tissue vascularization in vivo. Consistent with the expression

data, IHC revealed a high vascular content in tissues with high

CLEC14A, vWF, and CD34 FPKM values (Figure 1A). CLEC14A,

vWF, and CD34 expression correlated with percentage of EC in

Page 4: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Figure 2. Correlation Values between the

Reference Endothelial Cell Transcripts

CLEC14A, vWF, CD34 and Proteins Des-

cribed as EC Enriched in the Literature

(A) RNA-seq data from 124 individual samples

from 32 different human tissue types were used to

generate Spearman pair wise correlation values

between the EC reference transcripts CLEC14A,

vWF, and CD34 and transcripts reported in the

literature as EC enriched.

(B) IHC images of salivary gland, gallbladder, and

lung tissue stained for proteins encoded by

HSPA12B, PECAM1, ENG, ESM1, LIPG, and

EDF1. Corresponding scatterplots (right) show the

correlation betweenmean FPKM values andmean

EC percentage in selected sequenced tissue

samples. Tissue type represented by each symbol

corresponds to that indicated on the x axis of

Figure 1A. Pearson correlations and correspond-

ing p values are shown for each scatterplot. Scale

bars, 50 mm.

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

the corresponding sequenced tissue samples (correlation 0.82,

p value 0.001; correlation 0.90, p value <0.0001 and correlation

0.80 p value 0.002, respectively) (Figure 1B).

C

Cross-Tissue Correlation AnalysisCan Be Used to Identify EC-Enriched TranscriptsWe performed a bioinformatics analysis

of the RNA-seq tissue transcript profiling

data across the 32 organ types to pro-

duce correlation coefficient values be-

tween CLEC14A, vWF, and CD34 FKPM

values and those of the other 20,073

mapped protein-coding genes. A high

correlation value with all three EC refer-

ence genes should indicate EC-enriched

expression of the gene(s) in question

across tissue types. To test this method

for identification of EC-enriched tran-

scripts, we generated a list of 26 genes

widely considered as EC enriched, based

on published data (Ballabio et al., 2004;

Bernat et al., 2006; Ho et al., 2003; Humi-

niecki and Bicknell, 2000; Jaye et al.,

1999; Korhonen et al., 1995; Steagall

et al., 2006) and analyzed the correlation

between the FPKM values for these tran-

scripts and CLEC14A, vWF, and CD34

(Figure 2A). 20/26 selected genes had a

high mean correlation coefficient with

CLEC14A, vWF, and CD34 >0.5 (15/25

correlation >0.6). However, FPKM values

for endothelial-specific molecule (ESM1),

endothelial lipase (LIPG), and endothelial

differentiation-related factor 1 (EDF1)

failed to correlate with CLEC14A, vWF,

or CD34 FPKM values (correlation 0.04,

�0.02, and �0.09, respectively), sug-

gesting misclassification of these genes

as pan EC enriched. Consistent with this hypothesis, IHC for

transcripts with high-correlation coefficients, e.g., HSPA12B,

PECAM1, and ENG (mean correlation 0.73, p value <0.001,

ell Systems 3, 1–15, September 28, 2016 3

Page 5: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

0.67 p value <0.001 and 0.59, p value <0.001, respectively)

confirmed EC-enriched expression, while IHC for ESM1, LIPG,

and EDF1 did not (selected organs representing low [salivary

gland], medium [gallbladder], and high [lung] EC percentage by

histology are shown as representative examples in Figure 2B).

Furthermore, mean FPKM values for HSPA12B, PECAM1, and

ENG showed a correlation with the estimated mean percentage

of ECs in bone marrow, pancreas, ovary, tonsil, salivary gland,

appendix, spleen, thyroid gland, gall bladder, urinary bladder,

heart muscle, and lung (correlation 0.60, p value 0.04, 0.53 p

value 0.07, 0.75, p value 0.005, respectively), while such correla-

tion was absent for ESM1, LIPG, and EDF1 (correlation 0.43, p

value 0.16; correlation 0.23, p value 0.47 and 0.05, p value

0.86, respectively) (Figure 2B, right). Based on this correlation

data, we defined ‘‘EC-enriched genes’’ as those that had signif-

icant mean correlation coefficients with the EC reference tran-

scripts CLEC14A, vWF, and CD34 >0.5.

Exclusion of Possible False Positives Due to thePresence of Other Cell TypesThe EC reference transcripts CD34 and vWF are also expressed

in hematopoietic stem cells (Satterthwaite et al., 1992) and

platelets (Kanaji et al., 2012; Schick et al., 1997), respectively,

raising the concern that transcripts from vasculature associated

hematopoietic cells could be incorrectly classified as EC en-

riched. Protein tyrosine phosphatase, receptor type, C (PTPRC)

(commonly known as CD45), a differentiated hematopoietic cell

marker, was predominantly expressed in the lymph node, tonsil,

appendix, spleen, and bone marrow (Figure S1B.i) and the

platelet protein integrin, alpha 2b (platelet glycoprotein IIb of

IIb/IIIa complex, antigen CD41) (ITGA2B) was expressed mainly

in bone marrow (Figure S1C.i). PTPRC and ITGA2B showed no

significant correlation with the EC reference seeds (correlation

with CLEC14A, vWF, and CD34: 0.01, 0.11, �0.11 and 0.11,

0.19, 0.25, respectively) (Figures S2A and S2B), arguing against

potential misclassification of transcripts expressed by circu-

lating blood cells as EC enriched. As vascular smooth muscle

cells (SMCs) surround vessels, we assessed whether transcripts

from this cell type could be incorrectly classified as EC enriched.

The SMC marker protein myosin, heavy chain 11, smooth mus-

cle (MYH11) was most highly expressed in smooth muscle tissue

and esophagus (Figure S1D.i). MYH11 expression did show a

significant, albeit weak, correlation with the three EC reference

transcripts (correlation CLEC14A, vWF, and CD34: 0.41, 0.36,

0.41, respectively) (Figure S2C), indicating further analysis

was required to determine whether any SMC transcripts were

falsely annotated as EC-enriched transcripts. We found no asso-

ciation between the mean percentage of ECs in bone marrow,

pancreas, ovary, tonsil, salivary gland, appendix, spleen, thyroid

gland, gall bladder, urinary bladder, heart muscle, and lung

and the mean FPKM value for PTPRC, ITGA2B, or MYH11 (cor-

relation �0.30, �0.29, and �0.05, respectively; Figures S1B.ii,

S1C.ii, and S1D.ii).

Sensitivity and Specificity AnalysisTo test the sensitivity and specificity of our method for identifica-

tion of EC-enriched transcripts and to determine optimal anal-

ysis criteria, we compared correlation coefficients between the

EC reference genes, CLEC14A, vWF, and CD34, and four sets

4 Cell Systems 3, 1–15, September 28, 2016

of transcripts categorized as: (1) ‘‘previously known EC en-

riched’’ (Ballabio et al., 2004; Bernat et al., 2006; Ho et al.,

2003; Huminiecki and Bicknell, 2000; Jaye et al., 1999; Korhonen

et al., 1995) (as featured in Figure 2A, with the exclusion of ESM1,

LIPG, and EDF1 due to lack of evidence of EC expression) (2)

‘‘non-EC expressed’’ (no expression in cultured EC, no evidence

of EC staining in vivo by IHC and expression in at least 20 of

the 32 organs sequenced, see Table S2, tab 2 for details) (3)

‘‘smooth muscle cell (SMC) enriched’’ (Conley, 2001; Dreiza

et al., 2010; Long et al., 2009; Miwa et al., 1991; Rensen et al.,

2007; Wang et al., 2003; Yamawaki et al., 2001), or (4) ‘‘macro-

phage enriched’’ (East and Isacke, 2002; Fabriek et al., 2009;

Kaufmann et al., 2001; Kunjathoor et al., 2002; Liang and Tedder,

2001; Murray and Wynn, 2011; Varchetta et al., 2012) (Table S2,

tabs 1–4: column A). 15/23 (65%) of the previously known EC-

enriched transcripts had mean correlation values >0.6 with our

EC reference transcripts, which increased to 20/23 (87%)

when the cutoff point was lowered toR0.5 (Table S2, tab 1, sec-

tion A; Figure S3A.i). In contrast, all 50 ‘‘non-EC transcripts’’ had

mean correlation values <0.3 with the EC reference transcripts

(mean�0.01, SD 0.19) (Table S2, tab 2, section A; Figure S3A.ii).

9/12 (75%) ‘‘SMC-enriched’’ transcripts had correlation values

<0.5 with the EC reference transcripts (mean 0.40, SD 0.06),

but 3/12 (25%) had correlation values >0.5 (mean 0.52, SD

0.01), indicating a 25% rate of false classification of SMC genes

as EC enriched (Table S2, tab 3, section A; Figure S3A.iii).

All ‘‘macrophage-enriched’’ transcripts had correlation values

<0.38with the EC reference transcripts (mean 0.22, SD 0.11) (Ta-

ble S2, tab 4, section A; Figure S3A.iv). Overall 67/70 (96%) of the

non EC-enriched transcripts had a correlation coefficient with

the EC reference transcripts of <0.5, and the three others were

all from the SMC-enriched category. To determine whether

such false positives could be identified, we performed an addi-

tional analysis to measure mean correlation coefficient values

between three selected SMC reference transcripts Myosin,

Heavy Chain 11, Smooth Muscle (MYH11), Myosin Light Chain

Kinase (MYLK), and Actin, Alpha 2, Smooth Muscle, Aorta

(ACTA2) and those in the ‘‘previously known EC-enriched’’ and

the SMC-enriched test set (Table S2, tabs 1 and 3, section C).

The previously known EC-enriched transcripts had higher

mean correlation values with the EC reference transcripts,

CLEC14A, vWF, and CD34, than with the SMC reference tran-

scripts, MYH11, MYLK, and ACTA2 (mean 0.64 versus 0.33,

respectively) (Table S2, tab 1, section A versus section C), while

the reverse was true for the SMC-enriched transcripts (mean

0.76 versus 0.43) (Table S2, tab 3, section A versus section C).

Based on these analyses, we defined EC-enriched genes as

those that had statistically significant mean correlation coeffi-

cients R0.5 with the EC reference transcripts CLEC14A, vWF,

and CD34, with, on a transcript-to-transcript basis, lower corre-

lation values with the SMC reference transcripts,MYH11,MYLK,

and ACTA2. These criteria minimized the risk of false negatives

and positives. Correlation values of each identified gene with

SMC reference transcripts is provided in Table S3, tab 1, col-

umns AC–AE.

Identification of EC-Enriched GenesFrom 20,073 mapped protein-coding genes 481 transcripts

had mean correlation coefficients R0.5 with the EC reference

Page 6: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Figure 3. Summary of Endothelial Cell Reference Transcript Correlation Analysis Data

RNA-seq data from 124 individual samples from 32 different human tissue types were used to generate pairwise correlation values between the EC reference

transcripts CLEC14A, vWF, and CD34 and the other 20,073 detectable protein-coding genes.

(A) 234 transcripts were identified as EC enriched and categorized as known (previously reported as EC expressed), unknown (not reported as EC expressed), or

uncharacterized. The ten most highly correlated in each category are displayed (p < 0.001 in all cases).

(B) Scatterplots showing the correlation between mean FPKM values for selected genes from each category and the mean EC percentage in the sequenced

tissue sample, determined by histological examination prior to processing. Tissue type represented by each symbol corresponds to that indicated on the x axis of

Figure 1A. Pearson correlation and corresponding p values are shown in the top left of each scatterplot.

See also Table S3, tab 1.

Cell Systems 3, 1–15, September 28, 2016 5

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Page 7: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

(legend on next page)

6 Cell Systems 3, 1–15, September 28, 2016

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Page 8: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

transcripts CLEC14A, vWF, and CD34. 373/481 (77.5%) had

higher individual correlation coefficients with the EC reference

transcripts compared to the SMC transcripts MYH11, MYLK,

and ACTA2, and the remainder (23%) were excluded (corre-

sponding approximately to the predicted false positive rate,

25%). Of these, 332 (89.1%) had a Bonferroni corrected p

value <0.05 and FDR <0.0001. Individual and mean correlation

values between CLEC14A, vWF, and CD34 and all detected

protein-coding genes are provided in Table S3, tab 2.

Replication of Results Using GTEx DatasetsTo confirm our results with an independent approach, we used

RNA-seq data from the Genotype-Tissue Project (GTEx) portal

(http://www.gtexportal.org/home/) (Ardlie et al., 2015) from 24

human organs (4470 samples) (Table S2, tab 6) to replicate

our analysis. We repeated the sensitivity and specificity anal-

ysis, as for the HPA material. 20/22 (91%) of previously known

EC-enriched transcripts had mean correlation values >0.5 with

the EC reference transcripts (mean correlation 0.71, SD 0.16)

(Table S2, tab 1, section B; Figure S3B.i). All 50 non-EC tran-

scripts had mean correlation values <0.5 with the EC reference

transcripts (mean correlation 0.08, SD 0.27) (Table S2, tab 2,

section B; Figure S3B.ii). 10/12 (83%) SMC-enriched tran-

scripts had correlation values <0.5 (mean correlation 0.39, SD

0.07), but 2/12 (17%) had correlation values R0.5 (mean corre-

lation 0.51, SD 0.01) (Table S2, tab 3, section B; Figure S3B.iii),

indicating a lower false positive rate than in the HPA material

(17% versus 25%). 1/8 (12.5%) macrophage-enriched tran-

scripts had a correlation value >0.5 (mean correlation 0.43,

SD 0.06) (Table S2, tab 4, section B; Figure S3B.iv), revealing

a higher false positive rate in this category than for the HPA ma-

terial. GTEx and HPA values for all test transcripts highly corre-

lated with each other (Figure S3C) (correlation 0.84 p value

<0.0001). Based on this analysis, we defined the requirement

for replication as a mean correlation coefficient R0.5 with the

EC reference transcripts CLEC14A, vWF, and CD34 in the

GTEx material for each HPA-identified EC-enriched transcript.

233/332 (70%) of the EC-enriched genes determined from the

HPA material were replicated in the GTEx material. This final

list contained 82 of the 100 transcripts most highly correlated

with the EC reference seeds in the GTEx material (Table S3,

tab 3). A summary of the selection protocol is shown in

Figure S3D.

The 234 transcripts included 116 previously described in an

EC context, 88 that had not been previously been associated

with EC and 30 protein-coding genes on which knowledge is

sparse or entirely absent (Table S3, tab 1, column Q). The

ten most highly correlated genes in each category are detailed

in Figure 3A. Example correlation plots of the transcript FPKM

values versus the mean estimated percentage of EC in seq-

uenced samples are shown for three genes from each category

(correlation range 0.91–0.54, p value range <0.001–0.06) (Fig-

Figure 4. EH-Domain Containing 2 Is a Pan Endothelial-Enriched Prote

(A) IHC staining of multiple tissue types using a primary antibody targeting EHD2

(B.i.) Plotted mean FPKM values for von Willebrand factor (vWF) and EHD2 trans

represented as mean ± SEM. Corresponding IHC images from liver, kidney, adre

(B.ii.) Staining for EHD2 in (1) veins, (2) venules, and (3) capillaries of the heart m

Data are represented as mean ± SEM. See also Figure S2D.

ure 3B). Gene ontology (GO) analysis (Ashburner et al., 2000)

was performed on the final list of EC-enriched transcripts.

The most significant biological process groupings were all

related to EC function (vasculature development, blood vessel

development, angiogenesis, circulatory system development,

cardiovascular system development [corrected p value for

all <2.2 3 1033]), with numerous other endothelial related

groupings also identified, e.g., endothelial development, regu-

lation of endothelial cell migration, positive regulation of endo-

thelial cell migration, endothelial cell differentiation, vascular

endothelial growth factor signaling pathway, etc. (Table S3,

tab 4).

Protein Profiling of Novel EC-Enriched Genes In VivoWe selected three genes identified as EC enriched from

the unknown or uncharacterized category, that had varying

levels of expression in primary cultured EC in vitro (see

following section and Table S3, column AI) for antibody-based

protein profiling. EH-domain containing 2 (EHD2; mean corre-

lation 0.72, p value <0.001) and LIM and senescent cell anti-

gen-like domains 2 (LIMS2; mean correlation 0.69, p value

<0.001) (Table S3, tab 1 and Figures S2D and S2E, respec-

tively), both from the unknown category and family with

sequence similarity 110, member D (FAM110D; mean correla-

tion 0.65, p value <0.001) (Table S3, tab 1 and Figure S2F)

from the uncharacterized category, are predominantly ex-

pressed in ECs, e.g., stomach, skin, cerebral cortex, esoph-

agus, gallbladder, urinary bladder, placenta, breast, naso-

pharynx, heart, and lung (Figures 4A, 5A, and 6A). EDH2 and

LIMS2 FPKM expression levels both paralleled that of vWF

(Figures 4B and 5B) (variation by ANOVA for organ type =

72.9%, gene = 0.4% and organ type = 62.5%, gene =

0.18%, respectively). However, higher relative levels of EDH2

versus vWF were observed in liver, kidney, adrenal gland,

and ovary (Figure 4B.i, dotted boxes). IHC staining showed

that EHD2 was expressed in liver, kidney, and adrenal gland

EC, while vWF was largely absent. Ovarian ECs were positive

for EHD2 and vWF, with some EHD2 positivity in ovarian stro-

mal cells. Higher relative levels of LIMS2 versus vWF expres-

sion were observed in the liver, kidney, small intestine, and

prostate (Figure 5B.i, dotted boxes). LIMS2 staining was

stronger than vWF in liver, kidney, and small intestine ECs

(Figure 5B.i). Prostate ECs were positive for both, with some

weak LIMS2 staining in prostate smooth muscle cells (Fig-

ure 5B.i). Unlike EDH2 and LIMS2, mean FPKM values for

FAM110D were significantly lower than vWF (Figure 6B) (vari-

ation by ANOVA for organ type = 32.4%, gene = 19.8%).

FAM110D FPKM values, relative to vWF, were highest in the

kidney (second dotted box, Figure 6B.i). IHC confirmed

FAM110D expression in kidney ECs, but some positive stain-

ing was observed outside the EC compartment (Figure 6B.i).

EHD2, LIMS2, and FAM110D protein staining was observed

in In Vivo

. Scale bars, 100 mm.

cripts in 124 individual samples from 32 different human tissue types. Data are

nal gland, and ovary (denoted by dotted boxes) are displayed above.

uscle. Scale bars, 50 mm.

Cell Systems 3, 1–15, September 28, 2016 7

Page 9: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

(legend on next page)

8 Cell Systems 3, 1–15, September 28, 2016

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Page 10: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

in different vessel types (illustrated in heart muscle; Figures

4B.ii, 5.B.ii, and 6.B.ii, respectively).

Expression of In Vivo Pan Endothelial-Enriched Genes inIn Vitro Cell-Culture SystemsWe performed RNA-seq on first passage human umbilical vein

ECs (HUVECs), freshly isolated from four independent individ-

uals, to analyze the expression of our identified EC-enriched

transcripts on a cellular level. 56% of all mapped protein-coding

genes were expressed (11,292 transcripts with FPKM number

R1). 80% of these transcripts were housekeeping genes that

we have previously reported as widely expressed in all tissues

(Uhlen et al., 2015). Of the 234 EC-enriched transcripts identified

in vivo, 196were detectable with a cut off FPKMofR1 (Table S3,

tab 1, column AI; Figure 7A). We also analyzed expression of the

234 genes in human ECs isolated from other vascular beds;

bladder microvasculature (HBMEC), iliac artery (HIAEC), saphe-

nous vein (HSVEC), umbilical artery (HUAEC), uterine microvas-

cular (HUtMEC), using microarray expression data (http://www.

ncbi.nlm.nih.gov/geo/). The normalized expression values are

not strictly comparable to the RNA-seq FPKM values from

HUVEC, as they were generated using an alternative technology

platform. However, 161 of the pan EC-enriched transcripts iden-

tified by our analysis were detected in one or more of these EC

types (Table S3, tab 1, columns AM–AQ), while only 20measured

transcripts were not detectable in any of the cultured cells. 16 of

these belonged to the unknown or uncharacterized category,

which could explain the lack of acknowledgment of these tran-

scripts as endothelial, due to the heavy historical reliance on

in vitro work to characterize EC gene expression profiles. To

determine whether the relative in vivo expression levels of the

234 EC-enriched genes (to each other) were maintained in

cultured ECs, we calculated the mean FPKM for each transcript

across the 124 sequenced human tissues to produce a relative

expression score (Table S3, tab 1, column AH) and analyzed

the correlation between these scores and the respective tran-

script expression level in cultured ECs. We observed a positive

relationship (correlation 0.50, p value <0.0001) between the rela-

tive in vivo expression scores of the known, unknown, and un-

characterized transcripts and the corresponding transcript levels

in cultured HUVEC (Figure 7A.ii). In contrast, correlations were

absent in HBMEC, HIAEC, HSVEC, HUAEC, and HUtMEC (cor-

relation 0.03, 0.01, 0.04, 0.03, and 0.02, respectively). Taken

together, these results show that most pan EC-enriched genes

expressed in vivo can be detected in cultured ECs, and the rela-

tive expression levels are maintained in first passage HUVEC.

GIPC3 and KANK3, representative transcripts from the unknown

and uncharacterized EC-enriched categories that were not de-

tected in vitro, were confirmed as ECs expressed in vivo, e.g.,

stomach, liver, cerebral cortex, and kidney (Figure 7B). This anal-

ysis provides information regarding the suitability of in vitro sys-

tems to study EC-enriched gene expression and function.

Figure 5. LIM and Senescent Cell Antigen-like Domains 2 Is Pan Endo

(A) IHC staining of multiple tissue types using a primary antibody targeting LIMS

(B.i.) Plotted mean FPKM values for von Willebrand factor (vWF) and LIMS2 trans

represented as mean ± SEM. Corresponding IHC images from liver, kidney, sma

(B.ii.) Staining for LIMS2 expression in (1) veins, (2) venules, and (3) capillaries of

Data are represented as mean ± SEM. See also Figure S2E.

DISCUSSION

Here, we use an integrative transcriptomics and antibody-based

profiling approach to identify human EC-enriched proteins. We

provide a searchable resource (Table S3), which can be used

to determine the extent of pan EC specificity of any mapped

gene. For example, there are three types of NOS, an enzyme

involved in the synthesis of nitric oxide from L-arginine (Forster-

mann and Sessa, 2012), neuronal nNOS (NOS1) (Zhou and Zhu,

2009), cytokine-inducible iNOS (NOS2) (Bogdan, 2015), and

endothelial eNOS (NOS3) (Oliveira-Paula et al., 2016). Surpris-

ingly, there is no description of the extent of pan EC specificity

of eNOS in humans, but our data suggestNOS3 is predominantly

EC enriched (correlation 0.50), in contrast to NOS1 and NOS2,

(correlation �0.04 and 0.12, respectively). This resource can

be used together with our HPA tissue protein profiling data

(http://www.proteinatlas.org/) (Uhlen et al., 2015).

Our study has some limitations; we observe EC expression for

the majority of EC-enriched genes identified but could not

confirm all, due to lack of antibody specificity or other technical

issues, e.g., staining for secreted proteins. HPA antibody reli-

ability guidelines are available, with assessment of concordance

between RNA-seq and IHC staining, detection by western blot

and protein array specificity analysis. We minimized incorrect

classification of SMCs or macrophage genes as EC enriched,

but it was not possible to do such analysis for pericytes, due

to the lack of specific markers (Armulik et al., 2011). However,

pericytes are present in different relative quantities to ECs across

vascular beds (Dıaz-Flores et al., 2009), so pericyte-specific

genes, if they exist, should not correlate with our EC reference

transcripts. By analyzing samples from a large number of organs,

but few from each, the term ‘‘pan EC-transcriptome’’ needs to be

interpreted with caution; a lack of EC expression in some tissue

beds is not incompatible with a high correlation value. Examina-

tion of correlation plots to identify tissue-specific outliers, anal-

ysis of larger datasets (as we did using GTEx material) or IHC

confirmation could address this. EC reference transcripts are

not uniformly expressed across all vascular beds, e.g., low

vWF in liver ECs, however, identified EC-enriched genes were

detectable in EC of such ‘‘outlier’’ organs.

Previous efforts to determine human EC-enriched transcripts

have used isolated/cultured cells (Chi et al., 2003; Ho et al.,

2003), and those confirmed as pan EC-enriched are often critical

for EC function, e.g.,CDH5 (Carmeliet et al., 1999), FLT1 (Li et al.,

2015), and vWF (Lenting et al., 2015). There are many studies of

EC gene expression during normal and pathological angiogen-

esis (Seaman et al., 2007; Seano et al., 2014; van Beijnum

et al., 2006). Many genes we identified have a role in angiogen-

esis induction and vessel stability, e.g., RAMP2 (Ichikawa-

Shindo et al., 2008), RRAS (Sawada et al., 2012), ADGRL4 (Ma-

siero et al., 2013), and RHOJ (Kim et al., 2014). EC turnover in

adult tissues is over years, rather than months or weeks (Hobson

thelial-Enriched Protein In Vivo

2. Scale bars, 100 mm.

cripts in 124 individual samples from 32 different human tissue types. Data are

ll intestine, and prostate (denoted by dotted boxes) are displayed above.

the heart muscle. Scale bars, 50 mm.

Cell Systems 3, 1–15, September 28, 2016 9

Page 11: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

(legend on next page)

10 Cell Systems 3, 1–15, September 28, 2016

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Page 12: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Figure 7. Pan EC-Enriched Transcript Ex-

pression in Cultured ECs

(A.i) RNA-seq data from first passage primary

umbilical vein endothelial cells (HUVECs) were

used to identify the proportion of detectable

known, unknown, and uncharacterized pan EC-

enriched transcripts (FPKM R1). (A.ii) RNA-seq

data from 124 individual samples from 32 different

human tissue types were used to calculate a mean

FPKM expression value for each pan EC-enriched

transcript, which was plotted against the corre-

sponding mean transcript expression in HUVEC

(n = 4). Green, red, and black points represent

known, unknown, and uncharacterized tran-

scripts, respectively. Pearson correlations and

corresponding p values are shown in the lower

right of each plot.

(B) Stomach, liver, cerebral cortex, and liver

tissue sections stained for proteins encoded by

GIPC3 and KANK3, pan EC-enriched transcripts

that could not be detected in first passage

HUVEC (unknown and uncharacterized category,

respectively).

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

and Denekamp, 1984), so constitutive expression of these pro-

angiogenic genes unlikely indicates active angiogenesis. We

also identified angiogenesis inhibitory genes, e.g., NOTCH4

(Leong et al., 2002), FGD5 (Cheng et al., 2012), MMRN2 (Loren-

zon et al., 2012), and DLL4 (Liu et al., 2014). Thus, we speculate

that pro- and anti-angiogenic gene expression maintains a ho-

meostatic balance in the absence of external cues. Indeed, neo-

vascularization is associated with baseline gene expression

modulation (Benedito et al., 2012; Shih et al., 2002), rather than

absolute induction or suppression of specific transcripts.

We identified 88 EC-enriched genes encoding for previously

unknown or uncharacterized EC proteins; nine at very low tran-

script levels (mean expression across tissues <FPMK 2), e.g.,

FAM110D. Previous studies of human EC transcript expression

have involved EC isolation (Bhasin et al., 2010; Chu and Peters,

Figure 6. Family with Sequence Similarity 110, Member D Is a Pan EC-Enriched Protein In Vivo

(A) IHC staining of multiple tissue types using a primary antibody targeting FAM110D. Scale bars, 100 mm.

(B.i.) Mean FPKM values for von Willebrand factor (vWF) and FAM110D transcripts in 124 individual sampl

represented as mean ± SEM. Corresponding IHC images from liver, kidney, skeletal muscle, and ovary (den

(B.ii.) Staining for FAM110D in (1) veins, (2) arterioles, and (3) capillaries of the heart muscle. Scale bars, 50

Data are represented as mean ± SEM. See also Figure S2F.

Ce

2008; Seaman et al., 2007; Urich et al.,

2012), meaning results lack whole-organ-

ism context. Thus, lowly expressed (but

highly enriched) EC genes may have

been overlooked. Recent studies high-

light the complex relationship between

mRNA transcription and protein pro-

duction, finding that the two correlate

but not always strongly (Vogel and Mar-

cotte, 2012). Protein levels are primarily

determined by translation rates, followed

by transcription rates (Schwanhausser

et al., 2011), with gene-specific dy-

namics; e.g., metabolic genes can have

high protein to mRNA ratios (Vogel et al., 2010), while those

involved in transcriptional regulation can exhibit opposite traits

(Schwanhausser et al., 2011). Although FAM110D FPKM values

were <1 in most organs, IHC confirmed EC expression,

endorsing an integrative transcriptomics and antibody-based

profiling approach. Pan EC-enriched transcripts expressed at

low levels in the ‘‘baseline’’ state could be increased during

inflammation or angiogenesis, as for other EC genes, e.g.,

VEGFC (Semenza, 2001) and SELE (Collins et al., 1995).

We identified a number of EC-enriched proteins with unde-

fined in vivo function. Ectopic expression of EHD2, an ATPase

(Stoeber et al., 2012), revealed a role in the regulation of caveolin

1 (CAV1) carrier state via actin filament interaction (Moren et al.,

2012; Stoeber et al., 2012). CAV1 was also identified as pan

EC enriched; thus, one could speculate that EHD2 has a role in

es from 32 different human tissue types. Data are

oted by dotted boxes) are displayed above.

mm.

ll Systems 3, 1–15, September 28, 2016 11

Page 13: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

EC-specific caveolin function in vivo. In cell lines, LIMS2 local-

ized to focal adhesions and complexed with integrin-linked ki-

nase (ILK) (Zhang et al., 2002), where it negatively regulated

the interaction with LIMS1, a regulator of actin cytoskeletal

arrangement (Zhang et al., 2002). LIMS2 could be involved in

EC-specific regulation of LIMS1, and have a possible role in

angiogenesis, where shape changes and migration processes

are critical (Seano et al., 2014). There are currently no studies

on FAM110D (Figure 6), but its paralog genes, FAM110A-C,

localize to centrosomes and spindle poles, with a possible role

in cell-cycle regulation (Hauge et al., 2007).

Most identified pan EC-enriched genes were detected in

cultured ECs. Some transcripts encoding for well-described

EC proteins, e.g., PDGFRB (Beitz et al., 1991), GPIHBP1 (Pei-

Ling Chiu et al., 2014), HSPA12B (Hu et al., 2006), and PEAR1

(Nanda et al., 2005), were lost in vitro. Other undetectable genes

were unknown or uncharacterized; e.g.,GIPC3 and KANK3 both

were confirmed as EC enriched in vivo. Although cultured

venous and arterial ECs can retain distinct gene expression dif-

ferences, even after multiple passages (Chi et al., 2003), the

absence of microenvironment cues, e.g., shear stress, can

modulate mRNA levels (Amaya et al., 2015) and stability (Wu

et al., 2011), inducing a rapid gene expression drift (Durr et al.,

2004; Lacorre et al., 2004). Thus, the loss of pan EC-enriched

gene expression in vitro could be due to culture and/or repeated

passage, as for CD34 (Delia et al., 1993).

Bioinformatic algorithms have been proposed for deconvo-

luting gene expression data from heterogeneous tissues into

cell type-specific gene expression profiles, i.e., in silico micro

dissection (Gaujoux and Seoighe, 2013; Ju et al., 2013; Shen-

Orr et al., 2010). Here, we present a direct method to identify

the transcriptome profile of a low abundance cell type, EC,

from heterogeneous tissue samples. The transcriptome of other

cell types could be analyzed using the same principle, if present

across multiple tissue beds and expressing specific markers,

e.g., mast cells or resident macrophages. The pan EC-enriched

transcriptome generated here provides a basis for in-depth func-

tional studies to expand our knowledge of the vascular system in

health and disease.

STAR+METHODS

Detailed methods are provided in the online version of this paper

and include the following:

d KEY RESOURCES TABLE

d CONTACT FOR REAGENT AND RESOURCE SHARING

d EXPERIMENTAL MODEL AND SUBJECT DETAILS

d METHOD DETAILS

12

B Human Tissue Preparation and Transcript Profiling

B Estimation of Endothelial Cell Percentage in Selected

Tissues

B Tissue Profiling: Human Tissue Sections

B Transcript Profiling: Isolated Human Endothelial Cells

d QUANTIFICATION AND STATISTICAL ANALYSIS

B Analysis of RNA-Seq Data to Determine Pan EC-En-

riched Transcripts

B Analysis of GTEx RNA-Seq Data as Replication

B Gene Ontology (GO) Enrichment Analysis

Cell Systems 3, 1–15, September 28, 2016

d DATA AND SOFTWARE AVAILABILITY

d ADDITIONAL RESOURCES

SUPPLEMENTAL INFORMATION

Supplemental Information includes three figures and three tables and can be

found with this article online at http://dx.doi.org/10.1016/j.cels.2016.08.001.

AUTHOR CONTRIBUTIONS

Conceptualization, L.M.B. and J.O.; Methodology, L.M.B. and J.O.; Formal

Analysis, L.F. and B.M.H.; Investigation, L.M.B. and J.O.; Resources, F.P.

and M.U.; Writing – Original Draft, L.M.B. and J.O.; Writing – Review & Editing,

all; Visualization, L.M.B., T.R., and J.O.; Funding Acquisition, L.M.B., M.U.,

F.P., T.R., and J.O.

ACKNOWLEDGMENTS

We acknowledge the staff of the Human Protein Atlas (HPA) program, the Sci-

ence for Life Laboratory, and the pathology team in Mumbai, India. We thank

the Department of Pathology at the Uppsala Akademiska Hospital, Uppsala,

Sweden and Uppsala Biobank for kindly providing specimens used in this

study. The HPA was funded by Knut & Alice Wallenberg Foundation. We

also acknowledge funding from Hjart Lungfonden (20140691 and 20150623)

and Vetenskapsradet (2013-42608-102305-28) to L.M.B., Stockholm Council

(LS 1302-0311) to J.O. and Vetenskapsradet (K2013-65X-21462-04-5),

German Research Society (SFB841, SFB877), and a European Research

Council grant (ERC-StG-2012-311575_F-12) to T.R.

Received: December 1, 2015

Revised: May 23, 2016

Accepted: August 3, 2016

Published: September 15, 2016

REFERENCES

Aird, W.C. (2012). Endothelial cell heterogeneity. Cold Spring Harb Perspect

Med 2, a006429.

Amaya, R., Pierides, A., and Tarbell, J.M. (2015). The interaction between fluid

wall shear stress and solid circumferential strain affects endothelial gene

expression. PLoS ONE 10, e0129952.

Ardlie, K.G., DeLuca, D.S., Segre, A.V., Sullivan, T.J., Young, T.R., Gelfand,

E.T., Trowbridge, C.A., Maller, J.B., Tukiainen, T., Lek, M., et al.; GTEx

Consortium (2015). Human genomics. The Genotype-Tissue Expression

(GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348,

648–660.

Armulik, A., Genove, G., and Betsholtz, C. (2011). Pericytes: Developmental,

physiological, and pathological perspectives, problems, and promises. Dev.

Cell 21, 193–215.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M.,

Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.; The Gene Ontology

Consortium (2000). Gene ontology: Tool for the unification of biology. Nat.

Genet. 25, 25–29.

Balda, M.S., and Matter, K. (2009). Tight junctions and the regulation of gene

expression. Biochim. Biophys. Acta 1788, 761–767.

Ballabio, E., Mariotti, M., De Benedictis, L., and Maier, J.A.M. (2004). The dual

role of endothelial differentiation-related factor-1 in the cytosol and nucleus:

Modulation by protein kinase A. Cell. Mol. Life Sci. 61, 1069–1074.

Beitz, J.G., Kim, I.S., Calabresi, P., and Frackelton, A.R., Jr. (1991). Human

microvascular endothelial cells express receptors for platelet-derived growth

factor. Proc. Natl. Acad. Sci. USA 88, 2021–2025.

Benedito, R., Rocha, S.F., Woeste, M., Zamykal, M., Radtke, F., Casanovas,

O., Duarte, A., Pytowski, B., and Adams, R.H. (2012). Notch-dependent

VEGFR3 upregulation allows angiogenesis without VEGF-VEGFR2 signalling.

Nature 484, 110–114.

Page 14: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Berger, C., Harzer, H., Burkard, T.R., Steinmann, J., van der Horst, S.,

Laurenson, A.S., Novatchkova, M., Reichert, H., and Knoblich, J.A. (2012).

FACS purification and transcriptome analysis of Drosophila neural stem cells

reveals a role for Klumpfuss in self-renewal. Cell Rep. 2, 407–418.

Bernat, J.A., Crawford, G.E., Ogurtsov, A.Y., Collins, F.S., Ginsburg, D., and

Kondrashov, A.S. (2006). Distant conserved sequences flanking endothelial-

specific promoters contain tissue-specific DNase-hypersensitive sites and

over-represented motifs. Hum. Mol. Genet. 15, 2098–2105.

Bhasin, M., Yuan, L., Keskin, D.B., Otu, H.H., Libermann, T.A., and Oettgen, P.

(2010). Bioinformatic identification and characterization of human endothelial

cell-restricted genes. BMC Genomics 11, 342.

Bogdan, C. (2015). Nitric oxide synthase in innate and adaptive immunity: An

update. Trends Immunol. 36, 161–178.

Carmeliet, P., Lampugnani, M.G., Moons, L., Breviario, F., Compernolle, V.,

Bono, F., Balconi, G., Spagnuolo, R., Oosthuyse, B., Dewerchin, M., et al.

(1999). Targeted deficiency or cytosolic truncation of the VE-cadherin gene

in mice impairs VEGF-mediated endothelial survival and angiogenesis. Cell

98, 147–157.

Cheng, C., Haasdijk, R., Tempel, D., van de Kamp, E.H., Herpers, R., Bos, F.,

Den Dekker, W.K., Blonden, L.A., de Jong, R., Burgisser, P.E., et al. (2012).

Endothelial cell-specific FGD5 involvement in vascular pruning defines neo-

vessel fate in mice. Circulation 125, 3142–3158.

Cheng, L., Zhang, S., MacLennan, G.T., Williamson, S.R., Davidson, D.D.,

Wang, M., Jones, T.D., Lopez-Beltran, A., and Montironi, R. (2013). Laser-as-

sisted microdissection in translational research: Theory, technical consider-

ations, and future applications. Appl. Immunohistochem. Mol. Morphol. 21,

31–47.

Chi, J.T., Chang, H.Y., Haraldsen, G., Jahnsen, F.L., Troyanskaya, O.G.,

Chang, D.S., Wang, Z., Rockson, S.G., van de Rijn, M., Botstein, D., and

Brown, P.O. (2003). Endothelial cell diversity revealed by global expression

profiling. Proc. Natl. Acad. Sci. USA 100, 10623–10628.

Chu, T.J., and Peters, D.G. (2008). Serial analysis of the vascular endothelial

transcriptome under static and shear stress conditions. Physiol. Genomics

34, 185–192.

Civelek, M., Manduchi, E., Riley, R.J., Stoeckert, C.J., Jr., and Davies, P.F.

(2011). Coronary artery endothelial transcriptome in vivo: Identification of

endoplasmic reticulum stress and enhanced reactive oxygen species by

gene connectivity network analysis. Circ Cardiovasc Genet 4, 243–252.

Collins, T., Read, M.A., Neish, A.S., Whitley, M.Z., Thanos, D., and Maniatis, T.

(1995). Transcriptional regulation of endothelial cell adhesion molecules: NF-

kappa B and cytokine-inducible enhancers. FASEB J. 9, 899–909.

Conley, C.A. (2001). Leiomodin and tropomodulin in smooth muscle. Am. J.

Physiol. Cell Physiol. 280, C1645–C1656.

Cooke, B.M., Usami, S., Perry, I., and Nash, G.B. (1993). A simplified method

for culture of endothelial cells and analysis of adhesion of blood cells under

conditions of flow. Microvasc. Res. 45, 33–45.

Delia, D., Lampugnani, M.G., Resnati, M., Dejana, E., Aiello, A., Fontanella, E.,

Soligo, D., Pierotti, M.A., and Greaves, M.F. (1993). CD34 expression is regu-

lated reciprocally with adhesionmolecules in vascular endothelial cells in vitro.

Blood 81, 1001–1008.

Dıaz-Flores, L., Gutierrez, R., Madrid, J.F., Varela, H., Valladares, F., Acosta, E.,

Martın-Vasallo, P., and Dıaz-Flores, L., Jr. (2009). Pericytes.Morphofunction, in-

teractions and pathology in a quiescent and activated mesenchymal cell niche.

Histol. Histopathol. 24, 909–969.

Dreiza, C.M., Komalavilas, P., Furnish, E.J., Flynn, C.R., Sheller, M.R., Smoke,

C.C., Lopes, L.B., and Brophy, C.M. (2010). The small heat shock protein,

HSPB6, in muscle function and disease. Cell Stress Chaperones 15, 1–11.

Du Toit, A. (2015). Mechanotransduction: VE-cadherin lets it flow. Nat. Rev.

Mol. Cell Biol. 16, 268–268.

Durr, E., Yu, J., Krasinska, K.M., Carver, L.A., Yates, J.R., Testa, J.E., Oh, P.,

and Schnitzer, J.E. (2004). Direct proteomicmapping of the lungmicrovascular

endothelial cell surface in vivo and in cell culture. Nat. Biotechnol. 22, 985–992.

East, L., and Isacke, C.M. (2002). The mannose receptor family. Biochim.

Biophys. Acta 1572, 364–386.

Fabriek, B.O., van Bruggen, R., Deng, D.M., Ligtenberg, A.J., Nazmi, K.,

Schornagel, K., Vloet, R.P., Dijkstra, C.D., and van den Berg, T.K. (2009).

The macrophage scavenger receptor CD163 functions as an innate immune

sensor for bacteria. Blood 113, 887–892.

Flicek, P., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho-

Silva, D., Clapham, P., Coates, G., Fitzgerald, S., et al. (2014). Ensembl

2014. Nucleic Acids Res. 42, D749–D755.

Forstermann, U., and Sessa, W.C. (2012). Nitric oxide synthases: Regulation

and function. Eur. Heart J. 33, 829–837, 837a–837d.

Ganz, P., and Hsue, P.Y. (2013). Endothelial dysfunction in coronary heart dis-

ease is more than a systemic process. Eur. Heart J. 34, 2025–2027.

Gaujoux, R., and Seoighe, C. (2013). CellMix: A comprehensive toolbox for

gene expression deconvolution. Bioinformatics 29, 2211–2212.

Hauge, H., Patzke, S., and Aasheim, H.C. (2007). Characterization of the

FAM110 gene family. Genomics 90, 14–27.

Ho, M., Yang, E., Matcuk, G., Deng, D., Sampas, N., Tsalenko, A., Tabibiazar,

R., Zhang, Y., Chen, M., Talbi, S., et al. (2003). Identification of endothelial cell

genes by combined database mining and microarray analysis. Physiol.

Genomics 13, 249–262.

Hobson, B., and Denekamp, J. (1984). Endothelial proliferation in tumours and

normal tissues: continuous labelling studies. Br J Cancer 49, 405–413.

Hu, G., Tang, J., Zhang, B., Lin, Y., Hanai, J., Galloway, J., Bedell, V., Bahary,

N., Han, Z., Ramchandran, R., et al. (2006). A novel endothelial-specific heat

shock protein HspA12B is required in both zebrafish development and endo-

thelial functions in vitro. J. Cell Sci. 119, 4117–4126.

Huminiecki, L., and Bicknell, R. (2000). In silico cloning of novel endothelial-

specific genes. Genome Res. 10, 1796–1806.

Ichikawa-Shindo, Y., Sakurai, T., Kamiyoshi, A., Kawate, H., Iinuma, N.,

Yoshizawa, T., Koyama, T., Fukuchi, J., Iimuro, S., Moriyama, N., et al.

(2008). The GPCR modulator protein RAMP2 is essential for angiogenesis

and vascular integrity. J Clin Invest. 118, 29–39.

Jaye, M., Lynch, K.J., Krawiec, J., Marchadier, D., Maugeais, C., Doan, K.,

South, V., Amin, D., Perrone, M., and Rader, D.J. (1999). A novel endothe-

lial-derived lipase that modulates HDL metabolism. Nat. Genet. 21, 424–428.

Ju, W., Greene, C.S., Eichinger, F., Nair, V., Hodgin, J.B., Bitzer, M., Lee, Y.S.,

Zhu, Q., Kehata, M., Li, M., et al. (2013). Defining cell-type specificity at the

transcriptional level in human disease. Genome Res. 23, 1862–1873.

Kampf, C., Olsson, I., Ryberg, U., SjStedt, E., and Ponten, F. (2012).

Production of tissue microarrays, immunohistochemistry staining and digitali-

zation within the human protein atlas. J. Vis. Exp. 63, http://dx.doi.org/10.

3791/3620.

Kanaji, S., Fahs, S.A., Shi, Q., Haberichter, S.L., andMontgomery, R.R. (2012).

Contribution of platelet vs. endothelial VWF to platelet adhesion and hemosta-

sis. J. Thromb. Haemost. 10, 1646–1652.

Kaufmann, A., Salentin, R., Gemsa, D., and Sprenger, H. (2001). Increase of

CCR1 and CCR5 expression and enhanced functional response to MIP-1 alpha

during differentiation of humanmonocytes tomacrophages. J. Leukoc. Biol. 69,

248–252.

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L.

(2013). TopHat2: Accurate alignment of transcriptomes in the presence of in-

sertions, deletions and gene fusions. Genome Biol. 14, R36.

Kim, C., Yang, H., Fukushima, Y., Saw, P.E., Lee, J., Park, J.S., Park, I., Jung,

J., Kataoka, H., Lee, D., et al. (2014). Vascular RhoJ is an effective and selec-

tive target for tumor angiogenesis and vascular disruption. Cancer Cell 25,

102–117.

Korhonen, J., Lahtinen, I., Halmekyto, M., Alhonen, L., Janne, J., Dumont, D.,

and Alitalo, K. (1995). Endothelial-specific gene expression directed by the tie

gene promoter in vivo. Blood 86, 1828–1835.

Kunjathoor, V.V., Febbraio, M., Podrez, E.A., Moore, K.J., Andersson, L.,

Koehn, S., Rhee, J.S., Silverstein, R., Hoff, H.F., and Freeman, M.W. (2002).

Scavenger receptors class A-I/II and CD36 are the principal receptors respon-

sible for the uptake of modified low density lipoprotein leading to lipid loading

in macrophages. J. Biol. Chem. 277, 49982–49988.

Cell Systems 3, 1–15, September 28, 2016 13

Page 15: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Lacorre, D.A., Baekkevold, E.S., Garrido, I., Brandtzaeg, P., Haraldsen, G.,

Amalric, F., and Girard, J.P. (2004). Plasticity of endothelial cells: Rapid dedif-

ferentiation of freshly isolated high endothelial venule endothelial cells outside

the lymphoid tissue microenvironment. Blood 103, 4164–4172.

Lenting, P.J., Christophe, O.D., and Denis, C.V. (2015). von Willebrand factor

biosynthesis, secretion, and clearance: Connecting the far ends. Blood 125,

2019–2028.

Leong, K.G., Hu, X., Li, L., Noseda, M., Larrivee, B., Hull, C., Hood, L., Wong,

F., and Karsan, A. (2002). Activated Notch4 inhibits angiogenesis: role of beta

1-integrin activation. Mol Cell Biol. 22, 2830–2841.

Ley, K. (2003). The role of selectins in inflammation and disease. Trends Mol.

Med. 9, 263–268.

Li, S., Zhou, X.L., Dang, Y.Y., Kwan, Y.W., Chan, S.W., Leung, G.P.H., Lee,

S.M.Y., and Hoi, M.P.M. (2015). Basal Flt1 tyrosine kinase activity is a positive

regulator of endothelial survival and vascularization during zebrafish embryo-

genesis. Biochim. Biophys. Acta 1850, 373–384.

Liang, Y., and Tedder, T.F. (2001). Identification of a CD20-, FcepsilonRIbeta-,

and HTm4-related gene family: Sixteen newMS4A family members expressed

in human and mouse. Genomics 72, 119–127.

Liu, Z., Fan, F., Wang, A., Zheng, S., and Lu, Y. (2014). Dll4-Notch signaling in

regulation of tumor angiogenesis. J. Cancer Res. Clin. Oncol. 140, 525–536.

Long, X., Tharp, D.L., Georger, M.A., Slivano, O.J., Lee, M.Y., Wamhoff, B.R.,

Bowles, D.K., and Miano, J.M. (2009). The smooth muscle cell-restricted

KCNMB1 ion channel subunit is a direct transcriptional target of serum

response factor and myocardin. J. Biol. Chem. 284, 33671–33682.

Lorenzon, E., Colladel, R., Andreuzzi, E., Marastoni, S., Todaro, F.,

Schiappacassi, M., Ligresti, G., Colombatti, A., and Mongiat, M. (2012).

MULTIMERIN2 impairs tumor angiogenesis and growth by interfering with

VEGF-A/VEGFR2 pathway. Oncogene 31, 3136–3147.

Mackman, N. (2012). New insights into themechanisms of venous thrombosis.

J. Clin. Invest. 122, 2331–2336.

Malatesta, M. (2016). Histological and histochemical methods - theory and

practice. Eur. J. Histochem 60, 2639.

Masiero, M., Simoes, F.C., Han, H.D., Snell, C., Peterkin, T., Bridges, E.,

Mangala, L.S.,Wu, S.Y., Pradeep, S., Li, D., et al. (2013). A core human primary

tumor angiogenesis signature identifies the endothelial orphan receptor

ELTD1 as a key regulator of angiogenesis. Cancer Cell 24, 229–241.

Mi, H., Muruganujan, A., Casagrande, J.T., and Thomas, P.D. (2013). Large-

scale gene function analysis with the PANTHER classification system. Nat.

Protoc. 8, 1551–1566.

Mi, H., Poudel, S., Muruganujan, A., Casagrande, J.T., and Thomas, P.D.

(2016). PANTHER version 10: Expanded protein families and functions, and

analysis tools. Nucleic Acids Res. 44, D336–D342.

Miwa, T., Manabe, Y., Kurokawa, K., Kamada, S., Kanda, N., Bruns, G.,

Ueyama, H., and Kakunaga, T. (1991). Structure, chromosome location, and

expression of the human smooth muscle (enteric type) gamma-actin gene:

Evolution of six human actin genes. Mol. Cell. Biol. 11, 3296–3306.

Moren, B., Shah, C., Howes, M.T., Schieber, N.L., McMahon, H.T., Parton,

R.G., Daumke, O., and Lundmark, R. (2012). EHD2 regulates caveolar dy-

namics via ATP-driven targeting and oligomerization. Mol. Biol. Cell 23,

1316–1329.

Muller, A.M., Hermanns, M.I., Skrzynski, C., Nesslinger, M., Muller, K.M., and

Kirkpatrick, C.J. (2002). Expression of the endothelial markers PECAM-1, vWf,

and CD34 in vivo and in vitro. Exp. Mol. Pathol. 72, 221–229.

Mura, M., Swain, R.K., Zhuang, X., Vorschmitt, H., Reynolds, G., Durant, S.,

Beesley, J.F., Herbert, J.M., Sheldon, H., Andre, M., et al. (2012).

Identification and angiogenic role of the novel tumor endothelial marker

CLEC14A. Oncogene 31, 293–305.

Murray, P.J., and Wynn, T.A. (2011). Protective and pathogenic functions of

macrophage subsets. Nat. Rev. Immunol. 11, 723–737.

Nanda, N., Bao,M., Lin, H., Clauser, K., Komuves, L., Quertermous, T., Conley,

P.B., Phillips, D.R., and Hart, M.J. (2005). Platelet endothelial aggregation re-

ceptor 1 (PEAR1), a novel epidermal growth factor repeat-containing trans-

14 Cell Systems 3, 1–15, September 28, 2016

membrane receptor, participates in platelet contact-induced activation.

J. Biol. Chem. 280, 24680–24689.

Nolan, D.J., Ginsberg, M., Israely, E., Palikuqi, B., Poulos, M.G., James, D.,

Ding, B.S., Schachterle, W., Liu, Y., Rosenwaks, Z., et al. (2013). Molecular sig-

natures of tissue-specificmicrovascular endothelial cell heterogeneity in organ

maintenance and regeneration. Dev. Cell 26, 204–219.

Noy, P.J., Lodhia, P., Khan, K., Zhuang, X., Ward, D.G., Verissimo, A.R.,

Bacon, A., and Bicknell, R. (2015). Blocking CLEC14A-MMRN2 binding in-

hibits sprouting angiogenesis and tumour growth. Oncogene 34, 5821–5831.

Oliveira-Paula, G.H., Lacchini, R., and Tanus-Santos, J.E. (2016). Endothelial

nitric oxide synthase: From biochemistry and gene structure to clinical implica-

tions of NOS3 polymorphisms. Gene 575, 584–599.

Pei-Ling Chiu, A., Wang, F., Lal, N., Wang, Y., Zhang, D., Hussein, B., Wan, A.,

Vlodavsky, I., and Rodrigues, B. (2014). Endothelial cells respond to hypergly-

cemia by increasing the LPL transporter GPIHBP1. Am. J. Physiol. Endocrinol.

Metab. 306, E1274–E1283.

Pober, J.S., and Sessa, W.C. (2007). Evolving functions of endothelial cells in

inflammation. Nat. Rev. Immunol. 7, 803–815.

Ponten, F., Jirstrom, K., and Uhlen, M. (2008). The Human Protein Atlas—a tool

for pathology. J. Pathol. 216, 387–393.

Pusztaszeri,M.P.,Seelentag,W.,andBosman,F.T. (2006). Immunohistochemical

expression of endothelialmarkersCD31,CD34, vonWillebrand factor, andFli-1 in

normal human tissues. J. Histochem. Cytochem. 54, 385–395.

Rensen, S.S., Doevendans, P.A., and van Eys, G.J. (2007). Regulation and

characteristics of vascular smooth muscle cell phenotypic diversity. Neth.

Heart J. 15, 100–108.

Rho, S.S., Choi, H.J., Min, J.K., Lee, H.W., Park, H., Park, H., Kim, Y.M., and

Kwon, Y.G. (2011). Clec14a is specifically expressed in endothelial cells and

mediates cell to cell adhesion. Biochem. Biophys. Res. Commun. 404,

103–108.

Satterthwaite, A.B., Burn, T.C., Le Beau, M.M., and Tenen, D.G. (1992).

Structure of the gene encoding CD34, a human hematopoietic stem cell anti-

gen. Genomics 12, 788–794.

Sawada, J., Urakami, T., Li, F., Urakami, A., Zhu, W., Fukuda, M., Li, D.Y.,

Ruoslahti, E., and Komatsu, M. (2012). Small GTPase R-Ras regulates integrity

and functionality of tumor blood vessels. Cancer Cell 22, 235–249.

Schick, P.K., Walker, J., Profeta, B., Denisova, L., and Bennett, V. (1997).

Synthesis and secretion of von Willebrand factor and fibronectin in megakar-

yocytes at different phases of maturation. Arterioscler. Thromb. Vasc. Biol. 17,

797–801.

Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J.,

Chen, W., and Selbach, M. (2011). Global quantification of mammalian gene

expression control. Nature 473, 337–342.

Seaman, S., Stevens, J., Yang, M.Y., Logsdon, D., Graff-Cherry, C., and St

Croix, B. (2007). Genes that distinguish physiological and pathological angio-

genesis. Cancer Cell 11, 539–554.

Seano, G., Chiaverina, G., Gagliardi, P., di Blasio, L., Puliafito, A., Bouvard, C.,

Sessa, R., Tarone, G., Sorokin, L., Helley, D., et al. (2014). Endothelial podo-

some rosettes regulate vascular branching in tumor angiogenesis. Nat. Biol.

Cell 16, 931–941.

Semenza, G.L. (2001). Regulation of hypoxia-induced angiogenesis: A chap-

erone escorts VEGF to the dance. J. Clin. Invest. 108, 39–40.

Shen-Orr, S.S., Tibshirani, R., Khatri, P., Bodian, D.L., Staedtler, F., Perry,

N.M., Hastie, T., Sarwal, M.M., Davis, M.M., and Butte, A.J. (2010). Cell

type-specific gene expression differences in complex tissues. Nat. Methods

7, 287–289.

Shih, S.C., Robinson, G.S., Perruzzi, C.A., Calvo, A., Desai, K., Green, J.E., Ali,

I.U., Smith, L.E., and Senger, D.R. (2002). Molecular profiling of angiogenesis

markers. Am. J. Pathol. 161, 35–41.

Spies, D., andCiaudo, C. (2015). Dynamics in transcriptomics: Advancements in

RNA-seq time course and downstream analysis. Comput. Struct. Biotechnol. J.

13, 469–477.

Page 16: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

Steagall, R.J., Rusinol, A.E., Truong, Q.A., and Han, Z. (2006). HSPA12B is pre-

dominantly expressed in endothelial cells and required for angiogenesis.

Arterioscler. Thromb. Vasc. Biol. 26, 2012–2018.

Steyers, C.M., 3rd, and Miller, F.J., Jr. (2014). Endothelial dysfunction in

chronic inflammatory diseases. Int. J. Mol. Sci. 15, 11324–11349.

Stoeber, M., Stoeck, I.K., Hanni, C., Bleck, C.K.E., Balistreri, G., and Helenius,

A. (2012). Oligomers of the ATPase EHD2 confine caveolae to the plasma

membrane through association with actin. EMBO J. 31, 2350–2364.

Tabas, I., Garcıa-Cardena, G., and Owens, G.K. (2015). Recent insights into

the cellular biology of atherosclerosis. J. Cell Biol. 209, 13–22.

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,

M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly

and quantification by RNA-Seq reveals unannotated transcripts and isoform

switching during cell differentiation. Nat. Biotechnol. 28, 511–515.

Uhlen, M., Fagerberg, L., Hallstrom, B.M., Lindskog, C., Oksvold, P.,

Mardinoglu, A., Sivertsson, A., Kampf, C., Sjostedt, E., Asplund, A., et al.

(2015). Proteomics. Tissue-based map of the human proteome. Science

347, 1260419.

Urich, E., Lazic, S.E., Molnos, J., Wells, I., and Freskgard, P.O. (2012).

Transcriptional profiling of human brain endothelial cells reveals key properties

crucial for predictive in vitro blood-brain barrier models. PLoS ONE 7, e38149.

van Beijnum, J.R., Dings, R.P., van der Linden, E., Zwaans, B.M., Ramaekers,

F.C., Mayo, K.H., and Griffioen, A.W. (2006). Gene expression of tumor angio-

genesis dissected: Specific targeting of colon cancer angiogenic vasculature.

Blood 108, 2339–2348.

Varchetta, S., Brunetta, E., Roberto, A., Mikulak, J., Hudspeth, K.L., Mondelli,

M.U., and Mavilio, D. (2012). Engagement of Siglec-7 receptor induces a pro-

inflammatory response selectively in monocytes. PLoS ONE 7, e45821.

Vita, J.A. (2011). Endothelial function. Circulation 124, e906–e912.

Vogel, C., and Marcotte, E.M. (2012). Insights into the regulation of protein

abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet.

13, 227–232.

Vogel, C., Abreu, Rde.S., Ko, D., Le, S.Y., Shapiro, B.A., Burns, S.C., Sandhu,

D., Boutz, D.R., Marcotte, E.M., and Penalva, L.O. (2010). Sequence signa-

tures and mRNA concentration can explain two-thirds of protein abundance

variation in a human cell line. Mol. Syst. Biol. 6, 400.

Wang, Y., and Navin, N.E. (2015). Advances and applications of single-cell

sequencing technologies. Mol. Cell 58, 598–609.

Wang, Z., Wang, D.Z., Pipes, G.C., and Olson, E.N. (2003). Myocardin is a

master regulator of smooth muscle gene expression. Proc. Natl. Acad. Sci.

USA 100, 7129–7134.

Wu, W., Xiao, H., Laguna-Fernandez, A., Villarreal, G., Jr., Wang, K.C., Geary,

G.G., Zhang, Y., Wang, W.C., Huang, H.D., Zhou, J., et al. (2011). Flow-depen-

dent regulation of Kruppel-like factor 2 is mediated by MicroRNA-92a.

Circulation 124, 633–641.

Yamawaki, K., Ito,M., Machida, H.,Moriki, N., Okamoto, R., Isaka, N., Shimpo,

H., Kohda, A., Okumura, K., Hartshorne, D.J., and Nakano, T. (2001).

Identification of human CPI-17, an inhibitory phosphoprotein for myosin phos-

phatase. Biochem. Biophys. Res. Commun. 285, 1040–1045.

Yates, A., Akanni, W., Amode, M.R., Barrell, D., Billis, K., Carvalho-Silva, D.,

Cummins, C., Clapham, P., Fitzgerald, S., Gil, L., et al. (2016). Ensembl

2016. Nucleic Acids Res. 44, D710–D716.

Zanetta, L., Marcus, S.G., Vasile, J., Dobryansky, M., Cohen, H., Eng, K.,

Shamamian, P., and Mignatti, P. (2000). Expression of Von Willebrand factor,

an endothelial cell marker, is up-regulated by angiogenesis factors: A potential

method for objective assessment of tumor angiogenesis. Int. J. Cancer 85,

281–288.

Zhang, Y., Chen, K., Guo, L., and Wu, C. (2002). Characterization of PINCH-2,

a new focal adhesion protein that regulates the PINCH-1-ILK interaction, cell

spreading, and migration. J. Biol. Chem. 277, 38328–38338.

Zhou, L., and Zhu, D.Y. (2009). Neuronal nitric oxide synthase: Structure, sub-

cellular localization, regulation, and clinical implications. Nitric Oxide: Biol.

Chem. 20, 223–230.

Cell Systems 3, 1–15, September 28, 2016 15

Page 17: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Antibodies

CLEC14A Atlas Antibodies HPA039468

vWF Atlas Antibodies Cat# HPA001815, RRID:AB_611880

CD34 Atlas Antibodies HPA036722

HSPA12B Atlas Antibodies Cat# HPA013659, RRID:AB_1234541

ESM1 Sigma-Aldrich Cat# HPA036660, RRID:AB_10670842

PTPRC Atlas Antibodies Cat# HPA000440, RRID:AB_611377

ITGA2B Atlas Antibodies Cat# HPA031168, RRID:AB_10664706

MYH11 Atlas Antibodies Cat# HPA014539, RRID:AB_1234906

EDH2 Atlas Antibodies HPA049890

LIMS2 Atlas Antibodies HPA058340

FAM110D Atlas Antibodies Cat# HPA013664, RRID:AB_1234332

KANK3 Atlas Antibodies HPA051153

GIPC3 Atlas Antibodies HPA061258

ENG Leica Microsystems Cat# NCL-CD105, RRID:AB_563482

Deposited Data

HUVEC sequencing data This paper ArrayExpress E-MTAB-4897

Human tissue sequencing data Uhlen et al., 2015 ArrayExpress E-MTAB-2836

Other

Human Protein Atlas resource Ponten et al., 2008

Uhlen et al., 2015

http://www.proteinatlas.org/

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author, Dr. Lynn Marie

Butler ([email protected]).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Human tissue transcript profiling was performed in house as part of the Human Protein Atlas (HPA) project (Ponten et al., 2008; Uhlen

et al., 2015) (http://www.proteinatlas.org/). 124 individual human tissue samples were collected from 32 different organs (details in

Table S1) obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden; as part of the Uppsala Bio-

bank. Samples were handled in accordance with Swedish laws and regulations, with approval and advisory reports from the Uppsala

Ethical Review Board (Uhlen et al., 2015).

METHOD DETAILS

Human Tissue Preparation and Transcript ProfilingTissue samples were embedded in optimal cutting temperature compound and stored at�80�C. Hematoxylin and eosin (HE) stained

frozen sections (4 mm) were prepared from each sample and examined by a pathologist to confirm sampling of representative normal

tissue. Three sections per sample were homogenized using a 3mm metal grinding ball (VWR) and total RNA was extracted using

the RNeasy Mini Kit (QIAGEN), according to the manufacturer’s instructions. Extracted RNA was analyzed using either an Experion

automated electrophoresis system (BioRad Laboratories) with the standard-sensitivity RNA chip or an Agilent 2100 Bio-analyzer

system (Agilent Biotechnologies) with the RNA 6000 Nano Labchip Kit. Only high quality RNA (RNA integrity number R 7.5) was

used for library preparation (PolyA) and sequencing. Next generation RNA sequencing was performed using Illumina Hiseq2000

and Hiseq2500 and the standard Illumina RNA-seq protocol with a paired end read length of 100x2 bp or 125x2 bp with on average

50M reads/library (span of 13-84 M reads). Processed reads were mapped to the Human Genome (GRCh37 and GRCH38) using

Tophat v2.0.8b (Kim et al., 2013), allowing for two mismatches. Transcript abundance FPKM (fragments per kilobase of exon model

e1 Cell Systems 3, 1–15.e1–e3, September 28, 2016

Page 18: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

per million mapped reads) values were calculated using Cufflinks v2.1.2 (Trapnell et al., 2010) and Ensembl build 75 (Flicek et al.,

2014) or Ensembl build 83 (Yates et al., 2016) using summarized gene FPKM, not accounting for different isoforms in the analysis.

The number of protein coding genes mapped was 20,344.

Estimation of Endothelial Cell Percentage in Selected TissuesThe percentage of EC was estimated in fresh frozen tissue samples from bone marrow, pancreas, ovary, tonsil, salivary gland, ap-

pendix, spleen, thyroid gland, gall bladder, urinary bladder, heart muscle and lung (2-5 individual samples/organ; see Table S1) that

were used for RNA extraction andmRNA sequencing. A cryosection was stained using hematoxylin-eosin (H&E) and a pathologist or

a trained technician identified the constituent EC cells using high power microscopy using established morphological criteria (Ma-

latesta, 2016), specifically an elongated cell nuclei surrounding clear vascular spaces or slits. The mean percentage was estimated

from at least 4 representative fields from each individual tissue sample.

Tissue Profiling: Human Tissue SectionsTissue microarrays (TMA) were generated and stained as part of the HPA project, as previously described (Kampf et al., 2012; Ponten

et al., 2008). Briefly, formalin fixed and paraffin embedded tissue sampleswere sectioned, de-paraffinised in xylene, hydrated in graded

alcohols and blocked for endogenous peroxidase in 0.3% hydrogen peroxide diluted in 95% ethanol. For antigen retrieval, a Decloak-

ing chamber (Biocare Medical, CA) was used. Slides were boiled in Citrate buffer, pH6 (Lab Vision, CA). Primary antibody against

CLEC14A (Atlas Antibodies HPA039468), VWF (Atlas Antibodies HPA001815), CD34 (Atlas Antibodies HPA036722), HSPA12B (Atlas

Antibodies HPA013659), ENG (Novocastra NCL-CD105), ESM1 (Atlas Antibodies HPA036660), PTPRC (Atlas Antibodies HPA000440),

ITGA2B (Atlas AntibodiesHPA031168),MYH11 (Atlas AntibodiesHPA0145359), EDH2 (AtlasAntibodiesHPA049890), LIMS2 (AtlasAn-

tibodies HPA058340), FAM110D (Atlas Antibodies HPA013664), KANK3 (Atlas Antibodies HPA051153) or GIPC3 (Atlas Antibodies

HPA061258) and a dextran polymer visualization system (UltraVision LP HRP polymer, Lab Vision) were incubated for 30 min each

at room temperature and slides were developed for 10 min using Diaminobenzidine (Lab Vision) as the chromogen. Slides were coun-

terstained in Mayers hematoxylin (Histolab) and scanned using Scanscope XT (Aperio).

Transcript Profiling: Isolated Human Endothelial CellsHuman umbilical vein endothelial cells (HUVEC) were isolated from umbilical cords from four different donors, as previously described

(Cooke et al., 1993). Cells were maintained in Medium 199 (M199, Invitrogen) containing 20% fetal calf serum, 28 mg/ml gentamycin,

2.5 mg/ml amphotericin B, 1ng/ml epidermal growth factor and 1 mg/ml hydrocortisone (all from Sigma) for 48 hr prior to processing.

HUVEC cultures isolated using this method were 96%–98% pure, determined by positive staining by flow cytometry of CD105,

CD31 and vWF and the expression of elevated levels of intracellular adhesion molecule (ICAM-1) and E-selectin following stimulation

with the inflammatory cytokine interleukin-1b. Total HUVEC RNA was isolated using the RNeasy mini kit with QIAshredder (QIAGEN)

according to the manufacturer’s instructions. RNA integrity number was > 8.0 for all samples. RNA sequencing was performed using

the standard Illumina RNA-seq protocol. FPKM (fragments per kilobase of exon model per million mapped reads) values were calcu-

lated using Cufflinks v2.1.2 (Trapnell et al., 2010) and Ensembl build 75 (Flicek et al., 2014). The number of protein coding genes map-

pedwas 20,073. Normalizedmicroarray gene expression datasets for human bladder microvascular EC; HBMEC (GSM72644), human

iliac artery EC; HIAEC (GSM72657, GSM72658, GSM72659, GSM72660), human saphenous vein EC; HSVEC (GSM72683,

GSM72683), human umbilical artery EC; HUAEC (GSM72686, GSM72687, GSM72688, GSM72689, GSM72690, GSM72691) and hu-

man uterinemicrovascular EC; HUtMEC (GSM72692, GSM72692) were derived from a public dataset of 61 different normal human cell

cultures (GSE3239, GE Codelink Human Uniset) downloaded from NCBI-GEO (http://www.ncbi.nlm.nih.gov/geo/).

QUANTIFICATION AND STATISTICAL ANALYSIS

Analysis of RNA-Seq Data to Determine Pan EC-Enriched TranscriptsAs EC are present in all human tissues, at differing levels, we used a correlation analysis method to identify EC-enriched gene tran-

scripts from the whole tissue RNA-seq data described above.We calculated the pairwise Spearman correlation coefficients between

the EC transcripts C-type lectin domain family 14, member A (CLEC14A), von Willebrand factor (VWF) and CD34 (CD34) and the

20,073mapped protein coding genes. Amean Spearman correlation coefficient value between the ‘test’ transcripts and the EC refer-

ence transcripts of 0.5 or above was considered a positive result. All statistical analyses were performed in R (version 3.1.1). Corre-

lation values were calculated using the cor() function with method = ’’spearman’’ and use = ’’complete’’. Linear regression was per-

formed using the lm() function with default parameters. Multiple comparison correction of p values was done with p.adjust() using

both method = ’’fdr’’ and method = ’’bonferroni’’. We measured mean correlation coefficient values between three selected SMC

reference transcripts Myosin, Heavy Chain 11, Smooth Muscle (MYH11), Myosin Light Chain Kinase (MYLK) and Actin, Alpha 2,

Smooth Muscle, Aorta (ACTA2) and the identified EC-enriched transcripts. Those with higher correlation values than with the EC

reference transcripts were excluded.

Analysis of GTEx RNA-Seq Data as ReplicationThe gene RPKM and sample attributes tables of GTEx version 6 (dbGaP Accession phs000424.v6.p1) were downloaded from the

GTEx portal (http://www.gtexportal.org/home/). Data of samples originating from the tissues listed in Table S2, tab 6 were extracted

Cell Systems 3, 1–15.e1–e3, September 28, 2016 e2

Page 19: Analysis of Body-wide Unfractionated Tissue Data to Identify a … · profiling of 124 samples collected from 32 human organs (n = 2–7 samples/organ) as part of the Human Protein

Please cite this article in press as: Butler et al., Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome,Cell Systems (2016), http://dx.doi.org/10.1016/j.cels.2016.08.001

from the expression matrix and pairwise Spearman correlation coefficients between the EC transcripts CLEC14A, VWF and CD34

were performed on these data using the previously described methods. A mean correlation coefficient R 0.5 between the EC-en-

riched transcripts identified using the HPA material and the EC reference transcripts in the GTEX material was considered a positive

replication result. A summary of the complete selection protocol is shown in Figure S3D.

Gene Ontology (GO) Enrichment AnalysisThe Gene Ontology Consortium (Ashburner et al., 2000) and PANTHER classification resource (Mi et al., 2013; Mi et al., 2016) were

used to identify over represented terms (biological processes) in the final panel of identified EC-enriched transcripts from the GO

ontology database (release date March 2016).

DATA AND SOFTWARE AVAILABILITY

HUVEC sequencing data has been deposited in ArrayExpress under accession number E-MTAB-4897.

ADDITIONAL RESOURCES

The Human Protein Atlas (HPA) website contains details of all sequencing data and antibody-based protein profiling used in this

study: http://www.proteinatlas.org/

e3 Cell Systems 3, 1–15.e1–e3, September 28, 2016