Mixed-effects association of single cells identifies …...Email: [email protected] SCIENCE...

18
SINGLE CELL ANALYSIS Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works Mixed-effects association of single cells identifies an expanded effector CD4 + T cell subset in rheumatoid arthritis Chamith Y. Fonseka 1,2,3,4,5 *, Deepak A. Rao 2 *, Nikola C. Teslovich 2 , Ilya Korsunsky 1 , Susan K. Hannes 2 , Kamil Slowikowski 1,2,3,4,5 , Michael F. Gurish 2 , Laura T. Donlin 6,7,8 , James A. Lederer 1 , Michael E. Weinblatt 2 , Elena M. Massarotti 2 , Jonathan S. Coblyn 2 , Simon M. Helfgott 2 , Derrick J. Todd 2 , Vivian P. Bykerk 8,9 , Elizabeth W. Karlson 2 , Joerg Ermann 2 , Yvonne C. Lee 2,10 , Michael B. Brenner 2 , Soumya Raychaudhuri 1,2,3,5,11High-dimensional single-cell analyses have improved the ability to resolve complex mixtures of cells from human disease samples; however, identifying disease-associated cell types or cell states in patient samples remains challenging because of technical and interindividual variation. Here, we present mixed-effects modeling of associa- tions of single cells (MASC), a reverse single-cell association strategy for testing whether case-control status influences the membership of single cells in any of multiple cellular subsets while accounting for technical confounders and biological variation. Applying MASC to mass cytometry analyses of CD4 + T cells from the blood of rheumatoid arthritis (RA) patients and controls revealed a significantly expanded population of CD4 + T cells, identified as CD27 - HLA-DR + effector memory cells, in RA patients (odds ratio, 1.7; P = 1.1 × 10 -3 ). The frequency of CD27 - HLA-DR + cells was similarly elevated in blood samples from a second RA patient cohort, and CD27 - HLA-DR + cell frequency decreased in RA patients who responded to immunosuppressive therapy. Mass cytometry and flow cytometry analyses indicated that CD27 - HLA-DR + cells were associated with RA (meta-analysis P = 2.3 × 10 -4 ). Compared to peripheral blood, syn- ovial fluid and synovial tissue samples from RA patients contained about fivefold higher frequencies of CD27 - HLA-DR + cells, which comprised ~10% of synovial CD4 + T cells. CD27 - HLA-DR + cells expressed a distinctive effector memory transcriptomic program with T helper 1 (T H 1) and cytotoxicity-associated features and produced abundant interferon- g (IFN-g) and granzyme A protein upon stimulation. We propose that MASC is a broadly applicable method to identify disease-associated cell populations in high-dimensional single-cell data. INTRODUCTION The advance of single-cell technologies has enabled investigators to resolve cellular heterogeneity with unprecedented resolution. Single-cell assays have been particularly useful in the study of the immune system, in which diverse cell populations often consisting of rare and transitional cell states may play an important role (1). Application of single-cell transcriptomic and cytometric assays in a case-control study has the potential to reveal expanded pathogenic cell populations in immune- mediated diseases. Rheumatoid arthritis (RA) is a chronic, systemic disease affecting 0.5 to 1% of the adult population, making it one of the most common autoimmune disorders worldwide (2). RA is triggered by environmental and genetic risk factors, leading to activation of autoreactive T cells and B cells that mediate an autoimmune response directed at the joints (3, 4). CD4 + T cells have been strongly implicated in RA pathogenesis (5, 6). For one, the strongest genetic association to RA is with the HLA- DRB1 gene within the major histocompatibility complex (MHC); these polymorphisms affect the range of antigens that MHC II molecules can bind and present to activate CD4 + T cells (3, 7, 8). Furthermore, many RA risk alleles outside of the MHC locus also lie in pathways important for CD4 + T cell activation, differentiation into effector (T eff ) and reg- ulatory (T reg ) subsets, and maintenance of subset identity (4, 812). Defining the precise CD4 + T cell subsets that are expanded or dysregu- lated in RA patients is critical to deciphering pathogenesis. Such cell populations may be enriched in antigen-specific T cells and may aid in the discovery of dominant disease-associated autoantigens. In addition, these populations may directly carry out pathologic effector functions that can be targeted therapeutically (13). For many autoimmune diseases, directly assaying affected tissues is difficult because samples are only available through invasive proce- dures. Instead, querying peripheral blood for altered immune cell popu- lations is a rapidly scalable strategy that achieves larger sample sizes and allows for serial monitoring. Flow cytometric studies have identified alterations in specific circulating T cell subsets in RA patients, including an increased frequency of CD28 CD4 + T cells (1416); however, the expansion of CD28 T cells represents one of the relatively few T cell alterations that have been reproducibly detected by multiple groups. Limited reproducibility may be the consequence of differences in clinical cohorts, small sample sizes, methodologic variability, and use of limited idiosyncratic combinations of phenotypic markers (17). The advent of mass cytometry now allows for relatively broad assess- ment of circulating immune cell populations with >30 markers (18), 1 Center for Data Sciences, Brigham and Womens Hospital and Harvard Medical School, Boston, MA 02115, USA. 2 Division of Rheumatology, Immunology, and Allergy, Brigham and Womens Hospital and Harvard Medical School, Boston, MA 02115, USA. 3 Division of Genetics, Brigham and Womens Hospital and Harvard Medical School, Boston, MA 02115, USA. 4 Department of Biomedical Informatics, Harvard University, Cambridge, MA 02138, USA. 5 Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA. 6 Arthritis and Tissue Degeneration Program, Hospital for Special Sur- gery, New York, NY 10021, USA. 7 David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY 10021, USA. 8 Department of Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10021, USA. 9 Divi- sion of Rheumatology, Hospital for Special Surgery, 535 East 70th Street, New York, NY 10021, USA. 10 Division of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA. 11 Institute of Inflammation and Repair, University of Manchester, Manchester, UK. *These authors contributed equally to this work as co-first authors. Corresponding author. Email: [email protected] SCIENCE TRANSLATIONAL MEDICINE | RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018 1 of 17 by guest on October 2, 2020 http://stm.sciencemag.org/ Downloaded from

Transcript of Mixed-effects association of single cells identifies …...Email: [email protected] SCIENCE...

Page 1: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

S INGLE CELL ANALYS I S

1Center for Data Sciences, Brigham and Women’s Hospital and Harvard MedicalSchool, Boston, MA 02115, USA. 2Division of Rheumatology, Immunology, and Allergy,Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA.3Division of Genetics, Brigham and Women’s Hospital and Harvard Medical School,Boston, MA 02115, USA. 4Department of Biomedical Informatics, Harvard University,Cambridge, MA 02138, USA. 5Program in Medical and Population Genetics, BroadInstitute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142, USA. 6Arthritis and Tissue Degeneration Program, Hospital for Special Sur-gery, New York, NY 10021, USA. 7David Z. Rosensweig Genomics Research Center,Hospital for Special Surgery, New York, NY 10021, USA. 8Department of Medicine,Weill Cornell Medical College, Cornell University, New York, NY 10021, USA. 9Divi-sion of Rheumatology, Hospital for Special Surgery, 535 East 70th Street, New York,NY 10021, USA. 10Division of Rheumatology, Northwestern University FeinbergSchool of Medicine, Chicago, IL 60611, USA. 11Institute of Inflammation and Repair,University of Manchester, Manchester, UK.*These authors contributed equally to this work as co-first authors.†Corresponding author. Email: [email protected]

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

Copyright © 2018

The Authors, some

rights reserved;

exclusive licensee

American Association

for the Advancement

of Science. No claim

to original U.S.

Government Works

http://stm.sciencem

aD

ownloaded from

Mixed-effects association of single cells identifiesan expanded effector CD4+ T cell subset inrheumatoid arthritisChamith Y. Fonseka1,2,3,4,5*, Deepak A. Rao2*, Nikola C. Teslovich2, Ilya Korsunsky1,Susan K. Hannes2, Kamil Slowikowski1,2,3,4,5, Michael F. Gurish2, Laura T. Donlin6,7,8,James A. Lederer1, Michael E. Weinblatt2, Elena M. Massarotti2, Jonathan S. Coblyn2,Simon M. Helfgott2, Derrick J. Todd2, Vivian P. Bykerk8,9, Elizabeth W. Karlson2, Joerg Ermann2,Yvonne C. Lee2,10, Michael B. Brenner2, Soumya Raychaudhuri1,2,3,5,11†

High-dimensional single-cell analyses have improved the ability to resolve complex mixtures of cells from humandisease samples; however, identifying disease-associated cell types or cell states in patient samples remainschallenging because of technical and interindividual variation. Here, we present mixed-effects modeling of associa-tions of single cells (MASC), a reverse single-cell association strategy for testingwhether case-control status influencesthe membership of single cells in any of multiple cellular subsets while accounting for technical confounders andbiological variation. Applying MASC to mass cytometry analyses of CD4+ T cells from the blood of rheumatoidarthritis (RA) patients and controls revealed a significantly expanded population of CD4+ T cells, identified as CD27−

HLA-DR+ effector memory cells, in RA patients (odds ratio, 1.7; P = 1.1 × 10−3). The frequency of CD27− HLA-DR+ cellswas similarly elevated inblood samples froma secondRApatient cohort, andCD27−HLA-DR+ cell frequencydecreasedin RA patientswho responded to immunosuppressive therapy.Mass cytometry and flow cytometry analyses indicatedthat CD27− HLA-DR+ cells were associated with RA (meta-analysis P = 2.3 × 10−4). Compared to peripheral blood, syn-ovial fluid and synovial tissue samples fromRApatients contained about fivefoldhigher frequencies of CD27−HLA-DR+

cells, which comprised ~10% of synovial CD4+ T cells. CD27− HLA-DR+ cells expressed a distinctive effector memorytranscriptomic programwith T helper 1 (TH1)– and cytotoxicity-associated features and produced abundant interferon-g(IFN-g) and granzyme A protein upon stimulation. We propose that MASC is a broadly applicable method to identifydisease-associated cell populations in high-dimensional single-cell data.

g.or

by guest on O

ctober 2, 2020g/

INTRODUCTION

The advance of single-cell technologies has enabled investigators toresolve cellular heterogeneity with unprecedented resolution. Single-cellassays have been particularly useful in the study of the immune system,inwhich diverse cell populations often consisting of rare and transitionalcell states may play an important role (1). Application of single-celltranscriptomic and cytometric assays in a case-control study has thepotential to reveal expanded pathogenic cell populations in immune-mediated diseases.

Rheumatoid arthritis (RA) is a chronic, systemic disease affecting0.5 to 1% of the adult population, making it one of the most commonautoimmune disordersworldwide (2). RA is triggered by environmentaland genetic risk factors, leading to activation of autoreactive T cells andB cells that mediate an autoimmune response directed at the joints

(3, 4). CD4+ T cells have been strongly implicated in RA pathogenesis(5, 6). For one, the strongest genetic association to RA is with theHLA-DRB1 gene within the major histocompatibility complex (MHC); thesepolymorphisms affect the range of antigens that MHC II molecules canbind and present to activate CD4+ T cells (3, 7, 8). Furthermore, manyRA risk alleles outside of theMHC locus also lie in pathways importantfor CD4+ T cell activation, differentiation into effector (Teff) and reg-ulatory (Treg) subsets, and maintenance of subset identity (4, 8–12).Defining the precise CD4+ T cell subsets that are expanded or dysregu-lated in RA patients is critical to deciphering pathogenesis. Such cellpopulations may be enriched in antigen-specific T cells and may aid inthe discovery of dominant disease-associated autoantigens. In addition,these populations may directly carry out pathologic effector functionsthat can be targeted therapeutically (13).

For many autoimmune diseases, directly assaying affected tissues isdifficult because samples are only available through invasive proce-dures. Instead, querying peripheral blood for altered immune cell popu-lations is a rapidly scalable strategy that achieves larger sample sizes andallows for serial monitoring. Flow cytometric studies have identifiedalterations in specific circulating T cell subsets in RA patients, includingan increased frequency of CD28−CD4+ T cells (14–16); however, theexpansion of CD28− T cells represents one of the relatively few T cellalterations that have been reproducibly detected by multiple groups.Limited reproducibility may be the consequence of differences inclinical cohorts, small sample sizes, methodologic variability, and useof limited idiosyncratic combinations of phenotypic markers (17).

The advent ofmass cytometry nowallows for relatively broad assess-ment of circulating immune cell populations with >30 markers (18),

1 of 17

Page 2: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

enabling detailed, multiparametric char-acterization of lymphocyte subsets. Thistechnology provides the potential to de-fine and quantify lymphocyte subsets athigh resolution using multiple markers.Although the discovery of novel cellularpopulations has been enabled by rapidprogress in developing sensitive clusteringmethods (19–23), a key challenge that re-mains is establishing methods to identifycell populations associated with a disease.In particular, interindividual variation andtechnical variation can influence cell pop-ulation frequencies and need to be ac-counted for in an association framework.Single-cell association studies, with eithermass cytometry or single-cell RNA se-

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

quencing (RNA-seq), require statistical strategies robust to inter-individual donor variability and technical effects that can skew cellsubset estimates. For example, mass cytometry studies need to con-trol for variability in machine sensitivity, reagent staining, and sam-ple handling that can lead to batch effects. Interindividual differencescan lead to real shifts in cell population frequencies at baseline,whereas technical effects can lead to apparent shifts in cell popula-tion frequencies.

Here, we describe a robust statistical method to test for diseaseassociations with single-cell data calledMASC (mixed-effects modelingof associations of single cells), which tests, at the single-cell level, theassociationbetweenpopulation clusters anddisease status. It is a “reverse”association strategy where the case-control status is an independentvariable, rather than the dependent variable. We applied MASC toidentify T cell subsets associated with RA in a mass cytometry case-control immunophenotyping dataset that we generated focused onCD4+ T cells. This high-dimensional analysis enabled us to identifydisease-specific changes in canonical as well as noncanonical CD4+

T cell populations using a panel of 32 markers to reveal cell lineage,activation, and function (24, 25). Using MASC, we identified a popula-tion ofmemory CD4+ T cells, characterized as CD27− human leukocyteantigen (HLA)–DR+ which was expanded in the circulation of RA pa-tients. Further, we found that CD27−HLA-DR+ T cells were enrichedwithin inflamed RA joints, rapidly produced interferon-g (IFN-g) andcytolytic factors, and contracted with successful treatment of RA.

RESULTSStatistical and computational strategyWe acquired single-cell mass cytometry data from RA case and osteo-arthritis (OA) control peripheral blood samples (Fig. 1) and then ap-plied MASC after stringent quality controls to remove technicalartifacts and poorly stained cells that are typically observed in masscytometry data (Materials and Methods). First, we objectively definedsubsets of cells using an equal number of randomcells from each sampleso that they contributed equally to the subsequent analyses. We thenused DensVM (26) to cluster the mass cytometry data. Finally, we ap-plied MASC to identify differentially abundant cellular populationsassociated with disease.

MASC is a reverse association strategy that uses single-cell logisticmixed-effect modeling to test individual cellular populations for as-sociation by predicting the subset membership of each cell based on

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

fixed effects (e.g., sex) and random effects (e.g., batch and donor). Itassumes a nullmodel where the subsetmembership of each single cell isestimated by fixed and random effects without considering the case-control status of the samples. Thus, under this null framework,we assumethat variation in cluster frequencies is not associated with case-controlstatus. We then measured the improvement in model fit when a fixed-effect term for the case-control status of the sample was included with alikelihood ratio test. This framework allowed us to evaluate the signif-icance and effect size of the case-control association for each subsetwhile controlling for interindividual and technical variability.

To ensure that MASC had appropriately calibrated type 1 error, weran the analysis 10,000 times after permuting case-control labels usingthe dataset described below to obviate almost any case-control associa-tion that might be present. We then recorded the reported P values foreach cluster. Under ideal circumstances, 5% of these 10,000 trials wouldachieve a P value of <0.05 purely by chance in this randomdataset; con-versely, an inflated method would have much greater than 5% of trialsobtaining a P value of <0.05. This approach demonstrated that MASChas only a modestly inflated type 1 error rate for the 19 clusters in thedataset, with 6.5% of trials obtaining P < 0.05 (fig. S1A). We found thatincluding both donor and batch random effects was critical; elimi-nating random effects in the model led to highly inflated P values,with 66.1% of trials obtaining P < 0.05 (fig. S1B). As an alternativeand frequently used strategy, we also tested a simple binomial test forcase-control association. This approach is limited in our view becauseit fails tomodel donor-specific and technical effects. Unsurprisingly, wefound that this commonly used approach produced highly inflatedresults in comparison to MASC, with 65.7% of trials obtaining P <0.05 (fig. S1C).

Experimental strategyWeappliedMASC tomass cytometric analysis ofmemory CD4+T cellsthat were magnetically isolated from peripheral blood mononuclearcells (PBMCs) from patients with established RA (cases) and non-inflammatory OA controls (Table 1). We either (i) rested the cells for24 hours or (ii) stimulated them with anti-CD3/anti-CD28 beads for24 hours before analyzing cells with a 32-marker mass cytometry panelthat included 22 markers of lineage, activation, and function (table S1).This experimental design allowed us to interrogate immune statesacross cases and controls and also capture stimulation-dependentchanges. After stringent quality control measures, we analyzed a totalof 26 RA cases and 26 controls. We randomly sampled 1000 cells from

Fig. 1. MASC overview. Single-cell transcriptomics or proteomics are used to assay samples from cases andcontrols, such as immunoprofiling of peripheral blood. The data are then clustered to define populations of similarcells. Mixed-effects logistic regression is used to predict individual cell membership in previously defined popula-tions. The addition of a case-control term to the regression model allows the user to identify populations for whichcase-control status is significantly associated.

2 of 17

Page 3: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

#2 #9

54 5

F F

8 7

9 N

Ye N

S C I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

each of the 52 samples so that each sample contributed equally to theanalysis, preventing samples that happened to havemore cells capturedby themass cytometry assay from being overrepresented.We projectedresting and stimulated T cell data separately using the Barnes-Hutmod-ification to the t-SNE (t-distributed stochastic neighbor embedding)algorithm (27) so that all cells from all samples were projected intothe same two dimensions using all markers in the panel, with the excep-tion of CD4 andCD45RO, which we used to gate CD4+memory T cellsfor analysis.

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

Clustering approachProjecting the data into t-SNE space revealed areas of local density thatconsisted predominantly of cells fromRA orOA samples (fig. S2).Wewanted to identify CD4+memory T cell populations in an unsupervisedmanner. Currently, clustering high-dimensional single-cell data (suchas mass cytometry or single-cell RNA-seq data) is an active area of re-search, and there is no consensus on the best clustering strategy. Hence,we objectively considered multiple clustering algorithm options. Weevaluated DensVM (26), FlowSOM (28), and PhenoGraph (23), which

Table 1. Clinical characteristics of patient samples used. Mean ± SD is shown. Parentheses indicate percentages. “Other biologics” includes rituximab,tofacitinib, and abatacept. ACPA, anticitrullinated protein antibody; CDAI, clinical disease activity index; CRP, C-reactive protein; NA, not applicable; TNF, tumornecrosis factor.

Mass cytometry cohort

Flow cytometry cohort

Controls

Cases Controls Cases

Blood cross-sectionalcohorts

Number

26 26 27 39

Age

57 ± 15 66 ± 9 61 ± 14 58 ± 14

Female

19 (73) 20 (77) 18 (64) 30 (77)

ACPA- or RF-positive

NA 22 (85) NA 39 (100)

CRP (mg/liter)

NA 8.6 ± 16.9 NA 9.8 ± 17.9

CDAI

NA 9.3 ± 4.4 NA 13.7 ± 7.4

Methotrexate

0 18 (69) 0 18 (46)

Anti-TNF

0 10 (38) 0 16 (41)

Other biologics

0 5 (19) 0 9 (23)

Longitudinalcohort

3

Blood longitudinal cohort

Number 18

Age

49 ± 17

Female

17 (94)

ACPA- or RF-positive

18 (100)

CDAI before

17.6 ± 9.3

CDAI after

6.3 ± 4.2

Started methotrexate

7

Started anti-TNF

4

Started other biologic

7

Patient

#1 #3 #4 #5 #6 #7 #8

Synovial tissue donors

Age 57 76 46 46 79 62 63 2

Sex

F F F F F M M

CRP (mg/liter)

25 8 11 17 19 13 66 6

CDAI

14 17 15 21 25 5 9 A

Methotrexate

No s No No No No No Yes o

Biologic therapy

Yes s Yes Yes Yes No No No o Ye N

of 17

Page 4: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

Dow

nl

were identified as among the best performing in a recent benchmarkingcomparison of clusteringmethods for high-dimensional cytometry data(29). TheDensVMmethoduses a t-SNEprojection of the dataset to firstestimate the number of clusters by searching for local densities on theprojection with varying bandwidths before classifying cells based on thesimilarity of expression. PhenoGraph works by creating a graph repre-senting phenotypic similarities between cells and identifying clustersusing Louvain community detection. The FlowSOM algorithm buildsa self-organizing map with a minimum spanning tree to detect popula-tions and then classifies cells in a meta-clustering step using consensushierarchical clustering. We clustered cells with all three methods usingthe same selection ofmarkers that was used to create t-SNE projections.

After running all three algorithms on the dataset, we identified 19,19, and 21 clusters for DensVM, FlowSOM, and PhenoGraph, respec-tively. An ideal clustering algorithm would define clusters that aredistinct from each other in terms ofmarker intensities. However, clusterintensity differences should not be driven by batch effects; that is,clusters should not be disproportionately constituted by cells fromany individual batch.

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

To quantify the extent to which clustering approaches definedclusters with distinct marker intensities, we used a marker informative-ness metric (MIM) (Materials and Methods). All three methods weresimilar in their ability to generate clusters with distinct marker intensities(fig. S3A). Then, to determine whether those intensity differences weredependent on batch differences, we used a clustering informative-ness metric (CIM)–based metric to assess whether clusters were dis-proportionately represented by individual batches (Materials andMethods). Here, we observed that PhenoGraph and FlowSOM weremuchmore sensitive to batch effects than clusters identified by DensVM(P < 2 × 10−3, Wilcoxon rank-sum test; fig. S3B). Consistent with thisquantitative assessment, we observed that, in our data, FlowSOM andPhenoGraph produced clusters that were constituted exclusively of ordominated by cells from a specific batch. Thus, we chose to analyzeclusters identified by the DensVM algorithm going forward.

Landscape of CD4+ memory T cell subsetsWeobserved substantial diversity among restingCD4+memoryT cells inboth cases and controls, consistent with previous reports demonstrating

by guest on October 2, 2020

http://stm.sciencem

ag.org/oaded from

ffff

Density

aCD62L CXCR3 CD25 FoxP3 CD27 HLA-DR

IL-2 IFN-γ IL-17A FoxP3 CD27 HLA-DR

Low

Expression

T

TH1

HLA-DR

T

TH1

A

C

FE

High

24513136810129141615711181917

CD

4

CD

45R

O

CD

3

CD

8a

CD

62L

CD

28

CD

27

CD

38

CD

40L

CD

25

CC

R5

CX

CR

3

CX

CR

5

Fox

P3

IFN

g

HLA

.DR

PD

.1

TN

Fa

Per

forin

IL17

A

IL2

IL5

24513136810129141615711181917

0

1

2

3

4

0

1

2

3

4

Ave

rage

exp

ress

ion

Ave

rage

exp

ress

ion

CD

4

CD

45R

O

CD

3

CD

8a

CD

62L

CD

28

CD

27

CD

38

CD

40L

CD

25

CC

R5

CX

CR

3

CX

CR

5

Fox

P3

IFN

g

HLA

.DR

PD

.1

TN

Fa

Per

for in

IL17

A

IL2

IL5

710111469131516191718202145123812

0

1

2

3

4

5

D

B

t-SNE1

t-S

NE

2

t-SNE1

t-S

NE

2

t-SNE1

t-S

NE

2

t-SNE1

t-S

NE

2

Density

reg

CD27+

+

HLA-DRCD27

reg

Fig. 2. Diversity of CD4+ memory T cells before and after stimulation. (A) t-SNE projection of 50,000 resting CD4+ memory T cells sampled equally from RA patients(n = 24) and controls (n = 26). DensVM identified 19 populations in this dataset. (B) Same t-SNE projections as in (A) colored by the density of cells on the SNE plot or theexpression of the markers labeled above each panel. (C) t-SNE projection of 52,000 CD4+ stimulated memory T cells sampled equally from RA patients (n = 26) andcontrols (n = 26). Cells were stimulated for 24 hours with anti-CD3/anti-CD28 beads. (D) Same t-SNE projections as in (C) colored by the density of cells on the SNE plotor the expression of the markers labeled above each panel. (E) Heatmap showing mean expression of indicated markers across the 19 populations found in resting cells.(F) Heatmap showing mean expression of indicated markers across the 21 populations found after stimulation. Protein expression data are shown after arcsinhtransformation. All markers but CD4 and CD45RO were used to create t-SNE projections and perform clustering.

4 of 17

Page 5: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

a breadth of phenotypes in CD8+ T cells and CD4+ T cells (30–32).We identified 19 distinct subsets in resting (R) memory CD4+ T cells(Fig. 2A and figs. S4Aand S5). CentralmemoryT cells (TCM) segregatedfrom effector memory T cells (TEM) by the expression of CD62L (Fig.2B). Five subsets (subsets 1 to 5) of TCM, all expressing CD62L, variedin expression of CD27 and CD38, highlighting the heterogeneity withinthe TCM compartment (Fig. 2E).We identified twoThelper 1 (TH1) cellsubsets (subsets 8 and 12) and two Treg subsets (subsets 7 and 11). BothTreg subsets expressed high levels of CD25 and FoxP3, and subset 11also expressed HLA-DR, reflecting a known diversity among Treg pop-ulations in humans (33).

When applied to the stimulated CD4+ T cell data in a separate anal-ysis, DensVM identified 21 subsets among all case and control samples(Fig. 2C and figs. S4B and S5). As expected, certain activation markers,such as CD25 and CD40L, were broadly expressed across most subsetsafter stimulation (Fig. 2D).Mass cytometry robustly detected cytokineproduction from stimulated CD4+ memory T cells (Fig. 2, D and F).Activated effector cells identified by the production of interleukin-2(IL-2) were separated into three groups (subsets 1 to 3) according tothe relative expression of TNF. Cytokine expression after stimulationalso improved the ability to resolve certain CD4+ Teff subsets, such asTH17 cells (subset 15) and TH1 cells (subset 12). However, we were un-able to resolve the Treg subtypes that were observed in the resting T cells;after stimulation, all CD25+ FoxP3+ cells were grouped together (subset21). The set of cells that did not activate after stimulation (subsets 16, 17,and 19) were easily identified by the lack of expression of activationmarkers, such as CD25, CD40L, and cytokines, and the retention ofhigh CD3 expression.

Identifying populations that are enriched or depleted inRA samplesWe next sought to identify subsets that were significantly overrepre-sented or underrepresented in patient cells. The frequency of RA cellsin each subset ranged from 36.7 to 63.6% in the resting state and from38.9 to 65.7% in the stimulated state (table S2 and fig. S6A). Visualizingthe density of the t-SNE projections revealed that related cells clusteredinto dense groups both at rest and after stimulation (Fig. 2, B and D),and plotting t-SNE projections of cells from cases and controls sepa-rately while coloring by density suggested differential abundance ofRA cells among clusters (fig. S2). Accounting for subject- and batch-specific random effects withMASC (Materials andMethods), we ob-served three populations with significantly altered proportions of cellsfrom cases in the resting T cell data. We observed enrichment in subset18 (P = 5.9 × 10−4; Table 2 and Fig. 3A); this subset consisted of 3.1%of total cells from cases compared to 1.7% of cells from controls and

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

achieved an odds ratio (OR) of 1.9 [95% confidence interval (CI), 1.3to 2.7]. Conversely, subset 7 (P = 8.8 × 10−4; OR, 0.6; 95% CI, 0.5 to0.8) and subset 12 (P = 2.0 × 10−3; OR, 0.5; 95% CI, 0.4 to 0.8) wereunderrepresented for RA cells.

To confirm the robustness of this analytical model, we conducted astringent permutation test that was robust to potential batch effects. Tocontrol for batch, we reassigned case-control labels randomly to eachsample, retaining the same number of cases and controls in each batch.For each T cell subset, we recorded the number of case cells after ran-domization to define a null distribution. In the real data, we observedthat 753 of the 1184 cells in subset 18 were from case samples. In con-trast, when we permuted the data, we observed a mean of 550 cells andan SD of 63 cells from case samples. In only 93 of 100,000 permutationsdid we observemore than 753 cells from case samples in the random-ized data (permutation P = 9.4 × 10−4; table S2). We applied permuta-tion testing to every subset and found that the P values produced bypermutationwere similar to those derived from themixed-effectsmodelframework (Spearman’s r = 0.86; fig. S6, B and C).Whenwe consideredthe subsets identified as significant by MASC, both subsets 18 (permu-tation P = 9.4 × 10−4) and 7 (permutation P = 1.8 × 10−4) retainedsignificance, whereas subset 12 demonstrated only nominal evidence ofcase-control association (permutation P = 1.3 × 10−2).

Next, for each resting T cell subset, we identified correspondingsubsets in the stimulated T cell dataset using a cluster centroid align-ment strategy to calculate the distance between subsets across datasets(Materials and Methods). Subset 18 in the resting dataset was mostsimilar to subset 18 in the stimulated dataset, whereas subset 7 in theresting dataset wasmost similar to subset 21 in the stimulated dataset(fig. S6, D and E). Applying MASC, we observed case-control asso-ciation for subset 18 in the stimulated data (P = 1.2 × 10−3; OR, 1.7;95%CI, 1.2 to 2.2), whereas subset 21 (P= 0.55) was not significant inthe stimulated data (Table 2 and Fig. 3B). We identified one additionalpopulation subset that was significant in the stimulated dataset: subset20 (P = 1.7 × 10−3; OR, 1.7; 95% CI, 1.2 to 2.3) (table S3).

We wanted to ensure that the results we observed were not anartifact of using DensVM to cluster the cytometry data. When we usedPhenoGraph (23) and FlowSOM (28) to cluster the same mass cytom-etry dataset, we observed a CD27−, HLA-DR+ cluster with eithermethod (fig. S7, A and C) with association to RA by MASC (fig. S7,B and D). We also assessed how analysis byMASC compared to Citrus(34), an algorithm that uses hierarchical clustering to define cellular sub-sets and then builds a set of models using the clusters to stratify casesand controls. When applied to the same case-control resting dataset,Citrus was unable to producemodels with an acceptable cross-validationerror rate, regardless of the method used (fig. S8).

Table 2. Overview of the subsets found to be significantly expanded in RA. RA proportion reflects the fraction of cells in the subset that were from RAdonors. The 95% CI is shown next to the OR.

Condition

Description Subset RA proportion P OR Test

Resting

HLA-DR+, CD27− 18 0.636 5.9 × 10−4 1.9 (1.3–2.7) MASC

Stimulated

HLA-DR+, CD27− 18 0.619 1.3 × 10−3 1.7 (1.2–2.2) MASC

Flow cytometry replication

GatedHLA-DR+, CD27−

NA

NA 4.4 × 10−2 NA One-tailed t test

Meta-analysis

HLA-DR+, CD27− NA NA 2.3 × 10−4 NA Stouffer’s Z-score method

5 of 17

Page 6: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

CD27− HLA-DR+ CD4+ TEM expansion in RAAfter noting that subset 18 demonstrated robust association with RA,we interrogated the key features of this population. The lack of CD62Lexpression in subset 18 indicated that this subset was a TEM population.To define the markers that best differentiated this subset from othercells, we calculated the MIM for the expression of each marker in thesubset for both the resting and stimulated datasets (Materials andMethods and Fig. 4A). The expression of HLA-DR and perforin wasnotably increased in subset 18 compared to all other cells, whereasthe expression of CD27 was decreased in this subset (Fig. 4B). Weobserved that gating on CD27 and HLA-DR largely recapitulatedsubset 18 in both the resting cells and the stimulated cells (Fig. 4,C and D), with an F measure of 0.8 and 0.7, respectively (Materialsand Methods). Although HLA-DR is known to be expressed on T cellsin response to activation, it takes several days to induce strongHLA-DRexpression (35). Thus, it is likely that cells in subset 18 expressedHLA-DRbefore stimulation such that analyses of both resting and stim-ulated cells identified the same HLA-DR+ CD27− TEM population.

To confirm the expansion of theCD27−HLA-DR+T cell populationin RA patients that we observed by mass cytometry, we evaluated thefrequency of CD27− HLA-DR+ T cells in an independent cohort of39 seropositive RA patients and 27 noninflammatory controls withOAusing conventional flow cytometry (Table 1).We determined thepercentage of memory CD4+ T cells with a CD27− HLA-DR+ pheno-type by gating individual samples from each group (Fig. 3C and figs.S9A and S10). Consistent with the mass cytometry analysis, CD27−

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

HLA-DR+ cells were significantly expandedin theRApatient samples (P= 0.044, one-tailed t test; P > 0.52, Shapiro-Wilk nor-mality test; Fig. 3D). The frequency of thissubset was 0.8% in controls and 1.7% inRA samples, which was similar to thetwofold enrichment we observed in themass cytometry data.We then consideredthat the mass cytometry and flow cytom-etry association results together in ameta-analysis, confirming that CD27−HLA-DR+

cells significantly associated with RA (P =2.3 × 10−4, Stouffer’s Z-score method;Table 2).

To assess the effect of RA treatment onCD27−HLA-DR+ cell frequency,wequan-tified CD27−HLA-DR+ cell frequencies in23 RA patients before and 3 months afterinitiation of a new medication for RA.We dichotomized patients as those whoexperienced a clinical response, defined asa reduction in CDAI (36) scores (DCDAI−),versus those who did not, defined as anincrease in CDAI scores (DCDAI+). Weobserved that changes in CD27−HLA-DR+

cell frequency tracked with clinical re-sponse to treatment initiation (P = 3.49 ×10−4, Wilcoxon signed-rank test; Fig. 5A).Specifically, in the 18 DCDAI− patients,the frequency of CD27- HLA-DR+ cellssignificantly reduced by 0.7-fold (P =0.006, Wilcoxon signed-rank test; fig.S11B); in contrast, the 5 DCDAI+ patients

in this same trial had a 1.8-fold increase in CD27− HLA-DR+ cells,although the increase was not statistically significant.We did observea significant difference in the CD27− HLA-DR+ frequency fold changebetween patients experiencing a CDAI reduction versus not (P =0.02, Wilcoxon rank-sum test; fig. S11A). The decrease in frequency ofCD27−HLA-DR+ cells in patients responding to treatment escalationwas accompanied by a slight increase in the frequency of CD27+

HLA-DR− cells: CD27+ HLA-DR− cells represented an average of86.2% of CD4+ memory T cells before treatment, and an averageof 87.9% of CD4+ memory T cells afterward (P = 0.009, Wilcoxonsigned-rank test; fig. S11C). These results confirm that CD27−

HLA-DR+CD4+T cells were expanded in the circulation of RApatientsand decreased with effective disease treatment. We also examined therelationship between the frequency of CD27−HLA-DR+ CD4+ T cellsand disease activity or therapeutic use, but we found no significantassociations (fig. S12).

To determine whether CD27− HLA-DR+ T cells were furtherenriched at the sites of inflammation in seropositive RA patients, weevaluated T cells in inflamed synovial tissue samples obtained at thetime of arthroplasty (Table 1) and in inflammatory synovial fluidsamples fromRApatients. In a set of nine synovial tissue samples withlymphocytic infiltrates observed by histology, the frequency of CD27−

HLA-DR+ cells was significantly increased fivefold (P < 0.01, Wilcoxonrank-sum test; median, 10.5% of memory CD4+ T cells) compared toblood (Fig. 5B). In two of the tissue samples, >20% of the memoryCD4+ T cells displayed this phenotype. CD27− HLA-DR+ cells were

A B

D

% C

D27

– H

LA

-DR

+ /C

D4+

Tm

em

0.0625

0.125

0.25

0.5

1

2

4

8

16

Control RA

P = 0.04

CD27 HLA-DR

1

2

3

4

5

6

7

8

9

10

11

12

1314

15

16

17

18

19

1

0.1

0.01

0.001

0.25 0.5 1 2 4Odds ratio

Resting

141321

17151 11125 19

32

784

6

16

10

9 2018

1

0.1

0.01

0.001

0.25 0.5 1 2 4Odds ratio

Pva

lue

Stimulated

Pva

lue

C

HLA

-DR

CD27

1.5

57.537.2

3.9

– +CD27 HLA-DR– +

Fig. 3. MASC identifies a population that is expanded in RA. (A and B) ORs and association P values werecalculated by MASC for each population identified the resting (A) and stimulated (B) datasets. The yellow dashedlines indicate the significance threshold after applying the Bonferroni correction for multiple testing. (C) Flow cytometrydot plot of gated memory CD4+ T cells from a single RA donor shows the gates used to identify CD27− HLA-DR+ memoryCD4+ T cells (blue quadrant). (D) Flow cytometric quantification of the percentage of CD27− HLA-DR+ cells among bloodmemory CD4+ T cells in an independent cohort of seropositive RA patients (n = 39) and controls (n = 27). Statisticalsignificance was assessed using a one-tailed t test after assessing normality with a Shapiro-Wilk test (P > 0.52).

6 of 17

Page 7: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

similarly expanded in synovial fluid samples from seropositive RApatients (P < 0.001, Wilcoxon rank-sum test; median, 8.9% of memoryCD4+ T cells; n = 8). Thus, CD27− HLA-DR+ T cells were enriched atthe primary sites of inflammation in RA patients.

Functional and transcriptional features ofCD27− HLA-DR+ CD4+ TEMTo evaluate the potential function of the CD27− HLA-DR+ cell subset,we performed RNA-seq on CD27− HLA-DR+ CD4+ T cells with othereffector populations to identify transcriptomic signatures for this subset

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

(fig. S9, B to D). We sorted and sequenced the following CD4+ T cellpopulations: naïve CD4+ T cells (TN), central memory CD4+ T cells(TCM), regulatory CD4

+ T cells (Treg), and all four CD27−/+ HLA-DR−/+

subsets—defined as DR+27+ Teff (DR+27+), DR+27− Teff (DR+27−),

DR−27+ Teff (DR−27+), and DR−27− Teff (DR

−27−)—from PBMCsin seven RA cases and six OA controls. In total, we generated 1.1 billionreads for 90 samples.We aligned reads with Kallisto (37) and appliedstringent quality controls to remove genes that lacked sufficient ex-pression (Materials and Methods), ultimately resulting in a set of15,234 genes for analysis.

CD8α

IL-17A

FoxP3

IL-5

IFN-γ

PD-1

CXCR5

CD38

TNF

CD28

IL-2

CXCR3

CD40L

CD3

CD25

CCR5

Perforin

CD62L

CD27

HLA-DR

0.0 0.2 0.4 0.6 0.8 1.0

Marker informativeness metric (MIM)

C

D

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

F-m

easu

re

1

0 2 4 6 8

1

0 1 2 3 4

1

0 1 2 3 4

1

0 1 2 3

1

0 1 2

Expression

Resting Stimulated1

0 2 4 6

1

0 1 2 3 4

1

0 1 2 3 4

1

0 1 2 3 4

1

0 1 2 3 4 5

Expression

Marker expression densityB

CCR5

Perforin

CD62L

CD27

HLA-DR

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Cluster

F-m

easu

re

Cluster

A

Fig. 4. CD27 and HLA-DR expression specifically mark the expanded population. (A) Plot of the Kullback-Liebler (KL) divergence for each marker comparing cluster18 to all other cells (gray) in both the resting dataset (red) and the stimulated dataset (blue). (B) Density plots showing expression of the five markers most differentbetween cluster 18 cells (resting, red; stimulated, blue) and all other cells in the same dataset (black line). (C) Left: t-SNE projection of clusters identified in restingdataset. Middle: Same t-SNE projection, with cells gated as CD27− HLA-DR+ colored in red. Right: F-measure scores were calculated for the overlap between gated cellsand each cluster in the resting dataset. (D) Left: t-SNE projection of clusters identified in stimulated dataset. Middle: Same t-SNE projection, with cells gated as CD27−

HLA-DR+ colored in red. Right: F-measure scores were calculated for the overlap between gated cells and each cluster in the stimulated dataset.

7 of 17

Page 8: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

We performed principal components analysis (PCA) on the expres-sion data (Fig. 6A). The first PC, capturing 4% of the total variation,separated cell types along a naïve to effector axis, with the CD27−

HLA-DR+ subset representing the extreme case with the highest PC1values and naïve T cells with the lowest PC1 values. We examinedthe genes with the highest PC1 loadings and found that PC1 was as-sociated negatively with CCR7 and positively with CXCR3 and CCR5.This finding was consistent with the elevated expression of CXCR3 andCCR5we observed in themass cytometry analyses of CD27−HLA-DR+

cells. We note that CXCR3 and CCR5 are both chemokine receptorsassociated with a TH1 phenotype. In addition, PRF1 (perforin), a cy-totoxic factor, was also strongly associated with PC1. The extremeposition of CD27− HLA-DR+ cells along the continuum of CD4+

T cells suggested a possible late or terminal effectormemory phenotype.To further explore this naïve to effector gradient, we used the geneloadings along PC1 to perform GSEA and identify the pathways thatwere most associated. Intriguingly, the naïve to CD27− HLA-DR+ axiswas strongly correlated with naïve versus effector and natural killer(NK) versus CD4+ T cell gene signatures (Fig. 6B).

The association of CXCR3 and CCR5 with PC1 prompted us toexamine the expression of TH1-associated genes in these cells. Theeffector memory cell populations showed increased expression ofmultiple TH1-associated genes, including IFNG (IFN-g) and TBX21(Tbet), compared to TN, Treg, and TCM. In addition, expression of theseTH1-associated genes was higher in CD27− HLA-DR+ compared toCD27+ HLA-DR− effector memory cells, which constituted most ofthe TEM pool (Fig. 6C). In contrast, cytokines characteristic of otherpolarized TH subsets (IL17A, IL4, IL21, and TGFB1) and transcriptionfactors associated with other TH subsets (RORC, GATA3, BCL6, andFOXP3) were not elevated in CD27−HLA-DR+ cells compared to othereffector populations (fig. S13A). We noted that the transcription factor

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

CIITA was also increased in CD27−HLA-DR+ cells (Fig. 6C). CIITAis well known for its role as a regulator of MHC class II; we observedthat targets of CIITA such as HLA genes were significantly overex-pressed in the CD27−HLA-DR+ population compared to other popula-tions (fig. S13B).

As the expression of PRF1 was positively associated with PC1, wewanted to assess the relationship between the expression of cytotoxicgene programs and the naïve to effector PC1 gradient.We used GSEAto query whether NK cell–associated genes were up-regulated in theCD27− HLA-DR+ population. Specifically, we ranked genes by theirdegree of differential expression, comparing expression in CD27−

HLA-DR+ cells versus that seen in all other cell types. For NK cell–associated genes, we used the previously defined geneset fromMSigDBthat defines genes up-regulated in NK cells versus CD4+ T cells. Genesthat were highly expressed inNK cells similarly showed high expressionin CD27− HLA-DR+ cells, whereas genes with low expression in NKcells showed similarly low expression in CD27−HLA-DR+ cells (tableS4). Consistent with the enrichment results, a set of genes characteris-tically associated with cytolytic cells, including PRF1, GZMB, GZMA,GNLY, and NKG7, were all increased in expression in CD27− HLA-DR+

cells compared to most other T cell populations analyzed (Fig. 6D).These results indicate that the CD27− HLA-DR+ cells expressed atranscriptomic signature characteristic of cytotoxic cells.

We also evaluated the production and expression of cytokines andcytolytic factors at the protein level by intracellular flow cytometry.We assessed production of effector molecules by cells after in vitrostimulation with phorbol 12-myristate 13-acetate (PMA) + ionomycin,a stimulation method that readily reveals T cell capacity for cytokineproduction. Consistent with RNA-seq analyses, CD27−HLA-DR+ cellsalsomore frequently expressed perforin (P=0.031, one-tailedWilcoxonsigned-rank test) and granzyme A (P = 0.015, one-tailed Wilcoxon

B

0

1

2

3

4

5

6

1/8 1/4 1/2 1 2 4 8Fold change after treatment

Bef

ore

tre

atm

ent

Response+ CDAI– CDAI

P = 3.49 × 10–4

n = 23

P < 5 × 10–2

*

**

**

* **

P < 5 × 10–5

P < 5 × 10–10

****

*

**

A

*

*

0.0625

0.125

0.25

0.5

1

2

4

8

16

32

Control b

lood

RA blo

od

RA synovia

l flu

id

RA synovia

l tiss

ue

% CD27– HLA-DR+ /CD4 + TMEM

% C

D27

– H

LA

-DR

+ /C

D4+

Tm

em

% C

D27

– H

LA

-DR

+ /C

D4+

Tm

em

Fig. 5. CD27− HLA-DR+ memory CD4+ T cells are expanded in the blood and joints of patients with active RA. (A) Flow cytometric quantification of the frequencyof CD27− HLA-DR+ memory CD4+ T cells in 18 RA patients before starting a new medication, plotted against change in cell frequency after 3 months of new therapy.Treatment significantly reduced CD27− HLA-DR+ cell frequency as determined by a Wilcoxon signed-rank test. (B) Flow cytometric quantification of the percentage ofmemory CD4+ T cells with a CD27− HLA-DR+ phenotype in cells from seropositive RA synovial fluid (n = 8) and synovial tissue (n = 9) compared to blood samples fromRA patients and controls. Blood sample data are the same as shown in Fig. 3D. Significance was assessed using one-tailed t test after determining normality with aShapiro-Wilk test (P > 0.52) and applying a Bonferroni correction for multiple testing.

8 of 17

Page 9: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

signed-rank test) than did CD27+ HLA-DR− cells, which constitutedmost of the memory CD4+ T cells. In addition, CD27− HLA-DR+ cellsproduced IFN-g at a much higher frequency (P = 0.009, one-tailedWilcoxon signed-rank test; Fig. 6E). Percent positivity for each markerwas determined by selecting an expression level threshold to deter-mine positive staining (fig. S14). Together, broad analyses of gene ex-pression and targeted measures of effector molecule production at the

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

protein level indicated that CD27−HLA-DR+T cells are a TH1-skewedT cell population capable of producing cytotoxic molecules.

DISCUSSIONMass cytometry has been successfully applied to decipher the heteroge-neity of human T cells in multiple settings, including the identification

0

10

20

30

40

% G

ran

zym

e A

0

1

2

3

4

5

% P

erfo

rin

0

25

50

75

100

% IF

N-γ

+P = 0.009 P = 0.031 P = 0.015

GZMK NKG7 PRF1

GNLY GZMA GZMB

0.0

2.5

5.0

0

2

4

6

8

0

2

4

6

0

3

6

4

6

8

0

2

4

6

8

Cell type

GSE22886 Natural killer cell enrichment

GSE11057 CD4+ effector memory cell enrichment

B

C

D

E

A

−80

0−

600

−40

0−

200

0S

core

P

−5

05

Val

ues

Edge value = −3.6

−60

0−

400

−20

00

Sco

re

P < 1 × 10

−5

05

Val

ues

Edge value = −3.8

HLA−DRA IFNG TBX21 TNF

CD27 CCR5 CIITA CXCR3

T Treg

T

0

2

4

6

8

1

2

3

4

2

3

4

5

6

0

1

2

3

0

2

4

6

−101234

2

4

6

0

2

4

6

8

Cell type

Log

(1 +

TP

M)

DR 27

DR 27

DR 27

DR 27

−0.2

−0.1

0.0

0.1

−0.1 0.0 0.1PC1 (14%)

PC

2 (6

%)

Cell type

TTreg

TDR 27DR 27DR 27DR 27

HLA-DRAHLA-DQB1HLA-DRB5PRF1HLA-D PA1HLA-DPB1

CCR7

SELLCD27

PC1 loading

5

−5

−2

−1

0

1

2Z score

HLA-HCXCR3

(adj P = 2.53 × 10 )–19 (adj P = 3.66 × 10 )–20

(adj P = 9.04 × 10 )–15

(adj P = 2.86 × 10 )–21

(adj P = 2.51 × 10 )–14 (adj P = 5.19 × 10 )–12(adj P = 6.52 × 10 )–10

(adj P = 4.52 × 10 )–8

(adj P = 1.08 × 10 )–7

(adj P = 1.42 × 10 )–8 (adj P = 1.64 × 10 )–8

(adj P = 2.38 × 10 )–15

(adj P = 7.93 × 10 )–26

(adj P = 2.32 × 10 )–18

2Lo

g (1

+ T

PM

)2

CMna

ïve

T Treg

T CMna

ïve

T Treg

T CMna

ïve

T Treg

T CMna

ïve− −− −+

+ ++

DR 27

DR 27

DR 27

DR 27

− −− −+

+ ++

DR 27

DR 27

DR 27

DR 27

− −− −+

+ ++

DR 27

DR 27

DR 27

DR 27

− −− −+

+ ++

++

TregT CM N

DR 27

DR 27

DR 27

DR 27− −

− −+

−−

+

+−

−−

+

+−

−−

+

+

+ ++

T NNT TregT CM

DR 27

DR 27

DR 27

DR 27− −

− −++ +

+T Tre

gT CM

DR 27

DR 27

DR 27

DR 27− −

− −++ +

+

–4 < 1 × 10–4

Tnaïve Tre

gT CM

DR 27

+

DR 27

++

DR 27−−

DR 27−−

CM

N

+

+

+

+

− −

Fig. 6. Transcriptomic characterization of CD27−HLA-DR+memory CD4+ T cells identified a TH1-skewed cytotoxic phenotype. RNA-seq characterization of CD27−

HLA-DR+ (DR+27−) cells and six related CD4+ T cell populations: TN, Treg, TCM, and three populations of TEM [CD27+ HLA-DR− (DR−27+), CD27+ HLA-DR+ (DR+27+), and CD27−HLA-DR−

(DR−27−)]. (A) PCA plot (top) and PC1 gene loadings (bottom) of 90 samples from the seven CD4+ T cell populations. Cells were colored on the PCA plot according to known celltype. Normal confidence ellipses at 1 SD were plotted for each cell type. The 300most positive and 300 most negative PC1 gene loadings for each cell type were averagedand plotted in the heatmap. Genes relevant to the CD27−HLA-DR+ populationwere labeled. (B) Gene set enrichment analysis (GSEA) was performed on all genes, ranked on theirPC1 loadings. Two significantly enriched gene sets, NK signature (GSE22886 NAIVE CD4 T CELL VS NKCELL DN) and TEM signature (GSE11057 NAIVE VS EFFMEMORY CD4 T CELL),are shown. (C) Distribution of log-scaled expression of six canonical TH1 genes: CCR5, CIITA, CXCR3, IFNG, TBX21 (Tbet), and TNF. Populations are ordered by PC1 loading values, withCD27− HLA-DR+ population highlighted in red. TPM, transcripts per million. (D) Distribution of log-scaled gene expression of six canonical cytotoxic genes: GNLY, GZMA, GZMB,GMZK, NKG7, and PRF1. Populations are ordered by PC1 loading values, with the CD27− HLA-DR+ population highlighted in red. Reported P values in (C) and (D) correspond to alinear model of gene expression against ordered cell type (as an ordinal variable), with P values adjusted for multiple testing by the Benjamini-Hochberg procedure. (E) Cytokineexpression determined by intracellular cytokine staining of peripheral effectormemory CD4+ T cells after in vitro stimulationwith PMA/ionomycin. The percentage of cells positivefor each stain is plotted for CD27+HLA-DR− andCD27−HLA-DR+ subsets. Eachdot represents a separate donor (n=12; 6 RApatients and 6 controls, except for the quantificationofgranyzme A and perforin, where n = 6; 3 RA patients and 3 controls). Statistical significance was assessed using a Wilcoxon signed-rank test.

9 of 17

Page 10: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

of disease-specific changes to circulating immune cell populations andmapping of developmental pathways (30, 38–41). It and other emergingsingle-cell strategies offer a promising avenue to characterize the T celland other immunological features of a wide range of diseases in humans.However, successful application of mass cytometry on a larger scale(>20 samples) to conduct association studies on clinical phenotypesin humans has been limited thus far (42–45), in part due to the limitedavailability of effective association testing strategies.

Although there are many approaches to cluster single-cell data(19–23, 26, 28), extension of these methods to perform case-controlassociation testing is not straightforward. The simplest strategy wouldbe to define subsets of interest by clustering the cells and then compar-ing frequencies of these subsets in cases and controls with a univariatetest such as a t test or a nonparametric Mann-Whitney test. Althoughcommonly used in flow cytometry analysis, this approach is markedlyunderpowered because it relies upon reducing single-cell data to poten-tially inaccurate per-sample subset frequencies.

In contrast, our methodology takes full advantage of single-cellmeasurements by using a “reverse association” framework to testwhether case-control status influences the membership of a given cellin a population—that is, each single cell was treated as a single event.However, these cells are not entirely statistically independent. Failing toaccount for dependencies, for example, by using a binomial test or notaccounting for randomeffects, can result in considerably inflated statisticsand irreproducible associations (fig. S1). Using amixed-effects logisticregressionmodel allowed us to account for covariance in single-cell datainduced by technical and biological factors that could confound associ-ation signals (Fig. 1 and fig. S1B), without inflated association tests.MASC also allows users to use technical covariates relevant to a single-cell measurement (for example, read depth per cell for single-cellRNA-seq or signal quality for mass cytometry) that might influencecluster membership for a single cell.

We note that Citrus, an association strategy to automatically highlightstatistical differences between experimental groups and identify pre-dictive populations in mass cytometry data (34), was unable to iden-tify the expanded CD27−HLA-DR+ population. We believe that ourmethodology compared favorably to Citrus because it incorporatedboth technical covariates (e.g., batch) and clinical covariates whenmodeling associations, a key feature when analyzing high-dimensionaldatasets of large disease cohorts. In addition, the agglomerative hierar-chical clustering framework in Citrus substantially increased the testingburden and limited power.

Although we found that clustering with DensVM outperformedFlowSOMand PhenoGraph on our data (fig. S3), wewant to emphasizethat many clustering methods have emerged to analyze both cytometryand single-cell RNA-seq data; clustering single-cell data remains anactive field of research (46, 47). There continues to be controversy asto the best strategies, most appropriate metrics, and the optimalparameter choices (46, 48–50). Available clustering approaches use avariety of methods, such as graph-based community detection, self-or-ganizing maps (28), and density-based clustering (20, 23, 51). Becauseneural networks and a subclass, autoencoders, have been proven to bepowerful general nonlinear models, we implemented an autoencoder-based clustering approach that we tested on our datasets. We foundthat this clustering method also had potential and should be furtherinvestigated in the future (fig. S15). Because different clustering strate-giesmay bemore appropriate for different datasets, we have implemen-tedMASC specifically to allow for different clustering options based onuser preference.

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

The CD27−HLA-DR+ T cells identified byMASC in blood samplesfromRApatients were further enriched in both synovial tissue and syn-ovial fluid of RA patients. The accumulation of these cells in chronicallyinflamedRA joints, combinedwith the lack of CD27, suggests that thesecells have been chronically activated. Loss of CD27 is characteristic ofT cells that have been repeatedly activated, for example, T cells that rec-ognize common restimulation antigens (52–54). The broader CD27−

CD4+ T cell population did not itself differ in frequency betweenRA patients and controls, in contrast to the expanded population ofCD27− HLA-DR+ cells identified by MASC. We hypothesize that theCD27− HLA-DR+ T cell population in RA patients may be enrichedin RA antigen-specific T cells, offering a potential tool to identify rele-vant antigens in RA.

Expression of HLA-DR suggests recent or continued activation ofthese cells in vivo. The well-known HLA class II association to RA mayalso suggest an important role in disease susceptibility for this subset.Despite the suspected chronic activation of these cells, CD27−HLA-DR+

cells did not appear functionally exhausted. CD27−HLA-DR+ cells rap-idly produced multiple effector cytokines upon stimulation in vitro,with an increased predisposition to IFN-g production. These cells alsoshowed increased expression of cytolyticmolecules such as granzymeAand perforin on both the transcriptomic and proteomic level.

CD4+ cytotoxic T cells expressing granzyme A and perforin, alsowith a CD27− phenotype, are reported to be expanded in patients withchronic viral infections (55). Of note, CD4+ cells that are perforin+ andgranzyme A+ have been observed in RA synovial samples (56, 57) andother chronic inflammatory conditions (58, 59). Cytolytic CD4+ T cellslack the capacity to provide B cell help, suggesting that this is a distinctpopulation from the expanded T peripheral helper cell population inseropositive RA (13, 57). These findings nominate CD27− HLA-DR+

T cells as a potential pathogenic T cell population that may participatein the chronic autoimmune response in RA.

Although we were able to detect significant associations between thefrequency of CD27−HLA-DR+ cells and clinical response (Fig. 5A), werecognize the need for a larger study to detect other subtle subpheno-typic associations, such as association with specific therapeutics or dis-ease activity (fig. S12). To this end, larger prospective cohorts with deepimmunophenotyping of CD4+ T cells in blood and tissue will be critical.

We note that this study has certain limitations. First, the degree ofexpansion of CD27− HLA-DR+ T cells in the periphery is modest andvaries between individuals. Although we were able to recovermost ofthe cells identified as expanded by MASC in our mass cytometryexperiments using CD27 and HLA-DR as markers, we note that themore fine-grained immunophenotypingmay help to refine phenotypesthat even more specifically pinpoint the disease-associated T cell pop-ulation. Although we have characterized these cells as effector memoryCD4+ T cells that produce molecules associated with cytotoxicity andTH1 phenotype, further characterization is needed in future studies. Forexample, we assigned an effector memory phenotype to the CD27−

HLA-DR+ cell subset based onmass cytometry data; however, the masscytometry panel did not specifically measure the expression of CD45RA.Although this would appear to leave open the possibility that these cellsmight belong to a TEMRA (CD45RO+ CD45RA+) subset, we note thatour sorting strategy for cells before assay by mass cytometry specificallyexcluded CD45RA+ cells and consider this possibility unlikely. In ad-dition, our study focused exclusively onCD45RO+memoryCD4+T cells;thus, we have not assessed other CD4+ T cell populations that may alsohave cytotoxic features, including T effectormemory CD45RA+ (TEMRA)cells (60). A broader cytometric assessment of T cell and other immune

10 of 17

Page 11: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

cell phenotypes in RA will be of interest in subsequent studies. Also, wedid observe that the mRNA and protein expression of specific markerswas not always concordant. Applyingmore recently developed single-celltechniques that ascertain protein andmRNAexpression simultaneouslywould better describe the dynamics of CD27− HLA-DR+ T cells in re-sponse to stimulation (61, 62).We anticipate thatMASCwill be appliedto test for case-control associated differential abundance acrossmultiplecell types in the future.

In summary, theMASC single-cell associationmodeling frameworkidentified a TH1-skewed cytotoxic effector memory CD4+ T cell popu-lation expanded inRAusing a case-controlmass cytometry dataset. TheMASCmethod is adaptable to any case-control experiment in whichsingle-cell data are available, including flow cytometry, mass cytometry,and single-cell RNA-seq datasets. Although current single-cell RNA-seqstudies are not yet large scale, ongoing projectsmay benefit fromusing theMASC framework in case-control testing or testing for other clinical sub-phenotypes such as specific treatment response or disease progression.

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

MATERIALS AND METHODSStudy designThe objective of this research study was to profile immune subsets inperipheral blood samples from RA patients and controls with OA toidentify disease-related alterations or changes in frequency amongCD4+ T cells. Human subject research was performed in accordancewith the Institutional Review Boards at Partners HealthCare andHospital for Special Surgery via approved protocols (PartnersHealthCareprotocol 2014P00255) with appropriate informed consent as required.Patients with RA fulfilled the American College of Rheumatology 2010Rheumatoid Arthritis classification criteria, and electronic medical re-cords were used to ascertain patients’ rheumatoid factor and anti-CCP(cyclic citrullinated peptide) antibody status, CRP level, andmedicationuse. Synovial tissue samples formass and flow cytometry were collectedfrom seropositive RA patients undergoing arthroplasty at the Hospitalfor Special Surgery, New York, or at Brigham and Women’s Hospital(BWH), Boston. Samples with lymphocytic infiltrates on histologywereselected for analysis. Sample inclusion criteria were established prospec-tively, with the exception of samples that were excluded from the studybecause of poor acquisition by the mass cytometer. The study samplesize was a result of including all samples that passed quality control andwas not set prospectively.

Synovial fluid sampleswereobtainedas excessmaterial froma separatecohort of patients undergoing diagnostic or therapeutic arthrocentesisof an inflammatory knee effusion as directed by the treating rheumatol-ogist. These samples were de-identified; therefore, additional clinicalinformation was not available.

Blood samples for clinical phenotyping were obtained from con-sented patients seen at BWH. We performed medical record reviewfor ACPA positivity according to CCP2 assays, CRP, and RA-specificmedications including methotrexate and biologic disease-modifyingantirheumatic drugs (DMARDs). For blood cell analyses in the cross-sectional cohort, the treating physician measured the CDAI on the dayof sample acquisition. For RA patients followed longitudinally, a newDMARD was initiated at the discretion of the treating rheumatologist,andCDAIswere determined at each visit by trained research study staff.Blood samples were acquired before initiation of a new biologicDMARDor within 1 week of starting methotrexate and 3 months after initiatingDMARD therapy (63). Concurrent prednisone at doses ≤10 mg/daywas permitted. All synovial fluid and blood samples were subjected to

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

density centrifugation using Ficoll-Hypaque to isolate mononuclearcells, which were cryopreserved for batched analyses.

Sample preparation for mass cytometryWe rapidly thawed cryopreserved PBMCs and isolated total CD4+

memory T cells by negative selection using MACS (magnetic-activatedcell sorting) magnetic bead separation technology (Miltenyi). Subse-quently, we rested the CD4+ memory T cells for 24 hours in completeRPMI (Gibco) sterile filtered and supplemented with 15% fetal bovineserum (FBS), 1% penicillin/streptomycin (Gibco), 0.5% essential andnonessential amino acids (Gibco), 1% sodium pyruvate (Gibco), 1%Hepes (Gibco), and 55 mM 2-mercaptoethanol (Gibco). We activatedthe cells using Human T-Activator CD3/CD28 Dynabeads (ThermoFisher) at a density of 1 bead:2 cells. At 6 hours before harvesting (t =18 hours of stimulation), we added monensin and brefeldin A (1:1000)(BD GolgiPlug and BD GolgiStop). After 24 hours of stimulation, weincubated the cells with a rhodium metallointercalator (Fluidigm) inculture at a final dilution of 1:500 for 15 min as a viability measure.We then harvested cells into fluorescence-activated cell sorting (FACS)tubes and washed with CyTOF staining buffer (CSB) composed ofphosphate-buffered saline (PBS) with 0.5% bovine serum albumin (BSA)(Sigma-Aldrich), 0.02% sodium azide (Sigma-Aldrich), and 2 mMEDTA(Ambion).We spun the cells at 500g for 7min at room temperature.Weincubated the resulting cell pellets in 10ml of Fc receptor binding inhibitorpolyclonal antibody (eBioscience) and40ml ofCSB for 10min at 4°C.Thesamples were then incubated for 30min at 4°C on a shaker rack with 1 mlof the following 18 CyTOF surface antibodies in a cocktail brought to avolume of 50 ml per sample in CSB: anti-human CD49D (9F10)–141Pr(Fluidigm), anti-humanCCR5 (CD195)(−P-6G4)–144Nd (Fluidigm),anti-human CD4 (RPA-T4)–145Nd (Fluidigm), anti-human CD8a(RPA-T8)–146Nd (Fluidigm), anti-human CD45RO-147Sm (BWHCyTOFCore), anti-humanCD28-148Nd (BWHCyTOFCore), anti-human CD25 IL-2R–(2A3)–149Sm (Fluidigm), anti-human PD1-151Eu (BWH CyTOF Core), anti-human CD62L (DREG-56)–153Eu(Fluidigm), anti-human CD3 (UCHT1)–154Sm (Fluidigm), anti-humanCD27 (L128)–155Gd (Fluidigm), anti-humanCD183[CXCR3](G025H7)-156Gd (Fluidigm), anti-human CCR7-170Er (BWH CyTOF Core),anti-human ICOS-160Gd (BWH CyTOF Core), anti-human CD38(HIT2)–167Er (Fluidigm), anti-human CD154 (CD40L) (24–31)–168Er(Fluidigm), anti-human CXCR5[CD185](51505)-171Yb (Fluidigm),and anti-human HLA-DR (L243)–174Yb (Fluidigm). We washed thecells with 1ml of CSB and spun at 700g for 5 min at room temperature.After spin, we aspirated the buffer frompellet and added 1ml of 1:4 ratioof concentrate to diluent of a Foxp3/Transcription Factor StainingBuffer Set (eBioscience) supplemented with formaldehyde solution(F1268, Sigma-Aldrich) to a final concentration of 1.6%. We incubatedthe cells at room temperature on a gentle shaker in the dark for 45 min.Wewashed the cells with 2ml of CSB+ 0.3% saponin (CSB-S) and spunat 800g for 5min.We incubated the cell pellet with 1 ml of the following14 intracellular antibodies and 10 ml of a solution of iridium (1:25) inCSB-S brought to a total volume of 100 ml for 35min at room tempera-ture on a gentle shaker: anti-human IL-4 (MP4-25D2)–142 (Fluidigm),anti-mouse/human IL-5 (TRFK5)–143Nd (Fluidigm), anti-human IL-22(22URTI)–150Nd (Fluidigm), anti-human TNFa (Mab11)–152Sm(Fluidigm), anti-human IL-2 (MQ1-17H12)–158Gd (Fluidigm), anti-human IL-21–159Tb (BWH CyTOF Core), anti-human IFN-g (B27)–165Ho (Fluidigm), anti-human GATA3-166Er (BWH CyTOF Core),anti-human IL-9–172Yb (BWHCyTOF Core), anti-human perforin(B-D48)–175Lu (Fluidigm), anti-human IL-10–176Yb (BWH CyTOF

11 of 17

Page 12: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

Core), anti-human IL-17A–169Tm (BWH CyTOF Core), anti-humanFoxp3 (PCH101)–162Dy (Fluidigm), and anti-human Tbet-164Dy(BWH CyTOF Core). After incubation, we washed the cells in PBSand spun them at 800g for 5 min. We resuspended the pellet in 1 mlof 4% formaldehyde prepared in CSB and incubated the cells for 10minat room temperature on a gentle shaker, followed by another PBS washand spin at 800g for 5 min. We washed the pellet in 1 ml of Milli-Qdeionized water, spun at 800g for 6 min, and subsequently resuspendedthe resulting pellet in deionized water at a concentration of 700,000cells/ml for analysis via the CyTOF 2.We transferred the suspensions tonew FACS tubes through a 70-mm cell strainer and added MaxPar EQFour Element Calibration Beads (Fluidigm) at a ratio of 1:10 by volumebefore acquisition.

Mass cytometry panel designWe designed an antibody panel for mass cytometry with the goal ofboth accurately identifying CD4+ TEM populations and measuringcellular heterogeneity within these populations. We chose markers thatfell into one of five categories to generate a broadly informative panel:chemokine receptors, transcription factors, lineage markers, effectormolecules, and markers of cellular activation and exhaustion (table S1).

Mass cytometry data acquisitionWe analyzed samples at a concentration of 700,000 cells/ml on aFluidigm-DVSCyTOF2mass cytometer.We addedMaxPar 4-ElementEQ calibration beads to every sample that was run on the CyTOF 2,which allowed us to normalize variability in detector sensitivity forsamples run in different batches using previously described methods(64). We used staining for iridium and rhodiummetallointercalators toidentify viable singlet events. We excluded samples where acquisitionfailed or only yielded a fraction of the input cells (<5%). The criteriafor sample exclusion were not set prospectively but were maintainedduring all data collection runs.

Because samples were processed and analyzed on different dates,we ran equal numbers of cases and controls each time to guard againstbatch effects. However, because CyTOF data are very sensitive to day-to-day variability, we took extra steps to preprocess and normalize dataacross the entire study.

Flow cytometry sample preparationFor flow cytometry analysis of the validation and longitudinal bloodcohorts and synovial samples, cryopreserved cells were thawed intowarm RPMI/10% FBS, washed once in cold PBS, and stained in PBS/1%BSAwith the following antibodies for 45min: anti-CD27–fluoresceinisothiocyanate (FITC) (TB01), anti-CXCR3–phycoerythrin (PE) (CEW33D),anti-CD4–PE-Cy7 (RPA-T4), anti-ICOS–peridinin chlorophyll protein(PerCP)–Cy5.5 (ISA-3), anti-CXCR5–BV421 (J252D4), anti-CD45RA–BV510 (HI100), anti–HLA-DR–BV605 (G46-6), anti-CD49d–BV711(9F10), anti–PD-1–allophycocyanin (APC) (EH12.2H7), anti-CD3–Alexa Fluor 700 (HIT3A), anti-CD29–APC-Cy7 (TS2/16), and pro-pidium iodide. Cells were washed in cold PBS and passed through a70-mm filter, and data were acquired on a BD FACSAria Fusion or BDFortessa using FACSDiva software. Sampleswere analyzed in uniformlyprocessed batches containing both cases and controls.

Flow cytometry intracellular cytokine stainingEffector memory CD4+ T cells were purified from cryopreservedPBMCs by magnetic negative selection (Miltenyi) and rested overnightin RPMI/10% FBS medium. The following day, cells were stimulated

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

with PMA (50 ng/ml) and ionomycin (1 mg/ml) for 6 hours. BrefeldinA and monensin (both 1:1000; eBioscience) were added for the last5 hours. Cells were washed twice in cold PBS, incubated for 30 minwith Fixable Viability Dye eFluor 780 (eBioscience), washed in PBS/1%BSA, and stained with anti-CD4–BV650 (RPA-T4), anti-CD27–BV510(TB01), anti–HLA-DR–BV605 (G46-6), anti-CD20–APC-Cy7 (2H7),and anti-CD14–APC-Cy7 (M5E2). Cells were then washed, fixed, andpermeabilized using the eBioscience Transcription Factor Fix/PermBuffer. Cells were then washed in PBS/1% BSA/0.3% saponin and in-cubated with anti–IFN-g–FITC (B27), anti-TNF–PerCP/Cy5.5 (mAb11),anti–IL-10–PE (JES3-9D7), and anti–IL-2–PE/Cy7 (MQ1-17H12) oranti–granzyme A–AF647 (CB9) and anti-perforin–PE/Cy7 (B-D48)for 30 min, washed once, and filtered, and data were acquired on aBD Fortessa analyzer. Gates were drawn to identify singlet T cells byforward scatter (FSC)/side scatter (SSC) characteristics, and dead cellsand any contaminating monocytes and B cells were excluded by gatingout eFluor 780–positive, CD20+, and CD14+ events.

Synovial tissue processingSynovial samples were acquired after removal as part of standard of careduring arthroplasty surgery. Synovial tissue was isolated by carefuldissection, minced, and digested with Liberase TL (100 mg/ml) and de-oxyribonuclease I (100 mg/ml) (both Roche) in RPMI (Life Technologies)for 15 min, inverting every 5 min. Cells were passed through a 70-mmcell strainer, washed, subjected to red blood cell lysis, and cryopreservedin CryoStor CS10 (BioLife Solutions) for batched analyses.

RNA library preparation and sequencingSeven RA case samples and six OA control samples were flow sortedinto the following seven cellular subsets for low-input RNA-seq: TCM,TN, Treg, DR

+27+, DR+27−, DR−27+, and DR−27−. We rapidly thawedthe case and control samples of cryopreserved PBMCs, and MACSenriched the samples forCD4+T cells (Miltenyi).We rested the samplesovernight in complete RPMI/10% FBS. After the rest, we prepared thesamples for FACS.Wewashed the cells once in cold PBS and incubatedthemwith eBioscience Human Fc Receptor Binding Inhibitor (ThermoFisher). Subsequently, we stained the samples in PBS/5%FBS for 45minwith the following antibodies: FITC CD27 (BioLegend), PE CD25(BioLegend), PE/Cy7 CD127(IL-7Ra) (BioLegend), Brilliant Violet510 HLA-DR (BioLegend), Brilliant Violet 605 CD45RA (BioLegend),APC CD62L (BioLegend), Alexa Fluor 700 CD4 (BioLegend), APC/Cy7CD14 (BioLegend), and APC/Cy7 CD19 (BioLegend). After staining, wewashed the cells in cold PBS, passed them through a 70-mm filter, andacquired the samples on a BD FACSAria Fusion cytometer. One thou-sand cells from each subset were sorted and collected into 5 ml of TCLlysis buffer (Qiagen) with 1% b-me. We processed and collected thesamples uniformly in batches containing both cases and controls andrandomized the samples within the plate. We prepared sequencinglibraries using the Smart-Seq2 protocol. Sequenced libraries were pooledand sequenced with the Illumina HiSeq 2500 using 25–base pair paired-end reads. We removed one outlier with low read depth. The remaininglibraries were sequenced to a depth of 6 million to 19 million reads.

Flow cytometry data analysisFlow cytometry data were analyzed using FlowJo 10.0.7 (TreeStar Inc.),with serial gates drawn to identify singlet lymphocytes by FSC/SSCcharacteristics. Viable memory CD4+ T cells were identified as propi-dium iodide–negative CD3+CD4+CD45RA– cells.We then calculated thefrequency of various populations from the pool of memory CD4+ T cells.

12 of 17

Page 13: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

Gene expression quantificationWe quantified complementary DNAs on canonical chromosomes(autosomal, X, Y, and mitochondrial) in Ensembl release 83 withKallisto v0.43.1 in TPM. This analysis quantified inferred counts andlength-normalized expression (TPM) of transcripts. To quantify geneexpression, we collapsed transcripts mapping to the same HGNC genesymbol by summing the TPM values over. We filtered transcripts thatwere not sufficiently well expressed, omitting those that did not haveat least five counts in at least 10 samples. This resulted in 15,234 well-expressed genes out of 27,717 total genes. For differential expressionanalysis and PCA, we used log (base 2)–transformed TPM values.

Principal components analysisUsing the gene-level log2 TPM expression metric described above, weselected the top 500most variably expressed genes for PCA, excludinggenes with lower than 1 mean or 0.75 SD expression. We used the re-moveBatchEffect function in the R limma package, with default para-meters, to regress out donor-specific contributions to gene expression. Toprevent the absolute range of a strongly expressed gene from dominatingthe signal in the PCA, we scaled gene expression using the base R scalefunction on the rows of the expressionmatrix. This function centers andscales a vector by subtracting the mean and dividing by SD. We thenrenormalized the samples by centering and scaling the columns, withthe same R function. Finally, we used prcomp in R to perform PCAon the resulting gene expression matrix.

Correlation analysisFor individual genes, we computed association of their expression withthe proposed ordering of cell types, along the naïve to effector gradient.In this association, we modeled expression as a linear function of celltype, encoded as an ordinal variable, and donor, one-hot encoded asa categorical variable. The statistical significance of the association wasestimated with the lm function in R, followed by adjustment using theBenjamini-Hochberg procedure.

Gene set enrichment analysisWe performed GSEA using the R package gage directly on orderingsdefined by previous analyses. To enrich pathways from the PCA results,we used gene loadings for each principal component. For differentialexpression, we used the estimated t statistics. To produce barcodeplots for select pathways, we reanalyzed these pathways using theR package liger.

Mixed-effects modeling of associations of single cellsHere, we present a flexible method of finding significant associationsbetween subset abundance and case-control status that we havenamed MASC. The MASC framework has three steps: (i) stringentquality control, (ii) definition of population clusters, and (iii) associationtesting.Here, we assume thatwe have single-cell assays each quantifyingM possiblemarkers, wheremarkers can be genes (RNA-seq) or proteins(cytometry).Quality controlTo mitigate the influence of batch effects and spurious clusters, we firstremoved poorly recorded events and low-qualitymarkers before furtheranalysis. We removed those markers (i) that have little expression, be-cause these markers are not informative, and (ii) with significant batchvariability. First, we concatenated samples by batch and measured thefraction of cells negative and positive for each marker. We thencalculated the ratio of between-batch variance to total variance for each

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

marker’s negative and positive populations, allowing us to rank andretain 20 markers that were the least variable between batches. Wealso removedmarkers that were either uniformly negative or positiveacross batches, because this indicated that the antibody for thatmarker was not binding specifically to its target. For single-cell tran-scriptomic data, an analogous step would involve removing geneswith low numbers of supporting reads or genes whose expressionvaries widely between batches.

Once low-qualitymarkerswere identified and removed,we removedevents that were likely to be artifacts. We first removed events that hadextremely high signal for a single marker: Events that have recordedexpression values at or above the 99.9th percentile for that marker areremoved. These eventswere considered unlikely to be intact, viable cells.Next, a composite “information content” score (Eq. 1) for each event iwas created in the following manner: The expression x for each markerM is rescaled from 0 to 1 across the entire dataset to create normalizedexpression values yi for each event i. The sum of these normalized ex-pression values was used to create the event’s information content score.

INFOi ¼ ∑m¼M

m¼1yi;m ð1Þ

The information content score reflects that events with little to noexpression in every channel are less informative than events that havemore recorded expression. Events with low scores (INFOi < 0.05) wereconsidered unlikely to be informative in downstream analysis and wereremoved. In addition, events that derived more than half of theirinformation content score from expression in a single channel were alsoremoved (Eq. 2):

INFOi � 0:5 < maxm∈Mðyi;mÞ ð2Þ

Potential explanations for these events include poorly stained cells orartifacts caused by the clumping of antibodies with DNA fragments. Afinal filtering step retained events that were recorded as having detect-able expression in at least Mmin markers, where Mmin may vary fromexperiment to experiment based on the panel design and expected levelof coexpression between channels. The quality control steps describedhere are specific for mass cytometry analysis and will need to be opti-mized for use with transcriptomic data.ClusteringAfter applying quality control measures to each sample, we combineddata from cases and controls into a single dataset. It was critical toensure that each sample contributed equal numbers of cells to thisdataset, because otherwise the largest samples would dominate theanalysis and confound association testing. After sampling an equalnumber of cells from each sample, we partitioned these cells into popu-lations using DensVM (26), which performs unsupervised clusteringbased on marker expression. We note that partitioning the data canbe accomplished with different clustering approaches, such as SPADEor PhenoGraph for mass cytometry data, or even by using traditionalbivariate gating, because MASC is not dependent on any particularmethod of clustering (fig. S5).Association testingOnce all cells were assigned to a given cluster, the relationship betweensingle cells and clusters wasmodeled usingmixed-effects logistic regres-sion to account for donor or technical variation (Eq. 3).Wemodeled the

13 of 17

Page 14: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

age and sex of sample k as fixed-effect covariates, whereas the donor andbatch that cell i belongs toweremodeled as randomeffects. The randomeffects variance-covariance matrix treated each sample and batch asindependent Gaussians. Each cluster was individually modeled. Notethat this baselinemodel did not explicitly include any single-cell expres-sion measures.

logYi;j

1� Yi;j

� �¼ qj þ bclinicalXi;k þ ðfijkÞ þ ðkijmÞ ð3Þ

whereYi,j is the odds of cell i belonging to cluster j, qj is the intercept forcluster j, bclinical is a vector of clinical covariates for the kth sample, (fi|k)is the random effect for cell i from kth sample, and (ki|m) is the randomeffect for cell i from batch m.

To determinewhether any clusterswere associatedwith case-controlstatus, we included an additional covariate that indicated whether thekth sample is a case or control (Eq. 4):

logYi;j

1� Yi;j

� �¼ qj þ bclinicalXi;k þ ðfijkÞ þ ðkijmÞ þ bcaseXi;k ð4Þ

Here, Yi,j is the odds of cell i belonging to cluster j, qj is the interceptfor cluster j, bclinical is a vector of clinical covariates for the kth sample,(fi|k) is the random effect for cell i from kth sample, (ki|m) is therandom effect for cell i from batch m, and bcase indicates the effectof kth sample’s case-control status.

D ¼ �2 * lnlikelihood for null modellikelihood for full model

� �ð5Þ

p ¼ 1� tðv�2Þ=2e�t=2

2v=2G v2

� � !

ð6Þ

We compared the twomodels using a likelihood ratio test (Eq. 5) tofind the test statistic D, which is the ratio of the likelihoods for thebaseline and full models. The term D is distributed under the null by ac2 distribution with 1 df, because there is only one additional parameterin the full model compared to the null (case-control status).We deriveda P value by comparing test statistic D of the likelihood ratio test to thevalue of the c2 distribution with 1 df (Eq. 6), allowing us to find clustersin which case-control status significantly improves model fit. A signif-icant result (P < 0.05 after multiple testing correction) indicated thatcluster membership for a single cell is influenced by case-control statusafter accounting for technical and clinical covariates. The effect size ofthe case-control association can be estimated by calculating the ORfrom bcase. If a dataset includes multiple groups, then we can test forassociation between g groups using g − 1 indicator variables. This ap-proach allowed us to capture interindividual differences between donorsand model the influence of technical and clinical covariates that mightinfluence a cell to be included as amember of one cluster versus another.

Mass cytometry data analysisWe analyzed 50magnetically sorted peripheral blood samples (26 casesand 24 controls) under the resting condition and 52 samples (26 casesand 26 controls) in the stimulated condition by CyTOF. Here, we stimu-

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

lated samples by incubating them with Human T-Activator CD3/CD28Dynabeads (Thermo Fisher) at a density of 1 bead:2 cells for 24 hours.Two samples were only analyzed after stimulation due to low numbers ofavailable PBMCs.We ran aliquots of a standard PBMC sample alongsidecases and controls with each CyTOF run to allow us to measure batchvariability directly, because these aliquots should not be biologically dis-similar. We then used these data to find markers that had stained poorlyor varied significantly between batches and removed them from analysis.After acquisition, each sample was gated to a CD4+, CD45RO+ popula-tion using FlowJo 10.1 (TreeStar) and combined into a single datasetbefore analyzing the data using MASC, as previously described. We per-formed the data filtration steps requiring cells to demonstratemeasurableexpression (arcsinh-transformed expression > 0) in at least five markers(Mmin = 5). This removed 3.0 to 6.0% of all events captured in each sam-ple.We first removed the initial noise factor applied to all zero expressionvalues in mass cytometry by subtracting 1 from expression values andsetting any negative values to 0 and then applied the inverse hyperbolicsine transform with a cofactor of 5 to the raw expression data, using thefollowing equation: y ¼ sinh�1 maxðx�1;0Þ

5To partition the data, we first randomly selected 1000 cells from

each sample and applied the t-SNE algorithm [Barnes-Hut implemen-tation (27)] to the reduced dataset with the following parameters:perplexity = 30 and q = 0.5. We did not include channels for CD4 orCD45RO in the t-SNE clustering because these markers were only usedin gating samples to confirm the purity of CD4 memory T cell selection.We performed separate t-SNE projections for resting and stimulatedcells. To identify high-dimensional populations, we used a modifiedversion of DensVM (26). DensVM performs kernel density estimationacross the dimensionally reduced t-SNE map to build a training set andthen assigns cells to clusters by their expression of all markers using asupport vector machine (SVM) classifier. We modified the DensVMcode to increase the range of potential bandwidths searched duringthe density estimation step and to return the SVM model generatedfrom the t-SNE projection.

To create elbow plots, we ran DensVM using 25 bandwidth settingsevenly spaced along the interval [0.61, 5] for the resting data and [0.66, 5]for the stimulated data. We normalized marker expression to mean 0,variance 1 in each dataset before calculating the fraction of total varianceexplained by between-cluster variance.

Association testing for each cluster was performed using mixed-effects logistic regression according to the MASC method. Donorand batch were included as random-effect covariates, donor sex andage were included as a fixed-effect covariate, and donors were labeledas either cases (RA) or controls (OA). To confirm the associations foundby MASC, we conducted exact permutation testing in which we per-muted the association between case-control status and samples withinbatches 10,000 times, measuring the fraction of cells from RA samplesthat contributed to each cluster in each permutation. This allowed us tobuild an empirical null for the case-control skew of each cluster, and wecould then determine for each cluster how often a skew equal orgreater to the observed skew occurred. We adjusted the P valuesfor the number of tests we performed (the number of clusters ana-lyzed in each condition) using the Bonferroni correction.Cluster alignmentWe aligned subsets between experiments using the following strategy:In each experiment, expression data were first scaled to mean 0, vari-ance 1 to account for differences in sensitivity. We used the mean ex-pression value formarker column to define a centroid for each cluster inboth datasets. For a given cluster in the first dataset (query dataset), we

14 of 17

Page 15: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

calculated the Euclidean distance (Eq. 7) between that cluster andcluster centroids in the second dataset (target dataset) across all sharedmarkers. The cluster that ismost similar to the cluster in query dataset isthe cluster with the lowest distance in the target dataset, relative to allother clusters in that dataset.

distðq; tÞ ¼ ∑K

k¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðqk � tkÞ2

qð7Þ

Here, q refers to the query cluster and t refers to the target cluster,whereas qk and tk indicate the normalized mean expression of markerk (out of K total markers) in clusters q and t, respectively.Marker informativeness metricWe wanted to determine which markers best separated a given popu-lation from the rest of the data in a quantitativemanner, because findinga set of population-specific markers is crucial for isolating the popu-lation in vivo. To do this, we examined the distribution of expressionfor each marker individually in the entire dataset (Q) and the popula-tion of interest (P). We then grouped expression values for P and Qinto 100 bins, normalized the binned vector to 1, and calculated theKLdivergence from Q to P for that marker with the following equation:

DKLðP jjQÞ ¼ ∑i¼100

i¼1Pilog

Pi

Qi

The divergence score can be interpreted as a measure of how muchthe distribution of expression for a given marker in the entire datasetresembles the distribution of expression for that marker in the popula-tion of interest. Higher scores represent lower similarity of the marker’sexpression distributions for P and Q, indicating that the expressionprofile of thatmarker ismore specific for that population. By calculatingthis score for every marker, we can rank and identify markers that bestdifferentiate the population of interest from the dataset.Biaxial gating and cluster overlapTo determine the concordance between biaxial gating of CD27− HLA-DR+ cells and cluster 18, we first selected cells that had normalizedexpression values of CD27 < 1 andHLA-DR≥ 1.We then calculatedan F-measure statistic between the cells selected using the CD27 andHLA-DR gates and each cluster identified in the resting and stimulateddatasets (Eq. 8). Here, precision is defined as the number of cells in eachcluster tested that fall into the CD27− HLA-DR+ gate, whereas recall isdefined as the number of cells gated as CD27− HLA-DR+ that are ineach cluster.

Fmeasure ¼ 2� precision� recallprecisionþ recall

ð8Þ

Clustering informativeness metricWe compared clustering sets that contained either 19 (FlowSOM andDensVM) or 21 (PhenoGraph) clusters. We independently clusteredthe resting datasetwith PhenoGraph andFlowSOMusing the same cellsand markers used to cluster the data with DensVM. We set k to 19 forFlowSOM clustering to match the number of clusters found byDensVM; for PhenoGraph, we used the default setting of k = 30.

To evaluate and quantify the ability of different clustering algorithmsto define clusters that was explaining marker fluctuations, we definedan information theory–basedmetric to evaluate the relative information

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

content captured by each set of clusters in terms ofmarker intensity.Weselected this approach because it is separate from the objective functionsthat the clustering algorithms were attempting to optimize.

First, for each cell, we normalized marker intensities so that theysummed to one. Then, we defined a nullQi representing the average nor-malized intensity formarker i across all cells.We also defined Pi,j, which isthe mean intensity ofmarker i of cells from cluster j. Then, for each clusterj, we calculate their KL divergence for each of theMmarkers (Eq. 9).

DKL;jðPj jjQÞ ¼ ∑M

i Pi;jlnPi;j

Qið9Þ

A cluster with low divergence from the average expression ofmarkersacross the entire dataset will capture lessmarker intensity informationthan onewith a high divergence, because biologically valid clusters willhave unique marker profiles that differ greatly from one another andfrom the averagemarker expression profile.We defined a similarmetricto quantify the extent to which individual batches were accounting fordifferences in cluster composition. In this instance, we calculated Pi,j,which is the proportion of cells from cluster j that batch i contributed.We also calculate Qi, which is the proportion of cells that batch icontributes overall to the dataset. With this definition, we calculatethe KL divergence for each of the M batches (Eq. 10).

DKL;jðPj jjQÞ ¼ ∑M

i Pi;jlnPi;j

Qið10Þ

A cluster that contains cells with low divergence from the nulldistribution of cells across batches is affected less by batch effects thanone with a high divergence score, and a cluster completely free of batcheffects should have a KL divergence of zero.Autoencoder clusteringWe performed clustering analysis using a deep autoencoder and aGaussian mixture model (GMM). The autoencoder was designed withan architecture of three hidden layers, with depths of 8, 2, and 8 nodes,respectively. We trained the model with no regularization and 300epochs. The two modes from the middle hidden layer were then usedas features to learn a GMM. The number of clusters was chosen a priorito match the number discovered in the DensVM analysis. All para-meters not specific in this section were set to default values.Meta-analysisWe used Stouffer’s Z-score method to define a meta-analysis P valuefor the significant expansion of CD27− HLA-DR+ cells in RA. Weconverted P values from the restingmass cytometry and flow cytometryanalyses to Z scores and found a meta-analysis Z score by taking thesum of these scores divided by the square root of the number ofscores—which was two, in our case.We then derived a meta-analysisP value from the Z score using the standard normal distribution.

All analyses were performed using custom scripts for R 3.4.0. Weused the following packages: flowCore (65) to read and process FCS filesfor further analysis, lme4 (66) to apply mixed-effects logistic regression,ggplot2 (67), pheatmap (68) for data visualization, and cytofkit (69) forthe implementation of FlowSOM and PhenoGraph clustering algo-rithms. RNA-seq analyses were conducted using Kallisto (37) to alignreads and gage (70) and liger (71) to perform GSEA. The h2o (72)and mclust (73) packages were used to implement the autoencoderclustering method.

15 of 17

Page 16: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

D

SUPPLEMENTARY MATERIALSwww.sciencetranslationalmedicine.org/cgi/content/full/10/463/eaaq0305/DC1Fig. S1. MASC type 1 error.Fig. S2. t-SNE projection density.Fig. S3. CIM analysis of clustering approaches.Fig. S4. DensVM clustering of elbow plots.Fig. S5. Marker expression distribution plots for DensVM clusters.Fig. S6. Association permutation testing and cluster alignment.Fig. S7. PhenoGraph and FlowSOM clustering.Fig. S8. Association testing with Citrus.Fig. S9. Flow cytometry and RNA-seq gating strategies.Fig. S10. CD27 and HLA-DR expression in flow cytometry cohort.Fig. S11. CD4+ TEM populations in a clinical response cohort.Fig. S12. CD27− HLA-DR+ frequency and clinical characteristics.Fig. S13. RNA-seq analysis of CD4+ T cell subsets.Fig. S14. Flow cytometry expression quantification.Fig. S15. Using a neural network autoencoder to cluster mass cytometry data.Table S1. Panel design for mass cytometry experiments.Table S2. MASC analysis of the 19 clusters identified in the resting dataset.Table S3. MASC analysis of the 21 clusters identified in the stimulated dataset.Table S4. GSEA of genes differentially expressed in CD27− HLA-DR+ cells.

by guest on October 2, 2020

http://stm.sciencem

ag.org/ow

nloaded from

REFERENCES AND NOTES1. K. E. Neu, Q. Tang, P. C. Wilson, A. A. Khan, Single-cell genomics: Approaches and utility in

immunology. Trends Immunol. 38, 140–149 (2017).2. M. Cross, E. Smith, D. Hoy, L. Carmona, F. Wolfe, T. Vos, B. Williams, S. Gabriel, M. Lassere,

N. Johns, R. Buchbinder, A. Woolf, L. March, The global burden of rheumatoid arthritis:Estimates from the global burden of disease 2010 study. Ann. Rheum. Dis. 73, 1316–1322(2014).

3. S. Viatte, D. Plant, S. Raychaudhuri, Genetics and epigenetics of rheumatoid arthritis. Nat.Rev. Rheumatol. 9, 141–153 (2013).

4. J. Kurkó, T. Besenyei, J. Laki, T. T. Glant, K. Mikecz, Z. Szekanecz, Genetics of rheumatoidarthritis—A comprehensive review. Clin. Rev. Allergy Immunol. 45, 170–179 (2013).

5. A. M. Gizinski, D. A. Fox, T cell subsets and their role in the pathogenesis of rheumaticdisease. Curr. Opin. Rheumatol. 26, 204–210 (2014).

6. W. Wang, S. Shao, Z. Jiao, M. Guo, H. Xu, S. Wang, The Th17/Treg imbalance and cytokineenvironment in peripheral blood of patients with rheumatoid arthritis. Rheumatol. Int. 32,887–893 (2012).

7. S. Raychaudhuri, C. Sandor, E. A. Stahl, J. Freudenberg, H.-S. Lee, X. Jia, L. Alfredsson,L. Padyukov, L. Klareskog, J. Worthington, K. A. Siminovitch, S.-C. Bae, R. M. Plenge,P. K. Gregersen, P. I. W. de Bakker, Five amino acids in three HLA proteins explain most ofthe association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44,291–296 (2012).

8. C. Perricone, F. Ceccarelli, G. Valesini, An overview on the genetic of rheumatoid arthritis:A never-ending story. Autoimmun. Rev. 10, 599–608 (2011).

9. D. Diogo, Y. Okada, R. M. Plenge, Genome-wide association studies to advance ourunderstanding of critical cell types and pathways in rheumatoid arthritis: Recent findingsand challenges. Curr. Opin. Rheumatol. 26, 85–92 (2014).

10. S. Eyre, J. Bowes, D. Diogo, A. Lee, A. Barton, P. Martin, A. Zhernakova, E. Stahl, S. Viatte,K. McAllister, C. I. Amos, L. Padyukov, R. E. M. Toes, T. W. J. Huizinga, C. Wijmenga,G. Trynka, L. Franke, H.-J. Westra, L. Alfredsson, X. Hu, C. Sandor, P. I. W. de Bakker,S. Davila, C. C. Khor, K. K. Heng, R. Andrews, S. Edkins, S. E. Hunt, C. Langford, D. Symmons;Biologics in Rheumatoid Arthritis Genetics and Genomics Study Syndicate; WellcomeTrust Case Control Consortium; P. Concannon, S. Onengut-Gumuscu, S. S. Rich,P. Deloukas, M. A. Gonzalez-Gay, L. Rodriguez-Rodriguez, L. Ärlsetig, J. Martin,S. Rantapää-Dahlqvist, R. M. Plenge, S. Raychaudhuri, L. Klareskog, P. K. Gregersen,J. Worthington, High-density genetic mapping identifies new susceptibility loci forrheumatoid arthritis. Nat. Genet. 44, 1336–1340 (2012).

11. Y. Okada, D. Wu, G. Trynka, T. Raj, C. Terao, K. Ikari, Y. Kochi, K. Ohmura, A. Suzuki,S. Yoshida, R. R. Graham, A. Manoharan, W. Ortmann, T. Bhangale, J. C. Denny, R. J. Carroll,A. E. Eyler, J. D. Greenberg, J. M. Kremer, D. A. Pappas, L. Jiang, J. Yin, L. Ye, D.-F. Su,J. Yang, G. Xie, E. Keystone, H.-J. Westra, T. Esko, A. Metspalu, X. Zhou, N. Gupta, D. Mirel,E. A. Stahl, D. Diogo, J. Cui, K. Liao, M. H. Guo, K. Myouzen, T. Kawaguchi, M. J. H. Coenen,P. L. C. M. van Riel, M. A. F. J. van de Laar, H.-J. Guchelaar, T. W. J. Huizinga, P. Dieudé,X. Mariette, S. L. Bridges Jr., A. Zhernakova, R. E. M. Toes, P. P. Tak, C. Miceli-Richard,S.-Y. Bang, H.-S. Lee, J. Martin, M. A. Gonzalez-Gay, L. Rodriguez-Rodriguez,S. Rantapää-Dahlqvist, L. Ärlestig, H. K. Choi, Y. Kamatani, P. Galan, M. Lathrop; RACIConsortium; GARNET consortium; S. Eyre, J. Bowes, A. Barton, N. de Vries, L. W. Moreland,L. A. Criswell, E. W. Karlson, A. Taniguchi, R. Yamada, M. Kubo, J. S. Liu, S.-C. Bae,J. Worthington, L. Padyukov, L. Klareskog, P. K. Gregersen, S. Raychaudhuri, B. E. Stranger,

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

P. L. De Jager, L. Franke, P. M. Visscher, M. A. Brown, H. Yamanaka, T. Mimori, A. Takahashi,H. Xu, T. W. Behrens, K. A. Siminovitch, S. Momohara, F. Matsuda, K. Yamamoto,R. M. Plenge, Genetics of rheumatoid arthritis contributes to biology and drug discovery.Nature 506, 376–381 (2014).

12. X. Hu, H. Kim, E. Stahl, R. Plenge, M. Daly, S. Raychaudhuri, Integrating autoimmune riskloci with gene-expression data identifies specific pathogenic immune cell subsets.Am. J. Hum. Genet. 89, 496–506 (2011).

13. D. A. Rao, M. F. Gurish, J. L. Marshall, K. Slowikowski, C. Y. Fonseka, Y. Liu, L. T. Donlin,L. A. Henderson, K. Wei, F. Mizoguchi, N. C. Teslovich, M. E. Weinblatt, E. M. Massarotti,J. S. Coblyn, S. M. Helfgott, Y. C. Lee, D. J. Todd, V. P. Bykerk, S. M. Goodman, A. B. Pernis,L. B. Ivashkiv, E. W. Karlson, P. A. Nigrovic, A. Filer, C. D. Buckley, J. A. Lederer,S. Raychaudhuri, M. B. Brenner, Pathologically expanded peripheral T helper cell subsetdrives B cells in rheumatoid arthritis. Nature 542, 110–114 (2017).

14. P. B. Martens, J. J. Goronzy, D. Schaid, C. M. Weyand, Expansion of unusual CD4+ T cells insevere rheumatoid arthritis. Arthritis Rheum. 40, 1106–1114 (1997).

15. M. Scarsi, T. Ziglioli, P. Airò, Decreased circulating CD28-negative T cells in patients withrheumatoid arthritis treated with abatacept are correlated with clinical response.J. Rheumatol. 37, 911–916 (2010).

16. K. J. Warrington, S. Takemura, J. J. Goronzy, C. M. Weyand, CD4+,CD28- T cells inrheumatoid arthritis patients combine features of the innate and adaptive immunesystems. Arthritis Rheum. 44, 13–20 (2001).

17. C. Y. Fonseka, D. A. Rao, S. Raychaudhuri, Leveraging blood and tissue CD4+ T cellheterogeneity at the single cell level to identify mechanisms of disease in rheumatoidarthritis. Curr. Opin. Immunol. 49, 27–36 (2017).

18. S. C. Bendall, E. F. Simonds, P. Qiu, E.-a. D. Amir, P. O. Krutzik, R. Finck, R. V. Bruggner,R. Melamed, A. Trejo, O. I. Ornatsky, R. S. Balderas, S. K. Plevritis, K. Sachs, D. Pe’er,S. D. Tanner, G. P. Nolan, Single-cell mass cytometry of differential immune and drugresponses across a human hematopoietic continuum. Science 332, 687–696 (2011).

19. T. Sörensen, S. Baumgart, P. Durek, A. Grützkau, T. Häupl, immunoClust—An automatedanalysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry 87, 603–615 (2015).

20. N. Samusik, Z. Good, M. H. Spitzer, K. L. Davis, G. P. Nolan, Automated mapping ofphenotype space with single-cell data. Nat. Methods 13, 493–496 (2016).

21. P. Lin, M. Troup, J. W. K. Ho, CIDR: Ultrafast and accurate clustering through imputationfor single-cell RNA-seq data. Genome Biol. 18, 59 (2017).

22. E. Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. R. Bialas,N. Kamitaki, E. M. Martersteck, J. J. Trombetta, D. A. Weitz, J. R. Sanes, A. K. Shalek,A. Regev, S. A. McCarroll, Highly parallel genome-wide expression profiling of individualcells using nanoliter droplets. Cell 161, 1202–1214 (2015).

23. J. H. Levine, E. F. Simonds, S. C. Bendall, K. L. Davis, E.-a. D. Amir, M. D. Tadmor, O. Litvin,H. G. Fienberg, A. Jager, E. R. Zunder, R. Finck, A. L. Gedman, I. Radtke, J. R. Downing,D. Pe’er, G. P. Nolan, Data-driven phenotypic dissection of AML reveals progenitor-likecells that correlate with prognosis. Cell 162, 184–197 (2015).

24. O. Ornatsky, D. Bandura, V. Baranov, M. Nitz, M. A. Winnik, S. Tanner, Highlymultiparametric analysis by mass cytometry. J. Immunol. Methods 361, 1–20 (2010).

25. J. Ermann, D. A. Rao, N. C. Teslovich, M. B. Brenner, S. Raychaudhuri, Immune cell profiling toguide therapeutic decisions in rheumatic diseases. Nat. Rev. Rheumatol. 11, 541–551 (2015).

26. B. Becher, A. Schlitzer, J. Chen, F. Mair, H. R. Sumatoh, K. W. W. Teng, D. Low, C. Ruedl,P. Riccardi-Castagnoli, M. Poidinger, M. Greter, F. Ginhoux, E. W. Newell, High-dimensionalanalysis of the murine myeloid cell system. Nat. Immunol. 15, 1181–1189 (2014).

27. L. van der Maaten, Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res.15, 3221–3245 (2014).

28. S. Van Gassen, B. Callebaut, M. J. Van Helden, B. N. Lambrecht, P. Demeester, T. Dhaene,Y. Saeys, FlowSOM: Using self-organizing maps for visualization and interpretation ofcytometry data. Cytometry A 87, 636–645 (2015).

29. L. M. Weber, M. D. Robinson, Comparison of clustering methods for high-dimensionalsingle-cell flow and mass cytometry data. Cytometry A 89, 1084–1096 (2016).

30. M. T. Wong, D. E. H. Ong, F. S. H. Lim, K. W. W. Teng, N. McGovern, S. Narayanan,W. Q. Ho, D. Cerny, H. K. K. Tan, R. Anicete, B. K. Tan, T. K. H. Lim, C. Y. Chan, P. C. Cheow,S. Y. Lee, A. M. P. Takano, E. H. Tan, J. K. C. Tam, E. C. Y. Tan, J. K. Y. Chan, K. Fink,A. Bertoletti, F. Ginhoux, M. A. Curotto de Lafaille, E. W. Newell, A high-dimensional atlasof human T cell diversity reveals tissue-specific trafficking and cytokine signatures.Immunity 45, 442–456 (2016).

31. Y. Cheng, M. T. Wong, L. van der Maaten, E. W. Newell, Categorical analysis of humanT cell heterogeneity with one-dimensional soli-expression by nonlinear stochasticembedding. J. Immunol. 196, 924–932 (2016).

32. M. T. Wong, J. Chen, S. Narayanan, W. Lin, R. Anicete, H. T. K. Kiaang,M. A. Curotto de Lafaille, M. Poidinger, E. W. Newell, Mapping the diversity of follicularhelper T cells in human blood and tonsils using high-dimensional mass cytometryanalysis. Cell Rep. 11, 1822–1833 (2015).

33. C. Baecher-Allan, E. Wolf, D. A. Hafler, MHC class II expression identifies functionallydistinct human regulatory T cells. J. Immunol. 176, 4622–4631 (2006).

16 of 17

Page 17: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

SC I ENCE TRANS LAT IONAL MED I C I N E | R E S EARCH RE SOURCE

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from

34. R. V. Bruggner, B. Bodenmiller, D. L. Dill, R. J. Tibshirani, G. P. Nolan, Automatedidentification of stratifying signatures in cellular subpopulations. Proc. Natl. Acad. Sci. U.S.A.111, E2770–E2777 (2014).

35. S. Oshima, D. D. Eckels, Selective expression of class II MHC isotypes by MLC-activatedhuman T lymphocytes. Hum. Immunol. 27, 208–219 (1990).

36. D. Aletaha, V. P. K. Nell, T. Stamm, M. Uffmann, S. Pflugbeil, K. Machold, J. S. Smolen, Acutephase reactants add little to composite disease activity indices for rheumatoid arthritis:Validation of a clinical activity score. Arthritis Res. Ther. 7, R796–R806 (2005).

37. N. L. Bray, H. Pimentel, P. Melsted, L. Pachter, Near-optimal probabilistic RNA-seqquantification. Nat. Biotechnol. 34, 525–527 (2016).

38. G. M. Mason, K. Lowe, R. Melchiotti, R. Ellis, E. de Rinaldis, M. Peakman, S. Heck,G. Lombardi, T. I. M. Tree, Phenotypic complexity of the human regulatory T cellcompartment revealed by mass cytometry. J. Immunol. 195, 2030–2037 (2015).

39. S. C. Bendall, K. L. Davis, E.-a. D. Amir, M. D. Tadmor, E. F. Simonds, T. J. Chen,D. K. Shenfeld, G. P. Nolan, D. Pe’er, Single-cell trajectory detection uncoversprogression and regulatory coordination in human B cell development. Cell 157,714–725 (2014).

40. P. B. Ferrell Jr., K. E. Diggins, H. G. Polikowsky, S. R. Mohan, A. C. Seegmiller, J. M. Irish,High-dimensional analysis of acute myeloid leukemia reveals phenotypic changes inpersistent cells during induction therapy. PLOS ONE 11, e0153207 (2016).

41. M. Setty, M. D. Tadmor, S. Reich-Zeliger, O. Angel, T. M. Salame, P. Kathail, K. Choi,S. Bendall, N. Friedman, D. Pe’er, Wishbone identifies bifurcating developmentaltrajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

42. M. Mingueneau, S. Boudaoud, S. Haskett, T. L. Reynolds, G. Nocturne, E. C. Norton,X. Zhang, M. Constant, D. Park, W. Wang, T. Lazure, C. Le Pajolec, A. Ergun,X. Mariette, Cytometry by time-of-flight immunophenotyping identifies a bloodSjögren’s signature correlating with disease activity and glandular inflammation.J. Allergy Clin. Immunol. 137, 1809–1821.e12 (2016).

43. L. Hansmann, L. Blum, C.-H. Ju, M. Liedtke, W. H. Robinson, M. M. Davis, Mass cytometryanalysis shows that a novel memory phenotype B cell is expanded in multiple myeloma.Cancer Immunol. Res. 3, 650–660 (2015).

44. D. M. Strauss-Albee, J. Fukuyama, E. C. Liang, Y. Yao, J. A. Jarrell, A. L. Drake, J. Kinuthia,R. R. Montgomery, G. John-Stewart, S. Holmes, C. A. Blish, Human NK cell repertoirediversity reflects immune experience and correlates with viral susceptibility. Sci. Transl.Med. 7, 297ra115 (2015).

45. B. Gaudillière, G. K. Fragiadakis, R. V. Bruggner, M. Nicolau, R. Finck, M. Tingle, J. Silva,E. A. Ganio, C. G. Yeh, W. J. Maloney, J. I. Huddleston, S. B. Goodman, M. M. Davis,S. C. Bendall, W. J. Fantl, M. S. Angst, G. P. Nolan, Clinical recovery from surgery correlateswith single-cell immune signatures. Sci. Transl. Med. 6, 255ra131 (2014).

46. Y. Saeys, S. Van Gassen, B. N. Lambrecht, Computational flow cytometry: Helping to makesense of high-dimensional immunology data. Nat. Rev. Immunol. 16, 449–462 (2016).

47. A. Wagner, A. Regev, N. Yosef, Revealing the vectors of cellular identity with single-cellgenomics. Nat. Biotechnol. 34, 1145–1160 (2016).

48. C. Chester, H. T. Maecker, Algorithmic tools for mining high-dimensional cytometry data.J. Immunol. 195, 773–779 (2015).

49. D. Y. Orlova, L. A. Herzenberg, G. Walther, Science not art: Statistically sound methods foridentifying subsets in multi-dimensional flow and mass cytometry data sets. Nat. Rev.Immunol. 18, 77 (2017).

50. Y. Saeys, S. Van Gassen, B. Lambrecht, Response to Orlova et al. “Science not art:Statistically sound methods for identifying subsets in multi-dimensional flow and masscytometry data sets”. Nat. Rev. Immunol. 18, 78 (2017).

51. X. Qiu, A. Hill, J. Packer, D. Lin, Y.-A. Ma, C. Trapnell, Single-cell mRNA quantification anddifferential analysis with Census. Nat. Methods 14, 309–315 (2017).

52. A. Larbi, T. Fulop, From “truly naïve” to “exhausted senescent” T cells: When markerspredict functionality. Cytometry A 85, 25–35 (2014).

53. R. de Jong, M. Brouwer, B. Hooibrink, T. van der Pouw-Kraan, F. Miedema, R. A. W. van Lier,The CD27– subset of peripheral blood memory CD4+ lymphocytes contains functionallydifferentiated T lymphocytes that develop by persistent antigenic stimulation in vivo.Eur. J. Immunol. 22, 993–999 (1992).

54. A. Schiött, M. Lindstedt, B. Johansson-Lindbom, E. Roggen, C. A. K. Borrebaeck,CD27– CD4+ memory T cells define a differentiated memory population at both thefunctional and transcriptional levels. Immunology 113, 363–370 (2004).

55. V. Appay, J. J. Zaunders, L. Papagno, J. Sutton, A. Jaramillo, A. Waters, P. Easterbrook,P. Grey, D. Smith, A. J. McMichael, D. A. Cooper, S. L. Rowland-Jones, A. D. Kelleher,Characterization of CD4+ CTLs ex vivo. J. Immunol. 168, 5954–5958 (2002).

56. G. M. Griffiths, S. Alpert, E. Lambert, J. McGuire, I. L. Weissman, Perforin and granzymeA expression identifying cytolytic lymphocytes in rheumatoid arthritis. Proc. Natl. Acad.Sci. U.S.A. 89, 549–553 (1992).

57. T. Namekawa, U. G. Wagner, J. J. Goronzy, C. M. Weyand, Functional subsets of CD4 T cellsin rheumatoid synovitis. Arthritis Rheum. 41, 2108–2116 (1998).

58. H. Mattoo, V. S. Mahajan, T. Maehara, V. Deshpande, E. Della-Torre, Z. S. Wallace,M. Kulikova, J. M. Drijvers, J. Daccache, M. N. Carruthers, F. V. Castelino, J. R. Stone,

Fonseka et al., Sci. Transl. Med. 10, eaaq0305 (2018) 17 October 2018

J. H. Stone, S. Pillai, Clonal expansion of CD4+ cytotoxic T lymphocytes in patients withIgG4-related disease. J. Allergy Clin. Immunol. 138, 825–838 (2016).

59. C. L. Kohem, R. I. Brezinschek, H. Wisbey, C. Tortorella, P. E. Lipsky, N. Oppenheimer-Marks,Enrichment of differentiated CD45RBdim, CD27 - memory T cells in the peripheralblood, synovial fluid, and synovial tissue of patients with rheumatoid arthritis.Arthritis Rheum. 39, 844–854 (1996).

60. V. S. Patil, A. Madrigal, B. J. Schmiedel, J. Clarke, P. O’Rourke, A. D. de Silva, E. Harris, B. Peters,G. Seumois, D. Weiskopf, A. Sette, P. Vijayanand, Precursors of human CD4+ cytotoxic Tlymphocytes identified by single-cell transcriptome analysis. Sci. Immunol. 3, eaan8664 (2018).

61. M. Stoeckius, C. Hafemeister, W. Stephenson, B. Houck-Loomis, P. K. Chattopadhyay,H. Swerdlow, R. Satija, P. Simbert, Simultaneous epitope and transcriptome measurementin single cells. Nat. Methods 14, 865–868 (2017).

62. V. M. Peterson, K. X. Zhang, N. Kumar, J. Wong, L. Li, D. C. Wilson, R. Moore,T. K. McClanahan, S. Sadekova, J. A. Klappenbach, Multiplexed quantification of proteinsand transcripts in single cells. Nat. Biotechnol. 35, 936–939 (2017).

63. Y. C. Lee, C. O. Bingham III, R. R. Edwards, W. Marder, K. Phillips, M. B. Bolster, D. J. Clauw,L. W. Moreland, B. Lu, A. Wohlfahrt, Z. Zhang, T. Neogi, Association between painsensitization and disease activity in patients with rheumatoid arthritis: A cross-sectionalstudy. Arthritis Care Res. 70, 197–204 (2018).

64. R. Finck, E. F. Simonds, A. Jager, S. Krishnaswamy, K. Sachs, W. Fantl, D. Pe’er, G. P. Nolan,S. C. Bendall, Normalization of mass cytometry data with bead standards. Cytometry A83A, 483–494 (2013).

65. B. Ellis, P. Haaland, F. Hahne, N. Le Meur, N. Gopalakrishnan, J. Spidlen, M. Jiang, flowCore:Basic structures for flow cytometry data. R package version 1.46.1 (2018).

66. D. Bates, M. Mächler, B. M. Bolker, S. C. Walker, Fitting linear mixed-effects models usinglme4. J. Stat. Softw. 67, 1–48 (2015).

67. H. Wickham, ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).68. R. Kolde, pheatmap: Pretty Heatmaps. R package version 1.0.10 (2018).69. H. Chen, M. C. Lau, M. T. Wong, E. W. Newell, M. Poidinger, J. Chen, Cytofkit:

A bioconductor package for an integrated mass cytometry data analysis pipeline.PLOS Comput. Biol. 12, e1005112 (2016).

70. W. Luo, M. S. Freidman, K. Shedden, K. D. Hankenson, P. J. Woolf, GAGE: Generallyapplicable gene set enrichment for pathway analysis. BMC Bioinformatics 10, 161(2009).

71. J. Fan, P. Kharchenko, liger: Lightweight Iterative Geneset Enrichment. R package version0.1 (2018).

72. E. LeDell, N. Gill, S. Aiello, A. Fu, A. Candel, C. Click, T. Kraljevic, T. Nykodym,P. Aboyoun, M. Kurka, M. Malohlava, h2o: R Interface for ‘H2O’. R package version3.20.0.8 (2018).

73. L. Scrucca, M. Fop, T. B. Murphy, A. E. Raftery, mclust 5: Clustering, classification anddensity estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

Funding: This work was supported, in part, by funding from the NIH (UH2AR067677 toS.R., U19AI111224 to S.R., 1R01AR063759 to S.R., and R01 AR064850-03 to Y.C.L.), the DorisDuke Charitable Foundation (grant 2013097 to S.R.), T32 AR007530-31 (to M.B.B. and D.A.R.),the William Docken Inflammatory Autoimmune Disease Fund (to M.B.B. and S.R.), theRuth L. Kirschstein National Research Service Award (F31-AR070582 to K.S.), and theRheumatology Research Foundation Tobé and Stephen Malawista, MD Endowment inAcademic Rheumatology (to D.A.R.). Author contributions: C.Y.F. and S.R. conceived theMASC association testing method. C.Y.F., S.R., I.K., and K.S. conducted all statistical analyses.D.A.R., N.C.T., S.K.H., M.F.G., J.E., J.A.L., M.B.B., and S.R. designed and conducted all masscytometry, flow cytometry, and functional immunology assays. D.A.R., L.T.D., M.E.W., E.M.M.,J.S.C., S.M.H., D.J.T., V.P.B., E.W.K., and Y.C.L. recruited patients and obtained samples for thisstudy. C.Y.F., D.A.R., and S.R. wrote the initial manuscript. All authors edited and revisedthe final manuscript. Competing interests: The authors declare that they have no competinginterests. I.K. has been a paid bioinformatics consultant for Outlier Bio LLC since November2017. Data and materials availability: All data associated with this study can be found inthe paper or the Supplementary Materials. The RNA-seq data from this study have beendeposited in the Gene Expression Omnibus database (GSE118209). MASC and all other customscripts used in this analysis are available at www.github.com/immunogenomics.

Submitted 22 September 2017Resubmitted 9 May 2018Accepted 25 September 2018Published 17 October 201810.1126/scitranslmed.aaq0305

Citation: C. Y. Fonseka, D. A. Rao, N. C. Teslovich, I. Korsunsky, S. K. Hannes, K. Slowikowski,M. F. Gurish, L. T. Donlin, J. A. Lederer, M. E. Weinblatt, E. M. Massarotti, J. S. Coblyn,S. M. Helfgott, D. J. Todd, V. P. Bykerk, E. W. Karlson, J. Ermann, Y. C. Lee, M. B. Brenner,S. Raychaudhuri, Mixed-effects association of single cells identifies an expanded effectorCD4+ T cell subset in rheumatoid arthritis. Sci. Transl. Med. 10, eaaq0305 (2018).

17 of 17

Page 18: Mixed-effects association of single cells identifies …...Email: soumya@broadinstitute.org SCIENCE TRANSLATIONAL MEDICINE| RESEARCH RESOURCE Fonseka et al., Sci. Transl. Med. 10,

subset in rheumatoid arthritis T cell+Mixed-effects association of single cells identifies an expanded effector CD4

Brenner and Soumya RaychaudhuriSimon M. Helfgott, Derrick J. Todd, Vivian P. Bykerk, Elizabeth W. Karlson, Joerg Ermann, Yvonne C. Lee, Michael B.Michael F. Gurish, Laura T. Donlin, James A. Lederer, Michael E. Weinblatt, Elena M. Massarotti, Jonathan S. Coblyn, Chamith Y. Fonseka, Deepak A. Rao, Nikola C. Teslovich, Ilya Korsunsky, Susan K. Hannes, Kamil Slowikowski,

DOI: 10.1126/scitranslmed.aaq0305, eaaq0305.10Sci Transl Med

for the analysis of high-dimensional single-cell data.toolThe authors' method performed better than the current gold standard and could be a potentially widely applied

T cell subset in the blood of rheumatoid arthritis patients that was later seen to contract upon treatment.+CD4and technical variation. Using their method on single-cell mass and flow cytometry data revealed an expandeddeveloped a simple statistical method to do just that while simultaneously controlling for confounding biological

.et allarge datasets, and it can be challenging to identify rare cell types associated with disease. Fonseka New techniques analyzing single cells are being applied to clinical samples. These techniques generate

Pinpointing culprits in single-cell data

ARTICLE TOOLS http://stm.sciencemag.org/content/10/463/eaaq0305

MATERIALSSUPPLEMENTARY http://stm.sciencemag.org/content/suppl/2018/10/15/10.463.eaaq0305.DC1

CONTENTRELATED

http://stm.sciencemag.org/content/scitransmed/11/519/eaaw4626.fullhttp://stm.sciencemag.org/content/scitransmed/11/489/eaar6584.fullhttp://stm.sciencemag.org/content/scitransmed/11/477/eaat3356.fullhttp://stm.sciencemag.org/content/scitransmed/8/331/331ra38.fullhttp://stm.sciencemag.org/content/scitransmed/9/375/eaah3438.fullhttp://stm.sciencemag.org/content/scitransmed/8/369/369ra176.fullhttp://stm.sciencemag.org/content/scitransmed/10/452/eaan3508.full

REFERENCES

http://stm.sciencemag.org/content/10/463/eaaq0305#BIBLThis article cites 68 articles, 14 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

registered trademark of AAAS. is aScience Translational MedicineScience, 1200 New York Avenue NW, Washington, DC 20005. The title

(ISSN 1946-6242) is published by the American Association for the Advancement ofScience Translational Medicine

of Science. No claim to original U.S. Government WorksCopyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement

by guest on October 2, 2020

http://stm.sciencem

ag.org/D

ownloaded from