Identification of transcriptional regulators in the mouse...
Transcript of Identification of transcriptional regulators in the mouse...
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION �
r e s o u r c e
The differentiation of hematopoietic stem cells into cells of the immune system has been studied extensively in mammals, but the transcriptional circuitry that controls it is still only partially understood. Here, the Immunological Genome Project gene-expression profiles across mouse immune lineages allowed us to systematically analyze these circuits. To analyze this data set we developed Ontogenet, an algorithm for reconstructing lineage-specific regulation from gene-expression profiles across lineages. Using Ontogenet, we found differentiation stage–specific regulators of mouse hematopoiesis and identified many known hematopoietic regulators and 175 previously unknown candidate regulators, as well as their target genes and the cell types in which they act. Among the previously unknown regulators, we emphasize the role of ETV5 in the differentiation of gd T cells. As the transcriptional programs of human and mouse cells are highly conserved, it is likely that many lessons learned from the mouse model apply to humans.
programsofhumanandmousecellsarehighlyconserved4,manylessonslearnedfromthemousemodelwillprobablybeapplicabletohumans.Twokeyapproachesfortheidentificationofregulatorynetworks5arephysicalmodelsbasedontheassociationofatranscrip-tionfactororacis-regulatoryelementwithatarget’spromoter(forexample,fromchromatinimmunoprecipitation(ChIP)followedbydeepsequencing)andobservationalmodelswithwhichregulationcanbeinferredfromasignificantcorrelationbetweentheabundanceoractivityofatranscriptionfactor(asproteinormRNA)andthatofitspresumedtarget.Inbothcases,analysisoftherelationshipbetweenaputativeregulatorandamoduleofcoregulatedtargetsenhancesrobustnessandbiological interpretability6,7.Physicaldataprovidedirectevidenceofbiochemical interactionsbutdonotnecessarilyindicatefunction8andarechallengingtocollect9,whereasmRNAprofilesarehighlyaccessiblebutprovideonlycorrelativeevidence.Asphysicalandobservationalmodelsarecomplementary,usingboth3canenhanceconfidence5–7andexpandthescopeofdiscovery.
Analysisofcellsorganizedinaknownlineage,asinhematopoiesis,offersuniqueopportunitiesthathavenotbeenleveragedbefore.Inparticular,publishedmodels3havenotexplicitlyconsideredthefactthatcellsthataremorecloselyrelated(accordingtotheknownline-agetree)probablysharemanyoftheirregulatorymechanismsandthatregulatoryrelationshipsthatexistinonesublineagemaynotbeactiveinanother.Incorporatingsuchinformationmayhelpintheidentificationoftrueregulatorsofhematopoiesis.
Hereweused those insights todevelopanewcomputationalmethod,Ontogenet,andtoapplyittotheImmGencompendium
1Computer Science Department, Stanford University, Stanford, California, USA. 2Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA. 3Department of Pathology, University of Massachusetts Medical School, Worcester, Massachusetts USA. 4Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, USA. 5Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 6A full list of members and affiliations appears at the end of the paper. 7These authors contributed equally to this work. 8These authors jointly directed this work. Correspondence should be addressed to A.R. ([email protected]) or D.K. ([email protected]).
Received 30 January; accepted 13 March; published online 28 April 2013; doi:10.1038/ni.2587
TheImmunologicalGenomeProject(ImmGen)isaconsortiumofimmunologistsandcomputationalbiologistswhoaim,throughtheuseofsharedandrigorouslycontrolleddata-generationpipelines,toexhaustivelychartgene-expressionprofilesandtheirunderlyingregulatorynetworksinthemouseimmunesystem1.Inthiscontext,weprovidethe firstcomprehensiveanalysisof theImmGencom-pendiumanduseanewcomputationalalgorithmtoreconstructamodularmodeloftheregulatoryprogramofmousehematopoiesis.Understanding the regulatory mechanisms that underlie the dif-ferentiationofcellsoftheimmunesystemhasimportantimplica-tionsforthestudyofdevelopmentandforunderstandingthebasisof human immunological disorders and hematological malignan-cies.Moststudiesofhematopoiesisviewdifferentiationasaprocesscontrolled by relatively few ‘master’ transcription factors that areexpressed inspecific lineagesandact tosetandreinforcedistinctcellstates2.However,analysisofgeneexpressionin38celltypesinhumanhematopoiesis3hassuggestedamorecomplexorganizationthat involvesa largernumberof transcriptionfactors thatcontrolcombinationsofmodulesofcoexpressedgenesandarearrangedindensely interconnected circuits. However, that human study wasrestrictedtohumancellsthatcouldbeobtainedinsufficientquanti-tiesfromperipheralorcordbloodandthuscouldnotaccessmanycellpopulationsoftheimmunesystem.
The246celltypesofthemouseimmunesysteminthe816arraysoftheImmGencompendiumofferanunprecedentedopportunityfor studying the regulatory organization of hematopoiesis in thecontextofarichanddiverselineagetree.Becausethetranscriptional
Identification of transcriptional regulators in the mouse immune systemVladimir Jojic1,7, Tal Shay2,7, Katelyn Sylvia3, Or Zuk2, Xin Sun4, Joonsoo Kang3, Aviv Regev2,5,8, Daphne Koller1,8 & the Immunological Genome Project Consortium6
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
� aDVaNCE ONLINE PUBLICaTION nature immunology
r e s o u r c e
tobuildanobservationalmodelassociating578candidateregula-torswithmodulesofcoexpressedgenes.Wedefinedmodulesattwodifferentgranularities,with81largercoarse-grainedmodules,someofwhichwefurtherrefinedintosmallermoduleswithmorecoherentexpression; thisresulted in334 fine-grainedmodules.Themodelidentifiedmanyofthealreadyknownhematopoieticregulators, was supported through the use of a complementaryphysicalmodelandproposeddozensofpreviouslyunknowncan-didateregulators.Ourmodelprovidesarichresourceoftestablehypotheses for experimental studies, and the Ontogenet algo-rithmcanbeused todelineate regulation in thecontextofanycelllineage.
RESULTSTranscriptional compendium of the mouse immune systemThe ImmGen consortium data set1 (April 2012 release) consistsof816expressionprofilesfrom246celltypesofthemouseimmunesystem (Fig. 1 and Supplementary Table 1). The cell types spanall major hematopoietic lineages, including stem and progenitorcells,granulocytes,monocytes,macrophages,dendriticcells(DCs),natural killer (NK) cells, B cells and T cells. The T cells includemany types of αβ T cells, regulatory T cells (Treg cells), naturalkiller T cells (NKT cells) andγδ T cells. The ‘same’ cell type wasoftensampled fromseveral tissues, suchasbonemarrow, thymusandspleen.
Similarities in global profiles trace the cell ontogenyCorrelationsinglobalprofilesbetweensampleswerelargelyconsist-entwiththeknownlineagetree(Fig. 2).Ingeneral,theclosertwocellpopulationswereinthelineagetree,themoresimilartheirexpressionprofileswere(Pearsonr=–0.71;Supplementary Fig. 1).Formyeloidcells,profilesweresimilaroverall,withgranulocytesbeingtheleastvariable,DCsbeingthemostvariable(consistentwithDCsamples’beingobtainedfromdiversetissuesandtheirknowninherentdiver-sity10)andallmyeloidcellsbeingweakly similar to stromalcells.Conversely,lymphocyteshadlargerdifferencesbetweenlineages.NKcells,althoughtightlycorrelated,didshowweakersimilaritytoTcells,especiallyCD8+TcellsandNKTcells.Tcellswereveryheterogene-ous,whichpartlyreflectedthefinersamplingforthislineage.Stemcellsweremostsimilartoearlymyeloidandlymphoidprogenitors(S&Pgroup,Fig. 2),followedbypre-Bcellsandpre-Tcells,consist-entwithagraduallossofdifferentiationpotential.Asaresourceforstudyingeachlineage,weusedone-wayanalysisofvariancetodefinecharacteristicsignaturesofover-andunderexpressedgenesforeachofthemainelevenlineagescomparedwiththeexpressionofthosegenesinallotherlineages(Supplementary Table 2).
Coarse- and fine-grained expression modules in hematopoiesisTocharacterizethekeypatternsofgeneregulation,wenextdefinedmodulesofcoexpressedgenesattwogranularities(Supplementary Fig. 2a,b).Wefirstconstructed81coarse-grainedmodules(C1–C81;
SC.LTSL.BM
CLP.BM
preT.ETP.Th
SC.STSL.BM
NK.49CI–.Sp NK.49CI+.Sp NK.49H+.Sp NK.49H-.Sp
NK.MCMV1.Sp
proB.FrA.BM
proB.FrBC.BM
preB.FrC.BM
preB.FrD.BM
B.FrE.BM
B.T1.Sp
B.T2.Sp
B.T3.Sp
B.Fo.Sp
B.GC.SpB.FO.PC
preT.DN2.Th
T.DN4.Th
T.4int8+.Th
T.8SP69+.Th
T.4SP24int.Th
Tgd.Th
T.DP69+.Th
T.4+8int.Th
T.4SP69+.Th T.4FP3+25+.AA
T.DPbl.Th
T.DPsm.Th
T.DP.Th
Lu
NK.SpT.8SP24int.Th
T.8SP24-.Th
T.8Nve.SpT.8Nve.LN T.8Nve.MLN T.8Nve.PP
T.8Mem.LN T.8Mem.Sp T.8Nve.Sp.OT1
T.8Mem.Sp.d45
T.8Mem.Sp.d106
T.4SP24-.Th
T.4Nve.Sp T.4Nve.PPT.4Nve.MLNT.4Nve.LN
T.4Mem.LN
T.4.Pa.BDC
T.4.PLN.BDC
T.4.LN.BDC
T.4Mem.Sp
T.4Mem44h62l.Sp T.4Mem44h62l.LN
NKT.44–NK1.1–.Th
NKT.44+NK1.1+.Th
NKT.44+NK1.1–.Th
NKT.4+.Sp NKT.4+Lv
NKT.4–.Sp
NKT.4+.Lu
NKT.4–.Lv
NK.H–.MCMV1.SpNK.H+.MCMV1.Sp
NK.H+.MCMV7.SpNK.MCMV7.Sp
B CTRL
T.8Eff.Sp.d5T.8Eff.Sp.48hr
T.8Eff.Sp.d8
T.8Eff.Sp.d10
T.8Eff.Sp.d6
T.8Eff.Sp.d8
T.8Eff.Sp.d15 T.8Eff.Sp.d15
MLP.BM
IIhilang+103+11bLO.SLN
IIhilang+103–11b+.SLN
IIhilang–103–11b+.SLN
IIhilang–103–11bLO.SLN
LC.Sk
SC.LTSL.FL
SC.STSL.FL
MLP.FL
proB.FrA.FL
proB.FrBC.FL
preB.FrD.FL
B.FrE.FL
B.1a.PC
Fetal Adult
T CTRL
RP.Sp
Thio5.II+480lo.PC
Thio5.II+480int.PC
Thio5.II–480int.PC
Thio5.II-480hi.PC
II–480hi.PC
4+
8–
8+
8–4–11b–
MLN Th Sp
6C+II+
6C–II–
6C+II–
6C–II+
BI LN
6C–IIint
B.MZ.Sp B1a.SpB1b.PC
8–4–11b+
SLN
BM
Microglia.CNS
SC.CMP.BM
SC.GMP.BM
BMBM
T.8Mem.Sp.d100
ARTH.BM
ARTH.SYNF
Bl
THIO.PC
URAC.PC
B.FO.LN B.FO.MLN
LvLu.LN IC
SC.MPP34F.BM
SC.ST34F.BM
SC.MEP.BM
preT.DN2-3.Th
T.ISP.Th
PreT.DN3-4.Th
T.4FP3+25+.LN T.4FP3+25+.Sp
T.8Eff.Sp.12hr
T.8Eff.Sp.24hr
T.8Eff.Sp.d6
LisOva.OT1 VSVOva.OT1
103+.BR103-.BR 103-.Sp
T.8Mem.d20
Granulocytes
Monocytes
Macrophages
103+.11b+
103+.11b–
SI S.SI
Vg2+.Sp
Vg2+.act.Sp
Vg1+Vd6–24–.Th
Vg1+Vd6–.Th
Vg1+Vd6–24+.Th
Tgd.Vg1+Vd6+24–.Th
Vg1+Vd6+.Th
Vg1+Vd6+24+.Th
Vg2+24–.Th
Vg2+24+.Th
Tgd.Sp
Vg2–.Sp
Vg2–.act.Sp
Vg5–.IEL
Vg5–.act.IEL
Vg5+.IEL
Vg5+.act.IEL
T.8Mem.Sp.d45
T.4FP3–.Sp
Putative edges
Nonintersecting edges
B.FrF.BM
CLP.FL
CD8+ Tcells
CD4+ Tcells
Treg cells
NKT cells
γδ T cells
DCsEp.MEChi
Fi
SLN
FRC
LEC
BEC
MLNSk
ST.31-38-44-
Th
Stromal cells
NK cellsB cellsDC MF-MO T cellsGN
SC.LT34F
SC.MDP.BM
SC.CDP.BM
103–.11b+24+.Lu
103-.11b+.Lu
103-.11b+.SI
103-.11b+.Salm.SI
11cloSer.SI
11cloSer.Salm.SI
II+480lo.PC
DC.103-11b+F4/80lo.Kd
103-.11b+
pDC.8–
pDC.8+
Figure 1 Mouse cell populations in the ImmGen compendium. Lineage tree of the hematopoietic mouse cell types profiled by the ImmGen Consortium (nomenclature and markers for sorting, Supplementary Table 1). Some samples of stem cells, progenitor cells and B cells were obtained from adult and fetal liver. Stromal cells (box, bottom right) were also measured as part of ImmGen but are not part of the lineage tree. Color in bar beneath matches color of branches in tree above. GN, granulocyte; MF-MO, macrophages-monocyte. Adapted from ref. 4.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION �
r e s o u r c e
Supplementary Fig. 2c–h and Supplementary Table 3) and thenfurtheridentifiedforeachcoarse-grainedmoduleasetofnestedfinemodules(Supplementary Fig. 2a),whichresultedin334finemod-ulesspanning7,965genes(F1–F334;Supplementary Table 4).Coarsemoduleshelpeduscapturethemechanismsthatcoregulatealargersetofgenesinonelineage,whereasfinemodulesmayhelpintheidenti-ficationofdistinctregulatorymechanismsthatcontrolonlyasmallersubsetofthesegenesintheotherlineage(s).Manyofthemodulesshowedenrichmentforcoherentfunctionalannotations,cis-regulatoryelements(Supplementary Table 5)andbindingoftranscriptionfac-tors(Supplementary Table 6andSupplementary Note 1),includingbindingsitesforfactorsknowntoactasregulatorsinthelineage(s)inwhichthemodule’sgenesareexpressed(Supplementary Note 2).Allmodulesandtheirassociatedenrichmentscanbesearched,browsedand downloaded at the ImmGen portal (http://www.immgen.org/ModsRegs/modules.html).
Most coarse-grained modules (48 of 81 modules; 4,478 of 7,965genes) showed either lineage-specific induction (Supplementary Figs. 2c and 3) or ‘pan-differentiation’ regulation (Supplementary Figs. 2de, 4 and 5). Inaddition,6moduleswere ‘mixed-use’acrosslineages (Supplementary Figs. 2f and 6), 8 were stromal specific(Supplementary Fig. 2g)and19hadexpressionpatternsthatdidnotfallintothosecategories(Supplementary Figs. 2h and 7).Lineage-specificrepressionwasrare(onlyinC53(Bcells)andC17(stromalcells)).
Ontogenet: reconstructing lineage-sensitive regulationWenextdevelopedanewalgorithm,Ontogenet,todelineatetheregu-latorycircuitsthatdrivehematopoieticcelldifferentiation.Ontogenetaimstofulfillthefollowingbiologicalconsiderations:criterion1,theexpressionofeachmoduleofgenesisdeterminedbyacombinationofactivatingandrepressingtranscriptionfactors;criterion2,theactivityofthosefactorsmaychangeindifferentcelltypes(forexample,factorAmayactivateamoduleinonelineagebutnotinanother,evenifAisexpressedinbothlineages);criterion3,theidentityandactivityofthefactorsthatregulateamodulearemoresimilarincellsthatareclosetoeachotherinthelineagetree(forexample,fromthesamesublineage)thanin‘distant’cells(forexample,fromtwodifferentsublineages),inaccordancewiththegreatersimilarityinexpressionprofilesofclosercelltypes(Supplementary Fig. 1);andcriterion4,masterregulatorsofalineage(forexample,GATA-3forTcells)areactiveacrossthesublineages,butthesubtypescanalsohaveadditional,morespecificregulators(forexample,Foxp3forTregcells).Theformershouldbecapturedassharedregulatorsofacoarsemoduleanditsnestedfinemodules,whereasthelatterregulateonlyparticularfinemodules.
Ontogenetreceivesasinputthegene-expressionmodule,thelineagetreeandtheexpressionprofilesofapredesignatedsetof‘candidateregulators’(transcriptionfactors,chromatinregulatorsandsoon).It then associates each module with a combination of regulators(criterion1above),wherebyeachregulatorisassignedan‘activityweight’foreachcelltypethatindicatesitsactivityasaregulatorforthatmoduleinthatcell(criterion2above).Theregulatoractivityis
attheproteinlevelbutisinferredsolelyfromtranscriptabundance.FollowingtheapproachintheLirnetmethodforregulatory-networkreconstruction6,theactivity-weightedexpressionoftheregulatorsiscombinedinalinearmodeltogenerateapredictionofamodule’sgeneexpressionineachcelltype(Fig. 3).Inthismodel,theexpres-sionofthemodule’sgenesinagivencelltypeisapproximatedbythelinearsumoftheregulators’expressioninthatcelltypemultipliedbyeachregulator’sactivityweightinthatcelltype.Asaresult,themodelmakespredictionssuchas“inpreBcells,Module1isactivatedbytranscriptionfactorsAandBandisrepressedbyfactorC,whereasinBcells,factorsAandCarenolongeractive(evenifthefactorsareexpressed),andModule1isactivatedbyBandD.”Ourmodelassumesthatallgenesinthesamemoduleareregulatedinthesameway.Thisisessentialforstatisticalrobustness,althoughitcomesatthecostofmissingsomegene-specificexpressionpatterns.Thefinemodulesletusexaminesubtlerexpressionpatternssharedbyfewergenesbutaremoresusceptibletonoise.
AlthoughOntogenetreconstructsapotentiallydifferentregulatoryprogramforeachcelltype,asreflectedbythecell-specificactivityweightsofeachregulator,itisgearedtowardmaintainingthesameactivityacrossconsecutivestagesindifferentiation(criterion3).Thisisachievedbypenalizingchangesintheactivityweightsoftheregula-toryprogrambetweenacelltypeanditsprogenitor.Thefine-grainedmodulesderived fromacoarse-grainedmodule ‘inherit’ thesameregulatorsandactivityweightsthatwereinferredfortheircoarse-grainedmodule(whilepossiblygainingadditionalregulators;crite-rion4).Collectively,weuseanoptimizationapproachthatconstructsanensembleofregulatoryprogramsthattrytoachievethefollowinggoals:eachregulatoryprogramexplainsasmuchofthegene-expressionvarianceinthemoduleaspossible;theregulatoryprogramsremainassimpleaspossible;regulatoryprogramsareconsistentacrossrelatedcelltypesintheontogeny;andfinemoduleshaveregulatorssimilartothoseofthecoarsemodulestowhichtheybelong.
Notably,theapproachusedbeforetoidentifycombinationsofregu-lators(forexample,linearregressionregularizedwiththeElasticNet
GN
MF
MO
DC
S&P
PROB
B
NK
PRET
T4
T8
ACTT8
NKT
gdT
Stromal
NKBDC T8GN MF MO S&P
PROB
PRET T4 NKTACTT8 gdT
Stromal
−1.0
−0.8
−0.6
−0.4
−0.2 0
0.2
0.4
0.6
0.8
1.0
Correlation
Figure 2 Related cells have very similar expression profiles. Pearson correlation coefficients (purple, positive; yellow, negative; white, none) for each pair of profiled cell types, calculated for the 1,000 genes (of the 8,431 unique expressed genes) with the highest s.d. value of all samples. Samples are sorted by breadth-first search on the tree (Fig. 1), with stromal cells at the lower or right end. Black vertical and horizontal lines delineate major lineages according to labels along left margin and beneath (color in bar beneath matches colors in Fig. 1). S&P, stem and progenitor cells; PROB, pre-B cells and pro-B cells; T4, CD4+ T cells; T8, CD8+ T cells; ACTT8, activated CD8+ T cells; gdT, γδ T cells.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
� aDVaNCE ONLINE PUBLICaTION nature immunology
r e s o u r c e
penalty6,11) assumed that regulatory activ-ity (andhenceactivityweight) is the sameacrossallcelltypes.Thus,ifaregulatorwasexpressedsimilarlyintwodifferentcells,itwasdeemedtobeactivetothesameextent.Thisviolatestheknowncontextspecificityofregulationincomplexlineages.Conversely,allowingthealgorithmtoconstructasepa-rate regulatory program for each cell typeindependentlyisimpracticalandalsoignorestheexpectedsimilaritybetweenrelatedcelltypesinthelineageintermsofgeneregulation.Ontogenetsolvesthisproblembyleveragingthelineagetreewheninferringtheregulatoryconnectionsandtheiractivity,suchthatthemodule’sgenesaremorelikelytoberegulatedinasimilarwayinrelatedcelltypes.
Ontogenet regulatory model for mouse hematopoiesisWeappliedOntogenettothe81coarse-grainedmodulesand334finemodules,alineagetreeconsistingof195celltypesand580candidateregulators.TheOntogenetmodelidentified1,417regulatoryrelations(1,091activating,317repressingand9mixed)between81coarse-grainedmodulesand480uniqueregulators(Fig. 4,Supplementary Fig. 8andSupplementary Table 5).Onaverage,therewere17regula-torspercoarse-grainedmodule,andthreecoarse-grainedmodulesperregulator.Asdeterminedbycross-validation,Ontogenetconstructsregulatoryprograms thatare strictlybetteratpredictingnewandpreviouslyunknownexpressiondatathanthoseobtainedbyElasticNet6,amethodthatdoesnotusethetreeandhasfixedactivityweights(Supplementary Fig. 9andSupplementary Note 3).
Inmostcases(59%),aregulator’sactivityweightsvariedindiffer-entcelltypes(‘frequentlychanging’),reflectiveofcontext-specificregulation(Supplementary Fig. 10).Whenweprunedregulatoryinteractionswhosemaximaleffect(definedastheproductofactiv-ityweightandexpression)waslow,weobtainedasparsernetwork,in which ‘pan-differentiation’ and lineage-specific modules werecontrolledmostlybydistinctregulators(Fig. 5),whereasmixed-usemodulessharedregulatorswithmodules intheotherclasses.Theregulatorymodelassociating334finemodulesand554regulatorsin6,151interactionshadqualitativelysimilarpatterns,exceptforhav-ingmoreregulatorswithmixedactivity(thatis,aregulator’sactivityweightsfrequentlychangedinsomemodulesandremainedconstantinothers),probablyreflectiveofboththegreaternumberofinterac-tionsandthefinerregulatoryprogram(Supplementary Fig. 10andSupplementary Table 7).Thisrichregulatorymodelfordifferentia-tionofthemouseimmunesystemidentifiedmanyknownregulatoryinteractionsandsuggestednewregulatory interactions inspecificimmunologicalcontexts.
Ontogenet prediction of known regulatory interactionsManyof theregulatory interactions identifiedbyOntogenetwerealready known, which supported the accuracy of our model. Forexample,amongindividualregulators,PU.1(encodedbySfpi1)wasselectedasaregulatorofthemyeloidandBcellmoduleC25(and13ofits15finemodules);C/EBPα(encodedbyCebpa)regulatesthemyeloidmodulesC24,C30andC74,themacrophagemoduleC29,andmanymyeloidfinemodules;C/EBPβ(encodedbyCebpb)regu-latesthemyeloid-specificmodulesC25andC30andmanymyeloidfinemodules;MafB(encodedbyMafb)regulatesthemacrophage-specificmodulesC29,F128andF131;STAT1regulatestheinterferon-responsemoduleC52;T-bet(encodedbyTbx21)regulatestheNKcellmoduleC19andNKTcellmoduleF288;andCIITA(encodedbyCiita)regulatestheantigen-presentingcellmoduleF136.
Furthermore, the combination of regulators associated with asinglemodulewasalsoconsistentwithknownregulatoryrelations.For example, the B cell module C33 is regulated by the knownB cell regulators Pax5, EBF1, POU2AF1 and Spi-B (Fig. 4); theT cell module C18 (Supplementary Fig. 8) is regulated by theknown T cell regulators Bcl-11B, GATA-3, Lef1, TOX and TCF7;theγδTcellmoduleC56isregulatedbytheknownγδTcellregula-torsPLZF(ZBTB16),Sox13andId3,allalsoinvolvedinNKTcelldevelopmentandfunction; theNKTmoduleF188 isregulatedbyGATA-3,T-betandPLZF;andfinemodulesF150andF152,inwhichtheexpressionoftheirmembergenesbyCD8+DCsishigherthanthatofCD4+DCs,areregulatedbyIRF8(butnotIRF4),consistentwiththeknownroleofsubset-selectiveexpressionIRF4andIRF8inDCcommitment12.
Ontogenet’spredictionswerealsosupportedbytheirsignificantoverlapwiththosebasedonenrichmentofcis-regulatorymotifsandChIP-basedbindingprofilesinthemodules(Supplementary Tables 5 and 6),whichsupportedtheideaofdirectphysicalinteractionbetweenaregulatorandthegenesinthemodulewithwhichitwasassociatedbyOntogenet(Supplementary Table 8).Forexample,27oftheasso-ciationsbetweenaregulatorandacoarsemoduleweresupportedbyenrichmentforcis-regulatorymotifs(P=2.6×10−5(hypergeometrictest fortwogroups)andP<1×10−5(permutationtest)),suchas
ba
Target module mean
Global activatorExpression
Activity
Context-specific regulator
Global repressor
Target module expression
Expression
Activity
Expression
Activity
Samples
−2 0 2
Expression (fold)
Repression None Activation
Activity
= Regulator expressionRegulator activity
Irf5 expression
Gata3 expression
Tox expression
Trim14 expression
IRF5 activity
GATA-3 activity
TOX activity
TRIM14 activity
+
+
+
Mean module expression
= ×
×
×
×
×
Σ
Figure 3 Overview of Ontogenet. (a) Use of the regulator expression profile (blue-red; key below) and activity profile (orange-purple; key below) to demonstrate how a type of regulator (bottom) can ‘explain’ the expression of a module (top): a regulator may have a uniformly positive activity weight across the lineage (constitutive activator; top), a uniformly negative activity weight (constitutive repressor; middle) or variable activity weights (context-specific regulator; bottom). (b) Mean expression of a module (top) calculated as a linear combination of regulator expression (blue-red; left) and activity (orange-purple; right).
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION �
r e s o u r c e
theGATA-2motifinthehematopoieticstemcell(HSC)moduleC40,andthePU.1(SFPI1)motif in myeloid cell module C25. TheChIP profiles supported the prediction of21 regulator–coarse module associations(P=2.2×10−5(hypergeometrictestfortwogroups)andP<1×10−5(permutationtest)),suchasthebindingofC/EBPαandC/EBPβinthemyeloidcellmoduleC24andthebind-ingofEBF1intheBcellmoduleC33.
Althoughthoseoverlapswerestatisticallysignificant,theyneverthelessalsoindicatedthatthepredictionsofmostregulatoryinter-actionswerenotsupportedbyenrichmentforknowncis-regulatorymotifsoravailabletranscriptionfactor–bindingdata,andviceversa.Therearethreereasonsforthis.First,assigningscoresforbindingsitesandtheirenrichmentisaprocessthatishighlypronetofalse-negativeresults;thisisparticularlylikelytooccurformuchsmallerfinemodules.Second,themajorityofregulatorschosenbyOntogenetdonothaveacharacterizedbindingmotif(60%ofregulators;334of554)orChIPbindingdata inanycell type (90%of regulators;497of554).Suchregulatorscanbenominatedonlybyanexpres-sion-basedmethod,suchasOntogenet,andshouldnotbeconsideredfalse-positiveresultsofourmethod.Third,inmanycasesinwhichwedofindenrichmentforacis-regulatoryelementorbindingprofilefor(forexample) transcription factorA inmoduleB(300of551cis-regulatory interactions (54%); 52 of 90 ChIP-based interactions(57%)),thetranscriptionfactor(A)anditstargetmodule(B)showlittleornocorrelationinexpression(absolutePearsonr<0.5).Insomecases,thisisduetoafactorthatisnotitselftranscriptionallyregulated(areal ‘false-negative’resultofOntogenet),butinmanyothercasesthefactorprobablycontrolsthesetargetsinanothercelltype not measured in our study (and hence is not in fact a false-negativeresultofOntogenet).
Afewknownregulatorsofdifferentiationoftheimmunesystem13werenotidentifiedbythemodelforvariousreasons.Tal-1andBMI1didnotmeettheinitialfilteringcriteria,astheywereexpressedonlyinHSCs,andhencewerenotprovidedasinput.GFI1wasnotassigned
asaregulatorinstemandprogenitorcellsorgranulocytesbecauseitsexpressionwashighestinpre-Tcellsandwasonlysparseandinter-mediateinstemandprogenitorcellsandgranulocytes.E2A(encodedbyTcf3)wasnotidentifiedasaTcellregulator,perhapsbecauseitwasnotspecificallyexpressedinTcellsandhadlowexpressioningeneral,possiblybecauseofabadprobeset.XBP1wasnotidentifiedasaBcellregulatorbecauseithadrelativelylowexpressioninBcellsinourarraysandhadhigherexpressioninmyeloidcells.
Thereidentificationofknownregulatorslendssupporttothemanypreviouslyunknownregulatoryinteractionsinthemodel.Ofthe475regulatorsthatOntogenetassociatedwithlineage-specificmodulesor‘pan-differentiation’modules,atleast175(37%)werecompletelyunknowninthiscontext.Amongthose,forexample,KLF12waspre-dictedtobearegulatoroftheNKcellmoduleC19butwasnotassoci-atedbeforewiththeregulationofNKcells.GATA-6waspredictedtobearegulatorofthemacrophage-specificmodulesC31,C50andC58butwasnotassociatedbeforewithmacrophages.Thatisinagreementwiththemuchlowernumberofgranulocyte-macrophagecoloniesgener-atedbyembryoidbodiesofGATA-6-deficientmice14.Finally,ETV5waspredictedbythemodeltobearegulatoroftheγδTcellmodulesF287andF289,apreviouslyunknownrolediscussedbelow.
Context-specific regulation underlies mixed-use modulesContext-specificregulation,inwhichthesamesetofgenesisregu-latedbyonesetofregulatorsinthecontextofonelineageandby
POU2AF1
Pax5
EBF1
Spi-B
RFX5
PRDM9
HDAC10
c
b
aF175
F176
F177
F178
F179
F180
F181
Bcl7a
RestBlk
Cd19Cd79aCd79bEbf1Pax5
Pou2af1Vpreb3Cd22Cd55
Fcer2aFcrl1Fcrl5Il9r
Igll1Vpreb1
BlnkIl12a
Tcf4
Siglecg
Spib
NKBDC T8GN MF MO S&P PREB PRET T4 NKT ACTT8 GDT
Regulator expression
Regulator activity weight
Pou2af1
Pax5
Ebf1
Spib
Rfx5
Prdm9
Hdac10
GN MF DCs NK T-reg NKTT4T8
gdT
MO
d
POU2AF1
−2 0 2
Repression None Activation
Expression (fold)
ActivityPax5EBF1POU2AF1Spi-BRFX5HDAC10
Figure 4 Ontogenet regulatory model for coarse–grained module C33. (a) Module C33: mean-centered expression (red-blue; key, bottom right) of the module’s genes (rows) in each cell (column); major lineages are delineated by dashed vertical lines (which correspond to color bar beneath c; matches colors in Fig. 1); fine modules F175–F181 (right margin) nested within C33 are delineated by thin horizontal lines; left margin, examples of genes in module. This module contains some typical B cell genes, including Cd19, Blnk, Ebf1 and Cd79a. (b) Regulator expression, presented as mean-centered expression (red-blue color bar, below c) of the regulators (rows) assigned by Ontogenet to module C33. (c) Regulator activity weight (orange-purple; key, bottom right) assigned by Ontogenet for each of the regulators from b in each cell type. (d) Projection of mean-centered mean expression (blue, low; red, high) of module C33 onto the hematopoietic tree (below, differentiated populations on that tree); arrowheads indicate ‘edges’ (differentiation steps) at which the activity weight of selected inferred regulators (labeled in diagram) changes.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
� aDVaNCE ONLINE PUBLICaTION nature immunology
r e s o u r c e
anothersetofregulatorsinthecontextofanotherlineage,hasbeenreportedinselectedcases,suchastheregulationofRag2byGATA-3inTcellsandbyPax5inBcells15.TheabilityofOntogenettoidentifydifferentregulatoryprogramsforthesamemoduleindifferentpartsofthelineagetreecanhelpdelineatetheregulatorymechanismsthatunderlie‘mixed-use’modulesexpressedinmorethanonelineage.Forexample,moduleC70isinducedbothinTregcellsandsomemyeloidpopulations.Eachactivationeventisassociatedwithdifferentregu-latorsinourmodel:Foxp3inCD4+Tcells(itselfamemberofthemodule,althoughnotexpressedintheDCsubsets),andPIAS3,HSF2andINSM1inDCs.Inanotherexample, thefine-grainedmoduleF300isindependentlyinducedinbothmatureBcellsandTcells.Althoughsomeofitsregulatorsarethemselves‘mixed-use’inbothlineages,othersareBcellspecific(ZFP318,RFX5andCIITA)orTcellspecific(EGR2).
Regulatory recruitment and ‘rewiring’ during differentiationMostregulatoryrelationsidentifiedbyOntogenetweredynamic,asreflectedbythechangeintheirassociatedactivityweightsduringdiffer-entiation.Thischangeprovideda‘bird’s-eye’viewofthe‘recruitment’
and‘disposal’ofregulators(Fig. 6a).Tocharacterizethis,foreachcelltype,weidentifiedalltheregulatoryinteractionswhoseactiv-ityweightchanged(increasedordecreased)betweenthatcelltypeanditsimmediateprogenitor(Supplementary Table 9),aswellastheuniqueregulatorsandmodulesinvolvedinthoseinteractions.Inthisway,weidentifiedmodulesandregulatorsthatwererecruitedandstrengthened(activityweightgreaterthanthatofitsprogenitor)orweredisposedofandweakened(activityweightlowerthanthatofitsprogenitor)ateachdifferentiationstep.Notably,recruitment(ordisposal)ofregulatorsdoesnotnecessarilymeanthattheregulators’expressionchangesbutthatthemodelsuggeststhattheirregulatoryactivityhaschangedforthissetoftargets.Forexample,duringthedifferentiationofCD8+Tcellsfromcommonlymphoidprogenitors,61regulatoryinteractionswererecruited,involving34modulesand49regulators,only15ofwhichhavebeenassociatedbeforewithTcelldifferentiation.Inparticular,forthedifferentiationstepfromdouble-negative(CD4−CD8−)stage4Tcelltoimmaturesingle-positive(CD4+orCD8+)Tcell,OntogenetindependentlyidentifiedthepreviouslyreportedinvolvementofMXD4,BatfandNFIL3andnewlyidenti-fiedtheinvolvementofRCBTB1,PIAS3andITGB3BP(Fig. 6b,c).
TRPS1
SPIC
NFE2L2
JDP2
ZFP691
POU2AF1
SOX5
RFX5
SPIB
EBF1
PRDM9
PAX5
BCL11B
HLF
76
65
18
19
62
75
31
ETS1
AES
33
61
1628
27
41
74
24
TEAD2
PA2G4
ACTL6A
23
RUVBL1
PHF5A
HMGB2
ASF1B
FOXM1
11
THRA
47
42
HMGA2
PHB2RUVBL2
MECOM
PLAG1
CNBP
9
6
15
55
SOX7
7
3
4010
504
14
CBX5
DNMT1
GATA1
HSF2
MYBL2
43
UHRF1
12
ATAD2
RB
BP
7
MO
RF4
L2
NP
M1
BUD31
CHURC1
HOXA9
SU
V39
H2
HDAC277
443029
FOXN2KLF6
32
EPAS1
AHR
26
48
58
CEBPBHDAC5
IRF6
ZEB1
KLF12IRF8RCOR2
PPARGTRERF1
GATA6
RARB
CREG1NR1H3
CEBPAETV1 KLF9 MAFB M
AF
CH
D3
EY
A1
TCFE
BID
2N
FATC
2G
M99
07
AR
ID4A
CH
D7
FLI1
WH
SC
1L1
ME
IS1
8
54 67
BH
LHE
40
34 5259
STA
T2
DTX
3LIR
F7 AR
SO
X13
ZBTB
16
NFE
2L1
STA
T1
NP
AS
2
71
FHL2
SP4
RN
F2
ID3
70
56
BHLHE41
NCOA7
25
64
ETV3
HIC1
EGR2TRIM25MXD4PIAS3
HIF1AFOXP3INSM1IRF5
MITF
HIV
EP
1
EYA2RUNX3STAT4EZH1RXRAKCNIP3
EOMES
GATA2HMGN3
TCFECWWTR1
IRF3PLAGL1
IRF4RBPJMYCL1
TEAD1RUNX2VDRNR4A2
DACH1
KDM2B
CEBPE
BATF3
PADI4NFE2
FUBP1
PBX1
KLF8
RELB
TBX21
Regulator
Lineage-specific module (MF)
Lineage-specific module (DC)
Lineage-specific module (GN)
Lineage-specific module(myeloid cell)
Lineage-specific module (NK cell)
Lineage-specific module (T cell)
Lineage-specific module (B cell)
Module downregulated withdifferentiationModule upregulated withdifferentiationMixed-use modules
Regulatory interaction
PRDM1
AFF3
ZHX2
BACH2NR4A3
DTX1
MBD2
CREB3L1
NR3C2
ZFP263NCOA3
SREBF2ARID3B
SMAD3ZFP318
SAP18
Figure 5 Ontogenet regulatory model for coarse-grained modules. Lineage-specific modules (inner circle; colors as in Fig. 2, except myeloid induced modules (dark purple)), ‘pan-differentiation’ induced (red) and repressed (gray) modules and mixed-use modules (yellow) and their Ontogenet assigned regulators (outer circle; cream) with regulatory interactions with a maximal effect (absolute activity weight × expression) >1. Lines (regulatory interactions) connect each regulator to the module(s) it regulates.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION �
r e s o u r c e
Inanotherexample,duringthedifferentiationstepthatleadstoNKcells,theNKcellmoduleC19wasassignedtheknownNKcellregula-torsEomesandT-betasactivators.BothEomesandT-betwerealsorecruitedasrepressorsatthisdifferentiationstepinothermodules.ThedifferentiationstepthatleadstoTregcellsrecruitedtheTregcellmoduleC70anditsknownregulatorsFoxp3andCREM(whichhasbeenproposedasaTregcellregulator16).Notably,becauseHSCshavenoparentinourmodel,regulatorsactiveinHSCswillbenotedonlywhentheyarenolongerusedatlaterpoints(forexample,HOXA7andHOXA9werenolongerusedasactivatorsatthemultilymphoid
progenitorstage).Thefirstdifferentiationstepwithactivatorrecruit-mentisthestepthatleadstomultilymphoidprogenitors,atwhichMEIS1isrecruitedtomoduleC42.MEIS1is laterno longerusedbyC42inTcells,inagreementwiththereportedmethylationandsilencingofthegeneencodingMEIS1duringdifferentiationtowardTcells17.
Ranking of lineage activators and repressorsTheactivityweightsassignedforeachregulatorateachdifferentiationpointallowedustoidentifyandrankregulatorsaslineageactivators
100
200
300
400
500
600
700
800
NKBDC T8GN MF MO S&P PROB PRET T4 NKTACTT8 gdT
d
GN MOMF DC B abTNK NKT gdT
CLP.B
M
preT
.ETP.T
h
preT
.DN2.
Th
preT
.DN2−
3.Th
preT
.DN3−
4.Th
T.DN4.
Th
T.ISP.T
h
T.DP.T
h
T.DPbl.
Th
T.DPsm
.Th
T.DP69
+.Th
T.4int
8+.T
h
T.8SP69
+.Th
T.8SP24
int.T
h
T.8SP24
−.Th
T.8Nve
.PP
T.8Nve
.MLN
T.8Nve
.LN
T.8M
em.L
N
T.8Nve
.Sp
T.8M
em.S
p
Activa
ted
αβT C
ell
HES1TRP73
IRF9SCMH1
CBX7TCF15
RUNX2ETS2
KLF15PHF1
SMAD7PIAS3
TBX21RCBTB1ZBTB16
TAF11KLF15
ZBTB16KLF3
CLOCKZFP367
AFF1WHSC1ZFP367ZBTB38TRIM14
SP4MAF
TWIST1KLF15
POU4F2FOXE3AEBP1KLF15SOX4KLF15ATF6
MYNNKLF15
TRIM14ZHX2SOX9MXD4BATF
RCBTB1PIAS3
ITGB3BPMXD4NFIL3CUX1
HOXD8SMARCD2
ATF6STAT4BCL6BSMYD1
MORF4L2LMO4
POU6F1RORA
GFI1
CLP.BM
preT.ETP.Th
preT.DN2.Th
T.DN4.Th
T.4int8+.Th
T.8SP69+.Th
T.DP69+.Th
T.DPbl.Th
T.DPsm.Th
T.8SP24int.Th
T.8SP24-.Th
T.8Nve.SpT.8Nve.LN T.8Nve.MLN T.8Nve.PP
T.8Mem.LN T.8Mem.Sp T.8Nve.Sp.OT1
Activated CD8+ T
preT.DN2-3.Th
T.ISP.Th
PreT.DN3-4.Th
b
a
HES1, TRP73, IRF9, SCMH1, CBX7, TCF15, Runx2, ETS2,KLF15, PHF1, Smad7, PIAS3, 42; 69; 67; 77; 54; 38; 71; 57; 37; 45
GFI1, TBX21, RCBTB1, ZBTB16(2), TAF11, KLF15(2), KLF3, CLOCK, ZFP367(2), AFF1, WHSC1, ZBTB38, TRIM14, SP4, MAF, TWIST1, 68; 71; 52; 56; 35; 62; 6778; 14; 77; 23; 15; 65; 17; 66
ZHX2, Sox9, 18; 37
KLF15(3), POU4F2, Foxe3, AEBP1, Sox4; 39; 79; 38; 66; 77
ATF6, MYNN, KLF15, TRIM14, 71; 78; 73; 34
MXD4(2), BATF, RCBTB1, PIAS3, ITGB3BP, NFIL378; 34; 18; 71; 75; 81
CUX1, HOXD8, SMARCD2, 77; 69; 78
ATF6, 78
STAT4, BCL6B, SMYD1, 59; 73; 19
c
MORF4L2, 20
RORA, 19
LMO4, 44
POU6F1, 4
Reported T cell regulatorNot reported in T cells
SC
SC.COMP.BM CLP.BM
PHB2, CNBP, TRIM28, RUVBL2, HLF, LMO2, PA2G4, NPM1, RBBP7, PHF5A
ENO1PADI4C/EBPβ BACH1XBP1NCOA4STAT6NFE2KLF6TCFE3
CREG1XBP1C/EBPαLMO2ENO1C/EBPβ CNBPBACH1MAFNCOA4
C/EBPβ ENO1XBP1IRF5SFPI1BACH1TRPS1ATF6MEF2ALMO2
RELBETV3KLF6IKBαRBPJXBP1CIITAINSM1RELNFE2L1
POU2AF1PAX5HMGB2CBX5UHRF1EBF1FUBP1SPIBPA2G4ASF1B
EOMESTBX21ETS1ENO1SMAD3AESKLF2ID2SGM9907STAT6
AESTCF7ENO1LEF1BCL11BTRIM28NPM1ETS1FUBP1SP100
GATA3AESZBTB16ID2ENO1SP100NPM1TBX21STAT1LEF1
MBD2AESSTAT4DTX1ETS1ENO1HIC1GATA3TCF7SREBF2
Fre
quen
tly c
hang
ing
regu
lato
ry in
tera
ctio
ns
Repression None Activation
Activity
Figure 6 Changes in activity weights across the hematopoietic lineage tree. (a) Activity weights in each cell type (columns; correspond to color bar below, which matches colors in Fig. 1) for every ‘frequently changing’ regulatory interaction between a regulator and a coarse-grained module: orange, positive (activation) activity weight; purple, negative (repression) activity weight; white, 0 (no regulation). (b) Enlargement of boxed area in a of ‘frequently changing’ interactions for only those activating interactions recruited in the CD8+ T cell lineage. (c) Known and previously unknown regulators recruited in the CD8+ T cell lineage, including the CD8+ T cell lineage branch (squares, cell types; lines (‘edges’), differentiation steps); right, regulators recruited along each differentiation step and their associated modules (red, regulators reported to have a role in T cells; black, not reported in T cells). Numbers in parentheses indicate activity weight changes for the regulator on this edge, if >1. (d) Simplified ImmGen tree (cell type colors correspond to those in Fig. 1) with Ontogenet-inferred lineage regulators (red and black as in c).
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
� aDVaNCE ONLINE PUBLICaTION nature immunology
r e s o u r c e
and repressors on the basis of the entire model (Fig. 6d andSupplementary Table 10).Inthiswaywecorrectlycapturedmanyknownregulatorsofeachlineageamongthetop-rankedactivators.Forexample,ourmodelassociatedc-Myc,N-Myc,GATA-2andMEIS1withstemandprogenitorcells;Bcl-11B,TCF7andGATA-3withαβTcells;POU2AF1,Pax5,EBF1andSpi-BwithBcells;Eomes,T-betandSmad3withNKcells;andGATA-3andPLZFwithNKTcells.Inaddition,themodelmademanypredictionsoflineageregulatorsnotpreviouslyassociatedwiththoselineages,suchasthefollowing:instemandprogenitorcells,HLF;ingranulocytes,DACH1(reportedtoregulatecell-cycleprogressioninmyeloidcells18),Bach1andNFE2;inmacrophages,CREG1;inDCs,ATF6,ETV3,SKIL,NR4A2andNR4A3(showntobeinducedinviralinfectedDCs19,20);inmono-cytes,POU2F2(Oct2;reportedtobeupregulatedduringmacrophagedifferentiation21)andKLF13(aregulatorofBcellsandTcells22withhigherexpressioninmonocytes);inBcells,ZFP318;andinNKcells,ELF4(Gm9907;showntocontroltheproliferationandhomingofCD8+Tcells23).Notably,althoughthis‘pan-model’analysisisuseful,itcandeemphasizethecontributionofimportantregulatorscapturedbythemodelinamorenuancedway—forexample,asactingonlydur-ingalimitedwindowofdifferentiationbutnotpresentinthematurestage.Thosearecapturedbytherecruitmentanddisposalanalysispresentedabove(Fig. 6).
Finally,bycountingthechangesinactivityweightthatoccur(acrossallregulatorsandmodules)ateachdifferentiationstep(‘edge’),wecanidentifythosedifferentiationpointsatwhichregulationis‘rewired’mostsubstantially(Supplementary Fig. 11).Forexample,19regula-torswererecruitedtocoarsemodules(thatis,theiractivityweightincreased from0)at theearly thymocyteprogenitorstage,and28regulatorswererecruitedtocoarsemodulesat theγδTcell stage,includingtheknownTcellregulatorGATA-3andtheknownγδTcellregulatorsId3andSox13(Supplementary Fig. 11a).Atthecommonlymphoidprogenitorstage,fourregulatorsweredisposedof(thatis,theiractivityweightdiminishedto0)bycoarsemodules,includingtheHSCregulatorsHOXA7,HOXA9andHOXB3.Eighteenregula-torsweredisposedofatthedouble-negative-2Tcellprecursorstage,includingGATA-1,c-MycandN-Myc(Supplementary Fig. 11b).
Overall,‘rewiring’wasmoreprominentathigherlevelsinthelineagethanat lower(more-differentiated) levels,althoughthismayhavebeenpartlyduetothediminishedpowertodetectchangesincelltypeswithnoothercellsdifferentiatingfromthem(terminallydifferenti-ated;alsocalled‘leavesinthetree’).Theindividualdifferentiationstepswiththelargestnumberofactivityweightchangeswerethoseinsmall-intestineDCs,thymusγδTcells, liverandlungDCsanddouble-negative-2Tcellprecursorstage,whichsuggestssubstantialregulatory ‘rewiring’ in these cells, possibly due to tissue-specificeffects. The regulatory model for fine modules identified a largernumberofregulatorychanges(achangeinactivityweightfor82%ofthedifferentiationsteps,comparedwith65%forthecoarse-grainedmodulemodel),inparticularindifferentiationstepsleadingto‘leaves’(terminally differentiated cells; 67% versus 48%). Thus, the fine-grainedmoduleshelptoidentifymorecelltype–specificregulation.
ETV5 regulates gd T cell differentiationTotestoneofthemodel’spredictionsin vivo,wecenteredonregu-latoryactivatorsoflineage-specificmoduleswithnoknownfunc-tioninthatlineage.Apracticalcriterionwasthatthegenecouldbemanipulatedin vivoinacelltype–restrictedmanner.WefocusedontheEts familymemberETV5anditspredictedroleasaregulatorofthedifferentiationofγδTcellsinmodulesF287andF289,asitsexpressionishighlyrestrictedtotheγδTcelllineage.Althoughthemodelassignedseveralregulatorstothesemodules,onlytwo,Sox13andETV5,arespecifictotheγδTcelllineage.Bothareexpressedindistinctthymicprecursors,whichraisedthepossibilitythattheyareamongtheearliestdeterminantsofthelineage.Sox13isaknownregulatorofγδTcells,butETV5hasnotbeen linkedtoγδTcelldevelopmentthusfar.
ToassesstheregulatoryroleofETV5inγδTcells,weanalyzedγδTcelldevelopmentandfunctioninmicelackingETV5specificallyinTcells(CD2p-CreTg+Etv5fl/flmice).AsthymocytesthatexpresstheγδTcellantigenreceptortransitfromimmaturecellswithhighexpression of the cell surface marker CD24 (CD24hi) to matureCD24locells,theyacquireeffectorfunctions24.ETV5hasitshigh-estexpressioninγδthymocytesexpressingγ-chainvariableregion2
ba
7920 962.8
30 30
2516
2.5 2.7
7421
6
3
0
10
5
0
Thymus
Spleen
CD24
CD62L
CD
44
ThymusCtrl CKO
Eve
nts
(% o
f max
)
100806040200
0
104
103
102
105
0 104103102 105
0 104103102 105
c
67
9.7
42
32
RORγt
Thymus
IL-17
8.849
100
80
60
40
20
0
100
80
60
40
20
0
18
18
9.8
45
42
43
14
74
CD27
IFN-γ
IL-1
7C
CR
6
LNCtrl CKO
0
104
103
102
105
0
104
103
102
105
0 104103102 105
CtrlCKO
Ctrl CKO
Eve
nts
(% o
f max
)E
vent
s(%
of m
ax)
γδ T
cel
ls (
×103 )
γδ T
cel
ls (
×105 )
Figure 7 ETV5 is a regulator of γδ T cells. (a) Total γδ T cells in the thymus and spleen of mice with T cell–specific ETV5 deficiency (CKO) and their CD2p-CreTg+Etv5+/+ littermates (control (Ctrl)). (b) Expression of CD24 (top) and of CD44 and CD62L (bottom) by Vγ2+ thymocytes from 7-day-old mice as in a. Numbers above bracketed lines (top) indicate percent CD24lo (mature) thymocytes (left) or CD24hi (immature) thymocytes; numbers in quadrants (bottom) indicate percent cells in each. Similar results were obtained for mice of different ages. (c) Expression of RORγt and production of IL-17 (left) by mature (CD24lo) Vγ2+ thymocytes of mice as in a; numbers above bracketed lines indicate percent RORγt− cells (top; left number) or RORγt+ cells (top; right number), or IL-17+ cells (bottom). Right, expression of CCR6 and CD27 (top) and of IL-17 and interferon-γ (IFN-γ; bottom) by Vγ2+ T cells from the lymph nodes (LN) of 4-week-old mice as in a; numbers in or adjacent to outlined areas indicate percent cells in each. Data are representative of three experiments with one to three independent litters with a minimum of two mice per genotype (a; error bars, s.e.m.), five experiments (b) or three experiments (c).
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION �
r e s o u r c e
(Vγ2)oftheTcellantigenreceptor,whichconstitutenearlyhalfofallγδTcellsinpostnatalmice.MostVγ2+cellsdifferentiateintointer-leukin17(IL-17)-producingγδeffectorcellsinthethymus24.Thus,onepredictionofthemodelwasthattheintrathymicdevelopmentofIL-17-producingγδeffectorcellswouldbeparticularlyimpairedintheabsenceofETV5.InmicewithconditionalTcell–specificdeficiencyinETV5,theoverallnumberofγδTcellsgeneratedwassimilartothatofcontrolmice(theirCD2p-CreTg+Etv5+/+littermates):in7-day-oldneonates,totalnumberofthymocytesinmicewithTcell–specificETV5deficiencywas~50%ofnormal,butthefrequencyofthymo-cytesthatexpressedtheγδTcellantigenreceptorwasabouttwofoldhigher,whichresultedinanabundanceofγδTcellsinthethymusandspleensimilartothatincontrolmice(Fig. 7a).However,therewasspecificlossofmatureVγ2+thymocytesinmicewithTcell–specificETV5deficiency(Fig. 7b,top).Thismayhavebeenduetoinefficientactivation,asindicatedbythelowerexpressionofCD44(thenominalmarkeroflymphocyteactivation)onVγ2+thymocytesfrommicewithTcell–specificETV5deficiencyandthecorrespondinglyhigherexpres-sionofCD62L(amarkerofthenaivestate)onthosecells(Fig. 7b,bottom).ForγδthymocytesthatexpressedotherVγchains,thepro-portionofmaturecellsoractivatedcellsinmicewithTcell–specificETV5deficiencywasnotdifferentfromthatofcontrols.Critically,theresidualmaturethymocytesinmicewithTcell–specificETV5deficiencywereimpairedinthegenerationofIL-17-producingγδeffectorcells(Fig. 7c).MatureVγ2+thymocytesfromETV5-deficientmicehadlowerexpressionofthetranscriptionfactorRORγt(whichinducesIl17transcription),andboththymicandperipheralγδTcellswereimpairedinthegenerationofCCR6+CD27−IL-17-producingγδeffectorcells(Fig. 7c).TheseresultssupportedthepredictionofourmodelanddemonstratedthatETV5wasessentialforproperintra-thymicmaturationoftheIL-17-producingγδeffectorcellsubset.
Studying the Ontogenet model on the ImmGen portalTo facilitate exploration and testing of other predictions of ourmodel,weprovidethefullsetofmodulesandregulatorymodelaspartoftheImmGenportal,withrelevanttoolsforsearching,brows-ingandvisuallyinspectingtheresults.Specifically,the‘ModulesandRegulators’databrowseroftheImmGenportal(http://www.immgen.org/ModsRegs/modules.html)isthegatewaytotheOntogenetregu-latorymodeloftheImmGen.Itallowstheusertobrowsecoarse-grainedorfine-grainedmodulesbytheirnumber,theirpatternofexpression, a gene they contain, a regulator predicted to regulatethemorthecelltypeinwhichtheyareinduced.Foreachmodule,wepresenttheexpressionofitsgenesandpredictedregulators(eachasaheatmap),theactivityweightsofeachregulatorineachcell,andthemodule’smeanexpressionprojectedonthelineagetree(asinFig. 4a).Themodulepagealsolinkstoalistofthegenesinthemodule,theregulatorsthataremembersofthemodule,theregulatorspredictedtoregulatethemodule,theregulatorssuggestedbyenrichmentofcismotifsandbindingeventsofthemodulegenes,andfunctionalenrichmentsofthemodule.Finally,weprovidelinksfordownloadingatablewiththeassignmentofallgenestocoarseandfinemodules,theregulatoryprogramofallmodules,andtheOntogenetcode.
DISCUSSIONTheImmGendatasetprovidesthemostdetailedandcomprehensiveviewofthetranscriptionalactivityofanymammalianimmunesystemand(arguably)ofanydevelopmentalcell–differentiationprocess.Wehaveusedthosedatatoanalyzetheregulatorycircuitsunderlyingsuchprocesses,fromglobalprofilestomodulestothetranscriptionfactors thatcontrol them.Theunique featuresofOntogenethave
allowedustoidentifyregulatoryprogramsactiveatspecificdifferen-tiationstagesandtofollowthemasthey‘unfold’and‘rewire’.
Ouranalysishasautomatically reidentifiedmanyof theknownregulatorsandtheircorrectfunction,hassuggestedadditionalrolesforatleast175moreregulatorsnotassociatedbeforewithhemato-poiesisandhasidentifiedpointsinthelineageatwhichregulatorsarerecruitedtocontrolaspecificgeneprogramorlosetheirregula-toryfunction.Ourabilitytoautomaticallyreidentifymanyknownregulatorsattheappropriatedevelopmentalstageandthesignificantcorrespondenceamongthepredictedregulators,knownfunctions,enrichmentforcis-motifsandenrichmentbyChIPfollowedbydeepsequencingsupportstheprobablyhighqualityofournewpredictions.Amongthose,weexperimentallytestedandconfirmedapreviouslyunknownroleforETV5inthedifferentiationoftheγδTeffectorcellsubset.AdditionalstudiesshoulddeterminewhetherETV5regulatesthedifferentiationofIL-17-producingγδeffectorcellsbyselectivelycontrollingtheexpressionofgenesinγδlineage–specificmodules.
Ontogenet’srichmodelallowsustopredictthespecificbiologi-calcontextatwhichregulationoccurs,togeneralizebroadrolesforregulatorsandtoidentifyglobalprinciplesoftheregulatoryprogram.Theabilityto identifyregulatorsthatactonlyduringspecificdif-ferentiationwindowshelpstodetect‘early’programmingtranscrip-tion factorswhoseexpression is shutoffwhencells transit to thematurestage.However,integratingacrossthemodel’spredictionsinanentirelineagehelpstoidentifytranscriptionfactorsimportantforthemaintenanceoflineageidentityorfunction,suchasthosethatdirectlyregulatetheexpressionofeffectormolecules.Finally,general-izingacrossmultipleregulators,wecanidentifythosedifferentiationstepsatwhichregulatorycontrol‘rewires’mostsubstantiallyandtheregulatorsthatcontrolsuch‘rewiring’.
Aswithallexpression-basedmethodsusedtopredictregulation,Ontogenetcannotdirectlydistinguishcausaldirectionality.Toavoidarbitrary resolution of this ambiguity, Ontogenet allows severalregulatorswithsimilarexpressionprofilestobeassignedtogetheras regulators of a module. The dense interconnected circuits andextensiveautoregulationinothermammaliancircuitsthatcontrolscellstates3,25suggestthatsuchregulatoryinteractionsareprobablyfunctional,althoughsomemaybe‘false-positive’results.Conversely,theactivationofotherfunctionalregulatorsmaynotbereflectedbytheirexpression,andsomemayhavebeenfilteredbyourstringentcriteria(forexample,Tal1,whichencodesaknownHSCregulator).Thosemaybecapturedbyourcomplementaryanalysisofenrich-mentofmodulesincis-regulatorymotifsandbindingofregulators.Anotherchallengeisposedbygeneswithuniqueexpressionprofilesthatareassignedtomoduleswithsimilarbutdistinctexpressionpro-files(suchasRag1andRag2inmoduleC5).Theinferredregulatoryprogramisunlikelytoholdtrueforthosegenes.
Asimilarstudyofhumanhematopoiesis3hassuggestedsubstan-tialmixeduseofmodulesbylineages,whereasthemousecompen-diumsuggeststhatmostmodulesarelineagespecific.Ashasbeenshownbefore,globalprofiles,lineage-specificsignaturesandgene-coexpression patterns are otherwise broadly conserved betweenhumansandmice4.Onepossiblereasonforthediminished‘mixeduse’inthemouseprogramisthatwhereasthemousedatasetcon-tainsmanymorecelltypes,itdoesnotincludeerythrocytes,mega-karyocyte,basophilsandeosinophils,thecellsforwhichmanyofthe‘mixed-use’patternshavebeenobservedinhumans3.Notably,manyregulatorsweresharedacrosslineages.Inparticular,someregulatorswereactive inonlyone lineage insomemodulesbutweresharedbylineagesinothermodules.Forexample,ATF6wasanactivatorinalllineagesinthemyeloidmodulesC25,C45andC49butwasa
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
�0 aDVaNCE ONLINE PUBLICaTION nature immunology
r e s o u r c e
Tcell–specificrepressorintheTcellprecursormoduleC57andwasaTcell–specificactivatorintheBcellmoduleC71.
Ontogenetisapplicabletootherdifferentiationdatasets,includingdataobtainedwithfetalsamplesorforcancerstudies,whenotherpre-dictorsareusedascandidateregulators(forexample,geneticvariantsasinLirnet6),whencellsaremeasuredinboththerestingstateandstimulatedstate,orforprotein-expressiondata(forexample,single-cell,high-dimensionalphosphoproteomicmasscytometrydata26).Ineachcase,theabilitytoshareregulatoryprogramsbyrelatedcelltypesorconditionscanbothenhancethepowerandhelpwithbiologicalinterpretation.Notably,Ontogenetnowdependsonapreconstructedontogeny.Althoughmuchisknownaboutthehematopoieticlineage,somepartsremainunstructured(forexample,allDCsinthemyeloidlineage)andsomeprogenitorsarenotknown(forexample, thoseofγδTcellsorotherinnate-likelymphocytes).Thisreflectsinpartinherentlineageflexibility,wherebyseveralcelltypescandifferenti-ateintothesamecelltype,butreflectsinpartsimplythepresentlackofknowledgeoftheparticularprogenitorofagivencelltype.Newmethodswouldbeneededtoconstructanontogenyautomaticallyortoreviseanexistingone.Inothercases,Ontogenet’soutputcanbeusedtorefinethetopologyoftheontogenybyidentifying‘edges’thatdonotcorrespondtoanychangesinregulatoryprogramsandcanberemovedwithoutdisconnectingthelineage.TheImmGencompen-dium,coarse-and fine-grainedmodulesand identifiedregulatorsandregulatoryrelationsareallavailableforinteractivesearchingandbrowsingandfordownloadingattheImmGenportalandwillprovideaninvaluableresourceforfuturestudiesoftheroleofgeneregulationincelldifferentiationandimmunologicaldisease.
METHODSMethodsandanyassociatedreferencesareavailable in theonlineversionofthepaper.
Accession code.GEO:microarraydata,GSE15907.
Note: Supplementary information is available in the online version of the paper.
ACKnOwleDGmenTSWethanktheImmGencoreteam(includingM.PainterandS.Davis)forhelpwithdatagenerationandprocessing;eBioscience,AffymetrixandExpressionAnalysisforsupportofImmGen;L.Gaffneyforhelpwithfigurepreparationandlayoutofthelineagetree;S.Hartforinitiallayoutofthelineagetree;andA.Liberzon(MolecularSignaturesDatabase)forthepositionalgenesetsformouse.SupportedbyNationalInstituteofAllergyandInfectiousDiseases(R24AI072073totheImmGenConsortium),theUSNationalInstitutesofHealth(A.R.;andU54-CA149145and149644.0103toV.J.andD.K.),theBurroughsWellcomeFund(A.R.),theKlarmanCellObservatory(A.R.),theHowardHughesMedicalInstitute(A.R.),theMerkinFoundationforStemCellResearchattheBroadInstitute(A.R.)andtheNationalScienceFoundation(DBI-0345474toV.J.andD.K.).
AUTHOR COnTRIBUTIOnSV.J.,T.S.,A.R.andD.K.designedthestudy;V.J.developedthealgorithm,withinputfromT.S.,A.R.andD.K.;T.S.analyzedthedata;K.S.didexperiments;O.Z.providedthemotif-relatedcodeandparticipatedinwriting;X.S.generated
amousemodel;J.K.designedtheexperimentsandparticipatedinwritingthemanuscript;andV.J.,T.S.,A.R.andD.K.wrotethemanuscriptwithinputfromallauthors.
COmPeTInG FInAnCIAl InTeReSTSTheauthorsdeclarenocompetingfinancialinterests.
reprints and permissions information is available online at http://www.nature.com/reprints/index.html.
1. Heng, T.S.P. et al. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).
2. Iwasaki, H. & Akashi, K. Myeloid lineage commitment from the hematopoietic stem cell. Immunity 26, 726–740 (2007).
3. Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
4. Shay, T. et al. Conservation and divergence in the transcriptional programs of the human and mouse immune systems. Proc. Natl. Acad. Sci. USA 110, 2946–2951 (2013).
5. Kim, H.D., Shay, T., O’Shea, E.K. & Regev, A. Transcriptional regulatory circuits: predicting numbers from alphabets. Science 325, 429–432 (2009).
6. Lee, S.-I. et al. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 5, e1000358 (2009).
7. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
8. Capaldi, A.P. et al. Structure and function of a transcriptional network activated by the MAPK Hog1. Nat. Genet. 40, 1300–1306 (2008).
9. Ram, O. et al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell 147, 1628–1639 (2011).
10. Miller, J.C. et al. Deciphering the transcriptional network of the dendritic cell lineage. Nat. Immunol. 13, 888–899 (2012).
11. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 301–320 (2005).
12. Tamura, T. et al. IFN regulatory factor-4 and -8 govern dendritic cell subset development and their functional diversity. J. Immunol. 174, 2573–2581 (2005).
13. Orkin, S.H. & Zon, L.I. SnapShot: hematopoiesis. Cell 132, 712.e711–712.e712 (2008).
14. Pierre, M., Yoshimoto, M., Huang, L., Richardson, M. & Yoder, M.C. VEGF and IHH rescue definitive hematopoiesis in Gata-4 and Gata-6–deficient murine embryoid bodies. Exp. Hematol. 37, 1038–1053 (2009).
15. Kishi, H. et al. Lineage-specific regulation of the murine RAG-2 promoter: GATA-3 in T cells and Pax-5 in B cells. Blood 95, 3845–3852 (2000).
16. Bodor, J., Fehervari, Z., Diamond, B. & Sakaguchi, S. ICER/CREM-mediated transcriptional attenuation of IL-2 and its role in suppression by regulatory T cells. Eur. J. Immunol. 37, 884–895 (2007).
17. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010).
18. Lee, J.-W. et al. DACH1 regulates cell cycle progression of myeloid cells through the control of cyclin D, Cdk 4/6 and p21Cip1. Biochem. Biophys. Res. Commun. 420, 91–95 (2012).
19. Ng, S.S.M., Chang, T.-H., Tailor, P., Ozato, K. & Kino, T. Virus-induced differential expression of nuclear receptors and coregulators in dendritic cells: Implication to interferon production. FEBS Lett. 585, 1331–1337 (2011).
20. Wang, T. et al. Inhibition of activation-induced death of dendritic cells and enhancement of vaccine efficacy via blockade of MINOR. Blood 113, 2906–2913 (2009).
21. Neumann, M. et al. Differential expression of Rel/NF-κB and octamer factors is a hallmark of the generation and maturation of dendritic cells. Blood 95, 277–285 (2000).
22. Outram, S.V. et al. KLF13 influences multiple stages of both B and T cell development. Cell Cycle 7, 2047–2055 (2008).
23. Yamada, T., Park, C.S., Mamonkin, M. & Lacorazza, H.D. Transcription factor ELF4 controls the proliferation and homing of CD8+ T cells via the Kruppel-like factors KLF4 and KLF2. Nat. Immunol. 10, 618–626 (2009).
24. Narayan, K. et al. Intrathymic programming of effector fates in three molecularly distinct γδ T cell subtypes. Nat. Immunol. 13, 511–518 (2012).
25. Yosef, N. & Regev, A. Impulse control: temporal dynamics in gene transcription. Cell 144, 886–896 (2011).
26. Bendall, S.C. et al. Single-cell Mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).
The complete list of authors is as follows: Adam J Best9, Jamie Knell9, Ananda Goldrath9, Vladimir Jojic1, Daphne Koller1, Tal Shay2, Aviv Regev2,5, nadia Cohen11, Patrick Brennan11, michael Brenner11, Francis Kim12, Tata nageswara Rao12, Amy wagers12, Tracy Heng13, Jeffrey ericson13, Katherine Rothamel13, Adriana Ortiz-lopez13, Diane mathis13, Christophe Benoist13, natalie A Bezman14, Joseph C Sun14, Gundula min-Oo14, Charlie C Kim14, lewis l lanier14, Jennifer miller15, Brian Brown15, miriam merad15, emmanuel l Gautier15,16, Claudia Jakubzick15, Gwendalyn J Randolph15,16, Paul monach17, David A Blair18,
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology aDVaNCE ONLINE PUBLICaTION ��
r e s o u r c e
9Division of Biological Sciences, University of California San Diego, La Jolla, California, USA. 10Broad Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 11Division of Rheumatology, Immunology and Allergy, Brigham and Women’s Hospital, Boston, Massachusetts, USA. 12Joslin Diabetes Center, Boston, Massachusetts, USA. 13Division of Immunology, Department of Microbiology & Immunobiology, Harvard Medical School, Boston, Massachusetts, USA. 14Department of Microbiology & Immunology, University of California San Francisco, San Francisco, California, USA. 15Icahn Medical Institute, Mount Sinai Hospital, New York, New York, USA. 16Department of Pathology & Immunology, Washington University, St. Louis, Missouri, USA. 17Department of Medicine, Boston University, Boston, Massachusetts, USA. 18Skirball Institute of Biomolecular Medicine, New York University School of Medicine, New York, New York, USA. 19Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA. 20Computer Science Department, Brown University, Providence, Rhode Island, USA. 21Department of Biomedical Engineering, Howard Hughes Medical Institute, Boston University, Boston, Massachusetts, USA. 22Program in Molecular Medicine, Children’s Hospital, Boston, Massachusetts, USA. 23Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.
michael l Dustin18, Susan A Shinton19, Richard R Hardy19, David laidlaw20, Jim Collins21, Roi Gazit22, Derrick J Rossi22, nidhi malhotra3, Katelyn Sylvia3, Joonsoo Kang3, Taras Kreslavsky23, Anne Fletcher23, Kutlu elpek23, Angelique Bellemare-Pelletier23, Deepali malhotra23 & Shannon Turley23
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology doi:10.1038/ni.2587
ONLINE METHODSData set.ExpressionofmousegeneswasmeasuredonAffymetrixMogen1arrays(Affymetrixannotationversion31).SortingstrategiesfortheImmGenpopulationsareavailableontheImmGenwebsite(http://www.immgen.org/Protocols/ImmGenCellprepandsortingSOP.pdf).
AsthedatasetoftheImmGengraduallygrewfrom2010to2012,cluster-ing,regulatoryprogramreconstructionandfinalpresentationweredoneonthreedifferentImmGenreleases(September2010,March2011andApril2012)withattemptstomaximizebackwardcompatibilityasmuchaspossible.TheclustersandtheregulatoryprogramarefromtheSeptember2010andMarch2011releases,chosentoensureconsistencywiththeotherImmGenReportpapersthatrefertothem.ClusteringwasdoneontheImmGenreleaseofSeptember2010,with744samples,647ofwhichremainedintheApril2012release.OntogenetwasappliedtotheImmGenreleaseofMarch2011,onlytothedataofthe676samples(195hematopoieticcelltypes)thatwereconnectedtothehematopoietictree.Thus,wemaintainedmembershipinclustersfromtheearlieranalysisbutusedonlysomeofthesamplestolearntheregulatoryprogram.Theheatmapspresentedhereinclude755samples(244cell types),excludingcontrolsamples.Forsimplicity,only720sam-plesarepresentedonthefulltree(210celltypes).Supplementary Table 1listsallthesamplesinthelastImmGenrelease(April2012)andstatesforeachsampleifitwasusedingeneratingthemodules,regulatoryprogramreconstruction,thepresentedheatmapsandtree.TheImmGenwebsiteiscontinuouslyupdated.
Data preprocessing.ExpressiondatawerenormalizedaspartoftheImmGenpipeline by the robust multiarray average method. Data were log2 trans-formed.Forgeneswithmorethanoneprobesetonthearray,onlytheprobesetwiththehighestmeanexpressionwasretained.Ofthose,onlyprobesetswithas.d.valueabove0.5fortheentiredatasetwereusedfortheclustering,whichresultedwith7,965uniquegeneswithadifferenceinexpressionintheSeptember2011releaseand8,431intheApril2012release.
Lineage-specific signatures.Wecalculatedsignaturesfor11lineages:granu-locyte,macrophage,monocyte,DC,Bcell,NKcell,CD4+Tcell,CD8+Tcell,NKTcell,γδTcellandstemandprogenitorcell.AssignmentofsamplesintolineagesisinSupplementary Table 2.One-wayanalysisofvariancewasdoneforeachofthe6,997geneswithanexpressionvalueabovelog2(120)inatleastonelineage,followedbypost-hocanalysis(functionsanova1andmultcompareinMATLABsoftware).Foreachofthe11lineages,agenewasconsideredinducedifithadsignificantlyhigherexpressioninthatlineagethaninallotherlineages.Agenewasconsideredrepressedifithadsignificantlylowerexpres-sioninthatlineagethaninallotherlineages.Afalse-discoveryrate(FDR)of10%wasappliedtotheanalysisofvariancePvaluesofallgenes.
Definition of modules. Modules were defined by clustering. For coarse-grained modules, clustering was done by superparamagnetic clustering(SPC)27,aprincipledapproachforchoosingstableclustersfromahierarchi-calsetting.SPCwasusedbecauseitdoesnotrequireapredefinednumberof clusters but instead identifies the number inherently supported by thedata.TheclustersdefinedbySPCarestableacrossarangeofparameters,although they can have variable degrees of compactness. SPC was runwithdefaultparameters,whichresultedin80stableclusters(coarse-grainedmodules C1–C80); the remaining unclustered genes were grouped into aseparatecluster(C81).
Each coarse-grained module was further partitioned into fine-grainedmodulesbyaffinitypropagationclustering28,withcorrelationastheaffinitymeasure.The‘self-responsibility’parameter(whichindicatesthepropensityofthealgorithmtoformanewcluster)wassetat0.01.AffinitypropagationwasusedbecauseSPCandhierarchicalclusteringdidnotfurtherbreakthecoarsemodules.Affinitypropagationcouldnotbeusedforclusteringofallgenes,becauseitmustworkwitha‘sparsified’affinitymatrix.
Clusteringresultedin334fine-grainedmodules(F1–F334).Onaverage,3.9 fine-grained modules were nested in a single coarse-grained module.Theminimumnumberoffinemodulesnestedinacoarse-grainedmodulewas 1 (23 coarse-grained modules) and the maximum was 11 (7 coarse-grainedmodules).
Choice of candidate regulators. Candidate regulators were curated fromthefollowingsources:mouseorthologsofallthegenesencodingmoleculesusedascandidateregulatorsinapublishedstudyofhumanhematopoiesis3;genesannotatedwiththegene-ontologyterm‘transcriptionfactoractivity’inmouse,humanorrat;genesforwhichthereisaknownDNA-bindingmotifinTRANSFACmatrixdatabase(version8.3)29,theJASPARdatabase(version2008)30andexperimentallydeterminedpositionweightmatrices(PWMs)31,32;andgeneswithpublisheddataobtainedbyChIPfollowedbydeepsequenc-ingorChIPfollowedbymicroarray(Supplementary Table 11).Regulatorsthatwerenotmeasuredon thearrayorwhoseexpressiondidnot changesufficiently(s.d.<0.5acrosstheentiredataset)tobeincludedinthecluster-ingwereremoved,unlesstheywerehighlycorrelated(>0.85)withanotherregulator thatpassed thecutoff.This resulted in578candidate regulators(Supplementary Table 12).
Hematopoietic tree building.Thehematopoietictree(Fig. 1)wasbuiltbythemembersoftheImmGenConsortium.Eachgroupcreateditsownsub-lineagetree,andthesublineagetreeswereconnectedonthebasisofthebestknowledgeavailableatpresent,althoughsomeedgesarehypothetical(dashedlines,Fig. 1).Therearetworootstothetree:long-termstemcellsfromadultbonemarrow,andlong-termstemcellsfromfetalliver.Eachpopulationisanodeinthetree(square,Fig. 1).Edgesindicateadifferentiationstep,anactivationstep,time(asintheactivatedTcells)orageneralassumptionofsimilarityinregulatoryprogram(Supplementary Table 13).Someintermedi-ateinferrednodeswereaddedtogroupcellpopulationsthatwereassumedtohaveacommonprogenitororcommonregulatoryprogrambutforwhichthishypotheticalpopulationwasnotmeasured(forexample,granulocytesandmacrophages).Forthepopulationsthatconnectedtomorethanoneparentpopulation,oneoftheedgeswasmanuallypruned,eitherthelesslikelyoneorarbitrarily(Supplementary Table 13).
Module regulatory program.Ontogenettakesthefollowingasinput:gene-expressionprofilesacrossmanydifferentcelltypes;apartitioningofthegenesintomodules(thecoarse-grainedandfine-grainedclustersdescribedabove);apredefinedsetofcandidateregulators;andanontogenytreerelatingthecelltypes.Itthenconstructsaregulatoryprogramforeachmoduleconsistingofalinearcombinationofregulatorswithpossiblydistinctactivityweightsforeachregulatorineachcelltype.Amodule’sregulatoryprogramisthelinearsumoftheregulators’expressionmultipliedbyeachregulator’sactivityweight,whichapproximatestheexpressionpatternofthemodule.Eachregulatoryprogramaimstoexplainasmuchofthegene-expressionvarianceinthemoduleaspossiblewhileremainingassimpleaspossibleandbeingconsistentacrossrelatedcelltypesintheontogeny.Inaregularlinearmodel,theactivityweightsareconstantacrossallconditions.Here,weallowachangeofactivityweightsbetweencelltypes(Fig. 3).
Notably,allregulatorsareconsideredaspotentialregulatorsforeachmod-ule.Thatincludesregulatorsthataremembersofthemodule.Thus,amodulecanbeassignedregulatorsthatareitsmembersandregulatorsthatarenotitsmembers,butregulatorsthataremembersofthemodulewillnotnecessarilybeassignedtoit.
Moreformally,wemodeltheexpressionofageneinamoduleasa(noisy)linearcombinationoftheexpressionoftheregulators.Wedenotetheactivityofaregulatorrinacelltypetasar,t.Wemodeltheexpressionofagenei,amemberofmodulem,incelltypetas
X w ai t m r t r t m tr, , , , ,= +∑ e
whereeachεm,tisaGaussianrandomvariablewith0meanandvariancesm t,2
specifictoacombinationofamodulemandacelltypet.Hencetheregula-toryprogramlearnedbyOntogenetisrepresentedintermsofwm,r,tactivityweightsspecifictoacombinationofmodule,regulatorandcelltype.Becauseofparametertyingenforcedbythemodel,theeffectivenumberofparametersismuchsmallerthanthenominalsizeoftheregulatoryprogramrepresentation(modules)×(regulators)×(celltypes).
Module cell-type specific variance estimation. The module varianceinagivencell typesm t,
2 is estimated from theexpressionof themodule’s
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunologydoi:10.1038/ni.2587
membergenesacrossallreplicatesofthecelltype.Althoughweuseanunbi-asedestimator,wemakespecialconsiderationsforthemoduleswithlessthan10members.Forthesemodulesthevarianceestimatesm t,
2 iscomputedbyapooledvarianceestimatoracrossmoduleswithmorethantenmembersbutstillspecifictothecelltype.Theestimatedvariancesinafine-grainedmodulearetypicallysmallerthanthevariancesinitsparentcoarse-grainedmodule.
Regulatory program fitting as a penalized regression problem.Estimationoftheactivityweightswm,r,ttakestheformofaregressionproblem,butbecauseof ‘over-parameterization’of theproblem, itmustbe ‘regularized’withanextensionof the fusedLasso framework33,whichgivesrise toapenalizedregressionproblemoftheform
1 12 2
2
nx w a J w
m m ti t m r t r tri t s ,, , , ,, ( ),−( ) +∑∑
whereJ(w)isachosenpenalty.Inourcase,thispenaltyiscomposedoftwoparts,onethatpromotessparsityandselectionofcorrelatedpredictorsandanotherthatpromotesconsistencyofregulatoryprogramsbetweenrelatedcelltypes.
Weassumethatonlyasmallnumberofregulatorsareactivelyregulatinganyonemodule.Astandardapproachtopromotingsuchsparsityinregres-sionproblemsistointroduceanL1penalty,thesumofabsolutevaluesΣmΣrΣt|wm,r,t|.However,thispenaltytendstobeoverlyaggressiveininducingsparsityandthusprunesmanyhighlycorrelatedpredictorsandselectsonlyasinglerepresentative.Suchaggressivepruningmaybeinappropriate,asthecorrelatedregulatorsmayallbebiologicallyrelevantbecauseof‘redundancy’indenselyinterconnectedregulatorycircuits.Thatcanbecounteractedbytheadditionofsquaredterms12
2Σ Σ Σm r t m r tw( ), , ,whichyieldsacompositepenaltyknownas‘ElasticNet’11,asproposedbefore6,
l k| | ( ), , , ,w wm r tt m r ttrr∑ ∑∑∑ +2
2
whichwewritecompactlyas
l k|| || || ||w wm m1 2 22+
Animportantinputtoourregulatoryprogramfittingprocedureistheontog-eny(differentiation)tree(Supplementary Table 13).Thistreeisencodedasanedgelist(f ),andwith(t1,t2)∈fwedenotethatcelltypet1isaparentofcelltypet2.Thesimilarityoftheregulatoryprogramsforaparticularmoduleintworelatedcelltypes(t1,t2)∈fcanbeassessedasasumoftheabsolutevalueofthedifferenceofactivityweightsinthetwoprograms,Σr m r t m r tw w| |, , , ,1 2− .Thekeyobservationisthat| |, , , ,w wm r t m r t1 2− =0iftheregulatoryrelationshipbetweenregulatorrandmodulemisthesameincelltypet2anditsparenttypet1.Moregenerally,thetotaldifferenceoftheregulatoryprogramscanbewrittenasΣ Σ( , ) , , , ,| |t t f r m r t m r tw w1 2 1 2∈ − .Wewillwritethisterminacompactformas||Dwm||,wherewm isavectorofactivityweightsforallregulatorsacrossallcelltypesconcatenatedtogetherandDisamatrixofsize(RE)×(RT),whereRisthenumberofregulators,TisthenumberofcelltypesandEisthenumberofedgesinthetree.WenotethatmultiplicationbythematrixDcomputesthedifferencesbetweenrelevantentriesofthevectorwm.Thelesstheregulatoryprogramschangethroughoutdifferentiation,thesmallertheterm||Dwm||.Thus,withthistermasapenaltywillpromotethepreservationofaconsistentregulatoryprogramthroughoutdifferentiation.
Combiningalltheconsiderationsabove,thecompleteobjectiveforfittingaregulatoryprogramofamodulemisgivenby
1 12 22
21 2
2n
x w a w w Dwm m t
i t m r t r tr m m msl k g
,, , , , || || || || || |−( ) + + +∑ || ., 1i t∑
Optimizationofthisobjectiveissomewhatcomplicatedbythefactthatabsolutevalueisanon-smoothfunctionandhencedirectoptimizationbymethodssuchasgradientdescentisnotfeasible,astheseworkonlyonsmoothprob-lems.Alternativemethods,suchasprojectedgradients,canbeused,buttheirconvergenceisrelativelyslow.Wethereforeoptedtouseaprimaldualinteriorpointmethod34.Differentchoicesoftheparametersλ,κandδyielddifferent
regulatory models as solutions, with different data-fitting and model-complexityproperties.Wescannedsetsofparametersintherange(thesched-uleforeachoftheparametersλ,κandδwasgeometric,e−7,e−6,…,e3spanningvaluesbetween0.001and20)andchosetheoptimalsetofparameterswiththeBayesianinformationcriteria(describedbelow).
Tosimplify thediscussionof theoptimization,we introduce thesparsepredictormatrixAofsize(RT)×(T),whereA at r T t r t mt,( ) , /− + =1 s and=0otherwise.Furthermore,wenotethattheoptimalwmdependsonlyonthemeanexpressionprofileofthemodule’sgenesandwecanintroducevariabley
nxt
mti m
mi t= ∈
1 1s
Σ , .Hencewecanrewritetheobjectiveas
12 22
21 2
21|| || || || || || || || .y Aw w w Dwm m m m− + + +l k g
Finallywecanabsorbthetermk2 22|| ||wm intothefirsttermasfollows:
12 0
2
2
1 1y A
Iw w Dwm m m
−
+ +k
l g|| || || || .
Regulatory program transfer between coarse-grained and fine-grained modules.Thefine-grainedmodulesare‘encouraged’tohaveaprogramsimilartothatofthecoarse-grainedmoduleinwhichtheyarenested.Thisisaccom-plishedbytheintroductionofanadditionalpenaltyterm.Wewilldenotethealreadylearnedregulatoryprogramofacoarse-grainedmoduleasw0andtheregulatoryprogramofafine-grainedmodulethatwewishtolearnaswm.Thecoarse-to-fineversionofourobjectiveisthen
12 2 22
21 2
21 0 2
2|| || || || || || || || || ||y Aw w w Dw w wm m m m m− + + + + −l k g t
wherethelasttermtiestheprogramsofthecoarse-grainedandfine-grainedmodules.Thisobjectivecanbetransformedinto
12
0
0 2
2
1
y
w
A
I
I
w w Dwm m m
t
k
t
l g
−
+ +|| || || ||| .1
Solving the prototypical optimization problem.Wenotethatregulatory-program-fittingproblemsforbothcoarse-grainedandfine-grainedmodulehavebeenexpressedinthefollowinggeneralform
minimizew
y Xw w Dw12 2
21 1|| || || || || || .− + +l g
Wereformulatethatoptimizationproblembyaddingvariablesthatdecouplethepenalties:
minimize
subject to
w, , || || || ||
, ,
z d r r z d
r y Xw z w d Dw
12 1 1′ + +
= − = =
l g
..
Thisreformulationenablesstraightforwardderivationofaprimaldualinteriorpointmethod34.
Model selection with Bayesian information criterion.Theformulationofouroptimizationproblemaboveisdependentonthesetofparametersλ,κandδ;weobtainamodelbysolvingtheconvexproblemaboveforaparticularcombinationofλ,κandδ.Differentcombinationsoftheseparameterswillyieldregulatoryprogramsofdifferentquality.Onewaytoidentifytheoptimalλ,κandδisthroughtheuseofheld-outdataorthroughcross-validation.However,asearchfortheseparameterswithcross-validationwouldbepro-hibitivelyexpensive.Asanalternative,weuseamodelselectionapproachbasedontheBayesianinformationcriterion(BIC)tocomparemodelsresult-ingfromdifferentchoicesofthesethreeparametersandselectthebestone.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunology doi:10.1038/ni.2587
Thiscriterioncomparesmodels,here‘encoded’byregulatoryprograms,basedontheirtradeoffbetweendataloglikelihoodanddegreesoffreedom.Theloglikelihoodforourmodelis
LL const( ) .,
, , , ,,w x w am t
i t m r t r tri tm= − −( ) +∑∑∑ 12 2
2
s
Thecomputationofthedegreesoffreedomissomewhattechnicallyinvolvedbut intuitivelysimple:anactivityweight that remains thesamethroughaparticularconnectedportionofthedifferentiationtreeiscountedasasin-gledegreeoffreedom.Tomakethismoreformal,weconsidermatrixAandconstructitscounterpart,B.WeuseAr,ttodenoteacolumnofmatrixA.WethenconstructagraphinwhichnodescorrespondtocolumnsofmatrixA.Given two nodes corresponding to Ar t, 1 andAr t, 2, the graph will have anedgebetweenthesetwonodesifcelltypet1isaparentofcelltypet2,andw wm r t m r t, , , ,1 2= .ThematrixBwillhavecolumnsthataresumsofcolumnscorrespondingtoconnectedcomponentsinthegraph.Weeliminateallcol-umnsofBthatarezerosandthefinaldegreesoffreedomaregivenbydf(w)=Trace(B(B′B+κdiag(c))−1B′),wherediag(c)isadiagonalmatrixwithentriesbeingthenumberofcolumnsofAintheconnectedcomponentassociatedwithacolumnofB.
HencewecancomputetheBIC(w)as
BIC w LL df
df
( ) ( ) ( )log
( )l,
, , , ,
= − +
= −( ) +∑2
12
2
w w T
x w a wm t
i t m r t r trsoog ., Ti tm∑∑
Post-processing of regulatory programs.Onceweobtainanoptimalregula-toryprogramintermsofBIC,weusepost-processingtoremoveregulatoryrelationshipsforunderexpressedregulators.Weplacedalowexpressioncutoffof5.5onthelog2scale.Atthiscutoff,thecorrelationbetweenthepredictorandthetargetmodulemayverywellbeduetonoiseandhencetherelation-shipcouldbespurious.
Systematic query for known functions of regulators. For each lineage-specificmodule,weautomaticallyquerieditsregulatorsinPubMedwiththenameofthelineage(s).Foreachmodulethatwasupregulatedordownregu-latedwithdifferentiation,wequerieditsregulatorswith‘hematopoieticdif-ferentiation’.AllPubMedqueriesweredoneon30July2012.
Choice of lineage regulators.Foreachlineage,wecollectedtheregulatorsdeemedactivebyhavinganonzeroactivityweightandsignificantexpressioninexcessof9.0onthelog2scale.Foragivenlineage,wedeemedaregulatora lineageactivator if itsaverageactivityweightacrossall cell types in thelineageandallmoduleswaspositive.Analogously,aregulatorwasdeemedalineagerepressorifitsaverageactivityweightwasnegative.Wesubsequentlyrankedtheregulatorsonthebasisoftheiraverageexpressionacrosscelltypesinwhichtheregulatorhadarole.Hence,theregulatorsthatwerefrequentlyactiveinalineageand,whenactive,hadhigherexpressionwererankedhigherthanwereregulatorsthatwereinfrequentlyactiveorhadlowexpression.Theregulatorswiththehighestexpressiontypicallyweregiventhehighesttotalactivityweightacrosslineages.
Notably,thisprocedure,althoughstraightforward,willnotreflectallthelineageregulatorsidentifiedbythemodel.First,thoselineageregulatorsthatactonlyduringalimitedwindow(forexample,earlyindifferentiation)wouldbeunder-representedbythisanalysisyetwouldbecapturedintheoverallmodelinthewindowinwhichtheyact.Second,becauseofthepost-processingstep(describedabove),regulatorswithhighbaselineexpressioncanhaveaconstantactivityweighteveniftheirexpressionisverylineagespecific(forexample,GATA-3)andthusbeunder-representedintherecruitmentanalysis(althoughtheytooarechosenasregulatorsinthemodel).
Motif scanning.Wescannedpromotersofmousegenesforenrichedmotifs.We downloaded promoter sequences for mouse (mm9) from the genomebrowserwebsiteoftheUniversityofCaliforniaSantaCruz(http://hgdownload.cse.ucsc.edu/downloads.html).Foreachgene,wescannedtheregionstarting
fromposition–1,000(basepairsupstreamofthetranscriptionstartsite)andendingatposition+200(base-pairsdownstreamof the transcriptionstartsite).Werepresentedthenucleotideatpositionj(relativeto–1,000bpfromthetranscriptionstartsite)forgeneiasSi,j.Werepresentedeachcis-regulatoryelementbyaPWM.Wecompiledasetof1,651PWMsfromtheTRANSFACmatrixdatabase(version8.3)29,theJASPARdatabase(version2008)30andexperimentallydeterminedPWMs31,32.WedenotethePWMofthe‘k-th’motifbyPk,amatrixofsize4×Lk,whereLkisthelengthofthemotifandPk(i,j)rep-resentstheprobabilityofencounteringthenucleotidej(thatis,A,C,GorT)atthe‘i-th’position.Foreachgenei,apositionalongthepromoterjandaPWMk,wecomputedthelocalmotif-matchingscoreLOD(i,j,k),definedasthe log likelihoodratio(LODscore) forobservingthesequencegiventhePWMversusagivenrandomgenomicbackground:
LOD i j k P r S P Sk i j rr
L
b i j rk
( , , ) log ( , ) log ( ), ,= −
+ −=
+ −∑ 2 11
2 1
GenomicbackgroundwasdeterminedasPb(‘A’)=Pb(‘T’)=0.3,Pb(‘C’)=Pb(‘G’) = 0.2, which represents the nucleotide composition of the mousegenome. We then found the best motif instance over the entire promoterregion,definedasMAX-LOD(i,k)=maxjLOD(i,j,k).
Motif-scoring threshold.WeautomaticallycomputedaPWM-specificthresh-oldbyusingtheinformationcontentofeachmotif.Theinformationcontentforthe‘k-th’motifisdefinedas
IC k L P i j P i jk k kji
Lk( ) ( , ) log ( , )= +
==∑∑ 21
4
1
WedefinedthePWM-specificthresholdforthe‘k-th’motifasτk,the1–2−IC(k)quantileofthePWMLODdistributionacrossallgenes’promoters.Wecon-sidereda‘hit’forthe‘k-th’motifatthe‘i-th’geneifthebestscore(MAX-LOD(i,k))exceededthethresholdτk.
Motif enrichment in modules.ForeachmoduleofgenesM,andeachmotifk,wecomputedthePvalueforenrichment,pe(M,k)ofthemotifinthemod-ulerelativetothatoftheentiresetofgenesassignedtomodulesservingasbackground.Anenrichmentofamotif inamoduleresults inhigherthanexpectedMAX-LODscoresforthegenesinthismodule;tocapturethiseffect,wecomputedthePvaluebycomparingthescoresMAX-LOD(i,k)forallgenesiinthemoduleMandthescoresfortheentiresetofgenesassignedtomod-ulesbyaone-sidedrank-sumtest.WethenusedanFDRof5%ontheentirematrixofPvaluespe(M,k)anddeclaredallPvaluesthatpassedthisproceduresignificant‘hits’.TheFDRwascalculatedseparatelyforcoarse-grainedandfine-grainedmodules.
Binding events enrichment. The public data sets obtained by ChIPfollowedbydeepsequencingandChIPfollowedbymicroarray(Supplementary Table 11) were downloaded from the GEO (Gene Expression Omnibus)database repository, supplementary material and designated sites in theoriginal publications (Supplementary Table 11; 360 experiments of 109uniqueregulators).Thetargetlistdefinedineachoriginalpublicationwasusedwheneveravailable.Otherwise,genesthathadabindingeventreportedfromtheposition1,000basepairsupstreamofthetranscriptionstartsitetotheposition200basepairsdownstreamofthetranscriptionstartsitewerelistedastargets.Indatasetsobtainedwithhumansamples,genesymbolswerereplacedbythemousegenesymbolwhereveraone-to-oneorthologexistsaccord-ingtothephylogeneticresourceEnsemblCompara35.Onlygenesincludedintheclusteringwereconsideredtargetsforthepurposeofthecalculationofenrichment.
ThehypergeometricPvaluewascalculatedforthesizeofintersectionofeachmodulewitheachtarget list.AnFDRof10%wasusedfortheentiretableofPvaluesofallmodulesandalltargetslists.TheFDRwascalculatedseparatelyforcoarse-grainedandfine-grainedmodules.
©20
13 N
atu
re A
mer
ica,
Inc.
All
rig
hts
res
erve
d.
nature immunologydoi:10.1038/ni.2587
Estimating the significance of regulatory program overlap.WereporttwoP values for each overlap of the three regulation models (from ChIP,cis-elementsandOntogenet).First,wecalculatedthehypergeometrictestfortwoorthreegroupsforwhichthe‘universesizes’werethenumberofpossibleregulatoryinteractionsincludingtheoverlappingregulators.Forexample,forestimationofthesignificanceoftheoverlapofChIPandOntogenetregulatoryinteractions,the‘universesize’isthenumberofregulatorsthatwerecandi-datesforOntogenetandhadChIPinformationmultipliedbythenumberofmodules.TheChIPinteractionsaretheenrichedmodulesaccordingtotheChIPdataset,andtheOntogenetinteractionsaretheregulatorschosenforeachmodule.Second,wecalculatedanempiricalPvaluefrom10,000permuta-tionsoftheregulatorsintheregulatoryinteractions,includingtheoverlappingregulators.ThelastPvalueswerecalculatedtoaccountforthefactthatsomemoduleshavemoreregulatorsthanothers.ThehypergeometricPvaluesandtheempiricalPvaluesaresimilarfortheoverlapofeachtwomethodsbutdif-ferinsignificanceforthethree-methodoverlap,becausethehypergeometricscoreforthreegroupsexplicitlytakesintoaccounttheoverlapbetweeneachtwogroups,whereastheempiricalPvaluedoesnot.
Functional enrichment.Curatedgenesets(C2),motifgenesets(C3)andgeneontology(GO)genesets(C5)fromtheMolecularSignaturesDatabase(versionV.3)weredownloadedfromtheBroadInstitutewebsite(http://www.broadinstitute.org/gsea)36.Foreachgroup,genesymbolswerereplacedbythemousegenesymbolwhereveraone-to-oneorthologexistsaccordingtoEnsemblCompara. Only genes included in the clustering were consideredfunctionalgroupmembersforthepurposeofthecalculationofenrichment.
AhypergeometricPvaluewascalculatedforthesizeofintersectionofeachmodulewitheachfunctionalgroup.AnFDRof10%wasusedfortheentiretableofPvaluesofallmodulesandallfunctionalgroups.TheFDRwascal-culatedseparatelyforcoarse-grainedandfine-grainedmodules,andforthedifferentclassesoffunctionalannotation(C1,C2,C3andC5).
Identification of differentiation steps with a change in activity weight of regulators.Foreachmoduleandeachedge(differentiationstep)ofthehemato-poietictree,theactivityweightofthe‘parent’wascomparedwiththeactivityweightofthe‘child’,whichresultedinoneofthefollowingclassifications:nochange(activityweightsarethesame);activatorrecruitment(parentactivityweightis0;childactivityweightispositive);activatorstrengthening(parentactivityweightispositiveandissmallerthanthatofthechild);activatordis-posal(parentactivityweightispositiveandchildactivityweightis0);repres-sorrecruitment(parentactivityweightis0;childactivityweightisnegative);repressorstrengthening(parentactivityweightisnegativeandislargerthanthatofthechild);repressordisposal(parentactivityweightisnegativeandchildactivityweightis0).Forsimplicity,weomittedthe‘regulatorweakening’option.Thoselineage-specificregulatorsthatareassignedconstantactivity
weightacrossallcell types (suchasGATA-3)willnotbecapturedby thisanalysisbutarepartofthemodel.
Mice. Mice with loxP-flanked Etv5 alleles (Etv5fl/fl)37 were crossed withC57BL/6micewithatransgeneencodingCrerecominasedrivenbythepro-moterofthegeneencodingCD2togeneratemicewithTcell–specificETV5deficiency(CD2p-CreTg+Etv5fl/fl,backcrossedthreetimestotheC57BL/6strain).TheloxP-flankedEtv5locusisspecificallydeletedfromthegenomestarting in CD25+CD44−CD3−CD4−CD8− thymic precursors (DN3) with~80%deletionefficiencyinγδthymocytesubsets,asinferredfromtheanaly-sisofCre-activity-reportermice(CD2p-CreTg+Rosa-STOPfl/fl-EYFP).
Flow cytometry.Intracellularstaining(Cytofix/CytopermKit;BDBiosciences)and intranuclear staining (FoxP3 Staining Kit; eBioscience) were done asdescribed24. The following antibodies were used: anti-TCRδ (GL3), anti-CD24(HSA,M1/69),anti-CD44(IM7),anti-CD62l(MEL-15),anti-IL-17A(ebio17B7)andanti-RORγt(AFKJS-9;all fromeBioscience);andanti-Vγ2(UC3-10A6), anti-Vδ6.3 (8F4H7B7), anti-CCR6 (140706) and anti-CD27(LG.3A10; all from BD Biosciences). Anti-Vγ1.1 (2.11) was purified fromculturesupernatantandwasbiotinylatedwiththeFluoReporterMini-Biotin-XXLabelingKit(Invitrogen).DatawereacquiredonanLSRII(BD)andwereanalyzedwithFlowJosoftware(Treestar).
27. Blatt, M., Wiseman, S. & Domany, E. Superparamagnetic clustering of data. Phys. Rev. Lett. 76, 3251–3254 (1996).
28. Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).
29. Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
30. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32 (suppl. 1), D91–D94 (2004).
31. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
32. Berger, M.F. et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008).
33. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. & Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Series B Stat. Methodol. 67, 91–108 (2005).
34. Boyd, S.P. & Vandenberghe, L. in Convex Optimization, Ch 11.7 (Cambridge University, Cambridge, UK, 2004).
35. Vilella, A.J. et al. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009).
36. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
37. Zhang, Z., Verheyden, J.M., Hassell, J.A. & Sun, X. FGF-regulated Etv genes are essential for repressing Shh expression in mouse limb buds. Dev. Cell 16, 607–613 (2009).