Diffusion pseudotime robustly reconstructs lineage branching · diffusion-like random walks. This...
Transcript of Diffusion pseudotime robustly reconstructs lineage branching · diffusion-like random walks. This...
DiffusionpseudotimerobustlyreconstructslineagebranchingLalehHaghverdi1,MarenBüttner1,F.AlexanderWolf1,FlorianBuettner1,2,FabianJ.Theis1,3
1HelmholtzZentrumMünchen–GermanResearchCenterforEnvironmentalHealth,InstituteofComputationalBiology,Neuherberg,Germany.2EuropeanMolecularBiologyLaboratory,EuropeanBioinformaticsInstitute,WellcomeTrustGenomeCampus,Hinxton,Cambridge,UK.3DepartmentofMathematics,TechnischeUniversitätMünchen,Munich,Germany.Single-cell gene expression profiles of differentiating cells encode their intrinsic latenttemporalorder.Wedescribeanefficientwaytorobustlyestimatethisorderaccordingtoadiffusionpseudotime,whichmeasurestransitionsonalllengthscalesbetweencellsusingdiffusion-like random walks. This allows us to identify cells that undergo branchingdecisionsorare inmetastablestates,andtherebygenesdifferentiallyregulatedatthesestates. Cellular programs are driven by gene-regulatory interactions, which due to inherentstochasticity as well as external influences often exhibit strong heterogeneity andasynchronyinthetimingofprogramexecution.Time-resolvedbulktranscriptomicsaveragesovertheseeffectsandobscurestheunderlyinggenedynamics. Instead,single-cellprofilingtechniquesallowasystematicobservationofasinglecell'sregulatorystate1,capturingcellsat various stages in their respective process2,3. To infer gene dynamics and hence thesequenceofcellularprograms, thecollective (‘universal’)processdynamics (Box1)canbereconstructedby reorderingcellsaccording tosomemeasureofexpressionsimilarity.Thisso-calledpseudotemporalorderingwasinitiallyproposedforbulkexpression4,andwaslaterextendedtosingle-cellRNA-seq5andproteinprofilesfrommasscytometry6.Box1:UniversaltimeIncontrasttocontinuoustimeobservationsofasinglecelle.g.fromtime-lapsemicroscopy,high-throughputsnapshotexperimentssuchassinglecellRNA-seqorFACSonlyencodethecollective(‘universal’)timedependenceofcells,notthestochastictrajectoriesofsinglecells.Wedefineuniversal timeas thegeodesicdistanceon themanifold that isassociatedwiththe deterministic program underlying the stochastic cellular process. For time-lapse data,universaltimecanbeconstructedbyestimatingthevelocity𝒗(𝑡)tangentialtothismanifoldCfromlocalaveragesofsingle-celltrajectories.Thegeodesicdistance
𝑠 𝑡 = 𝑑𝑠!: ! ! ,! !
= 𝑑𝑡!!
! 𝒗 𝑡! ≈ 𝑑𝑡!
!
!
1𝜌 𝑡! .
thenquantifiesthearclengthi.e.theuniversaltimealongthemanifold,where𝜌 𝑡 denotesthelocaldensityofsamplesonasinglecelltrajectory(seeSupplementarySec.1).Pseudotimes are proxies for universal time (Supplementary Figs. 1-3). Our proposed DPTapproximatesuniversal timebetter thanotherpseudotimeschemesas itdoesnot involvedimensionreduction,andbetterthandiffusiondistance11asitaccountsforrandomwalksonalllengthscales.Ultimately these approaches aim to fully understanddifferentiationdynamics as paths onWaddington’s‘epigenetic landscape’7,8.Howeversofar it isunclearhowtoidentifycellsatcritical branching decisions aswell as quiescent ormetastable cells, forwhich there is nonotionof temporal ordering.Moreover, asnovel experimental techniques suchasdroplet
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
sequencing9,10 allow to profile tens of thousands of cells, there is an urgent need forcomputationallyefficient,scalableandrobustalgorithms.To overcome these problems, we introduce ‘diffusion pseudotime’ (DPT). It measuresprogression through branching lineages using a random-walk-based distance in diffusionmapspace11andallowsforbranchingandpseudotimeanalysisonlarge-scaleRNA-seqdatasets.Evenintheabsenceofbranching,DPTissignificantlymorerobustwithrespecttonoiseinlow-densityregionsandcelloutliersthanexistingmethods,whichrelyontheestimationofminimumspanningtrees5orsampling-baseddistances6,12.Diffusion pseudotime is computed in three steps (Fig. 1a and Online Methods). First, atransitionmatrixTthatapproximatesthedynamictransitionsofcellsthroughstagesofthedifferentiation process is determined. The right eigenvectors of T are known as diffusioncomponents and have been used in diffusion maps for visualizing single cell RNA-seqdata13,14. While using only few diffusion components yields interpretable visualizations,important informationmaybelostbyremovingtheothers.Consequently,DPTisbasedonthefullrankTratherthanalowrankapproximation.Inthesecondstep,thedistancedpt(x,y)betweentwocellswithindexxandyiscomputedas
dpt(𝑥,𝑦) = 𝑴 𝑥, . −𝑴 𝑦, . !/!! , 𝑴 = 𝑻! . (1)!
!!!
Here, … !/!! denotes the𝐿! norm weighted by the first left eigenvector𝜑! of T, the
steadystate(OnlineMethods,SupplementarySec.1.3).Insteadoftheprobability(𝑻!)!"fora random walk of fixed length11 t from x to y, in Eq. (1), we compute the accumulatedtransition probability(𝑴)!" of visiting y when starting from x over random walks of alllengths.Thisisdoneusingthemodifiedtransitionmatrix𝑻,whichisdefinedasTwithouttheeigenspace associated with the steady state𝜑!,and therefore describes how the steadystateisapproached.Fixingaknownrootcellrasstartofthedynamicalprocessofinterest,wedefinethepseudotimeofcellxasdpt(x,r).Inthethirdstep,branchingpointsareidentifiedbycomparingtworandomwalksovercells,one starting at the root cell r and the other at its maximally distant cell y, measuringpseudotimeswithrespecttorandy, respectively. Thetwosequencesofpseudotimesareanticorrelateduntil the twowalksmerge inanewbranch,where theybecomecorrelated(Online methods). This criterion robustly identifies branching points as we illustrate forsimulationdataforwhichthegroundtruthisknown(SupplementaryFig.4).ToillustratetheabilityofDPTtoidentifybranchesonrealdata,wereanalyzedasingle-cellqPCRdatasetfocusingonearlyblooddevelopment15,forwhichwehaveshownpreviouslythat diffusion maps allow to visually identify a precursor branch splitting up into twoseparate lineages (cf. Fig.1b).DPTorderedcellsalongtheirdevelopmental trajectoryandidentified cells at thebranchingpoint. The three identifiedbranchesqualitatively coincidewithaprecursorbranchandthereportedblood(branch1)andendothelialbranches(branch2)15.Inparticularweidentifiedcharacteristicpatternsinthedevelopmentalstagesinbloodprogenitors (Fig.1d),namelythehemangioblast-likesequence16 (subsequentup-regulationof Cdh1 to Tal1 and Cdh5) at the precursor branch15 and the endothelial differentiationroute15onbranch2(elevatedlevelsofPecam1,ErgandSox17amongstothers).Further,we
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
findtheerythroid-likesequenceofEtv2,Tal1,Runx1andGata117atbranch1.Thetemporalresolution obtained by DPT indicates immediate (directly after branching, cf. Ikarosexpression in Fig. 1c) and late transitions (cf. Erg in Fig. 1c) as well as a number ofintermediate regulatory events15 until the onset ofHbb-bH1 expression (cf. Fig. 1d, blacktriangles),hintingatpotentialnovelregulatoryinteractions.
Figure1:Diffusion pseudotime reveals temporal ordering and cellular decisions on the single cell level. (a)Outline of the computational workflow. The diffusion transition matrix𝑻!" is constructed bysuperimposing local kernels at the expression levels of cells x and y (1). The diffusion pseudotimedpt(x,y)approximatesthegeodesicdistanceofxandyonthemappedmanifold(2).Branchingpointsareidentifiedaspointswhereanti-correlateddistancesfrombranchendsbecomecorrelated(3).(b)ApplicationofDPTtosingle-cellqPCRof42genesin3934singlecellsduringearlyhematopoiesis15,sorted from 5 different populations: primitive streak (PS), neural plate (NP), head fold (HF), foursomiteGFPnegative(4SG-),foursomiteGFPpositive(4SG+).DPTidentifiesonebranchingpoint.(c)Exemplary dynamics of genes Erg and Ikaros show qualitatively different behavior in the twobranches,blacklinesdescribeamovingaverageover50adjacentcellsalongtherespectivebranch.(d) Heatmap of gene expression, with cells ordered by diffusion pseudotime and genes orderedaccording to theonsetof firstmajor change inexpression (seeSupplementary Sec. 7.2, smoothedalong50adjacentcells,seeSupplementaryFig.5fornon-smoothedversion).Barsontopindicatethecells’ population (b) and cell density, respectively,with high density regions indicatingmetastablestates. The time series were clustered temporally by the time point of the first transition event(precursorbranch,branch1andnone,respectively,SupplementaryFig.6).Thepiecharts(bottom)showthefractionofcellsmakingupthemetastablestates(blackhorizontalline).
DC1
DC2
0
5
10
15
20
25
30
35
40
45
DC1
DC2
0
10
20
30
40
50
60
70
80
90
100
B1 B2
B1 B2
M
branch 1 branch 2
higher probability
lower probability
x
yx
y
a1 2
x
ydiffusion pseudotime:
scale-free average over random walks
3x
y
d(x,_)
d(y,_)
b d
decis
ion st
age
termina
l bran
ch 1
termina
l bran
ch 2
diffusion pseudotime
population indexcell density
gene expression
gene dynamics
not observed
activationdeactivation
c
root
DPTDPT
precu
rsor s
tage
construction of transition matrix
branching point identification
branch 2branch 1
cell state composition
correlated anti-correlatedorder vs :d(x,_) d(y,_)
z
0 1000 2000 3000order in DPT
-10
-8
-6
-4
-2
0
Ikar
os
0 1000 2000 3000order in DPT
-10
-8
-6
-4
-2
0
Erg
Ikar
os
Erg
DPT order DPT order
sorted populations
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
DPT identified regions of small time-steps i.e. of high cell density (Fig. 1d, top andSupplementaryFig.7b)alongthedifferentiationprocess.Thesehigh-densityregionsindicatemetastable states, which correspond to biologically meaningful intermediates: We foundfourmetastablestateswithexpressionpatternsofprecursorcells,hemangioblast-likecellsat the decision state, erythroid-like and endothelial-like cells. Notably, both decision andprecursor states consist of cellmixtures from two or three different stages, stressing theasynchrony of developmental stages that could not be resolved without pseudotemporalordering.Toidentifykeydecisiongenes,wequantifiedexpressiondifferencesintheidentifiedstatesof decision versus precursor usingMAST18 (Supplementary Fig. 8a). This resulted inmorethan50%ofchangedgenes(27outof42),includingPecam1andCbfa2t3h,whichareknowntoindicatehematopoieticandendothelialdevelopment16,respectively.Incontrast,only24genesaredifferentiallyexpressedbetweensortedcellsfromheadfoldandprimitivestreak,all changing gradually but preserving bimodal distribution (Supplementary Fig. 8d andSupplementaryTable1).Also,differentialgeneexpressionbetweenHFand4SG-cellsfailstoidentifyendothelialdifferentiationbutbringsuperythroidfactors(Runx1, IkarosandGfi1bamongstothers,seeSupplementaryFig.8eandSupplementaryTable2).Insummarywhencomparing differentially expressed genes between metastable states, we identify moregenes thancomparingdevelopmental stages,andthegeneshave lessbimodalexpression.This shows that the anatomical stages overstate developmental heterogeneity thusdisguisingtheroleofkeyfactors.DPT copes well with large-scale experiments such as scRNA-seq combined with dropletbarcoding10: In the experiment, Klein et al. monitored the transcriptomic profiles andheterogeneity indifferentiationofmouseES cells after LIFwithdrawal (Fig.2a).After cell-cyclenormalization (SupplementaryFigs.9-10),DPTdescribesa singledifferentiationpathfrom which two populations branch off. With increasing pseudotime, we observeupregulatedepiblastmarkers(Krt8/18/19)anddownregulatedpluripotencyfactors(Nanog,Fig. 2b). Clusteringof the geneexpressiondynamics identified four clusterswithdifferenttemporalbehaviors(Fig.2candSupplementaryFig.11)butcoherentbiologicalfunctions(Fig.2d).Earlypseudotimecoincideswithday0cellsexhibitingstrongexpressionofpluripotencyfactors (purple cluster). Then, a small subpopulation (57 cells)mainly consisting of day 2cellsbranchesoff,enrichedinneuron-associatedgenes(Bc1,Lin7b,Snord64,Tagln3,Dtnbp1,Nenf; 6 out of 22, see Supplementary Fig. 12). Subsequent stages are characterized by agradualdecreaseofpluripotencyfactorsandslowriseofbothprimitiveendodermmarkers(yellow cluster) and epiblast markers (orange cluster)19. In late pseudotime, another twobranchingeventgivesrise toapopulationwith increasedprimitiveendodermmarkers (21cells),whereasepiblastmarkergenesrisetwo-tothree-foldintheotherbranch(120cells).Altogether, DPT is able to remove asynchronity of scRNA-seq snapshot data from severaldays,aligningcellsintermsoftheirdegreeofdifferentiation.ToevaluateDPT’sperformancewithoutbranching compared toWanderlustandMonocle,we counted how often a pseudotime puts a cell from a later temporal sorting before anearlierone(measuredbyKendallrankcorrelation𝜏).DPTreconstructsthetemporalordersof ESC differentiation with significantly higher accuracy than Wanderlust (𝜏= 0.78±10-3versus0.71±10-3,respectively,Fig.2eandSupplementaryFig.13).Thisholdstruealsowhen
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
compared to Monocle in repeated bootstrap runs and on other data sets (Fig. 2f andSupplementaryTable3).
Figure2:Diffusionpseudotime identifiesdifferentiationdynamics indroplet-basedscRNA-seqexperiments10.(a)MouseESCsafter LIFwithdrawalwereharvestedat T=0,2, 4 and7daysandprofiledwith thedropSeqprotocol,giving2717cellswith24175observeduniquetranscripts10.Aftercorrectionforcellcyclevariation,alowdimensionalvisualizationusingdiffusionmapsshowsoverlappingbutdirectedtemporal dynamics. (b) Temporal dynamics of selected genes as in the original publicationreconstructedbyDPT,relativetoexpressionatinitialtimepoint.(c)DPTidentifiesamaintimeaxiswith twominor branching events: an early side branch and late separation of cells enrichedwithmarkersforepiblastsandprimitiveendoderm(top).PopulationandcelldensityareshownasinFig.1d. The heatmapdepicts genedynamics after hierarchical clustering and removal of a fluctuating,mainlyconstantsubgroup (cf.SupplementaryFig.11b):Thedynamicsubgroups (indicatedbycolorbar,right)consistofepiblastmarkerssuchasKrt8/18/19,Sfn,Tagln(orange),gradualdownregulatedpluripotency factors such as Pou5f1 (Oct4), Sox2, Trim28, Nanog (purple) and slow consistentupregulatedprimitiveendodermmarkerssuchasCol4a1/2,Lama1/b1,Serpinh1,Sparc(yellow).(d)Geneontologyenrichmentshowsacellularreorganizationsignature(orange),ametabolicsignatureconsistent for differentiation (purple) and a cell motility signature (yellow). (e) Pseudotimedistribution of cells in the experiments from the four different days, for DPT and Wanderlust.Diffusionpseudotimeorderscellswellalongthe four temporalcategories (Kendall rankcorrelation0.78±10-3),significantlybetterthanpseudotemporalorderingbyWanderlust(Kendalrankcorrelation0.71±10-3,seealsoSupplementaryFig.13).(f)ComparisonsofKendallrankcorrelationonbootstrapsamples (n=100bootstrapruns,downsamplingtomaximal1800cells forWanderlustandDPT,700cellsforMonocleduetoperformanceissues)forthepresentedESCdataset,theqPCRdatasetfromfigure 1 and an scRNA-seq of differentiating myoblasts from the Monocle paper5 show that DPTconsistentlyoutperformstheothertwomethods(2-sidedt-testwithsignificancelevels *p<0.05;**p<0.01;***p<0.001,n.s.notsignificant).
DC1DC2
DC3
T=0d T=2d T=4d T=7d
amESCs
-LIF
e
0
0.2
0.4
0.6
0.8
ESC dropseq
early blood qPCR
myoblast RNA-seq
DPT
Wande
rlust
Monoc
le
Monoc
le
Monoc
le
Wande
rlust
Wande
rlust
DPTDPT
conc
orda
nce
with
tim
e la
bels
f
cdiffusion pseudotime
population indexcell density
0.2 0.3 0.4 0.5 0.6 0.7 0.80.9 1Wanderlust pseudotime
0d
2d
4d
7d
time
lab
els
0.05 0.1 0.5Diffusion pseudotime
0d
2d
4d
7d
time
lab
els
d
b
0 0.1 0.2 0.3 0.4DPT
1
1.5
2
2.5
3
3.5
expressio
n
Pou5f1Dppa5aZfp42Ccnd3NanogEsrrbSox2Otx2ActbKrt8
Krt8
Actb
Otx2
Nanog
diffusion pseudotime (main branch)
expr
essi
on re
lativ
e to
t=0
*** ***
******
** n.s.
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
Inconclusion,weintroduceDPTasapseudotimemeasurethatovercomesthedeficienciesofexistingapproaches:itisabletodealwithbranchinglineagesandidentifiesmetastableorsteadystates,itisstatisticallyrobust,anditscomputationcanbescaledtolargedatasetswithoutdimensionreduction.ComparedtoWanderlust6,whichhasbeenproposedforthelower-dimensionalmasscytometrydata,wereplacedapproximateandcomputationallycostlysamplingofshortestpathsbytheexactandcomputationallycheapaverageoverrandomwalksineq.(1).ComparedtoMonocle5,whichworksonRNA-seqdatabutonlyafterdimensionreductionandonmediumsamplenumbers,DPT’saverageoverallrandomwalksissignificantlymorerobustthanMonocle’sminimumspanningtreeapproach(Fig.2e).In the future, robust computation of pseudotimes will allow inferring regulatoryrelationships with much higher confidence than based on perturbations alone15, and weexpectDPTtoallowscalingthistogenome-widemodels.Recentlypseudotemporalorderinghas been applied to cell morphology to identify cell cycle states20 – here diffusionpseudotimewouldallowinclusionofbranchingforexampletoidentifycellsswitchingintoaquiescent state as well as comparison to time-lapse microscopy via universal time. Tosummarize, diffusion pseudotime provides a powerful and robust tool to order cellsaccordingtotheirstateondifferentiationtrajectoriesinsingle-celltranscriptomicsstudies.References1. Shalek,A.K.etal.Single-celltranscriptomicsrevealsbimodalityinexpressionand
splicinginimmunecells.Nature498,236–240(2013).2. Moignard,V.etal.Characterizationoftranscriptionalnetworksinbloodstemand
progenitorcellsusinghigh-throughputsingle-cellgeneexpressionanalysis.NatCellBiol15,363–372(2013).
3. Treutlein,B.etal.Reconstructinglineagehierarchiesofthedistallungepitheliumusingsingle-cellRNA-seq.Nature509,371–375(2014).
4. Magwene,P.M.,Lizardi,P.&Kim,J.Reconstructingthetemporalorderingofbiologicalsamplesusingmicroarraydata.Bioinformatics19,842–850(2003).
5. Trapnell,C.etal.Thedynamicsandregulatorsofcellfatedecisionsarerevealedbypseudotemporalorderingofsinglecells.NatBiotechnol32,381–386(2014).
6. Bendall,S.C.etal.Single-celltrajectorydetectionuncoversprogressionandregulatorycoordinationinhumanBcelldevelopment.Cell157,714–725(2014).
7. Islam,S.etal.Characterizationofthesingle-celltranscriptionallandscapebyhighlymultiplexRNA-seq.GenomeResearch21,1160–1167(2011).
8. Trapnell,C.Definingcelltypesandstateswithsingle-cellgenomics.GenomeResearch25,1491–1498(2015).
9. Macosko,E.Z.etal.HighlyParallelGenome-wideExpressionProfilingofIndividualCellsUsingNanoliterDroplets.Cell161,1202–1214(2015).
10. Klein,A.M.etal.Dropletbarcodingforsingle-celltranscriptomicsappliedtoembryonicstemcells.Cell161,1187–1201(2015).
11. Coifman,R.R.etal.Geometricdiffusionsasatoolforharmonicanalysisandstructuredefinitionofdata:diffusionmaps.Proc.Natl.Acad.Sci.U.S.A.102,7426–7431(2005).
12. Qiu,P.etal.Extractingacellularhierarchyfromhigh-dimensionalcytometrydatawithSPADE.NatBiotechnol29,886–891(2011).
13. Haghverdi,L.,Buettner,F.&Theis,F.J.Diffusionmapsforhigh-dimensionalsingle-cellanalysisofdifferentiationdata.Bioinformatics31,2989–2998(2015).
14. Angerer,P.etal.destiny-diffusionmapsforlarge-scalesingle-celldatainR.Bioinformaticsbtv715(2015).doi:10.1093/bioinformatics/btv715
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
15. Moignard,V.etal.Decodingtheregulatorynetworkofearlyblooddevelopmentfromsingle-cellgeneexpressionmeasurements.NatBiotechnol33,269–276(2015).
16. Huber,T.L.,Kouskoff,V.,Fehling,H.J.,Palis,J.&Keller,G.Haemangioblastcommitmentisinitiatedintheprimitivestreakofthemouseembryo.Nature432,625–630(2004).
17. Costa,G.,Kouskoff,V.&Lacaud,G.OriginofbloodcellsandHSCproductionintheembryo.TrendsinImmunology33,215–223(2012).
18. Finak,G.etal.MAST:aflexiblestatisticalframeworkforassessingtranscriptionalchangesandcharacterizingheterogeneityinsingle-cellRNAsequencingdata.GenomeBiology16,278(2015).
19. Nishikawa,S.I.,Jakt,L.M.&Era,T.Embryonicstem-cellcultureasatoolfordevelopmentalcellbiology.NaturereviewsMolecularcellbiology8,502–507(2007).
20. Gut,G.,Tadmor,M.D.,Pe’er,D.,Pelkmans,L.&Liberali,P.Trajectoriesofcell-cycleprogressionfromfixedcellpopulations.NatMeth12,951–954(2015).
MethodsMethodsandanyassociatedreferencesareavailableintheonlineversionofthepaper.AccessioncodesAMATLABimplementationofDPTisavailableonhttp://www.helmholtz-muenchen.de/icb/dpt.AuthorcontributionsL.H.developedthemethodandthecomputationaltools,performedtheanalysisandwrotethepaper.M.B.contributedtotheanalysisofresults.F.B.andF.A.W.helpedinterprettheresultsandwritethesupplement.F.J.T.conceivedandsupervisedthestudy,contributedtothemethoddevelopmentandwrotethepaperwithhelpfromallco-authors.AcknowledgementsWewouldliketoacknowledgeCarstenMarr,JanHasenauer,MatthiasHeinig,JanKrumsiekandThomasBlasifortheirhelpfuladviceandcommentsonthemanuscript.M.B. is supported by a DFG Fellowship through the Graduate School of QuantitativeBiosciences Munich (QBM). F.A.W. acknowledges support by the “Helmholtz PostdocProgramme”,InitiativeandNetworkingFundoftheHelmholtzAssociation.F.B.issupportedbytheUKMedicalResearchCouncil(MRC)viaaCareerDevelopmentAward(Biostatistics).F.J.T. acknowledges financial support by the German Science Foundation (SFB 1243 andGraduateSchoolQBM)aswellasbytheBavariangovernment(BioSysNet).CompetingfinancialinterestsTheauthorsdeclarenocompetingfinancialinterests.
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
OnlineMethods
OverviewofDPTalgorithm0) (Initialization)inputsthefollowing:
a. ThenbyGdatamatrixb. One(orseveral)rootcell(s).c. Diffusionmapsoptions“classic”or“locallyscaled”andrespectivelythe
parameters“θ”(kernelwidth)or“κ"(numberofnearestneighboursforadjustingthekernelwidth).
1) ComputesthetransitionmatrixT.2) BuildstheaccumulatedtransitionmatrixMandcomputesdiffusionpseudotimewith
respecttothespecifiedroot.Ifseveralrootsaredefined,DPTaveragesthepseudotimeforeachcellyovertheseroots.
3) DPTiterativelyassignscellstobranchesandsubbranches.DPTgroupsthecellsforeachbranchandprovidesdiffusionpseudotimeforeachgroup.
DiffusionpseudotimeWecalculate thediffusionmaps transitionmatrixT and its rightand lefteigen-vectorsψ!and𝜑! .Itthencomputestheaccumulatedtransitionprobabilitiesoverallnumbersoftimesteps.
M = 𝑻! = (𝐼 − 𝑻)!! − 𝐼 where 𝑻 = T - ψ!𝜑!! . !
!!!
This isdone relative to thesteadystate𝜑! ,whichstoresno informationaboutdynamics.Fixingaknownrootcellxasstartofthedynamicalprocessofinterest,DiffusionpseudotimeofcellyisdefinedasadensityweightedL!norm
dpt(𝑥,𝑦) = 𝑴 𝑥, . −𝑴 𝑦, . ! !!.
FurtherdetailsaregiveninSupplementarySec.3.BranchassignmentWefindthecellywiththemaximaldptdistancefromtheroot(s)xandalsoanothercellzwhichhasmaximaldistancetoxandy:
z = argmax!! dpt 𝑧′, 𝑥 + dpt 𝑧′,𝑦 .
Ifthemanifoldisbranching,thenasdefinedyandzwillprovidecellsattwodifferenttipsoftwobranches.DPT thenobtains twoorderingsOy=dpt(.,y) andOz=dpt(.,z) anddetermines the cutoff celluntilwhichthesequenceoforderedcellsinOx(callthemXi),OyandOzbecomemaximallycorrelatedusingKendall’srankcorrelation.DPTthusassignscellsXitothebranchofx.DPT treats y and z as root of the subbranchesYi and Zi respectively and in a similarwaysearches for new subbranches within each branch. Further details are provided inSupplementarySec.4.
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
MetastablestatesThe pseudotemporal ordering of cell populations reflects gradual and switch-like changesalongacertainbranch.HighlysimilarcellshavesmalldistanceinthegenespaceandahighprobabilitytobereachedbyarandomwalkasdefinedbythetransitionmatrixT.Then,thedifferenceinpseudotimebetweensuchcellsissmall,i.e.,thedensityofthedistancetotheroot cell dpt(r,.) increases at sites, where highly similar cells are found. In particular,developmental steady states have high densities in the pseudotime measure, but theseaccumulationsitesarenotsufficienttodepictasteadystate.However,theseaccumulationsitesarenotsufficienttodepictasteadystate. DetectingtranscriptionalchangesTo identify the succession of switch-like transcriptional changes revealed by thepseudotemporal order in qPCR data, we computed an approximate derivative of thesmoothed gene expression level along branch 1. A switch-like change is defined as themaximuminthederivative(detailsinSupplementarySec.7.2). DifferentialexpressionanalysisWeemployedatwo-part,generalizedlinearmodelthatallowstoquantifytheproportionofcells expressing a certain gene as well as the mean expression level, a modified Hurdlemodel1. Briefly, the model has two parts: A discrete part to decide whether a gene isexpressedandacontinuouspartusinganormaldistributionifthegeneisexpressed.Then,a likelihood ratio test is used to identify differentially expressed genes (details inSupplementarySecs.5and7.3.andFinaketal1). ESCqPCRDataWe reanalyzed a single-cell qPCR dataset (normalized version with 3934 cells, 42 genes)focusingonearlyblooddevelopment2.Foreachgene,thelimitofdetection(LOD)wastheaverageCtvalueforthelastdilutionatwhichallsixreplicateshadpositiveamplification.TheoverallLODof25forthegenesetwasthemedianCtvalueacrossallgenes.RawCtvaluesand normalized data can be found in Supplementary Table 7 of Moignard et al2. Geneexpressionwassubtractedfromthelimitofdetectionandnormalizedonacell-wisebasistothemean expression of the four housekeeping genes (Eif2b1,Mrpl19,Polr2a andUbc) ineach cell. Cells that did not express all four housekeeping genes were excluded fromsubsequentanalysis,aswerecellsforwhichthemeanofthefourhousekeeperswas±3s.d.from themean of all cells. A dCt value of −14was then assignedwhere a genewas notdetected.85–90%ofsortedcellswereretained for furtheranalysis.Gata2didnotamplifycorrectly andHoxB3was not expressed in any cells, so these factors have been excludedfromtheanalysis.TheanalysesweredoneonthedCtvaluesforalltranscriptionfactorsandmarkergenes,butnothousekeepinggenes.DropSeqdataWereanalyzedasingle-cellRNA-seqdatasetusingthedropSeqprotocol fromKleinetal3.Here,singlecellsalongwithasetofuniquelybarcodedprimerswerecaptureintinydropletsandsequenced.Thecapabilitiesof this techniqueweredemonstratedusinganundirecteddifferentiationprocessofmouseembryonicstemcellsuponleukemiainhibitoryfactor(LIF)withdrawal.Thedata set ispubliclyavailableunder theGEOaccessionnumberGSE65525.Countdatawerenormalizedby librarysizeand log10transformed(seeSupplementarySec.8.1).We corrected for cell-cycle and batch effects using scLVM4 on 2044 highly variable
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint
genes (see Supplementary Table 3 in Klein et al3). Then, diffusionmapwith local densityrescaling (Supplementary Sec. 2) visualizes the temporal order for all cells. Hierarchicalclustering was performed in R (http://www.r-project.org/) using the hclust package onquantile-normalized data (Supplementary Sec. 8.2) and displayed with ComplexHeatmappackage, where the distance was defined as 1 – correlation between all samples(SupplementarySec.8.3).Inaddition,weperformedaranksumstestonthefirstsidebranchtoidentifygenesbeinguniquelydifferentfrominitialpluripotentandlateepiblast-likecells(SupplementarySec.8.4).ConcordanceofpseudotimewithtimelabelsWesubsampledsetsof ~70%ofdataand foreachsetperformedWanderlust5,Monocle6andDPTpseudotimeorderings. SinceMonocledoesnotperformonvery largenumberofcells(>103),wereducedthesubsamplingto700cellswhennecessary.Theconcordanceforeach subsetwas thenmeasured as Kendall tau correlation of each pseudotimewith timelabels of that subset. We then performed a t-test and calculated p-values between thehistogramoftheconcordancemeasureforWanderlustandMonoclecomparedtotheDPTpseudotimes.TheresultisshowninFig.2fofthemaintextandSupplementaryTable3. References
1. Finak,G.etal.MAST:aflexiblestatisticalframeworkforassessingtranscriptionalchangesandcharacterizingheterogeneityinsingle-cellRNAsequencingdata.GenomeBiology16,278(2015).
2. Moignard,V.etal.Decodingtheregulatorynetworkofearlyblooddevelopmentfromsingle-cellgeneexpressionmeasurements.NatBiotechnol33,269–276(2015).
3. Klein,A.M.etal.Dropletbarcodingforsingle-celltranscriptomicsappliedtoembryonicstemcells.Cell161,1187–1201(2015).
4. Buettner,F.etal.Computationalanalysisofcell-to-cellheterogeneityinsingle-cellRNA-sequencingdatarevealshiddensubpopulationsofcells.NatBiotechnol33,155–160(2015).
5. Bendall,S.C.etal.Single-celltrajectorydetectionuncoversprogressionandregulatorycoordinationinhumanBcelldevelopment.Cell157,714–725(2014).
6. Trapnell,C.etal.Thedynamicsandregulatorsofcellfatedecisionsarerevealedbypseudotemporalorderingofsinglecells.NatBiotechnol32,381–386(2014).
.CC-BY-NC 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/041384doi: bioRxiv preprint