Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M....

25
Sakamoto, M., Venditti, C., & Benton, M. J. (2017). 'Residual diversity estimates' do not correct for sampling bias in palaeodiversity data. Methods in Ecology and Evolution, 8(4), 453–459. https://doi.org/10.1111/2041-210X.12666 Peer reviewed version License (if available): CC BY-NC Link to published version (if available): 10.1111/2041-210X.12666 Link to publication record in Explore Bristol Research PDF-document This is the author accepted manuscript (AAM). The final published version (version of record) is available online via Wiley at http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12666/abstract. Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

Transcript of Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M....

Page 1: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

Sakamoto, M., Venditti, C., & Benton, M. J. (2017). 'Residual diversityestimates' do not correct for sampling bias in palaeodiversity data.Methods in Ecology and Evolution, 8(4), 453–459.https://doi.org/10.1111/2041-210X.12666

Peer reviewed versionLicense (if available):CC BY-NCLink to published version (if available):10.1111/2041-210X.12666

Link to publication record in Explore Bristol ResearchPDF-document

This is the author accepted manuscript (AAM). The final published version (version of record) is available onlinevia Wiley at http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12666/abstract. Please refer to any applicableterms of use of the publisher.

University of Bristol - Explore Bristol ResearchGeneral rights

This document is made available in accordance with publisher policies. Please cite only thepublished version using the reference above. Full terms of use are available:http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

Page 2: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

1

‘Residualdiversityestimates’donotcorrectforsamplingbiasin1

palaeodiversitydata2

3

SHORTTITLE:Donotuseresidualsmethod4

5

WORDCOUNT:3,6356

7

ManabuSakamoto1,ChrisVenditti1andMichaelJ.Benton28

9

1SchoolofBiologicalSciences,UniversityofReading,Reading,RG66AJ,UK10

2SchoolofEarthSciences,UniversityofBristol,Bristol,BS81RJ,UK11

12

EMAIL:[email protected]

14

15

Page 3: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

2

ABSTRACT16

1. Itiswidelyacceptedthatthefossilrecordsuffersfromvarioussampling17

biases–diversitysignalsthroughtimemaypartlyorlargelyreflectthe18

rockrecord–andmanymethodshavebeendevisedtodealwiththis19

problem.Onewidelyusedmethod,the‘residualdiversity’method,uses20

residualsfromamodelledrelationshipbetweenpalaeodiversityand21

sampling(sampling-drivendiversitymodel)as‘corrected’diversity22

estimates,buttheunorthodoxwayinwhichtheseresidualsaregenerated23

presentsseriousstatisticalproblems;theresponseandpredictor24

variablesaredecoupledthroughindependentsorting,renderingthenew25

bivariaterelationshipmeaningless.26

2. Here,weusesimplesimulationstodemonstratethedetrimental27

consequencesofindependentsorting,throughassessingerrorratesand28

biasesinregressionmodelcoefficients.29

3. Regressionmodelsbasedonindependentlysorteddataresultin30

unacceptablyhighratesofincorrectandsystematically,directionally31

biasedestimates,whenthetrueparametervaluesareknown.Thelarge32

numberofrecentpapersthatusedthemethodarelikelytohave33

producedmisleadingresultsandtheirimplicationsshouldbereassessed.34

4. Wenotethatthe‘residuals’approachbasedonthesampling-driven35

diversitymodelcannotbeusedto‘correct’forsamplingbias,andinstead36

advocatetheuseofphylogeneticmultipleregressionmodelsthatcan37

includevariousconfoundingfactors,includingsamplingbias,while38

simultaneouslyaccountingforstatisticalnon-independenceowingto39

sharedancestry.Evolutionarydynamicssuchasspeciationareinherently40

Page 4: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

3

aphylogeneticprocess,andonlyanexplicitlyphylogeneticapproachwill41

correctlymodelthisprocess.42

KEYWORDS43

Palaeodiversity;residuals;modeling;samplingbias;fossilrecord;independent44

sorting 45

Page 5: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

4

INTRODUCTION46

IthasbeenwellknownsincethetimeofDarwinthatthefossilrecordislargely47

incomplete(Darwin1859),promptinggenerationsofmacroevolutionary48

researcherstotakeacautiousapproachwheninterpretingpatternsof49

palaeodiversitythroughtime(Raup1972;Raup1976;Raup1991;Prothero50

1999;Smith&McGowan2007;Alroy2010b).Therehavebeenmanyattemptsto51

accountforthissamplingbias(Raup1972;Raup1976;Smith&McGowan2007;52

Alroy2010b),butoneapproachinparticular,oftenreferredtoasthe‘residual53

diversity’method,devisedbySmithandMcGowan(2007)(andmodifiedby54

Lloyd(2012)),hasbeenwidelyused(citationcount~215toAug2016;Google-55

Scholar).56

57

Usingregressionresidualsasdata‘corrected’forconfoundingfactorsisawidely58

usedmethodinbiology,socialsciences,economics(King1986;Freckleton59

2002),andeveninpalaeodiversitystudies(Raup1976).However,Smithand60

McGowan’s(2007)approachdiffersfromtheseclassicalresidualsapproachesin61

onekeyway:the‘residuals’aregeneratednotasregressionresiduals(ε=y-ŷ)62

fromasimpleregressionofdiversity(y)onaproxyofsampling(x),butfrom“a63

modelinwhichrockareaatoutcropwasaperfectpredictorofsampleddiversity”64

(Smith&McGowan2007),herereferredtoasthesampling-drivendiversity65

model(SDDM).TheSDDMisconstructedasaregressionmodelbetweenysorted66

fromlowtohighvalues(y’)andxsortedfromlowtohighvalues(x’),wherethe67

relationshipbetweenthesetwoindependentlysortedvariablesy’andx’is68

assumedtorepresenttheSDDgeneratingprocess–thoughthereisnoreasonto69

assumeassuch.‘Residuals’areobtainedasthedifferencebetweentheSDDM70

Page 6: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

5

predictionsŷ’andtheobservedvaluesy,whicharethentreatedasthe‘residual71

diversityestimates’(figure1).72

73

However,independentlysortingyandxasoutlinedabovedecouplesapaired,74

bivariatedataset,andisobviouslyproblematicinstatistics.Modelfittingon75

decoupleddata(e.g.y’andx’)willleadtospuriouspredictionsand‘residuals’as76

theestimatedregressioncoefficientswillbebasedonaforced(false)linear77

relationship(figure1b).However,owingtocontinuedwideuseoftheSDDMasa78

preferredmethodforidentifyingsupposedly‘true’palaeodiversitysignals(as79

recentlyas(Grossnickle&Newham2016)),itappearsthatthisbasicstatistical80

conceptissomehowoverlooked.Whileithasbeensuggestedthattheuseof81

formationcountsto‘correct’palaeodiversitytimeseriesdataisunlikelytobe82

meaningfulbecauseofsubstantialredundancyofthetwometrics(Bentonetal.,83

2011;Benton2015),andarecentstudyhasscrutinizedtheperformanceof84

SDDMresidualsinaccuratelypredictingtruesimulatedbiodiversitysignals85

(Brocklehurst2015),theperformanceoftheSDDMitselfhasneverbeen86

assessed.Here,wedemonstratethedetrimentaleffectsofdecouplingdatain87

regressionmodellingusingsimplesimulations.88

89

90

MATERIALANDMETHODS91

Wefirstgeneratedrandomdeviates,x,samplingfromanormaldistribution(μ=92

0,σ=1),atasamplesizen=100(seeSIforothersamplesizesn=30and1000).93

Wethencalculatedyusingalinearrelationshipintheformofy=a+bx+e,94

whereaistheintercept,bistheslopeandeisGaussiannoise.Forsimplicity,we95

Page 7: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

6

fixeda=0.4andb=0.6,whilevaryinge(μe=0,σe=0.05,0.1,0.25,0.5)–other96

valuesofaandbshouldreturnsimilarifnotidenticalresults(though,b=197

wouldbemeaningless).FollowingSmithandMcGowan(2007),wesortedyandx98

independentlyofeachothertogeneratey’andx’,andfittedanordinaryleast99

squares(OLS)regressionmodeltoy’onx’(SDDM).Forcomparison,wefittedan100

OLSregressionmodeltoyonxintheiroriginalpairedbivariaterelationship(the101

standardregressionmodel,SRM),theperformanceofwhichservesasa102

benchmark.103

104

TotestSmithandMcGowan’s(2007)assertionthattheSDDMisindeed“amodel105

inwhichrockareaatoutcropwasaperfectpredictorofsampleddiversity”,we106

evaluatedwhethertheestimatedregressioncoefficientsαandβsignificantly107

differedfromthetrueregressionparameters,aandb,usingat-test.Werepeated108

theprocedureover5000simulationsandcalculatedthepercentageoftimesthe109

estimatedcoefficientsdifferedsignificantlyfromthetrueparameters.Wewould110

expectabout5%ofthesimulationstoresultinregressioncoefficients111

significantlydifferentfromthetrueparametersbychancealone;anything112

substantiallyabovethisthresholdwouldindicatethatthemodelhas113

unacceptablyhighTypeIerrorratesorfalselyrejectingatruenullhypothesis,114

whereournullhypothesisisthattheSDDMcancorrectlyestimatethe‘true’115

modelparameters.116

117

Inaddition,wetestedforbiasintheestimatedregressionslopes,i.e.whetherthe118

estimatessystematicallydeviatedfromthesimulationparameterb=0.6.The119

meanofthe5000slopeswassubjectedtoat-testagainstafixedvalueof0.6.If120

Page 8: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

7

deviationswererandom,thenwewouldnotexpecttofindanysignificant121

differencesbetweenthemeanslopeandthetheoreticalvalue,withallslopes122

randomlydistributedaroundit.123

124

125

RESULTS126

SRMcoefficientsweresignificantlydifferentfromthetruemodelparametersin127

only~5%ofthe5000iterationsacrossσe(figure2a;table1;SI),within128

acceptablelevelsofrandomlydetectingastatisticalsignificance.Variationin129

regressionlinesacross5000iterationsaredistributedrandomlyaboutthe130

simulatedline(figure3a),withnosignificantdifferencebetweenthemean131

regressionslopeandthesimulationparameterb=0.6(table2;SI).Incontrast,132

SDDMcoefficientsweresignificantlydifferentfromthetrueparameters(figure133

2b)ataratemuchhigherthantheconventionallyaccepted5%(table1;SI).The134

meanslopeoftheregressionmodelssignificantlydifferedfromthesimulation135

parameterb,inasystematicallyanddirectionalmanner(figure3b;table2;SI)–136

SDDMregressioncoefficientsarenotonlyincorrectbutgrosslymisleading.This137

systematicbiasincreaseswithincreasednoiseinthedata(table2)–themore138

noisethereisinthedata,themorepositivetherelationshipbetweeny’andx’139

becomes.140

141

142

DISCUSSION143

Byestablishing“amodelinwhichrockareaatoutcropwasaperfectpredictorof144

sampleddiversity”,SmithandMcGowan(2007)attemptedtocreateasampling-145

Page 9: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

8

drivendiversitymodel.However,theirSDDMisnotbasedonanyhypothesized146

orempiricalrelationshipbetweendiversityandsampling,orformulatedfrom147

firstprinciples.Thisisincontrasttootherwell-formulatedbiologicalmodels148

suchasvariousscalingmodelswheretheparameterofinterest(i.e.scaling149

coefficientortheslopeofthebivariaterelationship)isfoundedonfirst-principle150

theories,e.g.the2/3ruleforthescalingofareawithmass.Rather,theSDDMis151

basedontheassumptionthaty’andx’(yandxsortedindependentlyofeach152

other)formtheexpectedtheoreticalbivariaterelationshipbetweenyandx,153

whichthisstudyshowstobeincorrect(figures2,3),asonewouldexpectsince154

thereisnoreasontoassumesuchathing.155

156

Afurtherandperhapsmoreseriousproblemwithusingaforcedpairingofy’and157

x’isthateachdatapoint(pairofy’iandx’i)doesnotrepresentanaturalpairing158

andhasnomeaning;thenewpairingisactuallyyiandxj,wheretheithandjth159

ordersareindependentofeachother.Forinstance,usingthemarinegeneric160

diversityandrockareadataofSmithandMcGowan(2007)(figure4),thelowest161

marinegenericdiversityisintheCambrian,TommotianStage(529–521million162

yearsago[Ma];genuscount=309),whilethesmallestmarinerockoutcroparea163

(afterremoving0valueddata(Smith&McGowan2007))isfromtheEarly164

Permian,Asselian/SakmarianStage(299–290Ma;rockarea=1).Similarly,the165

highestdiversityisrecordedforthePliocene(5.3–2.58Ma;genuscount=3911)166

whilethelargestrockareaisfoundintheCenomanian(100–94Ma;rockarea=167

373).Thesetwoextremepointsalonedemonstratethatthepaireddiversityand168

rockareavaluesaremillionsofyearsapart,andareindependentofeachother169

(figure4).170

Page 10: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

9

171

Thismaybeobvious,butindependentlysortingyandxhasseriousstatistical172

consequences.Forinstance,inSmithandMcGowan’s(2007)data,log10marine173

genericdiversityhasnosignificantrelationshipwithlog10rockareaintheir174

originalpairedbivariatedata(figure4;r2=0.0398;p=0.0979),butoncesorted,175

hasasignificantlystrongpositiverelationshipwithlog10rockareasorted176

independentlyoflog10diversity(figure4;r2=0.903;p<0.001).Thisgeneral177

patternistrueinatleasttwomoredatasets(Bensonetal.2010;Benson&178

Upchurch2013)(figuresS1andS2).Theindependentsortingprocedurehas179

forcedastrongbutfalselinearrelationshipbetweentwovariablesthat180

otherwisedonotshowanysignificant(orifsignificant,averyweak)181

relationship.Infact,tworandomlygenerateddeviates(e.g.sampledfroma182

normaldistribution)thathavenorelationshipwitheachother(figure5a),once183

sortedindependentlyfromlowesttohighestwillinevitablyhaveasignificant184

andstrongrelationship(r2=~1;figure5b).Perhapsmoredetrimental,isthefact185

thattheindependentlysortedbivariaterelationshipwillalwaysbestrongly186

positive–asimulatednegativerelationshipbetweenxandy(figure5c)willhave187

astrongandpositiverelationshiponcetheyaresortedindependently(figure188

5d).189

190

Insomeclades(namelyMesozoicdinosaurs),diversitymeasurescanhavevery191

stronglypositiverelationshipswithsomesamplingmetrics,suchasgeological192

formationcounts(β=0.868;r2=0.85;p<0.001(Barrett,McGowan&Page193

2009))orfossilcollectioncounts(β=0.865;r2=0.79;p<0.001(Butleretal.194

2011)),whichwouldjustifycorrectingforsuchconfoundingfactors,ifthe195

Page 11: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

10

samplingmetricswereindeednon-redundantwithdiversity(Bentonetal.2011;196

Bentonetal.2013;Benton2015).However,eveninsuchcases,itdoesnot197

changethefactthatthemodelledrelationshipobtainedfromtheSDDMwillstill198

besystematicallybiased(figure3),andalternativemethodsshouldbe199

considered.200

201

Itisproblematictostipulatethatthisforcedrelationshipisthe‘true’relationship202

betweensampledpalaeodiversityandtherockrecord.Oursimulationsshow203

thatregressionmodelsfittedonindependentlysorteddatahaveunacceptably204

highTypeIerrorrateswhenthedatagenerationprocessesareknown,meaning205

thatSmithandMcGowan’s(2007)approachisnotstatisticallyviable.In206

particular,thattheslopesareincorrectlyestimatedatveryhighrates(~100%207

whenσe=0.5)hassevereconsequencesinthatSDDMpredictionsare208

systematicallybiased(figures2b,3b),leadingtoerroneous‘residuals’.209

Inferencesmadefromsuchproblematic‘residuals’(e.g.Smith&McGowan2007;210

Barrett,McGowan&Page2009;Bensonetal.2010;Butleretal.2011;Benson&211

Upchurch2013)willinevitablybemisleading(Brocklehurst2015),lackingany212

biologicalorgeologicalmeaning.213

214

Givenoursimulations,westronglyrecommendagainstusingtheSDDM215

approachinmodellingtherelationshipbetweenpalaeodiversityandrockrecord216

data;thestandardregressionusingunsorteddataisasensibleoption.However,217

usingtheresidualsofaregressionmodelasdataforsubsequentanalyseshas218

alsolongbeenknowntointroducebiasedstatisticalestimates(King1986;219

Freckleton2002).Successiveseriesofmodellingremovesvarianceanddegrees220

Page 12: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

11

offreedomfromsubsequentmodelparameterestimation,sothefinalmodels221

andstatisticalanalysesdonotaccountfortheremovederrorsappropriately222

(King1986).Instead,onecandirectlymodeltheconfoundingeffectsalongwith223

effectsofinterest(e.g.environment,climate,etc)throughmultipleregressions224

(OLS,GLMsorgeneralizedleastsquares[GLS]).Inthecontextofpalaeodiversity225

studies,onecanfitamultipleregressionmodelusingsomediversitymetricas226

theresponsevariableandsamplingproxyasaconfoundingcovariate,alongside227

additionalpredictorvariablessuchassealevel,temperature,etc.Theresulting228

modelcoefficientsfortheenvironmentalpredictorswouldbetheeffectsof229

interestafteraccountingfortheundesiredeffectsofrockavailability.Since230

diversitymeasuresarefrequentlytakenascounts,itisadvisabletousemodels231

thatappropriatelyaccountforerrorsincountdata,suchasthePoissonor232

negativebinomialmodels(O'Hara&Kotze2010).Whetherornottoincludetime233

seriesterms(e.g.autoregressive[AR]terms)dependsonthelevelofserial234

autocorrelationinthetimeseriesdataandonsamplesize;palaeontologicaltime235

seriestendtobeshort,with30timebinsorfewerbeingfairlytypical(Mesozoic236

dinosaursonlyspanamaximumof26geologicalstages(Butleretal.2011;237

Benson&Mannion2012)),inwhichcasecomplexmodelsfacetherisksofover-238

parameterisation.ModelselectionproceduresusingtheAkaikeInformation239

Criterion(Akaike1973)orsimilarindicescanhelpmakethisdecision(Burnham240

&Anderson2002).However,wedonotlightlyadvocatetheuseoftimeseries241

modelling,especiallyifthedependentvariable,sampleddiversity,isintheform242

ofcounts,inwhichcaseappropriatetimeseriesmethodsareseverelyunder-243

developed(butseegeneralisedlinearautoregressivemovingaverage[GLARMA]244

models(Dunsmuir&Scott2015)orPoissonexponentiallyweightedmoving245

Page 13: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

12

average[PEWMA]models(Brandtetal.2000)),butmoreimportantlysince246

therearemoreappropriatealternativemethods,i.e.phylogeneticapproaches(?247

Silvestroetal.2015;Sakamoto,Benton&Venditti2016).248

249

Fundamentally,macroevolutionarystudiesaimtoincreaseourunderstandingof250

evolutionaryprocesses(speciationandextinctionthroughtime),ratherthanthe251

resultingpatternsorphenomena(sampleddiversity,e.g.richness).Thus,we252

shouldseektocharacterizetheprocessusingbiologicallymeaningfuland253

interpretablemodelsinsteadofdescribingthepatterns.Further,simply254

exploringerrorinthefossilrecordinitselfseemsratherfruitlessbecause255

uncertaintydependsonthequestionsbeingposed;palaeontologicalstudiesof256

macroevolutionshouldbenodifferentthanotherstatisticalapproachesinthe257

naturalsciencesinthatuncertaintyisassessedwhileexploringthephenomena258

ofinterest(Benton2015).Explicitlyphylogeneticapproaches(e.g.Lloydetal.259

2008;Didier,Royer-Carenzi&Laurin2012;Stadler2013;Stadleretal.2013;260

Sakamoto,Benton&Venditti2016)offerthebestandmostappropriatemeansto261

tacklequestionsofevolutionaryprocesses.Especiallywhenextrinsiccausal262

mechanismsforchangesinbiodiversityaretestedusingregressionmodels,263

ignoringphylogenyisinseriousviolationofstatisticalindependence264

(Felsenstein1985;Harvey&Pagel1991).Itisalsoworthnotingthat265

subsamplingapproaches(e.g.Alroy’sSQS(Alroy2010a;Alroy2010b;Alroy266

2010c))aregainingwidepopularityasmodernmethodstoaccountforsampling267

bias,theyarenotwithoutproblems(Hannisdaletal.2016),andcertainlydonot268

takesharedancestrydescribedbyphylogenyintoaccount,thusalsosuffering269

statisticalnon-independence(Felsenstein1985;Harvey&Pagel1991),andcan270

Page 14: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

13

frequentlyresultinincorrectinterpretationofthedata.Forinstance,while271

recentstudiesusingbinnedtimeseriesapproaches(includingSDDMandSQS)272

haveledtomixedconclusionsregardingthelong-termdemiseofdinosaurs273

beforetheirfinalextinctionattheCretaceous-Paleogene(K-Pg)boundary66274

millionyearsago(Ma)(Barrett,McGowan&Page2009;Lloyd2012;Brusatteet275

al.2015),anexplicitlyphylogeneticBayesiananalysishasstronglysuggested276

thatdinosaurswereindeedinalong-termdeclinetensofmillionsofyearsprior277

totheK-Pgmassextinctionevent,inwhichspeciationratewasexceededby278

extinctionrateanddinosaurswereincreasinglyincapableofreplacingextinct279

taxawithnewones(Sakamoto,Benton&Venditti2016).Suchevolutionary280

dynamicscannotbeidentifiedusingtime-binned(tabulated)data.Phylogenetic281

mixedmodellingapproaches(Hadfield2010)furtherallowtheincorporationof282

confoundingvariablessuchassamplingbutalsoenvironmentaleffects283

(Sakamoto,Benton&Venditti2016).Therefore,inordertoadvanceour284

understandingoftheevolutionarydynamicsofbiodiversity,speciationand285

extinctionthroughtime(ortheunderlyingprocessgeneratingtheobserved286

patternsinsampleddiversity,e.g.taxonrichness),whileaccountingforsampling287

andphylogeneticnon-independence,itisimperativethatwehaveanabundance288

oflarge-scalecomprehensivephylogenetictreesoffossil(andextant)taxa.289

290

291

ACKNOWLEDGEMENTS292

WethankJoBaker,CiaraO’DonovanandHenryFerguson-Gowfordiscussion293

andinsightfulcomments.WealsothankNeilBrocklehurstandMichelLaurinfor294

Page 15: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

14

reviewingthismanuscriptandprovidinghelpfulcommentary.Wehaveno295

conflictsofinterest.296

297

298

FUNDING299

MSandCVarefundedbyLeverhulmeTrustResearchProjectGrantRPG-2013-300

185(awardedtoCV).MJBisfundedbyNaturalEnvironmentResearchCouncil301

StandardGrantNE/I027630/1.302

303

304

REFERENCES305

Akaike,H.(1973)Informationtheoryandanextensionofthemaximum306likelihoodprinciple.2ndInternationalSymposiumonInformationTheory307(edsB.N.Petrov&F.Csaki),pp.267–281.AkademiaiKiado,Budapest.308

Alroy,J.(2010a)Fairsamplingoftaxonomicrichnessandunbiasedestimationof309originationandextinctionrates.Quantitativemethodsinpaleobiology.310PaleontologicalSocietyPapers,16,55-80.311

Alroy,J.(2010b)Geographical,EnvironmentalandIntrinsicBioticControlson312PhanerozoicMarineDiversification.Palaeontology,53,1211-1235.313

Alroy,J.(2010c)TheShiftingBalanceofDiversityAmongMajorMarineAnimal314Groups.Science,329,1191-1194.315

Barrett,P.M.,McGowan,A.J.&Page,V.(2009)Dinosaurdiversityandtherock316record.ProceedingsofTheRoyalSocietyB-BiologicalSciences,276,2667-3172674.318

Benson,R.B.J.,Butler,R.J.,Lindgren,J.&Smith,A.S.(2010)Mesozoicmarine319tetrapoddiversity:massextinctionsandtemporalheterogeneityin320geologicalmegabiasesaffectingvertebrates.ProceedingsofTheRoyal321SocietyB-BiologicalSciences,277,829-834.322

Benson,R.B.J.&Mannion,P.D.(2012)Multi-variatemodelsareessentialfor323understandingvertebratediversificationindeeptime.BiologyLetters,8,324127-130.325

Benson,R.B.J.&Upchurch,P.(2013)Diversitytrendsintheestablishmentof326terrestrialvertebrateecosystems:Interactionsbetweenspatialand327temporalsamplingbiases.Geology,41,43-46.328

Benton,M.J.(2015)Palaeodiversityandformationcounts:redundancyorbias?329Palaeontology,58,1003-1029.330

Page 16: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

15

Benton,M.J.,Dunhill,A.M.,Lloyd,G.T.&Marx,F.G.(2011)Assessingthequalityof331thefossilrecord:insightsfromvertebrates.ComparingtheGeologicaland332FossilRecords:ImplicationsforBiodiversityStudies,358,63-94.333

Benton,M.J.,Ruta,M.,Dunhill,A.M.&Sakamoto,M.(2013)Thefirsthalfof334tetrapodevolution,samplingproxies,andfossilrecordquality.335PalaeogeographyPalaeoclimatologyPalaeoecology,372,18-41.336

Brandt,P.T.,Williams,J.T.,Fordham,B.O.&Pollins,B.(2000)Dynamicmodeling337forpersistentevent-counttimeseries.AmericanJournalofPolitical338Science,44,823-843.339

Brocklehurst,N.(2015)Asimulation-basedexaminationofresidualdiversity340estimatesasamethodofcorrectingforsamplingbias.Palaeontologia341Electronica,18.342

Brusatte,S.L.,Butler,R.J.,Barrett,P.M.,Carrano,M.T.,Evans,D.C.,Lloyd,G.T.,343Mannion,P.D.,Norell,M.A.,Peppe,D.J.,Upchurch,P.&Williamson,T.E.344(2015)Theextinctionofthedinosaurs.BiologicalReviews,90,628-642.345

Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference:346apracticalinformation-theoreticalapproach,2ndedn.Springer,New347York.348

Butler,R.J.,Benson,R.B.J.,Carrano,M.T.,Mannion,P.D.&Upchurch,P.(2011)Sea349level,dinosaurdiversityandsamplingbiases:investigatingthe'common350cause'hypothesisintheterrestrialrealm.ProceedingsofTheRoyalSociety351B-BiologicalSciences,278,1165-1170.352

Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNaturalSelection,orthe353PreservationofFavouredRacesintheStruggleforLife,FirstEditionedn.,354London,UK.355

Didier,G.,Royer-Carenzi,M.&Laurin,M.(2012)Thereconstructedevolutionary356processwiththefossilrecord.JournalOfTheoreticalBiology,315,26-37.357

Dunsmuir,W.T.M.&Scott,D.J.(2015)TheglarmaPackageforObservation-358DrivenTimeSeriesRegressionofCounts.JournalofStatisticalSoftware,35967,1-36.360

Felsenstein,J.(1985)PhylogeniesandtheComparativeMethod.American361Naturalist,125,1-15.362

Freckleton,R.(2002)Onthemisuseofresidualsinecology:regressionof363residualsvs.multipleregression.(vol71,pg542,2002).JournalofAnimal364Ecology,71,722-722.365

Grossnickle,D.M.&Newham,E.(2016)Therianmammalsexperiencean366ecomorphologicalradiationduringtheLateCretaceousandselective367extinctionattheK–Pgboundary.ProceedingsoftheRoyalSocietyof368LondonB:BiologicalSciences,283.369

Hadfield,J.D.(2010)MCMCmethodsformulti-responseGeneralizedLinear370MixedModels:TheMCMCglmmRPackage.JournalofStatisticalSoftware,37133,1-22.372

Hannisdal,B.,Haaga,K.A.,Reitan,T.,Diego,D.&Liow,L.H.(2016)Common373specieslinkglobalecosystemstoclimatechange.bioRxiv,043729.374

Harvey,P.H.&Pagel,M.D.(1991)Thecomparativemethodinevolutionary375biology.OxfordUniversityPress.376

King,G.(1986)HowNottoLiewithStatistics-AvoidingCommonMistakesin377QuantitativePolitical-Science.AmericanJournalofPoliticalScience,30,378666-687.379

Page 17: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

16

Lloyd,G.T.(2012)Arefinedmodellingapproachtoassesstheinfluenceof380samplingonpalaeobiodiversitycurves:newsupportfordeclining381Cretaceousdinosaurrichness.BiologyLetters,8,123-126.382

Lloyd,G.T.,Davis,K.E.,Pisani,D.,Tarver,J.E.,Ruta,M.,Sakamoto,M.,Hone,383D.W.E.,Jennings,R.&Benton,M.J.(2008)DinosaursandtheCretaceous384TerrestrialRevolution.ProceedingsOfTheRoyalSocietyB-Biological385Sciences,275,2483-2490.386

O'Hara,R.B.&Kotze,D.J.(2010)Donotlog-transformcountdata.Methodsin387EcologyandEvolution,1,118-122.388

Prothero,D.(1999)Fossilrecord.Encyclopediaofpaleontology(ed.R.Singer).389FitzroyDearbonPublishers,Chicago,USA.390

Raup,D.M.(1972)TaxonomicDiversityduringthePhanerozoic.Science,177,3911065-1071.392

Raup,D.M.(1976)SpeciesDiversityinthePhanerozoic:AnInterpretation.393Paleobiology,2,289-297.394

Raup,D.M.(1991)Extinction:badgenesorbadluck?W.W.Norton,NewYork.395Sakamoto,M.,Benton,M.J.&Venditti,C.(2016)Dinosaursindeclinetensof396

millionsofyearsbeforetheirfinalextinction.ProceedingsoftheNational397AcademyofSciences,USA,113,5036-5040.398

Silvestro,D.,Antonelli,A.,Salamin,N.&Quental,T.B.(2015)Theroleofclade399competitioninthediversificationofNorthAmericancanids.Proceedings400ofTheNationalAcademyofSciences,USA,112,8684-8689.401

Smith,A.B.&McGowan,A.J.(2007)Theshapeofthephanerozoicmarine402palaeodiversitycurve:Howmuchcanbepredictedfromthesedimentary403rockrecordofwesternEurope?Palaeontology,50,765-774.404

Stadler,T.(2013)Recoveringspeciationandextinctiondynamicsbasedon405phylogenies.JournalOfEvolutionaryBiology,26,1203-1219.406

Stadler,T.,Kuhnert,D.,Bonhoeffer,S.&Drummond,A.J.(2013)Birth-death407skylineplotrevealstemporalchangesofepidemicspreadinHIVand408hepatitisCvirus(HCV).ProceedingsofTheNationalAcademyofSciences,409USA,110,228-233.410

411

Page 18: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

17

TABLES412Table1.TypeIerrorrates(%)forSRM(StandardRegressionModel)andSDDM413(Sampling-DrivenDiversityModel)estimates(interceptαandslopeβ)across414residualerror(σe).415416

σeSRM SDDM

α β α β0.05 5.34 4.90 26.1 28.50.10 4.84 4.92 40.2 48.40.25 4.82 4.78 57.3 91.30.50 5.48 5.14 68.7 100.0

417

Page 19: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

18

Table2.t-testresultsbetweenmeanregressionslopesof5000iterationsandthe418theoreticalslopeb=0.6,forSRM(StandardRegressionModel)andSDDM419(Sampling-DrivenDiversityModel)acrossresidualerror(σe).420421

σeSRM SDDM

mean-slope t-value p-value mean-slope t-value p-value0.05 0.6 1.230 0.220 0.602 20.9 00.10 0.6 -1.790 0.073 0.607 46.0 00.25 0.6 -0.042 0.967 0.646 131.0 00.50 0.6 0.685 0.493 0.775 244.0 0

422

Page 20: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

19

FIGURES423

424Figure1.Procedureforgenerating‘residuals’fromasampling-drivendiversity425

model.(a)Apaired,bivariatedatasetx(samplingproxy)andy(sampled426

diversity)wassimulatedsothatxisrandomlydrawnfromanormaldistribution427

(μ=0,σ=1)andyiscalculatedasy=a+bx+ewherea=0.4,b=0.6andeis428

noise(μ=0,σ=0.5).ThethickblacklineistheexpectedrelationshipY=a+bx.429

Verticallinesrepresentthetrueresidualsordeviationsinyfromthethickline.430

(b)FollowingSmithandMcGowan(2007)xandyaresortedfromlowtohigh431

valuesindependentofeachother(x’andy’respectively),andanordinaryleast432

squares(OLS)regressionmodel(pinkline)isfittedtoy’onx’.Despitethepink433

linesupposedlyrepresentingthedatageneratingprocess,itisclearthatitisnot434

agoodestimatorofthetrueknowngeneratingprocess,thethickline.(c)The435

OLSmodelfrom(b)isusedasthesampling-drivendiversitymodel(SDDM)or436

theexpectedrelationshipbetweenyandx,fromwhich‘residuals’arecomputed437

asthedeviationsinyfromthepinkline(verticalpinkdottedlines).Itis438

immediatelyclearthatthereisasubstantialdifferencebetweenthetrue439

residuals(a)andtheSDDM‘residuals’(c).440

441

x

y

● ●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2 3

−1

0

1

2

(a)

x'

y'

●●

● ●●

●●●●●

● ●●● ●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●● ●●●●

●●●●●

●●●●●●●●

●●

−2 −1 0 1 2 3

−1

0

1

2

(b)

x

y

−2 −1 0 1 2 3

−1

0

1

2

(c)

Page 21: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

20

442

Figure2.Regressionmodellingonadecoupledbivariatedatasetfailstoestimate443

thesimulationslopeparameter.(a)Abivariatedataset(yandx)wasgenerated444

soastofollowatheoreticalrelationship(thickline)withintercepta=0.4,slope445

b=0.6andnoise(e[μe=0,σe=0.5]).Thebest-fitregressionline(blue)isnot446

significantlydifferentfromthetheoreticalline(dashed95%confidenceintervals447

encompassthethickline;seetable1forTypeIerrorratesover5000448

simulations),withyandxformingamoderatelystrongrelationship(r2=0.526)449

appropriateforthedegreeofemodelled.Regressionmodelresiduals(vertical450

lines)shownostructure,asexpected.(b)Thebivariatedatain(a)weresorted451

independentlyofeachother(y’andx’),towhicharegressionmodelwasfitted.452

Thebest-fitsampling-drivendiversitymodel(SDDM)regressionline(pink)453

deviatesstronglyfromthetheoreticalrelationship(dashed95%confidence454

intervalsdonotencompassthethickline;table1),andy’andx’formavery455

strong(butfalse)linearrelationship(r2=0.973).Regressionresiduals(vertical456

lines)showclearstructure.Onepairofmodelcomparisonoutof5000457

simulationsisshown.458

459

Page 22: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

21

460

Figure3.SDDMregressionpredictionsaresystematicallybiased.(a)Standard461

regressionlines(blue)for5000simulateddatasetsatσe=0.5deviaterandomly462

aroundthetheoreticalrelationship(thickline)withthemeanslopeshowingno463

significantdifferencefromthetheoreticalslopeb=0.6(table2).(b)SDDM464

regressionlinesondecoupleddatasets(pink)deviatesystematicallyawayfrom465

thetheoreticalrelationship(thickline),withasignificantdifferencebetweenthe466

meanregressionslopeandthetheoreticalslope(table2).467

468

Page 23: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

22

469

Figure4.Thedifferencebetweentheoriginalpaired,bivariaterelationship(a)470

andtheforced,falserelationship(b)shownusingthedatafromSmithand471

McGowan(2007).Log-transformedmarinegenericdiversityhasanon-472

significantandweakrelationshipwithlog-transformedrockarea(β=0.105;r2=473

0.0398;p=0.0979;a).However,oncediversityandrockareaaresorted474

independentlyofeachotherfollowingSmithandMcGowan(2007),thenthe475

relationshipbecomessignificantandstrong(β=0.499;r2=0.903;p<0.001;b).476

Pointsarecolouredaccordingtotheirgeologicalagewithcoolercoloursonthe477

olderandwarmercoloursontheyoungerendsofthetimescale.Filledand478

outlinecoloursin(b)correspondtotheagesoftherockrecordanddiversity479

respectively,anddemonstratevisuallythemismatchbetweeny’andx’.Dashed480

linesareconfidenceintervals,whiledottedlinesarepredictionintervals.481

482

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

Rock area

Dive

rsity

0.0 0.5 1.0 1.5 2.0 2.5

2.6

2.8

3.0

3.2

3.4

3.6

(a)

● ●

●● ●

●●●

● ●

●●

●●●●●●●●●●●●●●●

●●●●●●●●●●● ●●●

●●●

●●● ●●

●●●●●●

●●●●●●●

●●●●

Sorted rock area

Sorte

d di

vers

ity

0.0 0.5 1.0 1.5 2.0 2.5

2.6

2.8

3.0

3.2

3.4

3.6

(b)

Page 24: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

23

483

Figure5.Independentlysortinganytwovariablesresultsinaforcedpositive484

relationship.(a)Tworandomlygeneratedvariablesyandxshownosignificant485

relationshipsacross1000simulations,withtheslopesoftheregressionlines486

(blue)distributedaroundtheexpectedslopeofzero.(b)Whenregression487

modelsarefittedonindependentlysorteddatasets(y’andx’),estimatedslopes488

aresignificantlydifferentfromtheexpectedvalueofzero,andresultinastrong489

positiverelationship(r2=~1;insetpink)despitetheunrelatednatureofthe490

originaldatasets(r2=~0;insetblue).(c)Abivariatedataset(yandx)was491

generatedsoastofollowatheoreticalrelationship(thickline)withintercepta=492

0.4,slopeb=-0.6andnoise(e[μe=0,σe=0.5]).Standardregressionlines(blue)493

x

y

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

−1

0

1

2

(a)

(c) (d)x '

y'

● ●●●

● ●

●●●●●

●●

●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●● ● ● ●●●● ●●

●● ●●

●●

−2 −1 0 1 2

−2

−1

0

1

2

(b)

r2

Frequency

0.00 0.50 1.00

0

200

400

600

x

y

−2 −1 0 1 2

−2

−1

0

1

2

x '

y'

−2 −1 0 1 2

−2

−1

0

1

2

Page 25: Sakamoto, M., Venditti, C. , & Benton, M. J. (2017 ... · Sakamoto, M., Venditti, C., & Benton, M. J. (2017).'Residual diversity estimates' do not correct for sampling bias in palaeodiversity

24

deviaterandomlyaroundthetheoreticalrelationshipwiththemeanslope494

showingnosignificantdifferencefromthetheoreticalslopeb=-0.6.(d)However495

oncesortedindependently,regressionlines(pink)deviatesystematicallyaway496

fromthetheoreticalrelationship,withallestimatedslopesbeingpositive.Thus497

SDDMslopeestimatesaresystematicallyanddirectionallybiased.498

499