Evaluang Methods - Tandy Warnowtandy.cs.illinois.edu/Method-Evaluation.pdf ·...

Post on 17-Aug-2020

3 views 0 download

Transcript of Evaluang Methods - Tandy Warnowtandy.cs.illinois.edu/Method-Evaluation.pdf ·...

Evalua&ngMethods

TandyWarnow

You’vedesignedanewmethod!Nowwhat?

Toevaluateanewmethod:•  Establishtheore&calproper&es.•  Evaluateondata.•  Comparethenewmethodtoothermethods.Howdoyoudothis?

GeneralIssues

•  Sofarwehavecomputedtreesandwehavecomputedalignments.

•  Howcanwequan&fyaccuracyorerror?Whatdatasetsshouldweuse?

•  Whataretheissues?

Basiccriteria

•  Sensi&vity=trueposi&verate=recallrate= TP/(TP+FN)

•  Precision=posi&vepredic&vevalue= TP/(TP+FP)

•  Specificity=truenega&verate= TN/(TN+FP)

•  FalseDiscoveryRate=1-PPV

Trueposi&ves,falseposi&ves,etc.

•  Forthesecriteria,weneedtounderstandtheconceptsof–  trueposi&ve,–  falseposi&ve,–  truenega&ve,and–  falsenega&ve

•  Inotherwords,weneedtohavea“yes/no”classifier.

Simpleexample:HIVtes&ng

•  Samplespace:HIVtests(Eliza)– Trueposi&ve:thetestcomesoutposi&veandthepersondoeshaveHIV

– Truenega&ve:thetestcomesoutnega&veandthepersondoesnothaveHIV

– Falseposi&ve:thetestcomesoutposi&vebutthepersondoesnothaveHIV

– Falsenega&ve:thetestcomesoutnega&veandthepersondoeshaveHIV

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalseposi&verate?– Whatisthefalsenega&verate?

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalseposi&verate?– FPrate=#falseposi&vesdividedbythenumberoftotalposi&ves,soFP/(FP+TP)=11/20=55%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalsenega&verate?– FNrate=#falsenega&vesdividedbythenumberoftotalnega&ves,soFN/(FN+TN)=1/990=0.1%

Hypothe&calExample

•  Thepopula&onis1,000samples•  10ofthemhavethedisease,990donot•  Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalsenega&verate?– FNrate=#falsenega&vesdividedbythenumberoftotalnega&ves,soFN/(FN+TN)=1/990=0.1%

GeneralIssues

•  Sofarwehavecomputedtreesandwehavecomputedalignments.

•  Howcanwequan&fyaccuracyorerror?Whatdatasetsshouldweuse?

•  Whataretheissues?

Performance criteria •  Running time •  Space •  Statistical performance issues (e.g., statistical

consistency and sequence length requirements) •  “Topological accuracy” with respect to the underlying

true tree, typically studied in simulation. •  Accuracy with respect to a mathematical score (e.g.

tree length or likelihood score) on real data

Sta&s&calConsistency

error

Data

FN: false negative (missing edge) FP: false positive (incorrect edge)

FN

FP

50% error rate

AlignmentError/Accuracy

•  SPFN:percentageofhomologiesinthetruealignmentthatarenotrecovered(falsenega&vehomologies)

•  SPFP:percentageofhomologiesinthees&matedalignmentthatarefalse(falseposi&vehomologies)

•  TC:totalnumberofcolumnscorrectlyrecovered•  SP-score:percentageofhomologiesinthetruealignmentthatarerecovered

•  Pairsscore:1-(avgofSP-FNandSP-FP)

OtherAlignmentEs&ma&onCriteria

•  Treetopologyerror•  Treebranchlengtherror

•  Gaplengthdistribu&on•  Inser&on/dele&onra&o•  Alignmentlength•  Numberofindels

StudyingMethods

•  Thepointistoevaluateanewmethodincomparisontopriormethods.

•  Youneedtodothisondata,notjustusingtheorems.

•  Howdoyoudothis?

Benchmarks

•  Simula&ons:cancontroleverything,andtruealignmentisnotdisputed– Differentsimulators

•  Biological:can’tcontrolanything,andreferencealignmentandreferencetreemightnotbecorrect.Alignmentbenchmarksarealsosomewhatproblema&c,forvariousreasons:–  BAliBASE,HomFam,Prefab–  CRW(Compara&veRibosomalWebsite)

24 Brief introduction to phylogenetic estimation

Simulation Studies

S1 S2

S3 S4

S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-CT-------GACCGC-- S4 = -------TCAC--GACCGACA

S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA

S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-C--T-----GACCGC-- S4 = T---C-A-CGACCGA----CA

Compare

True tree and alignment

S1 S4

S3 S2

Estimated tree and alignment

Unaligned Sequences

Figure 1.6 A simulation study protocol. Sequences are evolved down a model tree under a processthat includes insertions and deletions; hence, the true alignment and true tree are known. An align-ment and tree are estimated on the generated sequences, and then compared to the true alignmentand true tree.

distance) between two trees is the number of non-trivial bipartitions that are present in oneor the other tree but not in both trees.

Each of these ways of quantifying error in an estimated tree can be normalized to pro-duce a proportion between 0 and 1 (equivalently, a percentage between 0 and 100). Forexample, the FN error rate would be the percentage of the non-trivial model tree biparti-tions that are not present in the estimated tree, and the FP error rate would be the percentageof the non-trivial bipartitions in the estimated tree that are not present in the model tree.Finally, the Robinson-Foulds error rate is the RF distance divided by 2n� 6, where nis the number of leaves in the model tree; note that 2n� 6 is the maximum possible RFdistance between two trees on the same set of n leaves.

Figure 1.7 provides an example of this comparison; note that the model tree (called thetrue tree in the figure) is rooted, but the inferred tree is unrooted. To compute the tree error,we unroot the true tree, and treat it only as an unrooted tree. Since both trees are binary(i.e., each non-leaf node has degree three), there are only two internal edges. Each of thetwo trees have the non-trivial bipartition separating S1,S2 from S3,S4,S5, but each tree alsohas a bipartition that is not in the other tree. Hence, the RF distance between the two treesis 2, out of a maximum possible of 4, and so the RF error rate is 50%. Note also that thereis one true positive edge and one false positive edge in the inferred tree, so that the inferredtree has FN and FP rates of 50%.

Designingasimula&onstudy

•  Considertherealismofthesimulator.•  Considerwhetherthecondi&onsaretooeasyortoodifficulttobehelpful.

•  Considerthecompe&ngmethodstoexplore.•  Considersta&s&calsignificance.•  Beconcernedwithrepeatability.

Data

•  Biologicaldata:– Howreliablearethereferencealignmentsandtrees?

•  Simulateddata: – Howrealis&carethesimula&oncondi&ons?

Simulators

•  Sequenceevolu&ondownatree:–  Indels?Ifso,whatlengths?– Subs&tu&onsunderwhatmodel?– Howmanysubs&tu&ons?Howmanyindels?– Howisthetreetopologyandsetofbranchlengthsdefined?

–  Isthetreeultrametric?– Howmanyleavesinthetree(i.e.,#sequences)?– Howlongarethesequences?

Methods

•  Areyoupickingthebestcompe&ngmethods?•  Areyourunningtheminthebestway?

Criteria

•  Areyouusingcriteriathatareconsideredappropriatebytheresearchcommunity?

•  Ifyouareusingnewcriteria,jus&fythesecriteria(andprobablyusethestandardcriteriaanyway).

Repeatability

•  Providefulldetailsabouthowyourantheanalysessothatthesameexperimentcouldbedonebythepersonreadingthepaper.

•  Saveyourdataandmakethemavailabletothereaders.

Wri&ngPapers

Read•  AppendixCinComputa&onalPhylogene&csforguidelinesaboutwri&ngpapersaboutcomputa&onalmethods.

•  “Howtowriteyourfirstpaper”–onmyhomepage

•  “Commonlyencounteredchallengesinresearchethics”–onmyhomepage