PomBase conventions for improving annotation depth, breadth, consistency and accuracy

52
PomBase conven,ons for improving annota,on depth, breadth, consistency and accuracy

Transcript of PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Page 1: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

PomBaseconven,onsforimprovingannota,ondepth,

breadth,consistencyandaccuracy

Page 2: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Annota,onnumbersareimportant…butnumbersaren’teverything…..•  Downstreamuseofannota.onfordata-mininganddata-

analysisislimitedbyerrors,inconsistenciesandomissions.•  PomBaseusesacombina.onofannota.onconven.ons,to

improveinforma.oncontent(annota.oncoverage,specificityandredundancy),andQCmechanismstoiden.fypossibleannota.oninconsistenciesanderrors.

•  Incombina.onthesemechanismsaddressmanyrecurringannota.onissues.

Page 3: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

1.Thedefini.oniscri.cal

Allontologytermshavea“fixed”defini.on•  Ifadefini.onismisleadingorincorrectitsmeaningcannot

bechanged.Tofixthetermisobsoletedandannota.onsaremigrated.

•  Thismakesannota.onsveryrobusttoontologychanges.Ifatermneedstobereposi.onedtheannota.onsremaincorrect.

•  Weannotatetothedefini.on,notthetermname.Alwayscheckthedefini,on.

Page 4: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2.Improvingannota.onspecificity

•  i)Considerdescendantterms•  ii)Vetouseofuninforma.veterms

Page 5: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2i.ConsiderdescendantsAnnotateasspecificallyasexperimentallowsandbeunambiguousaboutthebiology•  regula.on:posi.veornega.ve?•  transla.on:cytoplasmicormitochondrial?•  transport:ofwhat?towhere?how?•  chromosomesegrega.on:mito.cormeio.c?

Iftheavailabletermsareinsufficient,requestamorespecificterm

Page 6: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Foracarboxylicacidcarrier“carboxylicacidtransport”looksini.allyOK•  However“transmembranetransport”isnotexplicithere…Carboxylicacidmightbetransportedinotherways…

2i.Considerdescendantse.g.

Page 7: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Morespecificannota.oncanprovideaddi.onaldetaile.g.•  substrate,•  type(transmembrane),•  some.mesdirec.onalityAddi.onalparentsincreasetheinforma.oncontentasannota.ngindirectlytomoreterms.

2.Considerdescendantse.g.

Page 8: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2.Vetouseofuninforma.vetermsIden.fythesetofontologytermswheremorespecificannota.onshouldbepossible(morebiologicaldetail)Examples:•  e.g.cellularprocess->which•  e.g.transla.on->cytoplasmic?Mitochondrial?•  e.g.transport->ofwhat?towhere?SomeGOtermsarealreadyflaggedasnotformanualannota.on.Reviewandimproveannota.onstovetoedtermsPomBasealotoftheupperontologylevels1175GOtermsblockedforannota.on(only~50viola1ons)

Page 9: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.Improvetheontologies

Page 10: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.i)Missingparents

Originalarrangement

Page 11: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3i.Missingparents

Theseprocessannota.onswereoriginallyindifferentbranchesoftheontology,soallannota.onswererequired

Page 12: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Newarrangement:

3i.Missingparents

Page 13: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.iMissingparents

Collapsed6processesto2.Exactlythesameinforma.oncontentLessredundancy,easierforuserstointerpretannota.on

Page 14: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.iiReportincorrectparents

AKA“TruePathViola.ons”or“TPVs”Forexampleproteinmatura.on--proteinprocessing(part_of)----proteolysis(part_of)(notallproteolysisisprocessingormatura.on)

Page 15: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.ThepowerofAnnota.onExtensionsProvideaddi.onalspecificityforaGOannota.one.g.•  Targetgene(kinasesubstrate,TFregula.ontarget)•  Loca.onofafunc.on•  Localiza.ondependencies(proteinAlocalizesproteinB)•  Spa.alandtemporalaspectsofprocesses,func.ons,loca.ons(cellcyclestage

ofoccurrence)

•  ADDanexampleofageneproductspecificAE

See:Huntleyet.al.AmethodforincreasingexpressivityofGeneOntologyannota.onsusingacomposi.onalapproach.PMID:24885854

Page 16: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

cyclin-dependentproteinserine/threoninekinase•hassubstratejh2involvedinnega.veregula.onofconjuga.onwithcellularfusion•directlyinhibitssrw1involvedinposi.veregula.onregula.onofG1/Stransi.on•hassubstratedrc1involvedinposi.veregula.onofmito.ccellcycleDNAreplica.on•hassubstratecdc18,orc2involvedinnega.veregula.onofDNAreplica.onduringmito.cG2phase•hassubstratexlf1involvedinnega.veregula.onofdouble-strandbreakrepairvianonhomologousendjoining,duringmito.cG2phase•hassubstraterap1involvedinnega.veregula.onofmito.ctelomeretetheringatnuclearperipheryduringmito.cMphase•hassubstratehcn1duringmito.cMphase•hassubstratecut3involvedinposi.veregula.onofmito.cchromosomecondensa.onduringmito.cmetaphase•hassubstratemde4involvedincorrec.onofmerotelicamachment,mito.cduringmito.cmetaphase•hassubstrate,nsk1,involvedinnega.veregula.onofamachmentofmito.cspindlemicrotubulesduringmito.cmetaphase•hassubstratemde4,cut7involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.cmetaphase•hassubstrateklp9involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.canaphaseA•directlyinhibitsclp1involvedinnega.veregula.onofexitfrommitosis•hassubstratebyr4involvedinposi.veregula.onofsepta.onini.a.onsignaling•directlyinhibitsdis2,•hassubstraterum1,crb2,sds23

Linkfunc.on(cyclin-dependent-kinase)totargetgenes,processes,andtemporalinforma.on

4.Annota.onExtensione.g.cdc2

Page 17: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Alterna.ve(humanCDK1):

Notscalableormaintainable

Page 18: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.UsingAEforeffectors•  Reciprocaloftheextension(automated)called“targetof”•  Collectsknown“upstreameffectors”oncdc2page

Page 19: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Wecanuseeffectorsubstrateconnec.onstogeneratenetworks(interac.on,metabolic,regulatory)

•  Providedirec.onallinkstosupportpathwayreconstruc.on

4.UsingAnnota.onExtensionstogeneratenetworks/pathways

sty1 cmk2

srk1

rum1

atf1 srk1

gsa1

gpx1

ntp1

sro1 ish1

Page 20: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.AutomatedAEnetworkse.g.

44/59connectedinautomatednetworkbasedonannotatedconnec.onswithin“regula,onofG2/Mtransi,on”(fissionyeast)(NetworkforeachGOslimcategoryfromtheslimpage)

Page 21: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

•  PomBasepipelinesfilterredundantIEA(InferredfromElectronicAnnota1on)evidence

•  Removes>90%ofIEA(becauseanexis.ngmanualannota.onexists)

Page 22: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

13annota.onsarereducedto4

Sameinforma.on,fewerterms

Page 23: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Incorrectannota.onsaremoreeasilyspomedMis16isnotinvolvedin‘chroma.nmodifica.on,->fixmapping

5.SuppressredundantIEA,QCofmappings

Page 24: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Missingparentsinontologymoreobvious“inorganicanionexchanger”shouldbean‘ancestor’ofGO:0005452,tosuppresstheIEAasredundant

5.SuppressredundantIEA,QCofontology(SPBC543.05c)

Page 25: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

•  >40,000fissionyeastIEAsavailable.•  PomBasefilter36000redundant,retain4000(IEAsareatleast

90%accurateifmanualcorrect).•  ItiseasiertoevaluatetheremainingIEA’stoiden.fy/fix

anomalies

ReducingIEAsover.me

Page 26: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEA•  Moreconciseviewwithzerolossofinforma1on•  IEAmappingsderivedfromasingleexperiment/publica.on

canbeinterpretedasproofbyrepe11onandmakeweakEXPdataappearmul.plysupported/acceptable

•  Fewerannota.ons,easierQCofremainingIEA’sQ“Whyisn’tanIEAcoveredbymanualannota.on?”Either:

1.  Incorrectmapping2.  Missingparentinontology3.  Missingannota.on->findsuppor.ngevidenceand

annotatemanually(EXPorISO)(PomBasealsofilterNAS/TAS/IC)

Page 27: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

6.Annotatebyprocess(pathway)

•  Annota.ngbyprocessratherthan“adhoc”improvesconsistencyandallows‘annota.ongaps’tobetargeted

•  Processpapersmorequickly(becomemorefamiliarwiththefield,experimentalmethods)Becomefamiliarwithanareaofbiologyandthetechniquesused.Don’tneedtoreadthebackgroundevery.me.Recognisephenotypes.

Page 28: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

FromPMID:22898774

Regula.onofthemetaphase/anaphasetransi.onbytheMCC,theAPCandupstreamSignallingIden.fyobviousmissingannota.on,forexamplebetweencomplexmembers

6.Annotatebyprocessorpathway

Page 29: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

6.Annotatebyprocessorpathway

cdc20

proteasome

APC separase

CohesinsubunitsecurinPosttransi.on

SAC/MCC

CanperformQConprocessedorcomponentse.g.UseSTRINGtoevaluateoutliers(poten.alannota.onerrors)Inputlist“regula,onofmito,cmetaphase/anaphasetransi,on”

Canalsoask“areanyComplexmembersmissing”

Page 30: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Weareannota.ngwholeorganisms…useaholis.cwholeannota.onapproach

•  Evaluateannota.onbreadth(coverage)usingslims

•  Evaluateintersec.onsbetweenslimprocesses

7.Assessannota.onattheorganismallevel

Page 31: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Evaluateorganismalannota.oncoverageusing“slims”

•  EXPsupportedBP•  ISO/IEAinferredBP

‘unknowns’•  Speciesspecific,no

inferencepossible•  Conserved,but

unannotatedinanyspecies

Page 32: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.BrowsableSlim:

Page 33: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Sensibleassignments?

DNArecombina.on

PeriodiccheckthatslimclasscontentsLooksensible

Page 34: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Unknown&830&

TOTAL%5054%

cytoskeleton&&org&&206&

nuclear&DNA&&replica;on,&&recombina;on,&repair&305&

mito;c&&chromosome&&segrega;on&184&

&regula;on&of&&mito;c&&cell&&cycle&232&

10&

CELL%DIVISION%751%

27&

cytokinesis&110&

0&

39& 1&

46&

3&

4.%MITOCHONDRIAL%ORG/EXP%%280&

4&

cell&wall&&org&130&3&

4&

1&

MEMBRANES,%TRAFFICKING,%CELL%SURFACE%787%%

14&

lipid&met&222& vesicle&

Mediated&transport&324&

6&

glycosyla;on&polysacc&met&&&&&&&140&&

membrane&&org&199&

75&

0&

6&74&

10&

33&

0&

detox&&

SMALL%MOLECULE%TM%TRANSPORT%&288&&

13&

9&&

0&&

AA&&&sulfur&met&220&

&vitamin&cofactor&met&

9&

5&&

nucleoKbase/&side/;de&met&219&

small&&sugar&met&&&&&&&77&&

CENTRAL%MET,%ENERGY%%AND%BUILDING%%BLOCKS%549%

Nitrogen&15&&

25&174&

54&&

34&&

30&&

other&energy&genera;on&&&25&

23&&

signalling&404&

sexual&reproduc;ve&&process&&262&(Many&intersec;ons)&

Other&290&No&intersec;ons.&Includes&adhesion,&many&proteases,&peroxions&&&

EXPRESSION%1294%

````&

EXPRESSION%submod%863%

4& 1&3&

ribosome&&biogenesis&317&

RNA&&metabolism&772&&

cytoplasmic&transla;on&249&

189&

c&

nucleocyto&transport&&&&110&

5&

34&

26&

2&

Transcrip;on&479&&&&&

32&

18&&

PROTEIN%ASSEMBLY/STABILITY%%765%

protein&&catabolism&&&&autophagy&&&&&&&&&&251&

ubiqui;na;on&&&&&&&&&&&&192&&

63&

folding&102&

complex&&Assembly&325&

1&3&

4&

1&

7.Visualslim,allpombeproteins

Page 35: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Evaluateintersec.onsbetweenslimcategories

Evaluateintersec.onsbetweenprocessesManyGOprocessesarerarelyco-annotatedbecausetheyarefunc.onallyspa.allyortemporallydistant.Forexample,wouldnotexpect“ribosomebiogenesis”tointersectwith“vitaminmetabolism”Wecanusethisobserva.ontoiden.fypoten.alconflicts

Page 36: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Slimintersec.onsOct2014

xx

x x x x xx

x

x

xx

Page 37: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2

Slimintersec.onsFeb2015

Page 38: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

March2016

Page 39: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Iden.fiesontologyerrors(e.g)

DNAmetabolismandchromosomesegrega.ondonotusuallyintersectRegula.onofchromosomecondensa.onshouldnotbeaDNAmetabolicprocess

Page 40: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Ontologyerror(e.g.)

FolicacidisclassifiedasanaminoacidbyCHEBI,sofolatemetabolismisalsoannotatedtoAminoacidmetabolism.Needtofix,CHEBI,whichwillfixGO

Genesannotatedtofolicacidmetabolismwerealsoincorrectlyannotatedtoaminoacidmetabolism

Page 41: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Findsincorrectmappings(e.g)

IntersectbetweentRNAmetabolismandtranscrip.on.Elongatorisnolongerthoughttohaveadirectroleintranscrip.on,mappingremoved

Page 42: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

8.ConsiderAuthorintentThinkaboutthebiologytheauthorintendede.g.rubidiumiontransmembranetransporter/transportRubidiumionisusedasanassayforK+transportnotrubidium(non-physiologicalsubstrate)e.g.Apoptosis(RPS19)Rps19mutantdisplayedcondensedDNA,afragmentednucleusandcaspaseac.va.on-indica.veofapoptosis.SinceRPS19hasanessen.alroleinribosomebiogenesisapoptosisislikelytobeanindirecteffectofthedisrup.onofanupstreamprocesstransla.on(i.e.anexperimentalreadout)

Page 43: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

9.Communica.onwiththeauthorandcommunitycura.on

•  Mostauthorsarehappytodiscusstheirpublica.ons.Ifunsureaboutanannota.onaskthem.PomBaserou.nelyusetheauthorsasaQCsteptorefineannota.on.

•  Mostauthorsarehappytocuratetheirownpapers(especiallyPhD/postdoc/recentpapers).>40%of***papersassignedtocommunityhavebeenreturnedwithhighqualityannota.on

Page 44: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

9.CommunityCura.on•  …..Authorsalsocuratetheirownrecentpapers

Co-cura.onbyauthorandcuratorimprovesannota.onquality

Page 45: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Someexamplesessions

•  hQp://,nyurl.com/q2bgyqv•  hQp://,nyurl.com/p7d979b•  hQp://,nyurl.com/o72bzul

Page 46: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Verydetailedannota.onismadepossiblebecauseCantoGuidestheuserstepbysteptoconstructgenotypesandontologybasedannota.ons.“Drilldown”tomorespecifictermsisassisted.PromptsareprovidedforAEofspecifiedtypesforcertainterms

Page 47: PomBase conventions for improving annotation depth, breadth, consistency and accuracy
Page 48: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Isitsuccessful?

•  Numbers•  Showincreaseuptakegraph?•  Accuracy,high,butowenomissions•  Oncepeopledoneonehappytorepeat•  Bemeronsubsequentsessions•  Moresuccessclosertopublica.ondate•  Mo.va.on–publica.onvisibility(datadissemina.on)

Page 49: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

10.Priori.seerrorfixing•  Fixingknownerrorstakesprecedenceovernewannota.on....

likecri.calbugsincode•  Evensmallerrorsowenuncoverlargerissues,orcanfixmany

problemssimultaneouslyacrossmul.plespecies.•  Preventspropaga.onofannota.onerrors

Page 50: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Summary

Page 51: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Spareslides

Page 52: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

8.GOvs.phenotype•  GOannota.onsshouldreflectagene'sdirectinvolvementin,orroleinregula.ng,processes

orfunc.ons.Incontrast,phenotypeannota.onsindicatethatamuta.oncausesachangeinaprocess,butmayreflectdownstreamorindirecteffects.

•  ERmembranedefect->nuclearenvelopedefect->chromosmedecondensa.ondefect->defectsinnextroundofDNAreplica.on.ClearlyaDNAreplica.onphenotypealoneisnotenoughtomakea“DNAreplica.on”GOannota.on.

•  AtPomBaseweonlymakeGOannota.onsbasedonphenotypesif

i)Thephenotypeisknowntobecompletelydetermina.vefortheprocessIi)Addi.onaldatasupportsGOinferencefromphenotype(loca.on,orthology)

•  Owentheexperimentsdonotdefini.velyresolvetheexactprocessthatthegeneisinvolved

in.•  Intersec.onsbetweenprocessesusefulforiden.fyingannota.onerrorscausedbyindirect

annota.on