PomBaseconven,onsforimprovingannota,ondepth,
breadth,consistencyandaccuracy
Annota,onnumbersareimportant…butnumbersaren’teverything…..• Downstreamuseofannota.onfordata-mininganddata-
analysisislimitedbyerrors,inconsistenciesandomissions.• PomBaseusesacombina.onofannota.onconven.ons,to
improveinforma.oncontent(annota.oncoverage,specificityandredundancy),andQCmechanismstoiden.fypossibleannota.oninconsistenciesanderrors.
• Incombina.onthesemechanismsaddressmanyrecurringannota.onissues.
1.Thedefini.oniscri.cal
Allontologytermshavea“fixed”defini.on• Ifadefini.onismisleadingorincorrectitsmeaningcannot
bechanged.Tofixthetermisobsoletedandannota.onsaremigrated.
• Thismakesannota.onsveryrobusttoontologychanges.Ifatermneedstobereposi.onedtheannota.onsremaincorrect.
• Weannotatetothedefini.on,notthetermname.Alwayscheckthedefini,on.
2.Improvingannota.onspecificity
• i)Considerdescendantterms• ii)Vetouseofuninforma.veterms
2i.ConsiderdescendantsAnnotateasspecificallyasexperimentallowsandbeunambiguousaboutthebiology• regula.on:posi.veornega.ve?• transla.on:cytoplasmicormitochondrial?• transport:ofwhat?towhere?how?• chromosomesegrega.on:mito.cormeio.c?
Iftheavailabletermsareinsufficient,requestamorespecificterm
• Foracarboxylicacidcarrier“carboxylicacidtransport”looksini.allyOK• However“transmembranetransport”isnotexplicithere…Carboxylicacidmightbetransportedinotherways…
2i.Considerdescendantse.g.
Morespecificannota.oncanprovideaddi.onaldetaile.g.• substrate,• type(transmembrane),• some.mesdirec.onalityAddi.onalparentsincreasetheinforma.oncontentasannota.ngindirectlytomoreterms.
2.Considerdescendantse.g.
2.Vetouseofuninforma.vetermsIden.fythesetofontologytermswheremorespecificannota.onshouldbepossible(morebiologicaldetail)Examples:• e.g.cellularprocess->which• e.g.transla.on->cytoplasmic?Mitochondrial?• e.g.transport->ofwhat?towhere?SomeGOtermsarealreadyflaggedasnotformanualannota.on.Reviewandimproveannota.onstovetoedtermsPomBasealotoftheupperontologylevels1175GOtermsblockedforannota.on(only~50viola1ons)
3.Improvetheontologies
3.i)Missingparents
Originalarrangement
3i.Missingparents
Theseprocessannota.onswereoriginallyindifferentbranchesoftheontology,soallannota.onswererequired
Newarrangement:
3i.Missingparents
3.iMissingparents
Collapsed6processesto2.Exactlythesameinforma.oncontentLessredundancy,easierforuserstointerpretannota.on
3.iiReportincorrectparents
AKA“TruePathViola.ons”or“TPVs”Forexampleproteinmatura.on--proteinprocessing(part_of)----proteolysis(part_of)(notallproteolysisisprocessingormatura.on)
4.ThepowerofAnnota.onExtensionsProvideaddi.onalspecificityforaGOannota.one.g.• Targetgene(kinasesubstrate,TFregula.ontarget)• Loca.onofafunc.on• Localiza.ondependencies(proteinAlocalizesproteinB)• Spa.alandtemporalaspectsofprocesses,func.ons,loca.ons(cellcyclestage
ofoccurrence)
• ADDanexampleofageneproductspecificAE
See:Huntleyet.al.AmethodforincreasingexpressivityofGeneOntologyannota.onsusingacomposi.onalapproach.PMID:24885854
cyclin-dependentproteinserine/threoninekinase•hassubstratejh2involvedinnega.veregula.onofconjuga.onwithcellularfusion•directlyinhibitssrw1involvedinposi.veregula.onregula.onofG1/Stransi.on•hassubstratedrc1involvedinposi.veregula.onofmito.ccellcycleDNAreplica.on•hassubstratecdc18,orc2involvedinnega.veregula.onofDNAreplica.onduringmito.cG2phase•hassubstratexlf1involvedinnega.veregula.onofdouble-strandbreakrepairvianonhomologousendjoining,duringmito.cG2phase•hassubstraterap1involvedinnega.veregula.onofmito.ctelomeretetheringatnuclearperipheryduringmito.cMphase•hassubstratehcn1duringmito.cMphase•hassubstratecut3involvedinposi.veregula.onofmito.cchromosomecondensa.onduringmito.cmetaphase•hassubstratemde4involvedincorrec.onofmerotelicamachment,mito.cduringmito.cmetaphase•hassubstrate,nsk1,involvedinnega.veregula.onofamachmentofmito.cspindlemicrotubulesduringmito.cmetaphase•hassubstratemde4,cut7involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.cmetaphase•hassubstrateklp9involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.canaphaseA•directlyinhibitsclp1involvedinnega.veregula.onofexitfrommitosis•hassubstratebyr4involvedinposi.veregula.onofsepta.onini.a.onsignaling•directlyinhibitsdis2,•hassubstraterum1,crb2,sds23
Linkfunc.on(cyclin-dependent-kinase)totargetgenes,processes,andtemporalinforma.on
4.Annota.onExtensione.g.cdc2
Alterna.ve(humanCDK1):
Notscalableormaintainable
4.UsingAEforeffectors• Reciprocaloftheextension(automated)called“targetof”• Collectsknown“upstreameffectors”oncdc2page
• Wecanuseeffectorsubstrateconnec.onstogeneratenetworks(interac.on,metabolic,regulatory)
• Providedirec.onallinkstosupportpathwayreconstruc.on
4.UsingAnnota.onExtensionstogeneratenetworks/pathways
sty1 cmk2
srk1
rum1
atf1 srk1
gsa1
gpx1
ntp1
sro1 ish1
4.AutomatedAEnetworkse.g.
44/59connectedinautomatednetworkbasedonannotatedconnec.onswithin“regula,onofG2/Mtransi,on”(fissionyeast)(NetworkforeachGOslimcategoryfromtheslimpage)
5.SuppressredundantIEAannota.on
• PomBasepipelinesfilterredundantIEA(InferredfromElectronicAnnota1on)evidence
• Removes>90%ofIEA(becauseanexis.ngmanualannota.onexists)
5.SuppressredundantIEAannota.on
13annota.onsarereducedto4
Sameinforma.on,fewerterms
Incorrectannota.onsaremoreeasilyspomedMis16isnotinvolvedin‘chroma.nmodifica.on,->fixmapping
5.SuppressredundantIEA,QCofmappings
Missingparentsinontologymoreobvious“inorganicanionexchanger”shouldbean‘ancestor’ofGO:0005452,tosuppresstheIEAasredundant
5.SuppressredundantIEA,QCofontology(SPBC543.05c)
5.SuppressredundantIEAannota.on
• >40,000fissionyeastIEAsavailable.• PomBasefilter36000redundant,retain4000(IEAsareatleast
90%accurateifmanualcorrect).• ItiseasiertoevaluatetheremainingIEA’stoiden.fy/fix
anomalies
ReducingIEAsover.me
5.SuppressredundantIEA• Moreconciseviewwithzerolossofinforma1on• IEAmappingsderivedfromasingleexperiment/publica.on
canbeinterpretedasproofbyrepe11onandmakeweakEXPdataappearmul.plysupported/acceptable
• Fewerannota.ons,easierQCofremainingIEA’sQ“Whyisn’tanIEAcoveredbymanualannota.on?”Either:
1. Incorrectmapping2. Missingparentinontology3. Missingannota.on->findsuppor.ngevidenceand
annotatemanually(EXPorISO)(PomBasealsofilterNAS/TAS/IC)
6.Annotatebyprocess(pathway)
• Annota.ngbyprocessratherthan“adhoc”improvesconsistencyandallows‘annota.ongaps’tobetargeted
• Processpapersmorequickly(becomemorefamiliarwiththefield,experimentalmethods)Becomefamiliarwithanareaofbiologyandthetechniquesused.Don’tneedtoreadthebackgroundevery.me.Recognisephenotypes.
FromPMID:22898774
Regula.onofthemetaphase/anaphasetransi.onbytheMCC,theAPCandupstreamSignallingIden.fyobviousmissingannota.on,forexamplebetweencomplexmembers
6.Annotatebyprocessorpathway
6.Annotatebyprocessorpathway
cdc20
proteasome
APC separase
CohesinsubunitsecurinPosttransi.on
SAC/MCC
CanperformQConprocessedorcomponentse.g.UseSTRINGtoevaluateoutliers(poten.alannota.onerrors)Inputlist“regula,onofmito,cmetaphase/anaphasetransi,on”
Canalsoask“areanyComplexmembersmissing”
• Weareannota.ngwholeorganisms…useaholis.cwholeannota.onapproach
• Evaluateannota.onbreadth(coverage)usingslims
• Evaluateintersec.onsbetweenslimprocesses
7.Assessannota.onattheorganismallevel
7.Evaluateorganismalannota.oncoverageusing“slims”
• EXPsupportedBP• ISO/IEAinferredBP
‘unknowns’• Speciesspecific,no
inferencepossible• Conserved,but
unannotatedinanyspecies
7.BrowsableSlim:
7.Sensibleassignments?
DNArecombina.on
PeriodiccheckthatslimclasscontentsLooksensible
Unknown&830&
TOTAL%5054%
cytoskeleton&&org&&206&
nuclear&DNA&&replica;on,&&recombina;on,&repair&305&
mito;c&&chromosome&&segrega;on&184&
®ula;on&of&&mito;c&&cell&&cycle&232&
10&
CELL%DIVISION%751%
27&
cytokinesis&110&
0&
39& 1&
46&
3&
4.%MITOCHONDRIAL%ORG/EXP%%280&
4&
cell&wall&&org&130&3&
4&
1&
MEMBRANES,%TRAFFICKING,%CELL%SURFACE%787%%
14&
lipid&met&222& vesicle&
Mediated&transport&324&
6&
glycosyla;on&polysacc&met&&&&&&&140&&
membrane&&org&199&
75&
0&
6&74&
10&
33&
0&
detox&&
SMALL%MOLECULE%TM%TRANSPORT%&288&&
13&
9&&
0&&
AA&&&sulfur&met&220&
&vitamin&cofactor&met&
9&
5&&
nucleoKbase/&side/;de&met&219&
small&&sugar&met&&&&&&&77&&
CENTRAL%MET,%ENERGY%%AND%BUILDING%%BLOCKS%549%
Nitrogen&15&&
25&174&
54&&
34&&
30&&
other&energy&genera;on&&&25&
23&&
signalling&404&
sexual&reproduc;ve&&process&&262&(Many&intersec;ons)&
Other&290&No&intersec;ons.&Includes&adhesion,&many&proteases,&peroxions&&&
EXPRESSION%1294%
````&
EXPRESSION%submod%863%
4& 1&3&
ribosome&&biogenesis&317&
RNA&&metabolism&772&&
cytoplasmic&transla;on&249&
189&
c&
nucleocyto&transport&&&&110&
5&
34&
26&
2&
Transcrip;on&479&&&&&
32&
18&&
PROTEIN%ASSEMBLY/STABILITY%%765%
protein&&catabolism&&&&autophagy&&&&&&&&&&251&
ubiqui;na;on&&&&&&&&&&&&192&&
63&
folding&102&
complex&&Assembly&325&
1&3&
4&
1&
7.Visualslim,allpombeproteins
7.Evaluateintersec.onsbetweenslimcategories
Evaluateintersec.onsbetweenprocessesManyGOprocessesarerarelyco-annotatedbecausetheyarefunc.onallyspa.allyortemporallydistant.Forexample,wouldnotexpect“ribosomebiogenesis”tointersectwith“vitaminmetabolism”Wecanusethisobserva.ontoiden.fypoten.alconflicts
Slimintersec.onsOct2014
xx
x x x x xx
x
x
xx
2
Slimintersec.onsFeb2015
March2016
7.Iden.fiesontologyerrors(e.g)
DNAmetabolismandchromosomesegrega.ondonotusuallyintersectRegula.onofchromosomecondensa.onshouldnotbeaDNAmetabolicprocess
7.Ontologyerror(e.g.)
FolicacidisclassifiedasanaminoacidbyCHEBI,sofolatemetabolismisalsoannotatedtoAminoacidmetabolism.Needtofix,CHEBI,whichwillfixGO
Genesannotatedtofolicacidmetabolismwerealsoincorrectlyannotatedtoaminoacidmetabolism
7.Findsincorrectmappings(e.g)
IntersectbetweentRNAmetabolismandtranscrip.on.Elongatorisnolongerthoughttohaveadirectroleintranscrip.on,mappingremoved
8.ConsiderAuthorintentThinkaboutthebiologytheauthorintendede.g.rubidiumiontransmembranetransporter/transportRubidiumionisusedasanassayforK+transportnotrubidium(non-physiologicalsubstrate)e.g.Apoptosis(RPS19)Rps19mutantdisplayedcondensedDNA,afragmentednucleusandcaspaseac.va.on-indica.veofapoptosis.SinceRPS19hasanessen.alroleinribosomebiogenesisapoptosisislikelytobeanindirecteffectofthedisrup.onofanupstreamprocesstransla.on(i.e.anexperimentalreadout)
9.Communica.onwiththeauthorandcommunitycura.on
• Mostauthorsarehappytodiscusstheirpublica.ons.Ifunsureaboutanannota.onaskthem.PomBaserou.nelyusetheauthorsasaQCsteptorefineannota.on.
• Mostauthorsarehappytocuratetheirownpapers(especiallyPhD/postdoc/recentpapers).>40%of***papersassignedtocommunityhavebeenreturnedwithhighqualityannota.on
9.CommunityCura.on• …..Authorsalsocuratetheirownrecentpapers
Co-cura.onbyauthorandcuratorimprovesannota.onquality
Someexamplesessions
• hQp://,nyurl.com/q2bgyqv• hQp://,nyurl.com/p7d979b• hQp://,nyurl.com/o72bzul
Verydetailedannota.onismadepossiblebecauseCantoGuidestheuserstepbysteptoconstructgenotypesandontologybasedannota.ons.“Drilldown”tomorespecifictermsisassisted.PromptsareprovidedforAEofspecifiedtypesforcertainterms
Isitsuccessful?
• Numbers• Showincreaseuptakegraph?• Accuracy,high,butowenomissions• Oncepeopledoneonehappytorepeat• Bemeronsubsequentsessions• Moresuccessclosertopublica.ondate• Mo.va.on–publica.onvisibility(datadissemina.on)
10.Priori.seerrorfixing• Fixingknownerrorstakesprecedenceovernewannota.on....
likecri.calbugsincode• Evensmallerrorsowenuncoverlargerissues,orcanfixmany
problemssimultaneouslyacrossmul.plespecies.• Preventspropaga.onofannota.onerrors
Summary
Spareslides
8.GOvs.phenotype• GOannota.onsshouldreflectagene'sdirectinvolvementin,orroleinregula.ng,processes
orfunc.ons.Incontrast,phenotypeannota.onsindicatethatamuta.oncausesachangeinaprocess,butmayreflectdownstreamorindirecteffects.
• ERmembranedefect->nuclearenvelopedefect->chromosmedecondensa.ondefect->defectsinnextroundofDNAreplica.on.ClearlyaDNAreplica.onphenotypealoneisnotenoughtomakea“DNAreplica.on”GOannota.on.
• AtPomBaseweonlymakeGOannota.onsbasedonphenotypesif
i)Thephenotypeisknowntobecompletelydetermina.vefortheprocessIi)Addi.onaldatasupportsGOinferencefromphenotype(loca.on,orthology)
• Owentheexperimentsdonotdefini.velyresolvetheexactprocessthatthegeneisinvolved
in.• Intersec.onsbetweenprocessesusefulforiden.fyingannota.onerrorscausedbyindirect
annota.on
Top Related