The i5k [email protected]/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for...
Transcript of The i5k [email protected]/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for...
![Page 1: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/1.jpg)
Thei5kWorkspace@NAL:apan-ArthropodaGenomeDatabase
ChrisChildersandMonicaPoelchauUSDA-ARS,NationalAgriculturalLibrary
![Page 2: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/2.jpg)
Outline
• Backgroundandoverview• Whyjointhei5kWorkspace?• Whatdoweneedforaproject?• Whatwedowithyourdata?• Whatdon’twedowithyourdata?• Ournewsystemforsubmittingprojectsanddata
![Page 3: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/3.jpg)
Background
• Thei5kinitiativetaskeditselfwithcoordinatingthesequencingandassemblyof5000insectorrelatedarthropodgenomes
• Internationalefforttoprioritize insectgenomesforsequencing;provideguidelinesforgenomesequencingandcuration;andseekfunding.
• Thei5kWorkspace@NAL isavailabletohelpanyi5k(arthropod)projectwithgenomehostingneeds
• Researchplan• Generatematerialfor
sequencing• Genomesequencing• Genomeassembly• Automated
annotationofgenomeassembly
• Biologicalinsights/Publication
GenomeProjectTrajectory
• ManualCuration• Officialgeneset
(OGS)generation• Genomeproject
maintenance
![Page 4: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/4.jpg)
WorkspaceProjectBasics
• Thei5kWorkspacecentersaroundprojects.• Aprojectisacollectionofdatabasedonthegenomeassemblyofanarthropod
• Alldataisusedinthecontextofthegenomeassembly
• Eachprojecthasaprojectcoordinator.• Servesasthepointofcontactforquestionsabouttheproject
• Mainresponsibility:approveorrejectnewApollousers
• All ofourdataisuser-submitted
![Page 5: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/5.jpg)
Whyjointhei5kWorkspace?
• Gainaccesstoalargediversecommunity• Adiversityoforganisms
• 58speciesandcounting• 20%ofthearthropodswithgenomeassembliesatNCBI
• Largeusercommunitywithmanydifferentinterests• Peopleversedinthebiologyofspecificsystems• Expertsinaspeciesorgroupofspecies
• Acommoninterfaceforaccessingdata,toolsandsearch• Detailedpoliciesondataandprojectmanagement
• Helpfulifyouhavedatamanagementrequirements• Datamanagement
• https://i5k.nal.usda.gov/data-management-policy• Long-termprojectmanagement
• https://i5k.nal.usda.gov/long-term-i5k-workspace-project-management
![Page 6: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/6.jpg)
Whatdoweneedforaproject?
• Yourprojectmetadata• Informationaboutyourorganism• Metadataforsubmitteddatafiles(themorethebetter)
• Whattoolsormethodswereused• Softwareversionsandoptionsset• Whenandwherethedataweregenerated• Otherinformation(locationcollected,life-stage,etc.)
• Yourdatafiles• GenomeassemblyneedstobeinGenBank/ENA/DDBJ• Datashouldbeopenaccess(noprivaterepositories)• Additionaldatasetsneedtobemappedtothesameassembly
![Page 7: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/7.jpg)
Whatdowedowithyourdata?
• Createresources• Organismandgenepages• Datadownloads
• Integrateyourdatawithourtools• Genomebrowser• BLAST,Clustal,HMMer• Apolloforgenecuration
• Offerpostcurationservices• AnnotationQCandOfficialGeneSet(OGS)Creation• Updategenepages,Apollo,BLASTwithOGS
![Page 8: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/8.jpg)
Submission
‘Frozen’genomeassembly
Automatedannotations
Ancillarydatafiles (e.g.RNA-Seq alignments)
ToolsOrganismInformation
Page
Bulkdatadownloads
Tutorials
CustomBLASTinterface
Apollomanualcurationtool
JBrowse genomebrowser
Services
Manualannotationqualitycontrol
Officialgenesetgeneration
https://i5k.nal.usda.gov/Workspace@NAL
HMMer Clustal
Resources
Challenges
Non-standarddataformatting
Failuretosubmitallmetadata(ex:sampleorigin;
analysismethods)
![Page 9: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/9.jpg)
Whatdon’twedowithyourdata?
• Computationallyintenseanalysessuchas• Geneprediction• RawRNAseqmapping
• Wearenotalong-termarchiveorrepository• NCBI• AgDataCommons• DryadDigitalRepository• CyVerse Datacommons• Manyotheroptionsavailable
![Page 10: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/10.jpg)
Criteriaforstartingaproject
• Youneedtohaveanarthropod genomeassembly,accessionedbyNCBI(oranotherINSDCmember)
• UsingGenBank's accessionnumbersavoidsconfusionaboutassemblyversion
• TheGenBank contaminationscreenimprovestheassemblyquality
• Usingastableassemblyisbeneficialforthelabor-intensivecommunityannotationprocess
![Page 11: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/11.jpg)
Otherthingstoconsiderbeforesubmitting• Alldatasubmittedtothei5kWorkspaceispublic.
• However,wedostatewhetherFt.Lauderdale/Torontoagreementsofdatasharingshouldapply
• Isyourgenomean‘orphan’,oristhereanothersuitabledatabase?
• Wecanhostgenomesthatarealreadyhostedelsewhere,andactivelycommunicatewithotherdatabaseproviders
• Allmanualannotationeffortsneedtobeatonedatabase
![Page 12: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/12.jpg)
Gettinganaccount
• Applyforadatasetsubmissionaccount:https://i5k.nal.usda.gov/register/project-dataset/account
• Onceyouraccountisapproved,youcansubmitprojects,assembliesorotherdatasets
![Page 13: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/13.jpg)
Startani5kWorkspaceProject
• Login• https://i5k.nal.usda.gov/user
• Frommenu,select’Data->Submitdata->Requestanewi5kWorkspaceProject’
• https://i5k.nal.usda.gov/datasets/request-project
• We’llreviewyoursubmissionandwillgetintouchwithyou
![Page 14: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/14.jpg)
Submityourgenomeassembly
• Allinformationsubmittedthroughthisformwillbere-formattedfordisplayatthei5kWorkspace(exceptforemailaddressandfilechecksum)
• Frommenu,select‘Data->Submitdata->Submitagenomeassembly’
• https://i5k.nal.usda.gov/datasets/assembly-data
![Page 15: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/15.jpg)
Submitgenepredictions
• Allinformationsubmittedthroughthisformwillbere-formattedfordisplayatthei5kWorkspace(exceptforemailaddressandfilechecksum)
• Undermenubar,select‘Data->Submitdata->SubmitGenePredictions’
• https://i5k.nal.usda.gov/datasets/gene-prediction
![Page 16: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/16.jpg)
Submitmappeddatasets
• Allinformationsubmittedthroughthisformwillbere-formattedfordisplayatthei5kWorkspace(exceptforemailaddressandfilechecksum)
• Undermenubar,select‘Data->Submitdata->SubmitaMappedDataset’
• https://i5k.nal.usda.gov/datasets/mapped
![Page 17: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/17.jpg)
Sendusyourfiles
• Therearecurrentlyfive waystosharefileswithus:1. Useourdatasubmissionforms2. Transmitthefileviaftp (onlyforfiles<2Gb)3. Emailittous(forfiles<25Mbonly)4. ProvideuswithaURL,ifavailable5. UploadthefiletoCyVerse andsharewithour
organization,“NALBioinformatics”• Wepreferthatyoushareyourfileswithusviaourdatasubmissionforms.
• Formoreinformation,seehttps://i5k.nal.usda.gov/content/sharing-files-us
![Page 18: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/18.jpg)
OtherresourcesattheNAL:theAgDataCommons
• HostsanydatasetfundedbytheUSDA
• Landingpage• CitableDOI• https://data.nal.usda.gov/• 9i5kdatasetsalreadyavailable
![Page 19: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/19.jpg)
Needmoreinformation?i5kWorkspace@NAL:• https://i5k.nal.usda.gov/• https://github.com/NAL-i5K/
Thei5kinitiative:• Newwebsite:http://i5k.github.io/
![Page 20: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/20.jpg)
OfficialGeneSetcreationatthei5kWorkspace
![Page 21: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/21.jpg)
OfficialGeneSetcreationatthei5kWorkspace• OfficialGeneSetdefinition• OurOGSgenerationprocess
• Manualandcommunityannotation• Qualitycontrol• Merge• Release
• ExamplesandfuturedirectionsoftheOGSgenerationprocess
![Page 22: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/22.jpg)
TheOfficialGeneSet– whatisit?
• Loosedefinition:Thebestknownrepresentationofgenemodelsforagenomeassembly
• Whenthei5kWorkspacegeneratesanOGS,thisisamergebetweenonegeneset(usuallycomputationallypredicted),andasetofmanuallyvalidatedannotations(usuallyfromtheApollosoftware)
![Page 23: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/23.jpg)
WhygenerateanOfficialGeneSet?• Thisdependsonyourgenomecommunity’sneeds.• Ifseveralgroupswanttoperformdownstreamanalyses,ithelpstohaveanauthoritative‘referencegeneset’foryourcommunity,ratherthanmultiplecompetinggenesets
![Page 24: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/24.jpg)
OurOGSgenerationprocess
• Newpublicversionofprogramisavailable:https://github.com/NAL-i5K/GFF3toolkit (Mei-JuChen,Li-MeiChiang)
• Thefullprocessistime-consuming,butwearegenerallyavailabletoperformOGSgenerationfori5kWorkspaceprojects
1. Manual annotation (via Apollo)
2. Error checking Curator fixes
3. Merge with one
designated gene set
4. Release Official
Gene Set
Manual annotation
freeze
![Page 25: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/25.jpg)
1.ManualandcommunityannotationWhatismanualannotation?• Manualreviewandimprovementofanexistinggeneprediction
• Often,butnotalways:drawingonexternalevidence(e.g.RNA-Seq,cDNA,genesfromotherspecies)toimproveacomputationallypredictedgenemodel
Structuralannotation– e.g.modifyexons
Functionalannotation– e.g.addname
![Page 26: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/26.jpg)
1.ManualandcommunityannotationWhymanuallyannotate?• “Incorrectannotationspoisoneveryexperimentthatmakesuseofthem…Worsestill,thepoisonspreadsbecauseincorrectannotationsfromoneorganismareoftenunknowinglyusedbyotherprojectstohelpannotatetheirowngenomes.”
• Yandell andEnce 2012,doi:10.1038/nrg3174• Linkgenemodelstoexistingliteratureandontologies,providingricherdata
• Onecurrent‘model’ofthegenomepaperoftendrawsheavilyfrominsightsconfirmedbymanualannotation
![Page 27: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/27.jpg)
1.Manualandcommunityannotation• Whatiscommunityannotation?
• Scientistscollectivelyexamineandimprovegenemodels(usuallycomputationallypredicted)
• Communityannotationatthei5kWorkspace:• Accesstoalargecommunityofcurators• Tutorials,guidelines,webinars• Registrationmechanismfornewannotators• One-on-onesupport• Over400registeredannotatorshavecuratedover10,000genemodelsusingtheApollosoftware
![Page 28: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/28.jpg)
1.Manualandcommunityannotation– i5kpilotexampleNumberofcuratorsperorganism.Communitysizevariesamongorganisms.
Numberoforganismspercurator.35%ofcuratorsworkedonmorethanoneorganism
![Page 29: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/29.jpg)
1.Manualandcommunityannotation– i5kpilotexample
• Threeorganismsthatcompletedthemanualannotationprocesshadtoperformsimilaramountsofstructuralannotationstocomputationallypredictedgeneannotations
• Computationallypredictedgenesoftenhaveinaccurategenestructures
• Communityannotationcaneffectivelyimprovegenesets
organismTotalnumberofmanually
annotatedmodels
Proportionofmanuallyannotatedmodels with
structuralchanges
Anoplophora glabripennis6 1144 0.75
Cimex lectularius7 1354 0.76
Oncopeltus fasciatus 1518 0.76
![Page 30: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/30.jpg)
2.OGSgeneration– QualityControl• Manualcurationcanintroducemanyerrors,evenusingstandardsoftwarepackages(e.g.Apollo)
• QCprogramidentifiescommonformatting errorsfromthemanualcurationprocess
• Github repo:https://github.com/NAL-i5K/GFF3toolkit
• Identifiesover50errortypes• Anotherin-housepipelinecorrectsmanyoftheseerrors
![Page 31: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/31.jpg)
2.OGSgeneration– QualityControl• Requiressomemanualreview– can’tbecompletelyautomated
• e.g.didyounameyourgenemodel‘test’or‘Contig277’?
• Notethati5kWorkspacestaffaren’t‘curators’inthetraditionalsense– wedonotreviewthebiologicalvalidityofanyofthecommunity-annotatedmodels.
• ThedegreeofmanualreviewofcommunityannotationsishigherifOfficialGeneSetsaretobesubmittedtoNCBI
![Page 32: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/32.jpg)
2.OGSgeneration– QualityControl• Diaphorina citri example(Database,doi:10.1093/database/bax032)
• Firstroundofcorrectionsforcommunitycuration:• 513errorsin587manuallyannotatedmodels• 397oftheseerrorsneededcuratorfeedback
• Secondroundofcorrections:• 15errorsneededannotatorfeedback
Error checking Curator fixes
![Page 33: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/33.jpg)
3.OGSgeneration– Merge
TheGFF3toolkitMergeprogramcanidentifywhichgenemodelsinthe‘reference’genesetshouldbereplacedbygenemodelsinasecondgeneset(i.e.themanuallyannotatedmodels)via‘auto-assignment’)
Referencegene
Manuallyannotatedgene
![Page 34: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/34.jpg)
3.OGSgeneration– Merge
• Auto-assignmentusesbothsequencesimilarityandcoordinateoverlap
• ExtractCDSandpre-mRNAsequencesfrommRNAfeaturesfrombothgenesets.
• Useblastn todeterminewhichsequencesfromthemodifiedandreferencegenesetaligntoeachotherintheircodingsequence.
• Theseparametersareused:-evalue 1e-10-penalty-15-ungapped• Iftwomodelspassthealignmentstep,checkthatmatchedmodelsalsohavecoordinateoverlap
• Adda’ReplaceTag'withtheIDofeachoverlappingmodeltothemodifiedgeneset.
• Ifnoreferencemodeloverlapswithanewmodel,thentheprogramwilladd'replace=NA'.
![Page 35: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/35.jpg)
3.OGSgeneration– Merge
• TheprogramdeterminesmergeactionsbasedontheReplaceTags:1. deletion2. simplereplacement3. newaddition4. splitreplacement5. mergereplacement
• Modelsfrommodifiedmanualannotationsreplacemodelsfromreferenceannotationsbasedonmergeactionsinstep2.
Referencegene
UpdatedgeneMergereplacement
![Page 36: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/36.jpg)
3.OGSgeneration– Merge
• Diaphorina citri example(Database,doi:10.1093/database/bax032)1. #genesdeleted:12. #geneswithsimplereplacement:4373. #genes added:724. #genes split:385. #genes merged:316. TotalnumberofgenesinOGS:20,217
![Page 37: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/37.jpg)
3.OGSgeneration– Merge
• Othersoftwaretoolscanbeusedtomergegenesets
• Combinertoolsthatuse‘weights’fordifferentinputannotations,e.g.
• EVidenceModeler (EVM,https://evidencemodeler.github.io/)• Glean(https://sourceforge.net/projects/glean-gene/)
• Otheroverlap-basedreplacementtools,e.g Bedtoolsintersect(http://bedtools.readthedocs.io/en/latest/)
![Page 38: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/38.jpg)
4.OGSgeneration– ReleaseOGS
• GeneratenewormaintainoldgenemodelIDs• Establishreleasedatewithgenomecoordinator• Generatefasta files• Addtoi5kWorkspace@NAL database• *SubmittoNCBIifrequestedbygenomecoordinator*
![Page 39: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/39.jpg)
CompletedOGSprojectsusingi5kWorkspace’spipeline• Diaphorina citri OGSv1.0• Frankliniella occidentalis OGSv1.0• Hyalella azteca OGSv1.0• Oncopeltus fasciatus OGSv1.2• Athalia rosae OGSv1.0• Orussus abietinus OGSv1.0• Leptinotarsa decemlineata OGSv1.0
![Page 40: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/40.jpg)
Futureupdates
• Currentimprovments:• GFF3toolkitsupportforQCandmergeofnon-codingtranscripts(Li-MeiChiang)
• Futurework:• Improvemethodsformergingmulti-isoformmodels• ImproveQCprocess– howtoimprovecommunicationsabouterrorswithannotators
![Page 41: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/41.jpg)
Questions?
i5kWorkspace@NAL:• https://i5k.nal.usda.gov/• https://github.com/NAL-i5K/• GFF3toolkitissuetracker:https://github.com/NAL-i5K/GFF3toolkit/issues
• Email:[email protected]
![Page 42: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/42.jpg)
Thankyou!TheNALTeam
• Yu-yu Lin
• ChaitanyaGutta
• Li-MeiChiang
• YiHsiao
• GaryMoore
• SusanMcCarthy
I5kWorkspacealumni
• Chien-Yueh Lee
• HanLin
• Jun-WeiLin
• Vijaya Tsavatapalli
• Mei-Ju Chen
• Chao-ITuan
i5kWorkspace@NAL advisorycommittee
• i5kCoordinatingCommittee• i5kPilotProject• Apollo&JBrowse DevelopmentTeams• GMOD/Tripalcommunity
• Allofourusersandcontributors!
![Page 43: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/43.jpg)
OGSgeneration– theGFF3toolkit
![Page 44: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/44.jpg)
TheReplacedModelsfield
• Weusetheinformationinthisfieldtogenerateamerged,non-redundantgenesetfromthemanuallycuratedmodelsandtheofficialorprimarygeneset
• Yourofficialorprimarygenesetislistedinthecategoryfieldofthetrackselector
• Ifyoudon’tknowwhatyourproject’sgenesetis,contactus!
https://i5k.nal.usda.gov/apollo-replaced-models-field-explanations-and-examples
ReplacedModelsfield
![Page 45: The i5k Workspace@NALi5k.github.io/webinar_slides/i5k_webinar-i5k_workspace_Oct04-2017.… · for genome sequencing and curation; and seek funding. •The i5k Workspace@NALis available](https://reader036.fdocuments.us/reader036/viewer/2022070723/5f020f307e708231d4025f9c/html5/thumbnails/45.jpg)
Communityannotationlifecycle(endgoal:OGS)Genome
sequencing,assemblyandannotation
Communitybuilding:
Conferencecallsandtraining
Manualannotationvia
Apollo
Manualannotation‘freeze’
GeneralQC(NAL)
OfficialGeneSetgeneration(Merge
ofmanualannotationsand
referencegeneset)