EBI web resources I: databases and toolsbcb.unl.edu/yyin/teach/PBB/ebi-go.pdf · cellular...
Transcript of EBI web resources I: databases and toolsbcb.unl.edu/yyin/teach/PBB/ebi-go.pdf · cellular...
EBI web resources I: databases and tools
YanbinYin
1
Outline
• IntrotoEBI
• Databasesandwebtools– UniProt– GeneOntology
• HandsonPractice
MOSTMATERIALSAREFROM:http://www.ebi.ac.uk/training/online/course-list
2
Threeinternationalnucleotidesequencedatabases
3
TheEuropeanBioinformaticsInstitute (EBI)
Createdin1992aspartof EuropeanMolecularBiologyLaboratory (EMBL)
EMBLwascreatedin1974andisa molecularbiology researchinstitutionsupportedby20EuropeancountriesandAustralia
Wellcome TrustGenomeCampus, Hinxton,Cambridge,UKNeighborofWellcome TrustSangerInstitute
4
5
http://www.ebi.ac.uk/
ResearchgroupsinEBI
6
InterPro
UniProt
miRBase
MajordatabasesinEBI
EMBL-Bank (DNAandRNAsequences)Ensembl (genomes)ArrayExpress(microarray-basedgene-expressiondata)UniProt (proteinsequences)InterPro(proteinfamilies,domainsandmotifs)PDBe (macromolecularstructures)
Others,suchasIntAct (protein–proteininteractions)Reactome (pathways)ChEBI (smallmolecules)IntEnz (enzymeclassification)GO (geneontology)
GenBankGenomeMapView
GEOnr(GenPept)
CDDMMDB
SwissInstituteofBioinformaticsSangerInstitute
7
8
http://www.ebi.ac.uk/training/online/course/nucleotide-sequence-data-resources-ebi
chromatograms
9
SequencemightfirstenterENAasSRA (SequenceReadArchive)fragmented sequencereads;itmightbere-submittedasassembledWGS(WholeGenomeShotgun)sequenceoverlapcontigs;itmightbere-submittedagainwithfurtherassemblyasCON(Constructed)sequenceentries,withtheolderWGSentriesbeingconsignedtotheSequenceVersionArchive
10
Dataisfirstsplitintoclasses,thenitissplitintointersectingslicesbytaxonomy
UniProt
11
http://www.uniprot.org/help/uniparc
12
SourcesofannotationfortheUniProtKnowledgebase
13
Lifeasa ScientificCuratorhttp://www.ebi.ac.uk/about/jobs/career-profiles/scientific-curator
ScientificDatabaseCuratorjob:Cambridge,UnitedKingdomhttp://www.nature.com/naturejobs/science/jobs/589083-hgnc-gene-nomenclature-advisor
Curation generationhttp://cys.bios.niu.edu/yyin/teach/PBB/Bioinformatics%20Curation%20generation.pdf
Handsonpractice1:UniProt
14
15
www.uniprot.orghttp://www.uniprot.org/help/abouthttp://www.uniprot.org/docs/uniprot_flyer.pdf
16
WearegoingtodoIDmapping
17
http://cys.bios.niu.edu/yyin/teach/PBB/at-id.txt
ChooseAraport hereandUniProtKB here
18
TheseareUniProt IDs
19
SelectthePALproteinsandalignthem
Clustal omegaprogramwillbecalledtoaligntheselectedproteinseqsMaytake1mintofinish
20
ThisistheMSAresultpageToggletheseoptionsonwilladdcolorsinthealignment
21
GobacktotheproteinlistpageSelectingoneproteinwillenabletheBLASTbutton
ChooseadvancedwillallowtochangeBLASTparameters
22
Hereyoucanmakechanges
23
WearegoingtosearchUniProt proteomesforhumanproteinsetClickonAdvancedyouwillseeapop-outwindow
Hereyoucanspecifysearchterms
24
Clickheretogethelp
Clickheretoopenanewpage
25
TheGeneOntology(GO)projectisacollaborativeefforttoaddresstheneedforconsistentdescriptionsofgeneproductsindifferentdatabases
Theprojectbeganasacollaborationbetweenthreemodelorganismdatabases, FlyBase (Drosophila),the Saccharomyces GenomeDatabase (SGD)andthe MouseGenomeDatabase (MGD),in1998
Threestructuredcontrolledvocabularies(ontologies)thatdescribegeneproductsintermsoftheirassociatedbiologicalprocesses,cellularcomponentsandmolecularfunctionsinaspecies-independent manner.
Therearethreeseparateaspectstothiseffort:
1,thedevelopmentandmaintenanceoftheontologies themselves;2,theannotation ofgeneproducts,whichentailsmakingassociationsbetweentheontologiesandthegenesandgeneproductsinthecollaboratingdatabases;and3,developmentoftools thatfacilitatethecreation,maintenanceanduseofontologies.
http://geneontology.org/page/documentation
GeneOntology
26
GOisnotadatabaseofgenesequences,noracatalogofgeneproducts.Rather,GOdescribeshowgeneproductsbehave inacellularcontext.
GOisnotadictatedstandard,mandatingnomenclatureacrossdatabases.Groupsparticipatebecauseofself-interest,andcooperatetoarriveataconsensus.
GOisnotawaytounifybiologicaldatabases(i.e.GOisnota'federatedsolution').Sharingvocabularyisasteptowardsunification,butisnot,initself,sufficient.
GeneOntologycoversthreedomains:
cellularcomponent,thepartsofacelloritsextracellularenvironment;
molecularfunction,theelementalactivitiesofageneproductatthemolecularlevel,suchasbindingorcatalysis;
biologicalprocess,operationsorsetsofmoleculareventswithadefinedbeginningandend,pertinenttothefunctioningofintegratedlivingunits:cells,tissues,organs,andorganisms
ThescopeofGO
27
ThestructureofGOcanbedescribedintermsofagraph,whereeachGOtermisanode,andtherelationshipsbetweenthetermsareedgesbetweenthenodes.GOislooselyhierarchical,with'child'termsbeingmorespecializedthantheir'parent'terms,butunlikeastricthierarchy,atermmayhavemorethanoneparentterm
http://geneontology.org/page/ontology-structure
28http://www.ebi.ac.uk/training/online/course/go-quick-tour/what-can-i-do-go
id: GO:0000016 name: lactase activity namespace: molecular_function def: "Catalysis of the reaction: lactose + H2O = D-glucose + D-galactose." [EC:3.2.1.108] synonym: "lactase-phlorizin hydrolase activity" BROAD [EC:3.2.1.108] synonym: "lactose galactohydrolase activity" EXACT [EC:3.2.1.108] xref: EC:3.2.1.108 xref: MetaCyc:LACTASE-RXN xref: Reactome:20536 is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl compounds
29
Enrichmentanalysis:usestatisticalteste.g.FisherexacttestExample:inhumangenomebackground(20,000genetotal),40genesareinvolvedinp53signalingpathway.Agivengenelisthasfoundthat3outof300belongtop53signalingpathway.Then weaskthequestionif3/300ismorethanrandomchancecomparingtothehumanbackgroundof40/20000
http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E4
30
UniProt-GOannotation(GOA)
http://www.ebi.ac.uk/training/online/course/uniprot-goa-quick-tour/what-uniprot-goa
31
The reference usedtomaketheannotation(e.g. ajournalarticle)An evidencecode denotingthetypeofevidenceuponwhichtheannotationisbasedThedateandthecreatoroftheannotation
Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032GO term: heart contraction ; GO:0060047 (biological process) Evidence code: Inferred from Mutant Phenotype (IMP) Reference: PMID 17611253Assigned by: UniProtKB, June 6, 2008
UniProt-GOAformat
32
Ifyouhaveanewgenome/transcriptome sequenced,howdoyouperformaGOannotationforit?
1. FindaclosetmodelorganismwhichhasbeenannotatedbyGO2. BLASTyourdataagainstthisclosestorganism3. TransfertheGOannotationofthebestmatchtoyourquerysequences
Forinstance,ifwewanttoannotateferntranscriptome withGOfunctiondescriptions….
1. FindArabidopsisUniProt proteindataset2. FindtheArabidopsisGOAassociationfile3. BLASTx fernreads(orassembledUniGenes)againsttheUniProt set4. AnalyzeBLASTresulttolinkfernreadsGOterms
TheideaofGOannotationfornewsequences
Handsonpractice2:GOannotation
33
34
http://geneontology.org/
35
http://amigo1.geneontology.org/cgi-bin/amigo/blast.cgi
Getanexampleproteinsequencefilefromhttp://cys.bios.niu.edu/yyin/teach/PBB/csl-pr.fa
36
37
Thisiseasy.Nowlet’strytogetalistofdifferentiallyexpressedgenesandthenfindwhat’scommoninthislistofgenesintermsoffunctions.
We’regonna useNCBIGEOwebsitetogetthegenelistandthenfeedthegenelisttoGOenrichmentanalysistools
38
GotoNCBIhomepage,searchGEODataSets withkeyword“GDS4831”,andhitsearch
39
Choose“Compare2setsofsamples”
Choose“Valuemeansdifference”Choose“8+fold”Choose“higher”
ThengotoStep2
SelecttochoosegroupA:threesamplesforCOP1depletionandHuh7cellline
GroupB:threesamplesfornegativecontrolandHuh7cellline
Hitok,andgotoStep3
40
Total256geneprofilesarefoundwith8+foldhigherexpressioninCOP1depletionthaninnegativecontrolinHuh7cellline
Togetthelistofgenes,chooseGenedatabaseandhitFinditems
41
Total225genescorrespondto256geneprofilesTodownloadthelistofGeneIDs,hitSendto,chooseUIlistasformatandhitCreatefile
Afilenamed“gene_result.txt”willbeautomaticallydownloadedtoyourlocalcomputerFindoutwhereitisdownloadedto,openitusingnotepad++
42
Viewthefileusingnotepad++
NextwewilluseDAVIDtoperformfunctionenrichmentanalysis
43
The Databasefor Annotation, Visualizationand IntegratedDiscovery (DAVID )
Hitstartanalysis
44
UploadthelistofGeneIDs
SelectENTREZ_GENE_ID
ClickonGenelist
45Checkthesubmittedgenelist
ThisallowsyoutoviewfunctionalannotationfromvariousresourcesincludingGO
46
IfyouhaveclickedonFunctionalAnnotationtool,youareatthispage
Allthesecanbechangedbyusers(toshowornottoshowandshowwhat)
Uncheckthis
47
SelectjustGO
Clickherewillopenanewwindowtoshowthe225differentiallyexpressedgenesareenrichedinwhatGO
48
GenesareenrichedinwhatGOcategories(comparedtothegenomebackground)?
Nextlecture: EBI web resources II (ENSEMBL
and InterPro)
49