Bioinformatics tools for biologists @ the EBI An overview.
-
Upload
angelina-tamsyn-jenkins -
Category
Documents
-
view
221 -
download
0
Transcript of Bioinformatics tools for biologists @ the EBI An overview.
Bioinformatics tools for biologists @ the EBI
An overview
2 EBI Overview
Bioinformatics
• The science of storing, retrieving and analyzing large amounts of biological information
• An interdisciplinary science, involving biologists, computer scientists and mathematicians
• At the heart of modern biology
3 EBI Overview
“Large-scale” focus
• Data explosion and new types of data
• High-throughput biology
• Emphasis on systems, not reductionism
• Large community of users with no training in bioinformatics
• Growth of applied biology – molecular medicine, agriculture, food, environmental sciences…
4 EBI Overview4
What is EMBL-EBI?
• Based on the Wellcome Trust Genome Campus near Cambridge, UK
• Part of the European Molecular Biology Laboratory
• Non-profit organization
5 EBI Overview
The EBI’s mission
• To provide freely available data and bioinformatics servicesservices to all facets of the scientific community in ways that promote scientific progress
• To contribute to the advancement of biology through basic investigator-driven researchresearch in bioinformatics
• To provide advanced bioinformatics trainingtraining to scientists at all levels, from PhD students to independent investigators
• To help disseminate cutting-edge technologies to industryindustry
Filler text
Databases and toolswww.ebi.ac.uk
New types of data
GenomesGenomes
DNA & RNA sequenceDNA & RNA sequence
Gene expressionGene expression
Protein sequenceProtein sequence
Protein families, motifs and domains
Protein families, motifs and domains
Protein structureProtein structure
Protein interactionsProtein interactions
Chemical entitiesChemical entities
PathwaysPathways
SystemsSystems
Literature and ontologiesLiterature and ontologies
7 EBI Overview
8 EBI Overview
GenomesEnsembl
Ensembl Genomes EGA
GenomesEnsembl
Ensembl Genomes EGA
Nucleotide sequenceEMBL-Bank
Nucleotide sequenceEMBL-Bank
Microarray & gene expression data
ArrayExpress
Microarray & gene expression data
ArrayExpress
ProteomesUniProt, PRIDE
ProteomesUniProt, PRIDE
Protein families, motifs and domains
InterPro
Protein families, motifs and domains
InterPro
Protein structurePDBe
Protein structurePDBe
Protein interactionsIntAct
Protein interactionsIntAct
Chemical entitiesChEBI
Chemical entitiesChEBI
PathwaysReactome
PathwaysReactome
SystemsBioModels
SystemsBioModels
Literature and ontologiesCiteXplore, GO
Literature and ontologiesCiteXplore, GO
8
Databases: molecules to systems
9
Database collaborations
9 EBI Overview
10 EBI Overview10
Standards development – international collaborations
Genome annotationwww.geneontology.org
Genome annotationwww.geneontology.org
Microarray and Gene Expression Data (MGED)
www.mged.org
Microarray and Gene Expression Data (MGED)
www.mged.org
Protein sequencewww.uniprot.org
Protein sequencewww.uniprot.org
HUPO- Proteomics Standards
Initiative (PSI)www.psidev.info
HUPO- Proteomics Standards
Initiative (PSI)www.psidev.info
Protein structurewww.wwpdb.org
Protein structurewww.wwpdb.org
Cheminformaticswww.ebi.ac.uk/chebi
Cheminformaticswww.ebi.ac.uk/chebi
Pathwayswww.reactome.org
www.biopax.org
Pathwayswww.reactome.org
www.biopax.org
Systems modeling standards
www.sbml.org
Systems modeling standards
www.sbml.orgMetabolomics Standards Initiative (MSI)www.metabolomicssociety.org
Metabolomics Standards Initiative (MSI)www.metabolomicssociety.org
Genomics Standards Consortium (GSC)http://gensc.org
Genomics Standards Consortium (GSC)http://gensc.org
Nucleotide sequencewww.insdc.org
Nucleotide sequencewww.insdc.org
EBI website: www.ebi.ac.uk
11 EBI Overview
Databases Tools
12 EBI Overview
Search all main databases in one go
Search all main databases in one go
EBI search engine: EB-eye
13
Nucleotides: European Nucleotide Archive (ENA)
• ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data
• Collaboration with GenBank and DDBJ for data sharing
• It consolidates information from EMBL-Bank, the European Trace Archive (containing raw data from electrophoresis-based sequencing machines) and the Sequence Read Archive (containing raw data from next-generation sequencing platforms)
• Provides access to the whole scale of sequencing information: from raw data, through assembly and mapping information, through to high-level functional annotation (see figure).
EBI Overview
Nucleotides: ENADownload dataDownload data
Navigate to view related data, e.g.
taxon-specific data
Navigate to view related data, e.g.
taxon-specific data
Other type of data include SRA experiments
Other type of data include SRA experiments
14 EBI Overview
Genomes: Ensembl & Ensembl Genomes
• Genome browser providing free access to the complete sequences of higher and model organism
• With Ensembl you can: Retrieve all or part of a genome sequence Perform sequence alignment using BLAST or BLAT Link to genome annotation from microarray results View expressed mRNA, protein, etc. in a chromosomal region View variations such as SNPs across strains or populations View all alternative splicing for a gene Explore homologues and phylogenetic tree across > 30 species View conserved regions across species
• Ensembl Genomes extends to non-vertebrate genomes
15 EBI Overview
Genomes: Ensembl
Across species Within species
SyntenySynteny
Pick a genomePick a genome
OrthologyOrthology
Genomic alignmentsGenomic alignments
Gene familiesGene families
SNPsSNPs
GenesGenesChromosomesChromosomes
16 EBI Overview
Genomes: Ensembl Genomes
17 EBI Overview
Across species View options
Ensembl Metazoa
Ensembl Metazoa
Ensembl BacteriaEnsembl Bacteria
Ensembl-like genome browser for non-vertebrate species
Ensembl-like genome browser for non-vertebrate species
Select Orthologue view to see putative orthologues
Using view options, you can select to view only the current gene or the entire expanded gene tree
Using view options, you can select to view only the current gene or the entire expanded gene tree
Retrieving data with Biomart
• BioMart is a search engine that can be used to download data into a table format
• Many EBI databases are powered by Biomart
• For example, you can use Ensembl Biomart to retrieve:
All the genes for one species
Or… only genes on one specific region of a chromosome
Or… genes on one region of a chromosome associated with an InterPro domain
Or…etc.
18 EBI Overview
Biomart – how it works
First Step:
Choose a dataset
Second step:
Add filters to define a gene set
Third step:
Add attributes to determine column output
19 EBI Overview
Biomart results
20 EBI Overview
www.biomart.org
21 EBI Overview
ArrayExpress & Atlas of Gene Expression
• ArrayExpress Archive is a public repository of functional genomics experiments, including gene expression, supporting scientific publications
• You can query it to retrieve experimental information and download functional genomics data
• Atlas of Gene Expression contains a subset of curated and re-annotated Archive data
• Can be queried for individual gene expression under different biological conditions across experiments
22 EBI Overview
Transcriptomes: ArrayExpress
Expand resultsExpand results
Spreadsheets describing the
experiment, sample properties or array
design
Spreadsheets describing the
experiment, sample properties or array
design
Search by keywordSearch by keyword
ArrayExpress Archive: browse
experiments
ArrayExpress Archive: browse
experiments
23 EBI Overview
Transcriptomes: Atlas of Gene Expression
Search by gene name or biological condition
Search by gene name or biological condition
Gene summary page
Gene summary page
Atlas interfaceAtlas interface
Experiment pageExperiment page24 EBI Overview
Protein sequence: UniProt• Provides the scientific community with a
comprehensive, richly curated, high-quality and freely accessible resource of protein sequence and functional information
• Users can perform simple and complex text-based queries, run sequence-based searches, perform multiple sequence alignments, etc.
• Consists of: UniProtKB/Swiss-prot, manually annotated UniProtKB/TrEMBL, computationally analyzed
records Uniref, clustered by sequence identity UniParc, most comprehensive publicly available
non-redundant protein sequence db, un-annotated UniMES, protein sequence from metagenomic and
environmental data
25 EBI Overview
UniPort text search for Brca1
26 EBI Overview
• Integrated documentation resource for protein families, domains and functional sites
• Protein signatures from different member databases describing the same biological protein family or domain are united into a single InterPro entry containing information about the signature(s) and links to the protein in UniProt
• Links to Gene Ontology indicate the biological function and process that the proteins are involved in
27 EBI Overview
Protein families, motifs & domains: InterPro
Protein families, motifs and domains: InterPro
View architectures of proteins containing a signature
View architectures of proteins containing a signature
Compare methods of protein signature prediction
Compare methods of protein signature prediction
Visualize the taxonomic range for a protein signature
Visualize the taxonomic range for a protein signature
28 EBI Overview
Molecular interaction database: Intact
• IntAct provides a freely available, open source database system and analysis tools for protein interaction data.
• All interactions are derived from literature curation or direct user submissions
• With Intact you can: Find molecules that interact with your
protein of interest
Display interaction networks
Analyze interaction networks using GO terms, molecule type, role, etc.
Download data
Install IntAct system locally
29 EBI Overview
The Protein Data Bank in Europe (PDBe)
• PDBe is a resource for the collection, organization and dissemination of data about biological macromolecular structures
• A suite of web-based services allows you to: PDBeView and PDBeLite provide a flexible and user-friendly query interface to the PDBe
database
PDBeAnalysis provides searches and statistical analyses of macromolecular structure and residue information
PDBeFold allows performing pairwise or multiple comparisons as well as 3D alignments of structures
PDBeChem allows searching for and visualize any molecule in the PDB’s ligand dictionary
PDBePisa is an interactive tool for exploring macromolecular interfaces and surfaces, predicting probable quaternary structures (assemblies) and searching the PDB for structurally similar interfaces and assemblies
PDBeMotif allows complex searches of the PDB based on small 3D motifs, sequence motifs in conjunction with ligand environment, secondary structure patterns
Many more tools available
30 EBI Overview
Structures: PDBe
LigandsLigands
Sequence mapping
Sequence mapping
Linking to domain data
Linking to domain data
AssembliesAssemblies
Surface matching
Surface matching
Fold matchingFold matching
Active sitesActive sites
Electron density
visualization
Electron density
visualization
31 EBI Overview
PRoteomics IDEntifications database (PRIDE)
• PRIDE is a centralized, standards compliant, public data repository for proteomics data
• Provides the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications.
• PRIDE is also able to capture details of post-translational modifications coordinated relative to the peptides in which they have been found.
32 EBI Overview
Enzymes: IntEnz
• IntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature.
• IntEnz contains the recommendations of the Nomenclature Committee of the IUBMB on the nomenclature and classification of enzyme-catalysed reactions.
33 EBI Overview
Chemical entities: ChEBI
• ChEBI is a freely available, manually annotated database of small molecular entities
• A molecular entity is any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity, not directly encoded by the genome
• With ChEBI you can: Find the correct chemical terminolgy using name, formula or registry number
Visualize chemical structures
Perform similarity searches
View the relationship between molecules using the chEBI ontology
Bridge the gap between small molecules and the macromolecules they interact with (crosslink to UniProt and Reactome)
Downoload chemical structures
Submit new structures
34 EBI Overview
Chemical entities: ChEBI
Link to other databases
Link to other databases
View mappings to other databases such as
Reactome and Uniprot
View mappings to other databases such as
Reactome and Uniprot
View structure, nomenclature,
formula and more
View structure, nomenclature,
formula and more
View relationships in
the ChEBI Ontology
View relationships in
the ChEBI Ontology
Download flat files, database dumps and the ChEBI Ontology for local installation
Download flat files, database dumps and the ChEBI Ontology for local installation
35 EBI Overview
• ChEMBL is a publicly available database of drugs, drug-like small molecules and their targets
• The data includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on the molecules’ absorption, distribution, metabolism, excretion and toxicity.
• ChEMBL holds two-dimensional structures, calculated molecular properties (e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology).
• The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels.
• Additional data on the clinical progress of compounds is being integrated into ChEMBL.
36 EBI Overview
Chemogenomics: ChEMBL
Chemogenomics: ChEMBL
ChEMBL
37 EBI Overview
Pathways: Reactome
• A free, online, open-source curated database of pathways and reactions in human biology
• Information in the database is authored by expert biologist researchers, maintained by Reactome editorial staff
• Used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast
• Extensively cross-referenced to other resources e.g. NCBI, Ensembl, UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO.
38 EBI Overview
Pathways: Reactome
View reactions and events in detail
View reactions and events in detail
Select a pathway
Select a pathway
Compare events in different species
Compare events in different species
Export pathwayExport pathway
Pathways: Reactome
Display expression dataDisplay expression data
Link to source databases
Link to source databases40 EBI Overview
Biological ontologies: Gene Ontology (GO)
• The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases
• GO develops ontologies that describe biological processes, cellular components and molecular functions in a species-independent manner
• Also GO annotates several of the EBI’s databases with GO terms
41 EBI Overview
User support
• 2Can bioinformatics user support – www.ebi.ac.uk/2Can
• Online help pages – www.ebi.ac.uk/help
• E-mail support – www.ebi.ac.uk/support
42 EBI Overview
http://www.ebi.ac.uk/Information/Brochures/
43 EBI Overview
Researchwww.ebi.ac.uk/groups
45 EBI Overview45
Key facts about research
• The EBI provides a unique environment for bioinformatics research
• Seven dedicated research groups aim to understand biology through new approaches to interpreting biological data
• Services teams also carry out R&D to enhance existing services and develop new ones
• Research program complements services and the two are mutually supportive
Mammalian stem cell differentiation and development Bertone
Vertebrate genome annotationFlicek
Genome analysis using evolutionary toolsGoldman
Transcriptome analysis on a genomic scaleBrazma
Functional genomics and small RNA analysisEnright
Literature analysis and semantic data integration in life science researchRebholz-Schuhmann
Protein sequence analysis and functional annotationApweiler
Cheminformatics and metabolismSteinbeck
Chemogenomics and drug discoveryOverington
Neurobiology networks and systemsLe Novère
Genome-scale analysis of regulatory systemsLuscombe
Analysis of protein structure, function and evolutionThornton
Algorithmic methods for genome analysisBirney
Analysis and validation of protein structures; protein–ligand interactionsKleywegt
Research
Systems BiomedicineSaez-Rodriguez
Evolutionary biologyMarioni
Trainingwww.ebi.ac.uk/training
48 EBI Overview4848
Bioinformatics Roadshow
eLearning programme
Hands-on training at EMBL-EBI
A tripartite user-training programme
Training comes to youwww.ebi.ac.uk/training/roadshow
Training comes to youwww.ebi.ac.uk/training/roadshow
Training any time, anywhere, at any pace
www.ebi.ac.uk/training/elearning
Training any time, anywhere, at any pace
www.ebi.ac.uk/training/elearning
Hands-on user training on all our core data resources for researchers
www.ebi.ac.uk/training/handson
Hands-on user training on all our core data resources for researchers
www.ebi.ac.uk/training/handson
49 EBI Overview49
Hands-on training for all levels of experience
• Interactive training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge
• Learn from the EBI’s experts through a combination of talks and practical exercises
• Take a tour of all our core data resources, or focus in on specific data types
• Full programme at www.ebi.ac.uk/training/handson
50 EBI Overview50
eLearning project – pilot phase
50
• Do you want to learn at your own pace at a time that suits you?
• We are developing a new eLearning platform and need our users to help us test it
• If you would like to get involved, contact: [email protected]