Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases...
Transcript of Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases...
![Page 2: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/2.jpg)
What happens with the human body when you are running?
2
![Page 3: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/3.jpg)
Organ systems work together
Skeletal system- supports the skeleton
Muscular system - pulls on the bones to enable you to move
Respiratory system - makes sure your muscles have enough oxygen for respiration
Circulatory system- provides oxygen and glucose to the skeletal muscle cells
3
![Page 4: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/4.jpg)
Human body structure
4
![Page 5: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/5.jpg)
(Bio)Molecules Individual players are important
5
![Page 6: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/6.jpg)
Heaps of knowledge on biomolecules online available.
6 BWE2007 - January 13th, 2014
![Page 7: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/7.jpg)
Protein synthesis
7
![Page 8: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/8.jpg)
Gene structure
www.carolguze.com 8
CDS = Coding DNA Sequence UTR = UnTranslated region
![Page 9: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/9.jpg)
GOAL
To understand biological sequence databases
• Which biological sequence databases are available?
• How can you find information in these databases?
• What is the content of the databases?
• What is Gene Ontology?
• Two projects aimed at deciphering the content of the
human genome, the human genome project &
ENCODE.
9
![Page 10: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/10.jpg)
What is a database
https://www.youtube.com/watch?v=gfT7EGibry0
10
![Page 11: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/11.jpg)
Genes in stead of persons
11
Name Identifier Sequence Synonyms Chromosomal location
Disease Many more
Gene 1 2456 AGTCCCGT DAH, HSD 4q12 Cancer .....
Gene2 4333 CGGTAACT HGR 7p10 Diabetes .......
Gene 3 6799 AGTCGGCGGG
etc
All the available information is stored in databases!
![Page 12: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/12.jpg)
Biological sequence databases Originally – just a storage place for sequences.
Currently – the databases are bioinformatics work bench which provide many tools for retrieving, comparing and analyzing sequences.
1. Global nucleotide/protein sequence storage databases:
– GenBank of NCBI (National Center for Biotechnology Information)
– The European Molecular Biology Laboratory (EMBL) database
– The DNA Data Bank of Japan (DDBJ)
2. Genome-centered databases
– NCBI genomes
– Ensembl Genome Browser
– UCSC Genome Bioinformatics Site
3. Protein Databases
– UniProt Lecture protein structures
12
![Page 13: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/13.jpg)
NCBI nucleotide databases
• GenBank
– Individual submissions (DNA, mRNA, eiwit)
– Bulk submissions (Genome centers)
• High throughput sequencing (DNA)
• Expressed Sequence Tags (mRNA)
• RefSeq
– Curated subset of GenBank
– “Reference” sequence
– Single sequence per locus / molecule
13
![Page 14: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/14.jpg)
Growth of GenBank
14
![Page 15: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/15.jpg)
Genome-centered databases
UCSC NCBI
Ensembl http://www.ensembl.org/
http://genome.ucsc.edu/
http://www.ncbi.nlm.nih.gov
15
![Page 16: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/16.jpg)
NCBI homepage
16
![Page 17: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/17.jpg)
NCBI Global Cross-database search http://www.ncbi.nlm.nih.gov/gquery/
17
![Page 18: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/18.jpg)
UniGene
• EST:
– DNA sequence corresponding to mRNA from expressed gene
– ~500 base pairs long
– Sequenced from a cDNA library
• Predict genes based on ESTs (expressed sequence
tags)
• Cluster ESTs from many cDNA libraries to predict
distinct genes
![Page 19: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/19.jpg)
Map mRNA (EST) back to DNA
an EST
![Page 20: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/20.jpg)
EST clusters
This is a gene with 1 EST associated; the cluster size is 1
This is a gene with 10 ESTs associated; the cluster size is 10
![Page 21: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/21.jpg)
Nu
mb
er
of
clu
ste
rs
Cluster size
UniGene clusters
Likely to be a real gene
40986
![Page 22: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/22.jpg)
Gene (NCBI)
DHH as example
22
![Page 23: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/23.jpg)
OMIM (NCBI)
23
![Page 24: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/24.jpg)
Homologene
24
Homologue = One of a group of similar DNA sequences that share a common ancestry.
![Page 25: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/25.jpg)
PubMed (NCBI)
25
![Page 26: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/26.jpg)
Ensembl homepage
26
![Page 27: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/27.jpg)
Ensembl
example DHH (human)
27
![Page 28: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/28.jpg)
UCSC homepage
28
![Page 29: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/29.jpg)
UCSC: Search Gene (DHH)
29
![Page 30: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/30.jpg)
UCSC: Entry page (DHH)
30
![Page 31: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/31.jpg)
Search for genomic information
using identifiers
How can you store genes with a unique
name?
Regular gene names are not suited
• Structured identifiers
• These are different for different databases
31
![Page 32: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/32.jpg)
NCBI identifiers
• RefSeq:
– Chromosome: NC_
– mRNA: NM_
– Protein: NP_
• Genbank:
– Many types of IDs
• NCBI gene ID:
– Number
• OMIM ID:
– Number
• Pubmed ID:
– Number
32
![Page 33: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/33.jpg)
• ENSG### Ensembl Gene ID
• ENST### Ensembl Transcript ID
• ENSP### Ensembl Peptide ID
• ENSE### Ensembl Exon ID
• For other species than human a suffix is added:
MUS (Mus musculus) for mouse: ENSMUSG###
DAR (Danio rerio) for zebrafish: ENSDARG###, etc.
Ensembl identifiers
33
![Page 34: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/34.jpg)
Where does all this information come from?
• Submissions (e.g. Sequences)
• Literature
• Curators and contributors
• Automated generation by computer tools
• High-throughput lab screenings
• Individual contributions and large scale contributions
34
![Page 35: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/35.jpg)
GENOME TRANSCRIPTOME PROTEOME
DNA RNA PROTEIN
Single biomolecules High throughput
Functional genomics
Sequencing and gene identification Sequencing and gene expression Identification and structure determination
35
![Page 36: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/36.jpg)
nu.nl – Sept. 6th 2012
36
![Page 37: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/37.jpg)
HGP and ENCODE
• We will now discuss these two major projects that contributed a lot of data
• The Humane Genome Project (1990-2003) – Sequencing of the human genome
– Characterizing the genes on the DNA sequence
• The ENCODE project (2003-2012) – Focuses on regulatory elements on the DNA
37
![Page 38: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/38.jpg)
the Human Genome Project
International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome. Nature 431, 931-945 (21 October 2004).
movie
38
![Page 39: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/39.jpg)
Genome sequencing: general principle
Fragments of DNA
AC..GC
TT..TC
CG..CA
AC..GC
TG..GT TC..CC
GA..GC
TG..AC
CT..TG
GT..GC AC..GC AC..GC
AT..AT
TT..CC
AA..GC
Short DNA sequences
ACGTGACCGGTACTGGTAACGTACA
CCTACGTGACCGGTACTGGTAACGT
ACGCCTACGTGACCGGTACTGGTAA
CGTATACACGTGACCGGTACTGGTA
ACGTACACCTACGTGACCGGTACTG
GTAACGTACGCCTACGTGACCGGTA
CTGGTAACGTATACCTCT...
Sequenced genome
Genome
39
![Page 40: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/40.jpg)
Sequencing the Human Genome
2010:
5K$,
a few
days
2009:
Illumina,
Helicos
40-50K$
Year
Log
10(p
ric
e)
2010 2005 2000
10
8
6
4
2 2015: 100$, <24
hrs?
2008: ABI
SOLiD
60K$, 2
weeks
2007: 454
1M$, 3 mont hs
2001: Celera
100M$, 3 years
2001: Human Genome
Project
2.7G$, 11 years
40
![Page 41: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/41.jpg)
When has a genome been fully sequenced?
• N-fold coverage
– A typical goal is to obtain five to ten-fold coverage.
– With next-generation sequencing typically even
more, like 30-fold coverage
– Mostly both strands are sequenced
• Finished sequence
– Usually no gaps in the sequence
– High quality standard; error rate <0.01%.
41
![Page 42: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/42.jpg)
viruses
plasmids
bacteria
fungi
plants
algae
insects
mollusks
reptiles
birds
mammals
Genome sizes in nucleotide base pairs (log scale)
104 108 105 106 107 1011 1010 109
The size of the human genome is ~ 3 X 109 bp; almost all of its complexity is in single-copy DNA. The human genome is thought to contain ~20,000-30,000 genes.
bony fish
amphibians
http://www3.kumc.edu/jcalvet/PowerPoint/bioc801b.ppt
42
![Page 43: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/43.jpg)
Selection of genomes for sequencing is based on criteria such as: • genome size (some plants are >>> human genome)
• cost
• relevance to human disease (or other disease)
• relevance to basic biological questions
• relevance to agriculture or other food production
Which genomes are sequenced?
43
![Page 44: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/44.jpg)
44
![Page 45: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/45.jpg)
Number of genes Species and Common Name
Estimated Total Size of Genome (bp)*
Estimated Number of Protein-Encoding Genes*
Saccharomyces cerevisiae (unicellular budding yeast)
12 million
6,000
Trichomonas vaginalis
160 million
60,000
Plasmodium falciparum (unicellular malaria parasite)
23 million
5,000
Caenorhabditis elegans (nematode)
95.5 million
18,000
Drosophila melanogaster (fruit fly)
170 million
14,000
Arabidopsis thaliana (mustard; thale cress)
125 million
25,000
Oryza sativa (rice)
470 million
51,000
Gallus gallus (chicken)
1 billion
20,000-23,000
Canis familiaris (domestic dog)
2.4 billion
19,000
Mus musculus (laboratory mouse)
2.5 billion
30,000
Homo sapiens (human)
2.9 billion
20,000-25,000
Plants and amphibians with huge genomes (not in table) do not have huge amounts of genes
Pray, L. (2008) Eukaryotic genome complexity. Nature Education 1(1) 45
![Page 46: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/46.jpg)
Organization of the human genome
www.carolguze.com 46
![Page 47: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/47.jpg)
Non-Protein coding DNA
www.carolguze.com 47
![Page 48: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/48.jpg)
The ENCODE Project: ENCyclopedia
Of DNA Elements A public research consortium
Launched: September 2003, upgraded to the entire genome September 2007. Goal: to carry out a project to identify all the functional elements in the human genome sequence.
48
![Page 49: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/49.jpg)
The making of ENCODE: Lessons for big-data projects. Birney E. Nature. 2012 Sep 6;489(7414):49-51
Understanding of the human genome is far from complete. We are missing knowledge on: 1. non-coding RNA 2. Alternatively spliced transcripts 3. Regulatory sequences
49
![Page 50: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/50.jpg)
Data retrieved from ENCODE project
Genomics: ENCODE explained. Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E Nature. 2012 Sep 6;489(7414):52-5.
50
![Page 51: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/51.jpg)
51
ENCODE data in Ensembl
![Page 52: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/52.jpg)
Gene Ontology
• Built for a very specific purpose:
“annotation of genes and proteins in genomic and protein databases”
• Applicable to all species
52
![Page 53: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/53.jpg)
• Molecular Function = elemental activity/task
– the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity
• Biological Process = biological goal or objective – broad biological goals, such as mitosis or purine metabolism, that
are accomplished by ordered assemblies of molecular functions
• Cellular Component = location or complex – subcellular structures, locations, and macromolecular complexes;
examples include nucleus, telomere, and RNA polymerase II holoenzyme
The 3 Gene Ontologies
![Page 54: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/54.jpg)
GO muscle contraction – tree view
54
![Page 55: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/55.jpg)
Gene products - Striated muscle contraction (GO:0006941)
55
![Page 56: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/56.jpg)
Anatomy of a GO term
id: GO:0006094
name: gluconeogenesis
namespace: process
def: The formation of glucose from
noncarbohydrate precursors, such as
pyruvate, amino acids and glycerol.
[http://cancerweb.ncl.ac.uk/omd/index.html]
exact_synonym: glucose biosynthesis
xref_analog: MetaCyc:GLUCONEO-PWY
is_a: GO:0006006
is_a: GO:0006092
unique GO ID
term name
definition
synonym
database ref
parentage
ontology
![Page 57: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/57.jpg)
No GO Areas
• GO covers ‘normal’ functions and processes
– No pathological processes
– No experimental conditions
• NO evolutionary relationships
• NOT a system of nomenclature
![Page 58: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/58.jpg)
Searching and Browsing GO
• AmiGO
– http://www.godatabase.org
• Downloads
– http://www.godatabase.org/dev/database/
– XML or as a MySQL database dump
• http://www.geneontology.org/GO.tools.annotation.shtml
– Annotate gene by sequence similarity.
![Page 59: Dr. Susan Steinbusch-Coort susan.coort@maastrichtuniversity€¦ · Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are](https://reader034.fdocuments.us/reader034/viewer/2022050215/5f619c806c6e265656026a96/html5/thumbnails/59.jpg)
Practical session
– Ensembl tutorials
– Ensembl genome browser
– Several NCBI databases
- Gene
- OMIM
– Gene Ontology
59