The Ensembl Gene set The “Genebuild” 21 April 2008.
-
Upload
dorothy-warner -
Category
Documents
-
view
221 -
download
0
Transcript of The Ensembl Gene set The “Genebuild” 21 April 2008.
![Page 1: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/1.jpg)
The Ensembl Gene setThe Ensembl Gene setThe “Genebuild”The “Genebuild”
21 April 2008
![Page 2: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/2.jpg)
2 of 32
The GeneBuild (determining the Ensembl gene set)
What it means for the scientist? ‘annotation pipeline’ vs ‘manual curation’
Pseudogenes ncRNAs The CCDS project
OutlineOutline
![Page 3: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/3.jpg)
3 of 32
What is available?
I) Sequence Assemblies from genome sequencing efforts
IntroductionIntroduction
![Page 4: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/4.jpg)
4 of 32
Gene Sequencing- Gene Sequencing- the Assemblythe Assembly
http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.htmlThis generates clones, vs new sequencing methods
![Page 5: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/5.jpg)
5 of 32
Clones AvailableClones Available
Human:
(Tilepath- used in the assembly)
Ciona intestinalis
Shotgun assembly
![Page 6: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/6.jpg)
6 of 32
ContigView: Clones and ContigsContigView: Clones and Contigs
Contigs
Clones(Plate/well numbers) Ensembl
Transcripts
![Page 7: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/7.jpg)
7 of 32
Task:
View the tilepath clone in ContigView for the region containing the human
BRCA2 gene.
Hint: Start with a search for the BRCA2 gene.
![Page 8: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/8.jpg)
8 of 32
The Ensembl GenesetThe Ensembl Geneset
How does Ensembl use mRNA and protein information along with the sequence assembly to define distinct genes on the genome?
Protein Sequence Assembly Ensembl Geneset
![Page 9: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/9.jpg)
9 of 32
Once the Assembly is Imported…Once the Assembly is Imported…
Proteins/mRNAs are aligned.
These have been submitted to databases such as:
UniProt (manually curated) and
RefSeq (partially manually curated)
![Page 10: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/10.jpg)
10 of 32
The BiologicalThe Biological EvidenceEvidence
UniProt/Swiss-Prot
A manually curated database and therefore of highest accuracy
NCBI RefSeq
A partially manually curated database
UniProt/TrEMBL
Automatically annotated translations of EMBL coding sequence (CDS) features
EMBL / GenBank / DDBJ
Primary nucleotide sequence repository
All Ensembl gene predictions are based on experimental evidence:
![Page 11: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/11.jpg)
11 of 32
Database RelationshipDatabase Relationship
NCBIRefSeq
EMBL-BankDDBJ
GenBank
UniProt
Swiss-Prot TrEMBL
IndividualLab’s
Submission
![Page 12: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/12.jpg)
12 of 32
Sequence(Assembly)
Proteins(e.g. Swiss-Prot)
mRNA
EST
Manual annotation (HAVANA)
ESTgenes
Ensembl
GenebuildGenebuild
EMBL-BankGenBank
DDBJ
![Page 13: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/13.jpg)
13 of 32
Ensembl genes may be based on multiple protein/mRNAs
What is an Ensembl gene based on?
Why do I want to know?…Why do I want to know?…
![Page 14: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/14.jpg)
14 of 32
Task
Look at the evidence for the human EPO gene.
What was this gene based on?
Hint: Go to Exon Information from the GeneView page
![Page 15: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/15.jpg)
15 of 32
EPO gene supporting evidence
![Page 16: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/16.jpg)
16 of 32
Species-Specific GeneBuildsSpecies-Specific GeneBuilds
Pan troglodytes genes are built by projection from human genes.
Zebrafish has many gene duplications.
Homo sapiens genes must have
protein evidence, not just mRNA.
![Page 17: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/17.jpg)
17 of 32
Task
When was the chimpanzee (Pan troglodytes) Genebuild performed?
Can you find information as to how genes were annotated?
Hint: Look on the chimpanzee index page
![Page 18: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/18.jpg)
18 of 32
External Gene Set: VEGA/HavanaExternal Gene Set: VEGA/Havana
Human, zebrafish, mouse and dog
Havana transcripts in blue or gold…
What are Havana transcripts?
![Page 19: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/19.jpg)
20 of 32
Havana and Ensembl match
When a Havana (manually curated) and Ensembl (automatic methods) predictthe same transcript, basepair for basepair, the transcripts are merged and
coloured gold.
![Page 20: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/20.jpg)
21 of 32
Manually-curated gene sets in Manually-curated gene sets in EnsemblEnsembl
Vega (Havana)
Homo sapiens, Danio rerio,
Mus musculus and Canis familiaris
WormBase Caenorhabditis elegans
FlyBase Drosophila melanogaster
SGD Saccharomyces cerevisiae
![Page 21: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/21.jpg)
23 of 32
What Can Go Wrong?What Can Go Wrong?
I) A Gap in the assembly
Gene might not be found in Ensembl
II) Fused genes
BLAST hit(SwissProt
entry)
Gene might be associated with two names
![Page 22: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/22.jpg)
24 of 32
The genome sequence The Genebuild ‘manual curation’ by Havana Other: EST gene set
Pseudogenes
ncRNAs
OutlineOutline
![Page 23: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/23.jpg)
25 of 32
Expressed Sequence Tags vs Expressed Sequence Tags vs ‘cDNA’‘cDNA’
ESTs are annotated separately. Why?
mRNA and cDNA used in the GeneBuild:Sequenced to high standard, often complete.
EST: Lower quality sequence.
‘One shot’ sequencing of cDNA from the 5’ and 3’ end creates the EST sequence. ESTs are only 500-800 nucleotides longLow quality fragment- sequence error of ~2%.
BUT confers useful expression information discovery of new genes esp in diseased organisms Tissue type Timing/developmental stage Samples more transcripts, variants
![Page 24: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/24.jpg)
26 of 32
Where Can I See This EST Geneset?Where Can I See This EST Geneset?ContigView ContigView
Choose EST genes
EST track
![Page 25: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/25.jpg)
27 of 32
Pseudogenes: ‘False’ GenesPseudogenes: ‘False’ Genes
Unprocessed
Produced by gene duplication andrearrangement
Reverse transcription and re-integration
mRNA
pseudogene
AAAAAA
Processed
AAAAAA
![Page 26: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/26.jpg)
28 of 32
ncRNAs (non coding RNAs)ncRNAs (non coding RNAs)
What types are in Ensembl?
tRNA (transfer RNA)
rRNA (ribosomal RNA)
scRNA (small cytoplasmic)
snRNA (small nuclear)
snoRNA (small nucleolar)
miRNA (microRNA)
![Page 27: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/27.jpg)
29 of 32
ncRNAs (2 types)ncRNAs (2 types)
I) RNA with low homology can be identified through conserved 2ary structure (search genome using Rfam pattern)
II) High sequence conservation (miRNA)
BLAST alignment
‘RNA fold’ applied to make sure
sequences can fold (hairpin)
![Page 28: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/28.jpg)
30 of 32
ncRNAs… where can I see them?ncRNAs… where can I see them?
Find them in ContigView:
or use BioMart.
![Page 29: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/29.jpg)
31 of 32
*All Ensembl genes are based on biological evidence (protein and mRNA)
One Ensembl gene may come from proteins and mRNAs in various databases.
Havana (manually curated) genes are incorporated into the Ensembl geneset, merged for human.
The CCDS set strives for consensus coding sequences across databases.
Pseudogenes and RNAs are annotated, along with a separate EST gene set.
Summary – Ensembl GenesSummary – Ensembl Genes
![Page 30: The Ensembl Gene set The “Genebuild” 21 April 2008.](https://reader030.fdocuments.us/reader030/viewer/2022033103/56649e395503460f94b2adda/html5/thumbnails/30.jpg)
32 of 32
For more on GeneBuild:For more on GeneBuild:
Help and Documentation
(About Ensembl)
http://www.ensembl.org/info/about/docs/genome_annotation.html