Survey of Misannotations and Pseudogenes in the Arabidopsis Genome
description
Transcript of Survey of Misannotations and Pseudogenes in the Arabidopsis Genome
![Page 1: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/1.jpg)
Survey of Misannotations and Pseudogenes in the Arabidopsis Genome
Tanmay Prakash
![Page 2: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/2.jpg)
Objectives
Why•Misannotation can hinder research•Pseudogenes can be used to study natural selection
Objectives•Find Possible Misannotations•Find Possible Pseudogenes
![Page 3: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/3.jpg)
Many misannotations are the result of gene prediction programs mislabeling introns because of the presence of a stop codon
Misannotations
CDS CDSIntronUTR UTR
![Page 4: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/4.jpg)
Pseudogenes are DNA sequences that no longer function but resemble the functional genes they once were. There are two types:•Processed•Non-processed
Common Properties of Pseudogenes•Stop Codons•Frameshift mutations•Lack of Selective Pressure
agtacatgcataggactcgatcgactc
agtacatgataggactcgatcgactc
STCIGLDRL
ST..DSID
Pseudogenes
![Page 5: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/5.jpg)
Query Protein
Domains
SubjectArabidopsis
Introns
BLASTSearch
HMMERSearch
Query Protein
Domains
SubjectArabidopsis
CDS
GenesMatching In Introns
GenesMatching
In CDS
GenesMatchingIn Both
PossiblyMisannotated
Genes
Check forStop CodonsFrameshift
CheckKa/Ks
PossiblePseudogenes
Pipeline
![Page 6: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/6.jpg)
Query Protein
Domains
SubjectArabidopsis
Introns
BLASTSearch
HMMERSearch
Query Protein
Domains
SubjectArabidopsis
CDS
GenesMatching In Introns
GenesMatching In Exons
![Page 7: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/7.jpg)
GenesMatchingIn Both
PossiblyMisannotated
Genes
![Page 8: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/8.jpg)
Results
There were 346 genes (different models not included) that had matches to the same domain in the introns and exons
There were 299 genes (different models not included) that had matches to the same domain in an intron and flanking exons. These are most likely misannotations.
![Page 9: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/9.jpg)
Domain Possible Misannotations #DomainsPF01657.7 16 76PF02902.8 15 32PF06721.1 13 3PF07734.2 15 113
4 domains with the most possible misannotations
![Page 10: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/10.jpg)
Domain Family Size vs Misannotations
02468
10121416
0 500 1000 1500 2000 2500 3000
Number of Domains in Family
Nu
mb
er o
f M
isan
no
tati
on
s
Series1
![Page 11: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/11.jpg)
Misannotation Frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2000 4000 6000 8000 10000
Number of Genes Matching Domain
Per
cen
tag
e M
isan
no
tati
on
![Page 12: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/12.jpg)
Domian Gene Frequentcy
0
5
10
15
20
0 2000 4000 6000 8000 10000
Number of genes matching Domain
Num
ber o
f M
isan
nota
tions
![Page 13: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/13.jpg)
Future Research
•Identify pseudogenes by looking for stop codons, and frameshift mutations in the introns and checking the Ka/Ks value•Use a more recent database of domains•Follow the same process for the rice genome
![Page 14: Survey of Misannotations and Pseudogenes in the Arabidopsis Genome](https://reader035.fdocuments.us/reader035/viewer/2022062217/568152c4550346895dc0e208/html5/thumbnails/14.jpg)
Acknowledgement
Dr. Shin-Han ShiuDr. Kosuke HanadaDr. Melissa Lehti-ShiuDr. Gail RichmondHSHSP