Post on 18-Dec-2015
This presentation uses animations and is best viewed as a slide show.
To start the presentation, click Slide Show
on the top tool bar and then View show
Welcome toIntroduction to Bioinformatics
Wednesday, 28 February 2007
Introduction to Viral Metagenome Project
• Discussion of Edward & Rohwer (2005)*
• Exam retrospective (Problem 12)
• Other matters?
*Unless otherwise noted, all figures herein are from: Edwards RA, Rohwer F (2005). Viral metagenomics. Nature Rev Microbiol (2005) 3:504-510.
Edwards & Rohwer (2005)Phage phylogeny and taxonomy
Placement of unknown phage into phylogeny
SQ11. How to test? Result of test?
~50,000 nt
Blast
~500 nt
Edwards & Rohwer (2005)The proviral metagenome
SQ11. What's a provirus or prophage? Why would a virus do such a thing?
InfectionPhage
Bacterial chromosome
Phage genome
LysogenicpathwayPhage genome
DeathGeneral transduction
Edwards & Rohwer (2005)The proviral metagenome
Lytic pathway
InfectionPhage
Bacterial chromosome
Phage genome
Phage genome
Life!
Lytic pathwayLysogenic pathway
Edwards & Rohwer (2005)The proviral metagenome
Edwards & Rohwer (2005)Viral community structure and ecology
SQ14. What means ~1012 viruses but only ~1000 viral genotypes? Two scenarios?
Edwards & Rohwer (2005)Viral community structure and ecology
SQX. How to measure complexity?
- Sample 1000- How many counted once?- How many counted twice?- How many counted zero times?
- Model the process Use different number of types
Edwards & Rohwer (2005)Viral community structure and ecology
SQX. How to measure complexity?
0
0.2
0.4
0.6
0.8
1
0 5 10 15
200 types
Times encountered
Probab
ilIty
Edwards & Rohwer (2005)Viral community structure and ecology
SQX. How to measure complexity?
200 types
Times encountered
Probab
ilIty
0
0.2
0.4
0.6
0.8
1
0 5 10 15
5000 types
Edwards & Rohwer (2005)Viral community structure and ecology
SQX. How to measure complexity?
0
0.1
0.2
0.3
0.4
0 5 10 15 20
Times encountered
Probab
ilIty
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
1. How to identify genes?
2. How to identify genes' viruses?
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes?
Sequence
Open reading frames
Sequence 151 TATTTCGTAG TTATGTTGAA CCGATGAAAC TTGTTTGTTC TCAAATTGAG Translation-Frame-1 151 Y F V V M L N R * N L F V L K L STranslation-Frame-2 151 I S * L C * T D E T C L F S N * A Translation-Frame-3 151 F R S Y V E P M K L V C S Q I E Complement 151 ATAAAGCATC AATACAACTT GGCTACTTTG AACAAACAAG AGTTTAACTC Translation-Frame-4 151 I E Y N H Q V S S V Q K N E F Q Translation-Frame-5 151 Y K T T I N F R H F K N T R L N L Translation-Frame-6 151 T N R L * T S G I F S T Q E * I S Sequence 201 CTCAATACAG CTCTTCAACT AGTTAGTAGA GCTGTAGCCA CTAGGCCTTC Translation-Frame-1 201 S I Q L F N * L V E L * P L G L R Translation-Frame-2 201 Q Y S S S T S * * S C S H * A F Translation-Frame-3 201 L N T A L Q L V S R A V A T R P S Complement 201 GAGTTATGTC GAGAAGTTGA TCAATCATCT CGACATCGGT GATCCGGAAG Translation-Frame-4 201 A * Y L E E V L * Y L Q L W * A K Translation-Frame-5 201 E I C S K L * N T S S Y G S P R Translation-Frame-6 201 S L V A R * S T L L A T A V L G E
Open reading frame finder + ORF characteristicsE.g. GeneMark
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes?
Sequence
Open reading frames
Predicted function
BlastP
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes?
Sequence
Open reading frames
Predicted function
BlastN?
SQ16. Other Blasts? TBlastX? Why so much time?
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes' viruses?
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes' viruses?
Codon usage in different organisms
SQ16. What means "codon usage"? How useful?
GC content in different organisms
SQ18. GC/AT differences in cyanobacterial genomes?
GC content in different organisms
S6301 0.5548433 S7942 0.554378 P9313 0.50739753 S6803 0.47359636 Npun 0.4135452 A7120 0.4126833 Tery 0.34196815 PRO1375 0.3644214 S8102 0.594126 Gvi 0.6199786 TeBP1 0.5391793 PMED4 0.3079916 Cwat 0.37098223 A29413 0.4141176
Constancy of sequence characteristics
- GC content
- Codon frequencies
- Dinucleotide frequencies
DNA sequence
Constancy of sequence characteristics
DNA sequence
- GC content
- Codon frequencies
- Dinucleotide frequencies
Constancy of sequence characteristics
DNA sequence
- GC content
- Codon frequencies
- Dinucleotide frequencies
Constancy of sequence characteristics
Karlin S (2001). Trends Microbiol 9:335-343
Edwards & Rohwer (2005)Bioinformatics and viral metagenomics
How to identify genes' viruses?
- GC content
- Codon frequencies
- Dinucleotide frequencies
Virus #1Virus #2Virus #3Virus #4Virus #5Virus #6
. . .
Viral fragment