This presentation uses animations and is best viewed as a slide show. To start the presentation,...

Post on 18-Dec-2015

213 views 0 download

Tags:

Transcript of This presentation uses animations and is best viewed as a slide show. To start the presentation,...

This presentation uses animations and is best viewed as a slide show.

To start the presentation, click Slide Show

on the top tool bar and then View show

Welcome toIntroduction to Bioinformatics

Wednesday, 28 February 2007

Introduction to Viral Metagenome Project

• Discussion of Edward & Rohwer (2005)*

• Exam retrospective (Problem 12)

• Other matters?

*Unless otherwise noted, all figures herein are from: Edwards RA, Rohwer F (2005). Viral metagenomics. Nature Rev Microbiol (2005) 3:504-510.

Edwards & Rohwer (2005)Phage phylogeny and taxonomy

Placement of unknown phage into phylogeny

SQ11. How to test? Result of test?

~50,000 nt

Blast

~500 nt

Edwards & Rohwer (2005)The proviral metagenome

SQ11. What's a provirus or prophage? Why would a virus do such a thing?

InfectionPhage

Bacterial chromosome

Phage genome

LysogenicpathwayPhage genome

DeathGeneral transduction

Edwards & Rohwer (2005)The proviral metagenome

Lytic pathway

InfectionPhage

Bacterial chromosome

Phage genome

Phage genome

Life!

Lytic pathwayLysogenic pathway

Edwards & Rohwer (2005)The proviral metagenome

Edwards & Rohwer (2005)Viral community structure and ecology

SQ14. What means ~1012 viruses but only ~1000 viral genotypes? Two scenarios?

Edwards & Rohwer (2005)Viral community structure and ecology

SQX. How to measure complexity?

- Sample 1000- How many counted once?- How many counted twice?- How many counted zero times?

- Model the process Use different number of types

Edwards & Rohwer (2005)Viral community structure and ecology

SQX. How to measure complexity?

0

0.2

0.4

0.6

0.8

1

0 5 10 15

200 types

Times encountered

Probab

ilIty

Edwards & Rohwer (2005)Viral community structure and ecology

SQX. How to measure complexity?

200 types

Times encountered

Probab

ilIty

0

0.2

0.4

0.6

0.8

1

0 5 10 15

5000 types

Edwards & Rohwer (2005)Viral community structure and ecology

SQX. How to measure complexity?

0

0.1

0.2

0.3

0.4

0 5 10 15 20

Times encountered

Probab

ilIty

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

1. How to identify genes?

2. How to identify genes' viruses?

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes?

Sequence

Open reading frames

Sequence 151 TATTTCGTAG TTATGTTGAA CCGATGAAAC TTGTTTGTTC TCAAATTGAG Translation-Frame-1 151 Y F V V M L N R * N L F V L K L STranslation-Frame-2 151 I S * L C * T D E T C L F S N * A Translation-Frame-3 151 F R S Y V E P M K L V C S Q I E Complement 151 ATAAAGCATC AATACAACTT GGCTACTTTG AACAAACAAG AGTTTAACTC Translation-Frame-4 151 I E Y N H Q V S S V Q K N E F Q Translation-Frame-5 151 Y K T T I N F R H F K N T R L N L Translation-Frame-6 151 T N R L * T S G I F S T Q E * I S Sequence 201 CTCAATACAG CTCTTCAACT AGTTAGTAGA GCTGTAGCCA CTAGGCCTTC Translation-Frame-1 201 S I Q L F N * L V E L * P L G L R Translation-Frame-2 201 Q Y S S S T S * * S C S H * A F Translation-Frame-3 201 L N T A L Q L V S R A V A T R P S Complement 201 GAGTTATGTC GAGAAGTTGA TCAATCATCT CGACATCGGT GATCCGGAAG Translation-Frame-4 201 A * Y L E E V L * Y L Q L W * A K Translation-Frame-5 201 E I C S K L * N T S S Y G S P R Translation-Frame-6 201 S L V A R * S T L L A T A V L G E

Open reading frame finder + ORF characteristicsE.g. GeneMark

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes?

Sequence

Open reading frames

Predicted function

BlastP

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes?

Sequence

Open reading frames

Predicted function

BlastN?

SQ16. Other Blasts? TBlastX? Why so much time?

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes' viruses?

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes' viruses?

Codon usage in different organisms

SQ16. What means "codon usage"? How useful?

GC content in different organisms

SQ18. GC/AT differences in cyanobacterial genomes?

GC content in different organisms

S6301 0.5548433 S7942 0.554378 P9313 0.50739753 S6803 0.47359636 Npun 0.4135452 A7120 0.4126833 Tery 0.34196815 PRO1375 0.3644214 S8102 0.594126 Gvi 0.6199786 TeBP1 0.5391793 PMED4 0.3079916 Cwat 0.37098223 A29413 0.4141176

Constancy of sequence characteristics

- GC content

- Codon frequencies

- Dinucleotide frequencies

DNA sequence

Constancy of sequence characteristics

DNA sequence

- GC content

- Codon frequencies

- Dinucleotide frequencies

Constancy of sequence characteristics

DNA sequence

- GC content

- Codon frequencies

- Dinucleotide frequencies

Constancy of sequence characteristics

Karlin S (2001). Trends Microbiol 9:335-343

Edwards & Rohwer (2005)Bioinformatics and viral metagenomics

How to identify genes' viruses?

- GC content

- Codon frequencies

- Dinucleotide frequencies

Virus #1Virus #2Virus #3Virus #4Virus #5Virus #6

. . .

Viral fragment