Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

15
Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly (Didn’t have time to do this in class) Discussion of Study Question 14 from notes of Feb 6, which focuses on Table 1 and other sequence assembly parameters presented by Myers EW et al (2000). A whole- genome assembly of Drosophila. Science 287:2196- 2204. * http://www.people.vcu.edu/~elhaij/

description

Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly. (Didn’t have time to do this in class) Discussion of Study Question 14 from notes of Feb 6, which focuses on Table 1 and other sequence assembly parameters presented by - PowerPoint PPT Presentation

Transcript of Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

Page 1: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

Welcome toIntroduction to Bioinformatics*

Wednesday, 8 FebruaryGenome Sequencing/Assembly

(Didn’t have time to do this in class)

Discussion of Study Question 14 from notes of Feb 6, which focuses on Table 1 and other sequence assembly parameters presented by

Myers EW et al (2000). A whole-genome assembly of Drosophila. Science 287:2196-2204. 

* http://www.people.vcu.edu/~elhaij/bnfo301-12/

Page 2: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

We’re using this article to get an idea of how one can progressively make sense

out of a genome sequence.

The article presents a lot of quantitative information, particularly on…

Page 3: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

…the second page of the article, and particularly in Table 1.

Page 4: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

One of you asked:

I am having trouble understanding the meaning of the requested

and received columns

Page 5: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

An explanation of special terms in a table should be found either in a

footnote to the table or similar, and…

Page 6: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

…so it is!

Don’t concern yourself much with the Requested column. It’s just what

Myers et al figured they would need, as judged by simulation experiments.

Page 7: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

The Received column is much more important, giving the

actual values of their sequencing.

Page 8: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

From Tour of Myers et al (2000)

Here’s the study question we’re considering.

Many in the class were confused as to how to go about checking these statements. You should interpret “checking” to mean finding consistent evidence elsewhere in the article

that confirms that the numbers are not misprints.

Page 9: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

From Tour of Myers et al (2000)

Focus on the Part a of the question. Here’s a sample comment:

I understand what the table represents but cannot make any links as to how they obtained the

3.156 reads and 1.76 Gbp of sequence.

Page 10: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

First confirm… the statement indeed comes from the article, and there doesn’t seem to be anything else in the statement that sheds

much light on the numbers.

a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."

Page 11: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

What do these quantities mean?

Take the first one. What do you need to know to calculate the average read length?

a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."

Page 12: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

Not sure? What would you need to know to calculate the average length

of a book in the VCU library?

a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."

Page 13: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

Is that analogous to information you have with regard to the fly sequence?

a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."

Page 14: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

(yes)

Well, that’s how the game is played.

a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."

Page 15: Welcome to Introduction to Bioinformatics* Wednesday, 8 February Genome Sequencing/Assembly

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

From Tour of Myers et al (2000)

Try using a similar approach to figure out the other two parts.