GigAssembler. Genome Assembly: A big picture .
-
Upload
lindsay-horn -
Category
Documents
-
view
212 -
download
0
Transcript of GigAssembler. Genome Assembly: A big picture .
![Page 1: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/1.jpg)
GigAssembler
![Page 2: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/2.jpg)
Genome Assembly: A big picture
http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429
![Page 3: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/3.jpg)
GigAssembler – Preprocessing
1. Decontaminating & Repeat Masking.
2. Aligning of mRNAs, ESTs, BAC ends & paired reads against initial sequence contigs. psLayout → BLAT
3. Creating an input directory (folder) structure.
![Page 4: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/4.jpg)
http://www.triazzle.com; The image from http://www.dangilbert.com/port_fun.htmlReference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press
![Page 5: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/5.jpg)
RepBase + RepeatMasker
![Page 6: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/6.jpg)
GigAssembler: Build merged sequence contigs (“rafts”)
![Page 7: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/7.jpg)
Sequencing quality (Phred Score)
![Page 8: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/8.jpg)
Sequencing quality (Phred Score)
http://en.wikipedia.org/wiki/Phred_quality_score
Base-callingError Probability
![Page 9: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/9.jpg)
GigAssembler: Build merged sequence contigs (“rafts”)
![Page 10: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/10.jpg)
GigAssembler: Build merged sequence contigs (“rafts”)
![Page 11: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/11.jpg)
GigAssembler: Build sequenced clone contigs (“barges”)
![Page 12: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/12.jpg)
GigAssembler: Build a “raft-ordering” graph
![Page 13: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/13.jpg)
GigAssembler: Build a “raft-ordering” graph
Add information from mRNAs, ESTs, paired plasmid reads, BAC end pairs: building a “bridge”
Different weight to different data type: (mRNA ~ highest)
Conflicts with the graph as constructed so far are rejected.
Build a sequence path through each raft.
Fill the gap with N.
100: between rafts
50,000: between bridged barges
![Page 14: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/14.jpg)
Bellman-Ford algorithm
http://compprog.wordpress.com/2007/11/29/one-source-shortest-path-the-bellman-ford-algorithm/
![Page 15: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/15.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+7
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 16: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/16.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+7
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 17: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/17.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+7START
Inf. Inf.
Inf. Inf.
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 18: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/18.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+70
START
Inf.
Inf.
+7(→ A)
+6(→ A)
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 19: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/19.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+70
START
+7(→ A)
+6(→ A)
+2(→ B)
+4(→ D)
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 20: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/20.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+70
START
+7(→ A)
+2(→ B)
+4(→ D)
+2(→ C)
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 21: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/21.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
Find the shortest path to all nodes.
-4
+70
START
+7(→ A)
+4(→ D)
+2(→ C)
-2(→ B)
Take every edge and try to relax it (N – 1 times where N is the count of nodes)
![Page 22: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/22.jpg)
A
B C
D E
+6
+8
+7
+9
+2
+5
-2
-3
-4
+70
START
+7(→ A)
+4(→ D)
+2(→ C)
-2(→ B)
Answer: A-D-C-B-E
![Page 23: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/23.jpg)
Next-generation sequencing
![Page 24: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/24.jpg)
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina
![Page 25: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/25.jpg)
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina
![Page 26: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/26.jpg)
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Roche/454
![Page 27: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/27.jpg)
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 SOLiD
![Page 28: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/28.jpg)
An example of single molecule DNA sequencing,from Helicos (approx. 1 billion reads / run)
Pushkarev, D., N.F. Neff, and S.R. Quake. Nat Biotechnol (2009) 27, 847-50Harris, T.D., et al. Science (2008) 320, 106-9
![Page 29: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/29.jpg)
Mapping program
Trapnell C, Salzberg SL, Nat. Biotech., 2009
![Page 30: GigAssembler. Genome Assembly: A big picture .](https://reader035.fdocuments.us/reader035/viewer/2022070418/5697bfeb1a28abf838cb7e36/html5/thumbnails/30.jpg)
Two strategies in mapping
Trapnell C, Salzberg SL, Nat. Biotech., 2009