De Novo Repeat Classification and Fragment Assembly
description
Transcript of De Novo Repeat Classification and Fragment Assembly
![Page 1: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/1.jpg)
Pusan National UniversityInterdisciplinary Program of Bioinformatics
De Novo Repeat Classification De Novo Repeat Classification and Fragment Assemblyand Fragment Assembly
석사 1 년김 우 연
![Page 2: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/2.jpg)
PROGRAMS related RepeatPROGRAMS related Repeat
Repeat Annotation - libraries RepeatMasker ( A.F.A. Smit and P. Green, unpubl. ) MaskerAid ( Bedell et al. 2000 ) No de novo compilation
Repeat Analysis RepeatMatch ( Delcher et al. 1999 ) REPuter ( Kurtz et al. 2000, 2001 ) RECON, RepeatFinder, LTR_STRUC No compact overview or summary of the repeat family
![Page 3: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/3.jpg)
Genome Research Received January 27, 2004 Accepted in revised form June 29, 2004
![Page 4: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/4.jpg)
CONTENTSCONTENTS
Introduction Concepts Methods
De Bruijn Graphs & A-Bruijn Graphs RepeatGluer Algorithm Constructing A-Bruijn Graphs Without the Similarity Matrix Fragment Assembly FragmentGluer Algorithm
Results and Discussion
![Page 5: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/5.jpg)
INTRODUCTIONINTRODUCTION
“The problem of automated repeat sequence family classification is inherently messy and ill-defined and does not appear to be amenable to a clean algorithmic attack” – Bao and Eddy (2002)
One of the difficulties in repeat classification is that many repeats represent mosaics of sub-repeats – Bailey et al. 2002
Aims Proposing a new approach to repeat classification FragmentGluer assembler
![Page 6: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/6.jpg)
CONCEPSCONCEPS
![Page 7: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/7.jpg)
Genomic dot-plotGenomic dot-plot
Genomic dot-plot of an imaginary sequence
An imaginary evolutionary process
Gluing repeated regions leads to the repeat graph the final genome
![Page 8: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/8.jpg)
The idea of our approachThe idea of our approach
By gluing points together, repeats transform into the
A-Bruijn graph
![Page 9: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/9.jpg)
Mosaic repeat organizationMosaic repeat organization
BAC from human Chromosome Y Repeat pairs by REPuter & Sub-repeats by our division Repeat multigraph Repeat graph RepeatFinder vs RECON vs REPuter
![Page 10: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/10.jpg)
METHODSMETHODS
![Page 11: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/11.jpg)
De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs
De Bruijn Graph: ACTGCTGCC
ACT CTG
TGCGCT GCC
ACTGCTGCC ACTGCTGCC
![Page 12: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/12.jpg)
De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs
A-Bruijn Graph: … AT … ACT … ACAT …
![Page 13: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/13.jpg)
Whirls & Bulges
Available gaps & mismatch
![Page 14: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/14.jpg)
RepeatGluer AlgorithmRepeatGluer Algorithm
Construct the A-Bruijn graph Eliminate whirls Remove bulges Erosion – Remove all leaves Straighten zigzag paths Forming the consensus sequence Output repeat families
![Page 15: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/15.jpg)
Constructing A-Bruijn Graphs Without Constructing A-Bruijn Graphs Without the Similarity Matrixthe Similarity Matrix
Constructing of the A-Bruijn graph assumes S and A S and { S1, …, St } can construct A-Bruijn graph of S
A set for every pair of consecutive positions in S Matrix |Si| x |Sj|
A snapshot of a “small” area of matrix A
S: A genomic sequencen: the length of SA: matrix n x n{ S1, …, St }: A set of substrings|Si|: the length of the string Si
![Page 16: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/16.jpg)
Fragment AssemblyFragment Assembly
Assemblers Phrap ( Green 1994 ) Celera assembler ( Myers et al. 2000 ) EULER assembler ( Pevzner et al. 2001 )
http://nbcr.sdsc.edu/euler
ARCHNE, Phusion, CAP, TIGR
Building an accurate assembler EULER + Phrap EULER+ EULER’s accuracy in analyzing repeats & Phrap’s ability to han
dle low-coverage regions, low-quality reads, and read ends Less memory than the original EULER FragmentGluer algorithm
![Page 17: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/17.jpg)
FragmentGluer AlgorithmFragmentGluer Algorithm
1. Construct the A-Bruijn graph of S2. Eliminate whirls by splitting the composed vertices 3. Remove bulges 4. Erosion procedure by removing all leaves5. Straighten zigzag paths6. Thread each read7. Definition consensus sequence8. Output repeat families9. Transform mate-pairs into mate-paths after step 610. Assemble the resulting contigs into scaffolds by the
EULER Scaffolding algorithm
![Page 18: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/18.jpg)
RESULTS AND DISCUSSIONRESULTS AND DISCUSSION
![Page 19: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/19.jpg)
BenchmarkingBenchmarking
Results of a study of 518 human chromosome 20 clones.
Phrap ARACHNE EULER+
Av.# contigs/clone 6.8 13.8 6.2
Av. coverage 99.30% 98.60% 98.80%
# misassembled contigs 37 17 7
# missing repeats 5 9 4
EULER produced the least number of misassembled contigs. EULER also had the least number of missing repeat copies (4), ahead of Phrap (5) and Arachne (9). Average coverage, over 518 clones, was 99.3% for Phrap, 98.8% for EULER, and 98.6% for ARACHNE Average number of contigs per clone was the least for EULER (6.2) followed by Phrap (6.8) and ARACHNE (13.8).
![Page 20: De Novo Repeat Classification and Fragment Assembly](https://reader036.fdocuments.us/reader036/viewer/2022062518/56814514550346895db1d6da/html5/thumbnails/20.jpg)
More researchMore research
The consensus sequence analysis of FragmentGluer Detecting de novo HERVs as the consensus sequence of
FragmentGluer