RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly...

22
RNA Sequence Assembly WEI Xueliang

Transcript of RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly...

Page 1: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA Sequence Assembly

WEI Xueliang

Page 2: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 3: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Sequence Assembly

• Goal : get the DNA/RNA sequence.

• Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases.

• Define: Read = Tag = Fragment

Page 4: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

De novo sequence assembly

Page 5: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 6: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

De novo sequence assembly

Calculating the overlap need huge amount of time.

Page 7: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

DE BRUIJN GRAPH

K-Mer : Length k substring of the Tag.

Each nodes only have 4 out degrees at most.

Hashing the node. “CTG”=>(132)4=(30)10

“CTG”=>”TGG” (132=)4 shift left. (1320)4 module (1000)4

(320)4 + (3)4 ‘G’ (323)4

Page 8: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

DE BRUIJN GRAPH (CONT’)

If there are repeats, like ”GACT”

3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence.

Larger K, better result.

Page 9: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

De novo sequence assembly

Suppose use K = Length of Tag. (20-Mer) TGACGTAGCTATGTATTTTG GACGTAGCTATGTATTTTGT (no 20-Mer)

Coverage is not enough to support large K.

Page 10: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 11: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

MY METHOD.

Tag length=6, K=3 When we have

AAGACT? Try all the way:

AAGACTC AAGACTT AAGACTG

Check Tag : AGACTC

The correct way should be AAGACTC

Page 12: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 13: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA ASSEMBLY

Page 14: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

ALTERNATIVE SPLICING

The graph

All cDNA sequences.

Page 15: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA ASSEMBLY’S PROBLEM

Merge? Index the sequence.

Page 16: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

Page 17: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA ASSEMBLY’S PROBLEM(CONT’)

Index Tags

Page 18: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

Speed?

Page 19: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

SINGLE TAG’S LIMITATION

|Yellow Sequence| >= Length of Tag Length of Tag 25-100bp. Single Tag is not enough!

Page 20: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

DATASET - PAIRED END TAGS

Fragment length usually > 1k Some RNA sequence is shorter than 1k.

Page 21: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

TO DO

Handle large data-sets. (10G) Improve accuracy. Using PETs data.

Page 22: RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Thanks!!