Fast Sequence Alignment Methods Using CUDA-enabled GPU

Fast Sequence Alignment Methods Using CUDA-enabled GPU

Authors: Yeim-Kuan Chang, De-Yu ChenPresent: Pei-Hua HuangDate: 2014/06/04

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

INTRODUCTION Sequence alignment is the first step in

bioinformatics research that calculates the degree of similarity between two sequences

A large of biological data is stored in the form of DNA, RNA and protein sequences, and made of their respective letters set shown below:• DNA : {A, C, G, T}• RNA : {A, C, G, U}• Protein : {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y,

V}

Computer & Internet Architecture LabCSIE, National Cheng Kung University

2

INTRODUCTION only a selected set of operations is allowed at each

aligning position: 1. match one letter with another, 2. mismatch one letter with another,3. align one letter with a gap (denoted by " － ")

symbol for increasing similarity


3

S1 = A － G G C C T A － T G

S2 = A C G G C C T A A T C

match mismatchgap

INTRODUCTION Sequence alignment can be further classified as:

• Global alignment: align every letter of both sequences• Local alignment: find the similar region within two given

sequences


4

Global alignment:

Local alignment

－ A G C C C T A G C G

C G G C G C － A A T G

G C C C T A － G

G C G C A A T G

S1 = A G C C C T A G C G

S2 = C G G C G C A A T G

Smith-Waterman algorithm In order to quantify an alignment, SW algorithm uses

a scoring mechanism which has three types of score as follows:• Match score if S1[i] = S2[j].• Mismatch score if S1[i] ≠ S2[j]• Gap penalty score if S1[i] = ” －” or S2[j] = ” －” , or both

The match and mismatch scores obtained by looking up a substitution matrix in which each cell contains a score for the corresponding pair of based


5

Smith-Waterman algorithm Given two sequences S1 and S2 of length m and n Let H be a score matrix of size (m+1)(n+1). The

recursive formula is defined as:

where 1 ≤ i ≤ m, 1 ≤ j ≤ n, sbt(S1[i], S2[j]) gets the match or mismatch score, g is the score of gap penalty

H[ i ][ 0 ] and H[ 0 ][ j ] for 0 ≤ i ≤ m and 0 ≤ j ≤ n are initialized as 0


6

gjiHgjiH

jSiSsbtjiHjiH

]1][[]][1[

])[],[(]1][1[max]][[

21

Smith-Waterman algorithm


7

Smith-Waterman algorithm


8

0 1 2 3 4 5 6 7 8 9 10

0 C C A C C G G G G C

1 0←

0←

0←

0←

0←

0←

0←

0←

0←

0←

0←

2 G 0↑

0↖

0↖

0↖

0↖

0↖

6↖

6↖

6↖

6↖

2←

3 C 0↑

9↖

9↖

5←

9↖

9↖

5←

3↖

3↖

3↖

15↖

4 A 0↑

5↑

9↖

13↖

9←

9↖

9↖

5↖

3↖

3↖

11↑

5 G 0↑

1↑

5↑

9↖

10↖

6↖

15↖

15↖

11↖

9↖

7↑

6 G 0↑

0↖

1↑

5↖

6↖

7↖

12↖

21↖

21↖

17↖

13↑

7 G 0↑

0↖

0↖

1↖

2↖

3↖

13↖

18↖

27↖

27↖

23↑

8 T 0↑

0↖

0↖

0↖

0↖

1↖

9↑

14↑

23↑

25↖

26↖

9 T 0↑

0↖

0↖

0↖

0↖

0↖

5↑

10↑

19↑

21↖

24↖

10 A 0↑

0↖

0↖

4↖

0↖

0↖

1↑

6↑

15↑

19↖

21↖

11 G 0↑

0↖

0↖

0↖

1↖

0↖

6↖

7↖

12↖

21↖

17↑

S1 = C A G G G

S2 = C C G G G

j →i↓

Parallelism mechanisms1. exploits the parallel characteristics of cells in the

anti-diagonals, that is, each anti-diagonal cell can be computed independently of the others


9

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)

(i-2)

(i-1)

i

Parallelism mechanisms2. The matrix is computed vector by vector in order

parallel to the query sequence


10

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7)

(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7)

1 2 3 4 5 6 7

8 9 10 11 12 13 14

Parallelism mechanisms3. processe the vector in parallel to the query

sequence and at the same time it processes the vector from the next column in anti-diagonal direction


11

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)

(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7) (8,8)

(9,1) (9,2) (9,3) (9,4) (9,5) (9,6) (9,7) (9,8)

(10,1) (10,2) (10,3) (10,4) (10,5) (10,6) (10,7) (10,8)

(11,1) (11,2) (11,3) (11,4) (11,5) (11,6) (11,7) (11,8)

(12,1) (12,2) (12,3) (12,4) (12,5) (12,6) (12,7) (12,8)

Proposed method Redefine the recursive formula of SW algorithm so

that one row of the matrix can be calculated in parallel


12

0 1 2 3 4 5 6 7 8

b1 b2 b3 b4 b5 b6 b7 b8

0 0 0 0 0 0 0 0 0 0

1 a1 0

2 a2 0

3 a3 0

j →i↓

Proposed method In order to compute the scores in parallel, we have

the following new formula:


13

8..1,0

]][0[

),(]1][0[

max]][1[1

'

jforgjH

basbtjH

jHj

.8..1,0

]1][1[]][1[

max]][1[

'

jforgjHjH

jH

Proposed method


14

8..1,0

]][0[

),(]1][0[

max]][1[1

'

jforgjH

basbtjH

jHj

Proposed method

expand to


15

.8..1,0

]1][1[]][1[

max]][1[

'

jforgjHjH

jH

Proposed methodNext, the expanded formula is as follows.


16

Proposed methodNext, the expanded formula is as follows.


17

Proposed methodThe formula (a) can be performed by the threads assigned to the cells in parallel. After the first step is finished, formulas (b) and (c) are then executed.


18

Where 1 i m, ≦ ≦ 1 j n. ≦ ≦

use prefix max scan to speed up the computation of the formula (b)The prefix max scan is defined as:Input: array[a1, a2,..., an].Output: array[a1, M(a1, a2), M(a1, a3),...., M(a1, an)]The prefix max scan is performed in k steps where k = log2(n). For step i = 0 to k – 1, the element array[j] will be compared with array[j–2i] for j = 2i + 1 to n in parallel, and store the result in array[j]


19

Optimization


20

a1 M(a1, a2) M(a2, a3) M(a3, a4) M(a4, a5) M(a5, a6) M(a6, a7) M(a7, a8)

a1 a2 a3 a4 a5 a6 a7 a8



H’[1][1]+7g H’[1][2]+6g

H’[1][3]+5g H’[1][4]+4g H’[1][5]+3g H’[1][6]+2g H’[1][7]+g H’[1][8]

-7g -6g -5g -4g -3g -2g -g -0

H’[1][1]

gHgH

HM

2]1][1[']2][1[']3][1['

gHH

M]1][1[']2][1['

gH

gHH

M

7]1][1['

]7][1[']8][1['

Step 1

Step 2

Step 3

Optimization

CUDA implementation One block to process one alignment task.


21

CUDA implementation Each thread in the block is used to process each cell

of one row of the matrix


22

Experimental results execute on NVIDIA C1060 graphics card, with 30

MPs comprising 240 SPs and 4GB RAM, installed in a PC with an Intel Core 2 6420 2.13 GHz processor running the Ubuntu OS


23

Experimental results five databases are produced by Seq-Gen [16]. Each

database includes 100 sequences of equal length, ranging from 128 to 8192

the substitution matrix BLOSUM 62 is used with gap penalty － 4


24

Experimental results


25

Experimental results


26

Fast Sequence Alignment Methods Using CUDA-enabled GPU

Documents

Transcript of Fast Sequence Alignment Methods Using CUDA-enabled GPU