Fast Sequence Alignment Methods Using CUDA-enabled GPU

26
Fast Sequence Alignment Methods Using CUDA-enabled GPU Authors: Yeim-Kuan Chang, De-Yu Chen Present: Pei-Hua Huang Date: 2014/06/04 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

description

Fast Sequence Alignment Methods Using CUDA-enabled GPU. Authors: Yeim-Kuan Chang, De-Yu Chen Present: Pei-Hua Huang Date : 2014/06/04. Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. INTRODUCTION. - PowerPoint PPT Presentation

Transcript of Fast Sequence Alignment Methods Using CUDA-enabled GPU

Page 1: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Fast Sequence Alignment Methods Using CUDA-enabled GPU

Authors: Yeim-Kuan Chang, De-Yu ChenPresent: Pei-Hua HuangDate: 2014/06/04

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Page 2: Fast Sequence Alignment Methods Using CUDA-enabled GPU

INTRODUCTION Sequence alignment is the first step in

bioinformatics research that calculates the degree of similarity between two sequences

A large of biological data is stored in the form of DNA, RNA and protein sequences, and made of their respective letters set shown below:• DNA : {A, C, G, T}• RNA : {A, C, G, U}• Protein : {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y,

V}

Computer & Internet Architecture LabCSIE, National Cheng Kung University

2

Page 3: Fast Sequence Alignment Methods Using CUDA-enabled GPU

INTRODUCTION only a selected set of operations is allowed at each

aligning position: 1. match one letter with another, 2. mismatch one letter with another,3. align one letter with a gap (denoted by " - ")

symbol for increasing similarity

Computer & Internet Architecture LabCSIE, National Cheng Kung University

3

S1 = A - G G C C T A - T G

S2 = A C G G C C T A A T C

match mismatchgap

Page 4: Fast Sequence Alignment Methods Using CUDA-enabled GPU

INTRODUCTION Sequence alignment can be further classified as:

• Global alignment: align every letter of both sequences• Local alignment: find the similar region within two given

sequences

Computer & Internet Architecture LabCSIE, National Cheng Kung University

4

Global alignment:

Local alignment

- A G C C C T A G C G

C G G C G C - A A T G

G C C C T A - G

G C G C A A T G

S1 = A G C C C T A G C G

S2 = C G G C G C A A T G

Page 5: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Smith-Waterman algorithm In order to quantify an alignment, SW algorithm uses

a scoring mechanism which has three types of score as follows:• Match score if S1[i] = S2[j].• Mismatch score if S1[i] ≠ S2[j]• Gap penalty score if S1[i] = ” -” or S2[j] = ” -” , or both

The match and mismatch scores obtained by looking up a substitution matrix in which each cell contains a score for the corresponding pair of based

Computer & Internet Architecture LabCSIE, National Cheng Kung University

5

Page 6: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Smith-Waterman algorithm Given two sequences S1 and S2 of length m and n Let H be a score matrix of size (m+1)(n+1). The

recursive formula is defined as:

where 1 ≤ i ≤ m, 1 ≤ j ≤ n, sbt(S1[i], S2[j]) gets the match or mismatch score, g is the score of gap penalty

H[ i ][ 0 ] and H[ 0 ][ j ] for 0 ≤ i ≤ m and 0 ≤ j ≤ n are initialized as 0

Computer & Internet Architecture LabCSIE, National Cheng Kung University

6

gjiHgjiH

jSiSsbtjiHjiH

]1][[]][1[

])[],[(]1][1[max]][[

21

Page 7: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Smith-Waterman algorithm

Computer & Internet Architecture LabCSIE, National Cheng Kung University

7

Page 8: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Smith-Waterman algorithm

Computer & Internet Architecture LabCSIE, National Cheng Kung University

8

0 1 2 3 4 5 6 7 8 9 10

0 C C A C C G G G G C

1 0←

0←

0←

0←

0←

0←

0←

0←

0←

0←

0←

2 G 0↑

0↖

0↖

0↖

0↖

0↖

6↖

6↖

6↖

6↖

2←

3 C 0↑

9↖

9↖

5←

9↖

9↖

5←

3↖

3↖

3↖

15↖

4 A 0↑

5↑

9↖

13↖

9←

9↖

9↖

5↖

3↖

3↖

11↑

5 G 0↑

1↑

5↑

9↖

10↖

6↖

15↖

15↖

11↖

9↖

7↑

6 G 0↑

0↖

1↑

5↖

6↖

7↖

12↖

21↖

21↖

17↖

13↑

7 G 0↑

0↖

0↖

1↖

2↖

3↖

13↖

18↖

27↖

27↖

23↑

8 T 0↑

0↖

0↖

0↖

0↖

1↖

9↑

14↑

23↑

25↖

26↖

9 T 0↑

0↖

0↖

0↖

0↖

0↖

5↑

10↑

19↑

21↖

24↖

10 A 0↑

0↖

0↖

4↖

0↖

0↖

1↑

6↑

15↑

19↖

21↖

11 G 0↑

0↖

0↖

0↖

1↖

0↖

6↖

7↖

12↖

21↖

17↑

S1 = C A G G G

S2 = C C G G G

j →i↓

Page 9: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Parallelism mechanisms1. exploits the parallel characteristics of cells in the

anti-diagonals, that is, each anti-diagonal cell can be computed independently of the others

Computer & Internet Architecture LabCSIE, National Cheng Kung University

9

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)

(i-2)

(i-1)

i

Page 10: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Parallelism mechanisms2. The matrix is computed vector by vector in order

parallel to the query sequence

Computer & Internet Architecture LabCSIE, National Cheng Kung University

10

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7)

(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7)

1 2 3 4 5 6 7

8 9 10 11 12 13 14

Page 11: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Parallelism mechanisms3. processe the vector in parallel to the query

sequence and at the same time it processes the vector from the next column in anti-diagonal direction

Computer & Internet Architecture LabCSIE, National Cheng Kung University

11

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)

(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7) (8,8)

(9,1) (9,2) (9,3) (9,4) (9,5) (9,6) (9,7) (9,8)

(10,1) (10,2) (10,3) (10,4) (10,5) (10,6) (10,7) (10,8)

(11,1) (11,2) (11,3) (11,4) (11,5) (11,6) (11,7) (11,8)

(12,1) (12,2) (12,3) (12,4) (12,5) (12,6) (12,7) (12,8)

Page 12: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed method Redefine the recursive formula of SW algorithm so

that one row of the matrix can be calculated in parallel

Computer & Internet Architecture LabCSIE, National Cheng Kung University

12

0 1 2 3 4 5 6 7 8

b1 b2 b3 b4 b5 b6 b7 b8

0 0 0 0 0 0 0 0 0 0

1 a1 0

2 a2 0

3 a3 0

j →i↓

Page 13: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed method In order to compute the scores in parallel, we have

the following new formula:

Computer & Internet Architecture LabCSIE, National Cheng Kung University

13

8..1,0

]][0[

),(]1][0[

max]][1[1

'

jforgjH

basbtjH

jHj

.8..1,0

]1][1[]][1[

max]][1[

'

jforgjHjH

jH

Page 14: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed method

Computer & Internet Architecture LabCSIE, National Cheng Kung University

14

8..1,0

]][0[

),(]1][0[

max]][1[1

'

jforgjH

basbtjH

jHj

Page 15: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed method

expand to

Computer & Internet Architecture LabCSIE, National Cheng Kung University

15

.8..1,0

]1][1[]][1[

max]][1[

'

jforgjHjH

jH

Page 16: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed methodNext, the expanded formula is as follows.

Computer & Internet Architecture LabCSIE, National Cheng Kung University

16

Page 17: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed methodNext, the expanded formula is as follows.

Computer & Internet Architecture LabCSIE, National Cheng Kung University

17

Page 18: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Proposed methodThe formula (a) can be performed by the threads assigned to the cells in parallel. After the first step is finished, formulas (b) and (c) are then executed.

Computer & Internet Architecture LabCSIE, National Cheng Kung University

18

Where 1 i m, ≦ ≦ 1 j n. ≦ ≦

Page 19: Fast Sequence Alignment Methods Using CUDA-enabled GPU

use prefix max scan to speed up the computation of the formula (b)The prefix max scan is defined as:Input: array[a1, a2,..., an].Output: array[a1, M(a1, a2), M(a1, a3),...., M(a1, an)]The prefix max scan is performed in k steps where k = log2(n). For step i = 0 to k – 1, the element array[j] will be compared with array[j–2i] for j = 2i + 1 to n in parallel, and store the result in array[j]

Computer & Internet Architecture LabCSIE, National Cheng Kung University

19

Optimization

Page 20: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Computer & Internet Architecture LabCSIE, National Cheng Kung University

20

a1 M(a1, a2) M(a2, a3) M(a3, a4) M(a4, a5) M(a5, a6) M(a6, a7) M(a7, a8)

a1 a2 a3 a4 a5 a6 a7 a8

a1 M(a1, a2) M(a1, a3) M(a1, a4) M(a2, a5) M(a3, a6) M(a4, a7) M(a5, a8)

a1 M(a1, a2) M(a1, a3) M(a1, a4) M(a1, a5) M(a1, a6) M(a1, a7) M(a1, a8)

H’[1][1]+7g H’[1][2]+6g

H’[1][3]+5g H’[1][4]+4g H’[1][5]+3g H’[1][6]+2g H’[1][7]+g H’[1][8]

-7g -6g -5g -4g -3g -2g -g -0

H’[1][1]

gHgH

HM

2]1][1[']2][1[']3][1['

gHH

M]1][1[']2][1['

gH

gHH

M

7]1][1['

]7][1[']8][1['

Step 1

Step 2

Step 3

Optimization

Page 21: Fast Sequence Alignment Methods Using CUDA-enabled GPU

CUDA implementation One block to process one alignment task.

Computer & Internet Architecture LabCSIE, National Cheng Kung University

21

Page 22: Fast Sequence Alignment Methods Using CUDA-enabled GPU

CUDA implementation Each thread in the block is used to process each cell

of one row of the matrix

Computer & Internet Architecture LabCSIE, National Cheng Kung University

22

Page 23: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Experimental results execute on NVIDIA C1060 graphics card, with 30

MPs comprising 240 SPs and 4GB RAM, installed in a PC with an Intel Core 2 6420 2.13 GHz processor running the Ubuntu OS

Computer & Internet Architecture LabCSIE, National Cheng Kung University

23

Page 24: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Experimental results five databases are produced by Seq-Gen [16]. Each

database includes 100 sequences of equal length, ranging from 128 to 8192

the substitution matrix BLOSUM 62 is used with gap penalty - 4

Computer & Internet Architecture LabCSIE, National Cheng Kung University

24

Page 25: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Experimental results

Computer & Internet Architecture LabCSIE, National Cheng Kung University

25

Page 26: Fast Sequence Alignment Methods Using CUDA-enabled GPU

Experimental results

Computer & Internet Architecture LabCSIE, National Cheng Kung University

26