Fast Sequence Alignment Methods Using CUDA-enabled GPU
description
Transcript of Fast Sequence Alignment Methods Using CUDA-enabled GPU
Fast Sequence Alignment Methods Using CUDA-enabled GPU
Authors: Yeim-Kuan Chang, De-Yu ChenPresent: Pei-Hua HuangDate: 2014/06/04
Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
INTRODUCTION Sequence alignment is the first step in
bioinformatics research that calculates the degree of similarity between two sequences
A large of biological data is stored in the form of DNA, RNA and protein sequences, and made of their respective letters set shown below:• DNA : {A, C, G, T}• RNA : {A, C, G, U}• Protein : {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y,
V}
Computer & Internet Architecture LabCSIE, National Cheng Kung University
2
INTRODUCTION only a selected set of operations is allowed at each
aligning position: 1. match one letter with another, 2. mismatch one letter with another,3. align one letter with a gap (denoted by " - ")
symbol for increasing similarity
Computer & Internet Architecture LabCSIE, National Cheng Kung University
3
S1 = A - G G C C T A - T G
S2 = A C G G C C T A A T C
match mismatchgap
INTRODUCTION Sequence alignment can be further classified as:
• Global alignment: align every letter of both sequences• Local alignment: find the similar region within two given
sequences
Computer & Internet Architecture LabCSIE, National Cheng Kung University
4
Global alignment:
Local alignment
- A G C C C T A G C G
C G G C G C - A A T G
G C C C T A - G
G C G C A A T G
S1 = A G C C C T A G C G
S2 = C G G C G C A A T G
Smith-Waterman algorithm In order to quantify an alignment, SW algorithm uses
a scoring mechanism which has three types of score as follows:• Match score if S1[i] = S2[j].• Mismatch score if S1[i] ≠ S2[j]• Gap penalty score if S1[i] = ” -” or S2[j] = ” -” , or both
The match and mismatch scores obtained by looking up a substitution matrix in which each cell contains a score for the corresponding pair of based
Computer & Internet Architecture LabCSIE, National Cheng Kung University
5
Smith-Waterman algorithm Given two sequences S1 and S2 of length m and n Let H be a score matrix of size (m+1)(n+1). The
recursive formula is defined as:
where 1 ≤ i ≤ m, 1 ≤ j ≤ n, sbt(S1[i], S2[j]) gets the match or mismatch score, g is the score of gap penalty
H[ i ][ 0 ] and H[ 0 ][ j ] for 0 ≤ i ≤ m and 0 ≤ j ≤ n are initialized as 0
Computer & Internet Architecture LabCSIE, National Cheng Kung University
6
gjiHgjiH
jSiSsbtjiHjiH
]1][[]][1[
])[],[(]1][1[max]][[
21
Smith-Waterman algorithm
Computer & Internet Architecture LabCSIE, National Cheng Kung University
7
Smith-Waterman algorithm
Computer & Internet Architecture LabCSIE, National Cheng Kung University
8
0 1 2 3 4 5 6 7 8 9 10
0 C C A C C G G G G C
1 0←
0←
0←
0←
0←
0←
0←
0←
0←
0←
0←
2 G 0↑
0↖
0↖
0↖
0↖
0↖
6↖
6↖
6↖
6↖
2←
3 C 0↑
9↖
9↖
5←
9↖
9↖
5←
3↖
3↖
3↖
15↖
4 A 0↑
5↑
9↖
13↖
9←
9↖
9↖
5↖
3↖
3↖
11↑
5 G 0↑
1↑
5↑
9↖
10↖
6↖
15↖
15↖
11↖
9↖
7↑
6 G 0↑
0↖
1↑
5↖
6↖
7↖
12↖
21↖
21↖
17↖
13↑
7 G 0↑
0↖
0↖
1↖
2↖
3↖
13↖
18↖
27↖
27↖
23↑
8 T 0↑
0↖
0↖
0↖
0↖
1↖
9↑
14↑
23↑
25↖
26↖
9 T 0↑
0↖
0↖
0↖
0↖
0↖
5↑
10↑
19↑
21↖
24↖
10 A 0↑
0↖
0↖
4↖
0↖
0↖
1↑
6↑
15↑
19↖
21↖
11 G 0↑
0↖
0↖
0↖
1↖
0↖
6↖
7↖
12↖
21↖
17↑
S1 = C A G G G
S2 = C C G G G
j →i↓
Parallelism mechanisms1. exploits the parallel characteristics of cells in the
anti-diagonals, that is, each anti-diagonal cell can be computed independently of the others
Computer & Internet Architecture LabCSIE, National Cheng Kung University
9
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)
(i-2)
(i-1)
i
Parallelism mechanisms2. The matrix is computed vector by vector in order
parallel to the query sequence
Computer & Internet Architecture LabCSIE, National Cheng Kung University
10
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7)
(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7)
(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
Parallelism mechanisms3. processe the vector in parallel to the query
sequence and at the same time it processes the vector from the next column in anti-diagonal direction
Computer & Internet Architecture LabCSIE, National Cheng Kung University
11
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7) (2,8)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7) (3,8)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7) (5,8)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7) (6,8)
(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7) (7,8)
(8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (8,7) (8,8)
(9,1) (9,2) (9,3) (9,4) (9,5) (9,6) (9,7) (9,8)
(10,1) (10,2) (10,3) (10,4) (10,5) (10,6) (10,7) (10,8)
(11,1) (11,2) (11,3) (11,4) (11,5) (11,6) (11,7) (11,8)
(12,1) (12,2) (12,3) (12,4) (12,5) (12,6) (12,7) (12,8)
Proposed method Redefine the recursive formula of SW algorithm so
that one row of the matrix can be calculated in parallel
Computer & Internet Architecture LabCSIE, National Cheng Kung University
12
0 1 2 3 4 5 6 7 8
b1 b2 b3 b4 b5 b6 b7 b8
0 0 0 0 0 0 0 0 0 0
1 a1 0
2 a2 0
3 a3 0
j →i↓
Proposed method In order to compute the scores in parallel, we have
the following new formula:
Computer & Internet Architecture LabCSIE, National Cheng Kung University
13
8..1,0
]][0[
),(]1][0[
max]][1[1
'
jforgjH
basbtjH
jHj
.8..1,0
]1][1[]][1[
max]][1[
'
jforgjHjH
jH
Proposed method
Computer & Internet Architecture LabCSIE, National Cheng Kung University
14
8..1,0
]][0[
),(]1][0[
max]][1[1
'
jforgjH
basbtjH
jHj
Proposed method
expand to
Computer & Internet Architecture LabCSIE, National Cheng Kung University
15
.8..1,0
]1][1[]][1[
max]][1[
'
jforgjHjH
jH
Proposed methodNext, the expanded formula is as follows.
Computer & Internet Architecture LabCSIE, National Cheng Kung University
16
Proposed methodNext, the expanded formula is as follows.
Computer & Internet Architecture LabCSIE, National Cheng Kung University
17
Proposed methodThe formula (a) can be performed by the threads assigned to the cells in parallel. After the first step is finished, formulas (b) and (c) are then executed.
Computer & Internet Architecture LabCSIE, National Cheng Kung University
18
Where 1 i m, ≦ ≦ 1 j n. ≦ ≦
use prefix max scan to speed up the computation of the formula (b)The prefix max scan is defined as:Input: array[a1, a2,..., an].Output: array[a1, M(a1, a2), M(a1, a3),...., M(a1, an)]The prefix max scan is performed in k steps where k = log2(n). For step i = 0 to k – 1, the element array[j] will be compared with array[j–2i] for j = 2i + 1 to n in parallel, and store the result in array[j]
Computer & Internet Architecture LabCSIE, National Cheng Kung University
19
Optimization
Computer & Internet Architecture LabCSIE, National Cheng Kung University
20
a1 M(a1, a2) M(a2, a3) M(a3, a4) M(a4, a5) M(a5, a6) M(a6, a7) M(a7, a8)
a1 a2 a3 a4 a5 a6 a7 a8
a1 M(a1, a2) M(a1, a3) M(a1, a4) M(a2, a5) M(a3, a6) M(a4, a7) M(a5, a8)
a1 M(a1, a2) M(a1, a3) M(a1, a4) M(a1, a5) M(a1, a6) M(a1, a7) M(a1, a8)
H’[1][1]+7g H’[1][2]+6g
H’[1][3]+5g H’[1][4]+4g H’[1][5]+3g H’[1][6]+2g H’[1][7]+g H’[1][8]
-7g -6g -5g -4g -3g -2g -g -0
H’[1][1]
gHgH
HM
2]1][1[']2][1[']3][1['
gHH
M]1][1[']2][1['
gH
gHH
M
7]1][1['
]7][1[']8][1['
Step 1
Step 2
Step 3
Optimization
CUDA implementation One block to process one alignment task.
Computer & Internet Architecture LabCSIE, National Cheng Kung University
21
CUDA implementation Each thread in the block is used to process each cell
of one row of the matrix
Computer & Internet Architecture LabCSIE, National Cheng Kung University
22
Experimental results execute on NVIDIA C1060 graphics card, with 30
MPs comprising 240 SPs and 4GB RAM, installed in a PC with an Intel Core 2 6420 2.13 GHz processor running the Ubuntu OS
Computer & Internet Architecture LabCSIE, National Cheng Kung University
23
Experimental results five databases are produced by Seq-Gen [16]. Each
database includes 100 sequences of equal length, ranging from 128 to 8192
the substitution matrix BLOSUM 62 is used with gap penalty - 4
Computer & Internet Architecture LabCSIE, National Cheng Kung University
24
Experimental results
Computer & Internet Architecture LabCSIE, National Cheng Kung University
25
Experimental results
Computer & Internet Architecture LabCSIE, National Cheng Kung University
26