A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf ·...

54
GARY BENSON, YOZEN HERNANDEZ, & JOSHUA LOVING BIOINFORMATICS PROGRAM BOSTON UNIVERSITY [email protected] A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm

Transcript of A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf ·...

Page 1: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

GARY BENSON, YOZEN HERNANDEZ, & JOSHUA LOVING

B I O I N F O R M A T I C S P R O G R A M

B O S T O N U N I V E R S I T Y J L O V I N G @ B U . E D U

A Bit-Parallel, General Integer-Scoring

Sequence Alignment Algorithm

Page 2: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Introduction: Problem Description

Input: • Sequences X and Y • Integer weights M; I; G

match; mismatch; indel or gap that define a similarity or distance scoring function S

Output: • Calculate the global alignment score for X and Y

Page 3: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Introduction

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Match = 2, Mismatch = -3, Indel = -5

Global Alignment – Needleman-Wunsch Alignment Scoring Matrix

Sequence X

Seq

ue

nce

Y

Page 4: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Introduction

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

Page 5: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Introduction

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Match = 2, Mismatch = -3, Indel = -5

Penalty from beginning

Page 6: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Introduction

- A C T G C A A - 0 0 0 0 0 0 0 0 A 0 2 -3 -3 -3 -3 2 2 G 0 -3 -1 -6 -1 -6 -3 -1 T 0 -3 -6 1 -4 -4 -8 -6 C 0 -3 -1 -4 -2 -2 -7 -11 A 0 2 -3 -9 -7 -5 0 -5 A 0 2 -1 -6 -7 -10 -3 2

Match = 2, Mismatch = -3, Indel = -5

No initial Penalty

Page 7: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Page 8: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Page 9: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Page 10: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Page 11: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

Page 12: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

Page 13: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

Page 14: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

Page 15: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

Page 16: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

Page 17: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30

Page 18: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30

Page 19: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30

Page 20: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30

Page 21: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Needleman-Wunsch Alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

Page 22: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

Page 23: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

Page 24: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

Page 25: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30

Page 26: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30

Page 27: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30

Page 28: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel alignment

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30

Match = 2, Mismatch = -3, Indel = -5

Integer Scores

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Page 29: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Motivation

Cheaper sequencing of DNA means that larger datasets are being generated

Sequence analysis of such large datasets can be accelerated by faster alignment algorithms

Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: www.genome.gov/sequencingcosts. Accessed June 10, 2013.

Page 30: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Earlier Bit-parallel Pattern Matching Algorithms

Longest Common Subsequence (LCS) (Allison & Dix, 1986;

Crochemore et al, 2001; Hyyro, 2004)

Unit-cost edit-distance (Myers, 99; Hyyro et al, 2005)

K-differences (agrep; Wu-Manber, 92)

Regular expression search (Navarro, 04)

Arbitrary weights edit-distance (Bergeron&Hamel, 02)

Page 31: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Algorithm Foundation

Match = 2, Mismatch = -3, Indel = -5

Page 32: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Algorithm Foundation

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Match = 2, Mismatch = -3, Indel = -5

-3 -8

-1 -6

Page 33: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

-3 -8

-1 -6

-5 2 2 -5

Algorithm Foundation

Match = 2, Mismatch = -3, Indel = -5

Page 34: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

-3 -8

-1 -6

∆H

∆V ∆VNEXT

∆HNEXT

-5 2 2 -5

Algorithm Foundation

Match = 2, Mismatch = -3, Indel = -5

Page 35: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

∆H -5

∆V -

-5

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

-3 -8

-1 -6

-5 2 2 -5

∆H

∆V ∆VNEXT

∆HNEXT

Algorithm Foundation

Match = 2, Mismatch = -3, Indel = -5

Output

Input

Page 36: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Function Table

∆VNEXT output values given ∆V and ∆H input values

Page 37: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

What is the range of differences?

Page 38: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

What is the range of differences?

Match = 2, Mismatch = -3, Indel = -5

Page 39: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

What is the range of differences?

Match = 2, Mismatch = -3, Indel = -5

Minimum Value = Indel = -5 Maximum Value = Match – Indel = 2 - (- 5) = 7

Page 40: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Generalized Function Table

M = match score I = mismatch score G = indel (gap) penalty

Page 41: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Zones in Example Function Table

Page 42: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Bit-parallel Representation

Bit-vectors: computer words 64 bits long

A bit-vector for each possible difference, both horizontally and vertically (∆V and ∆H)

A set of Match vectors (MatchA, MatchC, MatchG, MatchT in the DNA case)

We keep track of match positions because they are a special case in the function table.

Page 43: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Example ∆H Bit-vector Storage

- A C T G C A A

C -20 -13 -6 -4 -2 -2 -7 -12

∆H values 7 7 2 2 0 -5 -5

∆H Bit-Vectors 7 1 1 0 0 0 0 0 6 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 0 -3 0 0 0 0 0 0 0 -4 0 0 0 0 0 0 0 -5 0 0 0 0 0 1 1

Page 44: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Example Match Vectors

- A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2

Match Vectors MatchesA 1 0 0 0 0 1 1 MatchesC 0 1 0 0 1 0 0 MatchesT 0 0 1 0 0 0 0 MatchesG 0 0 0 1 0 0 0

Page 45: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Algorithm

Start with ∆H values

Compute ∆V values

Then compute the new ∆H values

Page 46: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Algorithm: Example

- A C T G C A A

C -20 -13 -6 -4 -2 -2 -7 -12

A -25 -18 -11 -9 -7 -5 0 -5

∆H values 7 7 2 2 0 -5 -5

-5 -5 -5 -5

-3 7 7 ∆V values

-5

Page 47: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Time Complexity

𝑂 𝑧𝑛𝑚

𝑤

where n = |Sequence Y| m = |Sequence X| w = length of computer word

𝑧 =(𝑀 − 2𝐺 + 1)2− (𝐼 − 2𝐺)2

2

Page 48: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Implementation

Python script that generates C code based on input parameters (M; I; G)

Will eventually have web page for download of code

Page 49: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Experimental Analysis

Compared BHL with

Wu-Manber K-differences algorithm

Unit cost edit distance bit-parallel algorithm

Longest Common Subsequence bit-parallel algorithm

Needleman-Wunsch dynamic programming algorithm

Computed 25 million alignments with each program

Each DNA sequence was 63 bases long

All programs compiled using GCC, optimization level O3

Computation done on a typical desktop computer

Page 50: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Results: Comparison to NW algorithm

Page 51: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Results: comparison to bit-parallel algorithms

Page 52: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Current and Future Work

Implementation for sequences longer than one word

Single Instruction Multiple Data (SIMD) implementation

BLOSUM and PAM type substitution matrix support

General Purpose Graphics Processing Unit (GPGPU) implementation

New bit-parallel representations for greater speed and compactness of data

Page 53: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Acknowledgements

My advisor, Dr. Gary Benson

Lab members

Yevgeniy Gelfand Yozen Hernandez

Funded by the National Science Foundation (NSF)

Page 54: A Bit-Parallel, General Integer-Scoring Sequence Alignment ...stelo/cpm/cpm13/04_loving.pdf · Sequence Alignment Algorithm . Introduction: Problem Description Input: •Sequences

Questions