RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler...

129
Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An Duc Dang Hannes Hauswedell Sebastian Thieme Advanced Algorithms for Bioinformatics (P4) K. Reinert and S. Andreotti SoSe 2010 FU Berlin May 13, 2010 N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme RNA-Seq

Transcript of RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler...

Page 1: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

RNA-Sequencing

Nicolas Balcazar Corinna Blasse An Duc DangHannes Hauswedell Sebastian Thieme

Advanced Algorithms for Bioinformatics (P4)K. Reinert and S. Andreotti

SoSe 2010FU Berlin

May 13, 2010

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 2: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 3: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

What is RNA-Seq?

What is RNA-Seq?

I Also called "Whole Transcriptome Shotgun Sequencing"I Technique to sequence cDNA in order to get information

about a RNA sample content

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 4: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 5: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Why do we use RNA-Seq?

I Crucial tounderstand theprocesses inside acell

I DNA cannot give usall information unliketranscriptomes

I mRNAs specifycells’ identity

I mRNAs governcells’ activity dueto environmentalconditions

apoptosis pathway

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 6: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Why do we use RNA-Seq?

I Crucial tounderstand theprocesses inside acell

I DNA cannot give usall information unliketranscriptomes

I mRNAs specifycells’ identity

I mRNAs governcells’ activity dueto environmentalconditions

apoptosis pathway

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 7: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Why do we use RNA-Seq?

I Crucial tounderstand theprocesses inside acell

I DNA cannot give usall information unliketranscriptomes

I mRNAs specifycells’ identity

I mRNAs governcells’ activity dueto environmentalconditions

apoptosis pathway

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 8: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Why do we use RNA-Seq?

I Crucial tounderstand theprocesses inside acell

I DNA cannot give usall information unliketranscriptomes

I mRNAs specifycells’ identity

I mRNAs governcells’ activity dueto environmentalconditions

apoptosis pathway

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 9: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 10: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 11: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 12: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 13: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 14: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Why do we use RNA-Seq?

Advantages

I High-throughput sequencingI No bacterial cloning of cDNA needed (no bacterial cloning

constraints)I High coverageI Can discover new exons, genesI Can handle RNA splicingI Low cost

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 15: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Pipeline

PipelineI Poly(A)-selection of RNAI Fragmentation of RNA to an

average lengthI Conversion into cDNAI SequencingI Mapping of the reads onto the

genomeI Calculation of the transcript

prevalence

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 16: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Mapping

I Read-mapping is the semi-globalalignment of many (very) short sequences(reads) to a long sequence (the genome)

I Alignment: approximate string matching

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 17: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Important Questions

Questions regarding choice of algorithm and implementation:

Input How long are the reads? How many are there?How big is the genome?What technology was used?Known error-profile or quality values?

Output Gapped or ungapped alignments?What kind of scoring / distance measurement?

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 18: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Important Questions

Questions regarding choice of algorithm and implementation:

Input How long are the reads? How many are there?How big is the genome?What technology was used?Known error-profile or quality values?

Output Gapped or ungapped alignments?What kind of scoring / distance measurement?

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 19: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Tools

Tools designed for short sequences(limited number of mismatches, short gaps)

I BowtieI RazerSI SOAPI MAQI ZOOM

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 20: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Overview

Two main phases

1. Filtration / non Filtration (other preprocessing)2. Verification

⇒ Different (combinations of) algorithms available

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 21: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Overview

Two main phases

1. Filtration / non Filtration (other preprocessing)2. Verification

⇒ Different (combinations of) algorithms available

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 22: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Outline - Lectures

Lecture 1 Non filtering algorithmBowtie

I BWT

Lecture 2 Filtering algorithmFiltering

I PidgeonholeI q-gram

RazerS

Lecture 3 Repeat Resolution

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 23: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Outline - Lectures

Lecture 1 Non filtering algorithmBowtie

I BWT

Lecture 2 Filtering algorithmFiltering

I PidgeonholeI q-gram

RazerS

Lecture 3 Repeat Resolution

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 24: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Overview

Outline - Lectures

Lecture 1 Non filtering algorithmBowtie

I BWT

Lecture 2 Filtering algorithmFiltering

I PidgeonholeI q-gram

RazerS

Lecture 3 Repeat Resolution

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 25: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

The paper

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 26: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

I Fast, memory-efficientI Burrows-Wheeler indexing→ align >25 million reads per

CPU hour (memory footprint aprox. 1.3 GB)I Extends previous Burrows-Wheeler techniques with a

novel quality-aware backtracking algorithm that permitsmismatches

I Mismatches→ Backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 27: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

I Fast, memory-efficientI Burrows-Wheeler indexing→ align >25 million reads per

CPU hour (memory footprint aprox. 1.3 GB)I Extends previous Burrows-Wheeler techniques with a

novel quality-aware backtracking algorithm that permitsmismatches

I Mismatches→ Backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 28: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

I Fast, memory-efficientI Burrows-Wheeler indexing→ align >25 million reads per

CPU hour (memory footprint aprox. 1.3 GB)I Extends previous Burrows-Wheeler techniques with a

novel quality-aware backtracking algorithm that permitsmismatches

I Mismatches→ Backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 29: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

I Fast, memory-efficientI Burrows-Wheeler indexing→ align >25 million reads per

CPU hour (memory footprint aprox. 1.3 GB)I Extends previous Burrows-Wheeler techniques with a

novel quality-aware backtracking algorithm that permitsmismatches

I Mismatches→ Backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 30: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

Goal? → map reads back to reference genomeI Exact matchI Inexact match

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 31: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

Goal? → map reads back to reference genomeI Exact matchI Inexact match

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 32: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Bowtie

Goal? → map reads back to reference genomeI Exact matchI Inexact match

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 33: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Exact Match

→ Exact string matching→ EXACTMATCH

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 34: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Inexact Match

I Mismatches or indelsI Backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 35: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Efficiency

I compress genome→ space efficientI indexation of genome→ fast searchI ideas? → suffix arrayI or even better→ Burrows-Wheeler indexing (BWT)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 36: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Efficiency

I compress genome→ space efficientI indexation of genome→ fast searchI ideas? → suffix arrayI or even better→ Burrows-Wheeler indexing (BWT)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 37: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Efficiency

I compress genome→ space efficientI indexation of genome→ fast searchI ideas? → suffix arrayI or even better→ Burrows-Wheeler indexing (BWT)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 38: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Efficiency

I compress genome→ space efficientI indexation of genome→ fast searchI ideas? → suffix arrayI or even better→ Burrows-Wheeler indexing (BWT)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 39: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

An introduction

Efficiency

I compress genome→ space efficientI indexation of genome→ fast searchI ideas? → suffix arrayI or even better→ Burrows-Wheeler indexing (BWT)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 40: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Short revision

Suffix Arrays

I array of integersI gives the starting positions of suffixes of a stringI in lexicographical orderI very space economical data structure

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 41: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Short revision

Suffix ArraysExample

Consider the string "‘abracadabra$"’1 2 3 4 5 6 7 8 9 10 11 12 indexa b r a c a d a b r a $ string.

It has twelve suffixes:abracadabra$,bracadabra$,racadabra$,...,a$,$

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 42: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Short revision

Suffix ArraysExample

index sorted suffix12 $11 a$8 abra$1 abracadabra$4 acadabra$6 adabra$9 bra$2 bracadabra$5 cadabra$7 dabra$

10 ra$3 racadabra$

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 43: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Short revision

⇒ For the string "abracadabra$", the suffix array is[12, 11, 8, 1, 4, 6, 9, 2, 5, 7, 10, 3 ].

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 44: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Burrows-Wheeler transformation

I published by Michael Burrows and David Wheeler in 1994I reversible transformation algorithm to a input textI does no data compression itself, but might lead to effective

compression

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 45: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 46: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT

I Input: string S of length N with characters of orderedalphabet Σ

I i.e.: S = mississippi , N = 11,Σ ={i ,m,p, s}

I Output: string L of length N + 1 with characters of orderedalphabet Σ∪ special character and index I

I i.e.: L = ipssm#pissii , I = 5

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 47: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT1. sort rotations

I append an special character /∈ Σ like # to S, # islexicographically minimal

I build a conceptual N + 1×N + 1 Matrix M whose rows arecyclic shifts of S

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 48: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT1. sort rotations

I append an special character /∈ Σ like ′#′ to S, # islexicographically minimal

I build a conceptual N + 1×N + 1 Matrix M whose rows arecyclic shifts of S

I sort rows lexicographically

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 49: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT2. find last characters of rotations

I string L or F is thelast respectively firstcolumn of matrix Mi.e.:L = ipssm#pissii

I characters in L[i] arepredecessors ofcharacters in F [i]

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 50: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT2. find last characters of rotations

I string L or F is thelast respectively firstcolumn of matrix Mi.e.:L = ipssm#pissii

I characters in L[i] arepredecessors ofcharacters in F [i]

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 51: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT2. find last characters of rotations

I rank preserving property:posL(c1) < posL(c2)⇔posF (c1) < posF (c2),whereas posX (c): positionof character c in string X

I index I is row number ofinput S∪ special character(i.e. I = 5)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 52: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWT2. find last characters of rotations

I rank preserving property:posL(c1) < posL(c2)⇔posF (c1) < posF (c2),whereas posX (c): positionof character c in string X

I index I is row number ofinput S∪ special character(i.e. I = 5)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 53: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Principle of BWTFig. 7: Part of an BWT ouput example from the original BWTpaper.

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 54: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Principle of BWT

Advantages of BWT

I due to many repetitions in a big input S string L hasregions of same characters

I → good for compression algorithms (may be talked aboutnext lecture)

I BWT stores string of characters instead of integer indexeslike suffix arrays! → less space for big text input like DNA

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 55: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 56: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

I Input: string L of length N + 1 with characters of orderedalphabet Σ ∪ special character and index I

I i.e.: L = ipssm#pissii , N = 11,Σ ={i ,m,p, s}, specialcharacter = # and I = 5

I Output: string S of length N with characters of orderedalphabet Σ

I i.e.: S =mississippi

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 57: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

1. find F using LI every column in matrix M is

permutation of string S#

I first column F and last column Lalso permutations of S# and ofone another

I M has sorted rows and F is firstcolumn⇒ F is sorted

I get F by sorting L

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 58: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

1. find F using LI every column in matrix M is

permutation of string S#

I first column F and last column Lalso permutations of S# and ofone another

I M has sorted rows and F is firstcolumn⇒ F is sorted

I get F by sorting L

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 59: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI build mapping vector T where

T [i] is position of the character inL corresponding to character F [i](bijective mapping)

I i.e.: T ={5,0,7,10,11,4,1,6,2,3,8,9}(rank preserving!)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 60: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI recover S by starting at position

of # in L, which is the lastcharacter in S ∪#

I successor of # is the character ofF in the same row

I i.e.: m is first character of S

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 61: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI use mapping vector T to get back

from m in F to m in LI jump from last character of this

row to first character to get nextsuccessor character of S

I i.e.: we get mi

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 62: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI use mapping vector T to get back

from m in F to m in LI jump from last character of this

row to first character to get nextsuccessor character of S

I i.e.: we get mi

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 63: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI use mapping vector T to get back

from i in F to i in LI jump to frontI i.e.: we get misI go on until # in F reached

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 64: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI use mapping vector T to get back

from i in F to i in LI jump to frontI i.e.: we get misI go on until # in F reached

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 65: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Retransformation

Retransformation

2. F to L mappingI use mapping vector T to get back

from i in F to i in LI jump to frontI i.e.: we get misI go on until # in F reached

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 66: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Exactmatch

Where are we?

We haveI ReadsI Index of the

reference genome(BWT)

We wantI Exact matches

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 67: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Exactmatch

FM-IndexFull-Text Index In Minute Space

I Motivation: Finding reads by looking at only a small portionof the compressed text

I Use the relation between the Burrows-Wheelercompression algorithm and the suffix array data structure

I FM-Index = Compressed suffix array that connects boththe compressed text and the full-text indexing information

I Supports two basic operations:I Count - return number of occurrences of P in TI Locate - find all positions of P in T

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 68: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 69: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

= Algorithm that counts the number of occurrences of P in T

I Input: L, C, Occ()I Output: Number of

occurrences

Example:I si→ 2I pssi→ 0

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 70: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I C[1,...,|Σ|]: C[c] contains the total number of chars in Twhich are alphabetically smaller than c (including charrepetitions)

ExampleI C[] for

mississippi#

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 71: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I C[1,...,|Σ|]: C[c] contains the total number of chars in Twhich are alphabetically smaller than c (including charrepetitions)

ExampleI C[] for

mississippi#

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 72: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I C[1,...,|Σ|]: C[c] contains the total number of chars in Twhich are alphabetically smaller than c (including charrepetitions)

ExampleI C[] for

mississippi#

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 73: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I Occ(c,q): Number of occurrences of char c in prefix L[1,q]

Example

Occ(s,5) =

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 74: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I Occ(c,q): Number of occurrences of char c in prefix L[1,q]

Example

Occ(s,5) = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 75: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I Occ(c,q): Number of occurrences of char c in prefix L[1,q]

Example

Occ(s,5) = 2Occ(s,12) =

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 76: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

I Occ(c,q): Number of occurrences of char c in prefix L[1,q]

Example

Occ(s,5) = 2Occ(s,12) = 4

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 77: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

L-to-F Mapping

I Formula to determine the position ofchar L[i] in F:pos(i) = C[L[i]] + Occ(L[i], i)

I Why?I C[L[i]] = Number of characters before

the first L[i] in FI Occ(L[i], i) = Occurrence of character

L[i] in string L[1,i]I Example: i=9

pos(9) = C[s] + Occ(s,9)

= 8 + 3 = 11Occ(s, 9)= 3

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 78: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern of length = 1?

I Remember: BWT matrix rows = sortedsuffixes of T

I All suffixes prefixed by P occupies in aset of rows

I Possible to calculate the first and lastposition

I Occurrences = Last - First +1I How do we search?

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 79: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern of length = 1?

I Remember: BWT matrix rows = sortedsuffixes of T

I All suffixes prefixed by P occur in aconsecutive set of rows

I Possible to calculate the first and lastposition

I Occurrences = Last - First +1I How do we search?

I Use CI Find First and Last

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 80: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern of length = 1?

I Remember: BWT matrix rows = sortedsuffixes of T

I All suffixes prefixed by P occur in aconsecutive set of rows

I Possible to calculate the first and lastposition

I Occurrences = Last - First +1I How do we search?

I Use CI Find First and Last

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 81: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern of length = 1?

I Remember: BWT matrix rows = sortedsuffixes of T

I All suffixes prefixed by P occur in aconsecutive set of rows

I Possible to calculate the first and lastposition

I Occurrences = Last - First +1I How do we search?

I Use CI Find First and Last

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 82: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 83: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 84: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 85: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 86: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 87: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 88: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 89: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 90: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 91: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 92: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 93: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 94: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithmHow do we find a pattern ssi?

I Find First and Last of P[j]I Determine L[First, Last ]I Find the first P[j-1]in L[First, Last ]

Find the last P[j-1]in L[First, Last ]I Use L-to-F mappingI j = j-1→ Repetition

I Termination by finding ssiI Occurrence = 12-11+1 = 2

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 95: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-search algorithm

Backward-search algorithm

Algorithm backward_search(P[1,p])

(1) i←p, c←P[p], First←C[c]+1, Last←C[c+1](2) while ((First≤Last) and (i≥ 2)) do

(3) c←P[i-1];(4) First←C[c]+ Occ(c,First-1)+1;(5) Last←C[c]+ Occ(c,Last);(6) i←i-1;

(7) if (Last<First)then return „no rows prefixed by P[1,p]“else return 〈First, Last〉

I Running time: O(p)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 96: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 97: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in T

I Every i= First, First+1, ..., Lastpresents the pattern

I Possible because all charactershave the same order in L and F

I Locate the position in T for every i

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 98: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TBackward-step

I Locate the position posT (i) in T forevery i

I Problem: Can’t find posT (i) in T directlyBut: It’s possible to find it indirectly

I Given index j and it’s position in T→ Can find posT (i) such thatposT (i) = posT (j)− 1

I Backward_step algorithm

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 99: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TBackward-step

I How does it work?I You have: j and posT (j), a

compressed LI You want: index i of L of T[posT (j)-1]

I Determine L[j]:Compare Occ(c, j) with Occ(c,j-i) forevery c ∈ Σ ∪ {#}

I Use L-to-F mapping:pos(i)=C[L[j]] + Occ(L[j], i)

I Calling Occ() is O(1)→ Backward_step in O(1)

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 100: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TPreprocessing

I Mark every x-th character from Tand its corresponding position inL

I Store them in SI Example: pos(r3) = 8

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 101: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in T

I Find position of row i:I If ri is marked, then return pos(ri )

from SI Otherwise use Backward_step(i)

I Repeat t times until you find amarked row j

I posT (i) = posT (j) + t

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 102: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I Determine all positions for i= First, First+1, ..., Last

I First = 9, Last = 10I i= 10

I r10 is markedI pos(10) = 4

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 103: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I Determine all positions for i= First, First+1, ..., Last

I First = 9, Last = 10I i= 10

I r10 is markedI pos(10) = 4

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 104: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I Determine all positions for i= First, First+1, ..., Last

I First = 9, Last = 10I i= 10

I r10 is markedI pos(10) = 4

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 105: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 106: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 107: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 108: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 109: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 110: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 111: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 112: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 113: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 114: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 115: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 116: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Backward-step algorithm

Algorithm to locate P in TFinding si

I i = 9I r9 is not markedI Backward_step(9), t=1I r11 is not markedI Backward_step(11), t=2I r4 is not markedI Backward_step(4), t=3I r10 is markedI pos(9) = pos(10) +3 =

4 + 3 = 7

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 117: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 118: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

Mismatches

Due to:I sequencing errorsI differences between reference and query organisms

EXACTMATCH insufficient

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 119: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

Backtracking algorithm

I GreedyI Depth-First SearchI Allowed mismatches ↑ ⇒ runtime ↑

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 120: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

ExampleACGCACGTACGG

root

1

CACGTACGG

5

G

9

TACGG

2 6 10

4 3 7 11

8

ACG C G TACGG

G ACGTACGG CACGTACGG G TACGG

CACGTACGG G TACGG

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 121: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

ggta

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 122: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

ggta

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 123: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

ggta

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 124: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

How are mismatches handled?

ggta

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 125: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Excessive backtracking

OutlineReview RNA-Seq

Why do we use RNA-Seq?Read MappingBowtieBurrows-Wheeler transformation

Principle of BWTRetransformation

ExactmatchBackward-search algorithmBackward-step algorithm

BacktrackingHow are mismatches handled?Excessive backtracking

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 126: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking

Excessive backtracking

Excessive backtracking

I Avoid excessive backtrackingI Solution→ double indexingI and backtracking ceiling

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 127: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Appendix

Sources

Ben Langmead, Cole Trapnell, Mihai Pop & Steven L Salzberg.Ultrafast and memory-efficient alignment of short DNAsequences to the human genome.Genome Biology 2009, 10:R25.

M. Burrows & D.J. WheelerA block-sorting lossless data compression algorithm.Digital SRC Research Report 1994.

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 128: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Appendix

Thank you

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq

Page 129: RNA-Sequencing - Freie Universität · Review RNA-Seq Read Mapping Bowtie Burrows-Wheeler transformation Exactmatch Backtracking RNA-Sequencing Nicolas Balcazar Corinna Blasse An

Appendix

Questions?

N. Balcazar, C. Blasse, D. Dang, H. Hauswedell, S. Thieme

RNA-Seq