Foldalign 2.2: multithreaded structural RNA alignment

Post on 20-Apr-2022

5 views 0 download

Transcript of Foldalign 2.2: multithreaded structural RNA alignment

Foldalign 2.2: multithreaded structural RNA alignment

Daniel Sundfeld1,3 Jakob Havgaard1,2 Alba C. M. A. de Melo3

Jan Gorodkin1,2

1Center for non-coding RNA in Technology and Health

2IKVHUniversity of Copenhagen

3Department of Computer ScienceUniversity of Brasilia

19/02/2015 - Winterseminar

Sundfeld et al Foldalign 2.2 19/02/2015 1 / 16

Foldalign

*

Algorithms for simultaneously aligning and folding:

the probabilistic methods: usually based on Stochastic ContextFree Grammars, that use statistical learning to creates a solutionbased on previously known answers;

Sankoff-like methods: use some method of minimization of the freeenergy to calculates how elements of an RNA structure contribute tofree energy.

*M. Ziv-Ukelson, I. Gat-Viks, Y. Wexler, and R. Shamir. A faster algorithm for rna co-folding. In Proceedings of the 8thInternational Workshop on Algorithms in Bioinformatics, pages 174–185, 2008.

Sundfeld et al Foldalign 2.2 19/02/2015 2 / 16

Foldalign

The time complexity O(L6), where L is the sequence length of thesequences, makes the Sankoff algorithm prohibited for long sequences.Foldalign use several constraints to reduce the time requirement:

The final alignment is limited to λ nucleotides, and the maximumlength difference between any two subsequences being aligned islimited to δ nucleotides;

Subalignments with score bellow to a threshold are eliminated.

With this constraints, the time complexity is reduced to O(L2λ2δ2)

Sundfeld et al Foldalign 2.2 19/02/2015 3 / 16

FoldalignFoldalign recursion function.

The simplified Foldalign recursion:

Di,j,k,l = max

Di+1,j−1,k+1,l−1 + Sbp(ni , nj , nk , nl , σbp) (a)

Di+1,j−1,k,l + SbpiI (ni , nj ,−,−, σbpiI ) (b)

Di,j,k+1,l−1 + SbpiK (−,−, nk , nl , σbpiK ) (c)

Di+1,j,k+1,l + Sal (ni , nk , σal ) (d)

Di,j−1,k,l−1 + Sar (nj , nl , σar ) (e)

Di+1,j,k,l + SglI (ni ,−, σglI ) (f)

Di,j−1,k,l + SglrI (nj ,−, σglrI ) (g)

Di,j,k+1,l + SglK (−, nk , σglK ) (h)

Di,j,k,l−1 + SgrK (−, nl , σgrK ) (i)max

i<m<jk<n<l

{Di,m,k,n + Dm+1,j,n+1,l + σmbl} (j)

(1)

Sundfeld et al Foldalign 2.2 19/02/2015 4 / 16

FoldalignGraphical Representantion

J. Gorodkin and W. L. Ruzzo. RNA Sequence, Structure and Function: Computational and Bioinformatic Methods. HumanaPress, 2014.

Sundfeld et al Foldalign 2.2 19/02/2015 5 / 16

FoldalignFoldalign algorithm.

1: for i = LengthI → 1 do

2: for k = LengthK → 1 do3: for j = 1→ λ do

4: for l = (j − δ)→ (j + δ) do

5: if Msi,j,k,l can be the right part of bifurcation then6: Mli,j,k,l = Msi,j,k,l7: end if8: Expand Position(i , j , k, l)9: for Wm = 1→ (λ− j) do

10: for Wn = (Wm − δ)→ (Wm + δ) do

11: Expand MB(i , j , k, l ,Wm,Wn)12: end for13: end for14: end for15: end for16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 6 / 16

Parallel FoldalignParallel algorithm.

The multithreading algorithm is based on the fact that cells in the i loopmight be calculated in parallel. Some data dependency exist inside theloop, conditions are added to avoid race conditions.

Nthreads are created, each one have different i values:

Thread 1:

1: for i = LI → 1 step Nthreads do2: for k = LK → 1 do3: (...)

16: end for17: end for

Thread 2:

1: for i = (LI − 1)→ 1 step Nthreads do2: for k = LK → 1 do3: (...)

16: end for17: end for

(...)

Sundfeld et al Foldalign 2.2 19/02/2015 7 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 19 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = ... do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 1 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 19 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Single Thread:

1: for i = 19 do2: for k = 19 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 19 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 18 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = (...) do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = (...) do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 19 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 18 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Parallel FoldalignParallel algorithm.

Other mechanisms are necessary to the final implementation:

Threadpoll: To avoid thread creation overhead, a fixed number ofthreads are created and each thread wait for a i value.

Protection: The global memory need protection mechanisms toavoid simultaneous writes.

and other implementation details...

Sundfeld et al Foldalign 2.2 19/02/2015 10 / 16

ResultsMultithreaded Programming: Expectation vs. Reality

Sundfeld et al Foldalign 2.2 19/02/2015 11 / 16

ResultsRandom Sequences Performance

Sundfeld et al Foldalign 2.2 19/02/2015 12 / 16

ResultsReal Sequences Performance

16S Ribosomal RNA from the Gutell websitehttp://www.rna.icmb.utexas.edu/(|S1| = 2315, |S2| = 2019, λ = 2019, δ = 100).

Sundfeld et al Foldalign 2.2 19/02/2015 13 / 16

Conclusion

A multithreaded version of Foldalign is now available.The implementation were carefully designed to keep all previous functions,and it open new possibilities to search for structured RNAs in much longersequences in reasonable time, and in our experiments, it was possible tosearch 2.5-2.7 times faster with random sequences, and 4.18 times fasterwith 8 cores, 5.03 times faster with 14 cores.

Sundfeld et al Foldalign 2.2 19/02/2015 14 / 16

Acknowledgements

Sundfeld et al Foldalign 2.2 19/02/2015 15 / 16

Thank you!Questions?

Sundfeld et al Foldalign 2.2 19/02/2015 16 / 16