Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming...

21
Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    1

Transcript of Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming...

Page 1: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Computer Processing for Amyotrophic Lateral Sclerosis

On Parallelizing a Dynamic Programming Algorithm

for RNA Folding

Page 2: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Outline

• Work to be Undertaken• Background and Significance– Expected Significance– Relation to Class Materials– Relation to Present State of Knowledge

• Preliminary Studies / Progress Report– General Plan of the Work– Broad Design of Activities to be Undertaken

• Project Design and Methods

Page 3: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Work to be Undertaken

• Take existing parallel code• Modify it for use with MPI.• Find or create code to measure Altix performance– Time complexity– Message complexity

• See whether analysis in a published article matches experimental measurements of complexity.

• See whether knowledge about data dependencies in the algorithm can be used to design localization, and thereby improve parallel performance by reducing unnecessary communication.

Page 4: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Background and Significance

• There was a physician who was very kind to me who had this disease. I would like to be of service in helping people with this disease.

• There is increasing evidence for RNA processing problems causing motor neuron degeneration.

• (ALSR Today, 2008, Vol4, Advances in ALS Genetics, Ammar Al-Chalabi, Ph.D., F.R.C.P. )

Page 5: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

RNA Processing• RNA is created when genes are transcribed. (“Central dogma”, Watson

and Crick)• “When termination finally occurs, the RNA transcript is released from the

DNA template and in the case of eukaryotes it is rapidly processed.”(Proudfoot and Whitelaw, p.97).

• 3’ end processing has ...role in regulation of gene expression (ibid, p. 98)• “RNA splicing is a series of cleavage and ligation reactions the result in the

precision excision of introns from the precursor RNA”, p. 131 (Krainer and Maniatis)

• “it appears that recognition of splice sites by the tRNA splicing enzymes is based primarily on common structural features of the exons, and on the conserved position of the intron”, p. 133

Page 6: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

RNA Processing and Folding

• “the presence of the splice sites in single-stranded loops are characteristic features of S. cerevisiae pre-tTNAs, but these features are not required for splicing” (ibid., p. 132)

Page 7: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Background and Significance

• The remaining obstacles are therefore the statistical methods needed to analyze the data, the huge computing resources needed to handle billions of DNA results in thousands of people, and the money required to finance the research.

• (ALSR Today, 2008, Vol4, Advances in ALS Genetics, Ammar Al-Chalabi, Ph.D., F.R.C.P. )

Page 8: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Computational Aspects

• Many algorithms have been developed for the inference and (database) similarity search of RNA secondary structure. However, the execution time (and memory requirement) is often a polynomial with a degree as high as 6. This complexity limits the application of these algorithms. For instance, Baird et al. estimated that it would take 6 months to search for a single regulatory element in a database of 20,000 entries (untranslated regions) [2].

Page 9: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Background of Computation• RNA folding is predicted with several algorithms,

at least one of which is a dynamic programming algorithm.

• Dynamic programming algorithms divide larger problems into smaller problems by means of computing the value of smaller problems, and storing these values into an array (matrix).

• Often values stored in different parts of the array can be computed in parallel, for example the elements along one diagonal of the matrix might be independent of one another.

Page 10: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Time Complexity

According to Ogoubi et al., “Many algorithms have been developed for the inference and (database) similarity search of RNA secondary structure. However, the execution time (and memory requirement) is often a polynomial with a degree as high as 6. This complexity limits the application of these algorithms. For instance, Baird et al. estimated that it would take 6 months to search for a single regulatory element in a database of 20,000 entries (untranslated regions) [2].”

Page 11: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Parallelized Dynamic Programming

• Dynamic programming has a matrix creation/fill phase, and a readback phase.

• The readback phase is sequential, but it is linear.

• According to Ogoubi, et al., the “execution time of the fill stage of the RNA folding algorithm can be done in O(N 2).”

Page 12: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Dynamic Programming• The dependencies of one cell of the matrix upon

previously computed cells of the matrix is a pattern known at compile time.

• This pattern might be represented as a graph.• If we were to imagine a fine-grained case where each

cell of the matrix was computed separately, the dependencies would result in interprocessor communication along the graph edges.

• When we consider how most efficiently to deploy the individual cell computations onto coarser-grained processors, we would consider this dependency graph, trying to form subsets of the vertices, to enclose within a subset boundary, as much of the flow on the graph as possible.

Page 13: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Expected Significance• To quickly and efficiently fold long ribonucleic acid (RNA)

sequences, fast computational models are needed. This paper compares two parallel multiprocessor computer architectures for the prediction of RNA secondary structure. We show promising experimental results using the OpenMP programming environment. This work is intended to be a testbed for the development of new approaches for the prediction of consensus RNA secondary structure from multiple sequences.

• Parallel Multiprocessor Approaches to the RNA Folding Problem, Etienne Ogoubi, David Pouliot, Marcel Turcotte, and Abdelhakim Hafid, PPAM 2007, LNCS 4967, pp. 1230–1239, 2008.

Page 14: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Relation to Class Materials• In class we are studying the use of parallel machines

including libraries MPI and shared memory with the underlying cache coherence protocols.

• This class has taught us that shared memory implementations imply interprocessor communication. A statement such as the following might be subject to doubt:

• “The ability of all the processors to access the same pool of variables with no communication overheads cost and no network transit time cost makes OpenMP more suitable for our application compared to Message Passing Interface (MPI).”

Page 15: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Relation to Student’s Research Interests

• These researchers are working on RNA folding, which is part of RNA processing, which is implicated as problematic for ALS sufferers.

• Perhaps these researchers can benefit from the material taught in this course.

Page 16: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Relations to Present State of Knowledge in the Field

• The proposed work is, compared to the present state of knowledge in computer science, probably not a contribution. However, it might be that the researchers in RNA folding would benefit.

Page 17: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Preliminary Studies / Progress Report

• Minimal: – Single process program runs on SGI Altix.– Excel spreadsheet tool for prediction of bus

saturation in preparation.

Page 18: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

General Plan of the Work

• Find out what measurement tools, for time and message complexity, are available with SGI Altix, for monitoring loading of processor resources.

• Design a deployment of the algorithm onto processors.

• Measure the time and messages of the MPI and shared memory implementations.

• Compare with paper.• Check apparent assertion in paper about no

interprocessor communication.

Page 19: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Broad Design of the Activities to be Undertaken

• See what can be measured• Implement some code• Predict its bus use with spreadsheet tool.• Measure its performance,– Comparing the MPI implementation with– Shared memory implementation

Page 20: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

Project Design and Methods

• Establish a monitor which can measure interprocessor communication, especially cache coherence bus use.

• Establish a procedure for measuring elapsed time.• Predict communication patterns from algorithm, and attempt

to minimize interprocess communication by using localization considerations to deploy matrix cell computation onto specific processors.

• Compare different implementations and deployments with different localities.

Page 21: Computer Processing for Amyotrophic Lateral Sclerosis On Parallelizing a Dynamic Programming Algorithm for RNA Folding.

References

• Proudfoot and Whitelaw, in Transcription and Splicing, Hames and Glover, eds., IRL Press, 1988

• Krainer and Maniatis, in and Splicing, p. 97, Hames and Glover, eds., IRL Press, 1988

• Baird, S.D., Turcotte, M., Korneluk, R.G., Holcik, M.: Searching for IRES., RNA 12(10), 1755–1785

(2006) cited in Ogoubi, Pouliot, et al., Parallel Multiprocessor Approaches to the RNA Folding Problemin R. Wyrzykowski et al. (Eds.): PPAM 2007, LNCS 4967, pp. 1230–1239, 2008.