Tuning Tophat2
description
Transcript of Tuning Tophat2
![Page 1: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/1.jpg)
Tuning Tophat2Belinda Giardine
![Page 2: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/2.jpg)
Tophat2Aligns reads from RNA to the genome
Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes.
Adds on dealing with gaps in the alignments by breaking the reads into small pieces ~20 bases and reassembling the reads after mapping.
Though the new version is more parallel still slow (more than 4 days for recent runs)
It uses Bowtie to do the actual mapping
![Page 3: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/3.jpg)
RNA-seq
image from wikipedia
fastq file, a single read:@DGM97JN1:330:C3EW0ACXX:1:1101:2723:1993 1:N:0:NAAGGCGAATGCCCCCGGCCGTCCCTCTTAATCATGGCCTCAGTTCCGAAAACCANCAAAATAGAACCGCGGTCCTATTNN+CCCFFFFFHHHHGIIFGIIIIJJIIJIFGIJEHIIJIGHIJHAGHHFEE#,;;BACEEDDDDDD@B>BBDCDC##
![Page 4: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/4.jpg)
Tophat2Pipeline written in C++ (34,351 lines of code in 63
files)Wrapper written in Python
3 of the programs use Boost pthreads long_spanning_reads.cpp segment_juncs.cpp tophat_reports.cpp
Programs are compiled as one unit under autoconfig and automake, communication between programs with temporary files.
Many prerequisites: zlib, Boost, samtools, Bowtie, this and the amount of file IO makes running on MIC only not feasible.
![Page 5: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/5.jpg)
Data filesReads in fastq format, 20–200 million reads (2 x
20gb for my test)Reference sequence and indexes used for
mapping 6gb for mouseFinal output 14gb for my test
![Page 6: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/6.jpg)
Work from last timeCompiling
start with gcc then icc then add –mmic (this failed in trying to get all the
prerequisites)Test run on host, using Tophat’s log of run for time.
Run on biostar(Xeon) using 8 threads took 26 hoursRun on stampede (host) using 16 threads took 19
hours, 40 minsRun on stampede (host) using 32 threads took 24
hours
![Page 7: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/7.jpg)
New workPython wrapper and long run times makes gprof and
vtune difficult to profile code with.Going from my experience in Biostar, I am starting
with segment_juncs executable.Keeping the temporary files that are used for passing
data between programs, I ran just segment_juncs.Time for segment_junctions run alone:
8 threads 2 hours 13 minutes16 threads 1 hour 15 minutes (2 ½ out of 19 ½ hours
total) of this 76% is spent in the parallel section
32 threads 2 hours 12 minutes
![Page 8: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/8.jpg)
Failed attemptsRun vtune on segment_juncs
times out of full data license errors
Check loops in par_report that are assumed dependencies. lines of code indicated not loops or in loops?contradictory lines
Offloading threaded section of code in segment_juncs.cpp. Will it actually improve speed or too much file IO?Lots of variables to copyFile IO
![Page 9: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/9.jpg)
Hardison Lab
![Page 10: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/10.jpg)
vec_report3segment_juncs.cpp(135): (col. 32) remark: loop was not
vectorized: existence of vector dependence.segment_juncs.cpp(135): (col. 32) remark: vector
dependence: assumed ANTI dependence between r.92068 line 135 and r.92068 line 135.
segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed FLOW dependence between r.92068 line 135 and r.92068 line 135.
Line 135:left_seg.left = max(0, T.right() - 2);
![Page 11: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/11.jpg)
opt_report REMOVED VAR left_mismatches.201433.0_V$78b REMOVED PACK left_mismatches.201433.0 REMOVED VAR
right_mismatches.201433.0_V$78d REMOVED PACK right_mismatches.201433.0
![Page 12: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/12.jpg)
gprof output for segment_juncs Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name 100.01 0.01 0.01
extend_from_seeds(std::vector<SeedExtension, std::allocator<SeedExtension> >&, PackedSplice const&, std::vector<std::vector<ReadHit, std::allocator<ReadHit> >, std::allocator<std::vector<ReadHit, std::allocator<ReadHit> > > > const&, std::string const&, std::string const&, unsigned long, unsigned long, int)
0.00 0.01 0.00 89528 0.00 0.00 pack_splice(std::string const&, int, int, unsigned int)
0.00 0.01 0.00 3 0.00 0.00 __do_global_dtors_aux 0.00 0.01 0.00 2 0.00 0.00 pack_right_splice_half(std::string
const&, unsigned int, unsigned int)
![Page 13: Tuning Tophat2](https://reader035.fdocuments.us/reader035/viewer/2022062520/56816260550346895dd2bcab/html5/thumbnails/13.jpg)
Parallel section of code: vector<boost::thread*> threads; for (int i = 0; i < num_threads; ++i) { SegmentSearchWorker worker; worker.rt = &rt; worker.reads_fname = left_reads_fname; worker.segmap_fnames = &left_segmap_fnames; worker.partner_reads_map_fname = right_reads_map_fname; worker.seg_partner_reads_map_fname = right_seg_fname_for_segment_search; worker.juncs = &vseg_juncs[i]; worker.deletions = &vdeletions[i]; worker.insertions = &vinsertions[i]; worker.fusions = &vfusions[i]; worker.read = READ_LEFT; worker.partner_hit_offset = 0; worker.seg_partner_hit_offset = 0; if (i == 0) { worker.begin_id = 0; worker.seg_offsets = vector<int64_t>(left_segmap_fnames.size(), 0); worker.read_offset = 0; } else { worker.begin_id = read_ids[i-1]; worker.seg_offsets.insert(worker.seg_offsets.end(), offsets[i-1].begin()+1, offsets[i-1].end()); worker.read_offset = offsets[i-1][0]; if (partner_offsets.size() > 0) worker.partner_hit_offset = partner_offsets[i-1]; if (seg_partner_offsets.size() > 0) worker.seg_partner_hit_offset = seg_partner_offsets[i-1]; } worker.end_id = (i+1 < num_threads) ? read_ids[i] : std::numeric_limits<uint64_t>::max(); //Geo debug: //fprintf(stderr, "Worker %d: begin_id=%lu, end_id=%lu\n", i, worker.begin_id, worker.end_id);
if (num_threads > 1 && i + 1 < num_threads) threads.push_back(new boost::thread(worker)); else worker(); }