Parallel Processing - Gravity Probe B · Science Advisory Committee Meeting - 20 3 September 3,...

Science Advisory Committee Meeting - 20 September 3, 2010 • Stanford University

1 04_Parallel Processing

Parallel Processing

Majid AlMeshari John W. Conklin

Science Advisory Committee Meeting - 20

September 3, 2010 • Stanford University

04_Parallel Processing

Outline •  Challenge •  Requirements •  Resources •  Approach •  Status •  Tools for Processing

Challenge

A computationally intensive algorithm is applied on a huge set of data. Verification of filter’s correctness takes a long time and hinders progress.

Requirements •  Achieve a speed-up between 5-10x over serial version •  Minimize parallelization overhead •  Minimize time to parallelize new releases of code •  Achieve reasonable scalability

Number of Processors

SAC 19

Resources •  Hardware

– Computer Cluster (Regelation), a 44 64-bit CPUs » 64-bit enables us to address a memory space beyond 4 GB » Using this cluster because:

(a) likely will not need more than 44 processors (b) same platform as our Linux boxes

•  Software – MatlabMPI

» Matlab parallel computing toolbox created by MIT Lincoln Lab

» Eliminates the need to port our code into another language

Approach - Serial Code Structure Initialization

Loop over iterations

Loop Gyros

Loop over orbits

Loop over Segments

Build h(x) (the right hand side)

Build Jacobian, J(x)

Load data (Z), perform fit Bayesian estimation,

display results

Computationally Intensive

Components

(Parallelize!!!)

Approach - Parallel Code Structure Initialization

Loop over iterations Loop Gyros

Loop over orbits Loop over Segments

Build h(x)

Build Jacobian, J(x)

Bayesian estimation, display results

Distribute J(x), all nodes

Load data (Z), perform fit Loop over partial orbits

Send information matrices to master & combine

In parallel, split up by columns of J(x)

In parallel, split up by orbits

Custom data distribution code

required (Overhead)

Code in Action

. . . . . . . . .

Initialization

Compute h and Partial J

Sync Point Get Complete J

Perform Fit

Sync Point Combine Info Matrix

Bayesian Estimation Another Iteration?

Split by Columns of J

Split by Rows of J

Parallelization Techniques - 1 •  Minimize inter-processor communication

–  Use MatlabMPI for one-to-one communication only –  Use a custom approach for one-to-many communication

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Comp J

Dist J (I/O)

comp info (I/O)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Comp J

Dist J (I/O)

comp info (I/O)

Processors

ec) Comm time with

custom approach Comm time with

Parallelization Techniques - 2 •  Balance processing load among processors

Load per processor = ∑ length(reducedVectork)*(computationWeightk + diskIOWeight)

k: {RT, Tel, Cg, TFM, S0}

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Comp J

Dist J (I/O)

comp info (I/O)

Processors

ec) CPUs have

uniform load

Parallelization Techniques - 3 •  Minimize memory footprint of slave code

–  Tightly manage memory to prevent swapping to disk, or “out of memory” errors

–  We managed to save ~40% of RAM per processor and never seen “out of memory” since

Parallelization Techniques - 4 •  Find contention points

–  Cache what can be cached –  Optimize what can be optimized

Old Code Optimized Code

Example: Compute h Optimization

Status – Current Speed up

6.50 7.22

1 CPU 8 CPUs 16 CPUs 20 CPUs

Sep. '09

May '10

July '10

Number of Processors

Tools for Processing - 1 •  Run Manager/Scheduler/Version Controller

– Built by Ahmad Aljadaan – Given a batch of options files, it schedules batches of runs,

serial or parallel, on cluster –  Facilitates version control by creating a unique directory for

each run with its options file, code and results – Used with large number of test cases – Helped us schedule 1048+ runs since January 2010

»  and counting…

Tools for Processing - 2 •  Result Viewer

– Built by Ahmad Aljadaan – Allows searching for runs, displays results, plots related

charts –  Facilitates comparison of results

Questions

Parallel Processing - Gravity Probe B · Science Advisory Committee Meeting - 20 3 September 3,...

Documents

Transcript of Parallel Processing - Gravity Probe B · Science Advisory Committee Meeting - 20 3 September 3,...

Computationally Efﬁcient Unscented Kalman Filtering ...

Computationally Efﬁcient Multiphase Heuristics for ... · Computationally Efﬁcient Multiphase Heuristics for Simulation-Based Optimization Christoph Bodenstein, Thomas Dietrich,

Should learning design be supported computationally?

SEISMOLOGY Earthquake detection through computationally ... · Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms.

Computationally-Efficient Approximation Mechanisms (cont.)

Processing Electron, X-ray, and CL images Electron probe microanalysis EPMA Modified 8/22/08.

Simple and Computationally Efficient ... - eprints.soton.ac.uk

Computationally-predicted AOPs · computationally-predicted AOPs & networks

Image Processing for HTS SQUID probe microscope Advanced Image Processing Seminar Colin Bothwell 0570063.

· Lockers Solutions for every environment. probe lockers probe shelving. probe cupboards and workstations probe mobile shelving. probe library shelving probe cloakroom. probe cabine

Computationally modeling interpersonal trust

Making computer vision computationally efficient

Computationally Independent Model and Service Specification · Computationally Independent Model and Service Specification ... Service Specification v. 1.5 - 2 ... 0 Computationally

Towards Real-Time Information Processing of Sensor Network Data Using Computationally ...papers/cpsweek08/papers/ipsn08/3157a... · 2008-03-20 · Towards Real-Time Information Processing

Computationally Related Problems

COMPUTATIONALLY EFFICIENT VISION-BASED ROBOT ...essay.utwente.nl/74107/1/TERRIVEL_MA_EEMCS.pdfa user interface, data processing, (digital) filtering and high-level control, thus cannot

Computationally Eﬃcient Receiver Design for Mitigating Multipath …aspin.eng.uci.edu/papers/Computationally_Efficient... · 2017. 11. 23. · Computationally Eﬃcient Receiver

free pre-processing and post-processing modules, · PDF filefree pre-processing and post-processing modules, allowing users to ... documentation, printing, etc. ... Multi-probe scan

IEEE TRANSACTIONS ON IMAGE PROCESSING, …infolab.stanford.edu/~wangz/project/3DHMM/TIP/joshi_ip.pdfIEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 1871 A Computationally

Prediction versus Understanding in Computationally ...