Bulk Synchronous Parallel Processing Model
description
Transcript of Bulk Synchronous Parallel Processing Model
![Page 1: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/1.jpg)
Bulk Synchronous Parallel Processing ModelBulk Synchronous Parallel Processing Model
Jamie PerkinsJamie Perkins
![Page 2: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/2.jpg)
Overview Overview
Four W’s – Who, What, When and Why
Goals for BSP
BSP Design and Program
Cost Functions
Languages and Machines
Four W’s – Who, What, When and Why
Goals for BSP
BSP Design and Program
Cost Functions
Languages and Machines
![Page 3: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/3.jpg)
A Bridge for Parallel ComputationA Bridge for Parallel Computation
Von Neumann modelDesigned to insulate hardware and
software
BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard
University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel
computation
Von Neumann modelDesigned to insulate hardware and
software
BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard
University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel
computation
![Page 4: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/4.jpg)
Goals for BSPGoals for BSP
Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors
Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture
Predictability – performance of SW on different architecture must be predictable in a straight forward way
Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors
Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture
Predictability – performance of SW on different architecture must be predictable in a straight forward way
![Page 5: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/5.jpg)
BSP DesignBSP Design
Three ComponentsNode
Processor and Local MemoryRouter or Communication Network
Message Passing or Point-to-Point communication
Barrier or Synchronization MechanismImplemented in hardware
Three ComponentsNode
Processor and Local MemoryRouter or Communication Network
Message Passing or Point-to-Point communication
Barrier or Synchronization MechanismImplemented in hardware
![Page 6: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/6.jpg)
BSP Design BSP Design
Fixed memory architectureHashing to allocate memory in “random”
fashion
Fast access to local memory
Uniformly slow access to remote memory
Fixed memory architectureHashing to allocate memory in “random”
fashion
Fast access to local memory
Uniformly slow access to remote memory
![Page 7: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/7.jpg)
Illustration of BSP ComputerIllustration of BSP Computer
Communication Network
P M P M P M
Node Node Node
Barrier
http://peace.snu.ac.kr/courses/parallelprocessing/
![Page 8: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/8.jpg)
BSP ProgramBSP Program
Composed of S supersteps
Superstep consists of:A computation where each processor
uses only locally held valuesA global message transmission from
each processor to any subset of the others
A barrier synchronization
Composed of S supersteps
Superstep consists of:A computation where each processor
uses only locally held valuesA global message transmission from
each processor to any subset of the others
A barrier synchronization
![Page 9: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/9.jpg)
Strategies for programming on BSPStrategies for programming on BSP
Balance the computation between processes
Balance the communication between processes
Minimize the number of supersteps
Balance the computation between processes
Balance the communication between processes
Minimize the number of supersteps
![Page 10: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/10.jpg)
BSP ProgramBSP Program
Superstep 1
Superstep 2Barrier
P1 P2 P3 P4
Computation
Communication
http://peace.snu.ac.kr/courses/parallelprocessing/
![Page 11: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/11.jpg)
Advantages of BSPAdvantages of BSP
Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)
Synchronization allows automatic optimization of the communication pattern
BSP model provides a simple cost function for analyzing the complexity of algorithms
Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)
Synchronization allows automatic optimization of the communication pattern
BSP model provides a simple cost function for analyzing the complexity of algorithms
![Page 12: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/12.jpg)
Cost FunctionCost Function
g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one
superstep w – largest amount of work performed (per
processor) h – largest number of packets sent or received
wi + ghi + L = execution time for the
superstep i
g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one
superstep w – largest amount of work performed (per
processor) h – largest number of packets sent or received
wi + ghi + L = execution time for the
superstep i
![Page 13: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/13.jpg)
Languages & MachinesLanguages & Machines
BSP ++CC++FortranJBSPOpal
BSP ++CC++FortranJBSPOpal
IBM SP1SGI Power
Challenge(Shared Memory)
Cray T3DHitachi SR2001TCP/IP
IBM SP1SGI Power
Challenge(Shared Memory)
Cray T3DHitachi SR2001TCP/IP
![Page 14: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/14.jpg)
Thank YouThank You
Any QuestionsAny Questions
![Page 15: Bulk Synchronous Parallel Processing Model](https://reader036.fdocuments.us/reader036/viewer/2022062423/568145a7550346895db29cc0/html5/thumbnails/15.jpg)
ReferencesReferences
http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/
agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8 (1990), 103-111.
http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/
agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8 (1990), 103-111.