Bulk Synchronous Parallel Processing Model

Bulk Synchronous Parallel Processing ModelBulk Synchronous Parallel Processing Model

Jamie PerkinsJamie Perkins

Overview Overview

Four W’s – Who, What, When and Why

Goals for BSP

BSP Design and Program

Cost Functions

Languages and Machines

Four W’s – Who, What, When and Why

Goals for BSP

BSP Design and Program

Cost Functions

Languages and Machines

A Bridge for Parallel ComputationA Bridge for Parallel Computation

Von Neumann modelDesigned to insulate hardware and

software

BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard

University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel

computation

Von Neumann modelDesigned to insulate hardware and

software

BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard

University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel

computation

Goals for BSPGoals for BSP

Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors

Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture

Predictability – performance of SW on different architecture must be predictable in a straight forward way

Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors

Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture

Predictability – performance of SW on different architecture must be predictable in a straight forward way

BSP DesignBSP Design

Three ComponentsNode

Processor and Local MemoryRouter or Communication Network

Message Passing or Point-to-Point communication

Barrier or Synchronization MechanismImplemented in hardware

Three ComponentsNode

Processor and Local MemoryRouter or Communication Network

Message Passing or Point-to-Point communication

Barrier or Synchronization MechanismImplemented in hardware

BSP Design BSP Design

Fixed memory architectureHashing to allocate memory in “random”

fashion

Fast access to local memory

Uniformly slow access to remote memory

Fixed memory architectureHashing to allocate memory in “random”

fashion

Fast access to local memory

Uniformly slow access to remote memory

Illustration of BSP ComputerIllustration of BSP Computer

Communication Network

P M P M P M

Node Node Node

Barrier

http://peace.snu.ac.kr/courses/parallelprocessing/

BSP ProgramBSP Program

Composed of S supersteps

Superstep consists of:A computation where each processor

uses only locally held valuesA global message transmission from

each processor to any subset of the others

A barrier synchronization

Composed of S supersteps

Superstep consists of:A computation where each processor

uses only locally held valuesA global message transmission from

each processor to any subset of the others

A barrier synchronization

Strategies for programming on BSPStrategies for programming on BSP

Balance the computation between processes

Balance the communication between processes

Minimize the number of supersteps

Balance the computation between processes

Balance the communication between processes

Minimize the number of supersteps

BSP ProgramBSP Program

Superstep 1

Superstep 2Barrier

P1 P2 P3 P4

Computation

Communication

http://peace.snu.ac.kr/courses/parallelprocessing/

Advantages of BSPAdvantages of BSP

Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)

Synchronization allows automatic optimization of the communication pattern

BSP model provides a simple cost function for analyzing the complexity of algorithms

Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)

Synchronization allows automatic optimization of the communication pattern

BSP model provides a simple cost function for analyzing the complexity of algorithms

Cost FunctionCost Function

g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one

superstep w – largest amount of work performed (per

processor) h – largest number of packets sent or received

wi + ghi + L = execution time for the

superstep i

g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one

superstep w – largest amount of work performed (per

processor) h – largest number of packets sent or received

wi + ghi + L = execution time for the

superstep i

Languages & MachinesLanguages & Machines

BSP ++CC++FortranJBSPOpal

BSP ++CC++FortranJBSPOpal

IBM SP1SGI Power

Challenge(Shared Memory)

Cray T3DHitachi SR2001TCP/IP

IBM SP1SGI Power

Challenge(Shared Memory)

Cray T3DHitachi SR2001TCP/IP

Thank YouThank You

Any QuestionsAny Questions

ReferencesReferences

http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/

agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture

Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994

United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel

Computation. Communications of the ACM 33,8 (1990), 103-111.

http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/

agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture

Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994

United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel

Computation. Communications of the ACM 33,8 (1990), 103-111.

Bulk Synchronous Parallel Processing Model

Documents

Transcript of Bulk Synchronous Parallel Processing Model