July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on...

32
July 31 - Augu st 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey C. Fox Northeast Parallel Architecture Center (NPAC) Syracuse University {wen,gcf}@npac.syr.edu

Transcript of July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on...

Page 1: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems

Yuhong Wen and Geoffrey C. FoxNortheast Parallel Architecture Center

(NPAC)Syracuse University

{wen,gcf}@npac.syr.edu

Page 2: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Why Performance Prediction?

• Application complexities• A lot of processors required• Large amount of data involved• Time-consuming processing• Performance prediction to fasten

– new parallel computer architecture design– application model design

Page 3: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Performance Prediction Approaches

• Concept design level performance prediction– aim to provide a quick and roughly correct

performance prediction at the early stage of model design

– PetaSIM based on this level• Detailed performance prediction

– to provide detailed information of a given application running on specific computer system -- Simulation

Page 4: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Motivation• PetaSIM was designed to allow “qualitative” performance

estimates” where in particular the design of machine is particularly easy to change

• The project will build up a suite of applications which can be used in future activities such as “Petaflop” architectures studies

• Applications are to be derived “by hand” or by automatic generation from Maryland Application Emulators

• Special attention to support of hierarchical memory machines and data intensive applications

• Support parallelism and representation at different grain sizes• Support simulation of “pure data-parallel” and composition of

linked modules

Page 5: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Performance Prediction Process

Estimated Performance

PetaSIM Estimator

Execution ScriptArchitectureDescription

PSL Specification

Existing Real Applications

Application Emulators

Detailed Simulators

Cost Model

Page 6: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Full Heterogeneous MetaProblem

Module

Aggregate Aggregate

LooselySynchronizeComputation

Module Module

Module Module

Components Components

Task Parallelism

Data Parallelism

Splitting intoLower Level

Memory Hierarchy

PetaSIM

Peta-Computing Hierarchy

Simulate

Real Computing

Page 7: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Performance Prediction Model

Hardware Domain

ApplicationDomain

Software / Operating System Domain

Multi Domain model

Page 8: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Three Domain Performance Prediction

• Application Domain– to extract the data aggregates– to give abstract data movement and computation behavior

• Software / Operating System Domain– to provide the methods for task process and memory

management, communication and parallel file access

• Hardware Domain– to provide the model of processor and memory components,

includes cache as well

Page 9: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Emulators

• Extract the application’s computational and data access patterns

• A simplified version of the real application, contains all the necessary communication, computation and I/O characteristics

• less accurate than full application, but more robust• fast performance prediction for rapid prototyping

Page 10: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Design• We define an object structure for computer (including

network) and data

• Architecture Description– nodeset & linkset

– (describe the architecture memory hierarchy)

• Data Description– dataset & distribution

• Application Description– execution script

• System / Software Description

Page 11: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Architecture Description

• A nodeset is a collection of entities with types liked– memory with cache & disks– CPU where results can be calculated– pathway such as bus, switch or network

• A linkset connects nodesets together in various ways

Page 12: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Application Description

• An application consists of dataset objects• dataset implementation is controled by the

distribution objects• application behavior is represented by execution

script– a set of command statements– data movements– data computation– synchronization

Page 13: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Nodeset Object Structure

• Name: one per nodeset object• type: choose from memory, cache, disk, CPU, pathway• number: number of members of this nodeset in the architecture• grainsize: size in bytes of each member of this nodeset (for memory, cache, disk)• bandwidth: maximum bandwidth allowed in any one member of this nodeset• floatspeed: CPU’s float calculating speed• calculate(): method used by CPU nodeset to perform computation• cacherule: controls persistence of data in a memory or cache• portcount: number of ports on each member of nodeset• portname[]: ports connected to linkset• portlink[]: name of linkset connecting to this port• nodeset_member_list: list of nodeset members in this nodeset (for nodeset member

identification)

Page 14: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Linkset Object Structure• Name: one per linkset object• type: choose from updown, across• nodesetbegin: name of initial nodeset joined by this linkset• nodesetend: name of final nodeset joined buy this linkset• topology: used for across networks to specify linkage between members of a single

nodeset• duplex: choose from full or half• number: number of members of this linkset in the architecture• latency: time to send zero length message across any member of linkset• bandwidth: maximum bandwidth allowed in any link of this linkset• send(): method that calculates cost of sending a message across the linkset• distribution: name of geometric distribution controlling this linkset• linkset_member_list: list of linkset members in this linkset ( for linkset member

identification )

Page 15: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Dataset Object Structure• Name: one per dataset object• choose from grid1dim, grid2dim, grid3dim, specifies type of dataset• bytesperunit: number of bytes in each unit• floatsperunit: update cost as a floating point arithmetic count• operationsperunit: operations in each unit• update(): method that updates given dataset which is contained in a

CPU nodeset and a grainsize controlled by last memory nodeset visited• transmit(): method that calculates cost of transmission of dataset

between memory levels either communication or movement up and down hierarchy– Methods can use other parameters or be custom

Page 16: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Execution Script• Currently a few primitives which stress (unlike most languages)

movement of data through memory hierarchies• send DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K

– These reference object names for data and memory nodesets

• move DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K• Use distribution DISTRIBUTION from MEM-LEVEL-L to MEM-

LEVEL-K• compute DATAFAMILY-A, DATAFAMILY-B,… on MEM-LEVEL-

L• synchronize (synchronizes all processors --- loosely synchronous

barrier)• loop operation

Page 17: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Estimation Schedule

• Each nodeset member has its usage control to record when dataset arrives and when to send out to next nodeset member

• Each linkset member has its usage control to record at what time the linkset member is free or occupied

• Data Driven model: Ready ---> Go (First come, First Service)

• Support Both data parallel mode and individual operation on each nodeset, linkset member mode

Page 18: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Architecture of PetaSIM

C++ SimulatorMulti-UserJava Server

StandardJava Applet

Client

StandardJava Applet

Client

Page 19: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Experiments

IBM-SP2 nodeseet and linkset components -- Architecture I

Page 20: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Titan Estimation Results

050

100150200250300350

P15(9KData)

P15(27KData)

P32(27KData)

IBM-SP2RunningTime

PetasimEstimateTime

Page 21: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Pathfinder Estimation Results

0100

200300

400

500600

700800

P15(9KData)

P15(27KData)

P32(55KData)

IBM-SP2RunningTime

PetaSIMEstimateTime

Page 22: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Experiments

IBM-SP2 nodeseet and linkset components -- Architecture II

Page 23: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Pathfinder Performance Estimation Results

0

20

40

60

80

100

120

140

16 32 64 100

M easured Running Time on SP2

Estimated Running Time on SP2

Sequential Time to Produce Estimate

Processor Number

Tim

e (S

ec.)

Page 24: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Pathfinder Estimation Results II

0

5

10

15

20

25

30

35

4 8 16 32 48 56 60

M easured Running Time on SP2

Estimated Running Time on SP2

Sequential Time to Produce Estimate

I/O Nodes

Tim

e (S

ec.)

Page 25: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Titan Estimation Results (Fixed)

0102030405060708090

16 32 64 100

Measured Running Time on SP2

Estimated Running Time on SP2

Sequential Time to Produce Estimate

Processor Number

Tim

e (S

ec.)

Page 26: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

VMScope Estimation Results

0

5

10

15

20

25

16 32 64 100

Measured Running Time on SP2

Estimated Running Time on SP2

Sequential Time to Produce Estimate

Processor Number

Tim

e (S

ec.)

Page 27: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Features

• Accurate estimation• Friendly user interface

– Easy to modify the architecture design – Easy to monitor the effect of the design change

• Fast Estimation• Detail performance estimation

– Provide detail usage of each individual nodeset and linkset member in the memory hierarchy

Page 28: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Compare with some other Simulators

• Different Simulation Approach– PetaSIM not real run the application, estimate the

execution script (operation abstraction)• PetaSIM running on single processor• Similar performance estimation results• PetaSIM can easily deal with different kinds of

computer architecture• PetaSIM can get detailed information of any part of the

architecture

Page 29: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Current Progress Summary

• Architecture Description (nodeset & linkset)• Application Description (dataset & execution script)• Link to Application Emulators• Jacobi hand-written example• Pathfinder, Titan, VMScope real applications (Generated by

UMD’s Emulator)• Easy modified Architecture and Application description• Fast and relatively Accurate performance estimation (PetaSIM

running on single processor)• Java applet based user Interface• Data Parallel Model & Individual Control

Page 30: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Possible Future Work• Richer set of applications using standard benchmarks and DoD MSTAR• Relate object model to those used in “seamless interfaces” /

metacomputing i.e. to efforts to establish (distributed) object model for computation

• Review very simple execution script -- should we add more complex primitives or regard “application emulators” as this complex script

• Binary format (“compiled PetaSIM”) of architecture and application description ( ASCII format will make execution script very large)– Translation tool from ASCII format to binary format (to retain the friendly user

interface)

• Upgrade performance evaluation model• Run performance simulation in parallel (i.e. PetaSIM running on multi-

processors)

Page 31: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

PetaSIM Web-Site URL

http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html

-- PetaSIM Java Applet front user interface and demo-- Related PetaSIM documents

Page 32: July 31 - August 4, 1999 SCI'99 / ISAS'99 Performance Prediction for Data Intensive Applications on Large Scale Parallel Systems Yuhong Wen and Geoffrey.

July 31 - August 4, 1999

SCI'99 / ISAS'99

Interface of PetaSIM Client