Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016...

14
© 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors Saritha Vinod Power Systems Performance Analyst IBM Systems [email protected]

Transcript of Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016...

Page 1: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors

Saritha Vinod

Power Systems Performance Analyst

IBM Systems

[email protected]

Page 2: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Agenda

2

• Emerging Workloads Characteristics and Performance• Performance Modelling Lifecycle for Future Generation Processors• Workload Tracing Process• Workload Tracing Methods & Tools • Key Challenges in Workload Tracing• Performance Evaluations using Traces

• Microarchitecture Design Analysis • Software Performance Optimizations• Performance Verification

• Summary

Page 3: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Emerging Workloads Characteristics and Performance

3

• New industry trends leading to emerging workloads in domains such as Cognitive

computing, Deep Learning, Analytics, Cloud etc.

• To achieve best performance it is important for the next generation processor design to

address some of the following emerging workload characteristics

Instruction mixes & compute needs

Cache access patterns & prefetch

Data access patterns

Sharing of data

Data affinities

Branch prediction

OS and Hypervisor calls

Page 4: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Performance Modelling Lifecycle for Future Generation Processors

4

Develop/Config

ure processor

Model

Design/ Feature

Evaluation

Identify

bottlenecks

Design

Enhancements

WorkloadsInstruction

Traces

Processor Performance Modeling Lifecycle

Remodel

Reached Target

Performance ?

Model

Final

Processor

Model

Traces provide key workload

characteristics

Enable performance

evaluation of future

generation processors

Page 5: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Workload Tracing Process

5

Instruction Traces

Core Model

I/O Model

Memory Model

Model statistics

Pipeline Visualizations

In

put

Mo

dels

Outp

ut

Workload

Trace Post processing & Validation

Recaptu

re T

ra

ce

Tra

ce

Genera

tion

Perform

ance

Mo

delling

Functional

SimulatorHW Trace Valgrind

Page 6: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Workload Tracing Methods & Tools

6

Functional Simulator Hardware Traces Valgrind Framework

• Highly Controlled simulation environment

• Supports sampling of multi-phase workloads

• System level tracing

• Not well-suited for workloads with complex stack, large memory and highly threaded workloads

• Used forcommercial workloads with high core counts and memory requirements

• Instruction and bus traces

• System level tracing

• Complex setup process

• Lacks support for generating sampled traces

• Useful for tracing hot functions or problem areas in the application

• Supports sampling

• Provides only application tracing, no system level

Reference : IBM SDK for Linux on Power https://www-304.ibm.com/webapp/set2/sas/f/lopdiags/sdklop.html

Reference : IBM POWER8 Functional Simulator (systemsim)http://www-304.ibm.com/webapp/set2/sas/f/pwrfs/pwrfsinstall.html

Page 7: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Key Challenges in Workload Tracing

7

Challenges

• Hardware models execute only a subset of instructions; most workloads run into billions of instructions.

• Overall runtime of emerging workloads increasing

• A smaller subset of runtime with representative workload behavior required for design studies.

• Selection depends on the design needs and the workload characteristics

• The selected segment need to retain the original workload characteristics

Resolutions

• Identify workload interval to trace –workload steady state, phases based on performance counter data

• Representative trace segment selection – sampled, contiguous, filtered or at unit level

• Trace profile validation – capturing the right application runtime, maintaining the CPI characteristics

Page 8: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Performance Evaluations using Traces

8

Microarchitecture

Design

• Design evaluations of new processor features

• Tuning and trade-off analysis

Software Performance Optimizations

• Analysis of hot functions and bottlenecks in applications

• Compiler optimizations

• System tuning

Performance Verification

• Hardware model performance verification

Workload Traces

Page 9: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Microarchitecture Design Analysis and Optimization

9

• Tuning and trade-off analysis• Determine capacity – Cache size , queue size• Sensitivity analysis using various categories of workload traces

• New Design evaluations• New techniques for load-store handling• Branch prediction algorithms• Data prefetch design

Page 10: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Software Performance Optimizations

10

• Analyzing application performance bottlenecks• Back to back latency issues, LSU stalls, Branch mispredictions etc.

• Compiler optimizations• Microarchitecture dependent

• Scheduling, ISA exploitation

• Microarchitecture independent• Inlining, unrolling etc.

• Flag tuning

• System tuning • SMT levels• Prefetch settings• Large pages

Page 11: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Micro-architecture Pipeline View for Optimizations

11

Cycle accurate simulator • Micro-architecture

statistics • Pipeline view for the

instruction mix

References: IBM Power 8 Performance Simulator https://www-304.ibm.com/webapp/set2/sas/f/lopdiags/sdkdownload.html

Page 12: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Performance Verification

12

• Workload traces used for performance verification of hardware model• Broader performance comparison of final hardware model and the

performance model• To identify delta gaps in performance

Page 13: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Summary

13

• OpenPOWER processors designed to deliver superior performance

• Performance evaluation and micro-architecture analysis tools and methods available for open innovation

• Key insights derived from emerging workloads through traces• Enables micro-architecture design evaluations, trade-off

analysis, software/compiler optimizations and verification

Page 14: Emerging Workload Performance Evaluation on Future Generation … · 2019. 3. 1. · © 2016 OpenPOWER Foundation Emerging Workload Performance Evaluation on Future Generation OpenPOWER

© 2016 OpenPOWER Foundation

Thank you

14