ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of...

22
ACMSE’04, AL ACMSE’04, AL Department of Electrical and Computer Department of Electrical and Computer Engineering - UAH Engineering - UAH Execution Characteristics of SPEC Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++ Microsoft VC++ Swathi Tanjore Gurumani, Aleksandar Milenkovic Swathi Tanjore Gurumani, Aleksandar Milenkovic Electrical and Computer Engineering Department Electrical and Computer Engineering Department University of Alabama in Huntsville University of Alabama in Huntsville

Transcript of ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of...

Page 1: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Execution Characteristics of SPEC CPU2000 Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++Benchmarks: Intel C++ vs. Microsoft VC++

Swathi Tanjore Gurumani, Aleksandar MilenkovicSwathi Tanjore Gurumani, Aleksandar Milenkovic

Electrical and Computer Engineering DepartmentElectrical and Computer Engineering Department

University of Alabama in HuntsvilleUniversity of Alabama in Huntsville

Page 2: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

OutlineOutline

• Objective• Background• Problem Overview• Performance Evaluation - Overview• Experimental Setup• Results• Conclusion and Future Research

Page 3: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Problem ObjectiveProblem Objective

Prove and stress the importance of Prove and stress the importance of designing architecture-aware compilersdesigning architecture-aware compilers

Page 4: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Background - Application Performance Background - Application Performance

Advancement in processor technology• Deep pipelining• Multi-level cache hierarchy• Improved branch predictors• Out of order execution engine• Advanced floating point• Multimedia units

Compilers • Optimization levels and switches

Compilers should keep up with processor technology

Page 5: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Compiler/hardware interaction can maximize application performance by• Exploiting advances in processor technology• Generating target-specific optimal codes

Path length reduction Efficient instruction selection Pipelining scheduling Instruction level parallelism Memory penalty minimization

Architecture-aware CompilersArchitecture-aware Compilers

Page 6: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Performance EvaluationPerformance Evaluation

Systematic process of data collection and analysis to determine and evaluate any system

Benchmarks ExeCompile Performance Metrics

Benchmarks: A program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed.

Page 7: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Performance Evaluation Performance Evaluation – Previous Works– Previous Works

Study underlying architecture and characterize workloads • Evaluation of Pentium Pro using SPEC 2000• Evaluation of Pentium II using Multimedia applications

Processor centric optimization• Xeon vs. Pentium III• Pentium III vs. Pentium IV

Compilers and optimization• Branch optimizations by different compilers

Page 8: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Problem OverviewProblem Overview

ObjectiveObjectiveProve and stress the importance of architecture aware compilers

How?How?• Compile benchmarks using different compilers• Use same optimization switches• Execute the binaries using performance analyzer• Analyze and compare the performance metrics collected

Same OS, hardware features - difference in metrics only due to compiler used

Page 9: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Experimental SetupExperimental Setup

SPEC CPU2000

ExeIC++Performance Metrics

ExeVC++ Performance Metrics

VTune

VTune

Processor : Pentium IVOperating System : Windows 2000Optimization Level : /O2Input : Reference set from SPEC

Page 10: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

SPEC CPU2000 SPEC CPU2000

Portray real user application and computation intensive

Can measure performance of processor, memory and compilercompiler

Does not stress on I/O devices, networking and OS

Used CINT2000 and CFP2000

Name Description

164.gzip (INT) Data Compression written in C

176.gcc (INT) C Programming Language Compiler

177.mesa (FP) 3-D Graphics Library written in C

181.mcf (INT) Combinatorial Optimization written in C

186.crafty (INT) Chess – Game Playing written in C

197.parser (INT) Word Processing written in C

252.eon (INT) Computer Visualization written in C++

253.perlbmk (INT) PERL Programming Language written in C

254.gap (INT) Group Theory, Interpreter written in C

255.vortex (INT) Object Oriented database written in C

Page 11: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

VTune Performance AnalyzerVTune Performance Analyzer

Simultaneous sampling of multiple events and real time display using counter monitors

Supports time-based and event-based samplingevent-based sampling• To take advantage of Pentium IV’s EBS feature

Has a low intrusion• Samples collected provide a closer representation of application’s

actual performance

Events Collected• Clockticks, instructions retired, loads retired, stores retired,

branches retired, I level cache misses and mispredicted branches

Page 12: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Compiler OptimizationsCompiler Optimizations

Both compilers were used with /O2 option Invoke the same switches and have same

functions Microsoft VC++ has special switches to target

Pentium (/G5) & Pentium Pro (/G6) Intel C++ compiler optimizes performance for

applications running on Intel architecture-based computers

Option Effect

/Od Disable optimization

/O1 Minimize size

/O2 Maximize speed

Performance gains by using IC++ are result of- profile-guided optimization- pre-fetch instruction- support for Streaming SIMD

Extensions (SSE) - data prefetching- inter-procedural optimization

Page 13: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of Clock ticksComparison of Clock ticks

On average, 10% performance gain with IC++

Performance gain more pronounced for 3D graphics library and computer visualization application

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255

Applications

Cloc

ktic

ks R

atio

MSVC++

IC++

Page 14: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of BinariesComparison of Binaries

Benchmark Code Size (in

Bytes)

  MSVC++ IC++

164.gzip 69,632 77,824

176.gcc 1,089,536 1,314,816

177.mesa 442,368 610,304

181.mcf 49,152 53,248

186.crafty 241,664 258,048

197.parser 118,784 131,072

252.eon 405,504 413,696

253.perlbmk 516,096 651,264

254.gap 356,352 413,696

255.vortex 417,792 454,656

VC++ produced smaller sized binariesVC++ produced smaller sized binaries

Page 15: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of Instruction CountComparison of Instruction Count

3D and Computer Visualization applications have a much reduced instruction count than others

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255

Applications

Instr

ucti

on

Co

un

t

MSVC++

IC++

Page 16: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of LoadsComparison of Loads

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255Applications

Dis

trib

uti

on

of

Lo

ad

s

Icount

MSVC++

IC++

Page 17: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of StoresComparison of Stores

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255

Applications

Dis

trib

uti

on

of

Sto

res

Icount

MSVC++

IC++

Page 18: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of BranchesComparison of Branches

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255

Applications

Dis

trib

uti

on

of

Bra

nch

es

Icount

MSVC++

IC++

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255Applications

Mis

pred

icte

d B

ranc

hes

Rat

io

Branches

MSVC++

IC++

Page 19: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of Other InstructionsComparison of Other Instructions

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255Applications

Dis

trib

uti

on

of

Oth

er

Inst

ruct

ion

s Icount

MSVC++

IC++

Page 20: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Comparison of Cache MissesComparison of Cache Misses

0

0.2

0.4

0.6

0.8

1

1.2

164 176 177 181 186 197 252 253 254 255Applications

I-L

evel

Cac

he

Mis

ses

Rat

io

References

MSVC++

IC++

Page 21: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Conclusion & Future ResearchConclusion & Future Research

Execution characteristics of CPU2000 benchmarks was presented for VC++ and IC++

IC++ performed better than VC++ for all considered applications and more pronounced for graphics applications

Distribution of loads, stores and branches were same – difference in absolute numbers

No difference in branch prediction and memory references Use - Strength and weakness of compilers Future Directions

• Different Optimization switches

• Usage of microbenchmarks for better control

Page 22: ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

ACMSE’04, ALACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Department of Electrical and Computer Engineering - UAH

Thank You!

Questions and Feedback…