Cs2100 14 Understanding Performance

46
CS2100 Computer Organisation Understanding Performance

Transcript of Cs2100 14 Understanding Performance

  • CS2100 Computer Organisation Understanding Performance

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Performance MetricsPurchasing perspective given a collection of machines, which has the best performance ?least cost ?best cost/performance?

    Design perspectivefaced with design options, which has the best performance improvement ?least cost ?best cost/performance?

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Performance MetricsBoth requirebasis for comparisonmetric for evaluation

    Goal: Understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors

    Understanding Performance

  • 2011 Sem 1Understanding Performance*On cost and pricingTechnology costR&D Manufacturing costRaw materialCost of factoryMarketing costDistribution (including advertising)Price of competitorVolumeConsumer expectation

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Moores LawThe number of transistors that can be inexpensively placed on an integrated circuit is increasing exponentially, doubling approximately every two years.- Gordon Moore (1965) co-founder of Intel

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Moores Law

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Main driver: device scaling ...From: Facing the Hot Chips Challenge Again, Bill Holt, Intel, presented at Hot Chips 17, 2005.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Secondary driver: Wafer sizeFrom: Facing the Hot Chips Challenge Again, Bill Holt, Intel, presented at Hot Chips 17, 2005.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Intel Penryn45nm technologyQuad-core will have 820 million transistorsDual core will be 107mm2

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Defining (Speed) PerformanceNormally interested in reducingResponse time (aka execution time) the time between the start and the completion of a taskImportant to individual usersThus, to maximize performance, need to minimize execution timeperformanceX = 1 / execution_timeX

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Defining (Speed) Performance

    Throughput the total amount of work done in a given timeImportant to data center managersDecreasing response time almost always improves throughputIf X is n times faster than Y, then

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Performance FactorsWant to distinguish elapsed time and the time spent on our task

    CPU execution time (CPU time) time the CPU spends working on a taskDoes not include time waiting for I/O or running other programs

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Performance FactorsCan improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Review: Machine Clock RateClock rate (MHz, GHz) is inverse of clock cycle time (clock period)CC = 1 / CRone clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate500 psec clock cycle => 2 GHz clock rate250 psec clock cycle => 4 GHz clock rate200 psec clock cycle => 5 GHz clock rate

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Myth: Higher clock = faster CPU

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Clock Cycles per InstructionNot all instructions take the same amount of time to executeOne way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Clock Cycles per InstructionClock cycles per instruction (CPI) the average number of clock cycles each instruction takes to executeA way to compare two different implementations of the same ISA

    CPI for this instruction classABCCPI123

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Effective CPIComputing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averagingOverall effective CPI = (CPIi x ICi)i = 1nWhere ICi is the count (percentage) of the number of instructions of class i executedCPIi is the (average) number of clock cycles per instruction for that instruction classn is the number of instruction classes

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Effective CPIThe overall effective CPI varies by instruction mix a measure of the dynamic frequency of instructions across one or many programs

    Understanding Performance

  • 2011 Sem 1Understanding Performance*The Performance EquationOur basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle or

    Understanding Performance

  • 2011 Sem 1Understanding Performance*The Performance EquationThese equations separate the three key factors that affect performanceCan measure the CPU execution time by running the programThe clock rate is usually givenCan measure overall instruction count by using profilers/ simulators without knowing all of the implementation detailsCPI varies by instruction type and ISA implementation for which we must know the implementation details

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle

    Instruction_countCPIclock_cycleAlgorithmProgramming languageCompilerISAProcessor organizationTechnology

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycleXXXXXXXXXXXX

    Instruction_countCPIclock_cycleAlgorithmProgramming languageCompilerISAProcessor organizationTechnology

    Understanding Performance

  • 2011 Sem 1Understanding Performance*A Simple ExampleHow much faster would the machine be if a better data cache reduced the average load time to 2 cycles?How does this compare with using branch prediction to shave a cycle off the branch time?What if two ALU instructions could be executed at once?

    OpFreqCPIiFreq x CPIiALU50%1.Load20%5Store10%3Branch20%2 =

    Understanding Performance

  • 2011 Sem 1Understanding Performance*A Simple Example.51.0.3.42.21.6.5 .4.3.4.51.0.3.22.0.251.0.3.41.95

    OpFreqCPIiFreq x CPIiALU50%1 Load20%5Store10%3Branch20%2 =

    Understanding Performance

  • 2011 Sem 1Understanding Performance*BenchmarkingHow not to compare apples with oranges?Use a standard set of programs to test performanceNeed more than a single programA benchmark suite is a standard set of (usually portable) programs used for performance testingInput specifiedExpected output checked (with error margins defined)

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Comparing and Summarizing PerformanceHow do we summarize the performance for benchmark set with a single number?The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM)Where Timei is the execution time for the ith program of a total of n programs in the workloadA smaller mean indicates a smaller average execution time and thus improved performance

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Comparing and Summarizing PerformanceGuiding principle in reporting performance measurements is reproducibility list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.))

    Understanding Performance

  • 2011 Sem 1Understanding Performance*SPEC2000 Benchmarks www.spec.org

    Integer benchmarksFP benchmarksgzipcompressionwupwiseQuantum chromodynamicsvprFPGA place & routeswimShallow water modelgccGNU C compilermgridMultigrid solver in 3D fieldsmcfCombinatorial optimizationappluParabolic/elliptic pdecraftyChess programmesa3D graphics libraryparserWord processing programgalgelComputational fluid dynamicseonComputer visualizationartImage recognition (NN)perlbmkperl applicationequakeSeismic wave propagation simulationgapGroup theory interpreterfacerecFacial image recognitionvortexObject oriented databaseammpComputational chemistrybzip2compressionlucasPrimality testingtwolfCircuit place & route fma3dCrash simulation femsixtrackNuclear physics accelapsiPollutant distribution

    Understanding Performance

  • 2011 Sem 1Understanding Performance*SPEC2006 Benchmarks www.spec.org

    Integer benchmarksperlbmkperl applicationbzip2compressiongccGNU C compilermcfCombinatorial optimizationgoGo programhmmerGene sequencingsjengAI pattern recognitionlibquantumQuantum computingh264refVideo compressionomnetppDiscrete event simulationastarComputer gamexalancbmkXML processor

    Understanding Performance

  • 2011 Sem 1Understanding Performance*SPEC2006 Benchmarks www.spec.org

    Floating point benchmarksbwavesComputational fluid dynamicsgamessQuantum chemical computationsmilcQuantum ChromodynamicszeusmpMagnetohydrodynamicsgromacsMolecular DynamicscactusADMGeneral relativity calculationsleslie3DComputational fluid dynamicsnamdMolecular Dynamics SimulationdealIISolution of Partial Differential EquationssoplexSimplex Linear Program (LP) SolverpovrayComputer VisualizationcalculixStructural MechanicsgemFDTDComputational ElectromagneticstontoQuantum CrystallographylbmCFD Lattice Boltzmann MethodwrfWeather Forecastingsphinx3Speech Recognition

    Understanding Performance

  • 2011 Sem 1Understanding Performance*SPEC_rateAfter the benchmarks are run on the system under test (SUT), a ratio for each of them is calculated using the run time on the SUT and a SPEC-determined reference time. From these ratios, the following metrics are calculated:

    CINT2006 (for integer compute intensive performance comparisons):

    * SPECint2006: The geometric mean of twelve normalized ratios - one for each integer benchmark - when the benchmarks are compiled with peak tuning. * SPECint_base2006: The geometric mean of twelve normalized ratios when the benchmarks are compiled with base tuning. * SPECint_rate2006: The geometric mean of twelve normalized throughput ratios when the benchmarks are compiled with peak tuning. * SPECint_rate_base2006: The geometric mean of twelve normalized throughput ratios when the benchmarks are compiled with base tuning.

    Similar metrics are used for the FP benchmarks.

    In all cases, a higher score means "better performance" on the given workload.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*NotesThe geometric mean of a data set [a1, a2, ..., an] is given by

    SPEC uses a historical Sun system, the Ultra Enterprise 2 which was introduced in 1997, as the reference machine. The reference machine uses a 296 MHz UltraSPARC II processor.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Performance improvement

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Other benchmarks (Windows world)BAPCO's SYSmark 2004 SE. Based on real Windows applications including Office. POV-Ray 3.70 beta 20. 3ds Max 8. 3D modeling and animation tool. NewTek LightWave. Another 3D modeling and animation tool used primarily for special effects.Adobe After Effects. Maxon's Cinebench, 3D content creation benchmark. Windows Media Encoder. Mainconcept H.264 and MPEG-2 Encoders. 3DMark06 and PCMark05. Synthetic benchmarks. Games, such as Flight Simulator X, Supreme Commander, Prey, and Company of Heroes.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*The Challenges of BenchmarkingProducts can be tuned for specific benchmarksMeasures only performanceSecurity, cost, reliability, serviceability, scalability etc. not measuredMay not reflect the actual workloadA good read: http://donutey.com/hardwaretesting.php

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Other Performance MetricsPower consumption especially in the embedded market where battery life is important (and passive cooling)For power-limited applications, the most important metric is energy efficiency

    Understanding Performance

  • 2011 Sem 1Understanding Performance*The Challenge of PowerSource: Fred Pollack, Intel Corp.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Intels chip

    Understanding Performance

  • 2011 Sem 1Understanding Performance*The reality of coolingDesktop PCMotherboardCooling tower

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Summary: Evaluating ISAs Design-time metrics:Can it be implemented, in how long, at what cost?Can it be programmed? Ease of compilation? Static Metrics:How many bytes does the program occupy in memory? Dynamic Metrics:How many instructions are executed? How many bytes does the processor fetch to execute the program?How many clocks are required per instruction?How "lean" a clock is practical?Best Metric: Time to execute the program! depends on the instructions set, the processor organization, and compilation techniques.

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Multicore processorsBoth Intel and AMD already sell 2-4 core processorsA single chip contains 2-4 CPUsThese can be used as throughput machinesFor a single application, to take advantage of multicore must go for parallel processing

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Intel Core i7 (Sandy Bridge)

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Intel Core i7 (Sandy Bridge)

    Understanding Performance

  • 2011 Sem 1Understanding Performance*Parallel programming matters!QX6850 4 cores E6850 2 cores

    Understanding Performance

  • 2011 Sem 1Understanding Performance*END

    Understanding Performance