Lecture 2: Fundamentals of Computer Design Kai Bu kaibu@zju.edu.cn .

Post on 11-Jan-2016

222 views 4 download

Tags:

Transcript of Lecture 2: Fundamentals of Computer Design Kai Bu kaibu@zju.edu.cn .

Lecture 2: Fundamentalsof Computer Design

Kai Bukaibu@zju.edu.cn

http://list.zju.edu.cn/kaibu/comparch

Chapter 1

• Transition from single processor to multiple processors;

• Quantitative approach: empirical observations (of programs, experimentations, simulation) as its tools;

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

5 Classes of Computers

PMD: Personal Mobile Device

• Wireless devices with multimedia user interfaces

• cell phones, tablet computers, etc.• a few hundred dollars

PMD Characteristics• Cost effectiveness

less expensive packaging;absence of fan for cooling

• Responsiveness & Predictabilityreal-time performance: a maximum execution time for each

app segment;soft real-time: average time constraint – tolerate occasionally

missed time constraint on an event.

• Memory efficiencyoptimize code size

• Energy efficiencybattery power, heat dissipation

Desktop Computing

• Largest market share• low-end netbooks: $x00• …• high-end workstations: $x000

Desktop Characteristics

• Price-Performancecombination of performance and price;compute performancegraphics performance

• The most important to customers,and hence to computer designers

Servers

• Provide large-scale and reliable file and computing services (to desktops)

• Constitute the backbone of large-scale enterprise computing

Servers Characteristics

• Availabilityagainst server failure

• Scalabilityin response to increasing demand with scaling up computing capacity, memory, storage, and I/O bandwidth

• Efficient throughputtoward more requests handled in a unit time

Why Server Availability

Clusters/WSCsWarehouse-Scale Computerscollections of desktop computers or serversconnected by local area networks to act as a single larger computer

Characteristicsprice-performance, power, availability

Embedded Computers

hide everywhere

Embedded vs Non-embedded• Dividing line

the ability to run third-party software

• Embedded computers’ primary goalmeet the performance need at a minimum price;rather than achieve higher performance at a higher price

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Application Parallelism

• DLP: Data-Level Parallelismmany data items being operated on at the same time

• TLP: Task-Level Parallelismtasks of work created to be operate independently and largely in parallel

Hardware Parallelism

• Computer hardware exploits two kinds of application parallelism in four major ways:Instruction-Level ParallelismVector Architectures and GPUsThread-Level ParallelismRequest-Level Parallelism

Hardware Parallelism

• Instruction-Level Parallelismexploits data-level parallelismat modest levels – pipelining;at medium levels – speculative exec;

Hardware Parallelism

• Vector Architectures &GPUs (Graphic Process Units)exploit data-level parallelismapply a single instruction to a collection of data in parallel

Hardware Parallelism

• Thread-Level Parallelismexploits either DLP or TLPin a tightly coupled hardware modelthat allows for interaction among parallel threads

Hardware Parallelism

• Request-Level Parallelismexploits parallelism among largely decoupled tasks specified by the programmer or the OS

Classes of Parallel Architectures

by Michael Flynnaccording to the parallelismin the instruction and data streams called for by the instructions at the most constrained component of the multiprocessor:SISD, SIMD, MISD, MIMD

SISD

• Single instruction stream, single data stream – uniprocessor

• Can exploit instruction-level parallelism

SIMD

• Single instruction stream, multiple data stream

• The same instruction is executed by multiple processors using different data streams.

• Exploits data-level parallelism• Data memory for each processor;

whereas a single instruction memory and control processor.

MISD

• Multiple instruction streams, single data stream

• No commercial multiprocessor of this type yet

MIMD

• Multiple instruction streams, multiple data streams

• Each processor fetches its own instructions and operates on its own data.

• Exploits task-level parallelism

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Instruction Set Architecture

ISA• actual programmer-visible instruction

set• the boundary between software and

hardware• 7 major dimensions

ISA: Class

• Most are general-purpose register architectures with operands of either registers or memory locations

• Two popular versionsregister-memory ISA: e.g., 80x86

many instructions can access memoryload-store ISA: e.g., ARM, MIPS

only load or store instructions can access memory

ISA: Memory Addressing

• Byte addressing• Aligned address

object width: s bytesaddress: Aaligned if A mod s = 0

Each misaligned object requires two memory accesses

ISA: Addressing Modes

• Specify the address of a memory object

• Register, Immediate, Displacement

ISA: Types and Sizes of OPerands

Type Size in bits

ASCII character 8

Unicode characterHalf word

16

Integerword

32

Double wordLong integer

64

IEEE 754 floating point – single precision

32

IEEE 754 floating point – double precision

64

Floating point –extended double precision

80

MIPS64 Operations

• Data transfer

MIPS64 Operations

• Arithmetic Logical

MIPS64 Operations

• Control

MIPS64 Operations

• Floating point

ISA: Control Flow Instructions

• Types:conditional branchesunconditional jumpsprocedure callsreturns

• Branch address: add an address field to PC (program counter)

ISA: Encoding an ISA

• Fixed length: ARM, MIPS – 32 bits• Variable length: 80x86 – 1~18 bytes

http://en.wikipedia.org/wiki/MIPS_architecture

Start with a 6-bit opcode.

R-type: three registers,

a shift amount field, and a function field;

I-type: two registers,

a 16-bit immediate value; J-type:

a 26-bit jump target.

Computer Architecture

ISA Organization Hardwareactual programmervisible instruction set;boundary between swand hw;

high-level aspectsof computer design:

memory system,memory

interconnect,design of internal processor or CPU;

computer specifics:logic design,packaging tech;

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Five CriticalImplementation Technologies• Integrated circuit logic technology• Semiconductor DRAM• Semiconductor flash• Magnetic disk technology• Network technology

Integrated circuit logic technology

• Moore’s Law: a growth rate in transistor count on a chip of about 40% to 55% per year

doubles every 18 to 24 months

Semiconductor DRAM

• Capacity per DRAM chip doubles roughly every 2 or 3 years

Semiconductor Flash

• Electronically erasable programmable read-only memory

• Capacity per Flash chip doubles roughly every two years

• In 2011, 15 to 20 times cheaper per bit than DRAM

Magnetic Disk Technology

• Since 2004, density doubles every three years

• 15 to 20 times cheaper per bit than Flash

• 300 to 500 times cheaper per bit than DRAM

• For server and warehouse scale storage

Network Technology

• Switches• Transmission systems

Performance Trends

• Bandwidth/Throughputthe total amount of work done in a given time;

• Latency/Response Timethe time between the start and the completion of an event;

Bandwidth over Latency

Trends in Power and Energy

• Power = Energy per unit time1 watt = 1 joule per secondenergy to execute a workload = avg power x execution time

• Three primary concernsthe max power for a processorsustained power consumptionenergy and energy efficiency

Trends in Power and Energy• Sustained power consumption• Metric: TDP

Thermal Design Powerdetermines cooling requirement

• Heat management1. reduce clock rate and hence power as the thermal temperature approaches the junction temperature limit;2. if 1 is not working, power down the chip.

Trends in Power and Energy• Energy and Energy Efficiency• energy to execute a workload =

avg power x execution time• Example

processor A with 20% higher avg power consumption than processor B;but A executes the task with 70% of the time by B;A or B is more efficient?

Trends in Power and Energy• Example

processor A with 20% higher avg power consumption than processor B;but A executes the task with 70% of the time by B;A or B is more efficient?

• EnergyConsumptionA= 1.2 x 0.7 x EnergyConsumptionB=0.84 x EnergyConsumptionB

Trends in Power and Energy• Primary energy consumption within a micr

oprocessor is for switching transistors – dynamic energy

logic transistion: 0->1->0 or 1->0->1• The energy of a single transition

Trends in Power and Energy• The power required per transistor

• For a fixed task, slowing clock rate (frequency) reduces power, but not energy.

Trends in Power and Energy• Example

some microprocessors with adjustable voltage;15% reduction in voltage -> 15% reduction in frequency;the impact on dynamic energy and dynamic power?

Trends in Power and Energy• Answer

Trends in Power and Energy• Challenges

distributing the powerremoving the heatpreventing hot spots

potential research topics

Trends in Power and Energy• Energy-efficiency improvement

techniques1. do nothing wellturn off the clock of inactive modules2. DVFS: dynamic voltage-frequency scalingscale down clock frequency and voltage during periods of low activity

DVFS

Trends in Power and Energy• Energy-efficiency improvement techniques

3. design for typical casePMDs, laptops – often idlememory and storage with low power modes to save energy4. overclockingthe chip runs at a higher clock rate for a short time until temperature rises

Trends in Cost

• Cost of an Integrated Circuitwafer for test; chopped into dies for

packaging

Trends in Cost

• Cost of an Integrated Circuit

percentage of manufactured devices that survives the testing procedure

Trends in Cost

• Cost of an Integrated Circuit

Trends in Cost

• Cost of an Integrated Circuit

Intel Core i7 Die

Trends in Cost• Example

Trends in Cost• Example

Trends in Cost• Cost of an Integrated Circuit

• N: process-complexity factor for measuring manufacturing difficulty

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Dependability

• SLA: service level agreements• System states: up or down• Service states

service accomplishment

service interruption

failure restoration

Dependability• Two measures of dependability

Module reliabilityModule availability

Dependability• Two measures of dependability

Module reliabilitycontinuous service accomplishment from a reference initial instant

MTTF: mean time to failure MTTR: mean time to repairMTBF: mean time between failuresMTBF = MTTF + MTTR

Dependability• Two measures of dependability

Module reliabilityFIT: failures in time

failures per billion hours

MTTF of 1,000,000 hours= 109/106 = 1000 FIT

Dependability• Two measures of dependability

Module availability

Dependability

• Example

Dependability

• Answer

Outline

• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement

Measuring Performance

• Execution timethe time between the start and the completion of an event

• Throughputthe total amount of work done in a given time

Measuring Performance• Computer X and Computer Y• X is n times faster than Y

Quantitative Principles

• Parallelism• Locality

temporal locality: recently accessed items are likely to be accessed in the near future;spatial locality: items whose addresses are near one another tend to be referenced close together in time

Quantitative Principles• Amdahl’s Law

Quantitative Principles• Amdahl’s Law: two factors

1. Fractionenhanced: e.g., 20/60 if 20 seconds out of a 60-second program to enhance2. Speedupenhanced:e.g., 5/2 if enhanced to 2 seconds while originally 5 seconds

Quantitative Principles• Example

Quantitative Principles• The Processor Performance Equation

Quantitative Principles• Example

Quantitative Principles• Example

?

Reading

• Chapter 1.8, 1.10 – 1.13