lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola...

56
Lecture 04: ISA Principles CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan 1

Transcript of lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola...

Page 1: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Lecture04:ISAPrinciples

CSE564ComputerArchitectureSummer2017

DepartmentofComputerScienceandEngineering

YonghongYan

[email protected]

www.secs.oakland.edu/~yan

1

Page 2: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Contents

1.   Introduc@on2.   ClassifyingInstruc@onSetArchitectures3.   MemoryAddressing4.   TypeandSizeofOperands5.   Opera@onsintheInstruc@onSet6.   Instruc@onsforControlFlow7.   EncodinganInstruc@onSet8.   CrosscuMngIssues:TheRoleofCompilers9.   RISC-VISA•  Supplement

–  MIPSISA

–  RISCvsCISC–  CompilercompilaFonstages

–  ISAHistorical•  AppendixL

–  ComparisonofISA

•  AppendixK2

Page 3: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

1Introduc@on

Instruc@onSetArchitecture–theporFonofthemachine

visibletotheassemblylevelprogrammerortothe

compilerwriter

–  Tousethehardwareofacomputer,wemustspeakits

language

–  Thewordsofacomputerlanguagearecalledinstruc,ons,and

itsvocabularyiscalledaninstruc,onset

3

instruc@onset

soRware

hardware

Instr.# OperaFon+Operands

i movl-4(%ebp),%eax

(i+1)addl%eax,(%edx)

(i+2)cmpl8(%ebp),%eax

(i+3)jl L5

:

L5:

Page 4: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

4

sum.sforX86

•  h]p://www.hep.wisc.edu/~pinghc/

x86AssmTutorial.htm

•  h]ps://en.wikibooks.org/wiki/X86_Assembly/SSE

Page 5: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

5

sum.sforRISC-V

h]ps://riscv.org/

Page 6: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

2ClassifyingInstruc@onSetArchitectures

6

OperandstorageinCPU Wherearetheyotherthanmemory

#explicitoperandsnamedperinstruc@on

Howmany?Min,Max,Average

Addressingmode HowtheeffecFveaddressforan

operandcalculated?Canalluseany

mode?

Opera@ons WhataretheopFonsfortheopcode?

Type&sizeofoperands Howistypingdone?Howisthesize

specified?

Thesechoicescri,callyaffectnumberofinstruc,ons,CPI,and

CPUcycle,me

Page 7: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ISAClassifica@on

•  MostbasicdifferenFaFon:internalstorageinaprocessor

–  Operandsmaybenamedexplicitlyorimplicitly

•  Majorchoices:

1.  Inanaccumulatorarchitectureoneoperandisimplicitlythe

accumulator=>similartocalculator

2.  Theoperandsinastackarchitectureareimplicitlyonthe

topofthestack

3.  Thegeneral-purposeregisterarchitectureshaveonlyexplicitoperands–eitherregistersormemorylocaFon

7

Page 8: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

RegisterMachines

• Howmanyregistersaresufficient?

• General-purposeregistersvs.special-purposeregisters•  compilerflexibilityandhand-op,miza,on

•  TwomajorconcernsforarithmeFcandlogicalinstrucFons(ALU)

1.Twoorthreeoperands

•  X+Y⇒X•  X+Y⇒Z

2.Howmanyoftheoperandsmaybememoryaddresses(0–3)

8Hence,registerarchitectureclassifica@on(#mem,#operands)

Number of memory addresses

Maximum number of operands

allowed

Type of Architecture

Examples

0 3 Load-Store Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, TM32

1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x

2 2 Memory – memory VAX (also has 3 operand formats)

3 3 Memory - memory VAX (also has 2 operand formats)

Page 9: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

(0,3):Register-Register

•  ALUisRegistertoRegister–alsoknownaspureReducedInstruc-onSetComputer(RISC)

o Advantages–  simplefixedlengthinstrucFonencoding

–  decodeissimplesinceinstrucFontypesaresmall

–  simplecodegeneraFonmodel

–  instrucFonCPItendstobeveryuniform

–  exceptformemoryinstrucFonsofcourse

–  butthereareonly2ofthem-loadandstore

o Disadvantages–  instrucFoncounttendstobehigher

–  someinstrucFonsareshort-wasFnginstrucFonwordbits

9

Page 10: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

(1,2):Register-Memory

10

EvolvedRISCandalsooldCISC

•newRISCmachinescapableofdoingspecula,veloads

•predicatedand/ordeferredloadsarealsopossible

o Advantages–  dataaccesstoALUimmediatewithoutloadingfirst

–  instrucFonformatisrelaFvelysimpletoencode

–  codedensityisimprovedoverRegister(0,3)model

o Disadvantages–  operandsarenotequivalent-sourceoperandmaybe

destroyed

–  needformemoryaddressfieldmaylimit#ofregisters

–  CPIwillvary•ifmemorytargetisinL0cachethennotsobad

•ifnot-lifegetsmiserable

Page 11: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

(2,2)or(3,3):Memory-Memory

11

TrueandmostcomplexCISCmodel

•currentlyexFnctandlikelytoremainso

•morecomplexmemoryacFonsarelikelytoappearbutnot

directlylinkedtotheALU

o Advantages–  mostcompactcode

–  doesn’twasteregistersfortemporaryvalues

• goodideaforuseoncedata-e.g.streamingmedia

o Disadvantages–  largevariaFonininstrucFonsize-mayneedashoe-horn

–  largevariaFoninCPI-i.e.workperinstrucFon

–  exacerbatestheinfamousmemorybo]leneck

• registerfilereducesmemoryaccessesifreused

Notusedtoday

Page 12: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Summary:TradeoffsfortheISAClasses

12

Type Advantages Disadvantages

Register-register (0,3)

Simple, fixed length instruction encoding. Simple code generation model. Instructions take similar numbers of clocks to execute.

Higher instruction count than architectures with memory references in the instructions. More instructions and lower instruction density leads to larger programs

Register-memory (1,2)

Data can be accessed without a separate load instruction first. Instruction format tends to be easy to encode and yields good density

Operands are not equivalent since a source operand is destroyed. Encoding a register number and a memory address in each instruction may restrict the number of registers. Clocks per instruction vary by operand location

Memory-memory (2,2) or (3,3)

Most compact. Does not waste registers for temporaries.

Large variation in instruction size, especially for three-operand instructions. In addition, large variation in work per instruction. Memory accesses create memory bottleneck. (Not used today)

Page 13: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

3MemoryAddressing

• Objectshavebyteaddresses– thenumberofbytescountedfromthebeginningofmemory

• ObjectLength:– bytes(8bits),halfwords(16bits),– words(32bits),anddoublewords(64bits).– Thetypeisimpliedinopcode,e.g.,

•  LDB–loadbyte•  LDW–loadword,etc

• ByteOrdering– Li]leEndian:putsthebytewhoseaddressisxx00attheleastsignificantposiFonintheword.(7,6,5,4,3,2,1,0)

– BigEndian:putsthebytewhoseaddressisxx00atthemostsignificantposiFonin

theword.(0,1,2,3,4,5,6,7)

•  Problemoccurswhenexchangingdataamongmachineswithdifferent

orderings

13

Page 14: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Interpre@ngMemoryAddresses

•  AlignmentIssues

–  Accessestoobjectslargerthanabytemustbealigned.Anaccessto

anobjectofsizesbytesatbyteaddressAisalignedifAmods=0.

–  MisalignmentcauseshardwarecomplicaFons,sincethememoryis

typicallyalignedonawordoradouble-wordboundary

–  Misalignmenttypicallyresultsinanalignmentfaultthatmustbe

handledbytheOS

•  Hence–  byteaddressisanything-nevermisaligned

–  halfword-evenaddresses-loworderaddressbit=0(XXXXXXX0)elsetrap

–  word-loworder2addressbits=0(XXXXXX00)elsetrap–  doubleword-loworder3addressbits=0(XXXXX000)elsetrap

14

Page 15: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

MemoryAlignment

15

Page 16: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

16

Aligned/MisalignedAddresses

Page 17: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

17

AddressingModes

•  Howarchitecturespecifytheeffec@veaddressofanobject?

–  Effec-veaddress:theactualmemoryaddressspecifiedbythe

addressingmode.

•  E.g.Mem[R[R1]]referstothecontentsofthememory

loca,onwhoseloca,onisgiventhecontentsofregister1

(R1).

•  AddressingModes:–  Register.–  Immediate

–  Displacement

–  Registerindirect,……..

Page 18: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

18

AddressModes

Page 19: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

AddressingModeImpacts

•  InstrucFoncounts•  ArchitectureComplexity

•  CPI

19

Page 20: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

20

Summaryofuseofmemoryaddressingmodes

Page 21: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

21

Displacementvaluesarewidelydistributed

ImpactinstrucFonlength

Page 22: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

DisplacementAddressingMode

•  Benchmarksshow–  12bitsofdisplacementwouldcaptureabout75%ofthefull32-bit

displacements–  16bitsshouldcaptureabout99%

•  Remember:–  [email protected],thechoiceisatleast12-16bits

•  Foraddressesthatdofitindisplacementsize:

Add R4,10000(R0)

•  Foraddressesthatdon’tfitindisplacementsize,thecompilermustdo

thefollowing:

Load R1,1000000

AddR1,R0

Add R4,0(R1)

22

Page 23: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ImmediateAddressingMode

•  Usedwherewewanttogettoanumericalvalueinan

instrucFon

•  Around25%oftheopera@onshaveanimmediateoperand

23

Athighlevel:a=b+3;if(a>17)gotoAddr

AtAssemblerlevel:LoadR2,#3AddR0,R1,R2LoadR2,#17CMPBGTR1,R2LoadR1,AddressJump(R1)

Page 24: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

About25%ofdatatransferandALUopera@onshaveanimmediateoperand

24

ImpactinstrucFonlength

Page 25: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Numberofbitsforimmediate

•  16bitswouldcaptureabout80%and8bitsabout50%.

25

ImpactinstrucFonlength

Page 26: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Summary:MemoryAddressing

•  Anewarchitectureexpectedtosupportatleast:displacement,immediate,andregisterindirect–  represent75%to99%oftheaddressingmodes

•  Thesizeoftheaddressfordisplacementmodetobeat

least12-16bits–  capture75%to99%ofthedisplacements

•  Thesizeoftheimmediatefieldtobeatleast8-16bits–  capture50%to80%oftheimmediates

Processorsrelyoncompilerstogeneratecodesusingthose

addressingmode

26

Page 27: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

4TypeAndSizeofOperands

• Thetypeoftheoperandisusuallyencodedintheopcode–  e.g.,LDB–loadbyte;LDW–loadword

• Commonoperandtypes:(implytheirsizes)

Character(8bitsor1byte)

Halfword(16bitsor2bytes)

Word(32bitsor4bytes)

Doubleword(64bitsor8bytes)

SingleprecisionfloaFngpoint(4bytesor1word)

DoubleprecisionfloaFngpoint(8bytesor2words)

ü CharactersarealmostalwaysinASCII

ü 16-bitUnicode(usedinJava)isgainingpopularityü  Integersaretwo�scomplementbinary

ü  Floa,ngpointsfollowtheIEEEstandard754• Somearchitecturessupportpackeddecimal:4bitsareusedto

encodethevalues0-9;2decimaldigitsarepackedintoeachbyte27

Howisthetypeofanoperanddesignated?

Page 28: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Distribu@onofdataaccessesbysize

28

Page 29: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Summary:TypeandSizeofoperands

•  32-architecturesupports8-,16-,and32-bitintegers,32-bitand64-bitIEEE754floaFng-pointdata.

•  Anew64-bitaddressarchitecturesupports64-bitintegers•  MediaprocessorandDSPsneedwideraccumulaFng

registersforaccuracy.

29

Page 30: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

5Opera@onsintheInstruc@onSet

•  AllcomputersgenerallyprovideafullsetofoperaFonsfor

thefirstthreecategories

•  AllcomputersmusthavesomeinstrucFonsupportforbasic

systemfuncFons

•  GraphicsinstrucFonstypicallyoperateonmanysmaller

dataitemsinparallel

30

Page 31: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Top10instruc@onsforthe80x86

31

Page 32: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

6Instruc@onsforControlFlow

•  ControlinstrucFonschangetheflowofcontrol:–  insteadofexecuFngthenextinstrucFon,theprogram

branchestotheaddressspecifiedinthebranchinginstrucFons

•  Theybreakthepipeline–  DifficulttoopFmizeout

–  ANDtheyarefrequent•  FourtypesofcontrolinstrucFons

–  Condi@onalbranches–  Jumps–uncondi@onaltransfer–  Procedurecalls–  Procedurereturns

32

Page 33: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Breakdownofcontrolflowinstruc@ons

–  Condi@onalbranches–  Jumps–uncondi@onaltransfer–  Procedurecalls–  Procedurereturns

•  Issues:–  Whereisthetargetaddress?Howtospecifyit?–  Whereisreturnaddresskept?Howaretheargumentspassed?

(calls)–  Whereisreturnaddress?Howaretheresultspassed?(returns)

33

Page 34: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

AddressingModesforControlFlowInstruc@ons

•  PC-relaFve(ProgramCounter)

–  supplyadisplacementaddedtothePC

•  KnownatcompileFmeforjumps,branches,andcalls(specified

withintheinstrucFon)

–  thetargetisoxennearthecurrentinstrucFon•  requiringfewerbits•  independentlyofwhereitisloaded(posiFonindependence)

•  Registerindirectaddressing–dynamicaddressing

–  ThetargetaddressmaynotbeknownatcompileFme

–  Namingaregisterthatcontainsthetargetaddress

•  Caseorswitchstatements

•  VirtualfuncFonsormethodsinC++orJava

•  High-orderfuncFonsorfuncFonpointersinCorC++•  Dynamicallysharedlibraries

34

Page 35: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Branchdistances

35

Page 36: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Condi@onalBranchOp@ons

36

Figure2.21MajormethodsforevaluaFngbranchcondiFons

Page 37: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ComparisonTypevs.Frequency

•  Mostloopsgofrom0ton.

•  Mostbackwardbranches

areloops–takenabout

90%

37

Program % backward branches

% all control instructions that modify PC

gcc 26% 63% spice 31% 63% TeX 17% 70% Average 25% 65%

Page 38: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ProcedureInvoca@onOp@ons

•  Procedurecallsandreturns–  controltransfer–  statesaving;thereturnaddressmustbesaved

Newerarchitecturesrequirethecompilertogeneratestoresandloads

foreachregistersavedandrestored

•  TwobasicconvenFonsinusetosaveregisters–  callersaving:thecallingproceduremustsavetheregistersthatit

wantspreservedforaccessaxerthecall

•  thecalledprocedureneednotworryaboutregisters–  calleesaving:thecalledproceduremustsavetheregistersitwantsto

use

•  leavingthecallerunrestrained

mostrealsystemstodayuseacombinaFonofboth

•  ApplicaFonbinaryinterface(ABI)thatsetdownthebasicrulesastowhichregisterbecallersavedandwhichshouldbecalleesaved

38

Page 39: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

7.EncodinganInstruc@onSet

•  Opcode:specifyingtheoperaFon•  #ofoperand

–  addressingmode

–  addressspecifier:tellswhataddressingmodeisused

–  Load-storecomputer

•  Onlyonememoryoperand

•  Onlyoneortwoaddressingmodes

•  ThearchitecturemustbalancingseveralcompeFngforceswhen

encodingtheinstrucFonset:

–  #ofregisters&&Addressingmodes

–  Sizeofregisters&&Addressingmodefields;AverageinstrucFonsize&&Averageprogramsize.

–  EasytohandleinpipelineimplementaFon.

39

Page 40: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

x86vs.AlphaInstruc@onFormats

•  x86:

•  Alpha:

40

Page 41: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Instruc@onLengthTradeoffs

•  Fixedlength:LengthofallinstrucFonsthesame

+EasiertodecodesingleinstrucFoninhardware

+EasiertodecodemulFpleinstrucFonsconcurrently

--WastedbitsininstrucFons(Whyisthisbad?)

--Harder-to-extendISA(howtoaddnewinstrucFons?)

•  Variablelength:LengthofinstrucFonsdifferent(determinedbyopcodeandsub-opcode)

+Compactencoding(Whyisthisgood?)

Intel432:Huffmanencoding(sortof).6to321bitinstrucFons.How?

--MorelogictodecodeasingleinstrucFon

--HardertodecodemulFpleinstrucFonsconcurrently

•  Tradeoffs–  Codesize(memoryspace,bandwidth,latency)vs.hardwarecomplexity

–  ISAextensibilityandexpressiveness

–  Performance?Smallercodevs.imperfectdecode

41

Page 42: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

UniformvsNon-uniformDecode

•  Uniformdecode:SamebitsineachinstrucFoncorrespond

tothesamemeaning

–  OpcodeisalwaysinthesamelocaFon

–  immediatevalues,…

–  Many�RISC�ISAs:Alpha,MIPS,SPARC

+Easierdecode,simplerhardware

+Enablesparallelism:generatetargetaddressbeforeknowingtheinstrucFon

isabranch

--RestrictsinstrucFonformat(fewerinstrucFons?)orwastesspace

•  Non-uniformdecode

–  E.g.,opcodecanbethe1st-7thbyteinx86+MorecompactandpowerfulinstrucFonformat

--Morecomplexdecodelogic

42

Page 43: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Threebasicvaria@oninstruc@onencoding:variablelength,fixedlength,andhybrid

43

Thelengthof80x86(CISC)instruc@onsvariesbetween1and17bytes.

ThelengthofmostRISCISAinstruc@onsare4bytes.

&&

X86programaregenerallysmallerthanRISCISA.

ToreduceRISCcodesize

Page 44: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ReducedCodeSizeinRISCs

•  Hybridencoding–support16-bitand32-bitinstrucFonsinRISC,eg.ARMThumb,MIPS16andRISC-V

–  narrowinstrucFonssupportfeweroperaFons,smalleraddressand

immediatefields,fewerregisters,andtwo-addressformatratherthanthe

classicthree-addressformat

–  claimacodesizereducFonofupto40%

•  CompressioninIBM’sCodePack–  AddshardwaretodecompressinstrucFonsastheyarefetchedfrom

memoryonaninstrucFoncachemiss

–  TheinstrucFoncachecontainsfull32-bitinstrucFons,butcompressed

codeiskeptinmainmemory,ROMs,andthedisk

–  ClaimcodereducFon35%-40%

–  PowerPCcreateaHashtableinmemorythatmapbetweencompressed

anduncompressedaddress.Codesize35%~40%

•  Hitachi’sSuperH:fixed16-bitformat

–  16ratherthan32registers–  fewerinstrucFons

44

Page 45: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

SummaryofInstruc@onEncoding

•  Threechoices–  Variable,fixedandhybrid–  Notethedifferencesofhybridandvariable

•  ChoicesofinstrucFonencodingisatradeoffbetween–  Forperformance:fixedencoding

–  Forcodesize:variableencoding

•  HowhybridencodingisusedinRISCtoreducecodesize–  16bitand32bit

•  Ingeneral,wesee:–  RISC:fixedorhybrid–  CISC:variable

45

Page 46: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

8TheRoleofCompilers

•  Almostallprogrammingisdoneinhigh-levellanguages.

–  AnISAisessenFallyacompliertarget.

•  Seebackupslidesforthecompila,onstagebymost

compiler,e.g.gcc

•  Compilergoals:

–  Allcorrectprogramsexecutecorrectly

–  Mostcompiledprogramsexecutefast(opFmizaFons)

–  FastcompilaFon

–  Debuggingsupport

46

Page 47: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

TypicalModernCompilerStructure

47

Figure A.19 Compilers typically consist of two to four passes, with more highly optimizing compilers having more passes. This structure maximizes the probability that a program compiled at various levels of optimization will produce the same output when given the same input. The optimizing passes are designed to be optional and may be skipped when faster compilation is the goal and lower-quality code is acceptable. A pass is simply one phase in which the compiler reads and transforms the entire program. (The term phase is often used inter-changeably with pass.) Because the optimizing passes are separated, multiple languages can use the same optimizing and code generation passes. Only a new front end is required for a new language.

Page 48: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Op@miza@onTypes

•  Highlevel–doneatornearsourcecodelevel–  Ifprocedureiscalledonlyonce,putitin-lineandsaveCALL–  moregeneralcase:ifcall-count<somethreshold,putthemin-line

•  Local–donewithinstraight-linecode–  commonsub-expressionsproducesamevalue–eitherallocatea

registerorreplacewithsinglecopy

–  constantpropagaFon–replaceconstantvaluedvariablewiththeconstant

–  stackheightreducFon–re-arrangeexpressiontreetominimize

temporarystorageneeds

•  Global–acrossabranch–  copypropagaFon–replaceallinstancesofavariableAthathas

beenassignedX(i.e.,A=X)withX.

–  codemoFon–removecodefromaloopthatcomputessamevalue

eachiteraFonoftheloopandputitbeforetheloop

–  simplifyoreliminatearrayaddressingcalculaFonsinloops

48

Page 49: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Op@miza@onTypes

•  Machine-dependentopFmizaFons–basedonmachine

knowledge

–  strengthreducFon–replacemulFplybyaconstantwithshixs

andadds

•  wouldmakesenseiftherewasnohardwaresupportfor

MUL

•  atrickierversion:17×=arithmeFclexshix4andadd

•  pipeliningscheduling–reorderinstrucFonstoimprove

pipelineperformance

–  dependencyanalysis–  branchoffsetopFmizaFon-reordercodetominimizebranch

offsets

49

Page 50: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Majortypesofop@miza@ons

50

Page 51: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

ComplierOp@miza@ons–ChangeinIC

51

•  L0–unopFmized

•  L1–localopts,codescheduling,&localreg.allocaFon•  L2–globaloptsandlooptransformaFons,&globalreg.AllocaFon

•  L3–procedureintegraFon

gcc-O2hello.c-ohello

Page 52: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

CompilerBasedRegisterOp@miza@on

•  Compilerassumessmallnumberofregisters(16-32)–  OpFmizinguseisuptocompiler

–  HLLprogramshavenoexplicitreferencestoregisters

–  usually–isthisalwaystrue?

•  CompilerApproach

–  Assignsymbolicorvirtualregistertoeachcandidatevariable

–  Map(unlimited)symbolicregisterstorealregisters

–  Symbolicregistersthatdonotoverlapcansharerealregisters

–  Ifyourunoutofrealregisterssomevariables

•  Spilling

Page 53: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

GraphColoring

•  Givenagraphofnodesandedges–  Assignacolortoeachnode–  Adjacentnodeshavedifferentcolors–  Useminimumnumberofcolors

•  RegistraFonallocaFon–  Nodesaresymbolicregisters

–  Tworegistersthatareliveinthesameprogramfragmentare

joinedbyanedge

–  Trytocolorthegraphwithncolors,wherenisthenumberof

realregisters

–  Nodesthatcannotbecoloredareplacedinmemory

h]ps://en.wikipedia.org/wiki/Graph_coloring

Page 54: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Iron-codeSummary

•  Sec-onA.2—Usegeneral-purposeregisterswithaload-storearchitecture.•  Sec-onA.3—Supporttheseaddressingmodes:displacement(withanaddressoffset

sizeof12to16bits),immediate(size8to16bits),andregisterindirect.•  Sec-onA.4—Supportthesedatasizesandtypes:8-,16-,32-,and64-bitintegersand

64-bitIEEE754floa@ng-pointnumbers.–  Nowwesee16-bitFPfordeeplearninginGPU

•  hup://www.nextplavorm.com/2016/09/13/nvidia-pushes-deep-learning-inference-new-pascal-gpus/

•  Sec-onA.5—Supportthesesimpleinstruc@ons,sincetheywilldominatethenumberofinstruc@onsexecuted:load,store,add,subtract,moveregister-register,andshiR.

•  Sec-onA.6—Compareequal,comparenotequal,compareless,branch(withaPC-rela@veaddressatleast8bitslong),jump,call,andreturn.

•  Sec-onA.7—Usefixedinstruc@onencodingifinterestedinperformance,andusevariableinstruc@onencodingifinterestedincodesize.

•  Sec-onA.8—Provideatleast16general-purposeregisters,besurealladdressingmodesapplytoalldatatransferinstruc@ons,andaimforaminimalistIS

–  ORenuseseparatefloa@ng-pointregisters.–  Thejus@fica@onistoincreasethetotalnumberofregisterswithoutraisingproblems

intheinstruc@onfor-matorinthespeedofthegeneral-purposeregisterfile.Thiscompromise,however,isnotorthogonal.

54

Page 55: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

RealWorldISA

55

Page 56: lecture04 ISA Principles - GitHub Pages · 1 2 Register-Memory IBM 360/370, Intel 80x86, Motorola ... – decode is simple since instrucFon types are small ... • PC-relaFve ...

Thedetailsistotrade-off

56