Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of...

165
Chapter 2: Instructions: Language of the Computer CSCE 212 Introduction to Computer Architecture, Spring 2019 https://passlab.github.io/CSCE212/ Department of Computer Science and Engineering Yonghong Yan [email protected] http://cse.sc.edu/~yanyh

Transcript of Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of...

Page 1: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

CSCE212IntroductiontoComputerArchitecture,Spring2019https://passlab.github.io/CSCE212/

DepartmentofComputerScienceandEngineeringYonghongYan

[email protected]://cse.sc.edu/~yanyh

Page 2: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 2

Page 3: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ReviewofLecture06

3

Page 4: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Review:CBasics

4

#include <stdio.h>int main(int argc, char* argv[]){/* print a greeting */printf("Good evening!\n");return 0;

}

Page 5: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CVariableandPointer

5

&=addressof*=contentsat

Page 6: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CPointerandMemory

6

&=addressof*=contentsat

Page 7: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Example2:swap_2

void swap_2(int *a, int *b){int temp;temp = *a;*a = *b;*b = temp;

}

void call_swap_2( ) {int x = 3;int y = 4;swap_1(&x, &y);/* values of x and y ? */

}

Q: Let x=3, y=4,after swap_2(&x,&y);x =? y=?

A1: x=3; y=4;

A2: x=4; y=3;

Page 8: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Arrays

• Adjacentmemorylocationsstoringthesametypeofdata• int a[6];meansspaceforsixintegers

• aisthenameofthearray’sbaseaddress– 0x0C

Page 9: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

AddressofArrayElements

• int a[6];

• a isthenameofthearray’sbaseaddress– 0x0C

– E.g.&a[2]:0x0C+2*4=0x14• Byitself,aisalsotheaddressofthefirstinteger

– *aanda[0]meanthesamething• Theaddressofaisnotstoredinmemory:thecompilerinsertscodetocomputeitwhenitappears

&a[i]:a+i *sizeof(int)

Page 10: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CStoresArrayinMemoryinRowMajor

10

8 6 5 4

2 1 9 7

3 6 4 2

int A[3][4];

=A +offset(fromAtoA[1][2])=A +sizeof (int)*(1 *4 +2)=A +4*6=A +24

OffsetofA[1][2]

AddressofelementA[1][2]:

Page 11: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofReviewofLecture06

11

Page 12: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 12

Page 13: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSandX86_64AssemblyExample

13

§2.1 Introduction

Page 14: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

InstructionSet

• Therepertoireofinstructionsofacomputer• Differentcomputershavedifferentinstructionsets

– Butwithmanyaspectsincommon• Earlycomputershadverysimpleinstructionsets

– Simplifiedimplementation• Manymoderncomputersalsohavesimpleinstructionsets

14

Page 15: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

InstructionSetArchitecture:theInterfacebetweenHardwareandSoftware

InstructionSetArchitecture– theportionofthemachinevisibletotheassemblylevelprogrammerortothecompilerwriter– Tousethehardwareofacomputer,wemustspeak itslanguage– Thewordsofacomputerlanguagearecalledinstructions,andits

vocabularyiscalledaninstructionset

instructionset

software

hardware

Instr.# Operation+Operandsi movl -4(%ebp),%eax(i+1) addl %eax,(%edx)(i+2) cmpl 8(%ebp),%eax(i+3) jl L5:L5:

15

Page 16: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RISCvs.CISC• Design“philosophies”forISAs:RISCvs.CISC

– CISC=ComplexInstructionSetComputer• X86,X86_64(IntelandAMD,main-streamdesktop/laptop/server)• X86*internallyarestillRISC

– RISC=ReducedInstructionSetComputer• ARM:smartphone/pad• RISC-V:freeISA,closertoMIPSthanotherISAs,thesametextbookinRISC-Vversion• Others:Power,SPARC,etc

• Tradeoff:

• RISC:– Smallinstructionset

• Easierforcompilers– Limiteachinstructionto(atmost):

• threeregisteraccesses,• onememoryaccess,• oneALUoperation• =>facilitatesparallelinstructionexecution(ILP)

– Load-storemachine:minimizeoff-chipaccess

cycle ClockSeconds

nInstructiocycles Clock

ProgramnsInstructioTime CPU ´´=

Page 17: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheMIPSInstructionSet

• Usedastheexamplethroughoutthebook• StanfordMIPScommercializedbyMIPSTechnologies(www.mips.com)

• Largeshareofembeddedcoremarket– Applicationsinconsumerelectronics,network/storageequipment,

cameras,printers,…• TypicalofmanymodernISAs

– SeeMIPSReferenceDatatear-outcard,andAppendixesBandE

• OtherInstructionSetArchitectures:– X86andX86_32:IntelandAMD,main-streamdesktop/laptop/server– ARM:smartphone/pad– RISC-V:emergingandfreeISA,closertoMIPSthanotherISAs

• ThesametextbookinRISC-Vversion– Others:Power,SPARC,etc

17

Page 18: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ArithmeticOperations

• Addandsubtract,threeoperands– Twosourcesandonedestination

add a, b, c # a gets b + c

• Allarithmeticoperationshavethisform

• DesignPrinciple1: Simplicityfavours regularity– Regularitymakesimplementationsimpler– Simplicityenableshigherperformanceatlowercost

§2.2 Operations of the C

omputer H

ardware

18

Page 19: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ArithmeticExample

• Ccode:

f = (g + h) - (i + j);

• CompiledMIPScode:

add t0, g, h # temp t0 = g + hadd t1, i, j # temp t1 = i + jsub f, t0, t1 # f = t0 - t1

19

Page 20: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RegistersinCPU

• Registersaresuper-fastsmallstorageusedinCPU.• Dataandinstructionsneedtobeloadedtoregisterinordertobeprocessed.

20

§2.3 Operands of the C

omputer H

ardware

Page 21: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RegisterOperands

• Arithmeticinstructionsuseregisteroperands• MIPShasa32× 32-bitregisterfile

– Useforfrequentlyaccesseddata– Numbered0to31– 32-bitdatacalleda“word”

• Assemblernames– $t0,$t1,…,$t9fortemporaryvalues– $s0,$s1,…,$s7forsavedvariables

• DesignPrinciple2: Smallerisfaster– c.f.mainmemory:millionsoflocations

21

Page 22: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RegisterOperandExample

• Ccode:

f = (g + h) - (i + j);– f,…,jin$s0,…,$s4

• CompiledMIPScode:add $t0, $s1, $s2 #register$t0containsg+hadd $t1, $s3, $s4 #register$t1containsi +jsub $s0, $t0, $t1 #$s0gets$t0– $t1,whichis

#(g+h)–(i +j)

22

Page 23: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MemoryOperands

• Mainmemoryusedforcompositedata– Arrays,structures,dynamicdata

• Toapplyarithmeticoperations– Loadvaluesfrommemoryintoregisters– Storeresultfromregistertomemory

• Memoryisbyteaddressed– Eachaddressidentifiesan8-bitbyte

• Wordsarealignedinmemory– Addressmustbeamultipleof4

• MIPSisBigEndian– Most-significantbyteatleastaddressofaword– c.f. LittleEndian:least-significantbyteatleastaddress

23

Page 24: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MemoryOperandExample1

• Ccode:int a[N]

g = h + A[8];– gin$s1,hin$s2,baseaddressofAin$s3

• CompiledMIPScode:– Index8requiresoffsetof32,A[8]isright-val referenceà load

• 4bytesperword

lw $t0, 32($s3) # load wordadd $s1, $s2, $t0

offset base register

24

Page 25: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MemoryOperandExample2

• Ccode:int a[N]

A[12] = h + A[8];– hin$s2,baseaddressofAin$s3

• CompiledMIPScode:– Index8requiresoffsetof32:A[8]:right-val,A[12]:left-val

lw $t0, 32($s3) # load wordadd $t0, $s2, $t0sw $t0, 48($s3) # store word

25

Page 26: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Registersvs.Memory

• Registersarefastertoaccessthanmemory• Operatingonmemorydatarequiresloadsandstores

– Moreinstructionstobeexecuted• Compilermustuseregistersforvariablesasmuchaspossible

– Onlyspill tomemoryforlessfrequentlyusedvariables– Registeroptimizationisimportant!

26

Page 27: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ImmediateOperands

• Constant dataspecifiedinaninstructionaddi $s3, $s3, 4

• Nosubtractimmediateinstruction– Justuseanegativeconstantaddi $s2, $s1, -1

• DesignPrinciple3:Makethecommoncasefast– Smallconstantsarecommon– Immediateoperandavoidsaloadinstruction

27

Page 28: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheConstantZero

• MIPSregister0($zero)istheconstant0– Cannotbeoverwritten

• Usefulforcommonoperations– E.g.,movebetweenregistersadd $t2, $s1, $zero

28

Page 29: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

UnsignedBinaryIntegers

• Givenann-bitnumber

00

11

2n2n

1n1n 2x2x2x2xx ++++= -

--

- !

n Range: 0 to +2n – 1n Example

n 0000 0000 0000 0000 0000 0000 0000 10112= 0 + … + 1×23 + 0×22 +1×21 +1×20

= 0 + … + 8 + 0 + 2 + 1 = 1110

n Using 32 bitsn 0 to +4,294,967,295

§2.4 Signed and Unsigned N

umbers

29

Page 30: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

2s-ComplementSignedIntegers• Givenann-bitnumber

00

11

2n2n

1n1n 2x2x2x2xx ++++-= -

--

- !

n Range: –2n – 1 to +2n – 1 – 1n Example

n 1111 1111 1111 1111 1111 1111 1111 11002= –1×231 + 1×230 + … + 1×22 +0×21 +0×20

= –2,147,483,648 + 2,147,483,644 = –410

n Using 32 bitsn –2,147,483,648 to + 2,147,483,647

30

Page 31: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

2s-ComplementSignedIntegers• Bit31issignbit

– 1fornegativenumbers– 0fornon-negativenumbers

• 2n– 1 can’tberepresented– 1000… isnegativenow

• Non-negativenumbershavethesameunsignedand2s-complementrepresentation

• Somespecificnumbers– 0: 00000000…0000– –1: 11111111…1111– Most-negative: 10000000…0000,whichis–2,147,483,648– Most-positive: 01111111…1111,whichis2,147,483,647

31

00

11

2n2n

1n1n 2x2x2x2xx ++++-= -

--

- !

Page 32: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SignedNegation

• Complementandadd1– Complementmeans1→0,0→ 1

32

x1x

11111...111xx 2

-=+

-==+

n Example: negate +2n +2 = 0000 0000 … 00102

n –2 = +2 +1 = 0000 0000 … 00102 + 1= 1111 1111 … 11012 + 1 = 1111 1111 … 11102

Page 33: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SignExtension

• Representinganumberusingmorebits– E.g.shorta=-5;int b=a;– Preservethenumericvalue

• Replicatethesignbittotheleft– c.f.unsignedvalues:extendwith0s

• Examples:8-bitto16-bit– +2:00000010=>0000000000000010– –2:11111110=>1111111111111110

• InMIPSinstructionset– addi:extendimmediatevalue– lb,lh:extendloadedbyte/halfword– beq,bne:extendthedisplacement

33

Page 34: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RepresentingInstructions

• Instructionsareencodedinbinary– Calledmachinecode

• MIPSinstructions– Encodedas32-bitinstructionwords– Smallnumberofformatsencodingoperationcode(opcode),

registernumbers,…– Regularity!

• Registernumbers(total32registers)mappingconvention– $t0– $t7arereg’s $8– $15– $t8– $t9arereg’s $24– $25– $s0– $s7arereg’s $16– $23

§2.5 Representing Instructions in the C

omputer

34

Page 35: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSR-formatInstructions

• Instructionfields– op:operationcode(opcode)– rs:firstsourceregisternumber– rt:secondsourceregisternumber– rd:destinationregisternumber– shamt:shiftamount(00000fornow)– funct:functioncode(extendsopcode)

op rs rt rd shamt funct6 bits 6 bits5 bits 5 bits 5 bits 5 bits

35

Page 36: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

R-formatExample

add $t0, $s1, $s2

special $s1 $s2 $t0 0 add

0 17 18 8 0 32

000000 10001 10010 01000 00000 100000

000000100011001001000000001000002 = 0232402016

op rs rt rd shamt funct6 bits 6 bits5 bits 5 bits 5 bits 5 bits

36

Page 37: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Hexadecimal

• Base16– Compactrepresentationofbitstrings– 4bitsperhexdigit

37

0 0000 4 0100 8 1000 c 11001 0001 5 0101 9 1001 d 11012 0010 6 0110 a 1010 e 11103 0011 7 0111 b 1011 f 1111

n Example: eca8 6420n 1110 1100 1010 1000 0110 0100 0010 0000

Page 38: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSI-formatInstructions

• Immediatearithmeticandload/storeinstructions– rt:destinationorsourceregisternumber– Constant:–215 to+215 – 1– Address:offsetaddedtobaseaddressinrs

• DesignPrinciple4: Gooddesigndemandsgoodcompromises– Differentformatscomplicatedecoding,butallow32-bit

instructionsuniformly– Keepformatsassimilaraspossible

38

op rs rt constant or address6 bits 5 bits 5 bits 16 bits

g = h + A[8];

Page 39: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSInstructionEncoding

• TextbookExampleinpage84- 85

39

Page 40: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StoredProgramComputers

• Instructionsrepresentedinbinary,justlikedata

• Instructionsanddatastoredinmemory

• Programscanoperateonprograms– e.g.,compilers,linkers,…

• Binarycompatibilityallowscompiledprogramstoworkondifferentcomputers– StandardizedISAs

The BIG Picture

40

Page 41: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofLecture07

41

Page 42: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 42

Page 43: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ReviewofLecture07

43

Page 44: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Review:InstructionSetArchitecture:theInterfacebetweenHardwareandSoftware

InstructionSetArchitecture– theportionofthemachinevisibletotheassemblylevelprogrammerortothecompilerwriter– Tousethehardwareofacomputer,wemustspeak itslanguage– Thewordsofacomputerlanguagearecalledinstructions,andits

vocabularyiscalledaninstructionset

instructionset

software

hardware

Instr.# Operation+Operandsi movl -4(%ebp),%eax(i+1) addl %eax,(%edx)(i+2) cmpl 8(%ebp),%eax(i+3) jl L5:L5:

44

Page 45: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Arithmetic-LogicInstructions(add,sub,addi,and,or,shiftleft|right,etc)

• Ccode:

f = (g + h) - (i + j);– f,…,jin$s0,…,$s4

• CompiledMIPScode:(R-type,i.e.Registersasoperands)add $t0, $s1, $s2 #register$t0containsg+hadd $t1, $s3, $s4 #register$t1containsi +jsub $s0, $t0, $t1 #$s0gets$t0– $t1,whichis

#(g+h)–(i +j)

I-type(Immediateasoneoftheoperands)addi $s3, $s3, 4

45

Page 46: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MemoryLoad/StoreInstructions:Lw andSw

• Ccode:int a[N]

A[12] = h + A[8];– hin$s2,baseaddressofAin$s3

• CompiledMIPScode:– Index8requiresoffsetof32:A[8]:right-val,A[12]:left-val

lw $t0, 32($s3) # load wordadd $t0, $s2, $t0sw $t0, 48($s3) # store word

46

Page 47: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

2s-ComplementSignedIntegers• Bit31issignbit

– 1fornegativenumbers– 0fornon-negativenumbers

• 2n– 1 can’tberepresented– 1000… isnegativenow

• Non-negativenumbershavethesameunsignedand2s-complementrepresentation

• Somespecificnumbers– 0: 00000000…0000– –1: 11111111…1111– Most-negative: 10000000…0000,whichis–2,147,483,648– Most-positive: 01111111…1111,whichis2,147,483,647

47

00

11

2n2n

1n1n 2x2x2x2xx ++++-= -

--

- !

Page 48: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SignedNegation

• Complementandadd1– Complementmeans1→0,0→ 1

48

x1x

11111...111xx 2

-=+

-==+

n Example: negate +2n +2 = 0000 0000 … 00102

n –2 = +2 +1 = 0000 0000 … 00102 + 1= 1111 1111 … 11012 + 1 = 1111 1111 … 11102

Page 49: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SignExtension

• Representinganumberusingmorebits– E.g.shorta=-5;int b=a;– Preservethenumericvalue

• Replicatethesignbittotheleft– c.f.unsignedvalues:extendwith0s

• Examples:8-bitto16-bit– +2:00000010=>0000000000000010– –2:11111110=>1111111111111110

• InMIPSinstructionset– addi:extendimmediatevalue– lb,lh:extendloadedbyte/halfword– beq,bne:extendthedisplacement

49

Page 50: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

InstructionEncoding:R-format,MIPS32-bitinstructionword,32registers

add $t0, $s1, $s2

special $s1 $s2 $t0 0 add

0 17 18 8 0 32

000000 10001 10010 01000 00000 100000

000000100011001001000000001000002 = 0232402016

op rs rt rd shamt funct6 bits 6 bits5 bits 5 bits 5 bits 5 bits

50

Page 51: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSI-formatInstructions

• Immediatearithmeticandload/storeinstructions– rt:destinationorsourceregisternumber– Constant:–215 to+215 – 1– Address:offsetaddedtobaseaddressinrs

• DesignPrinciple4: Gooddesigndemandsgoodcompromises– Differentformatscomplicatedecoding,butallow32-bit

instructionsuniformly– Keepformatsassimilaraspossible

51

op rs rt constant or address6 bits 5 bits 5 bits 16 bits

g = h + A[8];

Page 52: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSInstructionEncoding

• TextbookExampleinpage84- 85

52

Page 53: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofReviewofLecture07

53

Page 54: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 54

Page 55: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ThreeClassesofInstructionsWeWillFocusOn:

1. Arithmetic-logicinstructions– add,sub,addi,and,or,shiftleft|right,etc

2. Memoryloadandstoreinstructions– lw andsw:Load/storeword– Lb andsb:Load/storebyte

• Controltransferinstructions– Conditionalbranch:bne,beq– Unconditionaljump:j– Procedurecallandreturn:jal andjr

55

Page 56: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LogicalOperations

• Instructionsforbitwisemanipulation

56

Operation C Java MIPSShift left << << sll

Shift right >> >>> srl

Bitwise AND & & and, andi

Bitwise OR | | or, ori

Bitwise NOT ~ ~ nor

n Useful for extracting and inserting groups of bits in a word

§2.6 Logical Operations

Page 57: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ShiftOperations

• shamt:howmanypositionstoshift• Shiftleftlogical

– Shiftleftandfillwith0bits– sll byi bitsmultipliesby2i– E.g.int a=b<<2;//a=b*2(22)

• Shiftrightlogical– Shiftrightandfillwith0bits– srl byi bitsdividesby2i (unsignedonly)– E.g.int a=b>>2;//a=b/4(22)

57

op rs rt rd shamt funct6 bits 6 bits5 bits 5 bits 5 bits 5 bits

Page 58: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ANDOperations

• Usefultomaskbitsinaword– Selectsomebits,clearothersto0

and $t0, $t1, $t2

58

0000 0000 0000 0000 0000 1101 1100 0000

0000 0000 0000 0000 0011 1100 0000 0000

$t2

$t1

0000 0000 0000 0000 0000 1100 0000 0000$t0

Page 59: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

OROperations

• Usefultoincludebitsinaword– Setsomebitsto1,leaveothersunchanged

or $t0, $t1, $t2

59

0000 0000 0000 0000 0000 1101 1100 0000

0000 0000 0000 0000 0011 1100 0000 0000

$t2

$t1

0000 0000 0000 0000 0011 1101 1100 0000$t0

Page 60: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

NOTOperations

• Usefultoinvertbitsinaword– Change0to1,and1to0

• MIPShasNOR3-operandinstruction– aNORb==NOT(aORb)

nor $t0, $t1, $zero

60

0000 0000 0000 0000 0011 1100 0000 0000$t1

1111 1111 1111 1111 1100 0011 1111 1111$t0

Register 0: always read as zero

Page 61: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ConditionalBranchandUnconditionalJump

Branchtoalabeledinstructionifaconditionistrue– Otherwise,continuesequentially– Labelisthesymbolicrepresentationofthememoryaddressof

aninstruction.• beq rs, rt, L1

– if(rs ==rt)branchtoinstructionlabeledL1;• bne rs, rt, L1

– if(rs !=rt)branchtoinstructionlabeledL1;

UnconditionalJump• j L1

– unconditionaljumptoinstructionlabeledL1

§2.7 Instructions for Making D

ecisions

61

Page 62: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CompilingIfStatements

• Ccode:

if (i==j) f = g+h;else f = g-h;

– f,g,…in$s0,$s1,…• CompiledMIPScode:

bne $s3, $s4, Elseadd $s0, $s1, $s2j Exit

Else: sub $s0, $s1, $s2Exit: …

Assembler calculates addresses

62

Page 63: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CompilingLoopStatements

• Ccode:

while (save[i] == k) i += 1;

– i in$s3,kin$s5,addressofsavein$s6• CompiledMIPScode:

Loop: sll $t1, $s3, 2 #i=i*4add $t1, $t1, $s6 #base+offsetlw $t0, 0($t1) #newbase in $t1 bne $t0, $s5, Exit #bneaddi $s3, $s3, 1 #i=i+1;j Loop

Exit: …

63

Page 64: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MoreConditionalOperations

• Setresultto1ifaconditionistrue– Otherwise,setto0

• slt rd, rs, rt– if(rs <rt)rd =1;elserd =0;

• slti rt, rs, constant– if(rs <constant)rt =1;elsert =0;

• Useincombinationwithbeq,bneslt $t0, $s1, $s2 # if ($s1 < $s2)bne $t0, $zero, L # branch to L

64

Page 65: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

BranchInstructionDesign

• Whynotblt,bge,etc?• Hardwarefor<,≥,…slowerthan=,≠

– Combiningwithbranchinvolvesmoreworkperinstruction,requiringaslowerclock

– Allinstructionspenalized!• beq andbne arethecommoncase• Thisisagooddesigncompromise

65

Page 66: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Signedvs.Unsigned

• Signedcomparison:slt,slti• Unsignedcomparison:sltu,sltui

• Example– $s0=11111111111111111111111111111111– $s1=00000000000000000000000000000001– slt $t0, $s0, $s1 # signed

• –1<+1Þ $t0=1– sltu $t0, $s0, $s1 # unsigned

• +4,294,967,295>+1Þ $t0=0

66

Page 67: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Memorylayoutofaprogram(process)andhardwaresupportforfunctioncalls

67

Page 68: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

./a.out:LoadingaFileforExecution

• Steps:– Itreadstheexecutable’sheadertodeterminethe

sizeofthetextanddatasegments.– Itcreatesanewaddressspacefortheprogram.– Itcopiesinstructionsanddatafromtheexecutable

intothenewaddressspace.– Itcopiesargumentspassedtotheprogramonto

thestack.– Itinitializesthemachineregisters.

• Ingeneral,mostregistersarecleared,butthestackpointermustbeassignedtheaddressoftherst freestacklocation(seeSectionA.5).

– Itjumpstoastart-uproutinethatcopiestheprogram’sargumentsfromthestacktoregistersandcallstheprogram’smain routine.• Whenthemain routinereturns,thestart-uproutineterminatestheprogramwiththeexitsystemcall.

68

§A.4 Loading

Page 69: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProcessMemoryLayout

• Text:programcode• Staticdata:globalvariables

– e.g.,staticvariablesinC,constantarraysandstrings

– $gp initializedtoaddressallowing±offsetsintothissegment

• Dynamicdata:heap– E.g.,malloc inC,newinJava

• Stack:automaticstorage

69

Page 70: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProgramCounter(PC)

• Aregistertoholdtheaddressofthecurrentinstructionbeingexecuted.– Abettername:instructionaddressregister.

70

Relativeaddress

Page 71: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LinuxProcessMemoryin32-bitSystem(4Gspace)• Code(machineinstructions)à Textsegment• Staticvariablesà DataorBSSsegment• Functionvariablesà stack(i,A[100]andB)

– Aisavariablethatstoresmemoryaddress,thememoryforA’s100int elementsisinthestack– Bisamemoryaddress,itisstoredinstack,butthememoryBpointstoisinheap(100int elements)

• Dynamicallocatedmemoryusingmalloc orC++“new”à heap(B[100)),memoryacrossfuntion calls

71

#include <stdio.h>

static char *gonzo = “God’s own prototype”;static char *userName;

int main(int argc, char* argv[]){int i; /* stack */int A[100]; /* stack */int *B = (int*)malloc(sizeof(int)*100); //heap

for(i = 0; i < 100; i++) {A[i] = i*i;B[i] = A[i] * 20;printf(”A[i]: %d, B[i]: %d\n",A[i], B[i]);

}}

Stacksizelimit.If8MB,“intA[10,000,000]”won’twork.

§A.5 Mem

ory Usage

Page 72: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProcedureCalling

• Stepsrequired– Placeparametersinregisters– Transfercontroltoprocedure– Acquirestorageforprocedure– Performprocedure’soperations– Placeresultinregisterforcaller– Returntoplaceofcall

72

§2.8 Supporting Procedures in Com

puter Hardw

are§A.6 Procedure C

all Convention

Page 73: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SumExample:sum_full.c

73https://passlab.github.io/CSCE212/exercises/sum

Page 74: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SumExample,sum_full_mips.s

74https://passlab.github.io/CSCE212/exercises/sum

Argumentsforsumcall

Memoryaddressofsumentry

Storereturnaddressin$31andcalltransfertosum

Returntocaller

Page 75: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProcedureCallInstructions

• Procedurecall:jumpandlinkjal ProcedureLabel– Addressoffollowinginstructionputin$ra– Jumpstotargetaddress

• Procedurereturn:jumpregisterjr $ra– Copies$ratoprogramcounter– Canalsobeusedforcomputedjumps

• e.g.,forcase/switchstatements

75

Page 76: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

RegisterUsage

• $a0– $a3:arguments(reg’s 4– 7)• $v0,$v1:resultvalues(reg’s 2and3)• $t0– $t9:temporaries

– Canbeoverwrittenbycallee• $s0– $s7:saved

– Mustbesaved/restoredbycallee• $gp:globalpointerforstaticdata(reg 28)• $sp:stackpointer(reg 29)• $fp:framepointer(reg 30)• $ra:returnaddress(reg 31)

76

Page 77: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StackMemoryUsedforFunctionCalls

• StackisLast-In-First-Out(LIFO)datastructuretostoretheinfoofeachfunctionofthecallpath– Main()callsfoo(),foo()callsbar(),bar()calls

tar()– Callin:pushfunctiontothestacktop– Return:popfunctionfromthetop

• Stackframe,functionframe,activationrecord– Thememoryandthedataoftheinfoforeach

functioncall

77

main()foo()bar()tar()

push pop

top

Page 78: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StackFrame(ActivationRecord)ofaFunctionCall

• Information:– Parameters– Localvariables– Returnaddress– Locationtoputreturnvalue

whenfunctionexits– Controllinktothecaller’s

activationrecord– Savedregisters– Temporaryvariablesand

intermediateresults– (notalways)Accesslinktothe

function’sstaticparent• Framepointer(fp register):

thestartingaddressofAR• Stackpointer(sp register):

theendingaddressofAR78

Page 79: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LeafProcedureExample

• Leafprocedure– Aproceduredoesnotcallotherprocedures

• Thinkingofprocedurecallsasatree• Ccode:int leaf_example (int g, h, i, j){ int f;f = (g + h) - (i + j);return f;

}– Argumentsg,…,jin$a0,…,$a3– fin$s0(hence,needtosave$s0onstack)– Resultin$v0

79

Page 80: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LeafProcedureExample

• MIPScode:leaf_example:addi $sp, $sp, -4sw $s0, 0($sp)add $t0, $a0, $a1add $t1, $a2, $a3sub $s0, $t0, $t1add $v0, $s0, $zerolw $s0, 0($sp)addi $sp, $sp, 4jr $ra

Save $s0 on stack

Procedure body

Restore $s0

Result

Return

80

push

pop

Page 81: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Non-LeafProcedures

• Proceduresthatcallotherprocedures• Fornestedcall,callerneedstosaveonthestack:

– Itsreturnaddress– Anyargumentsandtemporariesneededafterthecall

• Restorefromthestackafterthecall

81

Page 82: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Non-LeafProcedureExample

• Ccode:int fact (int n){ if (n < 1) return f;else return n * fact(n - 1);

}– Argumentnin$a0– Resultin$v0

82

Page 83: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Non-LeafProcedureExample

• MIPScode:fact:

addi $sp, $sp, -8 # adjust stack for 2 itemssw $ra, 4($sp) # save return addresssw $a0, 0($sp) # save argument nslti $t0, $a0, 1 # test for n < 1beq $t0, $zero, L1addi $v0, $zero, 1 # if so, result is 1addi $sp, $sp, 8 # pop 2 items from stackjr $ra # and return

L1: addi $a0, $a0, -1 # else decrement n jal fact # recursive calllw $a0, 0($sp) # restore original nlw $ra, 4($sp) # and return addressaddi $sp, $sp, 8 # pop 2 items from stackmul $v0, $a0, $v0 # multiply to get resultjr $ra # and return

83

push

pop

pop

Page 84: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ReadingAfterClass

• Readandunderstandtheexamplesin2.8andA.6

84

Page 85: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CharacterData

• Byte-encodedcharactersets– ASCII:128characters

• 95graphic,33control• charv=`a`;//storebyte-sizecharacterainvariablev

– Latin-1:256characters• ASCII,+96moregraphiccharacters

• Unicode:32-bitcharacterset– UsedinJava,C++widecharacters,…– Mostoftheworld’salphabets,plussymbols– UTF-8,UTF-16:variable-lengthencodings

§2.9 Com

municating w

ith People

85

Page 86: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Byte/HalfwordOperations

• Couldusebitwiseoperations• MIPSbyte/halfword load/storetoa32-bitregister

– Stringprocessingisacommoncase

lb rt, offset(rs) lh rt, offset(rs)– Signextendto32bitsinrt

lbu rt, offset(rs) lhu rt, offset(rs)– Zeroextendto32bitsinrt

sb rt, offset(rs) sh rt, offset(rs)– Storejustrightmostbyte/halfword

86

Page 87: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StringCopyExample

• Ccode(naïve):– Null-terminatedstring

void strcpy (char x[], char y[]){ int i;i = 0;while ((x[i]=y[i])!='\0')i += 1;

}– Addressesofx,yin$a0,$a1– iin$s0

87

Page 88: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StringCopyExample

• MIPScode:strcpy:

addi $sp, $sp, -4 # adjust stack for 1 itemsw $s0, 0($sp) # save $s0add $s0, $zero, $zero # i = 0

L1: add $t1, $s0, $a1 # addr of y[i] in $t1lbu $t2, 0($t1) # $t2 = y[i]add $t3, $s0, $a0 # addr of x[i] in $t3sb $t2, 0($t3) # x[i] = y[i]beq $t2, $zero, L2 # exit loop if y[i] == 0 addi $s0, $s0, 1 # i = i + 1j L1 # next iteration of loop

L2: lw $s0, 0($sp) # restore saved $s0addi $sp, $sp, 4 # pop 1 item from stackjr $ra # and return

88

Page 89: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofLecture8

89

Page 90: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 90

Page 91: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ReviewofofLecture8IMPORTANT:Readandunderstandeachassemblyinstructionandcodeintheexamples

91

Page 92: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Review:ConditionalBranchandUnconditionalJump

Branchtoalabeledinstructionifaconditionistrue– Otherwise,continuesequentially– Labelisthesymbolicrepresentationofthememoryaddressof

aninstruction.• beq rs, rt, L1

– if(rs ==rt)branchtoinstructionlabeledL1;• bne rs, rt, L1

– if(rs !=rt)branchtoinstructionlabeledL1;

UnconditionalJump• j L1

– unconditionaljumptoinstructionlabeledL1

§2.7 Instructions for Making D

ecisions

92

Page 93: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CompilingIfStatements

• Ccode:

if (i==j) f = g+h;else f = g-h;

– f,g,…in$s0,$s1,…• CompiledMIPScode:

bne $s3, $s4, Elseadd $s0, $s1, $s2j Exit

Else: sub $s0, $s1, $s2Exit: …

Assembler calculates addresses

93

Page 94: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CompilingLoopStatements

• Ccode:

while (save[i] == k) i += 1;

– i in$s3,kin$s5,addressofsavein$s6• CompiledMIPScode:

Loop: sll $t1, $s3, 2 #i=i*4add $t1, $t1, $s6 #base+offsetlw $t0, 0($t1) #newbase in $t1 bne $t0, $s5, Exit #bneaddi $s3, $s3, 1 #i=i+1;j Loop

Exit: …

94

Page 95: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MoreConditionalOperations

• Setresultto1ifaconditionistrue– Otherwise,setto0

• slt rd, rs, rt– if(rs <rt)rd =1;elserd =0;

• slti rt, rs, constant– if(rs <constant)rt =1;elsert =0;

• Useincombinationwithbeq,bneslt $t0, $s1, $s2 # if ($s1 < $s2)bne $t0, $zero, L # branch to L

95

Page 96: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProcedureCallInstructions

• Procedurecall:jumpandlinkjal ProcedureLabel– Addressoffollowinginstructionputin$ra– Jumpstotargetaddress

• Procedurereturn:jumpregisterjr $ra– Copies$ratoprogramcounter– Canalsobeusedforcomputedjumps

• e.g.,forcase/switchstatements

96

Page 97: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StackMemoryUsedforFunctionCalls

• StackisLast-In-First-Out(LIFO)datastructuretostoretheinfoofeachfunctionofthecallpath– Main()callsfoo(),foo()callsbar(),bar()calls

tar()– Callin:pushfunctiontothestacktop– Return:popfunctionfromthetop

• Stackframe,functionframe,activationrecord– Thememoryandthedataoftheinfoforeach

functioncall

97

main()foo()bar()tar()

push pop

top

Page 98: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LeafProcedureExample

• Leafprocedure– Aproceduredoesnotcallotherprocedures

• Thinkingofprocedurecallsasatree• Ccode:int leaf_example (int g, h, i, j){ int f;f = (g + h) - (i + j);return f;

}– Argumentsg,…,jin$a0,…,$a3– fin$s0(hence,needtosave$s0onstack)– Resultin$v0

98

Page 99: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LeafProcedureExample

• MIPScode:leaf_example:addi $sp, $sp, -4sw $s0, 0($sp)add $t0, $a0, $a1add $t1, $a2, $a3sub $s0, $t0, $t1add $v0, $s0, $zerolw $s0, 0($sp)addi $sp, $sp, 4jr $ra

Save $s0 on stack

Procedure body

Restore $s0

Result

Return

99

push

pop

Page 100: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Non-LeafProcedureExample

• Ccode:int fact (int n){ if (n < 1) return f;else return n * fact(n - 1);

}– Argumentnin$a0– Resultin$v0

100

Page 101: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Non-LeafProcedureExample

• MIPScode:fact:

addi $sp, $sp, -8 # adjust stack for 2 itemssw $ra, 4($sp) # save return addresssw $a0, 0($sp) # save argument nslti $t0, $a0, 1 # test for n < 1beq $t0, $zero, L1addi $v0, $zero, 1 # if so, result is 1addi $sp, $sp, 8 # pop 2 items from stackjr $ra # and return

L1: addi $a0, $a0, -1 # else decrement n jal fact # recursive calllw $a0, 0($sp) # restore original nlw $ra, 4($sp) # and return addressaddi $sp, $sp, 8 # pop 2 items from stackmul $v0, $a0, $v0 # multiply to get resultjr $ra # and return

101

push

pop

pop

Page 102: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofReviewofofLecture8

102

Page 103: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer

• Lecture07– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading 103

Page 104: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

32-bitConstants

• Mostconstantsaresmall– I-format32-bitinstructionwordincludes16-bitforconstant– 16-bitimmediateissufficient

• Fortheoccasional32-bitconstantlui rt, constant– Copiestheleft(upper)16bitsoftheconstantofrt– Clearsright16bitsofrt to0

104

§2.10 MIPS Addressing for 32-Bit Im

mediates

and Addresses

op rs rt constant or address6 bits 5 bits 5 bits 16 bits

Page 105: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Loadinga32-BitConstant

105

Page 106: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

BranchAddressing

• Branchinstructions(I-format)specify– Opcode,tworegisters,targetaddress

• Mostbranchtargetsarenearbranch– Forwardorbackward

106

op rs rt constant or address as offset6 bits 5 bits 5 bits 16 bits

n PC-relative addressingn Target address = PC + offset × 4n PC already incremented by 4 by this time

Page 107: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

JumpAddressing

• Jump(j andjal)targetscouldbeanywhereintextsegment– Encodefulladdressininstruction

107

op address6 bits 26 bits

n (Pseudo) Direct jump addressingn Target address = PC31…28 : (address × 4)

Page 108: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TargetAddressingExample

• Loopcodefromearlierexample– AssumeLoopatlocation80000

108

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0

add $t1, $t1, $s6 80004 0 9 22 9 0 32

lw $t0, 0($t1) 80008 35 9 8 0

bne $t0, $s5, Exit 80012 5 8 21 2

addi $s3, $s3, 1 80016 8 19 19 1

j Loop 80020 2 20000

Exit: … 80024

PC = 80024, which is PC + offset * 4 = 80016 + 2 * 4 = 80024

PC = 80000, which is address * 4 = 20000 * 4 = 80000

Page 109: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

BranchingFarAway

• Ifbranchtargetistoofartoencodewith16-bitoffset,assemblerrewritesthecode

• Examplebeq $s0,$s1, L1

↓bne $s0,$s1, L2j L1

L2: …

109

Page 110: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CSortExample

• IllustratesuseofassemblyinstructionsforaCbubblesortfunction

• Swapprocedure(leaf)void swap(int v[], int k){

int temp;temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}– vin$a0,kin$a1,tempin$t0

110

§2.13 A C Sort Exam

ple to Put It All Together

Page 111: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheProcedureSwap• 4-byte per element

swap: sll $t1, $a1, 2 # $t1 = k * 4

add $t1, $a0, $t1 # $t1 = v+(k*4)

# (address of v[k])

lw $t0, 0($t1) # $t0 (temp) = v[k]

lw $t2, 4($t1) # $t2 = v[k+1]

sw $t2, 0($t1) # v[k] = $t2 (v[k+1])

sw $t0, 4($t1) # v[k+1] = $t0 (temp)

jr $ra # return to calling routine

111

Note:1.Arrayreferences(V[k]andV[k+1])aretranslatedtoLW/SWdependingonwhetheritisareadorwrite(right-val orleft-val).2.v[k]=v[k+1]istranslatedtotwoinstructions,i.e.LWandSW

Page 112: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheSortProcedureinC

• Bubblesort– Doublenestedloop

• Non-leaf(callsswap)void sort (int v[], int n) {

int i, j;for (i = 0; i < n; i += 1) {

for (j = i – 1;j >= 0 && v[j] > v[j + 1];j -= 1) {

swap(v,j);}

}}

– vin$a0,nin$a1,i in$s0,jin$s1112

Page 113: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

113

Page 114: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

114

Page 115: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EffectofCompilerOptimization

0

0.5

1

1.5

2

2.5

3

none O1 O2 O3

Relative Performance

020000400006000080000

100000120000140000160000180000

none O1 O2 O3

Clock Cycles

020000400006000080000

100000120000140000

none O1 O2 O3

Instruction count

0

0.5

1

1.5

2

none O1 O2 O3

CPI

Compiled with gcc for Pentium 4 under Linux

115

Page 116: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EffectofLanguageandAlgorithm

0

0.5

1

1.5

2

2.5

3

C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Bubblesort Relative Performance

0

0.5

1

1.5

2

2.5

C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Quicksort Relative Performance

0

500

1000

1500

2000

2500

3000

C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Quicksort vs. Bubblesort Speedup

116

Page 117: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LessonsLearnt

• InstructioncountandCPIarenotgoodperformanceindicatorsinisolation

• Compileroptimizationsaresensitivetothealgorithm– High-leveloptimizationà betterperformance– 02isgenerallythedefaultoptimizationlevel

• Java/JITcompiledcodeissignificantlyfasterthanJVMinterpreted– ComparabletooptimizedCinsomecases

• Nothingcanfixadumbalgorithm!

117

Page 118: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Arraysvs.Pointers

• Arrayindexinginvolves– Multiplyingindexbyelementsize– Addingtoarraybaseaddress

• Pointerscorresponddirectlytomemoryaddresses– Canavoidindexingcomplexity

118

§2.14 Arrays versus Pointers

Page 119: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Example:ClearingandArray

clear1(int array[], int size) {int i;for (i = 0; i < size; i += 1)array[i] = 0;

}

clear2(int *array, int size) {int *p;for (p = &array[0]; p < &array[size];

p = p + 1)*p = 0;

}

move $t0,$zero # i = 0

loop1: sll $t1,$t0,2 # $t1 = i * 4

add $t2,$a0,$t1 # $t2 =

# &array[i]

sw $zero, 0($t2) # array[i] = 0

addi $t0,$t0,1 # i = i + 1

slt $t3,$t0,$a1 # $t3 =

# (i < size)

bne $t3,$zero,loop1 # if (…)# goto loop1

move $t0,$a0 # p = & array[0]

sll $t1,$a1,2 # $t1 = size * 4

add $t2,$a0,$t1 # $t2 =

# &array[size]

loop2: sw $zero,0($t0) # Memory[p] = 0

addi $t0,$t0,4 # p = p + 4

slt $t3,$t0,$t2 # $t3 =

#(p<&array[size])

bne $t3,$zero,loop2 # if (…)

# goto loop2

119

6instructionsperiterationvs4instructionsperiterationIfweusep!=&array[size]forloopcondition,wedonotneedsltintheloop.

Page 120: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ComparisonofArrayvs.Ptr

• Multiply“strengthreduced”toshift• Arrayversionrequiresshifttobeinsideloop

– Partofindexcalculationforincrementedi– c.f.incrementingpointer

• Compilercanachievesameeffectasmanualuseofpointers– Inductionvariableelimination– Bettertomakeprogramclearerandsafer

120

Page 121: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofLecture09

121

Page 122: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ReviewofLecture09

122

Page 123: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Review:TargetAddressingExample

• Loopcodefromearlierexample– AssumeLoopatlocation80000

123

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0

add $t1, $t1, $s6 80004 0 9 22 9 0 32

lw $t0, 0($t1) 80008 35 9 8 0

bne $t0, $s5, Exit 80012 5 8 21 2

addi $s3, $s3, 1 80016 8 19 19 1

j Loop 80020 2 20000

Exit: … 80024

PC = 80024, which is PC + offset * 4 = 80016 + 2 * 4 = 80024

PC = 80000, which is address * 4 = 20000 * 4 = 80000

Page 124: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Example:CSort,andArrayvsPointer

• IMPORTANT:Readandunderstandeachassemblyinstructionandcodeintheexamples

124

Page 125: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofReviewofLecture09

125

Page 126: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Chapter2:Instructions:LanguageoftheComputer• Lecture07

– 2.1Introduction– 2.2OperationsoftheComputerHardware– 2.3OperandsoftheComputerHardware– 2.4SignedandUnsignedNumbers– 2.5RepresentingInstructionsintheComputer

• Lecture08– 2.6LogicalOperations– 2.7InstructionsforMakingDecisions– A.4– A.6:Loading,MemoryandProcedureCallConvention– 2.8 SupportingProceduresinComputerHardware– 2.9 CommunicatingwithPeople

• Lecture09– 2.10 MIPSAddressingfor32-BitImmediates andAddresses– 2.11 ParallelismandInstructions:Synchronization– 2.12 TranslatingandStartingaProgram

• WecoveredinAppendixAandCBasics– 2.13 ACSortExampletoPutItAllTogether– 2.14 ArraysversusPointers– 2.15 AdvancedMaterial:CompilingCandInterpretingJava

• Lecture10– 2.16 RealStuff:ARMv7(32-bit)Instructions– 2.17 RealStuff:x86Instructions– 2.18 RealStuff:ARMv8(64-bit)Instructions– 2.19 FallaciesandPitfalls– 2.20 ConcludingRemarks– 2.21 HistoricalPerspectiveandFurtherReading– IntroductionofMARSMIPSassemblerandsimulator 126

Page 127: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ARM&MIPSSimilarities

• ARM:themostpopularembeddedcore• SimilarbasicsetofinstructionstoMIPS

127

§2.16 Real Stuff: AR

M Instructions

ARM MIPSDate announced 1985 1985Instruction size 32 bits 32 bitsAddress space 32-bit flat 32-bit flatData alignment Aligned AlignedData addressing modes 9 3Registers 15 × 32-bit 31 × 32-bitInput/output Memory

mappedMemory mapped

Page 128: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

CompareandBranchinARM

• Usesconditioncodesforresultofanarithmetic/logicalinstruction– Negative,zero,carry,overflow– Compareinstructionstosetconditioncodeswithoutkeepingthe

result• Eachinstructioncanbeconditional

– Top4bitsofinstructionword:conditionvalue– Canavoidbranchesoversingleinstructions

128

Page 129: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ARMvsMIPS

129

Page 130: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

InstructionEncoding

130

Page 131: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheIntelx86ISA

• Evolutionwithbackwardcompatibility– 8080(1974):8-bitmicroprocessor

• Accumulator,plus3index-registerpairs– 8086(1978):16-bitextensionto8080

• Complexinstructionset(CISC)– 8087(1980):floating-pointcoprocessor

• AddsFPinstructionsandregisterstack– 80286(1982):24-bitaddresses,MMU

• Segmentedmemorymappingandprotection– 80386(1985):32-bitextension(nowIA-32)

• Additionaladdressingmodesandoperations• Pagedmemorymappingaswellassegments

131

§2.17 Real Stuff: x86 Instructions

Page 132: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheIntelx86ISA

• Furtherevolution…– i486(1989):pipelined,on-chipcachesandFPU

• Compatiblecompetitors:AMD,Cyrix,…– Pentium(1993):superscalar,64-bitdatapath

• LaterversionsaddedMMX(Multi-MediaeXtension)instructions• TheinfamousFDIVbug

– PentiumPro(1995),PentiumII(1997)• Newmicroarchitecture(seeColwell,ThePentiumChronicles)

– PentiumIII(1999)• AddedSSE(StreamingSIMDExtensions)andassociatedregisters

– Pentium4(2001)• Newmicroarchitecture• AddedSSE2instructions

132

Page 133: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TheIntelx86ISA

• Andfurther…– AMD64(2003):extendedarchitectureto64bits– EM64T– ExtendedMemory64Technology(2004)

• AMD64adoptedbyIntel(withrefinements)• AddedSSE3instructions

– IntelCore(2006)• AddedSSE4instructions,virtualmachinesupport

– AMD64(announced2007):SSE5instructions• Inteldeclinedtofollow,instead…

– AdvancedVectorExtension(announced2008)• LongerSSEregisters,moreinstructions

• IfInteldidn’textendwithcompatibility,itscompetitorswould!– Technicalelegance≠marketsuccess

133

Page 134: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Basicx86Registers

134

Page 135: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Basicx86AddressingModes

• Twooperandsperinstruction

135

Source/dest operand Second source operandRegister RegisterRegister ImmediateRegister MemoryMemory RegisterMemory Immediate

n Memory addressing modesn Address in registern Address = Rbase + displacementn Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3)n Address = Rbase + 2scale × Rindex + displacement

Page 136: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

x86InstructionEncoding

• Variablelengthencoding– Postfixbytesspecify

addressingmode– Prefixbytesmodify

operation• Operandlength,repetition,locking,…

136

Page 137: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ImplementingIA-32

• Complexinstructionsetmakesimplementationdifficult– Hardwaretranslatesinstructionstosimplermicrooperations

• Simpleinstructions:1–1• Complexinstructions:1–many

– MicroenginesimilartoRISC– Marketsharemakesthiseconomicallyviable

• ComparableperformancetoRISC– Compilersavoidcomplexinstructions

137

Page 138: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ConcludingRemarks

• Designprinciples1. Simplicityfavorsregularity2. Smallerisfaster3. Makethecommoncasefast4. Gooddesigndemandsgoodcompromises

• Layersofsoftware/hardware– Compiler,assembler,hardware

• MIPS:typicalofRISCISAs– c.f.x86

§2.20 Concluding R

emarks

138

Page 139: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ConcludingRemarks

• MeasureMIPSinstructionexecutionsinbenchmarkprograms– Considermakingthecommoncasefast– Considercompromises

139

Instruction class MIPS examples SPEC2006 Int SPEC2006 FPArithmetic add, sub, addi 16% 48%

Data transfer lw, sw, lb, lbu, lh, lhu, sb, lui

35% 36%

Logical and, or, nor, andi, ori, sll, srl

12% 4%

Cond. Branch beq, bne, slt, slti, sltiu

34% 8%

Jump j, jr, jal 2% 0%

Page 140: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ThreeClassesofInstructions

1. Arithmetic-logicinstructions– add,sub,addi,and,or,shiftleft|right,etc

2. Memoryloadandstoreinstructions– lw andsw:Load/storeword– Lb andsb:Load/storebyte

• Controltransferinstructions– Conditionalbranch:bne,beq– Unconditionaljump:j– Procedurecallandreturn:jal andjr

140

Page 141: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MIPSSimulator

• MARS (MIPSAssemblerandRuntimeSimulator)– http://courses.missouristate.edu/KenVollmar/MARS/index.htm– https://courses.missouristate.edu/KenVollmar/mars/tutorial.htm

141

Page 142: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

WriteAssemblyCodeinMARShttps://courses.missouristate.edu/KenVollmar/mars/tutorial.htm

• .datasegment• .textsegment

• row-major.asm example

142

Page 143: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

row-major.asm:OffsetandAddressforrow-major2-dimensionalarray

• Slides49https://passlab.github.io/CSCE212/notes/lecture06_CBasic.pdf

143

Page 144: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

MultiplicationinMIPS

• Multiplication,Textbook3.3,page188– MIPSmultiplyunitcontainstwo32-bitregisterscalledhi andlo

• Theyarenotgeneralpurposeregisters– Theproductofmultiplyingtwo32-bitsoperandsarestoredinhi

andlo registers

• Needmfhi andmflo toloadvaluesinhi andlo togeneralpurposeregister

144

Page 145: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

row-major.asm Example

145

Page 146: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

row-major.asm ExampleinMARSEditwindowandToAssembly

146

Page 147: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

row-major.asm ExampleinMARSAssemblywindow

147

Page 148: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

row-major.asm ExampleinMARSAfterExecution

148

Page 149: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

DownloadandTryExamples

• MARSTutorial:https://courses.missouristate.edu/KenVollmar/mars/tutorial.htm– Fibonacci.asm– row-major.asm– column-major.asm

• Rewritetextbookexampletomakeitrun– Csort– Arrayvspointer

149

Page 150: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

EndofChapter02

150

Page 151: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Otherslidesofthechapter

151

Page 152: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Synchronization

• Twoprocessorssharinganareaofmemory– P1writes,thenP2reads– DataraceifP1andP2don’tsynchronize

• Resultdependsoforderofaccesses

• Hardwaresupportrequired– Atomicread/writememoryoperation– Nootheraccesstothelocationallowedbetweenthereadand

write• Couldbeasingleinstruction

– E.g.,atomicswapofregister↔memory– Oranatomicpairofinstructions

§2.11 Parallelism and Instructions: Synchronization

152

Page 153: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

SynchronizationinMIPS

• Loadlinked:ll rt, offset(rs)• Storeconditional:sc rt, offset(rs)

– Succeedsiflocationnotchangedsincethell• Returns1inrt

– Failsiflocationischanged• Returns0inrt

• Example:atomicswap(totest/setlockvariable)try: add $t0,$zero,$s4 ;copy exchange value

ll $t1,0($s1) ;load linkedsc $t0,0($s1) ;store conditionalbeq $t0,$zero,try ;branch store failsadd $s4,$zero,$t1 ;put load value in $s4

153

Page 154: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

TranslationandStartup

Many compilers produce object modules directly

Static linking

§2.12 Translating and Starting a Program

154

Page 155: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

AssemblerPseudoinstructions• Mostassemblerinstructionsrepresentmachineinstructionsone-to-one

• Pseudoinstructions:figmentsoftheassembler’simaginationmove $t0, $t1 → add $t0, $zero, $t1

blt $t0, $t1, L → slt $at, $t0, $t1

bne $at, $zero, L

– $at(register1):assemblertemporary

155

Page 156: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ProducinganObjectModule

• Assembler(orcompiler)translatesprogramintomachineinstructions

• Providesinformationforbuildingacompleteprogramfromthepieces– Header:describedcontentsofobjectmodule– Textsegment:translatedinstructions– Staticdatasegment:dataallocatedforthelifeoftheprogram– Relocationinfo:forcontentsthatdependonabsolutelocationof

loadedprogram– Symboltable:globaldefinitionsandexternalrefs– Debuginfo:forassociatingwithsourcecode

156

Page 157: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LinkingObjectModules

• Producesanexecutableimage1. Mergessegments2. Resolvelabels(determinetheiraddresses)3. Patchlocation-dependentandexternalrefs

• Couldleavelocationdependenciesforfixingbyarelocatingloader– Butwithvirtualmemory,noneedtodothis– Programcanbeloadedintoabsolutelocationinvirtualmemory

space

157

Page 158: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LoadingaProgram

• Loadfromimagefileondiskintomemory1. Readheadertodeterminesegmentsizes2. Createvirtualaddressspace3. Copytextandinitializeddataintomemory

• Orsetpagetableentriessotheycanbefaultedin4. Setupargumentsonstack5. Initializeregisters(including$sp,$fp,$gp)6. Jumptostartuproutine

• Copiesargumentsto$a0,…andcallsmain• Whenmainreturns,doexitsyscall

158

Page 159: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

DynamicLinking

• Onlylink/loadlibraryprocedurewhenitiscalled– Requiresprocedurecodetoberelocatable– Avoidsimagebloatcausedbystaticlinkingofall(transitively)

referencedlibraries– Automaticallypicksupnewlibraryversions

159

Page 160: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

LazyLinkage

Indirection table

Stub: Loads routine ID,Jump to linker/loader

Linker/loader code

Dynamicallymapped code

160

Page 161: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

StartingJavaApplications

Simple portable instruction set for

the JVM

Interprets bytecodes

Compiles bytecodes of “hot” methods

into native code for host

machine

161

Page 162: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

ARMv8Instructions

• Inmovingto64-bit,ARMdidacompleteoverhaul• ARMv8resemblesMIPS

– Changesfromv7:• Noconditionalexecutionfield• Immediatefieldis12-bitconstant• Droppedload/storemultiple• PCisnolongeraGPR• GPRsetexpandedto32• Addressingmodesworkforallwordsizes• Divideinstruction• Branchifequal/branchifnotequalinstructions

§2.18 Real Stuff: AR

M v8 (64-bit) Instructions

162

Page 163: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Fallacies

• PowerfulinstructionÞ higherperformance– Fewerinstructionsrequired– Butcomplexinstructionsarehardtoimplement

• Mayslowdownallinstructions,includingsimpleones– Compilersaregoodatmakingfastcodefromsimpleinstructions

• Useassemblycodeforhighperformance– Butmoderncompilersarebetteratdealingwithmodern

processors– MorelinesofcodeÞmoreerrorsandlessproductivity

§2.19 Fallacies and Pitfalls

163

Page 164: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Fallacies

• BackwardcompatibilityÞ instructionsetdoesn’tchange– Buttheydoaccretemoreinstructions

164

x86 instruction set

Page 165: Chapter 2: Instructions: Language of the Computer · Instruction Set Architecture –the portion of the machine visible to the assembly level programmer or to the compiler writer

Pitfalls

• Sequentialwordsarenotatsequentialaddresses– Incrementby4,notby1!

• Keepingapointertoanautomaticvariableafterprocedurereturns– e.g.,passingpointerbackviaanargument– Pointerbecomesinvalidwhenstackpopped

165