CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense...

29
CS 61C: Great Ideas in Computer Architecture CALL continued ( Linking and Loading) 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.Berkeley.edu/~cs61c/sp16

Transcript of CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense...

Page 1: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

CS61C:GreatIdeasinComputerArchitecture

CALLcontinued(LinkingandLoading)

1

Instructors:NicholasWeaver&VladimirStojanovic

http://inst.eecs.Berkeley.edu/~cs61c/sp16

Page 2: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

WhereAreWeNow?

2

Page 3: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Linker(1/3)• Input:Objectcodefiles,informationtables(e.g.,foo.o,libc.o forMIPS)

• Output:Executablecode(e.g.,a.out forMIPS)

• Combinesseveralobject(.o)filesintoasingleexecutable(“linking”)

• Enableseparatecompilationoffiles– Changestoonefiledonotrequirerecompilationofthewholeprogram• Windows7sourcewas>40Mlinesofcode!

– Oldname“LinkEditor”fromeditingthe“links”injumpandlinkinstructions

3

Page 4: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

.o file 1text 1data 1info 1

.o file 2text 2data 2info 2

Linker

a.outRelocated text 1Relocated text 2Relocated data 1Relocated data 2

Linker(2/3)

4

Page 5: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Linker(3/3)

• Step1:Taketextsegmentfromeach.o fileandputthemtogether

• Step2:Takedatasegmentfromeach.o file,putthemtogether,andconcatenatethisontoendoftextsegments

• Step3:Resolvereferences– GothroughRelocationTable;handleeachentry– Thatis,fillinallabsoluteaddresses

5

Page 6: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

FourTypesofAddresses

• PC-RelativeAddressing(beq,bne)– neverrelocate

• AbsoluteFunctionAddress(j,jal)– alwaysrelocate

• ExternalFunctionReference(usuallyjal)– alwaysrelocate

• StaticDataReference(oftenlui andori)– alwaysrelocate

6

Page 7: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

AbsoluteAddressesinMIPS

• Whichinstructionsneedrelocationediting?– J-format:jump,jumpandlink

– Loadsandstorestovariablesinstaticarea,relativetoglobalpointer

– Whataboutconditionalbranches?

– PC-relativeaddressingpreservedevenifcodemoves

j/jal xxxxx

lw/sw $gp $x address

beq/bne $rs $rt address

7

Page 8: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

ResolvingReferences(1/2)

• Linkerassumesfirstwordoffirsttextsegmentisataddress0x04000000.– (Morelaterwhenwestudy“virtualmemory”)

• Linkerknows:– lengthofeachtextanddatasegment– orderingoftextanddatasegments

• Linkercalculates:– absoluteaddressofeachlabeltobejumpedto(internalorexternal)andeachpieceofdatabeingreferenced

8

Page 9: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

ResolvingReferences(2/2)

• Toresolvereferences:– searchforreference(dataorlabel)inall“user”symboltables

– ifnotfound,searchlibraryfiles(forexample,forprintf)

– onceabsoluteaddressisdetermined,fillinthemachinecodeappropriately

• Outputoflinker:executablefilecontainingtextanddata(plusheader)

9

Page 10: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

WhereAreWeNow?

10

Page 11: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

LoaderBasics• Input:ExecutableCode(e.g.,a.out forMIPS)

• Output:(programisrun)• Executablefilesarestoredondisk• Whenoneisrun,loader’sjobistoloaditintomemoryandstartitrunning

• Inreality,loaderistheoperatingsystem(OS)– loadingisoneoftheOStasks– Andthesedays,theloaderactuallydoesalotofthelinking

11

Page 12: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Loader…whatdoesitdo?• Readsexecutablefile’sheadertodeterminesizeoftextand

datasegments• Createsnewaddressspaceforprogramlargeenoughtohold

textanddatasegments,alongwithastacksegment• Copiesinstructionsanddatafromexecutablefileintothenew

addressspace• Copiesargumentspassedtotheprogramontothestack• Initializesmachineregisters

– Mostregisterscleared,butstackpointerassignedaddressof1stfreestacklocation

• Jumpstostart-uproutinethatcopiesprogram’sargumentsfromstacktoregisters&setsthePC– Ifmainroutinereturns,start-uproutineterminatesprogramwiththe

exitsystemcall 12

Page 13: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Clicker/PeerInstructionAtwhatpointinprocessareallthemachinecodebitsdeterminedforthefollowingassemblyinstructions:1)addu $6, $7, $82)jal fprintf

A:1)&2)AftercompilationB:1)Aftercompilation,2)AfterassemblyC:1)Afterassembly,2)AfterlinkingD:1)Aftercompilation,2)AfterlinkingE:1)Aftercompilation,2)Afterloading

13

Page 14: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Example:C⇒ Asm⇒ Obj⇒ Exe⇒ Run

#include <stdio.h>int main (int argc, char *argv[]) {int i, sum = 0;for (i = 0; i <= 100; i++)

sum = sum + i * i;printf ("The sum of sq from 0 .. 100 is %d\n", sum);

}

C Program Source Code: prog.c

“printf” lives in “libc”

14

Page 15: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Compilation:MAL.text.align 2.globl main

main:subu $sp,$sp,32sw $ra, 20($sp)sd $a0, 32($sp)sw $0, 24($sp)sw $0, 28($sp)

loop:lw $t6, 28($sp)mul $t7, $t6,$t6lw $t8, 24($sp)addu $t9,$t8,$t7sw $t9, 24($sp)

addu $t0, $t6, 1sw $t0, 28($sp)ble $t0,100, loopla $a0, strlw $a1, 24($sp)jal printfmove $v0, $0lw $ra, 20($sp)addiu $sp,$sp,32jr $ra.data.align 0

str:.asciiz "The sum of sq from 0 .. 100 is %d\n"

Where are7 pseudo-instructions?

15

Page 16: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Compilation:MAL.text.align 2.globl main

main:subu $sp,$sp,32sw $ra, 20($sp)sd $a0, 32($sp)sw $0, 24($sp)sw $0, 28($sp)

loop:lw $t6, 28($sp)mul $t7, $t6,$t6lw $t8, 24($sp)addu $t9,$t8,$t7sw $t9, 24($sp)

addu $t0, $t6, 1sw $t0, 28($sp)ble $t0,100, loopla $a0, strlw $a1, 24($sp)jal printfmove $v0, $0lw $ra, 20($sp)addiu $sp,$sp,32jr $ra.data.align 0

str:.asciiz "The sum of sq from 0 .. 100 is %d\n"

7 pseudo-instructionsunderlined

16

Page 17: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Assemblystep1:

00 addiu $29,$29,-3204 sw$31,20($29)08 sw$4, 32($29)0c sw$5, 36($29)10 sw $0, 24($29)14 sw $0, 28($29)18 lw $14, 28($29)1c multu $14, $1420 mflo $1524 lw $24, 24($29)28 addu $25,$24,$152c sw $25, 24($29)

30 addiu $8,$14, 134 sw$8,28($29)38 slti $1,$8, 101 3c bne $1,$0, loop40 lui $4, l.str44 ori $4,$4,r.str 48 lw$5,24($29)4c jal printf50 add $2, $0, $054 lw $31,20($29) 58 addiu $29,$29,325c jr $31

Remove pseudoinstructions, assign addresses

17

Page 18: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Assemblystep2

• SymbolTableLabel address(inmodule) typemain: 0x00000000 global textloop: 0x00000018 local textstr: 0x00000000 local data

• RelocationInformationAddress Instr.type Dependency0x00000040 lui l.str0x00000044 ori r.str 0x0000004c jal printf

Create relocation table and symbol table

18

Page 19: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Assemblystep3

00 addiu $29,$29,-3204 sw $31,20($29)08 sw $4, 32($29)0c sw $5, 36($29)10 sw $0, 24($29)14 sw $0, 28($29)18 lw $14, 28($29)1c multu $14, $1420 mflo $1524 lw $24, 24($29)28 addu $25,$24,$152c sw $25, 24($29)

30 addiu $8,$14, 134 sw $8,28($29)38 slti $1,$8, 101 3c bne $1,$0, -1040 lui $4, l.str44 ori $4,$4,r.str48 lw $5,24($29)4c jal printf50 add $2, $0, $054 lw $31,20($29) 58 addiu $29,$29,325c jr $31

Resolve local PC-relative labels

19

Page 20: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Assemblystep4

• Generateobject(.o)file:– Outputbinaryrepresentationfor• textsegment(instructions)• datasegment(data)• symbolandrelocationtables

– Usingdummy“placeholders”forunresolvedabsoluteandexternalreferences

20

Page 21: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Textsegmentinobjectfile0x000000 001001111011110111111111111000000x000004 101011111011111100000000000101000x000008 101011111010010000000000001000000x00000c 101011111010010100000000001001000x000010 101011111010000000000000000110000x000014 101011111010000000000000000111000x000018 100011111010111000000000000111000x00001c 100011111011100000000000000110000x000020 000000011100111000000000000110010x000024 001001011100100000000000000000010x000028 001010010000000100000000011001010x00002c 101011111010100000000000000111000x000030 000000000000000001111000000100100x000034 000000110000111111001000001000010x000038 000101000010000011111111111101110x00003c 101011111011100100000000000110000x000040 001111000000010000000000000000000x000044 100011111010010100000000000000000x000048 000011000001000000000000111011000x00004c 001001000000000000000000000000000x000050 100011111011111100000000000101000x000054 001001111011110100000000001000000x000058 000000111110000000000000000010000x00005c 00000000000000000001000000100001

21

Page 22: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Linkstep1:combineprog.o,libc.o

• Mergetext/datasegments• Createabsolutememoryaddresses• Modify&mergesymbolandrelocationtables• SymbolTable– Label Addressmain: 0x00000000loop: 0x00000018str: 0x10000430printf: 0x000003b0 …

• RelocationInformation– Address Instr.Type Dependency0x00000040 lui l.str0x00000044 ori r.str0x0000004c jal printf …

22

Page 23: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Linkstep2:

00 addiu $29,$29,-3204 sw$31,20($29)08 sw$4, 32($29)0c sw$5, 36($29)10 sw $0, 24($29)14 sw $0, 28($29)18 lw $14, 28($29)1c multu $14, $1420 mflo $1524 lw $24, 24($29)28 addu $25,$24,$152c sw $25, 24($29)

30 addiu $8,$14, 134 sw$8,28($29)38 slti $1,$8, 101 3c bne $1,$0, -1040 lui $4, 409644 ori $4,$4,107248 lw$5,24($29)4c jal 81250 add $2, $0, $054 lw $31,20($29) 58 addiu $29,$29,325c jr$31

• Edit Addresses in relocation table • (shown in TAL for clarity, but done in binary )

23

Page 24: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Linkstep3:

• Outputexecutableofmergedmodules– Singletext(instruction)segment– Singledatasegment– Headerdetailingsizeofeachsegment

• NOTE:– Thepreceeding examplewasamuchsimplifiedversionofhowELFandotherstandardformatswork,meantonlytodemonstratethebasicprinciples.

24

Page 25: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

StaticvsDynamicallylinkedlibraries

• Whatwe’vedescribedisthetraditionalway:statically-linked approach– Thelibraryisnowpartoftheexecutable,soifthelibraryupdates,wedon’tgetthefix(havetorecompileifwehavesource)

– Itincludestheentire libraryevenifnotallofitwillbeused

– Executableisself-contained• Analternativeisdynamicallylinkedlibraries(DLL),commononWindows&UNIXplatforms

25

Page 26: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Dynamicallylinkedlibraries

• Space/timeissues+Storingaprogramrequireslessdiskspace+Sendingaprogramrequireslesstime+Executingtwoprogramsrequireslessmemory(iftheysharealibrary)– Atruntime,there’stimeoverheadtodolink

• Upgrades+Replacingonefile(libXYZ.so)upgradeseveryprogramthatuseslibrary“XYZ”– Havingtheexecutableisn’tenoughanymore

Overall, dynamic linking adds quite a bit of complexity to the compiler, linker, and operating system. However, it provides many benefits that often outweigh these

en.wikipedia.org/wiki/Dynamic_linking

26

Page 27: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

Dynamicallylinkedlibraries• Theprevailingapproachtodynamiclinkingusesmachine

codeasthe“lowestcommondenominator”– Thelinkerdoesnotuseinformationabouthowtheprogramor

librarywascompiled(i.e.,whatcompilerorlanguage)– Thiscanbedescribedas“linkingatthemachinecodelevel”– Thisisn’ttheonlywaytodoit...

• Alsothesedayswillrandomizelayout (AddressSpaceLayoutRandomization)– ActsasadefensetomakeexploitingCmemoryerrors

substantiallyharder,asmodernexploitationrequiresjumpingtopiecesofexistingcode(“Returnorientedprogramming”)tocounteranotherdefense(markingheap&stackunexecutable,soattackercan’twritecodeintojustanywhereinmemory).

27

Page 28: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

UpdateYourLinuxSystems!!!• TheGNUglibc hasacatastrophicallybadbug– Astackoverflowingetaddrinfo()

• Functionthatturns"DNSname"into"IPaddress"• CVE-2015-7547

– "CommonVulnerabilities and Exposures"

• If”badguy”canmakeyourprogramlookupanameoftheirchoosing…– Andtheirbadnamehasaparticularlylongreply...

• Withstaticlinking,therewouldbeaneedtorecompileandupdatehundreds ofdifferentprograms

• Withdynamiclinking,"just"needtoupdatetheoperatingsystem

28

Page 29: CS 61C: Great Ideas in Computer Architecture CALL ...cs61c/sp16/lec/12/... · – Acts as a defense to make exploiting C memory errors substantially harder, as modern exploitation

InConclusion…§ Compiler converts a single HLL file

into a single assembly language file.§ Assembler removes pseudo-

instructions, converts what it can to machine language, and creates a checklist for the linker (relocation table). A .s file becomes a .o file.ú Does 2 passes to resolve addresses,

handling internal forward references

§ Linker combines several .o files and resolves absolute addresses.ú Enables separate compilation, libraries

that need not be compiled, and resolves remaining addresses

§ Loader loads executable into memory and begins execution.

29