Programming Abstractions in C++ - Stanford University

29
Computer Systems Cynthia Lee CS107

Transcript of Programming Abstractions in C++ - Stanford University

Page 1: Programming Abstractions in C++ - Stanford University

Computer Systems

Cynthia Lee

C S 1 0 7

Page 2: Programming Abstractions in C++ - Stanford University

Today’s Topics

LECTURE:

› Pop quiz on floating point!

• Just kidding.

› New: Assembly code

Two friendly reminders:

2

Page 3: Programming Abstractions in C++ - Stanford University

REMINDER: Everything is bits!

Page 4: Programming Abstractions in C++ - Stanford University

Everything is bits!

We’ve seen many data types so far:

› Integers:

• char/short/int/long (encoding as unsigned or two’s complement signed)

› Letters/punctutation/etc:

• char (ASCII encoding)

› Real numbers:

• float/double (IEEE floating point encoding)

› Memory addresses:

• pointer types (unsigned long encoding)

› Now a new one…..the code itself!

• Instructions (AMD64 encoding)

Page 5: Programming Abstractions in C++ - Stanford University

What happens when we compile our code?ANATOMY OF AN EXECUTABLE FILE

Page 6: Programming Abstractions in C++ - Stanford University

What happens when we compile our code?

int sum_array(int arr[], int nelems) {

int sum = 0;

for (int i = 0; i < nelems; i++) {

sum += arr[i];

}

return sum;

}

> make

> ls

Makefile sum sum.c

> objdump –d sum

Page 7: Programming Abstractions in C++ - Stanford University

7

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

Page 8: Programming Abstractions in C++ - Stanford University

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

8

Name of the function (same as in

the C code) and the memory

address where the code for this

function starts

Page 9: Programming Abstractions in C++ - Stanford University

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

9

Memory address

where each of line of

instruction is found—

sequential instructions

are found sequentially

in memory

Page 10: Programming Abstractions in C++ - Stanford University

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

10

Assembly code:

“human-readable”

version of each

instruction

Page 11: Programming Abstractions in C++ - Stanford University

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

11

Machine code:

raw hexadecimal

version of each

instruction,

representing the

binary as it would

be read by the

computer

Page 12: Programming Abstractions in C++ - Stanford University

Anatomy of an individual instruction

Page 13: Programming Abstractions in C++ - Stanford University

Anatomy of an individual instruction

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

13

Operation name

(sometimes called

“opcode”)

Operands

(like arguments)

Page 14: Programming Abstractions in C++ - Stanford University

Anatomy of an individual instruction

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

14

%[name] names a register—

these are a small collection of

memory slots right on the CPU

that can hold variables’ values

Page 15: Programming Abstractions in C++ - Stanford University

Anatomy of an individual instruction

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

15

$[number] means a constant

value (this is the number 1)

Page 16: Programming Abstractions in C++ - Stanford University

000000000040055d <sum_array>:

40055d: ba 00 00 00 00 mov $0x0,%edx

400562: b8 00 00 00 00 mov $0x0,%eax

400567: eb 09 jmp 400572

400569: 48 63 ca movslq %edx,%rcx

40056c: 03 04 8f add (%rdi,%rcx,4),%eax

40056f: 83 c2 01 add $0x1,%edx

400572: 39 f2 cmp %esi,%edx

400574: 7c f3 jl 400569

400576: f3 c3 repz retq

16

Guessing game: which part of the sum_array C code do you think

corresponds to the marked assembly instruction?

int sum_array(int arr[], int nelems) {

int sum = 0; // (A)

for (int i = 0; i < nelems; i++) { // (B)

sum += arr[i]; // (C)

} // or (D) other?

return sum;

}

Page 17: Programming Abstractions in C++ - Stanford University

Registers and memoryANATOMY OF THE COMPUTER

Page 18: Programming Abstractions in C++ - Stanford University

An architecture view of computer hardware

18

Page 19: Programming Abstractions in C++ - Stanford University

The mov instructionOUR FIRST INSTRUCTION

Page 20: Programming Abstractions in C++ - Stanford University

Dude, where’s my data?

A main job of assembly language is to manage data:

› Data can be on the CPU (in registers) or in memory (at an address)

• Turns out this distinction REALLY MATTERS for performance

• https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

› Instructions often want to move data:

• Move from one place in memory to another

• Move from one register to another

• Move from memory to register

• Move from register to memory

› Instructions often want to operate on data:

• Add contents of register X to contents of register Y

Hence “mov” (move) instruction is paramount!

Page 21: Programming Abstractions in C++ - Stanford University

mov

mov src,dst

› Optional suffix (b,w,l,q): movb, movw, movl, movq

› One confusing thing about “move” it makes it sound like it leaves the

src “empty”—no!

• Does a copy, like the assignment operator you are familiar with

› src,dst options:

• Immediate (AKA constant value)

• Register

• Memory

Page 22: Programming Abstractions in C++ - Stanford University

Basic addressing modes

22

(Think: assembly version of VARIABLES)

Notice that one major difference between high-level code and assembly

instructions is the absence of programmer-chosen, descriptive variable

names:

We don’t get to choose variable names, we have to talk directly about

places in hardware

“Addressing modes” are allowable ways of naming these places

int total_goodness = nReeses + nButterfinger;

addl 8(%rbp),%eax

Page 23: Programming Abstractions in C++ - Stanford University

Basic addressing modes

Op Source Dest Dest Comments

movl $0, %eax Name of a register

movl $0, 0x8f2713e0 Actual address literal (note address

literals are different from other

literals—don’t need $ in front)

movl $0, (%rax) Look in the register named, find an

address there, and use it

23

(Think: assembly version of VARIABLES)

Reminder: need to put $

in front of immediate

values (constant literals)

Page 24: Programming Abstractions in C++ - Stanford University

Basic addressing modes

Op Source Dest Dest Comments

movl $0, %eax Name of a register

movl $0, 0x8f2713e0 Actual address literal (note address

literals are different from other

literals—don’t need $ in front)

movl $0, (%rax) Look in the register named, find an

address there, and use it

movl $0, -24(%rbp) Add -24 to an address in the

named register, and use that

address

24

(Think: assembly version of VARIABLES)

Displacement can be

positive or negative

Displacement must be a

constant; to have a variable

base and variable

displacement , use two

steps: add then mov

Page 25: Programming Abstractions in C++ - Stanford University

Basic addressing modes

Op Source Dest Dest Comments

movl $0, %eax Name of a register

movl $0, 0x8f2713e0 Actual address literal (note address

literals are different from other

literals—don’t need $ in front)

movl $0, (%rax) Look in the register named, find an

address there, and use it

movl $0, -24(%rbp) Add -24 to an address in the

named register, and use that

address

movl $0 8(%rbp, %eax, 2) Address to use = (8 + address in

rbp) + (2 * index in eax)

25

(Think: assembly version of VARIABLES)

Only 1, 2, 4, 8 allowedAny constant allowed

Page 26: Programming Abstractions in C++ - Stanford University

Matching exercise: Addressing modes use cases

Op Src Dest Use case?

movl $0 8(%rbp, %eax, 2)

movl $0, %eax

movl $0, 0x8f2713e0

movl $0, 4(%rax)

26

Use cases

(a) Prepare to use 0 in a calculation

(b) Zero out a field of a struct

(c) Zero out a given array bucket

(d) Zero out a global variable

Match up which use cases make the most sense for which addressing

modes (some guesswork expected)

Page 27: Programming Abstractions in C++ - Stanford University

Instruction Set ArchitecturesSOME CONTEXT AND TERMINOLOGY

Page 28: Programming Abstractions in C++ - Stanford University

Instruction Set Architecture

The ISA defines:

› Operations that the processor can execute

› Data transfer operations + how to access data

› Control mechanisms like branch, jump (think loops and if-else)

› Contract between programmer/compiler and hardware

Layer of abstraction:

› Above:

• Programmer/compiler can write code for the ISA

• New programming languages can be built on top of the ISA as long as the compiler will do the translation

› Below:

• New hardware can implement the ISA

• Can have even potentially radical changes in hardware implementation

• Have to “do” the same thing from programmer point of view

ISAs have incredible inertia!

› Legacy support is a huge issue for x86-64

Page 29: Programming Abstractions in C++ - Stanford University

Two major categories of Instruction Set Architectures

CISC:

› Complex instruction set computers

• e.g., x86 (CS107 studies this)

› Have special instructions for each thing you might

want to do

› Can write code with fewer instructions, because each

instruction is very expressive

RISC:

› Reduced instruction set computers

• e.g., MIPS

› Have only a very tiny number of instructions, optimize

the heck out of them in the hardware

› Code may need to be longer because you have to go

roundabout ways of achieving what you wanted