Assembly Instructions & Debugging with GDB · (gdb) layout split Show source, assembly, commands...

33
CS356: Discussion #4 Assembly Instructions & Bomb Lab

Transcript of Assembly Instructions & Debugging with GDB · (gdb) layout split Show source, assembly, commands...

  • CS356: Discussion #4Assembly Instructions & Bomb Lab

  • %rax

    %rbx

    %rcx

    %rdx

    %rsi

    %rdi

    %rsp

    %rbp

    Last week: 16 ⨉ 64-bit general registers

    %eax

    %ebx

    %ecx

    %edx

    %esi

    %edi

    %esp

    %ebp

    %ax

    %bx

    %cx

    %dx

    ● In addition: %r8 to %r15 (%r8d / %r8w / %r8b for lower 4 / 2 / 1 bytes)

    accumulate

    base

    counter

    data

    source index

    destination index

    stack pointer

    base pointer

    %si

    %di

    %sp

    %bp

    q (8 bytes)l (4 bytes)

    w (2 bytes)b (1 byte)

    %al

    %bl

    %cl

    %dl

    %sil

    %dil

    %spl

    %bpl

  • Last week: Data Movement

    Move to register/memory (register operands must match size codes)

    ● movb src, dst (1 byte)

    ● movw src, dst (2 bytes)

    ● movl src, dst (4 bytes / with register destination, the others are set to 0)

    ● movq src, dst (8 bytes)

    ● movabsq imm, reg (8 bytes / 64-bit source value allowed into register)

    (movq only supports a 32-bit immediate; movabsq allows a 64-bit immediate)

    (Either src or dst can refer to a memory location, not both; no imm as dst.)

    imm

    mem reg

  • Operand Forms

    Different ways to specify source values and output location.

    Immediate: $imm to use a constant input value, e.g., $0xFF.

    Register: %reg to use the value contained in a register, e.g., %rax .

    Memory reference

    ● Absolute: addr, e.g., 0x1122334455667788 [use a fixed address]

    ● Indirect: (%reg), e.g., (%rax) [use the address contained in a q register]

    ● Base+displacement: imm(%reg), e.g., 16(%rax) [add a displacement]

    ● Indexed: (%reg1,%reg2), e.g., (%rax,%rbx) [add another register]

    ● Indexed+displacement: imm(%reg1,%reg2) [add both]

    ● Scaled indexed: imm(%reg1,%reg2,c) [use address: imm+reg1+reg2*c]

    c must be one of 1, 2, 4, 8

    Variants: omit imm or reg1 or both. E.g., (,%rax,4)

    (A memory reference selects the first byte.)

  • Operand Forms: Examples

    Which one is correct?

    ● A. (%rax, , 4)

    ● B. (%rax, %rsp, 3)

    ● C. 123

    ● D. $1(%rbx, %rbp, 1)

  • Operand Forms: Examples

    Which one is correct?

    ● A. (%rax, , 4)

    ● B. (%rax, %rsp, 3)

    ● C. 123

    ● D. $1(%rbx, %rbp, 1)

    Solution: C

  • Operand Forms: Examples

    Operand value?

    ● %rax

    ● 0x104

    ● $0x108

    ● (%rax)

    ● (%eax)

    ● 4(%rax)

    ● 9(%rax,%rdx)

    ● 0xFC(,%rcx,4)

    ● (%rax,%rdx,4)

    ● 0x4(%rax,%rdx,3)

    ● $4(%rax,%rcx)

    Values at each memory address:

    ● 0x100: 0xFF

    ● 0x104: 0xAB

    ● 0x108: 0x13

    ● 0x10C: 0x11

    Values in registers:

    ● %rax: 0x100

    ● %rcx: 0x1

    ● %rdx: 0x3

    Solutions:

    0x100

    0xAB

    0x108

    0xFF

    Illegal

    0xAB

    0x11

    0xFF

    0x11

    Illegal

    Illegal

  • Data Movement: Instructions

    Move from register/memory to register (zero extension)

    ● movzbw src, reg (byte to word)

    ● movzbl src, reg (byte to double word)

    ● movzbq src, reg (byte to quad word)

    ● movzwl src, reg (word to double word)

    ● movzwq src, reg (word to quad word)

    Same, but with sign extension (replicate MSB):

    ● movsbw, movsbl, movsbq, movswl, movswq, movslq,

    cltq (= movslq %eax to %rax)

  • Data Movement: Examples

    Which one is wrong?

    ● A. movq $-1, (%rax)

    ● B. movq (%rax), %rax

    ● C. movq $23, 10(%rdx, %rax)

    ● D. movq (%rax), 8(%rbx)

  • Data Movement: Examples

    Which one is wrong?

    ● A. movq $-1, (%rax)

    ● B. movq (%rax), %rax

    ● C. movq $23, 10(%rdx, %rax)

    ● D. movq (%rax), 8(%rbx)

    Solution: D

    Either src or dst can refer to a memory location, not both

  • Data Movement: Zero and Sign Extension

    A sequence of instructions

    movabsq $0x0011223344556677, %rax // %rax = 0x0011223344556677

    movb $0xAA, %dl // %dl = 0xAA

    movb %dl, %al // %rax = 0x00112233445566AA

    movsbq %dl, %rax // %rax = 0xFFFFFFFFFFFFFFAA

    movzbq %dl, %rax // %rax = 0x00000000000000AA

    Another sequence of instructions

    movabsq $0x0011223344556677, %rax // %rax = 0x0011223344556677

    movb $-1, %al // %rax = 0x00112233445566FF

    movw $-1, %ax // %rax = 0x001122334455FFFF

    movl $-1, %eax // %rax = 0x00000000FFFFFFFF

    movq $-1, %rax // %rax = 0xFFFFFFFFFFFFFFFF

    // note: “movq $-1, %rax” extends $0xFFFFFFFF to 8 bytes

    movq $0xFF, %rax // %rax = 0x00000000000000FF

  • BombLab

    Goal: to defuse a “binary bomb” by figuring out the correct inputs.

    ● A sequence of 8 phases: each phase asks for an input from stdin.

    ● If a wrong input is provided, the program terminates with an “explosion.”

    Your goal is to complete all phases. You must figure out the correct inputs by

    disassembling the binary program that is already in your GitHub repository.

    ● Complete the assignment inside the VM (must have internet connection).

    ● The binary program pings our server.

    ● Commit and push your solution files sol1.txt through sol8.txt to GitHub.

  • gdb: The GNU Debugger

    Goal: “To help you catch bugs in the act.”

    How?

    ● Start your program (specifying inputs).

    ● Pause it when a condition is met (breakpoints).

    ● Examine the current state (inspect).

    ● Proceed step-by-step (understand).

    Getting started

    ● Install gdb: apt-get install gdb (already present on your VM)

    ● Include debugging information: gcc -g hello.c -o hello

    ● Run gdb on your binary program:

    $ gdb hello

    Reading symbols from hello...done.

    (gdb) _

    For a fish, the archer fish is known to

    shoot down bugs from low hanging plants

    by spitting water at them.

    — Jamie Guinan | https://goo.gl/VxsgbU

  • An interactive shell

    ● Autocomplete a command with tab

    ● Scroll history of previous commands with up / down

    ● Repeat the previous command with enter

    ● Commands can often be abbreviated with few letters (in red)

    ● Help about a command: (gdb) help

    ● Open a file for debug: (gdb) file

    ● Quit: (gdb) quit

    A bit tedious!

    There is a more practical interface: gdb -tui, the “terminal user interface”

    User Interface

  • User Interface Reloaded: gdb -tui

    Enter commands

    Scroll through source code

  • Moving the focus

    ● By pressing up / down / left / right, you scroll the source sub-window

    ● To scroll the history or move along the command line, you must set the

    focus on the other part of the screen: C-x o (press ctrl+x, release, press o)

    Redrawing the screen

    ● If your program prints to stdout, it will interfere with the TUI interface

    ● In case, you can redraw the screen with C-l

    Changing mode

    ● You can enable/disable the TUI mode with C-x a

    ● Or, you can select a mode:

    ○ (gdb) layout src Show source and commands

    ○ (gdb) layout asm Show assembly and commands

    ○ (gdb) layout split Show source, assembly, commands

    ○ (gdb) layout regs Show registers

    A few tips

  • Layouts

  • Breakpoints

    ● Add at current location: (gdb) break

    ● Add at the beginning of a function: (gdb) break func_name

    ● Add at a specific line of a source file: (gdb) break hello.c:5

    ● Add at a specific line of current file: (gdb) break 5

    ● List all breakpoints: (gdb) info breakpoints

    ● Delete a breakpoint: (gdb) delete

    ● Disable/enable breakpoint: (gdb) disable and (gdb) enable

    Controlling the execution

    ● Run a program from start, until first breakpoint: (gdb) run

    ● Advance your program execution manually

    ○ Continue to the next line, executing subroutines: (gdb) next

    ○ Continue to the next line, stepping into subroutines: (gdb) step

    ● Run until the next breakpoint: (gdb) continue

    ● Run until the end of the function and print return value: (gdb) finish

    Breakpoints and Control Flow

  • Inspecting Data

    Registers: (gdb) info registers

    Stack: (gdb) info stack and (gdb) info frame

    Memory

    ● Print 1 byte at 0x12345 as unsigned int: (gdb) x/1ub 0x12345

    ● Print 2 words above stack pointer as hex: (gdb) x/2xw $sp

    ● Print string at memory address contained in %rdi: (gdb) x/s $rdi

    Variables

    ● Print an expression: (gdb) print a/b+3.0*func_name(3)

    ● In hexadecimal: (gdb) print/x var_name

    ● Display an expression after every step: (gdb) display var_name

    Pausing on variable or condition changes

    ● Add a watchpoint for a variable (current scope): (gdb) watch var_name

  • Disassembling binary code

    When source code is missing...

    ● List all the strings in a binary file using: strings objfile

    ● Print the symbol table: objdump -t objfile

    ○ Names of all functions and global variables in objfile

    ○ Example:

    0000000000400ab6 g F .text 0000000000000064 riddle_2

    Meaning: a global Function in section .text with name riddle_2

    ● Debugging with gdb (use layout asm in gdb -tui)

    ○ Print the assembly of a function: (gdb) disassemble

    ○ Breakpoint at a given address: (gdb) break *

    ○ Next/step one assembly instruction at a time: (gdb) ni and si

    ○ Jump to a given address: (gdb) jump *

    ○ Print the string at a given address: (gdb) x/s

  • Getting started with the assignment

    Disassemble and step through main

    ● Open gdb -tui and set layout asm

    ● Load the binary file: (gdb) file riddle

    ● Set a breakpoint on main: (gdb) b main

    ● Start the program: (gdb) run

    ● Look around and advance with ni and si

    ○ Can you find where inputs are read from stdin?

    ○ Can you find the calls to riddle_1 and riddle_2?

    ○ Can you figure out their input parameters?

    Remember

    ● Disassemble a function with (gdb) disas func_name

    ● Redraw the screen with Ctrl-l

    ● Print the string at the address in %rdi using: (gdb) x/s $rdi

  • Today: an easier problem

    Download from: https://usc-cs356.github.io/labs/riddle.zip

    Two-Phases

    ● The main program reads two strings from stdin.

    ● The strings are validated by calling functions riddle_1 and riddle_2

    $ ./riddle

    To continue, tell me: how is an orange like a bell?

    I know you can Google it, but don't.

    Very well then. Tell me the ages of my three children.

    Hint 1: If you multiply their ages, the product is 36.

    Hint 2: If you add up their ages, it is the number of

    my neighbor's house.

    Hint 3: The oldest one is in fourth grade.

    Sorry, you failed to complete the riddle challenge.

    https://usc-cs356.github.io/labs/riddle.zip

  • Main function

  • Riddle 1

    Understanding

    ● Which functions are called by riddle_1?

    ● Which parameters are passed?

    ● Which output values are used afterward?

    ● Jumps? Conditional jumps?

    (gdb) disas riddle_1

    Dump of assembler code for function riddle_1:

    0x0000000000400a30 : sub $0x8,%rsp

    0x0000000000400a34 : mov $0x400dd0,%esi

    0x0000000000400a39 : callq 0x4009c9

    0x0000000000400a3e : test %eax,%eax

    0x0000000000400a40 : je 0x400a47

    0x0000000000400a42 : callq 0x400891

    0x0000000000400a47 : add $0x8,%rsp

    0x0000000000400a4b : retq

    End of assembler dump.

  • Arithmetic Instructions

    Unary (with q / l / w / b variants)

    ● incq x is equivalent to x++

    ● decq x is equivalent to x--

    ● negq x is equivalent to x = -x

    ● notq x is equivalent to x = ~x

    Binary (with q / l / w / b variants)

    ● addq x,y is equivalent to y += x

    ● subq x,y is equivalent to y -= x

    ● imulq x,y is equivalent to y *= x

    ● andq x,y is equivalent to y &= x

    ● orq x,y is equivalent to y |= x

    ● xorq x,y is equivalent to y ^= x

    ● salq n,y is equivalent to y = y > n arithmetic: fill in sign bit from left

    ● shrq n,y is equivalent to y = y >> n logical: fill in zeros from left

    Any instruction that generates a 32-bit

    value for a register also sets the high-

    order portion of the register to 0.

    Except for right shift, all instructions

    are the same for signed/unsigned

    values (thanks to 2’s-complement)

  • Arithmetic Instructions: Examples

    Effect?

    ● addq %rcx,(%rax)

    ● imulq $16,(%rax,%rdx,8)

    ● incq 16(%rax)

    ● decq %rcx

    ● subq %rdx,%rax

    Values at each memory address:

    ● 0x100: 0xFF

    ● 0x108: 0xAB

    ● 0x110: 0x13

    ● 0x118: 0x11

    Values in registers:

    ● %rax: 0x100

    ● %rcx: 0x1

    ● %rdx: 0x3

    Solutions:

    Write 0x100 at 0x100

    Write 0x110 at 0x118

    Write 0x14 at 0x110

    Write 0x0 inside %rcx

    Write 0xFD inside %rax

  • BombLab

  • BombLab

  • Riddle 2

    0x0000000000400a79 : sub $0x18,%rsp

    0x0000000000400a7d : lea 0x4(%rsp),%rsi

    0x0000000000400a82 : callq 0x400a4c

    0x0000000000400a87 : mov 0x4(%rsp),%eax

    0x0000000000400a8b : test %eax,%eax

    0x0000000000400a8d : jns 0x400a94

    0x0000000000400a8f : callq 0x400891

    0x0000000000400a94 : cmp $0x2,%eax

    0x0000000000400a97 : je 0x400a9e

    0x0000000000400a99 : callq 0x400891

    0x0000000000400a9e : cmpl $0x2,0x8(%rsp)

    0x0000000000400aa3 : je 0x400aaa

    0x0000000000400aa5 : callq 0x400891

    0x0000000000400aaa : cmpl $0x9,0xc(%rsp)

    0x0000000000400aaf : je 0x400ab6

    0x0000000000400ab1 : callq 0x400891

    0x0000000000400ab6 : add $0x18,%rsp

    0x0000000000400aba : retq

  • read_three_numbers

    0x0000000000400a4c : sub $0x8,%rsp

    0x0000000000400a50 : mov %rsi,%rdx

    0x0000000000400a53 : lea 0x4(%rsi),%rcx

    0x0000000000400a57 : lea 0x8(%rsi),%r8

    0x0000000000400a5b : mov $0x400e15,%esi

    0x0000000000400a60 : mov $0x0,%eax

    0x0000000000400a65 : callq 0x400680

    0x0000000000400a6a : cmp $0x2,%eax

    0x0000000000400a6d : jg 0x400a74

    0x0000000000400a6f : callq 0x400891

    0x0000000000400a74 : add $0x8,%rsp

    0x0000000000400a78 : retq

    sscanf: Reads formatted input from a string

    int sscanf(const char *str, const char *format, ...)

  • Some tips

    ● Set breakpoint at main, phase_1, phase_2, etc.

    ● Set breakpoint at explode_bomb in case you miss type and execute this

    function. Once you see explode_bomb is about to execute, type

    commands to restart. The breakpoint is still there.

  • Bonus: leaq (Load Effective Address)

    leaq src, reg

    ● Saves the first parameter into an 8-byte register

    ● The first parameter can be any displaced / indexed / scaled address

    Useful for:

    ● Saving an address for later use.

    ● Performing simple additions and constant multiplication:

    leaq imm(reg1,reg2,c), reg3 saves imm+reg1+reg2*c into reg3

    ● Only one instruction is used: efficient!

    Examples (%rax = x, %rcx = y)

    ● leaq 6(%rax),%rdx saves (6+x) in %rdx

    ● leaq (%rax,%rcx),%rdx saves (x+y) in %rdx

    ● leaq (%rax,%rcx,4),%rdx saves (x+4*y) in %rdx

    ● leaq 7(%rax,%rax,8),%rdx saves (7+9*x) in %rdx

    ● leaq 0xA(,%rcx,4),%rdx saves (10+4*y) in %rdx

  • Fill In the Missing C Expression

    The assembly code on the right is produced by the compiler. What is a

    corresponding C expression for the input code?

    long scale(long x, long y, long z) {

    // x in %rdi, y in %rsi, z in %rdx

    // output saved in %rax

    return ???

    }

    long scale(long x, long y, long z) {

    // x in %rdi, y in %rsi, z in %rdx

    // output saved in %rax

    return 5*x + 2*y + 8*z;

    }

    scale:

    leaq (%rdi,%rdi,4), %rax

    leaq (%rax,%rsi,2), %rax

    leaq (%rax,%rdx,8), %rax

    ret

    Try yourself! Save the C code in scale.c and compile: gcc -O3 -c scale.c

    Then, disassemble the binary object scale.o generated by the compiler:

    $ objdump -M suffix -d scale.o

    scale.o: file format elf64-x86-64

    Disassembly of section .text:

    0000000000000000 :

    0: 48 8d 04 bf leaq (%rdi,%rdi,4),%rax

    4: 48 8d 04 70 leaq (%rax,%rsi,2),%rax

    8: 48 8d 04 d0 leaq (%rax,%rdx,8),%rax

    c: c3 retq