H07 Final Practice

7
CS107 Handout #7 J Zelenski May 23, 2011 Final practice Final Exam: Friday, June 3rd 8:30-11:30am Memorial Auditorium This is our registrar-scheduled exam time. There is no alternate exam. You may bring your textbook, notes, and other paper resources, but no electronic devices may be used. SCPD (local): See forum post with map and parking info for the on-campus exam. SCPD (remote):. Please confirm arrangements for on-site exam by email to head TA Nate Hardison ([email protected]) by May 31st. Material The final is comprehensive but expect more coverage on post-midterm topics and particular focus on material covered in the labs and assignments. Check your rear-view mirror for the very impressive list of things you've learned in 107: • C— strings, arrays, pointers, &, *, void*, typecasts, function pointers • Data representation—bits, bytes, ASCII, two's complement integers, floating point, arrays, pointers, structs • IA32 assembly—data access and addressing modes, arithmetic and logical ops, implementation of C control structures, call/return, register use • Address space—layout and purpose of text/data/stack/heap segments, handling of globals /locals/parameters • Runtime stack— protocol for function call/return, parameter passing, management of ebp and esp registers • Compilation— tasks handled by preprocessor, compiler, assembler, and linker, static and dynamic linking, relocatable object files and executables, makefiles • Memory— memory hierarchy, caches, locality, static versus dynamic allocation, heap allocator strategies and tradeoffs • Performance— compiler optimizations, measuring execution time, profiling The rest of this handout is the final from CS107 last term so questions are fairly representative in terms of format, difficulty, and content. To conserve paper, I removed answer space, but the real exam will have much more room for your answers and scratch work. We'll distribute solutions later this week. Good luck preparing!

Transcript of H07 Final Practice

Page 1: H07 Final Practice

CS107 Handout #7 J Zelenski May 23, 2011

Final practice Final Exam: Friday, June 3rd 8:30-11:30am Memorial Auditorium This is our registrar-scheduled exam time. There is no alternate exam. You may bring your textbook, notes, and other paper resources, but no electronic devices may be used.

SCPD (local): See forum post with map and parking info for the on-campus exam.

SCPD (remote):. Please confirm arrangements for on-site exam by email to head TA Nate Hardison ([email protected]) by May 31st.

Material The final is comprehensive but expect more coverage on post-midterm topics and particular focus on material covered in the labs and assignments. Check your rear-view mirror for the very impressive list of things you've learned in 107:

• C— strings, arrays, pointers, &, *, void*, typecasts, function pointers • Data representation—bits, bytes, ASCII, two's complement integers, floating

point, arrays, pointers, structs • IA32 assembly—data access and addressing modes, arithmetic and logical ops,

implementation of C control structures, call/return, register use • Address space—layout and purpose of text/data/stack/heap segments, handling

of globals /locals/parameters • Runtime stack— protocol for function call/return, parameter passing,

management of ebp and esp registers • Compilation— tasks handled by preprocessor, compiler, assembler, and linker,

static and dynamic linking, relocatable object files and executables, makefiles • Memory— memory hierarchy, caches, locality, static versus dynamic

allocation, heap allocator strategies and tradeoffs • Performance— compiler optimizations, measuring execution time, profiling

The rest of this handout is the final from CS107 last term so questions are fairly representative in terms of format, difficulty, and content. To conserve paper, I removed answer space, but the real exam will have much more room for your answers and scratch work. We'll distribute solutions later this week. Good luck preparing!

Page 2: H07 Final Practice

– 2 –

Problem 1: Python and C Parts (a) and (c) were Python questions. We did not cover Python this term and it will not appear on the exam. b: Implement the C function Max that takes an array of strings and returns the longest string from the array. The function should use qsort to sort a copy of the array by length and then return the longest. You can assume the array has at least one entry. char *Max(char *words[], int nWords) Problem 2: IA32 Below is the unoptimized assembly code generated for the Binky function. Binky: push %ebp mov %esp, %ebp sub $0x18, %esp movl $0x4, -0x4(%ebp) jmp .L2 .L3: addl $0x4,0x8(%ebp) mov 0xc(%ebp), %eax lea 0xf(%eax,%eax,4),%eax mov %eax,0x4(%esp) mov 0x8(%ebp),%eax mov %eax,(%esp) call Binky add %eax, -0x4(%ebp) .L2: mov 0xc(%ebp), %eax cmp -0x4(%ebp), %eax jne .L3 mov 0xc(%ebp), %eax shl $0x2, %eax add 0x8(%ebp), %eax mov (%eax), %eax leave ret Fill in the blanks in the C code below for Binky to compile to the assembly above. Your code should refer to variables, not register names. Note this is nonsense code, not intended to do something meaningful. int Binky(int *param1, int param2) { int local = _____________________ ;

while ( _____________________________ ) {

__________________________________________;

__________________________________________;

}

return ___________________________;

Page 3: H07 Final Practice

– 3 –

Problem 3: Runtime stack

a: Implement the Corrupt function that creates stack corruption for testing a crash reporter. The function has two arguments: an integer whichType and a pointer ptr. If whichType is an odd number, the function overwrites the return address with the value of ptr. If whichType is even, it overwrites the saved base pointer with a pointer back to itself. Note the stack data being overwritten is within the stack frame or Corrupt, not main or further back in the stack. void Corrupt(int whichType, void *ptr) b: Assume the Corrupt function has been implemented correctly. Consider its use in a program consisting of just Corrupt and the main function below.

int main(int argc, const char *argv[]) { printf("Before %d %p \n", argc, argv); Corrupt(argc, NULL); printf("After %d %p \n", argc, argv); return 0; }

The above program is executed with no command-line arguments. What does the program output and how does it behave? Be specific. The above program is executed with one command-line argument. What does it output and how does it behave? Be specific. Problem 4: Heap allocator You are implementing a segregated storage allocator. This allocator treats the heap segment as a collection of individual pages where each page contains a group of same-size blocks.

typedef struct { size_t sz; // size of blocks on this page unsigned int status[31]; // bit vector of free/in-use per block } page_header;

Each page begins with a header as declared above. The rest of the page is divided into blocks. Every block on a page is the same size as the sz field of the page header. The blocks are laid out end-to-end; there is no block header or padding. The status array in the page header is used as a bit vector. Each block on the page corresponds to one bit. The bit is 1 if that block is freed, 0 if in use.

A page is 4096 bytes. The page header occupies the first 128 bytes; the remaining 3968 bytes are divided into blocks. In the first example page diagrammed below, blocks are 4 bytes and a total of 992 blocks fit within the page. The first block is at address 0x7080, the last at 0x7ffc. The first block's status bit is at position 0 of the bit vector, the last block's at position 991. When the page is created, each block's status bit is initialized to 1. A block's status bit is toggled when being allocated or freed.

Page 4: H07 Final Practice

– 4 –

sz 4 status ... 0x7000 7004-7076 7080 7084 7088 708c 7090 7094 7098 709c 7ff0 7ff4 7ff8 7ffc

Another heap page might be divided into 16-byte blocks as shown below:

sz 16 status ... 0x8000 8004-8076 8080 8090 8ff0

...

Specific implementation facts:

• This allocator rounds up all requested sizes to a power of 2. The minimum block size is 4. We will ignore allocating blocks larger than 2048.

• This allocator returns pointers aligned to 4-byte boundaries. • To keep things simple, every page stores the same number of entries in the status

bit vector, even though a page with a larger size blocks requires fewer. Assume the excess status bits are initialized to 0, marking them unavailable.

• The heap segment consists of a sequence of pages laid out end-to-end. The heap start address is always page-aligned.

• This allocator never splits nor coalesces blocks. The realloc operation must move a block to change its size.

Here are the global variables, type definitions, and constants.

// constant: number of bits in an unsigned int #define INTBITS (sizeof(unsigned int)*8) // constant: number of bytes per page #define PAGESZ 4096 typedef struct { size_t sz; unsigned int status[31]; } page_header; static void *heapStart; // addr of first page in heap segment static void *heapEnd; // addr past end of last page of segment

The allocator uses the gcc built-in function clz ('count leading zeros'). The clz function counts the leading 0-bits in integer parameter val starting from the most significant bit. If the val is zero, it returns the INTBITS constant. This built-in is implemented as a single, fast IA32 instruction (bsr bit scan reverse).

int clz(unsigned int val); a: The RoundToPower function is given a size sz and returns the smallest power of 2 that is greater or equal to sz. For example, RoundToPower(7) returns 8. The parameter sz is required to be non-zero and must be able to be rounded without overflow. The implementation below uses clz for efficiency.

Page 5: H07 Final Practice

– 5 –

static size_t RoundToPower(size_t sz) { int count = clz(sz); return ((unsigned int)INT_MIN) >> (count - 1); }

The function as implemented above does not work correctly in all cases. Identify the input(s) for which it returns an incorrect result. Fix the function so it works correctly for all inputs, retaining comparable efficiency.

b: The status field in the page header is an array of unsigned integers used as a bit vector. The bits are referred to by position. Position 0 refers to the least significant bit of status[0], position 32 is the least significant bit of status[1], and so on. Below are two helper functions that operate on the status bit vector.

Complete the FindFree function below by providing the missing test expression. This function searches for and returns a free position within the bit vector or –1 if no positions are free. A free position has bit equal to 1.

static int FindFree(unsigned int *array) { for (int i = 0; i < 31; i++) { // status array has 31 entries

for (int pos = 0; pos < INTBITS; pos++) { if (_________________________________________) return pos + i*INTBITS; } } return –1;

} Complete the Toggle function below that inverts a single bit at a given position in the bit vector. The function can assume that pos is within range for the bit vector.

static void Toggle(unsigned int *array, int pos) { int index1 = pos / INTBITS; int index2 = pos % INTBITS;

c: Complete the mymalloc function below. The function takes one argument, the requested payload size in bytes, and returns the address of a newly allocated heap block. The requested size is rounded up to nearest power of 2. It uses a first-fit search through the heap pages to find a page with blocks of the correct size containing a free block. If an appropriate block is found, it updates the heap data structures and returns the pointer. If none is found, it returns NULL (do not use larger sizes or extend the heap). You should use the helper functions FindFree and Toggle. You can assume that myinit has been called and all pages in the heap segment have been properly initialized.

void *mymalloc(size_t sz) { sz = max(RoundToPower(sz), 4); // round to power of 2, minimum 4

Page 6: H07 Final Practice

– 6 –

The PageStart function is given a pointer and returns the page-aligned start address of the page containing that pointer. The C code below works correctly, but the direct translation into unoptimized assembly uses an expensive divl operation and requires 5 total instructions for the function body, not counting the function prolog/epilog.

static void *PageStart(void *ptr) { return (void *)(((unsigned int)ptr/PAGESZ)*PAGESZ); }

push %ebp # prolog mov %esp,%ebp mov 0x8(%ebp),%eax # first body inst, load dividend mov $0x0,%edx # clear for divl instr mov $0x1000,%ecx # load divisor divl %ecx # divide edx:eax/ecx, quotient into eax imul %ecx,%eax # multiply quotient by divisor pop %ebp # epilog ret

d: Re-implement PageStart to compute an equivalent result using only 2 assembly instructions in the function body. Show both the C code and its generated assembly. You can hard-code knowledge that page size is the constant 4096.

static void *PageStart(void *ptr) {

return _________________________________________________; }

push %ebp # prolog mov %esp,%ebp mov 0x8(%ebp),%eax # first body inst

___________________________________________ pop %ebp # epilog ret

e: Complete the myfree function which deallocates a heap block and properly updates the heap data structures. You should use the helper functions PageStart and Toggle. You can assume that ptr is the address of an allocated heap block.

void myfree(void *ptr)

f: A callgrind profile shows mymalloc to be a bottleneck. It spends many cycles examining a given page (almost all that time is spent in FindFree) and examines many pages. Compiling with optimization helps, but you need a further boost in throughput.

Describe a change in code/strategy that could significantly reduce the number of cycles spent in FindFree.

Page 7: H07 Final Practice

– 7 –

Describe a change in code/strategy that could significantly reduce the number of pages being examined.

g: This form of segregated storage has very little internal fragmentation. Compute the utilization for the best-case scenario of a heap consisting of just one full page of in-use 8-byte blocks. How does this compare to the best-case utilization for a non-segregated allocator that tacks an 8-byte header onto every block?

Segregation can increase external fragmentation. Describe a scenario where the segregated storage allocator would have much lower utilization than the non-segregated allocator due to external fragmentation.

Problem 5: Compilation Failing to #include a necessary header file can cause a variety of consequences. For each scenario described below, provide/describe a code example that results in that outcome.

a: The missing #include causes a compiler warning but doesn't block the build nor create an execution error.

b: The missing #include causes a compiler error.

c: The missing #include causes a linker error.

d: The missing #include causes an execution error (builds but does wrong thing).