U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for...

Post on 31-Mar-2015

212 views 0 download

Tags:

Transcript of U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Memory Management for High-Performance

ApplicationsEmery Berger

University of Massachusetts, Amherst

2UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

High-Performance Applications

Web servers, search engines, scientific codes

C or C++ (still…) Run on one or

cluster of server boxes

Raid drive

cpucpucpucpu

RAM

Raid drive

cpucpucpucpu

RAM

RAID drive

cpucpucpucpu

RAM

software

compiler

runtime system

operating system

hardware

Needs support at every level

3UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

New Applications,Old Memory Managers

Applications and hardware have changed Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager

(malloc, free)

But memory managers have not kept up Inadequate support for modern

applications

4UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Current Memory ManagersLimit Scalability

Runtime Performance

01234567

89

1011121314

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Number of Processors

Sp

eed

up

Ideal

Actual

As we add processors, program slows down

Caused by heap contention

Larson server benchmark on 14-processor Sun

5UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

The Problem

Current memory managersinadequate for high-performance applications on modern architectures Limit scalability, application

performance, and robustness

6UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

This Talk

Building memory managers Heap Layers framework [PLDI 2001]

Problems with current memory managers Contention, false sharing, space

Solution: provably scalable memory manager Hoard [ASPLOS-IX]

Extended memory manager for servers Reap [OOPSLA 2002]

7UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Implementing Memory Managers

Memory managers must be Space efficient Very fast

Heavily-optimized code Hand-unrolled loops Macros Monolithic functions

Hard to write, reuse, or extend

8UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Building Modular Memory Managers

Classes Overhead Rigid hierarchy

Mixins No overhead Flexible hierarchy

9UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

A Heap Layer

template <class SuperHeap>class GreenHeapLayer :

public SuperHeap {…};

GreenHeapLayer

RedHeapLayer

Mixin with malloc & free methods

10UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

LockedHeap

mallocHeapmallocHeap

Example:Thread-Safe Heap Layer

LockedHeap protect the superheap with a lock

LockedMallocHeap

mallocHeap

LockedHeap

11UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Empirical Results

Heap Layers vs. originals: KingsleyHeap

vs. BSD allocator

LeaHeapvs. DLmalloc 2.7

Competitive runtime and memory efficiency

Runtime (normalized to Lea allocator)

0

0.25

0.5

0.75

1

1.25

1.5

cfrac espresso lindsay LRUsim perl roboop Average

BenchmarkN

orm

alized

Ru

nti

me

Kingsley KingsleyHeap Lea LeaHeap

Space (normalized to Lea allocator)

0

0.5

1

1.5

2

2.5

cfrac espresso lindsay LRUsim perl roboop Average

Benchmark

No

rmali

zed

Sp

ace

Kingsley KingsleyHeap Lea LeaHeap

12UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Building memory managers Heap Layers framework

Problems with memory managers Contention, space, false sharing

Solution: provably scalable allocator Hoard

Extended memory manager for servers Reap

13UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Problems with General-Purpose Memory Managers

Previous work for multiprocessors Concurrent single heap [Bigler et al. 85, Johnson 91,

Iyengar 92] Impractical

Multiple heaps [Larson 98, Gloger 99]

Reduce contention but cause other problems: P-fold or even unbounded increase in space Allocator-induced false sharing

we show

14UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Multiple Heap Allocator:Pure Private Heaps

One heap per processor: malloc gets memory

from its local heap free puts memory

on its local heap

STL, Cilk, ad hoc

x1= malloc(1)

free(x1) free(x2)

x3= malloc(1)

x2= malloc(1)

x4= malloc(1)

processor 0 processor 1

= in use, processor 0

= free, on heap 1

free(x3) free(x4)

Key:

15UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Problem:Unbounded Memory Consumption

Producer-consumer: Processor 0 allocates Processor 1 frees

Unbounded memory consumption Crash!

free(x1)

x2= malloc(1)

free(x2)

x1= malloc(1)processor 0 processor 1

x3= malloc(1)

free(x3)

16UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Multiple Heap Allocator:Private Heaps with Ownership

free returns memory to original heap

Bounded memory consumption No crash!

“Ptmalloc” (Linux),LKmalloc

x1= malloc(1)

free(x1)

free(x2)

x2= malloc(1)

processor 0 processor 1

17UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Problem:P-fold Memory Blowup

Occurs in practice Round-robin producer-

consumer processor i mod P

allocates processor (i+1) mod P

frees

Footprint = 1 (2GB),but space = 3 (6GB) Exceeds 32-bit address

space: Crash!

free(x2)

free(x1)

free(x3)

x1= malloc(1)

x2= malloc(1)

x3=malloc(1)

processor 0 processor 1 processor 2

18UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Problem:Allocator-Induced False Sharing

False sharing Non-shared objects

on same cache line Bane of parallel

applications Extensively studied

All these allocatorscause false sharing!

CPU 0 CPU 1

cache cache

bus

processor 0 processor 1x2= malloc(1)x1= malloc(1)

cache line

thrash… thrash…

19UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

So What Do We Do Now? Where do we put free memory?

on central heap: on our own heap:

(pure private heaps) on the original heap:

(private heaps with ownership)

How do we avoid false sharing?

Heap contention Unbounded

memory consumption

P-fold blowup

20UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Building memory managers Heap Layers framework

Problems with memory managers Contention, space, false sharing

Solution: provably scalable allocator Hoard

Extended memory manager for servers Reap

21UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hoard: Key Insights Bound local memory consumption

Explicitly track utilization Move free memory to a global

heap Provably bounds memory

consumption

Manage memory in large chunks Avoids false sharing Reduces heap contention

22UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview of Hoard

Manage memory in heap blocks Page-sized Avoids false sharing

Allocate from local heap block Avoids heap contention

Low utilization

Move heap block to global heap Avoids space blowup

global heap

processor 0 processor P-1

23UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Summary of Analytical Results

Space consumption: near optimal worst-case

Hoard: O(n log M/m + P) {P « n} Optimal: O(n log M/m)

[Robson 70]: ≈ bin-packing

Private heaps with ownership:O(P n log M/m)

Provably low synchronization

n = memory requiredM = biggest object sizem = smallest object sizeP = processors

24UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Empirical Results Measure runtime on 14-processor Sun

Allocators Solaris (system allocator) Ptmalloc (GNU libc) mtmalloc (Sun’s “MT-hot” allocator)

Micro-benchmarks Threadtest: no sharing Larson: sharing (server-style) Cache-scratch: mostly reads & writes

(tests for false sharing) Real application experience similar

25UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Runtime Performance: threadtest

speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors)

Many threads,no sharing

Hoard achieves linear speedup

26UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Runtime Performance: Larson

Many threads,sharing(server-style)

Hoard achieves linear speedup

27UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Runtime Performance:false sharing

Many threads,mostly reads & writes of heap data

Hoard achieves linear speedup

28UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hoard in the “Real World” Open source code

www.hoard.org 13,000 downloads Solaris, Linux, Windows, IRIX, …

Widely used in industry AOL, British Telecom, Novell, Philips Reports: 2x-10x, “impressive” improvement in

performance Search server, telecom billing systems, scene

rendering,real-time messaging middleware, text-to-speech engine, telephony, JVM

Scalable general-purpose memory manager

29UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Building memory managers Heap Layers framework

Problems with memory managers Contention, space, false sharing

Solution: provably scalable allocator Hoard

Extended memory manager for servers Reap

30UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Custom Memory Allocation

Programmers often replace malloc/free Attempt to increase performance Provide extra functionality (e.g., for

servers) Reduce space (rarely)

Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,

except for regions. [OOPSLA 2002]

31UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview of Regions

+ Fast+ Pointer-bumping

allocation+ Deletion of chunks

+ Convenient+ One call frees all memory

regionmalloc(r, sz)regiondelete(r)

Separate areas, deletion only en masse

regioncreate(r) r

- Risky- Accidental

deletion- Too much

space

32UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Why Regions?

Apparently faster, more space-efficient

Servers need memory management support: Avoid resource leaks

Tear down memory associated with terminated connections or transactions

Current approach (e.g., Apache): regions

33UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Drawbacks of Regions

Can’t reclaim memory within regions Problem for long-running computations,

producer-consumer patterns,off-the-shelf “malloc/free” programs

unbounded memory consumption

Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming

34UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Reap = region + heap Adds individual object deletion & heap

Reap Hybrid Allocator

reapmalloc(r, sz)

reapdelete(r)

reapcreate(r)r

reapfree(r,p)

Can reduce memory consumption Fast

Adapts to use (region or heap style) Cheap deletion

35UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Using Reap as Regions

Runtime - Region-Based Benchmarks

0

0.5

1

1.5

2

2.5

lcc mudlle

No

rma

lize

d R

un

tim

e

Original Win32 DLmalloc WinHeap Vmalloc Reap

4.08

Reap performance nearly matches regions

36UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Reap: Best of Both Worlds

Combining new/delete with regionsusually impossible:

Incompatible API’s Hard to rewrite code

Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)

Changed 20 lines (out of 8000) Benchmark: compute 1000th prime

With Reap: 240K Without Reap: 7.4MB

37UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Open Questions

Grand Unified Memory Manager? Hoard + Reap Integration with garbage collection

Effective Custom Allocators? Exploit sizes, lifetimes, locality and

sharing

Challenges of newer architectures NUMA, SMT/CMP, 64-bit, predication

38UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Current Work: Robust Performance

Currently: no VM-GC communicaton BAD interactions under memory

pressure Our approach (with Eliot Moss, Scott

Kaplan):Cooperative Robust Automatic Memory Management

Garbage collector

/ allocator

Virtual memory manager

LRU queuememory pressure

empty pages

reduced impact

39UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Current Work: Predictable VMM

Recent work on scheduling for QoS E.g., proportional-share Under memory pressure, VMM is

scheduler Paged-out processes may never recover Intermittent processes may wait long time

Scheduler-faithful virtual memory(with Scott Kaplan, Prashant Shenoy) Based on page value rather than order

40UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

ConclusionMemory management for high-performance

applications Heap Layers framework [PLDI 2001]

Reusable components, no runtime cost Hoard scalable memory manager [ASPLOS-IX]

High-performance, provably scalable & space-efficient Reap hybrid memory manager [OOPSLA 2002]

Provides speed & robustness for server applications

Current work: robust memory management for multiprogramming

41UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

The Obligatory URL Slide

http://www.cs.umass.edu/~emery

42UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

If You Can Read This,I Went Too Far

43UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hoard: Under the Hood

MallocOrFreeHeap

PerProcessorHeap

SelectSizeHeap

LockedHeap

HeapBlockManager

LockedHeap

SuperblockHeap

LockedHeap

HeapBlockManager

SystemHeap

Largeobjects(> 4K)

FreeToHeapBlock

EmptyHeap Blocks

LockedHeap

HeapBlockManager

select heap based on size

malloc from local heap, free to heap block

get or return memory to global heap

44UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Custom Memory Allocation

Very common practice Apache, gcc, lcc,

STL, database servers…

Language-level support in C++

Replace new/delete,bypassing general-purpose allocator Reduce runtime – often Expand functionality –

sometimes Reduce space – rarely

“Use custom allocators”

45UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Drawbacks of Custom Allocators

Avoiding memory manager means: More code to maintain & debug Can’t use memory debuggers Not modular or robust:

Mix memory from customand general-purpose allocators → crash!

Increased burden on programmers

46UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

47UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Class1 free list

(1) Per-Class Allocators

a

b

c

a = new Class1;b = new Class1;c = new Class1;delete a;delete b;delete c;a = new Class1;b = new Class1;c = new Class1;

Recycle freed objects from a free list

+ Fast+ Linked list operations

+ Simple+ Identical semantics+ C++ language

support- Possibly space-

inefficient

48UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

(II) Custom Patterns

Tailor-made to fit allocation patterns Example: 197.parser (natural

language parser)char[MEMORY_LIMIT]

a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);

a b cd

end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array

+ Fast+ Pointer-bumping allocation

- Brittle- Fixed memory size- Requires stack-like

lifetimes

49UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

(III) Regions

+ Fast+ Pointer-bumping

allocation+ Deletion of chunks

+ Convenient+ One call frees all memory

regionmalloc(r, sz)regiondelete(r)

Separate areas, deletion only en masse

regioncreate(r) r

- Risky- Accidental

deletion- Too much

space

50UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

51UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Custom Allocators Are Faster…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

helcc

mud

lle

Non-re

gions

Regio

ns

Ove

rall

No

rma

lize

d R

un

tim

e

Custom Win32

non-regions regions averages

52UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Not So Fast…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

Non-re

gions

Regio

ns

Overa

ll

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc

non-regions regions averages

53UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

The Lea Allocator (DLmalloc 2.7.0)

Optimized for common allocation patterns Per-size quicklists ≈ per-class

allocation Deferred coalescing

(combining adjacent free objects) Highly-optimized fastpath Space-efficient

54UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Space Consumption Results

Space - Custom Allocator Benchmarks

00.250.5

0.751

1.251.5

1.75

No

rmal

ized

Sp

ace

Original DLmalloc

regionsnon-regions averages

55UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

56UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Why Regions?

Apparently faster, more space-efficient

Servers need memory management support: Avoid resource leaks

Tear down memory associated with terminated connections or transactions

Current approach (e.g., Apache): regions

57UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Drawbacks of Regions

Can’t reclaim memory within regions Problem for long-running computations,

producer-consumer patterns,off-the-shelf “malloc/free” programs

unbounded memory consumption

Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming

58UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Reap = region + heap Adds individual object deletion & heap

Reap Hybrid Allocator

reapmalloc(r, sz)

reapdelete(r)

reapcreate(r)r

reapfree(r,p)

Can reduce memory consumption+ Fast

+ Adapts to use (region or heap style)+ Cheap deletion

59UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Using Reap as Regions

Runtime - Region-Based Benchmarks

0

0.5

1

1.5

2

2.5

lcc mudlle

No

rma

lize

d R

un

tim

e

Original Win32 DLmalloc WinHeap Vmalloc Reap

4.08

Reap performance nearly matches regions

60UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Reap: Best of Both Worlds

Combining new/delete with regionsusually impossible:

Incompatible API’s Hard to rewrite code

Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)

Changed 20 lines (out of 8000) Benchmark: compute 1000th prime

With Reap: 240K Without Reap: 7.4MB

61UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Conclusion

Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,

except for regions Reaps:

Nearly matches region performancewithout other drawbacks

Take-home message: Stop using custom memory allocators!

62UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Software

http://www.cs.umass.edu/~emery

(part of Heap Layers distribution)