Memory unmanglement

Memory Unmanglement With Perl

How to do what you dowithout getting hit in the memory.

Steven LembarkWorkhorse Computing

In Our Last Episode...

● We saw our hero battling the forces of rambloat in longrunning, heavilyforked, or largescale processes.

● Learned the golden rule: Nothing Shrinks.● Observed memory benchmarks using Devel::Peek,

Devel::Size, and perl -d.● peek() shows the structure & hash efficiency.● size() & total_size() show memory usage.

Time vs. Space

● The classic tradeoff is handled in favor of time in the perl implementations.

● More efficient data structures can help both sides.● Avoiding wasted space can help avoid thrashing, heap

management, and system call overhead.● Faster access for arrays can make them more compact

and faster than hashes in some situations.

● Benchmarks are not only for time: include checks of size(), total_size(), and peek() to see what is really going on.

Nothing Ever Shrinks

● perl maintains strings and arrays as pointers to memory allocations.● Adjusting the size of a scalar with substr or a regex

changes it start and length.● shift and pop adjust an array's initial offset and count.

● None of these will reduce the memory overhead of the 'scaffolding' perl uses to manage the data.

Look Deep Into Your Memory

● Devel::Peek● peek() at the structure● Shows efficiency of hashing.

● Devel::Size● size() shows memory usage of “scaffolding”.● total_size() includes contents along with skeleton.

● size() can be useful in loops for managing size of recycled buffers.

Size & Structure

● Scalars● Reference allocations for strings with offset & length.● size() of the scalar is small, total_size() can be large.

● Arrays● Allocated list of Scalars, also with offset & length.● size() reports space for list, total_size() includes contents.

● Hashes● Hash chains are an array of arrays with min. 8 chains.● size() reports space for hash chains.

Taming the Beast

● There are tools for managing the memory, most of which involve some sort of time/space tradeoff.● undef can help – probably less than you think.● You can manage the lifetime of variables with lexical or

local values.● Recycling buffers localizes the bloat to one structure.● Adapting your code to use more effective data structures

offers the best solution for large data.

● Here are some ideas.

undef() is somewhat helpful

● Marks the variable for reclamation.● Space may not be immediately reclaimed – up to perl

whether to add heap or recycle the undefed variables.

● Structures are discarded, not reduced.● This can have a significant performance overhead on

nested, reused data structures.

● Tradeoff: space for time for rebuilding the skeleton of discarded structures.

● Most useful for recycling singlelevel structures.

undefing an Array Doesn't Zero It

● For a large, nested structure this may not save the amount of space you expect.

my @a = ();$#a = 999_999;print "Size \@a:\t", size( \@a ), "\n";

undef @a;print "Size \@a:\t", size( \@a ), "\n";

Full @a:4000200Post @a: 100

● The contents are discarded & reallocated:

Recycling Buffers

● Use size() to discard and reallocate the buffer if it grows too large.

● Preallocate to avoid marginoferror added by perl when the initial allocation grows.

● Decent tradeoff between reallocating a buffer frequently and having it grow without bounds.

● Avoids one record botching the entire processing cycle.

Scalar Buffer

● Recycle buffer, clean it up, then copy by value.● Easiest with scalars since they don't have any nested

structure.while( $buffer = get_data ){ $buffer =~ s/^\s+//; ... push @data, $buffer;

if( size( $buffer ) > $max_buff ) { undef $buffer; $buffer = ' ' x $max_buff; }}

Array Buffer

● This works well for single level buffers multilevel buffers often require too much work to rebuild.my @buff = ();$#buff = $buff_count;

while( @buff = get_data ){ ... # clean up buffer $data{ $key } = [ @buff ]; # store values

if( size( \@a ) > $buff_max ) { undef @buff; $#buff = $max_buff; }}

Assign Arrays SinglePass

● Say you have to store a large number of items:

my @a = @b = ();

push @a, “” for( 1 .. 1_000_000 );@b = map { “” } ( 1 .. 1_000_000 );

print 'Size of @a: ', size( \@a ), "\n";print 'Size of @b: ', size( \@b ), "\n";

Size of @a: 4194388Size of @b: 4000100

● Push ends up with a larger structure:

Hashes are Huge

● Incremental assignment doesn't make hashes larger: they are 8x larger than arrays in both cases.

my %a =();my %b = ();

$a{ $_ } = “” for ( 1 .. 1_000_000 );%b = map { $_ => “” } ( 1 .. 1_000_000 );

print 'Size of %a: ', size( \%a ), "\n";print 'Size of %b: ', size( \%b ), "\n";

Size of %a: 32083244 # vs. 4000100Size of %b: 32083244 # in an array!

Two Ways of Storing Nothing

● There are two common ways of storing nothing in the values of a hash:● Assign an empty list: $hash{ $key } = ();

● Assign an empty string: $hash{ $key } = “”;

● Question:

Which would take less space: empty list or empty string?

TMTOWTDN

my %a =();my %b = ();

$a{ $_ } = () for( 'aaa' .. 'zzz' );$b{ $_ } = '' for( 'aaa' .. 'zzz' );

print "Size of %a:\t", size( \%a ), "\n";print "Size of %b:\t", size( \%b ), "\n";

Size of %a: 570516 # same size for “” & ()?Size of %b: 570516

● size() gives the same result for both values. Why?

TMTOWTDN

my %a =();my %b = ();

$a{ $_ } = () for( 'aaa' .. 'zzz' );$b{ $_ } = '' for( 'aaa' .. 'zzz' );

print "Size of %a:\t", size( \%a ), "\n";print "Size of %b:\t", size( \%b ), "\n";

print "Total in %a:\t", total_size( \%a ), "\n";print "Total in %b:\t", total_size( \%b ), "\n";

Size of %a: 570516 # size() doesn't alwaysSize of %b: 570516 # matter!

Total in %a: 851732Total in %b: 1203252

● total_size() benchmarks the values:

Replace Hashes With Arrays

● The smartmatch operator (“~~”) is fast.● Pushing onto an array:

$a ~~ @uniq or push @uniq, $a

uses about 1/8 the space of assigning hash keys:$uniq{ $a } = ();

keys %uniq

● The extra space used by array growth in push is dwarfed by the savings of an array over a hash.

● sort @uniq is much faster than sort keys %uniq.

Example: Taxonomy Trees

● The NCBI Taxonomy is delivered with each entry having a full tree.

● These must be reduced to a single tree for data entry and validation.

● There are several ways to do this...

Worst Solution: Parent tree.

● Since the tree is often used from the bottom up, some people store it as a child:parent relationship:

$parentz{ $child_id } = $parent_id;

● Unfortunately, this allocates a full hash table for each 1:1 relationship between a child and parent.

Another Bad Solution: Child Tree

● Another alternative is storing the children in a hash for each parent:

$childz{ $parent_id }{ $child_id } = ();

$childz{ '' } = [ $root_id ];

● This works via depthfirst search to generate the trees and has space to store the treedepth.

● Hashes are bulky and slow for storing a singlelevel structure like this.

Another Solution: SingleLevel Hash

● One oftforgotten bit of Perly lore in the age of references: multipart hash keys.

$childz{ $parent_id, $child_id } = $depth;

$childz{ “” } = [ $root_id ];

● Trades wasted space in thousands of anon hashes for split /$;/o, $key and grep's.

● Usable for moderate trees.● Obviously painful for really large trees.

Q: Why Nest Hashes?● Hashes are nice for the toplevel lookup, but why

nest them?

● Arrays save about 85% of the overhead below the top level.

● Any wasted space from the arrays growing via push is more than saved by avoiding hashes.

● The arrays only need to be sorted once if the tree is used multiple times.

my $c = $childz{ $parent_id } ||=[];

$new_id ~~ $cor push @{ $c{ $parent_id } }, $new_id;

Nested Lists

● List::Util has first() which saves greping entire lists.● A key and payload on an array can be handled

quickly.first { $_->[0] eq $key } @data;

● For shorter lists this saves space and can be faster than a hash.

● This is best for numerics, which don't have to be converted to text in order to be hashed: $_->[0] == $value is the least amount of work to compare integers.

Manage Lifespans

● Lexical variables are an obvious place.● Local values are another.

● Saves reallocating a set of values within tight loops in the called code.

● Local hash keys are a good way to manage storage in reused hashes handled with recursion.

● Use delete to remove hash keys in multilevel structures instead of assigning an empty list or “”.● This preserves the skeleton for recycling.● Saves storing the keys.

Use Simpler Objects

● If you're using insideout objects, why bless a hash?● Users aren't supposed to diddle around inside your

objects anyway.

● The only thing you care about is the address.● Bless something smaller:

my $obj = bless \(my $a), $package;

Use Linked Lists for Queues

● Automatically frees discarded nodes without having to modify the entire list.

● Based on an array they don't use much extra data:$node = [ $ref_to_next, @node_data ];

● Walking the list is simple enough:( $node, my @data ) = @$node;

● So is removing a node:$node->[0] = $node->[0][0];

● These are quite convenient for threading.

Use Hashes for Sparse Arrays

● OK, Time to stop beating up on hashes.● They beat out arrays for sparse lists.● Even list of integers.

● Say a collection of DNA runs from 15 to 10_000 bases, filling about 10% of the actual values.

● You could store it as:$dnaz[ $length ] = [ qw( dna dna dna ) ];

● But this is probably better stored in a hash:$dnaz{ $length } = [ qw( dna dna dna ) ];

Accessing Hash Keys: Integer Slices

● Numeric sequences work fine as hash keys.● Say you want to find all of the sequences within

+/ 10% of the current length:‑

my $min = 0.9 * $length;my $max = 1.1 * $length;my @found = grep{ $_ } @dnaz{ ( $min .. $max ) };

● For nontrivial, sparse lists this saves scaffolding by only storing the structure necessary.

● This doesn't change the data storage, just the overhead for accessing it by length.

Store Uppertriangular Comparisons

● Saves more than half the space.● Accessor can look for $i > $j ? [$i][$j] : [$j][$i] and

get the same results.● Requires designing symmetric comparison

algorithms (values can be returned asis or just negated).

● Also saves about half the processing time to only generate a single comparison for each pair.

● Requires access to the algorithm.

Example: DNA Analysis

● Our Wcurve analysis is used to compare large groups of DNA to one another.

● The original algorithm compared the curves until the first one was exhausted.

● Changing that to use the longer sequence in all cases saved us over half the comparison time.

Summary

● Devel::Size can be useful in your code.● Managing the lifespan of values helps.● Using efficient structures helps even more.

● Use arrays instead of hash structures where they make sense.

● Bless smaller structures: scalars, regexen, globs make perfectly good objects and take less space than hashes.

● Use XS or Inline where necessary.● And, yes, size() still matters.

Memory unmanglement

Documents

Transcript of Memory unmanglement

Memory Management - Virtual Memory

EKT 221 : Digital 2 Memory Basics. Today’s Outline Memory Basics Memory Basics Memory Definitions Memory Definitions Memory Organizations Memory Organizations.

Memory Studying Memory Building Memories: Encoding Memory Storage

Memory Management. Memory Manager Requirements –Minimize executable memory access time –Maximize executable memory size –Executable memory must be cost-effective.

1 LECTURE 22 memory organization memory-density trends flash memory DRAM Semiconductor Memory Semiconductor Memory.

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

Human Memory. Three Types Sensory Memory Short Term Memory Long Term Memory Sensory Memory Short Term Memory Long Term Memory.

The foundations of memory By S. Aleksandrova. Content : What is memory? Three – stage model of memory Sensory memory Short – term memory Long – term memory.

Psychology and Sociology€¦ · Office Hours: Psych/Soc Content Review Memory Memory Sensory memory Short-term memory Long-term memory Iconic Echoic Haptic Working memory Visuospatial

Violin Memory 6000 Series Memory Arrays with Memory ...€¦ · Violin Memory 6000 Series Memory Arrays with Memory Gateways Security Target Violin Memory 6000 Series Memory Arrays

[XLS] · Web view005B PC Memory - 4MB 005C PC Memory - 6MB 005D PC Memory - 8MB 005E PC Memory - 10MB 005F PC Memory - 12MB 005G PC Memory - 14MB 005H PC Memory - 16MB 005I PC Memory

Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.

CHARACTER WORD - KiDs Beach Club · Christmas MEMORY LINK Christmas MEMORY LINK Christmas MEMORY LINK Christmas MEMORY LINK Christmas MEMORY LINK Christmas MEMORY LINK Christmas MEMORY

Memory Management & Virtual Memory - University at Buffalo · Virtual Memory Background • Virtual memory – separation of user logical memory from physical memory. – Only part

Memory Management Today Basic memory management Swapping Kernel memory allocation Next Time Virtual memory.

Memory Organization (Memory Hierarchy)ggn.dronacharya.info/ECEDept/Downloads/QuestionBank/Vsem/... · 2013. 10. 15. · Memory Organization (Memory Hierarchy) Memory hierarchy in

Chapter 14 Memory System. HCS12 Memory System It has three internal memory blocks: –Program memory (EPROM or flash memory) –Data memory (SRAM) –Data EEPROM.

Memory Short term memory (a.k.a. Working Memory).

...Pokemon Memory . Pokemon Memory . Pokemon Memory

Cognition: Memory. The Phenomenon of Memory Introduction Memory Extremes of memory.