Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once...

11
Apr 17, 2013 Persistent Data Structures

Transcript of Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once...

Page 1: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Apr 17, 2013

Persistent Data Structures

Page 2: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Definitions

An immutable data structure is one that, once created, cannot be modified Immutable data structures can (usually) be copied, with modifications, to

create a new version The modified version takes up as much memory as the original version

A persistent data structure is one that, when modified, retains both the old and the new values Persistent data structures are effectively immutable, in that prior references

to it do not see any change Modifying a persistent data structure may copy part of the original, but the

new version shares memory with the original

This definition is unrelated to persistent storage, which means keeping a copy of data on disk between program executions

Page 3: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Why persistent data structures?

Functional programming is based on the idea of immutable data—or persistent data, which is effectively immutable

The use of immutable data structures greatly simplifies concurrent programming

Synchronization is expensive, and immutable data structures don’t need to be synchronized

Copying large data structures is expensive and wastes space, but persistent data structures can use sophisticated structure sharing to reduce the cost on disk between program executions

Page 4: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Lists

Lists are the original persistent data structures, and are very heavily used in functional programming

x zy

original

w

insert w delete x

As you can see, persistence is automatic with a list, and requires no additional effort

Page 5: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Trees and binary trees

Trees and binary trees can also be implemented in a persistent fashion, though it takes a bit more work

5

A

B C

D E F G

H I J K L M N

A’

C’

G’

Page 6: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Arrays and vectors

It’s more difficult to implement a persistent array The programming language Clojure implements

persistent vectors, which are like arrays but can be expanded

Any location in a vector can be accessed in (almost) O(1) time

Vectors are represented as “fat trees,” or more precisely, as 32-tries

6

Page 7: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Tries

A trie is like a binary search tree, only each node may have many children

Tries are most often used with strings (and have up to 26 children per node)

Each node of a 32-trie may have 32 children

7

Page 8: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Vector implementation I A persistent vector in Clojure is implemented as an N-level trie (N <= 7),

where the root and internal nodes are arrays of 32 references, and the leaves are arrays of 32 values

The depth of the trie (1 to 7) is also kept as an instance value For example, consider accessing location 5000 in a vector

5000 decimal is 1001110001000 binary

To acess element 5000 in a trie of depth 4: The binary number in group 4 (green) says to take the 0th reference The binary number in group 3 (orange) says to take the 5th reference The binary number in group 2 (green) says to take the 28th reference The binary number in group 1 (blue) says to take the 8th value

8

Page 9: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Vector implementation II

The trie can be treated as a “fat tree,” with the structure sharing discussed earlier Because the trie is fat (many children per node), there is a

high proportion of actual data to structure Access time is “almost” O(1), but as the size increases, the

constant factor grows from 1 to 7 (depth of trie) This design is especially good for appending vectors For adding single elements to the end of the vector,

there are additional special-case optimizations

9

Page 10: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Persistent Hash Map Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could be

implemented just like a vector For a vector, the additional space required for the trie structure is a reasonable

proportion of the total space For a hash map, the additional space required is not reasonable

There will be a large number of 32-element arrays which contain mostly nulls

The hard part is to use only as much space as needed

Basic approach: Use arrays size N <= 32, where N is the number of non-null children Use a 32-bit word to indicate which children are actually present

For example: 00010000000100010000000000101000 indicates 5 children

Find a fast function to map numbers in the range [0, 31] into the range [0, N) Many processors have an instruction to count the number of 1 bits in a word

This would make a good assignment for the next time I teach this course

10

Page 11: Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

The End

11

Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.

--Sir Winston Churchill, Speech in November 1942