Working with functional data structures

Post on 06-May-2015

896 views 2 download

Tags:

description

Beyond the bread-and-butter singly linked list are dozens of practical Functional Data Structures available to mask complexity, enable composition, and open possibilities in pattern matching. This session focuses on data structures available today in F#, but has practical application to most any functional language. The session covers what makes these structures functional, when to use them, why to use them, choosing a structure, time complexity awareness, garbage collection, and practical insights into creating and profiling your own Functional Data Structure. Bibliography at http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography

Transcript of Working with functional data structures

04/11/2023 1

Working with Functional Data Structures

Practical F# Application

acster.com jackfoxy.com @foxyjackfox

Jack Fox

@foxyjackfoxJackfoxy.com

04/11/2023 3

tl;dr Singly-linked list -- the fundamental purely functional data structure

Time complexity overview

Garbage collection and real-world performance

Reasons to use Purely Functional Data Structures

When not to use Purely Functional Data Structures

Choices and shapes

Build your own Purely Functional Data Structure

acster.com jackfoxy.com @foxyjackfox

04/11/2023acster.com jackfoxy.com@foxyjackfox 4

What is purely functional?

Immutable

Persistent

Thread safe

Recursive

Incremental

04/11/2023 5

Theoretical Performance O(1)

O(log * n) practically O(1)

O(log log n)

O(log n)

O(n) linear time

O(n2) gets real bad from here on out

acster.com jackfoxy.com @foxyjackfox

04/11/2023 6

Theoretical Performance (most common) O(1)

O(log * n) practically O(1)

O(log log n)

O(log n)

O(n) linear time

O(n2) gets real bad from here on out

O(i) variables other than n require explanation

acster.com jackfoxy.com @foxyjackfox

04/11/2023 7

Actual Performance Processor architecture (instruction look-ahead, cache,

etc.) .NET Garbage Collection O(n) behavior starts for “large enough size”

Recursive Benchmarks over different Structure Sizes

102

103

104

105

106

acster.com jackfoxy.com @foxyjackfox

often looks like << O(n)

usually settles down to O(n),sometimes looks like > O(n)

04/11/2023 8

List as a recursive structure

acster.com jackfoxy.com @foxyjackfox

[ ]1::

234

Empty List

Adding Element

Head

Tail

04/11/2023acster.com jackfoxy.com@foxyjackfox 9

So what the heck would you do with a list?

Demo 1

04/11/2023acster.com jackfoxy.com@foxyjackfox 10

“Getting” the recursive thing

SICP

a.k.a

Abelson &Sussman

a.k.a

The Wizard Book

04/11/2023 11

Why no update or remove in List ?

acster.com jackfoxy.com @foxyjackfox

Graphics: unattributed, all over the internet

04/11/2023 12

Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<'a>) =

2. match (i, l) with

3. | i', [] -> raise (System.Exception("subscript"))

4. | 0, x::xs -> updateElem::xs

5. | i', x::xs -> x::(loop (i' - 1) y xs)

acster.com jackfoxy.com @foxyjackfox

[ ]1234 ::

::

::

found it!

04/11/2023 13

Okasaki’s Pseudo-Canonical List Update1. let rec loop i updateElem (l:list<'a>) =

2. match (i, l) with

3. | i', [] -> raise (System.Exception("subscript"))

4. | 0, x::xs -> updateElem::xs

5. | i', x::xs -> x::(loop (i' - 1) y xs)

Do you see a problem?

acster.com jackfoxy.com @foxyjackfox

04/11/2023 14

We could just punt

1. let punt i updateElem (l:list<'a>) =

2. let a = List.toArray l

3. a.[i] <- updateElem

4. List.ofArray a

acster.com jackfoxy.com @foxyjackfox

04/11/2023 15

…or try a Hybrid approach1. let hybrid i updateElem (l:list<'a>) =

2. if (i = 0) then List.Cons (y, (List.tail l))

3. else

4. let rec loop i' (front:'a array) back =

5. match i' with

6. | x when x < 0 -> front, (List.tail back)

7. | x ->

8. Array.set front x (List.head back)

9. loop (x-1) front (List.tail back)

10. let front, back = loop (i - 1) (Array.create i y) l

11. let rec loop2 i' frontLen (front’:'a array) back’ =

12. match i' with

13. | x when x > frontLen -> back’

14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’)

15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back)

acster.com jackfoxy.com @foxyjackfox

04/11/2023 16

Time complexity of update optionsPseudo-Canonical

O(i)

Punt

O(n)

Hybrid

O(i)

acster.com jackfoxy.com @foxyjackfox

Place your bets !

Graphics: unattributed, all over the internet

04/11/2023 17

Actual Performance

102 PC - 2.9ms Punt - 0.2msHybrid 1.4X 4.0 PC 1.1X 0.2Punt 1.5 4.5 Hybrid 4.1

0.8

acster.com jackfoxy.com @foxyjackfox

10k Random Updates One-time Worst Case

PC looks perfect !

Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/

04/11/2023 18

Actual Performance

102 PC - 2.9ms Punt - 0.2msHybrid 1.4X 4.0 PC 1.1X 0.2Punt 1.5 4.5 Hybrid 4.1

0.8

103 Hybrid - 29.6 Punt - 0.2Punt 1.6 47.6 PC 1.1 0.2PC 1.7 50.3 Hybrid 4.1 0.8

104 Hybrid - 320.3 Punt - 0.3Punt 1.7 534.9 PC 1.3 0.4PC 2.9 920.2 Hybrid 3.2 0.9

105 Hybrid - 4.67sec Punt - 1.0Punt 2.0 9.34 Hybrid 1.5 1.5PC stack overflow !

acster.com jackfoxy.com @foxyjackfox

10k Random Updates One-time Worst Case

04/11/2023acster.com jackfoxy.com@foxyjackfox 19

Benchmarking performance

Hard to reason about actual performance

DS_Benchmark◦Open source on Github◦Discards outliers◦Fully isolates code to benchmark◦Fully documented◦“how to extend” documented

04/11/2023 20

Shapes: let your imagination run wild!

acster.com jackfoxy.com @foxyjackfox

Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg

04/11/2023acster.com jackfoxy.com@foxyjackfox 21

Binary Random Access List

Same Cons, Head, Tail signature

Optimized for Lookup and Update O(log n)

…but not for Remove

Why Not?

Does it with alternate internal structures

04/11/2023 22

Queue (FIFO)

acster.com jackfoxy.com @foxyjackfox

54::

321

Adding Element

Head

Tail[ ]Empty Queue

;;

04/11/2023acster.com jackfoxy.com@foxyjackfox 23

Deque (double-ended queue)

54::

321

Adding Element

Head

Tail[ ]Empty Deque

;;

Init Last

04/11/2023acster.com jackfoxy.com@foxyjackfox 24

Deque and remove

Approximately O(i/2)

(where i is index to element)

04/11/2023acster.com jackfoxy.com@foxyjackfox 25

Heap

* names in signature altered from Okasaki’s implementationGraphics: http://www.turbosquid.com/3d-models/heap-gravel-max/668104

::

1 Head

Tail

Insert Element

Merge Heaps

[ ]

Empty Heap

04/11/2023acster.com jackfoxy.com@foxyjackfox 26

Heap and remove

O(1) (if implemented)

…but implementation raises issues

Deleting before inserting

Order of events could nullify deletion before insertion

Equal values?

04/11/2023 27

Canonical Functional Linear StructuresOrder

by constructionascendingdescending random

Grow

Shrink

Peek

acster.com jackfoxy.com @foxyjackfox

04/11/2023 28

Fsharpx.Collections RandomAccessList = List + iLookup + iUpdate

DList = List + conj + append

Deque = List + conj + last + initial + rev = initial U tail

LazyList = ListLazy

Heap = List + sorted + append

Queue = List - cons + conj

Vector = List - cons - head - tail + conj + last + initial + iLookup + iUpdate

= RandomAccessList-1

acster.com jackfoxy.com @foxyjackfox

04/11/2023acster.com jackfoxy.com@foxyjackfox 29

Summary of time complexity performance Vector & Binary Random Access List

O(1) cons-conj / head-last / tail-init O(log32n) lookup / update

Dlist

O(1) cons / conj / head / append

O(log n) tail

Deque

O(1) cons / head / tail / conj / last / init

O(1) reverse O(i/2) lookup / update (generally)

Heap

O(1) insert / head O(log n) merge / tail

Queue

O(1) conj / head / tail (generally)

O(log n) merge / tail

04/11/2023acster.com jackfoxy.com@foxyjackfox 30

Measured performance (grow by one)

  102

103

104

105

106

ms.f#.array 0.8 1.8 100.9 11,771.4 n/a

ms.f#.array — list 0.3 1 69.5 n/a n/a

ms.f#.list 0.4 0.4 0.4 1.0 13.8

ms.f#.list — list 0.7 0.7 0.9 2.3 45.3

Deque — conj 0.3 0.3 0.5 4.7 *

Deque — cons 0.3 0.3 0.5 4.7 *

Dlist — conj 0.7 0.7 1.0 7.7 153.0

Dlist — cons 0.7 0.7 1.0 6.4 118.4

Heap 3.2 3.3 5.0 22.5 254.7

LazyList 0.9 0.9 1.0 2.6 108.3

Queue 1.0 1.1 1.4 7.6 106.6

RandomAccessList 0.8 0.9 3.3 19.6 189.8

Vector 0.8 0.9 3.3 19.7 189.1

04/11/2023acster.com jackfoxy.com@foxyjackfox 31

Trees

04/11/2023acster.com jackfoxy.com@foxyjackfox 32

Trees

Wide variety of applications

Binary (balanced or unbalanced)

Multiway (a.k.a. RoseTree)

04/11/2023 33

Red Black Tree Balancing

acster.com jackfoxy.com @foxyjackfox

a b

c

db ca d

Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf

a

a

b c

d

b c

d

a

c

b

d

04/11/2023 34

1.type 'a t = Node of color * 'a * 'a t * 'a t | Leaf

2.let balance = function

3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d

4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d

5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d)

6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) ->

7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d))

8. | x -> Node x

Source: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html

acster.com jackfoxy.com @foxyjackfox

Talk about reducing complexity!

04/11/2023 35

Extra Credit

Write the Remove operation for a Red Black Tree

Here’s how: http://en.wikipedia.org/wiki/Red-black_tree#Removal

acster.com jackfoxy.com @foxyjackfox

04/11/2023acster.com jackfoxy.com@foxyjackfox 36

Fsharpx.Collections.Experimental

IntMap(Map-like structure)

BKTree

RoseTree(lazy multiway)

EagerRoseTree IndexedRoseTree

MS.F#.Collections

Map

Set

04/11/2023acster.com jackfoxy.com@foxyjackfox 37

To Do:

Benchmark:

RoseTree (lazy)

EagerRoseTree (not yet implemented)

IndexedRoseTree

Multiway as unbalanced binary tree(polymorphic recursion)

04/11/2023acster.com jackfoxy.com@foxyjackfox 38

Another To Do:

The (not-so-) Naïve Binary Tree:

As seen all over the internet…

04/11/2023acster.com jackfoxy.com@foxyjackfox 39

Another To Do:

The (not-so-) Naïve Binary Tree:

As seen all over the internet…

…yet often missing: Pre-orderPost-orderIn-order

fold traversals (better be tail-recursive).And maybe a zipper navigator while you

are at it!

04/11/2023acster.com jackfoxy.com@foxyjackfox 40

Call for Action!

Fsharpx.Collections.Experimental

GitHub fork FSharpxImplement some interesting structure and

testsSync back to your forkPull request

Out of ideas or just want to practice?

Unimplemented Okasaki structures:

http://github.com/jackfoxy/DS_Benchmark/tree/master/PurelyFunctionalDataStructures

04/11/2023acster.com jackfoxy.com@foxyjackfox 41

When not to use purely functional

Consider Array if performance is critical

Functional dictionary–like structures (Map) may not perform well-enough, especially after scale 104

Consider .NET dictionary–like object

04/11/2023acster.com jackfoxy.com@foxyjackfox 42

Publishing your functional DS

FSharpx.Collections.readme.md

Include Try value returning option for values that can throw Exception

Include other common values if < O(n)

Reason about edge cases(more unit tests better than not enough)

04/11/2023acster.com jackfoxy.com@foxyjackfox 43

Build your own structure

Demo 3

Leverage Heap as internal structure to create RandomStack

04/11/2023acster.com jackfoxy.com@foxyjackfox 44

Closing Thought

The functional data structures further from the “mainstream” (if such a measure were possible) tend to have less inherit value in their generic form.

Therefore the ultimate functional data structures collection would combine the characteristics of a library, a snippet collection, a benchmarking tool, superb documentation, test cases, and EXAMPLES!

04/11/2023acster.com jackfoxy.com@foxyjackfox 45

Resources

FSPowerPack.Core.Community (NuGet)

FSharpx.Core (GitHub & NuGet)

FSharpx.Collections.Experimental (GitHub & NuGet)

DS_Benchmark (GitHub) raw code for structures not yet merged to FSharpx