applying the philosophy of functional programming to ... · functional programming has many...

functional thinkingapplying the philosophy of functional programmingto system design and architecture

Jed Wesley-Smith @jedws

functional programming has many benefits: better program reasonability, composition, refactorability and performance

yet, the dominant models & paradigms for software architecture and building software systems today remain rooted in mutation and side-effects

many of the ideas and principles of functional programming have been applied to solve design problems including security, concurrency, auditing and robustness

it is possible and desirable to apply them to all of the systems we build, and gain practical advantage from doing so

is the universe mutable?

what is change?

what about the past?

what is now?

what is functional programming?

programming, with functions!

a function

f : A -> B

relates one value from its domain: A

to exactly one value from its range – or co-domain: B

always the same – or equivalent – value

and nothing else!

programming with values

values

immutable, values do not change

shareable, can be cached forever

referentially transparent expressions

the state of a thing in an instant in time

functions are values too

what about identities?

identity

what we think of as the things around us; you, me, the plants and animals, rivers and mountains…

identities are things we name

we are used to thinking of the world in terms of identities, they are the objects in our world

since the time of Plato and Aristotle, philosophers have posited true reality as timeless, based on permanent substances, while processes are denied or subordinated to timeless substances

if Socrates changes, becoming sick, Socrates is still the same, and change (his sickness) only glides over his substance: change is accidental, whereas the substance is essential

http://en.wikipedia.org/wiki/Process_philosophy

“

”

http://en.wikipedia.org/wiki/Process_philosophy

– Heraclitus

“No man ever steps in the same river twice, for it's not the same river and

he's not the same man.”

an identity is a series of values

over time

reifying time

f : A -> B

f : A -> B f : A -> T -> B

Version: 1, Time: A

Version 1Version: 2, Time: B

Version 1Version 2Version: 3, Time C

a -> t1 -> X

a -> t1 -> X

a -> t2 -> X'

change

X + Δ = X'

X' - X = Δ

X' - Δ = X we can store entire versions, or we can store deltas

they are equivalent

being in possession of any two allows us to traverse time

architecture in the Real World™

problem: atomic updates

journaling file system

many writes in a single update

• describe writes in a log • perform writes • mark logged writes as complete

replay incomplete writes to recover from system failure

journal is an append-only immutable structure, contains an audit log of all changes (usually deltas)

can be used to revert a system to a previous state

journaling file system: zfs

constant time snapshot of file-system state

incremental changes create multiple versions that are persistent, revertable and replayable (ie. copy-on-write)

high cache efficiency due to immutability of data

storage compaction via data de-duplication

continuous integrity checking and automatic data repair

content-addressable storage

files are stored at an address computed from their content: a content hash

names are associated with a hash

retrieval looks up the current hash for a name, then accessing the content stored at that address

update adds new content, then a new (name, hash) pair

caches only cache content at a hash, not at a name

git: version control system

non-linear development, branching/merging

distributed development, changes must be shareable between repositories that are not necessarily connected

cryptographic authentication of history, the ability to uniquely identify the complete development history of any change to the resources in a repository

git: design

content is stored as a directed acyclic graph (DAG) of content and content deltas plus meta-data

content blobs are stored using the hash of the content or delta

trees store lists of file names and links to content in the form of other trees, or blob hash

commits are stored using a hash of the meta-data, including tree hash, author, date, parent commit/s

git: file format

updates add new deltas, or a full version known as a pack

all old versions are reconstructable

the same content produces the same hash, equivalent updates commute

data-structure is (mostly) immutable

mutable pointer to head of a branch

git: benefits

presents a mutable “view” of an immutable structure

commit hash includes parent commits, providing a cryptographically secure signature of content and history

commit and content data are shareable values, enabling distribution between multiple repositories

lucene

full-text indexing and search

needs to maintain a stable searchable “view” of an index in the face of concurrent updates

lucene: index

an index is a collection of Documents

a document is a collection of Fields and has an ID

an index is updated by deleting and re-adding documents

searching is done via a Searcher – for its lifetime, a searcher will see the state of the index as it was when it was opened

an index is made of Segment files

segments contain documents

deleting a document adds the document ID to a per-segment “.del” file – ie. it doesn’t modify the segment file directly

when no searchers reference a segment with many deleted documents, it may be be merged with others into a new segment containing the remaining documents – ie. garbage collection

lucene: file-format

segment 1

document 1document 2document 3document 4document 5

document 9document 8


document 0

segment 2




document 10

segment 1




document 0

segment 2




document 10

searcher1

segment 1




document 0

segment 2




document 10

searcher

segment 3


document 20


document 11

searcher1

segment 1




document 0

segment 2




document 10

searcher1


document 11

searcher2

segment 3


document 20

netflix

scale: 30% of last-mile internet traffic, +10k AWS instances

immutable everything, including servers:

• servers are values, not modified • new versions are printed and deployed • old versions are replaced

idempotent updates

ReactiveJava/RX (JavaScript) programming model

conclusions

avoid mutation at all costs

values replace – or occlude – values

store change

apply changes to construct a temporal view

apply these ideas to your entire system architecture

profit!

thanks

applying the philosophy of functional programming to ... · functional programming has many...

Documents

Transcript of applying the philosophy of functional programming to ... · functional programming has many...