applying the philosophy of functional programming to ... · functional programming has many...
Transcript of applying the philosophy of functional programming to ... · functional programming has many...
functional thinkingapplying the philosophy of functional programmingto system design and architecture
Jed Wesley-Smith @jedws
functional programming has many benefits: better program reasonability, composition, refactorability and performance
yet, the dominant models & paradigms for software architecture and building software systems today remain rooted in mutation and side-effects
many of the ideas and principles of functional programming have been applied to solve design problems including security, concurrency, auditing and robustness
it is possible and desirable to apply them to all of the systems we build, and gain practical advantage from doing so
is the universe mutable?
what is change?
what about the past?
what is now?
what is functional programming?
programming, with functions!
a function
f : A -> B
relates one value from its domain: A
to exactly one value from its range – or co-domain: B
always the same – or equivalent – value
and nothing else!
programming with values
values
immutable, values do not change
shareable, can be cached forever
referentially transparent expressions
the state of a thing in an instant in time
functions are values too
what about identities?
identity
what we think of as the things around us; you, me, the plants and animals, rivers and mountains…
identities are things we name
we are used to thinking of the world in terms of identities, they are the objects in our world
since the time of Plato and Aristotle, philosophers have posited true reality as timeless, based on permanent substances, while processes are denied or subordinated to timeless substances
if Socrates changes, becoming sick, Socrates is still the same, and change (his sickness) only glides over his substance: change is accidental, whereas the substance is essential
http://en.wikipedia.org/wiki/Process_philosophy
“
”
– Heraclitus
“No man ever steps in the same river twice, for it's not the same river and
he's not the same man.”
an identity is a series of values
over time
reifying time
f : A -> B
f : A -> B
f : A -> B f : A -> T -> B
f : A -> B f : A -> T -> B
Version: 1, Time: A
Version 1Version: 2, Time: B
Version 1Version 2Version: 3, Time C
a -> t1 -> X
a -> t1 -> X
a -> t2 -> X'
change
X + Δ = X'
X' - X = Δ
X' - Δ = X we can store entire versions, or we can store deltas
they are equivalent
being in possession of any two allows us to traverse time
architecture in the Real World™
problem: atomic updates
journaling file system
many writes in a single update
• describe writes in a log • perform writes • mark logged writes as complete
replay incomplete writes to recover from system failure
journal is an append-only immutable structure, contains an audit log of all changes (usually deltas)
can be used to revert a system to a previous state
journaling file system: zfs
constant time snapshot of file-system state
incremental changes create multiple versions that are persistent, revertable and replayable (ie. copy-on-write)
high cache efficiency due to immutability of data
storage compaction via data de-duplication
continuous integrity checking and automatic data repair
content-addressable storage
files are stored at an address computed from their content: a content hash
names are associated with a hash
retrieval looks up the current hash for a name, then accessing the content stored at that address
update adds new content, then a new (name, hash) pair
caches only cache content at a hash, not at a name
git: version control system
non-linear development, branching/merging
distributed development, changes must be shareable between repositories that are not necessarily connected
cryptographic authentication of history, the ability to uniquely identify the complete development history of any change to the resources in a repository
git: design
content is stored as a directed acyclic graph (DAG) of content and content deltas plus meta-data
content blobs are stored using the hash of the content or delta
trees store lists of file names and links to content in the form of other trees, or blob hash
commits are stored using a hash of the meta-data, including tree hash, author, date, parent commit/s
git: file format
updates add new deltas, or a full version known as a pack
all old versions are reconstructable
the same content produces the same hash, equivalent updates commute
data-structure is (mostly) immutable
mutable pointer to head of a branch
git: benefits
presents a mutable “view” of an immutable structure
commit hash includes parent commits, providing a cryptographically secure signature of content and history
commit and content data are shareable values, enabling distribution between multiple repositories
lucene
full-text indexing and search
needs to maintain a stable searchable “view” of an index in the face of concurrent updates
lucene: index
an index is a collection of Documents
a document is a collection of Fields and has an ID
an index is updated by deleting and re-adding documents
searching is done via a Searcher – for its lifetime, a searcher will see the state of the index as it was when it was opened
an index is made of Segment files
segments contain documents
deleting a document adds the document ID to a per-segment “.del” file – ie. it doesn’t modify the segment file directly
when no searchers reference a segment with many deleted documents, it may be be merged with others into a new segment containing the remaining documents – ie. garbage collection
lucene: file-format
segment 1
document 1document 2document 3document 4document 5
document 9document 8
document 6document 7
document 0
segment 2
document 11document 12document 13document 14document 15
document 19document 18
document 16document 17
document 10
segment 1
document 1document 2document 3document 4document 5
document 9document 8
document 6document 7
document 0
segment 2
document 11document 12document 13document 14document 15
document 19document 18
document 16document 17
document 10
searcher1
segment 1
document 1document 2document 3document 4document 5
document 9document 8
document 6document 7
document 0
segment 2
document 11document 12document 13document 14document 15
document 19document 18
document 16document 17
document 10
searcher
segment 3
document 21document 22
document 20
document 3document 8
document 11
searcher1
segment 1
document 1document 2document 3document 4document 5
document 9document 8
document 6document 7
document 0
segment 2
document 11document 12document 13document 14document 15
document 19document 18
document 16document 17
document 10
searcher
segment 3
document 21document 22
document 20
document 3document 8
document 11
searcher1
segment 1
document 1document 2document 3document 4document 5
document 9document 8
document 6document 7
document 0
segment 2
document 11document 12document 13document 14document 15
document 19document 18
document 16document 17
document 10
searcher1
document 3document 8
document 11
searcher2
segment 3
document 21document 22
document 20
netflix
scale: 30% of last-mile internet traffic, +10k AWS instances
immutable everything, including servers:
• servers are values, not modified • new versions are printed and deployed • old versions are replaced
idempotent updates
ReactiveJava/RX (JavaScript) programming model
conclusions
avoid mutation at all costs
values replace – or occlude – values
store change
apply changes to construct a temporal view
apply these ideas to your entire system architecture
profit!
thanks