Post on 14-Feb-2017
DESIGNING A DATABASE LIKE
AN ARCHAEOLOGIST
Yoav Rubin@yoavrubin
The background story
■How I got to it
■500 Lines or less■Datomic
“…My own strategy is to find a car, or the nearestequivalent, which looks as if it knows where it’s going and follow it.”The Long Dark Tea-Time of the Soul, Douglas Adams
My understanding of Datomic
Mental model - entities■ Built of entities
– Slices are entities■ Entities have attributes
– E.g., shape, color, quantity■ Attributes have values■ Key insight:
Things can “change” only by Adding layers A
B
CD
0
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
A
B
CD
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
3
A
B
C
DXD
Mental model - layers■ Each change creates a new
layer■ Update entity■ Delete entity■ Add entity
■ Each layer has its timestamp0
1
2
3
4
A
B
CD
E
DX
Mental model - Datom■ The basic data building block■ composed of a value of an
attribute of an entity at a specific time– E.g (E.A.V.T)
■ A, count, 3, 0■ A, Color, blue, 0■ A, Shape, rectangle, 0
■ A datom points to its previous version
■ A datom may represent a relationship between entities– An entity may point to
anotherentity
0
1
2
3
4
A
B
CD
E
DX
Why archeology■ It’s like an archeological
excavation site.■ The excavation site is a database.■ Each artifact is an entity
– With its corresponding ID.■ Each entity has a set of attributes
– which may change over time■ Each attribute has a
specific value at a specific time■ When you go deeper, you go back
in time■ A change is a new layer
– That hides the previous value
0
1
2
3
4
A
B
CD
E
DX
Design approach: Bottom up
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
(defrecord Database [layers top-id curr-time]) (defrecord Layer [storage VAET AVET VEAT EAVT]) (defrecord Entity [id attrs])
(defrecord Attr [name value ts prev-ts])
(defprotocol Storage (get-entity [storage e-id] ) (write-entity [storage entity]) (drop-entity [storage entity]))
Data model - constructs
(defn make-attr ([name value type ; these ones are required
& {:keys [cardinality] :or {cardinality :db/single}}] ; defaults {:pre [(contains? #{:db/single :db/multiple} cardinality)]} ; DbC preconditions (with-meta (Attr. name value -1 -1) ; creation {:type type :cardinality cardinality}))) ; metadata
(defn make-entity ([] (make-entity :db/no-id-yet))
([id] (Entity. id {})))
(defn add-attr [ent attr] (let [attr-id (keyword (:name attr))]
(assoc-in ent [:attrs attr-id] attr)))
Data model – basic creators
(defn entity-at ([db ent-id] (entity-at db (:curr-time db) ent-id)) ([db ts ent-id] (get-entity (get-in db [:layers ts :storage]) ent-id)))
(defn attr-at ([db ent-id attr-name] (attr-at db ent-id attr-name (:curr-time db))) ([db ent-id attr-name ts] (get-in (entity-at db ts ent-id) [:attrs attr-name]))) (defn value-of-at ([db ent-id attr-name] (:value (attr-at db ent-id attr-name))) ([db ent-id attr-name ts] (:value (attr-at db ent-id attr-name ts))))
(defn indx-at ([db kind] (indx-at db kind (:curr-time db))) ([db kind ts] (kind ((:layers db) ts))))
Data model - accessors
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Indexing - why■ The database accumulates facts
– Many of them
■ Needs to provide mechanisms to ask questions about these facts– Graph query APIs– Datalog query language APIs
■ That mechanism of insights extraction must be efficient
■ This is what indexes are all about
Indexing - what
■ A fact can be identified by the triplet entityId, attributeName and value – at a specific time– A Datom
■ Datoms are indexed
Indexing - how■ An index is a three leveled structure:
– First level: map from key to a second level map– Second level: map from a key to a third level set– Third level: a set
■ Represent the datom in an index structure– Each level represents different kind of items in it – Either entityId, attributeName or value
Indexing - how
■ The name of the index is derived from the kind of items found in each levels
– EAVT: {entityId {attributeName #{value}}}– VEAT: {value {entityId #{attributeName}}}– AVET: {attributeName {value #{entityId}}}– VAET: {value {attributeName #{entityId}}}
EAVT
AVET
(defn make-index [from-eav to-eav usage-pred] (with-meta {}
{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))
Takes a triplet in the canonical EAV order and rearranges it in the index
order
(defn make-index [from-eav to-eav usage-pred] (with-meta {}
{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))
Takes an index triplet and rearrange it in the canonical
EAV order
(defn make-index [from-eav to-eav usage-pred] (with-meta {}
{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))
Decides for a given datom whether it should be indexed in
this index
(defn make-db [] (atom (Database. [(Layer. (fdb.storage.InMemory.) ; storage (make-index #(vector %3 %2 %1) #(vector %3 %2 %1) #(ref? %));VAET (make-index #(vector %2 %3 %1) #(vector %3 %1 %2) always);AVET (make-index #(vector %3 %1 %2) #(vector %2 %3 %1) always);VEAT (make-index #(vector %1 %2 %3) #(vector %1 %2 %3) always);EAVT )] 0 0)))
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
(defn add-entity [db ent] …)
(defn remove-entity [db ent-id] …)
(defn update-entity ([db ent-id attr-name new-val] …) ([db ent-id attr-name new-val operation] …)
Operate on storage
Operate on indexes
New layerExtract top layer Add layer Return a
database
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
What’s in a transaction■ A database■ Set of operations to be performed in an ACI manner■ The desired API:
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
Not function calls!!
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations■ Each operation adds a layer
– On top of the layer that the previous operation added■ Problem: several layers may be added during a transaction■ Solution: re-layer the initial DB with the latest layer
– Then set the time of the new layer– Use the top-id from the last layer
■ Updated the Atom that holds the DB– Or not in case of what-if
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Operations in the form of: [[op param1 param2…] [op param1 param2…]]
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Going over the operations in the
transaction
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
Add a single layer on top of the previously added layer,
in each iteration. Build and execute the add /
update / remove call.
(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))
No more operations:Construct the output of the
transaction – a fully updated db
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
■ _transact is a macro that creates a function that calls the operation it received as an argument
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))
(defn- _what-if [db f txs] (f db txs))
The process
(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))
(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))
(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))
(defn- _what-if [db f txs] (f db txs))
Transaction vs What-if processTransaction What if(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(what-if db (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(_transact db-conn swap! (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(_transact db _what-if (add-entity e1) (update-entity e2 atr2 val2 :db/add))
(swap! db-conn transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(_what-if db transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])
(add-entity db e1)(update-entity e2 atr2 val2 :db/add)
(add-entity db e1)(update-entity e2 atr2 val2 :db/add)
The given db-conn (an Atom) points to a new db
Return a new db
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Evolutionary queries
■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before
Evolutionary queries
(defn evolution-of [db ent-id attr-name] (loop [res [] ts (:curr-time db)] (if (= -1 ts) (reverse res) (let [attr (attr-at db ent-id attr-name ts)] (recur (conj res {(:ts attr) (:value attr)}) (:prev-ts attr))))))
■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before
Ends up with a vector showing evolutionary
steps {:<time> :<value>}
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Graph queries
■ Treating the database as a graph■ Each entity models a node■ An entity may have attributes that their type is :db/ref
– The value of such attribute is an Id of another entity■ Each such attribute models a link
– The link’s label is the attribute name– The link’s target is the attribute value
Completing the graph story
■ For each link we know– Source – the containing entity– Target – the value
■ Need for each node to know who are its links– Outgoing
■ By extracting from the entity the attributes whose type is :db/ref– Incoming
■ Using the VAET index– V: the current node’s Id– E: the set of entities pointing to this node
Data model
Datom lifecycle
Transactions What-if
EvolutionGraph queries
Datalog queries
Indexes
Build
Read
Add UpdateRemove
Summary
■ We have in memory functional DB with– Transactions– What-if– Graph queries– Evolution queries– Simple datalog queries
■ 488 lines, of which– 73 blank– 55 docstrings– Total – 360 lines
Summary - what made it possible■ Design approach: bottom up
– With occasional top-down
■ Clojure’s magic– Persistent data structures– Macros – Data literals– HOFs– Destructuring
■ Clojure approach– Everything is a library– Design data structures and write data structures transformation code– The rest will follow
“… I may not have gone where I intended to go, but I think I have ended up where I needed to be” The Long Dark Tea-Time of the Soul, Douglas Adams