Designing a database like an archaeologist

109
DESIGNING A DATABASE LIKE AN ARCHAEOLOGIST Yoav Rubin @yoavrubin

Transcript of Designing a database like an archaeologist

Page 1: Designing a database like an archaeologist

DESIGNING A DATABASE LIKE

AN ARCHAEOLOGIST

Yoav Rubin@yoavrubin

Page 2: Designing a database like an archaeologist

The background story■Me

■How I got to it

Page 3: Designing a database like an archaeologist

The background story

■500 Lines or less

■Datomic

Page 4: Designing a database like an archaeologist

My understanding of Datomic

Page 5: Designing a database like an archaeologist

Mental model - entities■ Built of entities■ Slices are entities■ Entities have attributes

– E.g., shape, color, quantity

■ Attributes have values■ Update to an entity adds a

new layer■ Add / edit / delete

A

B

CD

Page 6: Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

A

B

CD

Page 7: Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

3

A

B

C

D

D

X

Page 8: Designing a database like an archaeologist

Mental model - layers■ Each change creates a new

layer■ Update entity■ Delete entity■ Add entity

■ Each layer has its timestamp0

1

2

3

4

A

B

CD

E

DX

Page 9: Designing a database like an archaeologist

Mental model - Datom■ A value of an attribute of an

entity at a specific time is a datom

■ E.g., A@0: ■ Count: 3■ Color: blue■ Shape: rectangle

■ A datom points to its previous version

■ An entity may point to anotherentity– Represents a relationship

between entities■ Modeled as yet another

datom

0

1

2

3

4

A

B

CD

E

DX

Page 10: Designing a database like an archaeologist

Why archeology■ It’s like an archeological

excavation site.■ The excavation site is

a database.■ Each artifact is an entity with a

corresponding ID.■ Each entity has a set

of attributes, which may change over time.

■ Each attribute has a specific value at a specific time.

■ When you go deeper, you go back in time

■ An update just hides previousvalue

0

1

2

3

4

A

B

CD

E

DX

Page 11: Designing a database like an archaeologist

Design approach: Bottom up

Page 12: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 13: Designing a database like an archaeologist

(defrecord Database [layers top-id curr-time]) (defrecord Layer [storage VAET AVET VEAT EAVT])

(defrecord Entity [id attrs])

(defrecord Attr [name value ts prev-ts])

(defprotocol Storage (get-entity [storage e-id] ) (write-entity [storage entity]) (drop-entity [storage entity]))

Data model - constructs

Page 14: Designing a database like an archaeologist

(defn make-attr ([name value type ; these ones are required

& {:keys [cardinality] :or {cardinality :db/single}}] ; defaults {:pre [(contains? #{:db/single :db/multiple} cardinality)]} ; DbC preconditions (with-meta (Attr. name value -1 -1) ; creation {:type type :cardinality cardinality}))) ; metadata

(defn make-entity ([] (make-entity :db/no-id-yet))

([id] (Entity. id {})))

(defn add-attr [ent attr] (let [attr-id (keyword (:name attr))]

(assoc-in ent [:attrs attr-id] attr)))

Data model – basic creators

Page 15: Designing a database like an archaeologist

(defn entity-at ([db ent-id] (entity-at db (:curr-time db) ent-id)) ([db ts ent-id] (stored-entity (get-in db [:layers ts :storage]) ent-id)))

(defn attr-at ([db ent-id attr-name] (attr-at db ent-id attr-name (:curr-time db))) ([db ent-id attr-name ts] (get-in (entity-at db ts ent-id) [:attrs attr-name]))) (defn value-of-at ([db ent-id attr-name] (:value (attr-at db ent-id attr-name))) ([db ent-id attr-name ts] (:value (attr-at db ent-id attr-name ts))))

(defn indx-at ([db kind] (indx-at db kind (:curr-time db))) ([db kind ts] (kind ((:layers db) ts))))

Data model - accessors

Page 16: Designing a database like an archaeologist

Indexes

Page 17: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 18: Designing a database like an archaeologist

Indexing

■ The database accumulate facts – called datoms■ A datom is a triplet composed of three items [entityId attributeName value]

■ Need to index datoms

■ An index is a three leveled structure:– First level: map from key to a second level map– Second level: map from a key to a third level set– Third level: a set

■ Each level represents different kind of item in it (entityId, attributeName or value)

Page 19: Designing a database like an archaeologist

Indexing

■ The name of the index is derived from of the items in the levels

– EAVT: {entityId {attributeName #{value}}}– VEAT: {value {entityId #{attributeName}}}– AVET: {attributeName {value #{entityId}}}

– VAET: {value {attributeName #{entityId}}}

Page 20: Designing a database like an archaeologist

EAVT

Page 21: Designing a database like an archaeologist

AVET

Page 22: Designing a database like an archaeologist

(defn make-index [from-eav to-eav usage-pred] (with-meta {}

{:from-eav from-eav :to-eav to-eav :usage-pred usage-pred}))

(defn make-db [] (atom (Database. [(Layer. (fdb.storage.InMemory.) ; storage (make-index #(vector %3 %2 %1) #(vector %3 %2 %1) #(ref? %));VAET (make-index #(vector %2 %3 %1) #(vector %3 %1 %2) always);AVET (make-index #(vector %3 %1 %2) #(vector %2 %3 %1) always);VEAT (make-index #(vector %1 %2 %3) #(vector %1 %2 %3) always);EAVT )] 0 0)))

Page 23: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 24: Designing a database like an archaeologist

(defn add-entity [db ent] …)

(defn remove-entity [db ent-id] …)

(defn update-entity ([db ent-id attr-name new-val] …) ([db ent-id attr-name new-val operation] …)

(defn add-entities [db ents-seq] (reduce add-entity db ents-seq))

Operate on storage

Operate on indexes

New layerExtract top layer Add layer Return a

database

No update to the DB’s curr-time!!

Page 25: Designing a database like an archaeologist

Transactions

Page 26: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 27: Designing a database like an archaeologist

What’s in a transaction■ A database■ Set of operations to be performed in an ACI manner■ The desired API:

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

Not function calls!!

Page 28: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations■ Each operation adds a layer

– On top of the layer that the previous operation added■ Problem: several layers may be added during a transaction■ Solution: re-layer the initial DB with the latest layer

– Then increase the time– Use the top-id from the last layer

■ Updated the Atom that holds the DB

Page 29: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 30: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 31: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 32: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

■ _transact is a macro that creates a function that calls the function it received as an argument

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

Page 33: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

Page 34: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))

(defn- _what-if [db f txs] (f db txs))

Page 35: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

(defmacro _transact [db op & txs] (when txs (loop [[frst-tx# & rst-tx#] txs res# [op db `transact-on-db] accum-txs# []] (if frst-tx# (recur rst-tx# res# (conj accum-txs# (vec frst-tx#))) (list* (conj res# accum-txs#))))))

(defmacro transact [db-conn & txs] (_transact ~db-conn swap! ~@txs))

(defmacro what-if [db & txs] `(_transact ~db _what-if ~@txs))

(defn- _what-if [db f txs] (f db txs))

Page 36: Designing a database like an archaeologist

The process

(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))■ Should be transformed into a collection of valid operations

■ Execute each operation in the - adding layers■ Re-layer the initial DB with the latest layer and update its time■ Create a new instance of DB■ Updated the Atom to hold the new instance

– Or not in case of what-if

Page 37: Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Operations in the form of: [[op param1 param2…] [op param1 param2…]]

Page 38: Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Going over the operations in the

transaction

Page 39: Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

Add a single layer on top of the previously added layer,

in each iteration. Build and execution of the add / update / remove call.

Page 40: Designing a database like an archaeologist

(defn transact-on-db [initial-db ops] (loop [[op & rst-ops] ops transacted initial-db] (if op (recur rst-ops (apply (first op) transacted (rest op))) (let [initial-layer (:layers initial-db) new-layer (last (:layers transacted))] (assoc initial-db :layers (conj initial-layer new-layer) :curr-time (next-ts initial-db) :top-id (:top-id transacted))))))

No more operations:Construct the output of the

transaction – a fully updated db

Page 41: Designing a database like an archaeologist

Transaction vs What-if processTransaction What if(transact db-conn (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(what-if db (add-entity a3) (remove-entity a4))

(_transact db-conn swap! (add-entity e1) (update-entity e2 atr2 val2 :db/add))

(_transact db _what-if (add-entity a3) (remove-entity a4))

(swap! db-conn transact-on-db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(_what-if db transact-on-db [[add-entity a3] [remove-entity a4]])

(transact-on-db db [[add-entity e1] [update-entity e2 atr2 val2 :db/add]])

(transact-on-db db [[add-entity a3] [remove-entity a4]])

(add-entity db e1)(update-entity e2 atr2 val2 :db/add)

(add-entity a3)(remove-entity a4)

Return the given db-conn (an atom) with updated state

Return a new db

Page 42: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 43: Designing a database like an archaeologist

Evolutionary queries

■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before

Page 44: Designing a database like an archaeologist

Evolutionary queries

(defn evolution-of [db ent-id attr-name] (loop [res [] ts (:curr-time db)] (if (= -1 ts) (reverse res) (let [attr (attr-at db ent-id attr-name ts)] (recur (conj res {(:ts attr) (:value attr)}) (:prev-ts attr))))))

■ Seeing how an entity’s attribute evolved throughout time■ Each attribute has a prev-ts property■ We can use it to look back and see what was before

Ends up with a vector showing evolutionary

steps {:<time> :<value>}

Page 45: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 46: Designing a database like an archaeologist

Graph queries

■ Treating the database as a graph■ Each entity is a node■ An entity may have attributes that their type is :db/ref

– The value of such attribute is an Id of another entity■ Each such attribute is a link

– The link’s label is the attribute name– The link’s target is the attribute value

Page 47: Designing a database like an archaeologist

0

1

2

3

4

A

B

CD

E

E points to A

DX

Page 48: Designing a database like an archaeologist

Completing the graph story

■ For each link we know– Source – the containing entity– Target – the value

■ Need for each node to know who are its links– Outgoing– Incoming– (at a give time)

Page 49: Designing a database like an archaeologist

(defn incoming-refs [db ts ent-id & ref-names] (let [vaet (indx-at db :VAET ts) all-attr-map (vaet ent-id) filtered-map (if ref-names (select-keys ref-names all-attr-map) all-attr-map)] (reduce into #{} (vals filtered-map))))

(defn outgoing-refs [db ts ent-id & ref-names] (let [val-filter-fn (if ref-names #(vals (select-keys ref-names %)) vals)] (if-not ent-id [] (->> (entity-at db ts ent-id) ; the entity at that timestamp (:attrs) ; take the attributes (val-filter-fn) ; filter them according to the given ref-names (filter ref?) ; take from it only the ones that are links (mapcat :value))))) ; take all the targets

We may want part of the

links

We may want part of the

links

Page 50: Designing a database like an archaeologist

(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))

(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))

Example: BFS or DFS over the incoming or outgoing links

Page 51: Designing a database like an archaeologist

(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))

(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))

Example: BFS or DFS over the incoming or outgoing links

Preparations

Page 52: Designing a database like an archaeologist

(defn- traverse [pendings explored exploring-fn ent-at structure-fn] (let [cleaned-pendings (remove-explored pendings explored structure-fn) item (first cleaned-pendings) all-next-items (exploring-fn item) next-pends (reduce conj (structure-fn (rest cleaned-pendings)) all-next-items)] (when item (cons (ent-at item) (lazy-seq (traverse next-pends (conj explored item) exploring-fn ent-at structure-fn))))))

(defn traverse-db ([start-ent-id db algo direction] (traverse-db start-ent-id db algo direction (:curr-time db))) ([start-ent-id db algo direction ts] (let [structure-fn (if (= :graph/bfs algo) vec list*) explore-fn (if (= :graph/outgoing direction) outgoing-refs incoming-refs)] (traverse [start-ent-id] #{} (partial explore-fn db ts) (partial entity-at db ts) structure-fn))))

Example: BFS or DFS over the incoming or outgoing links

Page 53: Designing a database like an archaeologist

Data model

Datom Life cycle

Transactions What-if

EvolutionGraph queries

Datalog queries

Indexes

Build

Read

Add UpdateRemove

Page 54: Designing a database like an archaeologist

Simple Datalog queries

■ The database accumulates facts– A fact is a triplet structured like this:

[EntityId AttributeName Value]■ Need a query language that can operate on facts

■ Datalog query have two main components – Output structure– List of conditions – query clauses

■ A condition is structured the same way a fact is

Page 55: Designing a database like an archaeologist

The anatomy of a query

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

Vector describing the output structure (think of SELECT in

SQL)

Page 56: Designing a database like an archaeologist

The anatomy of a queryVector of query clauses.

The operator between them is ‘AND’. Each clause is built of 3

terms.{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

Page 57: Designing a database like an archaeologist

The anatomy of a query

Term to operate on the Entity id part of a datom.

Here: variable, same symbol => same value.

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

Page 58: Designing a database like an archaeologist

The anatomy of a query

Term to operate on the Attribute part of a Datom.

Here – simple value means exact match

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

Page 59: Designing a database like an archaeologist

The anatomy of a query

“Equals” predicate to apply on the value

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

Term to operate on the value part of a datom.

Page 60: Designing a database like an archaeologist

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

The anatomy of a query

User provided predicate to act on a value

Page 61: Designing a database like an archaeologist

{ :find [?nm ?bd ] :where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]}

The anatomy of a query

User provided predicate to act on a value

Variables to be used at the output

What are the names and birthdays of entities who like pizza, speak English, and who have a birthday this month?

Page 62: Designing a database like an archaeologist

How does it work?

■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)

■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them

■ Need to extract from these facts the output– Based on the user’s request

Page 63: Designing a database like an archaeologist

How does it work?

■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)

■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them

■ Need to extract from these facts the output– Based on the user’s request

Page 64: Designing a database like an archaeologist

What’s in a query clause■ A query clause is built of 3 terms

■ Each term can be one of the following:– Variable – starts with ‘?’– Don’t care symbol – ‘_’– Single value – interpreted as equals– Unary operator with variable (negative? ?num)– Binary operator with variable as the first operand (> ?num 5)– Binary operator with variable as the second operand (> 5 ?num)

■ Each of these should be transformed to an executable predicate– If there was a variable, need to remember its symbol

Page 65: Designing a database like an archaeologist

(defmacro clause-term-expr [clause-term] (cond (variable? (str clause-term)) ;variable #(= % %) (not (coll? clause-term)) ;constant `#(= % ~clause-term) (= 2 (count clause-term)) ;unary operator `#(~(first clause-term) %) (variable? (str (second clause-term)));binary operator, 1st operand is variable `#(~(first clause-term) % ~(last clause-term)) (variable? (str (last clause-term)));binary operator, 2nd operand is variable `#(~(first clause-term) ~(second clause-term) %)))

(defmacro clause-term-meta [clause-term] (cond (coll? clause-term) ;unary or binary operator (first (filter #(variable? % false) (map str clause-term))) (variable? (str clause-term) false) ;variable without don’ t care (str clause-term) :no-variable-in-clause )));constant or don’t care nil))

Term becomes an executable

form

Page 66: Designing a database like an archaeologist

(defmacro clause-term-expr [clause-term] (cond (variable? (str clause-term)) ;variable #(= % %) (not (coll? clause-term)) ;constant `#(= % ~clause-term) (= 2 (count clause-term)) ;unary operator `#(~(first clause-term) %) (variable? (str (second clause-term)));binary operator, 1st operand is variable `#(~(first clause-term) % ~(last clause-term)) (variable? (str (last clause-term)));binary operator, 2nd operand is variable `#(~(first clause-term) ~(second clause-term) %)))

(defmacro clause-term-meta [clause-term] (cond (coll? clause-term) ;unary or binary operator (first (filter #(variable? % false) (map str clause-term))) (variable? (str clause-term) false) ;variable without don’ t care (str clause-term) :no-variable-in-clause )));constant or don’t care nil))

Extracting the name of the

variable used in the term

Page 67: Designing a database like an archaeologist

(defmacro pred-clause [clause] (loop [[trm# & rst-trm#] clause exprs# [] metas# []] (if trm# (recur rst-trm# (conj exprs# `(clause-term-expr ~ trm#)) (conj metas#`(clause-term-meta ~ trm#))) (with-meta exprs# {:db/variable metas#}))))

(defmacro q-clauses-to-pred-clauses [clauses] (loop [[frst# & rst#] clauses preds-vecs# []] (if-not frst# preds-vecs# (recur rst# `(conj ~preds-vecs# (pred-clause ~frst#))))))

Going over the terms in a clause

(a triplet)

Page 68: Designing a database like an archaeologist

(defmacro pred-clause [clause] (loop [[trm# & rst-trm#] clause exprs# [] metas# []] (if trm# (recur rst-trm# (conj exprs# `(clause-term-expr ~ trm#)) (conj metas#`(clause-term-meta ~ trm#))) (with-meta exprs# {:db/variable metas#}))))

(defmacro q-clauses-to-pred-clauses [clauses] (loop [[frst# & rst#] clauses preds-vecs# []] (if-not frst# preds-vecs# (recur rst# `(conj ~preds-vecs# (pred-clause ~frst#))))))

Going over conditions in a

query

Page 69: Designing a database like an archaeologist

Query Clause Predicate Clause Meta Clause

[?e  :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e  :name  ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?

nm"][?e  :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e  :birthday (birthday-this-month? ?bd)]

[#(= % %) #(= % :birthday) #(birthday-this-month? %)]

["?e" nil "?bd"]

:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Page 70: Designing a database like an archaeologist

Query Clause Predicate Clause Meta Clause

[?e  :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e  :name  ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?

nm"][?e  :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e  :birthday (birthday-this-month? ?bd)]

[#(= % %) #(= % :birthday) #(birthday-this-month? %)]

["?e" nil "?bd"]

:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Page 71: Designing a database like an archaeologist

Query Clause Predicate Clause Meta Clause

[?e  :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e  :name  ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?

nm"][?e  :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e  :birthday (birthday-this-month? ?bd)]

[#(= % %) #(= % :birthday) #(birthday-this-month? %)]

["?e" nil "?bd"]

:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Page 72: Designing a database like an archaeologist

Query Clause Predicate Clause Meta Clause

[?e  :likes "pizza"] [#(= % %) #(= % :likes) #(= % "pizza")] ["?e" nil nil][?e  :name  ?nm] [#(= % %) #(= % :name) #(= % %)] ["?e" nil "?

nm"][?e  :speak "English"] [#(= % %) #(= % :speak) #(= % "English")] ["?e" nil nil][?e  :birthday (birthday-this-month? ?bd)]

[#(= % %) #(= % :birthday) #(birthday-this-month? %)]

["?e" nil "?bd"]

:where [ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Page 73: Designing a database like an archaeologist

How does it work?

■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)

■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them

■ Need to extract from these facts the output– Based on the user’s request

Page 74: Designing a database like an archaeologist

Executing the query ■ Need to build a query plan

– The query itself gets executed on an index– Not on the data !!– Need to decide on which index to use

■ Executing the query means applying each of the clauses on an index– Each of the terms on the right level of the index– Remember, index is:

■ Top level – map■ Second level – map■ Third level – set

■ There may be a need to restructure the clauses to the index structure– The query clause is ordered as E->A->V– Index – not necessarily

Page 75: Designing a database like an archaeologist

Building a query plan

■ There’s actually only one predefined query plan– Operates on single index– Only one variable can be used in different clauses – the joining

variable■ Needs to receive the index to operate on to be fully operable

– That index is decided based on the joining variable■ It is executed on the third level

Page 76: Designing a database like an archaeologist

(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable

(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

Finding the index of the joining

variable

Page 77: Designing a database like an archaeologist

(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable

(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

Deciding which index to use in

the query

Page 78: Designing a database like an archaeologist

(defn index-of-joining-variable [query-clauses] (let [metas-seq (map #(:db/variable (meta %)) query-clauses) ;extracting the meta clauses collapsing-fn (fn [accV v] (map #(when (= %1 %2) %1) accV v)) ;clause collapsing fn collapsed (reduce collapsing-fn metas-seq)] ;reducing query to one triplet, with one variable (first (keep-indexed #(when (variable? %2 false) %1) collapsed)))) ;taking the index of the variable

(defn build-query-plan [query] (let [term-ind (index-of-joining-variable query) ind-to-use (case term-ind 0 :AVET 1 :VEAT 2 :EAVT)] (partial single-index-query-plan query ind-to-use)))

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

Constructing the plan

Page 79: Designing a database like an archaeologist

Executing the plan

■ Apply each clause on the index– Each such application returns a result clause

■ All the paths in the index that passed all the predicates

■ Collecting all the results and ‘AND’ing them– By looking at the values of the joining variable

■ The third level items that are found in all of the result clauses

Page 80: Designing a database like an archaeologist

Index

Page 81: Designing a database like an archaeologist

The levels

Level 1

Level 2

Level 3

Page 82: Designing a database like an archaeologist

Applying one predicate clause

Level 1

Level 2

Level 3

Page 83: Designing a database like an archaeologist

Applying another predicate clause

Level 1

Level 2

Level 3

Page 84: Designing a database like an archaeologist

The joining variable need to see which items are found in all of the sets

Page 85: Designing a database like an archaeologist

(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))

(defn filter-index [index predicate-clauses] (for [pred-clause predicate-clauses :let [[lvl1-prd lvl2-prd lvl3-prd] (apply (from-eav index) pred-clause)] [k1 l2map] index ; keys and values of the first level :when (try (lvl1-prd k1) (catch Exception e false)) [k2 l3-set] l2map ; keys and values of the second level :when (try (lvl2-prd k2) (catch Exception e false)) :let [res (set (filter lvl3-prd l3-set))] ] (with-meta [k1 k2 res] (meta pred-clause))))

Page 86: Designing a database like an archaeologist

(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))

(defn filter-index [index predicate-clauses] (for [pred-clause predicate-clauses :let [[lvl1-prd lvl2-prd lvl3-prd] (apply (from-eav index) pred-clause)] [k1 l2map] index ; keys and values of the first level :when (try (lvl1-prd k1) (catch Exception e false)) [k2 l3-set] l2map ; keys and values of the second level :when (try (lvl2-prd k2) (catch Exception e false)) :let [res (set (filter lvl3-prd l3-set))] ] (with-meta [k1 k2 res] (meta pred-clause))))

Adapting to the index’s structure

Page 87: Designing a database like an archaeologist

[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Use the AVET index

Page 88: Designing a database like an archaeologist

Entity ID Attribute Name

Attribute Value

1 :name :likes:speak:birthday

USAPizzaEnglishJuly 4, 1776

2 :name :likes:speak:birthday

FranceRed wineFrenchJuly 14, 1789

3 :name :likes:speak:birthday

CanadaSnowEnglishJuly 1, 1867

[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?nm"][:speak "English" #{1, 3}] ["?e" nil nil][:birthday "July 4, 1776" #{1}]

["?e" nil "?bd"]

[:name France #{2}] ["?e" nil "?nm"][:birthday "July 14, 1789" #{2}]

["?e" nil "?bd"]

[:name Canada #{3}] ["?e" nil "?nm"][:birthday "July 1, 1867" {3}]

["?e" nil "?bd"]

Use the AVET index

Page 89: Designing a database like an archaeologist

Entity ID Attribute Name

Attribute Value

1 :name :likes:speak:birthday

USAPizzaEnglishJuly 4, 1776

2 :name :likes:speak:birthday

FranceRed wineFrenchJuly 14, 1789

3 :name :likes:speak:birthday

CanadaSnowEnglishJuly 1, 1867

[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?nm"][:speak "English" #{1, 3}] ["?e" nil nil][:birthday "July 4, 1776" #{1}]

["?e" nil "?bd"]

[:name France #{2}] ["?e" nil "?nm"][:birthday "July 14, 1789" #{2}]

["?e" nil "?bd"]

[:name Canada #{3}] ["?e" nil "?nm"][:birthday "July 1, 1867" {3}]

["?e" nil "?bd"]

Use the AVET index

Notice that the result clauses have the triplet structure of the index

(AVE)

Page 90: Designing a database like an archaeologist

(defn items-that-answer-all-conditions [items-seq num-of-conditions] (->> items-seq ; take the items-seq (map vec) ; make each collection (actually a set) into a vector (reduce into []) ;reduce all the vectors into one vector (frequencies) ;count for each item in how many collections (sets) it was in (filter #(<= num-of-conditions (last %))) ;items that answered all conditions (map first) ; take from the duos the items themselves (set)))

(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))

(defn mask-path-leaf-with-items [relevant-items path] (update-in path [2] CS/intersection relevant-items))

ANDing the results

Page 91: Designing a database like an archaeologist

(defn items-that-answer-all-conditions [items-seq num-of-conditions] (->> items-seq ; take the items-seq (map vec) ; make each collection (actually a set) into a vector (reduce into []) ;reduce all the vectors into one vector (frequencies) ;count for each item in how many collections (sets) it was in (filter #(<= num-of-conditions (last %))) ;items that answered all conditions (map first) ; take from the duos the items themselves (set)))

(defn query-index [index pred-clauses] (let [result-clauses (filter-index index pred-clauses) relevant-items (items-that-answer-all-conditions (map last result-clauses) (count pred-clauses)) cleaned-result-clauses (map (partial mask-path-leaf-with-items relevant-items) result-clauses)] (filter #(not-empty (last %)) cleaned-result-clauses)))

(defn mask-path-leaf-with-items [relevant-items path] (update-in path [2] CS/intersection relevant-items))

Filtering the ANDed results

Page 92: Designing a database like an archaeologist

[ [?e :likes "pizza"] [?e :name ?nm] [?e :speak "English"] [?e :birthday (birthday-this-month? ?bd)]]

Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?

nm"][:speak "English" #{1, 3}]

["?e" nil nil]

[:birthday "July 4, 1776" #{1}]

["?e" nil "?bd"]

[:name France #{2}] ["?e" nil "?nm"]

[:birthday "July 14, 1789" #{2}]

["?e" nil "?bd"]

[:name Canada #{3}] ["?e" nil "?nm"]

[:birthday "July 1, 1867" {3}]

["?e" nil "?bd"]

Use the AVET index

Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?

nm"][:birthday "July 4, 1776" #{1}]

["?e" nil "?bd"]

[:speak "English" #{1}] ["?e" nil nil]

Page 93: Designing a database like an archaeologist

How does it work?

■ Need to transform the query clauses to predicates clauses– Each term is transform to a predicate (function returning true or false)

■ Need to execute the predicates on the data to find the facts that apply for each clause– Then AND them

■ Need to extract from these facts the output– Based on the user’s request

Page 94: Designing a database like an archaeologist

Reporting the results

■ Transform the results clauses into a binding pairs structure– A structure that follows an index structure (map->map->set)

■ Now in each level we have a pair that is a match from the result clause and the meta clause

■ Extract from the bind pairs the variables that the user requested

Page 95: Designing a database like an archaeologist

Result Clause Result Meta[:likes Pizza #{1}] ["?e" nil nil][:name USA #{1}] ["?e" nil "?

nm"][:birthday "July 4, 1776" #{1}]

["?e" nil "?bd"]

[:speak "English" #{1}] ["?e" nil nil]

{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}

Bind pairs structure

Page 96: Designing a database like an archaeologist

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))

Page 97: Designing a database like an archaeologist

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))

(defn combine-path-and-meta [from-eav-fn path] (let [expanded-path [(repeat (first path)) (repeat (second path)) (last path)] ;path’s set is cut to items meta-of-path (apply from-eav-fn (map repeat (:db/variable (meta path)))) ;meta in index order combined-data-and-meta-path (interleave meta-of-path expanded-path)] ;interleaving all (apply (partial map vector) combined-data-and-meta-path)))

6 items vector, result and its meta

Page 98: Designing a database like an archaeologist

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))

Restructuring the 6 items vector to be pairs in an EAV structure

Page 99: Designing a database like an archaeologist

(defn single-index-query-plan [query indx db] (let [q-res (query-index (indx-at db indx) query)] (bind-variables-to-query q-res (indx-at db indx))))

(defn bind-variables-to-query [q-res index] (let [seq-res-path (mapcat (partial combine-path-and-meta (from-eav index)) q-res) res-path (map #(->> %1 (partition 2)(apply (to-eav index))) seq-res-path)] (reduce #(assoc-in %1 (butlast %2) (last %2)) {} res-path)))

Building the 3 pairs into binding pairs structure

Page 100: Designing a database like an archaeologist

{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}

Reporting

■ We have a superset of the answer

■ Need to take only the variables the user requested

:find [?nm ?bd ]

Page 101: Designing a database like an archaeologist

(defn unify [binded-res-col needed-vars] (map (partial locate-vars-in-query-res needed-vars) binded-res-col))

(defn locate-vars-in-query-res [vars-set q-res] (let [[e-pair av-map] q-res e-res (resultify-bind-pair vars-set [] e-pair)] (map (partial resultify-av-pair vars-set e-res) av-map)))

(defn resultify-bind-pair [vars-set accum pair] (let [[ var-name _] pair] (if (contains? vars-set var-name) (conj accum pair) accum)))

(defn resultify-av-pair [vars-set accum-res av-pair] (reduce (partial resultify-bind-pair vars-set) accum-res av-pair))

Entity pair => result

Page 102: Designing a database like an archaeologist

(defn unify [binded-res-col needed-vars] (map (partial locate-vars-in-query-res needed-vars) binded-res-col))

(defn locate-vars-in-query-res [vars-set q-res] (let [[e-pair av-map] q-res e-res (resultify-bind-pair vars-set [] e-pair)] (map (partial resultify-av-pair vars-set e-res) av-map)))

(defn resultify-bind-pair [vars-set accum pair] (let [[ var-name _] pair] (if (contains? vars-set var-name) (conj accum pair) accum)))

(defn resultify-av-pair [vars-set accum-res av-pair] (reduce (partial resultify-bind-pair vars-set) accum-res av-pair))

Attribute value pair => result

Page 103: Designing a database like an archaeologist

{[1 "?e"] { {[:likes nil] ["Pizza" nil]} {[:name nil] ["USA" "?nm"]} {[:speaks nil] ["English" nil]} {[:birthday nil] ["July 4, 1776" "?bd"]} }}

:find [?nm ?bd ]

[("?nm" "USA") ("?bd" "July 4, 1776")]

Page 104: Designing a database like an archaeologist

Running the show(defmacro q [db query] (let [pred-clauses# (q-clauses-to-pred-clauses ~(:where query)) needed-vars# (symbol-col-to-set ~(:find query)) query-plan# (build-query-plan pred-clauses#) query-internal-res# (query-plan# ~db)] (unify query-internal-res# needed-vars#)))

Page 105: Designing a database like an archaeologist

Running the show(defmacro q [db query] (let [pred-clauses# (q-clauses-to-pred-clauses ~(:where query)) needed-vars# (symbol-col-to-set ~(:find query)) query-plan# (build-query-plan pred-clauses#) query-internal-res# (query-plan# ~db)] (unify query-internal-res# needed-vars#)))

Each of these steps does data structure transformation!

Page 106: Designing a database like an archaeologist

Summary

■ We have in memory functional DB with– Transactions– What-if– Graph queries– Evolution queries– Simple datalog queries

■ 488 lines, of which– 73 blank– 55 docstrings– Total – 360 lines

Page 107: Designing a database like an archaeologist

Summary - what made it possible■ Priorities

– Normal project: correct > optimized > readable > short– This project: correct > readable > short > optimized

■ Ignored by design– Networking– Durability– Nothing besides clojure.core

■ (Almost succeeded)

Page 108: Designing a database like an archaeologist

Summary - what made it possible■ Design approach: bottom up

– With occasional top-down

■ Clojure’s magic– Persistent data structures– Macros – Data literals– HOFs– Destructuring

■ Clojure approach– Everything is a library– Design data structures and write data structures transformation code– The rest will follow

Page 109: Designing a database like an archaeologist

Thank You