Advanced Non-Relational SchemasFor Big Data
by Victor Smirnov
Non-Relational Schema
● Is just a data structure● That uses some Memory Model● Typically, Key->Value mapping● Where Key is an Integer ID● And Value is an arbitrary array of a limited size or
memory block● It's assumed that operations on memory blocks
are atomic.
Storage Options
Partial (Prefix) Sums Tree
● Given a sequence of S[0, N) = s0...sn-1 of non-negative integers
● Sum(i) returns X = s0+s1+...+si.● FindLT(X) returns position i of largest Sum(i) < X● FindLE(X) is the same, but Sum(i) <= X● We can also define range versions of Sum(i, j) and
FindLT(j, X)● All operations perform in O(log N) time.
Packing Perfect Balanced Tree into an Array
Some Performance Bits
0
5e+06
1e+07
1.5e+07
2e+07
2.5e+07
3e+07
3.5e+07
4e+07
4.5e+07
5e+07
1 4 16 64 256 1024 4096 16384 65536 262144
Pe
rfo
rma
nce
, op
era
tion
s/se
c
Memory Block Size, Kb
PackedTree random read performance,1 million random reads
PackedTree<BigInt>, 2 childrenPackedTree<BigInt>, 32 children
std::set<BigInt>, 2 children
L1 L2 L3 RAM
Dynamic Vector
● An ordered sequence of elements (bytes, integers, strings) of size N
● Acess(i) is O(log N)● Insert(i, value) is O(log N)● Delete(i) is O(log N)● We can also define batch operations:● Insert(i, value[])● Delete(i, j)● Split(i); Merge(AnotherVector);...
Dynamic Vector
Dynamic Vector Operations
● FindLT(i) returns the B where i bounds and offset j in the block B for i
● Acces(i) is O(log N)● Insert(i, value) and Delete(i) are also O(log N)
because the tree is balanced.
File System: Map<ID, Vector<T>>
● Maps ID to Vector<T>● Merge all values into one large Dynamic Vector, in ID
order● Create separate “index” sequence from pairs <ID, Offset>
in ID order● We can represent this “index” sequence as two partial
sums tree, for ID and for Offset● We can merge both these trees to one because they have
exactly the same structure: multi-index balanced partial sums tree.
Map<ID, Vector<T>>
Sharing Tree Structures
● Tree structure sharing saves both space and time: SPMD principle (single program, multiple data)
● We can align partial sum trees with different structures using interpolation (padding with zeroes)
● We can merge index and data streams (index and data) of Map<ID, Vector<T>> in one multi-stream tree.
● Merging the trees, we will try to fix index pairs and corresponding data into the same leaf node of multi-stream tree.
Multistream Tree Node Layout
Multistream Balanced Tree
ACID
● Atomic block operations are not enough● Even simple tree update affects several blocks ● So, ACID is mandatory for advanced non-
relational schemas● We can get ACID for free with Multi-Version
Concurrency Control (MVCC)● We need Version History over data blocks● Where each each transaction is a version.
Transaction History via MVCC
Version History Implementation
● Version History maps pair <ID, Version> to an ID of real data block for that version and given ID
● We have Map<ID, Vector<Version, ID>>● We can turn it to Version History by sorting each
Vector<Version, ID> (less sapce, slower)● Or by creating additional partial sums tree index on top of it
(more space, but much faster)● We can do it in just one multi-stream balanced tree● MVCC requires some other data structures but they can be
designed by analogy.
Concurrency Handling
● Version History is a complicated data structure
● Concurrent access to it must be restricted
● Split whole Version History to shards
● And shard blocks by ID to reduce lock contention on Version History
Distributed Storage and Processing
● MVCC is very Raft/Paxos-friendly
● Because of Version History and MVCC
● So we can join storage nodes to Raft groups
● And join Raft groups to larger groups with 2PC
● Using split/merge model to map data to nodes.
Bonus Slides
Searchable Bitmaps
● rank1(n) = number of ones in [0, n)● select1(i) = position of i-th 1 in the bitmap● rank0(n) = number of zeroes in [0, n)● select0(i) = position of i-th 0 in the bitmap
Searchable Bitmap: Structure
Searchable Bitmaps: Views
LOUDS Tree
LOUDS Tree: Parent()
Wavelet Tree
● Searchable sequence [0...N) for large alphabets● Rank(i, s) returns number of symbols s in [0, i)● Select(k, s) returns position i of k-th symbol s● Insert(i, s), Delere(i), Access(i) – insert, remove and
access the symbol at position i respectively● All these operations have O(log N) time complexity● By mapping numbers to symbols we can perform the
following lookup operations: >, >=, <, <=, <> in O(log N) time.
Wavelet Tree: Structure
Wavelet Tree: Rank
Wavelet Tree: Inverted Index
Inverted Index Lookup
Thanks!More details are at:
https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData
Top Related