GaianDB
-
Upload
dale-lane -
Category
Technology
-
view
3.122 -
download
1
description
Transcript of GaianDB
GaianDB
A dynamic distributed federated database
Dale Lane@dalelane
A massively over-simplified view of data-warehousing...
The “Internet of Things”
GaianDB
a dynamic distributed federated database
Federated data
Network of distributed databases
A dynamic network
A dynamic networkBiologically-Inspired Self-Organisation
Exploit natural selection in nature to build better networks
Robust self-organizing network architectures
Frameworks and algorithms for robust fault-tolerant information dissemination
Robust communications with minimal complexity or human control
Gaian database
N0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL QueryN0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL Query
N0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL Query
N0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL Queries
Queries routed to all database nodes – a flood query, but retrieving only the data required to satisfy a query
Exchanges query traffic in the network for data traffic – aiming to minimize total traffic
Predicated on a concept of ‘store data locally - read data from anywhere’ paradigm
Architecture
GaianDB
Derby Engine: Parsing, Compilation, Execution
GaianPStmtNode VTI:Executes queries on physical leaf nodes +
Propagates the original SQL (+ queryID & steps state info) to linked Gaian nodes
Instantiates Invokes costingmethods
Pushes columns and ‘where’ clausein a structure
MQ(tt) Stream Data
Original SQL
DB2 Oracle MS SQLServer Sybase MySQL Flat files
In-memorytables
Derby
GaianDBGaianDB
GaianDB
propagate
Text Index
Derby tables
N0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL QueryN0
N3
N11
N4N5
N1
N2
N6
N7
N8
N10N9
SQL Query
Expanded Node
Multithreaded, breadth-first query propagation
Loop detection/handling – no duplicates
Performance – with 1,250 nodes
Query time for 1025 nodes, fetching up to 1025 rows from each
y = 4.217x + 349.251
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200
Row s fetched per node
Tim
e (m
illis
econ
ds)
Query Execute Time
Total Query Time
Linear (Total Query Time)
Query Performance
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
0 200 400 600 800 1000 1200Number of Nodes
Qu
ery
Tim
e(m
illis
eco
nd
s)
Average Query TimePredicted Max (Layers)Predicted Min (Layers)
Performance questions
The time to propagate a query to all of the nodes in the database, as a function of the number of database nodes (N);
The time to fetch data from across the nodes of the database to a single node, as a function of the volume of data;
The time to fetch data from across the database to multiple nodes concurrently querying, as a function of the number of nodes concurrently querying.
Graph metricsThe eccentricity ε(νi) of a graph vertex νi is the maximum graph distance between νi and any other vertex νj of G i.e. the "longest shortest path" between any two graph vertices (νi , νj) of the graph.
The maximum eccentricity is the graph diameter Gd. The minimum graph eccentricity is the graph radius Gr. We define the size of G as the number of vertices N and the number of connections at each vertex as the vertex degree δi (1 < i ≤ N).
Biologically inspired self-organisation
0123456789
10
0 200 400 600 800 1000Number of Nodes (N)
Grap
h Di
men
sion
(edg
es)
RadiusDiameter(1+e)ln(N)(1-e)ln(N)
Network growth by preferential attachment Using a fitness function at each node
Limit maximum vertex degree =10
Gd = nint [ (1+e) * ln(N) ]
Gr = nint [ (1-e) * ln(N) ]
e = 0.24
Query propagation timeThe predicted maximum (Tmax) and minimum times (Tmin) to execute the flood query are:
TL = link latencyTp = processor delay
Tmax = (Gd + 1)(TL + Tp)Tmin = (Gr + 1)(TL + Tp)
with the predicted execute query time from any node (Tν) being:
Tν = (ε(ν) + 1)(TL + Tp)
Hence substituting for ε(ν) Tν = nint[1 + B * ln(N) * (TL + Tp)]
Measured query propagationIndividual Query Time Scalability
0.0
53.9
107.8
161.7
215.6
269.5
323.4
377.3
431.2
485.1
539.0
592.9
0 200 400 600 800 1000 1200Number of Nodes
Query
Time (
ms)
Average Query TimePredicted Max (Diameter+1)Predicted Min (Radius+1)Queried node eccentricity+1
Individual Query Time Scalability
0
53.9
107.8
161.7
215.6
269.5
323.4
0 50 100Number of Nodes
Query
Time
(ms)
Individual Query TimesAverage Query TimeQueried node eccentricity+1
Measured data fetch
Query time to fetch 1 million rows
y = 4.217x + 349.251
y = 1.7383x + 678.141
0
1000
2000
3000
4000
5000
6000
0 200000 400000 600000 800000 1000000 1200000Total Rows fetched
Tim
e (m
illis
econ
ds)
Total Query Time 1025 nodes
Total Query Time 1 node
Total Query Time 1 node indexed
Linear (Total Query Time 1025 nodes)
Linear (Total Query Time 1 node)
Example uses
Smart Metering
centralisedwrite
Smart Metering
centralisedread
Smart Metering
distributed federatedwrite
Smart Metering
distributed federatedread
Other uses...
http://www.alphaworks.ibm.com/tech/gaiandb
Image credits
Background: YouTube video “The Internet of Things”, IBMhttp://www.youtube.com/watch?v=sfEbMV295Kk
Icons: DB and envelope icons, Tim Morgan http://flickr.com/photos/timothymorgan/sets/1615269
Microsoft Excel icon, Vincent Garnier (courtesy of IconArchive) http://iconarchive.com/show/softdimension-icons-by-benjigarner/Excel-icon.html
Photo of car mechanics, Tomas http://flickr.com/photos/tma/2264878
All other images original from GaianDB work