Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)
-
Upload
openblend-society -
Category
Technology
-
view
1.381 -
download
1
description
Transcript of Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)
Memory is the new disk,disk is the new tape
Bela Ban, JBoss / Red Hat
Motivation
● We want to store our data in memory– Memory access is faster than disk access
– Even across a network
– A DB requires network communication, too
● The disk is used for archival purposes● Not a replacement for DBs !
– Only a key-value store
– NoSQL
Problems
● #1: How do we provide memory large enough to store the data (e.g. 2 TB of memory) ?
● #2: How do we guarantee persistence ?– Survival of data between reboots / crashes
#1: Large memory
● We aggregate the memory of all nodes in a cluster into a large virtual memory space
– 100 nodes of 10 GB == 1 TB of virtual memory
#2: Persistence
● We store keys redundantly on multiple nodes
– Unless all nodes on which key K is stored crash at the same time, K is persistent
● We can also store the data on disk– To prevent data loss in case all cluster
nodes crash
– This can be done asynchronously, on a background thread
How do we provide redundancy ?
Store every key on every node
AA BB CC DD
K1 K1 K1 K1
K2 K2 K2 K2
K3 K3 K3 K3
K4 K4 K4 K4
● RAID 1● Pro: data is available everywhere
– No network round trip
– Data loss only when all nodes crash
● Con: we can only use 25% of our memory
Store every key on 1 node only
AA BB CC DD
K1 K2 K3 K4
● RAID 0, JBOD● Pro: we can use 100% of our memory● Con: data loss on node crash
– No redundancy
Store every key on K nodes
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● K is configurable (2 in the example)● Variable RAID● Pro: we can use a variable % of our memory
– User determines tradeoff between memory consumption and risk of data loss
So how do we determine on which nodes the keys are stored ?
Consistent hashing
● Given a key K and a set of nodes, CH(K) will always pick the same node P for K
– We can also pick a list {P,Q} for K
● Anyone 'knows' that K is on P● If P leaves, CH(K) will pick another node Q
and rebalance affected keys● A good CH will rebalance 1/N keys at most
(where N = number of cluster nodes)
Example
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● K2 is stored on B (primary owner) and C (backup owner)
Example
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● Node B now crashes
Example
● C (the backup owner of K2) copies K2 to D– C is now the primary owner of K2
● A copies K1 to C– C is now the backup owner of K1
AA BB CC DD
K1 K1 K1
K2 K2 K2
K3 K3
K4 K4
Rebalancing
● Unless all N owners of a key K crash exactly at the same time, K is always stored redundantly
● When less than N owners crash, rebalancing will copy/move keys to other nodes, so that we have N owners again
Enter ReplCache
● ReplCache is a distributed hashmap spanning the entire cluster
● Operations: put(K,V), get(K), remove(K)● For every key, we can define how many
times we'd like it to be stored in the cluster– 1: RAID 0
– -1: RAID 1
– N: variable RAID
Use of ReplCache
HTTP
Apache
mod_jk
DB
JBoss
Servlet
ReplCache
JBoss
Servlet
ReplCache
JBoss
Servlet
ReplCacheCluster
Demo
Use cases
● JBoss AS: session distribution using Infinispan
– For data scalability, sessions are stored only N times in a cluster
● GridFS (Infinispan)– I/O over grid
– Files are chunked into slices, each slice is stored in the grid (redundantly if needed)
– Store a 4GB DVD in a grid where each node has only 2GB of heap
Use cases
● Hibernate Over Grid (OGM)– Replaces DB backend with Infinispan
backed grid
Conclusion
● Given enough nodes in a cluster, we can provide persistence for data
● Unlike RAID, where everything is stored fully redundantly (even /tmp), we can define persistence guarantees per key
● Ideal for data sets which need to be accessed quickly
– For the paranoid we can still stream to disk
Conclusion
● Data is distributed over a grid– Cache is closer to clients
– No bottleneck to the DBMS
– Keys are on different nodes
Conclusion
CacheCache
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
CacheCache
CacheCache
CacheCache
CacheCache
CacheCache
CacheCache
Questions ?
● Demo (JGroups)– http://www.jgroups.org
● Infinispan– http://www.infinispan.org
● OGM– http://community.jboss.org/en/hibernate/ogm