Next generation databases july2010
-
Upload
guy-harrison -
Category
Technology
-
view
1.722 -
download
4
description
Transcript of Next generation databases july2010
![Page 1: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/1.jpg)
1
© 2010 Quest Software, Inc. ALL RIGHTS RESERVED
This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology
Guy Harrison
Director Research and Development, Melbourne
www.guyharrison.net
![Page 2: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/2.jpg)
2
Introductions
![Page 3: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/3.jpg)
3
![Page 4: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/4.jpg)
4
![Page 5: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/5.jpg)
Mainframes Minicomputers Client Server Internet/Y2K Boom After the gold rush
![Page 6: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/6.jpg)
6
Current Day Trends• Big Data• Cloud computing• Solid State Disk
![Page 7: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/7.jpg)
7
Big Data• The Industrial Revolution of data*
– User generated data:• Twitter, Facebook, Amazon
– Machine generated data:• RFID, POS, cell phones, GPS
• Traditional RDBMS neither economic or capable
* http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html
![Page 8: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/8.jpg)
8
Big data 1: Google
![Page 9: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/9.jpg)
9
Map Reduce
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
![Page 10: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/10.jpg)
10
Hadoop: Open source Map-reduce
• Yahoo! Hadoop cluster:– 4000 nodes– 16PB disk– 64 TB of RAM– 32,000 Cores– Very Low $/TB
![Page 11: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/11.jpg)
11
Hive
SQL
Java
Re
sults
![Page 12: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/12.jpg)
12
Big Data 2: Web 2.0
![Page 13: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/13.jpg)
13
Twitter Growth
![Page 14: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/14.jpg)
14
The fail whale
![Page 15: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/15.jpg)
15
Web Servers
Database
Servers
Memcached Servers
Shard (G-O) Shard (P-Z)Shard (A-F)
Read Only Slaves
![Page 16: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/16.jpg)
16
Clouds and Elastic provisioning
Over provisioned
Under provisioned
Capacity /
Demand
Time
Demand
Hardware upgrade
Capacity
![Page 17: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/17.jpg)
17
CAP Theorem
Consistency
Availability
R
D
B
M
S
NO
GO
Partition
Tolerance
NoSQL
![Page 18: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/18.jpg)
18
In search of the elastic database• Big Web sites AND Cloud applications need servers that scale
up (and down) on demand• Elastic provisioning works fine for web servers, application
servers, etc.• However RDBMS does not scale easily:
– SQL Azure limited to one database <50GB on a single host– Oracle’s RAC not supported in cloud environments– MySQL sharding “obnoxious”
• Many are willing to sacrifice relational database features for scalability and operational simplicity
![Page 19: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/19.jpg)
19
The NoSQL movement
![Page 20: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/20.jpg)
20
NoSQL (A.K.A.) Cloud databases• Generally DO NOT support
– SQL– Transactions– Immediate consistency
• Usually DO support:– Elasticity (scale out AND in)– Eventual consistency– Inherent redundancy and fault tolerance
![Page 21: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/21.jpg)
21
NoSQL Data Models
![Page 22: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/22.jpg)
Key Value Stores
Amazon Dynamo
Google BigTable
Document DB
JSON/XML DB
Graph Databases
MemcacheDB
Azure Table Services
Redis
Tokyo Cabinet
SimpleDB
Riak
Voldemort
Cassandra
Hbase
Hypertable
CouchDB
MongoDB
Neo4J
FlockDB
![Page 23: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/23.jpg)
23
Not so easy to get the data out....
![Page 24: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/24.jpg)
Amazon AWS Cloud
Microsoft Azure Cloud
On-Premise
(AKA private Cloud)
Data Hub
MySQL
HBase
SimpleDB
SQL Azure
Table Services SQL Server Oracle
Data Hub
SQL
SQL
![Page 25: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/25.jpg)
![Page 26: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/26.jpg)
26
Big Data 3: Data Warehousing
1996 1998 2000 2002 2004 2006 2008 20100
100
200
300
400
500
600
TB
![Page 27: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/27.jpg)
27
Data Warehouse players
![Page 28: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/28.jpg)
28
DATAllegro architecture
![Page 29: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/29.jpg)
29
Column Databases (Vertica, Sybase)
• Data is stored together in columns
• Very fast answers to analytic aggregate queries
• Better compression• Not write optimized
![Page 30: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/30.jpg)
30
Disk drives and Moore’s law• Transistor density doubles every 18 months• Exponential growth is observed in most electronic components:
– CPU clock speeds– RAM– Hard Disk Drive storage density
• But not in mechanical components– Service time (Seek latency) – limited by actuator arm speed and
disk circumference – Throughput (rotational latency) – limited by speed of rotation,
circumference and data density
![Page 31: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/31.jpg)
31
Big Data vs. Fast Data
IO Rate Disk Capacity IO/GB CPU IO/CPU-1,000
-500
0
500
1,000
1,500
2,000
260 1,635
-630
1,013
-390
%ag
e ch
ang
e
Disk trends 2001-2009
![Page 32: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/32.jpg)
32
SSD to the rescue?
Solid State Disk DDR-RAM
Solid State Disk Flash
Magnetic Disk
0 1,000 2,000 3,000 4,000 5,000
15
200
4,000
Seek time (us)
![Page 33: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/33.jpg)
33
Power consumption
Idle
Seek
Start up
1 10 100
8
10
20
Flash SSD
SATA HDD
Watts (logarithmic scale)
![Page 34: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/34.jpg)
34
Economics of SSD
Capacity HDDs
Performance HDDs
Flash SSDs (read)
DRAM SSDs
$0.10 $1.00 $10.00 $100.00 $1,000.00
$13.30
$16.60
$1.40
$0.50
$3.00
$28.00
$100.00
$400.00
$/GB
$/IOPs
![Page 35: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/35.jpg)
35
Fast reads but slow writes
256 page block erase
4k page write
4k page seek
0 500 1000 1500 2000 2500
2000
250
25
microseconds
![Page 36: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/36.jpg)
36
Hierarchical storage management
Main Memory
DDR SSD
Flash SSD
Disk
Tape
$/IOP$/
GB
![Page 37: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/37.jpg)
37
In Memory Databases: VoltDB & H-Store• In Memory Distributed (“Sharded”) Database• No transactional IO• ACID transactions (k-safety)• Single Threaded (no latches or locks)• Java Stored Procedure transactions• Hierarchical data model
• Double Shared Nothing (disk
OR CPU)
• Spool out to DW for ad-hoc
analysis
• Very high TPS for suitable
applications
![Page 38: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/38.jpg)
38
Oracle EXADATA
• RAC clusters provide MPP• Dedicated storage servers• High Speed infiniband
channels • Smart storage reduces data
transfer requirements • Hybrid Flash & spinning disk
storage system• Flash caching in the database
systems
![Page 39: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/39.jpg)
39
The Next Generation?
![Page 40: Next generation databases july2010](https://reader037.fdocuments.us/reader037/viewer/2022102813/547fb8dbb4af9f760d8b4588/html5/thumbnails/40.jpg)
40
© 2010 Quest Software, Inc. ALL RIGHTS RESERVED
너를 감사하십시요 Thank You Danke Schön
Gracias 有難う御座いました Merci
Grazie Obrigado 谢谢