Zing Database – Distributed Key-Value Database
-
Upload
zingopen -
Category
Technology
-
view
5.751 -
download
3
description
Transcript of Zing Database – Distributed Key-Value Database
![Page 1: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/1.jpg)
Zing Database – Distributed Key-Value Database
Nguyễn Quang NamZing Web-Technical Team
![Page 2: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/2.jpg)
Content
Why
Introduction
Overview architecture
1
3
2
Single Server/Storage4
Distribution5
![Page 3: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/3.jpg)
Introduction
![Page 4: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/4.jpg)
Some statistics:
- Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used.
- User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache.
- Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache
![Page 5: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/5.jpg)
Why
![Page 6: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/6.jpg)
Access time
L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 nsMutex lock/unlock 100 nsMain memory reference 100 nsCompress 1K bytes with Zippy 10,000 nsSend 2K bytes over 1 Gbps network 20,000 nsRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsDisk seek 10,000,000 nsRead 1 MB sequentially from network 10,000,000 nsRead 1 MB sequentially from disk 30,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns
by Jeff Dean (http://labs.google.com/people/jeff)
![Page 7: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/7.jpg)
Standard & Real Requirement
- Time to load a page < 200 ms- Read data rate ~12K ops/sec- Write data rate ~8K ops/sec- Caching service/Database recovery time < 5 mins
![Page 8: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/8.jpg)
Existent thing
- RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB
- Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good
- HBase/Hadoop: We use this for log system
- MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all
![Page 9: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/9.jpg)
Overview architecture
![Page 10: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/10.jpg)
ZN
onbl
ocki
ngS
erve
r
MODELRequests API
Disk
CommitlogStorage
(W)
ZiDBStorage
(RW)
LocalDatabase
LRU ICache(RW)
Remote Storage
(RW)Remote system
TCP
Transportlayer
Model(Business)
layerStorage
layer
Memory storage
Persistentstorage
Remotestorage
- Load configuration- Create & manage backend storages- Implement business rules
![Page 11: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/11.jpg)
Server/Storage
![Page 12: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/12.jpg)
ZNonblockingServer
- Based on TNonblockingServer (Apache Thrift)- 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec)- Serialize/Deserialize data- Prevent overload server- Data is not secured while transferring- Protect service from invalid requests
![Page 13: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/13.jpg)
ICache
- Least Recently Used/Time based expiration strategy- zlru_table<key_type, value_type>: hash table data structure- Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment- Support dirty-items marking => for lazy DB flush
![Page 14: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/14.jpg)
ZiDB
- Separate into DataFile & IndexFile- 1 seek for a read, 1-2 seeks for a write- IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call- Write-ahead log to avoid data loss- Data magic-padding- Checksum & checkpoint for repair data- Partitioning DB for easier maintenance
![Page 15: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/15.jpg)
Distribution
![Page 16: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/16.jpg)
Key requirements:- Scalability- Load balance- Availability- Consistency
![Page 17: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/17.jpg)
2 Models:- Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure- Peer-peer: Each server includes addressing module & storage
2 Types of routing:- Client routing: Each client itself does the addressing and query data - Server routing: The addressing is done at server
![Page 18: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/18.jpg)
Operation Flows
Business Logic Server
Addressing Server (DHT)
Storage Layer
Storage Node 1ICache ZiDB Storage
Module
Storage Node NICache ZiDB Storage
Module…
(1) Request key
locations(2)
Key locations(3)
Get & Set operations
(4)Operation
returns
* Addressing module is moved into each storage node in Peer-peer model
![Page 19: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/19.jpg)
Addressing:
- Provide key locations of resources- Basically a Distributed Hash Table, using consistent hashing- Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions: - Uniform distribution of generated keys in the key space - Consistency(MD5, SHA are bad choice since performance)
![Page 20: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/20.jpg)
Addressing - Node location:
Each node is assigned a continuous range of IDs (hashed key)
![Page 21: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/21.jpg)
Addressing - Node location: Golden ratio principle (a/b = 2b/a)
- Init ratio = 1.618- Max ratio ~ 2.6- Easy to implement- Easy for routing from client 2 3
4
5
1
![Page 22: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/22.jpg)
Server 1: 1,2,3Server 2: 4,5,6,7Server 3: 8,9
1
47
3
6
25
8
9
Addressing - Node location: Virtual nodes
- Each real server has multiple virtual nodes on ring- More virtual nodes, more balance of load- Hard to maintain table of nodes
![Page 23: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/23.jpg)
A
A
A
B
B
CAddressing – Multi-layer rings
- Store the change history of system - Provide availability/reconfigurability- Able to put a node on ring manually
* Write: data is located on the highest ring* Read: data is located on the highest ring, then lower rings if not found
![Page 24: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/24.jpg)
Replication & Backup - Each node has one primary range of IDs, and Some secondary range of IDs- Each real node need a backup instance to replace in case it’s down
* Data is queried from primary node, then secondary nodes
![Page 25: Zing Database – Distributed Key-Value Database](https://reader033.fdocuments.us/reader033/viewer/2022061111/54550f51af795994188b46b6/html5/thumbnails/25.jpg)
Configuration: to find the best parameters to configure DB or to choose the suitable DB type.
- How many read/write per second?- Length Deviation of data: data length is same same or much different each others, - Has updation/deletion data? - How important of data: acceptable loss or not- The old data can be recycled?