Scaling HDFS with a Strongly Consistent Relational Model for Metadata
-
Upload
hooman-peiro-sajjad -
Category
Presentations & Public Speaking
-
view
44 -
download
1
Transcript of Scaling HDFS with a Strongly Consistent Relational Model for Metadata
Scaling HDFS with a Strongly Consistent Relational Model for Metadata
Kamal Hakimzadeh,Hooman Peiro Sajjad,
Jim Dowling (mahh, shps, jdowling)@kth.se
DAIS 2014
I-node File Systems
Kamal Hakimzadeh, DAIS 2014
File Set
File Info
Pointers
File Info
Pointers
File Info
Pointers
I-nodes Blocks
Kamal Hakimzadeh, DAIS 2014
Hadoop Distributed File System (HDFS)
File Info
Pointers
…
File Info
Pointers
File Info
Pointers
File Info
Pointers
File Info
Pointers…I- node
sBl
ocks
…
NameNode (NN)
DateNode (DN) DateNode DateNode DateNode
Commodity Machines
High Availability in HDFS 2.0
DN DN DN DN
NNActive
NNStandby
JN JN JN
Shared NNlog stored inquorum of
journal nodes
NN
Checkpt NN
ZK ZK ZK
Master-Slave
Replicationof NN State.
Agreement on the Active Master
Faster Recovery,Cut Journal Log
Kamal Hakimzadeh, DAIS 2014
Kamal Hakimzadeh, DAIS 2014
NameNode Limitations and Tradeoffs
1. 60 GB JVM heap for NN
• Compression, larger blocks
2. Operation reorder in failures
3. Single writer concurrency model
4. HA consensus overhead
100 M files ≈ 10 PB
65 M files ≈ 21 PB
Eventual Consistent
Poor throughput
Move Metadata into Distributed DataBase
DN DN DN DN
Stateless NN
NDB
Up to 48 nodesMySQL Cluster
• Distributed, Replicated, In-Memory Database
• Transaction support • Read-committed isolation
level• Row-level locks• 17.6 M tx/sec.
Kamal Hakimzadeh, DAIS 2014
Kamal Hakimzadeh, DAIS 2014
Metadata Consistency
Objective: Strongly Consistent Metadata
1. Transaction per each Metadata Operation2. Read committed Isolation Level3. Row-level Locking
Seriablizable Isolation Level ≈ Strongly Consistent Model
HDFS Uses System Level Lock = Single Writer Concurrency Model
Kamal Hakimzadeh, DAIS 2014
HDFS Metadata
Kamal Hakimzadeh, DAIS 2014
Order of Locks in the DAG of Metadata
Metadata Operations:
1. Path Operation
2. Block Operation
3. Lease Operation
Conflicting Lock OrderTotal Order Locking
Locking Issues
1. Range Queries
2. Semantically Related Objects
3. Lock Upgrade
Implicit Sub-tree lock
Strongest Required Lock
Kamal Hakimzadeh, DAIS 2014
Scale of Capacity
…
48 Nodes NDB Cluster12 TB
• NDB: 3 TB, replication factor 2• File: 2 blocks, 3 replicas
HDFS: 100M files Our Solution: 4.1B files
Factor of 40
Kamal Hakimzadeh, DAIS 2014
Row-level lock throughput impact
Open Operation (Shared lock) Create Operation (Exclusive Lock)
Kamal Hakimzadeh, DAIS 2014
Improvement: Snapshotting
Kamal Hakimzadeh, DAIS 2014
Kamal Hakimzadeh, DAIS 2014