Extending Hadoop for Fun & Profit
-
date post
13-Sep-2014 -
Category
Technology
-
view
2.167 -
download
1
description
Transcript of Extending Hadoop for Fun & Profit
![Page 1: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/1.jpg)
Extending Hadoop for Fun & Profit
Milind Bhandarkar Chief Scientist, Pivotal Software,
(Twitter : @techmilind)
![Page 2: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/2.jpg)
About Me• http://www.linkedin.com/in/milindb
• Founding member of Hadoop team at Yahoo! [2005-2010]
• Contributor to Apache Hadoop since v0.1
• Built and led Grid Solutions Team at Yahoo! [2007-2010]
• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
• Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)
![Page 3: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/3.jpg)
Agenda• Extending MapReduce
• Functionality
• Performance
• Beyond MapReduce with YARN
• Hamster & GraphLab
• Extending HDFS
• Q & A
![Page 4: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/4.jpg)
Extending MapReduce
![Page 5: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/5.jpg)
MapReduce Overview
• Record = (Key, Value)
• Key : Comparable, Serializable
• Value: Serializable
• Logical Phases: Input, Map, Shuffle, Reduce, Output
![Page 6: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/6.jpg)
Map
• Input: (Key1, Value1)
• Output: List(Key2, Value2)
• Projections, Filtering, Transformation
![Page 7: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/7.jpg)
Shuffle
• Input: List(Key2, Value2)
• Output
• Sort(Partition(List(Key2, List(Value2))))
• Provided by Hadoop : Several Customizations Possible
![Page 8: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/8.jpg)
Reduce
• Input: List(Key2, List(Value2))
• Output: List(Key3, Value3)
• Aggregations
![Page 9: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/9.jpg)
MapReduce DataFlow
![Page 10: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/10.jpg)
Configuration• Unified Mechanism for
• Configuring Daemons
• Runtime environment for Jobs/Tasks
• Defaults: *-default.xml
• Site-Specific: *-site.xml
• final parameters
![Page 11: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/11.jpg)
<configuration> <property> <name>mapred.job.tracker</name> <value>head.server.node.com:9001</value> </property> <property> <name>fs.default.name</name> <value>hdfs://head.server.node.com:9000</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> <final>true</final> </property>....</configuration>
Example
![Page 12: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/12.jpg)
Extending Input Phase• Convert ByteStream to List(Key, Value)
• Several Formats pre-packaged
• TextInputFormat<long, Text>!
• SequenceFileInputFormat<K,V>!
• KeyValueTextInputFormat<Text,Text>!
• Specify InputFormat for each job
• JobConf.setInputFormat()
![Page 13: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/13.jpg)
InputFormat
• getSplits() : From Input descriptors, get Input Splits, such that each Split can be processed independently
•<FileName, startOffset, length>!
• getRecordReader() : From an InputSplit, get list of Records
![Page 14: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/14.jpg)
Industry Use Case !
Surveillance Video Anomaly Detection
![Page 15: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/15.jpg)
Acknowledgements
• Victor Fang
• Regu Radhakrishnan
• Derek Lin
• Sameer Tiwari
![Page 16: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/16.jpg)
Anomaly Detection in Surveillance Video
• Detect anomalous objects in a restricted perimeter
• Typical large enterprise collects TB’s video per day
• Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events
• Post-Incident monitoring enabled by Interactive Query
![Page 17: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/17.jpg)
Video DataFlow
• Timestamped Video Files as input
• Distributed Video Transcoding : ETL in Hadoop
• Distributed Video Analytics in Hadoop/HAWQ
• Insights in relational DB
![Page 18: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/18.jpg)
Real World Video Data
• Benchmark Surveillance videos from UK Home Office (iLids)
• CCTV Video footage depicting scenarios central to Govt requirements
![Page 19: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/19.jpg)
Common Video Standards
• MPEG & ITU responsible for most video standards
• MPEG-2 (1995) Widely adopted in DVDs, TV, Set Top boxes
![Page 20: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/20.jpg)
MPEG Standard Format
• Sequence of encoded video frames
• Compression by eliminating:
• Redundancy in Time: Inter-Frame Encoding
• Redundancy in Space: Intra-Frame Encoding
![Page 21: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/21.jpg)
Motion Compensation
• I-Frame: Intra-Frame encoding
• P-Frame: Predicated frame from previous frame
• B-Frame: Predicted frame from both previous & next frame
![Page 22: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/22.jpg)
Distributed MPEG Decoding
• HDFS splits large files in 64 MB/128 MB blocks
• Each HDFS block can be processed independently by a Map task
• Can we decode individual video frames from an arbitrary HDFS block in an MPEG File ?
![Page 23: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/23.jpg)
Splitting MPEG-2
• Header Information available only once per file
• Group of Pictures (GOP) header repeats
• Each GOP starts with an I-Frame and ends with an I-Frame
• Each GOP can be decoded independently
• First and last GOP may straddle HDFS blocks
![Page 24: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/24.jpg)
MPEG2InputFormat
• Derived from FileInputFormat
• getSplits() : Identical to FileInputFormat
• InputSplit = HDFS Block
•getRecordReader()!
•MPEG2RecordReader
![Page 25: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/25.jpg)
MPEG2RecordReader
• Start from beginning of block
• Search for the first GOP Header
• Locate an I-Frame, decode, keep in memory
• If P-Frame, decode using last frame
• If B-Frame, keep current frame in memory, read next frame, decode current frame
![Page 26: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/26.jpg)
Considerations for Input Format
• Use as little metadata as possible
• Number of Splits = Number of Map Tasks
• Combine small files
• Split determination happens in a single process, so should be metadata-based
• Affects scalability of MapReduce
![Page 27: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/27.jpg)
Scalability
• If one node processes k MB/s, then N nodes should process (k*N) MB/s
• If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes
• Linear Scalability
![Page 28: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/28.jpg)
Reduce LatencyMinimize Job Execution time
![Page 29: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/29.jpg)
Increase ThroughputMaximize amount of data processed per unit time
![Page 30: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/30.jpg)
Amdahl’s Law
S = N1+!(N !1)
![Page 31: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/31.jpg)
Multi-Phase Computations
• If computation C is split into N different parts, C1..CN
• If partial computation Ci can be speeded up by a factor of Si
![Page 32: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/32.jpg)
Amdahl’s Law, Restated
€
S =
Cii=1
N
∑Ci
Sii=1
N
∑
![Page 33: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/33.jpg)
Amdahl’s Law• Suppose Job has 5 phases: P0 is 10 seconds, P1,
P2, P3 are 200 seconds each, and P4 is 10 seconds
• Sequential runtime = 620 seconds • P1, P2, P3 parallelized on 100 machines with
speedup of 80 (Each executes in 2.5 seconds)
• After parallelization, runtime = 27.5 seconds • Effective Speedup: (620s/27.5s) = 22.5
![Page 34: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/34.jpg)
MapReduce Workflow
![Page 35: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/35.jpg)
Extending Shuffle
![Page 36: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/36.jpg)
Why Shuffle ?
• Often, the most expensive phase in MapReduce, involves slow disks and network
• Map tasks partition, sort and serialize outputs, and write to local disk
• Reduce tasks pull individual Map outputs over network, merge, and may spill to disk
![Page 37: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/37.jpg)
Message Cost Model
€
T = α + Nβ
![Page 38: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/38.jpg)
Message Granularity
• For Gigabit Ethernet
• α = 300 μS
• β = 100 MB/s
• 100 Messages of 10KB each = 40 ms
• 10 Messages of 100 KB each = 13 ms
![Page 39: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/39.jpg)
Alpha-Beta• Common Mistake: Assuming that α is constant
• Scheduling latency for responder
• MR daemons time slice inversely proportional to number of concurrent tasks
• Common Mistake: Assuming that β is constant
• Network congestion
• TCP incast
![Page 40: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/40.jpg)
Efficient Hardware Platforms
• Mellanox - Hadoop Acceleration through Network-assisted Merge
• RoCE - Brocade, Cisco, Extreme, Arista...
• SSD - Velobit, Violin, FusionIO, Samsung..
• Niche - Compression, Encryption...
![Page 41: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/41.jpg)
Pluggable Shuffle & Sort• Replace HTTP-based pull with RDMA
• Avoid spilling altogether
• Replace default Sort implementation with Job-optimized sorting algorithm
• Experimental APIs
• google PluggableShuffleAndPluggableSort.html
![Page 42: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/42.jpg)
Mellanox UDA
• Developed jointly with Auburn University
• 2x Performance on TeraSort
• Reduces disk writes by 45%, disk reads by 15%
![Page 43: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/43.jpg)
Syncsort DMX-h
![Page 44: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/44.jpg)
Beyond MapReduce with YARN
![Page 45: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/45.jpg)
Single'App'
BATCH
HDFS
Single'App'
INTERACTIVE
Single'App'
BATCH
HDFS
Single'App'
BATCH
HDFS
Single'App'
ONLINE
Hadoop 1.0 (Image Courtesy Arun Murthy, Hortonworks)
![Page 46: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/46.jpg)
MapReduce 1.0 (Image Courtesy Arun Murthy, Hortonworks)
![Page 47: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/47.jpg)
Hadoop 2.0 (Image Courtesy Arun Murthy, Hortonworks)
HADOOP 1.0
HDFS%(redundant,*reliable*storage)*
MapReduce%(cluster*resource*management*
*&*data*processing)*
HDFS2%(redundant,*reliable*storage)*
YARN%(cluster*resource*management)*
Tez%(execu7on*engine)*
HADOOP 2.0
Pig%(data*flow)*
Hive%(sql)*
%Others%(cascading)*
*
Pig%(data*flow)*
Hive%(sql)*
%Others%(cascading)*
%
MR%(batch)*
RT%%Stream,%Graph%Storm,''Giraph'
*
Services%HBase'
*
![Page 48: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/48.jpg)
Applica'ons+Run+Na'vely+IN+Hadoop+
HDFS2+(Redundant,*Reliable*Storage)*
YARN+(Cluster*Resource*Management)***
BATCH+(MapReduce)+
INTERACTIVE+(Tez)+
STREAMING+(Storm,+S4,…)+
GRAPH+(Giraph)+
INLMEMORY+(Spark)+
HPC+MPI+(OpenMPI)+
ONLINE+(HBase)+
OTHER+(Search)+(Weave…)+
YARN Platform (Image Courtesy Arun Murthy, Hortonworks)
![Page 49: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/49.jpg)
NodeManager* NodeManager* NodeManager* NodeManager*
Container*1.1*
Container*2.4*
NodeManager* NodeManager* NodeManager* NodeManager*
NodeManager* NodeManager* NodeManager* NodeManager*
Container*1.2*
Container*1.3*
AM*1*
Container*2.2*
Container*2.1*
Container*2.3*
AM2*
Client2*
ResourceManager*
Scheduler*
YARN Architecture (Image Courtesy Arun Murthy, Hortonworks)
![Page 50: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/50.jpg)
YARN
• Yet Another Resource Negotiator
• Resource Manager
• Node Managers
• Application Masters
• Specific to paradigm, e.g. MR Application master (aka JobTracker)
![Page 51: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/51.jpg)
Beyond MapReduce
• Apache Giraph - BSP & Graph Processing
• Storm on Yarn - Streaming Computation
• HOYA - HBase on Yarn
• Hamster - MPI on Hadoop
• More to come ...
![Page 52: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/52.jpg)
Hamster• Hadoop and MPI on the same
cluster
• OpenMPI Runtime on Hadoop YARN
• Hadoop Provides: Resource Scheduling, Process monitoring, Distributed File System
• Open MPI Provides: Process launching, Communication, I/O forwarding
![Page 53: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/53.jpg)
Hamster Components
• Hamster Application Master
• Gang Scheduler, YARN Application Preemption
• Resource Isolation (lxc Containers)
• ORTE: Hamster Runtime
• Process launching, Wireup, Interconnect
![Page 54: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/54.jpg)
Resource Manager
Scheduler
AMService
Node Manager Node Manager Node Manager …
Proc/Container
Framework Daemon NS MPI
Scheduler HNP
MPI AM
Proc/Container
… RM-AM
AM-NM
RM-NodeManager Client Client-RM
Aux Srvcs
Proc/Container
Framework Daemon NS
Proc/Container
…
Aux Srvcs RM-
NodeManager
Hamster Architecture
![Page 55: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/55.jpg)
Hamster Scalability• Sufficient for small to medium HPC
workloads
• Job launch time gated by YARN resource scheduler
Launch WireUp Collectives
Monitor
OpenMPI O(logN) O(logN) O(logN) O(logN)
Hamster O(N) O(logN) O(logN) O(logN)
![Page 56: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/56.jpg)
GraphLab + Hamster on Hadoop
!
![Page 57: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/57.jpg)
About GraphLab
• Graph-based, High-Performance distributed computation framework
• Started by Prof. Carlos Guestrin in CMU in 2009
• Recently founded Graphlab Inc to commercialize Graphlab.org
![Page 58: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/58.jpg)
GraphLab Features• Topic Modeling (e.g. LDA)
• Graph Analytics (Pagerank, Triangle counting)
• Clustering (K-Means)
• Collaborative Filtering
• Linear Solvers
• etc...
![Page 59: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/59.jpg)
Only Graphs are not Enough
• Full Data processing workflow required ETL/Postprocessing, Visualization, Data Wrangling, Serving
• MapReduce excels at data wrangling
• OLTP/NoSQL Row-Based stores excel at Serving
• GraphLab should co-exist with other Hadoop frameworks
![Page 60: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/60.jpg)
Coming Soon…
![Page 61: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/61.jpg)
Extending HDFS
![Page 62: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/62.jpg)
HCFS
• Hadoop Compatible File Systems
• FileSystem, FileContext
• S3, Local FS, webhdfs
• Azure Blob Storage, CassandraFS, Ceph, CleverSafe, Google Cloud Storage, Gluster, Lustre, QFS, EMC ViPR (more to come)
![Page 63: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/63.jpg)
New Dataset
• Reuse Namenode and Datanode implementations
• Substitute a different DataSet implementation: FsDatasetSpi, FsVolumeSpi
• Jira: HDFS-5194
![Page 64: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/64.jpg)
Extending Namenode
• Pluggable Namespace: HDFS-5324, HDFS-5389
• Pluggable Block Management: HDFS-5477
• Requires fine-grained locking in Namenode: HDFS-5453
![Page 65: Extending Hadoop for Fun & Profit](https://reader033.fdocuments.us/reader033/viewer/2022051012/54138c668d7f7299698b465c/html5/thumbnails/65.jpg)
Questions ?