A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters
-
Upload
miguel-xavier -
Category
Technology
-
view
85 -
download
2
Transcript of A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters
![Page 1: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/1.jpg)
A Performance Comparison of Container-‐based Virtualiza8on Systems for MapReduce Clusters
Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose [email protected]
Faculty of Informa8cs, PUCRS Porto Alegre, Brazil
February 13, 2014
![Page 2: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/2.jpg)
Outline
• Introduc8on • Container-‐based Virtualiza8on • MapReduce • Evalua8on • Conclusion
![Page 3: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/3.jpg)
Introduc8on • Virtualiza8on
• Allows resources to be shared • Hardware independence, availability, isola8on and security • BeUer manageability • Widely used in datacenters/cloud compu8ng
• MapReduce Cluster and Virtualiza8on • Usage scenarios
• BeUer resource sharing • Cloud Compu8ng
• However, hypervisor-‐based technologies in MapReduce environments has tradi8onally been avoided
![Page 4: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/4.jpg)
Container-‐based Virtualiza8on • A group o processes on a Linux box, put together in a
isolated environment • A lightweight virtualiza8on layer • Non virtualized drivers • Shared opera8ng system
Hardware
Host OS
Virtualization Layer
Guest Processes
Guest Processes
Hardware
Virtualization Layer
Guest Processes
Guest Processes
Guest OS Guest OS
Container-based Virtualization Hypervisor-Based Virtualization
Host OS
![Page 5: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/5.jpg)
Container-‐based Virtualiza8on • Each container has:
• Its own network interface (and IP Address) • Bridged, routed …
• Its own filesystem • Isola8on (security)
• container A and B can’t see each other • Isola8on (resource usage)
• RAM, CPU, I/O • Current systems
• Linux-‐Vserver, OpenVZ, LXC
![Page 6: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/6.jpg)
Container-‐based Virtualiza8on • Implements Linux Namespaces
• Mount – moun8ng/unmou8ng file systems • UTS – hostname, domainname • IPC – SysV message queues, semaphore, memory segments • Network – IPv4/IPv6 stacks, rou8ng, firewall, /proc/net,
sock • PID – Own set of pids Chroot is filesystem namespace
• Current systems • Linux-‐Vserver, OpenVZ, LXC
![Page 7: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/7.jpg)
Container-‐based Systems • Linux-‐VServer
• Implements its own features in Linux kernel • limits the scope of the file system from different processes
through the tradi8onal chroot • OpenVZ
• Linux Containers (LXC) • Based on CGroups
![Page 8: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/8.jpg)
Hypervisor-‐ vs Container-‐based Systems
Hypervisor Container Different Kernel OS Single Kernel Device Emula8on Syscall Many FS caches Single FS cache Limits per machine Limits per process High Performance Overhead Low Performance Overhead
![Page 9: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/9.jpg)
MapReduce • MapReduce • A parallel programming model • Simplicity, efficiency and high scalability • It has become a de facto standard for large-‐scale data analysis
• MapReduce has also aUracted the aUen8on of the HPC
community • Simpler approach to address the parallelism problem • Highly visible case where MapReduce has been successfully
used by companies like Google, Yahoo!, Facebook and Amazon
![Page 10: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/10.jpg)
MapReduce and Containers • Apache Mesos • Shares a cluster between mul8ple different frameworks • Creates another level of resource management • Management is taken away from cluster’s RMS
• Apache YARN • Hadoop Next Genera8on • BeUer job scheduling/monitoring • Uses virtualiza8on to share a cluster among different
applica8ons
![Page 11: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/11.jpg)
Evalua8on • Experimental Environment
• Hadoop cluster composed by 4 nodes • Two processors with 8 cores (without threads) per node • 16GB of memory per node • 146GB of disksize per node
• Analyze of the best results of performance • Through micro-‐benchmarks
• HDFS evalua8on (TestDFSIO) • NameNode evalua8on (NNBench) • MapReduce evalua8on (MRBench)
• Through macro-‐benchmarks (WordCount, TeraBench) • Analyze of best results of isola8on
• Through IBS benchmark
• At least 50 execu8ons were performed for each experiment
![Page 12: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/12.jpg)
HDFS Evalua8on
• Semngs: • Replica8on of 3 blocks • File size from 100 MB to
3000 MB
• All Container-‐based systems have performance similar to na8ve
• Results o OpenVZ represents loss of 3Mbps
• It is due to the CFQ scheduler
0
5
10
15
20
25
30
0 1000 2000 3000File size (Bytes)
Thro
ughp
ut (M
bps)
lxcnativaovzvserver
![Page 13: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/13.jpg)
HDFS Evalua8on • All of Container-‐based
systems obtained performance results similar to na8ve
• Linux-‐VServer uses a
Physical-‐based network
0
5
10
15
20
25
30
0 1000 2000 3000File size (Bytes)
Thro
ughp
ut (M
bps)
lxcnativaovzvserver
![Page 14: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/14.jpg)
NameNode Evalua8on using NNBench
• NNBench benchmark was chosen to evaluate the NameNode component • Linux-‐VServer reaches a latency at a average of 48ms, while LXC obtained the
worst result at an average of 56ms • The differences are not so significant if the numbers are considered • However, the strengths are that no excep8on was observed during the high
HDFS management stress, and that all systems were able to respond effec8vely as the na8ve
Na8ve LXC OpenVZ VServer
Open/Read (ms) 0.51 0.52 0.51 0.49
Create/Write (ms) 54.65 56.89 51.96 48.90
• Generates opera8ons on 1000 files on HDFS
![Page 15: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/15.jpg)
MapReduce Evalua8on using MRBench
• The results obtained from MRBench show that MR layer suffers no substan8al effect while running on different container-‐based virtualiza8on systems
Na8ve LXC OpenVZ VServer
Execu8on Time 14251 13577 14304 13614
![Page 16: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/16.jpg)
Analyzing Performance with WordCount
0
20
40
60
80
100
120
140
160
180
Wordcount
Exec
utio
n Ti
me
(sec
onds
)
NativeLXCOpenVZVServer
• 30 GB of input data
• The peak of performance degrada8on from OpenVZ is explained by the I/O scheduler overhead
![Page 17: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/17.jpg)
Analyzing Performance with TeraSort
0
20
40
60
80
100
120
140
Terasort
Exec
utio
n Ti
me
(sec
onds
)
NativeLXCOpenVZVServer
• Standard map/reduce sort • Steps: • Generates 30 GB of input
data • Run on such input data.
• A HDFS block size of 64MB
![Page 18: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/18.jpg)
Performance Isola8on
Container A
Container A
Container B
Base line applica8on
Base line applica8on
Stress Test
Execu8on Time Execu8on Time
Performance degrada8on (%)
![Page 19: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/19.jpg)
Performance Isola8on
CPU Memory I/O Fork Bomb
LXC 0% 8.3% 5.5% 0%
• We chose LXC as the representa8ve of the container-‐based virtualiza8on to be evaluated
• The limits of the CPU usage per container is working well • no significant impact was noted. • a liUle performance degrada8on needs to be taken into account • The fork bomb stress test reveals that the LXC has a security subsystem that
ensure feasibility
![Page 20: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/20.jpg)
Conclusions • we found that all container-‐based systems reach a near-‐na8ve performance for
MapReduce workloads • the results of performance isola8on reveled that the LXC has improved its
capabili8es of restrict resources among containers • although some works are already taking advantages of container-‐based
systems on MR clusters • this work demonstrated the benefits of using container-‐based systems to
support MapReduce clusters
![Page 21: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/21.jpg)
Future Work
• We plan to study the performance isola8on at the network-‐level • We plan to study the scalability while increasing the number of
nodes • We plan to study aspects regarding the green compu8ng, such as
the trade-‐off between performance and energy consump8on
![Page 22: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters](https://reader030.fdocuments.us/reader030/viewer/2022032700/55d5a2d7bb61eb09168b47a3/html5/thumbnails/22.jpg)
Thank you for your aUen8on!