World Journal of Pharmaceutical Research Pravin K. Bhoyar ...
SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar...
Transcript of SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar...
![Page 1: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/1.jpg)
Presented by
Date
Event
SFO15-TR6: Hadoop on ARM
Nachiket BhoyarSteve Capper
Nachiket BhoyarSteve Capper
Wednesday 23 September 2015
SFO15
![Page 2: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/2.jpg)
Agenda
1. Quick intro to Hadoop stack.2. Summary of our work.3. Demo time!4. Q & A
![Page 3: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/3.jpg)
The Hadoop Stack
And lots more components!.....
![Page 4: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/4.jpg)
● LOTS of components fit with Hadoop.● Hadoop distros package these.● The Open Data Platform Initiative has just
been formed to promote compatibility between Hadoop distros.
The Hadoop Distribution
![Page 5: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/5.jpg)
Our Hadoop work
● Open Data Platform is in early days.● A Hadoop distro was needed for us to start
experimenting with for AArch64.● We chose to start with Hortonworks (who are
a member of Open Data Platform).● We will move on to work with Open Data
Platform distributions.
![Page 6: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/6.jpg)
AArch64 Hadoop Work
● A lot of ramp up on build systems (Ant, Ivy, Maven, Gradle…), and tweaking build logic.
● We had to stop builds downloading the x86 version of node.js then running it on ARM…○ io.js was needed as it worked with AArch64 V8 JS.
● Otherwise, things mostly just worked.● Upstream Hadoop and Spark are being
investigated too.
![Page 7: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/7.jpg)
OpenJDK Work
● Building and testing Hadoop + Spark has given the AArch64 OpenJDK a very good stress test.
● A bug has been found and it has been fixed in the 1508 OpenJDK release:○ https://bugs.openjdk.java.net/browse/JDK-8133842
![Page 8: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/8.jpg)
Future work
● We need to package up everything:○ currently tricky as we don’t have the deb/rpm logic,○ some build systems appear to download the internet○ which is very bad in areas with no local mirrors!
● Clusters to be deployed + tested + profiled.● Workloads that are representative of real
world need to formulated and executed as well as micro-benchmarks.
![Page 9: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/9.jpg)
Demo Time!
![Page 10: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/10.jpg)
Useful H2O Links• H2O: http://h2o.ai/product/• Flow: https://github.com/h2oai/h2o-3/blob/master/h2o-
docs/src/product/flow/README.md• Downloads: http://h2o.ai/download/• HDP + H2O video: https://www.youtube.com/watch?
v=KigG7rPBNHM• Documentation: http://h2o-release.s3.amazonaws.com/h2o/rel-
simons/7/docs-website/h2o-docs/index.html• H2O Airlines Demo Video: https://www.youtube.com/watch?
v=bInMSgZhDd4
![Page 11: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/11.jpg)
Thank you for your attention!
Any questions/comments?
![Page 12: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/12.jpg)
Backup Slides
![Page 13: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/13.jpg)
Agenda
1. What is H2O?2. What is a Flow?3. H2O with Hadoop4. System Configuration5. Demo6. Summary
![Page 14: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/14.jpg)
What is H2O?
● Data collection is easy. Decision making is hard.● H2O derives insight using faster and better predictive
modelling.● Combines power of:
○ Highly advanced algorithms○ Freedom of open source○ Capacity of scalable in-memory processing
● Processes big data on single or multiple nodes.● Supports R, Python, Scala, Java and ReST API.● Easy integration with Hadoop
![Page 15: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/15.jpg)
H2O Stack
![Page 16: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/16.jpg)
What is a Flow?● A Flow is an open-source user interface for H2O● Allows user to combine code execution, text,
mathematics, graphs, and rich media in a single document
● In simplest sense, it’s a sequence of executable cells● Cells can be modified, rearranged or saved to library● Each cell has input field to:
○ Enter commands○ Define functions○ Call other functions○ Access other cells/objects in the flow
![Page 17: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/17.jpg)
H2O with Hadoop
● H2O can be run as an application in Hadoop● It is run as a mapper process on each node● Easy integration of data from HDFS● Shows Cluster Status:
○ GC status, Disk usage, System usage, System load, etc.
○ Water meter to show status of cores
![Page 18: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/18.jpg)
System Configuration
● Cluster - 6 nodes of AMD Opteron A1100 ARM64 servers
● Memory - 64GB per node● OS - Fedora 22● JDK - Linaro Open JDK 1.8 15/08 release● Hadoop - Hortonworks HDP 2.6.0-SNAPSHOT● H2O version - h2o-3.0.0.30-hdp2.2
![Page 19: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/19.jpg)
Model Building Scaling
• Linear scaling observed for both 32GB and 64GB
![Page 20: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/20.jpg)
File Parsing Scaling
• This phase is network dependent• A linear scaling observed for 10GigE• Network bottleneck observed for 1GigE going beyond 2 nodes
![Page 21: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/21.jpg)
Summary
● AMD Opteron A1100 and Linaro Open JDK 1.8 scale linearly w.r.t. number of nodes on H2O
● 10GigE ethernet scales linearly whereas 1GigE suffers from bottleneck
![Page 22: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/22.jpg)
Summary - H2O
● H2O helps to easily apply math and predictive analytics to solve challenging business problems
● With H2O, you can:○ Make better predictions using ready-to-use algorithms and processing
power to analyze: bigger data sets, more models and more variables○ Work with your existing languages and tools○ Extend the platform seamlessly into your Hadoop environments
● It is Open Source
![Page 23: SFO15-TR6: Hadoop on ARM...SFO15-TR6: Hadoop on ARM Nachiket Bhoyar Steve Capper Nachiket Bhoyar Steve Capper Wednesday 23 September 2015 SFO15 Agenda 1. Quick intro to Hadoop stack.](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed4741464cb9d0fda746f13/html5/thumbnails/23.jpg)
Summary - Flow
● Import data Files > Build Models > Iteratively Improve them > Make predictions
● Easy-to-use Modern Graphical Interactive WebUI
● Access any H2O object in well-organized tabular data