FastR+Apache Flink
date post
23-Jan-2018Category
Software
view
977download
1
Embed Size (px)
Transcript of FastR+Apache Flink
- 1. FastR + Apache Flink Enabling distributed data processing in R
- 2. Disclaimer This is a work in progress
- 3. Outline 1. Introduction and motivation Flink and R Oracle Truffle + Graal 2. FastR and Flink Execution Model Memory Management 3. Preliminary Results 4. Conclusions and Future Work 5. [DEMO] within a distributed configuration
- 4. Introduction and motivation
- 5. Why R? http://spectrum.ieee.org R is one of the most popular languages for data analytics Functional Lazy Object Oriented Dynamic R is:
- 6. But, R is slow Benchmark SlowdowncomparedtoC
- 7. Problem It is neither fast nor distributed
- 8. Problem It is neither fast nor distributed
- 9. Apache Flink Framework for distributed stream and batch data processing Custom scheduler Custom optimizer for operations Hadoop/Amazon plugins YARN connector for cluster execution It runs on laptops, clusters and supercomputers with the same code High level APIs: Java, Scala and Python
- 10. Flink Example, Word Count public class LineSplit { public void flatMap(...) { for (String token : value.split("W+")) if (token.length() > 0) { out.collect(new Tuple2(token, 1); } } DataSet textInput = env.readTextFile(input); DataSet counts= textInput.flatMap(new LineSplit()) .groupBy(0) .sum(1); JobManager TaskManager TaskManager Flink Client And Optimizer
- 11. We propose a compilation approach built within Truffle/Graal for distributed computing on top of Apache Flink. Our Solution How can we enable distributed data processing in R exploiting Flink?
- 12. FastR on Top of Flink * Flink Material FastR
- 13. Truffle/Graal Overview
- 14. How to Implement Your Own Language? Oracle Labs material
- 15. Truffle/Graal Infrastructure Oracle Labs material
- 16. Truffle node specialization Oracle Labs material
- 17. Deoptimization and rewriting Oracle Labs material
- 18. More Details about Truffle/Graal https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations [Thomas Wrthinger, Christian Wimmer, Andreas W, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, Mario Wolczko] One VM to Rule Them All In Proceedings of Onward!, 2013.
- 19. OpenJDK Graal - New JIT for Java Graal Compiler is a JIT compiler for Java which is written in Java The Graal VM is a modification of the HotSpot VM -> It replaces the client and server compilers with the Graal compiler. Open Source: http://openjdk.java.net/projects/graal/
- 20. FastR + Flink: implementation
- 21. Our Solution: FastR - Flink Compiler Our goal: Run R data processing applications in a distributed system as easy as possible Approach Custom FastR-Flink compiler builtins (library) Minimal R/Flink setup and configuration Offload hard computation if the cluster is available
- 22. FastR-Flink API Example - Word Count # Word Count in R bigText