Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Relationship
Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server...
Transcript of Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server...
Introducing a New Client/Server Framework for BigData Analytics with the R Language
Drew Schmidt, Wei-Chen, and George Ostrouchov
July 19, 2016
Slides: http://bit.do/cddXm Drew SchmidtIntroducing a New Client/Server Framework for Big Data
Analytics with the R Language
Acknowledgements
Support
This work was supported in part by the project “Harnessing ScalableLibraries for Statistical Computing on Modern Architectures andBringing Statistics to Large Scale Computing” funded by the NationalScience Foundation Division of Mathematical Sciences under Grant No.1418195.This work used resources supported by the National ScienceFoundation under Grant Number 1137097 and by the University ofTennessee through the Beacon Project.Just got an allocation on Comet! Thank you XSEDE!
Disclaimer
The findings and conclusions in this presentation have not beenformally disseminated by the U.S. Department of Health & HumanServices nor by the U.S. Department of Energy, and should not beconstrued to represent any determination or policy of University,Agency, Administration and National Laboratory.
Slides: http://bit.do/cddXm Drew SchmidtIntroducing a New Client/Server Framework for Big Data
Analytics with the R Language
Background and Motivation
1 Background and Motivation
2 pbdR Compute Backend
3 The Client/Server
4 Challenges and Future Work
Slides: http://bit.do/cddXm Drew Schmidt
Background and Motivation
Problems with ”Big Data“ Software
Many frameworks; what do they all do?
Don’t always play nice with HPC systems.
Often not as ”high level“ as advertised.
Almost exclusively batch!
Slides: http://bit.do/cddXm Drew Schmidt 3/27
Background and Motivation
Data Analysis Is An Interactive Activity
Data analysis is an interactiveactivitya
aData analysis is an interactive activity
Slides: http://bit.do/cddXm Drew Schmidt 4/27
Background and Motivation
Why am I here?
XSEDE: a conference for users and service providers.
I want to talk to you!
Slides: http://bit.do/cddXm Drew Schmidt 5/27
pbdR Compute Backend
1 Background and Motivation
2 pbdR Compute Backend
3 The Client/Server
4 Challenges and Future Work
Slides: http://bit.do/cddXm Drew Schmidt
pbdR Compute Backend
pbdR
Client/Server + pbdR = interactive big data analysis
This talk is not about the old stuff. . .
But we have to talk about it!
Slides: http://bit.do/cddXm Drew Schmidt 6/27
pbdR Compute Backend
Programming with Big Data in R (pbdR)
Free/Open Source R packages.
Actively maintained, available on CRANand Github.
Packages: Library interfaces, high-levelframeworks, profiling and vis, . . .
For today: MPI+ScaLAPACK stuff.
Syntax identical to R.
Slides: http://bit.do/cddXm Drew Schmidt 7/27
pbdR Compute Backend
About the Name
Slides: http://bit.do/cddXm Drew Schmidt 8/27
pbdR Compute Backend
R as an Interface Language
http://datascience.la/john-chambers-user-2014-keynote/
Slides: http://bit.do/cddXm Drew Schmidt 9/27
pbdR Compute Backend
Distributed Matrices and Statistics with pbdDMAT
21.34
42.6885.35170.7 341.41
1016.09
21.3442.6885.35 170.7
341.411016.09
21.3442.68
85.35
170.7341.41
1016.09
25
50
75
100
125
5041008
20164032
806424000
5041008
20164032
806424000
5041008
20164032
806424000
Cores
Run
Tim
e (S
econ
ds)
Predictors 500 1000 2000
Fitting y~x With Fixed Local Size of ~43.4 MiB
l i b r a r y (pbdDMAT)i n i t . g r i d ( )
x <− ddmat r i x ( ” rnorm” ,nrow=m, nco l=n )
y <− ddmat r i x ( ” rnorm” ,nrow=m, nco l =1)
mdl <− lm . f i t ( x=x , y=y )
f i n a l i z e ( )
Slides: http://bit.do/cddXm Drew Schmidt 10/27
pbdR Compute Backend
pbdR Paradigms/”Opinions“
In core.
Focus on dense data.
Integrates well with traditional HPC.
C and Fortran for performance; R for the interface.
Exclusively batch.
(Mostly) Very different from Hadoop/Spark!
Slides: http://bit.do/cddXm Drew Schmidt 11/27
pbdR Compute Backend
”OLCF Researchers Scale R to Tackle Big Science Data Sets“
A problem that takes several hours onApache Spark [was analyzed] in less than aminute using R on OLCF high-performancehardware.
”. . . for situations where one needsinteractive near-real-time analysis, thepbdR approach is much better.“
https://www.hpcwire.com/2016/07/06/
olcf-researchers-scale-r-tackle-big-science-data-sets/
Slides: http://bit.do/cddXm Drew Schmidt 12/27
The Client/Server
1 Background and Motivation
2 pbdR Compute Backend
3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead
4 Challenges and Future Work
Slides: http://bit.do/cddXm Drew Schmidt
The Client/Server Design and Architecture of the Client Server
3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead
Slides: http://bit.do/cddXm Drew Schmidt
The Client/Server Design and Architecture of the Client Server
Motivation
Connect local R session to a remote one.
Slides: http://bit.do/cddXm Drew Schmidt 13/27
The Client/Server Design and Architecture of the Client Server
The C/S Framework
No specialized softwareenvironment; just R packages.
Designed for the cloud or HPCresources.
Mostly ”un-opinionated“; can useour compute backend, SparkR,whatever.
Optional security features:passwords and encryption.
Slides: http://bit.do/cddXm Drew Schmidt 14/27
The Client/Server Design and Architecture of the Client Server
A Quick Summary of the Main Components
pbdZMQ: ZeroMQ bindings.Use: our C/S packages; IRkernel, . . .
remoter: client/server core.Use: cloud computing, ”reference“ bigdata framework.
pbdCS: Extension of remoter for the MPIbackend stuff.Use: Bringing interactivity to pbdR ”bigdata“ system.
Slides: http://bit.do/cddXm Drew Schmidt 15/27
The Client/Server Design and Architecture of the Client Server
Comparison to Similar Utilities
”Moving computations to a remote machine“
Web frameworks: shiny, htmlwidgets, fiery, . . .
Revolution Microsoft Analytics: Microsoft R Server and R Client
Jupyter/notebooks.
RStudio Server
rmote
Slides: http://bit.do/cddXm Drew Schmidt 16/27
The Client/Server Examples
3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead
Slides: http://bit.do/cddXm Drew Schmidt
The Client/Server Examples
Basics
Elsewhere:
1 R> remoter :: server ()
2
3 ## [2016 -07 -19 12:53:23]: *** Launching secure server ***
In your local R session:
1 R> remoter :: client ()
2 remoter> x = 1
3 remoter> x
4 ## [1] 1
5 remoter> exit()
6
7 R> x
8 ## Error: object ’x’ not found
9 R> remoter :: client ()
10
11 remoter> s2c(x)
12 remoter> exit()
13 R> x
14 [1] 1
Slides: http://bit.do/cddXm Drew Schmidt 17/27
The Client/Server Examples
Slides: http://bit.do/cddXm Drew Schmidt 18/27
The Client/Server Examples
A Demo?
No time for a live demo. . .
Any time you see me at XSEDE16, ask and I’ll give you one!
For a non-live demo, see:https://github.com/snoweye/user2016.demo
More information in the remoter vignette:https://cran.r-project.org/web/packages/remoter/
vignettes/remoter.html
Slides: http://bit.do/cddXm Drew Schmidt 19/27
The Client/Server Overhead
3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead
Slides: http://bit.do/cddXm Drew Schmidt
The Client/Server Overhead
Size Costs
There is a size overhead in transmitted messages.
For almost everything you would do, no big deal.
See paper for full discussion.
Slides: http://bit.do/cddXm Drew Schmidt 20/27
The Client/Server Overhead
Relays
Slides: http://bit.do/cddXm Drew Schmidt 21/27
The Client/Server Overhead
Run Time Costs
Slides: http://bit.do/cddXm Drew Schmidt 22/27
Challenges and Future Work
1 Background and Motivation
2 pbdR Compute Backend
3 The Client/Server
4 Challenges and Future Work
Slides: http://bit.do/cddXm Drew Schmidt
Challenges and Future Work
Already done!
Batch computing with a remote server.
Graphics and help, integration with rmote.
Many small things we didn’t bother to mention.
Slides: http://bit.do/cddXm Drew Schmidt 23/27
Challenges and Future Work
Object Guarding
Currently possible to send huge amounts of data without realizing it.
1 pbdR > print(huge_thing)
2 ## uh-oh!
Slides: http://bit.do/cddXm Drew Schmidt 24/27
Challenges and Future Work
Relays and Integration with the Scheduler
Some day. . .
1 R> bigJob(machine="stampede", nodes =10, interactive=TRUE)
Slides: http://bit.do/cddXm Drew Schmidt 25/27
Challenges and Future Work
References
Schmidt, D., Chen, W.-C. and Ostrouchov, G. (2016), “Introducing aNew Client/Server Framework for Big Data Analytics with the RLanguage, XSEDE2016.
Chen, W.-C., Ostrouchov, G., Schmidt, D., Patel, P. and Yu, H.(2012), “A Quick Guide for the pbdMPI Package. R Vignette, URLhttp://cran.r-project.org/package=pbdMPI
Schmidt, D., Chen, W.-C., Patel, P., and Ostrouchov, G. (2014),“Speaking Serial R with a Parallel Accent. R Vignette, URLhttp://cran.r-project.org/package=pbdDEMO
Chen, W.-C., Ostrouchov, G., Pugmire, D., Prahat, and Wehner, M.(2013), “A Parallel EM Algorithm for Model-Based Clustering Appliedto the Exploration of Large Spatio-Temporal Data, Technometrics,55(4), 513-523.
Slides: http://bit.do/cddXm Drew Schmidt 26/27
Challenges and Future Work
∼Thanks!∼
Questions?
Email: [email protected]
GitHub: https://github.com/wrathematics
Web: http://wrathematics.info
Blog: http://librestats.com
Twitter: @wrathematics
Slides: http://bit.do/cddXm Drew Schmidt 27/27