Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server...

34
Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George Ostrouchov July 19, 2016 Slides: http://bit.do/cddXm Drew Schmidt Introducing a New Client/Server Framework for Big Data Analytics with the R Language

Transcript of Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server...

Page 1: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Introducing a New Client/Server Framework for BigData Analytics with the R Language

Drew Schmidt, Wei-Chen, and George Ostrouchov

July 19, 2016

Slides: http://bit.do/cddXm Drew SchmidtIntroducing a New Client/Server Framework for Big Data

Analytics with the R Language

Page 2: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Acknowledgements

Support

This work was supported in part by the project “Harnessing ScalableLibraries for Statistical Computing on Modern Architectures andBringing Statistics to Large Scale Computing” funded by the NationalScience Foundation Division of Mathematical Sciences under Grant No.1418195.This work used resources supported by the National ScienceFoundation under Grant Number 1137097 and by the University ofTennessee through the Beacon Project.Just got an allocation on Comet! Thank you XSEDE!

Disclaimer

The findings and conclusions in this presentation have not beenformally disseminated by the U.S. Department of Health & HumanServices nor by the U.S. Department of Energy, and should not beconstrued to represent any determination or policy of University,Agency, Administration and National Laboratory.

Slides: http://bit.do/cddXm Drew SchmidtIntroducing a New Client/Server Framework for Big Data

Analytics with the R Language

Page 3: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Background and Motivation

1 Background and Motivation

2 pbdR Compute Backend

3 The Client/Server

4 Challenges and Future Work

Slides: http://bit.do/cddXm Drew Schmidt

Page 4: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Background and Motivation

Problems with ”Big Data“ Software

Many frameworks; what do they all do?

Don’t always play nice with HPC systems.

Often not as ”high level“ as advertised.

Almost exclusively batch!

Slides: http://bit.do/cddXm Drew Schmidt 3/27

Page 5: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Background and Motivation

Data Analysis Is An Interactive Activity

Data analysis is an interactiveactivitya

aData analysis is an interactive activity

Slides: http://bit.do/cddXm Drew Schmidt 4/27

Page 6: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Background and Motivation

Why am I here?

XSEDE: a conference for users and service providers.

I want to talk to you!

Slides: http://bit.do/cddXm Drew Schmidt 5/27

Page 7: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

1 Background and Motivation

2 pbdR Compute Backend

3 The Client/Server

4 Challenges and Future Work

Slides: http://bit.do/cddXm Drew Schmidt

Page 8: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

pbdR

Client/Server + pbdR = interactive big data analysis

This talk is not about the old stuff. . .

But we have to talk about it!

Slides: http://bit.do/cddXm Drew Schmidt 6/27

Page 9: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

Programming with Big Data in R (pbdR)

Free/Open Source R packages.

Actively maintained, available on CRANand Github.

Packages: Library interfaces, high-levelframeworks, profiling and vis, . . .

For today: MPI+ScaLAPACK stuff.

Syntax identical to R.

Slides: http://bit.do/cddXm Drew Schmidt 7/27

Page 10: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

About the Name

Slides: http://bit.do/cddXm Drew Schmidt 8/27

Page 11: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

R as an Interface Language

http://datascience.la/john-chambers-user-2014-keynote/

Slides: http://bit.do/cddXm Drew Schmidt 9/27

Page 12: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

Distributed Matrices and Statistics with pbdDMAT

21.34

42.6885.35170.7 341.41

1016.09

21.3442.6885.35 170.7

341.411016.09

21.3442.68

85.35

170.7341.41

1016.09

25

50

75

100

125

5041008

20164032

806424000

5041008

20164032

806424000

5041008

20164032

806424000

Cores

Run

Tim

e (S

econ

ds)

Predictors 500 1000 2000

Fitting y~x With Fixed Local Size of ~43.4 MiB

l i b r a r y (pbdDMAT)i n i t . g r i d ( )

x <− ddmat r i x ( ” rnorm” ,nrow=m, nco l=n )

y <− ddmat r i x ( ” rnorm” ,nrow=m, nco l =1)

mdl <− lm . f i t ( x=x , y=y )

f i n a l i z e ( )

Slides: http://bit.do/cddXm Drew Schmidt 10/27

Page 13: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

pbdR Paradigms/”Opinions“

In core.

Focus on dense data.

Integrates well with traditional HPC.

C and Fortran for performance; R for the interface.

Exclusively batch.

(Mostly) Very different from Hadoop/Spark!

Slides: http://bit.do/cddXm Drew Schmidt 11/27

Page 14: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

pbdR Compute Backend

”OLCF Researchers Scale R to Tackle Big Science Data Sets“

A problem that takes several hours onApache Spark [was analyzed] in less than aminute using R on OLCF high-performancehardware.

”. . . for situations where one needsinteractive near-real-time analysis, thepbdR approach is much better.“

https://www.hpcwire.com/2016/07/06/

olcf-researchers-scale-r-tackle-big-science-data-sets/

Slides: http://bit.do/cddXm Drew Schmidt 12/27

Page 15: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server

1 Background and Motivation

2 pbdR Compute Backend

3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead

4 Challenges and Future Work

Slides: http://bit.do/cddXm Drew Schmidt

Page 16: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Design and Architecture of the Client Server

3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead

Slides: http://bit.do/cddXm Drew Schmidt

Page 17: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Design and Architecture of the Client Server

Motivation

Connect local R session to a remote one.

Slides: http://bit.do/cddXm Drew Schmidt 13/27

Page 18: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Design and Architecture of the Client Server

The C/S Framework

No specialized softwareenvironment; just R packages.

Designed for the cloud or HPCresources.

Mostly ”un-opinionated“; can useour compute backend, SparkR,whatever.

Optional security features:passwords and encryption.

Slides: http://bit.do/cddXm Drew Schmidt 14/27

Page 19: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Design and Architecture of the Client Server

A Quick Summary of the Main Components

pbdZMQ: ZeroMQ bindings.Use: our C/S packages; IRkernel, . . .

remoter: client/server core.Use: cloud computing, ”reference“ bigdata framework.

pbdCS: Extension of remoter for the MPIbackend stuff.Use: Bringing interactivity to pbdR ”bigdata“ system.

Slides: http://bit.do/cddXm Drew Schmidt 15/27

Page 20: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Design and Architecture of the Client Server

Comparison to Similar Utilities

”Moving computations to a remote machine“

Web frameworks: shiny, htmlwidgets, fiery, . . .

Revolution Microsoft Analytics: Microsoft R Server and R Client

Jupyter/notebooks.

RStudio Server

rmote

Slides: http://bit.do/cddXm Drew Schmidt 16/27

Page 21: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Examples

3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead

Slides: http://bit.do/cddXm Drew Schmidt

Page 22: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Examples

Basics

Elsewhere:

1 R> remoter :: server ()

2

3 ## [2016 -07 -19 12:53:23]: *** Launching secure server ***

In your local R session:

1 R> remoter :: client ()

2 remoter> x = 1

3 remoter> x

4 ## [1] 1

5 remoter> exit()

6

7 R> x

8 ## Error: object ’x’ not found

9 R> remoter :: client ()

10

11 remoter> s2c(x)

12 remoter> exit()

13 R> x

14 [1] 1

Slides: http://bit.do/cddXm Drew Schmidt 17/27

Page 23: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Examples

Slides: http://bit.do/cddXm Drew Schmidt 18/27

Page 24: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Examples

A Demo?

No time for a live demo. . .

Any time you see me at XSEDE16, ask and I’ll give you one!

For a non-live demo, see:https://github.com/snoweye/user2016.demo

More information in the remoter vignette:https://cran.r-project.org/web/packages/remoter/

vignettes/remoter.html

Slides: http://bit.do/cddXm Drew Schmidt 19/27

Page 25: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Overhead

3 The Client/ServerDesign and Architecture of the Client ServerExamplesOverhead

Slides: http://bit.do/cddXm Drew Schmidt

Page 26: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Overhead

Size Costs

There is a size overhead in transmitted messages.

For almost everything you would do, no big deal.

See paper for full discussion.

Slides: http://bit.do/cddXm Drew Schmidt 20/27

Page 27: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Overhead

Relays

Slides: http://bit.do/cddXm Drew Schmidt 21/27

Page 28: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

The Client/Server Overhead

Run Time Costs

Slides: http://bit.do/cddXm Drew Schmidt 22/27

Page 29: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

1 Background and Motivation

2 pbdR Compute Backend

3 The Client/Server

4 Challenges and Future Work

Slides: http://bit.do/cddXm Drew Schmidt

Page 30: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

Already done!

Batch computing with a remote server.

Graphics and help, integration with rmote.

Many small things we didn’t bother to mention.

Slides: http://bit.do/cddXm Drew Schmidt 23/27

Page 31: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

Object Guarding

Currently possible to send huge amounts of data without realizing it.

1 pbdR > print(huge_thing)

2 ## uh-oh!

Slides: http://bit.do/cddXm Drew Schmidt 24/27

Page 32: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

Relays and Integration with the Scheduler

Some day. . .

1 R> bigJob(machine="stampede", nodes =10, interactive=TRUE)

Slides: http://bit.do/cddXm Drew Schmidt 25/27

Page 33: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

References

Schmidt, D., Chen, W.-C. and Ostrouchov, G. (2016), “Introducing aNew Client/Server Framework for Big Data Analytics with the RLanguage, XSEDE2016.

Chen, W.-C., Ostrouchov, G., Schmidt, D., Patel, P. and Yu, H.(2012), “A Quick Guide for the pbdMPI Package. R Vignette, URLhttp://cran.r-project.org/package=pbdMPI

Schmidt, D., Chen, W.-C., Patel, P., and Ostrouchov, G. (2014),“Speaking Serial R with a Parallel Accent. R Vignette, URLhttp://cran.r-project.org/package=pbdDEMO

Chen, W.-C., Ostrouchov, G., Pugmire, D., Prahat, and Wehner, M.(2013), “A Parallel EM Algorithm for Model-Based Clustering Appliedto the Exploration of Large Spatio-Temporal Data, Technometrics,55(4), 513-523.

Slides: http://bit.do/cddXm Drew Schmidt 26/27

Page 34: Introducing a New Client/Server Framework for Big Data ... · Introducing a New Client/Server Framework for Big Data Analytics with the R Language Drew Schmidt, Wei-Chen, and George

Challenges and Future Work

∼Thanks!∼

Questions?

Email: [email protected]

GitHub: https://github.com/wrathematics

Web: http://wrathematics.info

Blog: http://librestats.com

Twitter: @wrathematics

Slides: http://bit.do/cddXm Drew Schmidt 27/27