Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ......
Transcript of Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ......
![Page 1: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/1.jpg)
1
DataAnalysisusingtheRProjectforSta8s8calCompu8ng
DanielaUshizimaNERSCAnaly8cs
LawrenceBerkeleyNa8onalLaboratory
![Page 2: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/2.jpg)
2
Outline
I. R‐programming– WhytouseR– Rinthescien8ficcommunity– Extensible– Graphics– Profiling
II. Exploratorydataanalysis– Regression– Clusteringalgorithms
III. Casestudy– Acceleratedlaser‐wakefieldpar8cles
IV. HPC– State‐of‐the‐art
![Page 3: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/3.jpg)
3
R‐PROGRAMMING
Packages,datavisualiza8onandexamples
![Page 4: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/4.jpg)
4
Download:hVp://www.r‐project.org
Recommendedtutorial:hVp://cran.r‐project.org/doc/contrib/Paradis‐rdebuts_en.pdf
is a language and environment forsta8s8cal compu8ng and graphics, aGNUproject.Rprovidesawidevarietyofsta8s8cal(linear and nonlinear modeling,classical sta8s8cal tests, 8me‐seriesanalysis, classifica8on, clustering, ...)and graphical techniques, and ishighlyextensible.
![Page 5: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/5.jpg)
5
1.WhytouseR?
• Open‐source,mul8pla^orm,extensible;
• EasyonuserswithfamiliaritywithS/S+,Matlab,PythonorIDL;
• Ac8veandgrowingcommunity:– Google,Pfizer,Merck,BankofAmerica,Boeing,theInterCon8nentalHotelsGroupandShell.
![Page 6: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/6.jpg)
2.Rinthescien8ficcommunity
6
![Page 7: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/7.jpg)
2.1.YouRwithNERSC
• GetstartedwithRonDaVinci:>moduleloadR
>R
>help()
>demo()
>help.start()
>source(‘your_func8on.R’)
>library(package_name)
7
![Page 8: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/8.jpg)
8
3.Extensible
• Add‐onpackages:– Datainput/output:hdf5,Rnetcdf,DICOM,etc.
– Graphics:trellis,gplot,RGL,fields,etc.
– Mul8variateanalysis:MASS,mclust,ape,etc.
– Otherlanguages:Rcpp,Rpy,R.matlab,etc.
![Page 9: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/9.jpg)
9
4.Sta8s8calanalysisandgraphs
• Histogram
• Density• Boxplot• Mul8variateplot
• Condi8oningplot• Contourplot
![Page 10: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/10.jpg)
10
4.1.Mul8variateplots
> data=read.table('ozone.data.txt',header=T)
> names(data)[1]"rad""temp""wind""ozone“
> pairs(data,panel.smooth)#panel.smooth=locally‐weightedpolynomialregression
Ex: Explanatory variables: solar radiation, temperature, wind and the response variable ozone;
- use of pairs() with dataframes to check for dependencies between the variables.
![Page 11: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/11.jpg)
11
4.2.Condi8onalplots
• Checktherela8onofthetwoexplanatoryvariableswind,tempandtheresponsevariableozone;
>coplot(ozone~wind|temp,panel=panel.smooth)
![Page 12: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/12.jpg)
12
4.3.PackageRGLfor3Dvisualiza8on
• OpenGL‐rgl.demo.lsystem() ‐kerneldensityes8ma8on
UseVisit:h?ps://wci.llnl.gov/codes/visit/
![Page 13: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/13.jpg)
13
5.Profilingseveral.8mes<‐func8on(n,f,...){for(iin1:n){f(...)}}
matrix.mul8plica8on<‐func8on(s){A<‐matrix(1:(s*s),nr=s,nc=s)B<‐matrix(1:(s*s),nr=s,nc=s)C<‐A%*%B}
v<‐NULLfor(iin2:10){v<‐append(v,system.8me(several.8mes(10000,matrix.mul8plica8on,i))[1])}plot(v,type='b',pch=15,main="Matrixproductcomputa8on8me")
• Wheredoesyourprogramspendmore8me?
Variable number of arguments
Alsotrypackages:profrandproCools
![Page 14: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/14.jpg)
14
EXPLORATORYDATAANALYSIS
Basicsandbeyond
![Page 15: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/15.jpg)
15
1.Sta8s8calanalysis• Sta8s8calmodeling:checkforvaria8onsintheresponsevariablegivenexplanatoryvariables;– Linearregression
• Mul8variatesta8s8cs:lookforstructureinthedata;– Clustering:
• Hierarchical– Dendrograms
• Par88oning– Kmeans(stats)
– Mixture‐models(mclust)
![Page 16: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/16.jpg)
16
2.Linearregression• Ex:Findtheequa8onthatbestfitthedata,giventhedecayof
radioac8veemissionovera50‐dayperiod
• Linearregression:variablesexpectedtobelinearlyrelated;• Maximumlikelihoodes8matesofparameters=leastsquares;
![Page 17: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/17.jpg)
2.1.Linearregressiondata=read.table('sapdecay.txt',header=T)aEach(data)
par(mfrow=c(1,3))plot(x,y,main='DecayofradioacNveemissionovera50‐dayperiod',xlab='days')#thelog(y)givesaroughideaofthedecayconstant,a,forthesedatabylinearregressionoflog(y)againstx
mylm=lm(log(y)~x)print(mylm$coefficients)#sumofsquaresofthedifferencebetweentheobservedyvandpredictedypvaluesofy,givenaspecificvalueofparameterasumsq<‐funcNon(a,xv=x,yv=y)
{yp=exp(‐a*xv)#predictedmodelforysum((yv‐yp)^2)}
a=seq(0.01,0.2,.005)sq=sapply(a,sumsq)plot(a,sq,type='l',xlab='decayconstant',ylab='sumofsquaresof(observ‐predicted)')
decayK=a[min(sq)==sq]#thisistheleast‐squaresesNmateforthedecayconstantmatplot(decayK,min(sq),pch=19,col='red',add=T)plot(x,y)days=seq(0,50,0.1)
lines(days,exp(‐decayK*days),col='blue‘)detach()
17
![Page 18: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/18.jpg)
18
3.Clusteranalysis
• Hierarchical– dendrogram(stats)
• Par88oning– kmeans(stats)
• Mixture‐models:– Mclust(mclust)
Iris dataset: 150 samples of Iris flowers described in terms of its petal and sepal length and width
![Page 19: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/19.jpg)
3.1.Hierarchicalclustering
19
• Analysisonasetofdissimilari8es,combinedtoagglomera8onmethodsforanalyzingit:
• Dissimilari8es:Euclidean,ManhaVan,…
• Methods:– ward,single,complete,
average,mcquiVy,medianorcentroid.
![Page 20: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/20.jpg)
3.2.K‐means
• Splitnobserva8onsintokclusters;– eachobserva8onbelongsto
theclusterwiththenearestmean.
20
setosaversicolorvirginica104814
20236
35000
![Page 21: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/21.jpg)
3.3.Model‐basedclustering
• MixtureModels– Eachclusterismathema8callyrepresentedby
aparametricdistribu8on;– Setofkdistribu8onsiscalledamixture,and
theoverallmodelisafinitemixturemodel;– Eachprobabilitydistribu8ongivesthe
probabilityofaninstancebeinginagivencluster.
2121
![Page 22: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/22.jpg)
22
Casestudy
Acceleratedlaser‐wakefieldpar8cles
http://www.lbl.gov/publicinfo/newscenter/features/2008/apr/af-bella.html
![Page 23: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/23.jpg)
time steps
• PI:C.Geddes(LBNL)inSciDACCOMPASSproject,Incite.
• Accomplishments:– Describedcompactelectroncloudsusingminimumenclosingellipsoids;– Developedalgorithmstoadaptmixturemodelclusteringtolargedatasets;
• ScienceImpact:– Automateddetec8onandanalysisofcompactelectronclouds;– Deriveddispersionfeaturesofelectronclouds;– Extensiblealgorithmstootherscienceproblems;
• Collaborators:– Tech‐X– MathGroup,LBNL– UCDavis,UniversityofKaiserlautern
KnowledgediscoveryinLWFAscienceviamachinelearning
![Page 24: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/24.jpg)
24
Framework
• Goal:automatetheanalysisofelectronbunchesbydetec8ngcompactgroupsofpar8cles,subjectedtosimilarmomentumandspa8o‐temporalcoherence.
![Page 25: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/25.jpg)
25
B1.Selectrelevantpar8cles
• Beamsofinterestarecharacterizedbyhighdensityofhigh‐energypar8cles:
1. Elimina8onoflowenergypar8cles(px<1e10)
– Wakeoscilla8on:px<=1e9– Excludespar8clesofthebackground
2. Calcula8onofthesimula8onaveragenumberofpar8cles(µs);
3. Elimina8onof8mestepswithnumberofpar8clesinferiortoµs;
Representation of particle momentum in one time step: spline interpolation onto a grid for visualization of irregularly spaced input data.
Packages:akima,hdf5,fields
![Page 26: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/26.jpg)
26
B2.Kernel‐basedes8ma8on
• Kernel density estimators are less sensitive to the placement of the bin edges;
• Goal: retrieve a dense group of particles with similar spatial and momentum characteristics: argmax f(x,y,px), Neighborhood: 2 µm
Packages:misc3d,rgl,fields
![Page 27: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/27.jpg)
27
B3.Iden8fybeamcandidates
• Detec8onofcompactgroupsofpar8clesindependentofbeingamaximuminoneofthevariables;
![Page 28: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/28.jpg)
28
B4.Clusterusingmixturemodels
• Modelandnumberofclusterscanbeselectedatrun8me(mclust);
• Par88onofmul8dimensionalspace;
• Assumethatthefunc8onalformoftheunderlyingprobabilitydensityfollowsamixtureofnormaldistribu8ons;
Packages:mclust,rgl
![Page 29: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/29.jpg)
29
B5.Evalua8onofcompactness
• Bunchesofinterestmoveatspeed≈c,hencearenearlysta8onaryinthemovingsimula8onwindow;
• Movingaveragessmoothesoutshort‐termfluctua8onsandhighlightslonger‐termtrends.
![Page 30: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/30.jpg)
30
HighperformancecompuNng
Packages,challengesandnewbusinesses
![Page 31: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/31.jpg)
1.Improveperformance/reusability
• Goodcoding:avoidloops,vectoriza8on;• ExtendRusingcompiledcode:
– packages:Rcpp,inline• RecycleyourPythoncodes:
– Package:Rpython• Parallelism:
– Explicit:packagesRmpi,Rpvm,nws– Implicit:packagespnmath,pnmath0formul8threadedmath
func8ons
• Useout‐of‐memoryprocessingwith– packagesbigmemoryandff
31
![Page 32: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/32.jpg)
2.WhatisgoingonHPCinR?
• Parallelism:– Mul8core:mul8core,pnmath,…
– Computercluster:snow,Rmpi,rpvm,…– Gridcompu8ng:GRIDR,…
• GPU:– gputools:parallelalgorithmsusingCUDA+CUBLAS
• Extremelylargedata:– ff:memorymappedpagesofbinaryflatfiles.
32
![Page 33: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/33.jpg)
3.Nothingisperfect…
• Limitsonindividualobjects:onallversionsofR,themaximumnumberofelementsofavectoris2^31–1;
• RwilltakealltheRAMitcanget(Linuxonly);
• Moreinforma8on,type:
>help(‘Memory‐limits’)
>gc()#garbagecollector
>object.size(your_obj)#sizeofyourobject
33
![Page 34: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/34.jpg)
Takehome
34Source: http://www.nettakeaway.com/tp/R/129/understanding-r
![Page 35: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/35.jpg)
35
References• MichaelJ.Crawley.StaHsHcs:AnIntroducHonusingR.Wiley,2005.ISBN
0‐470‐02297‐3.– data:hVp://www.bio.ic.ac.uk/research/mjcraw/therbook/
• RobertH.ShumwayandDavidS.Stoffer.TimeSeriesAnalysisandItsApplicaHonsWithRExamples.Springer,NewYork,2006.ISBN978‐0‐387‐29317‐2
• Basics– h?p://cran.r‐project.org/doc/contrib/Short‐refcard.pdf– h?p://cran.r‐project.org/doc/contrib/refcard.pdf– hVp://cran.r‐project.org/doc/contrib/Paradis‐rdebuts_en.pdf– h?p://www.manning.com/kabacoff/Kabacoff_MEAPCH1.pdf
• Intermediate– h?p://math.acadiau.ca/ACMMaC/Rmpi/basics.html– User‐lists
Cheat sheets
![Page 36: Data Analysis using the R Project for Stascal Compung · R Project for Stascal Compung ... 3.1.Hierarchical clustering ... • Recycle your Python codes: ...](https://reader031.fdocuments.us/reader031/viewer/2022022511/5ae37a257f8b9ae74a8dbec0/html5/thumbnails/36.jpg)
36
Acknowledgements
http://www.sciviews.org/Tinn-R/