Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll,...
-
Upload
jesus-pope -
Category
Documents
-
view
217 -
download
3
Transcript of Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll,...
Open Source AnalyticsVisualization and Predictive Modeling of Big Data with R
Michael E. Driscoll, Ph.D.July 22, 2009
OSCON
(from Jessica Hagy’s thisisindexed.com)
“Hard-working Middle Class” Hypothesis
gdp <- read.csv('gdp.csv')hours <- read.csv('hours.csv')gdp.hours <- merge(hours,gdp)gdp.hours$freetime <- 4380 - gdp.hours$hours attach(gdp.hours)plot(freetime ~ gdp)
m <- lm(freetime ~ gdp,data=gdp.hours)abline(m,col=3,lw=2)pm <- loess(freetime ~ gdp)lines(spline(gdp,fitted(pm)))
Munge & Model OECD Data
Visualize the Analysis: is it True?
modeling Big Data
100thousand gene measures
1million transactions during this presentation
If You Liked ____, You’ll Love ___ !
1 billion clicks during this presentation
1 million pitches thrownsince 2007
A Tale of Two PitchersH
amel
sW
ebb
xyplot(x ~ y, data=pitch)
xyplot(x ~ y, groups=type, data=pitch)
xyplot(x ~ y | type, data=pitch)
xyplot(x ~ y | type, data=pitch,fill.color = pitch$color,panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })
xyplot(x ~ y | type, data=pitch,fill.color = pitch$color,panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
visualizingBig Data
ggplot2 =grammar ofgraphics
qplot(carat, price, data = diamonds)
qplot(log(carat), log(price), data = diamonds)
qplot(carat, price, log=“xy”, data = diamonds)OR
qplot(log(carat), log(price), data = diamonds, alpha = I(1/20))
qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)
R on the cloud
DataData
DesktopDesktop
Coding Clickingvs
LinuxApacheMySQLR
http://labs.dataspora.com/gameday
Final thoughts