Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Post on 21-Dec-2015

217 views 1 download

Tags:

Transcript of Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Data Visualization with R (II)

Dr. Jieh-Shan George YEHjsyeh@pu.edu.tw

2

Outlines

• Data Visualization with R• Visualizing Different Type of Data– Univariate– Univariate Categorical– Bivariate Categorical– Bivariate Continuous vs Categorical– Bivariate Continuous vs Continuous– Bivariate: Continuous vs Time

3

Data Visualization with R

• Both anecdotally, and per Google Trends, R is the language and tool most closely associated with creating data visualizations. – http://www.google.com/trends/explore?hl=en-US#q=

R%20language,%20Data%20Visualization,%20D3.js,%20Processing.js&cmpt=q

4

Google Trend on R & Data Visualization

5

Google Trend on R & Data Visualization

6

GRAPH FOR DATA MINING

7

Hierarchical Clustering

• hc<-hclust(dist(mtcars))• plot(hc)• rect.hclust(hc, k=4)

8

Decision Tree

require(rpart)require(rpart.plot)rp1<-rpart(factor(cyl)~mpg, data=mtcars)prp(rp1)

9

OTHERS

10

Financial TimeseriesQuantitative Financial Modeling Framework

• require(quantmod)• getSymbols("YHOO",src="google") # from google

finance• getSymbols("YHOO", from="2014-01-01")• chartSeries(YHOO)

11

• barChart(YHOO)• candleChart(YHOO,multi.col=TRUE,theme="white") • chartSeries(to.weekly(YHOO),up.col='white',dn.col='

blue')

12

GGPLOT2

13

ggplot2

• The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots.

• Originally based on Leland Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner.

• Grouping can be represented by color, symbol, size, and transparency. The creation of trellis plots (i.e., conditioning) is relatively simple.

• qplot() (for quick plot) hides much of this complexity when creating standard graphs.

14

qplot()• The qplot() function can be used to create the most common graph

types. While it does not expose ggplot's full power, it can create a very wide range of useful plots. The format is:

qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)

Notes:• At present, ggplot2 cannot be used to create 3D graphs or mosaic

plots.• Use I(value) to indicate a specific value. For example size=z makes the

size of the plotted points or lines proportional to the values of a variable z. In contrast, size=I(3) sets each point or line to three times the default size.

15

Customizing ggplot2 Graphs

• Unlike base R graphs, the ggplot2 graphs are not effected by many of the options set in the par( ) function.

• They can be modified using the theme() function, and by adding graphic parameters within the qplot() function.

• For greater control, use ggplot() and other functions provided by the package.

• ggplot2 functions can be chained with "+" signs to generate the final plot.

16

17

Example

# ggplot2 exampleslibrary(ggplot2)

# create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))

18

# Kernel density plots for mpg# grouped by number of gears (indicated by color)qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density")

19

# Scatterplot of mpg vs. hp for each combination of gears and cylinders# in each facet, transmission type is represented by shape and colorqplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")

20

# Separate regressions of mpg on weight for each number of cylindersqplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, xlab="Weight", ylab="Miles per Gallon“, main="Regression of MPG on Weight",

)

21

# Boxplots of mpg by number of gears # observations (points) are overlayed and jitteredqplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon")

22

• To learn more, see the ggplot reference site– http://docs.ggplot2.org/current/index.html