Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s...

25
Introduction to Graphics in R 3/12/2014

Transcript of Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s...

Page 1: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Introduction to Graphics in R

3/12/2014

Page 2: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

First, let’s get some data

• Load the Duncan dataset

• It’s in the car package. Remember how to get it?

– library(car)– data(Duncan)

Page 3: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Getting started

• Okay, now plot income levels:– plot(Duncan$income)

• What is this graph? Can you make it a line plot instead?– plot(Duncan$income, type=“l”)

Page 4: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Histogram

• The X axis is useless. Wouldn’t a histogram be more informative?

• Make a histogram• If you’re stuck, use google

– hist(Duncan$income)

Page 5: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Fix the title

• ‘Histogram of Duncan$income’ is not a good title

• Change it to ‘Income Distribution in Duncan Dataset’

– hist(Duncan$income, main="Income Distribution in Duncan Dataset")

Page 6: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Another option

• There’s another way to set the title. Maybe some of you will have done this (my crystal ball is murky):

– hist(Duncan$income)– title("Income Distribution in Duncan Dataset“)

• But wait. That looks awful. We need to not print the title as part of the hist() call. How do we do that?

• hist(Duncan$income, main="")

Page 7: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Scatterplot

• Okay, let’s look at income vs. prestige

• Make a scatterplot comparing income (x-axis) to prestige (y-axis)– plot(Duncan$income, Duncan$prestige)

• Did you get the x- and y- axes right?• Add a title: Income vs. Prestige– title(“Income vs. Prestige”)

Page 8: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Scatterplot: Axis labels

• The axis labels display the variable names. Can we do better than that?

• Label the X axis “Income” and the Y axis “Prestige”– plot(Duncan$income, Duncan$prestige,

xlab="Income", ylab="Prestige")

Page 9: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Scatterplot: Axis range

• How come income doesn’t have ticks at 0 and 100 but prestige does?

• Make both axes run from 0 to 100– plot(Duncan$income, Duncan$prestige,

xlab="Income", ylab="Prestige", xlim=c(0,100))

Page 10: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Scatterplot Axis Tick Marks

• Actually, your collaborator wants tick marks every 5 points on the X axis.

• DO IT• Caveat: this is trickier:– plot(Duncan$income, Duncan$prestige,

xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n")

– axis(1, at=seq(0,100, by=5))

Page 11: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Axis labels sideways

• Your collaborator still isn’t happy. Turn the x labels sideways.– plot(Duncan$income, Duncan$prestige,

xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n")

– axis(1, las=2, at=seq(0,100, by=5))

Page 12: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

More columns

• Now your collaborator wants to see how education affect this relationship. Create a dichotomous variable named ‘high_education’ categorizing education > 50 as TRUE and <= 50 as FALSE– Duncan$high_education <-

Duncan$education > 50

Page 13: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

High education: sanity check

• How many high and low education jobs are there?– table(Duncan$high_education)

• Plot education (y-axis) by high_education (x-axis)– plot(Duncan$high_education,

Duncan$education)

• Does it look right?

Page 14: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Adding color

• Okay, now color your income/prestige graph so high-education jobs are blue and low-education jobs are red

• This is a little tricky– colors <-

as.numeric(Duncan$high_education)+1– plot(Duncan$income, Duncan$prestige,

col=c("red", "blue")[colors], xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n")

– axis(1, at=seq(0,100, by=5))

Page 15: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Bar plot

• Okay, now run this code:– plot(Duncan$type, Duncan$income)

• What happened? Why didn't we get a scatterplot? Can you get one?– plot(as.numeric(Duncan$type),

Duncan$income)

Page 16: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

More than one plot at a time

• Now your collaborator wants your scatterplot and histogram side-by-side. (Don’t worry about color if you don't want to)– opar<-par()– par(mfrow=c(1,2))– hist(Duncan$income, main="Income Distribution in

Duncan Dataset")– plot(Duncan$income, Duncan$prestige,

xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n")

– axis(1, at=seq(0,100, by=5))– par(opar)

Page 17: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

ggplot

• ggplot is a whole different beast from base graphics

• ggplot is like R itself – some work to get oriented, but powerful once you do

• You don't have to know ggplot to be successful using R– But you do have to experiment with it

for this class

Page 18: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Load the ggplot library

• Hint: the package name, confusingly, is ggplot2

Page 19: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Plot income vs. prestige

• It will be easiest to start using qplot. Qplot mimics plot(), but uses the ggplot layout engine.– qplot(Duncan$income,

Duncan$prestige)

Page 20: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

ggplot

• qplot is the training wheels version of ggplot

• ggplot's syntax takes some getting used to. Try this:– ggplot(Duncan) + aes(x=income,

y=prestige) + geom_point()

• Huh? What are the pluses about?

Page 21: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

ggplot syntax

• ggplot objects are weird• You execute them (like a command) to

draw their plot• But you construct them by adding options

to them• Options specify data source, data columns,

etc, resulting in code like this:• p <- ggplot(Duncan)• p <- p + aes(x=income, y=prestige)• p + geom_point()

Page 22: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Where ggplot shines

• In my opinion, it's harder to think about doing simple plots in ggplot

• But when I want to do something multi-faceted (e.g. with different colors, sizes, etc.), ggplot makes it really easy

• I use it a lot for to understand 3+-way relationships in data

Page 23: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

ggplot example (one of many)

Page 24: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

ggplot code for that example

ggplot(data=nycnames) + aes(x=as.factor(race), y=n1_013002p,

color=as.factor(nbhdarkwalk)) +geom_point(position="jitter") +scale_x_discrete(breaks=1:7, limits=1:7, name="Subject

Race", labels=c('Asian', 'Black', 'First\nPeoples', 'Pacific\nIslander', 'Non-Hispanic\nWhite', 'Other', 'Hispanic')) +

scale_color_discrete(breaks=1:4, limits=1:4, name="Neighborhood Safe After Dark", labels=c('Strongly Agree', 'Somewhat Agree', 'Somewhat disagree', 'Strongly Disagree')) +

scale_y_continuous(name="Neighborhood percent white (1km buffer)")

Page 25: Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Exercises