Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type...

27
Maps and Time Series Stat 579 Heike Hofmann

Transcript of Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type...

Page 1: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Maps and Time Series Stat 579

Heike Hofmann

Page 2: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Outline

• Melting and Casting

• Maps: polygons, chloropleth

• Time series

Page 3: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Warm-up

• Start R and load data ‘fbi’ from http://www.hofroe.net/stat579/crimes-2012.csv

• This data set contains number of crimes by type for each state in the U.S.

• Investigate which states have the highest number of crimes (almost independently of type)

• Pick one state and crime type and plot a time series

Page 4: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

getting ready for loops

• Let’s concentrate on the years since 2000

• Pick a state and fit a model (use lm) in the number of Burglaries over time (i.e. lm(Burglary~Year) )

• Save the resulting object. Investigate it with your poking and prodding functions.

• Extract the coefficients (mean and slope) from the model

• Repeat for another state.

• How can we extract coefficients for all states?

Page 5: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

• Want to run the same block of code multiple times:

!

!

!

• Loop or iteration

for (i in allstates) { onestate <- subset(fbi, state==i & Year >= 2000) model <- lm(Burglary~Year, data=onestate)! print(coef(model))}

Iterations

output

block of commands

Page 6: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Why should we avoid loops?

• speed of for-loops still is an issue

• main reason: lots of error-prone householding chores before and after the ‘meat’

Page 7: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

fbi exploration

• Plot scatterplot of population size against number of violent crimes in 2012. What is your conclusion? How do things change in 2011?

• Plot population against number of burglaries in 2012. What is your conclusion there?

• What should we rather look at?

Page 8: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Reshaping Data

• Two step process:

• get data into a “convenient” shape, i.e. one that is particularly flexible

• cast data into new shape(s) that are better suited for analysis

melt

cast

Page 9: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

• id.vars: all identifiers (keys) and qualitative variables

• measure.vars: all quantitative variables

key X1X2X3X4X5

X1

X2

X3

X4

X5

key

molten form “long & skinny”

original data

melt.data.frame(data, id.vars, measure.vars, na.rm = F, ...)"

id.vars measure.vars

Page 10: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Casting

• Function castdcast(dataset, rows ~ columns, aggregate)

aggregate(data)rows

columns

Data aggregation sometimes is just a transformation

Page 11: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

> fbi.melt <- melt(fbi, id.vars=c("State","Abbr","Population"), measure.vars=4:12)!!> head(fbi.melt) State Abbr Population variable value1 Alabama AL 4708708 Violent.crime 211792 Alaska AK 698473 Violent.crime 44213 Arizona AZ 6595778 Violent.crime 269294 Arkansas AR 2889450 Violent.crime 149595 California CA 36961664 Violent.crime 1744596 Colorado CO 5024748 Violent.crime 16976!> tail(fbi.melt) State Abbr Population variable value445 Vermont VT 621760 Motor.vehicle.theft 448446 Virginia VA 7882590 Motor.vehicle.theft 11419447 Washington WA 6664195 Motor.vehicle.theft 23680448 West Virginia WV 1819777 Motor.vehicle.theft 2741449 Wisconsin WI 5654774 Motor.vehicle.theft 8926450 Wyoming WY 544270 Motor.vehicle.theft 771!!> summary(fbi.melt) State Abbr Population variable value Alabama : 9 AK : 9 Min. : 544270 Violent.crime : 50 Min. : 7 Alaska : 9 AL : 9 1st Qu.: 1796619 Murder.and.nonnegligent.manslaughter: 50 1st Qu.: 1536 Arizona : 9 AR : 9 Median : 4403094 Forcible.rape : 50 Median : 11056 Arkansas : 9 AZ : 9 Mean : 6128138 Robbery : 50 Mean : 47124 California: 9 CA : 9 3rd Qu.: 6664195 Aggravated.assault : 50 3rd Qu.: 37964 Colorado : 9 CO : 9 Max. :36961664 Property.crime : 50 Max. :1009614 (Other) :396 (Other):396 (Other) :150

Page 12: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Incidences are now easy to compute:

•fbi.melt$irate <- fbi.melt$value/fbi.melt$Population

Page 13: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Recreate this chart of incidence rates

count

reord

er(

Sta

te, irate

)

South DakotaNorth Dakota

Idaho

New HampshireNew York

New JerseyMaine

Vermont

PennsylvaniaIowa

ConnecticutVirginiaMontana

Massachusetts

KentuckyWest VirginiaRhode Island

Wisconsin

WyomingMinnesota

ColoradoNebraska

CaliforniaOregonIllinois

MississippiMichiganIndiana

Utah

Alaska

OhioKansas

Nevada

MarylandMissouri

Hawaii

Arizona

Delaware

WashingtonNorth Carolina

OklahomaGeorgiaAlabama

Arkansas

New Mexico

Louisiana

Tennessee

Florida

Texas

South Carolina

Murder.and.nonnegligent.manslaughter

0 1000200030004000

Forcible.rape

0 1000200030004000

Robbery

0 1000200030004000

Motor.vehicle.theft

0 1000200030004000

Aggravated.assault

0 1000200030004000

Violent.crime

0 1000200030004000

Burglary

0 1000200030004000

Larceny.theft

0 1000200030004000

Property.crime

0 1000200030004000

Page 14: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Then, cast

• Row variables, column variables, and a summary function (sum, mean, max, etc)

•dcast(molten, row ~ col, summary)"

•dcast(molten, row1 + row2 ~ col, summary)"

•dcast(molten, row ~ . , summary)"

•dcast(molten, . ~ col, summary)

Page 15: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Casting

• Using dcast:

• find the number of all offenses in 2009

• find the number of offenses by type of crime

• find the number of all offenses by state

Page 16: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

What is a map?

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Set of points specifying latitude and longitude

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Polygon: connect dots in correct order

Page 17: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

long

lat

30

35

40

-95 -90 -85

What is a map?

long

lat

30

35

40

-95 -90 -85

Polygon: connect only the correct dots

Page 18: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Grouping

• Use parameter group to connect the “right” dots (need to create grouping sometimes)

Page 19: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

lat

30

35

40

45

qplot(long, lat, geom="point", data=states)

qplot(long, lat, geom="path", data=states, group=group)

qplot(long, lat, geom="polygon", data=states, group=group, fill=region)

qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)

Page 20: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Practice

• Using the maps package, pull out map data for all US countiescounties <- map_data(“county”)

• Draw a map of counties (polygons & path geom)

• Colour all counties called “story”

• Advanced: What county names are used often?

Page 21: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Merging Data

• Merging data from different datasets:

merge(x, y, by = intersect(names(x), names(y))," by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all," sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)"

states.fbi <- merge(states, fbi.cast, by.x="", by.y="Abbr")

e.g.:

Page 22: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Merging Data• Merging data from different datasets:

regionalabama

...

...

X1 X2 region X3alabamaalabamaalabama

...

...

...

region

X1 X2 X3alabama

alabama

alabama

Page 23: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Practice

• Merge the fbi crime data and the map of the States

• Plot Chloropleth maps of crimes.

• Describe the patterns that you see.

!

• Advanced: try to cluster the states according to crime rates (use hclust)

Page 24: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Time Series

Page 25: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

• 24 x 24 grid across Central America

• satellite captured data: temperature, near surface temperature (surftemp) pressure, ozone, cloud coverage: low (cloudlow) medium (cloudmid) high (cloudhigh)

• for each location monthly averages for Jan 1995 to Dec 2000

NASA Meteorological Data

Gridx 1 to 24

Gri

dy 1

to

24

Page 26: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

What is a Time Series?

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

for each location multiple measurements

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

connected by a line

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

but only connect the right points

qplot(time, temperature, geom="point", data=subset(nasa, (x==1) & (y==1)))

qplot(time, temperature, geom="line", data=subset(nasa, (x==1) & (y==1)))

qplot(time, temperature, geom="line", data=subset(nasa, (x==1) & (y %in% c(1,15))), group=y)

Page 27: Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type and plot a time series. ... Indiana Utah Alaska Ohio ... data=states, group=group,

Practice

• For each location, draw a time series for pressure. What do you expect? Are there surprising values? Which are they?

• Plot near surface temperatures for each locationWhich locations show the highest range in temperatures? Which locations show the highest overall increase in temperatures?

use ddply to get these summaries