Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type...

Maps and Time Series Stat 579

Heike Hofmann

Outline

• Melting and Casting

• Maps: polygons, chloropleth

• Time series

Warm-up

• Start R and load data ‘fbi’ from http://www.hofroe.net/stat579/crimes-2012.csv

• This data set contains number of crimes by type for each state in the U.S.

• Investigate which states have the highest number of crimes (almost independently of type)

• Pick one state and crime type and plot a time series

getting ready for loops

• Let’s concentrate on the years since 2000

• Pick a state and fit a model (use lm) in the number of Burglaries over time (i.e. lm(Burglary~Year) )

• Save the resulting object. Investigate it with your poking and prodding functions.

• Extract the coefficients (mean and slope) from the model

• Repeat for another state.

• How can we extract coefficients for all states?

• Want to run the same block of code multiple times:

!

!

!

• Loop or iteration

for (i in allstates) { onestate <- subset(fbi, state==i & Year >= 2000) model <- lm(Burglary~Year, data=onestate)! print(coef(model))}

Iterations

output

block of commands

Why should we avoid loops?

• speed of for-loops still is an issue

• main reason: lots of error-prone householding chores before and after the ‘meat’

fbi exploration

• Plot scatterplot of population size against number of violent crimes in 2012. What is your conclusion? How do things change in 2011?

• Plot population against number of burglaries in 2012. What is your conclusion there?

• What should we rather look at?

Reshaping Data

• Two step process:

• get data into a “convenient” shape, i.e. one that is particularly flexible

• cast data into new shape(s) that are better suited for analysis

melt

cast

• id.vars: all identifiers (keys) and qualitative variables

• measure.vars: all quantitative variables

key X1X2X3X4X5

X1

X2

X3

X4

X5

key

molten form “long & skinny”

original data

melt.data.frame(data, id.vars, measure.vars, na.rm = F, ...)"

id.vars measure.vars

Casting

• Function castdcast(dataset, rows ~ columns, aggregate)

aggregate(data)rows

columns

Data aggregation sometimes is just a transformation

> fbi.melt <- melt(fbi, id.vars=c("State","Abbr","Population"), measure.vars=4:12)!!> head(fbi.melt) State Abbr Population variable value1 Alabama AL 4708708 Violent.crime 211792 Alaska AK 698473 Violent.crime 44213 Arizona AZ 6595778 Violent.crime 269294 Arkansas AR 2889450 Violent.crime 149595 California CA 36961664 Violent.crime 1744596 Colorado CO 5024748 Violent.crime 16976!> tail(fbi.melt) State Abbr Population variable value445 Vermont VT 621760 Motor.vehicle.theft 448446 Virginia VA 7882590 Motor.vehicle.theft 11419447 Washington WA 6664195 Motor.vehicle.theft 23680448 West Virginia WV 1819777 Motor.vehicle.theft 2741449 Wisconsin WI 5654774 Motor.vehicle.theft 8926450 Wyoming WY 544270 Motor.vehicle.theft 771!!> summary(fbi.melt) State Abbr Population variable value Alabama : 9 AK : 9 Min. : 544270 Violent.crime : 50 Min. : 7 Alaska : 9 AL : 9 1st Qu.: 1796619 Murder.and.nonnegligent.manslaughter: 50 1st Qu.: 1536 Arizona : 9 AR : 9 Median : 4403094 Forcible.rape : 50 Median : 11056 Arkansas : 9 AZ : 9 Mean : 6128138 Robbery : 50 Mean : 47124 California: 9 CA : 9 3rd Qu.: 6664195 Aggravated.assault : 50 3rd Qu.: 37964 Colorado : 9 CO : 9 Max. :36961664 Property.crime : 50 Max. :1009614 (Other) :396 (Other):396 (Other) :150

Incidences are now easy to compute:

•fbi.melt$irate <- fbi.melt$value/fbi.melt$Population

Recreate this chart of incidence rates

count

reord

er(

Sta

te, irate

)

South DakotaNorth Dakota

Idaho

New HampshireNew York

New JerseyMaine

Vermont

PennsylvaniaIowa

ConnecticutVirginiaMontana

Massachusetts

KentuckyWest VirginiaRhode Island

Wisconsin

WyomingMinnesota

ColoradoNebraska

CaliforniaOregonIllinois

MississippiMichiganIndiana

Utah

Alaska

OhioKansas

Nevada

MarylandMissouri

Hawaii

Arizona

Delaware

WashingtonNorth Carolina

OklahomaGeorgiaAlabama

Arkansas

New Mexico

Louisiana

Tennessee

Florida

Texas

South Carolina

Murder.and.nonnegligent.manslaughter

0 1000200030004000

Forcible.rape

0 1000200030004000

Robbery

0 1000200030004000

Motor.vehicle.theft

0 1000200030004000

Aggravated.assault

0 1000200030004000

Violent.crime

0 1000200030004000

Burglary

0 1000200030004000

Larceny.theft

0 1000200030004000

Property.crime

0 1000200030004000

Then, cast

• Row variables, column variables, and a summary function (sum, mean, max, etc)

•dcast(molten, row ~ col, summary)"

•dcast(molten, row1 + row2 ~ col, summary)"

•dcast(molten, row ~ . , summary)"

•dcast(molten, . ~ col, summary)

Casting

• Using dcast:

• find the number of all offenses in 2009

• find the number of offenses by type of crime

• find the number of all offenses by state

What is a map?

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Set of points specifying latitude and longitude

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Polygon: connect dots in correct order

long

lat

30

35

40

-95 -90 -85

What is a map?

long

lat

30

35

40

-95 -90 -85

Polygon: connect only the correct dots

Grouping

• Use parameter group to connect the “right” dots (need to create grouping sometimes)

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

lat

30

35

40

45

qplot(long, lat, geom="point", data=states)

qplot(long, lat, geom="path", data=states, group=group)

qplot(long, lat, geom="polygon", data=states, group=group, fill=region)

qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)

Practice

• Using the maps package, pull out map data for all US countiescounties <- map_data(“county”)

• Draw a map of counties (polygons & path geom)

• Colour all counties called “story”

• Advanced: What county names are used often?

Merging Data

• Merging data from different datasets:

merge(x, y, by = intersect(names(x), names(y))," by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all," sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)"

states.fbi <- merge(states, fbi.cast, by.x="", by.y="Abbr")

e.g.:

Merging Data• Merging data from different datasets:

regionalabama

...

...

X1 X2 region X3alabamaalabamaalabama

...

...

...

region

X1 X2 X3alabama

alabama

alabama

Practice

• Merge the fbi crime data and the map of the States

• Plot Chloropleth maps of crimes.

• Describe the patterns that you see.

!

• Advanced: try to cluster the states according to crime rates (use hclust)

Time Series

• 24 x 24 grid across Central America

• satellite captured data: temperature, near surface temperature (surftemp) pressure, ozone, cloud coverage: low (cloudlow) medium (cloudmid) high (cloudhigh)

• for each location monthly averages for Jan 1995 to Dec 2000

NASA Meteorological Data

Gridx 1 to 24

Gri

dy 1

to

24

What is a Time Series?

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

for each location multiple measurements

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

connected by a line

TimeIndx

ts

275

280

285

290

295

300

305

10 20 30 40 50 60 70

but only connect the right points

qplot(time, temperature, geom="point", data=subset(nasa, (x==1) & (y==1)))

qplot(time, temperature, geom="line", data=subset(nasa, (x==1) & (y==1)))

qplot(time, temperature, geom="line", data=subset(nasa, (x==1) & (y %in% c(1,15))), group=y)

Practice

• For each location, draw a time series for pressure. What do you expect? Are there surprising values? Which are they?

• Plot near surface temperatures for each locationWhich locations show the highest range in temperatures? Which locations show the highest overall increase in temperatures?

use ddply to get these summaries

Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type...

Documents

Transcript of Maps and Time Series - hofroe.nethofroe.net/stat579/08-maps.pdf · •Pick one state and crime type...