12 adv-manip

Hadley Wickham

Stat405Advanced data manipulation 2

Thursday, 30 September 2010

1. Colloquium Monday

2. String basics

3. Group-wise transformations

4. Practice challenges

Colloquium onSummer research experiences

When: Monday, 4:00 - 5:00Where: DH 1070

Coffee and Cookies will be served ahead of time

1. James Rigby -summer institute in bioinformatics

2. Gabi Quart - internship at Deutche Bank

3. Liz Jackson - Survey methodology summer program

4. Ollie McDonald - internship at Novartis in Switzerland

5. Christine Peterson - research in the Med center

6. Joseph Egbulefu - research at Rice

Speakers

String basics

install.packages("stringr")library(stringr)

str_length("Hadley")str_c(letters, LETTERS)str_c(letters, LETTERS, sep = " ")str_c("H", "a", "d", "l", "e", "y", collapse = "")

tolower("Hadley")toupper("Hadley")

str_sub("Hadley", 1, 3)str_sub("Hadley", -1)

Add new columns that give:

The length of each name

The first letter of the name

The last letter of the name

Your turn

library(stringr)

bnames <- read.csv("baby-names2.csv.bz2", stringsAsFactors = FALSE)

bnames <- transform(bnames, length = str_length(name), first = str_sub(name, 1, 1), last = str_sub(name, -1, -1))

Explore how the average length of names has changed over time (for each sex)

Your turn

library(plyr)sy <- ddply(bnames, c("sex", "year"), summarise, avg_length = weighted.mean(length, prop))

library(ggplot2)qplot(year, avg_length, data = sy, colour = sex, geom = "line")

# Another approachsyl <- ddply(bnames, c("sex", "length", "year"), summarise, prop = sum(prop))qplot(year, prop, data = syl, colour = sex, geom = "line") + facet_wrap(~ length)

Transformations

What about group-wise transformations? e.g. what if we want to compute the rank of a name within a sex and year?

This task is easy if we have a single year & sex, but hard otherwise.

Transformations

What about group-wise transformations? e.g. what if we want to compute the rank of a name within a sex and year?

This task is easy if we have a single year & sex, but hard otherwise.

Transformations

How would you do it for a single group?Thursday, 30 September 2010

one <- subset(bnames, sex == "boy" & year == 2008)one$rank <- rank(-one$prop, ties.method = "first")

# orone <- transform(one, rank = rank(-prop, ties.method = "first"))head(one)

What if we want to transform every sex and year?

1. Extract a single group

2. Figure out how to solve it for just that group

3. Use ddply to solve it for all groups

Workflow

1. Extract a single group

2. Figure out how to solve it for just that group

3. Use ddply to solve it for all groups

Workflow

How would you use ddply to calculate all ranks?Thursday, 30 September 2010

bnames <- ddply(bnames, c("sex", "year"), transform, rank = rank(-prop, ties.method = "first"))

ddply + transform = group-wise transformation

ddply + summarise = per-group summaries

ddply + subset = per-group subsets

You know have all the tools to solve 95% of data manipulation problems in R. It’s just a matter of figuring out which tools to use, and how to combine them.

The following challenges will give you some practice.

Challenges

Warmups

Which names were most popular in 1999?

Work out the average proportion for each name.

List the 10 names with the highest average proportions.

# Which names were most popular in 1999?subset(bnames, year == 1999 & rank < 10)subset(bnames, year == 1999 & prop == max(prop))

# Average usageoverall <- ddply(bnames, "name", summarise, prop = mean(prop))

# Top 10 nameshead(arrange(overall, desc(prop)), 10)

How has the total proportion of babies with names in the top 1000 changed over time?

How has the popularity of different initials changed over time?

Challenge 1

sy <- ddply(bnames, c("year","sex"), summarise, prop = sum(prop), npop = sum(prop > 1/1000))

qplot(year, prop, data = sy, colour = sex, geom = "line")qplot(year, npop, data = sy, colour = sex, geom = "line")

Challenge 2

For each name, find the year in which it was most popular, and the rank in that year. (Hint: you might find which.max useful).

Print all names that have been the most popular name at least once.

most_pop <- ddply(bnames, "name", summarise, year = year[which.max(prop)], rank = min(rank))most_pop <- ddply(bnames, "name", subset, prop == max(prop))

subset(most_pop, rank == 1)

# Double challenge: Why is this last one wrong?

Challenge 3

What name has been in the top 10 most often?

(Hint: you'll have to do this in three steps. Think about what they are before starting)

top10 <- subset(bnames, rank <= 10)counts <- count(top10, c("sex", "name"))

ddply(counts, "sex", subset, freq == max(freq))head(arrange(counts, desc(freq)), 10)

No homework this week.

Use what you’ve learned to make your projects even better!

Homework

12 adv-manip

Documents

Transcript of 12 adv-manip

Ch. 11 &12 deck intro adv spring 2015 final

Supplementary Material: On the Detection of Digital Face …cvlab.cse.msu.edu/pdfs/deep_fake_supp.pdf · ID Manip. 99:43 3:11 65:16 77:76 EXP Manip. 99:40 3:40 71:23 80:87 Attr. Manip.

2D Collisions Physics 12 Adv.

Adv. No. 12/2019, Cat No. 12, Engineering Drawing Instructor, …€¦ · Adv. No. 12/2019, Cat No. 12, Engineering Drawing Instructor, SKIL DEVELOPMENT AND INDUSTRIAL TRAINING DEPARTMENT,

Adv. No. 12/2019, Cat No. 26, Welder (Gas & Electric ...

Adv. No. 12/2019, Cat No. 24, Apprenticeship Instructor ......Adv. No. 12/2019, Cat No. 24, Apprenticeship Instructor, SKIL DEVELOPMENT AND INDUSTRIAL TRAINING DEPARTMENT, HARYANA

Adv. Alg. Ch. 8 Test Reviews Answer Keys (11-12)

Frontloaders MANIP’ MPower · Working with the pallet-stacker is thus done in entire safety. Ringed articulations, mounted on Silent block type washers All the shafts on the Manip’

Solar System, Kepler and Universal Gravitation Physics 12 Adv.

Adv. No. 12/2019, Cat No. 45, Draughtsman (Mechanical ......Adv. No. 12/2019, Cat No. 45, Draughtsman (Mechanical) Instructor (Theory), SKIL DEVELOPMENT AND INDUSTRIAL TRAINING DEPARTMENT,

Adv. No. 12/2019, Cat No. 21, Millwright Mechanic ......Adv. No. 12/2019, Cat No. 21, Millwright Mechanic (Mechanical) Instructor (Theory), SKIL DEVELOPMENT AND INDUSTRIAL TRAINING

R33Y9999N00A0167 - ISICOManual Medicine approache described by Robert Maigne. 12 Significant improvement was found in all three intervention groups, with spinal manip- ulation providing

JRN 440 Adv. Online Journalism Copyright, trademark, public domain Monday, 3/12/12.

Adv. No. 12/2019, Cat No. 19, Millwright Mechanic ......Adv. No. 12/2019, Cat No. 19, Millwright Mechanic (Electrical/Electronic) Instructor (Theory), SKIL DEVELOPMENT AND INDUSTRIAL

Break Even adv - Mobile Adv

Adv. No. 12/2019, Cat No. 50, Machinist Instructor (Theory ... · Adv. No. 12/2019, Cat No. 50, Machinist Instructor (Theory), SKIL DEVELOPMENT AND INDUSTRIAL TRAINING DEPARTMENT,

12/03408/ADV, DIGITAL MEDIA SCREEN TO SHOPPING CENTRE PUBLIC

Adv is culture is adv

Adv. Biores., Vol 12 (3) May 2021: 216-228 Advances ...

Adv. No. 12/2019, Cat No. 12, Engineering Drawing …...Adv. No. 12/2019, Cat No. 12, Engineering Drawing Instructor, SKIL DEVELOPMENT AND INDUSTRIAL TRAINING DEPARTMENT, HARYANA Afternoon