What factors are most responsible for height? Outcome = (Model) + Error.

35
What factors are most responsible for height? = (Model) +

Transcript of What factors are most responsible for height? Outcome = (Model) + Error.

Page 1: What factors are most responsible for height? Outcome = (Model) + Error.

What factors are most responsible for height?

Outcome = (Model) + Error

Page 2: What factors are most responsible for height? Outcome = (Model) + Error.

Analytics & History: 1st Regression Line

The first “Regression Line”

Page 3: What factors are most responsible for height? Outcome = (Model) + Error.

Galton’s Notebook on Families & Height

Page 4: What factors are most responsible for height? Outcome = (Model) + Error.

X1 X2 X3 Y

Galton’s Family Height Dataset

Page 5: What factors are most responsible for height? Outcome = (Model) + Error.

> getwd()[1] "C:/Users/johnp_000/Documents"

> setwd()

Page 6: What factors are most responsible for height? Outcome = (Model) + Error.

Dataset Input

Function FilenameObject

h <- read.csv("GaltonFamilies.csv")

Page 7: What factors are most responsible for height? Outcome = (Model) + Error.

str() summary()

Data Types: Numbers and Factors/Categorical

Page 8: What factors are most responsible for height? Outcome = (Model) + Error.

Outline

• One Variable: Univariate• Dependent / Outcome Variable

• Two Variables: Bivariate• Outcome and each Predictor

• All Four Variables: Multivariate

Page 9: What factors are most responsible for height? Outcome = (Model) + Error.

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 10: What factors are most responsible for height? Outcome = (Model) + Error.

Frequency Distribution, Histogram

hist(h$child)

Page 11: What factors are most responsible for height? Outcome = (Model) + Error.

Area = 1

Density Plot

plot(density(h$childHeight))

Page 12: What factors are most responsible for height? Outcome = (Model) + Error.

hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)

Mode, Bimodal

Page 13: What factors are most responsible for height? Outcome = (Model) + Error.

Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%

Hadley Wickham

Asst. Professor of Statistics at Rice University

ggplot2plyrreshaperggobiprofr

Industries / Organizations Creating and Using R

http://ggplot2.org/

Page 14: What factors are most responsible for height? Outcome = (Model) + Error.

ggplot2library(ggplot2)h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency")h.gg + geom_density()

Page 15: What factors are most responsible for height? Outcome = (Model) + Error.

ggplot2h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right")h.gg + geom_density() + labs(x = "Height", y = "Frequency")h.gg + geom_density(aes(fill=factor(gender)), size=2)

Page 16: What factors are most responsible for height? Outcome = (Model) + Error.

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 17: What factors are most responsible for height? Outcome = (Model) + Error.

Correlation and Regression

Page 18: What factors are most responsible for height? Outcome = (Model) + Error.
Page 19: What factors are most responsible for height? Outcome = (Model) + Error.

1. Calculate the difference between the mean and each person’s score for the first variable (x).

2. Calculate the difference between the mean and their value for the second variable (y).

3. Multiply these “error” values.4. Add these values to get the cross product deviations.5. The covariance is the average of cross-product deviations

Covariance

1cov( , ) i ix x y y

Nx y

Page 20: What factors are most responsible for height? Outcome = (Model) + Error.

1cov( , ) i ix x y y

Nx y

Covariance

Y

X

Persons 2,3, and 5 look to have similar magnitudes from their means

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

-4-3-2-1012345

Page 21: What factors are most responsible for height? Outcome = (Model) + Error.

254417

441021418221

4)4)(62()2)(60()1)(41()2)(41()3)(40(

1))((

)cov(

.

.....

.....N

yyxxy,x ii

Covariance

• Calculate the error [deviation] between the mean and each subject’s score for the first variable (x).

• Calculate the error [deviation] between the mean and their score for the second variable (y).

• Multiply these error values.• Add these values and you get the cross product deviations.• The covariance is the average cross-product deviations:

Page 22: What factors are most responsible for height? Outcome = (Model) + Error.

• Covariance depends upon the units of measurement• Normalize the data• Divide by the standard deviations of both variables.

• The standardized version of covariance is known as the correlation coefficient

Standardizing the Covariance

Page 23: What factors are most responsible for height? Outcome = (Model) + Error.

Correlation

?cor

cor(h$father, h$child)

0.2660385

Page 24: What factors are most responsible for height? Outcome = (Model) + Error.

Scatterplot Matrix: pairs()

Page 25: What factors are most responsible for height? Outcome = (Model) + Error.

Correlations Matrix library(car) scatterplotMatrix(heights)

Page 26: What factors are most responsible for height? Outcome = (Model) + Error.

ggplot2

Page 27: What factors are most responsible for height? Outcome = (Model) + Error.

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 28: What factors are most responsible for height? Outcome = (Model) + Error.

Box Plot

Page 29: What factors are most responsible for height? Outcome = (Model) + Error.

Children’s Height vs. Genderboxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")

Page 30: What factors are most responsible for height? Outcome = (Model) + Error.

Descriptive Stats: Box Plot

69.23

64.10

5.13 ======

Page 31: What factors are most responsible for height? Outcome = (Model) + Error.

Subset Malesmen<- subset(h, gender=='male')

Page 32: What factors are most responsible for height? Outcome = (Model) + Error.

Subset Femaleswomen <- subset(h, gender==‘female')

Page 33: What factors are most responsible for height? Outcome = (Model) + Error.

Children’s Height: Males

qqnorm(men$childHeight)qqline(men$childHeight)

hist(men$childHeight)

Page 34: What factors are most responsible for height? Outcome = (Model) + Error.

Children’s Height: Females

qqnorm(women$child)qqline(women$child)

hist(women$child)

Page 35: What factors are most responsible for height? Outcome = (Model) + Error.

ggplot2 library(ggplot2)h.bb <- ggplot(h, aes(factor(gender), child))h.bb + geom_boxplot()h.bb + geom_boxplot(aes(fill = factor(gender)))