What factors are most responsible for height? Outcome = (Model) + Error.
-
Upload
lewis-richardson -
Category
Documents
-
view
217 -
download
0
Transcript of What factors are most responsible for height? Outcome = (Model) + Error.
What factors are most responsible for height?
Outcome = (Model) + Error
Analytics & History: 1st Regression Line
The first “Regression Line”
Galton’s Notebook on Families & Height
X1 X2 X3 Y
Galton’s Family Height Dataset
> getwd()[1] "C:/Users/johnp_000/Documents"
> setwd()
Dataset Input
Function FilenameObject
h <- read.csv("GaltonFamilies.csv")
str() summary()
Data Types: Numbers and Factors/Categorical
Outline
• One Variable: Univariate• Dependent / Outcome Variable
• Two Variables: Bivariate• Outcome and each Predictor
• All Four Variables: Multivariate
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
Frequency Distribution, Histogram
hist(h$child)
Area = 1
Density Plot
plot(density(h$childHeight))
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)
Mode, Bimodal
Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%
Hadley Wickham
Asst. Professor of Statistics at Rice University
ggplot2plyrreshaperggobiprofr
Industries / Organizations Creating and Using R
http://ggplot2.org/
ggplot2library(ggplot2)h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency")h.gg + geom_density()
ggplot2h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right")h.gg + geom_density() + labs(x = "Height", y = "Frequency")h.gg + geom_density(aes(fill=factor(gender)), size=2)
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
Correlation and Regression
1. Calculate the difference between the mean and each person’s score for the first variable (x).
2. Calculate the difference between the mean and their value for the second variable (y).
3. Multiply these “error” values.4. Add these values to get the cross product deviations.5. The covariance is the average of cross-product deviations
Covariance
1cov( , ) i ix x y y
Nx y
1cov( , ) i ix x y y
Nx y
Covariance
Y
X
Persons 2,3, and 5 look to have similar magnitudes from their means
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
-4-3-2-1012345
254417
441021418221
4)4)(62()2)(60()1)(41()2)(41()3)(40(
1))((
)cov(
.
.....
.....N
yyxxy,x ii
Covariance
• Calculate the error [deviation] between the mean and each subject’s score for the first variable (x).
• Calculate the error [deviation] between the mean and their score for the second variable (y).
• Multiply these error values.• Add these values and you get the cross product deviations.• The covariance is the average cross-product deviations:
• Covariance depends upon the units of measurement• Normalize the data• Divide by the standard deviations of both variables.
• The standardized version of covariance is known as the correlation coefficient
Standardizing the Covariance
Correlation
?cor
cor(h$father, h$child)
0.2660385
Scatterplot Matrix: pairs()
Correlations Matrix library(car) scatterplotMatrix(heights)
ggplot2
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
Box Plot
Children’s Height vs. Genderboxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")
Descriptive Stats: Box Plot
69.23
64.10
5.13 ======
Subset Malesmen<- subset(h, gender=='male')
Subset Femaleswomen <- subset(h, gender==‘female')
Children’s Height: Males
qqnorm(men$childHeight)qqline(men$childHeight)
hist(men$childHeight)
Children’s Height: Females
qqnorm(women$child)qqline(women$child)
hist(women$child)
ggplot2 library(ggplot2)h.bb <- ggplot(h, aes(factor(gender), child))h.bb + geom_boxplot()h.bb + geom_boxplot(aes(fill = factor(gender)))