Danzón no - Baton Music 1,2 Piccolo Oboe 1,2 Bassoon 1,2 Eb Clarinet Solo Clarinet B b 1,2
Exam Feb 28: sets 1,2
-
Upload
raymond-sheppard -
Category
Documents
-
view
33 -
download
0
description
Transcript of Exam Feb 28: sets 1,2
Exam Feb 28: sets 1,2
• Set 1 due Thurs
• Memo C-1 due Feb 14
• Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs
Kinderman Supplement
• Ch 2: Multiple Regression
• Ch 3: Analysis of Variance
MULTIPLE REGRESSION
Kinderman, Ch 2
Example
• Reference: Statistics for Managers
• By Levine, David M; Berenson; Stephan
• Second edition (1999)
• Prentice Hall
Y = dependent variable = heating oil sales (gal)
• X1 = Temperature (degrees)
• X2 = Insulation (inches)
• X1 and X2 are independent variables
• Y = bo + b1X1 + b2X2
• Enter data to Excel
• NOTE: If you can’t find Data Analysis, try Add-Ins
Y = 562 –5X1 –20X2
Bottom table:
Coefficient Column
Interpret coefficients
Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562
• b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons
• b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons
Categorical Variables
• X = 0 or 1
• Example: 0 if male, 1 if female
• Example: 1 if graduate, 0 if drop out
• Example: 1 if citizen, 0 if alien
• NOTE: not in this fuel oil example
Estimate sales if temp = 30, insulation = 6
• Y = 562 -5(30) – 20(6) = 292 gal
Standard Error = 26Top table
• Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation
COEFFICIENT OF MULTIPLE
DETERMINATION
• Top table, R square
• Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation
Is there a relationship between all independent variables and
dependent variables?
• Ho: Null hypothesis: All coefficients = 0
Ho: NO Relationship
H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship
Computer output: Sample data
• Hypotheses: Population parameters
• Ho: Parameters = 0, but sample data makes it appear that there is a relationship
• Simple regression: Ho: zero slope vs H1: slope positive or slope negative
Exponents
• 10-1= 0.1
• 10-2 =0.01
Decision Rule
• Reject Ho if “Significance F” < alpha
• Middle table
• Fuel oil example: Significance F = 1.6E-09
• Excel: E = Exponent
• 1.6E-09 = 1.6*10-9 =0.0000000016
• Approaches zero as limit
Significance F=p-value
• Excel uses p-value only if t distribution
• Significance F = probability F is greater than Sample F
Assume alpha = .05
• Since 0 < .05, reject Ho
• We conclude there IS a relationship between fuel oil sales and the independent variables
Which independent variables seem to be important factors?
• Ho: Temperature not important factor
• H1: Temperature is important
• Reject Ho if p-value < alpha
• Bottom table: p-value column, X1 row
• P-value = 1.6E-09, or zero
• Reject Ho
• Temp is important
Insulation
• Ho: insulation unimportant
• H1: insulation important
• P-value = 1.9E-06, or zero
• Reject Ho
• Insulation important
Analysis of Variance (ANOVA)
Kinderman, Ch 3
X = number of auto accidents
Live in City Live in Suburb Live in rural
1 2 1
3 0 0
2 1 0
Hypothesis Testing
• Ho: µ1 = µ2 = µ 3
• H1: Not all means are =
• H1: There are differences among 3 populations
• H1: Average number of accidents different depending on where you live
This course: manual calculations
• If you used computer software, you could have as many populations as needed
• Homework, exam: 3 populations
• Computer: 4 or more populations
• Ex: Ethnic classifications at CSUN
Sample Sizes
• Column 1: n1 = number of drivers sampled from policyholders living in city = 3
• Column 2: n2 = sampled from suburban drivers = 3
• Col 3: n3 = sampled from rural = 3
• Number of rows of data
• Kinderman example: Different sample sizes
n = n1 + n2 + n3
n =3 + 3 + 3 = 9
X = number of auto accidents
Live in City Live in Suburb Live in rural
1=X11 2 1
3=X21 0 0
2=X31 1 0
1
312111
1
)(n
XXXX
3)231(
1
X
Do not assume n1=3 on exam
21X
X = number of auto accidents
Live in City Live in Suburb Live in rural
1=X11 2 1
3=X21 0 0
2=X31 1 0
Σ=6 Σ=3 Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
12 X
3.3 X
Hypotheses
• Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103)
• H1: Population means are different because city drivers have more accidents
n
XX ij
..
9136
..X
1.1.. X
Grand mean = 1.1
SSB = Sum of Squares Between
• Between 3 groups• Explained Variation• Here: Variation in number of accidents
explained by where you live (city, suburb, rural)
• If where you live did not affect accidents, we would expect SSB = 0
• Next slide: SSB formula
222 ..)3(3..)2(2..)1(1 XXnXXnXXn
X = number of auto accidents
Live in City Live in Suburb Live in rural
1=X11 2 1
3=X21 0 0
2=X31 1 0
Σ=6 Σ=3 Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
This example
• SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2 =4.2
MSB = Mean Square Between
• MSB = SSB/2
• Note: OK for this course, but bigger problems would have bigger denominator
• MSB = 4.2/2 = 2.1
SSE= Sum of Squared Error
• Variation within group
• Ex: Variation within group of city drivers
• Unexplained variation
• If every city driver had same number of accidents, we would expect SSE = 0
• Formula on next slide
3
1 1
2)(j i
jnjXXijSSE
22
1
2 )33()22()11( XXiXXiXXinj
i
X = number of auto accidents
Live in City Live in Suburb Live in rural
1=X11 2 1
3=X21 0 0
2=X31 1 0
Σ=6 Σ=3 Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
(1-2)2 +(3-2)2 +(2-2)2 +(2-1)2 + (0-1)2 + (1-1)2 +(1-.3)2 + (0-.3)2 + (0-.3)2
=4.67
MSE = Mean Square Error
Mean Square Within
Next slide is formula for this course.
Bigger problems have bigger denominator
3nSSE
MSE
3967.4
MSE = 0.78
F RATIO
• Sample F statistic
• Test statistic
• SAM F
MSEMSB
samF
78.1.2samF
Sam F = 2.7
• Extreme case#1: Where you live does not affect number of accidents, so SSB =0, so MSB = 0, so sam F = 0
• Extreme case #2: Every city driver has same number of accidents, etc, so SSE = 0, so MSE = 0, so sam F is very large
Critical F = cr F
• F table at end of Kinderman Supplement
• Appendix A, Table A.3, p 60 in Second Edition (assumes alpha = .05)
• Column = 2 (denominator of MSB)
• Row = n – 3 (denominator of MSE)
• Correct for this course, different for bigger problems
Example
• Col 2
• Row = 9-3 = 6
• Cr F = 5.14
Hypothesis Testing
• Ho: µ1 = µ2 = µ 3
• H1: Not all means are =
• H1: There are differences among 3 populations
• H1: Average number of accidents different depending on where you live
Decision Rule
• Reject Ho if sam F > cr F• Only right tail since SSB>0, SSE>0, so
sam F>0• If you reject Ho, you conclude that where
you live affects number of accidents• If you do not reject Ho, you conclude that
there is too much variation within city drivers, etc to draw any conclusions
Example
• Since 2.7 is NOT > 5.14, we can NOT reject Ho
• Differences between city and suburb, etc are NOT significant
Computer Approach
• Similar to multiple regression
• Reject Ho if Significance F < alpha
• Needed if more than 3 groups