Exam Feb 28: sets 1,2

62
Exam Feb 28: sets 1,2 • Set 1 due Thurs • Memo C-1 due Feb 14 • Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs

description

Exam Feb 28: sets 1,2. Set 1 due Thurs Memo C-1 due Feb 14 - PowerPoint PPT Presentation

Transcript of Exam Feb 28: sets 1,2

Page 1: Exam Feb 28: sets 1,2

Exam Feb 28: sets 1,2

• Set 1 due Thurs

• Memo C-1 due Feb 14

• Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs

Page 2: Exam Feb 28: sets 1,2

Kinderman Supplement

• Ch 2: Multiple Regression

• Ch 3: Analysis of Variance

Page 3: Exam Feb 28: sets 1,2

MULTIPLE REGRESSION

Kinderman, Ch 2

Page 4: Exam Feb 28: sets 1,2

Example

• Reference: Statistics for Managers

• By Levine, David M; Berenson; Stephan

• Second edition (1999)

• Prentice Hall

Page 5: Exam Feb 28: sets 1,2

Y = dependent variable = heating oil sales (gal)

• X1 = Temperature (degrees)

• X2 = Insulation (inches)

• X1 and X2 are independent variables

• Y = bo + b1X1 + b2X2

• Enter data to Excel

• NOTE: If you can’t find Data Analysis, try Add-Ins

Page 6: Exam Feb 28: sets 1,2

Y = 562 –5X1 –20X2

Bottom table:

Coefficient Column

Page 7: Exam Feb 28: sets 1,2

Interpret coefficients

Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562

• b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons

• b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons

Page 8: Exam Feb 28: sets 1,2

Categorical Variables

• X = 0 or 1

• Example: 0 if male, 1 if female

• Example: 1 if graduate, 0 if drop out

• Example: 1 if citizen, 0 if alien

• NOTE: not in this fuel oil example

Page 9: Exam Feb 28: sets 1,2

Estimate sales if temp = 30, insulation = 6

• Y = 562 -5(30) – 20(6) = 292 gal

Page 10: Exam Feb 28: sets 1,2

Standard Error = 26Top table

• Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation

Page 11: Exam Feb 28: sets 1,2

COEFFICIENT OF MULTIPLE

DETERMINATION

• Top table, R square

• Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation

Page 12: Exam Feb 28: sets 1,2

Is there a relationship between all independent variables and

dependent variables?

• Ho: Null hypothesis: All coefficients = 0

Ho: NO Relationship

H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship

Page 13: Exam Feb 28: sets 1,2

Computer output: Sample data

• Hypotheses: Population parameters

• Ho: Parameters = 0, but sample data makes it appear that there is a relationship

• Simple regression: Ho: zero slope vs H1: slope positive or slope negative

Page 14: Exam Feb 28: sets 1,2

Exponents

• 10-1= 0.1

• 10-2 =0.01

Page 15: Exam Feb 28: sets 1,2

Decision Rule

• Reject Ho if “Significance F” < alpha

• Middle table

• Fuel oil example: Significance F = 1.6E-09

• Excel: E = Exponent

• 1.6E-09 = 1.6*10-9 =0.0000000016

• Approaches zero as limit

Page 16: Exam Feb 28: sets 1,2

Significance F=p-value

• Excel uses p-value only if t distribution

• Significance F = probability F is greater than Sample F

Page 17: Exam Feb 28: sets 1,2

Assume alpha = .05

• Since 0 < .05, reject Ho

• We conclude there IS a relationship between fuel oil sales and the independent variables

Page 18: Exam Feb 28: sets 1,2

Which independent variables seem to be important factors?

• Ho: Temperature not important factor

• H1: Temperature is important

• Reject Ho if p-value < alpha

• Bottom table: p-value column, X1 row

• P-value = 1.6E-09, or zero

• Reject Ho

• Temp is important

Page 19: Exam Feb 28: sets 1,2

Insulation

• Ho: insulation unimportant

• H1: insulation important

• P-value = 1.9E-06, or zero

• Reject Ho

• Insulation important

Page 20: Exam Feb 28: sets 1,2

Analysis of Variance (ANOVA)

Kinderman, Ch 3

Page 21: Exam Feb 28: sets 1,2

X = number of auto accidents

Live in City Live in Suburb Live in rural

1 2 1

3 0 0

2 1 0

Page 22: Exam Feb 28: sets 1,2

Hypothesis Testing

• Ho: µ1 = µ2 = µ 3

• H1: Not all means are =

• H1: There are differences among 3 populations

• H1: Average number of accidents different depending on where you live

Page 23: Exam Feb 28: sets 1,2

This course: manual calculations

• If you used computer software, you could have as many populations as needed

• Homework, exam: 3 populations

• Computer: 4 or more populations

• Ex: Ethnic classifications at CSUN

Page 24: Exam Feb 28: sets 1,2

Sample Sizes

• Column 1: n1 = number of drivers sampled from policyholders living in city = 3

• Column 2: n2 = sampled from suburban drivers = 3

• Col 3: n3 = sampled from rural = 3

• Number of rows of data

• Kinderman example: Different sample sizes

Page 25: Exam Feb 28: sets 1,2

n = n1 + n2 + n3

n =3 + 3 + 3 = 9

Page 26: Exam Feb 28: sets 1,2

X = number of auto accidents

Live in City Live in Suburb Live in rural

1=X11 2 1

3=X21 0 0

2=X31 1 0

Page 27: Exam Feb 28: sets 1,2

1

312111

1

)(n

XXXX

Page 28: Exam Feb 28: sets 1,2

3)231(

1

X

Page 29: Exam Feb 28: sets 1,2

Do not assume n1=3 on exam

Page 30: Exam Feb 28: sets 1,2

21X

Page 31: Exam Feb 28: sets 1,2

X = number of auto accidents

Live in City Live in Suburb Live in rural

1=X11 2 1

3=X21 0 0

2=X31 1 0

Σ=6 Σ=3 Σ=1

Sample mean=2 Sample mean=1 Sample mean=.3

Page 32: Exam Feb 28: sets 1,2

12 X

Page 33: Exam Feb 28: sets 1,2

3.3 X

Page 34: Exam Feb 28: sets 1,2

Hypotheses

• Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103)

• H1: Population means are different because city drivers have more accidents

Page 35: Exam Feb 28: sets 1,2

n

XX ij

..

Page 36: Exam Feb 28: sets 1,2

9136

..X

Page 37: Exam Feb 28: sets 1,2

1.1.. X

Page 38: Exam Feb 28: sets 1,2

Grand mean = 1.1

Page 39: Exam Feb 28: sets 1,2

SSB = Sum of Squares Between

• Between 3 groups• Explained Variation• Here: Variation in number of accidents

explained by where you live (city, suburb, rural)

• If where you live did not affect accidents, we would expect SSB = 0

• Next slide: SSB formula

Page 40: Exam Feb 28: sets 1,2

222 ..)3(3..)2(2..)1(1 XXnXXnXXn

Page 41: Exam Feb 28: sets 1,2

X = number of auto accidents

Live in City Live in Suburb Live in rural

1=X11 2 1

3=X21 0 0

2=X31 1 0

Σ=6 Σ=3 Σ=1

Sample mean=2 Sample mean=1 Sample mean=.3

Page 42: Exam Feb 28: sets 1,2

This example

• SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2 =4.2

Page 43: Exam Feb 28: sets 1,2

MSB = Mean Square Between

• MSB = SSB/2

• Note: OK for this course, but bigger problems would have bigger denominator

• MSB = 4.2/2 = 2.1

Page 44: Exam Feb 28: sets 1,2

SSE= Sum of Squared Error

• Variation within group

• Ex: Variation within group of city drivers

• Unexplained variation

• If every city driver had same number of accidents, we would expect SSE = 0

• Formula on next slide

Page 45: Exam Feb 28: sets 1,2

3

1 1

2)(j i

jnjXXijSSE

Page 46: Exam Feb 28: sets 1,2

22

1

2 )33()22()11( XXiXXiXXinj

i

Page 47: Exam Feb 28: sets 1,2

X = number of auto accidents

Live in City Live in Suburb Live in rural

1=X11 2 1

3=X21 0 0

2=X31 1 0

Σ=6 Σ=3 Σ=1

Sample mean=2 Sample mean=1 Sample mean=.3

Page 48: Exam Feb 28: sets 1,2

(1-2)2 +(3-2)2 +(2-2)2 +(2-1)2 + (0-1)2 + (1-1)2 +(1-.3)2 + (0-.3)2 + (0-.3)2

=4.67

Page 49: Exam Feb 28: sets 1,2

MSE = Mean Square Error

Mean Square Within

Next slide is formula for this course.

Bigger problems have bigger denominator

Page 50: Exam Feb 28: sets 1,2

3nSSE

MSE

Page 51: Exam Feb 28: sets 1,2

3967.4

Page 52: Exam Feb 28: sets 1,2

MSE = 0.78

Page 53: Exam Feb 28: sets 1,2

F RATIO

• Sample F statistic

• Test statistic

• SAM F

Page 54: Exam Feb 28: sets 1,2

MSEMSB

samF

Page 55: Exam Feb 28: sets 1,2

78.1.2samF

Page 56: Exam Feb 28: sets 1,2

Sam F = 2.7

• Extreme case#1: Where you live does not affect number of accidents, so SSB =0, so MSB = 0, so sam F = 0

• Extreme case #2: Every city driver has same number of accidents, etc, so SSE = 0, so MSE = 0, so sam F is very large

Page 57: Exam Feb 28: sets 1,2

Critical F = cr F

• F table at end of Kinderman Supplement

• Appendix A, Table A.3, p 60 in Second Edition (assumes alpha = .05)

• Column = 2 (denominator of MSB)

• Row = n – 3 (denominator of MSE)

• Correct for this course, different for bigger problems

Page 58: Exam Feb 28: sets 1,2

Example

• Col 2

• Row = 9-3 = 6

• Cr F = 5.14

Page 59: Exam Feb 28: sets 1,2

Hypothesis Testing

• Ho: µ1 = µ2 = µ 3

• H1: Not all means are =

• H1: There are differences among 3 populations

• H1: Average number of accidents different depending on where you live

Page 60: Exam Feb 28: sets 1,2

Decision Rule

• Reject Ho if sam F > cr F• Only right tail since SSB>0, SSE>0, so

sam F>0• If you reject Ho, you conclude that where

you live affects number of accidents• If you do not reject Ho, you conclude that

there is too much variation within city drivers, etc to draw any conclusions

Page 61: Exam Feb 28: sets 1,2

Example

• Since 2.7 is NOT > 5.14, we can NOT reject Ho

• Differences between city and suburb, etc are NOT significant

Page 62: Exam Feb 28: sets 1,2

Computer Approach

• Similar to multiple regression

• Reject Ho if Significance F < alpha

• Needed if more than 3 groups