Power 16

1

Power 16

2

Review Post-Midterm Cumulative

3

Projects

4

Logistics Put power point slide show on a high

density floppy disk, or e-mail as an attachment, for a WINTEL machine.

Email [email protected] the slide-show as a PowerPoint attachment

5

Assignments 1. Project choice 2. Data Retrieval 3. Statistical Analysis 4. PowerPoint Presentation 5. Executive Summary 6. Technical Appendix 7. Graphics

Power_13

6

PowerPoint Presentations: Member 4 1. Introduction: Members 1 ,2 , 3

What Why How

2. Executive Summary: Member 5 3. Exploratory Data Analysis: Member 3 4. Descriptive Statistics: Member 3 5. Statistical Analysis: Member 3 6. Conclusions: Members 3 & 5 7. Technical Appendix: Table of Contents,

Member 6

7

Executive Summary and Technical Appendix

8

I. Your report should have an executive summary of one to one

and a half pages that summarizes your findings in words for a non-

technical reader. It should explain the problem being examined

from an economic perspective, i.e. it should motivate interest in the

issue on the part of the reader. Your report should explain how you

are investigating the issue, in simple language. It should explain

why you are approaching the problem in this particular fashion.

Your executive report should explain the economic importance of

your findings.

The technical details of your findings you can attach as an

appendix.

9

Technical Appendix Table of Contents Spreadsheet of data used and sources or

if extensive, a subsample of the data Descriptive Statistics and Histograms for

the variables in the study If time series data, a plot of each variable

against time If relevant, plot of the dependent Vs.

each of the explanatory variables

10

Technical Appendix (Cont.) Statistical Results, for example regression Plot of the actual, fitted and error and other

diagnostics Brief summary of the conclusions,

meanings drawn from the exploratory, descriptive, and statistical analysis.

11

Post-Midterm Review Project I: Power 16 Contingency Table Analysis: Power 14, Lab

8 ANOVA: Power 15, Lab 9 Survival Analysis: Power 12, Power 11, Lab

7 Multi-variate Regression: Power 11 , Lab 6

12

Slide Show Challenger disaster

13

Project I Number of O-Rings Failing On Launch i:

yi(#) = a + b*tempi + ei

Biased because of zeros, even if divide equation by 6

Two Ways to Proceed Tobit, non-linear estimation: yi(#) = a + b*tempi + ei

Bernoulli variable: probability models

Probability Models: yi(0,1) = a + b*tempi + ei

14

Project I (Cont.) Probability Models: yi(0,1) = a + b*tempi + ei

OLS, Linear Probability Model, linear approximation to the sigmoid

Probit, non-linear estimate of the sigmoid Logit, non-linear estimate of the sigmoid

Significant Dependence on Temperature t-test (or z-test) on slope, H0 : b=0 F-test Wald test

15

Project I (Cont.) Plots of Number or Probability Vs Temp.

Label the axes

Answer all parts, a-f The most frequent sins

Did not explicitly address significance Did not answer b, 660 : all launches at lower

temperatures had one or more o-ring failures Did not execute c, estimate linear probability

model

16

Challenger Disaster Failure of O-rings that sealed grooves on

the booster rockets Was there any relationship between o-

ring failure and temperature? Engineers knew that the rubber o-rings

hardened and were less flexible at low temperatures

But was there launch data that showed a problem?

17

Challenger Disaster

What: Was there a relationship between launch temperature and o-ring failure prior to the Challenger disaster?

Why: Should the launch have proceeded? How: Analyze the relationship between

launch temperature and o-ring failure

18

Launches Before Challenger Data

number of o-rings that failed launch temperature

19

o-rings temperature3 531 571 581 630 660 670 670 670 680 691 70

20

o-rings temperature1 700 700 700 720 732 750 750 760 760 780 79

21

o-rings temperature

0 80

0 81

22

Exploratory Analysis Launches where there was a problem

23

1 581 571 701 631 702 753 53

Orings temperature

0.5

1.0

1.5

2.0

2.5

3.0

3.5

50 55 60 65 70 75 80

TEMP

OR

ING

S

.

25

Exploratory Analysis All Launches

Plot of failures per observation versus temperature range shows temperature dependence:

Mean temperature for the 7 launches with o-ring failures was lower, 63.7, than for the 17 launches without o-ring failures,72.6. -

Contingency table analysis

26

Launches and O-Ring Failures (Yes/No)

Fail: Yes Fail: No Column Totals

53-62 F 3

0 3

63-71 F 3 8 11

72-81 F 1 9 10

Row Totals 7 17 24

27

Launches and O-Ring Failures (Yes/No) Expected/Observed


53-62 F 0.875/3 2.125/0 3

63-71 F 3.208/3 7.792/8 11

72-81 F 2.917/1 7.083/9 10

Row Totals 7 17 24

28

Launches and O-Ring Failures Chi-Square, 2dof=9.08, crit(=0.05)=6 Fail: Yes Fail: No Column Totals

53-62 F 5.16

2.125 3

63-71 F 0.013 0.005 11

72-81 F 1.26 0.519 10

Row Totals 7 17 24

0

1

2

3

4

50 60 70 80 90

TEMP

OR

ING

S

Number of O-ring Failures Vs. Temperature

30

Probability Models

-0.2

0

0.2

0.4

0.6

0.8

1

30 40 50 60 70 80 90

Temperature

Pro

bab

ilit

y Bernoulli

LPM Fitted

Probit Fitted

Logit Extrapolated to 31F: Probit extrapolated to 31F:

31

Number of Failed O-Rings

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

30 40 50 60 70 80 90

Temperature

Nu

mb

er

Number of Failed O-Rings

OLS Fitted

Tobit Fitted

Extrapolating OLS to 31F: OLS: Tobit:

32

Conclusions From extrapolating the probability models

to 31 F, Linear Probability, Probit, or Logit, there was a high probability of one or more o-rings failing

From extrapolating the Number of O-rings failing to 31 F, OLS or Tobit, 3 or more o-rings would fail.

There had been only one launch out of 24 where as many as 3 o-rings had failed.

Decision theory argument: expected cost/benefit ratio:

33

Conclusions Decision theory argument: expected

cost/benefit ratio:

34

Ways to Analyze Challenger

Difference in mean temperatures for failures and successes

Difference in probability of one or more o-ring failures for high and low temperature ranges

Probabilty models: LPM (OLS), probit, logitNumber of o-ring failure per launch Vs. Temp.

OLS, TobitContingency table analysisANOVA

35

Contingency Table Analysis Challenger example

36

Launches and O-Ring Failures (Yes/No)


53-62 F 3

0 3

63-71 F 3 8 11

72-81 F 1 9 10

Row Totals 7 17 24

37

ANOVA and O-Rings Probability one or more o-rings fail

Low temp: 53-62 degrees Medium temp: 63-71 degrees High temp: 72-81 degrees

Average number of o-rings failing per launch Low temp: 53-62 degrees Medium temp: 63-71 degrees High temp: 72-81 degrees

38

Probability one or more o-rings fails

39

Number of o-rings failing per launch

41

Outline ANOVA and Regression (Non-Parametric Statistics) (Goodman Log-Linear Model)

42

Anova and Regression: One-Way Salesaj =

c(1)*convenience+c(2)*quality+c(3)*price+ e E[salesaj/(convenience=1, quality=0, price=0)]

=c(1) = mean for city(1) c(1) = mean for city(1) (convenience) c(2) = mean for city(2) (quality) c(3) = mean for city(3) (price) Test the null hypothesis that the means are equal

using a Wald test: c(1) = c(2) = c(3)

43

One-Way ANOVA and Regression

Regression Coefficients are the City Means; F statistic

44

Anova and Regression: One-WayAlternative Specification Salesaj = c(1) +

c(2)*convenience+c(3)*quality+e E[Salesaj/(convenience=0, quality=0)] =

c(1) = mean for city(3) (price, the omitted one)

E[Salesaj/(convenience=1, quality=0)] = c(1) + c(2) = mean for city(1) (convenience) c(1) = mean for city(3), the omitted city c(2) = mean for city(1) minus mean for city(3) Test that the mean for city(1) = mean for city(3) Using the t-statistic for c(2)

45

Anova and Regression: One-WayAlternative Specification Salesaj = c(1) +

c(2)*convenience+c(3)*price+e E[Salesaj/(convenience=0, price=0)] = c(1)

= mean for city(2) (quality, the omitted one) E[Salesaj/(convenience=1, price=0)] = c(1)

+ c(2) = mean for city(1) (convenience) c(1) = mean for city(2), the omitted city c(2) = mean for city(1) minus mean for city(2) Test that the mean for city(1) = mean for city(2) Using the t-statistic for c(2)

46

ANOVA and Regression: Two-WaySeries of Regressions; Compare to Table 11, Lecture 15

Salesaj = c(1) + c(2)*convenience + c(3)* quality + c(4)*television + c(5)*convenience*television + c(6)*quality*television + e, SSR=501,136.7

Salesaj = c(1) + c(2)*convenience + c(3)* quality + c(4)*television + e, SSR=502,746.3

Test for interaction effect: F2, 54 = [(502746.3-501136.7)/2]/(501136.7/54) = (1609.6/2)/9280.3 = 0.09

Table 11: 2-Way ANOVA of Apple Juice Sales

Source of Variation Sum of Squares Degrees of

Freedom

Mean Square

Explained(between

treatments)

ESS =

Strategy ESS(Strat) = 98838.6 (a-1) = 2 49419.3

Medium ESS(Med) = 13172.0 (b-1) = 1 13172.0

Interaction ESS(I) = 1609.6 (a-1)(b-1) = 2 804.8

Unexplained(within

treatments)

USS = 501136.7 (n-ab) = 60 – 6

= 54

9280.3

Total TSS = 614756.98 (n-1) = 59

Table of Two-Way ANOVA for Apple Juice Sales

48

ANOVA and Regression: Two-WaySeries of Regressions

Salesaj = c(1) + c(2)*convenience + c(3)* quality + e, SSR=515,918.3

Test for media effect: F1, 54 = [(515918.3-502746.3)/1]/(501136.7/54) = 13172/9280.3 = 1.42

Salesaj = c(1) +e, SSR = 614757 Test for strategy effect: F2, 54 = [(614757-

515918.3)/2]/(501136.7/54) = (98838.7/2)/(9280.3) = 5.32

49

Survival Analysis Density, f(t) Cumulative distribution function, CDF, F(t)

Probability you failed up to time t* =F(t*) Survivor Function, S(t) = 1-F(t)

Probability you survived longer than t*, S(t*) Kaplan-Meier estimates: (#at risk- # ending)/# at risk

Applications Testing a new drug

50

Chemotherapy Drug Taxol Current standard for ovarian cancer is

taxol and a platinate such as cisplatin Previous standard was

cyclophosphamide and cisplatin Kaplan-Meier Survival curves comparing

the two regimens Lab 7: ( # at risk- #ending)/# at riak

51

Taxol ( Bristol-Myers Squibb) interrupts cell division (mitosis)

It is a cyclical hydrocarbon

52

Top Panel: EuropeanCanadian and Scottish,342 at risk for Tc, 292 Survived 1 year

Bottom Panel:Gynecological Oncology Group, 196 at riskFor Tc, 168 survived1 year

53

2003 Final

54

Nonparametric Statistics What to do when the sample of

observations is not distributed normally?

55

3 Nonparametric Techniques Wilcoxon Rank Sum Test for independent

samples Data Analysis Plus

Signs Test for Matched Pairs: Rated Data Eviews, Descriptive Statistics

Wilcoxon Signed Rank Sum Test for Matched Pairs: Quantitative Data Eviews

56

Wilcoxon Rank Sum Test for Independent Samples

Testing the difference between the means of two populations when they are non-normal

A New Painkiller Vs. Aspirin, Xm17-02

57

Rating scheme

Score Legend

5 Extremely Effective

4 Quite Effective

3 Somewhat Effective

2 Slightly Effective

1 Not At All Effective

58

New Drug Aspirin3 45 14 33 22 45 11 34 45 23 23 25 45 35 4

Ratings

59

Rank the 30 Ratings 30 total ratings for both samples 3 ratings of 1 5 ratings of 2 etc

60

Rating Raw Rank Rank/Ties1 1 21 2 21 3 22 4 62 5 62 6 62 7 62 8 63 9 123 10 123 11 123 12 123 13 123 14 12

3 15 12

61

Rating Raw Rank Rank/Ties4 16 19.54 17 19.54 18 19.54 19 19.54 20 19.54 21 19.54 22 19.54 23 19.55 24 275 25 275 26 275 27 275 28 275 29 27

5 30 27

continued

62

Drug Rate Rank Asp. Rate Rank3 12 4 19.55 27 1 24 19.5 3 123 12 2 62 6 4 19.55 27 1 21 2 3 124 19.5 4 19.55 27 2 63 12 2 63 12 2 65 27 4 19.55 27 3 125 27 4 19.5

4 19.5 5 27

Rank Sum 276.5 188.5

63

Rank Sum, T E (T )= n1 (n1 + n2 + 1)/2 = 15*31/2 = 232.5

VAR (T) = n1 * n2 (n1 + n2 + 1)/12

VAR (T) = 15*31/12 , T = 24.1 For sample sizes larger than 10, T is normal Z = [T-E(T)]/ T = (276.5 - 232.5)/24.1 = 1.83 Null Hypothesis is that the central tendency

for the two drugs is the same Alternative hypothesis: central tendency for

the new drug is greater than for aspirin: 1-tailed test

0.0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Z

FR

EQ

UE

NC

Y

Figure 1: One-Tailed Test, 5% Level, Normal Distribution

1.645

5%

Power 16

Documents

Transcript of Power 16