Power 16
description
Transcript of Power 16
1
Power 16
2
Review Post-Midterm Cumulative
3
Projects
4
Logistics Put power point slide show on a high
density floppy disk, or e-mail as an attachment, for a WINTEL machine.
Email [email protected] the slide-show as a PowerPoint attachment
5
Assignments 1. Project choice 2. Data Retrieval 3. Statistical Analysis 4. PowerPoint Presentation 5. Executive Summary 6. Technical Appendix 7. Graphics
Power_13
6
PowerPoint Presentations: Member 4 1. Introduction: Members 1 ,2 , 3
What Why How
2. Executive Summary: Member 5 3. Exploratory Data Analysis: Member 3 4. Descriptive Statistics: Member 3 5. Statistical Analysis: Member 3 6. Conclusions: Members 3 & 5 7. Technical Appendix: Table of Contents,
Member 6
7
Executive Summary and Technical Appendix
8
I. Your report should have an executive summary of one to one
and a half pages that summarizes your findings in words for a non-
technical reader. It should explain the problem being examined
from an economic perspective, i.e. it should motivate interest in the
issue on the part of the reader. Your report should explain how you
are investigating the issue, in simple language. It should explain
why you are approaching the problem in this particular fashion.
Your executive report should explain the economic importance of
your findings.
The technical details of your findings you can attach as an
appendix.
9
Technical Appendix Table of Contents Spreadsheet of data used and sources or
if extensive, a subsample of the data Descriptive Statistics and Histograms for
the variables in the study If time series data, a plot of each variable
against time If relevant, plot of the dependent Vs.
each of the explanatory variables
10
Technical Appendix (Cont.) Statistical Results, for example regression Plot of the actual, fitted and error and other
diagnostics Brief summary of the conclusions,
meanings drawn from the exploratory, descriptive, and statistical analysis.
11
Post-Midterm Review Project I: Power 16 Contingency Table Analysis: Power 14, Lab
8 ANOVA: Power 15, Lab 9 Survival Analysis: Power 12, Power 11, Lab
7 Multi-variate Regression: Power 11 , Lab 6
12
Slide Show Challenger disaster
13
Project I Number of O-Rings Failing On Launch i:
yi(#) = a + b*tempi + ei
Biased because of zeros, even if divide equation by 6
Two Ways to Proceed Tobit, non-linear estimation: yi(#) = a + b*tempi + ei
Bernoulli variable: probability models
Probability Models: yi(0,1) = a + b*tempi + ei
14
Project I (Cont.) Probability Models: yi(0,1) = a + b*tempi + ei
OLS, Linear Probability Model, linear approximation to the sigmoid
Probit, non-linear estimate of the sigmoid Logit, non-linear estimate of the sigmoid
Significant Dependence on Temperature t-test (or z-test) on slope, H0 : b=0 F-test Wald test
15
Project I (Cont.) Plots of Number or Probability Vs Temp.
Label the axes
Answer all parts, a-f The most frequent sins
Did not explicitly address significance Did not answer b, 660 : all launches at lower
temperatures had one or more o-ring failures Did not execute c, estimate linear probability
model
16
Challenger Disaster Failure of O-rings that sealed grooves on
the booster rockets Was there any relationship between o-
ring failure and temperature? Engineers knew that the rubber o-rings
hardened and were less flexible at low temperatures
But was there launch data that showed a problem?
17
Challenger Disaster
What: Was there a relationship between launch temperature and o-ring failure prior to the Challenger disaster?
Why: Should the launch have proceeded? How: Analyze the relationship between
launch temperature and o-ring failure
18
Launches Before Challenger Data
number of o-rings that failed launch temperature
19
o-rings temperature3 531 571 581 630 660 670 670 670 680 691 70
20
o-rings temperature1 700 700 700 720 732 750 750 760 760 780 79
21
o-rings temperature
0 80
0 81
22
Exploratory Analysis Launches where there was a problem
23
1 581 571 701 631 702 753 53
Orings temperature
0.5
1.0
1.5
2.0
2.5
3.0
3.5
50 55 60 65 70 75 80
TEMP
OR
ING
S
.
25
Exploratory Analysis All Launches
Plot of failures per observation versus temperature range shows temperature dependence:
Mean temperature for the 7 launches with o-ring failures was lower, 63.7, than for the 17 launches without o-ring failures,72.6. -
Contingency table analysis
26
Launches and O-Ring Failures (Yes/No)
Fail: Yes Fail: No Column Totals
53-62 F 3
0 3
63-71 F 3 8 11
72-81 F 1 9 10
Row Totals 7 17 24
27
Launches and O-Ring Failures (Yes/No) Expected/Observed
Fail: Yes Fail: No Column Totals
53-62 F 0.875/3 2.125/0 3
63-71 F 3.208/3 7.792/8 11
72-81 F 2.917/1 7.083/9 10
Row Totals 7 17 24
28
Launches and O-Ring Failures Chi-Square, 2dof=9.08, crit(=0.05)=6 Fail: Yes Fail: No Column Totals
53-62 F 5.16
2.125 3
63-71 F 0.013 0.005 11
72-81 F 1.26 0.519 10
Row Totals 7 17 24
0
1
2
3
4
50 60 70 80 90
TEMP
OR
ING
S
Number of O-ring Failures Vs. Temperature
30
Probability Models
-0.2
0
0.2
0.4
0.6
0.8
1
30 40 50 60 70 80 90
Temperature
Pro
bab
ilit
y Bernoulli
LPM Fitted
Probit Fitted
Logit Extrapolated to 31F: Probit extrapolated to 31F:
31
Number of Failed O-Rings
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
30 40 50 60 70 80 90
Temperature
Nu
mb
er
Number of Failed O-Rings
OLS Fitted
Tobit Fitted
Extrapolating OLS to 31F: OLS: Tobit:
32
Conclusions From extrapolating the probability models
to 31 F, Linear Probability, Probit, or Logit, there was a high probability of one or more o-rings failing
From extrapolating the Number of O-rings failing to 31 F, OLS or Tobit, 3 or more o-rings would fail.
There had been only one launch out of 24 where as many as 3 o-rings had failed.
Decision theory argument: expected cost/benefit ratio:
33
Conclusions Decision theory argument: expected
cost/benefit ratio:
34
Ways to Analyze Challenger
Difference in mean temperatures for failures and successes
Difference in probability of one or more o-ring failures for high and low temperature ranges
Probabilty models: LPM (OLS), probit, logitNumber of o-ring failure per launch Vs. Temp.
OLS, TobitContingency table analysisANOVA
35
Contingency Table Analysis Challenger example
36
Launches and O-Ring Failures (Yes/No)
Fail: Yes Fail: No Column Totals
53-62 F 3
0 3
63-71 F 3 8 11
72-81 F 1 9 10
Row Totals 7 17 24
37
ANOVA and O-Rings Probability one or more o-rings fail
Low temp: 53-62 degrees Medium temp: 63-71 degrees High temp: 72-81 degrees
Average number of o-rings failing per launch Low temp: 53-62 degrees Medium temp: 63-71 degrees High temp: 72-81 degrees
38
Probability one or more o-rings fails
39
Number of o-rings failing per launch
40
41
Outline ANOVA and Regression (Non-Parametric Statistics) (Goodman Log-Linear Model)
42
Anova and Regression: One-Way Salesaj =
c(1)*convenience+c(2)*quality+c(3)*price+ e E[salesaj/(convenience=1, quality=0, price=0)]
=c(1) = mean for city(1) c(1) = mean for city(1) (convenience) c(2) = mean for city(2) (quality) c(3) = mean for city(3) (price) Test the null hypothesis that the means are equal
using a Wald test: c(1) = c(2) = c(3)
43
One-Way ANOVA and Regression
Regression Coefficients are the City Means; F statistic
44
Anova and Regression: One-WayAlternative Specification Salesaj = c(1) +
c(2)*convenience+c(3)*quality+e E[Salesaj/(convenience=0, quality=0)] =
c(1) = mean for city(3) (price, the omitted one)
E[Salesaj/(convenience=1, quality=0)] = c(1) + c(2) = mean for city(1) (convenience) c(1) = mean for city(3), the omitted city c(2) = mean for city(1) minus mean for city(3) Test that the mean for city(1) = mean for city(3) Using the t-statistic for c(2)
45
Anova and Regression: One-WayAlternative Specification Salesaj = c(1) +
c(2)*convenience+c(3)*price+e E[Salesaj/(convenience=0, price=0)] = c(1)
= mean for city(2) (quality, the omitted one) E[Salesaj/(convenience=1, price=0)] = c(1)
+ c(2) = mean for city(1) (convenience) c(1) = mean for city(2), the omitted city c(2) = mean for city(1) minus mean for city(2) Test that the mean for city(1) = mean for city(2) Using the t-statistic for c(2)
46
ANOVA and Regression: Two-WaySeries of Regressions; Compare to Table 11, Lecture 15
Salesaj = c(1) + c(2)*convenience + c(3)* quality + c(4)*television + c(5)*convenience*television + c(6)*quality*television + e, SSR=501,136.7
Salesaj = c(1) + c(2)*convenience + c(3)* quality + c(4)*television + e, SSR=502,746.3
Test for interaction effect: F2, 54 = [(502746.3-501136.7)/2]/(501136.7/54) = (1609.6/2)/9280.3 = 0.09
Table 11: 2-Way ANOVA of Apple Juice Sales
Source of Variation Sum of Squares Degrees of
Freedom
Mean Square
Explained(between
treatments)
ESS =
Strategy ESS(Strat) = 98838.6 (a-1) = 2 49419.3
Medium ESS(Med) = 13172.0 (b-1) = 1 13172.0
Interaction ESS(I) = 1609.6 (a-1)(b-1) = 2 804.8
Unexplained(within
treatments)
USS = 501136.7 (n-ab) = 60 – 6
= 54
9280.3
Total TSS = 614756.98 (n-1) = 59
Table of Two-Way ANOVA for Apple Juice Sales
48
ANOVA and Regression: Two-WaySeries of Regressions
Salesaj = c(1) + c(2)*convenience + c(3)* quality + e, SSR=515,918.3
Test for media effect: F1, 54 = [(515918.3-502746.3)/1]/(501136.7/54) = 13172/9280.3 = 1.42
Salesaj = c(1) +e, SSR = 614757 Test for strategy effect: F2, 54 = [(614757-
515918.3)/2]/(501136.7/54) = (98838.7/2)/(9280.3) = 5.32
49
Survival Analysis Density, f(t) Cumulative distribution function, CDF, F(t)
Probability you failed up to time t* =F(t*) Survivor Function, S(t) = 1-F(t)
Probability you survived longer than t*, S(t*) Kaplan-Meier estimates: (#at risk- # ending)/# at risk
Applications Testing a new drug
50
Chemotherapy Drug Taxol Current standard for ovarian cancer is
taxol and a platinate such as cisplatin Previous standard was
cyclophosphamide and cisplatin Kaplan-Meier Survival curves comparing
the two regimens Lab 7: ( # at risk- #ending)/# at riak
51
Taxol ( Bristol-Myers Squibb) interrupts cell division (mitosis)
It is a cyclical hydrocarbon
52
Top Panel: EuropeanCanadian and Scottish,342 at risk for Tc, 292 Survived 1 year
Bottom Panel:Gynecological Oncology Group, 196 at riskFor Tc, 168 survived1 year
53
2003 Final
54
Nonparametric Statistics What to do when the sample of
observations is not distributed normally?
55
3 Nonparametric Techniques Wilcoxon Rank Sum Test for independent
samples Data Analysis Plus
Signs Test for Matched Pairs: Rated Data Eviews, Descriptive Statistics
Wilcoxon Signed Rank Sum Test for Matched Pairs: Quantitative Data Eviews
56
Wilcoxon Rank Sum Test for Independent Samples
Testing the difference between the means of two populations when they are non-normal
A New Painkiller Vs. Aspirin, Xm17-02
57
Rating scheme
Score Legend
5 Extremely Effective
4 Quite Effective
3 Somewhat Effective
2 Slightly Effective
1 Not At All Effective
58
New Drug Aspirin3 45 14 33 22 45 11 34 45 23 23 25 45 35 4
Ratings
59
Rank the 30 Ratings 30 total ratings for both samples 3 ratings of 1 5 ratings of 2 etc
60
Rating Raw Rank Rank/Ties1 1 21 2 21 3 22 4 62 5 62 6 62 7 62 8 63 9 123 10 123 11 123 12 123 13 123 14 12
3 15 12
61
Rating Raw Rank Rank/Ties4 16 19.54 17 19.54 18 19.54 19 19.54 20 19.54 21 19.54 22 19.54 23 19.55 24 275 25 275 26 275 27 275 28 275 29 27
5 30 27
continued
62
Drug Rate Rank Asp. Rate Rank3 12 4 19.55 27 1 24 19.5 3 123 12 2 62 6 4 19.55 27 1 21 2 3 124 19.5 4 19.55 27 2 63 12 2 63 12 2 65 27 4 19.55 27 3 125 27 4 19.5
4 19.5 5 27
Rank Sum 276.5 188.5
63
Rank Sum, T E (T )= n1 (n1 + n2 + 1)/2 = 15*31/2 = 232.5
VAR (T) = n1 * n2 (n1 + n2 + 1)/12
VAR (T) = 15*31/12 , T = 24.1 For sample sizes larger than 10, T is normal Z = [T-E(T)]/ T = (276.5 - 232.5)/24.1 = 1.83 Null Hypothesis is that the central tendency
for the two drugs is the same Alternative hypothesis: central tendency for
the new drug is greater than for aspirin: 1-tailed test
0.0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Z
FR
EQ
UE
NC
Y
Figure 1: One-Tailed Test, 5% Level, Normal Distribution
1.645
5%