Lecture6 Applied Econometrics and Economic Modeling

Concepts in Hypothesis Testing

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information

The manager of Pepperoni Pizza Restaurant has recently begun experimenting with a new method of baking its pepperoni pizzas.

He believes that the new method produces a better-tasting pizza, but he would like to base a decision on whether to switch from the old method to the new method on customer reactions.

Therefore he performs an experiment.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

The Experiment

For 100 randomly selected customers who order a pepperoni pizza for home delivery, he includes both an old style and a free new style pizza in the order.

All he asks is that these customers rate the difference between pizzas on a -10 to +10 scale, where -10 means they strongly favor the old style, +10 means they strongly favor the new style, and 0 means they are indifferent between the two styles.

Once he gets the ratings from the customers, how should he proceed?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Hypothesis Testing This example’s goal is to explain hypothesis testing

concepts. We are not implying that the manager would, or should, use a hypothesis testing procedure to decide whether to switch methods.

First, hypothesis testing does not take costs into account. In this example, if the new method is more costly it would be ignored by hypothesis testing.

Second, even if costs of the two pizza-making methods are equivalent, the manager might base his decision on a simple point estimate and possibly a confidence interval.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Null and Alternative Hypotheses

Usually, the null hypothesis is labeled Ho and the alternative hypothesis is labeled Ha.

The null and alternative hypotheses divide all possibilities into two nonoverlapping sets, exactly one of which must be true.

Traditionally, hypotheses testing has been phrased as a decision-making problem, where an analyst decides either to accept the null hypothesis or reject it, based on the sample evidence.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

One-Tailed Versus Two-Tailed Tests The form of the alternative hypothesis can be either a

one-tailed or two-tailed, depending on what the analyst is trying to prove.

A one-tailed hypothesis is one where the only sample results which can lead to rejection of the null hypothesis are those in a particular direction, namely, those where the sample mean rating is positive.

A two-tailed test is one where results in either of two directions can lead to rejection of the null hypothesis.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

One-Tailed Versus Two-Tailed Tests -- continued Once the hypotheses are set up, it is easy to detect

whether the test is one-tailed or two-tailed.

One tailed alternatives are phrased in terms of “>” or “<“ whereas two tailed alternatives are phrased in terms of “”

The real question is whether to set up hypotheses for a particular problem as one-tailed or two-tailed.

There is no statistical answer to this question. It depends entirely on what we are trying to prove.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Types of Errors

Whether or not one decides to accept or reject the null hypothesis, it might be the wrong decision.

One might reject the null hypothesis when it is true or incorrectly accept the null hypothesis when it is false.

These errors are called type I and type II errors.

In general we incorrectly reject a null hypothesis that is true. We commit a type II error when we incorrectly accept a null hypothesis that is false.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Types of Errors -- continued

These ideas appear graphically below.

While these errors seem to be equally serious, actually type I errors have traditionally been regarded as the more serious of the two.

Therefore, the hypothesis-testing procedure factors caution in terms of rejecting the null hypothesis.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Significance Level and Rejection Region The real question is how strong the evidence in favor

of the alternative hypothesis must be to reject the null hypothesis.

The analyst determines the probability of a type I error that he is willing to tolerate. The value is denoted by and is most commonly equal to 0.05, although sigma=0.01 and sigma=0.10 are also frequently used.

The value of is called the significance level of the test.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Significance Level and Rejection Region -- continued Then, given the value of sigma, we use statistical

theory to determine the rejection region.

If the sample falls into this region we reject the null hypothesis; otherwise, we accept it.

Sample evidence that falls into the rejection region is called statistically significant at the sigma level.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Significance from p-values

This approach is currently more popular than the significance level and rejected region approach.

This approach is to avoid the use of the level and instead simply report “how significant” the sample evidence is.

We do this by means of the p-value.The p-value is the probability of seeing a random sample at least as extreme as the sample observes, given that the null hypothesis is true.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Significance from p-values -- continued Here “extreme” is relative to the null hypothesis.

In general smaller p-values indicate more evidence in support of the alternative hypothesis. If a p-value is sufficiently small, almost any decision maker will conclude that rejecting the null hypothesis is the more “reasonable” decision.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Significance from p-values -- continued How small is a “small” p-value? This is largely a

matter of semantics but if the

– p-value is less than 0.01, it provides “convincing” evidence that the alternative hypothesis is true;

– p-value is between 0.01 and 0.05, there is “strong” evidence in favor of the alternative hypothesis;

– p-value is between 0.05 and 0.10, it is in a “gray area”;

– p-values greater than 0.10 are interpreted as weak or no evidence in support of the alternative.

Hypothesis Tests for a Population Mean

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


Recall that the manager of the Pepperoni Pizza Restaurant is running an experiment to test the hypotheses of H0: mu 0 versus Ha: mu> 0, where is the mean rating in the entire customer population.

Here, each customer rates the difference between an old-style pizza and a new-style pizza on a -10 to +10 scale, where negative ratings favor the old-style pizza and positive ratings favor the new-style pizza.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

PIZZA1.XLS

The ratings of 40 randomly selected customers and several summary statistics appear in this file and in the following table.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Summary Statistics From the summary statistics, we see that the sample

mean is 2.10 and the sample standard deviation is 4.717.

The positive sample mean provides some evidence in favor of the alternative hypothesis, but given the rather large standard deviation and the boxplot of ratings shown on the next slide does it provide enough evidence to reject H0?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Summary Statistics -- continued

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Running the Test

To run the test, we calculate the test statistic, using the borderline null hypothesis value mu0 = 0, and report how much probability is beyond it in the right tail of the appropriate t distribution.

We use the right tail because the alternative is one-tailed of the “greater than” variety.

The test statistic is

816.240/717.4

010.2

valuet

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Running the Test -- continued The probability beyond this value in the right tail of the t

distribution with n-1 = 39 degrees of freedom is approximately 0.004, which can be found in Excel with the function TDIST(2.816,39,1).

The probability, 0.004, is the p-value for the test. It indicates that these sample results would be very unlikely if the null hypothesis is true.

The manager has two choices: he can conclude that the null hypothesis is true or he can conclude that the alternative hypothesis is true - and presumably switch to the new-style pizza. The second choice appears to be more reasonable.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Using StatPro

Another way to interpret the results is in terms of traditional significance levels but the p-value is the preferred method.

The StatPro One-Sample procedure can be used to perform this analysis easily. To use it select the StatPro/Statistical Inference/One-Sample Analysis menu item, and choose the Rating variable as the variable to analyze.

Then fill in the dialog boxes as shown here.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

One-Sample Dialog Box

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Hypothesis Test Dialog Box

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

The Results

Most of this output should be familiar; it mirrors the previous calculations.

The results are significant at the 1% level.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Conclusion

Should the manager switch to the new-style pizza on the basis of these sample results?

We would probably recommend “yes”. There is no indication that the new-style pizza costs any more to make than the old-style pizza, and the sample evidence is fairly convincing that customers, on average, will prefer the new-style pizza.

Therefore, unless there are reasons for not switching (for example, costs) then we recommend the switch.

Hypothesis Tests for a Population Mean

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


Assume that the manager of the Pepperoni Pizza Restaurant currently uses two methods of producing pepperoni pizzas.

He plans to discontinue one of these methods if the results of a survey indicate that customers favor one of the methods by a “significant” margin.

The survey is conducted exactly as in the previous example.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information - continued Each of 40 randomly selected customers receives

two pizzas, one made by each method.

These customers are asked to rate the pizzas on a scale of -10 to +10, where negative ratings favor the first method and positive ratings favor the second method.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

PIZZA2.XLS The results of the survey appear in this file.

Is there enough evidence in this sample data to persuade the manager to discontinue one of the methods?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Formulating the Hypotheses

We now rite the hypotheses as H0:mu=0 versus Ha: mu0, where is the mean rating over the entire customer population.

A two-tailed alternative is appropriate here because the manager has no idea, before the sample is taken, which method (if either) will be favored.

It is not appropriate to look at the sample and decide to use the one-tailed variety because of the negative mean. The hypotheses should always be formulated before the same data are observed.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Test & Findings

To run the test use StatPros One-Sample Analysis procedure. Select to do an hypotheses test on the mean.

The small p-value provides convincing evidence for the manager that there is a difference, on average, between customer reactions to the two methods of making pizzas.

On average, customers appear to favor the new method of making pizzas.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

In Conclusion Should the manager discontinue the (evidently) less popular

second method on the basis of this hypothesis test?

The answer almost certainly depends on the costs that have not been mentioned.

The primary reason for discontinuing one of the methods is presumably to save costs by using only one production instead of two.

It appears that on average the population favors the first method but the data show that a good-sized minority favors the second method.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

In Conclusion -- continued So why not continue to use the second method if the

cost is not prohibitive?

The company could easily achieve greater overall profit by continuing to make pizzas by both methods than by discontinuing the slightly less popular method.

Once again, hypothesis testing provides useful information but the decision should be based on a careful cost analysis.

Hypothesis Tests for Other Parameters

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


The Walpole Appliance Company has a customer service department that handles customer questions and complaints.

This department's processes are set up to respond quickly and accurately to customers who phone in their concerns. However, there is a sizable minority of customers who prefer to write letters.

Traditionally, the customer service department has not been very efficient in responding to these customers.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information -- continued Letter writers first receive a mail-gram asking them to

call customer service; and when they do call, the customer service representative who answers the phone typically has no knowledge of the customer’s problem.

As a result, the department manager estimates that 15% of the letter writers have not obtained a satisfactory response within 30 days of the time their letters were first received.

The manager’s goal is to reduce this value by at least half, that is, to 7.5% or less.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information -- continued To do so, she changes the process for responding to

letter writers. Under the new process, these customers now receive a prompt and courteous form letter that responds to their problem.

Each form letter states that if the customer still has problems, he or she can call the department.

The manager also files the original letters so that if customers do call back, the representative will be able to find their letters quickly and respond intelligently.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information -- continued With this new process in place, the manager has

tracked 400 letter writers and has found that only 23 of them are classified as “unsatisfied” after a 30-day period.

Does it appear that the manager has achieved her goal?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Solution The manager’s goal is to reduce the proportion of

unsatisfied customers after 30 days from 0.15 to 0.075 or less.

Because the burden of proof is on her to “prove” that she has accomplished this goal, we set up the hypotheses as Ho: p > = 0.075 versus Ha: p < 0.075, where p is the proportion of all the letter writers who are still unsatisfied after 30 days.

The sample proportion she has observed is 0.0575. This is obviously less than 0.075, but is it enough less to reject he null hypothesis?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

LETTERS.XLS The test statistic for the data, using the borderline

value p0=0.075 is

This value appears in cell B10 of this file which contains the analysis of the new process.

329.1400/)075.01(075.0

075.00575.0

valuez

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Solution -- continued We find the denominator in cell B8 with the formula

=SQRT(HypProp*(1-HyProp/SampSize)

The corresponding p-value, 0.092 is found with the formula =NORMSDIST(TestStat) in cell B11.

It is the probability to to left of -1.329 in the standard normal distribution.

Also, because np0 = 400(0.075)=30>5 and n(1-p0)= 400(0.925)>5, this test is valid; that is, the sample size is large enough for the normal approximation to hold.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Results

The p-value might not be as low as you expected - or as low as the manager would like.

In spite of the fact that the sample proportion appears to be well below the target proportion of 0.075, the evidence in support of the alternative hypothesis is not overwhelming.

In statistical terminology, the results are significant at the 10% level, but not at the 5% or 1% level.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Results -- continued The 95% confidence interval extends from 0.035 to 0.080.

It includes the target value, 0.075, but just barely. In this sense it also supports the argument that the manager has indeed achieved her goal.

Analysts might disagree on whether a hypothesis test or a confidence interval is the more appropriate way to present these results. However, we see them as complementary and do not necessarily favor one over the other.

The bottom line is that they both provide strong, but not totally conclusive, evidence that the manager has achieved her goal.

Hypothesis Test for Differences Between

Population Proportions

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information The ArmCo Company, a large manufacturer of

automobile parts, has several plants in the United States.

For years ArmCo employees have complained that their suggestions for improvements in the manufacturing processes are ignored by upper management.

In the spirit of employee empowerment, ArmCo management at the Midwest plant decided to initiate a number of policies to respond to employee suggestions.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information -- continued No such initiatives were taken at the other ArmCo plants.

As expected, there was a great deal of employee enthusiasm at the Midwest plant shortly after the new policies were implemented, but the question was whether life would revert to normal and enthusiasm would dampen with time.

To check this, 100 randomly selected employees at the Midwest plant and 300 employees from other plants were asked to fill out a questionnaire 6 months after the implementation of the new policies .

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information - continued Employees were instructed to respond to each item

on the questionnaire by checking either a “yes” box or a “no” box.

Two specific items on the questionnaire were

– Management at this plant is generally responsive to employee suggestions or improvements in the manufacturing processes.

– Management at this plant is more responsive to employees suggestions now than it used to be.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

EMPOWER1.XLS The results of the questionnaire for these two items

appear in this file and in rows 5 and 6 of the table below.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Questions

Does it appear that the policies at the Midwest plant are appreciated?

Should ArmCo implement these policies in other plants?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Solution

For either questionnaire item we let p1 be the proportion of “yes” responses we would obtain at the Midwest plant if the questionnaire were given to all its employees.

We define p2 similarly for the other plants.

Management certainly hopes to find a larger proportion of “yes” responses (to either item), with the hypotheses set up H0:p1 - p2 < = 0 versus Ha: p1 - p2 > 0.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Solution -- continued The data from this type of questionnaire is usually given

as counts of “yes” and “no” responses, but these translate into sample proportions.

For the first questionnaire item, the sample proportions of “yes” responses are 0.39 and 0.31 for a difference of 0.08. The standard error of this difference, under the assumption that p1 = p2, uses the pooled proportion equal to 0.33. This produces a standard error of 0.054, calculated in cell B13 with the formula

=SQRT(PooledProp*(1-PooledProp) *(1/SampSize1+1/SampSize2))

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Solution -- continued

Then the test statistic is 1.473, and the corresponding p-value for the test is the probability to the right of 1.473 in the standard normal distribution. Its value is 0.070 found in cell B15 with the formula

=1-NORMDIST(TestStat)

A similar analysis for the second questionnaire item leads to a sample difference of 0.15 and a p-value of 0.004.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Results

These results should be fairly good news for management.

There is moderate, but not overwhelming, support for the hypothesis that management at the Midwest plant is more responsive than at the other plants, at least as perceived by employees.

There is convincing support for the hypothesis that things have improved more at the Midwest plant than at the other plants.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Results -- continued Corresponding 95% confidence intervals for the difference

between proportions appear in rows 21 and 22.

Since they are almost completely positive, they reinforce the hypothesis-test findings.

Moreover, they provide a range of plausible values for the differences between the population proportions.

The only real downside to these findings, from Midwest management’s point of view, is the sample proportion for the first item.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Results -- continued

Only 39% of the sampled employees at that plant believe that management generally responds to their suggestions, even though 68% believe things are better than they used to be.

A reasonable conclusion by ArmCo management is that they are on the right track at the Midwest plant, and the policies initiated there ought to be initiated at other plants, but more still needs to be done at all plants.

Tests for Normality

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


A company manufactures strips of metal that are supposed to have a width of 10 centimeters.

For purposes of quality control, the manager plans to run some statistical tests on these strips.

However, realizing that these statistical procedures assume normally distributed widths, he first tests this normality assumption on 90 randomly sampled strips.

How should he proceed?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

NORMTEST.XLS

The sample data appear in this file and in the table below.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Summary Measures

A number of summary measures also appear in the table.

These summary measures help the manager to select “reasonable” categories for a histogram of the data.

After observing them, the manager chooses 10 categories for the histogram. The extreme categories are “less than or equal to 9.980” and “greater than 10.020”, and the middle eight categories each have a length of 0.005.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

The Test The test we will be running is the chi-square goodness-

of fit.

This test involves forming a histogram of the sample data and comparing this to the expected histogram we would would observe if the data were normally distributed with the same mean and standard deviation as the sample.

To begin running the test in this example, we select the StatPro/Tests for Normality/Chi-Square Test menu item, which leads to the same dialog box as the histogram procedure.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

The Test -- continued

After specifying the histogram categories in the usual way we obtain the following message:

In addition we obtain the following histogram and data table:

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Resulting Histogram

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Resulting Table

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Analysis

The normal fit to the data appears to be quite good.

The message confirms this statistically.

The values in columns D and E of the table were calculated as the total number of observations multiplied by the normal probability of being in the corresponding category.

Column E contains the individual chi-square test statistic.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Analysis -- continued The corresponding p-value in cell H5, 0.814, is calculated

with the formula =CHIDIST(TestStat,7)

The large p-value provides no evidence whatsoever of nonnormality.

It implies that if we repeated the procedure on many random samples, each taken from a population known to be normal, we would obtain a fit at least this poor in about 81% of the samples.

Stated differently, only about 19% of the fits would be better than the ones we observed.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Analysis -- continued

Therefore, whatever statistical procedure the manager intends to use, he doesn’t need to worry about the normality assumption.

Chi-Square Test for Independence

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information Big Office, a chain of large office supply stores, sells an

extensive line of desktop and laptop computers.

Company executives want to know whether the demands for these two types of computers are related in any way.

The products might act as complementary products, where high demand for desktops accompanies high demand for laptops (computers in general are hot), or they might act as substitute products (demand for one takes away demand for the other), or their demand might be unrelated.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Background Information -- continued Because of limitations in its information system, Big

Office does not have the exact demands for these products.

However, it does have daily information on categories of demand, listed in aggregate (that is, over all stores).

These data appear on the next slide.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

PCDEMAND.XLS

Each day’s demand for each type of computer is categorized as Low, Medium-Low, Medium-High, or High.

The table is based on 250 days, so that the counts add to 250.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

PCDEMAND.XLS The individual counts show, for example, that

demand was high for both desktops and laptops on 11 of the 250 days.

For convenience, we include row and column totals in the margins.

Based on these data, can Big Office conclude that demands for these two products are independent?

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Chi-Square Test for Independence This test is used in situations where a population is

categorized in two different ways.

For example, we might categorize people by their smoking habits and their drinking habits. The question then is whether these two attributes are independent in a probabilistic sense.

The answer is yes if information on a person’s drinking habits is of no use in predicting the person's smoking habits (or vice versa).

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Chi-Square Test for Independence -- continued In this example however, we might suspect that these

attributes are dependent.

In particular, we might suspect that heavy drinkers are more likely to be heavy smokers, and we might suspect that nondrinkers are more likely to be nonsmokers.

The chi-square test for independence enables us to test this empirically.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Chi-Square Test for Independence -- continued The null hypothesis for the test is that the two

attributes are independent. Therefore, statistically significant results are those that indicate some sort of dependence.

The data for the test consist of counts in various combinations of categories.

We usually arrange these in a rectangular table called a contingency table, a cross-tabs, or using Excel terminology - a pivot table.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Testing the Data

The idea of the test is to compare actual counts in the table with what we would expect them to be under independence.

If the actual counts are sufficiently far from the expected counts, we can then reject the null hypothesis of independence.

The “distance” measure used to check how far apart they are is essentially the same chi-square statistic used in the chi-square test for normality.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Testing the Data -- continued What do we expect under independence?

The totals in row 9 indicate that demand for desktops was low on 38 of the 350 days. Therefore, if we had to estimate the probability of low demand for desktops, this estimate would be 38/250 = 0.152.

Now, if demands for the two products were independent, we should arrive at this same estimate from the data in any of the rows 5-8.

That is the prediction about desktops should be the same regardless of the demand for laptops.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Testing the Data -- continued

The probability estimates of low desktops from row 5, for example is 4/43 = 0.093. Similarly, for rows 6,7, and 8, it is 8/80 = 0.100, 16/70 = 0.229, and 10/57 = 0.175.

These calculations provide some evidence that desktops and laptops act as substitute products - the probability of low desktop demand is larger when laptop demand is medium-high or high than when it is low or medium-low.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Testing the Data -- continued We can perform the calculations for the test easily with

StatPro.

We use StatPro/Statistical Inference/Chi-Square Independence Test menu item.

There is only one dialog box, which asks for the range of the contingency table - not counting any labels or row or column totals that might surround the table.

In this case the relevant range has been range-named Counts.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


StatPro then provides the message shown here and it appends a sheet name ChiSqIndep that contains the calculations that are shown on the next slide.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9


10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Analysis

We interpret the p-value of the test, 0.045, in the usual way. Specifically, we can reject the null hypothesis of independence at the 5% or 10% significance level, but not at the 1% level.

There is a good bit of evidence that the demands for the two products are not independent, but it is not overwhelming.

If we accept that there is some sort of dependence, we can use the output to examine its form.

10.1a | 10.2 | 10.3 | 10.4 | 10.5 | 10.6 | 10.7 | 10.8 | 10.9

Analysis -- continued The two tables in rows 8-19 are especially helpful. If the

demands were independent, the rows of this first table should be identical, and the columns of the second table should be identical.

This is because each row in the first table shows the distribution of desktop demand for each category of laptop demand, whereas each column in the second table shows the distribution of laptop demand for each category of desktop demand.

A close study of these percentages again provides some evidence that the two products are substitutes, but the evidence is not overwhelming.

Lecture6 Applied Econometrics and Economic Modeling

Education

Transcript of Lecture6 Applied Econometrics and Economic Modeling