CHAPTER 16 THE FURTHER DATA ANALYSIS

49
CHAPTER 16 THE FURTHER DATA ANALYSIS

description

CHAPTER 16 THE FURTHER DATA ANALYSIS. 16.1 Introduction. 16.2FURTHER DATA ANALYSIS: (MEASURED V ATTRIBUTE). FDA is procedure that enables a decision to be made, based on the sample evidence: There is no relationship There is a relationship - PowerPoint PPT Presentation

Transcript of CHAPTER 16 THE FURTHER DATA ANALYSIS

Page 1: CHAPTER 16    THE FURTHER DATA ANALYSIS

CHAPTER 16 THE FURTHER DATA ANALYSIS

Page 2: CHAPTER 16    THE FURTHER DATA ANALYSIS

16.1 Introduction

Page 3: CHAPTER 16    THE FURTHER DATA ANALYSIS

16.2FURTHER DATA ANALYSIS: (MEASURED

V ATTRIBUTE) FDA is procedure that enables a decision to

be made, based on the sample evidence: There is no relationship There is a relationship

These statistical procedures are called hypothesis tests

Page 4: CHAPTER 16    THE FURTHER DATA ANALYSIS

Hypothesis A statement about a population developed for

purpose of testing. Hypothesis tests

A Procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement.

Four stages of hypothesis tests Stage 1: Specifying the hypotheses. Stage 2: Defining the test parameters and the

decision rule. Stage 3: Examining the sample evidence. Stage 4: The conclusions.

Page 5: CHAPTER 16    THE FURTHER DATA ANALYSIS

FDA for Measured v Attribute requires two different hypotheses tests Two levels of attribute explanatory variable three or more levels of attribute

explanatory variable

Page 6: CHAPTER 16    THE FURTHER DATA ANALYSIS

16.3 HYPOTHESIS TEST 1 Measured Response v Attribute Explanatory Variable with exactly two levels

Illustrative Example Response Variable: AMOUNT Spent on Clothes per

month Attribute Explanatory Variable GENDER

(Male/Female) If Males and Females have the same 'spending on

clothes' characteristics then the average amounts spent monthly by Males and by Female should be the same.

If Male and Females have different 'spending on clothes' characteristics then the average amount spent monthly by Males and Female would be different.

Page 7: CHAPTER 16    THE FURTHER DATA ANALYSIS

Total population can be split into two or more sub-populations according to the level of the attribute, a population of Males and a population of Females.

Page 8: CHAPTER 16    THE FURTHER DATA ANALYSIS

POPULATION MEANS THE SAME

Page 9: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 1: Specifying the hypotheses. NULL HYPOTHESIS:

ALTERNATIVE HYPOTHESIS

100 : H

101 : H

Page 10: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 2: The Decision Rule Results of IDA for Illustrative Example Outcome 1

Male Mean = £45 (Stand Dev = £20)Female Mean = £55 (Stand Dev = £20)Noenough evidence to form a clear judgement FDA is required.

Page 11: CHAPTER 16    THE FURTHER DATA ANALYSIS

Outcome 2Male Mean = £45 (Stand Dev = £10)Female Mean = £55 (Stand Dev = £10) The widths of the boxes would lead to the decision

from the I.D.A. that there is definitely a link.

Page 12: CHAPTER 16    THE FURTHER DATA ANALYSIS

Outcome 3Male Mean = £45 (Stand Dev = £40)Female Mean = £55 (Stand Dev = £40) FDA is required and Stand Dev is bigger

Page 13: CHAPTER 16    THE FURTHER DATA ANALYSIS

Measure of Relative Separation of the boxplots Considering not only MEANS but also STANDA

RD DEVIATIONof the two samples Finding “Threshold value”

If Measure of Relative Separation > Threshold value, there is a connection If Measure of Relative Separation < Threshold value there is no connection

Page 14: CHAPTER 16    THE FURTHER DATA ANALYSIS

Student's t Ratio (a measure of the relative separation of the boxplots )Sample data is Normal distributionStudent’s t-testtcalc --- value of t-ratio

2

22

1

21

21

ns

ns

XXtcalc

Page 15: CHAPTER 16    THE FURTHER DATA ANALYSIS

Bigger |tcalc| Larger SeparationOutcome2 >Outcome 1>Outcome3Set up decision rule

Page 16: CHAPTER 16    THE FURTHER DATA ANALYSIS

Decision RuleIf tcalc value is numerically between the range - tcri

t & + tcrit then the decision rule is flagging H0 Supporting the viewpoint that there is no relationship

If tcalc value is numerically outside the range - tcrit & + tcrit then the decision rule is flagging H1 Supporting the viewpoint that there is a relationship.

Value of tcrit

Depending upon the sample size, through a measure called Degrees of Freedom(DF)

Could be looked up in the tables.

Page 17: CHAPTER 16    THE FURTHER DATA ANALYSIS

The hypothesis test described above is called the student's t test and is a two tailed test using the 5% level of significance.

Formally the level of significance may be defined as the chance the tester is prepared to take in coming to the wrong conclusion about H0

Page 18: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 3: Doing the calculations If tcalc value is numerically between the ran

ge - tTable & + tTable then the decision rule is flagging H0 There is no relationship

If tcalc value is numerically outside the range - tTable & + tTable then the decision rule is flagging H1 There is a relationship

Page 19: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 4: The conclusions In terms of the original business problem

specification For example, On the basis of the sample

evidence there is evidence to suggest that there is a link between the amount spent on clothes and gender, Males on average spend about £45 per month and females spend on average £55.

Page 20: CHAPTER 16    THE FURTHER DATA ANALYSIS

Worked Example CREDIT IDA

Page 21: CHAPTER 16    THE FURTHER DATA ANALYSIS

FDA Stage 1: Define the hypotheses:

0--true average amount borrowed on credit for house owners

1--true average amount borrowed on credit for non house owners}

100 : H

101 : H

Page 22: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 2: Defining the test parameters and the decision ruleStudent’s t-test

Page 23: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 3: Examining the sample evidence MINITAB to do the calculations on the sampl

e data

tTable = 1.96 tcalc = -4.51 lies outside the range -1.96 to 1.9

6, reject H0 , accept H1

Page 24: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 4: The conclusions. Based on the sample evidence there is a

connection between Amount Borrowed on Credit and House-ownership. On average house owners borrow £869.5 and non house owners borrow £1009.00.

Page 25: CHAPTER 16    THE FURTHER DATA ANALYSIS

16.4 HYPOTHESIS TEST 2: Measured Response v Attribute Explanatory Variable with

three or more levels For example

Response variable: amount spent in a supermarket Explanatory Variable: the customer's marital status--four

categories, Single, Married, Divorced, or widowed The common data analysis methodology applies and has

the following three stages: Initial Data Analysis Further Data Analysis Describing the Relationship

Page 26: CHAPTER 16    THE FURTHER DATA ANALYSIS

Example 1: No evidence of a connection.

Example 2: Some degree of separation Measure of relative separation

Page 27: CHAPTER 16    THE FURTHER DATA ANALYSIS

Hypothesis Test--Four stages Stage 1:Specifying the hypotheses. Stage 2:Defining the test parameters and

the decision rule. Stage 3:Examining the sample evidence. Stage 4:The conclusions.

Page 28: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 1: Specifying the hypotheses. By definition if there is no connection then

all the population means are equal, whilst if there is a connection at least on of the means must be different,

Null hypotheses

Alternative hypotheses

43210 : H

different ismean on least at :1H

Page 29: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 2: Defining the test parameters and the decision rule. Decision rule: based on F-Ratio. Test procedure: Oneway Analysis of Variance ANalysis Of VAriance : ANOVA Fcrit is the particular value of F that split the area un

der the distribution in the proportions 95%/5%.

Page 30: CHAPTER 16    THE FURTHER DATA ANALYSIS

Decision ruleIf the value of Fcalc is between 0 and Fcrit then co

nclude that there is no linkIf the value of Fcalc is greater than Fcrit then concl

ude that on the basis of the sample evidence there is a link.

Page 31: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 3:Examining the sample evidence

Example1: Fcalc would be small. The F-Ratio is defined in such a way that if the

null hypothesis is true, i.e. all the means are equal then Fcalc is expected to be 1.

Example 2Fcalc measures the relative separationwider the separation, larger Fcalc value

Page 32: CHAPTER 16    THE FURTHER DATA ANALYSIS

To find Threshold Value: Fcrit

For F-Ratio: two degrees of freedom(depends on sample siz

e)Look up the statistical tables: Ftable

Suppose:Fcalc

= 8.91 The degrees of freedom as (3, 80) Then Ftable=2.72

Page 33: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 4:The conclusions. Since the value of Ftable is larger than the va

lue of Fcalc the conclusion is that on the basis of the sample evidence, there is enough evidence to suggest that there is a link between amount spent by customers in a supermarket and the customer's marital status. The remaining issue is to describe the connection.

Page 34: CHAPTER 16    THE FURTHER DATA ANALYSIS

Worked Example CREDIT data scenario

Question: The explanatory variable 'REGION' influence the

response variable 'CREDIT'? The amount borrowed on credit is dependent upon the

region of the country where the customer lives?

Page 35: CHAPTER 16    THE FURTHER DATA ANALYSIS

IDA

Page 36: CHAPTER 16    THE FURTHER DATA ANALYSIS

FDA Stage 1:Specifying the hypotheses.

Stage 2: Defining the test parameters and the decision rule.

543210 : Hdifferent ismean on least at :1H

Page 37: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 3:Examining the sample evidenceMINITAB—ANOVA—ONE WAY

Analysis of Variance for CREDIT Source DF SS MS F PREGION 4 3445125 861281 5.10 0.0Error 649 109631953 168924 Total 653 113077078

Ftable=2.39Since Fcalc= 5.10 > Ftable=2.39 , the sample evide

nce is indicating a link between "Amount borrowed on credit" and "The region the customer lives in"

Page 38: CHAPTER 16    THE FURTHER DATA ANALYSIS

Stage 4:The conclusions

Examination of the average values shows London to be the region with the highest amount on credit, then the South-West and South-East with similar average credits; the North having the lowest amount on credit.

REGION AMOUNT

SOUTH-WEST £977.10

SOUTH-EAST £958.40

LONDON £1061.80

MIDLANDS £898.10

NORTH £864.30

Page 39: CHAPTER 16    THE FURTHER DATA ANALYSIS

Examine diagram displaying the 95% confidence intervals for each level of the attribute variable

Interpretation:The decision rule is that if the confidence limits

don't overlap then there is a real difference in the sample means for the two levels of the attribute.

For example Region 3 London has an average amount on credit that is statistically significantly larger than average amount on credit for Regions 4, The Midlands, because the two confidence limits don't overlap.

Page 40: CHAPTER 16    THE FURTHER DATA ANALYSIS

The final description of the link can be summarised, as the amount spent on credit in London is significantly higher than in the Midlands and the North.

  level 2 level 3 level 4 level 5

level 1 No Difference No Difference No Difference No Difference

level 2   No Difference No Difference No Difference

level 3     Difference Difference

level 4       No Difference

Page 41: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 42: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 43: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 44: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 45: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 46: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 47: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 48: CHAPTER 16    THE FURTHER DATA ANALYSIS
Page 49: CHAPTER 16    THE FURTHER DATA ANALYSIS