Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test...

23
Assessing Normality and Data Transformations

Transcript of Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test...

Page 1: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Assessing Normality and Data Transformations

Page 2: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Tools for Assessing Normality

n Histogram and Boxplotn Normal Quantile Plot

(also called Normal Probability Plot)n Goodness of Fit Tests

Shapiro-Wilk Test (JMP)Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB)

Page 3: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Histograms and BoxplotsThe cholesterol levels of thepatients appear to beapproximately normal,although there is someevidence of right skewness asthe mean is larger than themedian.

The red curve represents anormal distribution fit tothese data and the blue curvethe density estimate for thesedata, these curves shouldagree if our data is normallydistributed.

Page 4: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Histograms and BoxplotsThe systolic volumes of themale heart patients in thisstudy suggest that they comefrom a right skewedpopulation distribution.

The red curve represents anormal distribution fit tothese data and the blue is theestimated density from thedata which does not agreewith the imposed normal.

Outliers are notconsistent withnormality.

Page 5: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Normal Quantile Plot

n Basically compares the spacing of our data to what we would expect to see in terms of spacing if our data were approximately normal.

If our data is approximately normallydistributed we should spacing similar towhat I attempted to show on the normalcurve on the right. Very few observationsin both tails and increasingly moreobservations as we move towards themean from either side. Also remember thespacing must be symmetric about themean.

Page 6: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Normal Quantile PlotTHE IDEALPLOT:

Here is an example where thedata is perfectly normal. Theplot on right is a normalquantile plot with the data onthe vertical axis and theexpected z-scores if our datawas normal on the horizontalaxis.

When our data isapproximately normal thespacing of the two will agreeresulting in a plot withobservations lying on thereference line in the normalquantile plot. The pointsshould lie within the dashedlines.

Page 7: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Normal Quantile PlotTHE IDEALPLOT:

Here is an example where thedata is perfectly normal. Theplot on right is a normalquantile plot with the data onthe vertical axis and theexpected z-scores if our datawas normal on the horizontalaxis.

When our data isapproximately normal thespacing of the two will agreeresulting in a plot withobservations lying on thereference line in the normalquantile plot. The pointsshould lie within the dashedlines.

Page 8: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Tests of Normality

Ho: The distribution of systolic volume is normal

HA: The distribution of systolic volume is NOT normal

Page 9: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

n A brief rundown of the strangeness associated with the Shapiro-Wilk– You fail reject the null when your observed value is greater than your critical value (that’s right, the critical region on this test is in the small tail)

– The test actually pairs observations from within the sample to determine normality

– The number of pairs is determined by nearly the same equation that you would use to determine the median

Shapiro-Wilk

Page 10: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

So How Does It Work?

n The W-Statistic:

n Recall that the variance of a sample is s2

n So really all we are required to give is the sum of the squared deviations from the mean (plus this term b2)

n b2is a bit more complex, but it is more odd than difficult

2

2

)1( snbW−

=

1

)(12

−=∑=

n

xxs

n

ii ∑

=

−=−n

ii xxsn

1

22 )()1(

Page 11: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Strange, don’t you think?

n But first let’s show you the equation for b

∑=

+− −=k

iiini xxab

11 )(

Big and Little Pairs

ai weight (from math that you don’t want to have to learn) –basically the weights are the result of an expected normaldistribution and its resulting covariance matrix

The median

Page 12: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Getting to B-Squared

n The b term is actually a weighted comparison of all the pairs within the sample

n The way that it works is that you sort all of your data from least to greatest

n Then you create k number of pairs from the sample with k=n/2 if n is even and k=n+1/2 if n is odd (note that k is the median of the sample)

n Each pair has a companion that is from the other end of the sample

n Example: Given the following set of numbers-1,2,3,4,5,100 your pairs would be as follows:

n 100 and 1, 5 and 2, and 4 and 3

Page 13: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Results

n W=0.952165n This isn’t very small, so we are going to fail to rejectn H0: Normal HA: Not Normal (note the wording here, we

are not saying that this test shows that the data is normal, we are only saying that it fails to show that the data is not normal)

n W(critical) for 0.05 and n=20 is 0.905n Note that our result of 0.952 has a p-value of around

0.40n This sample is suitable for parametric analysis

Page 14: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels

Shapiro-Wilk TablesPair Coefficients(weights)

Critical levels forsignificance

Page 15: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 16: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 17: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 18: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 19: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 20: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 21: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 22: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels
Page 23: Assessing Normality and Data Transformations .pdfShapiro-Wilk Test (JMP) Kolmogorov-Smirnov Test (SPSS) Anderson-Darling Test (MINITAB) Histograms and Boxplots The cholesterol levels