SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
-
Upload
steven-fagan -
Category
Documents
-
view
222 -
download
0
Transcript of SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics
Introduction to Non-Parametric Methods
(Session 19)
2To put your footer here go to View > Header and Footer
Learning ObjectivesAt the end of this session, you will be able to
• Understand the general meaning of non-parametric methods and when they might be used
• Implement and interpret a simple non-parametric test, the sign test, and understand its advantages and limitations
• Appreciate some practical problems associated with non-parametric methods
3To put your footer here go to View > Header and Footer
An illustrative example
A random sample of 12 small businesses were asked “What percentage of last year’s profit was reinvested?”.
Data: 5.1, 6.4, 7.1, 23.6, 4.7, 14.3,
5.9, 5.5, 11.6, 17.5, 8.2, 7.7
A government official claims the real “average” is 10%.
How can this claim be tested?
4To put your footer here go to View > Header and Footer
Start by plotting
- A very skewed distribution
5 10 15 20 25reinvest
Boxplot of % Reinvested
5To put your footer here go to View > Header and Footer
Addressing the question …
• A one-sample t-test is often employed in such cases, but the procedure assumes normally distributed data
• This is clearly NOT the case here, and hence the validity of the t-test procedure is questionable
6To put your footer here go to View > Header and Footer
Recall the t-test is robust to departures from normality due to the Central Limit Theorem
We only need to worry if the sample size is quite small and/or the underlying distribution is very non-normal
Hence, we might be concerned here about applying a t-test in our example
Robustness of the t-test
7To put your footer here go to View > Header and Footer
Two alternative approaches
• TransformationsAre the measurements approximately normally distributed on a different measurement scale, e.g. a logarithmic scale? If so, analyse the data on the transformed scale
• Non-Parametric methodsUtilise a technique that does not assume a normal distribution. Such methods are often collectively referred to as non-parametric methods …
8To put your footer here go to View > Header and Footer
• Non-parametric methods (or tests) derive their name from the fact that no explicit distribution (e.g. normal, gamma, …) is associated with the data
• Occasionally the techniques are called distribution-free methods, but assumptions may be made, e.g. a symmetrical distribution. Hence, the name is potentially misleading
• To illustrate the above we shall now apply a simple sign test to the example
Non-Parametric methods
9To put your footer here go to View > Header and Footer
Back to the example
• Let us make no assumption about the distribution of reinvestment percentages
• Have said this, the distribution is clearly very skewed. When attempting to summarise the “average” of such a distribution the median is a natural choice
– Sample median = 7.4%
• The median is a flexible summary and so hypotheses of interest are generally phrased in terms of a population median
10To put your footer here go to View > Header and Footer
The sign test
Hypotheses:
H0: Population median, =10% vs.
H1: Population median, 10%
Assumptions: Data values are independent. No distributional assumption is necessary
Logic: If H0 is true, then we would expect half
of the observed values to fall below 10 and half above 10. How inconsistent is our data with this expectation?
11To put your footer here go to View > Header and Footer
Applying the sign test
• List the data in ascending order:4.7, 5.1, …,8.2, 11.6, …, 23.6
• If a value is < 10 assign a negative sign;if a value is > 10 assign a positive sign
• Under H0, we have a random sample of n=12 binary outcomes (– or +):
– – – – – – – – + + + +• This gives 8 –ve and 4 +ve signs compared
to the expected 6 and 6 respectively
12To put your footer here go to View > Header and Footer
Applying the sign test
• How unusual is this result under H0?
• A natural test statistic is literally the number of +ve signs [the choice –ve vs. +ve is arbitrary]
• A sufficiently small or large value is evidence to reject H0
• Under H0, R=number of +ve signs follows a binomial distribution with n=12 and p=0.5– This is a symmetric distribution
• A two-sided p-value is thenProb(R4)+Prob(R8) = 2Prob(R4)
13To put your footer here go to View > Header and Footer
The p-value
• Using statistical software, e.g. Stata:
Two-sided test:
Ho: median of reinvest - 10 = 0 vs.
Ha: median of reinvest - 10 != 0
Pr(#positive >= 8 or #negative >= 8) =
min(1, 2*Binomial(n = 12, x >= 8, p = 0.5))= 0.3877
• P-value = 0.39
• This may be calculated by using the Excel BINOMDIST worksheet function
14To put your footer here go to View > Header and Footer
Conclusions
• The p-value is very large. Hence, there is no evidence to reject H0
• The estimated median reinvestment, 7.4%, is not significantly different from 10%
• There is no evidence based on this survey against the government official’s claim
15To put your footer here go to View > Header and Footer
Further notes• P-value calculation
– The p-value may be approximated using the normal approximation to the binomial distribution
– Compare Z with the tails of a N(0,1) distribution– n > 20 will usually give a reasonable
approximation
0 H ,
R n/2Under Z = N(0,1) approximately
n/2
16To put your footer here go to View > Header and Footer
Further notes
• No signs– If any value equals the hypothesised median of
10 then it is ignored and the sample size is reduced accordingly
• One-sided tests– Although a two-sided example was discussed,
one-sided tests are also possible
17To put your footer here go to View > Header and Footer
Pros and cons of the sign test
Advantages
• Simple and logical
• Widely applicable– Few assumptions
• Robust to outliers– Recorded values are not used, only signs
18To put your footer here go to View > Header and Footer
Pros and cons of the sign test
Major Disadvantages
• Severe loss of information– Recorded values not used, only signs– Makes the sign test inefficient
• Confidence intervals (CIs)– A CI for the true median can be constructed,
but it is cumbersome– Software packages tend not to present a CI for
the median, instead concentrating on the p-value
19To put your footer here go to View > Header and Footer
Concluding remarks
• Non-parametric methods generally concentrate on hypothesis testing, and hence the p-value
• The lack of confidence intervals is a major disadvantage
• We shall return to these issues in Session 20
20To put your footer here go to View > Header and Footer
References
The two references below apply to bothSessions 19 and 20 and also to non-parametric methods in general.
• Conover, W.J. (1999) Practical Nonparametric Statistics. 3rd edn. Wiley, pp. 584.
• Sprent, P., (1993) Applied Nonparametric Statistical Methods, 2nd edn. Chapman and Hall, London.
21To put your footer here go to View > Header and Footer
Practical work follows …