SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)

SADC Course in Statistics

Introduction to Non-Parametric Methods

(Session 19)

2To put your footer here go to View > Header and Footer

Learning ObjectivesAt the end of this session, you will be able to

• Understand the general meaning of non-parametric methods and when they might be used

• Implement and interpret a simple non-parametric test, the sign test, and understand its advantages and limitations

• Appreciate some practical problems associated with non-parametric methods


An illustrative example

A random sample of 12 small businesses were asked “What percentage of last year’s profit was reinvested?”.

Data: 5.1, 6.4, 7.1, 23.6, 4.7, 14.3,

5.9, 5.5, 11.6, 17.5, 8.2, 7.7

A government official claims the real “average” is 10%.

How can this claim be tested?


Start by plotting

- A very skewed distribution

5 10 15 20 25reinvest

Boxplot of % Reinvested


Addressing the question …

• A one-sample t-test is often employed in such cases, but the procedure assumes normally distributed data

• This is clearly NOT the case here, and hence the validity of the t-test procedure is questionable


Recall the t-test is robust to departures from normality due to the Central Limit Theorem

We only need to worry if the sample size is quite small and/or the underlying distribution is very non-normal

Hence, we might be concerned here about applying a t-test in our example

Robustness of the t-test


Two alternative approaches

• TransformationsAre the measurements approximately normally distributed on a different measurement scale, e.g. a logarithmic scale? If so, analyse the data on the transformed scale

• Non-Parametric methodsUtilise a technique that does not assume a normal distribution. Such methods are often collectively referred to as non-parametric methods …


• Non-parametric methods (or tests) derive their name from the fact that no explicit distribution (e.g. normal, gamma, …) is associated with the data

• Occasionally the techniques are called distribution-free methods, but assumptions may be made, e.g. a symmetrical distribution. Hence, the name is potentially misleading

• To illustrate the above we shall now apply a simple sign test to the example

Non-Parametric methods


Back to the example

• Let us make no assumption about the distribution of reinvestment percentages

• Have said this, the distribution is clearly very skewed. When attempting to summarise the “average” of such a distribution the median is a natural choice

– Sample median = 7.4%

• The median is a flexible summary and so hypotheses of interest are generally phrased in terms of a population median


The sign test

Hypotheses:

H0: Population median, =10% vs.

H1: Population median, 10%

Assumptions: Data values are independent. No distributional assumption is necessary

Logic: If H0 is true, then we would expect half

of the observed values to fall below 10 and half above 10. How inconsistent is our data with this expectation?


Applying the sign test

• List the data in ascending order:4.7, 5.1, …,8.2, 11.6, …, 23.6

• If a value is < 10 assign a negative sign;if a value is > 10 assign a positive sign

• Under H0, we have a random sample of n=12 binary outcomes (– or +):

– – – – – – – – + + + +• This gives 8 –ve and 4 +ve signs compared

to the expected 6 and 6 respectively


Applying the sign test

• How unusual is this result under H0?

• A natural test statistic is literally the number of +ve signs [the choice –ve vs. +ve is arbitrary]

• A sufficiently small or large value is evidence to reject H0

• Under H0, R=number of +ve signs follows a binomial distribution with n=12 and p=0.5– This is a symmetric distribution

• A two-sided p-value is thenProb(R4)+Prob(R8) = 2Prob(R4)


The p-value

• Using statistical software, e.g. Stata:

Two-sided test:

Ho: median of reinvest - 10 = 0 vs.

Ha: median of reinvest - 10 != 0

Pr(#positive >= 8 or #negative >= 8) =

min(1, 2*Binomial(n = 12, x >= 8, p = 0.5))= 0.3877

• P-value = 0.39

• This may be calculated by using the Excel BINOMDIST worksheet function


Conclusions

• The p-value is very large. Hence, there is no evidence to reject H0

• The estimated median reinvestment, 7.4%, is not significantly different from 10%

• There is no evidence based on this survey against the government official’s claim


Further notes• P-value calculation

– The p-value may be approximated using the normal approximation to the binomial distribution

– Compare Z with the tails of a N(0,1) distribution– n > 20 will usually give a reasonable

approximation

0 H ,

R n/2Under Z = N(0,1) approximately

n/2


Further notes

• No signs– If any value equals the hypothesised median of

10 then it is ignored and the sample size is reduced accordingly

• One-sided tests– Although a two-sided example was discussed,

one-sided tests are also possible


Pros and cons of the sign test

Advantages

• Simple and logical

• Widely applicable– Few assumptions

• Robust to outliers– Recorded values are not used, only signs


Pros and cons of the sign test

Major Disadvantages

• Severe loss of information– Recorded values not used, only signs– Makes the sign test inefficient

• Confidence intervals (CIs)– A CI for the true median can be constructed,

but it is cumbersome– Software packages tend not to present a CI for

the median, instead concentrating on the p-value


Concluding remarks

• Non-parametric methods generally concentrate on hypothesis testing, and hence the p-value

• The lack of confidence intervals is a major disadvantage

• We shall return to these issues in Session 20


References

The two references below apply to bothSessions 19 and 20 and also to non-parametric methods in general.

• Conover, W.J. (1999) Practical Nonparametric Statistics. 3rd edn. Wiley, pp. 584.

• Sprent, P., (1993) Applied Nonparametric Statistical Methods, 2nd edn. Chapman and Hall, London.


Practical work follows …

SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)

Documents

Transcript of SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)