Nonparametric tests European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov...

Nonparametric tests

European Molecular Biology LaboratoryPredoc Bioinformatics Course

17th Nov 2009

Tim Massingham, [email protected]

What is a nonparametric test?

Parametric: assume data from some family of distribution functions

Gamma distribution with different parameters

Normal distribution• mean• varianceGamma distribution• shape• scaleetc…

Non-parametric means that no assumptions about distribution

Generally means just look at ranks of data

Most traditional tests assume a normal distribution

Shape Scale1 2

2 23 25 1

10 0.5

Robustness

Pearson’s correlation test

Correlation = -0.05076632(p-value = 0.02318)

Correlation = -0.06499109 (p-value = 0.003632)

Correlation = -0.1011426 (p-value = 5.81e-06)

Correlation = 0.1204287 (p-value = 6.539e-08)

A single observation can change the outcome of many tests

Robust tests are resistant to outliers but require more data

200 observations from normal distributionx ~ normal(0,1) y ~ normal(1,3)

RobustnessA single observation can change the outcome of many tests

Robust tests are resistant to outliers but require more data

Spearman’s correlation test





Pearson’s correlation test

Correlation = -0.05076632(p-value = 0.02318)


Correlation = -0.1011426 (p-value = 5.81e-06)

Correlation = 0.1204287 (p-value = 6.539e-08)

Non-parametric Parametric

Newcomb’s speed of light data

Newcomb’s lab(1878)

Washington monument(~12 ms later)

Standard test of all dataMean 26.295% confidence interval 23.6 28.9 (width=5.3)

Newcomb dropped the outlierMean 27.395% confidence interval 25.7 28.8 (width=3.1)

Robust test (Sign test for median)Median 27.095% confidence interval 26.0 28.5 (width=2.5)

Efficiency of robust tests

Few results, mostly for large samplesUsing median rather than mean 50% more dataWilcoxon test vs. t-test 20% more data (no more than)

Potvin and Roff (1993) Ecology 74:1617-1628

5 57 31 15 100 48 25 0 0

-33 0 -17 -27 -21

Percentage extra datafor same tests

Asymptotic Relative EfficiencyAsymptotic ≈ valid for large samplesRelative efficiency ≈ ratio of variance

Efficiency of robust tests

Few results, mostly for large samplesUsing median rather than mean 50% more dataWilcoxon test vs. t-test 20% more data (no more than)

Potvin and Roff (1993) Ecology 74:1617-1628

5 57 31 15 100 48 25 0 0

-33 0 -17 -27 -21

Percentage extra datafor same tests

Requires less data!

Kolmogorov test

AKA Kolmogorov-Smirnov test

Type of data: continuousParametric equivalent: noneDistribution of statistic: exact when no ties in data

Does this data follow a specific distribution?Are two sets of data from the same distribution?

Maximum difference

Kolmogorov test

Why does it work?

Rank difference constant under transformation

stretch and contract x axis

Kolmogorov test

For testing whether data is normally distributed or not, the Shapiro-Wilk test is preferred.See shapiro.test in R

Not valid when null distribution has been fitted to data, e.g. test against normal but fit mean and variance

ks.test(stud_logexp, pnorm)

One-sample Kolmogorov-Smirnov test

data: stud_logexpD = 0.0526, p-value < 2.2e-16alternative hypothesis: two-sided

Is Studentized expression data normal?

Kolmogorov two-sample testAre two sets of data from the same distribution?

Gene expression data from Arabidopsis thaliana• sprayed with 1.6mM Tween• sprayed with water

ks.test(logexp1,logexp2)

Two-sample Kolmogorov-Smirnov test

data: logexp1 and logexp2 D = 0.0207, p-value = 0.0001146alternative hypothesis: two-sided

Biggest deviations for low expression

Sign test

Is the median of the data zero?Is the median x? (Subtract x from data and test against zero)

Type of data: continuousParametric equivalent: Student’s t-test (one sample)Distribution of statistic: exact when no ties in data

50% 50%

50:50 chance each side medianCount them up use binomial test

median

<0 0 >0

12334 321 10155

Gene expression differences

binom.test( c(12334,10155) )

Exact binomial test

data: c(10155, 12334) number of successes = 10155, number of trials = 22489, p-value < 2.2e-16alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.4450344 0.4580863 sample estimates:probability of success 0.4515541

Sign test

Is the median of the data zero?Is the median x? (Subtract x from data and test against zero)

Gene expression differences

Expect difference in expression to be zeroDiscard differences of exactly zero

<0 0 >0

12334 321 10155

Confidence interval is on proportionnot the expression difference

SIGN.test in the PASWR package is a more convenient way of doing a sign test and gives confidence intervals.

Wilcoxon Signed Rank test

Type of data: ordinal (interval for paired data)Parametric equivalent: Student’s t-testDistribution of statistic: exact

Is the data symmetric about zero?Is the data symmetric about x? (Subtract x and test against zero)

Much stronger assumption than signed test

median=0.72

Test rejects non-symmetric dataa <- rweibull(1000,1,1)wilcox.test( a-median(a) )p-value = 1.087e-05

Wilcoxon Signed Rank testSpecial case when we do expect symmetry

X = Intrinsic + RandomX

Y = Intrinsic + RandomY

Look a pair X & Y

Random property• measurement error• natural variation

Paired dataSame gene under two different conditionsMeasuring response (before and after)Paired control, e.g. sibling pairs


Paired dataSame gene under two different conditionsMeasuring response (before and after)Paired control, e.g. sibling pairs

Special case when we do expect symmetry

X = Intrinsic + RandomX

Y = Intrinsic + RandomY

-Distribution of difference is symmetric about zero

Look a pair X & Y

- =


Have gene expression data in two matched Arabidopsis thaliana plants• one sprayed with 1.6mM Tween and left for one hour• one sprayed with distilled water and left for one hour

The genes form matched pairs

Water Tween Difference


wilcox.test( lexp1 , lexp2, paired=TRUE )

Wilcoxon signed rank test with continuity correction

data: lexp1 and lexp2 V = 108347390, p-value < 2.2e-16alternative hypothesis: true location shift is not equal to 0

wilcox.test( lexp1 , lexp2, paired=TRUE , conf.int=TRUE)

Wilcoxon signed rank test with continuity correction

data: lexp1 and lexp2 V = 108347390, p-value < 2.2e-16alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.05204535 -0.04207491 sample estimates:(pseudo)median -0.04705803

Wilcoxon Rank Sum Test

Also referred to as Mann-Whitney or Mann-Whitney-Wilcoxon test

Type of data: ordinalParametric equivalent: two-sample Student’s t-testDistribution of statistic: exact

Do two samples have the same median?

Look at same expression data but ignore pairing

wilcox.test( lexp1 , lexp2, conf.int=TRUE)

Wilcoxon rank sum test with continuity correction

data: lexp1 and lexp2 W = 256243890, p-value = 0.005504alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.08455685 -0.01295834 sample estimates:difference in location -0.04910699

Paired vs two-sample tests

Pairing can make a huge difference to power of test

Look at a case where the variation in intrinsic greater than effect

wilcox.test(sample1,sample2)

Wilcoxon rank sum test

data: sample1 and sample2 W = 4930, p-value = 0.8652alternative hypothesis:

true location shift is not equal to 0

wilcox.test(sample1,sample2,paired=TRUE)

Wilcoxon signed rank test

data: sample1 and sample2 V = 1609, p-value = 0.001645alternative hypothesis:

true location shift is not equal to 0

Kruskal-Wallis

Type of data: ordinalParametric equivalent: ANOVADistribution of statistic: approximate

What if we have several groups?Arabidopis gene expression data consisted of 6 experiments

6 groups of expression data; do they have different medians?

kruskal.test(gene_expression)

Kruskal-Wallis rank sum test

data: gene_expression Kruskal-Wallis chi-squared = 58.421, df = 5, p-value = 2.575e-11

For two samples, Kruskal-Wallis is equivalent to Wilcoxon Rank Sum

Friedman test

Paired observationsWilcoxon Signed Rank test

Genes

Groups

Type of data: ordinalParametric equivalent: ANOVA with blocksDistribution of statistic: approximate

Genes

G1 G2 Groups

Many groupsKruskal-Wallis test

Many groups in distinct units

Friedman test

Classic example: wine tastingAsk 4 women to rank 3 different wines, is one wine preferred?

Merlot Shiraz Pinot Noir

Agnes 1 2 3

Clara 2 1 3

Mona 1 3 2

Pam 1 2 3

wine Merlot Shiraz Pinot NoirAgnes 1 2 3Clara 2 1 3Mona 1 3 2Pam 1 2 3

friedman.test(wine)

Friedman rank sum test

data: wine Friedman chi-squared = 4.5, df = 2, p-value = 0.1054

friedman.test(t(wine))

Friedman rank sum test

data: t(wine) Friedman chi-squared = 0.1429, df = 3, p-value = 0.9862

Agnes Clara Mona Pam

Merlot 1 2 1 1

Shiraz 2 1 3 2

Pinot Noir 3 3 2 3

Flip the question:Are judges ranking wines in a consistent manner?

Expected since forcing judges to rank

Friedman testAnother look at the Arabidopis data - look at first 20 genes

1 1360.8 638.2 839.8 807.9 1252.4 1421.92 12.4 3.6 0.9 2.1 3.4 12.03 1297.0 1354.8 1401.5 1198.6 1017.4 1322.24 73.9 83.4 87.4 156.4 150.3 69.05 943.6 938.9 904.8 1133.4 958.2 940.16 1301.4 1089.4 1153.5 1173.5 1157.8 1337.57 908.4 837.0 795.4 1227.2 1008.2 1027.68 1585.4 1699.7 1747.8 2093.3 1851.6 2118.79 2837.8 3848.7 3960.2 3438.9 3608.9 3987.410 1498.7 1095.0 1213.8 1719.0 1914.5 1836.211 1296.1 1033.5 1212.2 1256.6 1333.1 1345.012 35.2 29.8 23.3 8.6 10.5 22.713 41.1 27.3 26.8 13.6 15.2 29.614 64.2 31.9 32.5 14.8 13.1 37.315 60.3 45.2 41.2 28.8 24.0 38.616 136.6 89.6 83.6 42.4 39.7 95.017 518.9 333.1 347.8 229.8 206.3 421.318 108.9 70.0 61.5 80.4 78.9 80.419 1516.0 967.1 1038.5 600.7 565.2 1381.320 1377.4 853.8 834.7 415.4 366.6 965.8

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Friedman Testp-value = 0.0006611

Kruskal-Wallis Testp-value = 0.8761

Genes

Friedman test

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6

Exp 1 0.027 0.033 0.409 0.330 0.784

Exp 2 0.117 0.841 0.985 0.004

Exp 3 0.869 0.927 0.001

Exp 4 0.245 0.021

Exp 5 0.004

Exp 6

Friedman Testp-value = 0.0006611

Pairwise Wilcoxon Signed Rank(multiple comparisons problem)

Friedman / Kruskal-Wallis: at least one experiment shows differenceDoes not say which experiment


Exp 1 0.293 0.328 1.000 1.000 1.000

Exp 2 1.000 1.000 1.000 0.051

Exp 3 1.000 1.000 0.015

Exp 4 0.245 0.248

Exp 5 0.051

Exp 6

Raw p-values

Adjusted p-values

Friedman test


Exp 1 0.293 0.328 1.000 1.000 1.000

Exp 2 1.000 1.000 1.000 0.051

Exp 3 1.000 1.000 0.015

Exp 4 0.245 0.248

Exp 5 0.051

Exp 6

Adjusted p-values from Signed Rank testExperiment map

Actually have three pairs of experimentsA Exp 6 & Exp 1: with and without Tween, 1 hourB Exp 2 & Exp 3: with and without Tween, 2.5 hoursC Exp 5 & Exp 4: with and without Tween, 1 hour (replicate of A)

Difference detected may not be a useful oneBut note:

Looked at first 20 genesFull set has 22810

Aside on blockingG

ene

Experiment

The Friedman tests assumes thatall treatments are applied to all blocks“balanced complete design”

Statistical lingoExperiments are “treatments”Genes are “blocks”

Might not be able to do this• too expensive• blocks only available in packs of fixed size

Incomplete experimental designWhich treatments with which blocks is a critical issue

Aside on blockingG

ene

Experiment

The Friedman tests assumes thatall treatments are applied to all blocks“balanced complete design”

Statistical lingoExperiments are “treatments”Genes are “blocks”

Might not be able to do this• too expensive• blocks only available in packs of fixed size

Incomplete experimental designWhich treatments with which blocks is a critical issue

Talk to a statistician before you start

Nonparametric tests European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov...

Documents

Transcript of Nonparametric tests European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov...