Download - Biostat Last Research

8/8/2019 Biostat Last Research

1/40

RESEARCH

IN

BIOSTAT

Submitted by:

Alonzo, Jessa Marie

Balbin, Carmina

Magalona, Norie Rose

Magsalin, Alexander Hubert

Natividad, Leslie Ann

Valdez, Darrel Jan

Submitted to:

Mr. Joselito Roque

(Professor)


2/40

1. Test of HypothesisA. Differentiate:

i. Null and Alternative Hypothesisii. One-tailed Test and Two-tailed Test

B. What is the Level of Significance / Critical Value?

C. Test of Significance

2. Differentiate Parametric and Non-parametric Test3. Define and determine when it is appropriate to use:

y Z-testy T-testy Correlation and Regressiony Analysis of Variancey Chi-square Test

Illustrate examples each.

4. Types of of Non-parametric Test - Valdez, Darrel Jan

Magalona, Norie Rose

Magsalin, Alexander Hubert

Balbin, Carmina

Natividad, Leslie Ann

Alonzo, Jessa Marie


3/40

1) Test of HypothesisA. Differentiate:

i. Null and Alternative HypothesisNull hypothesis

The null hypothesis is an hypothesis about a population parameter. The purpose of

hypothesis testing is to test the viability of the null hypothesis in the light of experimental

data. Depending on the data, the null hypothesis either will or will not be rejected as a

viable possibility.

Consider a researcher interested in whether the time to respond to a tone is affected by

the consumption of alcohol. The null hypothesis is that 1- 2 = 0 where 1is the mean

time to respond after consuming alcohol and 2 is the mean time to respond otherwise.

Thus, the null hypothesis concerns the parameter 1- 2and the null hypothesis is that the

parameter equals zero.

The null hypothesis is often the reverse of what the experimenter actually believes; it is

put forward to allow the data to contradict it. In the experiment on the effect of alcohol,

the experimenter probably expects alcohol to have a harmful effect. If the experimental

data show a sufficiently large effect of alcohol, then the null hypothesis that alcohol has

no effect can be rejected.

It should be stressed that researchers very frequently put forward a null hypothesis in the

hope that they can discredit it. For a second example, consider an educational researcher

who designed a new way to teach a particular concept in science, and wanted to test

experimentally whether this new method worked better than the existing method. The

researcher would design an experiment comparing the two methods. Since the null

hypothesis would be that there is no difference between the two methods, the researcher


4/40

would be hoping to reject the null hypothesis and conclude that the method he or she

developed is the better of the two.

The symbol H0 is used to indicate the null hypothesis. For the example just given, the null

hypothesis would be designated by the following symbols:

H0: 1- 2 = 0

or by

H0: 1= 2.

The null hypothesis is typically a hypothesis of no difference as in this example where it

is the hypothesis of no difference between population means. That is why the word "null"

in "null hypothesis" is used -- it is the hypothesis of no difference.

Despite the "null" in "null hypothesis," there are occasions when the parameter is not

hypothesized to be 0. For instance, it is possible for the null hypothesis to be that the

difference between population means is a particular value. Or, the null hypothesis could

be that the mean SAT score in some population is 600. The null hypothesis would then be

stated as: H0: = 600. although the null hypotheses discussed so far have all involved the

testing of hypotheses about one or more population means, null hypotheses can involve

any parameter. An experiment investigating the correlation between job satisfaction and

performance on the job would test the null hypothesis that the population correlation ()

is 0. Symbolically, H0: = 0.

Some possible null hypotheses are given below:

H0: =0

H0: =10

H0: 1 - 2 = 0


5/40

H0: = .5

H0: 1 - 2 = 0

H0: 1= 2 = 3

H0: 1- 2= 0

When a one-tailed test is conducted, the null hypothesis includes the direction of the

effect. A one-tailed test of the differences between means might test the null hypothesis

that 1- 2 0. If M1- M2 were much less than 0 then the null hypothesis would be

rejected in favor of the alternative hypothesis: 1- 2 < 0.

Alternative hypothesis

In statistical hypothesis testing, the alternative hypothesis (or maintained

hypothesis or research hypothesis) and the null hypothesis are the two rival hypotheses

which are compared by a statistical hypothesis test. An example might be where water

quality in a stream has been observed over many years and a test is made of the null

hypothesis that there is no change in quality between the first and second halves of the

data against the alternative hypothesis that the quality is poorer in the second half of therecord.

The concept of an alternative hypothesis in testing was devised by Jerzy

Neyman and Egon Pearson, and it is used in the NeymanPearson lemma. It forms a

major component in modern statistical hypothesis testing. However it was not part

of Ronald Fisher's formulation of statistical hypothesis testing, and he violently opposed

its use.[1] In Fisher's approach to testing, the central idea is to assess whether the observed

dataset could have resulted from chance if the null hypothesis were assumed to hold,

notionally without preconceptions about what other model might hold. Modern statistical


6/40

hypothesis testing accommodates this type of test since the alternative hypothesis can be

just the negation of the null hypothesis.

ii. One-tailed Test and Two-tailed TestOne tailed test

A statistical test in which the critical region consists of all values of a test statistic that are

less than a given value or greater than a given value, but not both.

We choose a critical region. In a one-tailed test, the critical region will have just one part (the

red area below). If our sample value lies in this region, we reject the null hypothesis in favour

of the alternative.

Suppose we are looking for a definite decrease. Then the critical region will be to the left.

Note, however, that in the one-tailed test the value of the parameter can be as high as you

like.

Example

Suppose we are given that X has a Poisson distribution and we want to carry out a hypothesis

test on the mean, , based upon a sample observation of 3.


7/40

Suppose the hypotheses are:

H0: = 9

H1: < 9

We want to test if it is "reasonable" for the observed value of 3 to have come from a Poisson

distribution with parameter 9. So what is the probability that a value as low as 3 has come

from a Po(9)?

P(X 3) = 0.0212 (this has come from a Poisson table)

The probability is less than 0.05, so there is less than a 5% chance that the value has come

from a Poisson(3) distribution. We therefore reject the null hypothesis in favour of the

alternative at the 5% level.

However, the probability is greater than 0.01, so we would not reject the null hypothesis in

favour of the alternative at the 1% level.

Two-tailed test

The two-tailed test is a statistical test used in inference, in which a given statistical

hypothesis , H0 (null hypothesis) will be rejected when the value of the statistic is either

sufficiently small or sufficiently large. The test is named after the "tail" of data under the far

left and far right of a bell-shaped normal data distribution, or bell curve. However, the

terminology is extended to tests relating to distributions other than normal.

"In general a test is called two-sided or two-tailed if the null hypothesis is rejected for values

of the test statistic falling into either tail of its sampling distribution, and it is called one-

sided or one-tailed if the null hypothesis is rejected only for values of the test statistic falling

into one specified tail of its sampling distribution".[1] For example, if our alternative


8/40

hypothesis is , rejecting the null hypothesis of = 42.5 for small or for large

values of the sample mean, the test is called two-tailed or two-sided. If our alternative

hypothesis is > 1.4, rejecting the null hypothesis of only for large values of the

sample mean, it is then called one-tailed or one-sided.

If the distribution from which the samples are derived is considered to be normal, Gaussian,

or bell-shaped, then the test is referred to as a one- or two-tailed T test. If the test is

performed using the actual population mean and variance, rather than an estimate from a

sample, it would be called a one- or two-tailed Z test.

The statistical tables forZand fortprovide critical values for both one- and two-tailed tests.

That is, they provide the critical values that cut off an entire alpha region at one or the other

end of the sampling distribution as well as the critical values that cut off the 1/2 alpha regions

at both ends of the sampling distribution.

In a two-tailed test, we are looking for either an increase or a decrease. So, for example,

H0 might be that the mean is equal to 9 (as before). This time, however, H1 would be that the

mean is not equal to 9. In this case, therefore, the critical region has two parts:

Example

Lets test the parameter p of a Binomial distribution at the 10% level.


9/40

Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the

coin is fair. If the coin is fair, p = 0.5 . Put this as the null hypothesis:

H0: p = 0.5

H1: p 0.5

Now, because the test is 2-tailed, the critical region has two parts. Half of the critical region

is to the right and half is to the left. So the critical region contains both the top 5% of the

distribution and the bottom 5% of the distribution (since we are testing at the 10% level).

If H0 is true, X ~ Bin(10, 0.5).

If the null hypothesis is true, what is the probability that X is 7 or above?

P(X 7) = 1 - P(X < 7) = 1 - P(X 6) = 1 - 0.8281 = 0.1719

Is this in the critical region? No- because the probability that X is at least 7 is not less than

0.05 (5%), which is what we need it to be.

So there is not significant evidence at the 10% level to reject the null hypothesis.

B. What is the Level of Significance / Critical Value?Level of significance

In statistics, a result is called statistically significant if it is unlikely to have occurred

by chance. The phrase test of significance was coined by Ronald Fisher.

As used in statistics,significantdoes not mean importantormeaningful, as it does in

everyday speech. For example, a study that included tens of thousands of participants might

be able to say with great confidence that residents of one city were more intelligent than

people of another city by 1/20 of an IQ point. This result would be statistically significant,


10/40

but the difference is small enough to be utterly unimportant. Many researchers urge that tests

of significance should always be accompanied by effect-sizestatistics, which approximate the

size and thus the practical importance of the difference.

The amount of evidence required to accept that an event is unlikely to have arisen by chance

is known as the significance level or critical p-value: in traditional Fisherianstatistical

hypothesis testing, the p-value is the probability of observing data at least as extreme as that

observed,given that the null hypothesis is true. If the obtained p-value is small then it can be

said either the null hypothesis is false or an unusual event has occurred. It is worth stressing

that p-values do not have any repeat sampling interpretation.

An alternative statistical hypothesis testing framework is the Neyman-Pearson frequentist

school which requires that both a null and an alternative hypothesis to be defined and

investigates the repeat sampling properties of the procedure, i.e. the probability that a

decision to reject the null hypothesis will be made when it is in fact true and should not have

been rejected (this is called a "false positive" or Type I error) and the probability that a

decision will be made to accept the null hypothesis when it is in fact false (Type II error).

More typically, the significance level of a test is such that the probability of mistakenly

rejecting the null hypothesis is no more than the stated probability. This allows the test to be

performed using non-significant statistics which has the advantage of reducing the

computational burden while wasting some information.

It is worth stressing that Fisherian p-values are philosophically different from Neyman-

Pearson Type I errors. This confusion is unfortunately propagated by many statistics

textbooks.

Use in practice


11/40

The significance level is usually denoted by the Greek symbol (lowercase alpha). Popular

levels of significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of

significance gives a p-value lower than the -level, the null hypothesis is rejected. Such

results are informally referred to as 'statistically significant'. For example, if someone argues

that "there's only one chance in a thousand this could have happened by coincidence," a

0.001 level of statistical significance is being implied. The lower the significance level, the

stronger the evidence required. Choosing level of significance is an arbitrary task, but for

many applications, a level of 5% is chosen, for no better reason than that it is conventional.

In some situations it is convenient to express the statistical significance as 1 . In general,

when interpreting a stated significance, one must be careful to note what, precisely, is being

tested statistically.

Different -levels trade off countervailing effects. Smaller levels of increase confidence in

the determination of significance, but run an increased risk of failing to reject a false null

hypothesis (a Type II error, or "false negative determination"), and so have less statistical

power. The selection of an -level thus inevitably involves a compromise between

significance and power, and consequently between the Type I error and the Type II error.

More powerful experiments - usually experiments with more subjects or replications - can

obviate this choice to an arbitrary degree.

In some fields, for example nuclear and particle physics, it is common to express statistical

significance in units of "" (sigma), the standard deviation of a Gaussian distribution. A

statistical significance of "n" can be converted into a value of via use of the error

function:


12/40

The use of implicitly assumes a Gaussian distribution of measurement values. For example,

if a theory predicts a parameter to have a value of, say, 100, and one measures the parameter

to be 109 3, then one might report the measurement as a "3 deviation" from the theoretical

prediction. In terms of , this statement is equivalent to saying that "assuming the theory is

true, the likelihood of obtaining the experimental result by coincidence is 0.27%" (since

1 erf(3/2) = 0.0027).

Fixed significance levels such as those mentioned above may be regarded as useful in

exploratory data analyses. However, modern statistical advice is that, where the outcome of a

test is essentially the final outcome of an experiment or other study, the p-value should be

quoted explicitly. And, importantly, it should be quoted whether the p-value is judged to be

significant. This is to allow maximum information to be transferred from a summary of the

study into meta-analyses.

Critical value

In differential topology, a critical value of a differentiable function

between differentiable manifolds is the value of a critical point .

The basic result on critical values is Sard's lemma. The set of critical values can be quite

irregular; but in Morse theory it becomes important to consider real-valued functions on a

manifold M, such that the set of critical values is in fact finite. The theory of Morse

functions shows that there are many such functions; and that they are even typical, or generic

in the sense of Baire category.

A critical value is used in significance testing. It is the value that a test statistic must exceed

in order for the the null hypothesis to be rejected. For example, the critical value of t (with

12 degrees of freedom using the 0.05 significance level) is 2.18. This means that for the


13/40

probability value to be less than or equal to 0.05, the absolute value of the t statistic must be

2.18 or greater. It should be noted that the all-or-none rejection of a null hypothesis is not

recommended.

It should be noted that the all-or-none rejection of a null hypothesis is not recommended.

Statistics

In statistics, a critical value is the value corresponding to a given significance level. This

cutoff value determines the boundary between those samples resulting in a test statistic that

leads to rejecting the null hypothesis and those that lead to a decision not to reject the null

hypothesis. If the absolute value of the calculated value from the statistical test is greater than

the critical value, then the null hypothesis is rejected in favour of the alternative hypothesis,

and vice versa. You can never 'accept' an alternative hypothesis, you can only reject the null

hypothesis in favour of the alternative.

C. Test of SignificanceOnce sample data has been gathered through an observational study or experiment, statistical

inference allows analysts to assess evidence in favor or some claim about the populationfrom which the sample has been drawn. The methods of inference used to support or reject

claims based on sample data are known as tests of significance.

Every test of significance begins with a null hypothesis H0.H0 represents a theory that has

been put forward, either because it is believed to be true or because it is to be used as a basis

for argument, but has not been proved. For example, in a clinical trial of a new drug, the nullhypothesis might be that the new drug is no better, on average, than the current drug. We

would writeH0: there is no difference between the two drugs on average.

The alternative hypothesis,Ha, is a statement of what a statistical hypothesis test is set up to

establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be


14/40

that the new drug has a different effect, on average, compared to that of the current drug. We

would writeHa: the two drugs have different effects, on average. The alternative hypothesis

might also be that the new drug is better, on average, than the current drug. In this case we

would writeHa: the new drug is better than the current drug, on average.

The final conclusion once the test has been carried out is always given in terms of the null

hypothesis. We either "rejectH0 in favor ofHa" or "do not rejectH0"; we never conclude

"rejectHa", or even "acceptHa".

If we conclude "do not rejectH0", this does not necessarily mean that the null hypothesis is

true, it only suggests that there is not sufficient evidence againstH0 in favor ofHa; rejecting

the null hypothesis then, suggests that the alternative hypothesis may be true.

Hypotheses are always stated in terms of population parameter, such as the mean . An

alternative hypothesis may be one-sidedortwo-sided. A one-sided hypothesis claims that a

parameter is either largerorsmaller than the value given by the null hypothesis. A two-sided

hypothesis claims that a parameter is simply not equalto the value given by the null

hypothesis -- the direction does not matter.

Hypotheses for a one-sided test for a population mean take the following form:

H0: = k

Ha: > k

or

H0: = k

Ha: < k.


15/40

Hypotheses for a two-sided test for a population mean take the following form:

H0: = k

Ha: k.

A confidence intervalgives an estimated range of values which is likely to include an

unknown population parameter, the estimated range being calculated from a given set of

sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics

Glossary v1.1)

Example

Suppose a test has been given to all high school students in a certain state. The mean test

score for the entire state is 70, with standard deviation equal to 10. Members of the school

board suspect that female students have a higher mean score on the test than male students,

because the mean score from a random sample of 64 female students is equal to 73.

Does this provide strong evidence that the overall mean for female students is higher?

The null hypothesisH0 claims that there is no difference between the mean score for female

students and the mean for the entire population, so that = 70. The alternative hypothesis

claims that the mean for female students is higher than the entire student population mean, so

that > 70

2) Differentiate Parametric and Non-Parametric Test

Parametric test is a branch of statistics that assumes data has come from a type of probability

distribution and makes inferences about the parameters of the distribution. Most well-known

elementary statistical methods are parametric.


16/40

Generally speaking parametric methods make more assumptions than non-parametric methods. If

those extra assumptions are correct, parametric methods can produce more accurate and precise

estimates. They are said to have more statistical power. However, if those assumptions are

incorrect, parametric methods can be very misleading. For that reason they are often not

considered robust. On the other hand, parametric formulae are often simpler to write down and

faster to compute. In some, but definitely not all cases, their simplicity makes up for their non-

robustness, especially if care is taken to examine diagnostic statistics.

Because parametric statistics require a probability distribution, they are not distribution-free.

History

Statistician Jacob Wolfowitz coined the statistical term "parametric" in order to define its

opposite in 1942:

"Most of these developments have this feature in common, that the distribution functions of the

various stochastic variables which enter into their problems are assumed to be of known

functional form, and the theories of estimation and of testing hypotheses are theories of

estimation of and of testing hypotheses about, one or more parameters. . ., the knowledge of

which would completely determine the various distribution functions involved. We shall refer to

this situation. . .as the parametric case, and denote the opposite case, where the functional forms

of the distributions are unknown, as the non-parametric case."

Example

Suppose we have a sample of 99 test scores with a mean of100 and a standard deviation of10. If

we assume all 99 test scores are random samples from a normal distribution we predict there is a

1% chance that the 100th test score will be higher than 123.65 (that is the mean plus 2.365

standard deviations) assuming that the 100th test score comes from the same distribution as the

others. The normal family of distributions all have the same shape and areparameterizedby


17/40

mean and standard deviation. That means if you know the mean and standard deviation, and that

the distribution is normal, you know the probability of any future observation. Parametric

statistical methods are used to compute the 2.365 value above, given

99 independent observations from the same normal distribution.

A non-parametric estimate of the same thing is the maximum of the first 99 scores. We don't

need to assume anything about the distribution of test scores to reason that before we gave the

test it was equally likely that the highest score would be any of the first 100. Thus there is a 1%

chance that the 100th is higher than any of the 99 that preceded it.

Non parametric test

In statistics, the term non-parametric statistics has at least two different meanings:

1. The first meaning ofnon-parametric covers techniques that do not rely on data belongingto any particular distribution. These include, among others:

distribution free methods, which do not rely on assumptions that the data are drawnfrom a given probability distribution. As such it is the opposite ofparametric statistics.

It includes non-parametric statistical models, inference and statistical tests.

non-parametric statistics (in the sense of a statistic over data, which is defined to be afunction on a sample that has no dependency on a parameter), whose interpretation

does not depend on the population fitting any parametrized distributions. Statistics

based on the ranks of observations are one example of such statistics and these play a

central role in many non-parametric approaches.

2. The second meaning ofnon-parametric covers techniques that do not assume thatthestructure of a model is fixed. Typically, the model grows in size to accommodate the

complexity of the data. In these techniques, individual variables are typically assumed to


18/40

belong to parametric distributions, and assumptions about the types of connections

among variables are also made. These techniques include, among others:

non-parametric regression, which refers to modeling where the structure of the

relationship between variables is treated non-parametrically, but where nevertheless

there may be parametric assumptions about the distribution of model residuals.

non-parametric hierarchical Bayesian models, such as models based on the Dirichletprocess, which allow the number of latent variables to grow as necessary to fit the

data, but where individual variables still follow parametric distributions and even the

process controlling the rate of growth of latent variables follows a parametric

distribution.

Applications and purpose

Non-parametric methods are widely used for studying populations that take on a ranked order

(such as movie reviews receiving one to four stars). The use of non-parametric methods may be

necessary when data have a ranking but no clear numerical interpretation, such as when

assessing preferences; in terms of levels of measurement, for data on an ordinal scale.

As non-parametric methods make fewer assumptions, their applicability is much wider than the

corresponding parametric methods. In particular, they may be applied in situations where less is

known about the application in question. Also, due to the reliance on fewer assumptions, non-

parametric methods are more robust.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even

when the use of parametric methods is justified, non-parametric methods may be easier to use.

Due both to this simplicity and to their greater robustness, non-parametric methods are seen by

some statisticians as leaving less room for improper use and misunderstanding.


19/40

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases

where a parametric test would be appropriate, non-parametric tests have less power. In other

words, a larger sample size can be required to draw conclusions with the same degree of

confidence.

Non-parametric models

Non-parametric models differ from parametric models in that the model structure is not

specified a priori but is instead determined from data. The term non-parametric is not meant to

imply that such models completely lack parameters but that the number and nature of the

parameters are flexible and not fixed in advance.

A histogram is a simple nonparametric estimate of a probability distribution Kernel density estimation provides better estimates of the density than histograms. Nonparametric regression and semiparametric regression methods have been developed

based on kernels, splines, and wavelets.

Data Envelopment Analysis provides efficiency coefficients similar to those obtainedby Multivariate Analysis without any distributional assumption.

Methods

Non-parametric (ordistribution-free) inferential statistical methods are mathematical

procedures for statistical hypothesis testing which, unlike parametric statistics, make no

assumptions about the probability distributions of the variables being assessed. The most

frequently used tests include

AndersonDarling test Cochran's Q Cohen's kappa


20/40

Friedman two-way analysis of variance by ranks KaplanMeier Kendall's tau K

endall's W KolmogorovSmirnov test Kruskal-Wallis one-way analysis of variance by ranks Kuiper's test Logrank Test MannWhitney U or Wilcoxon rank sum test median test Pitman's permutation test Rank products SiegelTukey test Spearman's rank correlation coefficient WaldWolfowitz runs test Wilcoxon signed-rank test.3) Define and determine when it is appropriate to use:a. Z-test

It is a statistical test where normal distribution is applied and is basically used for dealing

with problems relating to large samples when n 30.

There are different types ofZ-test each for different purpose. Some of the popular types are

outlined below:

1. z test for single proportion is used to test a hypothesis on a specific value of the populationproportion.


21/40

Statistically speaking, we test the null hypothesis H0: p = p0 against the alternative hypothesis

H1: p >< p0 where p is the population proportion and p0 is a specific value of the population

proportion we would like to test for acceptance.

The example on tea drinkers explained above requires this test. In that example, p0 = 0.5. Notice

that in this particular example, proportion refers to the proportion of tea drinkers.

2. z test for difference of proportions is used to test the hypothesis that two populations have thesame proportion.

For example suppose one is interested to test if there is any significant difference in the habit of

tea drinking between male and female citizens of a town. In such a situation, Z-test for difference

of proportions can be applied.

One would have to obtain two independent samples from the town- one from males and the other

from females and determine the proportion of tea drinkers in each sample in order to perform this

test.

3. z -test for single mean is used to test a hypothesis on a specific value of the population mean.Statistically speaking, we test the null hypothesis H0: = 0 against the alternative hypothesis

H1: >< 0 where is the population mean and 0 is a specific value of the population that we

would like to test for acceptance.

Unlike the t-test for single mean, this test is used if n 30 and population standard deviation is

known.

4. z test for single variance is used to test a hypothesis on a specific value of the populationvariance.

Statistically speaking, we test the null hypothesis H0: = 0 against H1: >< 0 where is the

population mean and 0 is a specific value of the population variance that we would like to test

for acceptance.


22/40

In other words, this test enables us to test if the given sample has been drawn from a population

with specific variance 0. Unlike the chi square test for single variance, this test is used if n 30.

5. Z-test for testing equality of variance is used to test the hypothesis of equality of two populationvariances when the sample size of each sample is 30 or larger.

Example:

n = sample size

For example suppose a person wants to test if both tea & coffee are equally popular in a

particular town. Then he can take a sample of size say 500 from the town out of which suppose

280 are tea drinkers. To test the hypothesis, he can use Z-test.

Assumption:

Irrespective of the type ofZ-test used it is assumed that the populations from which the

samples are drawn are normal.

b. T-testThe students t test is a statistical method that is used to see if to sets of data differ

significantly. The method assumes that the results follow the normal distribution (also called

student's t-distribution) if the null hypothesis is true. This null hypothesis will usually stipulate

that there is no significant difference between the means of the two data sets.

It is best used to try and determine whether there is a difference between two independent

sample groups. For the test to be applicable, the sample groups must be completely independent,

and it is best used when the sample size is too small to use more advanced methods.

Before using this type of test it is essential to plot the sample data from he two samples and

make sure that it has a reasonably normal distribution, or the students t test will not be suitable.

It is also desirable to randomly assign samples to the groups, wherever possible.

Restrictions:


23/40

The two sample groups being tested must have a reasonably normal distribution. If the

distribution is skewed, then the students t test is likely to throw up misleading results. The

distribution should have only one main peak (= mode) near the mean of the group.

If the data does not adhere to the above parameters, then either a large data sample is needed or,

preferably, a more complex form of data analysis should be used.

Results:

The students t test can let you know if there is a significant difference in the means of the two

sample groups and disprove the null hypothesis. Like all statistical tests, it cannot prove anything, as

there is always a chance of experimental error occurring. But the test can support a hypothesis.

However, it is still useful for measuring small sample populations and determining if there is a

significant difference between the groups.

Example:

You might be trying to determine if there is a significant difference in test scores between

two groups of children taught by different methods.

The null hypothesis might state that there is no significant difference in the mean test scores of

the two sample groups and that any difference down to chance.

The students t test can then be used to try and disprove the null hypothesis.


24/40

c. Correlation and RegressionCorrelation Types

Correlation is a measure of association between two variables. The variables are not designated

as dependent or independent. The two most popular correlation coefficients are: Spearman's

correlation coefficient rho and Pearson's product-moment correlation coefficient.

When calculating a correlation coefficient for ordinal data, select Spearman's technique. For

interval or ratio-type data, use Pearson's technique.

The value of a correlation coefficient can vary from minus one to plus one. A minus one

indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation.

A correlation of zero means there is no relationship between the two variables. When there is a

negative correlation between two variables, as the value of one variable increases, the value of

the other variable decreases, and vise versa. In other words, for a negative correlation, the

variables work opposite each other. When there is a positive correlation between two variables,

as the value of one variable increases, the value of the other variable also increases. The

variables move together.

The standard error of a correlation coefficient is used to determine the confidence intervals

around a true correlation of zero. If your correlation coefficient falls outside of this range, then it

is significantly different than zero. The standard error can be calculated for interval or ratio-type

data (i.e., only for Pearson's product-moment correlation).

The significance (probability) of the correlation coefficient is determined from the t-statistic. The

probability of the t-statistic indicates whether the observed correlation coefficient occurred by

chance if the true correlation is zero. In other words, it asks if the correlation is significantly


25/40

different than zero. When the t-statistic is calculated for Spearman's rank-difference correlation

coefficient, there must be at least 30 cases before the t-distribution can be used to determine the

probability. If there are fewer than 30 cases, you must refer to a special table to find the

probability of the correlation coefficient.

Example:

A company wanted to know if there is a significant relationship between the total number of

salespeople and the total number of sales. They collect data for five months.

Variable1

Variable2

207 6907

180 5991

220 6810

205 6553

190 6190

--------------------------------

Correlation coefficient = .921

Standard error of the coefficient = ..068

t-test for the significance of the coefficient = 4.100

Degrees of freedom = 3

Two-tailed probability = .0263


26/40

Another Example:

Respondents to a survey were asked to judge the quality of a product on a four-point Likert scale

(excellent, good, fair, poor). They were also asked to judge the reputation of the company that

made the product on a three-point scale (good, fair, poor). Is there a significant relationship

between respondents perceptions of the company and their perceptions of quality of the product?

Since both variables are ordinal, Spearman's method is chosen. The first variable is the rating for

the quality the product. Responses are coded as 4=excellent, 3=good, 2=fair, and 1=poor. The

second variable is the perceived reputation of the company and is coded 3=good, 2=fair, and

1=poor.

Variable

1

Variable

2

4 3

2 2

1 2

3 3

4 3

1 1

2 1

-------------------------------------------


27/40

Correlation coefficient rho = .830

t-test for the significance of the coefficient = 3.332

Number of data pairs = 7

Probability must be determined from a table because of the small sample size.

Regression

Simple regression is used to examine the relationship between one dependent and one

independent variable. After performing an analysis, the regression statistics can be used to

predict the dependent variable when the independent variable is known. Regression goes beyond

correlation by adding prediction capabilities.

People use regression on an intuitive level every day. In business, a well-dressed man is

thought to be financially successful. A mother knows that more sugar in her children's diet

results in higher energy levels. The ease of waking up in the morning often depends on how late

you went to bed the night before. Quantitative regression adds precision by developing a

mathematical formula that can be used for predictive purposes.

For example, a medical researcher might want to use body weight (independent variable)

to predict the most appropriate dose for a new drug (dependent variable). The purpose of running

the regression is to find a formula that fits the relationship between the two variables. Then you

can use that formula to predict values for the dependent variable when only the independent

variable is known. A doctor could prescribe the proper dose based on a person's body weight.

The regression line (known as the least squares line) is a plot of the expected value of the

dependent variable for all values of the independent variable. Technically, it is the line that


28/40

"minimizes the squared residuals". The regression line is the one that best fits the data on a

scatterplot.

Using the regression equation, the dependent variable may be predicted from the

independent variable. The slope of the regression line (b) is defined as the rise divided by the

run. The y intercept (a) is the point on the y axis where the regression line would intercept the y

axis. The slope and y intercept are incorporated into the regression equation. The intercept is

usually called the constant, and the slope is referred to as the coefficient. Since the regression

model is usually not a perfect predictor, there is also an error term in the equation.

In the regression equation, y is always the dependent variable and x is always the

independent variable. Here are three equivalent ways to mathematically describe a linear

regression model.

y = intercept + (slope x) + error

y = constant + (coefficient x) + error

y = a + bx + e

The significance of the slope of the regression line is determined from the t-statistic. It is

the probability that the observed correlation coefficient occurred by chance if the true correlation

is zero. Some researchers prefer to report the F-ratio instead of the t-statistic. The F-ratio is equal

to the t-statistic squared.

The t-statistic for the significance of the slope is essentially a test to determine if the

regression model (equation) is usable. If the slope is significantly different than zero, then we


29/40

can use the regression model to predict the dependent variable for any value of the independent

variable.

On the other hand, take an example where the slope is zero. It has no prediction ability

because for every value of the independent variable, the prediction for the dependent variable

would be the same. Knowing the value of the independent variable would not improve our ability

to predict the dependent variable. Thus, if the slope is not significantly different than zero, don't

use the model to make predictions.

The coefficient of determination (r-squared) is the square of the correlation coefficient.

Its value may vary from zero to one. It has the advantage over the correlation coefficient in that it

may be interpreted directly as the proportion of variance in the dependent variable that can be

accounted for by the regression equation. For example, an r-squared value of .49 means that 49%

of the variance in the dependent variable can be explained by the regression equation. The other

51% is unexplained.

The standard error of the estimate for regression measures the amount of variability in the points

around the regression line. It is the standard deviation of the data points as they are distributed

around the regression line. The standard error of the estimate can be used to develop confidence

intervals around a prediction.

Example:

A company wants to know if there is a significant relationship between its advertising

expenditures and its sales volume. The independent variable is advertising budget and the

dependent variable is sales volume. A lag time of one month will be used because sales are

expected to lag behind actual advertising expenditures. Data was collected for a six month


30/40

period. All figures are in thousands of dollars. Is there a significant relationship between

advertising budget and sales volume?

Indep.

Var.

Depen.

Var

4.2 27.1

6.1 30.4

3.9 25.0

5.7 29.7

7.3 40.1

5.9 28.8

--------------------------------------------------

Model: y = 10.079 + (3.700 x) + error

Standard error of the estimate = 2.568

t-test for the significance of the slope = 4.095

Degrees of freedom = 4

Two-tailed probability = .0149

r-squared = .807

You might make a statement in a report like this: A simple linear regression was performed

on six months of data to determine if there was a significant relationship between advertising

expenditures and sales volume. The t-statistic for the slope was significant at the .05 critical

alpha level, t(4)=4.10, p=.015. Thus, we reject the null hypothesis and conclude that there was a


31/40

positive significant relationship between advertising expenditures and sales volume.

Furthermore, 80.7% of the variability in sales volume could be explained by advertising

expenditures.

d. Analysis of Variance

An important technique for analyzing the effect of categorical factors on a response is to

perform an Analysis of Variance. An ANOVA decomposes the variability in the response

variable amongst the different factors. Depending upon the type of analysis, it may be important

to determine: (a) which factors have a significant effect on the response, and/or (b) how much of

the variability in the response variable is attributable to each factor.

STATGRAPHICS Centurion provides several procedures for performing an analysis of variance:

1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to

comparing multiple groups of data.

2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in a

crossed pattern. When factors are crossed, the levels of one factor appear at more than one level

of the other factors.

3. Variance Components Analysis - used when there are multiple factors, arranged in a

hierarchical manner. In such a design, each factor is nested in the factor above it.

4. General LinearModels - used whenever there are both crossed and nested factors, when some

factors are fixed and some are random, and when both categorical and quantitative factors are

present.

One-Way ANOVA


32/40

A one-way analysis of variance is used when the data are divided into groups according

to only one factor. The questions of interest are usually: (a) Is there a significant difference

between the groups?, and (b) If so, which groups are significantly different from which others?

Statistical tests are provided to compare group means, group medians, and group standard

deviations. When comparing means, multiple range tests are used, the most popular of which is

Tukey's HSD procedure. For equal size samples, significant group differences can be determined

by examining the means plot and identifying those intervals that do not overlap.

Multifactor ANOVA

When more than one factor is present and the factors are crossed, a multifactor ANOVA

is appropriate. Both main effects and interactions between the factors may be estimated. The

output includes an ANOVA table and a new graphical ANOVA from the latest edition of

Statistics for Experimenters by Box, Hunter and Hunter (Wiley, 2005). In a graphical ANOVA,

the points are scaled so that any levels that differ by more than exhibited in the distribution of the

residuals are significantly different.

Variance Components Analysis

A Variance Components Analysis is most commonly used to determine the level at which

variability is being introduced into a product. A typical experiment might select several batches,

several samples from each batch, and then run replicates tests on each sample. The goal is to

determine the relative percentages of the overall process variability that is being introduced at

each level.

General Linear Model


33/40

The General LinearModels procedure is used whenever the above procedures are not

appropriate. It can be used for models with both crossed and nested factors, models in which one

or more of the variables is random rather than fixed, and when quantitative factors are to be

combined with categorical ones. Designs that can be analyzed with the GLM procedure include

partially nested designs, repeated measures experiments, split plots, and many others. For

example, pages 536-540 of the book Design and Analysis of Experiments (sixth edition) by

Douglas Montgomery (Wiley, 2005) contains an example of an experimental design with both

crossed and nested factors. For that data, the GLM procedure produces several important tables,

including estimates of the variance components for the random factors.

e. Chi-Square TestAny statistical test that uses the chi square distribution can be called chi square test. It is

applicable both for large and small samples-depending on the context.

There are different types of chi square test each for different purpose. Some of the popular

types are outlined below.

Chi square test for testing goodness of fitis used to decide whether there is any differencebetween the observed (experimental) value and the expected (theoretical) value.

For example given a sample, we may like to test if it has been drawn from a normal population.

This can be tested using chi square goodness of fit procedure.

Chi square test for independence of two attributes. Suppose N observations are consideredand classified according two characteristics say A and B. We may be interested to test

whether the two characteristics are independent. In such a case, we can use Chi square test

for independence of two attributes.

The example considered above testing for independence of success in the English test vis a vis

immigrant status is a case fit for analysis using this test.


34/40

Chi square test for single variance is used to test a hypothesis on a specific value of thepopulation variance. Statistically speaking, we test the null hypothesis H0: = 0 against the

research hypothesis H1: # 0 where is the population mean and 0 is a specific value of

the population variance that we would like to test for acceptance.

In other words, this test enables us to test if the given sample has been drawn from a

population with specific variance 0. This is a small sample test to be used only if sample size is

less than 30 in general.

Example:

For example suppose a person wants to test the hypothesis that success rate in a particular

English test is similar for indigenous and immigrant students.

If we take random sample of say size 80 students and measure both

indigenous/immigrant as well as success/failure status of each of the student, the chi square test

can be applied to test the hypothesis.

Assumptions:

The Chi square test for single variance has an assumption that the population from which

the sample has been is normal. This normality assumption need not hold for chi square goodness

of fit test and test for independence of attributes.

However while implementing these two tests, one has to ensure that expected frequency

in any cell is not less then 5. If it is so, then it has to be pooled with the preceding or succeeding

cell so that expected frequency of the pooled cell is at least 5.

Non-Parametric and Distribution Free:

It has to be noted that the Chi square goodness of fit test and test for independence of

attributes depend only on the set of observed and expected frequencies and degrees of freedom.


35/40

These two tests do not need any assumption regarding distribution of the parent population from

which the samples are taken.

Since these tests do not involve any population parameters or characteristics, they are

also termed as non parametric or distribution free tests. An additional important fact on these two

tests is they are sample size independent and can be used for any sample size as along as the

assumption on minimum expected cell frequency is met.

4) Types of Non-Parametric Test

Basically, there is at least one nonparametric equivalent for each parametric general type of test.

In general, these tests fall into the following categories:


36/40

y Tests of differences between groups (independent samples);y Tests of differences between variables (dependent samples);y Tests of relationships between variables.

Differences between independent groups.Usually, when we have two samples that we want to

compare concerning their mean value for some variable of interest, we would use the t-test for

independent samples); nonparametric alternatives for this test are the Wald-Wolfowitz runs test, the

Mann-Whitney U test, and the Kolmogorov-Smirnov two-sample test. If we have multiple groups,

we would use analysis of variance (see ANOVA/MANOVA; the nonparametric equivalents to this

method are the Kruskal-Wallis analysis of ranks and the Median test.

Differences between dependent groups.If we want to compare two variables measured in the same

sample we would customarily use the t-test for dependent samples (in Basic Statistics for example, if

we wanted to compare students' math skills at the beginning of the semester with their skills at the

end of the semester). Nonparametric alternatives to this test are the Sign test and Wilcoxon's matched

pairs test. If the variables of interest are dichotomous in nature (i.e., "pass" vs. "no pass") then

McNemar's Chi-square test is appropriate. If there are more than two variables that were measured in

the same sample, then we would customarily use repeated measures ANOVA. Nonparametric

alternatives to this method are Friedman's two-way analysis of variance and Cochran Q test (if the

variable was measured in terms of categories, e.g., "passed" vs. "failed"). Cochran Q is particularly

useful for measuring changes in frequencies (proportions) across time.

Relationships between variables.To express a relationship between two variables one usually

computes the correlation coefficient. Nonparametric equivalents to the standard correlation

coefficient are Spearman R, Kendall Tau, and coefficient Gamma (see Nonparametric correlations).

If the two variables of interest are categorical in nature (e.g., "passed" vs. "failed" by "male" vs.


37/40

"female") appropriate nonparametric statistics for testing the relationship between the two variables

are the Chi-square test, the Phi coefficient, and the Fisher exact test. In addition, a simultaneous test

for relationships between multiple cases is available: Kendall coefficient of concordance. This test is

often used for expressing inter-rater agreement among independent judges who are rating (ranking)

the same stimuli.

Descriptive statistics.When one's data are not normally distributed, and the measurements at best

contain rank order information, then computing the standard descriptive statistics (e.g., mean,

standard deviation) is sometimes not the most informative way to summarize the data. For example,

in the area of psychometrics it is well known that the rated intensity of a stimulus (e.g., perceived

brightness of a light) is often a logarithmic function of the actual intensity of the stimulus (brightness

as measured in objective units of Lux). In this example, the simple mean rating (sum of ratings

divided by the number of stimuli) is not an adequate summary of the average actual intensity of the

stimuli. (In this example, one would probably rather compute the geometric mean.) Nonparametrics

and Distributions will compute a wide variety of measures of location (mean, median, mode, etc.)

and dispersion (variance, average deviation, quartile range, etc.) to provide the "complete picture" of

one's data.

When to UseWhich Method

It is not easy to give simple advice concerning the use of nonparametric procedures. Each

nonparametric procedure has its peculiar sensitivities and blind spots. For example, the Kolmogorov-

Smirnov two-sample test is not only sensitive to differences in the location of distributions (for

example, differences in means) but is also greatly affected by differences in their shapes. The

Wilcoxon matched pairs test assumes that one can rank order the magnitude of differences in

matched observations in a meaningful manner. If this is not the case, one should rather use the Sign


38/40

test. In general, if the result of a study is important (e.g., does a very expensive and painful drug

therapy help people get better?), then it is always advisable to run different nonparametric tests;

should discrepancies in the results occur contingent upon which test is used, one should try to

understand why some tests give different results. On the other hand, nonparametric statistics are less

statistically powerful (sensitive) than their parametric counterparts, and if it is important to detect

even small effects (e.g., is this food additive harmful to people?) one should be very careful in the

choice of a test statistic.

Large data sets and nonparametric methods.Nonparametric methods are most appropriate when the

sample sizes are small. When the data set is large (e.g., n > 100) it often makes little sense to use

nonparametric statistics at all. Elementary Concepts briefly discusses the idea of the central limit

theorem. In a nutshell, when the samples become very large, then the sample means will follow the

normal distribution even if the respective variable is not normally distributed in the population, or is

not measured very well. Thus, parametric methods, which are usually much more sensitive (i.e.,

have more statistical power) are in most cases appropriate for large samples. However, the tests of

significance of many of the nonparametric statistics described here are based on asymptotic (large

sample) theory; therefore, meaningful tests can often not be performed if the sample sizes become

too small. Please refer to the descriptions of the specific tests to learn more about their power and

efficiency.

Nonparametric Correlations

The following are three types of commonly used nonparametric correlation coefficients

(Spearman R, Kendall Tau, and Gamma coefficients). Note that the chi-square statistic computed for

two-way frequency tables, also provides a careful measure of a relation between the two (tabulated)


39/40

variables, and unlike the correlation measures listed below, it can be used for variables that are

measured on a simple nominal scale.

Spearman R. Spearman R (Siegel & Castellan, 1988) assumes that the variables under

consideration were measured on at least an ordinal (rank order) scale, that is, that the individual

observations can be ranked into two ordered series. Spearman R can be thought of as the regular

Pearson product moment correlation coefficient, that is, in terms of proportion of variability

accounted for, except that Spearman R is computed from ranks.

Kendall tau. Kendall tau is equivalent to Spearman R with regard to the underlying assumptions. It

is also comparable in terms of its statistical power. However, Spearman R and Kendall tau are

usually not identical in magnitude because their underlying logic as well as their computational

formulas are very different. Siegel and Castellan (1988) express the relationship of the two measures

in terms of the inequality: More importantly, Kendall tau and Spearman R imply different

interpretations: Spearman R can be thought of as the regular Pearson product moment correlation

coefficient, that is, in terms of proportion of variability accounted for, except that Spearman R is

computed from ranks. Kendall tau, on the other hand, represents a probability, that is, it is the

difference between the probability that in the observed data the two variables are in the same order

versus the probability that the two variables are in different orders.

-1 3 * Kendall tau - 2 * Spearman R 1

Gamma. The Gamma statistic (Siegel & Castellan, 1988) is preferable to Spearman R orKendall

tau when the data contain many tied observations. In terms of the underlying assumptions, Gamma is

equivalent to Spearman R orKendall tau; in terms of its interpretation and computation it is more

similar to Kendall tau than Spearman R. In short, Gamma is also a probability; specifically, it is

computed as the difference between the probability that the rank ordering of the two variables agree


40/40

minus the probability that they disagree, divided by 1 minus the probability of ties. Thus, Gamma is

basically equivalent to Kendall tau, except that ties are explicitly taken into account.