Experiment Design & Statistical Analyses for FYP Students_June 2015

Experiment Design & Statistical Analyses for FYP Students

(A Refresher Biostatistics session for FYP students)

23, June 2015

Conducted by Dr Nevil

LSCT, NP

1

The outlines of the Todays session are..

Experiment design & Testing hypothesis

Data validation & Analysis

Interpretation of results and making a conclusion

Statistical analyses in Excel paired t-test, unpaired t-test, Analysis of variance (ANOVA) and multiple comparison

Some examples of published scientific presentations in different types of charts

How to create a chart with error bars in Excel

2

Important Note:Any research involving human participants will require IRB approval.Any research involving animals will require IACUC approval

The Most Important Points to note in your FYP in relation to statistics are

Experiment design & Hypothesis testing

Clear objectives

State your hypotheses: Null and Alternate (Research)

Sample size

Replication of experiment

Execution of experiment

Data collection and validation

Check your data for normal distribution and outliers

Set criteria

Significance level () = 0.05 or 0.01 or 0.001

Test statistics

Choose correct statistical test (Refer flow chart in annex slide 58)

3

The Most Important points to note in your FYP in relation to statistics are

Interpretation of results & making conclusion

Interpret the result based on p-value and significance level.

Interpret the result based on critical value based and calculated value.

Interpret the result trendline (relationship / association between 2 variables)

Make your conclusion on the null or alternative hypothesis.

Presentation of results

Choose suitable graph/table

Use error bars to compare your data Mean or trendline to show the relationship between variables

Write clear title, axis labels, legends and footnote.

What is a Hypothesis Testing?

Hypothesis testing, otherwise called significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. (e.g. The Mean height of Male students population in NP is 170 cm).

What is the goal of hypothesis testing?

The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the Mean height of Asian males, is likely to be true or not.

Null & Alternative Hypotheses (Two tail / Non-directional)

The null hypothesis, denoted by H0 , is a tentative assumption about

a population parameter. The assumption is always true.

E.g. The Mean height of Male students population in NP is 170 cm.

The alternative hypothesis, otherwise called research hypothesis

denoted by Ha, is the opposite of what is stated in the null

hypothesis. The assumption is NOT true.

E.g. The Mean height of Male students population in NP is NOT 170 cm.

Researchers uses data from sample(s) to test the these two

competing statements.

Researchers always challenge the null hypothesis that it is NOT

TRUE.

Tips to remember

Null means: Always Neutral, no effect, same, or no different

Alternative means: No neutral, has effect, not same, different, increase or decrease

One-tailed

(lower-tail)One-tailed

(upper-tail)

Two-tailed

Three Forms for Null and Alternative Hypotheses

When we do hypothesis testing, we must follow one of the

following three forms

0: 0

: 0 <

0: 0

: 0 >

0: 0 =

: 0

E.g. H0: The height of Asian male population is 173 cm.Ha: The height of Asian male population is not 173 cm.This is the most commonly used method.

E.g. H0: The height of Asian male population is equal to or more than 173 cm.Ha: The height of Asian male population is less than 173 cm.

E.g. H0: The height of Asian male population is equal to or less than 173 cm.Ha: The height of Asian male population is more than 173 cm.

0 is Hypothesized population Mean is True population Mean

Null & Alternative Hypotheses (One tail/Directional)

Another example: A new drug developed is believed to be reducing serum glucose level in diabetic patients than the existing drugs.

Null Hypothesis H0: The new drug doesnt reduce the serum glucose level in diabetic patients than the existing drugs.

Alternative Hypothesis (Ha): The new drug reduces the serum glucose level in diabetic patients than the existing drugs.

Steps of hypothesis testing

1. State the hypotheses: Null and Alternate (Research).

2. Set the criteria for a decision (significance level, called alpha ())

3. Select and compute test(s) statistics.

4. Interpret the results

5. Make a decision.

Step 1. State the hypothesis

The null hypothesis (H0), stated as the null, is a statement about a parameter, such as the population Mean, that is assumed to be true.

The null hypothesis is a starting point. We will test whether the parameter (Population Mean) stated in the null hypothesis is likely to be true.

E.g. The Mean height of Male students population in NP is 170 cm.

An alternative hypothesis (Ha) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a parameter (Population Mean) is less than, greater than, or not equal to null hypothesis.

The alternative hypothesis states that the null hypothesis is wrong.E.g. The Mean height of Male students population in NP is NOT 170 cm.

2: Set the criteria (-value) to make a decision. To set the criteria, we need state the level of significance, otherwise

called alpha indicated by a symbol . In tradition, the level of significance is typically set at = 0.05. The

other levels used are =0.01 and = 0.001. Level of significance, or significance level, is the cut of point at which

we say the calculated probability (p)-value is small enough to reject null hypothesis or large enough to accept the null hypothesis.

Probability (p) value The P value or calculated probability is the estimated probability of

rejecting the null hypothesis (H0) of a study question when that hypothesis is true.

If p is low, H0 must GO

Key points to remember If p-value is smaller than , we say that mean values are significantly

different between groups. Hence, we reject the null hypothesis. If p-value is bigger than , we say that mean values are NOT significantly

different between groups. Hence, we accept the null hypothesis.

Significance level & Confidence level

Significance levels are related to confidence levels through the rule, CL = 1- .

1-0.05 = 0.95

1-0.01 = 0.99

1-0.001 = 0.999

It is common to express the confidence level in %.

CL = 1-0.05 = 0.95 = 95%

CL = 1-0.01 = 0.99 = 99%

CL = 1-0.001 = 0.999 = 99.9%

3. Select and Compute the Test Statistic

The test statistic is select and apply the appropriate statistical test (e.g. t-test or z-test or ANOVA) that allows researchers to determine the likelihood of obtaining sample outcomes.

The value of the test statistic is used to make a decision regarding the null hypothesis.

4. Make a decision We use the result of the test statistic and significance level to make a

decision about the null hypothesis.

In practice, one of these two decisions a researcher can make:

1. Reject the null hypothesis, when p value is less than set (e.g. 0.05). That means, the difference between and 0 is significant. The difference is NOT caused by a chance. It is caused by the experiment. The experiment has strong evidence to reject the null hypothesis.

2. Accept the null hypothesis, when p value is more than set (e.g. 0.05). That means, the difference between and 0 is NOT significant. The difference is caused by a chance. It is NOT caused by the experiment. The experiment does not have strong evidence to reject the null hypothesis.

P < 0.05, the difference is significant, Reject H0P > 0.05, the difference is not significant, Accept H0

Experiment Design - Objectives

Example 1: The aim of the project is to evaluate the effect of a new antibiotic on the mortality of bacterium sp. X

15

Example 3. The aim of the project is to evaluate the effect of herbal supplement on the immune parameters of juvenile fish.

Example 2. The aim of the project is to study the effect of new drug on Diabetic mice to reduce blood serum glucose level

Let's take example 3 for hypothesis testing

Experiment design Making Null and Alternate Hypotheses

E.g. Objective. The aim of the project is to evaluate the effect of herbal supplement on immune parameters of juvenile fish.

Write your null and alternative hypothesis and symbols for this objective

Null hypothesis (H0) Herbal supplement will not affect the immune parameters of juvenile fish.

Alternative hypothesis (Ha) Herbal supplement will affect the immune parameters of juvenile fish.

16

0: 0 =

: 0

= value of true population mean0 = value of hypothesized population mean

Is this hypothesis one tail or two tail?

17

Steps of Hypothesis TestingStep 1: Develop Null and Alternative hypotheses

Refer slides 6 & 7

Step 2: Set the criteria for decision

Level of significance (conventionally = 0.05 or 0.01)

Step 3: Compute the test statistics

Appropriate statistical test like t-test, z-test and ANalysis Of Variance (ANOVA)

Step 4: Interpret results

Make your decision based on the set alpha () and P-value or based on the

critical and calculated values.

Step 5: Make a decision / conclusion

When we reach statistical significance, the null hypothesis is rejected and the

research (alternative) hypothesis is accepted.

When we fail to reach statistical significance, the null hypothesis is accepted and the research (alternative) hypothesis is rejected.

Sample Size One of the main issues in experimental design is setting the sample size. Certainly, larger sample size is desirable. The reasons that data would be

mostly in normal distribution and gives lesser variation. If your project involves a population study (e.g. survey), the sample size

should be at least 10% of that population. In fact, this number is a challenge in population studies that require animal model due to cost and animal ethical issues.

Sample size required depends on how significant your experiments need to be.

More significant results need bigger sample size Some text books recommend the sample size of 30 and above is large

sample size for biological research. Below 30 is classified as small sample size, which is also accepted in biological research.

Its a good practice to refer to scientific articles related to your project to set your sample size.

In general, clinical trials and population studies require large sample size. Websites are available to calculate sample size at different significance level.

18

Replication of experiment It is very important to obtain consistency and

reproducibility in your experiment results.

Replication of the whole experiment is laborious, time consuming and costly.

One way to address this issue is having 3 or more replicates of samples within the same experiment.

However, in some long term research or clinical studies replication of the whole experiment is required.

19

Types of data you may collect in your experiments

Data Type Description Examples Types of Stats Analysis to Apply

Qualitative data

Descriptive, observed, categorical orinterpretive data

Colour of bacteria colonies yellow, red, brown, grey;

Behaviour of mice fed on caffeine diet;

Survey on elderly habits in taking prescribed medication;

Classification; Factor analysis;Cluster analysis;, Prediction

Quantitative data

Count / frequency data

No. of larva that survived after heat treatment;

No. of viable cells after treatment with anti-tumour drug;

Chi-square test, Goodness of Fit

Measureddata

Length of fish larvae after 2 weeks of treatment with supplemented feed;

Amount of insulin in the blood sugar of diabetic mice in relation to their glycemic load diet over 2 weeks;

t-test, z-test ANOVA,multiple comparison,

20

Once you have designed your experiments, you will need to think about the data you will collect and what are the tests to apply.

Data validation prior to analysis

Check your data for normal distribution.

It can be done subjecting your data to descriptive statistics.

It can be visualized in Excel by scatter plot and by constructing frequency histogram.

If your data points are normally distributed you may be able to see a kind of Bell curve.

If you are data points are normally distributed, proceed to parametric tests such as paired t- test, unpaired t-test and ANOVA, which ever test is suitable for your data.

If your data are not normally distributed proceed to non-parametric test such as Wilcoxon signed Rank test ( = paired t-test), Mann-Whitney test (= unpaired t-test) and Kruscal Wallis test (= ANOVA), Chi-Square (for categorical data) which ever test is suitable for your data.

21

Refer Annex slide, 58 for more info

Some example of statistical tests and their use

Descriptive statistics preliminary check to validate your data

Paired T-test compare means of two groups which came from the same subjects. Used in before and after situation.

Unpaired T-test compare means of two groups of unequal sample size. The data not necessarily come from the same subjects.

ANalysis Of Variance (ANOVA)* analyze the variances for more than 2 groups

Multiple comparison test* E.g. Tukeys test Continuation of ANOVA to compare means in pairs

Correlation & Regression see the relationship between 2 or more variables

Chi-square test for the distribution of categorical variables

22

23

Descriptive Statistics

Key in your data sets in Excel and follow the screen shots

The basic statistics to get the summary of data such as mean, median, mode, variance, standard deviation and other functions of your data.

A preliminary check to know about your data before analysis.

Descriptive statistics output in MS Excel

26

The important values you need know from this table are mean, standard deviation, skewness and kurtosis. Why?

Paired t- testYou can use paired t-test When you have 2 groups of data Conditions

Data are normally distributed Samples are dependent sample size must be equal. Used in before and after situation. In this case, the subjects

will be same. Two types of data collected from the same subjects Matching pairs by gender or age

It is most commonly used in clinical trial studies to evaluate the efficacy of new drugs for various diseases and disorders before and after treatments.

27

28

Do you see a significant difference between S1 and S2?

Example of Paired t-testIn Excel it is called t-test: Paired TWO Sample for Means

Sub

Unpaired or Independent sample t-test

You can use unpaired t-test

When you have 2 groups of data

Conditions Data are normally distributed

Samples are independent

sample size need not be equal.

Data collected from different subjects

It is most commonly used in comparing Means of 2 groups.

Unpaired t- test

Example: A pair of students tested the effect of hand washing liquid on the bacterial growth. Students prepared several

petri dishes with culture medium. Hand washing liquid were added in some of the Petri dishes and in not in others. Same amount of bacteria were inoculated in all petri dishes and

incubated for 48 hrs. After 48 hrs, the number colonies present in each petri were counted. The results are tabulated

in the next slide.

What is your null and alternative hypotheses?

Unpaired or Independent sample t-test in ExcelIn Excel it is called t-test: Two samples assuming equal variances

Unpaired or Independent sample t-test output in Excel

Interpretation: P value is smaller than 0.05. t Stat value is larger than the t critical value. These values indicate there is a significant difference between control group and treatment group in terms of bacterial colonies.What is your conclusion on null hypothesis?

ANOVA: ANalysis Of VAriance

You can use ANOVA when you have more than 2 groups of data to compare. your data are normally distributed. sample size need not be same.

AdvantageIt can tell us overall significant difference in the experimental groups.

LimitationIt doesnt tell us which group is significantly different from other groups.

34

One-way ANOVA (ANOVA single factor in Excel)

Example: A Biologist quantified the levels of cadmium present in different species of sea weeds. He collected same amounts of 4

different species of seaweeds and measured cadmium levels separately. The data are tabulated in the next slide.

What is your null and alternative hypotheses?

36

Prior to do ANOVA, check data for normality

Are data in this example normally distributed? Why?

37

ANOVA single factor in MS Excel

38

ANOVA single factor in MS Excel

What is your interpretation on the ANOVA results?

Do you accept or reject the null hypothesis? Why?

What is your next step in analysis? Why?

You can use Tukeys test when ANOVA shows overall significant difference (P < 0.05) of your data. Tukeys test is able to tell you which group is significantly different from

the others.

It makes pairwise comparisons to show the difference in Means. A very useful test available in Excel for multiple comparison.

Multiple comparison Tukey's test

40

How to do Multiple Comparisons in Excel?

Select data with labels > click Add-Ins > Data Analysis Plus > MultipleComparisons > OK

Check Labels (if you had selected labels when you selected the data).Otherwise, no. By default is 0.05. Press Ok (unless need to change this).

You will see the multiple comparison output in another Excel sheet.

Multiple comparison in MS Excel (Tukeys Test)

41

42

Multiple comparison in MS Excel (Tukeys Test)

43

The Excel output of Tukeys test looks like this

How to interpret?Look at the values in difference column and Omega column. If the value in difference column is LARGER than the value in Omega column for a particular pair, the Mean difference between the pair is statistically significant.If the value in difference column is SMALLER than the value in Omega column for a particular pair, the Mean difference between the pair is NOT statistically significant.In your conclusion, if you have many pairs to interpret, you may write about the pairs that are significantly different.

Scientific Presentation of ResultsWhat you mainly need are

Suitable chart Mean values of results Standard deviation or Standard error of mean Confidence interval (in some cases) Interpretation of results Proper X and Y axes labels Adequate title Footnote

In footnote, indicate statistical output, , level of significance, sample size, test statistics used and experiment is replicated or not.

44

You can refer slides 46-50 for some examples of published scientific data presentations using different kinds of charts.

When do you use Standard deviation (SD), Standard error of mean (SEM) and Confidence Interval (CI)

in your data presentation?

45

When your Goal is to SD SEM CI

show the variation in each group

show how precisely you have determined the mean.

compare means different groups.

Whatever you choose to show, be sure to state your choice in the footnote of your graph /table.

Samples of data presentationExample 1: Table format

46

Table format is fine, if you have many groups to compare. However, It may not be impressive sometimes and not easy for readers to visualize effects of the experiment.

Note the Title is on top. Footnote at the bottom of the table which contains details of test groups, sample size, statistical tests & software used.

Example 2: Simple bar chart to compare Means of 3 groups

47

Note the title and footnote are combined at the bottom of the graphs.

Bars represent mean values and error bars represent SEM.

Bar with * represent No Gum group is significantly different from other 2 groups

Example 3: Vertical Cluster bar chart used to compare means of 5 groups (different time intervals) with different blood cells

48

Bars indicate group Means and error bars indicate Standard error of Means

49

Note: confidence interval is stated

Mean value

Example 4. Line chart to present survival studies (Line chart is suitable to present data that were collect over the period of time,

cell culture studies and incubation studies)

Example 5. Line and clustered bar charts

50

Line chart is well suited for incubation experiments that show data over the period of experiments (shows trends)

It is easy to visualize, if you have to many groups to compare.

Important Steps to prepare a graph with error bars in MS Excel !!!

Key in raw data in Excel spreadsheet.

Find the Mean, Standard Deviation (SD) and Standard Error of Mean (SEM) in Excel or manually.

Use Mean values to draw a chart. NOT THE RAW DATA.

Use SD as error bars on the chart if, your interest is show the deviation of data from the mean.

Use SEM as error bars on the chart, if your interest is to compare the means of different groups. This is the most commonly used (E.g. Compare the mean height of kangkong plant grown in different conditions).

Label the X and Y axes with proper terms and units.

Give appropriate title to your chart.

Indicate the Statistical test used, level of significant, sample size (n), and choice of error bar used in footnote. (E.g. test used is ANOVA, level of significance is p

How to prepare chart with error bars in Excel

52

In Excel, go to formula bar and type =Average(select cell with data) E.g. =AVERAGE(b2:b6)=STDEV(select cells with data) E.g. = STDEV(b2:b6)=(cell with STDEV value /(sqrt(n)) E.g. =B8/(sqrt(5))

OrSelect descriptive statistics as shown in slide 23 -26.

E.g. Key in data, calculate the mean, SD and SEM as shown in this slide.

n is sample size

To draw bar chartCopy and paste feed label and mean values onto new cells as shown in this slide. Select the feed label and mean values > Insert > select column chart >

2D column and enter. You will see bar chat as shown below.

53

To enter error barsClick the chart > chart elements > click error bar option and the

arrow and select more options as shown in this slide.

54

Under error bar option select Both (by default it should be both), scroll down and select custom as shown in this slide

55

Click specify values and fill positive and negative error values by selecting SEM values in the spreadsheet once for positive error value and 2nd time for

negative error value as shown in this slide and press Ok

56

You can see a chart with SEM as error bars. Fill in axes and title as shown in this slide

57

Flow chart used to choose an appropriate statistical test for your data

58

It may look complicated, but it's not and it's a very good reference.

Refer the soft copy for enlarged view

Start here

Annex

Q & A

59

Experiment Design & Statistical Analyses for FYP Students_June 2015

Documents

Transcript of Experiment Design & Statistical Analyses for FYP Students_June 2015