Experiment Design & Statistical Analyses for FYP Students_June 2015
-
Upload
ruther-teo -
Category
Documents
-
view
7 -
download
0
description
Transcript of Experiment Design & Statistical Analyses for FYP Students_June 2015
-
Experiment Design & Statistical Analyses for FYP Students
(A Refresher Biostatistics session for FYP students)
23, June 2015
Conducted by Dr Nevil
LSCT, NP
1
-
The outlines of the Todays session are..
Experiment design & Testing hypothesis
Data validation & Analysis
Interpretation of results and making a conclusion
Statistical analyses in Excel paired t-test, unpaired t-test, Analysis of variance (ANOVA) and multiple comparison
Some examples of published scientific presentations in different types of charts
How to create a chart with error bars in Excel
2
Important Note:Any research involving human participants will require IRB approval.Any research involving animals will require IACUC approval
-
The Most Important Points to note in your FYP in relation to statistics are
Experiment design & Hypothesis testing
Clear objectives
State your hypotheses: Null and Alternate (Research)
Sample size
Replication of experiment
Execution of experiment
Data collection and validation
Check your data for normal distribution and outliers
Set criteria
Significance level () = 0.05 or 0.01 or 0.001
Test statistics
Choose correct statistical test (Refer flow chart in annex slide 58)
3
-
The Most Important points to note in your FYP in relation to statistics are
Interpretation of results & making conclusion
Interpret the result based on p-value and significance level.
Interpret the result based on critical value based and calculated value.
Interpret the result trendline (relationship / association between 2 variables)
Make your conclusion on the null or alternative hypothesis.
Presentation of results
Choose suitable graph/table
Use error bars to compare your data Mean or trendline to show the relationship between variables
Write clear title, axis labels, legends and footnote.
-
What is a Hypothesis Testing?
Hypothesis testing, otherwise called significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. (e.g. The Mean height of Male students population in NP is 170 cm).
What is the goal of hypothesis testing?
The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the Mean height of Asian males, is likely to be true or not.
-
Null & Alternative Hypotheses (Two tail / Non-directional)
The null hypothesis, denoted by H0 , is a tentative assumption about
a population parameter. The assumption is always true.
E.g. The Mean height of Male students population in NP is 170 cm.
The alternative hypothesis, otherwise called research hypothesis
denoted by Ha, is the opposite of what is stated in the null
hypothesis. The assumption is NOT true.
E.g. The Mean height of Male students population in NP is NOT 170 cm.
Researchers uses data from sample(s) to test the these two
competing statements.
Researchers always challenge the null hypothesis that it is NOT
TRUE.
Tips to remember
Null means: Always Neutral, no effect, same, or no different
Alternative means: No neutral, has effect, not same, different, increase or decrease
-
One-tailed
(lower-tail)One-tailed
(upper-tail)
Two-tailed
Three Forms for Null and Alternative Hypotheses
When we do hypothesis testing, we must follow one of the
following three forms
0: 0
: 0 <
0: 0
: 0 >
0: 0 =
: 0
E.g. H0: The height of Asian male population is 173 cm.Ha: The height of Asian male population is not 173 cm.This is the most commonly used method.
E.g. H0: The height of Asian male population is equal to or more than 173 cm.Ha: The height of Asian male population is less than 173 cm.
E.g. H0: The height of Asian male population is equal to or less than 173 cm.Ha: The height of Asian male population is more than 173 cm.
0 is Hypothesized population Mean is True population Mean
-
Null & Alternative Hypotheses (One tail/Directional)
Another example: A new drug developed is believed to be reducing serum glucose level in diabetic patients than the existing drugs.
Null Hypothesis H0: The new drug doesnt reduce the serum glucose level in diabetic patients than the existing drugs.
Alternative Hypothesis (Ha): The new drug reduces the serum glucose level in diabetic patients than the existing drugs.
-
Steps of hypothesis testing
1. State the hypotheses: Null and Alternate (Research).
2. Set the criteria for a decision (significance level, called alpha ())
3. Select and compute test(s) statistics.
4. Interpret the results
5. Make a decision.
-
Step 1. State the hypothesis
The null hypothesis (H0), stated as the null, is a statement about a parameter, such as the population Mean, that is assumed to be true.
The null hypothesis is a starting point. We will test whether the parameter (Population Mean) stated in the null hypothesis is likely to be true.
E.g. The Mean height of Male students population in NP is 170 cm.
An alternative hypothesis (Ha) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a parameter (Population Mean) is less than, greater than, or not equal to null hypothesis.
The alternative hypothesis states that the null hypothesis is wrong.E.g. The Mean height of Male students population in NP is NOT 170 cm.
-
2: Set the criteria (-value) to make a decision. To set the criteria, we need state the level of significance, otherwise
called alpha indicated by a symbol . In tradition, the level of significance is typically set at = 0.05. The
other levels used are =0.01 and = 0.001. Level of significance, or significance level, is the cut of point at which
we say the calculated probability (p)-value is small enough to reject null hypothesis or large enough to accept the null hypothesis.
Probability (p) value The P value or calculated probability is the estimated probability of
rejecting the null hypothesis (H0) of a study question when that hypothesis is true.
If p is low, H0 must GO
Key points to remember If p-value is smaller than , we say that mean values are significantly
different between groups. Hence, we reject the null hypothesis. If p-value is bigger than , we say that mean values are NOT significantly
different between groups. Hence, we accept the null hypothesis.
-
Significance level & Confidence level
Significance levels are related to confidence levels through the rule, CL = 1- .
1-0.05 = 0.95
1-0.01 = 0.99
1-0.001 = 0.999
It is common to express the confidence level in %.
CL = 1-0.05 = 0.95 = 95%
CL = 1-0.01 = 0.99 = 99%
CL = 1-0.001 = 0.999 = 99.9%
-
3. Select and Compute the Test Statistic
The test statistic is select and apply the appropriate statistical test (e.g. t-test or z-test or ANOVA) that allows researchers to determine the likelihood of obtaining sample outcomes.
The value of the test statistic is used to make a decision regarding the null hypothesis.
-
4. Make a decision We use the result of the test statistic and significance level to make a
decision about the null hypothesis.
In practice, one of these two decisions a researcher can make:
1. Reject the null hypothesis, when p value is less than set (e.g. 0.05). That means, the difference between and 0 is significant. The difference is NOT caused by a chance. It is caused by the experiment. The experiment has strong evidence to reject the null hypothesis.
2. Accept the null hypothesis, when p value is more than set (e.g. 0.05). That means, the difference between and 0 is NOT significant. The difference is caused by a chance. It is NOT caused by the experiment. The experiment does not have strong evidence to reject the null hypothesis.
P < 0.05, the difference is significant, Reject H0P > 0.05, the difference is not significant, Accept H0
-
Experiment Design - Objectives
Example 1: The aim of the project is to evaluate the effect of a new antibiotic on the mortality of bacterium sp. X
15
Example 3. The aim of the project is to evaluate the effect of herbal supplement on the immune parameters of juvenile fish.
Example 2. The aim of the project is to study the effect of new drug on Diabetic mice to reduce blood serum glucose level
Let's take example 3 for hypothesis testing
-
Experiment design Making Null and Alternate Hypotheses
E.g. Objective. The aim of the project is to evaluate the effect of herbal supplement on immune parameters of juvenile fish.
Write your null and alternative hypothesis and symbols for this objective
Null hypothesis (H0) Herbal supplement will not affect the immune parameters of juvenile fish.
Alternative hypothesis (Ha) Herbal supplement will affect the immune parameters of juvenile fish.
16
0: 0 =
: 0
= value of true population mean0 = value of hypothesized population mean
Is this hypothesis one tail or two tail?
-
17
Steps of Hypothesis TestingStep 1: Develop Null and Alternative hypotheses
Refer slides 6 & 7
Step 2: Set the criteria for decision
Level of significance (conventionally = 0.05 or 0.01)
Step 3: Compute the test statistics
Appropriate statistical test like t-test, z-test and ANalysis Of Variance (ANOVA)
Step 4: Interpret results
Make your decision based on the set alpha () and P-value or based on the
critical and calculated values.
Step 5: Make a decision / conclusion
When we reach statistical significance, the null hypothesis is rejected and the
research (alternative) hypothesis is accepted.
When we fail to reach statistical significance, the null hypothesis is accepted and the research (alternative) hypothesis is rejected.
-
Sample Size One of the main issues in experimental design is setting the sample size. Certainly, larger sample size is desirable. The reasons that data would be
mostly in normal distribution and gives lesser variation. If your project involves a population study (e.g. survey), the sample size
should be at least 10% of that population. In fact, this number is a challenge in population studies that require animal model due to cost and animal ethical issues.
Sample size required depends on how significant your experiments need to be.
More significant results need bigger sample size Some text books recommend the sample size of 30 and above is large
sample size for biological research. Below 30 is classified as small sample size, which is also accepted in biological research.
Its a good practice to refer to scientific articles related to your project to set your sample size.
In general, clinical trials and population studies require large sample size. Websites are available to calculate sample size at different significance level.
18
-
Replication of experiment It is very important to obtain consistency and
reproducibility in your experiment results.
Replication of the whole experiment is laborious, time consuming and costly.
One way to address this issue is having 3 or more replicates of samples within the same experiment.
However, in some long term research or clinical studies replication of the whole experiment is required.
19
-
Types of data you may collect in your experiments
Data Type Description Examples Types of Stats Analysis to Apply
Qualitative data
Descriptive, observed, categorical orinterpretive data
Colour of bacteria colonies yellow, red, brown, grey;
Behaviour of mice fed on caffeine diet;
Survey on elderly habits in taking prescribed medication;
Classification; Factor analysis;Cluster analysis;, Prediction
Quantitative data
Count / frequency data
No. of larva that survived after heat treatment;
No. of viable cells after treatment with anti-tumour drug;
Chi-square test, Goodness of Fit
Measureddata
Length of fish larvae after 2 weeks of treatment with supplemented feed;
Amount of insulin in the blood sugar of diabetic mice in relation to their glycemic load diet over 2 weeks;
t-test, z-test ANOVA,multiple comparison,
20
Once you have designed your experiments, you will need to think about the data you will collect and what are the tests to apply.
-
Data validation prior to analysis
Check your data for normal distribution.
It can be done subjecting your data to descriptive statistics.
It can be visualized in Excel by scatter plot and by constructing frequency histogram.
If your data points are normally distributed you may be able to see a kind of Bell curve.
If you are data points are normally distributed, proceed to parametric tests such as paired t- test, unpaired t-test and ANOVA, which ever test is suitable for your data.
If your data are not normally distributed proceed to non-parametric test such as Wilcoxon signed Rank test ( = paired t-test), Mann-Whitney test (= unpaired t-test) and Kruscal Wallis test (= ANOVA), Chi-Square (for categorical data) which ever test is suitable for your data.
21
Refer Annex slide, 58 for more info
-
Some example of statistical tests and their use
Descriptive statistics preliminary check to validate your data
Paired T-test compare means of two groups which came from the same subjects. Used in before and after situation.
Unpaired T-test compare means of two groups of unequal sample size. The data not necessarily come from the same subjects.
ANalysis Of Variance (ANOVA)* analyze the variances for more than 2 groups
Multiple comparison test* E.g. Tukeys test Continuation of ANOVA to compare means in pairs
Correlation & Regression see the relationship between 2 or more variables
Chi-square test for the distribution of categorical variables
22
-
23
Descriptive Statistics
Key in your data sets in Excel and follow the screen shots
The basic statistics to get the summary of data such as mean, median, mode, variance, standard deviation and other functions of your data.
A preliminary check to know about your data before analysis.
-
24
-
Descriptive statistics output in MS Excel
-
26
The important values you need know from this table are mean, standard deviation, skewness and kurtosis. Why?
-
Paired t- testYou can use paired t-test When you have 2 groups of data Conditions
Data are normally distributed Samples are dependent sample size must be equal. Used in before and after situation. In this case, the subjects
will be same. Two types of data collected from the same subjects Matching pairs by gender or age
It is most commonly used in clinical trial studies to evaluate the efficacy of new drugs for various diseases and disorders before and after treatments.
27
-
28
Do you see a significant difference between S1 and S2?
Example of Paired t-testIn Excel it is called t-test: Paired TWO Sample for Means
Sub
-
Unpaired or Independent sample t-test
You can use unpaired t-test
When you have 2 groups of data
Conditions Data are normally distributed
Samples are independent
sample size need not be equal.
Data collected from different subjects
It is most commonly used in comparing Means of 2 groups.
-
Unpaired t- test
Example: A pair of students tested the effect of hand washing liquid on the bacterial growth. Students prepared several
petri dishes with culture medium. Hand washing liquid were added in some of the Petri dishes and in not in others. Same amount of bacteria were inoculated in all petri dishes and
incubated for 48 hrs. After 48 hrs, the number colonies present in each petri were counted. The results are tabulated
in the next slide.
What is your null and alternative hypotheses?
-
Unpaired or Independent sample t-test in ExcelIn Excel it is called t-test: Two samples assuming equal variances
-
Unpaired or Independent sample t-test in ExcelIn Excel it is called t-test: Two samples assuming equal variances
-
Unpaired or Independent sample t-test output in Excel
Interpretation: P value is smaller than 0.05. t Stat value is larger than the t critical value. These values indicate there is a significant difference between control group and treatment group in terms of bacterial colonies.What is your conclusion on null hypothesis?
-
ANOVA: ANalysis Of VAriance
You can use ANOVA when you have more than 2 groups of data to compare. your data are normally distributed. sample size need not be same.
AdvantageIt can tell us overall significant difference in the experimental groups.
LimitationIt doesnt tell us which group is significantly different from other groups.
34
-
One-way ANOVA (ANOVA single factor in Excel)
Example: A Biologist quantified the levels of cadmium present in different species of sea weeds. He collected same amounts of 4
different species of seaweeds and measured cadmium levels separately. The data are tabulated in the next slide.
What is your null and alternative hypotheses?
-
36
Prior to do ANOVA, check data for normality
Are data in this example normally distributed? Why?
-
37
ANOVA single factor in MS Excel
-
38
ANOVA single factor in MS Excel
-
What is your interpretation on the ANOVA results?
Do you accept or reject the null hypothesis? Why?
What is your next step in analysis? Why?
-
You can use Tukeys test when ANOVA shows overall significant difference (P < 0.05) of your data. Tukeys test is able to tell you which group is significantly different from
the others.
It makes pairwise comparisons to show the difference in Means. A very useful test available in Excel for multiple comparison.
Multiple comparison Tukey's test
40
How to do Multiple Comparisons in Excel?
Select data with labels > click Add-Ins > Data Analysis Plus > MultipleComparisons > OK
Check Labels (if you had selected labels when you selected the data).Otherwise, no. By default is 0.05. Press Ok (unless need to change this).
You will see the multiple comparison output in another Excel sheet.
-
Multiple comparison in MS Excel (Tukeys Test)
41
-
42
Multiple comparison in MS Excel (Tukeys Test)
-
43
The Excel output of Tukeys test looks like this
How to interpret?Look at the values in difference column and Omega column. If the value in difference column is LARGER than the value in Omega column for a particular pair, the Mean difference between the pair is statistically significant.If the value in difference column is SMALLER than the value in Omega column for a particular pair, the Mean difference between the pair is NOT statistically significant.In your conclusion, if you have many pairs to interpret, you may write about the pairs that are significantly different.
-
Scientific Presentation of ResultsWhat you mainly need are
Suitable chart Mean values of results Standard deviation or Standard error of mean Confidence interval (in some cases) Interpretation of results Proper X and Y axes labels Adequate title Footnote
In footnote, indicate statistical output, , level of significance, sample size, test statistics used and experiment is replicated or not.
44
You can refer slides 46-50 for some examples of published scientific data presentations using different kinds of charts.
-
When do you use Standard deviation (SD), Standard error of mean (SEM) and Confidence Interval (CI)
in your data presentation?
45
When your Goal is to SD SEM CI
show the variation in each group
show how precisely you have determined the mean.
compare means different groups.
Whatever you choose to show, be sure to state your choice in the footnote of your graph /table.
-
Samples of data presentationExample 1: Table format
46
Table format is fine, if you have many groups to compare. However, It may not be impressive sometimes and not easy for readers to visualize effects of the experiment.
Note the Title is on top. Footnote at the bottom of the table which contains details of test groups, sample size, statistical tests & software used.
-
Example 2: Simple bar chart to compare Means of 3 groups
47
Note the title and footnote are combined at the bottom of the graphs.
Bars represent mean values and error bars represent SEM.
Bar with * represent No Gum group is significantly different from other 2 groups
-
Example 3: Vertical Cluster bar chart used to compare means of 5 groups (different time intervals) with different blood cells
48
Bars indicate group Means and error bars indicate Standard error of Means
-
49
Note: confidence interval is stated
Mean value
Example 4. Line chart to present survival studies (Line chart is suitable to present data that were collect over the period of time,
cell culture studies and incubation studies)
-
Example 5. Line and clustered bar charts
50
Line chart is well suited for incubation experiments that show data over the period of experiments (shows trends)
It is easy to visualize, if you have to many groups to compare.
-
Important Steps to prepare a graph with error bars in MS Excel !!!
Key in raw data in Excel spreadsheet.
Find the Mean, Standard Deviation (SD) and Standard Error of Mean (SEM) in Excel or manually.
Use Mean values to draw a chart. NOT THE RAW DATA.
Use SD as error bars on the chart if, your interest is show the deviation of data from the mean.
Use SEM as error bars on the chart, if your interest is to compare the means of different groups. This is the most commonly used (E.g. Compare the mean height of kangkong plant grown in different conditions).
Label the X and Y axes with proper terms and units.
Give appropriate title to your chart.
Indicate the Statistical test used, level of significant, sample size (n), and choice of error bar used in footnote. (E.g. test used is ANOVA, level of significance is p
-
How to prepare chart with error bars in Excel
52
In Excel, go to formula bar and type =Average(select cell with data) E.g. =AVERAGE(b2:b6)=STDEV(select cells with data) E.g. = STDEV(b2:b6)=(cell with STDEV value /(sqrt(n)) E.g. =B8/(sqrt(5))
OrSelect descriptive statistics as shown in slide 23 -26.
E.g. Key in data, calculate the mean, SD and SEM as shown in this slide.
n is sample size
-
To draw bar chartCopy and paste feed label and mean values onto new cells as shown in this slide. Select the feed label and mean values > Insert > select column chart >
2D column and enter. You will see bar chat as shown below.
53
-
To enter error barsClick the chart > chart elements > click error bar option and the
arrow and select more options as shown in this slide.
54
-
Under error bar option select Both (by default it should be both), scroll down and select custom as shown in this slide
55
-
Click specify values and fill positive and negative error values by selecting SEM values in the spreadsheet once for positive error value and 2nd time for
negative error value as shown in this slide and press Ok
56
-
You can see a chart with SEM as error bars. Fill in axes and title as shown in this slide
57
-
Flow chart used to choose an appropriate statistical test for your data
58
It may look complicated, but it's not and it's a very good reference.
Refer the soft copy for enlarged view
Start here
Annex
-
Q & A
59