Hypothesis Tests in R Programming
-
Upload
atacan-garip -
Category
Data & Analytics
-
view
49 -
download
0
Transcript of Hypothesis Tests in R Programming
Department of Management Information Systems
FINAL REPORT
Hypothesis Tests in R Programming
Atacan Garip
1303041042
Ankara 2017
Table of Contents Parametric or Nonparametric ................................................................................................................. 3
Scale of measurement ......................................................................................................................... 3
The population distribution ................................................................................................................. 3
Parametric ........................................................................................................................................... 3
Non-Parametric ................................................................................................................................... 3
Parametric Tests ...................................................................................................................................... 4
T-Test ................................................................................................................................................... 4
ANOVA and Post Hoc ........................................................................................................................... 7
Non-Parametric Tests ............................................................................................................................ 10
Chi-Square ......................................................................................................................................... 10
Mann-Whitney U and Wilcoxon Test ................................................................................................ 12
Kruskal Wallis Test ............................................................................................................................. 14
Correlation ............................................................................................................................................. 16
Correlation ......................................................................................................................................... 16
Regression ............................................................................................................................................. 19
Simple Regression ............................................................................................................................. 19
Multiple Regression ........................................................................................................................... 20
Resources .............................................................................................................................................. 21
Parametric or Nonparametric When to use which statistical test? Parametric or Non-parametric. To answer this question, we
should look at two dimensions.
Scale of measurement If data is Nominal (gender) or Ordinal (low-to-high), Nonparametric test should be used
If data is Interval (measures of temperature) or Ratio (income), Parametric test should be
used.
The population distribution If the population is Normally Distributed, Parametric test may be used.
If the population is not Normally Distributed, Nonparametric test must be used.
Before going through the statistical tests, we'd better look at Parametric-Nonparametric comparison.
Parametric Information about population is completely known
Specific assumptions are made regarding the population
Null hypothesis is made on parameters of the population distribution
Test statistics based on the distribution
Parametric tests are applicable only for variable
No parametric test exists for Nominal Scale data
Parametric test is powerful, if exist
Non-Parametric No information about the population is available
No assumptions are made regarding the population
The null hypothesis is free from parameters
Test statistic is arbitrary
It is applied both variables and attributes
Non-parametric test exists for nominal and ordinal scale data
It is not so powerful like parametric tests
Parametric Tests The parametric test is the hypothesis test which provides generalizations for making statements
about the mean of the parent population.
T-Test T test is used to compare the mean of the two groups and it is decided whether the difference is
coincidental or statistically significant.
The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors, and
Professors in a college in the U.S. In one-sample and two-independent sample T Tests, this data set
will be tested.
One-sample T Test A one sample t-test allows us to test whether a sample mean (of a normally distributed interval
variable) significantly differs from a hypothesized value. It is applied to test the accuracy of the
prediction. This test is applied to test the degree of accuracy of any sample when it is made for a
given forecast.
Hypothesis
Ho: Avg. Salaries is equal to 90000 H1: Avg. Salaries is greater than 90000
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means Avg. Salaries is greater than 90000.
Two Independent Samples T Tests An independent samples t-test is used when you want to compare the means of a normally
distributed interval dependent variable for two independent groups. It is applied to test the meaning
of the difference between the two arithmetic averages.
Hypothesis
Ho: Avg. Salaries of Male and Female are equal H1: Avg. Salaries of Male and Female are not equal
With 0.05 confidence level
Decision Making with Using P-value.
According to test result, p-value is equal to 0.002. When we compare the p-value with confidence
level;
P-value (0.002) < Confidence Level (0.05)
Reject Ho. It means Avg. Salaries of Male and Female are not equal.
Paired T Tests A paired (samples) t-test is used when you have two related observations (i.e. two observations per
subject) and you want to see if the means on these two normally distributed interval variables differ
from one another
Birth and death rates for 69 countries. In Paired T Test, this data set will be tested.
Hypothesis
Ho: Differences between birth and death rate is 15 H1: Differences between birth and death rate is not 15
With 0.05 confidence level.
Decision Making with Using P-value.
According to test result, p-value is equal to 0.001. When we compare the p-value with confidence
level;
P-value (0.001) < Confidence Level (0.05)
Reject Ho. It means Differences between birth and death rate is greater than 15.
ANOVA and Post Hoc Analysis of Variance (ANOVA) used to test hypothesis the differences among group means and their
associated procedures
Data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France. In one-way and two-
way ANOVA, and Post Hoc tests, this data set will be tested.
One-way ANOVA One of the most widely known and used tests to compare multiple groups to one or more is "one
way analysis of variance". One of the prerequisites for the analysis of variance is that each group is
randomly selected from a master population with normal distribution. In addition, each group has an
equal variance.
Hypothesis
Ho: There is a statistically significance Relationship between “ncontrols” and “agegp” H1: There is no statistically significance Relationship between “ncontrols” and “agegp”
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is equal to 0.022. When we compare the p-value with confidence
level;
P-value (0.022) < Confidence Level (0.05)
Reject Ho. It means there is a statistically significance relationship between “ncontrols” and “agegp”
Two-way ANOVA One-way variance analysis for independent samples refers to one independent variable and one
dependent variable whereas two-way variance analysis refers to two independent variables and one
dependent variable. In the two-way analysis of variance, the main goal is to measure the joint effect
of the independent variables on the dependent variable.
Hypothesis
Ho: There is a statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp” H1: There is no statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp”
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means there is no statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp”
Post-Hoc Post-Hoc means to analyze the results of your experimental data.
Decision Making with Using P-value
With the Tukey test we see which combination whether supports the hypothesis or not.
Non-Parametric Tests The nonparametric test is defined as the hypothesis test which is not based on underlying
assumptions, i.e. it does not require population’s distribution to be denoted by specific parameters.
Chi-Square Distribution of hair and eye color and sex in students. In Chi Square, this data set will be
tested.
Hypothesis
Ho: Statistically dependent H1: Statistically independent
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means statistically independent
Mann-Whitney U and Wilcoxon Test
Mann-Whitney U It is a non-parametric test that is used to compare two population means that come from the same
population, it is also used to test whether two population means are equal or not. This test should be
applied when the conditions for the test "t" are not appropriate.
The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors, and
Professors in a college in the U.S. In Mann-Whitney U test, this data set will be tested.
Hypothesis
Ho: The means of two populations are equal H1: The means of two populations are not equal
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means the means of two populations are not equal.
Wilcoxon Signed Rank The Wilcoxon test is a non-parametric statistical hypothesis test used when comparing two related
samples, matched samples, or repeated measurements on a single sample to assess whether their
population mean ranks differ. Also, it is the nonparametric test of the "paired t" test.
Birth and death rates for 69 countries. In Wilcoxon Test, this data set will be tested.
Hypothesis
Ho: The medians of two populations are equal H1: The medians of two populations are not equal
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means the medians of two population are not equal.
Kruskal Wallis Test The Kruskal-Wallis H test is used with the aim of testing whether there is a significant difference
between the two distributions by comparing the measures of a dependent variable on more than
one group (sample) independent of each other. The median value is used instead of the arithmetic
mean.
Students were administered two parallel forms of a test after a random assignment to three different
treatments. In Kruskal Wallis Test, this data set will be tested.
Hypothesis
Ho: Statistically significance difference H1: Not Statistically significance difference
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means not statistically significance difference.
Correlation
Correlation A correlation is useful when you want to see the linear relationship between two (or more) normally
distributed interval variables.
A simulated data set containing sales of child car seats at 400 different stores. In correlation, this
data set will be tested.
Pearson
Hypothesis
Ho: There is a statistically significance relationship H1: There is no statistically significance relationship
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means there is no statistically significance relationship.
Spearman
Hypothesis
Ho: There is a statistically significance relationship H1: There is no statistically significance relationship
With 0.05 confidence level
Decision Making with Using P-value.
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means there is no statistically significance relationship.
Kendall
Hypothesis
Ho: There is a statistically significance relationship H1: There is no statistically significance relationship
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means there is no statistically significance relationship.
Regression
Simple Regression Simple regression analysis examines the relationship between a dependent variable and an
independent variable. Simple regression analysis forms an equation of linearity that represents the
linear relationship between dependent and independent variables.
A simulated data set containing sales of child car seats at 400 different stores. In Simple and Multiple
Regression, this data set will be tested.
Hypothesis
Ho: Advertising increases the Sales H1: Advertising does not increase the Sales
With 0.05 confidence level.
Decision Making with Using P-value
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means Advertising does not increases the Sales.
Multiple Regression Multiple regression analysis examines the relationship between a dependent variable and more than
one independent variable.
Hypothesis
Ho: Advertising and Income increase the Sales H1: Advertising and Income do not increase the Sales
With 0.05 confidence level
Decision Making with Using P-value.
According to test result, p-value is very close to 0. When we compare the p-value with confidence
level;
P-value (very close to 0) < Confidence Level (0.05)
Reject Ho. It means Advertising and Income do not increase the Sales.
Resources Difference Between Parametric or Nonparametric Test. (2017, 01 5). Key Differences:
http://keydifferences.com/difference-between-parametric-and-nonparametric-test.html
adresinden alındı
Elementary Statistics with R. (2017, 1 5). Elementary Statistics with R: http://www.r-
tutor.com/elementary-statistics adresinden alındı
Hipotez Testleri. (2017, 01 5). İstatistik Analiz Hakkında: http://www.istatistikanaliz.com/default.asp
adresinden alındı
İSTATİSTİK. (2017, 01 5). http://mustafaotrar.net/istatistik/ adresinden alındı
Using R for statistical analyses - Basic Statistics. (2017, 1 5). GardenerS own:
http://www.gardenersown.co.uk/education/lectures/r/basics.htm#t_test adresinden alındı
What statistical analysis should I use? (2017, 01 5). INSTITUTE FOR DIGITAL RESEARCH AND
EDUCATION: http://www.ats.ucla.edu/stat/stata/whatstat/whatstat.htm adresinden alındı