Hypothesis Tests in R Programming

21
Department of Management Information Systems FINAL REPORT Hypothesis Tests in R Programming Atacan Garip 1303041042 Ankara 2017

Transcript of Hypothesis Tests in R Programming

Department of Management Information Systems

FINAL REPORT

Hypothesis Tests in R Programming

Atacan Garip

1303041042

Ankara 2017

Table of Contents Parametric or Nonparametric ................................................................................................................. 3

Scale of measurement ......................................................................................................................... 3

The population distribution ................................................................................................................. 3

Parametric ........................................................................................................................................... 3

Non-Parametric ................................................................................................................................... 3

Parametric Tests ...................................................................................................................................... 4

T-Test ................................................................................................................................................... 4

ANOVA and Post Hoc ........................................................................................................................... 7

Non-Parametric Tests ............................................................................................................................ 10

Chi-Square ......................................................................................................................................... 10

Mann-Whitney U and Wilcoxon Test ................................................................................................ 12

Kruskal Wallis Test ............................................................................................................................. 14

Correlation ............................................................................................................................................. 16

Correlation ......................................................................................................................................... 16

Regression ............................................................................................................................................. 19

Simple Regression ............................................................................................................................. 19

Multiple Regression ........................................................................................................................... 20

Resources .............................................................................................................................................. 21

Parametric or Nonparametric When to use which statistical test? Parametric or Non-parametric. To answer this question, we

should look at two dimensions.

Scale of measurement If data is Nominal (gender) or Ordinal (low-to-high), Nonparametric test should be used

If data is Interval (measures of temperature) or Ratio (income), Parametric test should be

used.

The population distribution If the population is Normally Distributed, Parametric test may be used.

If the population is not Normally Distributed, Nonparametric test must be used.

Before going through the statistical tests, we'd better look at Parametric-Nonparametric comparison.

Parametric Information about population is completely known

Specific assumptions are made regarding the population

Null hypothesis is made on parameters of the population distribution

Test statistics based on the distribution

Parametric tests are applicable only for variable

No parametric test exists for Nominal Scale data

Parametric test is powerful, if exist

Non-Parametric No information about the population is available

No assumptions are made regarding the population

The null hypothesis is free from parameters

Test statistic is arbitrary

It is applied both variables and attributes

Non-parametric test exists for nominal and ordinal scale data

It is not so powerful like parametric tests

Parametric Tests The parametric test is the hypothesis test which provides generalizations for making statements

about the mean of the parent population.

T-Test T test is used to compare the mean of the two groups and it is decided whether the difference is

coincidental or statistically significant.

The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors, and

Professors in a college in the U.S. In one-sample and two-independent sample T Tests, this data set

will be tested.

One-sample T Test A one sample t-test allows us to test whether a sample mean (of a normally distributed interval

variable) significantly differs from a hypothesized value. It is applied to test the accuracy of the

prediction. This test is applied to test the degree of accuracy of any sample when it is made for a

given forecast.

Hypothesis

Ho: Avg. Salaries is equal to 90000 H1: Avg. Salaries is greater than 90000

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means Avg. Salaries is greater than 90000.

Two Independent Samples T Tests An independent samples t-test is used when you want to compare the means of a normally

distributed interval dependent variable for two independent groups. It is applied to test the meaning

of the difference between the two arithmetic averages.

Hypothesis

Ho: Avg. Salaries of Male and Female are equal H1: Avg. Salaries of Male and Female are not equal

With 0.05 confidence level

Decision Making with Using P-value.

According to test result, p-value is equal to 0.002. When we compare the p-value with confidence

level;

P-value (0.002) < Confidence Level (0.05)

Reject Ho. It means Avg. Salaries of Male and Female are not equal.

Paired T Tests A paired (samples) t-test is used when you have two related observations (i.e. two observations per

subject) and you want to see if the means on these two normally distributed interval variables differ

from one another

Birth and death rates for 69 countries. In Paired T Test, this data set will be tested.

Hypothesis

Ho: Differences between birth and death rate is 15 H1: Differences between birth and death rate is not 15

With 0.05 confidence level.

Decision Making with Using P-value.

According to test result, p-value is equal to 0.001. When we compare the p-value with confidence

level;

P-value (0.001) < Confidence Level (0.05)

Reject Ho. It means Differences between birth and death rate is greater than 15.

ANOVA and Post Hoc Analysis of Variance (ANOVA) used to test hypothesis the differences among group means and their

associated procedures

Data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France. In one-way and two-

way ANOVA, and Post Hoc tests, this data set will be tested.

One-way ANOVA One of the most widely known and used tests to compare multiple groups to one or more is "one

way analysis of variance". One of the prerequisites for the analysis of variance is that each group is

randomly selected from a master population with normal distribution. In addition, each group has an

equal variance.

Hypothesis

Ho: There is a statistically significance Relationship between “ncontrols” and “agegp” H1: There is no statistically significance Relationship between “ncontrols” and “agegp”

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is equal to 0.022. When we compare the p-value with confidence

level;

P-value (0.022) < Confidence Level (0.05)

Reject Ho. It means there is a statistically significance relationship between “ncontrols” and “agegp”

Two-way ANOVA One-way variance analysis for independent samples refers to one independent variable and one

dependent variable whereas two-way variance analysis refers to two independent variables and one

dependent variable. In the two-way analysis of variance, the main goal is to measure the joint effect

of the independent variables on the dependent variable.

Hypothesis

Ho: There is a statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp” H1: There is no statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp”

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means there is no statistically significance Relationship between “ncontrols” and “alcgp”, “tobgp”

Post-Hoc Post-Hoc means to analyze the results of your experimental data.

Decision Making with Using P-value

With the Tukey test we see which combination whether supports the hypothesis or not.

Non-Parametric Tests The nonparametric test is defined as the hypothesis test which is not based on underlying

assumptions, i.e. it does not require population’s distribution to be denoted by specific parameters.

Chi-Square Distribution of hair and eye color and sex in students. In Chi Square, this data set will be

tested.

Hypothesis

Ho: Statistically dependent H1: Statistically independent

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means statistically independent

Mann-Whitney U and Wilcoxon Test

Mann-Whitney U It is a non-parametric test that is used to compare two population means that come from the same

population, it is also used to test whether two population means are equal or not. This test should be

applied when the conditions for the test "t" are not appropriate.

The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors, and

Professors in a college in the U.S. In Mann-Whitney U test, this data set will be tested.

Hypothesis

Ho: The means of two populations are equal H1: The means of two populations are not equal

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means the means of two populations are not equal.

Wilcoxon Signed Rank The Wilcoxon test is a non-parametric statistical hypothesis test used when comparing two related

samples, matched samples, or repeated measurements on a single sample to assess whether their

population mean ranks differ. Also, it is the nonparametric test of the "paired t" test.

Birth and death rates for 69 countries. In Wilcoxon Test, this data set will be tested.

Hypothesis

Ho: The medians of two populations are equal H1: The medians of two populations are not equal

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means the medians of two population are not equal.

Kruskal Wallis Test The Kruskal-Wallis H test is used with the aim of testing whether there is a significant difference

between the two distributions by comparing the measures of a dependent variable on more than

one group (sample) independent of each other. The median value is used instead of the arithmetic

mean.

Students were administered two parallel forms of a test after a random assignment to three different

treatments. In Kruskal Wallis Test, this data set will be tested.

Hypothesis

Ho: Statistically significance difference H1: Not Statistically significance difference

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means not statistically significance difference.

Correlation

Correlation A correlation is useful when you want to see the linear relationship between two (or more) normally

distributed interval variables.

A simulated data set containing sales of child car seats at 400 different stores. In correlation, this

data set will be tested.

Pearson

Hypothesis

Ho: There is a statistically significance relationship H1: There is no statistically significance relationship

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means there is no statistically significance relationship.

Spearman

Hypothesis

Ho: There is a statistically significance relationship H1: There is no statistically significance relationship

With 0.05 confidence level

Decision Making with Using P-value.

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means there is no statistically significance relationship.

Kendall

Hypothesis

Ho: There is a statistically significance relationship H1: There is no statistically significance relationship

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means there is no statistically significance relationship.

Regression

Simple Regression Simple regression analysis examines the relationship between a dependent variable and an

independent variable. Simple regression analysis forms an equation of linearity that represents the

linear relationship between dependent and independent variables.

A simulated data set containing sales of child car seats at 400 different stores. In Simple and Multiple

Regression, this data set will be tested.

Hypothesis

Ho: Advertising increases the Sales H1: Advertising does not increase the Sales

With 0.05 confidence level.

Decision Making with Using P-value

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means Advertising does not increases the Sales.

Multiple Regression Multiple regression analysis examines the relationship between a dependent variable and more than

one independent variable.

Hypothesis

Ho: Advertising and Income increase the Sales H1: Advertising and Income do not increase the Sales

With 0.05 confidence level

Decision Making with Using P-value.

According to test result, p-value is very close to 0. When we compare the p-value with confidence

level;

P-value (very close to 0) < Confidence Level (0.05)

Reject Ho. It means Advertising and Income do not increase the Sales.

Resources Difference Between Parametric or Nonparametric Test. (2017, 01 5). Key Differences:

http://keydifferences.com/difference-between-parametric-and-nonparametric-test.html

adresinden alındı

Elementary Statistics with R. (2017, 1 5). Elementary Statistics with R: http://www.r-

tutor.com/elementary-statistics adresinden alındı

Hipotez Testleri. (2017, 01 5). İstatistik Analiz Hakkında: http://www.istatistikanaliz.com/default.asp

adresinden alındı

İSTATİSTİK. (2017, 01 5). http://mustafaotrar.net/istatistik/ adresinden alındı

Using R for statistical analyses - Basic Statistics. (2017, 1 5). GardenerS own:

http://www.gardenersown.co.uk/education/lectures/r/basics.htm#t_test adresinden alındı

What statistical analysis should I use? (2017, 01 5). INSTITUTE FOR DIGITAL RESEARCH AND

EDUCATION: http://www.ats.ucla.edu/stat/stata/whatstat/whatstat.htm adresinden alındı