Elite_data Analysis With Stata

download Elite_data Analysis With Stata

of 52

Transcript of Elite_data Analysis With Stata

1

Introduction to Data Analisys with StataSara Godoy.Grupo Avanzado. Noviembre 2011

+

Nonparametric Analysis

+

Non-Parametric tests: SummaryNATURE OF DEPENDENT VBL. ONE-SAMPLE TWO-SAMPLE K-SAMPLE

RELATED/MATCHED

INDEPENDENT

INDEPENDENT

CATEGORICAL/NOMINAL

Binomial test

McNemar test

Fisher s exact test WilconxonMann Whitney test

Chi-square test

ORDINAL/INTERVAL

KolmogorovSmirnov onesample test

Wilcoxon signed ranks test

Kruskal Wallis test

+

Non-parametric correlationA Spearman correlation is used when one or both of the variables are not assumed to be normally distributed and interval (but are assumed to be ordinal). The values of the variables are converted in ranks and then correlated. ! Syntax: spearman [varlist] [if] ,[options]!

spearman read write Number of obs = 200 Spearman's rho = 0.6167 Test of Ho: read and write are independent Prob > |t| = 0.0000 The results suggest that the relationship between read and write (rho = 0.6167, p = 0.000) is statistically significant.

+

P-values meaningA p-value is a measure of how much evidence we have against the null hypothesis (H0) ! The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.! !

One often "rejects the null hypothesis" when the p-value is less than the significance level:! ! !

p F = R-squared = Root MSE =

xi: regress csat expense percent income high college i.region, robust

50 69.82 0.0000 0.9111 21.492

csat expense percent income high college _Iregion_2 _Iregion_3 _Iregion_4 _cons

Coef. -.002021 -3.007647 -.1674421 1.814731 4.670564 69.45333 25.39701 34.57704 808.0206

Robust Std. Err. .0035883 .2358047 1.196409 1.02694 1.599798 17.99933 12.52558 9.44989 67.86418

t -0.56 -12.75 -0.14 1.77 2.92 3.86 2.03 3.66 11.91

P>|t| 0.576 0.000 0.889 0.085 0.006 0.000 0.049 0.001 0.000

[95% Conf. Interval] -.0092676 -3.483864 -2.583638 -.2592168 1.439705 33.10295 .101086 15.4926 670.9661 .0052256 -2.53143 2.248754 3.888679 7.901422 105.8037 50.69293 53.66149 945.0751

NOTE: By default xi excludes the first value, to select a different value, before running the regression type: . char region[omit] 4 xi: regress csat expense percent income high college i.region, robust This will select Midwest (4) as the reference category for the dummy variables.

+

Regression: correlation matrix!

Below is a correlation matrix for all variables in the model. Numbers are Pearson correlation coefficients, go from -1 to 1. Closer to 1 means strong correlation. A negative value indicates an inverse relationship (roughly, when one goes up the other goes down).pwcorr csat expense percent income high college, star(0.05) sigcsat csat 1.0000 expense percent income high college

expense

-0.4663* 0.0006 -0.8758* 0.0000 -0.4713* 0.0005 0.0858 0.5495 -0.3729* 0.0070

1.0000

percent

0.6509* 0.0000 0.6784* 0.0000 0.3133* 0.0252 0.6400* 0.0000

1.0000

income

0.6733* 0.0000 0.1413 0.3226 0.6091* 0.0000

1.0000

high

0.5099* 0.0001 0.7234* 0.0000

1.0000

college

0.5319* 0.0001

1.0000

+

Regression: graph matrix!

Command graph matrix produces a graphical representation of the correlation matrix by presenting a series of scatterplots for all variables

graph matrix csat expense percent income high college, half maxis (ylabel(none) xlabel(none))

+

Regression: Managing all this outputs! Usually!

when we re running regression, we ll be testing multiple models at a timeCan be difficult to compare results

! Stata

offers several user- friendly options for storing and viewing regression output from multiple models:! !

Store Output: eststo / esttab Outputting into Excel: outreg2

+

Regression: eststo/esttab!We

can store this info in Stata, just type:

regress csat expense, robust eststo model1 regress csat expense college, robust eststo model2 percent income high

xi: regress csat expense college i.region, robust eststo model3

percent

income

high

+

Regression: eststo/esttab!esttab model1 model2 model3 Now Stata will hold your output in . memory until you ask to recall it: (1) (2) csat csat expense -0.0223*** (-6.07) 0.00335 (0.70) -2.618*** (-11.44) 0.106 (0.09) 1.631 (1.73) 2.031 (0.96) (3) csat -0.00202 (-0.56) -3.008*** (-12.75) -0.167 (-0.14) 1.815 (1.77) 4.671** (2.92) 69.45*** (3.86) 25.40* (2.03) 34.58*** (3.66) 1060.7*** (43.55) 51 851.6*** (14.86) 51 808.0*** (11.91) 50

esttab model1 !model2 model3

percent

income

high

college

_Iregion_2

_Iregion_3

_Iregion_4

_cons

N

t statistics in parentheses * p