Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor...

10
Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/FNRS/Louvain) Damian Raess (MIT) Members of the team headed by Prof. Richard Freeman (HarvardLWP/NBER) - A 5 Minute Presentation - 3rd Global WageIndicator Conference. Amsterdam, April 16, 2008

description

1/ Questionnaire design 1/ Locate the “reference survey” in your country = a random sample representative of the entire US population In the USA: US Current Population Survey (generated by the Bureau of Labor Statistics and the Census Bureau) 2/ Identify the questions of the WageIndicator survey that correspond to questions covered by that “reference survey”, and aim at matching these. Example: the wage question(s) should be conformed to the one of your “reference survey”.

Transcript of Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor...

Page 1: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

Methodological Foundations of the Web-based PayWizard Dataset (USA)

Isabelle Ferreras (Harvard Labor and Worklife Program/FNRS/Louvain)Damian Raess (MIT) Members of the team headed by Prof. Richard Freeman (HarvardLWP/NBER)

- A 5 Minute Presentation -

3rd Global WageIndicator Conference. Amsterdam, April 16, 2008

Page 2: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

General Approach

How to generalize from a non-representative sample?

• Building a strategy to probe the significance of the web-generated dataset:– 1/ Questionnaire design– 2/ Basic statistical tests (a simple model determining income) – 2.a. Correlation analysis– 2.b. Regression analysis– 2.c. Weighing might not be enough– 3/ More to come... With YOU!

• 18 months after launching, a tough road in the US: a highly competitive market ; now 10 000 visitors a month, waiting for completed surveys!

• Number of observations so far = 3000

Page 3: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

1/ Questionnaire design

• 1/ Locate the “reference survey” in your country= a random sample representative of the entire US population• In the USA: US Current Population Survey (generated by the

Bureau of Labor Statistics and the Census Bureau)

• 2/ Identify the questions of the WageIndicator survey that correspond to questions covered by that “reference survey”, and aim at matching these.

• Example: the wage question(s) should be conformed to the one of your “reference survey”.

Page 4: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests– A simple model determining income by age, sex, education– Descriptive variables– Sex, percentage of men: CPS, 51%; PayW, 54%– Age, median: CPS, 41; PayW, 37– Education, years of schooling

CPS . tab peeduca_new peeduca_new | Freq. Percent Cum. ------------ +----------------------------------- 2.5 | 99 0.71 0.71 5.5 | 153 1.10 1.81 7.5 | 117 0.84 2.65 9 | 158 1.13 3.78 10 | 193 1.38 5.17 11 | 409 2.93 8.10 (HS) 12 | 4,083 29.30 37.40 13 | 2,825 20.27 57.67 14 | 714 5.12 62.79 15 | 776 5.57 68.36 (BA) 16 | 2,909 20.87 89.23 (MA) 18 | 1,105 7.93 97.16 20 | 194 1.39 98.55 (PhD) 22 | 202 1.45 100.00 ------------ +----------------------------------- Total | 13,937 100.00

PayW . tab educat_new educat_new | Freq. Percent Cum. ------------------------ +----------------------------------- 2.5 | 7 0.24 0.24 5.5 | 3 0.10 0.35 7.5 | 3 0.10 0.45 9 | 4 0.14 0.59 10 | 7 0.24 0.83 11 | 27 0.94 1.77 (HS) 12 | 163 5.65 7.41 13 | 571 19.78 27.19 14 | 154 5.33 32.53 15 | 100 3.46 35.99 (BA) 16 | 923 31.97 67.96 (MA) 18 | 541 18.74 86.70 20 | 165 5.72 92.41 (PhD) 22 | 219 7.59 100.00 ------------------------ +----------------------------------- Total | 2,887 100.00

Page 5: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests– Weekly wage (logwage)

CPS . sum logprernwa, d logprernwa ------------------------------------------------------------- Percentiles Smallest 1% 4.248495 -4.60517 5% 5.105946 -4.60517 10% 5.500809 -1.469676 Obs 13925 25% 5.991465 0 Sum of Wgt. 13925 50% 6.44572 Mean 6.413416 Largest Std. Dev. .7921646 75% 6.907755 7.967145 90% 7.338537 7.967145 Variance .6275247 95% 7.600903 7.967145 Skewness -1.355724 99% 7.967145 7.967145 Kurtosis 13.02899

PayW . sum logwagegro8, d logwagegro8 ------------------------------------------------------------- Percentiles Smal lest 1% 3.401197 -4.60517 5% 5.347107 -2.207275 10% 5.950643 -1.139434 Obs 2875 25% 6.476973 0 Sum of Wgt. 2875 50% 7.003965 Mean 6.927878 Largest Std. Dev. 1.00245 75% 7.683864 7.967145 90% 7.967145 7.967145 Variance 1.004906 95% 7.967145 7.967145 Skewness -2.677555 99% 7.967145 7.967145 Kurtosis 19.60873

Page 6: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests– A simple model determining income – 2.a. Correlation analysis

Table 1: Matrix of Correlation s for CPS peage pesex peeduca_new logprernwa peage 1 pesex 0.0128 1 peeduca _new 0.1074 0.0575 1 logprernwa 0.2587 -0.2283 0.3675 1

Table 2: Matrix of Correlation s for PayW age_new sex educat_new logwagegro8 age_new 1 sex 0.0093 1 educat_new 0.0120 -0.0071 1 logwagegro8 0.2303 -0.1580 0.1680 1

Page 7: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests– A simple model determining income – 2.b. Regression analysis

A) CPS - UNWEIGHTED . reg logprernwa peage peagesq pesex peeduca_new Source | SS df MS Number of obs = 13925 ------------- +------------ ------------------ F( 4, 13920) = 1389.69 Model | 2493.50892 4 623.37723 Prob > F = 0.0000 Residual | 6244.14544 13920 .448573667 R -squared = 0.2854 ------------- +------------------------------ Adj R -squared = 0.2852 Total | 8737.65436 13924 .627524732 Root MSE = .66976 ------------------------------------------------------------------------------ logprernwa | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- peage | .0968396 .003055 31.70 0.000 .0908514 .1028279 peagesq | -.0010304 .0000376 -27.38 0.000 -.0011042 -.0009566 pesex | -.3917856 .0113767 -34.44 0.000 -.4140855 -.3694857 peeduca_new | .0971796 .0020588 47.20 0.000 .093144 .1012152 _cons | 3.200938 .0618519 51.75 0.000 3.0797 3.322176 ------------------------------------------------------------------------------

Page 8: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests– A simple model determining income – 2.b. Regression analysis

B) PayW . reg logwagegro8 age_new age_newsq sex educat_new Source | SS df MS Number of obs = 2861 ------------- +------------------------------ F( 4, 2856) = 105.35 Model | 358.531783 4 89.6329458 Prob > F = 0.0000 Residual | 2429.99993 2856 .85084031 R -squared = 0.1286 ------------- +------------------------------ Adj R -squared = 0.1274 Total | 2788.53171 2860 .975011087 Root MSE = .92241 ------------------------------------------------------------------------------ logwagegro8 | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------- ------------------------------------------------ age_new | .1254245 .0115034 10.90 0.000 .1028687 .1479803 age_newsq | -.0013163 .0001425 -9.24 0.000 -.0015957 -.0010369 sex | -.2948835 .0346723 -8.50 0 .000 -.3628687 -.2268984 educat_new | .0506158 .0059966 8.44 0.000 .0388577 .0623738 _cons | 3.541355 .2324046 15.24 0.000 3.085658 3.997053 -------------------------------------------------------------------- ----------

Page 9: Methodological Foundations of the Web-based PayWizard Dataset (USA) Isabelle Ferreras (Harvard Labor and Worklife Program/ FNRS/Louvain ) Damian Raess.

2/ Basic statistical tests- 2.b. Regression analysis

Weighting factor: – CPS weighted to be comparable to Pay PayW (%) CPS (%) Cps_weight < High School 1.77 8.10 0.218519 High School 5.64 29.30 0.192491 Some Col./ Ass.deg 28.58 30.96 0.923127 College (BA) 31.97 20.87 1.531864 > College 32.04 10.77 2.974930

C) CPS - UNWEIGHTED . reg logprernwa peage peagesq pesex peeduca_new [pweight=weight] (sum of wgt is 1.3921e+04) Linear regression Number of obs = 13925 F( 4, 13920) = 660.61 Prob > F = 0.0000 R -squared = 0.2464 Root MSE = .71414 ------------------------------------------------------------------------------ | Robust logprernwa | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- peage | .0996643 .0044517 22.39 0.000 .0909385 .1083902 peagesq | -.0010605 .0000562 -18.87 0.000 -.0011707 -.0009504 pesex | -.35972 26 .0165176 -21.78 0.000 -.3920993 -.3273458 peeduca_new | .095726 .0038926 24.59 0.000 .0880959 .103356 _cons | 3.165757 .094409 33.53 0.000 2.980703 3.350811 ------------------------------------------ ------------------------------------