Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find...
Transcript of Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find...
![Page 1: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/1.jpg)
Variable Selectionin Wide Data Sets
Bob Stine
Department of Statistics
The Wharton School of the University of Pennsylvania
www-stat.wharton.upenn.edu/ stine
Univ of Penn – p. 1
![Page 2: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/2.jpg)
Overview• Problem
• Picking the features to use in a model• Wide data sets imply many choices
• Examples• Simulated idealized problems• Predicting credit risk
• Themes• Modifying familiar tools for data mining• Dealing with the problem of multiplicity
Collaboration with Dean Foster
Univ of Penn – p. 2
![Page 3: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/3.jpg)
QuestionsCredit scoring
Who will repay the loan? Who will declare bankruptcy?
Identifying facesIs this a picture of a face?
GenomicsDoes a pattern of genes predict higher risk of a disease?
Text processingWhich references were left out of this paper?
Univ of Penn – p. 3
![Page 4: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/4.jpg)
QuestionsCredit scoring
Who will repay the loan? Who will declare bankruptcy?
Identifying facesIs this a picture of a face?
GenomicsDoes a pattern of genes predict higher risk of a disease?
Text processingWhich references were left out of this paper?
These are great statistics problems, so...Why not use our workhorse, regression?Its familiar with diagnostics available.
Univ of Penn – p. 3
![Page 5: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/5.jpg)
Regression ModelsSimple structure
• n independentobservations ofm features• q predictors in model with error varianceσ2:
Y = β0 + β1X1 + · · · + βqXq + ǫ
Calibrated predictions allow various link functions ...• Linear regression has identity, logistic has logit link.
E(Y |X) = h(β0 + β1X1 + · · · + βqXq)
Rich feature space is key to the model• Usual mix: continuous, categorical, dummy vars, ...• Missing data indicators• Combinations (principal components) and clusters• Nonlinear terms, transformations (quadratics)
Univ of Penn – p. 4
![Page 6: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/6.jpg)
Hard QuestionIf we allow for the possibility ofinteractions among thepredictors, then the feature space becomes larger still.
Which features generate the best predictions?
Also hope to do this without cross-validation.
Univ of Penn – p. 5
![Page 7: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/7.jpg)
A favorite example: Stock MarketWhere’s the stock market headed in 2005?
Univ of Penn – p. 6
![Page 8: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/8.jpg)
A favorite example: Stock MarketWhere’s the stock market headed in 2005?
Daily percentage changes in the closing price of the S&P 500during the last 3 months of 2004 (85 trading days) ...
20 40 60 80 100Day
-1
-0.5
0.5
1
1.5
Pct Change
???
Univ of Penn – p. 6
![Page 9: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/9.jpg)
Predicting the MarketProblem
Predict returns on S&P 500 in 2005 using features built from12 exogenous factors (a.k.a., technical trading rules).
Regression modelR2 = 0.85 usingq = 28 predictors.With n = 85, F = 12 with p-value< 0.00001.
Coefficients are impressive...
Term Estimate Std Error t-Ratio p-value
Intercept 0.323 0.078 4.14 0.0001
X4 0.172 0.040 4.34 0.0000
(X1)*(X1) -0.202 0.039 -5.16 0.0000
(X1)*(X5) 0.256 0.048 5.34 0.0000
(X2)*(X6) 0.289 0.044 6.59 0.0000
(X7)*(X9) 0.249 0.046 5.37 0.0000
...Univ of Penn – p. 7
![Page 10: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/10.jpg)
Model Fit and PredictionsPredictions track the historical returns closely, even matchingturning points.
20 40 60 80 100Day
-2
-1
1
Daily Pct Change
Better short the market that day... No guts, no glory! Univ of Penn – p. 8
![Page 11: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/11.jpg)
Prediction ErrorsIn-sample prediction errors are small...
20 40 60 80 100Day
-1.5
-1
-0.5
0.5
1
1.5
Pct Change
Univ of Penn – p. 9
![Page 12: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/12.jpg)
Prediction ErrorsHow’d you lose the house? Even Bonferroni lost his house!
20 40 60 80 100Day
-1.5
-1
-0.5
0.5
1
1.5
Pct Change
Univ of Penn – p. 9
![Page 13: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/13.jpg)
What was that model?Exogenous base variables
12 columns of Gaussian noise.Null model is the right model.
Selecting predictorsTurn stepwise regression lose onm = 90 features
12 linear + 12 Squares + 66 Interactionswith “promiscuous” settings for adding variables,
prob-to-enter = 0.16 (AIC)Then run stepwise backward to remove extraneous effects.
Why does this fool Bonferroni?s2 becomes biased, so remaining predictors appear moresignificant. Often “cascades” into a perfect fit.
Cannot fit saturated model to get unbiased estimate ofσ2
becausem > n.
Univ of Penn – p. 10
![Page 14: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/14.jpg)
Big Picture, 1Moral
Stepwise regression can make a silk purse from sow’s ear.
Must control the fitting processUse Bonferroni from the start, not after the fact. Nopredictor joins model — the right answer.
So, how does one control the fitting process?Particularly when there are many possible choices.
Univ of Penn – p. 11
![Page 15: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/15.jpg)
Second Example: Predicting BankruptcyPredict onset of personal bankruptcy
Estimate probability customer declares bankruptcy.
ChallengeCanstepwise regressionpredict as well as commercial“data-mining” tools or substantive models?
Many features About 350 “basic” variables• Short time series for each account• Spending, utilization, payments, background• Missing data and indicators• Interactions are important (cash advance in Vegas)
m > 67,000 predictors!
• Transaction history would vastly expand the problem.
Univ of Penn – p. 12
![Page 16: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/16.jpg)
Complication: Sparse ResponseHow much data do you really have?
3 million months of activity, each with 67,000 predictors.
Needle in many haystacksObserve only 2,244 bankruptcies.
Further complicationWhat loss function should be minimized?Profitable customers look risky. Want to lose them?Who is the ideal customer?
“Borrow lots of money and pay it back slowly.”
Cross-validationLess appealing with so few events.
Univ of Penn – p. 13
![Page 17: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/17.jpg)
ApproachForward stepwise search
Use p-values to determine whether to add predictors.
Two questions
1. How to calculate a p-value?2. Where to put the threshold for the p-value?
Risk inflation criterion (RIC) (m =# possible predictors)Hard thresholding, Bonferroni, Fisher’s method
Add Xj ⇐⇒ t2j > 2 log m ⇐⇒ pj < 1m
Obtains RIC bound (Foster & George 1994)
minβ
maxβ
E‖Y − Xβ‖2
|β| σ2≤ 2 log m
where|β| = #{βj 6= 0}.Univ of Penn – p. 14
![Page 18: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/18.jpg)
Adaptive Variable SelectionSparse models
RIC best when “truth” is sparse, but lacks power if muchsignal.False discovery ratemotivates an alternative.
Adaptive threshold If you’ve addedq predictors,
Add Xj ⇐⇒ pj < q+1m
Further motivation• Half-normal plot (Cuthbert Daniel?)• Generalized degrees of freedom (Ye 1998, 2002)• Empirical Bayes (George & Foster 2000)• Information theory (Foster, Stine & Wyner 2002)
Universal prior Which predictors minimize this ratio?
minq
maxπ
E ‖Y − Y (q)‖2
E ‖Y − Y (π)‖2 for β ∼ π
Univ of Penn – p. 15
![Page 19: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/19.jpg)
Example: Finding Subtle SignalSignal is a Brownian bridge
Stylized version of financial volatility.
Yt = BBt + σ ǫt
200 400 600 800 1000
-8
-6
-4
-2
2
4
6
Univ of Penn – p. 16
![Page 20: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/20.jpg)
Example: Finding Subtle SignalWavelet transform has many coefficients but none is very
large relative to the others.
-3 -2 -1 1 2 3
-4
-2
2
4
MSE with adaptive (top) vs. hard (bottom)
1 1.25 1.5 1.75 2 2.25 2.5 Univ of Penn – p. 17
![Page 21: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/21.jpg)
But it fails for bankruptcy!Thresholding classical p-values does not work
Fit on 600,000, predict 2,400,000.As fitting proceeds, out-of-sample error shows sudden spike.Diagnosticplots revealed the problem.
Stylized problem n = 10, 000, ditheredX1 = 1, X2, . . . , X10,000 ∼ N(0, 0.025)P (Y = 1) = 1/1000, independent ofX.
0.2 0.4 0.6 0.8 1X
0.2
0.4
0.6
0.8
1
Y
t = 14, but common sense suggestsp ≈ 1/1000.Univ of Penn – p. 18
![Page 22: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/22.jpg)
What went wrong?Theory implies p-values are known
• Normal distribution on error (thin tailed).• Know “true” error varianceσ2, plus its constant.• Predictors are orthogonal.
⇒ βjiid∼ N(βj, SEj)
Univ of Penn – p. 19
![Page 23: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/23.jpg)
What went wrong?Theory implies p-values are known
• Normal distribution on error (thin tailed).• Know “true” error varianceσ2, plus its constant.• Predictors are orthogonal.
⇒ βjiid∼ N(βj, SEj)
In practice...• Is any data ever normally distributed?• Don’t believe there’s a true model, much less fixed error
variance.• Collinear features, frequently withm > n.
⇒ At best, can only guess p-values.
Univ of Penn – p. 19
![Page 24: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/24.jpg)
Honest p-valuesConcern from stock example
Must avoid “false positives” that lead to inaccuratepredictions, cascade of mistakes.
Robustness of validityRank regression would also protect you to some extent, buthas problems dealing with extreme heteroscedasticity.
Alternatives
Method Requires
Robust estimator (e.g. ranks)Homoscedastic
White estimator Symmetry
Bennett bounds BoundedY
Bounded influence estimatorsAnother possibility, but can it be computed fast enough?
Univ of Penn – p. 20
![Page 25: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/25.jpg)
White Estimator
Y = β0 + βq,1Xq,1 + · · · + βq,qXq,q + ǫ
Sandwich formula (H. White, 1980,Econometrica)
Var(βq) = (X ′
qXq)−1X ′
q Var(ǫ)︸ ︷︷ ︸
Xq(X′
qXq)−1
Estimate variance using the residuals fromprior step:
Var(βq) = (X ′
qXq)−1X ′
q Diag(e2q−1)
︸ ︷︷ ︸Xq(X
′
qXq)−1
Result in stylized problemEstimated SE is 10 times larger, more appropriate p-value.
Moral: Watch your null hypothesis!Only testH0 : βq,q = 0 rather than
H0 : βq,q = 0 & Var(ǫi) = σ2
Univ of Penn – p. 21
![Page 26: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/26.jpg)
Example: Finding Missed SignalQuestion
Does White estimator ever find an effect that OLS misses?Usually OLS underestimates SE.
Heteroscedastic dataHigh variance at 0 obscures the differences at extremes.
-1 -0.5 0.5 1
-30
-20
-10
10
20
30
Least squaresStandard OLS reports SE = 3.6 giving t = 1.2.White SE≈ 0.9 and finds the underlying effect.
Univ of Penn – p. 22
![Page 27: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/27.jpg)
Honest p-values for TestingH0 : β = 0
Three methodsEach method makes some assumption about the data inorder to produce reliable p-value.
Method Requires
Robust estimator (e.g. ranks)Homoscedastic
White estimator Symmetry
Bennett bounds BoundedY
Univ of Penn – p. 23
![Page 28: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/28.jpg)
Bennett InequalityThink differently
Sampling distribution ofβ is not normal because hugeleverage contributes “Poisson-like” variation.
Bennett inequality (Bennett, 1962,JASA)Bounded independent r.v.U1, . . . , Un with max |Ui| < 1,E Ui = 0, and
∑
i E U2i = 1,
P (∑
i
Ui ≥ τ) ≤ exp
(τ
M−
(τ
M+
1
M 2
)
log(1 + Mτ)
)
If M τ is small,log(1 + Mτ) ≈ Mτ − M 2τ 2/2
Allows heteroscedastic dataFree to divy up the variances as you choose, albeit only forbounded random variables.
In stylized example assigns p-value≈ 1/100.Univ of Penn – p. 24
![Page 29: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/29.jpg)
Classification ResultsSuccess!
Not only does the model obtain smaller costs than C4.5(w/wo boosting), it has huge lift:
20 40 60 80 100% Called
20
40
60
80
100% BR Found
Univ of Penn – p. 25
![Page 30: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/30.jpg)
Classification ResultsSuccess!
Not only does the model obtain smaller costs than C4.5(w/wo boosting), it has huge lift:
20 40 60 80 100% Called
20
40
60
80
100% BR Found
But...• Most predictors were interactions.• Know that you missed some things.• Slloooowwwwwwww.
Univ of Penn – p. 25
![Page 31: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/31.jpg)
Big Picture, 2Moral
Better get the right p-values
Successful
Carefully computed standard errors and p-values+
Adaptive thresholding rules= Competitive procedure
Univ of Penn – p. 26
![Page 32: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/32.jpg)
Big Picture, 2Moral
Better get the right p-values
Successful
Carefully computed standard errors and p-values+
Adaptive thresholding rules= Competitive procedure
Downside• Know that we’re not leveraging domain knowledge.• Model is more complicated than need be.• Took forever to run.
Univ of Penn – p. 26
![Page 33: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/33.jpg)
ObesityThe issues arising the bankruptcy application only get worse asthe data set gets wider...
Univ of Penn – p. 27
![Page 34: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/34.jpg)
ObesityThe issues arising the bankruptcy application only get worse asthe data set gets wider...
and data sets seem to be getting wider and wider.
Application Number of Cases Number Raw Features
Bankruptcy 3,000,000 350
Faces 10,000 1,400
Genetics 1,000 10,000
CiteSeer 500 10,000,000
Univ of Penn – p. 27
![Page 35: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/35.jpg)
Sequential SelectionSimple idea
Search features insome order rather than all at once.
Opportunities• Incorporate substantive knowledge into search order• Sequential selection of features based on current status• Open vs. closed view of space of predictors• Run faster
Heuristics• Theory from adaptive selection suggests that if you can
order predictors, then little to be gained from knowingβ.• Alpha spending rules in clinical trials.• Depth-first rather than breath-first search.
Univ of Penn – p. 28
![Page 36: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/36.jpg)
Multiple Hypothesis Testing
Test m null hypotheses{H1, . . . , Hm}, Hj : θj = 0.
AcceptH0 RejectH0
True H0 U θ(m) V θ(m) m0
State Hc0 T θ(m) Sθ(m) m − m0
m − R(m) R(m) m
V θ(m) =∑m
j=1 V θj indicators
Univ of Penn – p. 29
![Page 37: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/37.jpg)
Multiple Hypothesis Testing
Test m null hypotheses{H1, . . . , Hm}, Hj : θj = 0.
AcceptH0 RejectH0
True H0 U θ(m) V θ(m) m0
State Hc0 T θ(m) Sθ(m) m − m0
m − R(m) R(m) m
V θ(m) =∑m
j=1 V θj indicators
Classical criterion Family wide error rate
FWER(m) = P0(Vθ(m) ≥ 1)
Univ of Penn – p. 29
![Page 38: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/38.jpg)
Multiple Hypothesis Testing
Test m null hypotheses{H1, . . . , Hm}, Hj : θj = 0.
AcceptH0 RejectH0
True H0 U θ(m) V θ(m) m0
State Hc0 T θ(m) Sθ(m) m − m0
m − R(m) R(m) m
V θ(m) =∑m
j=1 V θj indicators
Classical criterion Family wide error rate
FWER(m) = P0(Vθ(m) ≥ 1)
Classical procedure BonferroniTest eachHj at levelαj so that
P0(Vθj = 1) ≤ αj,
∑
j
αj ≤ αUniv of Penn – p. 29
![Page 39: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/39.jpg)
False Discovery RateApproach Control testing procedure once it rejects.
Univ of Penn – p. 30
![Page 40: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/40.jpg)
False Discovery RateApproach Control testing procedure once it rejects.
FDR criterion (Benjamini & Hochberg 1995,JRSSB)Proportion of false positives among rejects
FDR(m) = Eθ
(V (m)
R(m)| R(m) > 0
)
P (R(m) > 0) .
Univ of Penn – p. 30
![Page 41: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/41.jpg)
False Discovery RateApproach Control testing procedure once it rejects.
FDR criterion (Benjamini & Hochberg 1995,JRSSB)Proportion of false positives among rejects
FDR(m) = Eθ
(V (m)
R(m)| R(m) > 0
)
P (R(m) > 0) .
Step-down testing (procedure)Order p-values ofm independenttests of hypotheses
p(1) ≤ p(2) ≤ · · · ≤ p(m)
Starts with Bonferroni comparisonp(1) ≤ α/m, thengradually raises threshold as more hypotheses are rejected.RejectH(j) if p(j) ≤ αj/m.
Controls FWER and FDR with more power.
Univ of Penn – p. 30
![Page 42: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/42.jpg)
An Alternative to FDRCount the number ofcorrect rejectionsSθ(m) in excess of afraction of the total number rejected.
EDCα,γ(m) = Eθ[Sθ(m) − γ R(m)] + α , 0 < α, γ < 1 .
Heuristically,α controls FWER andγ controls FDR.Testing procedure “controls EDC”⇐⇒ EDC≥ 0
Univ of Penn – p. 31
![Page 43: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/43.jpg)
An Alternative to FDRCount the number ofcorrect rejectionsSθ(m) in excess of afraction of the total number rejected.
EDCα,γ(m) = Eθ[Sθ(m) − γ R(m)] + α , 0 < α, γ < 1 .
Heuristically,α controls FWER andγ controls FDR.Testing procedure “controls EDC”⇐⇒ EDC≥ 0
No signal Moderate Strong signalΘ
CountEΘR
Γ EΘR - Α
EDCEΘS
Θ
Univ of Penn – p. 31
![Page 44: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/44.jpg)
Comparison to FDRSimulate tests ofm = 200 hypotheses with
µj ∼
{
0 w.p. 1 − π1
N(0, σ2) w.p. π1
Univ of Penn – p. 32
![Page 45: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/45.jpg)
Comparison to FDRSimulate tests ofm = 200 hypotheses with
µj ∼
{
0 w.p. 1 − π1
N(0, σ2) w.p. π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08FDR
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08FDR
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
-2
0
2
4
6
EDC
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
-2
0
2
4
6
EDC
Three methods: naive fixedα level, step-down, Bonferroni.
Univ of Penn – p. 32
![Page 46: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/46.jpg)
Alpha-Investing RulesSequentially tests hypothesesin style of alpha-spending rules.
Univ of Penn – p. 33
![Page 47: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/47.jpg)
Alpha-Investing RulesSequentially tests hypothesesin style of alpha-spending rules.
Alpha-wealthTest next hypothesisHj with levelαj up to its currentalpha-wealth,
0 < αj ≤ W (j − 1)
Univ of Penn – p. 33
![Page 48: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/48.jpg)
Alpha-Investing RulesSequentially tests hypothesesin style of alpha-spending rules.
Alpha-wealthTest next hypothesisHj with levelαj up to its currentalpha-wealth,
0 < αj ≤ W (j − 1)
P-valueDetermines how alpha-wealth changes:
W (j) ≈ W (j − 1) +
{
ω − pj if pj ≤ αj ,
−αj if pj > αj .
PayoutIf the test rejects, the rule “earns” payoutω toward futuretests. Otherwise, its wealth decreases byαj.
Univ of Penn – p. 33
![Page 49: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/49.jpg)
Alpha-investing Controls EDCBenefit of EDC
Because it uses differences in counts rather than expectedratio, we can prove alpha investing satisfies EDC criterion.
TheoremAn alpha-investing rule with initial wealthW (0) ≤ α andpayoffω ≤ 1 − γ controls EDC,
infM
infθ
Eθ EDCα,γ(M) ≥ 0
for any stopping timeM .
GeneralityThe tests need not be independent, just “honest” in the sensethat
E (Vj|R1, . . . , Rj−1) ≤ αj underHj
Univ of Penn – p. 34
![Page 50: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/50.jpg)
Comparison to Step-Down TestingSimulation
Same setup.200 hypotheses, varying proportion of signalπ1.
Testing procedures• Step-down testing• Alpha-investing: “conservative” and “aggressive”
Two scenarios: What is known?Order of testing differs in the two cases:Nothing: random order.Lots: order of|µj| (notY j)
Univ of Penn – p. 35
![Page 51: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/51.jpg)
Approaches to Alpha-InvestingConservative
Micro-investing matches performance of FDR.Suppose FDR rejectsk hypotheses, andα = ω = W (0).• Test allm hypotheses at Bonferroni levelα/m.
RejectsH(1): payα = m × α/m to earnα.• Test otherm − 1 hypotheses givenpj > α/m.
RejectsH(2): paysα and earnsα.• Continue...
Rejects at least those FDR rejects, and perhaps more.
Univ of Penn – p. 36
![Page 52: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/52.jpg)
Approaches to Alpha-InvestingConservative
Micro-investing matches performance of FDR.Suppose FDR rejectsk hypotheses, andα = ω = W (0).• Test allm hypotheses at Bonferroni levelα/m.
RejectsH(1): payα = m × α/m to earnα.• Test otherm − 1 hypotheses givenpj > α/m.
RejectsH(2): paysα and earnsα.• Continue...
Rejects at least those FDR rejects, and perhaps more.
AggressiveSpend moreα testing leading hypotheses since these shouldhave the most signal — if your science is right.e.g., Set level as
αj =cW (0)
j2, j = 1, 2, . . .
Univ of Penn – p. 36
![Page 53: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/53.jpg)
Results of SimulationSimulate tests ofm = 200 hypotheses with
µj ∼
{
0 w.p. 1 − π1
N(0, σ2) w.p. π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
FDR
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
FDR
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
-2
0
2
4
6
EDC
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
-2
0
2
4
6
EDC
Step-down testing · · ·Alpha-investing (conservative —-, aggressive− · −)
Univ of Penn – p. 37
![Page 54: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/54.jpg)
Knowledge = PowerPercentage rejection relative to step-down testing.
100Sθ(m, aggressive investing)
Sθ(m, step-down)> 150%
Aggressive alpha investingworks well when the information ituses is accurate.
.1 .2 .3 .4 .5 .6 .7 .8 .9 1Π1
100
110
120
130
140
150
% Rejected vs Step-Down
Step-down
Conservative
Aggressive
Univ of Penn – p. 38
![Page 55: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/55.jpg)
Going FurtherIn for a penny, in for a pound
EDC and alpha-investing provide framework for testing astream of hypotheses usingseveral investing rules.
Multiple predictor streams• Start with raw variables as basic stream.• Once selectX1 andX2, try X1 ∗ X2.
Auction = Multiple alpha-investing rules• Two rules, one forX ’s and second for interactions.• Each starts with wealthα/2.• Easy to wager more form X ’s thanm2 interactions.• Best strategy accumulates wealth, makes most choices.
It works!• Have run auctions through 1,000,000 predictors.• Finds linear effectsand high-order interactions. Univ of Penn – p. 39
![Page 56: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/56.jpg)
DiscussionVariable selection
• Importance of p-values (which not all methods have).• Role for cross-validation.• Provably as good as class of Bayes estimators.
Univ of Penn – p. 40
![Page 57: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/57.jpg)
DiscussionVariable selection
• Importance of p-values (which not all methods have).• Role for cross-validation.• Provably as good as class of Bayes estimators.
Strategies for feature creation• Room for serious improvement• Substantive expert can order features (chemist)• Parasitic rules that exploit substantive rules
Univ of Penn – p. 40
![Page 58: Variable Selection in Wide Data Setsstine/research/penn_2005.pdf · Does White estimator ever find an effect that OLS misses? Usually OLS underestimates SE. Heteroscedastic data](https://reader034.fdocuments.us/reader034/viewer/2022042712/5f9e3925c41bcd62a975ba1b/html5/thumbnails/58.jpg)
DiscussionVariable selection
• Importance of p-values (which not all methods have).• Role for cross-validation.• Provably as good as class of Bayes estimators.
Strategies for feature creation• Room for serious improvement• Substantive expert can order features (chemist)• Parasitic rules that exploit substantive rules
Multiple testing using EDC• Adaptive sequential trials• Universal wagering
Univ of Penn – p. 40