Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton...

68
Validity First, Optimality Next An Approach to Valid Post-selection Inference in Assumption-lean Linear Regression Arun Kumar Kuchibhotla 1 Department of Statistics University of Pennsylvania WHOA-PSI 2017 Working Draft Available at: https://statistics.wharton.upenn.edu/profile/arunku/ #research 1 Joint work with Wharton Group, including, Larry Brown, Andreas Buja, Richard Berk, Ed George, Linda Zhao. Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 1 / 22

Transcript of Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton...

Page 1: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Validity First, Optimality NextAn Approach to Valid Post-selection Inference in Assumption-lean

Linear Regression

Arun Kumar Kuchibhotla1

Department of StatisticsUniversity of Pennsylvania

WHOA-PSI 2017

Working Draft Available at:https://statistics.wharton.upenn.edu/profile/arunku/

#research1Joint work with Wharton Group, including, Larry Brown, Andreas Buja, Richard

Berk, Ed George, Linda Zhao.Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 1 / 22

Page 2: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 2 / 22

Page 3: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22

Page 4: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Linear Regression

Suppose (Xi ,Yi) ∈ Rp × R (1 ≤ i ≤ n) are random vectorsavailable as data.We apply linear regression on this data.The least squares estimator of “slope” vector is β̂ given by

β̂ =(En[XX>]

)−1(En[XY ]).

(En represents empirical mean.)If the random vectors satisfy weak law of large numbers, then thisestimator converges (in probability) to

β0 =(E[XX>]

)−1(E [XY ]).

β0 has an interpretation as best linear predictor (slope) in terms ofsquared error loss.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 4 / 22

Page 5: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Linear Regression

Suppose (Xi ,Yi) ∈ Rp × R (1 ≤ i ≤ n) are random vectorsavailable as data.We apply linear regression on this data.The least squares estimator of “slope” vector is β̂ given by

β̂ =(En[XX>]

)−1(En[XY ]).

(En represents empirical mean.)If the random vectors satisfy weak law of large numbers, then thisestimator converges (in probability) to

β0 =(E[XX>]

)−1(E [XY ]).

β0 has an interpretation as best linear predictor (slope) in terms ofsquared error loss.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 4 / 22

Page 6: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Linear Regression

Suppose (Xi ,Yi) ∈ Rp × R (1 ≤ i ≤ n) are random vectorsavailable as data.We apply linear regression on this data.The least squares estimator of “slope” vector is β̂ given by

β̂ =(En[XX>]

)−1(En[XY ]).

(En represents empirical mean.)If the random vectors satisfy weak law of large numbers, then thisestimator converges (in probability) to

β0 =(E[XX>]

)−1(E [XY ]).

β0 has an interpretation as best linear predictor (slope) in terms ofsquared error loss.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 4 / 22

Page 7: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Linear Regression

Suppose (Xi ,Yi) ∈ Rp × R (1 ≤ i ≤ n) are random vectorsavailable as data.We apply linear regression on this data.The least squares estimator of “slope” vector is β̂ given by

β̂ =(En[XX>]

)−1(En[XY ]).

(En represents empirical mean.)If the random vectors satisfy weak law of large numbers, then thisestimator converges (in probability) to

β0 =(E[XX>]

)−1(E [XY ]).

β0 has an interpretation as best linear predictor (slope) in terms ofsquared error loss.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 4 / 22

Page 8: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Linear Regression

Suppose (Xi ,Yi) ∈ Rp × R (1 ≤ i ≤ n) are random vectorsavailable as data.We apply linear regression on this data.The least squares estimator of “slope” vector is β̂ given by

β̂ =(En[XX>]

)−1(En[XY ]).

(En represents empirical mean.)If the random vectors satisfy weak law of large numbers, then thisestimator converges (in probability) to

β0 =(E[XX>]

)−1(E [XY ]).

β0 has an interpretation as best linear predictor (slope) in terms ofsquared error loss.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 4 / 22

Page 9: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Comments

It is important to recognize that to use linear regression, we doNOT need any model or distributional assumptions.We do NOT even require independence or identical distributionsfor random vectors. Existence of LLN is enough.This perspective also works if the covariate vectors Xi are treatedas fixed.The target β0 coincides with the usual model parameter under atrue linear model.CLT: If the random vectors (Xi ,Yi) are independent with finitefourth moments, then

n1/2(β̂ − β0

)L→ N

(0, J−1KJ−1

), ← efficiency

with J = E[XX>] and K = E[XX>(Y − X>β0)2] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 5 / 22

Page 10: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Comments

It is important to recognize that to use linear regression, we doNOT need any model or distributional assumptions.We do NOT even require independence or identical distributionsfor random vectors. Existence of LLN is enough.This perspective also works if the covariate vectors Xi are treatedas fixed.The target β0 coincides with the usual model parameter under atrue linear model.CLT: If the random vectors (Xi ,Yi) are independent with finitefourth moments, then

n1/2(β̂ − β0

)L→ N

(0, J−1KJ−1

), ← efficiency

with J = E[XX>] and K = E[XX>(Y − X>β0)2] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 5 / 22

Page 11: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Comments

It is important to recognize that to use linear regression, we doNOT need any model or distributional assumptions.We do NOT even require independence or identical distributionsfor random vectors. Existence of LLN is enough.This perspective also works if the covariate vectors Xi are treatedas fixed.The target β0 coincides with the usual model parameter under atrue linear model.CLT: If the random vectors (Xi ,Yi) are independent with finitefourth moments, then

n1/2(β̂ − β0

)L→ N

(0, J−1KJ−1

), ← efficiency

with J = E[XX>] and K = E[XX>(Y − X>β0)2] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 5 / 22

Page 12: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Comments

It is important to recognize that to use linear regression, we doNOT need any model or distributional assumptions.We do NOT even require independence or identical distributionsfor random vectors. Existence of LLN is enough.This perspective also works if the covariate vectors Xi are treatedas fixed.The target β0 coincides with the usual model parameter under atrue linear model.CLT: If the random vectors (Xi ,Yi) are independent with finitefourth moments, then

n1/2(β̂ − β0

)L→ N

(0, J−1KJ−1

), ← efficiency

with J = E[XX>] and K = E[XX>(Y − X>β0)2] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 5 / 22

Page 13: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Comments

It is important to recognize that to use linear regression, we doNOT need any model or distributional assumptions.We do NOT even require independence or identical distributionsfor random vectors. Existence of LLN is enough.This perspective also works if the covariate vectors Xi are treatedas fixed.The target β0 coincides with the usual model parameter under atrue linear model.CLT: If the random vectors (Xi ,Yi) are independent with finitefourth moments, then

n1/2(β̂ − β0

)L→ N

(0, J−1KJ−1

), ← efficiency

with J = E[XX>] and K = E[XX>(Y − X>β0)2] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 5 / 22

Page 14: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 6 / 22

Page 15: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Notation

We have p covariates and model selection refers to choosing asubset M ⊆ {1,2, . . . ,p} stating which covariates to consider inthe model.For any vector v ∈ Rp and model M, the vector v(M) ∈ R|M|denotes the sub-vector of v with indices in M.For instance, if v = (1,0,5,−1), M = {2,4} and p = 4, thenv(M) = (0,−1).Let for any model M ⊆ {1,2, . . . ,p},

Σn(M) := En

[X (M)X>(M)

]and Σ(M) := E

[X (M)X>(M)

].

Γn(M) := En [X (M)Y ] and Γ(M) := E [X (M)Y ] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 7 / 22

Page 16: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Notation

We have p covariates and model selection refers to choosing asubset M ⊆ {1,2, . . . ,p} stating which covariates to consider inthe model.For any vector v ∈ Rp and model M, the vector v(M) ∈ R|M|denotes the sub-vector of v with indices in M.For instance, if v = (1,0,5,−1), M = {2,4} and p = 4, thenv(M) = (0,−1).Let for any model M ⊆ {1,2, . . . ,p},

Σn(M) := En

[X (M)X>(M)

]and Σ(M) := E

[X (M)X>(M)

].

Γn(M) := En [X (M)Y ] and Γ(M) := E [X (M)Y ] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 7 / 22

Page 17: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Notation

We have p covariates and model selection refers to choosing asubset M ⊆ {1,2, . . . ,p} stating which covariates to consider inthe model.For any vector v ∈ Rp and model M, the vector v(M) ∈ R|M|denotes the sub-vector of v with indices in M.For instance, if v = (1,0,5,−1), M = {2,4} and p = 4, thenv(M) = (0,−1).Let for any model M ⊆ {1,2, . . . ,p},

Σn(M) := En

[X (M)X>(M)

]and Σ(M) := E

[X (M)X>(M)

].

Γn(M) := En [X (M)Y ] and Γ(M) := E [X (M)Y ] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 7 / 22

Page 18: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some Notation

We have p covariates and model selection refers to choosing asubset M ⊆ {1,2, . . . ,p} stating which covariates to consider inthe model.For any vector v ∈ Rp and model M, the vector v(M) ∈ R|M|denotes the sub-vector of v with indices in M.For instance, if v = (1,0,5,−1), M = {2,4} and p = 4, thenv(M) = (0,−1).Let for any model M ⊆ {1,2, . . . ,p},

Σn(M) := En

[X (M)X>(M)

]and Σ(M) := E

[X (M)X>(M)

].

Γn(M) := En [X (M)Y ] and Γ(M) := E [X (M)Y ] .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 7 / 22

Page 19: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Notation Contd.

Define the least squares estimator and target as

β̂M := arg minθ∈R|M|

En

[(Y − X>(M)θ)2

]βM := arg min

θ∈R|M|E[(Y − X>(M)θ)2

]No matter whether M is fixed or random, β̂M is estimating βM .What it means is secondary.For any 1 ≤ k ≤ p, define

M(k) = {M ⊆ {1,2, . . . ,p} such that |M| ≤ k}.

Define, finally,

D2n := ‖Σn − Σ‖∞ and D1n := ‖Γn − Γ‖∞ .

All the rates about β̂M − βM can be bounded in terms of D1n andD2n (possibly sub-optimally).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 8 / 22

Page 20: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Notation Contd.

Define the least squares estimator and target as

β̂M := arg minθ∈R|M|

En

[(Y − X>(M)θ)2

]βM := arg min

θ∈R|M|E[(Y − X>(M)θ)2

]No matter whether M is fixed or random, β̂M is estimating βM .What it means is secondary.For any 1 ≤ k ≤ p, define

M(k) = {M ⊆ {1,2, . . . ,p} such that |M| ≤ k}.

Define, finally,

D2n := ‖Σn − Σ‖∞ and D1n := ‖Γn − Γ‖∞ .

All the rates about β̂M − βM can be bounded in terms of D1n andD2n (possibly sub-optimally).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 8 / 22

Page 21: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Notation Contd.

Define the least squares estimator and target as

β̂M := arg minθ∈R|M|

En

[(Y − X>(M)θ)2

]βM := arg min

θ∈R|M|E[(Y − X>(M)θ)2

]No matter whether M is fixed or random, β̂M is estimating βM .What it means is secondary.For any 1 ≤ k ≤ p, define

M(k) = {M ⊆ {1,2, . . . ,p} such that |M| ≤ k}.

Define, finally,

D2n := ‖Σn − Σ‖∞ and D1n := ‖Γn − Γ‖∞ .

All the rates about β̂M − βM can be bounded in terms of D1n andD2n (possibly sub-optimally).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 8 / 22

Page 22: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Notation Contd.

Define the least squares estimator and target as

β̂M := arg minθ∈R|M|

En

[(Y − X>(M)θ)2

]βM := arg min

θ∈R|M|E[(Y − X>(M)θ)2

]No matter whether M is fixed or random, β̂M is estimating βM .What it means is secondary.For any 1 ≤ k ≤ p, define

M(k) = {M ⊆ {1,2, . . . ,p} such that |M| ≤ k}.

Define, finally,

D2n := ‖Σn − Σ‖∞ and D1n := ‖Γn − Γ‖∞ .

All the rates about β̂M − βM can be bounded in terms of D1n andD2n (possibly sub-optimally).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 8 / 22

Page 23: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Notation Contd.

Define the least squares estimator and target as

β̂M := arg minθ∈R|M|

En

[(Y − X>(M)θ)2

]βM := arg min

θ∈R|M|E[(Y − X>(M)θ)2

]No matter whether M is fixed or random, β̂M is estimating βM .What it means is secondary.For any 1 ≤ k ≤ p, define

M(k) = {M ⊆ {1,2, . . . ,p} such that |M| ≤ k}.

Define, finally,

D2n := ‖Σn − Σ‖∞ and D1n := ‖Γn − Γ‖∞ .

All the rates about β̂M − βM can be bounded in terms of D1n andD2n (possibly sub-optimally).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 8 / 22

Page 24: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Uniform (in Models) Results

Table: Uniform Rates of Convergence (∨

k ≡ supM∈M(k)).

Quantity Rate Bound Sub-Gaussianity

∨k

∥∥∥β̂M − βM

∥∥∥1

k(D1n +D2nS1,k ) k√

log p/n∨k

∥∥∥β̂M − βM

∥∥∥2

k1/2D1n + kD2nS2,k√

k log p/n∨k ‖Σn(M)− Σ(M)‖op kD2n

√k log p/n

Here

S1,k := supM∈M(k)

‖βM‖1 and S2,k := supM∈M(k)

‖βM‖2 .

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 9 / 22

Page 25: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Two Famous Quotes . . .

. . . and two ways of invalidating inference!!

John. W. Tukey:The greatest value of a picture is when it forces us tonotice what we never expected to see.

. . . more emphasis needed to be placed on using data tosuggest hypotheses to test.

George E. P. Box:all models are wrong, but some are useful.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 10 / 22

Page 26: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Two Famous Quotes . . .

. . . and two ways of invalidating inference!!

John. W. Tukey:The greatest value of a picture is when it forces us tonotice what we never expected to see.

. . . more emphasis needed to be placed on using data tosuggest hypotheses to test.

George E. P. Box:all models are wrong, but some are useful.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 10 / 22

Page 27: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Two Famous Quotes . . .

. . . and two ways of invalidating inference!!

John. W. Tukey:The greatest value of a picture is when it forces us tonotice what we never expected to see.

. . . more emphasis needed to be placed on using data tosuggest hypotheses to test.

George E. P. Box:all models are wrong, but some are useful.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 10 / 22

Page 28: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Two Famous Quotes . . .

. . . and two ways of invalidating inference!!

John. W. Tukey:The greatest value of a picture is when it forces us tonotice what we never expected to see.

. . . more emphasis needed to be placed on using data tosuggest hypotheses to test.

George E. P. Box:all models are wrong, but some are useful.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 10 / 22

Page 29: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Two Famous Quotes . . .

. . . and two ways of invalidating inference!!

John. W. Tukey:The greatest value of a picture is when it forces us tonotice what we never expected to see.

. . . more emphasis needed to be placed on using data tosuggest hypotheses to test.

George E. P. Box:all models are wrong, but some are useful.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 10 / 22

Page 30: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The PoSI Problem

We concentrate on the problem of constructing valid confidenceregions which readily allows us to get valid tests of hypotheses.The PoSI problem is given by

ProblemConstruct a set of confidence regions (depending on α)

{R̂M : M ∈M}

for some non-random set of models,M, such that for any randommodel M̂ with P(M̂ ∈M)→ 1,

lim infn→∞

P(βM̂ ∈ R̂M̂

)≥ 1− α.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 11 / 22

Page 31: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The PoSI Problem

We concentrate on the problem of constructing valid confidenceregions which readily allows us to get valid tests of hypotheses.The PoSI problem is given by

ProblemConstruct a set of confidence regions (depending on α)

{R̂M : M ∈M}

for some non-random set of models,M, such that for any randommodel M̂ with P(M̂ ∈M)→ 1,

lim infn→∞

P(βM̂ ∈ R̂M̂

)≥ 1− α.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 11 / 22

Page 32: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The PoSI Problem

We concentrate on the problem of constructing valid confidenceregions which readily allows us to get valid tests of hypotheses.The PoSI problem is given by

ProblemConstruct a set of confidence regions (depending on α)

{R̂M : M ∈M}

for some non-random set of models,M, such that for any randommodel M̂ with P(M̂ ∈M)→ 1,

lim infn→∞

P(βM̂ ∈ R̂M̂

)≥ 1− α.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 11 / 22

Page 33: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The Basic Lemma

LemmaFor any set of confidence regions

{R̂M : M ∈M},

and any random model M̂ satisfying P(M̂ ∈M) = 1, the followinglower bound holds:

P(βM̂ ∈ R̂M̂

)≥ P

( ⋂M∈M

{βM ∈ R̂M

}).

We will provide confidence regions such that the right handprobability is bounded below by 1− α (asymptotically).This Lemma is also the basis for valid PoSI in (fixed covariate)setting of Berk et al. (2013).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 12 / 22

Page 34: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The Basic Lemma

LemmaFor any set of confidence regions

{R̂M : M ∈M},

and any random model M̂ satisfying P(M̂ ∈M) = 1, the followinglower bound holds:

P(βM̂ ∈ R̂M̂

)≥ P

( ⋂M∈M

{βM ∈ R̂M

}).

We will provide confidence regions such that the right handprobability is bounded below by 1− α (asymptotically).This Lemma is also the basis for valid PoSI in (fixed covariate)setting of Berk et al. (2013).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 12 / 22

Page 35: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

The Basic Lemma

LemmaFor any set of confidence regions

{R̂M : M ∈M},

and any random model M̂ satisfying P(M̂ ∈M) = 1, the followinglower bound holds:

P(βM̂ ∈ R̂M̂

)≥ P

( ⋂M∈M

{βM ∈ R̂M

}).

We will provide confidence regions such that the right handprobability is bounded below by 1− α (asymptotically).This Lemma is also the basis for valid PoSI in (fixed covariate)setting of Berk et al. (2013).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 12 / 22

Page 36: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some More Notation

Recall

D1n := ‖Γn − Γ‖∞ and D2n := ‖Σn − Σ‖∞ .

Define C1(α) and C2(α) by

P(D1n ≤ C1(α) and D2n ≤ C2(α)

)≥ 1− α.

In the following, we give three different confidence regions thatsatisfy the validity.If k = o(

√n/ log p), then

lim infn→∞

P

⋂M∈M(k)

{βM ∈ R̂M

} ≥ 1− α

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 13 / 22

Page 37: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some More Notation

Recall

D1n := ‖Γn − Γ‖∞ and D2n := ‖Σn − Σ‖∞ .

Define C1(α) and C2(α) by

P(D1n ≤ C1(α) and D2n ≤ C2(α)

)≥ 1− α.

In the following, we give three different confidence regions thatsatisfy the validity.If k = o(

√n/ log p), then

lim infn→∞

P

⋂M∈M(k)

{βM ∈ R̂M

} ≥ 1− α

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 13 / 22

Page 38: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some More Notation

Recall

D1n := ‖Γn − Γ‖∞ and D2n := ‖Σn − Σ‖∞ .

Define C1(α) and C2(α) by

P(D1n ≤ C1(α) and D2n ≤ C2(α)

)≥ 1− α.

In the following, we give three different confidence regions thatsatisfy the validity.If k = o(

√n/ log p), then

lim infn→∞

P

⋂M∈M(k)

{βM ∈ R̂M

} ≥ 1− α

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 13 / 22

Page 39: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Some More Notation

Recall

D1n := ‖Γn − Γ‖∞ and D2n := ‖Σn − Σ‖∞ .

Define C1(α) and C2(α) by

P(D1n ≤ C1(α) and D2n ≤ C2(α)

)≥ 1− α.

In the following, we give three different confidence regions thatsatisfy the validity.If k = o(

√n/ log p), then

lim infn→∞

P

⋂M∈M(k)

{βM ∈ R̂M

} ≥ 1− α

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 13 / 22

Page 40: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 14 / 22

Page 41: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Dantzig

Define for any model M ⊆ {1,2 . . . ,p},

R̂M :={θ ∈ R|M| :

∥∥∥Σn(M)(β̂M − θ

)∥∥∥∞≤ C1(α) + C2(α)

∥∥∥β̂M

∥∥∥1

}.

Note the left hand side is

Σn(M)(β̂M − θ

)=

1n

n∑i=1

Xi(M)(

Yi − X>i (M)θ).

Thus, the confidence set is very similar to the constraint set ofDantzig selector and so the name PoSI-Dantzig.Validity does not require either independence or identicaldistribution for the random vectors.It allows for diverging number of covariates. If (Xi ,Yi) areindependent and sub-Gaussian then for k = o(

√n/ log p),

Leb(R̂M

)= Op

(√|M| log p

n

)|M|uniformly for M ∈M(k).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 15 / 22

Page 42: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Dantzig

Define for any model M ⊆ {1,2 . . . ,p},

R̂M :={θ ∈ R|M| :

∥∥∥Σn(M)(β̂M − θ

)∥∥∥∞≤ C1(α) + C2(α)

∥∥∥β̂M

∥∥∥1

}.

Note the left hand side is

Σn(M)(β̂M − θ

)=

1n

n∑i=1

Xi(M)(

Yi − X>i (M)θ).

Thus, the confidence set is very similar to the constraint set ofDantzig selector and so the name PoSI-Dantzig.Validity does not require either independence or identicaldistribution for the random vectors.It allows for diverging number of covariates. If (Xi ,Yi) areindependent and sub-Gaussian then for k = o(

√n/ log p),

Leb(R̂M

)= Op

(√|M| log p

n

)|M|uniformly for M ∈M(k).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 15 / 22

Page 43: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Dantzig

Define for any model M ⊆ {1,2 . . . ,p},

R̂M :={θ ∈ R|M| :

∥∥∥Σn(M)(β̂M − θ

)∥∥∥∞≤ C1(α) + C2(α)

∥∥∥β̂M

∥∥∥1

}.

Note the left hand side is

Σn(M)(β̂M − θ

)=

1n

n∑i=1

Xi(M)(

Yi − X>i (M)θ).

Thus, the confidence set is very similar to the constraint set ofDantzig selector and so the name PoSI-Dantzig.Validity does not require either independence or identicaldistribution for the random vectors.It allows for diverging number of covariates. If (Xi ,Yi) areindependent and sub-Gaussian then for k = o(

√n/ log p),

Leb(R̂M

)= Op

(√|M| log p

n

)|M|uniformly for M ∈M(k).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 15 / 22

Page 44: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Dantzig

Define for any model M ⊆ {1,2 . . . ,p},

R̂M :={θ ∈ R|M| :

∥∥∥Σn(M)(β̂M − θ

)∥∥∥∞≤ C1(α) + C2(α)

∥∥∥β̂M

∥∥∥1

}.

Note the left hand side is

Σn(M)(β̂M − θ

)=

1n

n∑i=1

Xi(M)(

Yi − X>i (M)θ).

Thus, the confidence set is very similar to the constraint set ofDantzig selector and so the name PoSI-Dantzig.Validity does not require either independence or identicaldistribution for the random vectors.It allows for diverging number of covariates. If (Xi ,Yi) areindependent and sub-Gaussian then for k = o(

√n/ log p),

Leb(R̂M

)= Op

(√|M| log p

n

)|M|uniformly for M ∈M(k).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 15 / 22

Page 45: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Dantzig

Define for any model M ⊆ {1,2 . . . ,p},

R̂M :={θ ∈ R|M| :

∥∥∥Σn(M)(β̂M − θ

)∥∥∥∞≤ C1(α) + C2(α)

∥∥∥β̂M

∥∥∥1

}.

Note the left hand side is

Σn(M)(β̂M − θ

)=

1n

n∑i=1

Xi(M)(

Yi − X>i (M)θ).

Thus, the confidence set is very similar to the constraint set ofDantzig selector and so the name PoSI-Dantzig.Validity does not require either independence or identicaldistribution for the random vectors.It allows for diverging number of covariates. If (Xi ,Yi) areindependent and sub-Gaussian then for k = o(

√n/ log p),

Leb(R̂M

)= Op

(√|M| log p

n

)|M|uniformly for M ∈M(k).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 15 / 22

Page 46: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 16 / 22

Page 47: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Lasso

Define for M ⊆ {1,2, . . . ,p},

R̂M :=

{θ : L̂M(θ) ≤ L̂M(β̂M) + 4C1(α)

∥∥∥β̂M

∥∥∥1

+ 2C2(α)∥∥∥β̂M

∥∥∥2

1

},

whereL̂M(θ) = En(Y − X>(M)θ)2.

Note C2(α) is related to the quantile of ‖Σn − Σ‖∞. This is exactlyzero, if covariates are treated fixed.

In this case, the right hand side is exactly the same as the lassoobjective function. Hence the name PoSI-Lasso.

Validity does not require either independence or identicaldistributions for the random vectors.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 17 / 22

Page 48: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Lasso

Define for M ⊆ {1,2, . . . ,p},

R̂M :=

{θ : L̂M(θ) ≤ L̂M(β̂M) + 4C1(α)

∥∥∥β̂M

∥∥∥1

+ 2C2(α)∥∥∥β̂M

∥∥∥2

1

},

whereL̂M(θ) = En(Y − X>(M)θ)2.

Note C2(α) is related to the quantile of ‖Σn − Σ‖∞. This is exactlyzero, if covariates are treated fixed.

In this case, the right hand side is exactly the same as the lassoobjective function. Hence the name PoSI-Lasso.

Validity does not require either independence or identicaldistributions for the random vectors.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 17 / 22

Page 49: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Lasso

Define for M ⊆ {1,2, . . . ,p},

R̂M :=

{θ : L̂M(θ) ≤ L̂M(β̂M) + 4C1(α)

∥∥∥β̂M

∥∥∥1

+ 2C2(α)∥∥∥β̂M

∥∥∥2

1

},

whereL̂M(θ) = En(Y − X>(M)θ)2.

Note C2(α) is related to the quantile of ‖Σn − Σ‖∞. This is exactlyzero, if covariates are treated fixed.

In this case, the right hand side is exactly the same as the lassoobjective function. Hence the name PoSI-Lasso.

Validity does not require either independence or identicaldistributions for the random vectors.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 17 / 22

Page 50: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-Lasso

Define for M ⊆ {1,2, . . . ,p},

R̂M :=

{θ : L̂M(θ) ≤ L̂M(β̂M) + 4C1(α)

∥∥∥β̂M

∥∥∥1

+ 2C2(α)∥∥∥β̂M

∥∥∥2

1

},

whereL̂M(θ) = En(Y − X>(M)θ)2.

Note C2(α) is related to the quantile of ‖Σn − Σ‖∞. This is exactlyzero, if covariates are treated fixed.

In this case, the right hand side is exactly the same as the lassoobjective function. Hence the name PoSI-Lasso.

Validity does not require either independence or identicaldistributions for the random vectors.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 17 / 22

Page 51: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Outline

1 IntroductionAssumption-lean Linear RegressionSome Notation and Uniform Results

2 Post-selection Inference ProblemPoSI-DantzigPoSI-LassoPoSI-SqrtLasso

3 Multiplier Bootstrap

4 Some Comments and Extensions

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 18 / 22

Page 52: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-SqrtLasso

Define the confidence set

R̂M :=

{θ : L̂

12M(θ) ≤ L̂

12M(β̂M) + 2C(α)

∥∥∥β̂M

∥∥∥1

},

whereC2(α) = max{C1(α),C2(α)}.

The right hand side is exactly the objective function of square-rootlasso. Hence the name PoSI-SqrtLasso.

Again the validity does not require either independence oridentical distribution for the random vectors.

All these three confidence regions are valid as long as C1(α) andC2(α) are valid quantiles (or their estimators).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 19 / 22

Page 53: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-SqrtLasso

Define the confidence set

R̂M :=

{θ : L̂

12M(θ) ≤ L̂

12M(β̂M) + 2C(α)

∥∥∥β̂M

∥∥∥1

},

whereC2(α) = max{C1(α),C2(α)}.

The right hand side is exactly the objective function of square-rootlasso. Hence the name PoSI-SqrtLasso.

Again the validity does not require either independence oridentical distribution for the random vectors.

All these three confidence regions are valid as long as C1(α) andC2(α) are valid quantiles (or their estimators).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 19 / 22

Page 54: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-SqrtLasso

Define the confidence set

R̂M :=

{θ : L̂

12M(θ) ≤ L̂

12M(β̂M) + 2C(α)

∥∥∥β̂M

∥∥∥1

},

whereC2(α) = max{C1(α),C2(α)}.

The right hand side is exactly the objective function of square-rootlasso. Hence the name PoSI-SqrtLasso.

Again the validity does not require either independence oridentical distribution for the random vectors.

All these three confidence regions are valid as long as C1(α) andC2(α) are valid quantiles (or their estimators).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 19 / 22

Page 55: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

PoSI-SqrtLasso

Define the confidence set

R̂M :=

{θ : L̂

12M(θ) ≤ L̂

12M(β̂M) + 2C(α)

∥∥∥β̂M

∥∥∥1

},

whereC2(α) = max{C1(α),C2(α)}.

The right hand side is exactly the objective function of square-rootlasso. Hence the name PoSI-SqrtLasso.

Again the validity does not require either independence oridentical distribution for the random vectors.

All these three confidence regions are valid as long as C1(α) andC2(α) are valid quantiles (or their estimators).

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 19 / 22

Page 56: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Multiplier Bootstrap

All the three confidence regions depend on C1(α) and C2(α) forα ∈ (0,1).For the cases of independent random vectors and certaindependence settings, (with log p = o(n1/5)) multiplier bootstrapworks and gives correct estimators of (C1(α),C2(α)).If w1,w2, . . . ,wn are mean zero independent random variableswith Ew2

i = Ew3i = 1, then it can be proved that with high

probability,

P (‖En[Z ]− E[Z ]‖∞ ≤ t)

≤ P(‖En [w (Z − EnZ )]‖∞ ≤ t

∣∣∣∣Z n1

)+ Op

((log p)5

n

)1/6

.

Here Z1,Z2, . . . ,Zn are independent random vectors in Rp. Theresult applies for dependent variables with some changes in rates.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 20 / 22

Page 57: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Multiplier Bootstrap

All the three confidence regions depend on C1(α) and C2(α) forα ∈ (0,1).For the cases of independent random vectors and certaindependence settings, (with log p = o(n1/5)) multiplier bootstrapworks and gives correct estimators of (C1(α),C2(α)).If w1,w2, . . . ,wn are mean zero independent random variableswith Ew2

i = Ew3i = 1, then it can be proved that with high

probability,

P (‖En[Z ]− E[Z ]‖∞ ≤ t)

≤ P(‖En [w (Z − EnZ )]‖∞ ≤ t

∣∣∣∣Z n1

)+ Op

((log p)5

n

)1/6

.

Here Z1,Z2, . . . ,Zn are independent random vectors in Rp. Theresult applies for dependent variables with some changes in rates.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 20 / 22

Page 58: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Multiplier Bootstrap

All the three confidence regions depend on C1(α) and C2(α) forα ∈ (0,1).For the cases of independent random vectors and certaindependence settings, (with log p = o(n1/5)) multiplier bootstrapworks and gives correct estimators of (C1(α),C2(α)).If w1,w2, . . . ,wn are mean zero independent random variableswith Ew2

i = Ew3i = 1, then it can be proved that with high

probability,

P (‖En[Z ]− E[Z ]‖∞ ≤ t)

≤ P(‖En [w (Z − EnZ )]‖∞ ≤ t

∣∣∣∣Z n1

)+ Op

((log p)5

n

)1/6

.

Here Z1,Z2, . . . ,Zn are independent random vectors in Rp. Theresult applies for dependent variables with some changes in rates.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 20 / 22

Page 59: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Multiplier Bootstrap

All the three confidence regions depend on C1(α) and C2(α) forα ∈ (0,1).For the cases of independent random vectors and certaindependence settings, (with log p = o(n1/5)) multiplier bootstrapworks and gives correct estimators of (C1(α),C2(α)).If w1,w2, . . . ,wn are mean zero independent random variableswith Ew2

i = Ew3i = 1, then it can be proved that with high

probability,

P (‖En[Z ]− E[Z ]‖∞ ≤ t)

≤ P(‖En [w (Z − EnZ )]‖∞ ≤ t

∣∣∣∣Z n1

)+ Op

((log p)5

n

)1/6

.

Here Z1,Z2, . . . ,Zn are independent random vectors in Rp. Theresult applies for dependent variables with some changes in rates.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 20 / 22

Page 60: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 61: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 62: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 63: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 64: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 65: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

Summary and Extensions

We proved some new uniform rates of convergence results inlinear regression with NO model/distributional assumptions.

These results apply to various structurally dependent randomvectors with rates accordingly affected.

We described three different confidence regions that have validcoverage no matter how the model is chosen.

All these methods are computationally efficient and only requirecomputation of C1(α),C2(α) once for the dataset.

The construction of these confidence regions (for now) seemsvery specific to the quadratic structure of the objective function.

We also developed two different approaches (based on t- andF -tests) that apply to most of the M-estimation problems,including, generalized linear models.

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 21 / 22

Page 66: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

References[1] Berk, R., Brown, L. D., Buja, A., Zhang, K., Zhao, L. (2013)

Valid post-selection inference.Ann. Statist. 41, no. 2, 802–837.

[2] Deng, H., Zhang, C. H. (2017)Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectorsarxiv.org/abs/1705.09528.

Thank You!Questions?

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 22 / 22

Page 67: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

References[1] Berk, R., Brown, L. D., Buja, A., Zhang, K., Zhao, L. (2013)

Valid post-selection inference.Ann. Statist. 41, no. 2, 802–837.

[2] Deng, H., Zhang, C. H. (2017)Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectorsarxiv.org/abs/1705.09528.

Thank You!Questions?

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 22 / 22

Page 68: Validity First, Optimality Next - arun-kuchibhotla.github.io · Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 3 / 22. Linear Regression Suppose (Xi;Yi)

References[1] Berk, R., Brown, L. D., Buja, A., Zhang, K., Zhao, L. (2013)

Valid post-selection inference.Ann. Statist. 41, no. 2, 802–837.

[2] Deng, H., Zhang, C. H. (2017)Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectorsarxiv.org/abs/1705.09528.

Thank You!Questions?

Arun Kumar, PoSI Group (Wharton School) Valid PoSI with Random X WHOA-PSI 2017 22 / 22