Finding a compromise between information and regret in...

OptimalAllocation

AsyaMetelkinaand LucPronzato

Compromisebetweeninformationand ethics

Allocationrulesconverging tothe optimum

CARA andempiricalprocessallocation

Finding a compromise between information and regretin clinical trials

Asya Metelkina and Luc Pronzato

AM: Azoth Systems LP: Laboratoire I3S (CNRS and Univ. de Nice Sophia Antipolis)

MASCOT-NUM Annual Conference 19 March 2019

1 / 32

OptimalAllocation





Luc Pronzato I know Luc since 2014...

Sophia Antipolis, I3S, Algorithmes: Luc works in I3S since 1992...

Luc, Antibes, sea: leasure or work?

2 / 32

OptimalAllocation





Plan

Optimal allocations in clinical studies...an optimal design problem...

� 1. Compromise between information and ethicsClinical studies.Objectives of adaptive allocation.Information, ethics and compromise criterion.

� 2. Allocation rules converging to the optimumEquivalence theorem: characterization of optimal allocation.Oracle allocation rule.Randomized oracle rule.

� 3. CARA and empirical process allocationCARA type allocation.Allocation based on empirical measures.Compromise procedures proposed in literature.

� 4. Conclusion and perspectives

3 / 32

OptimalAllocation





Optimal designs

Design space X

Parametric or non parametric model(s): f (x ,θ), x ∈ X

Criterion G to maximize or minimize

Choose measure ξ =

{w1 . . . wkx1 . . . xk

}to optimize G(ξ)

Generate a sequence {x1, . . . ,xn} to (quasi-)optimize G(x1, . . . ,xn)

4 / 32

OptimalAllocation





Clinical trials

n subjects. i-th subject:

Xi , Ti , Yicovariates, allocation, outcome

Assumptions:

� E(Yi |Xi = x ,Ti = k) = ηk (x ,θk ), Yi ∈ Y ⊂ R, k = 1, . . . ,K .

� Xi are i.i.d. ∼ µ, (Xi ∈ X ⊂ Rd , µ(x)≥ 0,∫

X µ(dx) = 1)

� covariate-adaptive allocations

P(Ti = k |Xi = x) = πk (x)

Example 1 : Logistic regression.Y = {0,1},X = [−1,1], K = 2P(Yi = 1|Xi = x ,Ti = k) = ηk (x ,θk )

ηk (x ,θk ) =1

1 + e−θk x

θ1 = 5,θ2 =−5.

5 / 32

OptimalAllocation





Clinical trials



Assumptions:� E(Yi |Xi = x ,Ti = k) = ηk (x ,θk ),� Xi are i.i.d. ∼ µ,� covariate-adaptive allocations

P(Ti = k |Xi = x) = πk (x)

10 subjects6 / 32

OptimalAllocation





Clinical trials




P(Ti = k |Xi = x) = πk (x)

10 subjects allocated7 / 32

OptimalAllocation





Clinical trials



Assumptions:

� E(Yi |Xi = x ,Ti = k) = ηk (x ,θk ),

� Xi are i.i.d. ∼ µ,

� covariate-adaptive allocations

P(Ti = k |Xi = x) = πk (x)

Example 2: Linear regression with i.i.d errors.Y = R,X = [0,1], K = 2Yi = ηk (Xi ,θk ) + εi if Ti = k , εi |= Xiηk (x ,θk ) = ak + bk xa1 = 0, b1 = 10, a2 = 5, b2 =−4.

8 / 32

OptimalAllocation





Clinical trials




P(Ti = k |Xi = x) = πk (x)

10 subjects allocated9 / 32

OptimalAllocation





Objectives of clinical trial

Objectives:

� Information: to better estimate θ = (θ1, . . . ,θK )

� Ethics: to favor allocation to best treatment η∗(Xi ) = maxk=1...,K

ηk (Xi ,θk )

Yi |Ti = k are i.i.d with E(Yi |Ti = k) =∫X

ηk (x ,θk )πk (x)µ(dx)︸︷︷︸ξk (dx)

ηnewk (x ,θk ) = ηk (x ,θk )πk (x) or ξ = (ξ1, . . . ,ξK ) ∈ Ξ(µ)

modified model allocation measure

Ξ(µ) = {ξ = (ξ1, . . . ,ξK ) | ξk is µ−a.c., ∑Kk=1 ξk = µ, ξk ≥ 0}

is a convex set.

10 / 32

OptimalAllocation





Information

Objectives:

� Information: to better estimate θ = (θ1, . . . ,θK )

Maximize Ψ(M(ξ,θ)):

Ψ strictly concave, Lowener increasing, differentiable

M(ξ,θ) = ∑Kk=1

∫X Mk (x ,θk )ξk (dx) Fisher information.

Example: Ψ(M) = logdet(M), Ψ(M) =−tr(M−q), q ∈ (0,+∞).

Motivation:Asymptotic normality of ML estimator θ̂n,

√n(θ̂n−θ)→N (0,M(ξ,θ)−1)

Unbiased estimators: Cramer-Rao bound for variance.

11 / 32

OptimalAllocation





Information

Objectives:

� Information: to better estimate θ = (θ1, . . . ,θK ) ⇒ maxξ∈Ξ(µ)

Ψ(M(ξ,θ)).

Ψ concave, Ξ(µ) convex set⇒ ξ∗(x ,θ) = arg max

ξ∈Ξ(µ)Ψ(M(ξ,θ))

Example 1: logistic regressionY ∈ {0,1}, X = [−1,1], µ = U([−1,1]),θk ∈ R, no covariates in common.

⇒ M(ξ,θ) has a block structure:

M(ξ,θ) =

1∫0

x2

η(x ,θ1)(1−η(x ,θ1))ξ1(dx) 0

01∫0

x2

η(x ,θ2)(1−η(x ,θ2))ξ2(dx)

Ψ(M(ξ,θ)) = log(det(M(ξ,θ))) =

2∑

k=1log

(1∫0

x2πk (x)ηk (x ,θk )(1−ηk (x ,θk ))

dx

)

12 / 32

OptimalAllocation





Information

Objectives:


Ψ(M(ξ,θ)).



Example 2: linear regression with i.i.d. N(0,1) errors.µ = U([0,1]). ηk (x ,θk ) = ak + bk x , no common parameters.⇒ M(ξ,θ) has a block structure:

M(ξ) =

1∫0

ξ1(dx)1∫0

xξ1(x)dx 0 0

1∫0

xξ1(dx)1∫0

x2ξ1(dx) 0 0

0 01∫0

ξ2(dx)1∫0

xξ2(dx)

0 01∫0

xξ2(dx)1∫0

x2ξ2(dx)

13 / 32

OptimalAllocation





Information

Objectives:


Ψ(M(ξ,θ)).



Example 2: linear regression with i.i.d. N(0,1) errors.µ = U([0,1]). ηk (x ,θk ) = ak + bk x , no common parameters.⇒ M(ξ,θ) has a block structure:

M(ξ) =

m1 m2 0 0m2 m3 0 00 0 1−m1

12 −m2

0 0 12 −m2

13 −m3

m1 =

1∫0

π1(x)dx ,

m2 =1∫0

xπ1(x)dx ,

m3 =1∫0

x2π1(x)dx .

Ψ(M(ξ)) = logdet(M(ξ)) = log(m1 ·m3−m2

2

)+ log

((1−m1)( 1

3 −m3)− ( 12 −m2)2

)

14 / 32

OptimalAllocation





Information

Objectives:


Ψ(M(ξ,θ)).



Example 3: linear regression with i.i.d. N(0,1) errors.µ = U([−1,1]). ηk (x ,θk ) = a + bk x , with one common parameter.

M(ξ) =

1 m2 −m2m2 m3 0−m2 0 1

3 −m3

m2 = 12

1∫−1

xπ1(x)dx ,

m3 = 12

1∫0

x2π1(x)dx .

15 / 32

OptimalAllocation





Regret and reward

Objectives:

� Ethics: to favor allocation to best treatment η∗(Xi ) = maxk=1...,K

ηk (Xi ,θk )

Minimize the regret R(ξ,θ) =K∑

k=1

∫X

(η∗(x ,θ)−ηk (x ,θ))ξk (dx)

Maximize the reward Φ(ξ,θ) =K∑

k=1

∫X

ηk (x ,θ)ξk (dx)

Best treatment allocation: dξkdµ (x ,θ) = πk (x ,θ) =

1, if ηk (x ,θk ) = η∗(x ,θ)1m , if m ties

0, otherwise

16 / 32

OptimalAllocation





Information, ethics and compromise

Problem: information OR ethics? May be contradictory!

Example: Yi ∈ {0,1}, ηi (x ,θ) = θi , no covariates

Information Neyman allocation: P(Ti = 1) =

√θ1(1−θ1)√

θ1(1−θ1)+√

θ2(1−θ2):

maximizes the power of Wald’s test(

with statistics θ̂n−θ

V(θ̂n)

).

favors inferior treatment if θ1 + θ2 > 1.

Ethics Best treatment allocation: P(Ti = 1) = δθ1≥θ2 no information about the worthtreatment parameters.

Solution: Compromise criterion = convex combination of information and reward:

Maximize (1−α) ·Ψ(M(ξ,θ)) + α ·Φ(ξ,θ), α ∈ (0,1)

Equivalent to

{maxΨ(M(ξ,θ))

Φ(ξ,θ)≥ τ(α,θ).

17 / 32

OptimalAllocation





Optimal allocation measure

K = 2. Optimal allocation measure: ξ∗α

(θ) = arg maxξ∈Ξ(µ)

Hα(ξ,θ)

Asymptotic optimality criterion:

Hα(ξ,θ) = (1−α)Ψ

(2

∑k=1

Mk (ξk ,θ)

)︸︷︷︸

information

+α

2

∑k=1

φk (ξk ,θ)︸︷︷︸reward

.

Mk (θ,ξk ) =∫

X Mk (x ,θ)ξk (dx) and φk (θ,ξk ) =∫

X ηk (x ,θ)ξk (dx).

Convex set of allocation measures

Ξ(µ) = {ξ = (ξ1,ξ2) ∈M 2X | ξ1 + ξ2 = µ,ξ1 ≥ 0,ξ2 ≥ 0, ξ1,ξ2 a.c. wrt µ}.

Maximize concave function Hα(ξ,θ) on a convex set Ξ(µ)⇒ Equivalence Theorem

18 / 32

OptimalAllocation





Equivalence Theorem

Under differentiability conditions:ξ∗α

(θ) can be characterized through directional derivatives of Hα.

Equivalence theorem

ξ∗α

= arg maxξ∈Ξ(µ)

Hα(ξ,θ) is characterized by the function ∆12(ξ,x ,θ):

1 ∆12(ξ,x ,θ)≥ 0 ξ∗1-a.s., ∆12(ξ,x ,θ)≤ 0 ξ∗2-a.s.

2 There exist X1,X2 ⊂ X such thatξ∗1 = µ on X1, ξ∗2 = µ on X2,infx∈X1 ∆12(ξ,x ,θ)≥ 0 , supx∈X2

∆12(ξ,x ,θ)≤ 0∆12(ξ,x ,θ) = 0, if x ∈ X \(X1

⋂X2).

� Here ∆12(ξ,x ,θ) = G1(ξ,x ,θ)−G2(ξ,x ,θ).

� Directional derivatives (in direction of point mass at x)

G1(ξ,x ,θ) = limγ→0+

γ−1[Hα(ξ + γ(δx ,0),θ)−Hα(ξ,θ)]

G2(ξ,x ,θ) = limγ→0+

γ−1[Hα(ξ + γ(0,δx ),θ)−Hα(ξ,θ)]

19 / 32

OptimalAllocation





Logistic regression example

X = [0,1] and µ = U([0,1]). Ψ(M) = logdet(M).Logistic regression ηk (x ,θ) = 0.25 + 0.5 · 1

1+exp(−(θk ,1−x)θk ,2).

θ = (θ1,θ2), θk ∈ R2, k = 1,2.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

η1(x)

η2(x)

x0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

x

G(α

)1

(ξ∗,x

)−G

(α)

2(ξ

∗,x

)A B C

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α

1

2

1

2

Figure : Left: Statistical model of response η1(x ,θ1) et η2(x ,θ2).Middle: Function ∆12(ξ,x ,θ) from Equivalence Theorem for α = 0.7.Right: Ensembles X1 (treatment 1) et X2 (treatment 2) as function of α.

20 / 32

OptimalAllocation





Oracle allocation rule

� Rule 1. Oracle allocation rule(necessitates the knowledge of µ, θ and construction of ξ

∗α

(θ) )

P(Tn+1 = 1|Xn+1 = x) = π∗α(x ,θ)

with π∗α(x ,ξ,θ) =dξ∗1,α

dµ (x ,θ).

Remark: π∗α(x ,θ) = 1 on a subset x ∈ X0 ⊂ X is possible.

Example:P(Ti = 2[Xi = x) = 1 if x ∈ [0,A]∩ (B,C),P(Ti = 1[Xi = x) = 1 if x ∈ (A,B)∩ (C,1].

21 / 32

OptimalAllocation





Empirical information and reward

Empirical compromise criterion:

(1−α) ·Ψ(Mn(θ)) + α · (Φn(θ)), α ∈ (0,1)

Empirical reward :

Φn(θ) =1n

2

∑k=1

n

∑i=1,Ti =k

ηk (Xi ,θ)

Empirical information

Ψ(Mn(θ)) with Mn(θ) =1n

2

∑k=1

n

∑i=1,Ti =k

Mk (Xi ,θ)

� δTi =1Xi are i.i.d. ∼ ξ∗1(θ).

� Empirical allocations ξ̂n

= 1n

n∑

i=1(δTi =1,δTi =2) are asymptotically optimal:

Hα(ξ̂n,θ)→ Hα(ξ

∗α

(θ),θ) a.s.

22 / 32

OptimalAllocation





Randomized oracle randomization

π∗α(x ,θ) = 1 for x ∈ X0 with µ(X0) > 0⇒ prediction bias.

Optimal allocation measure under a constraint of randomization level β ∈ (0,1) :

ξ∗α,β

(θ) = βξµ + ξ̃α,β

(θ) with

ξµ

= ( µ2 ,

µ2 ) and ξ̃

α,β(θ) = arg max

ξ∈Ξ(µ(1−β))Hα

(βξ

µ+ ξ,θ

).

Randomized Oracle Rule 1.

P(Tn+1 = 1|Xn+1 = x) =β

2+ π∗α,β(x ,θ)

with πα,β(x ,θ) =d ξ̃1,α,β

dµ (x ,θ).

23 / 32

OptimalAllocation





Example of logistic regression

X = [0,1]. θ = (θ1,θ2), θk ∈ R2, k = 1,2.Logistic regression ηk (x ,θ) = ak + 0.5 · 1

1+exp(−θk ,1−θk ,2x).

a1 = 0.1, a2 = 0.25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

η1(x)

η2(x)

x0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α

1

2

1

2

1

2

Figure : Left : Outcome models.Right: Randomization level β = 0.2, deterministic part of allocation function ξ̃

α,β(θ) as fonction

of α.

24 / 32

OptimalAllocation





Randomized oracle rule

α,β ∈ (0,1). Asymptotic properties of:

Empirical allocations ξ̂n = 1n ∑

ni=1 Ti Xi converge to ξ

∗α,β

Compromise criterion is asymptotically optimal: H(ξ̂n,θ)−−−→n→∞

H(ξ∗α,β

,θ).

Allocation proportions ρn

= 1n

n∑

i=1Ti are consistent and asymptotically normal:

ρn−−−→n→∞

ρ∗k (θ) where ρ

∗k (θ) := ξ

∗k (X ,θ).

√n(

ρn−ρ∗k (θ)

)d−−−→

n→∞N (0,Σ∗) Σ∗ = diag(ρ

∗k (θ))−ρ

∗k (θ)ρ

∗k (θ)t

Bounds on regret(reward) and information.

25 / 32

OptimalAllocation





Adaptive sequential allocations:

Subjects arrive sequentially⇒ can use past covariates, allocations, outcomesinformation.

Allocation functions: P(Tn = k |Xn = x ,Fn)where Fn = σ(X1, . . . ,Xn−1,T1, . . . ,Tn−1,Y1, . . . ,Yn−1)

The sequence (T1, . . . ,Tn) is a stochastic process.

Motivation:

K = 2, Y = {0,1}, ηk (x) = θk , θk ∈ [0,1], Nn,1 := ∑ni=1(Ti = 1).

variance of (Nn,1− n2 ) is of order n for P(Tn = 1) = 1

2 ,bounded Biased Coin Design BCD(p), p ∈ ( 1

2 ,1]

P(Tn+1 = 1|Fn) =

p, 2Nn,1 < n12 , 2Nn,1 = n(1−p), 2Nn,1 > n

[Efron1971],[Markaryan,Rosenberger 2010]26 / 32

OptimalAllocation





Covariate Adjusted Response Adaptive designs

µ known, θ unknown⇒, use θ̂n (well defined n > n0, β > 0) [Zhang et al.’07]

� Rule 2 Covariate-Adjusted Response Adaptive targeting ξ∗α

(θ):

P(Tn+1 = 1|Xn+1 = x ,F n) = π(x , θ̂n)

here π is computed from ξ∗α,β

(θ̂n).

Asymptotic properties:

Consistency and asymptotic normality of θ̂n:

Asymptotic optimality of ξ̂n

= 1n

n∑

n=1(δTi =1δXi (.),δTi =2δXi (.))

Asymptotic properties of proportions.

27 / 32

OptimalAllocation





Allocation based on empirical process.

� θ known, µ unknown⇒ replace ξ∗α

(θ) by ξ̂n

.

� Rule 3. Allocation based on empirical process

P(Tn+1 = 1|T n,Xn,Xn+1 = x) = πn(x ,θ)

with π(x , ξ̃,θ) = d ξ̃1dµ (x ,θ) et ξ̂

n= 1

n

n∑1

(δTi =1δXi ,δTi =2δXi ).

� Asymptotic optimality: Hα(ξ̂n,θ)→ Hα(ξ

∗α,β

(θ),θ) a.s.

� Simulations: ⇒ allocation proportions ρn

very less that for rules 1 and 2.

� Randomization β > 0, θ and µ are unknown.

Rule 4. Empirical procedure of CARA type use both ξ̂n

and θ̂n

P(Tn+1 = 1|F n,Xn+1 = x) = πn(x , θ̂n).

28 / 32

OptimalAllocation





Logistic regression example

β = 0, X = [0,1]. θ = (θ1,θ2), θk ∈ R2, k = 1,2.Logistic regression ηk (x ,θ) = 1

1+exp(−θk ,1−θk ,2x), θk = (2,10) .

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

n1/2(Nn,1/n− ρ∗1)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

n1/2(Nn,2/n− ρ∗2)

Figure : Variability of treatments proportions:Up: treatment 1. Down: treatment 2.dotted = rule 3, red = rule 1, blue = limit law for rule 1.

29 / 32

OptimalAllocation





Comparison with procedures proposed in literature

Ad-hoc rules proposed in literature (logistic regression)

� Allocation based on covariates-adjusted odds ratio [Rosenberger et al.’01]

P(Tn+1 = 1 | Fn,Xn+1 = x) = η1(x ,θ̂n)(1−η2(x ,θ̂n))

(1−η1(x ,θ̂n))η2(x ,θ̂n)odds ratio.

� Modification of allocation rule from [Hu&Zhu&Hu’15]

P(Tn+1 = 1 | Fn,Xn+1 = x)da

1 ·eb1

da1 ·eb

1 + da2 ·eb

2

witha > 0,b > 0.Information dk = dk (ξ

n,x ,θ) = tr(M−1(ξ

n,θ)Mk (x ,θ))

Inverse of reward ek = ek (x ,θ) = 1/(1−ηk (x ,θ)) .

30 / 32

OptimalAllocation





Example: Logistic regression.

0 0.01 0.02 0.03 0.04 0.05 0.06−20.5

−20

−19.5

−19

−18.5

−18

−17.5

α = 0.95

α = 0

α = 1

R(ξ∗)

ψ(ξ

∗)

Figure : Red curve Ψ(M(ξ∗α

)) vs. R(ξ∗α

) parametrized by α.? = allocation based on odd rations [Rosenberger et al.’01].O = modified procedure [Hu&Zhu&Hu’15] (from left to right a = 1, b = 10,6,4,3.)

31 / 32

OptimalAllocation





Conclusion and perspectives

Fo more details see: [A. Metelkina, L. Pronzato “Information-regret compromise incovariate-adaptive treatment allocation” AoS’17]

Perspectives:

� Adaptation of efficient allocation procedure [Zhang&Hu AMJCU’09].

� Other forms of regret/reward, e.g. quadratic in (η∗−ηk ).

� Approximate computation of θ 7→∆12(x ,ξ∗α

(θ),θ).

� α = α(τ′) for R(ξ∗α

)≤ τ′, so Φ(ξ,θ)≥ τ(τ′,θ).

� Sequence of αn tending to 0. At which speed?

� Bayesian approach (wrt to θ)

32 / 32

Finding a compromise between information and regret in...

Documents

Transcript of Finding a compromise between information and regret in...