Modernizing Statistics Through Treatment Regimesdavidian/icsa2019.pdf2019 ICSA Applied Statistics...

Post on 19-Jul-2020

0 views 0 download

Transcript of Modernizing Statistics Through Treatment Regimesdavidian/icsa2019.pdf2019 ICSA Applied Statistics...

Modernizing Statistics ThroughTreatment Regimes

Marie Davidian

Department of StatisticsNorth Carolina State University

2019 ICSA Applied Statistics Symposium June 11, 2019

1/65

Outline

Treatment Regimes and Precision Medicine

A Brief History

Optimal Single Decision Regimes

Optimal Multiple Decision Regimes

Treatment Regimes in Practice

2/65

Premise

Patient heterogeneity:• Genetic/genomic profile• Demographic, physiological characteristics• Clinical variables• Medical history, concomitant conditions• Environment, lifestyle factors• Adverse reactions, adherence to prior treatments• Preference• . . .

Clinical decision-making:• Key decision points in the disease/disorder process• Multiple treatment options at each• A patient’s characteristics are implicated in which treatment

options s/he should receive

3/65

Example: Acute leukemia

Two decision points:• Decision 1 : Induction chemotherapy (2 options: C1, C2)• Decision 2 :

I Maintenance treatment for patients who respond(2 options: M1, M2)

I Salvage chemotherapy for those who don’t respond(2 options: S1, S2)

4/65

Clinical decision-making

How are treatment decisions made?• Clinical judgment , practice guidelines• Synthesis of all information on a patient up to the point of a

decision to determine next treatment action from among thefeasible options

• Goal: Make the “best ” decisions leading to the most beneficialexpected outcome for the patient

Precision medicine: Inform clinical decision-making and make itevidence-based• Evidence-based decision support

5/65

Informing clinical decision-making

Treatment regime:• A set of decision rules , each corresponding to a decision point• Static rule: Recommended treatment action does not depend on

patient information• Dynamic rule: Takes as input all available information on a

patient to that point and recommends treatment action fromamong the possible options

• Dynamic treatment regime , adaptive treatment strategy ,adaptive intervention , policy

• Formalizes clinical-decision making and defines an an algorithmfor treating any individual patient

6/65

Treatment regime

Simplest: E.g., acute leukemia• Decision 1: Give C1

• Decision 2: If response, give M2, if nonresponse, give S1

Individualized rules: More complex rules incorporating patientinformation• “Tailoring variables ”• Consistent with precision medicine

7/65

Treatment regime

For example: Acute leukemia• Decision 1:

If age < 50 years and WBC < 10.0 × 103/µl , givechemotherapy C2, otherwise, give C1

• Decision 2:

If patient responded and baseline WBC < 11.2, currentWBC < 10.5, no grade 3+ hematologic adverse event,current ECOG Performance Status ≤ 2, give maintenanceM1, otherwise, give M2; otherwise

If patient did not respond and age >60, current WBC <11.0, ECOG ≥ 2 give S1, otherwise, give S2

8/65

Two decision regime: Acute leukemia

• At baseline: Information x1, accrued information h1 = x1 ∈ H1

• Decision 1: Set of options A1 = {C1,C2}; rule 1: d1(h1): H1 → A1

• Between Decisions 1 and 2: Collect additional information x2, includingresponder status

• Accrued information h2 = (x1, chemotherapy at Decision 1, x2) ∈ H2

• Decision 2: Set of options A2 = {M1,M2,S1,S2}; rule 2:

d2(h2): H2 → {M1,M2} (responder), d2(h2): H2 → {S1,S2} (nonresponder)

• Treatment regime : d = {d1(h1),d2(h2)} = (d1,d2)

9/65

In general

Treatment regime with K decision points:• Baseline information x1 ∈ X1, intermediate information xk ∈ Xk

between Decisions k − 1 and k , k = 2, . . . ,K• Set of treatment options Ak at Decision k , elements ak ∈ Ak

• Accrued information or history

h1 = x1 ∈ H1

hk = (x1,a1, . . . , xk−1,ak−1, xk ) ∈ Hk , k = 2, . . . ,K ,

• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak

• Treatment regime

d = {d1(h1), . . . ,dK (hK )} = (d1,d2, . . . ,dK )

10/65

Treatment regimes and precision medicine

Premise: There is an infinitude of possible regimes d• D = class of all possible treatment regimes• Given a health outcome of interest. . .• Can we define an optimal treatment regime in D formalizing the

clinician’s goal to make the “best ” decisions to achieve the mostbeneficial expected outcome for a patient?

• Can we estimate an optimal treatment regime from data ?• And thereby inform clinical-decision making and make it

evidence-based

Result: An explosion of statistical methodological research onestimation of (optimal) treatment regimes in the past decade

11/65

Outline

Treatment Regimes and Precision Medicine

A Brief History

Optimal Single Decision Regimes

Optimal Multiple Decision Regimes

Treatment Regimes in Practice

12/65

Causal inference framework

Jamie Robins

Robins, J. (1986). A new approach to causal inference in mortality studieswith sustained exposure period–application to control of the health workersurvivor effect. Mathematical Modeling, 7, 1393–1512 (and Addendum) –Framework for causal inference on effects of time-varying treatment

13/65

Causal inference framework

Robins, J. M. (1997). Causal inference from complex longitudinaldata. In Berkane, M., editor, Latent Variable Modeling andApplications to Causality. Lecture Notes in Statistics (120), New York:Springer Verlag, 69–117 – Refine the causal inference framework

Robins J. M., Hernan, M., and Brumback, B. (2000). Marginalstructural models and causal inference in epidemiology.Epidemiology, 11, 550–560 – Modeling and estimation of meanoutcome if the patient population were to follow a static regime(inverse probability weighting )

14/65

Sequential treatment

Peter Thall

Thall, P., Millikan, R., and Sung, H. (2000). Evaluating multipletreatment courses in clinical trials. Statistics in Medicine, 30,1011-1128 – Sequential treatments

15/65

Sequential treatment

Lavori, P. W. and Dawson, R. (2000). A design for testing clinicalstrategies: Biased adaptive within-subject randomization. JRSS-A,163, 29-38 – Clinical trials for evaluating sequential treatments

Murphy, S. A., van der Laan, M. J., Robins, J. M., and CPPRG.(2001). Marginal mean models for dynamic regimes. JASA, 96,1410–1423 – Estimation of mean outcome for dynamic regimes

Lunceford, J., Davidian, M., and Tsiatis, A. A. (2002). Estimation ofsurvival distributions of treatment policies in two-stage randomizationdesigns in clinical trials. Biometrics, 58, 48- 57 – Estimation of meansurvival outcome for simple dynamic regimes

16/65

Optimal regimes

Susan Murphy

Murphy, S. (2003). Optimal dynamic treatment regimes (withdiscussions). JRSS-B, 65, 331-366 – Definition and estimation of anoptimal treatment regime from data

17/65

Optimal regimes

Robins, J. M. (2004). Optimal structural nested models for optimalsequential decisions. In Lin, D. Y. and Heagerty, P., editors,Proceedings of the Second Seattle Symposium on Biostatistics,189–326, New York. Springer – Definition and estimation of anoptimal treatment regime from data

Rosthøj, S., Fullwood, C., Henderson, R., and Stewart, S. (2006).Estimation of optimal dynamic anticoagulation regimes fromobservational data: A regret-based approach. Statistics in Medicine,25, 4197–4215.

Moodie, E. E. M., Richardson, T. S., and Stephens, D. A. (2007).De-mystifying optimal dynamic treatment regimes. Biometrics, 63,447-455.

18/65

Sequential multiple assignment randomized trials

Lavori, P. W. and Dawson, R. (2004). Dynamic treatment regimes:Practical design considerations. Clinical Trials, 1, 9–20.

Murphy, S. A. (2005). An experimental design for the development ofadaptive treatment strategies. Statistics in Medicine, 24, 1455–1481– Formal framework for SMARTs

Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W., Gnagy, B., et al.(2012). Experimental design and primary data analysis methods forcomparing adaptive interventions. Psychological Methods, 17,457–477.

19/65

Estimation of optimal regimes, K = 1

Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012).Estimating individual treatment rules using outcome weightedlearning. JASA, 107, 1106-1118.

Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). Arobust method for estimating optimal treatment regimes. Biometrics,68, 1010–1018.

Zhang, B., Tsiatis, A., Davidian, M., Zhang, M., and Laber, E. (2012).Estimating optimal treatment regimes from a classication perspective.Stat, 1, 103114.

20/65

Estimation of optimal regimes, K ≥ 2

Orellana, L., Rotnitzky, A., and Robins, J. M. (2010a). Dynamicregime marginal structural mean models for estimation of optimaldynamic treatment regimes, part I: Main content. The InternationalJournal of Biostatistics, 6.

Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W., Gnagy, B., et al.(2012). Q-learning: A data analysis method for constructing adaptiveinterventions. Psychological Methods, 17, 478-494.

Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013).Robust estimation of optimal dynamic treatment regimes forsequential treatment decisions. Biometrika, 100, 681–694.

Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015). Newstatistical learning methods for estimating optimal dynamic treatmentregimes. JASA, 110, 583-598.

21/65

Today

Enormous body of literature on estimation of optimal single andmultiple decision regimes. . .

22/65

Outline

Treatment Regimes and Precision Medicine

A Brief History

Optimal Single Decision Regimes

Optimal Multiple Decision Regimes

Treatment Regimes in Practice

23/65

Statistical framework

For simplicity: Two treatment options, A1 = {0,1}

With K = 1: Baseline information x1 = h1 ∈ H1

• Treatment regime d ∈ D: d = {d1(h1)}, d1 : H1 → A1

• Example: Rules involving thresholds (with 0 = C1, 1 = C2)

d1(h1) = I(age < 50 and WBC < 10)

• Example: Rules involving linear combinations

d1(h1) = I{age + 8.7 log(WBC)− 60 > 0}

Convention: Larger outcomes are more beneficial

24/65

Statistical framework

For a randomly chosen individual from the population: Withhistory H1

• Potential outcomes Y *(0) and Y *(1) that would be achievedunder options 0 and 1

• Potential outcome if assigned treatment according to d ∈ D

Y *(d) = Y *(1)I{d1(H1) = 1}+ Y *(0)I{d1(H1) = 0}

• For general A1, a1 ∈ A1

Y *(d) =∑

a1∈A1

Y *(a1)I{d1(H1) = a1}

25/65

Value of a treatment regime

For regime d ∈ D: E{Y *(d)} is the expected outcome if allindividuals in the population were to receive treatment in A1according to rule d1 in d• Referred to as the value of regime d

V(d) = E{Y *(d)}

Definition of an optimal regime: dopt ∈ D is a regime satisfying

dopt = arg maxd∈D

E{Y *(d)} = arg maxd∈D

V(d)

i.e., V(dopt) = E{Y *(dopt)} ≥ E{Y *(d)} = V(d) for all d ∈ D

26/65

Optimal treatment regime

Characterization of an optimal regime:

V(d) = E{Y *(d)} = E[E{Y *(d)|H1}

]= E

[E{Y *(1)|H1}I{d1(H1) = 1}+ E{Y *(0)|H1}I{d1(H1) = 0}

]• Maximizing the inner expression at any h1 leads to V(d) as large

as possible if

d1(h1) = 1 when E{Y *(1)|H1 = h1} > E{Y *(0)|H1 = h1}d1(h1) = 0 when E{Y *(1)|H1 = h1} ≤ E{Y *(0)|H1 = h1}

• Thus, dopt has rule

dopt1 (h1) = I

[E{Y *(1)|H1 = h1} > E{Y *(0)|H1 = h1}

]27/65

Optimal treatment regime

In general:

dopt1 (h1) = arg max

a1∈A1

E{Y *(a1)|H1 = h1}

• Chooses the option in A1 with the maximum expected outcomegiven an individual’s history h1

• Thus, makes the best decision for such an individual given onlyknowledge of his history

• So formalizing the goal of clinical decision-making

28/65

Estimating dopt

Observed data: (X1i ,A1i ,Y1i), i = 1, . . . ,n• H1i = X1i = history for individual i• A1i = option in A1 actually received by i• Y1 = observed outcome• From a clinical trial or observational study (RWD )

Goal: Estimate dopt based on these data

dopt1 (h1) = arg max

a1∈A1

E{Y *(a1)|H1 = h1}

• dopt is defined in terms of potential outcomes• Must be able to express the definition of dopt in terms of the

observed data

29/65

Identifiability assumptions

SUTVA (consistency): Y = Y *(A1) =∑

a1∈A1Y *(a1)I(A1 = a1)

No unmeasured confounders (NUC): {Y *(1),Y *(0)} ⊥⊥ A1|H1

Positivity: P(A1 = a1|H1 = h1) > 0, a1 ∈ A1, for all h1 ∈ H1 withP(H1 = h1) > 0

Under these assumptions: For a1 ∈ A1

E{Y *(a1)|H1} = E{Y *(a1)|H1,A1 = a1} = E(Y |H1,A1 = a1)

so that dopt1 (h1) = I{E(Y|H1,A1 = 1) > E(Y|H1,A1 = 0)}

• Regression E(Y|H1 = h1,A1 = a1) = Q1(h1,a1)

dopt1 (h1) = arg max

a1∈A1

Q1(h1,a1)

30/65

Estimation of an optimal regime

Q-learning: Posit and fit a regression model Q1(h1,a1;β1)

• Substitution estimator

doptQ,1(h1) = arg max

a1∈A1

Q1(h1,a1; β1), doptQ (h1) = {dopt

Q,1(h1)}

• Form of model Q1(h1,a1;β1) induces a class of regimes indexedby β1 to which the search is restricted

• If Q1(h1,a1;β1) is misspecified doptQ could be far from dopt

31/65

Estimation of an optimal regime

Alternative approach: Deliberately restrict to a class Dη ⊂ D ofregimes dη with rules of form d1(h1; η1)

• Chosen for interpretability , cost , etc; e.g,

d1(h1; η1) = I(x11 < η11, x12 < η12)

• Optimal restricted regime: doptη ∈ Dη with rule

d1(h1; ηopt1 ), ηopt

1 = arg maxη1

V(dη), doptη = {d1(h1; η

opt1 )}

Policy search estimation: Estimate V(dη) by V(dη) for fixed η1

• Maximize V(dη) in η1 to obtain

doptη = {d1(h1, η

opt1 )}, ηopt

1 = arg maxη1

V(dη), doptη = {d1(h1, η

opt1 )}

32/65

Estimation of an optimal regime

Define: π1(h1,a1) = P(A1 = a1|H1 = h1)

Can show: For any d under SUTVA, NUC, positivity

V(d) = E{Y *(d)} = E[

I{A1 = d1(H1)}YP{A1 = d1(H1)|H1}

]= E

{I{A1 = d1(H1)}Y

π1(H1,A1)

}

33/65

Estimation of an optimal regime

Suggests: With a model π1(h1,a1; γ1)

• Inverse probability weighted estimator (IPW)

VIPW (dη) = n−1n∑

i=1

I{A1i = d1(H1i ; η1)}Yi

π1(H1i ,A1i ; γ1)

• Augmented inverse probability weighted estimator (AIPW)

VAIPW (dη) = n−1n∑

i=1

[I{A1i = d1(H1i ; η1)}Yi

π1(H1i ,A1i ; γ1)

− I{A1i = d1(H1i ; η1)} − π1(H1i ,A1i ; γ1)

π1(H1i ,A1i ; γ1)Qdη,1(H1i ; η1, β1)

]Qdη,1(H1; η1, β1) = Q1{H1,d1(H1; η1);β1}

• AIPW estimator is doubly robust, considerably more efficient

34/65

Classification analogy

Challenge: Maximization of VIPW (dη) or VAIPW (dη) in η1 is anonsmooth optimization problem

Can show: With two options• This maximization can be recast as minimization of a weighted

classification error• Can view the rule d1(h1; η1) as a classifier• Can exploit existing algorithms for classification problems (for

nonsmooth optimization )• CART, SVM, etc

35/65

Classification analogy

Algebra: Can write

VAIPW (dη) = n−1n∑

i=1

[ψ1(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 1}

+ ψ0(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 0}]

ψa1(H1,A1,Y) =I(A1 = a1)Yπ1(H1,a1)

− I(A1 = a1)− π1(H1,a1)

π1(H1,a1)Q1(H1,a1)

• Maximizing VAIPW (dη) is equivalent to minimizing in η1

n−1n∑

i=1

|C1(H1i ,A1i ,Yi)|I[I{C1(H1i ,A1i ,Yi) > 0} 6= d1(H1i ; η1)

]C1(H1i ,A1i ,Yi) = ψ1(H1i ,A1i ,Yi)− ψ0(H1i ,A1i ,Yi)

• In form of a weighted classification error• Similarly for VIPW (dη)

36/65

Classification analogy

Decision function f1: d1(h1; η1) = I{f1(h1; η1) > 0}• Equivalently with `0-1(x) = I(x ≤ 0), minimize

n−1n∑

i=1

|C1(H1i ,A1i ,Yi)| `0-1

([2I{C1(H1i ,A1i ,Yi) > 0}−1

]f1(H1i ; η1)

)• For SVM, replace nonconvex `0-1(x) by convex surrogate

`hinge(x) = (1− x)+, x+ = max(0, x)

and impose a penalty for overfitting of f1• Outcome Weighted Learning (OWL)

37/65

Interpretability versus flexibility

Classification approach:• Flexible, highly parameterized decision rules• Pro: Can synthesize high-dimensional patient information and

get close to true dopt ∈ D• Con: “Black box ,” little scientific insight

Opposing view: Parsimony and interpretability• Focus on Dη with understandable rules• Pro: Accessibility, scientific insight• Con: Optimal regime may not get close to true dopt ∈ D

38/65

Outline

Treatment Regimes and Precision Medicine

A Brief History

Optimal Single Decision Regimes

Optimal Multiple Decision Regimes

Treatment Regimes in Practice

39/65

K decision treatment regime

• Baseline information x1 ∈ X1, intermediate information xk ∈ Xkbetween Decisions k − 1 and k , k = 2, . . . ,K

• Set of treatment options Ak at Decision k , elements ak ∈ Ak

• Accrued information or history

h1 = x1 ∈ H1

hk = (x1,a1, . . . , xk−1,ak−1, xk ) ∈ Hk , k = 2, . . . ,K ,= (xk ,ak−1)

where xk = (x1, . . . , xk ), ak similarly, k = 1, . . . ,K• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak

• Treatment regime

d = {d1(h1), . . . ,dK (hK )} = (d1,d2, . . . ,dK )

• Write dk = (d1, . . . ,dk ), k = 1, . . . ,K

40/65

Statistical framework

For a randomly chosen individual from the population: Withhistory H1 = X1

• Potential outcomes if an individual were to receive aK

{X *2(a1),X *

3(a2), . . . ,X *K (aK−1),Y *(aK )}

• All possible potential outcomes

W * ={

X *2(a1),X *

3(a2), . . . ,X *K (aK−1),Y *(aK ),

for a1 ∈ A1,a2 ∈ A2, . . . ,aK−1 ∈ AK−1,aK ∈ AK

}• For regime d ∈ D can define in terms of H1 and W * potential

outcomes under d

{X *2(d1),X *

3(d2), . . . ,X *K (dK−1),Y *(d)}

41/65

Potential outcomes for regime d ∈ DFormally:• Define

X*k (ak−1) = {X1,X *

2(a1),X *3(a2), . . . ,X *

k (ak−1)}, k = 2, . . . ,K

• Then

X *2(d1) =

∑a1∈A1

X *2(a1)I{d1(X1) = a1}

X *k (dk−1) =

∑ak−1∈Ak−1

X *k (ak−1)

k−1∏j=1

I[dj{X*

j (aj−1),aj−1} = aj

]k = 3, . . . ,K

Y *(d) =∑

aK∈AK

Y *(aK )K∏

j=1

I[dj{X*

j (aj−1),aj−1} = aj

]• Also define

X*k (dk−1) = {X1,X *

2(d1),X *3(d2), . . . ,X *

k (dk−1)}, k = 2, . . . ,K

42/65

Value of a K -decision treatment regime

For regime d ∈ D: E{Y *(d)} is the expected outcome if allindividuals in the population were to receive treatment in A1, . . . ,AKaccording to the rules in d• The value of regime d

V(d) = E{Y *(d)}

Definition of an optimal regime: dopt ∈ D is a regime satisfying

dopt = arg maxd∈D

E{Y *(d)} = arg maxd∈D

V(d)

i.e., V(dopt) = E{Y *(dopt)} ≥ E{Y *(d)} = V(d) for all d ∈ D

43/65

Characterization of an optimal regime

Sketch for K = 2: Using backward induction• For a randomly chosen individual

At Decision 2:• If she started with H1 = X1 = x1 and received a1 ∈ A1 at

Decision 1, she already will have achieved X *2(a1)

• With a1 and X*2(a1) = {X1,X *

2(a1} = x2 already determined, theoptimal decision at Decision 2 is to choose a2 ∈ A2 resulting inthe largest expected outcome given she is already at this point

V2(h2) = V2(x2,a1) = maxa2∈A2

E{Y *(a1,a2)|X*2(a1) = x2}

• Optimal rule at Decision 2

dopt2 (h2) = dopt

2 (x2,a1) = arg maxa2∈A2

E{Y *(a1,a2)|X*2(a1) = x2}

44/65

Characterization of an optimal regime

At Decision 1: If she starts with H1 = X1 = x1, choose a1 ∈ A1 tomaximize expected outcome given X1 = x1, taking into account shewill receive treatment at Decision 2 by following dopt

2 in the future• If a1 ∈ A1 is selected now, she will arrive at Decision 2 with

X*2(a1) = {x1,X *

2(a1)} and, with treatment selected by dopt2 , will

have expected outcome

V2{x1,X *2(a1),a1} = max

a2∈A2E{Y *(a1,a2)|X *

2(a1),X1 = x1}

• Optimal rule at Decision 1

dopt1 (h1) = dopt

1 (x1) = arg maxa1∈A1

E [V2{x1,X *2(a1),a1}|X1 = x1]

selects a1 ∈ A1 to maximize the maximum expected outcomethat would result from choosing treatment optimally at Decision 2

45/65

Characterization of an optimal regime

Can be shown: d = (dopt1 ,dopt

2 ) defined this way satisfies thedefinition of an optimal regime• And the reasoning extends to general K

46/65

Estimating dopt

Observed data: For i = 1, . . . ,n, i.i.d.

(X1i ,A1i ,X2i ,A2i , . . . ,XKi ,AKi ,Yi) = (X Ki ,AKi ,Yi) = (X i ,Ai ,Yi)

• X1 = baseline information at Decision 1, taking values in X1

• Ak = treatment option actually received at Decision k ,k = 1, . . . ,K , taking values in Ak

• Xk = intervening information between Decisions k − 1 and k ,k = 2, . . . ,K , taking values in Xk

• History H1 = X1, Hk = (X1,A1, . . . ,Xk−1,Ak−1,Xk ) = (X k ,Ak−1),k = 2, . . . ,K

• Y = observed outcome (after Decision K or function of HK )• Data sources discussed shortly

Goal: Estimate dopt based on these data• Must be able to express dopt in terms of the observed data

47/65

Identifiability assumptions

SUTVA (consistency): Y = Y *(AK ) =∑

a∈A Y *(aK )I(AK = aK )

Xk = X *k (Ak−1) =

∑ak−1∈Ak−1

X *k (ak−1)I(Ak−1 = ak−1), k = 2, . . . ,K

Sequential Randomization (SRA): Robins (1986)

W * ⊥⊥ Ak |Hk , k = 1, . . . ,K

Positivity: P(Ak = ak |Hk = hk ) > 0, ak ∈ Ak , for all hk ∈ Hk withP(Hk = hk ) > 0, k = 1, . . . ,K

48/65

Identifiability assumptions

Under these assumptions: It is possible to identify the distribution of

{X1,X *2(d1),X *

3(d2), . . . ,X *K (dK−1),Y *(d)}

which depends on that of (X1,W *), from the distribution of theobserved data

(X1,A1,X2,A2, . . . ,XK ,AK ,Y)

• Robins (1986) g-computation algorithm

49/65

Characterization in terms of observed data

For K = 2: Generalizes to arbitrary K

Decision 2: With Q2(h2,a2) = E(Y|H2 = h2,A2 = a2)

dopt2 (h2) = arg max

a2∈A2

Q2(h2,a2)

V2(h2) = maxa2∈A2

Q2(h2,a2)

Decision 1: Define

Q1(h1,a1) = Q1(x2,a1) = E{V2(X2,a1)|X2 = x2,A1 = a1}

dopt1 (h1) = arg max

a1∈A1

Q1(h1,a1)

• Qk (h1,ak ) are the Q-functions

Suggests: Positing and fitting models for the Q-functions

50/65

Q-learning

Estimation of dopt :• Decision 2: Posit and fit a model Q2(h2,a2;β2) by regressing Y

on H2,A2 (e.g., least squares) and estimate

doptQ,2(h2) = dopt

Q,2(h2; β2) = I{Q2(h2,1; β2) > Q2(h2,0; β2)}

• For each i , form the “pseudo outcome ”

V2i = V2(H2i ; β2) = max{Q2(H2i ,0; β2),Q2(H2i ,1; β2)}

• Decision 1: Posit and fit a model Q1(h1,a1;β1) by regressing V2on H1,A1 (e.g., least squares) and estimate

doptQ,1(h1) = dopt

Q,1(x1; β1) = I{Q1(h1,1; β1) > Q1(h1,0; β1)}

• Estimated regime doptQ = (dopt

Q,1, doptQ,2)

51/65

Restricted class of regimes

Q-learning: Q-function models for k = K − 1, . . . ,1 almost certainlymisspecified

Alternative approach: Deliberately restrict to a class Dη ⊂ Dcomprising regimes

dη = {d1(h1; η1), . . . ,dK (hK ; ηK )}, η = (ηT1 , . . . , η

TK )

T

• Chosen for interpretability , cost , etc• Optimal restricted regime: dopt

η ∈ Dη satisfies

doptη = {d1(h1; η

opt1 ), . . . ,dK (hK ; η

optK )}

ηopt = (ηopt T1 , . . . , ηopt T

K )T = arg maxη

V(dη)

52/65

Restricted class of regimes

Policy search estimation: Estimate V(dη) by V(dη) for fixed η

• Maximize V(dη) in η to obtain

ηopt = (ηopt T1 , . . . , ηopt T

K )T = arg maxη

V(dη)

doptη = {d1(h1, η

opt1 ), . . . ,dK (hK , η

optK )}

53/65

Estimation of an optimal regime

IPW estimator: Extension of single decision case

VIPW (dη) = n−1n∑

i=1

∏Kk=1 I{Aki = dk (Hki ; ηk )Yi}∏K

k=1 πk (Hki ,Aki ; γk )

• πk (hk ,ak ) = P(Ak = ak |Hk = hk ); model as πk (h1k ,ak ; γk )

• AIPW estimator possible, is doubly robust, considerably moreefficient

• Valid under SUTVA, SRA, positivity

54/65

Estimation of an optimal regime

High-dimensional η: Direct maximization in η is infeasible

Backward iterative implementation: Basic idea for K = 2• Decision 2: History H2 = (X 2,A1) is already determined , so

selection of Decision 2 treatment is like a single decision problemwith “baseline history ” H2 and single decision rule d2(h2; η2)

• ⇒ Maximize single decision estimator V(dη) in η2 to obtain ηopt2

• Decision 1: Maximize a two decision estimator V(dη,1,dη,2) in η1

with dη,2(h2) = d2(h2; η2) held fixed at d2(h2; ηopt2 )

• Can be shown: Results in an estimator for doptη

• Classification analogy at each stage

• Backward Outcome Weighted Learning (BOWL)

55/65

Outline

Treatment Regimes and Precision Medicine

A Brief History

Optimal Single Decision Regimes

Optimal Multiple Decision Regimes

Treatment Regimes in Practice

56/65

Evidence-based decision support

Result: From these or other approaches• An evidence-based regime based on formal statistical principles

that can be used to inform selection of treatment at at eachdecision point

• Insight on key characteristics (tailoring variables ) that should beincorporated at each decision point

• Interpretability versus flexibility

57/65

Data sources

Data sources:• Existing data from a longitudinal observational study or

previously conducted conventional clinical trial with follow-up(RWD )

• Prospectively collected data from a clinical trial conductedspecifically for this purpose

Sequential Multiple Assignment Randomized Trial (SMART):• Randomize at each decision point• SRA, positivity automatically satisfied• Collect rich baseline and intervening information to support

estimation of an optimal regime

58/65

SMART for Acute Leukemia

Randomization at •s

59/65

Growing interest in SMARTs

SMARTs on which I am a collaborator: Optimizing• Behavioral cancer pain management• Colorectal screening• HIV prevention, management• Anti-epilepsy medication adherence

60/65

I-SPY 2+ platform trial in breast cancer

How to treat women with locally advanced breast cancer who donot respond to initial therapy?

• I-SPY 2: Adaptive phase II platform trial, collaborative effort ofNCI, FDA, industry (FINH Biomarkers Consortium)

• I-SPY 2+: SMART with re-randomization of nonresponders

• P01 CA210961, PI: Laura Esserman, UCSF

61/65

Remarks

“Modernizing statistics:”• Questions of “what comes next ?” and “when and for whom ”

arise routinely• Usually after the fact in a conventional clinical trial. . .• These questions usually can be cast in terms of treatment

regimes• A formal statistical framework and methods to address this exist ,

as do methods for design of SMARTs• We must promote thinking in terms of treatment regimes• Prospectively rather than retrospectively

62/65

Pontification

“Modernizing statistics:”• The research-practice gap is still huge• Development of methods for estimating optimal treatment

regimescontinues• But major conceptual questions remain unresolved• E.g., how to characterize the contribution of a particular

treatment option to a regime?• E.g., what should be the regulatory path forward for a treatment

that is a critical component of an overall regime?• Challenge: Resolution of these and associated methodology

63/65

Shameless promotion

Coming in 2020:

Introduction to Dynamic Treatment Regimes:Statistical Methods for Precision Medicine

Tsiatis, A. A., Davidian, M., Holloway, S. T., and Laber, E. B.

• Published by Chapman & Hall• Dedicated website with software , code , worked examples

R package: DynTxRegime, available on CRAN and athttps://www2.cscc.unc.edu/impact7/DynTxRegime

Course notes: Eric Laber and I taught the PhD course Introductionto Dynamic Treatment Regimes in Spring 2019 for the SAMSIPrecision Medicine (PMED) program (www.samsi.info)

64/65

Acknowledgement

IMPACT – Innovative Methods Program for Advancing Clinical Trials

• A joint venture of Duke, UNC-Chapel Hill, NC State• Supported by NCI Program Project P01 CA142538 (2010–2020)

http://impact.unc.edu

• Statistical methods for precision cancer medicine

65/65