Experimental Designs for different approaches ofSimultaneous Equations

Vıctor Casero-Alonso and Jesus Lopez FidalgoDepartment of Mathematics

Institute of Mathematics Applied to Science and EngineeringUniversity of Castilla-La Mancha, Spain

0. Abstract

I Models with simultaneous equations (widely used in economics, sociology, medicine,engineering...) are considered.

I A model with two equations is considered.I One explanatory variable (exogenous) of the first equation is the response variable (endogenous)

of the second equation, where there is a controllable variable which is being designed.I Plugging second equation into the first one the designable variable is now in both equations.I Two different models: different maximum likelihood estimators and therefore information matrices

and optimal designs.I Optimal designs for both approaches are computed and compared, both in a discrete and a

continuous design space.I Cases of completely known correlation and a unknown correlation to be estimated are considered

and compared.I A sensitivity analysis is performed to have an idea of the risk in choosing wrong nominal values

of the parameters.

1. Motivating examples

I Conlisk (1979): An oil company wants to study a controlled variation in the prices of gas andrepairs, Pg and Pr . (Other controlled variables: whether trading stamps are offered with gas andrepair sales respectively.)Two endogenous variables: quantity of gas and repairs sold.Two equations with all variables, exogenous and endogenous, included.

I Aigner and Balestra (1988): Designing electricity pricing experiments which takes place via anintervention in one or a succession of periods.

I Hahn, Hirano and Karlan (2011): Designing using propensity score, i.e. the conditional probabilityof treatment given some observed characteristics of the individual (covariates).

I Surgery of lung carcinoma (2004) Exercise test to predict morbidity after lung resection: Riding astatic bicycle during a period of time (controlled variable, exogenous).An uncontrollabled variable: % of maximum volume of expired air in the first second.Two endogenous variables: Oxygen desaturation during the test (y1, e.g. linear regression) andBinary response: morbidity (y2, Logistic model).

2. Different approaches of Simultaneous Equations

Poskitt and Skeels (2007): Two formulations probabilistically equivalent but conceptually quitedifferent. Different MLEs, then information matrices and optimal designs.

SES - Structural equation specification{y = Yβ + u,Y = Π2a + Z Π2b + V ,

I y , response variable,I Y , explanatory/response variable,I Z , controllable variable, which is being

designed,I β, Π2a and Π2b, unknown parameters,I u and V , error terms,(


)∼ N



(1 ρρ 1


RFS - Reduced form specification

Plugging second equation into the first one{y = Π2aβ + Z Π2bβ + ν,

Y = Π2a + Z Π2b + V .

The designable variable Z is now in bothequations.(


)∼ N



(1 ρρ 1


3. Optimal designs

I Design space: Z = {0,1}.I Information Matrices:

For ρ known ( aa ) and ρ unknown to be estimated (the whole matrix):

I MSES[ξ(z)] =1

1− ρ2

1 + Π2

2a + 2pΠ2aΠ2b + pΠ22b −(Π2a + pΠ2b)ρ −p(Π2a + Π2b)ρ 1

−(Π2a + pΠ2b)ρ 1 p 0−p(Π2a + Π2b)ρ p p 0

1 0 0 1+ρ2


I MRFS[ξ(z)] =1

1− ρ2


2a + 2pΠ2aΠ2b + pΠ22b (Π2a + pΠ2b)(β − ρ) p(Π2a + Π2b)(β − ρ) 0

(Π2a + pΠ2b)(β − ρ) 1 + β2 − 2βρ p(1 + β2 − 2βρ


p(Π2a + Π2b)(β − ρ) p(1 + β2 − 2βρ

)p(1 + β2 − 2βρ


0 0 0 1+ρ2


I Optimal design: ξ∗(z) =

{0 1

1− p∗ p∗

},p∗ = arg min Φ{M[ξ(z)]} ∈ [0,1]; Φ: D–optimal, c–optimal

4. Optimal weights

Similar D– and c–optimal weights p∗ for a given nominal values. Ex: (Π2a,Π2b, ρ) = (4,1,0.8):

D–opt cβ–opt cΠ2a–opt cΠ2b–opt cρ–optSES RFS SES RFS SES RFS SES RFS SES RFS

ρ known 0.547 .553 1 1 0 0.528 0.50092 0.5089 - -ρ unknown 0.548 0.50097 1 1

D–, cΠ2a–(only for RFS) and cΠ2b–optimal designs for Z = {0,1} are optimal for Z = [0,1] (GET).

Bounds for p∗ of D–optimal designs:

a) b)

Figure: Values of p∗ for (Π2a,Π2b) with ρ = 0.8 for D–optimal design in: a) SES model and b) RFS model.

Theorem (p∗ bounds)

For SES and RFS models (with ρ known or unknown) the weight of D–optimal design in Z = {0,1}for all values of ρ, Π2a, Π2b and β is bounded:

p∗ ∈ (1/3,2/3).

5. Robustness of D–optimal designs

Lower bound for D–efficiency:

D–effθ∗(p0) =|Mθ∗(p∗)|−1

|Mθ∗(p0)|−1 =|M(Π∗




0.35 0.40 0.45 0.50 0.55 0.60 0.65p*






0.35 0.40 0.45 0.50 0.55 0.60 0.65p*






a) b)Figure: Values of D–effθ∗(p0) for SES model and

different values of (Π2a,Π2, ρ) with: a) ρ known andb) ρ unknown.

Theorem (D–efficiency lower bound)

The minimum D–efficiencies for SES andRFS models for all values of ρ, Π2a, Π2band β are:I for ρ known: min D-effθ∗(p0) = 1

21/3 ,

I for ρ unknown: min D-effθ∗(p0) = 121/4 .

Particular cases:If nominal values are θ0 but true values are θ∗...



10P2 a -10



P2 b









10P2 a




P2 b






Figure: D–eff (with ρ known) for a neighborhood of θ0:a) θ0 = (4,1,0.8) (p0 = 0.547) b) θ0 = (4,−3,0.8)

(p0 = 0.368)

I D–eff in the point θ∗ = θ0 is 1.I 3 points more with D–eff = 1 (blacks in figures)I D–eff is high for true values (Π∗2a,Π∗2b):

a) greater (both) than (Π2a0,Π2b0),b) less (both) than (Π2a0,Π2b0).

I D-eff decay for true values in the direction:a) Π2a = −Π2b,b) Π2a = 0.

I min D-eff in the neighborhood is:a) 91.46%,b) 83.60%.


I In both models the optimal designs depend on the nominal values.I For Z = {0,1}:

I Similar SES optimal designs either for ρ known or unknown.I The same RFS optimal designs either for ρ known or unknown.I Similar optimal designs for SES and RFS, except the cΠ2a-optimality.

Ex: (Π2a,Π2b) = (4,1) (ρ = .8 known) ξ∗SESΠ2a=


}and ξ∗RFSΠ2a


{0 1

1− 0.528 0.528


but the relative efficiencies are quite good:

effSES (ξ∗RFS) =cT M−1


(θ, ξ∗SES



(θ, ξ∗RFS


= 91.1% effRFS (ξ∗SES) =cT M−1


(θ, ξ∗RFS



(θ, ξ∗SES


= 95.2%

I Bounds por p∗ of D–optimal design: p∗ ∈ (1/3,2/3).

I Lower bound for D–efficiency: min D-effθ∗(p0) =1

21/3 (ρ known) or1

21/4 (ρ unknown).

