Common random numbers (CRN)...For typical simulations, F 1 k (U) is much too complicated to compute....

Common random numbers (CRN)

Simulation is often used to compare similar systems,e.g., for the purpose of optimization.

Suppose we want to estimate µ2 − µ1 by ∆ = X2 − X1, where µ1 = E[X1]and µ2 = E[X2]. We have

Var[∆] = Var[X1] + Var[X2]− 2Cov[X1,X2].

If each Xk has (fixed) cdf Fk for k = 1, 2, then taking Xk = F−1k (U) for asingle common r.v. U ∼ U(0, 1) maximizes the covariance (Frechet 1951).

For typical simulations, F−1k (U) is much too complicated to compute.

Common random numbers (CRNs)

What we can do is simulate the two systems with exactly the samestreams of uniforms random numbers. Important: make sure that thecommon random numbers (CRN) are used for the same purpose for bothsystems (synchronization) and generate all r.v.’s by inversion.

Proposition. If X1 and X2 are monotone functions of each uniform, in thesame direction then Cov[X1,X2] > 0.

In general (non-monotone functions), we can still have Cov[X1,X2] > 0.

With independent random numbers(IRN), Cov[X1,X2] = 0.

Multiple comparisons: All of this applies if we want to compare severalsimilar systems.

Common random numbers (CRNs)

What we can do is simulate the two systems with exactly the samestreams of uniforms random numbers. Important: make sure that thecommon random numbers (CRN) are used for the same purpose for bothsystems (synchronization) and generate all r.v.’s by inversion.

Proposition. If X1 and X2 are monotone functions of each uniform, in thesame direction then Cov[X1,X2] > 0.

In general (non-monotone functions), we can still have Cov[X1,X2] > 0.

With independent random numbers(IRN), Cov[X1,X2] = 0.

Multiple comparisons: All of this applies if we want to compare severalsimilar systems.

Example: The stochastic activity network

0source 1Y0

8 sink

Suppose we increase µ2 from 7.0 to 10.0, and µ4 from 16.5 to 18.5. What is theimpact on the project duration T?

X1 = project duration T under original laws.X2 = project duration T under modified distributions.We want to study the distribution of ∆ = X1 − X2 and estimate E[∆].

Suppose that Yj = F−1j (Uj) for X1 and Yj = F−1j (Uj) for X2.

IRN: The Uj are independants of the Uj . CRN: Uj = Uj for each j .

We make n = 100, 000 runs for each estimator.

With IRN, the realizations of ∆ = X2 −X1 range from −223.22 to 280.92.mean = 1.326, variance = 967.95% CI on E[∆]: (1.133, 1.519).

With CRN, ∆ goes from 0 a 49.88. mean = 1.528, variance = 9.1,95% CI on E[∆]: (1.510, 1.547).

CRNs reduce the variance by a factor of (approx.) 106.

With CRN, 67,880 realizations of ∆ are 0 (because the modified Yj arenot on the longest path) and the 32,120 other realizations ∆ are > 0.Explanation: when we increase its mean µj , Yj = −µj ln(1− Uj)(exponential r.v.) cannot decrease for Uj fixed, because − ln(1− Uj) > 0.Then T cannot decrease.

With IRN, ∆ can take both positive and negative values.

∆0 50-50 100-100 150-150

Frequency (IRN)

10,000

∆0 5 10 15 20

Frequency (CRN)

Example: a telephone call centerOpen 13 hours a day.nj = number of agents available during hour j .Arrivals: Poisson at rate Bλj per hour during hour j , whereB = busyness factor for the day; B ∼ gamma(10, 10);

E[B] = 1, Var[B] = 0.1.Expected number of arrivals: a = E[A] = E[B]

∑12j=0 λj .

Service times : i.i.d. exponential with mean θ = 100 seconds.FIFO queue.Patience time: 0 with prob. p = 0.1, exponential with mean 1000, withprob. 1− p. If wait > patience: abandonment.

Let G = number of calls answered within 20 seconds on a given day.Performance measure of interest:µ = fraction of calls answered within 20 seconds, in the long run.

Unbiased estimator of µ: X = G/a.

∑12j=0 λj .

j 0 1 2 3 4 5 6 7 8 9 10 11 12nj 4 6 8 8 8 7 8 8 6 6 4 4 4λj 100 150 150 180 200 150 150 150 120 100 80 70 60

(Arrival rates are per hour and service times are in seconds.)

Let X1 = value of G with this configuration;and X2 = value of G with one more agent for periods 5 and 6.

Want to estimate µ2 − µ1 = E[X2 − X1] = E[∆].

Here, Var[∆] is about 225 times smaller with CRNs than with IRNs.

In an optimization algorithm, we may have to compare thousands ofconfigurations (different staffings, routings of calls, etc.), and theefficiency gain can make a huge difference.

j 0 1 2 3 4 5 6 7 8 9 10 11 12nj 4 6 8 8 8 7 8 8 6 6 4 4 4λj 100 150 150 180 200 150 150 150 120 100 80 70 60

(Arrival rates are per hour and service times are in seconds.)

Let X1 = value of G with this configuration;and X2 = value of G with one more agent for periods 5 and 6.

Want to estimate µ2 − µ1 = E[X2 − X1] = E[∆].

Here, Var[∆] is about 225 times smaller with CRNs than with IRNs.

In an optimization algorithm, we may have to compare thousands ofconfigurations (different staffings, routings of calls, etc.), and theefficiency gain can make a huge difference.

public class CallCenterCRN extends CallCenter {

Tally statQoS1 = new Tally ("stats on QoS for config.1");

Tally statDiffIndep = new Tally ("stats on difference with IRNs");

Tally statDiffCRN = new Tally ("stats on difference with CRNs");

int[] numAgents1, numAgents2;

public CallCenterCRN (String fileName) throws IOException {

super (fileName);

numAgents1 = new int[numPeriods];

numAgents2 = new int[numPeriods];

for (int j = 0; j < numPeriods; j++)

numAgents1[j] = numAgents2[j] = numAgents[j];

// Set the number of agents in each period j to the values in num.

public void setNumAgents (int[] num) {

for (int j = 0; j < numPeriods; j++) numAgents[j] = num[j];

9public void simulateDiffCRN (int n) {

double value1, value2;

statQoS1.init(); statDiffIndep.init(); statDiffCRN.init();

for (int i = 0; i < n; i++) {

setNumAgents (numAgents1);

streamB.resetNextSubstream();

streamArr.resetNextSubstream();

streamPatience.resetNextSubstream();

(genServ.getStream()).resetNextSubstream();

simulateOneDay(); // Simulate config 1

value1 = (double)nGoodQoS / nCallsExpected;

setNumAgents (numAgents2);

streamB.resetStartSubstream();

streamArr.resetStartSubstream();

streamPatience.resetStartSubstream();

(genServ.getStream()).resetStartSubstream();

simulateOneDay(); // Simulate config 2 with CRN

statQoS1.add (value1);

statDiffCRN.add (value2 - value1); // Stat. for CRN

simulateOneDay(); // Simulate config 2 indep.

statDiffIndep.add (value2 - value1); // Stat. for IRN

static public void main (String[] args) throws IOException {

int n = 1000; // Number of replications.

CallCenterCRN cc = new CallCenterCRN ("CallCenter.dat");

cc.numAgents2[5]++; cc.numAgents2[6]++; // Config 2

cc.simulateDiffCRN (n);

System.out.println (

cc.statQoS1.reportAndCIStudent (0.9) +

cc.statDiffIndep.reportAndCIStudent (0.9) +

cc.statDiffCRN.reportAndCIStudent (0.9));

double varianceIndep = cc.statDiffIndep.variance();

double varianceCRN = cc.statDiffCRN.variance();

// Print variance reduction factor.

System.out.println ("Variance ratio: " +

PrintfFormat.format (10, 2, 3, varianceIndep / varianceCRN));

REPORT on Tally stat. collector ==> stats on QoS for config.1

min max average standard dev. num. obs

0.293 1.131 0.861 0.163 1000

90.0% confidence interval for mean: ( 0.853, 0.870 )

REPORT on Tally stat. collector ==> stats on difference with IRNs

-0.763 0.775 9.9E-3 0.234 1000

90.0% confidence interval for mean: ( -0.002, 0.022 )

REPORT on Tally stat. collector ==> stats on difference with CRNs

-0.013 0.101 9.9E-3 0.016 1000

90.0% confidence interval for mean: ( 0.009, 0.011 )

Variance ratio: 223.69

Derivative estimation for call center

Service times are exponential with mean θ = 100 seconds.

We would like to estimate the derivative of µ = E[G ] w.r.t. θ.

For that, we simulate the system at θ = θ1 = 100 to get X1, then atθ = θ2 = 100 + δ, and estimate the derivative by D(θ, δ) = (X2 − X1)/δ.

Can simulate X1 and X2 either with CRNs or with IRNs.

We replicate this n times, independently, and compute the empirical meanand variance.

How to implement CRNs?

Four types of random variates in this model, all generated by inversion:

(a) the busyness factor B for the day;(b) the times between successive arrivals of calls;(c) the call durations;(d) the patience times;

Synchronization problem: when service times change, waiting times andabandonment decisions can change.For a given call, we may need to generate a patience time in one case andnot on the other one (if call does not wait), or a service time in one caseand not on the other one (if call abandons).

Possible strategies:

(a) generate a service time for all calls, or

(b) only for those who do not abandon.

Similarly, we can

(c) generate a patience time for all calls, or

(d) only for those who wait.

Possible strategies:

(a) generate a service time for all calls, or

(b) only for those who do not abandon.

Similarly, we can

(c) generate a patience time for all calls, or

(d) only for those who wait.

Experimental results, with n = 104. S2n = Var[D(θ, δ)].

Method δ = 10 δ = 1 δ = 0.1Dn(θ, δ) δ2S2

n Dn(θ, δ) δ2S2n Dn(θ, δ) δ2S2

IRN (a + c) 5.52 56913 4.98 45164 6.6 44046IRN (a + d) 5.22 54696 7.22 45192 -18.2 45022IRN (b + c) 5.03 56919 9.98 44241 15.0 45383IRN (b + d) 5.37 55222 5.82 44659 13.6 44493CRN, no sync. 5.60 3187 5.90 1204 01.9 726CRN (a + c) 5.64 2154 6.29 37 06.2 1.8CRN (a + d) 5.59 2161 6.08 158 07.4 53.8CRN (b + c) 5.58 2333 6.25 104 06.3 7.9CRN (b + d) 5.55 2323 6.44 143 05.9 35.3

Derivative estimation: theory

Suppose µ = µ(θ) = E[X (θ,U)] depends on a continuous parameter θand we want to estimate

µ′(θ1) =∂µ(θ)

∣∣∣∣θ=θ1

=∂Eθ[X (θ)]

∣∣∣∣θ=θ1

= limδ→0

Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]

For a vector of parameters: gradient (vector).

Why estime derivatives?

(a) To evaluate the relative importance of different model parameters(sensitivity avalysis).(b) Confidence interval that accounts for estimation error in modelparameters.(c) Evaluate the effect of a change (sensitivity) in a decision parameter.(d) A gradient estimator is often required in optimisation algorithms.(e) In finance, the derivatives of a contract price (the “Greeks”) w.r.t.certain parameters are required to implement hedging strategies.

Derivative estimation: theory

Suppose µ = µ(θ) = E[X (θ,U)] depends on a continuous parameter θand we want to estimate

µ′(θ1) =∂µ(θ)

∣∣∣∣θ=θ1

=∂Eθ[X (θ)]

∣∣∣∣θ=θ1

= limδ→0

Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]

For a vector of parameters: gradient (vector).

Why estime derivatives?

(a) To evaluate the relative importance of different model parameters(sensitivity avalysis).(b) Confidence interval that accounts for estimation error in modelparameters.(c) Evaluate the effect of a change (sensitivity) in a decision parameter.(d) A gradient estimator is often required in optimisation algorithms.(e) In finance, the derivatives of a contract price (the “Greeks”) w.r.t.certain parameters are required to implement hedging strategies.

Want to estimate

µ′(θ1) = limδ→0

Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]

Finite-difference estimator:

δ= D(θ1, δ) =

X (θ1 + δ,U2)− X (θ1,U1)

(X2 − X1)

for some δ > 0, where U1 and U2 are sequences of uniform r.v.’s.

This estimator is biased, but biais β → 0 when δ → 0. Moreover:

Var[∆/δ] =Var(X2 − X1)

Var[X1] + Var[X2]− 2Cov[X1,X2]

Requires at least d + 1 simulations to estimate a d-dimensional gradient.

Proposition.

(i) If U1 and U2 are independent, then

limδ→0

δ2Var[D(θ, δ)] = 2Var[X (θ)].

That is, Var[D(θ, δ)] blows up at rate 1/δ2.

(ii) Suppose U1 = U2 = U (CRNs), X (θ,U) is continuous in θ anddifferentiable almost everywhere, and D(θ, δ) is uniformly integrable(uniformly in θ).Then Var[D(θ, δ)] remains bounded when δ → 0.

(iii) Suppose U1 = U2 = U and X (θ,U) is discontinuous in θ, but theprobability that X (·,U) is discontinuous in (θ, θ + δ) converges to 0 asO(δβ) when δ → 0, and X 2+ε(θ) is uniformly integrable for some ε > 0.Then Var[D(θ, δ)] = O(1 + δβ−2−ε), for any ε > 0, when δ → 0.

Proposition.

limδ→0

Proposition.

limδ→0

Can improve efficiency by arbitrary large factor when δ → 0.For ex., δ = 10−4 (and we assume hidden constants are 1), thenVar[D(θ, δ)] is 200 millions times larger with (i) than with (ii).So (i) needs 200 millions times more runs for same accuracy.

When (ii) holds, we may take the stochastic derivative

X ′(θ,U) = ∂X (θ,U)/∂θ = limδ→0

D(θ, δ)

as an estimator of µ′(θ), if not too hard to compute.This is infinitesimal perturbation analysis.

This estimator is unbiased iff

E[X ′(θ,U)]def= E

[∂X (θ,U)

]?=∂E[X (θ,U)]

def= µ′(θ,U). (1)

Sufficient condition: Lebesgue Dominated Convergence Theorem.If there is a δ1 > 0 and a random variable Y such that

supδ∈(0,δ1]

|X (θ1 + δ,U)− f (θ1,U)|δ

and E[Y ] <∞, then the interchange in (1) is valid.

When (ii) holds, we may take the stochastic derivative

X ′(θ,U) = ∂X (θ,U)/∂θ = limδ→0

D(θ, δ)

as an estimator of µ′(θ), if not too hard to compute.This is infinitesimal perturbation analysis.

This estimator is unbiased iff

E[X ′(θ,U)]def= E

[∂X (θ,U)

]?=∂E[X (θ,U)]

def= µ′(θ,U). (1)

Sufficient condition: Lebesgue Dominated Convergence Theorem.If there is a δ1 > 0 and a random variable Y such that

supδ∈(0,δ1]

|X (θ1 + δ,U)− f (θ1,U)|δ

and E[Y ] <∞, then the interchange in (1) is valid.

We may change the definition of X (θ) to make it continuous and benefitfrom case (ii). For ex., by replacing some r.v.’s by conditional expectations(conditional Monte Carlo).

For example, if X (θ) counts the customer abandonments, we may replaceeach indicator of abandonment (0 or 1) by the probability of abandonmentgiven the waiting time.

Case (iii) shows that CRNs may provide substantial benefits even if X (θ)is discontinuous.In the call center example, we can prove that (iii) holds with β = 1.

We may change the definition of X (θ) to make it continuous and benefitfrom case (ii). For ex., by replacing some r.v.’s by conditional expectations(conditional Monte Carlo).

For example, if X (θ) counts the customer abandonments, we may replaceeach indicator of abandonment (0 or 1) by the probability of abandonmentgiven the waiting time.

Case (iii) shows that CRNs may provide substantial benefits even if X (θ)is discontinuous.In the call center example, we can prove that (iii) holds with β = 1.

Example: stochastic activity network, E[T ].

0source 1Y0

8 sink

Yj = F−1j ,θj(Uj).

Some Yj are exponential with mean θj = µj :Yj = Yj(θj) = −θj ln(1− Uj).

Some Yj are normal with mean θj = µj :Yj = Yj(θj) = θj + (θj/4)Zj = θj + (θj/4)Φ−1(Uj).

We want to estimate the derivative of E[T ] w.r.t. each θj .We consider a single θj at a time.We write T = X (θj ,U) where U = (U1, . . . ,U13) and ∂T/∂θj = X ′j (θ,U).

We see that X ′j (θ,U) = Y ′j (θj) if arc j is on the longest path, andX ′j (θ,U) = 0 otherwise.

If Yj is exponential, then Yj = Yj(θj) = −θj ln(1− Uj),Y ′j (θj) = − ln(1− Uj) and

0 ≤Xj(θ + δej ,U)− Xj(θ,U)

δ≤−δ ln(1− Uj)

δ= − ln(1− Uj) = Ej ,

where Ej ∼ Exponentielle(1). The dominated convergence theoremapplies: X ′j (θ,U) is unbiased and has finite variance.

If Yj is normal, then Yj = Yj(θj) = θj + (θj/4)Φ−1(Uj) andY ′j (θj) = 1 + Φ−1(Uj)/4. Also unbiased.

δ= − ln(1− Uj) = Ej ,

Example: stochastic activity network, P[T > x ].Now we want to estimate the derivative of P[T > x ] w.r.t. θj .The standard estimator of P[T > x ] is X (θ,U) = I[T > x ].Always 0 or 1.

Therefore the derivative X ′j (θ,U) is always 0 or undefined (the latteroccurs with probability 0).Thus, P[X ′j (θ,U) = 0] = 1. This is a biased estimator ofµ′(θj) = ∂P[T > x ]/∂θj .

The dominated convergence does not apply here, becausesupδ>0[X (θ + δej ,U)− X (θ,U)]/δ is not integrable.This ratio can always be 1/δ with positive probability w.r.t. U.

This problem can be solved (in this example) by replacing the estimatorI[T > x ] by a conditional expectationP[T > x | Y0,Y1,Y2,Y3,Y6,Y7,Y10,Y12], which is continuous in θ.

source......................................................................................................................................

..........................................................................................................................................................................

..................................................................................................................

............................................................................................................

....................

..........................................................................................................................................................................................

.................... ....................

..........................................

...........................................

............................................................................................................

...................................................................................................................

....................

...................................................................................................................................................................................................................................................................................................................

CMC Estimator:Xe = P[T > x | {Vj , j 6∈ L}].

Computed as follows.

For each l ∈ L, say going from al to bl , we compute the length αl of thelongest path from the source to al , then the length βl of the longest pathfrom bl to the sink.

No path going through l is longer than x iff αl + Vl + βl ≤ x .

Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].Since the Vl are independent, we obtain

Xe = 1− P[Vl ≤ x − αl − βl for all l ] = 1−∏l∈L

Fl [x − αl − βl ].

Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].

Since the Vl are independent, we obtain

Fl [x − αl − βl ].

Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].Since the Vl are independent, we obtain

Fl [x − αl − βl ].

Discrete-event models

The stochastic derivative can be complicated to compute, because aninfinitesimal change in θ may have complicated impacts on the sequenceof events.

Propagation technique: infinitesimal perturbation analysis (IPA).

Sample average optimizationSuppose we have an optimization problem of the form

min E[H(y,U)]subject to E[Gk(y,U)] ≥ bk for all k ,

y ∈ S (some set)

Simulate n copies of functions H and Gk , with CRNs across y, and takeaverages. Sample average problem (deterministic in y):

min Hn(y)subject to Gk,n(y) ≥ bk for all k ,

y ∈ S.Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.

Well-synchronized CRNs are essential.

Example: staffing and scheduling in a multiskill call center.

More general: Objective and constraints may contain functions of severalexpectations, or quantiles, etc.

y ∈ S (some set)

y ∈ S.

Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.

y ∈ S (some set)

Common random numbers (CRN)...For typical simulations, F 1 k (U) is much too complicated to compute....

Documents

Transcript of Common random numbers (CRN)...For typical simulations, F 1 k (U) is much too complicated to compute....