Common random numbers (CRN)...For typical simulations, F 1 k (U) is much too complicated to compute....
Transcript of Common random numbers (CRN)...For typical simulations, F 1 k (U) is much too complicated to compute....
Dra
ft
1
Common random numbers (CRN)
Simulation is often used to compare similar systems,e.g., for the purpose of optimization.
Suppose we want to estimate µ2 − µ1 by ∆ = X2 − X1, where µ1 = E[X1]and µ2 = E[X2]. We have
Var[∆] = Var[X1] + Var[X2]− 2Cov[X1,X2].
If each Xk has (fixed) cdf Fk for k = 1, 2, then taking Xk = F−1k (U) for asingle common r.v. U ∼ U(0, 1) maximizes the covariance (Frechet 1951).
For typical simulations, F−1k (U) is much too complicated to compute.
Dra
ft
1
Common random numbers (CRN)
Simulation is often used to compare similar systems,e.g., for the purpose of optimization.
Suppose we want to estimate µ2 − µ1 by ∆ = X2 − X1, where µ1 = E[X1]and µ2 = E[X2]. We have
Var[∆] = Var[X1] + Var[X2]− 2Cov[X1,X2].
If each Xk has (fixed) cdf Fk for k = 1, 2, then taking Xk = F−1k (U) for asingle common r.v. U ∼ U(0, 1) maximizes the covariance (Frechet 1951).
For typical simulations, F−1k (U) is much too complicated to compute.
Dra
ft
1
Common random numbers (CRN)
Simulation is often used to compare similar systems,e.g., for the purpose of optimization.
Suppose we want to estimate µ2 − µ1 by ∆ = X2 − X1, where µ1 = E[X1]and µ2 = E[X2]. We have
Var[∆] = Var[X1] + Var[X2]− 2Cov[X1,X2].
If each Xk has (fixed) cdf Fk for k = 1, 2, then taking Xk = F−1k (U) for asingle common r.v. U ∼ U(0, 1) maximizes the covariance (Frechet 1951).
For typical simulations, F−1k (U) is much too complicated to compute.
Dra
ft
2
Common random numbers (CRNs)
What we can do is simulate the two systems with exactly the samestreams of uniforms random numbers. Important: make sure that thecommon random numbers (CRN) are used for the same purpose for bothsystems (synchronization) and generate all r.v.’s by inversion.
Proposition. If X1 and X2 are monotone functions of each uniform, in thesame direction then Cov[X1,X2] > 0.
In general (non-monotone functions), we can still have Cov[X1,X2] > 0.
With independent random numbers(IRN), Cov[X1,X2] = 0.
Multiple comparisons: All of this applies if we want to compare severalsimilar systems.
Dra
ft
2
Common random numbers (CRNs)
What we can do is simulate the two systems with exactly the samestreams of uniforms random numbers. Important: make sure that thecommon random numbers (CRN) are used for the same purpose for bothsystems (synchronization) and generate all r.v.’s by inversion.
Proposition. If X1 and X2 are monotone functions of each uniform, in thesame direction then Cov[X1,X2] > 0.
In general (non-monotone functions), we can still have Cov[X1,X2] > 0.
With independent random numbers(IRN), Cov[X1,X2] = 0.
Multiple comparisons: All of this applies if we want to compare severalsimilar systems.
Dra
ft
3
Example: The stochastic activity network
0source 1Y0
2
Y1Y2
3Y3
4
Y7
5
Y9
Y4
Y5
6Y6
7
Y11
Y8
8 sink
Y12
Y10
Suppose we increase µ2 from 7.0 to 10.0, and µ4 from 16.5 to 18.5. What is theimpact on the project duration T?
X1 = project duration T under original laws.X2 = project duration T under modified distributions.We want to study the distribution of ∆ = X1 − X2 and estimate E[∆].
Suppose that Yj = F−1j (Uj) for X1 and Yj = F−1j (Uj) for X2.
IRN: The Uj are independants of the Uj . CRN: Uj = Uj for each j .
Dra
ft
4
We make n = 100, 000 runs for each estimator.
With IRN, the realizations of ∆ = X2 −X1 range from −223.22 to 280.92.mean = 1.326, variance = 967.95% CI on E[∆]: (1.133, 1.519).
With CRN, ∆ goes from 0 a 49.88. mean = 1.528, variance = 9.1,95% CI on E[∆]: (1.510, 1.547).
CRNs reduce the variance by a factor of (approx.) 106.
With CRN, 67,880 realizations of ∆ are 0 (because the modified Yj arenot on the longest path) and the 32,120 other realizations ∆ are > 0.Explanation: when we increase its mean µj , Yj = −µj ln(1− Uj)(exponential r.v.) cannot decrease for Uj fixed, because − ln(1− Uj) > 0.Then T cannot decrease.
With IRN, ∆ can take both positive and negative values.
Dra
ft
4
We make n = 100, 000 runs for each estimator.
With IRN, the realizations of ∆ = X2 −X1 range from −223.22 to 280.92.mean = 1.326, variance = 967.95% CI on E[∆]: (1.133, 1.519).
With CRN, ∆ goes from 0 a 49.88. mean = 1.528, variance = 9.1,95% CI on E[∆]: (1.510, 1.547).
CRNs reduce the variance by a factor of (approx.) 106.
With CRN, 67,880 realizations of ∆ are 0 (because the modified Yj arenot on the longest path) and the 32,120 other realizations ∆ are > 0.Explanation: when we increase its mean µj , Yj = −µj ln(1− Uj)(exponential r.v.) cannot decrease for Uj fixed, because − ln(1− Uj) > 0.Then T cannot decrease.
With IRN, ∆ can take both positive and negative values.
Dra
ft
4
We make n = 100, 000 runs for each estimator.
With IRN, the realizations of ∆ = X2 −X1 range from −223.22 to 280.92.mean = 1.326, variance = 967.95% CI on E[∆]: (1.133, 1.519).
With CRN, ∆ goes from 0 a 49.88. mean = 1.528, variance = 9.1,95% CI on E[∆]: (1.510, 1.547).
CRNs reduce the variance by a factor of (approx.) 106.
With CRN, 67,880 realizations of ∆ are 0 (because the modified Yj arenot on the longest path) and the 32,120 other realizations ∆ are > 0.Explanation: when we increase its mean µj , Yj = −µj ln(1− Uj)(exponential r.v.) cannot decrease for Uj fixed, because − ln(1− Uj) > 0.Then T cannot decrease.
With IRN, ∆ can take both positive and negative values.
Dra
ft
5
∆0 50-50 100-100 150-150
Frequency (IRN)
5,000
10,000
∆0 5 10 15 20
Frequency (CRN)
0
1,000
2,000
Dra
ft
6
Example: a telephone call centerOpen 13 hours a day.nj = number of agents available during hour j .Arrivals: Poisson at rate Bλj per hour during hour j , whereB = busyness factor for the day; B ∼ gamma(10, 10);
E[B] = 1, Var[B] = 0.1.Expected number of arrivals: a = E[A] = E[B]
∑12j=0 λj .
Service times : i.i.d. exponential with mean θ = 100 seconds.FIFO queue.Patience time: 0 with prob. p = 0.1, exponential with mean 1000, withprob. 1− p. If wait > patience: abandonment.
Let G = number of calls answered within 20 seconds on a given day.Performance measure of interest:µ = fraction of calls answered within 20 seconds, in the long run.
Unbiased estimator of µ: X = G/a.
Dra
ft
6
Example: a telephone call centerOpen 13 hours a day.nj = number of agents available during hour j .Arrivals: Poisson at rate Bλj per hour during hour j , whereB = busyness factor for the day; B ∼ gamma(10, 10);
E[B] = 1, Var[B] = 0.1.Expected number of arrivals: a = E[A] = E[B]
∑12j=0 λj .
Service times : i.i.d. exponential with mean θ = 100 seconds.FIFO queue.Patience time: 0 with prob. p = 0.1, exponential with mean 1000, withprob. 1− p. If wait > patience: abandonment.
Let G = number of calls answered within 20 seconds on a given day.Performance measure of interest:µ = fraction of calls answered within 20 seconds, in the long run.
Unbiased estimator of µ: X = G/a.
Dra
ft
6
Example: a telephone call centerOpen 13 hours a day.nj = number of agents available during hour j .Arrivals: Poisson at rate Bλj per hour during hour j , whereB = busyness factor for the day; B ∼ gamma(10, 10);
E[B] = 1, Var[B] = 0.1.Expected number of arrivals: a = E[A] = E[B]
∑12j=0 λj .
Service times : i.i.d. exponential with mean θ = 100 seconds.FIFO queue.Patience time: 0 with prob. p = 0.1, exponential with mean 1000, withprob. 1− p. If wait > patience: abandonment.
Let G = number of calls answered within 20 seconds on a given day.Performance measure of interest:µ = fraction of calls answered within 20 seconds, in the long run.
Unbiased estimator of µ: X = G/a.
Dra
ft
7
j 0 1 2 3 4 5 6 7 8 9 10 11 12nj 4 6 8 8 8 7 8 8 6 6 4 4 4λj 100 150 150 180 200 150 150 150 120 100 80 70 60
(Arrival rates are per hour and service times are in seconds.)
Let X1 = value of G with this configuration;and X2 = value of G with one more agent for periods 5 and 6.
Want to estimate µ2 − µ1 = E[X2 − X1] = E[∆].
Here, Var[∆] is about 225 times smaller with CRNs than with IRNs.
In an optimization algorithm, we may have to compare thousands ofconfigurations (different staffings, routings of calls, etc.), and theefficiency gain can make a huge difference.
Dra
ft
7
j 0 1 2 3 4 5 6 7 8 9 10 11 12nj 4 6 8 8 8 7 8 8 6 6 4 4 4λj 100 150 150 180 200 150 150 150 120 100 80 70 60
(Arrival rates are per hour and service times are in seconds.)
Let X1 = value of G with this configuration;and X2 = value of G with one more agent for periods 5 and 6.
Want to estimate µ2 − µ1 = E[X2 − X1] = E[∆].
Here, Var[∆] is about 225 times smaller with CRNs than with IRNs.
In an optimization algorithm, we may have to compare thousands ofconfigurations (different staffings, routings of calls, etc.), and theefficiency gain can make a huge difference.
Dra
ft
8
public class CallCenterCRN extends CallCenter {
Tally statQoS1 = new Tally ("stats on QoS for config.1");
Tally statDiffIndep = new Tally ("stats on difference with IRNs");
Tally statDiffCRN = new Tally ("stats on difference with CRNs");
int[] numAgents1, numAgents2;
public CallCenterCRN (String fileName) throws IOException {
super (fileName);
numAgents1 = new int[numPeriods];
numAgents2 = new int[numPeriods];
for (int j = 0; j < numPeriods; j++)
numAgents1[j] = numAgents2[j] = numAgents[j];
}
// Set the number of agents in each period j to the values in num.
public void setNumAgents (int[] num) {
for (int j = 0; j < numPeriods; j++) numAgents[j] = num[j];
}
}
Dra
ft
9public void simulateDiffCRN (int n) {
double value1, value2;
statQoS1.init(); statDiffIndep.init(); statDiffCRN.init();
for (int i = 0; i < n; i++) {
setNumAgents (numAgents1);
streamB.resetNextSubstream();
streamArr.resetNextSubstream();
streamPatience.resetNextSubstream();
(genServ.getStream()).resetNextSubstream();
simulateOneDay(); // Simulate config 1
value1 = (double)nGoodQoS / nCallsExpected;
setNumAgents (numAgents2);
streamB.resetStartSubstream();
streamArr.resetStartSubstream();
streamPatience.resetStartSubstream();
(genServ.getStream()).resetStartSubstream();
simulateOneDay(); // Simulate config 2 with CRN
value2 = (double)nGoodQoS / nCallsExpected;
statQoS1.add (value1);
statDiffCRN.add (value2 - value1); // Stat. for CRN
simulateOneDay(); // Simulate config 2 indep.
value2 = (double)nGoodQoS / nCallsExpected;
statDiffIndep.add (value2 - value1); // Stat. for IRN
}
}
Dra
ft
10
static public void main (String[] args) throws IOException {
int n = 1000; // Number of replications.
CallCenterCRN cc = new CallCenterCRN ("CallCenter.dat");
cc.numAgents2[5]++; cc.numAgents2[6]++; // Config 2
cc.simulateDiffCRN (n);
System.out.println (
cc.statQoS1.reportAndCIStudent (0.9) +
cc.statDiffIndep.reportAndCIStudent (0.9) +
cc.statDiffCRN.reportAndCIStudent (0.9));
double varianceIndep = cc.statDiffIndep.variance();
double varianceCRN = cc.statDiffCRN.variance();
// Print variance reduction factor.
System.out.println ("Variance ratio: " +
PrintfFormat.format (10, 2, 3, varianceIndep / varianceCRN));
}
Dra
ft
11
REPORT on Tally stat. collector ==> stats on QoS for config.1
min max average standard dev. num. obs
0.293 1.131 0.861 0.163 1000
90.0% confidence interval for mean: ( 0.853, 0.870 )
REPORT on Tally stat. collector ==> stats on difference with IRNs
min max average standard dev. num. obs
-0.763 0.775 9.9E-3 0.234 1000
90.0% confidence interval for mean: ( -0.002, 0.022 )
REPORT on Tally stat. collector ==> stats on difference with CRNs
min max average standard dev. num. obs
-0.013 0.101 9.9E-3 0.016 1000
90.0% confidence interval for mean: ( 0.009, 0.011 )
Variance ratio: 223.69
Dra
ft
12
Derivative estimation for call center
Service times are exponential with mean θ = 100 seconds.
We would like to estimate the derivative of µ = E[G ] w.r.t. θ.
For that, we simulate the system at θ = θ1 = 100 to get X1, then atθ = θ2 = 100 + δ, and estimate the derivative by D(θ, δ) = (X2 − X1)/δ.
Can simulate X1 and X2 either with CRNs or with IRNs.
We replicate this n times, independently, and compute the empirical meanand variance.
Dra
ft
13
How to implement CRNs?
Four types of random variates in this model, all generated by inversion:
(a) the busyness factor B for the day;(b) the times between successive arrivals of calls;(c) the call durations;(d) the patience times;
Synchronization problem: when service times change, waiting times andabandonment decisions can change.For a given call, we may need to generate a patience time in one case andnot on the other one (if call does not wait), or a service time in one caseand not on the other one (if call abandons).
Dra
ft
14
Possible strategies:
(a) generate a service time for all calls, or
(b) only for those who do not abandon.
Similarly, we can
(c) generate a patience time for all calls, or
(d) only for those who wait.
Dra
ft
14
Possible strategies:
(a) generate a service time for all calls, or
(b) only for those who do not abandon.
Similarly, we can
(c) generate a patience time for all calls, or
(d) only for those who wait.
Dra
ft
15
Experimental results, with n = 104. S2n = Var[D(θ, δ)].
Method δ = 10 δ = 1 δ = 0.1Dn(θ, δ) δ2S2
n Dn(θ, δ) δ2S2n Dn(θ, δ) δ2S2
n
IRN (a + c) 5.52 56913 4.98 45164 6.6 44046IRN (a + d) 5.22 54696 7.22 45192 -18.2 45022IRN (b + c) 5.03 56919 9.98 44241 15.0 45383IRN (b + d) 5.37 55222 5.82 44659 13.6 44493CRN, no sync. 5.60 3187 5.90 1204 01.9 726CRN (a + c) 5.64 2154 6.29 37 06.2 1.8CRN (a + d) 5.59 2161 6.08 158 07.4 53.8CRN (b + c) 5.58 2333 6.25 104 06.3 7.9CRN (b + d) 5.55 2323 6.44 143 05.9 35.3
Dra
ft
16
Derivative estimation: theory
Suppose µ = µ(θ) = E[X (θ,U)] depends on a continuous parameter θand we want to estimate
µ′(θ1) =∂µ(θ)
∂θ
∣∣∣∣θ=θ1
=∂Eθ[X (θ)]
∂θ
∣∣∣∣θ=θ1
= limδ→0
Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]
δ.
For a vector of parameters: gradient (vector).
Why estime derivatives?
(a) To evaluate the relative importance of different model parameters(sensitivity avalysis).(b) Confidence interval that accounts for estimation error in modelparameters.(c) Evaluate the effect of a change (sensitivity) in a decision parameter.(d) A gradient estimator is often required in optimisation algorithms.(e) In finance, the derivatives of a contract price (the “Greeks”) w.r.t.certain parameters are required to implement hedging strategies.
Dra
ft
16
Derivative estimation: theory
Suppose µ = µ(θ) = E[X (θ,U)] depends on a continuous parameter θand we want to estimate
µ′(θ1) =∂µ(θ)
∂θ
∣∣∣∣θ=θ1
=∂Eθ[X (θ)]
∂θ
∣∣∣∣θ=θ1
= limδ→0
Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]
δ.
For a vector of parameters: gradient (vector).
Why estime derivatives?
(a) To evaluate the relative importance of different model parameters(sensitivity avalysis).(b) Confidence interval that accounts for estimation error in modelparameters.(c) Evaluate the effect of a change (sensitivity) in a decision parameter.(d) A gradient estimator is often required in optimisation algorithms.(e) In finance, the derivatives of a contract price (the “Greeks”) w.r.t.certain parameters are required to implement hedging strategies.
Dra
ft
17
Want to estimate
µ′(θ1) = limδ→0
Eθ1+δ[X (θ1 + δ)]− Eθ1 [X (θ1)]
δ.
Finite-difference estimator:
∆
δ= D(θ1, δ) =
X (θ1 + δ,U2)− X (θ1,U1)
δ=
(X2 − X1)
δ.
for some δ > 0, where U1 and U2 are sequences of uniform r.v.’s.
This estimator is biased, but biais β → 0 when δ → 0. Moreover:
Var[∆/δ] =Var(X2 − X1)
δ2=
Var[X1] + Var[X2]− 2Cov[X1,X2]
δ2.
Requires at least d + 1 simulations to estimate a d-dimensional gradient.
Dra
ft
18
Proposition.
(i) If U1 and U2 are independent, then
limδ→0
δ2Var[D(θ, δ)] = 2Var[X (θ)].
That is, Var[D(θ, δ)] blows up at rate 1/δ2.
(ii) Suppose U1 = U2 = U (CRNs), X (θ,U) is continuous in θ anddifferentiable almost everywhere, and D(θ, δ) is uniformly integrable(uniformly in θ).Then Var[D(θ, δ)] remains bounded when δ → 0.
(iii) Suppose U1 = U2 = U and X (θ,U) is discontinuous in θ, but theprobability that X (·,U) is discontinuous in (θ, θ + δ) converges to 0 asO(δβ) when δ → 0, and X 2+ε(θ) is uniformly integrable for some ε > 0.Then Var[D(θ, δ)] = O(1 + δβ−2−ε), for any ε > 0, when δ → 0.
Dra
ft
18
Proposition.
(i) If U1 and U2 are independent, then
limδ→0
δ2Var[D(θ, δ)] = 2Var[X (θ)].
That is, Var[D(θ, δ)] blows up at rate 1/δ2.
(ii) Suppose U1 = U2 = U (CRNs), X (θ,U) is continuous in θ anddifferentiable almost everywhere, and D(θ, δ) is uniformly integrable(uniformly in θ).Then Var[D(θ, δ)] remains bounded when δ → 0.
(iii) Suppose U1 = U2 = U and X (θ,U) is discontinuous in θ, but theprobability that X (·,U) is discontinuous in (θ, θ + δ) converges to 0 asO(δβ) when δ → 0, and X 2+ε(θ) is uniformly integrable for some ε > 0.Then Var[D(θ, δ)] = O(1 + δβ−2−ε), for any ε > 0, when δ → 0.
Dra
ft
18
Proposition.
(i) If U1 and U2 are independent, then
limδ→0
δ2Var[D(θ, δ)] = 2Var[X (θ)].
That is, Var[D(θ, δ)] blows up at rate 1/δ2.
(ii) Suppose U1 = U2 = U (CRNs), X (θ,U) is continuous in θ anddifferentiable almost everywhere, and D(θ, δ) is uniformly integrable(uniformly in θ).Then Var[D(θ, δ)] remains bounded when δ → 0.
(iii) Suppose U1 = U2 = U and X (θ,U) is discontinuous in θ, but theprobability that X (·,U) is discontinuous in (θ, θ + δ) converges to 0 asO(δβ) when δ → 0, and X 2+ε(θ) is uniformly integrable for some ε > 0.Then Var[D(θ, δ)] = O(1 + δβ−2−ε), for any ε > 0, when δ → 0.
Dra
ft
19
Can improve efficiency by arbitrary large factor when δ → 0.For ex., δ = 10−4 (and we assume hidden constants are 1), thenVar[D(θ, δ)] is 200 millions times larger with (i) than with (ii).So (i) needs 200 millions times more runs for same accuracy.
Dra
ft
20
When (ii) holds, we may take the stochastic derivative
X ′(θ,U) = ∂X (θ,U)/∂θ = limδ→0
D(θ, δ)
as an estimator of µ′(θ), if not too hard to compute.This is infinitesimal perturbation analysis.
This estimator is unbiased iff
E[X ′(θ,U)]def= E
[∂X (θ,U)
∂θ
]?=∂E[X (θ,U)]
∂θ
def= µ′(θ,U). (1)
Sufficient condition: Lebesgue Dominated Convergence Theorem.If there is a δ1 > 0 and a random variable Y such that
supδ∈(0,δ1]
|X (θ1 + δ,U)− f (θ1,U)|δ
≤ Y
and E[Y ] <∞, then the interchange in (1) is valid.
Dra
ft
20
When (ii) holds, we may take the stochastic derivative
X ′(θ,U) = ∂X (θ,U)/∂θ = limδ→0
D(θ, δ)
as an estimator of µ′(θ), if not too hard to compute.This is infinitesimal perturbation analysis.
This estimator is unbiased iff
E[X ′(θ,U)]def= E
[∂X (θ,U)
∂θ
]?=∂E[X (θ,U)]
∂θ
def= µ′(θ,U). (1)
Sufficient condition: Lebesgue Dominated Convergence Theorem.If there is a δ1 > 0 and a random variable Y such that
supδ∈(0,δ1]
|X (θ1 + δ,U)− f (θ1,U)|δ
≤ Y
and E[Y ] <∞, then the interchange in (1) is valid.
Dra
ft
21
We may change the definition of X (θ) to make it continuous and benefitfrom case (ii). For ex., by replacing some r.v.’s by conditional expectations(conditional Monte Carlo).
For example, if X (θ) counts the customer abandonments, we may replaceeach indicator of abandonment (0 or 1) by the probability of abandonmentgiven the waiting time.
Case (iii) shows that CRNs may provide substantial benefits even if X (θ)is discontinuous.In the call center example, we can prove that (iii) holds with β = 1.
Dra
ft
21
We may change the definition of X (θ) to make it continuous and benefitfrom case (ii). For ex., by replacing some r.v.’s by conditional expectations(conditional Monte Carlo).
For example, if X (θ) counts the customer abandonments, we may replaceeach indicator of abandonment (0 or 1) by the probability of abandonmentgiven the waiting time.
Case (iii) shows that CRNs may provide substantial benefits even if X (θ)is discontinuous.In the call center example, we can prove that (iii) holds with β = 1.
Dra
ft
22
Example: stochastic activity network, E[T ].
0source 1Y0
2
Y1Y2
3Y3
4
Y7
5
Y9
Y4
Y5
6Y6
7
Y11
Y8
8 sink
Y12
Y10
Yj = F−1j ,θj(Uj).
Some Yj are exponential with mean θj = µj :Yj = Yj(θj) = −θj ln(1− Uj).
Some Yj are normal with mean θj = µj :Yj = Yj(θj) = θj + (θj/4)Zj = θj + (θj/4)Φ−1(Uj).
Dra
ft
23
We want to estimate the derivative of E[T ] w.r.t. each θj .We consider a single θj at a time.We write T = X (θj ,U) where U = (U1, . . . ,U13) and ∂T/∂θj = X ′j (θ,U).
We see that X ′j (θ,U) = Y ′j (θj) if arc j is on the longest path, andX ′j (θ,U) = 0 otherwise.
If Yj is exponential, then Yj = Yj(θj) = −θj ln(1− Uj),Y ′j (θj) = − ln(1− Uj) and
0 ≤Xj(θ + δej ,U)− Xj(θ,U)
δ≤−δ ln(1− Uj)
δ= − ln(1− Uj) = Ej ,
where Ej ∼ Exponentielle(1). The dominated convergence theoremapplies: X ′j (θ,U) is unbiased and has finite variance.
If Yj is normal, then Yj = Yj(θj) = θj + (θj/4)Φ−1(Uj) andY ′j (θj) = 1 + Φ−1(Uj)/4. Also unbiased.
Dra
ft
23
We want to estimate the derivative of E[T ] w.r.t. each θj .We consider a single θj at a time.We write T = X (θj ,U) where U = (U1, . . . ,U13) and ∂T/∂θj = X ′j (θ,U).
We see that X ′j (θ,U) = Y ′j (θj) if arc j is on the longest path, andX ′j (θ,U) = 0 otherwise.
If Yj is exponential, then Yj = Yj(θj) = −θj ln(1− Uj),Y ′j (θj) = − ln(1− Uj) and
0 ≤Xj(θ + δej ,U)− Xj(θ,U)
δ≤−δ ln(1− Uj)
δ= − ln(1− Uj) = Ej ,
where Ej ∼ Exponentielle(1). The dominated convergence theoremapplies: X ′j (θ,U) is unbiased and has finite variance.
If Yj is normal, then Yj = Yj(θj) = θj + (θj/4)Φ−1(Uj) andY ′j (θj) = 1 + Φ−1(Uj)/4. Also unbiased.
Dra
ft
23
We want to estimate the derivative of E[T ] w.r.t. each θj .We consider a single θj at a time.We write T = X (θj ,U) where U = (U1, . . . ,U13) and ∂T/∂θj = X ′j (θ,U).
We see that X ′j (θ,U) = Y ′j (θj) if arc j is on the longest path, andX ′j (θ,U) = 0 otherwise.
If Yj is exponential, then Yj = Yj(θj) = −θj ln(1− Uj),Y ′j (θj) = − ln(1− Uj) and
0 ≤Xj(θ + δej ,U)− Xj(θ,U)
δ≤−δ ln(1− Uj)
δ= − ln(1− Uj) = Ej ,
where Ej ∼ Exponentielle(1). The dominated convergence theoremapplies: X ′j (θ,U) is unbiased and has finite variance.
If Yj is normal, then Yj = Yj(θj) = θj + (θj/4)Φ−1(Uj) andY ′j (θj) = 1 + Φ−1(Uj)/4. Also unbiased.
Dra
ft
23
We want to estimate the derivative of E[T ] w.r.t. each θj .We consider a single θj at a time.We write T = X (θj ,U) where U = (U1, . . . ,U13) and ∂T/∂θj = X ′j (θ,U).
We see that X ′j (θ,U) = Y ′j (θj) if arc j is on the longest path, andX ′j (θ,U) = 0 otherwise.
If Yj is exponential, then Yj = Yj(θj) = −θj ln(1− Uj),Y ′j (θj) = − ln(1− Uj) and
0 ≤Xj(θ + δej ,U)− Xj(θ,U)
δ≤−δ ln(1− Uj)
δ= − ln(1− Uj) = Ej ,
where Ej ∼ Exponentielle(1). The dominated convergence theoremapplies: X ′j (θ,U) is unbiased and has finite variance.
If Yj is normal, then Yj = Yj(θj) = θj + (θj/4)Φ−1(Uj) andY ′j (θj) = 1 + Φ−1(Uj)/4. Also unbiased.
Dra
ft
24
Example: stochastic activity network, P[T > x ].Now we want to estimate the derivative of P[T > x ] w.r.t. θj .The standard estimator of P[T > x ] is X (θ,U) = I[T > x ].Always 0 or 1.
Therefore the derivative X ′j (θ,U) is always 0 or undefined (the latteroccurs with probability 0).Thus, P[X ′j (θ,U) = 0] = 1. This is a biased estimator ofµ′(θj) = ∂P[T > x ]/∂θj .
The dominated convergence does not apply here, becausesupδ>0[X (θ + δej ,U)− X (θ,U)]/δ is not integrable.This ratio can always be 1/δ with positive probability w.r.t. U.
This problem can be solved (in this example) by replacing the estimatorI[T > x ] by a conditional expectationP[T > x | Y0,Y1,Y2,Y3,Y6,Y7,Y10,Y12], which is continuous in θ.
Dra
ft
24
Example: stochastic activity network, P[T > x ].Now we want to estimate the derivative of P[T > x ] w.r.t. θj .The standard estimator of P[T > x ] is X (θ,U) = I[T > x ].Always 0 or 1.
Therefore the derivative X ′j (θ,U) is always 0 or undefined (the latteroccurs with probability 0).Thus, P[X ′j (θ,U) = 0] = 1. This is a biased estimator ofµ′(θj) = ∂P[T > x ]/∂θj .
The dominated convergence does not apply here, becausesupδ>0[X (θ + δej ,U)− X (θ,U)]/δ is not integrable.This ratio can always be 1/δ with positive probability w.r.t. U.
This problem can be solved (in this example) by replacing the estimatorI[T > x ] by a conditional expectationP[T > x | Y0,Y1,Y2,Y3,Y6,Y7,Y10,Y12], which is continuous in θ.
Dra
ft
24
Example: stochastic activity network, P[T > x ].Now we want to estimate the derivative of P[T > x ] w.r.t. θj .The standard estimator of P[T > x ] is X (θ,U) = I[T > x ].Always 0 or 1.
Therefore the derivative X ′j (θ,U) is always 0 or undefined (the latteroccurs with probability 0).Thus, P[X ′j (θ,U) = 0] = 1. This is a biased estimator ofµ′(θj) = ∂P[T > x ]/∂θj .
The dominated convergence does not apply here, becausesupδ>0[X (θ + δej ,U)− X (θ,U)]/δ is not integrable.This ratio can always be 1/δ with positive probability w.r.t. U.
This problem can be solved (in this example) by replacing the estimatorI[T > x ] by a conditional expectationP[T > x | Y0,Y1,Y2,Y3,Y6,Y7,Y10,Y12], which is continuous in θ.
Dra
ft
25
puits
V5
V4
V3
V6
V7
V8
V9
V10
V11
V12
V13
V1
V2
source......................................................................................................................................
..........................................................................................................................................................................
..................................................................................................................
............................................................................................................
....................
..........................................................................................................................................................................................
.................... ....................
..........................................
..........................................
..........................................
...........................................
............................................................................................................
...................................................................................................................
....................
...................................................................................................................................................................................................................................................................................................................
1
2 4 7
3
6 9
5 8
Dra
ft
26
CMC Estimator:Xe = P[T > x | {Vj , j 6∈ L}].
Computed as follows.
For each l ∈ L, say going from al to bl , we compute the length αl of thelongest path from the source to al , then the length βl of the longest pathfrom bl to the sink.
No path going through l is longer than x iff αl + Vl + βl ≤ x .
Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].Since the Vl are independent, we obtain
Xe = 1− P[Vl ≤ x − αl − βl for all l ] = 1−∏l∈L
Fl [x − αl − βl ].
Dra
ft
26
CMC Estimator:Xe = P[T > x | {Vj , j 6∈ L}].
Computed as follows.
For each l ∈ L, say going from al to bl , we compute the length αl of thelongest path from the source to al , then the length βl of the longest pathfrom bl to the sink.
No path going through l is longer than x iff αl + Vl + βl ≤ x .
Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].
Since the Vl are independent, we obtain
Xe = 1− P[Vl ≤ x − αl − βl for all l ] = 1−∏l∈L
Fl [x − αl − βl ].
Dra
ft
26
CMC Estimator:Xe = P[T > x | {Vj , j 6∈ L}].
Computed as follows.
For each l ∈ L, say going from al to bl , we compute the length αl of thelongest path from the source to al , then the length βl of the longest pathfrom bl to the sink.
No path going through l is longer than x iff αl + Vl + βl ≤ x .
Conditionally on {Vj , j ∈ B}, this holds with probabilityP[Vl ≤ x − αl − βl ] = Fl [x − αl − βl ].Since the Vl are independent, we obtain
Xe = 1− P[Vl ≤ x − αl − βl for all l ] = 1−∏l∈L
Fl [x − αl − βl ].
Dra
ft
27
Discrete-event models
The stochastic derivative can be complicated to compute, because aninfinitesimal change in θ may have complicated impacts on the sequenceof events.
Propagation technique: infinitesimal perturbation analysis (IPA).
Dra
ft
28
Sample average optimizationSuppose we have an optimization problem of the form
min E[H(y,U)]subject to E[Gk(y,U)] ≥ bk for all k ,
y ∈ S (some set)
Simulate n copies of functions H and Gk , with CRNs across y, and takeaverages. Sample average problem (deterministic in y):
min Hn(y)subject to Gk,n(y) ≥ bk for all k ,
y ∈ S.Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.
Well-synchronized CRNs are essential.
Example: staffing and scheduling in a multiskill call center.
More general: Objective and constraints may contain functions of severalexpectations, or quantiles, etc.
Dra
ft
28
Sample average optimizationSuppose we have an optimization problem of the form
min E[H(y,U)]subject to E[Gk(y,U)] ≥ bk for all k ,
y ∈ S (some set)
Simulate n copies of functions H and Gk , with CRNs across y, and takeaverages. Sample average problem (deterministic in y):
min Hn(y)subject to Gk,n(y) ≥ bk for all k ,
y ∈ S.
Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.
Well-synchronized CRNs are essential.
Example: staffing and scheduling in a multiskill call center.
More general: Objective and constraints may contain functions of severalexpectations, or quantiles, etc.
Dra
ft
28
Sample average optimizationSuppose we have an optimization problem of the form
min E[H(y,U)]subject to E[Gk(y,U)] ≥ bk for all k ,
y ∈ S (some set)
Simulate n copies of functions H and Gk , with CRNs across y, and takeaverages. Sample average problem (deterministic in y):
min Hn(y)subject to Gk,n(y) ≥ bk for all k ,
y ∈ S.Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.
Well-synchronized CRNs are essential.
Example: staffing and scheduling in a multiskill call center.
More general: Objective and constraints may contain functions of severalexpectations, or quantiles, etc.
Dra
ft
28
Sample average optimizationSuppose we have an optimization problem of the form
min E[H(y,U)]subject to E[Gk(y,U)] ≥ bk for all k ,
y ∈ S (some set)
Simulate n copies of functions H and Gk , with CRNs across y, and takeaverages. Sample average problem (deterministic in y):
min Hn(y)subject to Gk,n(y) ≥ bk for all k ,
y ∈ S.Can be solved by a deterministic optimization method, but for eachsolution y, the objective and constraints are evaluated by simulation.Convergence: well-developed theory, CLTs, large deviations, etc.
Well-synchronized CRNs are essential.
Example: staffing and scheduling in a multiskill call center.
More general: Objective and constraints may contain functions of severalexpectations, or quantiles, etc.