COSC 3330/6308 Solutions to First Problem Set Jehan-François Pâris September 2012.
Review Session Jehan-François Pâris. Agenda Statistical Analysis of Outputs Operational Analysis...
-
Upload
paulina-stone -
Category
Documents
-
view
216 -
download
2
description
Transcript of Review Session Jehan-François Pâris. Agenda Statistical Analysis of Outputs Operational Analysis...
Review Session
Jehan-François Pâris
Agenda
Statistical Analysis of Outputs Operational Analysis Case Studies Linear Regression
How to use this presentation
Most problems haveOne slide stating the problemOne slide explaining how to solve the problemOne slide allowing you to check your answer
You will learn more by trying first to do the problems on your own than by reading their solutions
Do not forget either to review the problems in the original notes
Statistical Analysis of Outputs
The big picture
The problemsConstructing confidence intervalsHandling auto correlated data
The toolsCentral-Limit TheoremWilson’s formulaBatch means (and regeneration)RNG tricks
Confidence Intervals
Distinguish betweenCIs for means
CSIM does it for youCIs for proportions
We are on our own Major issue is independence of data points
CSIM uses batch means
Central Limit Theorem
If the n mutually independent random variables x1, x2, …, xn have the same distribution, and if their mean and their variance 2 exist then …
Central Limit Theorem
The random variable
is distributed according to the standard normal distribution (zero mean and unit variance).
n
xn
n
ii
1
1
CI for means (I) For large values of n, the (1-)% confidence
interval for is given by
with
nzx
nzx 2/2/ ,
21)( 2/
zF
CI for means (II)
F(z) is taken from a table of the normal distributionF(0.025) = 1.96
For smaller values of n, we have to use Student’s t random variableWider CIs
We replace by the sample standard deviation s
Example
We have100 observations for the waiting timexbar = 4.25 minutess2 = 25
Example
We have100 observations for the waiting timexbar = 4.25 minutess2 = 25
Answer is4.25 ± 1.96 sqrt(25/100) = 4.25 ± 0.98
CI for proportions
A proportion represents the probabilityP(X) for some fixed threshold 97% of our customers have to wait less than
one minute Distributed according to a binomial law
Use Wilson’s formula
Wilson’s formula
When n > 29, we can use the Wilson’s interval
where z/2 = 1.96 for a 95% C.I.
1
1
4)ˆ1(ˆ
2ˆ
1
4)ˆ1(ˆ
2ˆ
22/
2
22/
2/
22/
22/
2
22/
2/
22/
nz
nz
nqqz
nzq
q
nz
nz
nqqz
nzq
P
Example
We have want to estimate the proportion of packets that wait more than four slots400 observations 40 packets waited more than four slots
Answer
Divisor: 1 + 1.962/400 1.01 (instead of 1.0096)
Central term 0.1 + 1.962/(2×400) 0.105 (instead of 1.048)
Half width sqrt( (0.1×0.9)/400 + 1.962/(2×4002) )
sqrt (0.09/400 + (4/800)/400) 1/20 sqrt (0.09 +0.0025) 0.3/20 = 0.015
Result is (0.105 ± 0.015)/ 1.01 = 0.104 ± 0.015
Batch means (I)
Simulation data are often autocorrelatedPacket delays in ALOHAWaiting times in queues …
Batch means reduce (but do not completely eliminate) that effect
Batch means (II)
Group measurements into fixed-size batches of consecutive data
Compute mean of each batch If batches are large enough, these means
will be independentCan use standard-limit theorem, …
In case of doubt, compute autocorrelation function for successive batch means
Regeneration (I)
The ideaPartition simulation data into intervals such
that Data measured inside the same interval
might be correlated Data measured in different intervals are
independent
Regeneration (II)
How?System goes to a regeneration point each
time Its queues become empty All the disk drives are operational …
Criterion is system specific
Streams
When you want to evaluate two different configurations of a system, it is often good idea to use separate random number streams for arrivals and service timesArrival times remain unchanged when we
change other parameters of the system
Operational Analysis
Single server (I)
We can measureT the length of the observation periodA the number of arrivals during the
observation periodB the total amount of busy times during the
observation periodC the number of completions during the
observation period
Single server (II)
We can compute = A/T the arrival rateX = C/T the output rateU = B/T the utilizationS = B/C the mean service time
There are two ways to compute UU = B/T = (C/T )(B/C) = XS
In general A C and X
Little’s law If W is the total time spent by all tasks
inside the system over the observation period, thenN = W/TR = W/C
Since W/T = (C/T)(W/C) = XR, N = XR
This is important
A problem
An ice-cream parlorObserved during 6 hoursVisited by 120 customersSpend an average of 24 minutes inside
What is the average number of customers inside the parlor?
Answer
We compute X and apply Little’s Law
Answer
We compute X and apply Little’s LawX = 120/6 = 20 customers/hourR = 24 minutes = 0.4 hoursN = XR = 8 customers
If you did not get it
The 120 customers sent a total of 120×24 customer×minutes or 48 customer×hours in the parlor48 customer×hours/6 hours = 8 customers
Same as having 8 customers spending six hours each inside the parlor
Network of servers (I)
Arrivals Departures
Open network
Network of servers (II)
Arrivals Departures
Closed network
Operational Quantities
Keep same quantities as before but add indices0 for whole systemk for individual servers
Two changesWe never care about the utilization of the
whole systemWe add number of visits Vk of each server
Operational quantities Over the observation period, we measure
C = the number of job completionsCk = the number of tasks completed by
device k We define
X0 = C/T = the system throughputXk = Ck/T = the output rate at server kVk = Ck/C = the visit count at server k
Important relationships
Ck = VkCSince each job requires Vk visits, there are Vk
more server completions than job completions
Xk = Vk X0
Same property applies to throughputs
System response time (I)
We define Nbar = average number of jobs in the
systemnbari = average number of jobs at device i
Nbar = Σi nbari
in
System response time (II)
Applying Little’s law, we haveR = Nbar/X0
andnbari = RiXi = RiViX0
Hence
R = Σi ViRi
Note
This result is trivialThe total time spent by a job in the system is
the sum of the times spent at each server This includes the time spent waiting in the
server queues
Problem 1
A job requires100 ms of CPU time9 disk accesses
Each disk access takes 7 ms We want
VCPU and SCPU
Answer
We now that jobs get CPU first and lastVCPU = 10
ThenSCPU = 100/10 =10s
Bottleneck analysis (I)
A system has one CPU and one disk drive It processes transactions such that
VCPU = 12 and SCPU = 5ms
VDisk = 11 and SDISK = 8ms
What is the maximum system throughput?
Bottleneck analysis (II) We compute first the maximum device throughputs Maximum XCPU = 1/0.005 = 200 requests/s Maximum Xdisk = 1/0.008 = 125 requests/s Since Xi = Vi X0
Maximum throughput compatible with CPU workload is 200/12 = 16.7 transactions/s
Maximum throughput compatible with disk workload is 125/11 = 11.4 transactions/s
Bottleneck analysis (III)
The disk is this the bottleneck It has highest ViSi product
Identifying feature of any bottleneck device Increasing the system throughput might
requireSharing disk requests with a second disk Increasing the efficiency of the system I/O buffer
Problem 2
In the previous example, which device was the bottleneck?
What would be the throughput of the system if the bottleneck utilization was 80%?
Answer
We compareVCPUSCPU
VdiskSdisk
Answer
We compareVCPUSCPU = 100msVdiskSdisk = 9×7 = 63 ms
The CPU is the bottleneck
Answer
If the bottleneck was operating at 100% utilization, It could process one job each VCPUSCPU time
unitsOr 1/(VCPUSCPU) job per time unit
At UCPU utilization,
It will process UCPU/(VCPUSCPU) job per time unit
Answer
X0 = UCPU/(VCPUSCPU) = 0.80/0.10 seconds8 jobs/second
Systems with terminals
M Terminals
Wholesystem
Interactive response time formula We have
M terminals Think time Z between the completion of a job and the
submission of the next job
Applying Little’s law to the whole systemM = (R + Z ) X0
thenR = M/X0 – Z
Very Important
Problem 3
We haveM = 50 usersZ = 20 sX0 = 2 transactions/s
What is the system response time?
Answer
We apply R = M/X0 – Z
Answer
We apply R = M/X0 – Z and obtainR = 50/2 – 20 = 5 seconds
Problem 4
A systemProcesses 5 transactions/secondsHas 60 usersAchieves a response time of 4 seconds
What is the think time?
Answer
We apply R = M/X0 – Z,
Z = M/X0 – R
Answer
We apply R = M/X0 – Z,
Z = M/X0 – R = 60/5 – 4 = 8 seconds
Problem 5
We haveM = 50 users Z = 20 sR = 4 s
What is the system throughput?
Answer
From R = M/X0 – Z, we have
X0 = (R + Z)/M
Hence X0 = (20 + 4)/50 = 0.48 tasks/s
Problem 6
A systemCan process up to 4 transactions/secondHas 60 usersUser think time is 12 seconds
Can the system achieve a response time of 2 seconds?
Answer
Applying R = M/X0 – Z, we compute a lower bound for the response time Rmin = M/X0,max – Z
Answer
Applying R = M/X0 – Z, we compute a lower bound for the response time Rmin = M/X0,max – Z = 60/4 – 12 = 3 seconds
Answer is no
Problem 7 Compute the response time of a system
knowing the following parametersM = 50 usersZ = 15 sVCPU SCPU = 200ms UCPU = 50%
Answer
Since Xk = Uk /Sk and Xk = VkX0,X0 = Uk /(VkSk)
The response time is then given byR = M/X0 – Z
Answer
Let us compute first the throughput X0
Applying X0 = Uk/(VkSk)
X0 = 0.50/0.200 = 2.5 interactions/s
The response time is thenR = M/X0 – Z = 50/2.5 – 15 = 5 s
SimulationCase Studies
A simple reminder
If interarrival times areIndependent identically distributed
(i. i. d.) According to an exponential law
then the probability of having exactly n arrivals during a fixed interval is distributed according to a Poisson law
Explanation (II)
Assume thatThe probability of one arrival during a small
interval t is tThe probability of two arrivals during the same
small time interval is negligible
t tt t tt
Explanation (I)
The probability of having exactly k arrivals during n slots is
What would happen if the number of time intervals goes to infinity while their total duration T = nt remains constant
knk ttkn
)1()(
Explanation (III)
We rewrite the previous expression as
and compute separately the limits of its four factors
knk
k
knk
nT
nT
kT
knnn
nT
nT
knkn
)1()1(!)(
)!(!
)1()()!(!
!
Explanation (IV)
1)1(lim
)1(lim
unchanged remains!)(
1)1)...(1()!(
!lim
kn
Tnn
k
kkn
nT
enT
kT
nknnn
knnn
Explanation (V)
We obtain the Poisson distribution
The probability that there are no arrivals in the same time interval T (or in any time interval T) is
Tk
ekT
!)(
Te
Explanation (VI)
This last expression is the probability that the time interval between two consecutive arrivals is greater than T
The probability that the time interval between two consecutive arrivals is equal or lesser than T is
which is the cdf of the exponential distribution
Te 1
A final observation
Use the Poisson distribution to generate number of arrivals during a time interval
Use the exponential distribution to generate interarrival times
Linear Regression
Most important point
Compute a regression line
Compute regression coefficient
Example
Linear Regression
We haveone independent variableOne dependent variable
We must findY = + X
minimizing the sum of squares of errorsi (yi - - xi)2
Formulas
xy
xxn
yxyxn
i i ii
i i ii iii
22
Calculations (I)
Calculations (II)
Outcome
More notations
22
2
22
2
1)(
1
)()(
1)(
iii i
i iyy
iiiii ii
i i iixy
iii i
i ixx
yn
y
yyS
yxn
yx
yyxxS
xn
x
xxS
More notations (II)
Solution can be rewritten
xySS
xx
xy
Coefficient of correlation
r = 1 would indicate a perfect fit r = 0 would indicate no linear dependency
yy
xx
yyxx
xx
yyxx
xy
SSb
SSbS
SSS
r
Calculations