cs1538

Course Notes for

CS 1538Introduction to Simulation

ByJohn C. Ramirez

Department of Computer ScienceUniversity of Pittsburgh

2

• These notes are intended for use by students in CS1538 at the University of Pittsburgh and no one else

• These notes are provided free of charge and may not be sold in any shape or form

• These notes are NOT a substitute for material covered during course lectures. If you miss a lecture, you should definitely obtain both these notes and notes written by a student who attended the lecture.

• Material from these notes is obtained from various sources, including, but not limited to, the following:

Discrete-Event System Simulation, Fourth Edition by Banks, Carson, Nelson and Nicol (Prentice Hall)

• Also (same title and authors) Third Edition Object-Oriented Discrete-Event Simulation with Java by Garrido

(Kluwer Academic/Plenum Publishers) Simulation Modeling and Analysis, Third Edition by Law and Kelton

(McGraw Hill) A First Course in Monte Carlo by George S. Fishman (Thomson &

Brooks/ Cole

3

Goals of Course

• To understand the basics of computer simulation, including: Simulation concepts and terminology When it is useful Why it is useful How to approach a simulation How to develop / run a simulation How to interpret / analyze the results

4

Goals of Course

• To understand and utilize some of the mathematics required in simulations Statistical models and probability

distributions• How various models are defined

• Which models are correct for which situations Simple queuing theory

• Characteristics

• Performance measures

• Markovian models

5

Goals of Course

Random number theory• Generating and testing pseudo-random

numbers

• Generating pseudo-random values within various distributions

Analysis / generation of input data• How is input data generated?

• Is the data correct and appropriate for the simulation?

Analysis / measurement of output data• What does the output data mean and what

can be derived from it?

• How confident are we in our results?

6

Goals of Course

• To implement some simulation tools and some simulation projects What enhancements do typical

programming languages need to facilitate simulation?

Programming will be done in Java• Review if you are rusty

• Find / keep a good Java reference

• There are special-purpose simulation languages, but we will probably not be using them

7

Introduction to Simulation

• What is simulation? Banks, et al:

• "A simulation is the imitation of the operation of a real-world process or system over time". It "involves the generation of an artificial history of a system, and the observation of that artificial history to draw inferences … "

Law & Kelton: • "In a simulation we use a computer to

evaluate a model (of a system) numerically, and data are gathered in order to estimate the desired true characteristics of the model"

8


More specifically (but still superficially)• We develop a model of some real-world

system that (we hope) represents the essential characteristics of that system

– Does not need to exactly represent the system – just the relevant parts

• We use a program (usually) to test / analyze that model

– Carefully choosing input and output

• We use the results of the program to make some deductions about the real-world system

9


• Why (or when) do we use simulation?• This is fairly intuitive

Consider arbitrary large system X• Could be a computer system, a highway, a

factory, a space probe, etc. We'd like to evaluate X under different

conditions• Option 1: Build system X and generate the

conditions, then examine the results– This is not always feasible for many reasons:

> X may be difficult to build> X may be expensive to build

10


> We may not want to build X unless it is "worthwhile"

> The conditions that we are testing may be difficult or expensive to generate for the real system

• For example:– A company needs to increase its production and

needs to decide whether it should build a new plant or it should try to increase production in the plants it already has

> Which option is more cost-effective for the company?

– Clearly, building the new plant would be very expensive and would not be desirable to do unless it is the more cost-effective solution

– But how can we know this unless we have built the new plant?

11


• Option 2: Model system X, simulate the conditions and use the simulation results to decide

– Continuing with the same example:– Model both possibilities for increasing production

and simulate them both> We then choose the solution that is most economically

feasible

• Clearly, this is itself not a trivial task– Simulations are often large, complex and difficult to

develop– Just developing the correct system model can be a

daunting task– However, if a new plant costs hundreds of millions or

even billions of dollars, spending on the order of thousands (or even hundreds of thousands) of dollars on a simulation could be a bargain

12


When is simulation NOT a good idea?– See Section 1.2 of Banks text

• Don't use a simulation when the problem can be solved in a "simpler" or more exact way

– Some things that we think may have to be simulated can be solved analytically

– Ex: Given N rolls of a fair pair of dice, what are the relative expected frequencies of each of the possible values {2, 3, 4, … 12} ?

> We could certainly simulate this, "rolling" the dice N times and counting

> However, based on the probability of each possible result, we can derive a more exact answer analytically

13


> How many ways do we have of obtaining each outcome?

2:1, 3:2, 4:3, 5:4, 6:5, 7:6, 8:5, 9:4, 10:3, 11:2, 12:1Total of 36 possible outcomesFor N "rolls", the expected frequency of value i is

N * (Pi) = N * (outcomes yielding i / total outcomes)

> For example, for 900 rolls, the expected number of 9s generated would be 900 * (4 / 36) = 100

> Note that the expected value may not be a whole number (nor should it necessarily be)

> Given 500 rolls, the expected number of 9s is500 * (4 / 36) 55.55

– Note: You should be familiar with the general approach above from CS 0441

> We will be looking at some more complex analytical models later on

14


• Don't use a simulation if it is easier or cheaper to experiment directly on a real system

– Ex: A 24 hour supermarket manager wants to know how to best handle the cash register during the "midnight shift":

> Have one cashier at all times> Have two cashiers at all times> Have one cashier at all times, and a second

cashier available (but only working as cashier if the line gets too long)

– Each of these can be done during operating hours

> An extra employee can be used to keep track of queue data (and would not be too expensive)

> Differences are (likely) not that drastic so that customers will be alienated

15


• Don't use a simulation if the system is too complex to model correctly / accurately

– This is often not obvious– Can depend on cost and alternatives as well– Ex: Simulation of damage to the space shuttle –

results were disputed but what was the alternative?

16

Some Definitions

• System "A group of objects that are joined

together in some regular interaction or interdependence toward the accomplishment of some purpose" (Banks et al)• Note that this is a very general definition

• We will represent this system in our simulation using variables (objects) and operations

The state of a system is the variables (and their values) at one instance in time

17

Some Definitions

• Discrete vs. Continuous Systems Discrete SystemDiscrete System

• State variables change at discrete points in time

– Ex: Number of students in CS 1538> When a registration or add is completed, number

of students increases, and when a drop is completed, number of students decreases

Continuous System• State variables change continuously over time

– Ex: Volume of CO2 in the atmosphere

> CO2 is being generated via people (breathing), industries and natural events and is being consumed by plants

18

Some Definitions

– Models of continuous systems typically use differential equations to indicate rate of change of state variables

– Note that if we make the time increment and the unit of measurement small enough, we may be able to convert a continuous system into a discrete one

> However, this may not be feasible to do> Why?

– Also note that systems are not necessarily exclusively discrete or exclusively continuous

• We will be primarily concerned with Discrete Systems in this course

19

Some Definitions

• System Components Entities

• Objects of interest within a system– Typically "active" in some way– Ex: Customers, Employees, Devices, Machines, etc

• Contain attributes to store information about them

– Ex: For Customer: items purchased, total bill

• May perform activities while in the system– Ex: For Customer: shopping, paying bill– In normal cases it is really just the period of time

required to perform the activity

• Note how nicely this meshes with object-oriented programming

20

Some Definitions

Events• Instantaneous occurrences that may change the

state of a system– Note that the event itself does not take any time– Ex: A customer arrives at a store– Note that they "may" change the state of the system

> Example of when they would not?

• Endogenous event– Events occurring within the system– Ex: Customer moves from shopping to the check-out

• Exogenous event– Events relating / connecting the system to the outside– Ex: Customer enters or leaves the store

21

Some Definitions

• System Model A representation of the system to be used /

studied in place of the actual system• Allows us to study a system without actually

building it (which, as we discussed previously, could be very expensive and time-consuming to do)

Physical Model• A physical representation of the system (often

scaled down) that is actually constructed– Tests are then run on the model and the results used

to make decisions about the system– Ex: Development of the "bouncing bomb" in WWII

> http://www.bbc.co.uk/dna/ww2/A2163827 > http://www.computing.dundee.ac.uk/staff/irmurray/bigbounc.a

sp

http://www.bbc.co.uk/dna/ww2/A2163827

http://www.computing.dundee.ac.uk/staff/irmurray/bigbounc.asp

http://www.computing.dundee.ac.uk/staff/irmurray/bigbounc.asp

22

Some Definitions

Mathematical ModelMathematical Model• Representing the system using logical and

mathematical relationships

• Simulations are run using the mathematical model, and, assuming it is valid, the results are interpreted for the system in question

• Simple ex: d = vot + ½ at2

– This equation can be used to predict the distance traveled by an object at time t

– However, will acceleration always be the same?

• More often this model is fairly complex and defined by the entities and events

• So this is the model we will be using

23

Some Definitions

• Analytical evaluation– If the model is not too complex we can sometimes

solve it in a closed form using analytical methods– One type of analytical evaluation is the Markov

process (or Markov chain)– Nice simple example at:

http://en.wikipedia.org/wiki/Examples_of_Markov_chains – We will see this more in Section 6.4– Often problems that are too complex, even if they

can be modeled analytically, are too computation intensive to be practical

• Simulation evaluation– More often we need to simulate the

behavior of the model

http://en.wikipedia.org/wiki/Examples_of_Markov_chains

24

Some Definitions

Deterministic Model• Inputs to the simulation are known values

– No random variables are used– Ex: Customer arrivals to a store are monitored over a

period of days and the arrival times are used as input to the simulation

Stochastic ModelStochastic Model• One or more random variables are used in the

simulation– Results can only be interpreted as estimates (or

educated guesses) of the true behavior of the system– Quality of the simulation depends heavily on the

correctness of the random data distribution> Different situations may require different distributions

25

Some Definitions

– Ex: Customers arrive at a store with exponentially distributed interarrival times having a mean of 5 minutes

• In most cases we do not know all of the input data in advance, and at least some random data is required

– Thus, our simulations will typically use the stochastic model

26

Some Definitions

Static Model• Models a system at a single point in time,

rather than over a period of time

• Sometimes called Monte Carlo simulations– We'll briefly discuss these shortly

Dynamic ModelDynamic Model• Models a system over time

• Our simulations will typically use this model

• In summary our models will typically be: discrete, mathematical, stochastic and dynamic

27

The Clock

• Since we are using the dynamic model, we need to represent the passage of time We need to use a clock Three fundamental approaches to time

progression• Next-event time advance

– Clock initialized to zero– As the times of future events are determined, they

are put into the future event list (FEL)– Clock is advanced to the time of the next most

imminent event, the event is executed and removed from the list

– See example in Section 3.1.1

28

The Clock

Ex: People (P) using a MAC machine• Event A == arrival of a customer at MAC machine

• Event C == completion of a transaction by a customer

Clock

FEL Event Action

0 (A2,t1), (C1,t2) A1

P1 arrives, is served; Events A2 and C1 generated, placed in FEL

t1 (C1,t2), (A3,t3) A2P2 arrives, waits; Event A3 generated, placed in FEL

t2 (A3,t3), (C2,t4) C1

P1 completes; P2 is served; Event C2 generated, placed in FEL

t3 (A4,t5), (C2,t4) A3

P3 arrives, waits; Event A4 generated, placed in FEL (note:

t5<t4)

t5 (C2,t4), (A5,t6) A4P4 arrives, waits; Event A5 generated, placed in FEL

t4 (A5,t6), (C3,t7) C2

P2 completes; P3 is served; Event C3 generated, place in FEL

29

The Clock

• Fixed-increment time advance (activity scanning)– Clock initialized to zero– Clock is incremented by a fixed amount (ex. 1)– With each increment, list of events is checked to see

which should occur (could be none)– Clock is typically easier to implement in this way– However, execution is less efficient

> Potentially many scans for each event

• Process-interaction approach– Entities are associated with processes– Processes interact as entities progress through

system– Could delay while waiting for a resource, or during

an interaction with another process> Can be implemented with multithreading or

multiprocessing

30

Simple Example

• Let's consider a very simple example: Single-Channel Queue (Example 2.1 in text)

• Small grocery store with a single checkout counter

• Customers arrive at the checkout at random between 1 and 8 minute apart (uniform)

• Service times at the counter vary from 1 to 6 minutes

– P(1) = 0.1, P(2) = 0.2, P(3) = 0.3, P(4) = 0.25P(5) = 0.1, P(6) = 0.05

• Start with first customer arriving at time 0

• Run for a given number of customers (text uses 100)

• Calculate some results that may be useful

31

Simple Example

The entities are the customers The system is discrete since states are

changed at specific points in time• ex: a customer arrives or leaves

The model is mathematical (since we don't have real customers)

The model is stochastic since we are generating random arrivals and random service times

The model is dynamic since we are progressing in time

32

Simple Example

What results are we interested in?• In this simple case we may want to know

– What fraction of customers have to wait in line– What is the average amount of time that they

wait– What is the fraction of time the cashier is idle

(or busy)

• We probably want to do several runs and get cumulative results over the runs (ex: averages)

• There are more complex statistics that may be relevant

– We will discuss some of these later

33

Simple Example

We can program this example, but in this simple case we could also use a table or spreadsheet to obtain our results• Let's first look at an "Excel novice" approach to this

– See sim1.xls

• Although some of the spreadsheet formulas require some thought, this is fairly simple to do

• Note that each row in the spreadsheet depends only on some local data (generated in that row) and the data in the previous row

– We do not need a "memory" of all rows

• Authors have a much nicer spreadsheet with macros

– See http://www.bcnn.net

http://www.bcnn.net/

34

Programming a Simple Example

• If we do program it, how would we do it? Using Java, it is logical to do it in an object-

oriented way Let's think about what is involved

• We need to represent our entities– As text indicates, for this simple example we do

not have to explicitly represent them– However, we can do it if we want to – and have

our Customers and CheckOut as simple Java objects

• We need to represent our events– We need to store events in our Future Event List

(FEL) and we have two different kinds of events (arrival of a customer, finish of a checkout)

35


> We need to distinguish between the different event types (since different actions are taken for different events)

> We need to order our events based on the simulation clock time that they will occur

– Thus we probably need to explicitly represent the events in some way

> Use classes and inheritance to represent the different events

> This enables events to share characteristics but also to be distinguished from each other

> So we need a event time instance variable and a method to compare event times

> Look at SimEvent.java, ArrivalEvent.java, CompletionEvent.java

36

Priority Queue to Represent the FEL

• We need to represent the FEL itself– Since we are inserting items and then removing

them based on priority (earliest next time of an event is removed first), we should use a priority queue (PQ) with the following operations:

> add (Object e) – add a new Object to the PQ> remove() – remove and return the Object with the

min (best) priority value> peek() – return the Object with the min (best)

priority value without removing it

– It's also a good idea to have some helper methods

> size() – how many items are in the PQ> isEmpty() – is the PQ empty

– There are variations of these ops depending on the implementation, but the idea is the same

37

Priority Queue to Represent the FEL

– How to efficiently implement a Priority Queue?> How about an unsorted array or linked list?> add is easy but remove is hard – why? – discuss> How about a sorted array or linked list?> removeMin is easy but add is hard – why? –

discuss

– Neither implementation is adequate in terms of efficiency

> Note that the premise of a PQ is that everything that is inserted is eventually removed

> Thus, with N adds you have N removes> Discuss / show on board overall time required for

both implementations> You may have seen this already in CS 1501

– Thus we need a better approach> Implementation of choice is the Heap

38

Heap Implementation of a Priority Queue

– Idea of a Heap:> Store data in a partially ordered complete binary

tree such that the following rule holds for EACH node, V:

Priority(V) betterthan Priority(LChild(V))

Priority(V) betterthan Priority(RChild(V))

> This is called the HEAP PROPERTY> Note that betterthan here often means smaller> Note also that there is no ordering of siblings –

this is why the overall ordering is only a partial ordering

– ex:

10

30

40 70

90 45

20

80

3585

39


– How to do our operations?> peek() is easy – return the root> add() and remove() are not so obvious> Let's look at them separately

– add(Object e)> We want to maintain the heap property> However, we don't know where in advance the

new object will end up> We also don't want a lot of rearranging or

searching if we can avoid it – remember time is key

> Solution: Add new object at the next open leaf in the last level of the tree, then push the node UP the tree until it is in the proper location

> This operation is called upHeap> See example on board

40


– remove()> Clearly, the min node is the root> However, removing it will disrupt the tree greatly> How can we solve this problem?

• Remember BST delete?– Did not actually delete the root, but rather the _______________ (fill in blank)

• We will do a similar thing with our Heap– Copy the last leaf to the root and delete (easily) the leaf node– Then re-establish the heap property by a downHeap– See example on board

41


– Run-Time?> Since our tree is complete, it is balanced and thus

for N nodes has a height of ~ lgN> Thus upHeapand downHeap require no more than

~lgN time to complete> Thus, if we have N adds and N removeMins, our

total run-time will be NlgN> This is a SIGNIFICANT improvement of the simpler

implementations, especially for a long simulation> Ex: Compare N2 with NlgN for N = 1M (= 220)

– Note:> For our simple example, a heap is probably not

necessary, since we have few items in our FEL at any given time

> However, for more complex simulations, with many different event types, a heap is definitely preferable

42

Implementing a Heap

– How to Implement a Heap?> We could use a linked binary tree, similar to that

used for BSTWill work, but we have

overhead associated with dynamic memory allocation and access

> But note that we are maintaining a complete binary tree for our heap

> It turns out that we can easily represent a complete binary tree using an array

We simply must map the tree locations onto the array indexes in a reasonable / consistent way

– Idea:> Number nodes row-wise starting at 0 (some

implementations start at 1)> Use these numbers as index values in the array

43

Implementing a Heap

> Now, for node at index i

> See example on board

– Now we have the benefit of a tree structure with the speed of an array implementation

• So now should we write the code?– No! Luckily, in JDK 1.5 a heap-based

PriorityQueue class has been provided!– It's still a good idea to understand the

implementation, however– Look at API

Parent(i) = floor((i-1)/2)LChild(i) = 2i+1RChild(i) = 2i+2

44

Queue for Waiting Customers

• We need to represent the queue (or line) of customers waiting at the checkout

– This is a FIFO queue and can simply be implemented in various ways

> We can use a circular array> We can use a linked-list

– You should be already familiar with queue implementations from CS 0445

– In JDK 1.5 Queue is an interface which is implemented by the LinkedList class

> See API> Q: Would a similar approach using an ArrayList

also be good?

45


• We need to represent the clock– This is fairly easy – we can do it with an integer

> In some cases it might be better to use a double

• We need to implement some activities– These are actually better defined as the time

required for activities to execute– Typically interarrival times or service times,

either specified exactly (with deterministic model) or by probability distributions (with stochastic model)

> In our case, we have the interarrival times of customers and the time required for checkout, specified by the distributions shown on p. 28 of the text

> We will discuss various distributions in more detail later

46


Let's put this all together: GrocerySim.java• This is a fairly object-oriented implementation,

using newer JDK 1.5 features Note that there is also a Java version from

authors in Chapter 4• Look over this one as well

• Does not utilize JDK 1.5 and not quite as object-oriented

• The author also switches distributions in this implementation

– Uses an exponential distribution for arrivals– Uses a normal distribution for service times

> We will look at these later

47

One More Example

• Newspaper Seller's Problem Example 2.3 in text Simple inventory problem

• Each day new inventory is produced and used, but is not carried over to successive days

• Thus, time is more or less removed from this problem

Used where goods are only useful for a short time• Ex: newspaper, fresh food

In this case, our goal is to try to optimize our profit

48

Newspaper Seller's Problem

Specifics of the Newspaper Seller's Problem• Seller buys N newspapers per day for 0.33 each

• Seller sells newspapers for 0.50 each

• Unused papers are "scrapped' for 0.05 each

• If seller runs out, lost revenue is 0.17 for each not sold paper

– Text says this is controversial, which is true– How to predict how many would have been sold?

> Perhaps seller goes home when he/she runs out> May be a goal to run out every day – easier than

returning the papers for scrap

• See sim2.xls

49


In fact we do we really need to simulate this problem at all?• The data is simple and highly mathematical

• Time is not involved Let's try to come up with an analytical

solution to this problem• We have two distributions, the second of which

utilizes the result of the first

• Let's calculate the expected values for random variables using these distributions

– For a given discrete random variable X, the expected value,

E(X) = Sum [xi p(xi)] (more soon in Chapter 5)all i

50


• Let our random variable, X, be the number of newspapers sold

– Let's first consider the expected value for each of the demands of good, fair and poor

Demand Probability Distribution

Demand Good Fair Poor

40 0.03 0.10 0.44

50 0.05 0.18 0.22

60 0.15 0.40 0.16

70 0.20 0.20 0.12

80 0.35 0.08 0.06

90 0.15 0.04 0.00

100 0.07 0.00 0.00

51


• Egood(X) = (40)(0.03) + (50)(0.05) + (60)(0.15) +

(70)(0.20) + (80)(0.35) + (90)(0.15) + (100)(0.07) = 75.2

• Efair(X) = (40)(0.10) + (50)(0.18) + (60)(0.40) +

(70)(0.20) + (80)(0.08) + (90)(0.04) + (100)(0.00) = 61

• Epoor(X) = (40)(0.44) + (50)(0.22) + (60)(0.16) +

(70)(0.12) + (80)(0.06) + (90)(0.00) + (100)(0.00) = 51.4

Now we need to use the second distribution (of good, fair and poor days) to determine the overall expected value

52


• E(X) = (Egood(X))(0.35) + (Efair(X))(0.45) +

(Epoor(X))(0.20) = 64.05

Now we utilize the expected number of newspapers sold to find results for each of the potential number that we stock• Let sales = expected value calculated above

• Let stock = number vendor purchases

• Let left = stock – sales (only if stock > sales, else 0)

• Let lost = sales – stock (only if sales > stock, else 0)

• Profit = (Min(sales,stock))(0.5) – (stock)(0.33) + (left)(0.05) – (lost)(0.17)

53


Stock Profit

40 2.71

50 6.11

60 9.51

70 9.2

80 6.4

90 3.6

100 0.82

Expected profit values for given stock anoumts

Note that this table shows that 60 is the best choice (more or less

agreeing with the simulation results)

54


Is this analytical solution correct?• Not entirely

• We are using an expected value to derive another expected value – oversimplifying the actual analysis

• The variance from the expected value will cause our actual results to differ

• Note that the simulation results are almost identical to the analytical for small and large inventories

• In the middle there is more variation and this is where using the expected value is inadequate

• However, as a basis for choosing the best number of papers to stock, it still works

55

Other Simulation Examples

There are other examples in Chapters 2 and 3• Read over them carefully

• We may look at some of these types of simulations later on in the term

56

Simulation Software

• Simulations can be written in any good programming language

• However, many things that need to be done in simulations can be built into languages to make them easier Random values from various probability

distributions Tools for modeling Tools for generating and analyzing output Graphical tools for displaying results

57

Simulation Software

Look at the various described languages Our simple queueing example (Example

2.1) is shown using many of the languages• Even if you don't completely understand all of

the code, look it over to note some differences We may look at one of these packages

later in the term if we have time

58

Probability and Statistics in Simulation

• Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed to determine / choose the input

probability distributions• Needed to generate random samples / values

from these distributions Needed to analyze the output data /

results Needed to design correct / efficient

simulation experiments

59

Experiments and Sample Space

• Experiment A process which could result in several

different outcomes

• Sample Space The set of possible outcomes of a given

experiment

• Example: Experiment: Rolling a single die Sample Space: {1, 2, 3, 4, 5, 6}

• Another example?

60

Random Variables

• Random Variable A function that assigns a real number to

each point in a sample space Example 5.2:

• Let X be the value that results when a single die is rolled

• Possible values of X are 1, 2, 3, 4, 5, 6

• Discrete Random Variable A random variable for which the number of

possible values is finite or countably infinite• Example 5.2 above is discrete – 6 possible values

61

Random Variables and Probability Distribution

• Countably infinite means the values can be mapped to the set of integers– Ex: Flip a coin an arbitrary number of times. Let

X be the number of times the coin comes up heads

• Probability Distribution For each possible value, xi, for discrete

random variable X, there is a probability of occurrence, P(X = xi) = p(xi)

p(xi) is the probability mass function (pmf) of X, and obeys the following rules:

1) p(xi) >= 0 for all i

2) = 1

iallixp )(

62

Random Variables and Probability Distribution

The set of pairs (xi, p(xi)) is the probability distribution of X

Examples:• For Example 5.2 (assuming a fair die):

– Probability Distribution:> {(1, 1/6), (2, 1/6), (3, 1/6), (4, 1/6), (5, 1/5), (6, 1/6)}

• From Example 2.1 for Service Times– Probability Distribution:

> {(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.25), (5, 0.1), (6, 0.05)}

• From Example 2.3 for Type of Newsday– Probability Distribution:

> {(0, 0.35), (1, 0.45), (2, 0.20)}> Note in this case we are assigning the values 0, 1, 2 to

the outcomes somewhat arbitrarily

63

Cumulative Distribution

• Cumulative Distribution Function The pmf gives probabilities for individual

values xi of random variable X The cumulative distribution function (cdf),

F(x), gives the probability that the value of random variable X is <= x, orF(x) = P(X <= x)

For a discrete random variable, this can be calculated simply by addition:F(x) =

xxi

i

xp )(

64

Cumulative Distribution

Properties of cdf, F:1) F is non-decreasing2) 3)

andP(a < X b) = F(b) – F(a) for all a < b

Ex: Probability that a roll of two dice will result in a value > 7?• Discuss

Ex: Probability that 10 flips of a fair coin will yield between 6 and 8 (inclusive) heads?• Discuss

0)(lim

xFx

1)(lim

xFx

65

Expected Value

• Expected Value (for discrete random variables)

Also called the mean Ex: Expected value for roll of 2 fair dice?

E(X) = (2)(1/36) + (3)(2/36) + (4)(3/36) + (5)(4/36) + (6)(5/36) + (7)(6/36) + (8)(5/36) + (9)(4/36) + (10)(3/36) + (11)(2/36) + (12)(1/36) = 7

•Note that in this case the expected value is an actual value, but not necessarily

iall

ii xpxXE )()(

66

Expected Value and Variance

If each value has the same "probability", we often add the values together and divide by the number of values to get the mean (average)• Ex: Average score on an exam

• Variance

We won't prove the identity, but it is useful

]])[[()( 2XEXEXV

)10.5(22 )]([)( EquationXEXE

67


• In the original definition, we need to subtract the mean from each of the X values before squaring

– So we need each X value to calculate the mean AND AFTER the mean has been calculated

> Must look at them twice

• In the right side of the equation (Equation 5.10 in the text), we need to calculate the mean of X and the mean of the squares of X

– We can do this as we process the individual X values and need to look at them only one time

• Ex: What is the variance of the following group of exam scores: { 75, 90, 40, 95, 80 }

– Since each value occurs once, we can consider this to have a uniform distribution

68


• V(X) using original definition:E(X) = (75+90+40+95+80)/5 = 76V(X) = E[(X – E[X])2] = [(75-76)2 + (90-76)2 + (40-

76)2

+ (95-76)2 + (80-76)2]/5 = (1 + 196 + 1296 + 361 + 16)/5

= 374

• V(X) using Equation 5.10E(X) = (75+90+40+95+80)/5 = 76E(X2) = (5625+8100+1600+9025+6400)/5 = 6150V(X) = 6150 – (76)2 = 374– Note that in this case we can add each number to

one sum and its square to another, so we can calculate our overall answer with one a single "look" at each number

69

Discrete Distributions

• Discrete Distributions of interest: Bernoulli Trials and the Bernoulli Distribution

• Consider an experiment with the following properties

– n independent trials are performed– each trial has two possible results – success or

failure– the probability of success, p and failure, q (= 1 – p)

is constant from trial to trial– for random variable X, X = 1 for a success and X = 0

for a failure

• Probability Distribution:P(X = 1) = pP(X = 0) = 1 – p = qor 0 for all other values of X

70

Bernoulli Distribution

• Expected Value– E(X) = (0)(q) + (1)(p) = p

• Variance– V(X) = [02q + 12p] – p2 = p(1 – p)

A single Bernoulli trial is not that interesting• Typically, multiple trials are performed, from

which we can derive other distributions:– Binomial Distribution– Geometric Distribution

71

Binomial Distribution

Binomial Distribution• Given n Bernoulli trials, let random variable X

denote the number of successes in those trials

• Note that the order of the successes is not important, just the number of successes

– Thus, we can achieve the same number of successes in various different ways

– Since the trials are independent, we can multiply the probabilities for each trial to get the overall probability for the sequence

otherwise

nxqpx

n

xpxnx

,0

,,1,0,)(

72


• Recall that the number of combinations of n items taken x at a time is

• E(X) = np– Discuss

• V(X) = npq

• Consider an example:– Exercise 5.1– Read– Do solution on board

)!(!

!

xnx

n

x

n

73


• Consider again coin-flip ex. on slide 63

• Generally speaking binomial distributions can be used to determine the probability of a given number of defective items in a batch, or the probability of a given number of people having a certain characteristic

– Ex: The trait of having a klinkled flooje occurs on average in 10% of Kreptoplomians (krep-tō-plō'-mē-əns). Given a group ot 20 Kreptoplomians, what is the probability that 3 of them have klinkled floojes?

P(3) = (20 C 3)(0.1)3(0.9)17 = (1140)(0.001)(0.1668) = 0.1902

74

Geometric Distribution

Geometric Distribution• Given a sequence of Bernoulli trials, let X

represent the number of trials required until the first success

– i.e. we have x – 1 failures, followed by a success– Note that the maximum probability for this is at X

= 1, regardless of p and q

• E(X) = 1/p

• V(X) = q/p2

– We will omit the proofs of the above, since they are fairly complex (involving series solutions)

otherwise

xpqxp

x

,0

,2,1,)(

1

75


• Ex: What is the probability that the first Kreptoplomian found to have a klinkled flooje will be the 5th Kreptoplomian overall?

(0.9)4(0.1) = 0.0656

• Ex: The probability that a certain computer will fail during any 1-hour period is 0.001

– What is the probability that the computer will survive at least 3 hours?

> Here p = 0.001 and q = (1 – p) = 0.999

– Using a geometric distribution, we want to solveP(X >= 4) = 1 – P(1) – P(2) – P(3)

= 1 – (0.001) – (0.999)(0.001)

– (0.999)2(0.001) = 0.997

76


• The Geometric Distribution is memoryless– Consider the following two scenarios where p =

probability that a component will fail in the next hour. Assume the current hour is hour 0.1) What is the probability that the component will fail

by the end of hour 3?2) What is the probability that the component will fail

by the end of hour 6, given that it has not failed by the end of hour 3 ?

> For 1) the solution is P(1) + P(2) + P(3)> For 2), since the component did NOT fail by the

end of hour 3, and since the probability is for the next hour (whatever that hour may be), the solution is the same

– We can prove this property with fairly simple algebra> First we need one additional definition

77


• The conditional probability of an event, A, given that another event, B, has occurred is defined to be:

• Applying this to the geometric distribution we get

– Clearly, if X > s+t, then X > t, so we get

)(

)()|(

BP

BAPBAP

)(

])[]([)|(

sXP

sXtsXPsXtsXP

)(

)()|(

sXP

tsXPsXtsXP

78


– Consider that P(X > s) =

– We can use similar logic to determine that P(X > s + t) = qs+t

– Now our conditional probability becomes

– and thus we have shown that the geometric distribution is memoryless

> We will see shortly that the exponential distribution is also memoryless

1

1

1

1

1

1 )1(sj

jj

sj

j

sj

j qqqqpq

sssssss qqqqqqq 32211

)()|( tXPqq

qsXtsXP t

s

ts

79

Poisson Distribution

Poisson Distribution• Often used to model arrival processes with

constant arrival rates

• Gives (probability of) the number of events that occur in a given period

• Formula looks quite complicated (and NOT discrete), but it is discrete and using it is not that difficult

• Where is the mean arrival rate. Note that must be positive

• E(X) = V(X) =

otherwise

xx

exp

x

,0

,1,0,!)(

80


• Note: The Poisson Distribution is actually the convergence of the Binomial Distribution as the number of trials, n, approaches infinity

– If we think of n as the number of subintervals of a given unit of time

– As n , the subintervals get smaller and smaller

– We will skip the detailed math here– One nice feature of this is that we can use a

Poisson Distribution to approximate a Binomial Distribution when n is large

x

i

i

i

exF

0 !)(

81


• Example– Number of people logging onto a computer per

minute is Poisson Distributed with mean 0.7– What is the probability that 5 or more people

will log onto the computer in the next 10 minutes?

– Solution?>Must first convert the mean to the 10 minute period – if mean is 0.7 in 1 minute, it will be (0.7)(10) = 7 in a ten minute period>Now we can plug in the formula>P(X >= 5) = 1 – P(0) – P(1) – P(2) – P(3) – P(4)

= 1 – F(4) (where F is the cdf)

= 1 – 0.173 (from Table A.4 in the text)

= 0.827

82

Continuous Random Variables

• Continuous Random Variable Random variable X is continuous if its sample

space is a range or collection of ranges of real values

More formally, there exists non-negative function f(x), called the probability density function, such that for any set of real numbers, S

a) f(x) >= 0 for all x in the range spaceb) – the total area under f(x) is 1

c) f(x) = 0 for all x not in the range space

– Note that f(x) does NOT give the probability that X = x– Unlike the pmf for discrete random variables

spacerange

dxxf 1)(

83


• The probability that X lies in a given interval [a,b] is

– We see this visually as the "area under the curve"– Note that for continuous random variables,

P(X = x) = 0 for any x (see from formula above)

– Rather we always look at the probability of x within a given range (although the range could be very small)

• The cumulative density function (cdf), F(x) is simply the integral from - to x or

– This gives us the probability up to x

b

adxxfbXaP )()(

xdttfxF )()(

84


• Ex: Consider the uniform distribution on the range [a,b] (see text p. 166)

– Look at plots on board for example range [0,1]> What about F(x) when x < a or x > b?

Expected Value for a continuous random variable

– Compare to the discrete expected value

otherwise

bxaifabxf

0

1)(

x

a

x

abxaif

ab

axdy

abdyyfxF

1)()(

dxxxfXE )()(

85


Variance for continuous random variables

• Defined in same way as for discrete variables– Calculating it will clearly be different, however

Ex: Uniform Distribution

2)(2

))((

)(2)(2)(

222

ab

ab

abab

ab

ab

ab

x

a

bdx

ab

xXE

b

a

23

22

22

)]([)(3

)(

)10.5()()()(

XEab

x

a

bXEdx

ab

x

EqXEXEXV

b

a

86


12

)(

)(12

)(

)(12

33

)(12

36336344

)(12

)](2[344

)(12

)()(3

)(12

)(4

)(4

)()(

)(3)(2

))((

)(3

)(2)(3)]([

)(3)(

232233

32222333

2233

233

2

2233233

222332

3

ab

ab

ab

ab

baabab

ab

abaabbaabbab

ab

abaabbab

ab

abab

ab

ab

ab

abab

ab

ab

ab

abab

ab

ab

ab

ab

ab

abXE

ab

x

a

bXV

87

Uniform Distribution

• Generally speaking we will not be calculating these values from scratch (good news, in all likelihood!)

– However, it is good to see how it can be done for at least one (simple) distribution

The Uniform Distribution will be useful primarily in the generation of other distributions• Ex: Most random number generators on

computers minimally will provide a uniform value from [0,1)

– We will later see how that can be used to generate other random variates (See Chapter 8)

88

Exponential Distribution

Given a value > 0, the pdf of the Exponential Distribution is

• Since the exponent is negative, the pdf will decrease as x increases – see shape on board and p. 168

• is the rate: number of occurrences per time unit– Ex: arrivals per hour; failures per day

• Note that 1/ can thus be considered to be the time between events or the duration of events

– Ex: 10 arrivals per hour 1/10 hr (6min) average between arrivals

– Ex: 20 customers served per hour 1/20 hr (3min) average service time

otherwise

xexf

x

,0

0,)(

89


• Some more definitions

– See shape of cdf on board and p. 168

• Ex: Assume the hard drives manufactured by Herb's Hard Drives have a mean lifetime of 4 years, exponentially distributeda) What is the probability that one of Herb's Hard

Drives will fail within the first two years?b) What is the probability that one of Herb's Hard

Drives will last at least 8 years (or twice the mean)?

x xt xedte

xxF

XVXE

0

2

0,1

0,0)(

1)(

1)(

90


• Like the geometric distribution, the exponential distribution is memoryless– Implies that P(X > s+t | X > s) = P(X > t)– The proof of this is similar in nature to that for

the geometric distribution> See pp. 169 in text

• Ex: Exercise 5.19 in the text– Component has exponential time-to-failure with

mean 10,000 hrsa) Component has already been in operation for its

mean life. What is the prob. that it will fail by 15,000 hrs?

b) Component is still ok at 15,000 hrs. What is the prob. that it will operate for another 5,000 hrs

The first thing we need to do here is be sure we understand the problem correctly

91


– First, let's determine our distribution> X is the time to failure, and is exponentially

distributed with mean 10,000 hours> This gives us a cumulative distribution

> Here the failure rate = = 1/10000

– a) is pretty clear – we want the probability that it lasts at most 15,000 given that it has lasted 10,000

> We want P(X <= 15000 | X > 10000)> Due to the memoryless property of the

exponential distribution, this reduces to P(X <= 5000) which is

1 – e-1/2 = 0.3935

0,1)( 10000

xexFx

92


– b) is trickier (and actually not phrased well)> Do they mean the prob. that it will last exactly

5000 more hours?If so, the probability is 0,

since continuous distributions have 0 prob. for a specific value

> Do they mean it will last at most 5000 more hours?

If so, the answer is the same as a) due to the memoryless property

> Do they mean it will last at least 5000 more hours?

If so, we want 1 – F(5000) = 0.6065

93

Gamma Distribution

Gamma Distribution• More general than exponential

• Based on the gamma function, which is a continuous generalization of the the factorial function

base case (1) = 1 is called the shape parameter – The gamma function also has , the scale

parameter– This leads the the following pdf:

when)!1(

)1()1()(

otherwise

xexxf

x

,0

0,)()()(

1

94

Gamma Distribution

– These formulas look complicated (and they are)– However, they allow for more flexibility in the

distributions> Allow more different curves (See Fig. 5.10a)

– Note that if =1, the equations simplify to the exponential distribution with rate

> Look at the formulas to see this

0,0

0,)()(

1)(

1)(

1)(

1

2

x

xdtetxF

XVxE

x

t

95

Erlang Distribution

When is an arbitrary positive integer, k, the Gamma Distribution is also called the Erlang Distribution of order k• In general terms, it represents the sum of k

independent, exponentially distributed variables, each with rate k (= )

X = X1 + X2 + … + Xk leading to

0,0

0,!

)(1

)(

1

)(

1

)(

1

)(

1)(

1111)(

1

0

2222

x

xi

xkexF

kkkkXV

kkkXE

k

i

ixk

96

Erlang Distribution

• This allows us to determine probabilities for sequences of exponentially distributed events

– Note that the rates for all events in the sequence must be the same

• Ex: Exercise 5.21 in text– Time to serve a customer at a bank is

exponentially distributed with mean 50 seca)Probability that two customers in a row will each

require less than 1 minute for their transactionb)Probability that the two customers together will

require less than 2 minutes for their transactions– It is important to recognize the difference

between these two problems

97

Erlang Distribution

– For a) we are looking at the probability that each of two independent events will be < 1 minute

> In this case, the probability overall is the product of the two probabilities, each of which is exponential = rate = 1/mean = 1/50We want P(X < 60) = F(60) = 1 – e-(1/50)(60) = 0.6988

> Thus the total probability is (0.6988)2 = 0.4883

– For b) we are looking at the probability that two events together will be < 2 minutes

> In this case the probability overall is an Erlang distribution with k = 2 and k = 1/50, and we want to determine P(X < 120) = F(120)

> Substituting into our equation for F, we get

98

Erlang Distribution

– Note that these results are fairly intuitive> Requiring both to be < 1 minute is more

restrictive a condition than requiring the sum to be < 2 minutes, and would seem to have a lower probability

– How about if we add another part: Probability that the next 3 customers will have a cumulative time of more than 2.5 minutes?

> Now we want P(X > 150) = 1 – F(150)> But F has changed since we now have 3 events> Let's do this one on the board

6916.0)2177.0()0907.0(1

!

)50/120(1)120(

12

0

)50/120(

i

i

i

eF

99

Normal Distribution

A very common distribution is the Normal Distribution• It has some nice properties

– Discuss these

• The pdf for the normal distribution is also quite complex

– We won't even show it here– However, we can use tables and another nice

property to allow solution for arbitrary normal distributions

)())(max(

)()(

)(lim0)(lim

fxf

xfxf

xfxf xx

100

Normal Distribution

• Define (z) to be the normal distribution with mean () 0 and variance (2) 1

= N(0, 1)– We call this the standard normal distribution– This can be calculated using numerical methods,

and its cdf is typically provided in tables in statistics (and simulation) textbooks (see Table A.3 in text)

– We use the notation Z ~ N(0, 1) to mean that Z is a random variable with a standard normal distribution

• Naturally, most normal distributions of interest will not be the standard normal distribution

– However, Eqs 5.42 & 5.43 in the text relate any normal distribution to the standard normal distribution in the following way

101

Normal Distribution

– Given an arbitrary normal distribution, X ~ N(, 2), let Z = (X – )/

– Through Eq. 5.43, we know that

> where (z) is the cumulative density function for the standard normal distribution

– Thus we can use the tabulated values of the standard normal distribution to determine probabilities for arbitrary normal distributions

– Ex:Student GPA's are approximately normally distributed with =2.4 and =0.8. What fraction of students will possess a GPA in excess of 3.0?

xxF )(

Example is from From Mathematical Statistics with Applications, Second Edition by Mendenhall, Scheaffer and Wackerly

Example is from From Mathematical Statistics with Applications, Second Edition by Mendenhall, Scheaffer and Wackerly

102

Normal Distribution

– Let Z = (X – 2.4)/0.8– We want the area under the normal curve with

mean 2.4 and standard deviation 0.8 where x > 3.0This will be 1 – F(3.0)F(3.0) = [(3.0 – 2.4)/0.8] = (0.75)

– Looking up (0.75) in Table A.3 we find 0.77337– Recall that we want 1 – F(3.0), which gives us our

final answer of 1 – 0.77337 = 0.2266

• The idea in general is that we are moving from the mean in units of standard deviations

– The relationship of the mean to standard deviation is the same for all normal distributions, which is why we can use the method indicated

103

Other Distributions

There are a LOT of probability distributions• More in the text that we did not discuss

• Many others not in the text For simulation, the idea for using them is:

• How well does the distribution of choice model the actual distribution of events / times that are relevant to our model

• The more possibilities and variations, the more closely we can model our actual behavior

• However, we need to be able to determine if a distribution fits observed data

– We will look at this in Chapter 9

104

Poisson Arrival Process

Before we finish Ch. 5, let's revisit the Poisson Distribution

• In this case, indicates the mean value, or number of arrivals (total)

– Does not factor in arrivals over time– However, this can be done, and in this case we

say the arrivals follow a Poisson Process> In this case we are counting the number of

arrivals over time

– However, some rules must be followed

x

i

ix

i

exF

otherwise

xx

exp

0 !)(

,0

,1,0,!)(

105


1) Arrivals occur one at a time2) The number of arrivals in a given time period

depends only on the length of that period and not on the starting point– i.e. the rate does not change over time

3) The number of arrivals in a given time period does not affect the number of arrivals in a subsequent period– i.e. the number of arrivals in given periods are

independent of each other

– Discuss if these are realistic for actual "arrivals"

• We can alter the Poisson distribution to include time– Only difference is that t is substituted for

otherwise

nn

tentNP

nt

,0

,1,0,!

)(])([

106


• Just like the Poisson Distribution, V = E = = t

• In fact, if you look at the example from slide 81, we are in fact using the Poisson Arrival Process there

The Poisson Arrival Process has some nice properties• The three required from the previous slide

(obviously)– These imply that the arrivals follow an exponential

distribution

• Random Splitting– Consider a Poisson Process N(t) with rate t – Assume that arrivals can be divided into two groups,

A and B with probability p and (1-p), respectively> Show on board

107


– In this case N(t) = NA(t) + NB(t)

– NA is a Poisson Process with rate p and NB is a Poisson Process with rate (1-p)

– Splitting can be used in situations where arrivals are subdivided to different queues in some way

> Ex: At immigration US citizens vs. non-US citizens

• Pooled Process– Consider two Poisson Processes N1(t) and N2(t), with

rates 1 and 2

– The sum of the two, N1,2(t) is also a Poisson Process with rate 1 + 2

– Pooling can be used in situations where multiple arrivals processes feed a single queue

> Ex: Cars arrive in New York City from many bridges and tunnels, each at a different rate

108


• Ex: Exercise 5.28 in text: – An average of 30 customers per hour arrive at

the Sticky Donut Ship in accordance with a Poisson process. What is the probability that more than 5 minutes will elapse before both of the next two customers walk through the door?

> As usual, the first thing is to identify what it is that we are trying to solve.

> Discuss (and see Notes)

– [I added this part] If (on average) 75% of Sticky Donut Shop's customers get their orders to go, what is the probability that 3 or more new customers will sit in the dining room in the next 10 minutes?

> Discuss

109

Brief Intro. to Monte Carlo Simulation

• We discussed previously that our simulations will typically follow the dynamic model Progress over time

• Stochastic simulations using the static model are often called Monte Carlo Simulations Idea is to determine some quantity / value

using random numbers that could be very difficult to do by other means• Ex: Evaluating an integral that has no closed

analytical form

110


• Before any formal definitions, let's consider a simple example Let's assume we don't know the formula

for the area of a circle, but we do know the formula for the area of a square

We'd like to somehow find the area of a circle of a given radius (let's say 1)

2

111


Let's generate a (large) number of random points known to be within the square• We then test to see if each point is also within

the circle– Since we know the circle has a radius of 1, we can

put its center at the origin and any random point a distance <= 1 from the origin is within the circle

• The ratio of points in the circle to total points generated should approximate the ratio of the area of circle to the area of the square

• We can then calculate the area of the circle by multiplying the area of the square by that ratio

• See Circle.java

112


• Some informal theory behind M.C. Empirical probability

• Consider a random experiment with possible outcome C

• Run the experiment N times, counting the number of C outcomes, NC

• The relative frequency of occurrence of C is the ratio NC/N

• As N , NC/N converges to the probability of C, or

N

NCp C

N lim)(

113


Axiomatic probability• Set theoretic approach that determines

probabilities of events based on the number of ways they can occur out of the total number of possible outcomes

• Gives the "true" probability of a given event, whereas empirical probability only gives an estimate (since we cannot actually have N be infinity)

• However, for complex situations this could be quite difficult to do

When axiomatic probability is not practical, empirical probability (via Monte Carlo sims) can often be a good substitute• Can also be used to verify axiomatic results

114

Let's Make a Deal

• Ex: Famous Let's Make a Deal problem Player is given choice of 3 curtains

• One has a grand prize

• Other two are duds After player chooses a curtain, Monty shows

one of the other two, which has a dud• Now player has option to keep the same curtain or to

switch to the remaining curtain What should player do? At first thought, it seems like it should not

matter• However, it DOES matter – player should always

switch

115

Let's Make a Deal

We can look at this axiomatically• Initially there is a 1/3 probability that the

player's choice is correct and 2/3 that it is incorrect

• Revealing an incorrect curtain does not change that probability, so if the user does not switch his/her chance of winning is still 1/3

• However, now what do we know?– There is a 2/3 chance that the winning curtain is

NOT the one originally picked– Of that 2/3, there is a 0 chance that it is the

curtain already revealed– Therefore, there is a 2/3 chance that the

remaining curtain is the winner, so we should switch to it

116

Let's Make a Deal

In case we are still skeptical, we can verify this result with a Monte Carlo Simulation• See MontyMonte.java

Note that the larger our number of trials, the better our result agrees with the axiomatic result

117

Monte Carlo Integration

• Let's apply this idea to another common problem – evaluating an integral Many integrals have no closed form and

can also be very difficult to evaluate with "traditional" numerical methods

How can we utilize Monte Carlo simulation to evaluate these?

Let's look at this in a somewhat simplified way (i.e. we will be light on the theory)

118


Consider function f(x) that is defined and continuous on the range [a,b]• The first mean value theorem for integral calculus

states that there exists some number c, with a < c < b such that:

– The idea is that there is some point within the range (a,b) that is the "average" height of the curve

– So the area of the rectangle with length (b-a) and height f(c) is the same as the area under the curve

b

a

b

a

cfabdxxf

orcfdxxfab

)()()(

)()(1

http://www.sosmath.com/calculus/integ/integ04/integ04.html

119


So now all we have to do is determine f(c) and we can evaluate the integral

We can estimate f(c) using Monte Carlo methods• Choose N random values x1, … , xN in [a,b]

• Calculate the average (or expected) value, ḟ(x) in that range:

• Now we can estimate the integral value as

)()(1

)(1

cfxfN

xfN

ii

b

axfabdxxf )()()(

120


There is some error in this, but as N the error approaches 0• It is inversely proportional to the square root of

N

• Thus we may need a fairly large N to get satisfactory results

Let's look at a few simple examples• In practice, these would be solved either

analytically or through other numerical methods

• Monte Carlo methods are most useful for multiple integrals that are not analytically solvable

• See Monte.java

121

Simple Queueing Theory

• Many simulations involve use of one or more queues People waiting in line to be served Jobs in a process or print queue Cars at a toll booth Orders to be shipped from a company

• Queueing Theory can get quite complex We are interested in a few of the more

important results / guidelines

122


• First, we should define queue characteristics in a consistent way Standard Queueing Notation:

A/B/c/N/K• where A is the interarrival time distribution

• where B is the service-time distribution– A and B can follow any distribution (ex: the ones

we discussed in Chapter 5):> D Deterministic

Distribution is not random (ex: real data that has been measured / calculated)

123


> M Exponential (Markov) This is probably the most

common and most studied of the random distributions – more on this below

> Ek Erlang of order k

> G General

So why is an exponential distribution called Markov?• Relates to Markov processes (continuous time)

and Markov chains (discrete time)

• Let's consider a Markov chain for now (it is easier to conceptualize)

124


• A set of random variables (or states) X1, X2, … forms a Markov Chain if the probability that we transition from state Xi to state Xi+1 does not depend on any of the previous states X1, … Xi-1

– In other words, the past history of the chain does not affect its future

– The idea is the same for a continuous time Markov process

• Let's consider now a random variable Y that describes how long it will be in one state before transitioning to a different state

– For example, in a queueing system, this could model how long before another arrival into the system (which changes the system state)

125


– This time should not depend on how long the process has been in the current state

> i.e. it must be memoryless

– As we discussed in Chapter 5, the exponential distribution is the only continuous distribution that is memoryless

– Thus, when arrivals or services times are exponentially distributed, they are often called Markovian

– We will touch on a bit more of this theory later

– Now back to our description of the terminology…A/B/c/N/K

126


• where c is the number of parallel servers– A single queue could feed into multiple servers

• where N is the system capacity– Could be "infinite" if the queue can grow arbitrarily

large (or at least larger than is ever necessary)> Ex: a queue to go up the Eiffel Tower

– Space could be limited in the system> Ex: a bank or any building> This can affect the effective arrival rate, since some

arrivals may have to be discarded

• where K is the size of the calling population– How large is the pool of customers for the system?– It could be some relatively small, fixed size

> Ex: The computers in a lab that may require service

127


– It could be very large (effectively infinite)> Ex: The cars coming upon a toll booth

– The size of the calling population has an important effect on the arrival rate

> If the calling population is infinite, customers that are removed from the population and enter the queueing system do not affect the arrival rate of future customers (-1 = )

> If the calling population is finite, removal of customers from the population (and putting them into and later out of the system) must affect future arrival ratesEx: In a computer lab with 10 computers, each has a 10% chance of going down in a given day. If a computer goes down, the repair takes a mean of 2 days, exponentially distributed

128


In the first day, the expected number of failures is 10*0.1=1

However, once a failure occurs, the faulty computer is out of the calling population, so the expected number of failures in the next day is 9 * 0.1 = 0.9

Clearly this changes again if another computer fails

• What about the system you are testing in Programming Assignment 1?

– We cannot actually classify it with this notation, due to the single initial queue branching into multiple queues for the toll booths

– However, if we consider only the toll booth queues, we could make each queue M/M/1/10/

> However, since a single queue buffers all of the individual queues, the arrival rate into the queues will change if traffic backs up onto the single queue

129

Long-Run Measures of Performance

• What are some important queueing measurements? L = long-run average number of

customers in the system LQ = long-run average number in queue w = long-run average time spent in

system wq = long-run average time spent in

queue = server utilization (fraction of time

server is busy)

130

Time-Average Number in System

• Let's discuss these in more detail Time-Average Number in the System, L

• Given a queueing system operating for some period of time, T, we'd like to determine the time-weighted average number of people in the system,

• We'd also like to know the time-weighted average number of people in the queue,

– Note that for a single queue with a single server, if the system is always busy,

> However, it is not the case when the server is idle part of the time

– The "hats" indicate that the values are "estimators" rather than analytically derived long-term values

L

QL

1

LLQ

131


• We can calculate for an interval [0,T] in a fairly straightforward manner using a sum:

– Note that each Ti here represents the total time that the system contained exactly i customers

> These may not be contiguous

– i is shown going to infinity, but in reality most queuing systems (especially stable queuing systems) will have all Ti = 0 for i > some value

> In other words, there is some maximum number in the system that is never exceeded

– See GrocerySimB.java

L

0

1

iiTiT

L

132


Let's think of this value in another way:• Consider the number of customers in the

system at any time, t – L(t) = number of customers in system at time t

• This value changes as customers enter and leave the system

• We can graph this with t as the x-axis and L(t) as the y-axis

• Consider now the area under this plot from [0, T]

– It represents the sum of all of the customers in the system over all times from [0, T], which can be determined with an integral

T

dttLArea0

)(

133


• Now to get the time-average we just divide by T, or

– See on board

• For many stable systems, as T (or, practically speaking, as T gets very large) approaches L, the long-run time-average number of customers in the system

– However, initial conditions may determine how large T must be before the long-run average is reached

The same principles can be applied to , the time-average number in the queue, and LQ, the long-run time average number in the queue

T

ii dttLT

TiT

L0

0

)(11

QL

L

134

Average Time in System Per Customer

Average Time in System Per Customer, w• This is also a straightforward calculation during

our simulations

– where N is the number of arrivals in the period [0,T]

– where each Wi is the time customer i spends in the system during the period [0,T]

• If the system is stable, as N , ŵ w– w is the long-run average system time

• We can do similar calculations for the queue alone to get the values ŵQ and wQ

– We can think of these values as the observed delay and the long-run average delay per customer

N

iiWN

w1

1

135

Arrival Rates, Service Rates and Stability

We have seen a few times now• "for stable systems…"

• What does this mean and what are the implications? For simple queueing systems such as those we

have been examining, stability can be determined in a fairly easy way• The arrival rate, , must be less than the service

rate– i.e. customers must arrive with less frequency than

they can be served

• Consider a simple single queue system with a single server (G/G/1//)

– Define the service rate to be – This system is stable if <

136


– If > , then, over a period of time there is a net rate of increase in the system of –

> This will lead to increase in the number in the system (L(t)) without bound as t increases

– Note if == some systems (ex: deterministic) may be stable while others may not be

• If we have a system with multiple servers (ex: G/G/c//) then it will be stable if the net service rate of all servers together is greater than the arrival rate

– If all servers have the same rate , then the system is stable if < c

• See GrocerySimB.java

137


• Note that if our system capacity or calling population (or both) are fixed, our system can be stable even if the arrival rate exceeds the service rate

– Ex: G/G/c/k/> With a fixed system capacity, in a sense the

system is unstable until it "fills" up to k. At this point, excess arrivals are not allowed into the system, so it is stable from that point on

> The idea here is that the net arrival rate decreases once the system has filled

– Ex: G/G/c//k> With a fixed calling population, we are in effect

restricting the arrival rate> As arrivals occur, the arrival rate decreases and

the system again becomes stable

138

Conservation Law

An important law in queueing theory statesL = w

• where L is the long-run number in the system, is the arrival rate and w is the long-run time in the system

– Discuss intuitively what this means

• Often called "Little's Equation"

• This holds for most queueing systems

• Text shows the derivation– Read it but we will not cover it in detail here

139

Server Utilization

Server Utilization• What fraction of the time is the server busy?

• Clearly it is related to the other measures we have discussed (we will see the relationship shortly)

• As with our other measures we can calculate this for a given system (G/G/1//)

– We assume that if at least 1 customer is in the system, the server will be busy (which is why we start at T1 rather than T0)

• However, we can also calculate the server utilization based on the arrival and service rates

1

1

iiTT

140

Server Utilization

G/G/1// systems• Consider again the arrival rate and the service

rate • Consider only the server (without the queue)

• With a single server, it can be either busy or idle

• If it is busy, there is 1 customer in the "server system", otherwise there are 0 customers in the "server system" (excluding the queue)

– Thus we can define Ls = = average number of customers in the "server system"

• Using the conservation equation this gives us– Ls = sws

> where s is the rate of customers coming into the server and ws is the average time spent in the server

141

Server Utilization

– For the system to be stable, s = , since we cannot serve faster than customers arrive and if we serve more slowly the line will grow indefinitely

– The average time spent in the server is simply 1/ (i.e. 1/(rate of the server))

– Putting these together gives us

– = Ls = sws = (1/) = /

– Note that this indicates that a stable queueing system must have a server utilization of less than 1

> The closer we get to one, the less idle time for the server, but the longer the lines will be (probably)

> Actual line length depends a lot not just on the rates, but also on the variance – we will discuss shortly

– See GrocerySimB.java

142

Server Utilization

G/G/c// systems• Applying the same techniques we did for the

single server, recalling that for a stable system with c servers:

< c

• we end up with the the final result = /c

143

Markov Systems in Steady-State

• Consider stable M/G/c queueing systems• Exponential interarrival times

• Arbitrary service times

• 1 or more servers As long as they are kept relatively simple,

we can calculate some of the long-run steady state performance measures for these analytically

May enable us to avoid a simulation for simple systems

May give us a good starting point even if the actual system is more complicated

144


First, what do we mean by steady-state?• The probability that the system is in a particular

state is not time-dependent

• For example, consider a stable queueing system S

– What is the probability that fewer than 10 people are in the queue?

– If we start with an empty queue, this probability is initially 1

– However, as the system runs over time, the length of the queue will approach its long-run average length LQ

> It no longer depends on the initial state> The actual length will still vary, but only due to

variations in the arrival and service times

145


Let's look at it another way• Consider a stable M/G/1 queueing system with a

given long-run average customer delay wQ

• We start the system with a certain number of customers, s, in the queue

• For each customer processed, we re-calculate the mean delay up to that point

– In other words (ŵQ)j = average customer delay up to and including customer j

• As j increases, (ŵQ)j wQ , despite the different starting values of s

– When this occurs the steady-state has been reached

• See graph from Law text on board

146


We will not concentrate on the derivation of the formulas, but it is useful to know them and how to use them

M/G/1 Queue (see p. 222 of text for complete list)

)1(2

)1()1(2

)1(

222

222

QL

L

Note that is as we previously defined it (and is equal to the average number in the server)

Note that 2 is the variance in the service times

Note that (intuitively) the long-run queue length is equal to the long-run number in the

system minus the average number in the server

147


Let's look at these a bit more closely• Consider first if 2 = 0

– i.e. the service times are all the same (= mean)> For example a deterministic distribution

– In this case the equations for L and LQ greatly simplified to

– In this case LQ is dependent solely upon the server utilization,

– Note as 0 (low server utilization) LQ 0

– Note as 1 (high server utilization) LQ

)1(2)1(2

)01( 2222

QL

148


• However, as 2 increases with a fixed utilization, LQ also increases

– Other measures such as w and wQ increase with 2 as well

– This indicates that all other factors being equal, a system with a lower variance will tend to have better performance

> In fact (See Ex. 6.9) in some cases a lower will give shorter lines than a higher , if it also has a lower 2

Able: 1/ = 24min, 2 = 400min2

Baker: 1/ = 25min, 2 = 4min2

Able: LQ = 2.711, P0 = 0.20

Baker: LQ = 2.097, P0 = 0.167

> Note that Able has a longer long-run queue length despite his faster rate

> However, he also has a higher P0, indicating that more people experience no delay

149


We can generalize this idea, comparing various distributions using the coefficient of variation, cv:

(cv)2 = V(X)/[E(X)]2

– i.e. it is the ratio of the variance to the square of the expected value

• Distributions that have a larger cv have a larger LQ for a given server utilization, ρ

– Ex: Consider an exponential service distribution: V(X) = 1/μ2 and E(X) = 1/μ, so (cv)2

= (1/μ)2/[(1/μ)]2 = 1– See Figure 6.12 for chart of other distributions

150


Consider again the special case of an exponential service distribution (M/M/1), then• The mean service time is 1/ and the variance

on the service time is 1/2

This simplifies our equations to

)1()1(2

)1

1(

)1()1()1(

)1(

)1(2

)1

1(

22

22

22

22

QL

L

151


• The M/M/1 queue gives us a closed form expression for a measure that the M/G/1 queue does not:

– Pi , the long-run probability that there will be exactly i customers in the system

– Note: We can use this value to show L for this system

> The last equality is based on the solution to an infinite geometric series when the base is < 1

> We know ρ < 1 since system must be stable

nnP )1(

)1

1()1(

33220

]1[3]1[2]1[1]1[0

32

43322

3210

0

i

iPiL

152


The text has formulas for a number of different queue possibilities:• M/G/1 already discussed

• M/M/1 already discussed

• M/M/c Exponential arrivals and service with multiple servers

• M/G/ "Infinite" servers (or service capacity >> arrivals, or "how many servers are needed"?)

• M/M/c/N/ Limited capacity system

• M/M/c/K/K Finite population Let's look at an example where these

formulas would be used

153


Exercise 6.6– Patients arrive for a physical exam according to

a Poisson process at the rate of 1/hr– The physical exam requires 3 stages, each one

independently and exponentially distributed with a service time of 15min

– A patient must go through all 3 stages before the next patient is admitted to the facility

– Determine the average number of delayed patients, LQ, for this system

– Hint: The variance of the sum of independent random variables is the sum of the variance

> Discuss and develop the solution

154


Exercise 6.23– Copy shop has a self-service copier– Room in the shop for only 4 people (including the

person using the machine)> Others must line up outside the shop> This is not desirable to the owners

– Customers arrive at rate of 24/hr– Average use time is 2 minutes– What impact will adding another copier have on

this system

• How to solve this?– Determine the systems in question– Calculate for each system the probability that 5

or more people will be in the system at steady-state

155

Markov Systems in Steady State

Exercise 6.21:– A large consumer shopping mall is to be

constructed– During busy periods the expected rate of car

arrivals is 1000 per hour– Customers spend 3 hrs on average shopping– Designers would like there to be enough spots

in the lot 99.9% of the time– How many spaces should they have?

• We will model this as an M/G/ queue, where the spaces are the servers

– We want to know how many servers, c, are necessary so that the probability of having c+1 or more customers in the system is < 0.001

156


Text also has some examples showing the M/M/c/N Queue and the M/M/c/K/K Queue• Idea is similar but the formulas and

applications are different

157


• Before we finish with this topic… Let's just talk a LITTLE bit about how these

formulas are derived (being light on theory) Let's look at the simplest case, an M/M/1

queue with arrival rate and service rate Consider the state of the system to be the

number of customers in the system We can then form a state-transition diagram

for this system• A transition from state k to state k+1 occurs with

probability • A transition from state k+1 to state k occurs with

probability

158


From this we can obtain

We also know that

• Since the sum of the probabilities in the distribution must equal 1

• This will allow us to solve for P0

0 1 2 kk-1 k+1…

)1.138.(0

1

00 EqPPP

kk

ik

0

1k

kP

159


1

0

10

00

0

1

1

11

1

1

k

k

k

k

k

k

kk

P

P

P

P

•All that is needed for this derivation is fairly simple

algebra

•Before completing the derivation, we must note an

important requirement: to be stable, the system utilization

/ < 1 must be true

•This allows the series to converge

160


• Which is the solution for P0 from the M/G/1 Queue in Table 6.3

Utilizing Eq. 138.1, we can substitute back to get

• Which is the formula indicated in Table 6.4

• The other values can also be derived in a similar manner

1

)/(11

1)/(1

/)/(1)/(1

1

)/(1/

1

1

0

0

P

P

k

kk

k PP

)1(10

161


A lot of theory has been left out of this process If you want to read more about queueing

theory, there are a number of books on the topic:• Queueing Systems Volume 1: Theory by L. Kleinrock

(Wiley and Sons, 1975)– Old but still relevant and still available – well known

text on the topic

• Many newer textbooks as well – Go to Amazon.com select "Books" and search "Queueing Theory"

One final note: spelling!• Both "Queueing Theory" and "Queuing Theory" seem

to be acceptable (and both spellings are in dictionary)– Search both to cover all options

162

Random Numbers

• Stochastic simulations require random data If we want true random data, we CANNOT use an

algorithm to derive it• We must obtain it from some process that itself has

random behavior– http://en.wikipedia.org/wiki/Hardware_random_number_generator

• Ex: thermal noise– http://noosphere.princeton.edu/reg.html

• Ex: atmospheric noise– http://random.org/essay.html

• Ex: radioactive decay– http://www.fourmilab.ch/hotbits/ – http://www.enginova.com/radioactive_random_number_gen

era.htm

http://en.wikipedia.org/wiki/Hardware_random_number_generator

http://noosphere.princeton.edu/reg.html

http://random.org/essay.html

http://www.fourmilab.ch/hotbits/

http://www.enginova.com/radioactive_random_number_genera.htm

http://www.enginova.com/radioactive_random_number_genera.htm

163

Pseudo-Random Numbers

However, these require either purchase of hardware or a service

For stochastic simulations we also need very large amounts of random data quickly• Possibly more than can be obtained from a true

random source in a reasonable amount of time Often in testing and running our simulations,

we also want to reuse the same "random" values• If we are debugging it helps to be able to see the

execution repeatedly with the same data

• If we are comparing systems, we may want to use the same data on both systems

– Ex: Project 1

164


• Thus, more often than not, simulations rely on pseudo-random numbers These numbers are generated

deterministically (i.e. can be reproduced) However, they have many (most, we hope)

of the properties of true random numbers:• Numbers are distributed uniformly on [0,1]

– Assuming a generator from [0,1), which is the most common

• Numbers should show no correlation with each other

– Must appear to be independent– There are no discernable patterns in the numbers

165


Let's look at a decidedly non-random distribution in the range [0,1)• Ex: Assume 99 numbers, X1…X99 are generated

X1 = 0.01;

Xi = Xi-1 + 0.01

– This will generate the sequence of numbers0.01, 0.02, 0.03, … , 0.99

– Clearly these numbers are uniformly distributed throughout the range of values

– However, also as clearly they are not independent and thus would not be considered to be "random"

– Often, uniformity and independence are not as obvious, and must be determined mathematically

> We will discuss how to test for each shortly

166

Linear Congruential Generators

• Linear Congruential Generators These are perhaps the best known

pseudo-random number generators• Easy and fairly efficient

– Depending upon parameter choices

• Simple to reproduce sequences

• Give good results (for the most part, and when used properly)

– Again depending upon parameter choices

Based on mathematical operations of multiplication and modulus

167


Standard Equation:X0 = seed value

Xi+1 = (aXi + c) mod m for i = 1, 2, …– wherea = multiplierc = incrementm = modulus

• For c == 0 it is called a multiplicative congruential generator

• For c != 0 it is called a mixed congruential generator – Both can achieve good results

• Initially proposed by Lehmer and studied extensively by Knuth

– See references at end of the chapter

168


Note that the generator as shown will produce integers

If we want numbers in the range [0,1) we will have to convert the integer to a float in some way• This can be done in a fairly straightforward

way by dividing by m

• However, sophisticated generators can do the conversion in a better (more efficient) way

• Clearly, however, the more possible integers, the denser the values will be in the range [0,1)

169


For now, consider three properties• The density of the distribution

– How many different values in the range can be generated?

• The period of the generator– How many numbers will be generated before the

generator cycles?> Since the values are deterministic this will inevitably

happen> Clearly, a large period is desirable, especially if a lot

of numbers will be needed> A large period also implies a denser distribution

• The ease of calculation– We'd like the numbers to be generated reasonably

quickly, with few complex operations

170


Consider a multiplicative linear congruential generator (i.e. c == 0)

Xi+1 = (aXi) mod m

• If m is prime, this will produce a maximum period of (m–1) if ak – 1 is not divisible by m for all k < (m–1)

– The period and density here are good, but with m as a prime, the mod calculation is somewhat time-consuming

• If m = 2b for some b, this will produce a maximum period of 2b-2 of X0 is odd and either a = 3 + 8k or a = 5 + 8k for some k

– This allows more efficient calculation of the numbers, but gives a smaller period

– The importance of a good seed also makes this less practical (programmer must understand algorithm and seed requirements to use generator effectively)

171


Consider a mixed linear congruential generator (i.e. c != 0)

Xi+1 = (aXi + c) mod m

• If m = 2b for some b, this will produce a maximum period of 2b if c and m are relatively prime and a = (1+4k) for some k

• This allows a good period, an easy mod calculation with the relatively small overhead of an addition

• This is the generator used by Java in the JDK

• Let's take a look at that in more detail

• See JDKRandom.java

• See TestRandom.java

172

Quality of Linear Congruential Generators

The previous criteria for m, a and c can guarantee a full period of m (or m-1 for multiplicative congruential generators)• However, this does not guarantee that the

generator will be good

• We must also check for uniformity and independence in the values generated

• General criteria for good values of m, a and c are still unsure

• If you are creating a new LCG, a good rule of thumb is:

– Choose m, a and c to guarantee a good period– Test the resulting generator for uniformity and

independence

173

Chi Square Testing for Uniformity

• Idea: Consider discrete random variable X, which

has possible values x1, x2, …, xk and probabilities p1, p2, …, pk (which sum to 1)

Assume n random values for X are chosen Then the expected number of times each

value will occur Ei = npi

Now assume n random values of distribution Y thought to be the same as X are chosen• How can we tell if distribution Y is the same as X?

• We can at least get a good idea if the occurrences more or less match those of the Ei

174

Chi Square Testing for Uniformity

Ex: Consider our dice-throwing example from earlier this term (see Slides 12-13)

E2 = (n/36), E3 = (2n/36), E4 = (3n/36), E5 = (4n/36)

E6 = (5n/36), E7 = (6n/36), E8 = (5n/36), E9 = (4n/36)

E10 = (3n/36), E11 = (2n/36), E12 = (n/36)

• Now consider a sequence of n = 360 rolls with the following distribution

N2 = 8, N3 = 19, N4 = 35, N5 = 38, N6 = 53, N7 = 64

N8 = 46, N9 = 40, N10 = 27, N11 = 17, N12 = 13

– Question: Are these dice "fair"?

• We need a way of comparing empirical results with theoretical predictions

– We can use the Chi-Square (2) test for this

175

Chi Square Test

Consider the following formula:

• Where Yi is the observed number of occurrences of value xi and Ei is the expected number of occurrences of value xi

• This is showing the square of the differences the observed values and the expected values (divided by the expected values to normalize it)

• Now consider two hypotheses:– NULL Hypothesis, H0 : Y matches distribution of X

– Hypothesis H1: Y does not match distribution of X

k

i i

ii

E

EYV

1

2

176

Chi Square Test

• If V is too large, we reject the null hypothesis (i.e. the distributions do not match)

• If V is small(ish) we do not reject the null hypothesis (i.e. the distributions may match)

Staying with our dice example:V = (8-10)2/10 + (19-20)2/20 + (35-30)2/30 + (38-40)2/40 + (53-50)2/50 + (64-60)2/60 + (46-50)2/50 + (40-40)2/40 + (27-30)2/30 + (17-20)2/20 + (13-10)2/10

= 4/10 + 1/20 + 25/30 + 4/40 + 9/50 + 16/60 + 16/50 + 0/40 + 9/30 + 9/20 + 9/10

= 228/60

• Ok, so what does this number mean?

• Should we accept the null hypothesis or reject it?

177

Chi Square Test

• To answer this question we need to look at the test a bit more closely

• The test is called the Chi Square Test because for n large enough (standard rule is that each Yi should be at least 5) the value for V gives a Chi Square distribution under the null hypothesis

– Since V is normalized it is not dependent on the original distributions – only the differences between them and the "degrees of freedom" (k – 1, where k is the number of possible values of the random variable).

– Thus a standard chart can be used – see p. 584 in text to see how it looks

– The idea is that, given the null hypothesis, it is more likely that V will fall into the "middle" part than either end

178

Chi Square Test

– The closer we get to the ends (the "tails") the less likely it is that V should fall here, given the null hypothesis

– Thus if V is too large (or in some cases small), we reject the null hypothesis

• How to determine values for "too large" or "too small"?

– We must set a level of significance, = Pr(reject H0 | H0 true)

> Or, is the probability that we reject the null hypothesis even though it is true (i.e. the probability that we reject it by mistake)

> Clearly, the lower the value for the less likely we are to reject a distribution by mistake

> Ex: if = 0.1, we are saying we are willing to reject the null hypothesis even though there is a 10% chance that it is true

179

Chi Square Test

– Mapping this onto our Chi Square graph, we locate the portion point of the graph that corresponds to our level of significance,

> For example, if = 0.1, which values of V would cause us to reject the null hypothesis?

> This corresponds to the area in the Chi Square graph where there is less than 0.1 chance that V would legitimately be mapped

> See handout

– Depending upon the application, we can use the full on one "tail" of the graph (one-sided test)

> Usually the right tail – in this case V can be 0 but should not be too large

– We may also want to split between the left and right "tails" of the graph

> If it is either too big or too small we reject the null hypothesis

180

Chi Square Test

Consider again our die-tossing experiment• We have 10 degrees of freedom (since we have

11 possible values)

• Our value for V = 228/60 = 3.8

• Assume our level of significance is 0.05, using a two-tailed test (so 0.025 on each side)

– In other words, if the expected values are varied from by too much, OR if they are matched too closely, we will assume that the sequence of throws is not random

• Looking at the charts in the handout, we see:– For the right tail, with critical value 0.025, the

value of V for 10 degrees of freedom is 20.483> Since our value of V is much less than this, we do

not reject the null hypothesis on this basis

181

Chi Square Test

– For the left tail, with critical value 0.025, we actually want the critical value for (1 – 0.025) = 0.975

> The value of V here for 10 degrees of freedom is 3.247

> Since our value of V is GREATER than this (but not by a lot) we do not reject the null hypothesis on this basis either

• Note that under repeated testing, there is a reasonable chance that we WILL occasionally reject a null hypothesis (perhaps) improperly

– Ex: if = 0.1 and we do 100 trials, we are likely to reject 10 of them even if the distribution is valid

– We need to take this into account during testing

182

Chi Square Test

So how do we apply this test to our Uniform distribution?• After all, the values shown are discrete, and the

uniform distribution is continuous! We simply divide U[0,1) into subranges

that represent each discrete value• Ex: For 10 subranges we can have {[0,0.1),

[0.1,0.2), …, [0.9,1.0)}

• Then we count the values in each subrange and perform the Chi Square test as indicated

– Note that for a uniform distribution all of the Ei will be the same = n/(# of subranges)

• We probably want the two-tail test here (why?)

183

Chi Square Test

Look at handout RandTest1.java• Here some linear congruential generators are

tested with the Chi Square Test

• Note the comments and why some of the generators are poor in some situations

The Chi Square test is not perfect• Doesn't work well for small number of trials

• As shown does not directly test independence– Thus we need other tests as well before deciding

upon quality of a random number generator

• However, Chi Square can be applied in conjunction with other tests – we may see this later

184

Kolmogorov-Smirnov Test

• Another test for uniformity is the Kolmogorov-Smirnov test Has a different approach than Chi-Square

• Does not count number of values in each subrange of U[0,1)

• Instead it compares the empirical distribution (i.e. the numbers generated) to the cumulative distribution function (cdf) for a uniform distribution

– Recall from Chapter 5 and Slide 84:F(x) = x where 0 <= x <= 1

– Think about what this means> Discuss

185


– Now consider the empirical distribution generated by our random number generator (assume N values are gen'ed)

> Call it SN(x)

– Of the N total values generated by our random number generator, say that k of them are <= x

> Then SN(x) in this case is k/N

– For any value x, we can determine SN(x) by counting the number of values <= x and dividing by N

– For example, consider the following 10 values:(0.275, 0.547, 0.171, 0.133, 0.865, 0.112, 0.806, 0.155, 0.572, 0.222)

SN(0.25) = 5/10 since 5 values above are <= 0.25



– K-S test checks to see how much SN(x) differs from F(x)> If it is too much we reject the null hypothesis

186


More specifically:• Sort the empirical data

• Calculate the maximum "SN(x) above F(x)" value, D+ for the empirical data

• Calculate the maximum "SN(x) below F(x)" value, D- for the empirical data

– Idea is that if the empirical data matches the uniform distribution, it should not ever be too much above or below the cdf – See Figure 7.2

• Test max(D+, D-) for the desired level of significance against the table of critical values for the given degree of freedom (see Table A.8)

– If it is too high, reject the null hypothesis

187


• Ex: For previous data, call it R(0.275, 0.547, 0.171, 0.133, 0.865, 0.112, 0.806, 0.155, 0.572, 0.222)

• Sorted, and compared to F(x) (actually, i/N), we get(0.112, 0.133, 0.155, 0.171, 0.222, 0.275, 0.547, 0.572, 0.806, 0.865)

(0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900, 1.0)

• i/N – Ri

( ----- , 0.067, 0.145, 0.229, 0.278, 0.325, 0.153, 0.228, 0.094, 0.135)

• Ri – (i-1)/N(0.112, 0.033, ----- , ----- , ----- , ----- , ----- , ----- , 0.006, ----- )

• D+ = 0.325; D- = 0.112

• D = 0.325

• Assume = 0.05– N = 10, critical value = 0.41 (from Table A.8)– So the null hypothesis is not rejected

188

Chi-Square vs. Kolmogorov-Smirnov

There is debate about which test is more effective• Each seems to work better under certain

circumstances

• It is probably best to run both tests on the data to thoroughly test for uniformity

189

Tests for Independence

• There are many tests for independence There are a lot of ways data can correlate From one point of view it could appear

independent, while from another it could appear very non-independent

For best results, many different independence tests should be run on the data• Only if it "passes" them all should the generator be

accepted We will briefly look at a few of these tests

• Most not taken from the text– Many taken from previous (3rd) edition

190

Runs Tests

A run can be considered a sequence of events whose outcome is "the same"• For example, a sequence of increasing values

– Old game show "Card Sharks" was based on this

> 2, 5, 8, J, Q, A

In random data the number and length of various runs should not be too great (or too small)• If so, it indicates some non-independence of

the data Many types of runs can be tested

• We'll look at a few

191

Runs Tests

Runs up and Runs down• In a random sequence of N values in U[0,1),

some properties of the runs up and runs down are

Max = N – 1 (every other number switches)

Min = 1 (monotone increasing or decreasing)

= (2N – 1)/2

2 = (16N – 29)/90

(don't worry about why this is true)

• If N > 20, this distribution is about normal

• Assume we observed r actual runs in our data– If our data is random, r should not be too far

from the mean> If it is we reject it

192

Runs Tests

– We normalize using our standard formula:Z0 = (r - )/

– And look up the result in our normal table> If is our level of significance, our value r should

not be less than /2 standard deviations from either side of the normal curve

> If it is we reject the null hypothesis

– See example in handout and show on board

Runs above and below the mean• Runs up and down don't indicate any absolute

organization of the data (or lack thereof)

• We may also want to test the number of runs (i.e consecutive values) above and below the mean

– Too many or too few is not a good sign

193

Runs Tests

• Calculation for mean and variance is more complex, but we handle this test in more or less the same way as the runs up and runs down test

– Don't want the our number to be too far from the mean

– Use the standard normal distribution to test– See handout for formulas

We can also test the lengths of runs for both of our previous options• The math in this case is more complex, but the idea

is similar– Given a sequence of random numbers, we can

expect a certain number of runs of each length 1, 2, 3, …

> Clearly the probability should get lower as the lengths increase

194

Runs Tests

– If the run lengths in our empirical tests differ too much from the expected distribution, we can reject the generator

– Idea is to determine the empirical distribution of runs of various lengths, then compare it with the expected distribution given random data

> The distributions are compared using the Chi-Square test

– See handout for details

195

Other Tests

Other tests for independence include• Autocorrelation tests

– Tests relationship of items m locations apart> See simple ex on board

• Gap tests– Tests the gap between repetitions of the same

digit> For U[0,1) we can discretize it and test for the gap

between values in the same bin> See simple ex on board

• Poker test– Tests frequencies of digits within numbers– Looks at possible poker hands and their

frequencies vs. the expected values> See simple ex on board

196

Other Tests

• Serial Test– Let's start with an example here:– Given a sequence of random values in U[0,1):

X1, X2, X3, X4, …

– Consider the values in non-overlapping pairs(X1, X2), (X3, X4), …

– Now let's consider the distribution of the pairs of values in a plane

> We can approach this in a similar way to what we did with the Chi-Square frequency test

> Now, however, rather than dividing U[0,1) into (for example) 10 bins we instead divide U[0,1)2 into (for example) 100 bins, one for each pair

> We then check the empirical results with the expected values for each bin (using Chi-Square)

197

Other Tests

– We can generalize this to an arbitrary number of dimensions

– However, note that the number of bins increases dramatically as we increase dimensions

> Recall the Chi-Square is only effective if the number of values in each bin is ~5 or more

> For large dimensions we have to generate an incredibly large number of values to obtain accurate results

– However, if our generator "passes" in 2 and 3 dimensions, for many purposes it should be ok

> As long as it passes other tests as well

198

Other Tests

There are many other empirical tests that we can run on our generators

There are also theoretical ways to test generators• In this case we do not generate any actual

values with a generator

• Rather, we use the generation algorithm to determine how well (or poorly) it "can perform"

Two famous theoretical tests are the spectral test and the lattice test

– Both are similar in what they are trying to do

• Measure the theoretical randomness / distribution of sequences of values generated

199

Spectral Test & Lattice Test

These tests are (more or less) theoretical extensions of the serial test

Let's look at a simple example with sets of pairs• (Xn, Xn+1) for n = 1 up to period – 1.

• If we consider these sets in two-dimensional space, we get points in a plane

– These points tend to fall on a set of parallel lines> Show on board

– The more lines required to touch all of the points, the better the distribution (for d = 2)

– If few lines are required, it indicates that the generator does not disperse the pairs well in 2 dimensions

– See handout

200

Spectral Test & Lattice Test

• We can extend this test to an arbitrary number of dimensions

– Generally, for dimension d we will look at (d-1)-tuples and see how many d-1 dimension hyperplanes are needed to capture all of the points

> Actually we are looking at "how far apart" the hyperplanes are, but it is more or less the same idea

• See handout from Knuth– Note that the process is highly mathematical– However, if a generator does not pass this test

in at least the first few dimensions, it is probably not a good generator to use

201

Other Random Number Generators

• Other random number generation techniques have been tried, with varying degrees of success Middle Square method

• Original method proposed by von NeumannStart with a four-digit positive integer Z0 and square it to obtain an

integer with up to eight digits; if necessary, append zeros to the left to make it exactly eight digits. Take the middle four digits of this eight-digit number as the next four-digit number, Z1. Place a decimal point at the left of Z1 to obtain the first U[0,1) random number, U1. Then let Z2 be the middle four digits of Z1

2 and let

U2 be Z2 with a decimal point at the left, and so on.

– Try an example

• Interesting idea and at first glance looks good, but has been shown to be quite poor

202

Other Random Number Generators

Combined LCGs• In this case multiple LCGs are combined (in a

specific way) to produce generators with longer periods

– See Section 7.3.2

• These must be handled very carefully– Oddly enough, the best random generators tend to

be those that follow strict mathematical formulas

Additive Generators• These generators add previously generated values

to generate new values

• They typically differ on how many values are added and which previous values are chosen to add

– These choices have a large effect on the quality!

203

Additive Generators

Famous initial attempt: Fibonacci Generator:• Initialize X0 and X1 using an LCG

Xi = (Xi-1 + Xi-2) mod 232

– The name is fairly obvious

• Unfortunately, this generator does not do too well in empirical tests

Researchers determined better choices for terms to add

Ex: The generator used for Unix BSD random()• Initialize X0, X1, … X31 using an LCG

Xi = Xi-31 + Xi-3

– Has a HUGE period – good for long simulations– See handout for details

204

Random Variate Generation

• Generating Random Variates Once we have obtained / created and

verified a quality random number generator for U[0,1), we can use that to obtain random values in other distributions• Ex: Exponential, Normal, etc.

A variety of techniques exist for generating random variates• Some are more efficient than others

• Some are easier to implement than others We will briefly look at just a few

205

Random Variate Generation

• Inverse Transform Technique Distributions that have a closed

mathematical formula can be generated in this way

Idea is that some function F-1 will map values from U[0,1) into the desired distribution

An obvious example is the exponential distribution that you used in Project 1

Let's look this one in more detail• Recall what the distribution is

206

Inverse Transform Technique

Recall the cdf, as shown below

Now consider a uniform distribution over U[0,1) – we will call this R

We then set F(x) = R and solve for x• This will give us the transformation needed to

convert from R to F(x)

0,10,0

)(xe

xxF x

)ln()/1()1ln()/1()1ln(

11

RRxRx

ReRe

x

x

207


The last equality seems incorrect, but note that if R is a random distribution on U[0,1), then so is 1-R, so we can use the slightly simpler form

• Let's look at another example: The Uniform Distribution over an arbitrary

range• We could have it continuous, such as U(1,2) or

U(0,4)

• We could have it discrete, such as– Uniform distribution over integers 1-10– Uniform distribution over integers 100-200

208

Uniform Distributions on Different Ranges

• Continuous uniform distributions– Given R U[0,1), we may need to do 2 different

things to get an arbitrary uniform distribution> Expand (or compress) the range to be larger (or

smaller)> Shift it to get a different starting point

– The text shows a derivation of the formula, but we can do it intuitively, based on the two goals above

> To expand the range we need to multiply R by the length of the range desired

> To get the correct starting point, we need to add (to 0) the starting point we want

– Consider the desired range [a,b]. Our transformations yield

X = a + (b – a)R

209


• Discrete uniform distributions– The formula in this case is quite similar to that

for the continuous case– However, care must be paid to the range and

any error due to truncation– Consider a discrete uniform range [m, n] for

integers m, m+1, … , n-1, n– If we use

X = m + (n – m)R> The minimum value is correct, since the

minimum result for X is m (since R can be 0)> Now we need to be sure of two things:1) The maximum value is correct2) The probabilities of all values are the same

210


– First let's consider the maximum value> Although the continuous range (b – a) worked as we

expected, the discrete range (n – m) does NOT work> Since R U[0,1), we know it can never actually be 1> Thus, (n – m)R will always be strictly less than n – m

and thus m + (n – m)R will always be strictly less than n (it will be n – 1 , since it is discrete)

– To solve this problem, we need to increase the range by 1

> (n – m + 1)

– So we get overallX = m + (n – m + 1)R

– Now are all values uniform (i.e. equal probability)?> Consider breaking U[0,1) up into (n – m + 1) intervals

211


• Each sub-interval is equal in range and thus has equal probability

• Given R in each sub-interval, a distinct value in the range [m,n] will result

– Ex: Assume R is in the second sub-interval above

– We then havem + (n – m + 1)(1/(n – m + 1)) = m + 1 X <m + (n – m + 1)(2/(n – m + 1)) = m + 2– Since X is an integer, and is strictly less than m

+ 2, the entire sub-interval maps into m + 1

1

1,

11

2,

1

1,

1

1,0

mn

mn

mn

mn

mnmnmn

212


The inverse transform technique can be applied to other distributions as well• See text for more examples

Note that in some situations, the inverse transform technique may NOT be the most efficient way of generating random variates• However for relatively simple cdfs such as the

exponential distribution, it can be easily done

213

Convolution Technique

Recall that the Erlang distribution can be thought of as the sum of K independent exponential random variables, each with the same mean 1/K• Thus, since we can already calculate the

exponential distribution using the inverse transform technique, we should be able to easily also generate an Erlang distribution

• However, for efficiency, we can generate the answer in a slightly different way

• Let's look at it in some detail

214

Erlang Distribution

• From the text we see:

– where the contents of the parenthesis is an exponential variate with mean 1/k

– We could calculate the sum directly, but that would require calculating a log k times

> Since logarithms are typically time-consuming, we'd like to calculate fewer if possible

– The equation continues in the text

– Now only one logarithm is required> But how did they get that last step?

)ln1

(1 1

i

k

i

k

ii R

kXX

k

iii

k

i

k

ii R

kR

kXX

11 1

ln1

)ln1

(

215

Erlang Distribution

• Recall properties of logarithmsGiven values x1 and x2

logb(x1) + logb(x2) = logb(x1x2)

– Ex: log10(1000) + log10(100) = log10(100000) = 5

• Applying the rule multiple times gives us our desired result

• We haven't talked too much about efficiency up to this point, but note that for large simulations, very high numbers of variates may be required

– If we save even a few cycles in each generation overall it can add up to substantial savings

• In fact there are more efficient ways than this as well, which would probably be used in a large simulation

216

Other Techniques

Not all distributions can be easily transformed• Some may not have a closed form inverse

• We can still use the inverse transform technique if we are willing to approximate the distribution

• However, if we want to be more precise we may have to use other means

In these cases we can use alternative techniques to generate the variates• There are many different techniques, some of

which are quite specific to a particular distribution

217

Normal Variates

Recall that the cdf for a normal distribution is complex and cannot be inverted in a closed form• However, with a different approach we can still

generate normal variates with good results

• The text shows a polar technique that generates two normal values at a time

– There are many other techniques as well

Let's take a look at these four distributions, using the implementations just discussed• Look at Variates.java

218

Variate Generation Summary

• In a professional simulation Software used will have predefined

functions for all of these variates

• However, it is good to know some of the theory for how they are derived You may perhaps have to derive one You may need to examine the

implementation to perhaps improve the efficiency

219

Input Modeling

• In the real world, input data distributions are not always obvious or even clearly defined Often we have some sample data but not

enough for our entire simulation Thus we'd like to determine the

distribution of the sample data, then generate an arbitrary number of values within that distribution for our simulation• However, the input data may not even be of a

single distribution

• May differ at different times / days

220

Input Modeling

• Also, in some cases we may not be able to generate real data, because the means to do so do not yet exist

– Ex: If we are building a new network or road system, we don't yet have a system to provide us with the real data

Assuming we can get sample input data, determining the distribution is not easy• Usually requires multiple steps, and a

combination of computer and "by hand" work Let's try to do this for a small real example

• Arrivals at Panera downstairs

• We'll monitor Panera for the rest of class

221

Input Modeling

• We will stand at the door and record the interarrival times of customers over a 15-20 minute period

– We could do this by hand, but we'll use a computer since that is easier and more precise

• We'll then look at the data to see if we can fit it to a distribution

– We'll do this next class

222

Input Modeling

Once we have the data, how do we fit it to a distribution?• The first step typically involves creating one or

more histograms of the data and graphing them, to see the basic "shape" of the distribution

• This requires some skill and finesse– How "wide" is each group to be?– How can we know if a shape is of a particular

distribution?> Sometimes we just "eyeball it"> There are also some analytical techniques for

verification

• If we have an idea of a possible distribution, then we can try to verify it

223

Input Modeling

• Estimate parameters based on the distribution– Ex: lamba for exponential– Ex: beta and theta for gamma– Ex: mu and sigma for normal

• We can then try some goodness of fit tests to see if the distribution works

– As with the random number testers, we can use the Chi-Square test and the Kolmogorov-Smirnov test

Let's try this process with our collected data

224

Panera Example

Consider the initial data points acquired• First try to form one or more histograms to

“eyeball” the distribution versus known distributions

– In this case, the exponential distribution seems to best match the data

– Note that we may have to try a few different intervals for the histogram

• Once a family of distributions has been determined, we’d like to determine the parameter(s) to the distribution

– With exponential, we’d like to estimate the value of the rate, lambda

– Since this should be 1/mean, we can calculate it easily using the sample data

225

Panera Example

• Next we should try one or more “goodness of fit” tests to see if the proposed distribution does in fact reasonably match our sample data

– In this case we will use Chi-Square or Komolgorov-Smirnov

– Consider in this case Chi-Square as shown in the panera.xls handout

– First we will consider an “uneven” distribution based on the histogram shown

> We generate the cumulative distribution with the given lambda at the interval endpoints

> Next we determine the expected values for each interval

> We then run the test on the results, with the following modifications:

226

Panera Example

> The degrees of freedom will be n – s – 1, where n is the number of intervals, and s is the number of estimated parameters (in this case s = 1)

> Groups with fewer than 5 expected values will be merged into a single group

– Note that the p-values described in the Banks text (section 9.4.4) can be easily calculated in Excel using the CHITEST function

> However there is an issue with degrees of freedom – see panera.xls

– As indicated in the Banks text (section 9.4.2) we should also consider an equal probability Chi Square test

> Note that for our data this test does not perform nearly as well as the “uneven” probability test (in fact it fails fairly badly)

• We could also use a K-S test to test the distribution

– See Example 9.16 in Banks text

227

Panera Example

If we find Exponential Distribution unacceptable (based on the even prob. Chi-Square test) it might behoove us to try a different distribution to see if it matches better• Some options are

– Gamma– Weibull

> Exponential is a special case of both of these

– Lognormal> This is actually tried in the panera.xls spreadsheet,

but it is not good

• Clearly this is not a simple process

• Also note that we really need more data here

228

Input Modeling

What if we don’t have sample data to use?• We must do the best that we can, relying on

– Experts> What do people in the area in question think, based

on their knowledge and experience

– Engineering specs> Ex: A device is built to have some mean time to

failure, based on the production environment. We can use that value as a starting point for mean time to failure of the device in our real environment

– Similarity to something we already know> Ex: What is the distribution of people wanting to

ride the new “Super Wacky Crazy Loopmeister” ride? We can use as a starting point the measured distribution from last year’s “Bizarro Zany Upsidedowner” ride

cs1538

Documents

Transcript of cs1538