A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis,...

1
A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 Abstract Consider an experiment of randomly distributing r balls into n cells. One can conceive several easily described probability problems related to this experiment. Obtaining the probability that no two adjacent cells are empty, finding the distribution of the number of balls occupying a given cell and deriving the distribution of the smallest number of balls over all cells are a few examples of such problems which are collectively referred to as occupancy problems. Solutions to some of these problems are non-trivial and in fact some naturally give rise to well known probability distributions such as binomial and multinomial distributions. Occupancy problems have found important applications in many areas. Distribution of Bose- Einstein and Fermi- Dirac statistics are the most celebrated examples of such applications. More recently, questions from genetics, involving non-randomness of occurrence of mutagen-induced mutations across loci, have also been connected to this general topic. In this poster, we provide a glimpse to the probability calculations underlying occupancy problems, and demonstrate them using an interactive MATLAB program. Examples Applications Fig. 2: These are four realizations generated by the MATLAB demo of an experiment in which 5 balls are thrown into 4 cells. A C D Fig. 1: This is a screenshot of the MATLAB Demo used to visualize the occupancy problems. With six different operations that may be selected to the right. Basic Calculations Concerning the Occupancy Problem |S| = (n) (n)…(n)= In the program, realizations were generated by: For i=1 to number of balls Generate a random number from 1 to the number of cells with each number having a uniform probability of occurring End Conclusion B This has only been a basic introduction to occupancy problems and there are many other calculations that may be done based on the experiment of throwing r balls into n cells. These problems have many applications in the natural sciences, especially in physics. More complex calculations are able to explain the behaviors of elementary particles. Using the simulation method, we can begin to understand the probability distributions which arise from these models. -A binomial distribution describes the distribution of results of an experiment in which: 1. There is a sequence of n trials, where n is fixed in advance 2. Each trial results in one of two possible outcomes, which is denoted as either a success or a failure 3. The trials are independent, so each outcome on any particular trial does not influence the outcome of any other trial 4. The probability of success (p) is constant from trial to trial -Where the probability of x number of successes is Binomial Distributions Binomial Distribution of 1 st Cell T=number of balls in cell 1, where T is a random variable t=0, 1, 2,…, r, where t is all r of the balls being thrown : exactly t of ‘s are 1 and the others are not 1} Number of (T=t)= (n-1) r-t So, T is a binomial distribution such that T ~ Multinomial Distribution -A multinomial distribution is similar to a binomial with the exception that instead of having 2 possible outcomes, there are greater than 2 possible outcomes -Let = number of balls in cell 1 and = number of balls in cell 2 -The third outcome is a ball going into a cell other than cell 1 or cell 2 -( , ) ~ multinomial Minimum Calculations Y=minimum number of balls occurring in any cell So, For Y>0 it is non-trivial to calculate the P(Y) without the use of a simulation. Statistical Mechanics -We have r indistinguishable particles subdivided into n small regions, or phase spaces with the particles being randomly distributed into these phase spaces -It would seem that all arrangements are equally possible, however physicists have shown that this is not the case. So, there are two statistics to describe the behavior of particles: -Fermi-Dirac Statistics -Bose-Einstein Statistics -In this realization, no two particles may be in the same cell and all distinguishable arrangements have equal probabilities -This means that r ≤ n, so any of the arrangements can be chosen by randomly selecting which r cells contain a particle. Each arrangement has a probability of and describes the behavior of electrons, protons, and neutrons. -In this realization, each distinguishable arrangement is given a probability of -This has been proven, experimentally, to describe the behavior of photons, nuclei, and atoms that have an even number of elementary particles Population Genetics -Since genetic data is often analyzed through categorical observations, the computation of expected frequencies of different genetic models can be described -These are important in genetics when testing the non- randomness of mutagen-induced mutations across loci. -The occupancy problem is applied to these analyses to combinatorially solve the problem of an inadequate sample size. -In this application, r is the size of the random sample and n is the number of classes being analyzed in the sample Matlab Demo Function 1: Can generate one realization at a time for a certain number of balls and cells. Function 2: Can simulate a large number of realizations and empirically compute probabilities. Function 3: Allows the user to change the number of balls and cells Output 1: One arrangement of 50 balls and 25 cells Output 5: Displays the distribution of the balls in the first cell over 1000 realizations Output 3: Displays the number of balls in each cell over 1,000 realizations Output 4: Displays the minimum number of balls for each of 10,000 realizations if each arrangement has 50 balls and 25 cells selected by the user Output 2: Randomly generates birthdays of50 people Output 6: Shows how many days have a certain number of births in common Acknowledgments: I would like to thank Andrew Raim and all of the members of CIRC for their help. using occupancy problems References Feller, William. An Introduction to Probability Theory and Chakraborty, Ranajit. “A Class Population Genetic Its Applications . New York: John Wiley & Sons, 1950. Questions Formulated as the Generalized Occupancy Problem.” Genetics Society of America (1993) 953- 958.

Transcript of A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis,...

Page 1: A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department.

A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A

MATLAB DEMOSamuel Khuvis, Undergraduate

Nagaraj Neerchal, Professor of StatisticsDepartment of Mathematics and Statistics, University of Maryland Baltimore County,

1000 Hilltop Circle, Baltimore, MD 21250

AbstractConsider an experiment of randomly distributing r

balls into n cells. One can conceive several easily

described probability problems related to this

experiment. Obtaining the probability that no two

adjacent cells are empty, finding the distribution of

the number of balls occupying a given cell and

deriving the distribution of the smallest number of

balls over all cells are a few examples of such

problems which are collectively referred to as

occupancy problems. Solutions to some of these

problems are non-trivial and in fact some naturally

give rise to well known probability distributions such

as binomial and multinomial distributions.

Occupancy problems have found important

applications in many areas. Distribution of Bose-

Einstein and Fermi- Dirac statistics are the most

celebrated examples of such applications. More

recently, questions from genetics, involving non-

randomness of occurrence of mutagen-induced

mutations across loci, have also been connected to

this general topic. In this poster, we provide a

glimpse to the probability calculations underlying

occupancy problems, and demonstrate them using

an interactive MATLAB program.

Examples

Applications

Fig. 2: These are four realizations generated by the MATLAB demo of an experiment in which 5 balls are thrown into 4 cells.

A

C D

Fig. 1: This is a screenshot of the MATLAB Demo used to visualize the occupancy problems. With six different operations that may be selected to the right.

Basic Calculations Concerning the Occupancy Problem

|S| = (n) (n)…(n)=

In the program, realizations were generated by:

For i=1 to number of balls

Generate a random number from 1 to the number of cells with each

number having a uniform probability of occurring

End

Conclusion

B

This has only been a basic introduction to occupancy

problems and there are many other calculations that

may be done based on the experiment of throwing r

balls into n cells.

These problems have many applications in the natural

sciences, especially in physics. More complex

calculations are able to explain the behaviors of

elementary particles. Using the simulation method, we

can begin to understand the probability distributions

which arise from these models.

-A binomial distribution describes the distribution of results of an experiment

in which:

1. There is a sequence of n trials, where n is fixed in advance

2. Each trial results in one of two possible outcomes, which is denoted as

either a success or a failure

3. The trials are independent, so each outcome on any particular trial

does not influence the outcome of any other trial

4. The probability of success (p) is constant from trial to trial

-Where the probability of x number of successes is

Binomial Distributions

Binomial Distribution of 1st Cell

T=number of balls in cell 1, where T is a random variable

t=0, 1, 2,…, r, where t is all r of the balls being thrown

: exactly t of ‘s are 1 and the

others are not 1}

Number of (T=t)= (n-1)r-t

So, T is a binomial distribution such that T ~ Bin(r, )

Multinomial Distribution

-A multinomial distribution is similar to a binomial with the

exception that instead of having 2 possible outcomes, there

are greater than 2 possible outcomes

-Let = number of balls in cell 1 and = number of balls in

cell 2

-The third outcome is a ball going into a cell other than cell 1

or cell 2

-( , ) ~ multinomialMinimum Calculations

Y=minimum number of balls occurring in any cell

So,

For Y>0 it is non-trivial to calculate the P(Y) without the use of a

simulation.

Statistical Mechanics

-We have r indistinguishable particles subdivided into n small regions, or phase

spaces with the particles being randomly distributed into these phase spaces

-It would seem that all arrangements are equally possible, however physicists

have shown that this is not the case. So, there are two statistics to describe the

behavior of particles:

-Fermi-Dirac Statistics

-Bose-Einstein Statistics

-In this realization, no two particles may be in the same cell and all

distinguishable arrangements have equal probabilities

-This means that r ≤ n, so any of the arrangements can be chosen

by randomly selecting which r cells contain a particle. Each

arrangement has a probability of and describes the behavior of

electrons, protons, and neutrons.

-In this realization, each distinguishable arrangement is given a

probability of

-This has been proven, experimentally, to describe the behavior of

photons, nuclei, and atoms that have an even number of elementary

particles

Population Genetics

-Since genetic data is often analyzed through categorical

observations, the computation of expected frequencies of

different genetic models can be described

-These are important in genetics when testing the non-

randomness of mutagen-induced mutations across loci.

-The occupancy problem is applied to these analyses to

combinatorially solve the problem of an inadequate sample

size.

-In this application, r is the size of the random sample and n is

the number of classes being analyzed in the sample

Matlab Demo

Function 1: Can generate one realization at a time for a certain number of balls and

cells.

Function 2: Can simulate a large number of realizations and empirically compute

probabilities.

Function 3: Allows the user to change the number of balls and cells

Output 1: One arrangement of 50 balls and 25 cells

Output 5: Displays the distribution of the balls in the first cell over 1000

realizations

Output 3: Displays the number of balls in each cell over 1,000 realizationsOutput 4: Displays the minimum number of balls for each of 10,000 realizations if

each arrangement has 50 balls and 25 cells selected by the user

Output 2: Randomly generates birthdays of50 people

Output 6: Shows how many days have a certain number of births in common

Acknowledgments: I would like to thank Andrew Raim and all of the members of CIRC for their

help.

using

occupancy problems

References

Feller, William. An Introduction to Probability Theory and

Chakraborty, Ranajit. “A Class Population Genetic

Its Applications. New York: John Wiley & Sons, 1950.

Questions Formulated as the Generalized Occupancy

Problem.” Genetics Society of America (1993) 953-958.