BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and...
Transcript of BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and...
BIO5312 Biostatistics BIO5312 Biostatistics R Session 03: Random Number and R Session 03: Random Number and
Probability DistributionsProbability Distributions
Dr. Junchao Xia
Center of Biophysics and Computational Biology
Fall 2016
9/13/2016 1 /12
Random Number GeneratorRandom Number Generator Random number generators have many important applications in gambling, statistical sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator of genuinely random numbers means a mechanism for producing a sequence of random variables, X1, X2 , X3, …Xn, with the property that
1) Each Xi is uniformly distributed between 0 and 1. 2) The Xi are mutually independent.
“True” vs. pseudo-random numbers 1) First method measures some physical phenomenon that is expected to be random and then compensates for possible bases in the measurement process such as atmospheric noise and thermal noise. “True” random numbers 2) Second method uses computational algorithms that can produce long sequences of apparently random results, which are in fact completely determined by a shorter initial value, known as a seed value. Pseudo-random numbers
A linear congruential generator is a reoccurrence of the following form: Where the multiplier a and the modulus m are integer constants that determine the values generated, given an initial value (seed) X0.
1) Park and Miller method: a= 231 -1=2147483647, m=16897 2) L’Ecuyer method: a = 2147483399, m=40692
9/13/2016
m,/IX m, mod aII 1i1ii1i
2 /12
General Sampling MethodsGeneral Sampling Methods
Assuming we have a random number generator to produce a sequence of random variables, U1, U2 , U3, …Un, which are mutually independent and uniformly distributed between 0 and 1. How can we obtain a sequence of variables obeying some certain distribution such as normal.
Inverse transform method is the simple but very important one among many others.
1) Suppose we want to sample from a cumulative distribution function F(x); i.e. we want to generate a random variable X with the property that P(X <x) =F(x) for all x.
2) The inverse transform method sets X= F-1(U), where U~Unif[0,1].
9/13/2016 3 /12
Generate Random Integers In RGenerate Random Integers In R
Examples using the sample() function
# set work directory
> setwd("C:/Users/Junchao/Desktop/Biostatistics_5312/2016/lab_03")
# generate a random integer between 1 to 20
>sample(1:20,1)
# generate 10 random integers between 1 to 20 with repeats are allowed
> sample(1:20,10,replace=T)
# select 10 states randomly without repeats
>sample(state.name,10,replace=F)
# sample 52 states randomly without repeats
>sample(state.name,52,replace=F)
# sample 52 states randomly with repeats
sample(state.name,52,replace=T)
# sample 50 states randomly without repeats
>sample(state.name,50,replace=F)
9/13/2016 4 /12
Generate Random Generate Random FloatsFloats Examples using the runif() function # generate 10 random numbers between 0 and 1 >runif(10,0,1) # generate 1000 random numbers between 1.5 to 10.5 > y=runif(1000,1.5,10.5) # check the histogram >hist(y) # generate 10,000 random numbers between 1.5 to 10.5 > y=runif(10000,1.5,10.5) # check the histogram, any difference? >hist(y) # set the seed for the random number generator >set.seed(12345) # generate 1000 random numbers and set to x x=runif(1000,1.5,10.5) # generate another 1000 random numbers and set to y > y=runif(1000,1.5,10.5) # reset random number seed to 12345 >set.seed(12345) # generate another 1000 random numbers and set to z >z=runif(1000,1.5,10.5) # plot scatter plots for x-y and x-z >plot(x,y,xlab="x",ylab="y") >plot(x,z,xlab="x",ylab="z")
9/13/2016 5 /12
Generate Random Generate Random Floats: Continued Floats: Continued
Examples plots from the previous slide
9/13/2016 6 /12
Binomial DistributionBinomial Distribution
Examples using the dbinom(), pbinom(), rbinom() # check the help
>help(dbinom)
# get a binomial distribution with n=10,p=0.05
>x=0:10
>y=dbinom(x,10,0.05)
>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.05")
# get a binomial distribution with n=10,p=0.95
>y=dbinom(x,10,0.95)
>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.95")
# get a binomial distribution with n=10,p=0.50
>y=dbinom(x,10,0.50)
>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.50")
# get the cumulative probability function
>y=pbinom(x,10,0.5)
>plot(x,y,xlab=“k”,ylab=“CDF of Pr(k)”,main=“n=10,p=0.50”)
# generate 1000 random numbers from the binomial distribution
>z=rbinom(1000,10,0.5)
>hist(z)
9/13/2016 7 /12
Binomial Distribution: ContinuedBinomial Distribution: Continued
Some plots from the previous slide
9/13/2016 8 /12
Poisson DistributionPoisson Distribution Examples using the dpois(), ppois(), rpois() # check the help >help(dpois) # get a Poisson distribution with lambda*t=4.6 >x=0:10 >y=dpois(x,4.6) >plot(x,y,xlab="k",ylab="Pr(k)",main="lambda*t=4.6") # get a Poisson distribution with lambda*t=1.15 >y=dpois(x,1.15) >plot(x,y,xlab="k",ylab="Pr(k)",main="lambda*t=1.15") # get the cumulative probability function >y=ppois(x,4.6) >plot(x,y,xlab="k",ylab="CDF of Pr(k)",main="lambda*t=4.6") # generate 1000 random numbers from the Poisson distribution >z=rpois(1000,4.6) hist(z) # Poisson approximation to the Binomial distribution >x=0:20 >y=dbinom(x,100,0.05) >z=dpois(x,5.0) >plot(x,y,xlab="k",ylab="Pr(k)",col="red", main="Red: Binomial,n=100,p=0.05 \n green: Poisson, lambda*t=5.0") > points(x,z,col="green")
9/13/2016 9 /12
Poisson Distribution: ContinuedPoisson Distribution: Continued
Some plots from the previous slide
9/13/2016 10 /12
Normal DistributionNormal Distribution Examples using the dnorm(), pnorm(), rnorm() # check the help >help(dnorm) # get a normal distribution with mean=2,sd=4 >x=c(-5:9) >y=dnorm(x,2,4) >plot(x,y,xlab="x",ylab="Pr(x)",main="normal, mean=2, sd=4") # generate 1000 random numbers from the normal distribution >z=rnorm(1000,2,4) # get PDF of z >hist(z,freq=F) # add y values as red points > points(x,y,co="red") # normal approximation to the binomial distribution > x=0:20 > y=dbinom(x,25,0.4) # normal distribution with mean=np, variance=npq >z=dnorm(x,10,sqrt(6)) >plot(x,y,xlab="x",ylab="Pr(x)",col="red",main="red: binomial\n green : normal") >points(x,z,col="green") # normal approximation to the Poisson distribution >y=dpois(x,10) >z=dnorm(x,10,sqrt(10)) >plot(x,y,xlab="x",ylab="Pr(x)",col="red",main="red: poisson\n green : normal") > points(x,z,col="green")
9/13/2016 11 /12
Normal Normal Distribution: ContinuedDistribution: Continued
Some plots from the previous slide
9/13/2016 12 /12
The End
9/13/2016 13 /12