Normal distribution

50
Probability and Statistics Mat 271E Yard. Doç. Dr. Tarkan Erdik Probability Distributions Uniform and Normal Distributions- Week 7 1

description

this slide for data analysis

Transcript of Normal distribution

Probability and Statistics Mat 271E

Probability and StatisticsMat 271EYard. Do. Dr. Tarkan Erdik

Probability DistributionsUniform and Normal Distributions- Week 71Probability Distributions2It has been observed that certain functions F(x) and f(x) can successfully express the distributions of many random variables.

3Several continuous distributions play useful roles in engineering as in numerous other disciplines. The more important ones are:

Uniform, Normal,Exponential, Gamma, Beta,Weibull,Lognormal distributions.

What is the difficulty is in choosing the distribution function? There are no general rules for this.

4So how should/can we choose it?The engineer has to make a choice based on his experience and knowledge as regards the properties of the commonly used distribution. 5

The comparison of the histogram of the observed data with the chosen probability density function helps in the decision making.

Uniform DistributionThe simplest type of continuous distribution is the uniform. As implied by the name, the pdf is constant over a given interval (for example, from a to b, where a < b).f(x)=constant, F(x)=cx; c is constant.6

7The density function of the continuous uniform random variable X on the interval [A, B] is

?1/(3-1)8

The mean and variance of the uniform distribution areExample: Assume that the length X of a conference has a uniform distribution on the interval [0, 4]. P[X3]=?

?,

P[X3]=Normal DistributionsThe normal distribution arose originally in the study of experimental errors.

9Such errors pertain to unavoidable differences between observations when an experiment is repeated under similar conditions. An alternative term is noise, which is used in telecommunication engineering and elsewhere when referring to the difference between the true state of nature and the signal received.

The uncertainties which are manifest in the errors may arise from different causes that are not easily identifiable.10THE NORMAL DISTRIBUTION IS AN IDEAL CANDIDATE TO REPRESENT SUCH ERRORS WHEN THEY ARE OF AN ADDITIVE NATURE.A large number of random variables encountered in practical applications fit to the Normal (Gaussian) distribution with the following probability density function:11

This distribution is shown briefly as N(, 2). It has two parameters: X and X . Normal distribution is symmetrical (Cs=0), with a kurtosis coefficient equal to 3 (K=3).The mode and median are equal to the mean because of the symmetry. The sample values of x and sX can be taken as the estimates of the parameters X and X12

(a) Probability density function, f X (x), and (b) probability distribution function, FX (x), of X for m=0 and =1Once and are specified, the normal curve is completely determined. For example,if = 50 and = 5, then the ordinates n(x; 50, 5) can be computed for various values of x and the curve drawn.

Example: Lets draw to two normal distribution functions having the same mean but different standart deviations such as n(x; 0, 1) and n(x; 0.5, 1) on the axis x=[-3 3]. 13[-3:.1:3]; [-3:.1:3]; 14x = [-3:.1:3];norm = normpdf(x,0,1);norm1 = normpdf(x,0.5,1);figure;plot(x,norm,'r')hold onplot(x,norm1,'b')

15x = [-3:.1:3];norm = normpdf(x,0,1);norm1 = normpdf(x,0,2);figure;plot(x,norm,'r')hold onplot(x,norm1,'b')

16The analytical form of the probability distribution function F(x) of the normal distribution cannot be obtained by integration, but is tabulated numerically. A single table for the normal distribution can be prepared by standardizing the random variable as follows

where the standard normal variable Z has the mean 0 and standard deviation 1. The distribution N(0,1) of the variable Z is called the STANDARD NORMAL DISTRIBUTION. 17

Probability distribution function of standard normal distributionSince the normal distribution is symmetrical, this table is prepared for the positive values of Z only.

The probabilities of Z exceeding a certain positive value z, F1(z)=A are given.

For positive z we can compute the probability of nonexceedance as F(z)=1-F1(z), and for negative z we have F(z)=F1(z) because of symmetry around the mean 0. 18The probability density function is bell-shaped around the mean X .

The mode and median are equal to the mean because of the symmetry.

The probabilities of the normal variable to remain in the intervals around the mean of width one, two and three standard deviations are equal to 0.683, 0.955 and 0.9975 (nearly 1), respectively.19

The probability paper of the normal distribution The ordinate axis of this probability paper is scaled such that the cumulative distribution function of the normal distribution appears as a straight line The standard deviation can be computed as X=X0.84-X or X=X -X0.16 because the probability of the normal variable to remain in an interval of two standard deviations around the mean is about 0.68.What is ?Central Limit Theorem states that, the distribution of the variable

X=

where Xi are independent random variables approaches the normal distribution with the increase of n, whatever the distributions of the variables Xi are.

The approach is rather fast such that the normal distribution can be assumed for n10.

Thus, if a random variable is affected by a large number of independent variables such that the effects are additive, then it can be assumed to be distributed normally.

20

21

22

23Example: Given a standard normal distribution, find the area under the curve that lies(a) to the right of z = 1.84 and(b) between z = 1.97 and z = 0.86.

Solution (a)24

25Solution (b)

1-0.0244=0.9756.9756-0.1949=0.780726Example: Given a standard normal distribution, find the value of k such that(a) P(Z > k) = 0.3015 and(b) P(k < Z < 0.18) = 0.4197

Solution (a)27

k = 0.5228Solution (b)

0.4286-P(k)=0.4197P(k)=0.0089k=-2.37Example:Given a random variable X having a normal distribution with = 50 and = 10,find the probability that X assumes a value between 45 and 62.2930

Solution : The z values corresponding to x1 = 45 and x2 = 62 are

31Solution (b)

1-0.1151=0.88490.8849-0.3085=0.576432Example:Given a normal distribution with = 40 and = 6, find the value of x that has(a) 45% of the area to the left and(b) 14% of the area to the right.

33Solution

(a) x = (6)(0.13) + 40 = 39.22(b) x = (6)(1.08) + 40 = 46.4834Example:A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 year. Assuming that battery life is normally distributed, find the probability that a given battery will last less than 2.3 years.

35Example:An electrical firm manufactures light bulbs that have a life, before burn-out, that is normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 778 and 834 hours.36

37

-0.550.85

1-0.2912= 0.70880.7088-0.1977=0.511138Example: In an industrial process, the diameter of a ball bearing is an important measurement. The buyer sets specifications for the diameter to be 3.0 0.01 cm. The implication is that no part falling outside these specifications will be accepted. It is known that in the process the diameter of a ball bearing has a normal distribution with mean = 3.0 and standard deviation = 0.005. On average, how many manufactured ball bearings will be scrapped?

-22

39P(Z < 2.0) + P(Z > 2.0) = 2(0.0228) = 0.0456As a result, it is anticipated that, on average, 4.56% of manufactured ball bearingswill be scrapped.40Example:Gauges are used to reject all components for which a certain dimension is not within the specification 1.50 d. It is known that this measurement is normally distributed with mean 1.50 and standard deviation 0.2. Determine the value d such that the specifications cover 95% of the measurements.41P(1.96 < Z < 1.96) = 0.95

-1.96 1.960.95(X-1.5)/0.2=1.96; x=1.89 and d =1.89-1.5=0.39

42How can we tell if a variable is normally distributed? 1. The first check is by sketching the cumulative frequency distribution of the data on the normal probability paper. We can resume normal distribution if the plot is nearly a straight line. 432. The closeness of the skewness coefficient Cs of the sample to 0 (its absolute value below 0.10 or 0.05) and that of the kurtosis coefficient K to 3 (between 2.5 and 3.5) are further checks. 3. If the data pass these tests, the assumption of the normal distribution can be tested by statistical tests to be discussed in Chapter 6.The normal distribution is certainly not valid in many cases because the variable is skewed.

Most hydrologic variables (such as the discharge in a stream, the precipitation depth at a location) are NOT symmetrically distributed.

For such variables, distributions other than normal must be used.

4444ExercisesGiven a standard normal distribution, find the value of k such that(a) P(Z > k) = 0.2946;(b) P(Z < k) = 0.0427;(c) P(0.93 < Z < k) = 0.7235.4546

(a) 0.54; (b) 1.72; (c) 1-0.1762=0.8238; 0.8238-0.7235=0.1003 EXAMPLE 4.3. (M. Bayazt, page 83) The load of a footing consists of the sum of the dead load and moving load. These loads are assumed to be random variables. The dead load X has the mean 100 t, standard deviation 10 t. The moving load Y has the mean 40 t, standard deviation 10 t. The design load corresponds to the risk of exceedance of 5%. Let us determine the design load assuming that loads follow the normal distribution.

4748EXAMPLE 4.5 (M. Bayazt, page 85) Water is transmitted from A to B by two parallel pipelines. The capacities of these pipelines are assumed to be normal variables with parameters:

X = 5 m3/sCvX = 0.10

Y = 8 m3/sCvY = 0.15

Find the probability that the total discharge is below 12 m3/s.

4950