Post on 05-Apr-2018
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 1/48
Statistics, Probability and
Applications
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 2/48
1
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
This is a short course [based on the lecture notes to the M.Sc (Geography) students of Distance
Education] on the elements of Statistics and the concept of Probability supplemented with
examples and illustrations. The purpose of this general and basic course is to serve as aguideline to the practical usages by any student
Statistics, Probability and Applications
What is Statistics?
Statistics is a systematic presentation of data out of which we may conclude something
meaningful.
Just a collection of raw data is meaningless unless we are able to calculate some quantities out
of them. It is only interesting when some patterns emerge out of the data that are
representative of some event or measurement.
After we collect a set of data, the first thing we like to do is to obtain the central tendency of it.
Central Tendency
The central tendency of a data set is obtained by calculating mean, median and mode.
Mean:
There are various kinds of mean, (i) arithmetic, (ii) geometric, (iii) harmonic. We usually calculate
arithmetic mean and this we commonly call mean or average.
Suppose, we have a set of
-data points:
.
Arithmetic mean (A.M.)
∑ (1)
The arithmetic mean or average is the measure of the ‘middle’ of the data set.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 3/48
2
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Now suppose, appears times, appears times and so on in the data set. Here
, … are called the frequencies. The arithmetic mean in this case is
∑ , (2)
where
.
Formula (2) is called the weighted mean.
Note: In the formula (2), if we put for all , we get back formula (1).
The above formula (2) can also be written as
∑ ∑ ,
Where
is the relative frequency for each
(each data point).
Example #1
The ages of father, mother, son and daughter in a family are 60 years, 55 years, 25 years
and 20 years respectively. What is the average age of the family members?
Ans. Average age =
years.
Example #2
In the game of ‘Ludo’ (dice throwing), you obtain ‘1’ two times, ‘2’ five times, ‘3’ twotimes, ‘4’ six times, ‘5’ four times and ‘6’ only once from the random throwing of a dice.
What is the average value you get?
Ans. Average value =
.
Geometric mean (G.M.):
G.M. = ∏
Harmonic mean (H.M.):
H.M. =
∑
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 4/48
3
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
It is useful to calculate arithmetic mean (A.M.) of any set of numbers unless they have
some special properties among them.
For example, if we are to find the mean of the following set of numbers: 2, 4, 8, 16, 32, it
is useful to calculate the geometric mean (G.M.).
G.M.=
Note: The numbers 2, 4, 8…are in geometric progression.
If we are asked to find out the mean of the following numbers,
, it would be
interesting to find out the harmonic mean (H.M.):
H.M.=
.
Note: Here the numbers
are in harmonic progression. In fact, the inverse of the
numbers in H.M. are in A.M.
Useful Method of Mean Calculation:
In practical calculations, when we are to obtain arithmetic mean (A.M.) of a set of big numbers,we follow a short cut method:
Step I: We assume a mean by just looking at the numbers. Let this be .
(This is our choice and we do this as per our convenience.)
Step II: Next, we calculate the deviation of this assumed mean from each data point: .
Now, the calculated mean
∑
∑
The actual arithmetic mean,
Similarly, for data with frequencies, ∑
∑ ∑
Here also, we get the same formula as above, = .
=
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 5/48
4
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Example #1
Consider the following table. We are to calculate the mean rainfall over seven days in monsoon
season.
Days Rainfall in mm.
1 250 50
2 240 40
3 190 -10
4 254 54
5 225 25
6 232 32
7 170 -30
Total 1561 161
Here the assumed mean, mm, .
∑
mm. The actual mean, = mm.
Also, verify by direct calculation, ∑
mm.
Example #2
Calculate the mean of the following data with the help of assumed mean method.
Class
interval
10-20 45 4 5 20
20-30 35 5 -5 -25
30-40 48 3 8 24
40-50 43 2 3 6
50-60 40 1 0 060-70 37 1 -3 -3
70-80 39 4 -1 -4
Total 20 18
Here assumed mean, and number of data, ∑
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 6/48
5
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Mean of deviation, ∑
The actual mean,
We can also check this from direct calculation,
∑
Median:
Median is the data in the middle when the data set is arranged in ascending or descending
order.
Example #1
9, 12, 6, 1, 11
After ordering, 1, 6, 9, 11, 12
Median = 9.
If the data set has even number of entries, the median is the mean of the two data point at the
middle after the ordering.
Example #2
9, 12, 6, 1, 11, 13
After ordering, 1, 6, 9, 11, 12, 13
Median =
Mode:
Mode is the data value which has maximum frequency. This means this value occurs maximum
number of times in the data set.
Example:
0, 2, 5, 9, 3, 2, 6, 2, 3, 5, 4, 2, 1
In the above data set the number 2 occurs maximum times. Mode = 2.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 7/48
6
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Usually, a data set follows the following approximate empirical formula:
Measures of Position
It is often important to classify data!
In statistical data analysis, we often like to measure the position of a data point relative to
other values in the set. For example, we like to know the rank or position of a student relative
to others in a certain examination.
The measures are done for rank-ordered data, where the elements in the data set are arranged
in ascending order (from the smallest to the largest).
The following are the most common measures of position of the rank-ordered data:
Percentiles:
Percentile is the value of a variable below which a certain percent of observations fall. For
example, 90th
percentile is the value (or score) below which 90% of the data are to be found.
Suppose, we have -number of values. How is the percentile calculated?
1. First the data is rank-ordered (arranged in ascending order)
2. To calculate the -th percentile we have to find the rank :
3. Round off the above rank to the nearest integer and then take the value corresponding
to the integer rank.
Example:
Given the numbers 2, 5, 4, 9, 8, 1
Rank ordered set: 1, 2, 4, 5, 8, 9
The rank of the 60th
percentile,
(rounded off to nearest
integer)
The 60th
percentile is 5 (the 4th
member in the ordered list).
Median – Mode = 2Mean
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 8/48
7
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Note: The 100th
percentile is defined to be the largest value in the given data set.
Quartiles:
A quartile is one of the three points that divide a rank-ordered data set into four equal groups.
First quartile (): Cuts off lower 25% of data ⇨ 25th
percentile
Second quartile (): Divides the data set into half ⇨ 50th
percentile
Third quartile (): Cuts off lowest 75% (or highest 25%) of data ⇨ 75th
percentile
Inter quartile range = upper quartile – lower quartile
Note: The 50th
percentile = Median
Deciles:
Like percentiles, deciles is calculated to find the position of data out of 10 (instead of 100). So
all we have to do is to replace 100 by 10 in the above percentile formula.
Probability Theory and Applications
For randomly occurring events, we would like to know how many times we get a desired result
out of all trials. This means we would like to know the fraction of favourable events or trails.
Suppose, we flip a coin a few number of times. We may count how many times there is a
“Head” or a “Tail” out of all the flips.
Let,
= No. of favourable events and = Total no. of events.
= fraction of favourable events. We can also say this is relative frequency in the usual
language of Statistics.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 9/48
8
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Now, if we do the trials a large number of times, this fraction tends to some fixed value
specific to the event. Then the limiting value of the fraction is what we call probability .
Note:
Total no. of trials is also called ‘sample space’ when we are drawing samples out of total
‘population’. As the no. of trials is increased, the sample space becomes bigger.
Definition of Probability:
Probability is the ratio of number of favourable events to the total number of events, provided
the total number of events is very large (actually infinity).
, when (infinity).
So by definition,
is a fraction between 0 and 1 :
.
No favourable outcome.
All the outcomes are in favour.
We can also think in the following way: probability of occurring an event, probability of
not occurring the event. Since, either the event will occur or not occur, we must write:
Therefore, we have .
Example #1:
In a coin tossing, we know from our experience, = and = =
. So,
.
Example #2:
In a throw of a dice, we know that the probability of the dice facing “1” up, “2” up, “3” up etc.
will be , , and so on.
Here,
Probability of not occurring “1” is .
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 10/48
9
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Note:
The condition that the total probability of all the events has to be 1 is called normalization of
probabilities.
Rules of Probability:
When more than one event takes place, we need to calculate the joint probability for the all the
events.
Mutually Exclusive Events
Two events are mutually exclusive (or disjoint) when they can not occur at the same time.Suppose, two events are A and B and the individual probabilities for them are designated as
and . Mutually exclusive means,
.
Addition Rule:
Example#1: The probability of occurring either Head or Tail in a coin toss,
Example#2: The probability of occurring either “1’ or “6” in a dice throw,
.
Independent Events
When the occurrence of one event does not influence the other but they can occur at the sametime, they are called independent. For example, the rain fall today and the Manchester United
winning a match.
Multiplication Rule:
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 11/48
10
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Example#1: What is the probability that two Heads will occur when we toss two coins together?
for the first coin and for the second coin.
.
Note that if would flip a single coin two times and ask the probability of getting Heads twice, we
would get the same answer.
Example#2: Now we ask the question, what is the probability of getting one Head and one Tail
in the flipping of two coins together?
Consider, the probability of obtaining Head in the first coin and Tail in the second coin:
.
And the probability of obtaining Tail in the first and the Head in the second:
.
Now the total probability of above two events (either of them occurs mutually exclusively):
.
Note that in the flipping of two coins together, there are 4 types of events, HH, HT, TH, TT. Out
of which the relative occurrence of one Head and one Tail is 2/4 = /12.
Events which are NOT Mutually Exclusive:
If the events are not mutually exclusive, there are some overlap. Suppose, we designate an area
A corresponding to the probability of some event A and the area B to the probability of another
event B. The overlap between the two areas then represents the joint probability .Note that for two independent events the overlap would be zero.
Fig.
Addition Rule in this case:
Events that are NOT Independent:
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 12/48
11
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Multiplication rule:
⁄ ) ⇨ “The probability of B given A”. This is a conditional probability , i.e., the probability of occurring B provided A occurs first.
Similarly, ⁄ ) ⇨ “The probability of A, given B”.
Note here that
⁄ ) = , when B does not depend on A which means A and B are independent.
⁄ ) = , when A does not depend on B which means A and B are independent.
So, we can write the formula for conditional probability :
Now to illustrate, follow the following table:
In a survey over 100 people, the question was asked whether they are graduate or not.
Q,1 What us the probability that a randomly selected person is a male?
Ans.
Q.2 What is the probability that a randomly selected person is a female?
Ans.
Q.3 What is the probability that a randomly selected person is a male who is graduate?
Graduate Non-
graduate
Total
Male 40 20 60
Female 10 30 40
Total 50 50 100
⁄ )
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 13/48
12
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Ans.
[Also we can think,
]
Q.4 What is the probability that a randomly selected person is a female who is non-graduate?
Ans.
[Also,
]
Q.5 What is the probability that the randomly selected person is either a male graduate or a
female non-graduate?
Ans. This two events are mutually exclusive and by the law of addition,
.
Q.6 If we now select two persons, what is the probability that one of them is a male graduate
and another is a female non-graduate?
Ans. Two independent events are occurring together. So by the law of multiplication of
probabilities, .
Q.7 What is the probability that a randomly selected no-graduate is a female? [Prob. of non-
graduate among female]
Ans.
Q.8 What is the probability that a randomly selected graduate is a male?
Ans. This is no. of male out of total graduates, .
Note: In Q.7 & 8, each probability is a conditional probability . However, we gave the answers by
looking at the table directly. Now we answer them in terms of the law of conditional
probability.
Ans. to Q.8: Suppose, A = graduate, B = male, ⁄ = probability of male given that they are
graduates.
We use the formula:
Here, = Prob. of male graduates =
, = prob. of graduates =
⁄ .
Exercise: Q.7 can also be answered in terms of conditional probability formula. Do this and check
yourself.
Q.9 What is the probability that the selected person is either male or graduate?
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 14/48
13
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Ans. Here the two events do not happen together but they are not mutually exclusive. So we
use the formula:
=
.
Probability Distributions
Let us think of the probabilities for a number of events marked 1, 2, 3…..and so on.
For each event we can have and also for all the events, ∑ .
So we have a set of probabilities corresponding to a set of events. This collection is a probability
distribution for all that discrete events.Suppose, instead of discrete events we think that is variable which can continuous values in
and there is the probability for each value of . Now if we plot against , we get a
continuous curve which is the continuous probability distribution curve (commonly referred as
the probability distribution curve).
Fig.
Area under the curve (above x-axis) can be obtained by summing up the areas of the approximate
rectangular bars (which we may easily find by plotting this on a graph paper). Approximate area
of one such bar of width and height is = . So, the approximate total areabetween the two end points and is = ∑ .
To calculate exactly, we need the help of Integral Calculus which essentially sums up the areas
of the rectangles (bars) of infinitesimally small width.
←
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 15/48
14
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
[ NOTE: Those not familiar with the Mathematics of Calculus, should not have to worry much as
the following explanation and symbols can be understood qualitatively and that may serve the
purpose for now.]
The area under the curve (between the two extreme points shown in the above figure) is thefollowing definite integral:
Area = ∫ = .
is the total probability for all the values between the two limits. That is why, is often
referred to as the probability density. So, is the probability (and the area of the bar of
height and width ) in between and , where is the infinitesimally small (smaller
than you can think) range!
Note that for discrete case, the above is the sum of all the mutually exclusive events.
* The sum, ∑ becomes the integral, ∫ for continuous case.]
Also,
∫ = (Normalization)
The above means that the total area under the curve (extended from negative infinity to
positive infinity that means over the entire stretch of the curve.) is unity. This is true as in
discrete case we know that the sum of all the probabilities for all the events should be 1.
For discrete events, we calculated the relative frequency and then the Bar diagram from them.
Here for the continuous case, the bars merge together to form a continuous spectrum and that
is the probability distribution. The relative frequencies tend to the probabilities for
corresponding values of the variable for large number of events.
Now given the probability distribution curve, we would like to know about the shape and size of
the curve, some specific quantities that are representative of the character of the event.
For any discrete set of data collection, we measure the central tendency of the data set. We
commonly calculate mean, mean of square and variance.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 16/48
15
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Mean:
=∑ ∑ =
∑ ∑
,
where is the frequency of occurrence for event and we have total frequency, ∑ .
[Note: relative frequency]
Mean of Square:
∑ = ∑
Variance:
Var () = =∑
=∑ ∑
= ∑
*∑ +.
Standard deviation is the square root of the variance.
Now for a large number of events each of the ratio in each of the above formulas becomes
the corresponding probability : as tends to very large.
Mathematical Expectation and Mean
If the probabilities , , etc. are known for the values , , and
so on, we write
Expectation: ∑ , where ∑
However, when instead of probabilities, we are given the frequencies
,
, …for the quantities that appear in a data set, we calculate
Mean or average: = ∑
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 17/48
16
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Therefore, we write the above quantities in terms of probabilities:
Now we calculate the above quantities from the following dice throwing experiments.
Example #1 Throwing of a single dice:
The chance of turning up of any side is equal which is 1 out of 6. We consider that a priori
probabilities for each case and find out the mean and variance from the following table.
1 2 3 4 5 6 Total
1/6 1/6 1/6 1/6 1/6 1/6 1
1/6 2/6 3/6 4/6 5/6 6/6 21/6
1/6 4/6 9/6 16/6 25/6 36/6 91/6
From the table, we can calculate mean, and
variance,
If we plot against , we obtain the probability distribution for this case. This distribution is
uninteresting as we can check that the probabilities for all values of are same! The curve
obtained by joining the points will be a horizontal straight line.
Fig.
Mean, ∑
Mean of Square,
=
∑
Variance, = ∑
∑
= σ Standard deviation
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 18/48
17
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Now we do this similar experiment taking two dice.
Example #2 (Two Dice)
We look for the value of which is the sum of two numbers on the top faces of the two dice.Here we shall have possible combinations of events and can have a minimum
value, and maximum value, .
2 3 4 5 6 7 8 9 10 11 12 Total
1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1
2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 252/3
4/36 18/36 48/36 100/36 180/36 294/36 320/36 324/36 300/36 242/36 144/36 1974/
Mean, , Variance,
Now if we plot against taking from above table, we get an interesting symmetric
distribution around a peak! The peak is at (mean value).
Fig.
The distribution is showing a peak at the middle and it is symmetric!
We can go on doing such experiment 3 or more dice together and ask for the sum of values
occurring on all the dice together and calculate the corresponding probabilities as above. We
can realize that the distribution would be smoother retaining the symmetry with the peak value
at the mean.
In fact, the envelope of the probability values at different (joining the top of the height bars)
of the discrete distribution will slowly assume a continuous symmetric curve!
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 19/48
18
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
In the limit of large number of events obtained from the large number of dice throwing
together, we tend to get a continuous bell shaped symmetric distribution.
This is Normal Distribution.
Fig.
For any naturally occurring event, for any random measurement of any value in any
experiment, the distribution that occurs is Normal distribution. The bell shaped symmetric
curve is called Normal curve. If we calculate the height distribution or age distribution among a
population, the probability distribution turns out to be Normal. The name ‘normal’ is given as it
occurs normally.
Properties of Normal Distribution:
Symmetric about mean
(mean at the centre or at peak position)
Approximate are under the curve:
A = 68%,
[within one standard deviation ( from the mean ( on both sides]
A = 95%,
A = 99.7%,
Fig.
For a large number of independent random events, the probability distribution is normal
distribution. This is called Central Limit Theorem.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 20/48
19
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
NOTE: The following part can be skipped by students who are not familiar with Calculus.
How Z-distribution is obtained from Normal distribution:
Mathematical Expression for Normal distribution:
√
, (1)
where mean, = standard deviation. The above expression is symmetric around the mean,
.
[The value of the exponential,
]
Normal distributions are often referred by the symbol:
The total area under the curve,
∫
= 1.
If we put
, we get
Thus we can write, the rescaled probability,
√ (2)
Now the above is a symmetric distribution around .
So the Normal distribution (1) has become a ‘Z-distribution” in (2). This is nothing but a normal
distribution with mean = 0 and standard deviation = 1.
We have to remember that the area under the curve between values of gives us the total
probability:
Area = ∫ √ .
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 21/48
20
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Now instead of actually doing the integration over , we are supplied with the -score and we
find the area under the curve (hence the total probability) between two limits from the table.
(See the z-score table.)
Consider the following typical situations where we have to calculate the areas
from z-distribution:
Fig.
(Total area under the curve = 1)
Fig.
(Area between and is 0.5 or area between
and is 0.5 because of symmetry)
Fig.
(Area between and any other value )
Fig.
(Area between two positive values of or between
two negative values)
Fig.
(Area between a negative value and a positive value)
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 22/48
21
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Fig.
(Area less than a negative or greater than apositive value)
Important:
In the z-score table we always look for the area between zero and any other value (as the
integral is actually done that way). So, zero is always the reference point.
Finally, the area between any two values of is obtained by adding or subtracting the scores
involving zero. This will be clear from the following examples.
Examples:(While solving the following problems consult the z-score table given in the appendix.)
#1. In the Geography examination, the marks distribution is known to be Normal where the
mean is 52 and the standard deviation is 15. Determine the z-scores of students receiving
marks: (i) 40, (ii) 95, (iii) 52.
Solution: Here, ,
(i)
(ii)
(iii)
So, we see the z-scores can be negative, positive or zero.
#2. Find the area under the normal curve in each of the following cases:
(i) and
Area = 0.3849 from table.
(ii) and
Area = 0.2518
(Note: The area is equal to the area between and as the curve is symmetric.)
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 23/48
22
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
(iii) Area between and 2.21
Area = (area between and 2.21) + (area between and -0.46)
= 0.4861 + 0.1772 = 0.6633
(Note: The areas are added as they are on both sides of .)
(iv) Area between and
Required area = (area between and 1.94) – (area between and 0.81)
= 0.4738 – 0.2881 = 0.1857
(Note: There is the subtraction as the two areas are on the same side of .)
(v) To the left of
Required area = 0.5 – (area between and )
= 0.5 – 0.2257 = 0.2743
(vi) To the right of
Required area = (area between and ) + 0.5
= (area between and ) + 0.5
= 0.3997 + 0.5 = 0.8997
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 24/48
23
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
#3. Among 1000 students, the mean score in the final examination is 25 and the standard
deviation is 4.0. Assume the distribution is Normal. Find the following.
(a) How many students score between 22 and 27?
=25, = 4.0
,
So the probability is the area under the curve between -0.75 and 0.5
= (area between 0 and -0.75) + (area between 0 and 0.5)
= 0.2734 + 0.1915 = 0.4649
The number of students in this marks range =
(b) How many students score above 30?
Probability = area right to
= (area between 0 and 1.25)
= 0.5 – 0.3944 = 0.1056
The number of students =
(c) How many students score below 15?
Area = 0.5 – (area between and -2.5) = 0.5 – 0.4938 = 0.0062
The number of students =
(d) How many score 24?
Here we have to calculate area between 23.5 and 24.5. ,
Area between and
= (area between 0 and ) + (area between 0 and
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 25/48
24
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
= 0.1480 – 0.0517 = 0.0963
The number of students = .
Symmetry of Distribution, Skewness
We have seen that a Normal distribution is symmetric around its peak (most probable value or
the value for which the probability is the highest). In a symmetric distribution the mean,
median and mode are at the same position.
The skewness is any deviation from symmetry or we can say, lack of symmetry.
Coefficient of skewness =
The above coefficient can be positive or negative. Below are the two figures demonstrating the
negative and positive skewness: the distributions are correspondingly called negative skewed
and positively skewed distributions.
Figs.
(Negative Skewness: Mean < Mode) (Positive Skewness: Mean > Mode)
For a symmetric distribution, skewness is zero.
Note:
The distribution we are discussing is a unimodal distribution that means a distribution which
has a single mode or one peak. But in many practical cases, we can have a distribution with
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 26/48
25
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
many peaks or many modes. For example, a distribution with two peaks (in fig.) is called a
bimodal distribution.
Figs.
(Bimodal distribution)
Combination Rules:
When we scale a variable that is we multiply a variable by a number or add with this, we need to know
how this scaled variable behaves. Do they have same statistical measures? Do they follow the same kind
of distributions? Also, we ask the same question for two or more variables when scaled and added
together to form a combined variable.
When
Mean: Variance:
When
Mean:
Variance:
If has a Normal distribution, is also a Normal distribution.
When
Mean:
Variance:
If and are separately Normal distributions, is then also a Normal
distribution.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 27/48
26
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Following the combination rules in the above box, we can solve the following problem.
Example:
The weight of individual people follows Normal distribution, . What will be theprobability distribution of weight of 10 people taking together?
Ans. Here, mean , .
Mean weight of 10 people, + = = 40
Variance, …+ = = 500
The probability distribution of weight of 10 people taking together, .
Binomial and Poisson Probability Distributions
Binomial Probability:
Suppose, the probability of occurring a certain event is and not occurring of the event is
. In a total of trials, the particular event occurs times each with probability and
does not occur
times each with probability
. Also, we have to know which
events
will occur out of total events. The number of ways we can do that is the number of
combinations = . Consider a variable which is equal to the relative frequency, .
As the events are considered independent, the joint probability will be
The above probability is called binomial probability.
Note:
The meaning of the symbol is given in the box below.
[For those who are not familiar of the above mathematical notations and rules, may consult the
necessary introduction given in the following Box.]
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 28/48
27
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Now consider the following table based on the binomial probability:
……..
) --------
If we add all the terms of the second row above, we get the following binomial expansion:
(1)
From the expression (1) above, we can easily check the following known algebraic formulas:
Factorial: ! =
For example, five factorial !
Consider that factorial of negative integers have no meaning and
! .
Note that we can write ! = ! Permutation: How many different objects can be arranged among themselves?
The answer is the permutation of objects, ! For example, for three objects A, B, C, the different combinations are ABC, ACB,
BCA, BAC, CAB, CBA: total 6 ways = ! Combination:
or
=!
!
!
This is the number of ways some objects can be selected from objects.
For example, if we want to know how 2 students can be selected from total 3
students, the answer is !!! !
!! .
Also note for quick calculations, !!! = 1,
!
!! and
!!! .
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 29/48
28
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
……………….
= ……………..
The coefficients of the terms on the right of the above can be arranged in the following
triangular form which is called Pascal’s triangle:
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
The Rule:
As indicated above, a number in a row (except the right and left most ones) is the sum of two
numbers on the two sides of the preceding row.
So, from the 8th
row in the Pascal’s triangle we can easily write the binomial expansion:
Remember that each term represents a binomial probability. A binomial distribution is a
collection of these discrete binomial probabilities. Note:
Example #1:
Five independent shots are fired at a target. The probability of a hit from each shot is 0.4.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 30/48
29
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Q. What is the probability that two shots will hit the target?
Ans. Here , , ,
!
!!
Q. What is the probability that there will be more than two hits?
Ans. Prob. =
=!
!! !!! !
!!
=!
! !
!
=
Q. What is the expectation value of the hits (that is the mean value of hitting the targets out of
all five shots)?
Ans. For this we have to calculate the probabilities , , ,…..for the corresponding number
of hits 0, 1, 2…..
The expectation value,
= 0 +
=
= 0.2592 + 0.6912 + 0.6912 + 0.3072 + 0.0512 = 2.0
Example #2:
Now, imagine a situation where we toss 8 coins together or we toss one coin 8 times
consecutively. We measure the relative occurrence of Head in 8 trials. Let us attach values,
Head = 1 and Tail = 0. So, we can think of a variable which can take values 1/8, 2/8, 3/8,
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 31/48
30
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
4/8…. and so on. Thus we can associate probabilities for the values of directly from Pascal’s
triangle (or by using formula). Note that probability of occurring Head, and not-
occurring Head, .
,
,
,
,
()
If we now plot against , we get the following symmetric discrete distribution with the
peak value at .
Fig.
For large number of trails, this distribution becomes Normal distribution. Therefore, we can say
the following:
Poisson Distribution:
Poisson distribution is applicable to random but extremely rare events. For example, if we
count the number of phone calls received in a span of 5 minutes over a day or count the
numbers of cars passing on the road in a time interval of 1 minute, we will have a distribution
Binomial Probability distribution for a random variable becomes Normal distribution
for a large number of trials.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 32/48
31
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
not quite symmetric like Normal distribution although it is random. This distribution is Poisson
and it appears like a skewed one.
If is a variable that takes the values,
= mean of the distribution,
Some Characteristics of Poisson distribution:
One interesting thing is that for a Poisson distribution, mean and variance are same. For a data
set, if mean and variance are not found approximately equal then the Poisson distribution will
not be suitable model.
We can arrive at Poisson distribution from Binomial distribution. How?
Consider to be the mean value out of total . We can then say that the probability of
occurrence, and so,
.
For -trials, we write the Binomial probability :
!
Mean, µ
Variance,
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 33/48
32
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
( ) (
)
=
!
=!
!
[We can write this as becomes large.]
! [In the limit of large ,
]
Note:
In the above derivation, the approximation from Binomial to Poisson distribution is possibleonly when we assume very large and very small (and thus q very large). The small value of
means a ‘rare event’!
In the following figs. we demonstrate how a symmetric binomial distribution (which would
become a Normal distribution for a large number of events) becomes a Poisson distribution as
the value of is increased (and so is decreased).
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 34/48
33
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
A Binomial distribution (with large and for high value) is again plotted below along with an
actual Poisson distribution (continuous curve) of appropriately chosen λ.
It is often difficult to differentiate a Poisson distribution from a Normal distribution with naked
eye. The mean and the variance of the Normal distribution is chosen suitably so as to match
with the Poisson distribution. A close examination can only reveal the difference! Look at the
following graph.
The Measure of Correlation
Let us first note that the variance of a set of data is given by
∑ ∑
(1)
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 35/48
34
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Variance of another set of data is likewise,
∑ ∑
(2)
We can write the above two expressions in the following form:
∑ ∑
∑ and
∑ ∑
Therefore, we can also define a similar kind of formula involving two variables,
∑ ∑
∑
, (3)
which is called covariance of the two sets of data.
The linear correlation between two sets of data is defined by the following coefficient:
The above coefficient is called Pearson’s correlation coefficient . Correlation is to test how
strongly a pair of variables is related.
Note: In many books, the correlation coefficient is written in the form, , where
, and .
Corr (x,y) =
Properties of :
The coefficient measures the strength of a linear
relationship.
The range:
+1 ⇨ perfect positive linear correlation
⇨ perfect negative linear correlation
⇨ no correlation
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 36/48
35
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
We can have an idea of the kind of correlation between two sets of data from the (
scatter plots:
Figs.
Correlation Matrix:
For the relations among more than two sets of variables, it is useful to present the correlation
coefficients between every two sets of variables in the form of a table. This is called correlation
matrix
For example, for three sets of variables, , we have the following table:
X Y Z
X 1
Y 1
Z 1
Note that in the above table, we have only three different entries. The reason is that the matrix
is symmetric as the correlation between and is same as between and and so on:
, , . Also, the correlation of a variable with itself is trivial; it is always
the perfect correlation ( = 1). So we have only three independent useful
quantities.
Practical Calculation of Correlation Coefficient:
For practical calculations, we often use the following formula after multiplying by to the
numerator and denominator of the formula for :
∑ ∑ ∑ ∑ ∑ ∑ ∑
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 37/48
36
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Example #1
In the following table, some values of two values and are given in two columns. We
calculate the necessary quantities in the other columns to be put in the correlation formula.
Here we find, ∑ , ∑ , ∑ , ∑ , ∑ and
The correlation coefficient, √ √
√ √
The above calculated value of correlation coefficient is close to 1. Thus we may say, there is a
good (positive) correlation between two sets of data.
Example #2
Calculate the correlation coefficient from the following height-weight data:
Height
(cm)
170 172 181 157 150 168 166 175 177 165 163 152 161 173 175
Wight
(kg)
65 66 69 55 51 63 61 75 72 64 61 52 60 70 72
Example #3
Following is a table that represents the data for shoe sizes vs. height achieved by Olympic
participants in a high jump event. Both the columns are measured in inches.
Shoe
size
12.0 7.0 4.5 11.0 8.5 5.0 12.0 7.5 8.5 5.5 9.5 5.5 10.5 12.0 14.0 7.0 7.0
height 72 64 62 70 69 65 72 65 65 65 68 61 69 77 73 65 67
No.
1 2 5 4 25 10
2 4 9 16 81 36
3 5 11 25 121 55
4 6 10 36 100 60
5 8 12 64 144 96
Total 25 47 145 471 257
12.0 12.0 7.0 13.0 11.0 12.0 4.5 10.5 10.0 10.0 13.0 7.5 4.5 8.5 14.0 10.0 6.5
71 73 64 71 71 72 61 71 66 67 73 69 61 70 75 72 66
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 38/48
37
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Follow the same procedure as is done in example #1 and calculate the correlation coefficient.
For a visual effect, we may have a scatter plot of the pair of data. The relationship between
them seems to be linear which should be well reflected in the correlation coefficient.
Rank Correlation:
Spearman’s rank correlation coefficient between two sets of ranked variables is defined below.
Suppose, the original data sets for two variables and are ranked-ordered to have two sets:
and .
We calculate the differences between the ranks of two sets.
The rank correlation coefficient:
∑
Time Series, Auto Correlation
What is a Time Series?
A Time series is a set of observations generated sequentially in time. Any electrical signal, stock
exchange data (the daily trading curve), ECG curve, record of temperature or humidity over a
period etc. all are basically time series.
A time series can tell us a lot of things about what is happening in the system and this enables
us to predict with a certain degree of accuracy.
It is sometimes important to see if there is any cross-correlation among data points in a given
time series. Cross-correlation is nothing but the correlation between data taken at some time
with that of other time. This we call autocorrelation. This can throw some light on the hidden
pattern inside the time series data.
Autocorrelation:
Remember, in the correlation formulas (on p. 34 and on p. 35) before, we considered the pair
of quantities and which corresponding to the same parameter or serial number. Here we
would just have to consider a pair of values and of the same variable but at different
times. If index corresponds to a time , will correspond to another time .
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 39/48
38
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Regression
If two variables are related, that means there is a significant correlation between them; we can
make quantitative prediction of one variable for some value of the other. This is the basis of
regression analysis.
There are two types of regression analysis:
Linear regression ⇨ when the data approximately follow a straight line
Non-linear regression ⇨ when there is no linear relationship exists; in general, a
polynomial is considered to fit the data points.
A regression is drawn through the scatterplot of two variables. The line is chosen so that it
comes through all the points as close as possible.
Regression analysis is widely used for prediction and forecasting.
Linear Regression:
Suppose, we have a set of data for a pair of variables ( ) and we predict that the dependent
variable can be obtained from the independent variable , where they obey a linear
regression equation: , where the coefficients and are given by the following,
∑ ∑ ∑ ∑ ∑ , ∑ ∑ ∑ ∑
∑ ∑
[Derivations of the above formulas are given in appendix.]
So the regression equation is the line with slope and intercept which passes through the
point [mean values].
Example:
We plot the data in example#3 above and obtain a scatter plot. Next we calculate the values of
the parameters and by the above formulas. Then we can draw a straight line with the slope
= and intercept (on y-axis) = . We can examine that this straight line superposed on the
scatter data is the best fit line for the data points. This straight line fit is also called least squarefit.
Fig.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 40/48
39
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Sampling
Basic Concept:
What is sampling?
Sampling is to take a subsection of the population for a particular study. The aim is to
select the data sample in order to represent the total data set.
In statistics, population means the total collection of data. When the population or the
entire collection of data is studied, it is called census.
In short, population is the total set and the sample is the subset of it.
Why the sampling is done?
When the number of elements in a population is large it is often not possible to
investigate the population completely due to lack of time, money and resources. This is
why the sampling is necessary.
Sampling is done in such a way that the subset of data represents the entire set.
If a TV channel wants to know the popularity of a program it would be expensive to ask
everybody’s opinion. Instead a subsection of viewers are interviewed and the data is
collected.
Methods of Sampling:
A sample of size means there are -data points in the collection. A sample of size is
collected from a population of size in such a way that all the features of the population are
well represented by this.
If a sampling method does over-represent or under-represent a feature of the population it is
said to be biased . The aim of any selection method is to reduce the chance of bias as far as
possible.
There are several methods of sampling; among them the most common is the random
sampling.
Random sampling:
For a sample of size , we collect -data from the population. We collect many such
samples for our evaluation. If this is done randomly so that each group of size taken
from the population has equal chance of getting selected, we call this random sampling.
Sometimes, it is called simple random sampling.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 41/48
40
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
For a random sampling, the successive drawings have to be independent.
Let us suppose, we want to select a sample of size 100 from a population of size 10000.
In case of random sampling, we select the elements (that is which element is to be
picked) with the help of a random number (generated in a computer) or by consulting arandom number table or by some kind of dice throwing.
Systematic Sampling:
If simple random sampling from population is not possible, the systematic sampling may
be done. First, population is enumerated from 1 onwards. If sample size of from a
population of size is to be obtained, every -th item is selected. First a random
number between 1 and is selected and then it is taken as the 1st
element. After this
every
-th element is taken.
Stratified Sampling:
In this method, the population is first divided into groups (strata). Each element of the
sample belongs to one such group.
Sl no. value
1 20
2 27
3 33
4 21
5 15
6 22
7 45
8 13
9 32
10 29
11 10
12 16
For a sample of size
Select a random number between 1-3: choose 2, for example.
Start with #2 and then take 5, 8, 11
number data.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 42/48
41
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Divide the population into non-overlapping groups each containing , …data such
that . Next do the simple random sampling to collect one or
a few elements from each group.
Suppose, a population is classified into several groups according to age or something
like that. Then from each group random samples are collected.Note: This is also called restricted random sampling.
Cluster Sampling:
In this method, like before, the population is divided into groups called clusters. Then
clusters are taken randomly and the elements are collected from them as sample.
Any method of sampling that uses (probabilistically) random selection is in general
called probability sampling.
Sampling variation:
When sampling from a population is done, we take not one sample but different sets of
samples having same size. If the samples are different, we call this sampling variation.
Usually in practice, we often draw only one sample or one set of data from a population.
But we may not be sure what may happen in case we draw several other samples. Will
we get the same result? The answer is No. If we look for mean value, we see that the
mean is not the same for all the samples that we are able to draw. We then get some
distributions of the sample means.
population size, sample size, ⁄ = the sample fraction.
Many samples of the same size yield a sampling distribution.
The sampling distributions are usually assumed to follow any well-known probability
distribution.
We look for various properties from the distribution curves.
It is seen how the variation of sample size can affect the properties.
From the experience and theory, we can say that the variability of sampling
distributions decreases with sample size.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 43/48
42
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Hypothesis Testing
What is Hypothesis?
On the basis of sample information, we make certain decisions about the population. In taking
such decisions we make certain assumptions. These assumptions are known as statistical
hypothesis.
[ Note: A collected set of data points which is a part of the population (a few number of data)
is called a sample. The process of selection is called sampling. When all the data are considered
for a study, this is called population.]
How to test Hypothesis?Assuming the hypothesis correct, we calculate the probability of getting the observed sample. If
this probability is less than a certain assigned value, the hypothesis is rejected .
If there is no significant difference between the observed value and the expected value, the
hypothesis is called Null Hypothesis.
Test of significance:
The tests which enable us to decide whether to accept or to reject the null hypothesis are called
the tests of significance. If the differences between the sample values and the population
values are significantly large it is to be rejected (i.e., Hypothesis is not Null).
Student t-test:
Let be the elements of a set of data from a random sampling. The sampling is
drawn from a population that is assumed to obey Normal distribution.
Let
= the actual mean of the distribution,
= the sample mean.
A parameter is calculated as following:
√ ,
where ∑ and sample size.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 44/48
43
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Example:
Q. The average life span of a citizen of India is 70. The average value obtained from a sample of
100 people is 75. The standard deviation is 40. Find if the claim is accepted using the level of
significance of 0.05. Ans.
Here , , ,
√ =
.
Now at the level of significance 0.05, we know (from the standard value) The calculated value of < the tabular value of t [ ]
Thus the claim is accepted within 5% level of significance.
The
-test (Chi-square test):
Here we evaluate the following quantity:
∑ ,
Observed frequency
Expected frequency
Now let us define another parameter called, ‘degree of freedom’.
Degree of freedom = No. of independent observations
= No. of observations – No. of independent constraints.
In practical calculations, we often estimate the degree of freedom from the number of columns
( and number of rows ( in a data table.
Degree of freedom =
Example:
Q. Given are the amounts of rainfall (in mm.) on different days in a week. Check if the rainfall is
uniformly distributed over the week.Given that the is significant at 5, 6, 7 degrees of freedom are respectively 11.07, 12.59, 14.07
at the 5% level of significance.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 45/48
44
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Day 1 2 3 4 5 6 7
Rain fall
(in mm.)
14 16 8 12 11 9 14
If the distribution is to be uniform the expected frequency has to be .
= 4.17
Here the degrees of freedom = and the tabulated value for 6
degrees of freedom is 12.59.
As the calculated value 4.17 < 12.59, we can accept the claim.
We then say Null Hypothesis.
Least Square Fit
(Regression Formulas)
Let us think that we are about to fit the set of data by a straight line.The equation of a straight line is bmx y
Consider the data points ( 11, y x ) , ( 22
, y x ), (33
, y x )…….etc. If we know the two parameters
and , we can draw a st. line with them.
Error is defined as 2
1
)(),( ii
n
i
yb xmbm
.
For the best fit, this error should be minimum.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 46/48
45
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
Therefore, we must have 0
m
and 0
b
.
[We take partial derivatives of the error function with respect to the parameters.]
Now,
2
1
)(
n
i
ii ybmxmm
= 2
1
)( ii
n
i
yb xmm
= )().(21
iiii
n
i
ybmxm
yb xm
=ii
n
i
i x ybmx )(21
=i
n
i
i
n
i
i
n
i
i x y xb xm
111
2
22 = 0 (1)
Similarly,
n
i
i
n
i
i ynb xmb 11
0
(2)
From (1) and (2),
n
i
ii
n
i
i
n
i
i
n
i
i
n
i
i
x y
y
m
b
x x
xn
1
1
1
2
1
1
Slope,
n
i
n
i
ii
n
i
n
i
n
i
iiii
x xn
x y y xn
m
1
2
1
2
1 1 1
)(
and Intercept,
n
i
n
i
ii
n
i
n
i
n
i
iii
n
i
ii
x xn
y x x x y
b
1
2
1
2
1 1 11
2
)(
.
Example:
For the data points (1,2), (2,3), (3,4), (4,5)
4n , 104321
1
n
i
i x , 145432
1
n
i
i y
4054433221
1
n
i
ii y x , 3044332211
1
2
n
i
i x
120
20
100120
140160
10304
1014404
2
m , 120
20
100120
400420
10304
40103014
2
b .
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 47/48
46
Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta, kg.abhi@gmail.com
FORTRAN Program:
C Least Square fit
C
open(1,file='xy.dat')
open(2,file='fit.dat')
write(*,*)'Number of Points?'
read(*,*)n
sumx=0.0
sumy=0.0
sumsqx=0.0
sumxy=0.0
write(*,*)'Give data in the form: x,y'
do i=1,n
read(*,*)x,y
write(1,*)x,y
sumx=sumx+xsumy=sumy+y
sumsqx=sumsqx+x*x
sumxy=sumxy+x*y
enddo
deno=n*sumsqx-sumx*sumx
slope=(n*sumxy-sumx*sumy)/deno
b=(sumsqx*sumy-sumx*sumxy)/deno
write(*,*)'Slope, Intercept= ',slope,b
C
write(*,*)'Give a lower and upper limits of X'
read(*,*)xmin, xmaxx=xmin
dx=(xmax-xmin)/2.0
do i=1,3
y=slope*x+b
write(2,*)x,y
x=x+dx
enddo
stop
end
For the Least Square Fit of given data points. The straight line is drawn with the values of the
slope and the intercept obtained from the program.
7/31/2019 Appl Statistics
http://slidepdf.com/reader/full/appl-statistics 48/48
47
Z-Score Table
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356 0.06749 0.07142 0.07535
0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257 0.10642 0.11026 0.11409
0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058 0.14431 0.14803 0.15173
0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724 0.18082 0.18439 0.18793
0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226 0.21566 0.21904 0.22240
0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537 0.24857 0.25175 0.25490
0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637 0.27935 0.28230 0.28524
0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511 0.30785 0.31057 0.31327
0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147 0.33398 0.33646 0.33891
1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543 0.35769 0.35993 0.36214
1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698 0.37900 0.38100 0.38298
1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617 0.39796 0.39973 0.40147
1.3 0.40320 0.40490 0.40658 0.40824 0.40988 0.41149 0.41308 0.41466 0.41621 0.41774
1.4 0.41924 0.42073 0.42220 0.42364 0.42507 0.42647 0.42785 0.42922 0.43056 0.43189
1.5 0.43319 0.43448 0.43574 0.43699 0.43822 0.43943 0.44062 0.44179 0.44295 0.44408
1.6 0.44520 0.44630 0.44738 0.44845 0.44950 0.45053 0.45154 0.45254 0.45352 0.45449
1.7 0.45543 0.45637 0.45728 0.45818 0.45907 0.45994 0.46080 0.46164 0.46246 0.46327
1.8 0.46407 0.46485 0.46562 0.46638 0.46712 0.46784 0.46856 0.46926 0.46995 0.47062
1.9 0.47128 0.47193 0.47257 0.47320 0.47381 0.47441 0.47500 0.47558 0.47615 0.47670
2.0 0.47725 0.47778 0.47831 0.47882 0.47932 0.47982 0.48030 0.48077 0.48124 0.48169
2.1 0.48214 0.48257 0.48300 0.48341 0.48382 0.48422 0.48461 0.48500 0.48537 0.48574
2.2 0.48610 0.48645 0.48679 0.48713 0.48745 0.48778 0.48809 0.48840 0.48870 0.48899
2.3 0.48928 0.48956 0.48983 0.49010 0.49036 0.49061 0.49086 0.49111 0.49134 0.49158
2.4 0.49180 0.49202 0.49224 0.49245 0.49266 0.49286 0.49305 0.49324 0.49343 0.49361
2.5 0.49379 0.49396 0.49413 0.49430 0.49446 0.49461 0.49477 0.49492 0.49506 0.49520
2.6 0.49534 0.49547 0.49560 0.49573 0.49585 0.49598 0.49609 0.49621 0.49632 0.49643
2.7 0.49653 0.49664 0.49674 0.49683 0.49693 0.49702 0.49711 0.49720 0.49728 0.49736
2.8 0.49744 0.49752 0.49760 0.49767 0.49774 0.49781 0.49788 0.49795 0.49801 0.49807
2 9 0 49813 0 49819 0 49825 0 49831 0 49836 0 49841 0 49846 0 49851 0 49856 0 49861