M 2008 Meyer Folleto Statistics and Data Analysis in Geology
-
Upload
pancho-perez -
Category
Documents
-
view
219 -
download
0
Transcript of M 2008 Meyer Folleto Statistics and Data Analysis in Geology
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
1/28
Franz MeyerStatistics & Data Analysis in Geology 1
Dr. Franz J Meyer
Earth and Planetary Remote Sensing,University of Alaska Fairbanks
Statistics and Data Analysis in Geology 6. Normal Distribution
probability plots
central limits theorem
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
2/28
Franz MeyerStatistics & Data Analysis in Geology 2
Normal Distribution
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
3/28
Franz MeyerStatistics & Data Analysis in Geology 3
Normal Distribution
An Enormously Important Distribution
The normal distribution is the most commonly used distribution in statistics
Partly this is due to the fact that the normal distribution is a
reasonable description of
many processes from industrial processes to intelligence test scores
Also, under specific conditions one can assume that sampling distributions are normallydistributed even if the samples are drawn from populations that are not
normally
distributed (this is discussed further when we talk about the Central Limits Theorem)
The normal distribution is also referred to as bell curve
and you see a few examples
below
There are an infinite number of normal
distributions that differ according to their
mean () and variance (2)
3
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
4/28
Franz MeyerStatistics & Data Analysis in Geology 4
Almost all natural processes follow the normal
distribution
The shape of a Normal distribution corresponds
to a binomial distribution with p = 0.5 (compareto coin toss example of lecture 5)
As N becomes large, the function becomes
continuous and can be represented by the
following equation
it also can be thought of as
for p = 0.5
A normal distribution can be characterized by
only two parameters,
and
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
5/28
Franz MeyerStatistics & Data Analysis in Geology 5
Normal Distribution
The Standard Normal Distribution or Z Distribution
It is often useful to standardize the
variables so that populations can be
compared. Standardization meansthat the mean, , = 0 and thestandard deviation , = 1
Then the equation becomes:
and the curve is expressed in numbers
of standard deviations from the mean
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
6/28
Franz MeyerStatistics & Data Analysis in Geology 6
Normal Distribution
The Standard Normal Distribution or Z Distribution
So you convert the normal distribution to the Z distribution by converting the
original values to standard scores, which allows comparison among populations
with different means and variances
Thats interesting as all normal distributions share the following characteristics:
Symmetry
Unimodality
Continuous range from -
to +
A total area under the curve of 1
A common values for the mean, median, and mode
We can make some assumptions about
how the data is distributed within any
normal distribution
About 68% of the data fall within 1
About 95% of all data fall within 2
About 99.5% of all data fall within 3
6
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
7/28
Franz MeyerStatistics & Data Analysis in Geology 7
Normal Distribution
The Standard Normal Distribution or Z Distribution
Standardization of
normal random
variables
7
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
8/28
Franz MeyerStatistics & Data Analysis in Geology 8
Normal Distribution
The Standard Normal Distribution or Z Distribution
For any sample, the way to standardize the data is called Z-transformation.
For every point we calculate a Z-score, which is really a measure ofhow many
standard deviations a point is from the mean.
depending on if you are dealing with a sample
or population. Z scores can be positive or negative.SXXZXZ ii
ii == or
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
9/28
Franz MeyerStatistics & Data Analysis in Geology 9
Normal Distribution
The Standard Normal Distribution or Z Distribution
For any sample, the way to standardize the data is called Z-transformation.
For every point we calculate a Z-score, which is really a measure ofhow many
standard deviations a point is from the mean.
depending on if you are dealing with a sample
or population. Z scores can be positive or negative.
Example:
A shell specimen with a value of 12 mm (X = 12) is drawn from a population with
= 10,
= 2. What is that samples Z score?
or the sample is one standard deviation longer than the
mean
What if that same sample is drawn from a population with
= 10,
= 1 (Same mean
different variance)?
In absolute terms the specimen is the same distance from the mean, however relative to
the population as a whole, it is further away (more anomalous).
SXXZXZ ii
ii == or
( ) 12221012 ===Z
( ) 21211012 ===Z
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
10/28
Franz MeyerStatistics & Data Analysis in Geology 10
Normal Distribution
The Standard Normal Distribution or Z Distribution
Example cont.:
What if a different specimen (X = 14) is drawn from the population in example 1 with
=
10,
= 2?
So this sample is in the same position relative to the population as that from example 2.
( ) 22421014 ===Z
Z score
4 6 8 10 12 14
16 mm
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
11/28
Franz MeyerStatistics & Data Analysis in Geology 11
Normal Distribution
The Standard Normal Distribution or Z Distribution
For each normal distribution, the area
under the curve is equal to 1. That is,
the total probability is equal to 1 (as it
was with the binomial distribution).
Mathematically we can express this as:
For Z-transformed data this is:
+
=1)( dXXf
+
== 12)(21)(
2
dxXdxXf e
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
12/28
Normal Distribution
The Standard Normal Distribution or Z Distribution
Similarly, we can calculate the probability of a sample
as being less than or equal to some preset value Z as
Z
dxX
e
2)(
2
1 2
A different way to represent the
normal distribution is by Cumulative
Probability: They are plots of the
area under the curve versus X.They can be made for any
distribution. These types of plots are
called OGIVE PLOTS, and I will
come back to them later.
12
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
13/28
Franz MeyerStatistics & Data Analysis in Geology 13
Normal Distribution
For the normal distribution, it is a
pain in the neck to calculate this
integral for every problem that we
are going to do, so tables have
been constructed.
13
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
14/28
Franz MeyerStatistics & Data Analysis in Geology 14
Normal Distribution
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
15/28
Franz MeyerStatistics & Data Analysis in Geology 15
Normal Distribution
The numbers in the table below
are answers to the question:
What is the Z value
corresponding to a particular
area under the curve?
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
16/28
Franz MeyerStatistics & Data Analysis in Geology 16
Normal Distribution
Example of Cumulated Probability
Grades of chip samples from a body of ore
have a normal distribution with a mean of 12%
() and a standard deviation of 1.6 % ().
(curve to the right helps to visualize the
distribution)
Problem 1: Find the probability of a specimen of
15% or less
Calculate Z score
(15-12)/1.6 = +1.88
The chart on slide 13 gives cumulative probability
from very small (minus infinity) to the value:
+1.88 = 0.97 (we have to interpolate between +1.8
and +1.9)
Make a sketch to see if this makes sense
So the probability of finding a sample with
less than 15% ore is 97%
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
17/28
Franz MeyerStatistics & Data Analysis in Geology 17
Normal Distribution
Example of Cumulated Probability
Problem 2: What is the probability of finding ore
greater than 14%?
Z = (14-12)/1.6 = +1.25
the probability associated with this Z score is
0.895. This is the probability of 14% or less.
The probability of 14% or more is 1
0.895 or
0.105
So the probability of finding a sample morethan 14% ore is 10.5%
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
18/28
Franz MeyerStatistics & Data Analysis in Geology 18
Normal Distribution
Example of Cumulated Probability
Problem 3: What is the probability of finding ore
grade of less than 8%?
Z = (8-12)/1.6 = -2.5
the probability associated with this Z score is
0.0062
So the probability of finding a sample less
than 8% ore is 0.62%, not very likely
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
19/28
Franz MeyerStatistics & Data Analysis in Geology 19
Normal Distribution
Example of Cumulated Probability
Problem 4: What is the probability of a sample
being between 8% and 15%?
Calculate the Z scores for each value:
Z8
= (8-12)/1.6 = -2.5 --> 0.62%
Z15
= (15-12)/1.6 = 1.88 --> 97%
Subtract the smaller from the larger:
97
0.62 = 96.38%, so about 96% or all
samples fall in that range.
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
20/28
Franz MeyerStatistics & Data Analysis in Geology 20
Normal Distribution
Example of Cumulated Probability
area under the curve
1
0.8413 -
0.1587 = 0.6826
68%
2
0.9773 -
0.0228 = 0.9545
95.5%
1.96
= 95%
3
0.9987-
0.00140 = 0.9973
99%
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
21/28
Franz MeyerStatistics & Data Analysis in Geology 21
Normal Distribution
The Central Limits Theorem
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
22/28
Franz MeyerStatistics & Data Analysis in Geology 22
Normal Distribution
The Central Limits Theorem
If you draw a number of samples from a normal distribution population, we find
that the sample means will form a normal distribution
BUT we don't always know the distribution of the population
Central Limits Theorem:
CLT states that independent of their original statistical distribution, the re-averaged sumof a sufficiently large number of identically distributed independent
random variables
will
be approximately normally distributed.
In other words, ifsufficiently large sets
of random samples are taken from any
population, and the means are calculated for those samples, then these sample
means
will tend to be normally distributed.
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
23/28
Franz MeyerStatistics & Data Analysis in Geology 23
Normal Distribution
The Central Limits Theorem
Central Limits Theorem:
Again in other words: if we take all possible samples of size n from any population with a mean of
and a standard deviation of, the distribution of sample means will have:
mean of also written as
Standard deviation of means,
This is also called the standard error of the mean, se
will be normally distribution when the parent population is normal
will approach a normal distribution as N approaches infinity regardless of the distribution of the
parent population.
=X =XX
nsX
=
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
24/28
Franz MeyerStatistics & Data Analysis in Geology 24
Normal Distribution
The Central Limits Theorem
1 2 4 25
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
25/28
Franz MeyerStatistics & Data Analysis in Geology 25
Normal Distribution
The Central Limits Theorem
Some animated examples:
Uniform distribution:
Log-normal distribution:
Parabolic distributions:
http://www.
statisticalengineering.com/central_limit_the
orem.htm
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
26/28
Franz MeyerStatistics & Data Analysis in Geology 26
This means, if we average enough we can always reduce data of unknownstatistics to data of known properties.
Practically, we can use our Z-statistic
useful when we want to infer something from single values taken from a normalpopulation (Xi
drawn from population)
and adapt it for CLT for a sample of size N drawn from a population with known
mean and standard deviation.
You can see that equation (1) is the same as (2) if n = 1 (a single sample)
So both equations are just more specific forms of the general equation
se
is the standard deviation of means =
Normal Distribution
The Central Limits Theorem
= i
i
XZ
n
XZ/1
=
es
XZ
=
n/1
(1)
(2)
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
27/28
Franz MeyerStatistics & Data Analysis in Geology 27
Normal Distribution
The Central Limits Theorem
For the example from earlier:
A sample with a value of 14% (X = 14) is
drawn from a population with
= 12,
=
1.6. What is the probability of finding a
single sample equal to or greater than 14%
ore? First calculate that samples Z score.
Or the probability of finding one such
sample or greater was about 10.5%.
25.16.12
6.11214 ===Z
-
7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology
28/28
Franz MeyerStatistics & Data Analysis in Geology 28
Normal Distribution
The Central Limits Theorem
For the example from earlier:
Now, what if we selected 4 samples (n = 4)
and the mean of those specimens was
14%?
And the probability of finding four such
specimens is less, in fact it is only 0.62%!!!
5.28.0
2
)2/1(6.1
2
4/16.1
1214==
=
=Z