M 2008 Meyer Folleto Statistics and Data Analysis in Geology

download M 2008 Meyer Folleto Statistics and Data Analysis in Geology

of 28

Transcript of M 2008 Meyer Folleto Statistics and Data Analysis in Geology

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    1/28

    Franz MeyerStatistics & Data Analysis in Geology 1

    Dr. Franz J Meyer

    Earth and Planetary Remote Sensing,University of Alaska Fairbanks

    Statistics and Data Analysis in Geology 6. Normal Distribution

    probability plots

    central limits theorem

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    2/28

    Franz MeyerStatistics & Data Analysis in Geology 2

    Normal Distribution

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    3/28

    Franz MeyerStatistics & Data Analysis in Geology 3

    Normal Distribution

    An Enormously Important Distribution

    The normal distribution is the most commonly used distribution in statistics

    Partly this is due to the fact that the normal distribution is a

    reasonable description of

    many processes from industrial processes to intelligence test scores

    Also, under specific conditions one can assume that sampling distributions are normallydistributed even if the samples are drawn from populations that are not

    normally

    distributed (this is discussed further when we talk about the Central Limits Theorem)

    The normal distribution is also referred to as bell curve

    and you see a few examples

    below

    There are an infinite number of normal

    distributions that differ according to their

    mean () and variance (2)

    3

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    4/28

    Franz MeyerStatistics & Data Analysis in Geology 4

    Almost all natural processes follow the normal

    distribution

    The shape of a Normal distribution corresponds

    to a binomial distribution with p = 0.5 (compareto coin toss example of lecture 5)

    As N becomes large, the function becomes

    continuous and can be represented by the

    following equation

    it also can be thought of as

    for p = 0.5

    A normal distribution can be characterized by

    only two parameters,

    and

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    5/28

    Franz MeyerStatistics & Data Analysis in Geology 5

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    It is often useful to standardize the

    variables so that populations can be

    compared. Standardization meansthat the mean, , = 0 and thestandard deviation , = 1

    Then the equation becomes:

    and the curve is expressed in numbers

    of standard deviations from the mean

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    6/28

    Franz MeyerStatistics & Data Analysis in Geology 6

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    So you convert the normal distribution to the Z distribution by converting the

    original values to standard scores, which allows comparison among populations

    with different means and variances

    Thats interesting as all normal distributions share the following characteristics:

    Symmetry

    Unimodality

    Continuous range from -

    to +

    A total area under the curve of 1

    A common values for the mean, median, and mode

    We can make some assumptions about

    how the data is distributed within any

    normal distribution

    About 68% of the data fall within 1

    About 95% of all data fall within 2

    About 99.5% of all data fall within 3

    6

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    7/28

    Franz MeyerStatistics & Data Analysis in Geology 7

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    Standardization of

    normal random

    variables

    7

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    8/28

    Franz MeyerStatistics & Data Analysis in Geology 8

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    For any sample, the way to standardize the data is called Z-transformation.

    For every point we calculate a Z-score, which is really a measure ofhow many

    standard deviations a point is from the mean.

    depending on if you are dealing with a sample

    or population. Z scores can be positive or negative.SXXZXZ ii

    ii == or

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    9/28

    Franz MeyerStatistics & Data Analysis in Geology 9

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    For any sample, the way to standardize the data is called Z-transformation.

    For every point we calculate a Z-score, which is really a measure ofhow many

    standard deviations a point is from the mean.

    depending on if you are dealing with a sample

    or population. Z scores can be positive or negative.

    Example:

    A shell specimen with a value of 12 mm (X = 12) is drawn from a population with

    = 10,

    = 2. What is that samples Z score?

    or the sample is one standard deviation longer than the

    mean

    What if that same sample is drawn from a population with

    = 10,

    = 1 (Same mean

    different variance)?

    In absolute terms the specimen is the same distance from the mean, however relative to

    the population as a whole, it is further away (more anomalous).

    SXXZXZ ii

    ii == or

    ( ) 12221012 ===Z

    ( ) 21211012 ===Z

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    10/28

    Franz MeyerStatistics & Data Analysis in Geology 10

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    Example cont.:

    What if a different specimen (X = 14) is drawn from the population in example 1 with

    =

    10,

    = 2?

    So this sample is in the same position relative to the population as that from example 2.

    ( ) 22421014 ===Z

    Z score

    4 6 8 10 12 14

    16 mm

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    11/28

    Franz MeyerStatistics & Data Analysis in Geology 11

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    For each normal distribution, the area

    under the curve is equal to 1. That is,

    the total probability is equal to 1 (as it

    was with the binomial distribution).

    Mathematically we can express this as:

    For Z-transformed data this is:

    +

    =1)( dXXf

    +

    == 12)(21)(

    2

    dxXdxXf e

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    12/28

    Normal Distribution

    The Standard Normal Distribution or Z Distribution

    Similarly, we can calculate the probability of a sample

    as being less than or equal to some preset value Z as

    Z

    dxX

    e

    2)(

    2

    1 2

    A different way to represent the

    normal distribution is by Cumulative

    Probability: They are plots of the

    area under the curve versus X.They can be made for any

    distribution. These types of plots are

    called OGIVE PLOTS, and I will

    come back to them later.

    12

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    13/28

    Franz MeyerStatistics & Data Analysis in Geology 13

    Normal Distribution

    For the normal distribution, it is a

    pain in the neck to calculate this

    integral for every problem that we

    are going to do, so tables have

    been constructed.

    13

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    14/28

    Franz MeyerStatistics & Data Analysis in Geology 14

    Normal Distribution

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    15/28

    Franz MeyerStatistics & Data Analysis in Geology 15

    Normal Distribution

    The numbers in the table below

    are answers to the question:

    What is the Z value

    corresponding to a particular

    area under the curve?

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    16/28

    Franz MeyerStatistics & Data Analysis in Geology 16

    Normal Distribution

    Example of Cumulated Probability

    Grades of chip samples from a body of ore

    have a normal distribution with a mean of 12%

    () and a standard deviation of 1.6 % ().

    (curve to the right helps to visualize the

    distribution)

    Problem 1: Find the probability of a specimen of

    15% or less

    Calculate Z score

    (15-12)/1.6 = +1.88

    The chart on slide 13 gives cumulative probability

    from very small (minus infinity) to the value:

    +1.88 = 0.97 (we have to interpolate between +1.8

    and +1.9)

    Make a sketch to see if this makes sense

    So the probability of finding a sample with

    less than 15% ore is 97%

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    17/28

    Franz MeyerStatistics & Data Analysis in Geology 17

    Normal Distribution

    Example of Cumulated Probability

    Problem 2: What is the probability of finding ore

    greater than 14%?

    Z = (14-12)/1.6 = +1.25

    the probability associated with this Z score is

    0.895. This is the probability of 14% or less.

    The probability of 14% or more is 1

    0.895 or

    0.105

    So the probability of finding a sample morethan 14% ore is 10.5%

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    18/28

    Franz MeyerStatistics & Data Analysis in Geology 18

    Normal Distribution

    Example of Cumulated Probability

    Problem 3: What is the probability of finding ore

    grade of less than 8%?

    Z = (8-12)/1.6 = -2.5

    the probability associated with this Z score is

    0.0062

    So the probability of finding a sample less

    than 8% ore is 0.62%, not very likely

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    19/28

    Franz MeyerStatistics & Data Analysis in Geology 19

    Normal Distribution

    Example of Cumulated Probability

    Problem 4: What is the probability of a sample

    being between 8% and 15%?

    Calculate the Z scores for each value:

    Z8

    = (8-12)/1.6 = -2.5 --> 0.62%

    Z15

    = (15-12)/1.6 = 1.88 --> 97%

    Subtract the smaller from the larger:

    97

    0.62 = 96.38%, so about 96% or all

    samples fall in that range.

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    20/28

    Franz MeyerStatistics & Data Analysis in Geology 20

    Normal Distribution

    Example of Cumulated Probability

    area under the curve

    1

    0.8413 -

    0.1587 = 0.6826

    68%

    2

    0.9773 -

    0.0228 = 0.9545

    95.5%

    1.96

    = 95%

    3

    0.9987-

    0.00140 = 0.9973

    99%

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    21/28

    Franz MeyerStatistics & Data Analysis in Geology 21

    Normal Distribution

    The Central Limits Theorem

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    22/28

    Franz MeyerStatistics & Data Analysis in Geology 22

    Normal Distribution

    The Central Limits Theorem

    If you draw a number of samples from a normal distribution population, we find

    that the sample means will form a normal distribution

    BUT we don't always know the distribution of the population

    Central Limits Theorem:

    CLT states that independent of their original statistical distribution, the re-averaged sumof a sufficiently large number of identically distributed independent

    random variables

    will

    be approximately normally distributed.

    In other words, ifsufficiently large sets

    of random samples are taken from any

    population, and the means are calculated for those samples, then these sample

    means

    will tend to be normally distributed.

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    23/28

    Franz MeyerStatistics & Data Analysis in Geology 23

    Normal Distribution

    The Central Limits Theorem

    Central Limits Theorem:

    Again in other words: if we take all possible samples of size n from any population with a mean of

    and a standard deviation of, the distribution of sample means will have:

    mean of also written as

    Standard deviation of means,

    This is also called the standard error of the mean, se

    will be normally distribution when the parent population is normal

    will approach a normal distribution as N approaches infinity regardless of the distribution of the

    parent population.

    =X =XX

    nsX

    =

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    24/28

    Franz MeyerStatistics & Data Analysis in Geology 24

    Normal Distribution

    The Central Limits Theorem

    1 2 4 25

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    25/28

    Franz MeyerStatistics & Data Analysis in Geology 25

    Normal Distribution

    The Central Limits Theorem

    Some animated examples:

    Uniform distribution:

    Log-normal distribution:

    Parabolic distributions:

    http://www.

    statisticalengineering.com/central_limit_the

    orem.htm

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    26/28

    Franz MeyerStatistics & Data Analysis in Geology 26

    This means, if we average enough we can always reduce data of unknownstatistics to data of known properties.

    Practically, we can use our Z-statistic

    useful when we want to infer something from single values taken from a normalpopulation (Xi

    drawn from population)

    and adapt it for CLT for a sample of size N drawn from a population with known

    mean and standard deviation.

    You can see that equation (1) is the same as (2) if n = 1 (a single sample)

    So both equations are just more specific forms of the general equation

    se

    is the standard deviation of means =

    Normal Distribution

    The Central Limits Theorem

    = i

    i

    XZ

    n

    XZ/1

    =

    es

    XZ

    =

    n/1

    (1)

    (2)

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    27/28

    Franz MeyerStatistics & Data Analysis in Geology 27

    Normal Distribution

    The Central Limits Theorem

    For the example from earlier:

    A sample with a value of 14% (X = 14) is

    drawn from a population with

    = 12,

    =

    1.6. What is the probability of finding a

    single sample equal to or greater than 14%

    ore? First calculate that samples Z score.

    Or the probability of finding one such

    sample or greater was about 10.5%.

    25.16.12

    6.11214 ===Z

  • 7/28/2019 M 2008 Meyer Folleto Statistics and Data Analysis in Geology

    28/28

    Franz MeyerStatistics & Data Analysis in Geology 28

    Normal Distribution

    The Central Limits Theorem

    For the example from earlier:

    Now, what if we selected 4 samples (n = 4)

    and the mean of those specimens was

    14%?

    And the probability of finding four such

    specimens is less, in fact it is only 0.62%!!!

    5.28.0

    2

    )2/1(6.1

    2

    4/16.1

    1214==

    =

    =Z