Descriptive Statistics Handout

73
by RODOLFO B. MARIANO II, DMD, DDPH, MPD

description

research

Transcript of Descriptive Statistics Handout

  • byRODOLFO B. MARIANO II, DMD, DDPH, MPD

  • describes the basic features of the data in a study

    summarizes the data in such a way that they can be more easily be understood and interpreted

    presents quantitative descriptions in a manageable form together with simple graphics analysis, they form the basis of virtually every

    quantitative analysis of data

    DESCRIPTIVE STATISTICS

  • Birth Weights (g) of 20 Live-born Infants Born at a Private Hospital

    Baby

    Weight Baby Weight

    Baby Weight

    Baby Weight

    1 3265 6 3323 11 2581 16 27592 3260 7 3649 12 2841 17 32483 3245 8 3200 13 3609 18 33144 3484 9 3031 14 2838 19 32005 4146 10 2069 15 3541 20 2834

  • gives a single description of the average or "typical" score in the distribution

    how "spread out" the scores are in the distribution

  • Measures Of Central Tendency (Location)

    Measures Of Variation (Variability, Dispersion, Spread)

  • locate the center of the distribution the typical or average score representative for an entire distribution

  • Mode

    Median

    Mean

  • The mode for a set of measurements is defined to be the measurement(s) that occur(s) with the greatest frequency.

    The value (number) that appears the most. Notation: Mo

  • Birth Weights (g) of 20 Live-born Infants Born at a Private Hospital

    Baby

    Weight Baby Weight

    Baby Weight

    Baby Weight

    1 3265 6 3323 11 2581 16 27592 3260 7 3649 12 2841 17 32483 3245 8 3200 13 3609 18 33144 3484 9 3031 14 2838 19 32005 4146 10 2069 15 3541 20 2834

  • Birth Weights (g) of 20 Live-born Infants Born at a Private Hospital

    Weight20692581275928342838284130313200320032453248326032653314332334843541360936494146

    Weight32653260324534844146332336493200303120692581284136092838275932483314320028342759

  • It is possible to have more than one mode.

    12, 12, 10, 9, 9, 6, 6, 2, 0

    And it is possible to have no mode.

    12, 10, 9, 6, 2, 0

    If there is no mode, write "no mode," donot write zero (0).

  • A distribution may have no mode; or it may have more than one mode. Thus, it has limited applicability.

    Nominal DataMode is the only meaningful typical score.

    When To Use

  • The median is the score that divides the distribution exactly in half.

    It is the precise midpoint. Notation: Mdn

  • The median (for an odd number of measurements) is the middle measurement when the measurements are arranged in order of magnitude.

    Mdn = N + 12th

    Where: N / n = number of observations

    2, 4, 5, 6, 8

  • 12, 10, 9, 6, 2, 0 12, 10, 6, 6, 2, 0

    Mdn =(N/2)th + (N/2 + 1)th

    2

    The median (for an even number of measurements) is the average of the middle observations when the measurements are arranged in order of magnitude.

  • Mdn =(20/2)th + (20/2 + 1)th

    2

    =(10)th + (11)th

    2

    =3245 + 3248

    2

    Mdn =

    Weight20692581275928342838284130313200320032453248326032653314332334843541360936494146

    6493

    2= 3246.5g

  • 4, 6, 7, 7, 10, 11, 14

    The median is of little value because it may lead to misconceptions and false deduction.

    Two distributions might have the same median, yet be entirely different.

    5, 6, 7, 7, 8, 9, 10

    When To Use

  • Ordinal Data can have either a median or a mode if given numerical values, it may be possible to derive a meanMedian most appropriate

  • When distribution is skewed, the median may provide a better representation

    because it is unaffected by the extreme scores.

  • ab c

    Median Age

  • Median AgeThis the age at which

    exactly half the population is older and half is younger.

  • List of Highest Median Age (2010 est.)

    Rank Country / Territory

    Median Age (Years)Total Male Female

    1 Monaco 48.9 48.0 49.9 2 Japan 44.6 42.9 46.5 3 Italy 44.3 43.0 45.64 Germany 43.7 42.3 45.3 5 Jersey 43.4 42.5 44.2 6 Hong Kong 42.8 42.4 43.2 7 Austria 42.6 41.5 43.6 8 Greece 42.2 41.1 43.2 9 San Marino 42.1 41.3 42.8 9 Slovenia 42.1 40.4 43.7 10 Guernsey 42.0 41.8 43.5 10 Belgium 42.0 40.7 43.3

    India 25.9 25.4 26.6 Philippines 22.7 22.2 23.2 World 28.4 27.7 29.0

  • List of Lowest Median Age (2010 est.)

    Rank Country / Territory

    Median Age (Years)Total Male Female

    Tanzania 17.3 17.0 17.6 Benin 17.3 16.9 17.8 Mayotte 17.3 18.1 16.5 Zambia 17.2 17.1 17.3 Malawi 17.1 17.0 17.3 Congo, R 16.9 16.7 17.2 Ethiopia 16.8 16.5 17.2 Burkina Faso 16.8 16.6 17.0 Burundi 16.8 16.6 17.0 Chad 16.6 15.5 17.8 Congo, DR 16.5 16.3 16.7 Yemen 16.4 16.8 16.0 Mali 16.2 15.8 16.6 Niger 15.2 15.0 15.4 Uganda 15.0 14.9 15.1

  • The arithmetic mean, often called the average, is usually the most representative of all the scores in a distribution (by weight). is always the center of balance of a distribution

  • Persons

    Pens 2 3 4 5 6 7 8

    Persons

    Pens 2 3 4 5 6

  • (with a bar on top)

    The mean is the sum of a set of measurements divided by the number of

    measurements in the set. Notation: - The mean of a sample is x

    - The mean of a population is

  • nx

    x =

    Where: x = mean

    N / n = number of observations x = observations x = the sum of all observations

    Nx =

    /

  • Persons

    Pens 2 3 4 5 6 7 8

    Persons

    Pens 2 3 4 5 6

  • Nx =

    2063,437=

    Nx1 + x2 + + xn=

    3,171.85g=

    Weight

    20692581275928342838284130313200320032453248326032653314332334843541360936494146

    (x)xx1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20

    x 63437

  • 0500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    Fig. 1. Birthweights (g) of 20 Live-born Infants Born at a Private Hospital.

  • This has the greatest reliability and enjoys the widest use. It considers all the score values in a distribution.

    The mean is the reference point in calculating the common measures of variation and testing for significance.

    When To Use

  • Interval or Ratio Data can have a mean, a median, or a mode The mean should always be the statistic

    of choice, but especially appropriate only when the data are normal or near normal.

  • Mean

    Median

    Mode

  • Rule: In a unimodal, symmetrical

    distribution (a normal distribution), the mode, median and mean are the same value.

    Mean

    Median

    Mode

  • Pens 2 3 4 5 6 7 8 9 10 11 12 13

    Persons

  • Mean

    Median

    If the mean is to the right of the median (and mode), the distribution is said to be

    positively skewed (skewed right).

  • Mean

    Median

    If the mean is to the left of the median (and the mode), the distribution is said to be negatively skewed (skewed left).

  • Birth weightsMode: 3200g

    Median: 3246.5gMean: 3171.85g

  • locate the spread of the distribution help to describe how far from the center

    the data tend to range

  • Range

    Variance

    Standard Deviation

    Coefficient of Variation

  • The range of a set of measurements is defined to be the difference between the largest and the smallest measurements.

  • Range = Largest value - smallest

    = 4146g 2069g

    Range = 2077g

    Weight20692581275928342838284130313200320032453248326032653314332334843541360936494146

  • The range is an insensitive measure of variation (does not change with a change in the distribution of data), and is not very informative.

    Data sets may be represented by the same range, but the data may be distributed very differently.

    12, 10, 9, 6, 2, 0 (Range = 12)14, 11, 9, 6, 4, 2 (Range = 12)

    When To Use

  • The range, however, is an adequate measure of variation for a small set of data

    (like class scores for a test).

  • The variance is the average squared deviation about the mean of a set of data.

  • Weight(x)

    20692581275928342838284130313200320032453248326032653314332334843541360936494146

    Weight - Mean(x )

    - 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85

    = -1102.85= -590.85= -412.85= -337.85= -333.85= -330.85= -140.85= 28.15= 28.15= 73.15= 76.15= 88.15= 93.15= 142.15= 151.15= 312.15= 369.15= 437.15= 474.15= 974.15

    00.00

    Squared Deviation

    1,216,278.12 349,103.72 170,445.12 114,142.62 111,455.82 109,461.72 19,838.72 792.42 792.42 5,350.92 5,798.82 7,770.42 8,676.92 20,206.62 22,846.32 97,437.62 136,271.72 191,100.12 227,672.12 948,968.22

    3,764,410.55

    63437

  • The variance is computed as the sum of the squared deviation of each

    number from the mean divided by the number of observations.

  • where: 2 = the varianceN = total number of observations

    (x-)2 = the sum of the squared deviation of each observation from the mean

    The population variance can be computed as:

    N2 =

    (x )2

  • Weight(x)

    20692581275928342838284130313200320032453248326032653314332334843541360936494146

    Weight - Mean(x )

    - 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85- 3171.85

    = -1102.85= -590.85= -412.85= -337.85= -333.85= -330.85= -140.85= 28.15= 28.15= 73.15= 76.15= 88.15= 93.15= 142.15= 151.15= 312.15= 369.15= 437.15= 474.15= 974.15

    00.00

    Squared Deviation

    1,216,278.12 349,103.72 170,445.12 114,142.62 111,455.82 109,461.72 19,838.72 792.42 792.42 5,350.92 5,798.82 7,770.42 8,676.92 20,206.62 22,846.32 97,437.62 136,271.72 191,100.12 227,672.12 948,968.22

    3,764,410.55

    3,764,410.5563437

  • (x )2

    N2 =

    3,764,410.5520

    =

    2 = 188,220.53g2

  • The sample variance can be computed as:

    (x x)2

    n - 1

    S2 =

    where:S2 = the variancen = total number of observations

    (x-x)2 = the sum of the squared deviation of each observation from the mean

    n 1 = standard; number of degrees of freedom

  • x2 ( x)2 / n

    n - 1S2 =

    x2 ( x)2 / N

    N 2 =

  • (x2) 4,280,761 6,661,561 7,612,081 8,031,556 8,054,244 8,071,281 9,186,961

    10,240,00010,240,00010,530,02510,549,50410,627,60010,660,22510,982,59611,042,32912,138,25612,538,68113,024,88113,315,20117,189,316

    204,977,059

    Weight(x)

    20692581275928342838284130313200320032453248326032653314332334843541360936494146

    63437

    204977059.00204977059.00

    x2 (x)2/N

    N 2 =

    204,977,059 (6,34,377 )2/20

    20=

    20 =

    204,977,059 4,024,252,969/20

    20 = 204,977,059 201,212,648.45

    20 =

    3,764,410.55

    2 = 188,220.53g2

  • The variance measures how the observations are spread around the mean. The larger the variance, the more scattered the observations on average. It is hard to interpret, though, because when we square the values we also square the units.

    When To Use

  • The standard deviation is the most useful and most commonly used measure

    of variability. Like the variance, its computation considers all the scores in a distribution. The standard deviation is the positive square root of the variance. Notation:

    - For a sample, S- For a population,

  • = 2

    (x )2

    N =

    (x x)2

    n - 1S =

    S = S2

  • x2 ( x)2

    n - 1S2 =

    x2 ( x)2

    N 2 =

  • = 2

    188,220.53g2 =

    = 433.84g

  • The standard deviation is a widely used measure of variation.

    It is a useful measure when your data distribution is very close to a normal curve. In this situation, the mean is the best measure of central tendency, and the standard deviation is the best measure of

    variation.

    When To Use

  • This distribution has

    a mean weight of 3,171.85g and a standard deviation of 433.84g.

    a weight of 3,171.85g 433.84g.

    Weight(x)

    20692581275928342838284130313200320032453248326032653314332334843541360936494146

    63437

  • In case of a normal distribution the following rules of thumb can be applied: ( ) contains about 68% of the observations ( 2) contains about 95% of the observations ( 3) contains more than 99% of the

    observations

  • The coefficient of variation measures variability in relation to the mean and is used to compare the relative dispersion in one type of data with the relative dispersion in another type of data. The coefficient of variation is a calculation

    built on other calculations the standard deviation and the mean.

    Notation: C.V.

  • C.V. = 100xS

    C.V. = 100

  • 85.7

    = 10010.18

    = (0.125)(100)

    C.V. = 11.88%

    Scores

    64777980869192929799

    (x)

    C.V. = 100

  • CV measures the relative variability of the same characteristic in two different

    populations, or of two related characteristics scored in different units in the same population.

    When To Use

  • Is the variability in body temperature and pulse rate in male adults the same? Body temperatures and pulse rates of 50 men were measured and analyzed to yield the following statistics:

    Temperature98.4

    1.5

    Pulse Rate78

    9xS

  • for temperature, C.V. = (1.5/98.4)(100) = 1.52%

    for pulse rate, C.V. = (9/78)(100) = 11.54%

    The difference in variability between the two variables tells us that there is more than 10% difference between them.

  • Birth weightsRange: 2077gVariance: 188,220.53g2Standard Deviation: 433.84g

  • Numerical descriptive measure of a population is called parameter.

  • Cancel

    Descriptive Statistics.ppt

    Saving

    PowerPoint PresentationSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62Slide 63Slide 64Slide 65Slide 66Slide 67Slide 68Slide 69Slide 70Slide 71Slide 72Slide 73