7 Measure Statistics

download 7 Measure Statistics

of 76

Transcript of 7 Measure Statistics

  • 8/2/2019 7 Measure Statistics

    1/76

    VII. MEASURE - STATISTICS

    DO NOT PUT YOUR FAITH IN WHATSTATISTICS SAY UNTIL YOU HAVECAREFULLY CONSIDERED WHAT THEY DONOT SAY.

    WILLIAM W. WATT

    CSSBB 2007 VII -1 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    2/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/BASIC TERMS

    Measure - Statistics

    Measure- Statistics is described in the following topic areas: Basic statistics Probability Process capability

    Basic Statistics

    Basic terms Central limit theorem Descriptive statistics

    Graphical methods Statistical conclusions

    ContinuousDistribution:

    DiscreteDistribution:

    Parameter:

    Population:

    Statistic:

    CSSBB 2007

    Basic statistics is presented in the following topic areas:

    Basic TermsA distribution containing infinite (variable) data points thatmay be displayed on a continuous measurement scale.Examples: normal, uniform, exponential, and Weibulldistributions.A distribution resulting fromcountable (attribute) datathat hasa finite number of possible values. Examples: binomial,Poisson, and hypergeometric distributions.The true numeric population value, often unknown, estimatedby a statistic.All possible observations of similar items fromwhich asampleis drawn.A numerical data value taken from a sample that maybeusedto make an inference about a population.

    (Omdahl, 1997)12

    VII - 2 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    3/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/CENTRAL LIMIT THEOREM

    Central Limit TheoremIf a random variable, X, has mean I J , and finite variance 02, as n increases, Xapproacher a normal distribution with mean I J and variance Ox 2. Where,a x 2 =!!_ and n is the number of observations on which each mean is based.n

    F(x)

    Normal Distributionof Sample Means

    Distribution ofIndividuals~

    I JFigure 7.1 Distributions of Individuals Versus MeansThe Central Limit Theorem States:

    The sample means X i will be more normally distributed around I J thanindividual readings Xj' The distribution of sample means approaches normalregardless of the shape of the parent population. This is why X - R controlcharts work!

    !he spread in sample means X i is less than Xj with the standard deviation ofXi equal to the standard deviation of the population (individuals) divided bythe square root of the sample size. Sx is referred to as the standard error ofthe mean:

    o0- = ~x r n sWhich is estimated by Sx = ~r nSolution: Sx = Sx = 0.124 = 0.062 grams

    / f a 1 4

    Example 7.1: Assume the following are weight variation results: X = 20 grams anda = 0.124 grams. Estimate ax for a sample size of 4:

    (

    CSSBB 2007 VII - 3 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    4/76

    VII. MEASURE STATISTICSBASIC STATISTICS/CENTRAL LIMIT THEOREM

    Central Limit Theorem (Continued)Thesignificance ofthe central limit theorem oncontrol charts is that the distributionof sample means approaches a normal distribution. Refer to Figure 7.2below:

    L L S = 2 .x b , n = , ~ x~,x~x

    PopulationDistribution

    n=4

    n = 25

    xPopulation Distribution PopulationDistributionopulation Distribution

    x

    n=4

    x

    n = 25

    +--.-+-,~~X XSampling Distribution of X Sampling Distribution of X Sampling Distribution of X Sampling Distribution of X

    Figure 7.2 Illustration of Central Tendency (Lapin, 1982)9In Figure 7.2, a variety of population distributions approach normality for thesampling distribution of X as n increases. Formost distributions, but not all, anearnormal sampling distribution is attained with a sample size of 4 or 5.

    CSSBB 2007 VII4 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    5/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Descriptive StatisticsDescriptive statistics include measures of central tendency, measures of dispersion,probability density function, frequency distributions, and cumulative distributionfunctions.

    Measures of Central TendencyMeasures of central tendency represent different ways of characterizing the centralvalue of a collection of data. Three of these measures will be addressed here: mean,mode, and median.

    The Mean (X-bar, X)The mean is the total of all data values divided by the number of data points.

    - EXX=-nWhere: X is the mean

    L means summationX represents each numbern is the sample size

    5 3 7 9 854 5 8Example 7.2: For the following 9 numbers, find X .

    Answer: 6The arithmetic mean is the most widely used measure of central tendency.

    Advantages of using the mean: It is the center of gravity of the data It uses all data No sorting is needed

    Disadvantages of using the mean: Extreme data values may distort the picture It can be time-consuming The mean may not be the actual value of any data points

    CSSBB 2007 VII-5 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    6/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Measures of Central Tendency (Continued)The Mode

    The mode is the most frequently occurring number in a data set.

    5 3 7 9 8 545 8Example 7.3: (9 Numbers). Find the mode of the following data set:

    Answer: 5Note: It is possible for groups of data to have more than one mode.

    Advantages of using the mode: No calculations or sorting are necessary It is not influenced by extreme values It is an actual value It can be detected visually in distribution plots

    Disadvantage of using the mode: The data may not have a mode, or may have more than one mode

    The Median (Midpoint)The median is the middle value when the data is arranged in ascending ordescending order. For an even set of data, the median is the average of the middletwo values.

    (9 Numbers) 2 2 3 4 5 7 8 8 9

    Examples 7.4: Find the median of the following data set:(10 Numbers) 2 2 2 3 4 6 7 7 8 9

    Answer: 5 for both examples

    CSSBB2007 VII - 6 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    7/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Measures of Central Tendency (Continued)Advantages of using the median:

    Provides an idea of where most data is located Little calculation required Insensitivity to extreme values

    Disadvantages of using the median: The data must be sorted and arranged Extreme values may be important Two medians cannot be averaged to obtain a combined distribution median The median will have more variation (between samples) than the average

    For a Normal Distribution

    MEAN = MEDIAN = MODE

    For a Skewed Distribution.......r---- MODE

    ----- MED IAN

    Figure 7.3 A Comparison of Central Tendency in Normal and Skewed Distributions

    CSSBB 2007 VII -7 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    8/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Measures of DispersionOther than central tendency, the other important parameter to describe a set of datais spread or dispersion. Three main measures of dispersion will be reviewed: range,variance, and standard deviation.

    Range (R)The range of a set of data is the difference between the largest and smallest values.

    537 9 8 5 4 5 8Example 7.5: (9 Numbers). Find the range of the following data set:

    Answer: 9 - 3 = 6The variance, a2 or S2, is equal to the sum of the squared deviations from the mean,divided by the sample size. The formula for variance is:

    Population, 02 = L(X ~ 1J)2 Sample, S2 = L(X - X)2n - 1The variance is equal to the standard deviation squared.

    Standard Deviation (a, s)The standard deviation is the square root of the variance.

    Population, 0- = ~ E(X ~ ~ ' i ' Sample, s = ~ L . ( : _ -1 X ) 2N is used for a population and n -1 for a sample to remove potential bias in relativelysmall samples (less than 30).

    Coefficient of Variation (COV)The coefficient of variation equals the standard deviation divided by the mean andis expressed as a percentage.

    COY = ~ (100%) or COY = 0 (100%)X I J

    CSSBB2007 VII-8 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    9/76

    VII. MEASURE STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    The Classic Method of Calculating Standard DeviationCalculate the standard deviation of the following data set using the formula:

    s = ~ U X - X > "n-1Example 7.6: Determine s from the following data:

    XSAMPLE123456789101112131415

    X162176160142125159145167114120119180154125142IX = 2190

    146146146146146146146146146146146146146146146

    (X -X ) (X_X)2+16 256+30 900+14 196-4 16-21 441+13 169-1 1+21 441-.32 1024-26 676-27 729+34 1156+ 8 64-21 441- 4 _ _ _ _ 1 _

    L(X-X? = 6526

    Summary:X = 146n = 15s = 21.6R= 66

    s is the standard deviation of the sample (21.6)whichis used as an estimate for the population from whichthe sample was taken.

    Calculate the average: i= LX = 2190 = 146n 15 Compute the deviation (X-X) Square each deviation (X - X )2 Sum the squares of the deviations L (X_X)2 Calculate standard deviation:

    s = ~ L(x.i )2 = ~ 6526 = V 465 = 21.6n1 14

    CSSBB 2007 VII9 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    10/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Shortcut Formula for Standard Deviation

    s = n (EX 2 ) - (EX )2n (n - 1)This formula will yield the same results as shown on the previous page. It is calleda "shortcut" because it is convenient to use with some computers and calculatorswhen working with messy data.

    Determine X and s Using a CalculatorFormerly the authors attempted to instruct students on how to determine X andstandard deviation on a Sharp calculator. However, many varieties of TexasInstrument, Casio, Hewlett Packard, and Sharp calculators can accomplish this task.The functions on all of these calculators are subject to change. It should berecognized that most technical people determine the mean and dispersion using acalculator. The following general procedures apply:

    1. Turn on the calculator. Put it in statistical mode.2. Enter all observation values following the model instructions.

    3. Determine the sample mean (X).4. Determine the population standard deviation, 0, or the sample standard

    deviation, s.Alternative Methods to Determine Standard Deviation

    Standard deviation can be determined using probability paper. However, since theadvent of computer programs, this is rarely done. Standard deviation can also beestimated from control charts using R . This technique is discussed later in thisSection and relates to the determination of process capability.The control chart method of estimating standard deviation makes the bigassumption that the process being charted is in control and many processes are not.Using a calculator or software program to determine s from individual data is oftenmore accurate.

    CSSBB 2007 VII -10 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    11/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Probability Density FunctionThe probability density function, f(x), describes the behavior of a random variable.Typically, the probability density function is viewed as the "shape" of thedistribution. It is normally agrouped frequency distribution. Consider the histogramfor the length of a product shown in Figure 7.4.

    100 100

    20

    80 80

    20

    n , n , ~ n n

    :> .g 60-Q):IC'. t 40

    6040

    Length LengthFigure 7.4 Example Histogram Figure 7.5 Histogram with Overlaid Model

    A histogram is an approximation of the distribution's shape. The histogram shownin Figure 7.4 appears symmetrical. Figure 7.5 shows this histogram with a smoothcurve overlaying the data. The smooth curve is the statistical model that describesthe population; in this case, the normal distribution.When using statistics, the smooth curve represents the population. The differencesbetween the sample data represented by the histogram and the population datarepresented by the smooth curve are assumed to be due to sampling error. Inreality, the difference could also be caused by lack of randomness in the sample oran incorrect model.

    CSSBB 2007 VII -11 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    12/76

  • 8/2/2019 7 Measure Statistics

    13/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/DESCRIPTIVE STATISTICS

    Cumulative Distribution FunctionThe cumulative distribution function, F(x),denotes the areabeneath the probabilitydensity function to the left of x. This is demonstrated in Figure 7.7.

    0 . 0 3 0

    ~ 0 .0 2 5~co~ 0 .0 2 0

    . ~ 0 .0 1 5.co; ;_ 0 .0 10

    0 . 0 0 50 . 0 0 0 . . . .

    1 55 1 5 9 1 63 1 67 1717 5 1 7 9 1 8 3 1 8 7 \ 9 1 1 9 5 1 9 9 2 0 3 2 0 7 2 1 1 2 1 5 2 1 9 2 2 3 2 '[ 1 2 3 i 2 3 5 2 3 9 2 4 3\ L e n g th

    co 1 . 0 0 0 -ou; ; 0 .8 00~co~ 0 . 6 0 0.c

    ; ; 0 .4 0 0

    ~ 0 1 0 0'"E'"u . oo o . ~ ~~ " ' ~ " " ' ' ' ~ " ' " I ~ ' ' ' ' ' ' : : :- ' ' ' ' '' ' = ' ' I o ' ' ~ I J ' ' J ' ' : : : : : : ' ' '' ' ' ~ ' " ' ' '~ o I ' ' " ' = " I r I l ~ " " " : : : : : ! ' ' ' " d~ , , , , , , , ~ , , , " , ~ , , , , , , : : : : : , . ' ' '' ' : : : :: : ' ' '' ' ' : :: : : : : ' '' ' ' '' : : : : : I r '' ' ' ~ " 1 " " ~ " ' I " : ; : ; " " . 1 ; ; ; : , ' ' ' '~mrnW171mm.wrn.m mmnmmm~.mL e n g t hFigure 7.7 Cumulative Distribution Function for Length

    The area of the shaded region of the probability density function in Figure 7.7 is0.2525which corresponds to the cumulative distribution function at x = 190.Mathematically, the cumulative distribution function is equal to the integral of theprobability density function to the left of x.

    xF(x) = J f(t)dt

    _00

    Example 7.7: A random variable has the probability density function f(x) = 0.125x,where x is valid from 0 to 4. The probability of x being less than or equal to 2 is:F(2) = J 2 0.125xdx = 0.125x2 i= 0.0625x2 i= 0.25

    200Solution:

    CSSBB 2007 VII -13 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    14/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/G~PHICAL METHODS

    Graphical Methods. Graphical methods include boxplots, stem and leaf plots, scatter diagrams, runcharts, histograms, and normal probability plots. Additional information onproperties and applications of probability distributions is provided later in thisSection.

    BoxplotsOne of the simplest and most useful ways of summarizing data is the boxplot. Thistechnique is credited to John W. Tukey (1977)16. The boxplot is a five numbersummary of the data. The data median is a line dividing the box. The upper andlower quartiles of the data define the ends of the box. The minimum and maximumdata points are drawn as points at the end of lines (whiskers) extending from thebox. A simple boxplot is shown in Figure 7.8 below.Boxplots can be more complex. See Figure 7.9 below. They can be notched toindicate variability of the median. The notch widths are calculated so that if twomedian notches do not overlap, the means are different at a 5% significance level.Boxplots can also have variable widths, proportional to the log of the sample size.Outliers can also be identified as points (asterisks) more than 1.5 times theinterquartile distance from each quartile. Some computer programs canautomatically generate boxplots for data analysis ..67 ,-------------,.65.63.61.59.57.55.53.51

    - r-

    - L....49 ...__ ---.l

    9 >. 8 . . . . . . cro 7::Ja. . . . . . 6Q)E 5)I-:::J 4J)roQ) 3

    2 1

    1 3 4 5 6Figure 7.8 A Simple Boxplot Figure 7.9 Complex Boxplots

    CSSBB 2007 VII -14 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    15/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Stem and Leaf PlotsThe stem and leaf diagram (Tukey, 1977)16is a convenient, manual method forplotting data sets. These diagrams are effective in displaying both variable andcategorical data sets. The diagram consists of grouping the data by class intervals,as stems, and the smaller data increments as leaves. Stem and leaf plots permitdata to be read directly, whereas, histograms lose the individual data values asfrequencies within class intervals.Example 7.8: Shear Strength, 50 observations:

    51.4 49.9 46.5 47.5 42.5 40.8 46.8 47.2 49.1 43.643.1 48.0 46.8 43.1 44.6 52.2 49.6 48.8 47.3 45.845.6 48.9 47.7 46.4 45.2 46.4 46.5 50.6 42.6 47.544.9 50.4 43.2 44.4 40.2 43.8 44.1 48.9 42.4 45.449.6 48.8 44.2 51.0 47.4 47.0 47.0 48.6 49.8 50.3Show the above data in a histogram format.

    14~--.-

    12108Frequency 6420

    41# 43# 45# 47# 49# 51# 53#Strength

    CSSBP 2007 VII -15 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    16/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Stem and Leaf Plots (Continued)Example 7.8 (continued): Show the same data in a stem and leaf diagram:

    LeafStem

    52538

    69 87091688591514644966

    8 62124086442 4824506830201234567890124444444444555

    Figure 7.11 Shear Strength Stem and Leaf Plot

    Scatter DiagramsA scatter diagram is a graphic display of many XV coordinate data points whichrepresent the relationship between two different variables. It is also referred to asa correlation chart. For example, temperature changes cause contraction orexpansion of many materials. Both time and temperature in a kiln will affect theretained moisture in'~od. Examples of such relationships on the job are abundant.

    II Knowledge of the nature of these relationships can often provide a clue to thesolution of a problem. Scatter diagrams can help determine if a relationship existsand how to control the effect of the relationship on the process.In most cases, there is an independent variable and a dependent variable.Traditionally, the dependent variable is represented by the vertical axis and theindependent variable is represented by the horizontal axis.The ability to meet specifications in many processes is dependent upon controllingtwo interacting variables and, therefore, it is important to be able to control the effectone variable has on another. For instance, if the amount of heat applied to plasticliners affects their durability, then control limits must be set to consistently apply theright amount of heat. Through the use of scatter diagrams, the proper temperaturecan be determined ensuring a quality product.

    CSSBB 2007 VII -16 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    17/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Scatter Diagrams (Continued)The dependent variable can be controlled if the relationship is understood.Correlation originates from the following:

    A cause-effect relationship A relationship between one cause and another cause A relationship between one cause and two or more other causes

    Not all scatter diagrams reveal a linear relationship. The examples below definitelyportray a relationship between the two variables, even though they do notnecessarily produce a straight line. If a center line can be fitted to a scatter diagram,it will be possible to interpret it. To use scatter diagrams, one must be able to decidewhat factors will best control the process within the specifications.

    0

    o

    o o00o 0

    o000

    o 0oo 00

    o 0

    Low-positive High-positive

    oo00 oo 0

    o o o o o0o o

    o .0 oo 0

    o

    High-negative Non-linear Relationship

    oo

    o

    oo o

    o

    o

    No correlation;;~~(~ 1

    Non-linear RelationshipFigure 7.12 Scatter Diagram Examples

    CSSBB 2007 VII -17 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    18/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Scatter Diagrams (Continued)Sample Correlation Coefficient

    A sample correlation coefficient "r" can be calculated to determine the degree ofassociation between two variables.

    n _L (Xi - X )( Yi - Y )i=1r = - - - - ; : : : = = = = = = = = = = = = = = = = = = = =~ [ t . (X; - X )2 ]l t . (Y; - Y )2]

    Interpret the Relationship Between Two Variablesr = -1.0r = -0.5r= 0r = +0.5r = +1.0

    slight positivestrong positive

    when X increases, Y decreaseswhen X increases, Y generally decreasesthe two variables are independentwhen X increases, Y generally increaseswhen X increases, Y increases

    strong negativeslight negativeno correlation

    Concluding Comments A correlation analysis seeks to uncover relationships. Common sense mustbe liberally applied. There is such athing asa nonsense correlation wherebytwo variables that are not related can show correlation. For example, everytime the car is washed, it rains.

    The line of "best fit" can be obtained by calculating a "regression line."However, to determine whether a relationship exists or not, the line can be"eyeballed." Simply drawastraight linethrough the points attempting to haveapproximately one-half aboveandone-half below. Study the points for trendsand confirm that the line drawn fits appropriately.

    Scatter diagrams should always be analyzed prior to making decisions incorrelation statistics.

    CSSBB 2007 VII-18 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    19/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Run (Trend) ChartsThe average human brain is not good at comparing more than a few numbers at atime. Therefore, a large amount of data is often difficult to analyze unless it ispresented in some easily digested format.Data can be presented in either summary (static) or time sequence (dynamic)fashion. Important elements of most processes can change over time. Thesechanges can be presented graphically by use of control charts or by the use of runor trend charts. For many business activities, trend charts will show patterns thatindicate if a process is running normally or whether desirable or undesirablechanges are occurring.It should be noted that normal convention has time increasing across the page (fromleft to right) and the measurement value increasing up the page. Dependent uponthe process measurement, values can be "good" if they go up or down the page orremain as close as possible to some target value. Consider the following examples:

    UPWARD TREND DOWNWARD TREND100 100 10080 80

    60 60 6040 40020 2000 0 05 10 15 20 10 15 20

    PROCESS SHIFT--"-'

    \I \" ..., - . . . . . .

    5 10 15 20

    UNUSUAL VALUES100 100

    80 80

    60 60

    40 40

    20 20

    0 05 10 15 20

    CYCLES INCREASING VARIABILITY

    5 10 15 20 5 10 15 20

    Figure 7.13 Examples of Trends

    CSSBB2007 VII -19 QUALITY COUNCIL OFINDIANA

  • 8/2/2019 7 Measure Statistics

    20/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    HistogramsHistograms have the following characteristics:

    Frequency column graphs that display a static picture of process behavior.Histograms require a minimum of 50-100 data points.

    A histogram is characterized by the number of data points that fall within agiven bar or interval. This is commonly referred to as "frequency."

    A stable process is frequently characterized by a histogram exhibitingunimodal or bell-shaped curves. A stable process is predictable.

    An unstable process is often characterized by a histogram that does notexhibit a bell-shaped curve. Obviously other more exotic distribution shapes(like exponential, lognormal, gamma, beta, Weibull, Poisson, bino.mial,hypergeometric, geometric, etc.) exist as stable processes.

    When the bell curve is the approximate distribution shape, variation aroundthe bell curve is chance or natural variation. Other variation is due to specialor assignable causes.

    Histogram ConstructionFREQUENCY TALLY.50 I.51 I I I.52 U I f I.53 U I f U I f.5 4 U I f U I f U H ' I

    .55 U I f U I f U H ' U I f I.5 6 U I f U I f U H ' U I f U I f

    .57 U I f U I f U H ' U I f U I f I I.58 U I f U I f U H ' U I f U H ' I I I.59 U I f U I f U H ' U I f U I f I.60 U I f U I f U H ' U I f I.61 U I f U I f U H '.62 U I f U I f.63 U I f II.64111.651

    SPECIFICATION LIMITS 0.50 - 0.60~

    28262422> - 20fi 18

    ~ 16a 14~ 12I.L. 10

    8642

    MEASUREMENT (INCHES)Figure 7.14 Histogram Construction Example

    CSSBB 2007 QUALITY COUNCIL OFINDIANAII - 20

  • 8/2/2019 7 Measure Statistics

    21/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/GRAPHICAL METHODS

    Column Graph

    Histogram Examples

    I JII IBar Graph Normal Histogram

    Histogram with Special Causes Bimodal Histogram(May Also be Polymodal)

    LSL USL

    Negatively Skewed Truncated Histogram(After 100% Inspection)

    (-,

    Figure 7.15 Histogram Examples

    CSSBB 2007 QUALITY COUNCIL OF INDIANAII - 21

  • 8/2/2019 7 Measure Statistics

    22/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/STATISTICAL CONCLUSIONS

    Drawing Valid Statistical Conclusions *Analytical (Inferential) Studies

    The objective of statistical inference is to draw conclusions about populationcharacteristics based on the information contained in asample. Statistical inferencein a practical situation contains two elements: (1) the inference and (2) a measureof its validity. The steps involved in statistical inference are:

    Define the problem objective precisely

    Decide if the problem will be evaluated by a one-tail or two-tail test Formulate a null hypothesis and an alternate hypothesis Select a test distribution and a critical value of the test statistic reflecting the

    degree of uncertainty that can be tolerated (the alpha, 0, risk) Calculate a test statistic value from the sample information Make an inference about the population by comparing the calculated value to

    the critical value. This step determines ifthe null hypothesis is to be rejected.If the null is rejected, the alternate must be accepted.

    Communicate the findings to interested partiesEveryday, in our personal and professional lives, individuals are faced withdecisions between choice A or choice B. In most situations, relevant information isavailable; but it may be presented in a form that is difficult to digest. Quite often, thedata seems inconsistent or contradictory. In these situations, an intuitive decisionmay be little more than an outright guess. While most people feel their intuitivepowers are quite good, the fact is that decisions made on gut-feeling are oftenwrong.

    * A SUbstantial portion of the material throughout this Section is from the CQEPrimer(2006)17. The student should note that portions ofthis subject are coveredin more depth in Section VIII.

    CSSBB 2007 VII - 22 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    23/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/STATISTICAL CONCLUSIONS

    Drawing Valid Statistical Conclusions (Continued)Null Hypothesis and Alternate Hypothesis

    The null hypothesis is the hypothesis to be tested. The null hypothesis directlystems from the problem statement and is denoted as H o.The alternate hypothesis must include all possibilities which are not included in thenull hypothesis and is designated H1Examples of null and alternate hypothesis:

    Null hypothesis:Alternate hypothesis:A null hypothesis can only be rejected, or fail to be rejected, it cannot be acceptedbecause of a lack of evidence to reject it.

    Test StatisticIn order to test a null hypothesis, a test calculation must be made from sampleinformation. This calculated value is called a test statistic and is compared to anappropriate critical value. A decision can then be made to reject or not reject thenull hypothesis.

    Types of ErrorsWhen formulating a conclusion regarding a population based on observations froma small sample, two types of errors are possible:

    Type I error: This error results when the null hypothesis is rejected when itis, in fact, true.

    Type II error: This error results when the null hypothesis is not rejected whenit should be rejected.

    The degree of risk (a) is normally chosen by the concerned parties (a is normallytaken as 5%) in arriving at the critical value of the test statistic.

    CSSBB 2007 VII - 23 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    24/76

    VII. MEASURE - STATISTICSBASIC STATISTICS/STATISTICAL CONCLUSIONS

    Drawing Valid Statistical Conclusions (Continued)Enumerative (Descriptive) Studies

    Enumerative data is data that can be counted. For example: the classification ofthings, the classification of people into intervals of income, age, health. A censusis an enumerative collection and study. Useful tools for tests of hypothesisconducted on enumerative data are the chi square, binomial, and Poissondistributions.Deming, in 1975, defined a contrast between enumeration and analysis:Enumerative study: A study in which action will be taken on the universe.Analytical study: A study in which action will be taken on a process to improve

    performance in the future.Numerical descriptive measures create a mental picture of a set of data. Themeasures calculated from a sample are called statistics. When these measuresdescribe a population, they are called parameters.

    Measures Statistics ParametersMean X I J

    Standard Deviation s aTable 7.16 Statistics and Parameters

    Table 7.16 shows examples of statistics and parameters for the mean and standarddeviation. These two important measures are called central tendency anddispersion.

    Summary of Analytical and Enumerative StudiesAnalytical studies start with the hypothesis statement made about populationparameters. A sample statistic is then used to test the hypothesis and either rejector fail to reject the null hypothesis. At a stated level of confidence, one is then ableto make inferences about the population.

    CSSBB 2007 VII - 24 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    25/76

    VII. MEASURE - STATISTICSPROBABILITY/BASIC CONCEPTS

    Probability

    Basic concepts Commonly used distributions Other distributions

    Probability is presented in the following topic areas:

    Most quality theories use statistics to make inferences about a population based oninformation contained in samples. The mechanism one uses to make theseinferences is probability.

    Conditions for ProbabilityThe probability of any event, E, lies between 0 and 1. The sum of the probabilitiesof all possible events in a sample space, 5, = 1.

    Simple EventsAn event that cannot be decomposed is a simple event, E. The set of all samplepoints for an experiment is called the sample space, 5.If an experiment is repeated a large number oftimes, N, and the event, E, is observednE times, the probability of E is approximately:

    PE :::l nENExample 7.9: The probability of observing 3 on the toss of a single die is:

    PE =!3 6Example 7.10: What is the probability of getting 1, 2, 3, 4, 5, or 6 by throwing a die?

    PET= P(E1) + P(E2) + P(E3) + P(E4) + P(E5) + P(Es)1 1 1 111PE = - + - + - + - + - + - = 1T 666666

    CSSBB 2007 VII - 25 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    26/76

    VII. MEASURE - STATISTICSPROBABILITY/BASIC CONCEPTS

    Compound EventsCompound events are formed by a composition oftwo or more events. They consistof more than one point in the sample space. For example, if two dice are tossed,what is the probability of getting an 8? A die and a coin are tossed. What is theprobability of getting a 4 and tail? The two most important probability theorems areadditive and multiplicative (covered later in this Section). For the followingdiscussion, EA= A and EB= B.I. Composition. Consists of two possibilities -- a union or intersection.

    A. Union of A and BIf A and B are two events in a sample space,S, the union of A and B (A u B)contains all sample points in event A, B, or both.

    Example 7.11: In the die toss in Example 7.10, consider the following:If A = E1, E2and E3(numbers less than 4)and B = E1,E3and E5(odd numbers),then A uB = E1,E2,E3and E5.

    B. Intersection of A and BIf A and B are two events in a sample space,S, the intersection of A and B (An B) is composed of all sample points that are in both A and B.

    Example 7.12: Refer to Example 7.11. An B = E1and E3

    AuB AnBFigure 7.17 Venn Diagrams Illustrating Union and Intersection

    CSSBB 2007 VII - 26 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    27/76

    \

    VII. MEASURE - STATISTICSPROBABILITY/BASIC CONCEPTS

    Compound Events (Continued)II. Event Relationships. There are three relationships involved in finding the

    probability of an event: complementary, conditional, and mutually exclusive.A. Complement of an EventThe complement of event A is all sample points in the sample space, S, butnot in A. The complement of A is 1-PA'

    Example 7.13: If PA (cloudy days) is 0.3, the complement of A would be 1 - PA = 0.7(clear).

    B. Conditional ProbabilitiesThe conditional probability of event A occurring, given that event B hasoccurred is:

    Example 7.14: If event A (rain) = 0.2 and event B (cloudiness) = 0.3, what is theprobability of rain on a cloudy day? (Note, it will not rain without clouds.)

    P(AIB) = p(AnB) = 0.2 =0.67P(B) 0.3

    Two events A and B are said to beindependent if either:

    P(AIB) = P(A) or P(BIA) = P(B) However,P(AIB) = 0.67 and P(A) = 0.2= no equality, andP(BIA) = 1.00 and P(B) = 0.3 = no equality

    Therefore, the events are said to be de endent.

    CSSBB 2007 VII - 27 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    28/76

    VII. MEASURE STATISTICSPROBABILITY/BASIC CONCEPTS

    Compound Events (Continued)c. Mutually Exclusive Events

    If event A contains no sample points in common with event B, then they aresaid to be mutually exclusive.

    Example 7.15: Obtaining a 3 and a 2 on the toss of a single die is a mutuallyexclusive event. The probability of observing both events simultaneously is zero.The probability of obtaining either a 3 or a 2 is:

    111PE + PE = - + - = -2 3 6 6 3

    D. Testing for Event Relationships

    Does P(AIB) = P(A)?P(AIB) = p(AnB) = 2/6 = 2

    P(B) 1/2 3Therefore P(AIB) :# P(A)

    1P(A) = -2

    Example 7.16: Refer to Example 7.11.Event A: E1, E2, E3Event B: E1, E3, E5

    Are events A and B mutually exclusive, complementary, independent, or dependent?Events A and B contain two sample points in common, so they are not mutuallyexclusive. They are not complementary because event B does not contain all pointsin S that are not in event A.To determine if they are independent requires a check.

    By definition, events A and B are dependent.

    CSSBB 2007 VII28 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    29/76

    VII. MEASURE - STATISTICSPROBABILITY/BASIC CONCEPTS

    The Additive LawIf the two events are not mutually exclusive:

    1. P (A uB) = peA) + PCB) - P (A n B)Note that P (A uB) is shown in many texts as P (A + B) and is read as the probabilityof A or B.

    P (A u B) = 0.7 + 0.7 - (0.7 x 0.7)= 1.4 - 0.49= 0.91 = 91%

    Example 7.17: If one owns two cars and the probability of each car starting on acoldmorning is 0.7, what is the probability of getting to work?

    If the two events are mutually exclusive, the law reduces to:2. P (A uB) = peA) + PCB)also P (A + B) = peA) + PCB)

    Black0 . 4

    Example 7.18: If the probability of finding a black sock in a dark room is 0.4 and theprobability of finding a blue sock is 0.3, what is the chance of finding a blue or blacksock?

    P (A uB) = 0.4 + 0.3 = 0.7 = 70%

    Note: The problem statements center around the word "or." Will car A or B start?Will one get a black or blue sock?

    CSSBB 2007 VII - 29 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    30/76

    VII. MEASURE - STATISTICSPROBABILITY/BASIC CONCEPTS

    The Multiplicative LawIf events A and B are dependent, the probability of event A influences the probabilityof event B. This is known as conditional probability and the sample space isreduced.For any two events, A and B, such that PCB) * 0:

    1. p (A IB ) = P~~~)B) and p(AnB) = p(AIB)P(B)Note in some texts P (A n B) is shown as peA B) and is read as the probability ofA and B. P(BIA) is read as the probability of B given that A has occurred.

    p(AnB) = 30 x 29 = 870 = 0.088100 99 9900p(AnB) = 8.8%

    Example 7.19: If a shipment of 100 TV sets contains 30 defective units and twosamples are obtained, what is probability of finding both defective? (Event A is thefirst sample and the sample space is reduced, and event B is the second sample.)

    If events A and B are independent:2. P (A n B) = peA) X PCB)

    P (A n B) = 0.9 X 0.8 = 0.72P (A n B) = 72%

    Example 7.20: One relay in an electric circuit has a probability of working equal to0.9. Another relay in series has a chance of 0.8. What's the probability that thecircuit will work?

    Note: The problem statements center around the word "and." Will TV A and Bwork? Will relay A and B operate?

    CSSBB 2007 VII - 30 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    31/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Commonly Used Distributions

    Normal Binomial Poisson

    Chi square Student's t F distribution

    Commonly used distributions include the following:

    Normal DistributionThe normal distribution has numerous applications. It is useful when it is equallylikely that readings will fall above or below the average.When a sample of several random measurements are averaged, distribution of suchrepeated sample averages tends to be normally distributed regardless of thedistribution of the measurements being averaged. Mathematically, if

    x = x1 + x2 + xa +",+xnn

    the distribution of Xs becomes normal as n increases. If the set of samples beingaveraged have the same mean and variance, then the mean of the Xs is equal to themean (IJ) of the individual measurements, and the variance of the Xs is:

    2 020-=-x nWhere 0 is the variance of the individual variables being averaged.The tendency of sums and averages of independent observations, from populationswith finite variances, to become normally distributed as the number of variablesbeing summed or averaged becomes large is known as the central limit theorem.For distributions with little skewness, summing or averaging as few as 3 or 4variables will result in a normal distribution. For highly skewed distributions, morethan 30 variables may have to be summed or averaged to obtain a normaldistribution. The normal probability density function is:

    1 1 ( X _ I l ) 2f (X) = e 2 (J -00 < X < 00c r . J 2 r r _ ,Where IJ is the mean and 0is the standard deviation.

    CSSBB 2007 VII - 31 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    32/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Normal Distribution (Continued)The normal probability density function is not skewed, as shown in Figure 7.18.

    0.4

    ~0.3s : : : :

  • 8/2/2019 7 Measure Statistics

    33/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Normal Distribution (Continued)

    50Fi ure 7.19

    55 Voltage 65 70

    Example 7.21: A battery is produced with an average voltage of 60 and a standarddeviation of 4 volts. If 9 batteries are selected at random, what is the probability thatthe total voltage of the 9 batteries is greater than 530? What is the probability thatthe average voltage of the 9 batteries is less than 62?Solution Part A: The expected total voltage for nine batteries is 540. The expectedstandard deviation of the voltage of the total of nine batteries is:

    S;OTAl = 9 x (4)2 = 144 STOTAl = 12Transforming to standard normal: z = 530 - 540 = - 0.83312From the standard normal table, the area to the right of z is 0.7976.Solution Part B: The expected value is 60. The standard deviation is :

    a 4Sx = - = - = 1.333r n /9 Thus, z = 62 - 60 = 1.51.333The area to the left of z is 1 - 0.0668 = 0.9332The probability density function of the voltage of the individual batteries and of theaverage of nine batteries is shown in Figure 7.19. The distribution of the averageshas less variance because the standard deviation of the averages is equal to thestandard deviation of the individuals divided by the square root of the sample size.0.30

    0.25~0.20II)s ::~ 0.15~: c 0.10co.ce1 1 . . 0.05 Individuals

    Average of 9

    0.00

    l _CSSBB 2007 VII - 33 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    34/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Binomial DistributionThebinomial distribution is oneof several distributions usedto model discrete data.Somesituations call for discrete data, such as, the number of missiles required todestroy a target or the number of defectives in a lot of 1,000 items.The binomial distribution applies when the population is large (N > 50) and thesamplesize is small compared to the population. The binomial is best appliedwhenthe sample size is less than 10% of N (n < 0.1N). Binomial sampling is withreplacement. It is most appropriate to usewhen the proportion defective is equal toor greater than 0.1.The binomial is an approximation to the hypergeometric. The normal distributionapproximates the binomial when np ~ 5. The Poisson distribution can be used toapproximate the binomial distribution when p is small (generally, less than 0.1) andn is large (generally, n ~ 16) by using np as the mean of the Poisson distribution.

    Where: n = sample sizer = occurrences or number of defectivesp = probability or proportion defective

    There is a limited binomial probability table in the Appendix. The binomialdistribution, using different p values, is shown in Figure 7.20.

    P(r) = n! p 'q n o rr! (n-r)!n = sample size P(r)r = number of occurrencesp = probabilityq = 1 - P

    rFigure 7.20 Binomial Distribution Example

    CSSBB 2007 VII - 34 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    35/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Binomial Distribution (Continued)The binomial distribution is used to model situations having only 2 possibleoutcomes, usually labeled as success or failure. For a random variable to follow abinomial distribution, the number of trials must be fixed, and the probability ofsuccess must be equal for all trials. The binomial probability density function is:

    P[x.n.p] = ( : ) p X ( 1 - prWhere P (x,n,p) is the probability of exactly x successes in n trials with a probabilityof success equal to p on each trial. Note that:

    ( n ) n!x - x !(n - x)!This notation is referred to as "n choose x," and is equal to the number ofcombinations of size x made from n possibilities. This function is found on mostcalculators.The binomial distribution mean and standard deviation, sigma, can be obtained fromthe following calculations when the event of interest is the count of definedoccurrences in the population, e.g., the number of defectives or effectives.

    The binomial mean = IJ = npThe binomial sigma = C J = v'np(1-p)Poisson Distribution

    The Poisson distribution is one of several distributions used to model discrete dataand has numerous applications in industry. The Poisson distribution can be anapproximation to the binomial when p is equal to or less than 0.1, and the samplesize n is fairly large.

    I J re -~P(r) = -=----r!Where: I J = np = the population mean

    r = number of defectives( e = 2.71828 the base of natural logarithms

    CSSBB 2007 VII - 35 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    36/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Poisson Distribution (Continued)The Poisson distribution using different p values is shown in Figure 7.21.

    n = sample size P(r)r = number of occurrencesp = probabilitynp = t J = average

    (np)re-npP(r) = ~~-r!

    rFigure 7.21 Poisson Distribution Example

    The Poisson is used as a distribution for defect counts and can be used as anapproximation to the binomial. For np

  • 8/2/2019 7 Measure Statistics

    37/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Chi Square DistributionThe chi square, t, and F distributions are formed from combinations of randomvariables. Because of this, they are generally not used to model physicalphenomena, like time to fail, but are used to make decisions and constructconfidence intervals. These three distributions are considered samplingdistributions. The student should be advised that there are numerous applicationsof these distributions in Section VIII.The chi square distribution is formed by summing the squares of standard normalrandom variables. For example, if z is a standard normal random variable, then:

    222 2Y =Z1 + Z2 + Z3+ .+Znis a chi square random variable (statistic) with n degrees of freedom. A chi squarestatistic is also created by summing two or more chi square statistics and dividingby the sum of the degrees of freedom. A distribution having this property isregenerative. The chi square distribution is aspecial case ofthe gamma distributionwith a failure rate of 2, and degrees of freedom equal to 2 divided by the number ofdegrees of freedom for the corresponding chi square distribution. The chi squareprobability density function is:

    x(v/2-1)e-x/2f{x} = 2V/2r(v 12) ,x > 0

    where v is the degrees of freedom, and r(x} is the gamma function. The chi squareprobability density function is shown in Figure 7.22.0.40

    0.30>.: t : : : :I/)c 0.201 lC>.~: cI 0.10.00r . . .ll.

    0.000 5 10 15 20X

    Figure 7.22 Chi square Probability Density FunctionCSSBB 2007 VII - 37 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    38/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Chi Square Distribution (Continued)The critical values of the chi square distribution are given in the Appendix.

    Solution: When hypothesis testing, this is commonly referred to as the criticalvalue with 5% significance, or a = 0.05. From the chi square table in the Appendix,this value is 14.067.

    Example 7.22: A chi square random variable has 7 degrees of freedom, what is thecritical value if 5% of the area under the chi square probability density is desired inthe right tail?

    F DistributionIfX is a chi square random variable with V1 degrees offreedom, and Y is a chi squarerandom variable with V2 degrees of freedom, and if X and Yare independent, then:

    is an F distribution with V 1 and V 2 degrees of freedom. The F distribution is usedextensively to test for equality of variances from two normal populations.The F probability density function is:

    f{x) = ,x>O

    CSSBB2007 VII - 38 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    39/76

    VII. MEASURE STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    F Distribution (Continued)The F probability density function is shown in Figure 7.23.

    1.00 v1=1,v2=1(1=1'V2~., Y .1=15, .0.80-iiit:~ 0.60~:0I'll 0.40.ce1 1 . 0.20

    0.000 1.5X

    Figure 7.23 The F Probability Density Function0.5 1 2 2.5 3

    The Fcumulative distribution function is given in Tables VII and VIII in the Appendix.Both the lower and upper tails are listed, but most texts only give one tail, andrequire the other tail to be computed using the expression:

    F = 1a , n 1 , n 2 F1 - a , n 2 , n 1

    Answer:1 1F = --- = - = 0.326

    0.95,10,8 F0.05,8,10 3.07

    Example 7.23: Given that FO05 with V1 = 8 and V2 = 10 is 3.07, find the value of FO95with V1 = 10 and V2 = 8.

    CSSBB 2007 VII39 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    40/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Student's t DistributionThe student's t distribution is formed by combining a standard normal randomvariable and a chi square random variable. If z is a standard normal randomvariable, and X2 is a chi square random variable with v degrees of freedom, then arandom variable with a t distribution is:

    zt= ~

    Like the normal distribution, when random variables are averaged, the distributionof the average tends to be normal, regardless of the distribution of the individuals.The t distribution is equivalent to the Fdistribution with 1 and v degrees of freedom.The t distribution is commonly used for hypothesis testing and constructingconfidence intervals for means. It is used in place of the normal distribution whenthe standard deviation is unknown. The t distribution compensates for the error inthe estimated standard deviation. If the sample size is large, n>100, the error in theestimated standard deviation is small, and the t distribution is approximately normal.The t probability density function is:

    (V + 1 )r -2- ( x2) - ( V +1)/2f(x) = 1+-~r(~) v for -00 < x < 00Where v is the degrees of freedom. The t probability density function is shown inFigure 7.24. 0.40

    ~0.30II)s ::Q)C~:: 0.20: cn s.Qee, 0.10

    0.00 ~;;;;;;;;;ii;;;;;;"'.....I.. ...... """;;;;iiIiiii;;;;~-4 -2 o 2 4x

    Figure 7.24 Student's t Probability Density Function

    CSSBB 2007 VII - 40 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    41/76

    VII. MEASURE - STATISTICSPROBABILITY/COMMON DISTRIBUTIONS

    Student's t Distribution (Continued)The mean and variance of the t distribution are:

    , . . . = 0 r r = _v_,v ~3v - 2From a random sample of n items, the probability that:

    -x - ~t=--81m

    falls between any two specified values is equal to the area under the t probabilitydensity function between the corresponding values on the x-axis with n-1 degreesof freedom.

    t = 495.13 - 500 = _ 2.2278.467//fS

    Example 7.24: The burst strength of 15 randomly selected seals is given below.What is the probability that the burst strength of the population is greater than 500?

    480 489 491 508 501500 486 499 479 496499 504 501 496 498

    Solution: The mean of these 15 data points is 495.13. The sample standarddeviation of these 15 data points is 8.467. The probability that the population meanis greater than 500 is equal to the area under the t probability density function, with14 degrees of freedom, to the left of:

    From the t table in the Appendix, the area under the t probability density function,with 14 degrees of freedom, to the left of -2.227 is 0.0214. This value must beinterpolated (2.227 falls between the 0.025 value of 2.145 and the 0.010 value of2.624) but can be computed directly using electronic spreadsheets, or calculators.Simply stated, making an inference from the sample of 15 data points, there is a2.14% possibility that the true population mean is greater than 500.

    CSSBB2007 VII - 41 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    42/76

    VII. MEASURE STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Other Distributions

    Hypergeometric Bivariate Exponential

    Lognormal Weibull

    Other less commonly used distributions include the following:

    Hypergeometrie DistributionThe hypergeometric distribution is used to model discrete data. The hypergeometricdistribution applies when the population size, N, is small compared to the samplesize, or stated another way, when the sample, n, is a relatively large proportion of thepopulation (n >0.1 N). Sampling is done without replacement. The hypergeometricdistribution is a complex combination calculation and is used when the definedoccurrences are known or can be calculated.The number of successes, r, in the sample follows the hypergeometric function:

    e d e N - dP(r) = r nore NnWhere:

    N = population sizen = sample sized = number of occurrences in the population

    N - d = number of non occurrences in the populationr = number of occurrences in the sample

    The term x is used instead of r in many texts.The hypergeometric distribution is similar to the binomial distribution. Both areused to model the number of successes given a fixed number of trials and twopossible outcomes on each trial. The difference is that the binomial distributionrequires the probability of success to be the same for all trials, while thehypergeometric distribution does not.

    CSSBB2007 VII42 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    43/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Hypergeometric Distribution (Continued)Example 7.25: From a group of 20 products, 10are selected at random for testing.What is the probability that the 10selected contain the 5 best units?

    N = 20, n = 10, d = 5, (N-d) = 15and r = 5

    (note that e rn = n' )r!(n -r)!

    ( 5! ) ( 15! )5!O! 5! 10!20!10!10!

    = ( 15! ) ( 10!10!) = 0.0163 = 1.63%5!10! 20!P(r) =

    The hypergeometric distribution using different r values is shown in Figure 7.25.

    ( N - d )P(r) = (d) n - rr (~)n = sample size .P(r)r = number of occurrencesd = occurrences in populationN = population size

    rFigure 7.25 Hypergeometric Distribution Example

    The mean and the variance of the hypergeometric distribution are:nmf.1=-N c r2 = ( " ;) ( 1 _ : ) ( ~ ~ ~ )

    CSSBB 2007 VII - 43 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    44/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Choosing the Correct Discrete DistributionTo determine the correct discrete distribution, ask the following questions.1. Is a rate being modeled, such as defects per car, and is there no upper bound onthe number of possible occurrences? If the answer is yes, the Poisson distributionis probably the appropriate distribution. If the answer is no, go to question 2.2. Is there a fixed number of trials? If yes, go to question 3. If no, is there a fixednumber of successes with the number of trials being the random variable? If theanswer is yes, use either the geometric or negative binomial distributions.3. Is the probability of success the same on all trials? If yes, use the binomialdistribution, if no, use the hypergeometric distribution.These questions are summarized in the flow chart shown in Figure 7.26.

    Figure 7.26 Discrete Distribution Flow ChartThe geometric and negative binomial distributions identified in Figure 7.26 are notsummarized in this Primer.

    CSSBB 2007 VII - 44 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    45/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Bivariate Normal DistributionThe joint distribution of two variables is called a bivariate distribution. Bivariatedistributions may be discrete or continuous. There may be total independence ofthe two independent variables, or there may be a covariance between them.The graphical representation of a bivariate distribution is a three dimensional plot,with the x and y-axis representing the independent variables and the z-axlsrepresenting the frequency for discrete data or the probability for continuous data.A special case of the bivariate distribution is the bivariate normal distribution, inwhich there are two random variables. For this case, the bivariate normal densityis given by Freund (1962f as:

    ( X1 -1-11) _ 2p (x 1 -1-11) ( X2 -1-12) + (X2 -1-12)2 201 0102 02exp ---~----------~~~--------~--2(1-p2)

    Where: 1-11and 1-12are the two means01 and 02 are the two variances and are each> 0p is the correlation coefficient of the two random variables

    The bivariate normal distribution surface is shown in Figure 7.27. Note that themaximum occurs at x , = 1-11and x2 = 1-12.

    (Freund,1962fFigure 7.27 Bivariate Normal Distribution Surface

    (~ Additional information on bivariate distributions may be found in Duncan (1986)6.

    CSSBB 2007 VII - 45 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    46/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Exponential DistributionThe exponential distribution applies to the useful life cycle of many products. Theexponential distribution is used to model items with a constant failure rate.The exponential distribution is closely related to the Poisson distribution. If arandom variable, x, is exponentially distributed, then the reciprocal of x, y = 1/xfollows a Poisson distribution. Likewise, if x is Poisson distributed, then y = 1/x isexponentially distributed. Becauseof this behavior, the exponential distribution isusually used to model the mean time between occurrences, such as arrivals orfailures, and the Poisson distribution is used to model occurrences per interval,such as arrivals, failures, or defects. The exponential probability density functionis: x

    f(x) = .!e -9 = A . e -A x , X ~ 09Where: A is the failure rate and 9 is the meanFrom the equation above, it can be seen that A = 1/9. The exponential probabilitydensity function is shown in Figure 7.28.

    ~" i i ir : : :Q)o~:c11 1.cea.

    xFigure 7.28 Exponential Probability Density Function

    The variance of the exponential distribution is equal to the meansquared.1hence a = 8 = -A

    The exponential distribution is characterized by its hazard function which isconstant. Because of this, the exponential distribution exhibits a lack of memory.That is, the probability of survival for a time interval, given survival to the beginningof the interval, is dependent only on the length of the interval.CSSBB 2007 VII - 46 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    47/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Lognormal DistributionIf a data set is known to follow a lognormal distribution, transforming the data bytaking a logarithm yields a data set that is approximately normally distributed. Thisis shown in Table 7.29.

    Original Data Normalized Data In12 In(12) 2.4828 In(28) 3.33

    87 In(87) 4.47143 In(143) 4.96

    Table 7.29 Transformation of Lognormal DataThe most common transformation is made by taking the natural logarithm, but anybase logarithm, also yields an approximate normal distribution. The remainingdiscussion will use the natural logarithm denoted as "In".When random variables are summed, as the sample size increases, the distributionof the sum becomes a normal distribution, regardless of the distribution of theindividuals. Since lognormal random variables are transformed to normal randomvariables by taking the logarithm, when random variables are multiplied, as thesample size increases, the distribution of the product becomes a lognormaldistribution regardless of the distribution of the individuals. This is because thelogarithm of the product of several variables is equal to the sum of the logarithmsof the individuals. This is shown below:

    y = x1 x2 X3In y = In x1 + In x2 + In X3

    The standard lognormal probability density function is:( )

    21 1 Inx-Ilf{x) = e 2 (J ,X> 0xcrJi;,Where: IJ is the location parameter or mean of the natural logarithms of the

    individual valuesa is the scale parameter or standard deviation of natural logarithms of the

    individual values. Some references show a as the shape parameter.

    CSSBB 2007 VII - 47 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    48/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Lognormal Distribution (Continued)The location parameter is the meanof the data set after transformation by taking thelogarithm, and the scale (or shape) parameter is the standard deviation of the dataset after transformation.The lognormal distribution takes on several shapes depending on the value of theshape parameter. The lognormal distribution is skewed right, and the skewnessincreases as the value of0increases. This is shown in Figure 7.30.

    xFigure 7.30 Lognormal Probability Density Function

    The mean of the lognormal distribution can be computed from its parameters:mean = e ( J . I + a2 I 2)

    The variance of the lognormal distribution is:variance = (e(2 J .I + 02 (e02 - 1)

    Where f. I and0are the mean and variance of natural log values.(Dovich, 2002)5

    CSSBB 2007 VII - 48 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    49/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Weibull DistributionTheWeibull distribution is oneofthemostwidely useddistributions in reliability andstatistical applications. It is commonly usedto model time to fail, time to repair, andmaterial strength. There are two common versions of the Weibull distribution, thetwo parameterWeibull and the three parameterWeibull. The difference is the threeparameter Weibull distribution has a location parameter when there is some non-zero time to first failure.The three parameterWeibull probability density function is:

    Where:fix) = : ( X ; i S r ' exp-( X ; i S r . for x , is

    1 3 is the shape parameterB is the scale parameterl) is the location parameter

    The three parameter Weibull distribution can also be expressed as:

    Where: 1 3 is the shape parameter11s the scale parameter (determines the width of the distribution)V is the non-zero location parameter (the point below which there are nofailures)

    Note: The Weibull discussion on the following pages will use B for the scaleparameter and l) for the location parameter.

    The shape parameter is what gives the Weibull distribution its flexibility. Bychanging the value of the shape parameter, the Weibull distribution can model awide variety of data. If 1 3 = 1 the Weibull distribution is identical to the exponentialdistribution, if 1 3 = 2,theWeibull distribution is identical to the Rayleighdistribution;if 1 3 is between 3 and 4, the Weibull distribution approximates the normaldistribution.TheWeibull distribution approximates the lognormal distribution for several valuesof 1 3 . For most populations, more than fifty samples are required to differentiatebetween theWeibull and lognormal distributions. Theeffect ofthe shape parameteron the Weibull distribution is shown in Figure 7.31.

    CSSBB 2007 VII - 49 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    50/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Weibull Distribution (Continued)0.025

    0.020~1 /1I:~ 0.015~:cc u 0.010ea ..

    0.0050.0000........... """'"""'-5"'"'0--10"'"' 0--;"""'15"'"'0"";;;;"-"""'200

    X

    Figure 7.31 Effect of the Weibull Shape Parameter, p (with 9 = 100 and l > = 0).The scale parameter determines the range of the distribution. The scale parameteris also known as the characteristic life if the location parameter is equal to zero. Ifl > does not equal zero, the characteristic life is equal to 9+l>; 63.2% of all values fallbelow the characteristic life regardless of the value of the shape parameter.

    0.020 p=1

    0.005

    P=2.58=50

    ~ 0.015I:Q)C~: s 0.010 -~ee,

    p= 2.5

    50 100X

    150 200

    Figure 7.32 Effect of the Weibull Scale Parameter

    CSSBB2007 VII - 50 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    51/76

    VII. MEASURE - STATISTICSPROBABILITY/OTHER DISTRIBUTIONS

    Weibull Distribution (Continued)The location parameter is used to define a failure-free zone. The probability offailure when x is less than 6 is zero. When 6>0, there is a period when no failurescan occur. When 6

  • 8/2/2019 7 Measure Statistics

    52/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Process Capability

    Process Capability is presented in the following topic areas:

    Capability studies Capability indices Performance indices Short-term vs. long-term

    Non-normal data Attributes data Performance metrics

    The above topics are presented in a slightly different order than the ASQ SOK.

    Process Capability StudiesThe determination of process capability requires a predictable pattern of statisticallystable behavior (most frequently a bell-shaped curve) where the chance causes ofvariation are compared to the engineering specifications. A capable process is aprocess whose spread on the bell-shaped curve is narrower than the tolerance rangeor specification limits. USL is the upper specification limit and LSL is the lowerspecification limit.

    ! USLSL : : i : : i iFigure 7.34 A Comparison of Process Spread to Tolerance Range

    It is often necessary to compare the process variation with the engineering orspecification tolerances to judge the suitability of the process. Process capabilityanalysis addresses this issue. A process capability study includes three steps:

    Planning for data collection Collecting data Plotting and analyzing the results

    CSSBB 2007 VII - 52 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    53/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Process Capability Studies (Continued)The objective of process quality control is to establish a state of control over themanufacturing process and then maintain that state of control through time. Actionsthat change or adjust the process are frequently the result of some form of capabilitystudy. When the natural process limits are compared with the specification range,any of the following possible courses of action may result:

    Do nothing. If the process limits fall well within the specification limits, noaction may be required.

    Change the specifications. The specification limits may be unrealistic. Insome cases, specifications may be set tighter than necessary. Discuss thesituation with the final customer to see if the specifications may be relaxed ormodified.

    Center the process. When the process spread is approximately the same asthe specification spread, an adjustment to the centering of the process maybring the bulk of the product within specifications.

    Reduce variability. This is often the most difficult option to achieve. It maybe possible to partition the variation (stream-to-stream, within piece,batch-to-batch, etc.) and work on the largest offender first. For a complicatedprocess, an experimental design may be used to identify the leading sourceof variation.

    Accept the losses. In some cases, management must be content with a highloss rate (at least temporarily). Some centering and reduction in variation maybe possible, but the principal emphasis is on handling the scrap and reworkefficiently.

    Other capability applications: Providing a basis for setting up a variables control chart Evaluating new equipment Reviewing tolerances based on the inherent variability of a process Assigning more capable equipment to tougher jobs Performing routine process performance audits Determining the effects of adjustments during processing

    Modified from Juran (1999)8

    CSSBB 2007 VII - 53 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    54/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Identifying CharacteristicsThe identification of characteristics to be measured in a process capability studyshould meet the following requirements:

    The characteristic should be indicative of a key factor in the quality of theproduct or process

    It should be possible to adjust the value of the characteristic The operating conditions that affect the measured characteristic should bedefined and controlled

    If a part has fourteen different dimensions, process capability would not normallybe performed for all of these dimensions. Selecting one, or possibly two, keydimensions provides a more manageable method of evaluating the processcapability. For example in the case of a machined part, the overall length or thediameter of a hole might be the critical dimension. The characteristic selected mayalso be determined by the history of the part and the parameter that has been themost difficult to control or has created problems in the next higher level ofassembly.Customer purchase order requirements or industry standards may also determinethe characteristics that are required to be measured. In the automotive industry, theProduction Part Approva/ Process (PPAP) (AIAG, 1995}1states "An acceptable levelof preliminary process capability must be determined prior to submission for allcharacteristics designated by the customer or supplier as safety, key, critical, orsignificant, that can be evaluated using variables (measured) data." Chrysler, Fordand General Motors use symbols to designate safety and/or government regulatedcharacteristics and important performance, fit, or appearance characteristics.

    (AIAG,1995}1

    CSSBB 2007 VII - 54 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    55/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Identifying Specifications/TolerancesThe process specifications or tolerances are determined either by customerrequirements, industry standards, or the organization's engineering department.Various process capability indices are described later in this Section.The process capability study is used to demonstrate that the process is centeredwithin the specification limits and that the process variation predicts the process iscapable of producing parts within the tolerance requirements.When the process capability study indicates the process is not capable, theinformation is used to evaluate and improve the process in order to meet thetolerance requirements. There may be situations where the specifications ortolerances are set too tight in relation to the achievable process capability. In thesecircumstances, the specification must be reevaluated. If the specification cannot beopened, then the action plan is to perform 100% inspection of the process, unlessinspection testing is destructive.

    Developing Sampling PlansThe appropriate sampling plan for conducting process capability studies dependsupon the purpose and whether there are customer or standards requirements for thestudy. Ford and General Motors specify that process capability studies for PPAPsubmissions be based on data taken from asignificant production run of aminimumof 300 consecutive pieces. (AIAG,1995)1If the process is currently running and is in control, control chart data may be usedto calculate the process capability indices. If the process fits a normal distributionand is in statistical control, then the standard deviation can be estimated from:

    a z B . _ . .R d2

    For new processes, for example for a project proposal, a pilot run may be used toestimate the process capability. The disadvantage of using a pilot run is that theestimated process variability is most likely less than the process variability expectedfrom an ongoing process.Process capabilities conducted for the purpose of improving the process may beperformed using a design of experiments (DOE) approach in which the optimumvalues of the process variables which yield the lowest process variation is the

    ( objective.

    CSSBB 2007 VII - 55 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    56/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Verifying Stability and NormalityIf only common causes of variation are present lna process, then the output of theprocess forms a distribution that is stable over time and is predictable. If specialcauses of variation are present, the process output is not stable over time.(AIAG,1995)2Figure 7.35 depicts an unstable process with both process average and variationout-of-control. Note,the process mayalso beunstable if either the process averageor variation is out-of-control.

    Figure 7.35 Unstable Process with Average and Variation Out-of-controlCommon causesof variation refer to the manysources of variation within aprocessthat has a stable and repeatable distribution over time. This is called a state ofstatistical control and the output of the process is predictable. Special causesreferto any factors causing variation that arenot always acting on the process. If specialcauses of variation are present, the process distribution changes and the processoutput is not stable over time. (AIAG, 1995)2When plotting a process on a control chart, lack of process stability can be shownby several types of patterns including: points outside the control limits, trends,points on one side of the center line, cycles, etc.

    CSSBB 2007 VII - 56 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    57/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Verifying Stability and Normality (Continued)The validity of the normality assumption may be tested using the chi squarehypothesis test. To perform this test, the data is partitioned into data ranges. Thenumber of data points in each range is then compared with the number predictedfrom a normal distribution. Using the hypothesis test with a selected confidencelevel, a conclusion can be made as to whether the data follows a normal distribution.The chi square hypothesis test is:

    H o: The data follows a specified distributionH1: The data does not follow a specified distribution

    and is tested using the following test statistic:X 2 ~ (0.- E.)2= L . . . J I I

    i=1 Ei (NIST, 20 01 )1 1Refer to Section VIII for the definition of terms in the above equation and additionalchi square information.Continuous data may be tested using the Kolmogorov-Smirnov goodness-of-fit test.It has the same hypothesis test as the chi square test, and the test statistic is givenby (NIST, 2001)11:

    D = max I F ( Y i)- ~1:5:i:5:N

    Where D is the test statistic and F is the theoretical cumulative distribution of thecontinuous distribution being tested. An attractive feature of this test is that thedistribution of the test statistic does not depend on the underlying cumulativedistribution function being tested. Limitations of this test are that it only applies tocontinuous distributions and that the distribution must be fully specified. Thelocation, scale, and shape parameters must be specified and not estimated from thedata.The Anderson-Darling test is a modification of the Kolmogorov-Smirnov test andgives more weight to the tails of the distribution. See the (NIST, 2001)11 reference atthe end of this Section for further discussion of distribution tests.If the data does not fit a normal distribution, the chi square hypothesis test may alsobe used to test the fit to other distributions such as the exponential or binomialdistributions. Refer to non-normal data transformations discussed later in thisSection.

    CSSBB 2007 VII - 57 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    58/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    The Normal DistributionWhen all special causes of variation are eliminated, many variable data processes,when sampled and plotted, produce a bell-shaped distribution. If the base of thehistogram is divided into six (6) equal lengths (three on each side of the average),the amount of data in each interval exhibits the following percentages:

    u-f o Il 1l+1a u+Zo u+So+---------- 9973%-------- ..Figure 7.36 The Normal Distribution

    The Z ValueThe area outside of specification for a normal curve can be determined by a Z value.

    -X-LSLZLOWER = S -USL-XZUPPER = --- S

    The Z transformation formula is:Z= X-IJ

    aWhere: x = data value (the value of concern)

    I J = meana = standard deviation

    This transformation will convert the original values to the number of standarddeviations away from the mean. The result allows one to use a single standardnormal table to describe areas under the curve (probability of occurrence).

    C55BB 2007 VII - 58 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    59/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Z Value (Continued)There are several ways to display the normal (standardized) distribution:1. As a number under the curve, up to the Z value.

    o 1.0P(Z = - 00 to 1) = 0.8413

    2. As a number beyond the Z value.

    o 1P(Z =1 to + (0) = 0.1587

    3. As a number under the curve, and at a distance from the mean.

    o 1

    P(Z = 0 to 1) = 0.3413The standard normal table in this Primer uses the second method of calculating theprobability of occurrence.

    CSSBB 2007 VII - 59 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    60/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Z Value Examples

    150

    Example 7.26: Tenth grade students weights follow a normal distribution with amean IJ = 150 Ib and a standard deviation of 20 lb. What is the probability of astudent weighing less than 100 Ib?

    IJ = 150x = 100a = 20

    Z= x-IJa= 100 - 150 = _ 50 = _ 2.520 20

    Since the normal table has values about the mean, a Z value of - 2.5 can be treatedas 2.5.

    P(Z = - 00 to -2.5) = 0.0062. That is, 0.62% of the students will weigh less than 100 lb.

    120 150 160

    Example 7.27: Using the data from Example 7.26, what is the probability of a studentweighing between 120 Ib and 160 Ib?

    The best technique to solve this problem, using the standard normal table in thisPrimer, would be to determine the tail area values, and to subtract them from thetotal probablllty of 1.

    CSSBB 2007 VII - 60 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    61/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY STUDIES

    Z Value Examples (Continued)Example 7.27 (continued):

    First, determine the Z value and probability below 120 lb.Z = x - 1 . 1 = 120 - 150 = _ 30 = -1.5

    a 20 20P(Z = - 00 to -1.5) = 0.0668

    .0668\120 150

    Second, determine the Z value and probability above 160 lb.

    P(Z = 0.5 to + (0) = 0.3085Z = x - 1 . 1 = 160 - 150 = _!! = 0.5a 20 20

    .3085I

    150 160Third, the total probability - below - above = probability between 120 and 160 lb.

    1 - 0.0668 - 0.3085 = 0.6247Thus, 62.47% of the students will weigh more than 120 Ib but less than 160 lb.

    CSSBB 2007 VII - 61 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    62/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY INDICES

    Capability Index Failure RatesThere is a direct link between the calculated Cp (and Pp values) with the standardnormal (Z value) table. A Cp of 1.0 is the loss suffered at a Z value of 3.0 (doubled,since the table is one sided). Refer to Table 7.37 below.

    Cp Z value ppm0.33 1.00 317,3110.67 2.00 45,5001.00 3.00 2,7001.10 3.30 9671.20 3.60 3181.30 3.90 961.33 4.00 631.40 4.20 271.50 4.50 6.81.60 4.80 1.61.67 5.00 0.571.80 5.40 0.0672.00 6.00 0.002

    Table 7.37 Failure Rates for Cp and Z ValuesIn Table 7.37, ppm equals parts per million of nonconformance (or failure) when theprocess:

    - Is centered on X Has a two-tailed specification Is normally distributed Has no significant shifts in average or dispersion

    When the Cp, Cpk, Pp' and Ppk values are 1.0 or less, Z values and the standard normaltable can be used to determine failure rates. With the drive for increasinglydependable products, there is a need for failure rates in the Cp range of 1.5 to 2.0.

    CSSBB 2007 VII - 62 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    63/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY INDICES

    Process Capability IndicesTo determine process capability, an estimation of slqmals necessary:

    OR is an estimate of process capability sigma and comes from a control chart.The capability index is defined as:

    Cp = (USL - LSL)60RAs a rule of thumb:

    Cp > 1.33 CapableCp = 1.00 to 1.33 Capable with tight controlCp < 1.00 IncapableThe capability ratio is defined as:

    60RC =-----R (USL - LSL)

    As a rule of thumb:CR < 0.75 CapableCR = 0.75 to 1.00 Capable with tight controlCR > 1.00 Incapable

    Note, this rule ofthumb logic is somewhat out of step with the six sigma assumptionof a 1.5 sigma shift. The above formulas only apply ifthe process is centered, stayscentered within the specifications, and Cp = Cpk

    /\CSSBB 2007 VII - 63 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    64/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/CAPABILITY INDICES

    Process Capability Indices (Continued)Cpk is the ratio giving the smallest answer between:

    - -C = USL-X or X-LSLpk 3a 3aR R

    Process Capability Exercise

    C = USL - LSL = 16 - 4 = ~ = 1p 6aR 6(2) 12

    Example 7.28:For a process with X = 12,aR = 2 anUSL = 16and LSL = 4, determineCp and c., min:

    -C USL - X = 16 - 12 = 4 = 0.667. pkupper = 3aR 3 (2) 6-c X - LSL = 12 - 4 = ! 1.5pklower = 3aR 3 (2) 6Cpkmin = Cpkupper = 0.667

    Cpm IndexThe Cpm index is defined as: c = USL-LSLpm 6 J (IJ - T)2 + a2Where: USL = upper specification limitLSL = lower specification limit

    IJ = process meanT = target valuea = process standard deviation

    Cpm is based on the Taguchi index, which places more emphasis on processcentering on the target.(Breyfogle, 1999)4

    CSSBB2007 VII - 64 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    65/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/PERFORMANCE INDICES

    Cpm Index Exercise

    C = USL-LSLpm 6 V (101T)2 + a2C = 16 -4pm 6 V(12 -10)2 + 22Cpm =0.707

    Example 7.29: For a process with 101 12, a = 2, T = 10, USL = 16 and LSL = 4,determine Cpm:

    Process Performance IndicesTQdetermine process performance, an estimation of sigma is necessary:

    a.= ~ E(X_X)2I (n -1)

    a is a measure of total data sigma and generally comes from a calculator orcomputer.The performance index is defined as:

    Pp = (USL - LSL)6aj

    The performance ratio is defined as:6a.P = IR (USL - LSL)

    Ppk is the ratio giving the smallest answer between:- -P = (USL - X) or (X - LSL)

    pk 3aj 3aj(\

    CSSBB2007 VII - 65 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    66/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/SHORT & LONG TERM CAPABILITY

    Short-Term and Long-Term CapabilityUp to this point, process capability has been discussed in terms of stable processes,with assignable causes removed. In fact, the process average and spread aredependent upon the number of units measured or the duration over which theprocess is measured.When a process capability is determined using one operator on one shift, with onepiece of equipment, and a homogeneous supply of materials, the process variationis relatively small. As factors for time, multiple operators, various lots of material,environmental changes, etc. are added, each of these contributes to increasing theprocess variation. Control limits based on a short-term process evaluation arecloser together than control limits based on the long-term process.Smith (2001)15 describes a short run with respect to time and a small run, wherethere is a small number of pieces produced. When a small amount of data isavailable, there is generally less variation than is found with a larger amount of data.Control limits based on the smaller number of samples will be narrower than theyshould be, and control charts will produce false out-of-control patterns.Smith suggests a modified X and R chart for short runs, running an initial 3 to 10pieces without adjustment. A calculated value is compared with a critical value andeither the process is adjusted or an initial number of subgroups is run. Inflated D 4and A2values are used to establish control limits. Control limits are recalculatedafter additional groups are run.For small runs, with a limited amount of data, Smith recommends the use of the Xand MR chart. The X represents individual data values, not an average, and the MRis the moving range, a measure of piece-to-piece variability. Process capability orCpk values determined from either of these methods must be considered preliminaryinformation. As the number of data points increases, the calculated processcapability will approach the true capability.When comparing attribute with variable data, variable data generally provides moreinformation about the process, for a given number of data points. Using variablesdata, a reasonable estimate of the process mean and variation can be made with 25to 30 groups of five samples each. Whereas a comparable estimate using attributedata may require 25 groups of 50 samples each. Using variables data is preferableto using attribute data for estimating process capability.Information on rational subgrouping and breakdown of variation is given inSection X.

    CSSBB 2007 VII - 66 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    67/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/NON-NORMAL DATA

    Process Capability for Non-Normal DataIn the real world, data does not always fit a normal distribution, and when it doesnot, the standard capability indices does not give valid information because they arebased on the normal distribution. The first step is avisual inspection of a histogramof the data. If all data values are well within the specification limits, the processwould appear to be capable. One additional strategy is to make non-normal dataresemble normal data by using a transformation. The question is which one toselect for the specific situation. Unfortunately, the choice of the "best"transformation is generally not obvious. (NIST,2001)11

    In the NIST (2001)11reference, a family of power transformations for positive datavalues are attributed to G.E.P Box and D.R. Cox. The Box-Cox powertransformations are given by:

    x(A) = (x A - 1) for A .; = 0Ax(A) = In(x) for A = 0

    Given data observations x., x2, xn' select the power Athat maximizes the logarithmof the likelihood function:n (x (A)-i(A2 nf(x,A) = - _ ! ! In[L i ]+ (A-1) LIn(xi)2 i=1 n i=1

    Where the arithmetic mean of the transformed data is:1 ni(A) = - LXJA)n i=1 (NIST, 2001)11

    Process capability indices and formulas described elsewhere in this Section arebased on the assumption that the data are normally distributed. The validity of thenormality assumption may be tested using the chi square hypothesis test.Breyfogle (1999)4states that one approach to address the non-normal distributionis to make transformations to "normalize" the data. This may be done withstatistical software that performs the Box-Cox transformation. As an alternativeapproach, when the data can be represented by a probability plot (i.e. a Weibulldistribution) one should use the 0.135 and 99.865 percentiles to describe the spreadof the data.,

    (

    CSSBB 2007 VII - 67 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    68/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/NON-NORMAL DATA

    Process Capability for Non-normal Data (Continued)It is often necessary to identify non-normal data distributions and to transform theminto near normal distributions to determine process capabilities or failure rates.Assume that a process capability study has been conducted. Some 30 data pointsfrom a non-normal distribution are shown in Table 7.38 below. An investigator cancheck the data for normality using techniques such as the dot plot, histogram, andnormal probability plot.

    1.46 19.45 23.43 92.35 104.86118.59 282.58 311.17 341.88 374.81410.06 676.52 731.16 789.05 850.31915.06 983.45 1055.60 1131.65 1384.581477.63 1575.30 1677.72 1785.06 2541.172687.39 2839.82 4304.67 4521.22 6857.50

    Table 7.38 Sample Non-normal DataA histogram displaying the above non-normal data indicates a distribution that isskewed to the right.

    10

    ~cQ)~0"'5~LL

    -I I I I- I I I I I Io o 1000 2000 3000 4000 5000 6000 7000Data

    Table 7.39 Histogram of Non-normal Data

    C55BB 2007 VII - 68 QUALITY COUNCIL OF INDIANA

  • 8/2/2019 7 Measure Statistics

    69/76

    VII. MEASURE - STATISTICSPROCESS CAPABILITY/NON-NORMAL DATA

    Process Capability for Non-normal Data (Continued)A probability plot can also be used to display the non-normal data. The data pointsare clustered to the left with some extreme points to the right. Since this is a non-normal di