ITK 226 2 Statistics

download ITK 226 2 Statistics

of 42

Transcript of ITK 226 2 Statistics

  • 8/3/2019 ITK 226 2 Statistics

    1/42

    Dicky Dermawanwww.dickydermawan.net78.net

    [email protected]

  • 8/3/2019 ITK 226 2 Statistics

    2/42

    I

    n statistics we are concerned with method for designing andevaluating experiments to obtain information about practical

    problems.

    In most cases the inspection of each item ofpopulation would

    be too expensive, time-consuming, or even impossible. Hencea few ofsample are drawn at random and from this inspection

    conclusion about the population are inferred.

  • 8/3/2019 ITK 226 2 Statistics

    3/42

    POPULATION SAMPLE

    n

    y

    y

    n

    1i

    i!!!Q

    1n

    yy

    S

    2n

    1i

    i22

    !!

    !

    W

    !

    !N

    1j

    jj )x(fxQMean Average

    VarianceVariance )x(fx jj

    2

    j

    2 ! QW

    Size large number N Size small number n

    Probability function/density f(x) Relative frequency function

    Distribution function F(x) Cumulative frequency function

  • 8/3/2019 ITK 226 2 Statistics

    4/42

    Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

    320 380 340 410 380 340 360 350 320 370

    350 340 350 360 370 350 380 370 300 420

    370 390 390 440 330 390 330 360 400 370

    320 350 360 340 340 350 350 390 380 340

    400 360 350 390 400 350 360 340 370 420

    420 400 350 370 330 320 390 380 400 370

    390 330 360 380 350 330 360 300 360 360

    360 390 350 370 370 350 390 370 370 340

    370 400 360 350 380 380 360 340 330 370

    340 360 390 400 370 410 360 400 340 360

  • 8/3/2019 ITK 226 2 Statistics

    5/42

    Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    300 310 320 330 340 350 360 370 380 390 400 410 420 430 440

    AbsoluteFrequency

    Tensile Strength

  • 8/3/2019 ITK 226 2 Statistics

    6/42

    Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    300 310 320 330 340 350 360 370 380 390 400 410 420 430 440

    Relati

    veFrequency

    Tensile Strength

  • 8/3/2019 ITK 226 2 Statistics

    7/42

    Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

    0

    20

    40

    60

    80

    100

    120

    300 310 320 330 340 350 360 370 380 390 400 410 420 430 440

    CumulativeAbsoluteFrequency

    Tensile Strength

  • 8/3/2019 ITK 226 2 Statistics

    8/42

    Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    300 310 320 330 340 350 360 370 380 390 400 410 420 430440CumulativeRelativeFrequency

    Tensile Strength

  • 8/3/2019 ITK 226 2 Statistics

    9/42

    Min

    Lower Quartile

    Middle Quartile = Median

    Upper Quartile

    Interquartile range

    Max

  • 8/3/2019 ITK 226 2 Statistics

    10/42

    DOX 6E Montgomery 10

  • 8/3/2019 ITK 226 2 Statistics

    11/42

    Experimental errorHypothesis testing: null hypothesis, alternative hypothesisType I error : rejecting a true hypothesisType II error : accepting a false hypohesis

    One-tail test vs Two-tail testConfidence level = Significance LevelP-valueConfidence interval

  • 8/3/2019 ITK 226 2 Statistics

    12/42

    I

    f Y1, .Yn are independent normal random variables each ofwhich has mean Q and variance 2, then the normal randomvariable:

    Is normal with the mean Q and variance 2/n and the random

    variable

    Is normal with the mean 0 and variance 1

    The confidence interval for Q is

    )Y....YYY(n

    1Y n321 !

    n/YZW

    Q!

    So far we have regarded the value y1, y2, .of a sample as n observedvalue of a single random variable Y. We may equally well regard these nvalues as single observations of n random variables Y1, Y2,.that have thesame distribution and are independent

    ee

    n

    cy

    n

    cyCONF

    WQ

    W

  • 8/3/2019 ITK 226 2 Statistics

    13/42

    A vendor submits lots of fabric to a textile manufacturer.

    The manufacturer wants to know if the lot average breakingstrength exceeds 200 psi. If so, she wants to accept the lot.

    Past experience indicates that a reasonable value for thevariance of breaking strength is 100 (psi)2.

    Four speciments are randomly selected, and the average

    breaking strength observed is psi.214y !

  • 8/3/2019 ITK 226 2 Statistics

    14/42

    The hypothesis to be tested are:

    This is a one-sided alternative hypothesis

    The value ofthe test statistic is:

    If the confidence level of 95% is chosen, i.e. type I error = 0.05, wefind Z = 1.645

    Thus the difference is significant: H0 is rejected and we conclude that

    the lot average breaking strength exceeds 200 psi.Thus, we accept the lot.

    The confidence interval for Q at 95% confidence level is 205.8 Q 222.2. Clearly, 200 is outside the interval.

    The P-value is 0.0026.

    200:H

    200:H

    1

    0

    "

    !

    Q

    Q

    80.2

    n/

    yZ 00 !

    !

    W

    Q

  • 8/3/2019 ITK 226 2 Statistics

    15/42

  • 8/3/2019 ITK 226 2 Statistics

    16/42

  • 8/3/2019 ITK 226 2 Statistics

    17/42

    n/S

    Yt 00 Q!

    ee

    n

    Sty

    n

    StyCONF Q

    The test statistic is

    The confidence interval is

    At (n-1) degree of freedom

    The same as previous, but we use..t distribution instead of normal distribution

    Sample standard deviationS instead of tNormal

    S

    W

  • 8/3/2019 ITK 226 2 Statistics

    18/42

    If Variance Known

    2

    2

    2

    1

    2

    1

    210

    nn

    yyZWW

    !The test statistic is

    The confidence interval is

    ee1

    2

    1

    1

    2

    12/2121

    1

    2

    1

    1

    2

    12/21

    nnZyyyy

    nnZyyCONF

    WWWWEE

    Normal

    nnn

    y

    2

    2

    2

    1

    2

    1

    2

    WWW

    Q

  • 8/3/2019 ITK 226 2 Statistics

    19/42

    If Variance Unknown, but 12 = 2

    2

    1 20

    1 2

    The test statistic is

    1 1

    p

    y yt

    Sn n

    !

    )1n()1n(

    S)1n(S)1n(S

    21

    222

    2112

    P !

    Choose confidence level, usually 95%, then find critical t value at associated degree offreedom, i.e. tE/2,RIf |t0|> t E/2,R, we have enough reason to reject null hypothesis and conclude that thetwo method differ significantlyAlternatively, calculate P value, i.e. the risk ofwrongly rejecting the null hypothesisOr set confidence interval and reject null hypothesis if 0 is not included in the interval

    2nn 21 !R

    2nn,tNormal

    n

    1

    n

    1S

    n

    y

    21

    21

    P

    2

    !

    R

    W

    Q

  • 8/3/2019 ITK 226 2 Statistics

    20/42

    If Variance Unknown, 12 2

    2

    1n1n

    2

    n

    S

    n

    S

    2

    2

    2

    1

    2

    1

    2

    2

    2

    2n

    22S

    1

    2

    1n

    21S

    2

    22

    1

    21

    tNormal

    n

    S

    n

    S

    n

    y

    !

    R

    W

    Q

    2

    2

    2

    1

    2

    1

    210

    n

    S

    n

    S

    yy

    t

    !

    The test statistic is

  • 8/3/2019 ITK 226 2 Statistics

    21/42

  • 8/3/2019 ITK 226 2 Statistics

    22/42

    DOX 6E Montgomery 22

    Tension bond strength of portland cementmortar is an important characteristics of the

    product. An engineer is interested incomparing the strength of a modified

    formulation in which polymer latex

    emulsions have been added during mixing tothe strength of the unmodified mortar. Hecollected 10 observations (Table 2.1)

    Plot the dot diagram.

    Plot the Box & Whisker plot

    Are the two formulations really different?

    Or perhaps the observed difference is theresults of sampling fluctuation and the two

    formulations are really identical?

  • 8/3/2019 ITK 226 2 Statistics

    23/42

  • 8/3/2019 ITK 226 2 Statistics

    24/42

  • 8/3/2019 ITK 226 2 Statistics

    25/42

  • 8/3/2019 ITK 226 2 Statistics

    26/42

    Bloking is a design technique used to improve the precision with which the

    comparisons among the factors of interest are made. Often blocking is used to reduceor eliminate the variability transmitted from nuisance factors, i.e.factors that may influence the experimental response but in which we are notinterested.

    The term block refers to a relatively homogeneous experimental unit, and the block

    represents a restriction on complete randomization because the treatmentcombinations are only randomized within the block. Blocking is carried out by makingcomparisons within matched pairs of experimental material.

    The confidence interval based on paired analysis usually much narrower than thatfrom the independent analysis. This illustrates the noise reduction property of

    blocking.

  • 8/3/2019 ITK 226 2 Statistics

    27/42

    S

    tatistical model 4 complete randomization:

    with (2ni -1) degree of freedom

    Statistical model with blocking:

    with only (ni pair -1) degree of freedom

    The test statistic:

    The confidence interval for 2-sided test:

    n/S

    dt

    d

    0 !

    !

    !!

    i

    ijiij

    n1,2,...,j

    1,2iy IQ

    !

    !!

    i

    ijjiijn1,2,...,j

    1,2iy IFQ

    j2j1j yyd !

    n/Stddn,2/

    sE

  • 8/3/2019 ITK 226 2 Statistics

    28/42

    Consider a hardness testing machine that presses a rod with a pointed tip into a

    metal specimen with a known force. Two different tips are available for this machine,and it is suspected that one tip produces different hardness readings than the other.

    The test could be performed as follows: a number of metal specimens couldrandomly be selected. Half are tested by tip 1 and the other half by tip 2.

    The metal specimens might be cut from different bar stock that were not exactlydifferent in their hardness. To protect against this possibility, an alternativeexperimental design should be considered: divide each specimen into two part andrandomly assign each tip to of each specimen

  • 8/3/2019 ITK 226 2 Statistics

    29/42

    - Use the paired data to determine a 95% confidence interval for the difference- What if we use pooled or independent analysis?

    Speciment Tip 1 Tip 2

    1 7 6

    2 3 3

    3 3 5

    4 4 3

    5 8 8

    6 3 2

    7 2 4

    8 9 9

    9 5 4

    10 4 5

  • 8/3/2019 ITK 226 2 Statistics

    30/42

  • 8/3/2019 ITK 226 2 Statistics

    31/42

  • 8/3/2019 ITK 226 2 Statistics

    32/42

    In some experiments it is the comparison of variability in the data that is important.

    For example, in chemical laboratories, we may wish to compare the variability of twoanalytical methods.Unlike the tests on means, the procedures for tests on variances are rather sensitiveto the normality assumption.

    Suppose we wish to test the hypothesis weather or not the variance of a normalpopulation equals a constant, viz. 0

    2 . The test statistic is:

    The appropriate distribution for G02 is chi-square distribution with (n-1) degree of

    freedom. The confidence interval for 02 is

    2

    0

    2

    2

    0

    2

    0

    S)1n(SS

    WWG !!

    2

    1n,1

    22

    2

    1n,

    2

    22

    S)1n(S)1n(

    ee

    EE GW

    G

  • 8/3/2019 ITK 226 2 Statistics

    33/42

    Suppose we wish to test equality of the variances of two normal populations.If independent random samples of size n1 and n2 are taken from populations 1 & 2,respectively, the test statistic for:

    Is the ratio of the sample variances:

    The appropriate distribution for F0 is the F distribution with (n1-1) numerator degree

    of freedom and (n2-1) denominator degree of freedom. The null hypothesis would berejected if F0 > F/2,n1-1,n2-1

    The confidence interval for 12 / 2

    2 is

    2

    2

    2

    10

    S

    SF !

    1n,1n,22

    2

    1

    22

    2

    1

    1n,1n,122

    2

    1

    122122

    FS

    SF

    S

    S

    ee EEW

    W

    2

    0

    2

    1

    2

    0

    2

    0

    :H

    :H

    WW

    WW

    {

    !

    1n,1n,1n,1n,1

    122

    212 F

    1F:Note

    !E

    E

  • 8/3/2019 ITK 226 2 Statistics

    34/42

    Probability plotting is a graphical technique for determining whether sample data

    conform to a hypothesized distribution based on a subjective visual examination ofthe data.

    To construct a probability plot, the observation in the sample are first rank fromsmallest to largest. That is, the sample y1,y2,,yn is arranged as y(1) ,y(2) ,.,y(n) wherey(1) is the smallest observation, with y(n) the largest.The ordered observations y(j) are then plotted against their observed cumulativefrequency (j-0.5)/n.The cumulative frequency scale has been arranged so that if the hypothesizeddistribution adequately describes the data, the plotted points will fall approximatelyalong a straight line. Usually, this is subjective.

  • 8/3/2019 ITK 226 2 Statistics

    35/42

  • 8/3/2019 ITK 226 2 Statistics

    36/42

  • 8/3/2019 ITK 226 2 Statistics

    37/42

  • 8/3/2019 ITK 226 2 Statistics

    38/42

  • 8/3/2019 ITK 226 2 Statistics

    39/42

  • 8/3/2019 ITK 226 2 Statistics

    40/42

    DOX 6E Montgomery 40

    An experiment is a test or a series of tests Experiments are used widely in the engineering

    world

    Process characterization & optimization Evaluation of material properties

    Product design & development

    Component & system tolerance determination

    All experiments are designed experiments,some are poorly designed, some are well-

    designed

  • 8/3/2019 ITK 226 2 Statistics

    41/42

    DOX 6E Montgomery 41

    Randomization

    Running the trials in an experiment in random order

    Notion of balancing out effects oflurking variables

    Replication

    Sample size (improving precision of effect estimation,estimation of error or background noise)

    Replication versus repeat measurements? (see page 13)

    Blocking

    Dealing with nuisance factors

  • 8/3/2019 ITK 226 2 Statistics

    42/42

    DOX 6E Montgomery 42

    Best-guess experiments Used a lot

    More successful than you might suspect, but there are

    disadvantages One-factor-at-a-time (OFAT) experiments

    Sometimes associated with the scientific orengineering method

    Devastated by interaction, also very inefficient Statistically designed experiments

    Based on Fishers factorial concept