TESCO Evaluation of Non-Normal Meter Data

40
Evaluation of Non Normal Meter Test Data Evaluation of Non Normal Meter Test Data The Eastern Specialty Company February 28, 2012 Frank Garcia Fred Rispoli

description

Undertand requirements of normality in ANSI/ASQ Z1.9 and how that affects analysis of meter test data. Review typical meter test data distributors and how to determine if meter test data is normal Introduction to working with non-normal data

Transcript of TESCO Evaluation of Non-Normal Meter Data

Page 1: TESCO Evaluation of Non-Normal Meter Data

Evaluation of Non Normal Meter Test Data

Evaluation of Non Normal Meter Test Data

The Eastern Specialty Company

February 28, 2012

Frank GarciaFred Rispoli

Page 2: TESCO Evaluation of Non-Normal Meter Data

Session Objectives

Understand:• Requirements of normality in ANSI/ASQ Z1.9 and

how that affects analysis of meter test data.• Review typical meter test data distributions and how

to determine if meter test data is normal.• Introduction to working with non normal data.• Assessing risk of using ANSI/ASQ Z1.9 with non

normal meter test data.

Any Other Issues or Items of Interest?

Page 3: TESCO Evaluation of Non-Normal Meter Data

We Assume That Population Fits a Statistical Model

• The statistical model being used for the sampling/testing plan needs to match the actual distribution of the population.

• In most circumstances, one is looking at a normal or Gaussian distribution (i.e. a Bell curve) based on sampling theory.

• We like ANSI/ASQ Z1.9 for ease of use and universal acceptance as a sampling plan.

Page 4: TESCO Evaluation of Non-Normal Meter Data

What does ANSI/AQ Z1.9 Say About Normality?

Paragraph A8 states:“This standard assumes the underlying distribution of individual measurements to be normal in shape. Failure of this assumption to be valid will affect the Operating Characteristic (OC) curves and probabilities based on these curves. In particular it will affect the estimate of percentage non conforming calculated from the mean and standard deviation of the distribution. The assumption should be verified prior to use of this standard.”

Page 5: TESCO Evaluation of Non-Normal Meter Data

ANSI/ASQ Z1.9

Sampling Procedures and Tables for Inspection by Variables for Percent

Nonconforming

• Various methods (Variability Unknown - Standard Deviation Method, Variability Unknown - Range Method, and Variability Known Method)

• All methods can be used with single or double specification limits.

Page 6: TESCO Evaluation of Non-Normal Meter Data

Z1.9 Calculations

EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9-2003 STANDARD DEVIATION METHOD WITH DOUBLE

SPECIFICATION LIMITS

Weighted Average Calculated Data

Example meter group has a population of 475 meters and 2% accuracy. Use AQL = 2.5

Line Information Needed Value Obtained Explanation 1. Sample Size: n 25 From Table I

2. Sum of % Registrations: ∑X 2506.1

3. Sum of Squared % Registrations: ∑X2 251222.03

4. Correction Factor (CF): (∑X)2/n 251221.49 (2506.1)2/25

5. Corrected Sum of Squares (SS): ∑X2-CF 0.5416

6. Variance (V): SS/(n-1) 0.0226 0.5416/24

7. Estimate of Lot Std. Deviation (S): √V 0.1503 √0.0226

8. Sample Mean (X bar): (∑X)/n 100.24 2506.1/25

9. Upper Specification Limit: U 102.0

10. Lower Specification Limit: L 98.0

11. Quality Index (upper): QU = (U- X bar)/S 11.71 (102.0-100.24)/0.1503

12. Quality Index (lower): QL = (X bar - L)/S 14.90 (100.24-98.0)/0.1503

13. Est. of Lot % Out of Limits Above U: Pu 0.00% From Table V

14. Est. of Lot % Out of Limits Below L: Pl 0.00% From Table V

15. Total Est. % Out of Limits: P = Pu+Pl 0.00% 0.00% +0.00%

16. Maximum Allowable % Out of Limits: M 5.98% From Table II and IV using AQL = 2.5 17. Acceptability Criterion: Pu + Pl < M 0.00% < 5.98% Therefore, the meter group is acceptable for continued service.

Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or less than the maximum allowable percent nonconforming (M), the lot meets the acceptability criterion. If P is greater than M or if either Qu or Ql or both are negative, then the lot does not meet the acceptability criterion.

Page 7: TESCO Evaluation of Non-Normal Meter Data

Operating Characteristic Curves - OCCs

Ideal OC Curve

When sampling, you face the risk of rejecting lots of AQL quality as well as the risk of accepting lots of poorer than AQL quality. We are interested in knowing how an acceptance sampling plan will accept or not accept lots over various lot qualities. A curve showing the probability of acceptance over various lot or process qualities is called the operating characteristic (OC) curve.

The Acceptable Quality Limit (AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling

Page 8: TESCO Evaluation of Non-Normal Meter Data

Challenge of Meter Accuracy Test Data Distributions

• ElectroMechanical electric meter accuracy test data tends to have more variability and tends to be normal.

• Electronic (AMR) and Digital (AMI) meter accuracy is very high and tends to have a higher percentage of test points around the mean– Has different flatness or what we call “kurtosis”.

Results in non normal distribution.– AMI meter test data is even more concentrated

Page 9: TESCO Evaluation of Non-Normal Meter Data

Skew and Kurtosis

Mean > Median Mean < Median

Positive +Negative -

Page 10: TESCO Evaluation of Non-Normal Meter Data

AMR Meter Accuracy Test Data Distribution

Skew: - 0.2 Kurtosis: +0.85

Page 11: TESCO Evaluation of Non-Normal Meter Data

AMI Meter Accuracy Test Data Distribution

Skew: -12.2 Kurtosis: +230.05

Page 12: TESCO Evaluation of Non-Normal Meter Data

Gas Meter Accuracy Test Data Distribution

Skew: - 0.33 Kurtosis: +4.85

Page 13: TESCO Evaluation of Non-Normal Meter Data

The Normal Curve

The normal distribution is the most recognized distribution in statistics.The normal curve is a smooth, symmetrical, bell-shaped curve, generated by the density function.

It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately normally distributed.

Page 14: TESCO Evaluation of Non-Normal Meter Data

Normal Distribution

Each combination of mean and standard deviation generates a unique normal curve

“Standard” Normal Distribution

– Has a μ = 0, and σ = 1

– Data from any normal distribution can be made to fit the standard normal by converting raw scores to standard scores.

– Z-scores measure how many standard deviations from the mean a particular data-value lies.

Page 15: TESCO Evaluation of Non-Normal Meter Data

Empirical Rule

The Empirical Rule…

+6-1-3-4-5-6 -2 +4+3+2+1 +5

68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations

99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations

99.9999998 % of the data will fall within +/- 6 standard deviations

Page 16: TESCO Evaluation of Non-Normal Meter Data

Why Assess Normality?

While many processes behave according to the normal distribution, many distributions in meter testing are not normal.

There are many types of distributions:

There are many statistical tools that assume normal distributionproperties in their calculations such as Z1.9.

So understanding just how “normal” the data is will impact how we look at the data.

Page 17: TESCO Evaluation of Non-Normal Meter Data

Tools for Assessing Normality

The shape of any normal curve can be calculated based on the normal probability density function.

Tests for normality basically compare the shape of the calculated curve to the actual distribution of your data points.

For the purposes of this training, we will focus using the Anderson-Darling test and Normal Probability plots in MINITAB™ to assess normality.

Watch that curve!

Page 18: TESCO Evaluation of Non-Normal Meter Data

Goodness-of-Fit

The Anderson-Darling test uses an empirical density function.

CumulativePercent

0

20

40

60

80

100

3.0 3.5 4.0 4.5 5.0 5.5Raw Data Scale

Expected for Normal DistributionActual Data

20%20%

20%20%Departure of the actual data from the expected normal distribution. The Anderson-Darling Goodness-of-Fit test assesses the magnitude of these departures using an Observed minus Expected formula.

Page 19: TESCO Evaluation of Non-Normal Meter Data

The Normal Probability Plot

Amount

Perc

ent

11010090807060

99.9

99

9590

80706050403020

10

5

1

0.1

Mean

0.684

84.69StDev 7.913N 70AD 0.265P-Value

Probability Plot of AmountNormal

The Anderson-Darling test is a good litmus test for normality: if the P-value is more than .05,

your data are normal enough for most purposes

Notice scale on the vertical axis

Page 20: TESCO Evaluation of Non-Normal Meter Data

Anderson-Darling Caveat

Use the Anderson Darling column to generate these graphs.

In this case, both the Histogram and the Normality Plot look very “normal”. However, because the sample size is so large, the Anderson-Darling test is very sensitive and any slight deviation from normal will cause the p-value to be very low.

Examples: Centron & AM 250 data

Anderson Darling

Perc

ent

65605550454035

99.9

99

9590

80706050403020

10

5

1

0.1

Mean 50.03StDev 4.951N 500AD 0.177P-Value 0.921

Probability Plot of Anderson DarlingNormal

60565248444036

Median

Mean

50.5050.2550.0049.7549.50

1st Q uartile 46.800Median 50.0063rd Q uartile 53.218Maximum 62.823

49.596 50.466

49.663 50.500

4.662 5.278

A -Squared 0.18P-V alue 0.921

Mean 50.031StDev 4.951V ariance 24.511Skewness -0.061788Kurtosis -0.180064N 500

Minimum 35.727

A nderson-Darling Normality Test

95% C onfidence Interv al for Mean

95% C onfidence Interv al for Median

95% C onfidence Interv al for StDev95% Confidence Intervals

Summary for Anderson Darling

Page 21: TESCO Evaluation of Non-Normal Meter Data

If the Data Is Not Normal, Don’t Panic!

• There are lots of meaningful statistical tools you can use to analyze your data.

• It just means you may have to think about your data in a slightly different way.

Don’t touch that button!

Page 22: TESCO Evaluation of Non-Normal Meter Data

Non Normality

Why do we care if a data set is normally distributed?– When it is necessary to make inferences about the true nature

of the population based on random samples drawn from the population.

– For problem solving purposes, because we don’t want to make a bad decision – having normal data is so critical that with EVERY statistical test, the first thing we do is check for normality of the data.

Some of the primary causes for non-normal data:– Skewness – Natural and Artificial Limits– Mixed Distributions - Multiple Modes– Kurtosis

Page 23: TESCO Evaluation of Non-Normal Meter Data

Non Normal Data Analysis

What happens if the process is not normally distributed?

Usually concerned about this when doing process capability analysis or hypothesis testing but can be a factor in using Z1.9.

• The Box-Cox or Johnson transformations are used to try to transform the data so that they become approximately normal.

• Find another known distribution that fits the data.

• Evaluate the risk of assuming a normal distribution using histograms, empirical cdfs, and OCCs

Page 24: TESCO Evaluation of Non-Normal Meter Data

Box Cox Transformations

• Box Cox transformations are used to try to convert non normal data into normal data

• Box Cox transform the input data data denoted by g, using W = gl

where l is any number typically between -5 and 5.

• Trick is to choose l that produces a curve that is as close to normal as possible.

• Statistics software recommends a range for l often with a confidence interval.

• User may choose values for l and observe resulting curves.• Limited to positive data values and assumes the data is in

subgroups.• Problem using with meter test data.

Page 25: TESCO Evaluation of Non-Normal Meter Data

Exercise: Box Cox Transformations

Examples:

Dimensions Example

AM 250 Data

Lambda

StD

ev

5.02.50.0-2.5-5.0

0.001340

0.001335

0.001330

0.001325

0.001320

Lower CL Upper CL

Limit

Estimate 0.06

Lower CL -0.50Upper CL 0.71

Rounded Value 0.00

(using 95.0% confidence)

Lambda

Box-Cox Plot of AM 250 Sort_1 no out

Am 250 Data Does not Transform Using Box Cox. Most Meter Data will not Transform with Box Cox.

Page 26: TESCO Evaluation of Non-Normal Meter Data

Johnson Transformations

• Johnson transformations are used to try to convert non normal data into normal data.

• Johnson transformation is chosen from three different functions.

• Does not assume data is in subgroups.• The Johnson transformation is more powerful than a

Box-Cox transformation, hence it works more often with meter data files.

Page 27: TESCO Evaluation of Non-Normal Meter Data

Exercise: Johnson Transformations

1. Use G1 data

2. Based on the probability plots what can you say about the original data and the transformed data?

3. Using the transformed data in Z1.9

Page 28: TESCO Evaluation of Non-Normal Meter Data

G1 Data & Histogram

FPL G1 Sorted

Freq

uenc

y

10296908478

60

50

40

30

20

10

0

Mean 99.41StDev 3.344N 57

Normal Histogram of G1 Sorted Data

TotalVariable Count N Mean StDev Minimum MedianG1 57 57 99.410 3.344 74.650 99.890

Maximum Range Skewness Kurtosis100.110 25.460 -7.51 56.61

Data is Non Normal

Page 29: TESCO Evaluation of Non-Normal Meter Data

G1 Johnson Transformed Data

Per

cent

100.5100.099.599.0

99

90

50

10

1

N 56AD 3.003P-Value <0.005

Per

cent

20-2

99

90

50

10

1

N 56AD 0.259P-Value 0.703

Z Value

P-V

alue

for

AD

tes

t

1.21.00.80.60.40.2

0.8

0.6

0.4

0.2

0.0

0.6

Ref P

P-V alue for Best F it: 0.702832Z for Best F it: 0.6Best Transformation Ty pe: SUTransformation function equals0.576014 + 0.884646 * A sinh( ( X - 99.9377 ) / 0.0840756 )

Probability P lot for Or iginal Data

Probability P lot for T ransformed Data

Select a T ransformation

(P-Value = 0.005 means <= 0.005)

Johnson Transformation for G1 Sorted_1

Transformed Data Now Satisfies the Normality Requirement

Page 30: TESCO Evaluation of Non-Normal Meter Data

Z1.9 Calculations Using Johnson Transformed G1 Data

EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9 STANDARD DEVIATION METHOD WITH DOUBLE

SPECIFICATION LIMITS

Full Load Data Using Johnson Transformed Data

Example meter group has a population of 3,000 meters and 2% accuracy. Use AQL = 0.4%

Line Information Needed Value Obtained Explanation 1. Sample Size: n 57 From Table I

2. Sum of % Registrations: ∑X

3. Sum of Squared % Registrations: ∑X2

4. Correction Factor (CF): (∑X)2/n

5. Corrected Sum of Squares (SS): ∑X2-CF

6. Variance (V): SS/(n-1)

7. Estimate of Lot Std. Deviation (S): √V 1.089

8. Sample Mean (X bar): (∑X)/n .012

9. Upper Specification Limit: U 3.49830 (102.0)

10. Lower Specification Limit: L -2.47898 (98.0)

11. Quality Index (upper): QU = (U- X bar)/S 3.207 (3.49830-.012)/1.089

12. Quality Index (lower): QL = (X bar - L)/S 2.292 (.012-(-2.47898))/1.08

13. Est. of Lot % Out of Limits Above U: Pu 0.035% From Table V

14. Est. of Lot % Out of Limits Below L: Pl 0.954% From Table V

15. Total Est. % Out of Limits: P = Pu+Pl 0.989% 0.035% +0.954%

16. Maximum Allowable % Out of Limits: M 1.16% From Table II and IV using AQL = 0.4 17. Acceptability Criterion: Pu + Pl < M 0.989% < 1.16% Therefore, the meter group is acceptable for continued service.

Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or less than the maximum allowable percent nonconforming (M), the lot meets the acceptability criterion. If P is greater than M or if either Qu or Ql or both are negative, then the lot does not meet the acceptability criterion.

Page 31: TESCO Evaluation of Non-Normal Meter Data

Checking for Alternative Distributions

• Find another known distribution that fits the data– Normal – Lognormal – Exponential – Weibull– Smallest or Largest Extreme Value – Gamma – Logistic – Loglogistic

• Can use Minitab or other statistics software package• Calculate probabilities based on distribution and determine how to

accept the results• Very difficult to do

Page 32: TESCO Evaluation of Non-Normal Meter Data

Exercise: Check for Alternative Distribution

See Centron data

Centron FL

Pe

rce

nt

100.4100.2100.099.8

99.99

99

90

50

10

1

0.01

Centron FL

Pe

rce

nt

100.4100.2100.099.8

99.99

99

90

50

10

1

0.01

Centron FL - Threshold

Pe

rce

nt

3192.63192.43192.23192.0

99.99

99

90

50

10

1

0.01

Centron FL

Pe

rce

nt

1000.00100.0010.001.000.100.01

99.99

90

50

10

1

0.01

3-Parameter LognormalAD = 2.565 P-V alue = *

ExponentialAD = 564.693 P-V alue < 0.003

Goodness of F it Test

NormalAD = 2.576 P-V alue < 0.005

LognormalAD = 2.587 P-V alue < 0.005

Probability Plot for Centron FLNormal - 95% C I Lognormal - 95% C I

3-Parameter Lognormal - 95% C I Exponential - 95% C I

No Distributions Fit the Data Because Its Basically a Normal Distribution with Kurtosis

Page 33: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk – Histograms & CDF

• Develop histogram of data• Develop cumulative probability distribution and compare

to normal probability distribution• Calculate and plot empirical cdf using Minitab and

compare to normal cdf• How close are the data plot and normal plot?

Sampling is frequently from a population that is approximately normal. If the deviation from normality is not large, the best procedure may be to proceed with the standard Z1.9 methods and interpret the results with some degree of caution.

See AM 250 Example Excel & MinitabExcel

Page 34: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk – AM 250 Data Histograms & CDF

A M 250 Sort_1 no out

Freq

uenc

y

105.6104.0102.4100.899.297.696.094.4

1000

800

600

400

200

0

Mean 99.85S tDev 0.8336N 7638

Histogram (with Normal Curve) of AM 250 Sort_1 no out

Data is not normal & Does Not Transform Using Box Cox or Johnson

Page 35: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk - AM 250 Histogram CRF Data

100.000%100.00%0.10%8105.95104.95 to 105.95

100.000%99.90%0.13%10104.95103.95 to 104.95

100.000%99.76%0.25%19103.95102.95 to 103.95

99.990%99.52%0.59%45102.95101.95 to 102.95

99.416%98.93%5.13%392101.95100.95 to 101.95

90.701%93.79%38.47%2938100.9599.95 to 100.95

54.914%55.33%42.80%326999.9598.95 to 99.95

14.107%12.53%11.40%87198.9597.95 to 98.95

1.147%1.13%0.85%6597.9596.95 to 97.95

0.026%0.27%0.16%1296.9595.95 to 96.95

0.000%0.12%0.09%795.9594.95 to 95.95

0.000%0.03%0.03%294.9593.95 to 94.95

FrequencyFrequencyFrequencyFrequencyBinsClass 

Relative Relative Relative

CumulativeCumulative

NormalData

7638Count:

11.4Range:

94.3Min: 

105.7Max:

0.83355St Dev:Compare Data CRF to Normal

Page 36: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk – Empirical CDF of AM 250

A M 250 Sort_1 no out

Perc

ent

105.0102.5100.097.595.0

100

80

60

40

20

0

Mean 99.85S tDev 0.8336N 7638

Empirical CDF of AM 250 Sort_1 no outNormal

Blue Curve = Normal CDF Red Curve = AM 250 Data CDF

Curves are nearly Identical……Little Risk in Using Z1.9 Even Though Data is Not Normal

Page 37: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk - OCCs

Similar to Empirical cdf but difficult to evaluate

Typical OC Curve for AQL ~ 1%.

OCC shape will change depending on characteristics of the non normality.

The Acceptable Quality Limit(AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling

Page 38: TESCO Evaluation of Non-Normal Meter Data

Assessment of Risk - OCCs

ANSI/ASQ Z1.9 OCCs for Larger Population Size

Page 39: TESCO Evaluation of Non-Normal Meter Data

Can Always Punt – Use Z1.4

• Normality is not a requirement. OCCs usually calculated with Poisson, hypergeometric, or binominal distributions.

• Go/No Go evaluation of the sample. Just count the number of meters that fail the test.

• Down side…..Requires a larger sample size than Z1.9– Sample of 75 meters using Z1.9 requires 200 using

Z1.4

Page 40: TESCO Evaluation of Non-Normal Meter Data

Summary

• There are lots of meaningful statistical tools you can use to analyze your data.

• It just means you may have to think about your data in a slightly different way. Best choices:– Use the Johnson transformation to try to transform the data

so that it becomes approximately normal. Use Z1.9.

– Using histograms and empirical cdfs, evaluate the risk of assuming a normal distribution and using Z1.9

– Use Z1.4 and larger sample

If the Data Is Not Normal, Don’t Panic!