TESCO Evaluation of Non-Normal Meter Data
-
Upload
tesco-the-eastern-specialty-company -
Category
Documents
-
view
162 -
download
2
description
Transcript of TESCO Evaluation of Non-Normal Meter Data
Evaluation of Non Normal Meter Test Data
Evaluation of Non Normal Meter Test Data
The Eastern Specialty Company
February 28, 2012
Frank GarciaFred Rispoli
Session Objectives
Understand:• Requirements of normality in ANSI/ASQ Z1.9 and
how that affects analysis of meter test data.• Review typical meter test data distributions and how
to determine if meter test data is normal.• Introduction to working with non normal data.• Assessing risk of using ANSI/ASQ Z1.9 with non
normal meter test data.
Any Other Issues or Items of Interest?
We Assume That Population Fits a Statistical Model
• The statistical model being used for the sampling/testing plan needs to match the actual distribution of the population.
• In most circumstances, one is looking at a normal or Gaussian distribution (i.e. a Bell curve) based on sampling theory.
• We like ANSI/ASQ Z1.9 for ease of use and universal acceptance as a sampling plan.
What does ANSI/AQ Z1.9 Say About Normality?
Paragraph A8 states:“This standard assumes the underlying distribution of individual measurements to be normal in shape. Failure of this assumption to be valid will affect the Operating Characteristic (OC) curves and probabilities based on these curves. In particular it will affect the estimate of percentage non conforming calculated from the mean and standard deviation of the distribution. The assumption should be verified prior to use of this standard.”
ANSI/ASQ Z1.9
Sampling Procedures and Tables for Inspection by Variables for Percent
Nonconforming
• Various methods (Variability Unknown - Standard Deviation Method, Variability Unknown - Range Method, and Variability Known Method)
• All methods can be used with single or double specification limits.
Z1.9 Calculations
EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9-2003 STANDARD DEVIATION METHOD WITH DOUBLE
SPECIFICATION LIMITS
Weighted Average Calculated Data
Example meter group has a population of 475 meters and 2% accuracy. Use AQL = 2.5
Line Information Needed Value Obtained Explanation 1. Sample Size: n 25 From Table I
2. Sum of % Registrations: ∑X 2506.1
3. Sum of Squared % Registrations: ∑X2 251222.03
4. Correction Factor (CF): (∑X)2/n 251221.49 (2506.1)2/25
5. Corrected Sum of Squares (SS): ∑X2-CF 0.5416
6. Variance (V): SS/(n-1) 0.0226 0.5416/24
7. Estimate of Lot Std. Deviation (S): √V 0.1503 √0.0226
8. Sample Mean (X bar): (∑X)/n 100.24 2506.1/25
9. Upper Specification Limit: U 102.0
10. Lower Specification Limit: L 98.0
11. Quality Index (upper): QU = (U- X bar)/S 11.71 (102.0-100.24)/0.1503
12. Quality Index (lower): QL = (X bar - L)/S 14.90 (100.24-98.0)/0.1503
13. Est. of Lot % Out of Limits Above U: Pu 0.00% From Table V
14. Est. of Lot % Out of Limits Below L: Pl 0.00% From Table V
15. Total Est. % Out of Limits: P = Pu+Pl 0.00% 0.00% +0.00%
16. Maximum Allowable % Out of Limits: M 5.98% From Table II and IV using AQL = 2.5 17. Acceptability Criterion: Pu + Pl < M 0.00% < 5.98% Therefore, the meter group is acceptable for continued service.
Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or less than the maximum allowable percent nonconforming (M), the lot meets the acceptability criterion. If P is greater than M or if either Qu or Ql or both are negative, then the lot does not meet the acceptability criterion.
Operating Characteristic Curves - OCCs
Ideal OC Curve
When sampling, you face the risk of rejecting lots of AQL quality as well as the risk of accepting lots of poorer than AQL quality. We are interested in knowing how an acceptance sampling plan will accept or not accept lots over various lot qualities. A curve showing the probability of acceptance over various lot or process qualities is called the operating characteristic (OC) curve.
The Acceptable Quality Limit (AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling
Challenge of Meter Accuracy Test Data Distributions
• ElectroMechanical electric meter accuracy test data tends to have more variability and tends to be normal.
• Electronic (AMR) and Digital (AMI) meter accuracy is very high and tends to have a higher percentage of test points around the mean– Has different flatness or what we call “kurtosis”.
Results in non normal distribution.– AMI meter test data is even more concentrated
Skew and Kurtosis
Mean > Median Mean < Median
Positive +Negative -
AMR Meter Accuracy Test Data Distribution
Skew: - 0.2 Kurtosis: +0.85
AMI Meter Accuracy Test Data Distribution
Skew: -12.2 Kurtosis: +230.05
Gas Meter Accuracy Test Data Distribution
Skew: - 0.33 Kurtosis: +4.85
The Normal Curve
The normal distribution is the most recognized distribution in statistics.The normal curve is a smooth, symmetrical, bell-shaped curve, generated by the density function.
It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately normally distributed.
Normal Distribution
Each combination of mean and standard deviation generates a unique normal curve
“Standard” Normal Distribution
– Has a μ = 0, and σ = 1
– Data from any normal distribution can be made to fit the standard normal by converting raw scores to standard scores.
– Z-scores measure how many standard deviations from the mean a particular data-value lies.
Empirical Rule
The Empirical Rule…
+6-1-3-4-5-6 -2 +4+3+2+1 +5
68.27 % of the data will fall within +/- 1 standard deviation95.45 % of the data will fall within +/- 2 standard deviations99.73 % of the data will fall within +/- 3 standard deviations
99.9937 % of the data will fall within +/- 4 standard deviations99.999943 % of the data will fall within +/- 5 standard deviations
99.9999998 % of the data will fall within +/- 6 standard deviations
Why Assess Normality?
While many processes behave according to the normal distribution, many distributions in meter testing are not normal.
There are many types of distributions:
There are many statistical tools that assume normal distributionproperties in their calculations such as Z1.9.
So understanding just how “normal” the data is will impact how we look at the data.
Tools for Assessing Normality
The shape of any normal curve can be calculated based on the normal probability density function.
Tests for normality basically compare the shape of the calculated curve to the actual distribution of your data points.
For the purposes of this training, we will focus using the Anderson-Darling test and Normal Probability plots in MINITAB™ to assess normality.
Watch that curve!
Goodness-of-Fit
The Anderson-Darling test uses an empirical density function.
CumulativePercent
0
20
40
60
80
100
3.0 3.5 4.0 4.5 5.0 5.5Raw Data Scale
Expected for Normal DistributionActual Data
20%20%
20%20%Departure of the actual data from the expected normal distribution. The Anderson-Darling Goodness-of-Fit test assesses the magnitude of these departures using an Observed minus Expected formula.
The Normal Probability Plot
Amount
Perc
ent
11010090807060
99.9
99
9590
80706050403020
10
5
1
0.1
Mean
0.684
84.69StDev 7.913N 70AD 0.265P-Value
Probability Plot of AmountNormal
The Anderson-Darling test is a good litmus test for normality: if the P-value is more than .05,
your data are normal enough for most purposes
Notice scale on the vertical axis
Anderson-Darling Caveat
Use the Anderson Darling column to generate these graphs.
In this case, both the Histogram and the Normality Plot look very “normal”. However, because the sample size is so large, the Anderson-Darling test is very sensitive and any slight deviation from normal will cause the p-value to be very low.
Examples: Centron & AM 250 data
Anderson Darling
Perc
ent
65605550454035
99.9
99
9590
80706050403020
10
5
1
0.1
Mean 50.03StDev 4.951N 500AD 0.177P-Value 0.921
Probability Plot of Anderson DarlingNormal
60565248444036
Median
Mean
50.5050.2550.0049.7549.50
1st Q uartile 46.800Median 50.0063rd Q uartile 53.218Maximum 62.823
49.596 50.466
49.663 50.500
4.662 5.278
A -Squared 0.18P-V alue 0.921
Mean 50.031StDev 4.951V ariance 24.511Skewness -0.061788Kurtosis -0.180064N 500
Minimum 35.727
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev95% Confidence Intervals
Summary for Anderson Darling
If the Data Is Not Normal, Don’t Panic!
• There are lots of meaningful statistical tools you can use to analyze your data.
• It just means you may have to think about your data in a slightly different way.
Don’t touch that button!
Non Normality
Why do we care if a data set is normally distributed?– When it is necessary to make inferences about the true nature
of the population based on random samples drawn from the population.
– For problem solving purposes, because we don’t want to make a bad decision – having normal data is so critical that with EVERY statistical test, the first thing we do is check for normality of the data.
Some of the primary causes for non-normal data:– Skewness – Natural and Artificial Limits– Mixed Distributions - Multiple Modes– Kurtosis
Non Normal Data Analysis
What happens if the process is not normally distributed?
Usually concerned about this when doing process capability analysis or hypothesis testing but can be a factor in using Z1.9.
• The Box-Cox or Johnson transformations are used to try to transform the data so that they become approximately normal.
• Find another known distribution that fits the data.
• Evaluate the risk of assuming a normal distribution using histograms, empirical cdfs, and OCCs
Box Cox Transformations
• Box Cox transformations are used to try to convert non normal data into normal data
• Box Cox transform the input data data denoted by g, using W = gl
where l is any number typically between -5 and 5.
• Trick is to choose l that produces a curve that is as close to normal as possible.
• Statistics software recommends a range for l often with a confidence interval.
• User may choose values for l and observe resulting curves.• Limited to positive data values and assumes the data is in
subgroups.• Problem using with meter test data.
Exercise: Box Cox Transformations
Examples:
Dimensions Example
AM 250 Data
Lambda
StD
ev
5.02.50.0-2.5-5.0
0.001340
0.001335
0.001330
0.001325
0.001320
Lower CL Upper CL
Limit
Estimate 0.06
Lower CL -0.50Upper CL 0.71
Rounded Value 0.00
(using 95.0% confidence)
Lambda
Box-Cox Plot of AM 250 Sort_1 no out
Am 250 Data Does not Transform Using Box Cox. Most Meter Data will not Transform with Box Cox.
Johnson Transformations
• Johnson transformations are used to try to convert non normal data into normal data.
• Johnson transformation is chosen from three different functions.
• Does not assume data is in subgroups.• The Johnson transformation is more powerful than a
Box-Cox transformation, hence it works more often with meter data files.
Exercise: Johnson Transformations
1. Use G1 data
2. Based on the probability plots what can you say about the original data and the transformed data?
3. Using the transformed data in Z1.9
G1 Data & Histogram
FPL G1 Sorted
Freq
uenc
y
10296908478
60
50
40
30
20
10
0
Mean 99.41StDev 3.344N 57
Normal Histogram of G1 Sorted Data
TotalVariable Count N Mean StDev Minimum MedianG1 57 57 99.410 3.344 74.650 99.890
Maximum Range Skewness Kurtosis100.110 25.460 -7.51 56.61
Data is Non Normal
G1 Johnson Transformed Data
Per
cent
100.5100.099.599.0
99
90
50
10
1
N 56AD 3.003P-Value <0.005
Per
cent
20-2
99
90
50
10
1
N 56AD 0.259P-Value 0.703
Z Value
P-V
alue
for
AD
tes
t
1.21.00.80.60.40.2
0.8
0.6
0.4
0.2
0.0
0.6
Ref P
P-V alue for Best F it: 0.702832Z for Best F it: 0.6Best Transformation Ty pe: SUTransformation function equals0.576014 + 0.884646 * A sinh( ( X - 99.9377 ) / 0.0840756 )
Probability P lot for Or iginal Data
Probability P lot for T ransformed Data
Select a T ransformation
(P-Value = 0.005 means <= 0.005)
Johnson Transformation for G1 Sorted_1
Transformed Data Now Satisfies the Normality Requirement
Z1.9 Calculations Using Johnson Transformed G1 Data
EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9 STANDARD DEVIATION METHOD WITH DOUBLE
SPECIFICATION LIMITS
Full Load Data Using Johnson Transformed Data
Example meter group has a population of 3,000 meters and 2% accuracy. Use AQL = 0.4%
Line Information Needed Value Obtained Explanation 1. Sample Size: n 57 From Table I
2. Sum of % Registrations: ∑X
3. Sum of Squared % Registrations: ∑X2
4. Correction Factor (CF): (∑X)2/n
5. Corrected Sum of Squares (SS): ∑X2-CF
6. Variance (V): SS/(n-1)
7. Estimate of Lot Std. Deviation (S): √V 1.089
8. Sample Mean (X bar): (∑X)/n .012
9. Upper Specification Limit: U 3.49830 (102.0)
10. Lower Specification Limit: L -2.47898 (98.0)
11. Quality Index (upper): QU = (U- X bar)/S 3.207 (3.49830-.012)/1.089
12. Quality Index (lower): QL = (X bar - L)/S 2.292 (.012-(-2.47898))/1.08
13. Est. of Lot % Out of Limits Above U: Pu 0.035% From Table V
14. Est. of Lot % Out of Limits Below L: Pl 0.954% From Table V
15. Total Est. % Out of Limits: P = Pu+Pl 0.989% 0.035% +0.954%
16. Maximum Allowable % Out of Limits: M 1.16% From Table II and IV using AQL = 0.4 17. Acceptability Criterion: Pu + Pl < M 0.989% < 1.16% Therefore, the meter group is acceptable for continued service.
Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or less than the maximum allowable percent nonconforming (M), the lot meets the acceptability criterion. If P is greater than M or if either Qu or Ql or both are negative, then the lot does not meet the acceptability criterion.
Checking for Alternative Distributions
• Find another known distribution that fits the data– Normal – Lognormal – Exponential – Weibull– Smallest or Largest Extreme Value – Gamma – Logistic – Loglogistic
• Can use Minitab or other statistics software package• Calculate probabilities based on distribution and determine how to
accept the results• Very difficult to do
Exercise: Check for Alternative Distribution
See Centron data
Centron FL
Pe
rce
nt
100.4100.2100.099.8
99.99
99
90
50
10
1
0.01
Centron FL
Pe
rce
nt
100.4100.2100.099.8
99.99
99
90
50
10
1
0.01
Centron FL - Threshold
Pe
rce
nt
3192.63192.43192.23192.0
99.99
99
90
50
10
1
0.01
Centron FL
Pe
rce
nt
1000.00100.0010.001.000.100.01
99.99
90
50
10
1
0.01
3-Parameter LognormalAD = 2.565 P-V alue = *
ExponentialAD = 564.693 P-V alue < 0.003
Goodness of F it Test
NormalAD = 2.576 P-V alue < 0.005
LognormalAD = 2.587 P-V alue < 0.005
Probability Plot for Centron FLNormal - 95% C I Lognormal - 95% C I
3-Parameter Lognormal - 95% C I Exponential - 95% C I
No Distributions Fit the Data Because Its Basically a Normal Distribution with Kurtosis
Assessment of Risk – Histograms & CDF
• Develop histogram of data• Develop cumulative probability distribution and compare
to normal probability distribution• Calculate and plot empirical cdf using Minitab and
compare to normal cdf• How close are the data plot and normal plot?
Sampling is frequently from a population that is approximately normal. If the deviation from normality is not large, the best procedure may be to proceed with the standard Z1.9 methods and interpret the results with some degree of caution.
See AM 250 Example Excel & MinitabExcel
Assessment of Risk – AM 250 Data Histograms & CDF
A M 250 Sort_1 no out
Freq
uenc
y
105.6104.0102.4100.899.297.696.094.4
1000
800
600
400
200
0
Mean 99.85S tDev 0.8336N 7638
Histogram (with Normal Curve) of AM 250 Sort_1 no out
Data is not normal & Does Not Transform Using Box Cox or Johnson
Assessment of Risk - AM 250 Histogram CRF Data
100.000%100.00%0.10%8105.95104.95 to 105.95
100.000%99.90%0.13%10104.95103.95 to 104.95
100.000%99.76%0.25%19103.95102.95 to 103.95
99.990%99.52%0.59%45102.95101.95 to 102.95
99.416%98.93%5.13%392101.95100.95 to 101.95
90.701%93.79%38.47%2938100.9599.95 to 100.95
54.914%55.33%42.80%326999.9598.95 to 99.95
14.107%12.53%11.40%87198.9597.95 to 98.95
1.147%1.13%0.85%6597.9596.95 to 97.95
0.026%0.27%0.16%1296.9595.95 to 96.95
0.000%0.12%0.09%795.9594.95 to 95.95
0.000%0.03%0.03%294.9593.95 to 94.95
FrequencyFrequencyFrequencyFrequencyBinsClass
Relative Relative Relative
CumulativeCumulative
NormalData
7638Count:
11.4Range:
94.3Min:
105.7Max:
0.83355St Dev:Compare Data CRF to Normal
Assessment of Risk – Empirical CDF of AM 250
A M 250 Sort_1 no out
Perc
ent
105.0102.5100.097.595.0
100
80
60
40
20
0
Mean 99.85S tDev 0.8336N 7638
Empirical CDF of AM 250 Sort_1 no outNormal
Blue Curve = Normal CDF Red Curve = AM 250 Data CDF
Curves are nearly Identical……Little Risk in Using Z1.9 Even Though Data is Not Normal
Assessment of Risk - OCCs
Similar to Empirical cdf but difficult to evaluate
Typical OC Curve for AQL ~ 1%.
OCC shape will change depending on characteristics of the non normality.
The Acceptable Quality Limit(AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling
Assessment of Risk - OCCs
ANSI/ASQ Z1.9 OCCs for Larger Population Size
Can Always Punt – Use Z1.4
• Normality is not a requirement. OCCs usually calculated with Poisson, hypergeometric, or binominal distributions.
• Go/No Go evaluation of the sample. Just count the number of meters that fail the test.
• Down side…..Requires a larger sample size than Z1.9– Sample of 75 meters using Z1.9 requires 200 using
Z1.4
Summary
• There are lots of meaningful statistical tools you can use to analyze your data.
• It just means you may have to think about your data in a slightly different way. Best choices:– Use the Johnson transformation to try to transform the data
so that it becomes approximately normal. Use Z1.9.
– Using histograms and empirical cdfs, evaluate the risk of assuming a normal distribution and using Z1.9
– Use Z1.4 and larger sample
If the Data Is Not Normal, Don’t Panic!