2010 - Estimating Reference Intervals

3
Am J Clin Pathol  2010;133:175-177 175 175 DOI: 10.1309/AJCPQ4N7BRZQVHAL 175 © American Society for Clinical Pathology  AJCP / E Estimating Reference Intervals Gary L. Horowitz, MD DOI: 10.1309/AJCPQ4N7BRZQVHAL As indicated in the thought-provoking article in this issue of the  Journal  by Katayev and colleague s, 1  there are few things more important than the reference inter- vals we report along with our laboratory measurements. Unfortunately, we laboratory professionals give them far too little attention, adopting manufacturers’ recommended intervals, often without even verifying them ourselves and, rarely, if ever, establishing our own values. Thus, the article “Establishing Reference Intervals for Clinical Laboratory Test Results: Is There a Better Way?” is certain to attract much attention. Katayev and colleagues 1  make several assertions that  bear comment. They claim that there is no clear guideline as to which technique to use, but the recommendations of the International Federation of Clinical Chemistry and Laboratory Medicine 2  and the CLSI 3  are quite clear. If one collects samples carefully from 120 vetted healthy people, then the technique of choice is nonparametric analysis. The reason for this is that the nonparametric technique requires no knowledge of, and makes no assumptions about, the nature of the data distribution. In other words, the reference interval values obtained are valid no matter what the under- lying distribution is. If one has fewer samples, again from carefully screened, apparently healthy people, one can use a parametric technique with as few as 40 points, so long as the original data (or some transformed version of the data) exhibit a gaussian distribution. And with even fewer than 40 data points, one can use robust techniques to get an estimate of the reference interval. 3,4  Notwithstanding the assertions  by Katayev a nd colleagu es, 1  the reference intervals obtained from properly collected and analyzed data do not vary depending on the technique used. 3 Katayev and colleagues 1  also make the point that it  becomes prohibitively difficult to collect sufficient data for all the potential partitions for which one might want refer- ence intervals. They mention specifically sex and age (eg, deciles), but one might also include others (eg, fasting and race or ancestry). In this regard, it is particularly interesting that Katayev and colleagues 1  did not specifically mention differences (or lack thereof) for any partitions, with the exception of sex for hemoglobin and creatinine. In his origi- nal article, Hoffman 5  demonstrated the use of his technique with “just” 500 points. Clearly, Katayev and colleagues 1  had more than enough data to look at every analyte by sex and  by age. One wonders, for example, whether there was any effect of sex and age on calcium or on thyroid-stimulating hormone (TSH). The fact that something is difficult to do does not negate the importance, or usefulness, of doing it. As a par- ticularly good example, a group in the Netherlands recently did a superb reference interval study. 6  By collecting data for 1,444 people and using the recommended nonparametric method of analysis, they determined that creatine kinase reference intervals varied tremendously not only by sex but also by race/ancestry as well. Specifically, the 97.5th per- centile for women varied from 201 to 313 to 414 IU/L for “white Europeans, South Asians, and blacks,” respectively; for men, the corresponding values were 322, 641, and 801 IU/L. By using the manufacturer’s reference intervals (defined as a single partition), the authors showed that the  proportion of healthy women whose values were “abnor- mal” was 8% for white Europeans, 16% for South Asians, and 42% for blacks; for men, the corresponding values were 17%, 32%, and 62%.

Transcript of 2010 - Estimating Reference Intervals

Page 1: 2010 - Estimating Reference Intervals

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 13

Am J Clin Pathol 2010133175-177 175175 DOI 101309AJCPQ4N7BRZQVHAL 175

copy American Society for Clinical Pathology

AJCP E983140983145983156983151983154983145983137983148

Estimating Reference Intervals

Gary L Horowitz MD

DOI 101309AJCPQ4N7BRZQVHAL

As indicated in the thought-provoking article in this

issue of the Journal by Katayev and colleagues1 there

are few things more important than the reference inter-

vals we report along with our laboratory measurements

Unfortunately we laboratory professionals give them far

too little attention adopting manufacturersrsquo recommended

intervals often without even verifying them ourselves and

rarely if ever establishing our own values Thus the article

ldquoEstablishing Reference Intervals for Clinical Laboratory

Test Results Is There a Better Wayrdquo is certain to attractmuch attention

Katayev and colleagues1 make several assertions that

bear comment They claim that there is no clear guideline

as to which technique to use but the recommendations

of the International Federation of Clinical Chemistry and

Laboratory Medicine2 and the CLSI3 are quite clear If one

collects samples carefully from 120 vetted healthy people

then the technique of choice is nonparametric analysis The

reason for this is that the nonparametric technique requires

no knowledge of and makes no assumptions about the

nature of the data distribution In other words the referenceinterval values obtained are valid no matter what the under-

lying distribution is If one has fewer samples again from

carefully screened apparently healthy people one can use

a parametric technique with as few as 40 points so long as

the original data (or some transformed version of the data)

exhibit a gaussian distribution And with even fewer than 40

data points one can use robust techniques to get an estimate

of the reference interval34 Notwithstanding the assertions

by Katayev and colleagues1 the reference intervals obtained

from properly collected and analyzed data do not vary

depending on the technique used

3

Katayev and colleagues1 also make the point that it

becomes prohibitively difficult to collect sufficient data for

all the potential partitions for which one might want refer-

ence intervals They mention specifically sex and age (eg

deciles) but one might also include others (eg fasting and

race or ancestry) In this regard it is particularly interesting

that Katayev and colleagues1 did not specifically mention

differences (or lack thereof) for any partitions with the

exception of sex for hemoglobin and creatinine In his origi-

nal article Hoffman5 demonstrated the use of his techniquewith ldquojustrdquo 500 points Clearly Katayev and colleagues1 had

more than enough data to look at every analyte by sex and

by age One wonders for example whether there was any

effect of sex and age on calcium or on thyroid-stimulating

hormone (TSH)

The fact that something is difficult to do does not

negate the importance or usefulness of doing it As a par-

ticularly good example a group in the Netherlands recently

did a superb reference interval study6 By collecting data

for 1444 people and using the recommended nonparametric

method of analysis they determined that creatine kinasereference intervals varied tremendously not only by sex but

also by raceancestry as well Specifically the 975th per-

centile for women varied from 201 to 313 to 414 IUL for

ldquowhite Europeans South Asians and blacksrdquo respectively

for men the corresponding values were 322 641 and

801 IUL By using the manufacturerrsquos reference intervals

(defined as a single partition) the authors showed that the

proportion of healthy women whose values were ldquoabnor-

malrdquo was 8 for white Europeans 16 for South Asians

and 42 for blacks for men the corresponding values were

17 32 and 62

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 23

176 Am J Clin Pathol 2010133175-177176 DOI 101309AJCPQ4N7BRZQVHAL

copy American Society for Clinical Pathology

Horowitz E983155983156983145983149983137983156983145983150983143 R983141983142983141983154983141983150983139983141 I983150983156983141983154983158983137983148983155

The truth though is that Katayev et al1 are absolutely

correct in pointing out that few laboratories are capable of

undertaking a reference interval study of this magnitude

What can we do As recommended in the CLSI document3

we can verify rather than establish from scratch a reference

interval established elsewhere by collecting samples from

just 20 carefully vetted healthy people If no more than 2 of

these 20 samples have values outside the proposed interval

then it is statistically valid to adopt the proposed interval

Any laboratories that attempted to do this for creatine kinase

would likely have discovered that there was a major problem

with the manufacturerrsquos proposed interval

Of course if you find that you cannot adopt the manufac-

turerrsquos reference interval what do you do next What if you

could find other laboratories that used the same method each

of which collected data from its own set of 20 reference sub-

jects If you pooled the data from 6 laboratories you would

have 120 different values enough to establish a reference

interval using the recommended technique (at least for 1 parti-

tion) In the same way if you could pool data from 20 or even

50 laboratories all using the same method you would have

400 or even 1000 different values which might allow you

to establish reference intervals for many different partitions

As it turns out the College of American Pathologists offers

among its proficiency testing program services a service that

does just this for its participants7 It is at least in my opinion

an excellent although underused service

It is important too that laboratory professionals real-

ize that for many analytes where national or international

guidelines apply they should not attempt to establish orverify their own traditional reference intervals they should

use the indicated ldquodecision limitsrdquo Examples include cho-

lesterol high-density lipoprotein cholesterol triglycerides

glucose glycated hemoglobin (hemoglobin A1c

) and neona-

tal bilirubin In these cases the laboratoryrsquos job is to ensure

accuracymdashthat its results on patient specimens match those

that were used to establish the guidelines In many laborato-

ries it is assumed the its methods are accurate which is not

entirely unreasonable if homogeneous systems are used in

which the reagents calibrators and instruments have been

validated by the manufacturers who have done remarkablygood jobs with many if not all of these analytes Some

laboratories may go further by subscribing to proficiency

survey programs that use commutable materials and estab-

lish values by reference methods89 The main point though

is that efforts that would have been directed to establishing

or verifying reference intervals should for these analytes

be directed instead toward verifying accuracy of the method

By definition roughly 50 not 25 of cholesterol values

from apparently healthy Americans not taking cholesterol-

lowering medication will be higher than the decision limit

of 200 mgdL

10

In this connection it is interesting that Katayev et

al1 included creatinine among the analytes in their report

Because of differences in sex age and race establishing a

reference interval for creatinine is fraught with problems

Rather laboratories should report the estimated glomeru-

lar filtration rate (GFR) using the Modification of Diet in

Renal Disease Study equation whenever they report serum

creatinine values As noted on the National Kidney Disease

Education Program Web site estimated GFR ldquoprovides

a more clinically useful measure of kidney function than

serum creatinine alonerdquo11 For example a 55-year-old nonndash

African American woman with a creatinine level of 10 mg

dL a value at the upper end of the reference interval has an

estimated GFR of 58 mLmin173 m2 indicative of chronic

kidney disease12 The ldquonormalrdquo creatinine value is clinically

misleading

The inclusion of TSH in the article by Katayev et al1 is

interesting as well Because the incidence of so-called sub-

clinical hypothyroidism is higher among women than among

men why did they not evaluate values for women and men

separately And to the extent that subclinical hypothyroid-

ism also increases with age why did they not look at their

data as a function of age as well as sex Assuming their data

had confirmed these phenomena they could have questioned

the validity of the current reference intervals as others have

That is if a TSH value in an asymptomatic older woman is

outside the central 95 of values in younger people of both

sexes what should a concerned clinician do Many would

argue that it may be wise to do nothing13-15 If that is the

case is the value genuinely ldquoabnormalrdquoAnd this begs the question of why we typically use

975 as the upper limit of the reference interval To the

extent that physicians are screening (ie ordering tests for

people without symptoms or signs) then by definition

25 of ldquonormalrdquo people will have ldquoabnormalrdquo results and

therefore possibly be subjected to additional unnecessary

testing Is there any real medical benefit in following up on

a calcium level in the 98th percentile in the absence of symp-

toms or signs This may be part of what drives up health

care costsmdashfollow-up on tests that are outside the central

95 of results but that by themselves warrant no furtheraction As laboratory professionals we need to think beyond

simply providing central 95 reference intervals and more

about what we want clinicians to do with the information we

provide Perhaps the reference interval for calcium should

include the central 990 of calcium values

The issue of what constitutes abnormal becomes even

more controversial when we turn to prostate-specific antigen

(PSA) screening A PSA value of 50 ngmL is typically

flagged as abnormal but what does it mean in a man with

no symptoms and no signs Without doubt as a result of

screening we detect prostate cancer at earlier stages but it

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 33

Am J Clin Pathol 2010133175-177 177177 DOI 101309AJCPQ4N7BRZQVHAL 177

copy American Society for Clinical Pathology

AJCP E983140983145983156983151983154983145983137983148

is not clear that overall patients live longer or benefit in

any other ways Are too many men undergoing biopsies and

even therapy the benefits of which are less certain than the

costs and adverse effects1617

Even if one grants the attractiveness of the proposal by

Katayev et al1 it does nothing to help individual laboratories

that may use different methods or that may serve populations

with different backgrounds or that may not have the comput-

ing power to replicate their analyses What are they to do I

would submit that they can still use the Hoffman technique

as a powerful quality assurance tool

As Katayev and colleagues1 point out the huge advan-

tage of the Hoffman technique is that it does not require

that samples be obtained from healthy people Indeed in his

article Hoffman5 demonstrated the technique using sample

sets with admixtures of 20 abnormal values Furthermore

the Hoffman technique does not require sophisticated sta-

tistics and computer analyses In the original description

Hoffman5 plotted on gaussian probability paper the cumu-

lative frequency of test values vs concentration (It should be

noted that this is very different from Figure 1 in the article

by Katayev and colleagues1 presumably the computer

techniques used make Figure 1 equivalent to the original

description) The effect of gaussian probability paper is to

give much more weight to the central part of the distribution

lessening the contributions of values as they deviate from the

center values that are progressively less likely to be from

healthy people

If the reference intervals generated in this way are

strikingly different from the reference intervals in use (forexample if 30 of samples are labeled as abnormal) the

laboratory needs to do some troubleshooting Maybe the

original reference interval study was flawed or maybe the

method was not implemented correctly or maybe the local

population is different or maybe the method has drifted over

time or maybe the ldquoreference intervalrdquo is really a ldquodecision

limitrdquo Whatever the cause it is incumbent on laboratory

professionals to make sure they understand why the apparent

reference interval by the Hoffman technique is so different

from the reference interval in use As noted at the outset I

believe that this exercise is at least as important as reviewingquality control and proficiency testing

My assessment is that the proposal by Katayev and col-

leagues1 is not a ldquobetter wayrdquo to establish reference intervals

but I am indebted to them for making me think hard about

the whole issue of reference intervals for encouraging me to

read in detail Hoffmanrsquos5 classic article and for enabling me

to see that there are other equally important applications of

their ideas from which we can all benefit

From the Department of Pathology Beth Israel Deaconess

Medical Center Boston MA

References

1 Katayev A Balciza C Seccombe D Establishing referenceintervals for clinical laboratory test results is there a betterway Am J Clin Pathol 2010133180-186

2 Clinical and Laboratory Standards Institute DefiningEstablishing and Verifying Reference Intervals in the ClinicalLaboratory Approved Guideline Third Edition CLSIdocument C28-A3 Wayne PA Clinical and LaboratoryStandards Institute 2008

3 Solberg HE Approved recommendations (1987) on thetheory of reference values part 5 statistical treatment ofcollected reference values determination of reference limits

J Clin Chem Clin Biochem 198725645-656

4 Horn PS Pesce AJ Reference Intervals A Userrsquos GuideWashington DC AACC Press 2005

5 Hoffman RG Statistics in the practice of medicine JAMA1963188864-873

6 Brewster LM Mairuhu G Sturk A et al Distribution ofcreatine kinase in the general population implications forstatin therapy Am Heart J 2007154655-661

7 College of American Pathologists 2010 Surveys amp Anatomic

Pathology Education Programs [online catalog] ReferenceRange Services [page 101] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

8 Canadian External Quality Assessment Laboratory(CEQAL) Proficiency Testing (PT) Samples for theMeasurement of Lipids httpwwwceqalcomservicesphpAccessed December 6 2009

9 College of American Pathologists 2010 Surveys amp AnatomicPathology Education Programs [online catalog] AccuracyBased Surveys [pages 85-88] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

10 Third Report of the National Cholesterol Education

Program (NCEP) Expert Panel on Detection Evaluationand Treatment of High Blood Cholesterol in Adults(Adult Treatment Panel III) Final Report

Circulation20021063143-3421

11 National Kidney Disease Education Program Estimating andreporting GFR httpwwwnkdepnihgovlabprofessionalsestimate_report_gfrhtm Accessed December 6 2009

12 National Kidney Disease Education Program GFR MDRDcalculators for Adults (Conventional units) httpwwwnkdepnihgovprofessionalsgfr_calculatorsidms_conhtmAccessed December 6 2009

13 US Preventive Services Task Force Screening for thyroiddisease recommendation statement Ann Intern Med2004140125-127

14 Helfand M Screening for subclinical thyroid dysfunctionin nonpregnant adults a summary of the evidence forthe US Preventive Services Task Force Ann Intern Med2004140128-141

15 Surks MI Ortiz E Daniels GH et al Subclinical thyroiddisease scientific review and guidelines for diagnosis andmanagement JAMA 2004291228-238

16 Lin K Lipsitz R Miller T et al Benefits and harms ofprostate-specific antigen screening for prostate cancer anevidence update for the US Preventive Services Task Force

Ann Intern Med 2008149192-199

17 Barry MJ The PSA conundrum [editorial] Arch Intern Med20061667-8

Page 2: 2010 - Estimating Reference Intervals

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 23

176 Am J Clin Pathol 2010133175-177176 DOI 101309AJCPQ4N7BRZQVHAL

copy American Society for Clinical Pathology

Horowitz E983155983156983145983149983137983156983145983150983143 R983141983142983141983154983141983150983139983141 I983150983156983141983154983158983137983148983155

The truth though is that Katayev et al1 are absolutely

correct in pointing out that few laboratories are capable of

undertaking a reference interval study of this magnitude

What can we do As recommended in the CLSI document3

we can verify rather than establish from scratch a reference

interval established elsewhere by collecting samples from

just 20 carefully vetted healthy people If no more than 2 of

these 20 samples have values outside the proposed interval

then it is statistically valid to adopt the proposed interval

Any laboratories that attempted to do this for creatine kinase

would likely have discovered that there was a major problem

with the manufacturerrsquos proposed interval

Of course if you find that you cannot adopt the manufac-

turerrsquos reference interval what do you do next What if you

could find other laboratories that used the same method each

of which collected data from its own set of 20 reference sub-

jects If you pooled the data from 6 laboratories you would

have 120 different values enough to establish a reference

interval using the recommended technique (at least for 1 parti-

tion) In the same way if you could pool data from 20 or even

50 laboratories all using the same method you would have

400 or even 1000 different values which might allow you

to establish reference intervals for many different partitions

As it turns out the College of American Pathologists offers

among its proficiency testing program services a service that

does just this for its participants7 It is at least in my opinion

an excellent although underused service

It is important too that laboratory professionals real-

ize that for many analytes where national or international

guidelines apply they should not attempt to establish orverify their own traditional reference intervals they should

use the indicated ldquodecision limitsrdquo Examples include cho-

lesterol high-density lipoprotein cholesterol triglycerides

glucose glycated hemoglobin (hemoglobin A1c

) and neona-

tal bilirubin In these cases the laboratoryrsquos job is to ensure

accuracymdashthat its results on patient specimens match those

that were used to establish the guidelines In many laborato-

ries it is assumed the its methods are accurate which is not

entirely unreasonable if homogeneous systems are used in

which the reagents calibrators and instruments have been

validated by the manufacturers who have done remarkablygood jobs with many if not all of these analytes Some

laboratories may go further by subscribing to proficiency

survey programs that use commutable materials and estab-

lish values by reference methods89 The main point though

is that efforts that would have been directed to establishing

or verifying reference intervals should for these analytes

be directed instead toward verifying accuracy of the method

By definition roughly 50 not 25 of cholesterol values

from apparently healthy Americans not taking cholesterol-

lowering medication will be higher than the decision limit

of 200 mgdL

10

In this connection it is interesting that Katayev et

al1 included creatinine among the analytes in their report

Because of differences in sex age and race establishing a

reference interval for creatinine is fraught with problems

Rather laboratories should report the estimated glomeru-

lar filtration rate (GFR) using the Modification of Diet in

Renal Disease Study equation whenever they report serum

creatinine values As noted on the National Kidney Disease

Education Program Web site estimated GFR ldquoprovides

a more clinically useful measure of kidney function than

serum creatinine alonerdquo11 For example a 55-year-old nonndash

African American woman with a creatinine level of 10 mg

dL a value at the upper end of the reference interval has an

estimated GFR of 58 mLmin173 m2 indicative of chronic

kidney disease12 The ldquonormalrdquo creatinine value is clinically

misleading

The inclusion of TSH in the article by Katayev et al1 is

interesting as well Because the incidence of so-called sub-

clinical hypothyroidism is higher among women than among

men why did they not evaluate values for women and men

separately And to the extent that subclinical hypothyroid-

ism also increases with age why did they not look at their

data as a function of age as well as sex Assuming their data

had confirmed these phenomena they could have questioned

the validity of the current reference intervals as others have

That is if a TSH value in an asymptomatic older woman is

outside the central 95 of values in younger people of both

sexes what should a concerned clinician do Many would

argue that it may be wise to do nothing13-15 If that is the

case is the value genuinely ldquoabnormalrdquoAnd this begs the question of why we typically use

975 as the upper limit of the reference interval To the

extent that physicians are screening (ie ordering tests for

people without symptoms or signs) then by definition

25 of ldquonormalrdquo people will have ldquoabnormalrdquo results and

therefore possibly be subjected to additional unnecessary

testing Is there any real medical benefit in following up on

a calcium level in the 98th percentile in the absence of symp-

toms or signs This may be part of what drives up health

care costsmdashfollow-up on tests that are outside the central

95 of results but that by themselves warrant no furtheraction As laboratory professionals we need to think beyond

simply providing central 95 reference intervals and more

about what we want clinicians to do with the information we

provide Perhaps the reference interval for calcium should

include the central 990 of calcium values

The issue of what constitutes abnormal becomes even

more controversial when we turn to prostate-specific antigen

(PSA) screening A PSA value of 50 ngmL is typically

flagged as abnormal but what does it mean in a man with

no symptoms and no signs Without doubt as a result of

screening we detect prostate cancer at earlier stages but it

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 33

Am J Clin Pathol 2010133175-177 177177 DOI 101309AJCPQ4N7BRZQVHAL 177

copy American Society for Clinical Pathology

AJCP E983140983145983156983151983154983145983137983148

is not clear that overall patients live longer or benefit in

any other ways Are too many men undergoing biopsies and

even therapy the benefits of which are less certain than the

costs and adverse effects1617

Even if one grants the attractiveness of the proposal by

Katayev et al1 it does nothing to help individual laboratories

that may use different methods or that may serve populations

with different backgrounds or that may not have the comput-

ing power to replicate their analyses What are they to do I

would submit that they can still use the Hoffman technique

as a powerful quality assurance tool

As Katayev and colleagues1 point out the huge advan-

tage of the Hoffman technique is that it does not require

that samples be obtained from healthy people Indeed in his

article Hoffman5 demonstrated the technique using sample

sets with admixtures of 20 abnormal values Furthermore

the Hoffman technique does not require sophisticated sta-

tistics and computer analyses In the original description

Hoffman5 plotted on gaussian probability paper the cumu-

lative frequency of test values vs concentration (It should be

noted that this is very different from Figure 1 in the article

by Katayev and colleagues1 presumably the computer

techniques used make Figure 1 equivalent to the original

description) The effect of gaussian probability paper is to

give much more weight to the central part of the distribution

lessening the contributions of values as they deviate from the

center values that are progressively less likely to be from

healthy people

If the reference intervals generated in this way are

strikingly different from the reference intervals in use (forexample if 30 of samples are labeled as abnormal) the

laboratory needs to do some troubleshooting Maybe the

original reference interval study was flawed or maybe the

method was not implemented correctly or maybe the local

population is different or maybe the method has drifted over

time or maybe the ldquoreference intervalrdquo is really a ldquodecision

limitrdquo Whatever the cause it is incumbent on laboratory

professionals to make sure they understand why the apparent

reference interval by the Hoffman technique is so different

from the reference interval in use As noted at the outset I

believe that this exercise is at least as important as reviewingquality control and proficiency testing

My assessment is that the proposal by Katayev and col-

leagues1 is not a ldquobetter wayrdquo to establish reference intervals

but I am indebted to them for making me think hard about

the whole issue of reference intervals for encouraging me to

read in detail Hoffmanrsquos5 classic article and for enabling me

to see that there are other equally important applications of

their ideas from which we can all benefit

From the Department of Pathology Beth Israel Deaconess

Medical Center Boston MA

References

1 Katayev A Balciza C Seccombe D Establishing referenceintervals for clinical laboratory test results is there a betterway Am J Clin Pathol 2010133180-186

2 Clinical and Laboratory Standards Institute DefiningEstablishing and Verifying Reference Intervals in the ClinicalLaboratory Approved Guideline Third Edition CLSIdocument C28-A3 Wayne PA Clinical and LaboratoryStandards Institute 2008

3 Solberg HE Approved recommendations (1987) on thetheory of reference values part 5 statistical treatment ofcollected reference values determination of reference limits

J Clin Chem Clin Biochem 198725645-656

4 Horn PS Pesce AJ Reference Intervals A Userrsquos GuideWashington DC AACC Press 2005

5 Hoffman RG Statistics in the practice of medicine JAMA1963188864-873

6 Brewster LM Mairuhu G Sturk A et al Distribution ofcreatine kinase in the general population implications forstatin therapy Am Heart J 2007154655-661

7 College of American Pathologists 2010 Surveys amp Anatomic

Pathology Education Programs [online catalog] ReferenceRange Services [page 101] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

8 Canadian External Quality Assessment Laboratory(CEQAL) Proficiency Testing (PT) Samples for theMeasurement of Lipids httpwwwceqalcomservicesphpAccessed December 6 2009

9 College of American Pathologists 2010 Surveys amp AnatomicPathology Education Programs [online catalog] AccuracyBased Surveys [pages 85-88] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

10 Third Report of the National Cholesterol Education

Program (NCEP) Expert Panel on Detection Evaluationand Treatment of High Blood Cholesterol in Adults(Adult Treatment Panel III) Final Report

Circulation20021063143-3421

11 National Kidney Disease Education Program Estimating andreporting GFR httpwwwnkdepnihgovlabprofessionalsestimate_report_gfrhtm Accessed December 6 2009

12 National Kidney Disease Education Program GFR MDRDcalculators for Adults (Conventional units) httpwwwnkdepnihgovprofessionalsgfr_calculatorsidms_conhtmAccessed December 6 2009

13 US Preventive Services Task Force Screening for thyroiddisease recommendation statement Ann Intern Med2004140125-127

14 Helfand M Screening for subclinical thyroid dysfunctionin nonpregnant adults a summary of the evidence forthe US Preventive Services Task Force Ann Intern Med2004140128-141

15 Surks MI Ortiz E Daniels GH et al Subclinical thyroiddisease scientific review and guidelines for diagnosis andmanagement JAMA 2004291228-238

16 Lin K Lipsitz R Miller T et al Benefits and harms ofprostate-specific antigen screening for prostate cancer anevidence update for the US Preventive Services Task Force

Ann Intern Med 2008149192-199

17 Barry MJ The PSA conundrum [editorial] Arch Intern Med20061667-8

Page 3: 2010 - Estimating Reference Intervals

8192019 2010 - Estimating Reference Intervals

httpslidepdfcomreaderfull2010-estimating-reference-intervals 33

Am J Clin Pathol 2010133175-177 177177 DOI 101309AJCPQ4N7BRZQVHAL 177

copy American Society for Clinical Pathology

AJCP E983140983145983156983151983154983145983137983148

is not clear that overall patients live longer or benefit in

any other ways Are too many men undergoing biopsies and

even therapy the benefits of which are less certain than the

costs and adverse effects1617

Even if one grants the attractiveness of the proposal by

Katayev et al1 it does nothing to help individual laboratories

that may use different methods or that may serve populations

with different backgrounds or that may not have the comput-

ing power to replicate their analyses What are they to do I

would submit that they can still use the Hoffman technique

as a powerful quality assurance tool

As Katayev and colleagues1 point out the huge advan-

tage of the Hoffman technique is that it does not require

that samples be obtained from healthy people Indeed in his

article Hoffman5 demonstrated the technique using sample

sets with admixtures of 20 abnormal values Furthermore

the Hoffman technique does not require sophisticated sta-

tistics and computer analyses In the original description

Hoffman5 plotted on gaussian probability paper the cumu-

lative frequency of test values vs concentration (It should be

noted that this is very different from Figure 1 in the article

by Katayev and colleagues1 presumably the computer

techniques used make Figure 1 equivalent to the original

description) The effect of gaussian probability paper is to

give much more weight to the central part of the distribution

lessening the contributions of values as they deviate from the

center values that are progressively less likely to be from

healthy people

If the reference intervals generated in this way are

strikingly different from the reference intervals in use (forexample if 30 of samples are labeled as abnormal) the

laboratory needs to do some troubleshooting Maybe the

original reference interval study was flawed or maybe the

method was not implemented correctly or maybe the local

population is different or maybe the method has drifted over

time or maybe the ldquoreference intervalrdquo is really a ldquodecision

limitrdquo Whatever the cause it is incumbent on laboratory

professionals to make sure they understand why the apparent

reference interval by the Hoffman technique is so different

from the reference interval in use As noted at the outset I

believe that this exercise is at least as important as reviewingquality control and proficiency testing

My assessment is that the proposal by Katayev and col-

leagues1 is not a ldquobetter wayrdquo to establish reference intervals

but I am indebted to them for making me think hard about

the whole issue of reference intervals for encouraging me to

read in detail Hoffmanrsquos5 classic article and for enabling me

to see that there are other equally important applications of

their ideas from which we can all benefit

From the Department of Pathology Beth Israel Deaconess

Medical Center Boston MA

References

1 Katayev A Balciza C Seccombe D Establishing referenceintervals for clinical laboratory test results is there a betterway Am J Clin Pathol 2010133180-186

2 Clinical and Laboratory Standards Institute DefiningEstablishing and Verifying Reference Intervals in the ClinicalLaboratory Approved Guideline Third Edition CLSIdocument C28-A3 Wayne PA Clinical and LaboratoryStandards Institute 2008

3 Solberg HE Approved recommendations (1987) on thetheory of reference values part 5 statistical treatment ofcollected reference values determination of reference limits

J Clin Chem Clin Biochem 198725645-656

4 Horn PS Pesce AJ Reference Intervals A Userrsquos GuideWashington DC AACC Press 2005

5 Hoffman RG Statistics in the practice of medicine JAMA1963188864-873

6 Brewster LM Mairuhu G Sturk A et al Distribution ofcreatine kinase in the general population implications forstatin therapy Am Heart J 2007154655-661

7 College of American Pathologists 2010 Surveys amp Anatomic

Pathology Education Programs [online catalog] ReferenceRange Services [page 101] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

8 Canadian External Quality Assessment Laboratory(CEQAL) Proficiency Testing (PT) Samples for theMeasurement of Lipids httpwwwceqalcomservicesphpAccessed December 6 2009

9 College of American Pathologists 2010 Surveys amp AnatomicPathology Education Programs [online catalog] AccuracyBased Surveys [pages 85-88] httpwwwcaporgappsdocsproficiency_testingsurveys_catalog2010_surveys_catalogpdf Accessed December 6 2009

10 Third Report of the National Cholesterol Education

Program (NCEP) Expert Panel on Detection Evaluationand Treatment of High Blood Cholesterol in Adults(Adult Treatment Panel III) Final Report

Circulation20021063143-3421

11 National Kidney Disease Education Program Estimating andreporting GFR httpwwwnkdepnihgovlabprofessionalsestimate_report_gfrhtm Accessed December 6 2009

12 National Kidney Disease Education Program GFR MDRDcalculators for Adults (Conventional units) httpwwwnkdepnihgovprofessionalsgfr_calculatorsidms_conhtmAccessed December 6 2009

13 US Preventive Services Task Force Screening for thyroiddisease recommendation statement Ann Intern Med2004140125-127

14 Helfand M Screening for subclinical thyroid dysfunctionin nonpregnant adults a summary of the evidence forthe US Preventive Services Task Force Ann Intern Med2004140128-141

15 Surks MI Ortiz E Daniels GH et al Subclinical thyroiddisease scientific review and guidelines for diagnosis andmanagement JAMA 2004291228-238

16 Lin K Lipsitz R Miller T et al Benefits and harms ofprostate-specific antigen screening for prostate cancer anevidence update for the US Preventive Services Task Force

Ann Intern Med 2008149192-199

17 Barry MJ The PSA conundrum [editorial] Arch Intern Med20061667-8