Reliable assessment of faecal loading in older adults by abdominal radiograph

6
Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18 13 Blackwell Publishing, Ltd. Research Assessment of faecal loading in older adults Reliable assessment of faecal loading in older adults by abdominal radiograph Mark Yates, Keren Day, Jim Mullany Ballarat Health Services, Ballarat, Victoria, Australia Jack Harvey School of Information Technology and Mathematical Sciences, University of Ballarat, Ballarat, Victoria, Australia Objective: To produce a measure of faecal loading using plain abdominal radiograph that has both face validity and reliability. This formed part of the Ballarat Constipation Study, which aimed to establish a suite of objective assessment tools for the identification of constipation in residential and extended care facilities. Methods: A 20-point loading scale (five levels of loading × four segments of colon) was evaluated using 75 plain abdominal films of patients older than 65 years that were taken for various purposes. These were randomly ordered and five radiologists, following appropriate training, rated the films. Each was blinded to the others’ responses. To establish intra- rater reliability, each radiologist rated 25 of the films for a second time. Results: Reliability was assessed using Q-type correlations for raw scores and Cohen’s kappa for dichotomised scores. Inter-rater correlations ranged from 0.57 confidence interval (CI) (0.38, 0.72) to 0.83 CI (0.74, 0.90). Inter-rater kappas ranged from 0.28 CI (0.06, 0.50) to 0.72 CI (0.50, 0.94). Intra-rater correlations ranged from 0.68 CI (0.38, 0.84) to 0.92 CI (0.82, 0.96) and intra-rater kappas ranged from 0.26 CI (0.08, 0.60) to 0.90 CI (0.70, 0.99). Conclusion: This method of assessing and reporting faecal loading in older people has an acceptable level of reliability for four of the five radiologists. The scale was considered appropriate for use in the larger study, where its validity was tested. Key words: abdominal radiograph, constipation, elderly, faecal loading, scale reliability. Introduction Constipation is a difficult condition to assess because of its subjective, symptom based nature and its complex pathophys- iology. In most cases, a thorough history forms the basis for therapy. However, it is important to have objective measures. This is particularly so in groups of patients who are limited in their ability to clearly report the symptoms. Even where cognition is not impaired, patient reports can be unreliable [1,2] and can involve different understandings of constipation between the patient and health professionals [3]. How best to evaluate the presence of constipation and faecal impaction and the relationship between the experience of constipation and the observable phenomenon of faecal loading is unclear. Imaging studies may be used to confirm the presence of a suspected abnormality such as faecal loading. In the elderly, plain abdominal radiographs are commonly used in clinical practice for this purpose. However, the reliability and validity of this investigation has not been established for this population, although it has been attempted in other contexts [4 –7]. If a reliable method of evaluating stool retention in older people on plain abdominal radiograph could be established and its rela- tionship to constipation further explored, then this could be used as a first line investigation into constipation prior to more extensive and invasive investigations such as transit time studies [8,9]. Alternatively, if reliability and validity cannot be established, then the cost and bother for older people with constipation of undergoing the radiograph may be justifiably discouraged. The present research formed part of the Ballarat Constipation Study, which aimed to establish objective assessment tools for use with the elderly in residential and extended care facilities. The aim in this phase of the study was to produce a measure of faecal loading using plain abdominal radiograph that has both face validity and acceptable inter-rater and intra-rater reliability. Various scoring procedures have been proposed for objectively assessing colonic loading from plain abdominal radiographs [4 –7]. Reliability of any such scoring procedure is most com- monly assessed by measures of correlation or concordance. The commonly used Q-type Pearson’s correlation (r) [10] indi- cates the strength of the linear relationship between the raw scores allocated to a number of cases by two raters or by one rater on two occasions. However, the correlation coefficient is an imperfect indicator of repeatability. Whilst a low value cer- tainly indicates unreliability, a high correlation suggests, but does not necessarily imply, good reliability; a strong relation- ship and hence a high correlation can occur even if there has been a change in the average level or in the spread of scores. For this reason it is necessary to also check for any such changes. If the scores are used as the basis of a diagnostic categorisation (such as loaded vs not loaded), then statistics such as Cohen’s kappa (κ) [11] can be used. Cohen’s kappa is based on the number of concordances (agreed categorisations or diagnoses) between two raters or one rater on two occasions. It is the difference between the observed number of concordances Correspondence to: Dr Jack Harvey, School of Information Technology and Mathematical Sciences, University of Ballarat. Email: [email protected]

Transcript of Reliable assessment of faecal loading in older adults by abdominal radiograph

Page 1: Reliable assessment of faecal loading in older adults by abdominal radiograph

Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18 13

Blackwell Publishing, Ltd.ResearchAssessment of faecal loading in older adults

Reliable assessment of faecal loading in older adults by abdominal radiograph

Mark Yates, Keren Day, Jim MullanyBallarat Health Services, Ballarat, Victoria, Australia

Jack HarveySchool of Information Technology and Mathematical Sciences, University of Ballarat, Ballarat, Victoria, Australia

Objective: To produce a measure of faecal loading using plain abdominal radiograph that has both face validity and reliability. This formed part of the Ballarat Constipation Study, which aimed to establish a suite of objective assessment tools for the identification of constipation in residential and extended care facilities.Methods: A 20-point loading scale (five levels of loading × four segments of colon) was evaluated using 75 plain abdominal films of patients older than 65 years that were taken for various purposes. These were randomly ordered and five radiologists, following appropriate training, rated the films. Each was blinded to the others’ responses. To establish intra-rater reliability, each radiologist rated 25 of the films for a second time.Results: Reliability was assessed using Q-type correlations for raw scores and Cohen’s kappa for dichotomised scores. Inter-rater correlations ranged from 0.57 confidence interval (CI) (0.38, 0.72) to 0.83 CI (0.74, 0.90). Inter-rater kappas ranged from 0.28 CI (0.06, 0.50) to 0.72 CI (0.50, 0.94). Intra-rater correlations ranged from 0.68 CI (0.38, 0.84) to 0.92 CI (0.82, 0.96) and intra-rater kappas ranged from 0.26 CI (−0.08, 0.60) to 0.90 CI (0.70, 0.99).Conclusion: This method of assessing and reporting faecal loading in older people has an acceptable level of reliability for four of the five radiologists. The scale was considered appropriate for use in the larger study, where its validity was tested.

Key words: abdominal radiograph, constipation, elderly, faecal loading, scale reliability.

IntroductionConstipation is a difficult condition to assess because of itssubjective, symptom based nature and its complex pathophys-iology. In most cases, a thorough history forms the basis fortherapy. However, it is important to have objective measures.This is particularly so in groups of patients who are limited intheir ability to clearly report the symptoms. Even where cognition

is not impaired, patient reports can be unreliable [1,2] andcan involve different understandings of constipation betweenthe patient and health professionals [3]. How best to evaluatethe presence of constipation and faecal impaction and therelationship between the experience of constipation and theobservable phenomenon of faecal loading is unclear. Imagingstudies may be used to confirm the presence of a suspectedabnormality such as faecal loading. In the elderly, plainabdominal radiographs are commonly used in clinical practicefor this purpose. However, the reliability and validity of thisinvestigation has not been established for this population,although it has been attempted in other contexts [4–7].

If a reliable method of evaluating stool retention in older peopleon plain abdominal radiograph could be established and its rela-tionship to constipation further explored, then this could be usedas a first line investigation into constipation prior to more extensiveand invasive investigations such as transit time studies [8,9].Alternatively, if reliability and validity cannot be established,then the cost and bother for older people with constipation ofundergoing the radiograph may be justifiably discouraged.

The present research formed part of the Ballarat ConstipationStudy, which aimed to establish objective assessment tools foruse with the elderly in residential and extended care facilities.The aim in this phase of the study was to produce a measure offaecal loading using plain abdominal radiograph that has bothface validity and acceptable inter-rater and intra-rater reliability.

Various scoring procedures have been proposed for objectivelyassessing colonic loading from plain abdominal radiographs[4–7]. Reliability of any such scoring procedure is most com-monly assessed by measures of correlation or concordance.The commonly used Q-type Pearson’s correlation (r) [10] indi-cates the strength of the linear relationship between the rawscores allocated to a number of cases by two raters or by onerater on two occasions. However, the correlation coefficient isan imperfect indicator of repeatability. Whilst a low value cer-tainly indicates unreliability, a high correlation suggests, butdoes not necessarily imply, good reliability; a strong relation-ship and hence a high correlation can occur even if there hasbeen a change in the average level or in the spread of scores.For this reason it is necessary to also check for any suchchanges. If the scores are used as the basis of a diagnosticcategorisation (such as loaded vs not loaded), then statisticssuch as Cohen’s kappa (κ) [11] can be used. Cohen’s kappa isbased on the number of concordances (agreed categorisationsor diagnoses) between two raters or one rater on two occasions.It is the difference between the observed number of concordances

Correspondence to: Dr Jack Harvey, School of Information Technology and Mathematical Sciences, University of Ballarat. Email: [email protected]

Page 2: Reliable assessment of faecal loading in older adults by abdominal radiograph

Y a t e s M , D a y K , M u l l a n y J e t a l .

14 Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18

and the number expected under chance allocation, expressedas a proportion of the improvement over chance that is theo-retically possible, that is, to perfect agreement.

Barr et al. developed a complex scoring method for assess-ing the severity of stool retention in children using plainabdominal radiographs [4]. Four faecal quantity criteria andtwo faecal quality criteria, each relating to particular segmentsof the colon, were scored on differently weighted 3-pointand 4-point scales, from which a total score on a scale of0–25 was derived. The technique was clinically validated byevaluating films of children with known stool retention beforeand after therapy and a control group of films taken of otherchildren for other reasons (45 children in total). A receiveroperating characteristic (ROC) curve showing the relation-ship between sensitivity and specificity was constructed andfrom it an optimal diagnostic cut-off score (positive if rawscore ≥10) was determined. However, reliability was assessedonly in terms of the raw scores, being reported (apparently ascorrelations) as inter-observer ≥0.80 and intra-observer ≥0.85.

Rockney et al. used the scale of Barr et al. and the same diag-nostic cut-off score of 10 in a study of children with encopresis[5]. They reported reliability in terms of 72 diagnostic categor-isations by three radiologists using Cohen’s kappa. Overall inter-rater κ was 0.65 (P < 0.0001); the intra-rater kappas rangedfrom 0.52 to 0.63, but as these were based on much smallersample sizes, were not significantly different from zero (chance).

Bruera et al. reviewed the plain abdominal radiographs of 122consecutive terminal cancer patients admitted to a palliative careunit [6]. Two physicians scored the radiographs using a simpler12-point scale based on the amount of stool in the four segmentsof the colon. Inter-rater correlation was r = 0.78 (P = 0.0001).

Leech et al., in another study of faecal loading in children [7],used a different basis for scoring plain abdominal radiographs.Rather than identifying segments of the colon, the abdomenwas divided into three segments by three lines drawn by joiningspecified points on the spine and pelvis. In each of these seg-ments, faecal quantity was given a score on a 6-point scalefrom 0 to 5. These were summed to give a total score on a scalefrom 0 to 15. One hundred radiographs (33 constipated and67 control) were assessed by three observers on two occasions.Leech et al. assessed reliability in terms of intra-observer andinter-observer variation. A Wilcoxon matched pairs signedrank test showed no significant variation between the scores ofthe same observers on two occasions. A Friedman two-wayanalysis of variance showed significant differences betweenthe mean scores for different observers (χ2(2) = 44.2, P < 0.05).A ROC curve was constructed and a diagnostic cut-off score(positive if raw score ≥ 9) was determined, but no assessmentof the reliability of this diagnostic dichotomy was reported.

The majority of these scales were developed for children, andthey differed in the way the colon was divided, the number ofgrades of loading, and the weighting given to the degree of

loading depending on the segment. The reliability of thesescales has been reported differently, and only the Barr scale hasbeen used to determine a dichotomy. No scale was testedacross more than three reporters.

MethodIn light of the previous research reviewed and the experience ofthe radiologist on the research team, it was decided to evaluatea scoring procedure which used the colon segment approach ofBarr et al. and Bruera et al. and a simple scoring method withfive gradations (intermediate between the four of Barr et al.and Bruera et al. and the six of Leech et al.) This will bereferred to as the Ballarat faecal loading scale or simply theBallarat scale. For brevity, the other scales will be referred toas the Barr, Bruera and Leech scales.

Comparison of the Barr, Bruera, Leech and Ballarat scales

Barr scale (range: 0–25)The Barr scoring scheme is much more complex than the otherthree scales. The quantity score (ranging from 0 to 17) is basedon a 4-point verbal scale: ‘little or none’, ‘moderate amount’,‘large amount’ and ‘large/dilated’. This scale is applied to foursegments of the colon: ascending, transverse, descending andrectum. However, scoring for the larger amounts is dependenton location, so that the total score is differentially weighted(Table 1). There are also two minor variations in the verbaldescriptions. In the case of the ascending colon, ‘little or none’is replaced by ‘small amount’; and in the case of the rectum, thelast two categories are combined and labelled ‘large amount/full distally’. The Barr scale also includes two quality compon-ents relating to the distribution of stools of ‘granular’ and‘rocky’ appearance. The range of the quality score is 0–8, giv-ing a total score in the range 0–25.

Bruera scale (range: 0–12)Four-point scale applied to four segments of colon: ascending,transverse, descending and recto-sigmoid. 0, no stool; 1, stooloccupying <50% of the lumen; 2, stool occupying >50% of thelumen; and 3, stool completely occupying the lumen.

Table 1: Comparison of Ballarat, Bruera and Barr scoring rules

ScaleAmount

None Small Moderate Large Large-dilated

BallaratAll four segments 0 1 2 3 4

BrueraAll four segments 0 1 2 3 3

Barr quantityAscending 0 0 1 2 2Transverse 0 0 3 4 5Descending 0 0 3 4 5Rectum 0 0 2 5 5

The verbal descriptions shown are those used in the Ballarat procedure for coding the amount of stool. The Barr scale uses essentially the same verbal descriptions (see text), although the first two Ballarat categories are combined, and the amounts are differently weighted depending on location. The Bruera scale uses rather different verbal descriptions (see text), and hence the apparent correspondence between four of the five categories on the Ballarat and Bruera scales is only approximate.

Page 3: Reliable assessment of faecal loading in older adults by abdominal radiograph

A s s e s s m e n t o f f a e c a l l o a d i n g i n o l d e r a d u l t s

Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18 15

Leech scale (range: 0–15)Six-point scale applied to three abdominal segments defined inrelation to skeletal features. 0, none; 1, scanty; 2, mild; 3, mod-erate; 4, severe; and 5, severe with dilation.

Ballarat scale (range: 0–16)Five-point scale applied to four segments of colon: ascending,transverse, descending and recto-sigmoid. 0, none; 1, smallamount/traces; 2, moderate amount; 3, large amount/full; and4, very large amount/full with dilation.

Relationship between the four scalesThe Leech scale is more finely graded in regard to the inter-mediate quantities, but it is not comparable spatially with theother three scales. The Bruera categories can be put intoapproximate correspondence with the Ballarat categories, asshown in Table 1. Because the Barr and Ballarat scoring rulesuse almost identical verbal descriptions (see above), the differ-entially weighted Barr quantity scores for the various segmentscan be generated at the same time as the simpler Ballaratscores, using the correspondences shown in Table 1.

ProcedureThe reliability study involved a retrospective review of selectedabdominal films. From a large stock of plain abdominal filmsof patients aged 65 years and above that had been taken forvarious purposes, a sample of 75 films, representative of therange of faecal loading, were selected by an experienced radio-logist. The research radiologist, using a set of four additionalfilms, described the loading scale to the test radiologists, usingstandard anatomical markings for the bowel segments [12].

The 75 films were then presented in a different random order toeach radiologist, with ratings being recorded on score sheets de-signed to enable both Ballarat and Barr scores to be calculated.Each was blinded to the others’ responses. To establish intra-raterreliability, the set of 75 films was randomly partitioned into threesets of 25 films, which were then randomly reordered. Each radiolo-gist rated one of the sets for a second time using the same scale.

A fourth radiologist, the member of the research team who devisedthe scale, also rated the 75 films and one of the sets of 25 films,under the same blinded conditions. Figure 1 shows four films

Figure 1: Plain abdominal radiographs illustrating a range of Ballarat faecal loading scores (BFLS). (a) BFLS = 2; (b) BFLS = 6; (c) BFLS = 9; (d) BFLS = 13

Page 4: Reliable assessment of faecal loading in older adults by abdominal radiograph

Y a t e s M , D a y K , M u l l a n y J e t a l .

16 Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18

together with the Ballarat scores assigned by this radiologist.Subsequently, a fifth radiologist from a different hospital alsowent through the same rating procedure under the same con-ditions. Due to some clerical difficulties, only 66 of the original75 films were identifiable and available on this occasion.

Reliability was assessed using both correlations for pairs ofraw scores, and Cohen’s kappa statistics for pairs of dichotom-ous categorisations (loaded vs not loaded) derived from theraw scores. In each case 95% confidence intervals (CI) werecalculated, because placing error bounds on the estimates ofreliability is more informative than testing the hypothesis ofzero reliability.

In the absence of independent clinical evidence for direct valid-ation of the diagnostic performance of the Ballarat scale in thisphase of the study, an indirect validation was carried out by com-paring Ballarat and Barr scores. Ballarat and Barr faecal loadingscores were calculated from each rater’s assessment of eachfilm. For the Ballarat scale, diagnostic cut-offs in the range 7–9were investigated, these being perceived a priori as roughly equi-valent to the established Barr cut-off score of 10. It was foundthat although the results were not highly sensitive to choice ofcut-off, the highest level of reliability for the Ballarat dichotomyand the highest level of concordance between the Ballarat andBarr dichotomies were obtained using a cut-off of 8 (i.e. loadedif raw score ≥8). The diagnostic results reported are based onthis figure.

ResultsThe distributions of the raw scores and the diagnoses of the fiveraters on the first occasion are summarised in Table 2. Table 3shows a similar summary for the repeated assessments of each

rater. Results of the correlation and kappa analyses are sum-marised in Table 4. In each case, the minimum and maximumvalues of the relevant pair-wise statistic for both raw scoresand for diagnostic categorisations is shown, together with anapproximate 95% CI for each. No diagnoses were made onthe basis of the quantitative and qualitative subscales of theBarr scale.

DiscussionTable 2 shows that the assessments of rater two differed some-what from those of the other four raters: a combination of highmean score, large standard deviation and less pronouncedpositive skew led to the highest proportion of ‘loaded’ assess-ments. Table 3 shows that the same rater was also much lessconsistent than the other raters on the repeated assessments.This rater exhibited consistently lower intra- and inter-raterreliability than the other four raters, and contributed to mostof the minimum values of the measures summarised in Table 4,for both Barr and Ballarat scales (the exceptions being Ballaratintra-rater correlations, where the minimum value was tied;the Barr qualitative scores, for which the reliability of all raterswas low; and the interscale reliability which was high for allraters). For this reason, Table 4 includes two sets of minimumvalues, one with all five raters included and one with the divergentrater excluded. Clearly, four raters, including one from a differentradiological practice in another hospital, were much moreconsistent in all respects than rater two. Nevertheless, it is con-sidered that the variation exhibited by the divergent rater maybe representative of the scope of professional judgement amongradiologists. Furthermore, this divergence applied equally toboth Ballarat and Barr scales.

The presence of this divergent rater would suggest that somecertification of competency would be necessary for clinicaluse of the Ballarat scale. Considering that this rater exhibitedpoor intra-rater and inter-rater reliability, we consider thatsuch certification should involve as a minimum a demonstra-tion of acceptable intra-rater reliability on a test sample offilms.

In general, the CI for intra-rater comparisons are wider thanthose for inter-rater comparisons and interscale comparisonsbecause of the smaller sample size (25 vs 75). Kappa values inthe range 0.40–0.75 indicate good diagnostic reproducibility,

Table 2: Summary statistics for five raters: Inter-rater comparisons

Rater nScore

Min Max Mean SD Skewness Per cent loaded

1 75 0 14 5.05 2.76 0.474 162 75 0 13 5.52 3.33 0.287 323 75 0 12 4.57 2.70 0.325 154 75 0 13 5.48 2.59 0.495 205 66 0 14 4.85 3.50 0.513 23

Table 3: Summary statistics for five raters: Intra-rater comparisons

Rater Assessment nScore

Per cent loadedMin Max Mean SD Skewness

1 1 25 1 12 5.32 2.641 0.286 202 25 1 11 5.72 2.923 0.464 32

2 1 25 0 13 6.28 3.129 0.276 322 25 0 10 3.52 2.931 0.970 12

3 1 24 0 8 4.17 2.353 −0.021 132 24 0 9 4.04 2.742 0.201 17

4 1 25 1 11 5.72 2.836 0.261 282 25 1 13 5.20 3.055 1.157 20

5 1 25 0 14 5.08 4.020 0.639 282 25 0 13 5.04 3.680 0.609 24

Page 5: Reliable assessment of faecal loading in older adults by abdominal radiograph

A s s e s s m e n t o f f a e c a l l o a d i n g i n o l d e r a d u l t s

Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18 17

and values over 0.75 indicate excellent reproducibility[4].

These results indicate that, on the basis of both raw scoresand diagnostic dichotomies, the Ballarat faecal loading scaleexhibited levels of inter-rater and intra-rater reliability similarto those reported in other studies for the Barr, Bruera and Leechscales.

In the present study, the Ballarat scale, the overall Barr scaleand the quantitative subscale of the Barr scale all had similarlevels of reliability. The qualitative subscale of the Barr scalewas much less reliable. Scores on the Ballarat scale and the Barrscale were highly correlated, and the concordances betweenthe diagnoses based on the two scales were, in Rockney’sterminology [4], good to excellent. As the Barr scale has beenvalidated in earlier studies (albeit for a juvenile population),this provides indirect evidence that the Ballarat scale alsoproduces valid diagnoses of faecal loading. Consequently, onthe basis of the principle of parsimony, the simpler Ballaratscoring procedure is to be preferred to the more complex Barrprocedure.

It was concluded that, because four out of the five participatingradiologists demonstrated acceptable reliability in using theBallarat scale, the immediate objective of developing a scale foruse in a study of the assessment of constipation in the elderlywas attained. Because the test set of films was chosen retro-spectively from an archived collection, and had not been vali-dated against other more direct measures of faecal loading,direct validation of the Ballarat scale was not possible. How-ever, validity of the Ballarat scale has been provisionally andindirectly established by comparison with the Barr scale, whichhad been previously validated for a different, juvenile popula-tion. In the subsequent broader study, the validity in an olderpopulation will be directly tested against reported symptomsand direct observation of frequency, amount and quality ofbowel motions.

For the Ballarat scale to be applied in clinical practice, it wouldbe advantageous for radiologists to receive specific instructions

and to test their reliability. A competency-based training pack-age is being developed for this purpose.

AcknowledgementsThe Ballarat Constipation Study was supported by a grant fromthe Victorian Department of Human Services, Quality andCare Continuity Branch, Acute Health Division. The researchteam also acknowledges the contribution of the participatingradiologists.

Key Points

• Faecal loading in the elderly can be assessed usingplain abdominal radiography.

• The Ballarat faecal loading scale is a simple tool forquantifying faecal loading in the elderly.

• Acceptable levels of inter-rater and intra-raterreliability have been demonstrated for the Ballaratfaecal loading scale.

• The Ballarat faecal loading scale has been indirectlyvalidated by comparison with a previously validatedbut more complex scale.

• This new tool has the potential to assist in theassessment of constipation in the elderly.

References1 Manning AP, Wyman JB, Heaton KM. An examination of the reliability

of reported stool frequency in the diagnosis of idiopathic constipation.British Medical Journal 1976; 2: 213–214.

2 Ashraf W, Park F, Lof J, Quigley EMM. An examination of the reliability ofreported stool frequency in the diagnosis of idiopathic constipation.American Journal of Gastroenterology 1996; 91 (4): 26–32.

3 Herz MJ, Kahn E, Zalevski S, Aframian R, Kuznitz D, Reichman S. Con-stipation: a different entity for patients and doctors. Family Practice1996; 13 (2): 156–159.

4 Barr RG, Levine MD, Wilkinson RH, Mulvihill D. Chronic and occult stoolretention. Clinical Paediatrics 1979; 18: 674–686.

Table 4: Reliability of Ballarat and Barr faecal loading scales

Reliability of raw scores: correlation coefficients Reliability of categorisations: kappa coefficients Min (5 raters) Min (4 raters) Max (5 raters) Min (5 raters) Min (4 raters) Max (5 raters)

Comparison Scale rmin 95% CL rmin 95% CL rmax 95% CL rmin 95% CL rmin 95% CL rmax 95% CL

Inter-rater Ballarat 0.57 0.38, 0.72 0.74 0.69, 0.83 0.83 0.74, 0.90 0.28 0.06, 0.50 0.56 0.31, 0.81 0.72 0.50, 0.94Barr 0.60 0.42, 0.74 0.72 0.57, 0.81 0.84 0.75, 0.91 0.17 −0.04, 0.38 0.49 0.27, 0.70 0.68 0.45, 0.91Barr Q 0.55 0.35, 0.70 0.72 0.57, 0.81 0.85 0.76, 0.91 – – – – – –Barr qual. 0.40 0.18, 0.58 0.43 0.22, 0.62 0.67 0.51, 0.78 – – – – – –

Intra-rater Ballarat 0.68 0.38, 0.84 0.68 0.38, 0.84 0.92 0.82, 0.96 0.23 −0.15, 0.61 0.57 0.19, 0.95 0.90 0.70, 0.99Barr 0.63 0.29, 0.82 0.74 0.44, 0.86 0.93 0.84, 0.97 0.26 −0.08, 0.60 0.42 0.05, 0.79 1.00 –Barr Q 0.59 0.25, 0.79 0.72 0.25, 0.79 0.91 0.79, 0.96 – – – – – –Barr qual. 0.31 −0.11, 0.62 0.31 −0.11, 0.62 0.79 0.57, 0.90 – – – – – –

Inter-scale Ballarat /Barr 0.86 0.77, 0.91 0.86 0.77, 0.91 0.95 0.92, 0.97 0.69 0.51, 0.87 0.69 0.51, 0.87 0.95 0.84, 0.99Ballarat /Barr Q 0.89 0.82, 0.93 0.89 0.82, 0.93 0.95 0.92, 0.97 – – – – – –

CL, confidence limits; Q, quantity; qual, quality.

Page 6: Reliable assessment of faecal loading in older adults by abdominal radiograph

Y a t e s M , D a y K , M u l l a n y J e t a l .

18 Australasian Journal on Ageing, Vol 23 No 1 March 2004, Research 13–18

5 Rockney RM, McQuade WH, Days AL. The plain abdominal roentgenogramin the management of encopresis. Archives of Paediatric and Adoles-cent Medicine 1995; 149: 623–627.

6 Bruera E, Suarez-Almazor M, Velasco A, Bertolino M, MacDonald S,Hanson J. The assessment of constipation in terminal cancer patientsadmitted to a palliative care unit: a retrospective review. Journal of Painand Symptom Management 1994; 9: 515–519.

7 Leech SC, McHugh K, Sullivan PB. Evaluation of a method of assessingfaecal loading on plain abdominal radiographs in children. PaediatricRadiology 1999; 29: 255–258.

8 Camilleri M, Thompson WG, Fleshman JW, Pemberton JH, Clinical

management of intractable constipation. Annals of Internal Medicine 1994;121: 520–528.

9 Drossman DA, Corazziari E, Talley NJ, Thompson WG, Whitehead WE.Rome II: The Functional Gastrointestinal Disorders Diagnosis, Pathophysi-ology and Treatment: A Multinational Consensus. McLean: Degnon Associates,2000.

10 Jobson JD. Applied Multivariate Data Analysis. New York: Springer-Verlag, 1992.

11 Agresti A. Categorical Data Analysis. New York: John Wiley & Sons, 1990.12 Margulis A, Burhenne H. Alimentary Tract Roentenology. St Louis: Mosby,

1973.