A Comparison Between Patient Recall and Concurrent Measurement of Preoperative Quality of Life...

7
A Comparison Between Patient Recall and Concurrent Measurement of Preoperative Quality of Life Outcome in Total Hip Arthroplasty Jonathan Howell, MD,* Min Xu, MB, MSc, y Clive P. Duncan, MB, FRCSC,* Bassam A. Masri, MD, FRCSC,* and Donald S. Garbuz, MD, MHSc, FRCSC Abstract: The objective is to evaluate the reliability of patients' recall of preoperative pain and function during the immediate postoperation period after total hip arthroplasty. A prospective cohort of 104 patients completed a survey about their quality of life before operation, and recalled preoperative status at 3 days, 6 weeks, and 12 weeks after operation. Quality of life was measured by the Western Ontario and McMaster University Osteoarthritis Index, the Oxford-12 hip score, and the 12- item Short-Form score. The intraclass correlation coefficient and Spearman correlation coefficient were used to compare preoperative quality of life scores to the scores recalled. The reliability of recall remained high up to 3 months postoperation. Patients are able to accurately recall their preoperative function for up to 3 months after total hip arthroplasty. Key words: recall, reliability, quality of life outcome, total hip arthroplasty. © 2008 Elsevier Inc. All rights reserved. Total hip arthroplasty is an operation that has been shown to lead to significant improvements in physical function, psychological well-being, life satisfaction, and self-rated health [1,2]. Several studies have shown that a patient's preoperative function is strongly associated with their long-term functional outcome [1,3,4]. Therefore, it is impor- tant for orthopedic researchers to report preopera- tive pain and function in outcome studies on total hip arthroplasty. However, it may not always be possible to collect the data immediately before the operation, even in prospective studies. In today's practice of orthopedics, resources are limited and patients are increasingly admitted on the day of surgery, which may interfere with preoperative data collection. In addition, there are circumstances such as after a periprosthetic fracture in which it is not feasible to collect quality of life (QOL) data before the operation. Collection of preoperative QOL status with vali- dated outcome tools, after the operation, would be a useful alternative, if patient recall is reliable. This would facilitate collection of data on a higher percentage of patients enrolled in databases and prospective studies. In addition, for urgent cases From the *Division of Adult Lower Limb Reconstruction & Oncology, Department of Orthopaedics, University of British Columbia, Vancouver, BC, Canada; and yThe Arthritis Research Centre of Canada, Vancouver, BC, Canada. Submitted March 12, 2007; accepted July 30, 2007. No benefits or funds were received in support of the study. Reprint requests: Donald S Garbuz, MD, MHSc, FRCSC, Division of Adult Lower Limb Reconstruction & Oncology, Department of Orthopaedics, University of British Columbia, Room 3114, 910 West 10th Avenue, Vancouver, BC V5Z 4E3, Canada. © 2008 Elsevier Inc. All rights reserved. 0883-5403/08/2306-0009$34.00/0 doi:10.1016/j.arth.2007.07.020 843 The Journal of Arthroplasty Vol. 23 No. 6 2008

Transcript of A Comparison Between Patient Recall and Concurrent Measurement of Preoperative Quality of Life...

The Journal of Arthroplasty Vol. 23 No. 6 2008

A Comparison Between Patient Recall andConcurrent Measurement of Preoperative Quality

of Life Outcome in Total Hip Arthroplasty

Jonathan Howell, MD,* Min Xu, MB, MSc,y Clive P. Duncan, MB, FRCSC,*Bassam A. Masri, MD, FRCSC,* and Donald S. Garbuz, MD, MHSc, FRCSC

Abstract: The objective is to evaluate the reliability of patients' recall of preoperativepain and function during the immediate postoperation period after total hiparthroplasty. A prospective cohort of 104 patients completed a survey about theirquality of life before operation, and recalled preoperative status at 3 days, 6 weeks,and 12 weeks after operation. Quality of life was measured by the Western Ontarioand McMaster University Osteoarthritis Index, the Oxford-12 hip score, and the 12-item Short-Form score. The intraclass correlation coefficient and Spearmancorrelation coefficient were used to compare preoperative quality of life scores tothe scores recalled. The reliability of recall remained high up to 3 monthspostoperation. Patients are able to accurately recall their preoperative function forup to 3 months after total hip arthroplasty. Key words: recall, reliability, quality oflife outcome, total hip arthroplasty.© 2008 Elsevier Inc. All rights reserved.

Total hip arthroplasty is an operation that has beenshown to lead to significant improvements inphysical function, psychological well-being, lifesatisfaction, and self-rated health [1,2]. Severalstudies have shown that a patient's preoperativefunction is strongly associated with their long-termfunctional outcome [1,3,4]. Therefore, it is impor-

From the *Division of Adult Lower Limb Reconstruction & Oncology,Department of Orthopaedics, University of British Columbia, Vancouver,BC, Canada; and yThe Arthritis Research Centre of Canada, Vancouver,BC, Canada.

Submitted March 12, 2007; accepted July 30, 2007.No benefits or funds were received in support of the study.Reprint requests: Donald S Garbuz, MD, MHSc, FRCSC,

Division of Adult Lower Limb Reconstruction & Oncology,Department of Orthopaedics, University of British Columbia,Room 3114, 910 West 10th Avenue, Vancouver, BC V5Z 4E3,Canada.

© 2008 Elsevier Inc. All rights reserved.0883-5403/08/2306-0009$34.00/0doi:10.1016/j.arth.2007.07.020

843

tant for orthopedic researchers to report preopera-tive pain and function in outcome studies on totalhip arthroplasty.

However, it may not always be possible tocollect the data immediately before the operation,even in prospective studies. In today's practice oforthopedics, resources are limited and patients areincreasingly admitted on the day of surgery,which may interfere with preoperative datacollection. In addition, there are circumstancessuch as after a periprosthetic fracture in which itis not feasible to collect quality of life (QOL) databefore the operation.

Collection of preoperative QOL status with vali-dated outcome tools, after the operation, would be auseful alternative, if patient recall is reliable. Thiswould facilitate collection of data on a higherpercentage of patients enrolled in databases andprospective studies. In addition, for urgent cases

844 The Journal of Arthroplasty Vol. 23 No. 6 September 2008

such as those mentioned above, collection ofpreoperative data relies on patient recall in theimmediate postoperative period.Little is known of patients' ability to recall their

preoperative pain and function after total hiparthroplasty. Mancuso and Charlson [5] studiedpatients 2 and a half years after hip arthroplasty andgenerally found only poor to fair agreementbetween responses before and after hip arthroplasty.Overall, patients tended to recall more pain, betterwalking, better function, and worse impact of hiparthritis on health than they reported beforesurgery. Lingard et al [6] studied recall 3 monthsafter knee arthroplasty and found only poor to fairagreement for Western Ontario and McMasterUniversity Osteoarthritis Index (WOMAC) painand Short-Form 36, with patients again tending torecall more pain. However, there is no studyavailable in the literature about the short-termrecall of QOL after total hip arthroplasty.The objective of this prospective study was to

evaluate the reliability of patients' recall of pre-operative pain and function during the immediatepostoperation period after total hip arthroplasty.The primary hypothesis was that patients' recollec-tion of preoperative pain and function is reliable ifcollected during the first 3 days postoperation. Thesecondary hypothesis was that the reliability ofrecollection data does not decline during the first3 months postoperation.

Materials and Methods

Patients

Between September 2002 and January 2003,patients admitted to a tertiary hospital for primaryand revision total hip arthroplasties were prospec-tively recruited in this study. The exclusion criteriawere the following: (1) if they were unable tocomplete a questionnaire written in English with-out assistance; (2) if they underwent bilateral totalhip arthroplasty during the same hospital admis-sion; (3) if their revision arthroplasty was carriedout for an emergency cause such as infection,dislocation, or periprosthetic fracture; and (4) ifanother surgical intervention occurred in thestudy period.

Data Collection

All patients admitted for elective joint arthroplastyattend a preoperative assessment clinic. During theirclinic visit, QOL status is collected using a ques-tionnaire package that is composed of 3 validated

assessment tools: the WOMAC Likert scale [7], theOxford-12 hip score [8], and the 12-item Short-Form (SF-12) score [9]. All patients in this studyattended the preoperative clinic. To minimize anylearning effect, patients were not recruited until aftertheir surgery. Three days after their hip arthroplas-ties, patients were recruited. Informed consent wasobtained from all patients enrolled in the study.Patients were instructed to recall their preoperativestatus and complete the same questionnaire packageat 3 days, 6 weeks, and 12 weeks after surgery. Thequestionnaire packages are collected during thesame preoperative day in clinic and 3 postoperativedays in hospital. The 6 and 12 weeks questionnairepackage is mailed to patients and responded by mail,and followed up by phone call if there is no responsebymail. In addition, demographic data including ageand sex were collected for all participants.

Quality of Life Measurement Instruments

The WOMAC is a disease-specific outcome scorefor osteoarthritis [7,10], composed of separatedimensions for pain (5 items), stiffness (2 items),and function (17 items). At present, it is the mostfrequently used measure of subjective pain and self-rated disability among total joint arthroplastypatients. The Oxford-12 questionnaires for hipand knee are new self-administered assessmenttools that, unlike the WOMAC, are designedspecifically to capture joint arthroplasty outcomes.A global score is the sum of 12-item Likertresponses (scores 1-5) concerning joint pain, func-tion, and mobility. The Oxford-12 is validated in theUK and Sweden [8,11,12]. There is considerableinterest in broadening the use of the Oxford-12scores in total joint arthroplasty outcomes research[13,14]. The SF-12 is a self-administered genericmeasure of Health related quality of life with2 component scores: mental and physical. The SF-12 is widely used and has been shown to be reliableand valid across a broad spectrum of medicalconditions [15,16].

Statistical Analysis

This study compared preoperative QOL scoresobtained before surgery with those recalled after-ward. Paired t tests were used to compare themagnitude of difference between preoperativeQOL scores and postoperative recalled scores.The responses to the preoperative questionnaireswere compared with the responses given at eachof the postoperative intervals using the intraclasscorrelation coefficient (ICC) and Spearman corre-lation coefficient.

A Comparison Between Patient Recall and Concurrent Measurement � Howell et al 845

The nonparametric Spearman correlation testdoes not make assumptions of normal distributionfor the data. Using rank correlation, we rankedthe subjects for each of the variables and then therankings of the 2 variables are compared. TheSpearman correlation tests the direction andstrength of how one variable correlates to theother. The result will lie between the values of +1and −1. If the result is +1, the 2 variables haveperfect correlation, and as one increases so does theother. If the value is −1, there is perfect correlation,and as one variable increases the other decreases.Values close to zero indicate that there is norelationship between the 2 variables.Intraclass correlation coefficient measures agree-

ment beyond chance for continuous data. The ICC isthe ratio of variability between participants over thetotal variability [17]. It is calculated as ICC =(between-groups MS − within-groups MS)/[between-groups MS + (n − 1) × within-groupsMS]. MS is the mean square and n is the number ofcases in responses measured at each time points[18]. An absolute value of correlation less than 0.3indicates a weak relationship; 0.3 to 0.7 is moderate;and more than 0.7 is strong. Subgroup analyseswere also performed with different age groups(b65 and N65 years old), sex (female and male),number of joints (unilateral and bilateral), and typeof operation (primary and revision). Statisticalanalysis was performed with use of SPSS (version14.0; SPSS Inc, Chicago, Ill) software package.

Results

One hundred four patients were included in thestudy, with a mean age of 61 years and a range of 29to 86 years. There were 57 female and 47 malepatients. Seventy-nine patients had primary hiparthroplasties and 25 had revision surgery; 58 hadunilateral hip disease and 46 had bilateral disease.The 79 primary cases included 54 patients with

Table 1. Paired t Test Between Postoperative

Paired t Test3 d Postoperativevs Preoperative

Oxford-12 1.58 **WOMAC global −3.08 *WOMAC pain −2.53 *WOMAC stiffness −2.21WOMAC function −4.64 **SF-12 physical 0.34SF-12 mental −4.82 **

* P b .05.** P b .001.

osteoarthritis, 13 patients with degenerative diseasesecondary to dysplasia, 8 with osteonecrosis, 2 withrheumatoid arthritis, 1 with slipped capital femoralepiphysis, and 1 with multiple epiphyseal dysplasia.Among the revision patients, the primary diagnosesincluded osteoarthritis in 16 cases, 3 posttraumaticcases, 2 cases with ankylosing spondylitis, and 1 eachof rheumatoid arthritis, dysplasia, Perthes disease,and septic arthritis. The response rate was 100%(104/104) at 3 days postoperation, 87% (90/104)at 6 weeks postoperation, and 73% (76/104) at3 months postoperation.

Table 1 shows the results of paired t test betweenpostoperative recollection and preoperative collec-tion of QOL scores. Between preoperative scores and3 days postoperative recollection, there are signifi-cant differences measured by Oxford score,WOMAC function and pain subscale, and SF-12mental component score. Recalled Oxford scores aresignificantly worse than preoperative Oxford score(Δ = 1.58, P = .001). Recalled WOMAC pain scoresare significantly worse than preoperative collection(Δ = −2.21, P = .029). Recalled WOMAC functionscores are significantly worse than preoperativecollection (Δ = −4.64, P b .001). Recalled SF-12mental component scores are also significantlyworse (Δ = −4.82, P b .001).

Between preoperative scores and 6 weeks post-operative recollection, recalled SF-12 mental com-ponent scores are also significantly worse (Δ =−2.79, P = .01). However, there are no significantdifferences observed between recalled score at6 weeks postoperation and preoperation collectionmeasured by all other QOL instrument includingWOMAC, Oxford score, and SF-12 physical compo-nent (P N .05).

Furthermore, between preoperative scores and3 months postoperative recollection, there are nosignificant differences measured by all QOLinstruments including WOMAC, Oxford score,and both SF-12 physical and mental components(P N .05).

Recollection and Preoperation Collection

6 wk Postoperativevs Preoperative

3 mo Postoperativevs Preoperative

0.85 0.02−1.46 −1.14−1.34 −1.603.23 0.99

−1.97 −0.760.23 −0.05

−2.79 ** 0.24

Fig. 1. Intraclass correlation coefficient of recalled QOLscores at 3 days, 6 weeks, and 3 months postoperation vspreoperative QOL scores.

Fig. 2. Spearman correlation coefficient of recalled QOLscores at 3 days, 6 weeks, and 3 months postoperation vspreoperative QOL scores.

846 The Journal of Arthroplasty Vol. 23 No. 6 September 2008

Overall, as shown in Fig. 1 and Table 2, the ICCbetween preoperative scores and recalled scoreswere strong at each time interval. The reliability ofrecall remained high out to 3 months after opera-tion. The only moderate correlation was betweenWOMAC stiffness scores obtained before surgeryand 6 weeks after surgery. Spearman correlationcoefficient was also high at all time points for the3 outcome tools used in this study (Fig. 2).Subgroup analysis for ICC showed the trend of

correlation between preoperative scores, andrecalled scores at 3 days postoperation was verysimilar between female and male patients (Fig. 3),and therefore patients' sex had no effect on therecall ability.However, subgroup analysis for ICC showed that

the patient's age did have an effect on recall ability.The correlation between preoperative scores andrecalled scores at 3 days postoperation remainedstrong in patients younger than 65 years, whereas itvaried from moderate to strong in patients olderthan 65 years (Fig. 4).Subgroup analysis showed that the correlation

between preoperative and recalled scores demon-strated a similar trend in unilateral and bilateral hipdiseases (Fig. 5). Therefore, laterality of the surgeryhad no effect on the recall ability.

Table 2. Reliability of recalled QOL s

Preoperative vs3 d Postoperative

Spearman ρ ICC Sp

Oxford-12 0.818 0.905WOMAC global 0.795 0.857WOMAC pain 0.809 0.894WOMAC stiffness 0.651 0.803WOMAC function 0.844 0.911SF-12 physical 0.659 0.830SF-12 mental 0.772 0.863

All instruments showed a strong correlation inpatients having primary hip arthroplasty. All instru-ments also showed a moderate to strong correlationin patients having revision hip arthroplasty (Fig. 6).

Discussion

This study demonstrated moderate to strongcorrelation of preoperative QOL scores obtainedbefore operation and 3 days, 6 weeks, and 3 monthspostoperation. We have used different statisticalmethods to assess the reliability of patients' recall ofpreoperative function and pain. All methods havedemonstrated moderate to strong correlationbetween preoperative scores and those recalled forup to 3 months after hip arthroplasty. Both functionand pain scores showed strong correlation betweenpreoperative collection and postoperative recollec-tion. Stiffness and mental component scores showedmoderate to strong correlation.

In the present study, we have found that patientsare able to accurately recall their preoperativefunction for up to 3 months after total hiparthroplasty, but not all components of the ques-tionnaire were equally recalled. The site-specificOxford-12 score was particularly reliable, as was the

cores vs Preoperative QOL scores

Preoperative vs6 wk Postoperative

Preoperative vs3 mo Postoperative

earman ρ ICC Spearman ρ ICC

0.803 0.876 0.923 0.9580.781 0.878 0.865 0.9300.779 0.861 0.865 0.9140.530 0.687 0.743 0.8470.736 0.856 0.867 0.9370.537 0.769 0.758 0.8970.714 0.836 0.762 0.926

Fig. 3. Effect of sex on recall reliability.

Fig. 5. Effect of laterality on recall reliability.

A Comparison Between Patient Recall and Concurrent Measurement � Howell et al 847

disease-specific global WOMAC score, although thestiffness domain of the latter was less reliable. Thisdomain contains only 2 questions, which mayexplain the variable results that it produced. Wefound that the general health score SF-12 was theleast reliable of the 3 assessment tools.Little is known of patients' ability to recall their

preoperative QOL status immediately after total hiparthroplasty. One previous study by Mancuso andCharlson [5] reported patients' ability to recall theirfunction after total hip arthroplasty. They studiedpatients 2 and a half years after surgery using thehip-rating questionnaire and generally found onlypoor to fair agreement between responses beforeand after surgery. Overall, their patients tended torecall more pain, better walking, better function,and worse impact of hip arthritis on health than theyhad reported before surgery. The authors concludedthat relying on patients' recall does not provide anaccurate measure of preoperative status. A patient'sassessment of their postoperative result may changewith time as a result of postoperative experiencesthat lead to a so-called response shift [19,20], andthis may explain why, in the study of Mancuso andCharlson [5], patients' recall of the preoperativefunction was poor. However, in our study, recall was

Fig. 4. Effect of age on recall reliability.

assessed in the first 3 months after surgery, and thisprobably explains why the agreement in our studywas high in comparison to that of Mancuso andCharlson [5].

Short-term recall of function after knee arthro-plasty has been reported by Lingard et al [6], whostudied recall of preoperative function 3 monthsafter surgery. In this study, the authors found poor tofair agreement for pain and function, with patientsagain tending to recall more pain. They concludedthat researchers must exercise caution when usingrecall data to derive preoperative status. In 2 furtherorthopedic studies, one in low back pain [21] andone in foot and ankle surgery [22], patients wereagain shown to have only poor to moderate recall ofpreoperative status; although in the low back painstudy, patients underestimated their degree ofpreoperative pain. In their assessment of reliabilityof patients' responses, these 3 studies all used theκ statistic. The κ statistic is useful for measuringagreement for categorical variable outcomes with2 or more possibilities [17]. However, currentmeasurements of QOL outcome use continuous

Fig. 6. Effect of operation type on recall reliability.

848 The Journal of Arthroplasty Vol. 23 No. 6 September 2008

scales, and transforming continuous scales intocategorical outcome will result in the loss ofinformation and more variability. This will result inlower levels of agreement when the κ statistic isused. In our study, the ICCwas used, which does notcategorize the data; and this may explain why wewere able to show higher levels of agreement thanwere reported by Lingard et al.Bedard et al [17] compared measures of assessing

reliability under different statistical conditionsincluding those of perfect agreement, systematicbias, and random bias. They have found that the ICCis able to distinguish between reliable data and datawith both systematic and random bias [17]. Theysuggest that it is the best statistical method formeasuring reliability, and Rousson et al [23] haveconfirmed that for intra- and interrater reliability,the ICC is the preferred method. Therefore, weevaluated ICC for the subgroup analysis. The abilityof patients to recall their preoperative ability was notaffected by a number of demographic measures inincluding sex, disease laterality, or operation type,although older patients did have slightly poorerrecall of their function.Paired t tests were used to measure the change

between postoperative recollection and preopera-tive collection of QOL scores. There were statisticallysignificant differences measured between 3 dayspostoperative recollection and preoperative collec-tion. At 3 days postoperation, patients recalled morepain and worse function before operation. However,the difference is not clinically significant. Forexample, the biggest difference detected is forWOMAC function score (Δ = −4.64), which is lessthan the minimal clinical improvement (9.3 points)[24]. Also, there were neither statistically norclinically significant differences observed between6 weeks postoperative recollection and preoperativecollection. Similarly, no significant differences wereobserved between 3 months postoperative recollec-tion and preoperative collection.There are limitations in our study as well. In the

present study, patients may have had a heightenedawareness of their preoperative function as a resultof their being repeated testing at 3 different post-operative intervals. We minimized learning effectsby recruiting patients after their total hip arthro-plasty, thus preventing them from memorizing theirpreoperative scores. Preoperative data were col-lected at a routine preadmission clinic that was heldwithin the 42 days before a patient's operation andthe first postoperative data were collected 3 daysafterward. It is possible that some of the patientsmay have recalled their preoperative responses andused this in subsequent answers for the 6-week and

3-month intervals. However, we compared thereliability of patients who had their preoperativeassessment within 21 days of their operation andfound that this was no different to the reliability ofpatients who had their preoperative data collectedmore than 21 days before surgery (P N .05), and so itseems that this retrospective learning effect wasprobably negligible. Nevertheless, it may have beenadvantageous to have tested 3 different groups ofpatients, one at 3 days, one at 6 weeks, and one at3 months to minimize this effect.

Another limitation of the study is that theresponse at 6 weeks and 3 months postoperationwas collected by mail and phone. The response ratewas 87% (90/104) at 6 weeks postoperation and73% (76/104) at 3 months postoperation. However,the demographic factors such as age, sex, andpreoperative scores between responders and non-responders were not significantly different (P N .05).Therefore, there does not appear to be a responsebias in this study.

In outcomes research in total hip arthroplasty,obtaining preoperative QOL scores is imperative.The authors believe the gold standard is obtainingthe scores immediately before surgery. However, incases where this cannot be done, this study supportsobtaining these scores up to 3 months after surgery.

References

1. Cleary LJ, Byrne JH. Identification and characterizationof a multifunction neuron contributing to defensivearousal in Aplysia. J Neurophysiol 1993;70:1767.

2. Petrie K, Chamberlain K, Azariah R. The psychologi-cal impact of hip arthroplasty. Aust N Z J Surg 1994;64:115.

3. Fortin PR, Clarke AE, Joseph L, et al. Outcomes oftotal hip and knee replacement: preoperative func-tional status predicts outcomes at six months aftersurgery. Arthritis Rheum 1999;42:1722.

4. Fortin PR, Penrod JR, Clarke AE, et al. Timing of totaljoint replacement affects clinical outcomes amongpatients with osteoarthritis of the hip or knee.Arthritis Rheum 2002;46:3327.

5. Mancuso CA, Charlson ME. Does recollection errorthreaten the validity of cross-sectional studies ofeffectiveness? Med Care 1995;33(Suppl 4):AS77.

6. Lingard EA, Wright EA, Sledge CB. Pitfalls of usingpatient recall to derive preoperative status in outcomestudies of total knee arthroplasty. J Bone Joint SurgAm 2001;83-A:1149.

7. Bellamy N, Buchanan WW, Goldsmith CH, et al.Validation study of WOMAC: a health status instru-ment for measuring clinically important patientrelevant outcomes to antirheumatic drug therapy inpatients with osteoarthritis of the hip or knee.J Rheumatol 1988;15:1833.

A Comparison Between Patient Recall and Concurrent Measurement � Howell et al 849

8. Dawson J, Fitzpatrick R, Murray D, et al. Comparisonof measures to assess outcomes in total hip replace-ment surgery. Qual Health Care 1996;5:81.

9. Ware Jr JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual frameworkand item selection. Med Care 1992;30:473.

10. Bellamy N, Kirwan J, Boers M, et al. Recommenda-tions for a core set of outcome measures for futurephase III clinical trials in knee, hip, and handosteoarthritis. Consensus development at OMERACTIII. J Rheumatol 1997;24:799.

11. Fitzpatrick R, Morris R, Hajat S, et al. The value ofshort and simple measures to assess outcomes forpatients of total hip replacement surgery. Qual HealthCare 2000;9:146.

12. Garbuz DS, Xu M, Sayre EC. Patients' outcome aftertotal hip arthroplasty A comparison between theWestern Ontario and McMaster Universities Indexand the Oxford 12-item Hip Score. J Arthroplasty2006;21:998.

13. Dunbar MJ, Robertsson O, Ryd L, et al. Translationand validation of the Oxford-12 item knee score foruse in Sweden. Acta Orthop Scand 2000;71:268.

14. Dunbar MJ, Robertsson O, Ryd L, et al. Appropriatequestionnaires for knee arthroplasty. Results of asurvey of 3600 patients from The Swedish KneeArthroplasty Registry. J Bone Joint Surg Br 2001;83:339.

15. Jenkinson C, Wright L, Coulter A. Criterion validityand reliability of the SF-36 in a population sample.Qual Life Res 1994;3:7.

16. McHorney CA, Kosinski M, Ware Jr JE. Comparisonsof the costs and quality of norms for the SF-36 health

survey collected by mail versus telephone interview:results from a national survey. Med Care 1994;32:551.

17. Bedard M, Martin NJ, Krueger P, et al. Assessingreproducibility of data obtained with instrumentsbased on continuous measurements. Exp Aging Res2000;26:353.

18. Fleiss JL, Cohen J. The equivalence of weighted kappaand the intraclass correlation coefficient as measuresof reliability. Educ Psychol Meas 1973;33:619.

19. Sprangers MA. Response-shift bias: a challenge to theassessment of patients' quality of life in cancer clinicaltrials. Cancer Treat Rev 1996;22(Suppl A):55.

20. Sprangers MA, Schwartz CE. Integrating responseshift into health-related quality of life research: atheoretical model. Soc Sci Med 1999;48:1507.

21. Dawson EG, Kanim LE, Sra P, et al. Low back painrecollection versus concurrent accounts: outcomesanalysis. Spine 2002;27:984.

22. Toolan BC, Wright QV, Cunningham BJ, et al. Anevaluation of the use of retrospectively acquiredpreoperative AOFAS clinical rating scores to assesssurgical outcome after elective foot and ankle surgery.Foot Ankle Int 2001;22:775.

23. Rousson V, Gasser T, Seifert B. Assessing intrarater,interrater and test-retest reliability of continuousmeasurements. Stat Med 2002;21:3431.

24. Ehrich EW, Davies GM, Watson DJ, et al.Minimal perceptible clinical improvement withthe Western Ontario and McMaster Universities os-teoarthritis index questionnaire and global assess-ments in patients with osteoarthritis. J Rheumatol2000;27:2635.