Measurement Properties of the Whiplash Disability ... · Measurement Properties of the Whiplash...
Transcript of Measurement Properties of the Whiplash Disability ... · Measurement Properties of the Whiplash...
Measurement Properties of the Whiplash Disability
Questionnaire in Acute Whiplash-associated Disorders
by
Maja Stupar
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Institute of Health Policy Management and Evaluation University of Toronto
© Copyright by Maja Stupar, 2013
ii
Measurement Properties of the Whiplash Disability Questionnaire
in acute Whiplash-associated Disorders
Maja Stupar
Doctor of Philosophy (Clinical Epidemiology)
Institute of Health Policy, Management and Evaluation University of Toronto
2013
Abstract
Whiplash-associated disorders (WAD) include physical and psychological symptoms that may
lead to disability. However, measuring disability following whiplash injuries is challenging
because we lack valid and reliable measurement tools. The assessment of WAD-related
disability relies on self-reported instruments that are specific to neck pain and do not
comprehensively target the constructs associated with WAD-related disability. Designing new
tools and evaluating their measurement properties is challenging because of the apparent
inconsistencies in the theoretical frameworks (psychometrics and clinimetrics) used in
instrument development and in reliability, validity and responsiveness evaluation.
A scoping review design was used to develop a conceptual theory on the difference between
clinimetrics and psychometrics in order to provide recommendations for future application. The
scoping review of psychometric and clinimetric methods suggested that the two frameworks are
not as divergent as reflected in the current protracted debates. Content analysis revealed that
differences only exist in the scope of what is measured and in instrument development methods
with no operational differences in the testing phases. Based on content analysis, I developed a
iii
new framework that bridges the two measurement schools with an overlapping informed zone
between them.
I designed a cohort study of 130 participants with acute WAD to assess the measurement
properties of the Whiplash Disability Questionnaire (WDQ). The WDQ is a recently developed
instrument designed to capture the broad construct of WAD-related disability. The WDQ
measurement properties were determined in adults with WAD recruited within 21 days of their
collision. My study indicates that the WDQ and its subscales are reliable and valid for clinical
and research use. The WDQ can demonstrate change over time as a single scale or as the daily
activities subscale. However, WDQ users should be aware of its measurement error when
demonstrating change over time. Furthermore, the emotional subscale should not be used alone
to demonstrate change over six weeks because it was not responsive.
My thesis proposes a unified framework for studying the measurement properties of assessment
tools used in clinical practice. I also demonstrated that the WDQ possesses the necessary
properties to be used in patients with acute WAD.
iv !
Acknowledgments
!The journey of life is not a journey taken alone. I would like to thank everyone who has
contributed to my journey through this doctoral program.
First, I would like to thank my supervisor, Dr. Pierre Côté, for his mentorship, support and
availability throughout my doctoral journey. I thank him for challenging me and helping me
mold into the young investigator that I have become. His infectious enthusiasm for research and
excellence in scientific rigor continue to be inspiring. I was also privileged to work with a
dedicated and supportive advisory committee, Dr. Dorcas Beaton, Dr. Eleanor Boyle and Dr. J.
David Cassidy. I am thankful for their guidance and tireless feedback.
Several individuals and teams contributed support directly and indirectly in the completion my
thesis. I would like to thank everyone who contributed to the recruitment, data collection,
processing and completion of the UHN Whiplash Intervention Trial that, in turn, helped the
completion of this thesis project.
Without funding support, the completion of my doctoral program would not be possible. I would
like to thank the Canadian Institute of Health Research (CIHR) for providing three years of
financial support toward my doctoral studies through the Vanier Canada Graduate Scholarship.
My doctoral education experience was also enriched with the opportunity to study abroad at
Karolinska Institute in Stockholm, Sweden through the support of the CIHR Michael Smith
Foreign Study Supplement. I am thankful to the Department of Clinical Epidemiology and
Health Care Research within the Institute of Health Policy, Management and Evaluation at the
University of Toronto for providing additional support. Finally, without AVIVA Canada’s
vision of investing in research to improve business practices, the UHN Whiplash Intervention
Trial would not be possible and, in turn, my thesis projects would not have been completed. I
thank all these institutions for making it possible for me to dedicate time to my doctoral
education.
v !
I thank my personal friends for their smiles and laughter that made my journey that much more
enjoyable and for their support during those more challenging times.
I am thankful to my family for their unwavering support; to my wonderful parents Milica and
Ilija Stupar, for guiding me through the rollercoaster of life and teaching me the value of hard
work; and to my dear sister Biljana for always standing by me with an attentive ear and for all
her insightful advice.
vi
Table of Contents
Abstract .......................................................................................................................................... ii
Acknowledgments ........................................................................................................................ iv
Table of Contents ......................................................................................................................... vi
List of Tables ................................................................................................................................ xi
List of Figures ............................................................................................................................. xiii
List of Appendices ...................................................................................................................... xiv
List of abbreviations ................................................................................................................... xv
Preface ............................................................................................................................................ 1
Chapter 1 : Introduction .............................................................................................................. 3
1.1 Measuring disability in health research .......................................................................... 3
1.2 Epidemiology of Whiplash-associated Disorders ........................................................... 4
1.2.1 Definition .................................................................................................................... 4
1.2.2 The burden of whiplash-associated disorders in the population ................................ 5
1.2.3 Prognosis of Whiplash-associated Disorders ............................................................. 6
1.2.4 Treatment of Whiplash-associated Disorders ............................................................. 6
1.2.5 Outcome measures currently used in WAD research ................................................. 7
1.3 The measurement divide .................................................................................................. 9
1.4 Objectives........................................................................................................................... 9
1.4.1 General Objectives...................................................................................................... 9
1.4.2 Specific Objectives .................................................................................................... 10
vii
1.5 Structure of the Thesis.................................................................................................... 10
Chapter 2 : Measurement Properties: A new framework to contribute to the debate
between the field of clinimetrics and psychometrics ............................................................... 12
2.1 Introduction ..................................................................................................................... 12
2.2 Methods ............................................................................................................................ 14
2.2.1 Research question ..................................................................................................... 14
2.2.2 Search for relevant studies........................................................................................ 14
2.2.3 Study selection .......................................................................................................... 14
2.2.4 Data charting ............................................................................................................ 14
2.2.5 Collation, summarizing and reporting results including synthesis .......................... 15
2.3 Results .............................................................................................................................. 15
2.3.1 Literature search ....................................................................................................... 15
2.3.2 Study selection .......................................................................................................... 16
2.3.3 Data charting ............................................................................................................ 17
2.3.4 Collation, summarizing and reporting of results ...................................................... 19
2.3.5 Synthesis .................................................................................................................... 41
2.4 Discussion......................................................................................................................... 42
2.5 Conclusion ....................................................................................................................... 45
Chapter 3 : Can Recovery from Whiplash-associated Disorders be Measured Reliably in
Patients with Acute Whiplash-Associated Disorders? A Test-retest Reliability Study of the
Whiplash Disability Questionnaire ........................................................................................... 46
3.1 Introduction ..................................................................................................................... 46
3.2 Methods ............................................................................................................................ 47
3.2.1 Participants ............................................................................................................... 47
viii
3.2.2 Procedure .................................................................................................................. 47
3.2.3 Data........................................................................................................................... 47
3.2.4 Sample Size ............................................................................................................... 48
3.2.5 Analysis ..................................................................................................................... 48
3.2.5.1 Test-Retest Reliability ............................................................................................... 48
3.2.5.2 Minimal detectable change ....................................................................................... 49
3.2.5.3 Sensitivity Analyses ................................................................................................... 49
3.3 Results .............................................................................................................................. 49
3.3.1 Descriptive statistics ................................................................................................. 50
3.3.2 Completeness of WDQ .............................................................................................. 50
3.3.3 Test-retest reliability ................................................................................................. 51
3.3.4 Individual item test-retest reliability ......................................................................... 52
3.3.5 Minimal detectable change ....................................................................................... 54
3.4 Discussion......................................................................................................................... 54
3.5 Conclusion ....................................................................................................................... 56
3.6 Acknowledgement ........................................................................................................... 56
Chapter 4 : Exploratory Factor Analysis, Validity and Responsiveness of the Whiplash
Disability Questionnaire in Adults with Acute Whiplash-associated Disorders ................... 57
4.1 Introduction ..................................................................................................................... 57
4.2 Methods ............................................................................................................................ 58
4.2.1 Participants and Procedures .................................................................................... 58
4.2.2 Data Collection ......................................................................................................... 59
4.2.2.1 Whiplash Disability Questionnaire ........................................................................... 60
4.2.2.2 Numerical Pain Rating Scale .................................................................................... 60
ix
4.2.2.3 Neck Disability Index ................................................................................................ 60
4.2.2.4 Neck Bournemouth Questionnaire ............................................................................ 61
4.2.2.5 CES-D ....................................................................................................................... 62
4.2.2.6 SF-36 Health Survey ................................................................................................. 62
4.2.2.7 Self-report Recovery ................................................................................................. 63
4.2.3 Analysis ..................................................................................................................... 63
4.2.3.1 Descriptive statistics ................................................................................................. 63
4.2.3.2 Factor Structure ........................................................................................................ 63
4.2.3.3 Validity ...................................................................................................................... 65
4.2.3.4 Responsiveness .......................................................................................................... 65
4.2.4 Sample Size ............................................................................................................... 67
4.3 Results .............................................................................................................................. 67
4.3.1 Sample characteristics .............................................................................................. 67
4.3.2 Data completion ........................................................................................................ 70
4.3.3 Factor structure ........................................................................................................ 71
4.3.4 Validity ...................................................................................................................... 74
4.3.5 Responsiveness .......................................................................................................... 76
4.4 Discussion......................................................................................................................... 78
4.5 Conclusion ....................................................................................................................... 81
4.6 Acknowledgement ........................................................................................................... 82
Chapter 5 : Discussion ................................................................................................................ 83
5.1 Context and summary of the thesis ............................................................................... 83
5.2 Contribution of the research to the whiplash literature ............................................. 84
x
5.3 Implications of the research ........................................................................................... 86
5.4 Future research ............................................................................................................... 87
5.4.1 Content validity using qualitative methods ............................................................... 87
5.4.2 Minimizing measurement error................................................................................. 88
5.4.3 Longitudinal and structural construct validity ......................................................... 88
5.4.4 Predictive validity ..................................................................................................... 89
5.4.5 Direct comparison with other relevant instruments ................................................. 89
5.4.6 Applicability of the conceptual framework ............................................................... 89
References .................................................................................................................................... 91
Appendices ................................................................................................................................. 105
xi
List of Tables
Table 2.1: Position statement of our framework and the evidence that is in support of the
framework ..................................................................................................................................... 23
Table 2.2a: Studies using empirical methods to test differences between clinimetric and
psychometric methods .................................................................................................................. 37
Table 2.2b: Studies using empirical methods to test differences between clinimetric and
psychometric methods .................................................................................................................. 40
Table 3.1: Baseline demographic characteristics of patients with acute whiplash associated
disorders. ....................................................................................................................................... 51
Table 3.2: Intra-class Correlation Coefficient for the Total Summary Score categorized by the
report of no recovery on the change in neck pain question and memory effects .......................... 52
Table 3.3: Sensitivity Analysis for the Intra-class Correlation Coefficient for the Total Summary
Score ............................................................................................................................................. 53
Table 3.4: Intra-class Correlation Coefficient for individual items of the WDQ ........................ 53
Table 4.1: Baseline demographic characteristics of patients with acute whiplash associated
disorders. ....................................................................................................................................... 69
Table 4.2: Baseline means, medians and normality values of WDQ total score and individual
items .............................................................................................................................................. 70
Table 4.3: Model fit statistics for the models with different number of factors in the WDQ ....... 73
Table 4.4: Factor analysis of the WDQ: The 2-factor solution ..................................................... 74
Table 4.5: Results of construct validation (n=130). A priori expected Pearson correlations
between the WDQ, its subdomains and constructs shown (E) followed by observed/achieved
results (A)...................................................................................................................................... 75
xii
List of Tables (continued)
Table 4.6: Effect size, Guyatt’s responsiveness statistic (RS) and standardize response mean
(SRM) for participants reporting recovery on the global recovery question (N=62) ................... 76
Table 4.7: Spearman’s rank correlations and AUCs for responsiveness based on the a priori
hypotheses ..................................................................................................................................... 77
xiii
List of Figures
Figure 1. 1: Data collection and data use in analysis addressing objectives two to six ............... 11!
Figure 2.1: Literature search for the measurement divide scoping review .................................. 16
Figure 2.2: Latent construct relationship with causal and indicator variables ............................. 18
Figure 2.3: Conceptual framework bridging clinimetrics and psychometrics ............................. 22
Figure 4.1: Total WDQ baseline distribution .............................................................................. 71
Figure 4.2: Factor analysis scree plot .......................................................................................... 72
xiv
List of Appendices
Appendix 1: Questionnaires ....................................................................................................... 105
A-1.1: Baseline Questionnaire ............................................................................................... 105
A-1.2: Three-to-Five Day Follow-up Questionnaire.............................................................. 122
A-1.3: Six-week Follow-up Questionnaire ............................................................................ 125
A-1.4: Addition to WIT Baseline Questionnaire ................................................................... 137
A-1.5: Addition to WIT Six-week Follow-up Questionnaire ................................................ 142
Appendix 2: Ethics Certificates ................................................................................................. 144
A-2.1: University Health Network Ethics Approval ............................................................. 144
A-2.2: University of Toronto Ethics Approval ..................................................................... 146
Appendix 3: Baseline WDQ Distributions ................................................................................ 147
Appendix 4: COSMIN Checklist completed with criteria relevant to this thesis ...................... 154
xv
List of abbreviations AIC Akaike information criteria ANOVA Analysis of Variance AUC Area under the curve CES-D Center for Epidemiologic Studies Depression Scale CINAHL Cumulative Index to Nursing and Allied Health COSMIN COnsensus-based Standards for the selection of health Measurement INstruments DASH Disabilities of the Arm, Shoulder and Hands EFA Exploratory Factor Analysis ES Effect Size GTA Greater Toronto Area ICC Intra-class Correlation Coefficient ICF International Classification of Functioning, Disability and Health framework IRT Item Response Theory KMO Kaiser-Meyer-Olkin MCID Minimal Clinically Important Difference MDC Minimal Detectable Change MeSH Medical Subject Heading NDI Neck Disability Index NPTF Neck Pain Task Force NRS Numerical Rating Scale PCA Principal Component Analysis QTF Quebec Task Force RMSR Root Mean Square Residual ROC Receiver operating characteristic SBC Schwarz Bayesian criteria SEM Standard Error of Measurement SF-36 Short-Form Health Survey containing 36 items from the Medical Outcomes Study SRM Standardized Response Mean TLRC Tucker and Lewis Reliability coefficient UHN University Health Network VAS Visual Analog Scale WAD Whiplash-associated Disorders WDQ Whiplash Disability Questionnaire WHO World Health Organization WIT Whiplash Intervention Trial WOMAC Western Ontario and McMaster Universities Osteoarthritis Index
1
Preface
Background
The general purpose of my thesis was to determine the measurement properties of the Whiplash
Disability Questionnaire (WDQ) in patients with acute Whiplash-Associated Disorders (WAD)
and to develop a conceptual theory on the difference between clinimetrics and psychometrics. In
order to meet these goals, I performed a scoping literature review and I designed a cohort study
that involved primary data collection. Potential participants for this cohort study were recruited
alongside the University Health Network (UHN) Whiplash Intervention Trial (WIT) but
participation for this study was offered regardless of their eligibility for the trial.[26] The UHN
WIT investigated the effectiveness of programs of care in improving recovery of patients with
recent WAD. The recruited population for the UHN WIT included adults who made an
insurance claim for traffic injuries to a large Ontario insurer (Aviva Canada) between February
2008 and June 2012 with WAD diagnoses Grades I-II[113] of less than 3 weeks duration.
Participants were given the opportunity to participate in both studies but the cohort study also
included WAD Grade III and had a shorter recruitment period from February 2008 to August
2009. The UHN WIT was led by Dr. Pierre Côté. I was a clinical research coordinator of the
UHN WIT and the cohort study as well as one of the co-authors.
The objectives of the research conducted for my doctoral dissertation are separate from the
randomized controlled trial lead by Dr. Pierre Côté. The UHN WIT provided the infrastructure
for recruiting participants within 21 days of their collision. Without this infrastructure, recruiting
participants with acute WAD would not be possible in Ontario for a small cohort study. Within
this infrastructure, claims adjusters identified potential study subjects when policy holders
contacted AVIVA’s claim center to report an injury. A short screening tool was designed to
assist adjusters in identifying eligible participants. The tool prompted the adjusters to inquire
about their location of residence (GTA, Barrie, Brantford, Burlington, Cambridge, Guelph,
Hamilton, Kitchener-Waterloo, New Market, Oshawa, and surrounding towns); their age (18
years or older); whether they were making an injury claim and whether their collision was within
21 days of reporting the injury. If they satisfied these conditions, the adjusters invited them to
enter a study at UHN, and asked permission to release their name and phone number to the UHN
2
research team. If they agreed, the claimant was referred immediately to one of the clinical
research coordinators and booked for eligibility assessment. The offer to participate in both
studies was given if potential participants were determined to be eligible after the history,
physical exam and, if needed, a radiological exam performed by the clinical research
coordinators. Informed consent was obtained separately for each study. Some baseline and six-
week follow-up data was the same for both studies and that data was collected only once for
participants in both studies with only a few additional questions asked for the cohort study.
These data collection procedures reduced the burden on study participants and provided the
cohort with a rich dataset appropriate for analysis of measurement properties of the WDQ in
acute WAD.
Roles and Responsibilities As a clarification of the roles and responsibilities, my specific tasks in the conduct of this
research over the past six and a half years are outlined below:
i. Designed the study and defended the protocol in May 2008;
ii. Wrote the ethics applications to the University Health Network and University of
Toronto;
iii. Coordinated participant recruitment and data collection including baseline, 3-5 day
reliability study follow-up and the 6-week responsiveness study follow-up;
iv. Developed, cleaned, validated and managed databases used for the cohort study;
v. Conducted the analysis for the test-retest reliability, factor analysis, construct validity
and responsiveness;
vi. Designed, led and contributed to the scoping review of literature as one of two
reviewers;
vii. Conceptualized the scoping review framework based on content analysis with one
other author;
viii. I was the primary author and lead writer of all the papers presented in this thesis.
3
Chapter 1 :
Introduction
1.1 Measuring disability in health research In the era of evidence-based medicine and health care accountability, measuring outcomes with
validated outcome measures is the essential building block for developing evidence and
implementing it into practice.[52,104] How health outcomes are assessed directly impacts on the
development of effective therapies and on the evaluation of their cost-effectiveness. Outcome
measures need to be validated for use in research and in clinical practice.[33] Validation of an
instrument means that measurement properties have been tested to ensure that the instrument
measures what it purports to measure and that it can accurately demonstrate change over a
clinically relevant period of time. Without adequate measurement properties, results of clinical
trials would be biased and change demonstrated in clinical settings would be inaccurate. Results
from clinical studies are only as good as the instruments used to measure outcomes in those
studies.
Unlike weight or blood pressure, many health outcomes cannot be directly measured. Therefore,
many outcomes are defined and measured as latent constructs. The development and testing of
instruments used to measure latent constructs is complex. Developers of instruments need to
consider the definition of the construct that is being measured, the time component of what is
measured (e.g. change over time, current state), items that should be included in self-report
outcome measures to capture the scope of the construct and how to score the measure.[33] Once
developed, outcome measures must be tested to establish reliability, validity, and responsiveness.
Evaluating an instrument’s ability to accurately measure latent constructs can be complicated by
the lack of consistency in the terminology and methods used in the field of measurement. I used
the definitions provided by the consensus-based standards for the selection of health
measurement instruments (COSMIN) group because they used Delphi methods to reach
consensus on taxonomy, terminology and definitions related to measurement.[89] They defined
reliability as ‘the extent to which scores for patients who have not changed are the same for
repeated measurement under several conditions’. These conditions can be categorized into
4
different types of reliability including: 1. internal consistency (e.g. using a different set of items
from the same health related-patient reported outcomes); 2. test-retest (e.g. testing change over
time); 3. inter-rater (e.g. testing the condition by different persons on the same occasion); and 4.
intra-rater reliability (e.g. testing the condition by the same raters on different occasions).[89]
Validity was defined as ‘the degree to which an instrument measures the construct(s) it purports
to measure’. Validity can also be assessed using different criteria and was categorized into: 1.
content validity (e.g. if the instrument adequately reflects the construct); 2. construct validity
(e.g. if the instrument is consistent with hypotheses relating to internal and external instrument
relationships and relevant group differences demonstrating measurement of the construct); and 3.
criterion validity (e.g. if the instrument is an adequate reflection of a gold standard).
Responsiveness was defined as ‘the ability of an instrument to detect change over time in the
construct to be measured’. Finally, the outcome measure must be interpretable meaning that a
qualitative meaning can be assigned to the quantitative or change scores to some degree.[89]
Adequate measurement properties including reliability, validity, responsiveness and
interpretability are necessary when using an instrument in clinical and research settings to ensure
accurate measurement of outcomes. However, measurement properties are specific to the
condition, setting and population in which the instrument is assessed.[119] Therefore,
researchers and clinicians must consider the conditions, settings and populations in which
instruments are to be used or to which the instrument needs to be applicable when determining
their properties.
1.2 Epidemiology of Whiplash-associated Disorders
1.2.1 Definition
The Quebec Task Force (QTF) on Whiplash-Associated Disorders defined whiplash as an
acceleration-deceleration mechanism of energy transfer to the neck which may result in bony or
soft tissue injuries that commonly occurs in motor vehicle collisions.[113] The resulting
whiplash associated disorders (WAD) are defined as a “clinical manifestation of, or the disability
caused by, whiplash injury and may include biologic, psychological, and social symptoms of the
5
potential tissue damage”.[99,113] Common WAD symptoms include neck pain, back pain,
headache, dizziness, arm pain, concentration problems and depression.[13,19,113]
1.2.2 The burden of whiplash-associated disorders in the population
Whiplash injuries are common following motor vehicle collisions. In the United States,
whiplash-related injuries were reported as the most common emergency department-treated
motor vehicle injury in 2000 with an incidence of 328 visits per 100,000 inhabitants.[101] In
2008, a systematic review of literature on the burden of neck pain and associated disorders such
as WAD was published by The 2000–2010 Bone and Joint Decade Task Force on Neck Pain and
Its Associated Disorders.[61] This systematic review estimated the annual incidence to be at
least 300 per 100,000 inhabitants in North America and western Europe.[61] It also reported that
the incidence of WAD differed substantially between countries. Similarly, a 2008 study by the
European Insurance Committee found that the incidence of minor cervical trauma (defined as a
percentage of overall claims) varied widely across ten European countries with the lowest
incidence found in France (3% of all bodily injuries) and the highest in Great Britain (76% of all
bodily injuries).[20] This study also found that the cost of minor cervical trauma varied greatly
between countries with Switzerland having higher costs (average cost of 35000 euro per claim)
compared to other European countries (average cost of 9000 euro per claim). However, these
cost differences did not reflect the difference in incidence across countries.
The incidence of WAD also varies across Canadian provinces. In Saskatchewan, the six-month
incidence of WAD was approximately 300 cases per 100,000 inhabitants in 1995.[19] WAD
were reported by 83% of all eligible participants in this cohort.[19,25] In contrast, the 12-month
incidence was 70 cases per 100,000 inhabitants in Quebec.[113] Different compensation
systems have been shown to influence the incidence and prognosis of WAD and may provide
part of the explanation for the varied reporting of injuries across provinces and countries.[19]
Multiple studies have reported the incidence of WAD is higher in women and more common in
younger ages.[25,27,61,113] It can also be influenced by several risk factors including personal,
societal, and environmental.[61]
6
1.2.3 Prognosis of Whiplash-associated Disorders
Whiplash injuries are an important cause of persistent disability. Although the QTF originally
reported that WAD is a self-limiting condition with a favourable prognosis, subsequent studies
found that the course of WAD varies greatly between jurisdictions and insurance
systems.[28,113] In Saskatchewan, the median time to recovery decreased from 433 days in
1994 for claimants under the tort system to 200 days in 1995 for those insured under the no-fault
system.[19] In contrast, the original study from the QTF on WAD in 1987, reported the median
time on compensation to be 30 days with 4.1% of individuals were receiving compensation one-
year after the collision.[28,113] A review of literature published by the NPTF on the course and
prognosis of WAD reported that approximately 50% of those with WAD will report neck pain
symptoms one-year after their injuries.[15]
1.2.3.1 Prognostic factors for Recovery From WAD
The prognosis of WAD is complex and influenced by physical and psychological factors.
Studies have found that greater initial pain intensity, more symptoms and greater initial disability
predict slower recovery from WAD.[15,135] Pre- and post-injury psychosocial factors such as
passive coping, depressed mood and fear of movement are also predictive of slower
recovery.[17,97,114] Other studies reported that sociodemographic factors (e.g. female gender,
lower education), general health before the injury and insurance/compensation systems under
which benefits can be claimed were associated with WAD recovery.[15,19,28,135] In addition,
an individual’s expectation of recovery is an important prognostic factor for delayed recovery
with those reporting poor expectations showing much slower rates of recovery than those who
expect to get better soon after their injury.[62,94,95]
1.2.4 Treatment of Whiplash-associated Disorders
Identification of effective therapies through research studies is important in providing evidence-
based care that can influence the prognosis of WAD. The NPTF systematic review on the
treatment of neck pain reported that there is evidence that educational videos, mobilization and
7
exercises appear more beneficial than usual care or physical modalities in promoting the
recovery of patients with WAD.[67] However, the role of education in the management of
WAD is being debated as evidenced by two recent systematic reviews that reached different
conclusions.[53,130] Moreover, evidence from observational studies suggests that early
intensive management of WAD may delay recovery.[29,30] Similarly, a population-based
cohort study from Saskatchewan has shown that individuals receiving fitness training and
outpatient rehabilitation had a 19-50% slower recovery from WAD.[18] The effectiveness of
rehabilitation, training programs and other health care services commonly provided to patients
with WAD needs to be determined in randomized controlled trials.[71]
For clinical trials to accurately demonstrate therapy effectiveness, appropriate outcome measures
must be used to evaluate the clinical evolution of a condition. Currently used measures in WAD
clinical trials have focused on the assessment of disability related to the neck. Considering that
WAD commonly present with a constellation of symptoms, currently used measures may be
missing the full spectrum of disability and recovery from WAD.[59] Furthermore, clinical
outcome measures must demonstrate good reliability, validity and responsiveness in order to be
useful clinically and for research purposes.[119]
1.2.5 Outcome measures currently used in WAD research
The construct of disability is difficult to define and measure. It is a concept that is not physically
tangible and can be highly contextualized; therefore, it may differ from person to person and
from situation to situation.[5] While previous definitions focused on activity limitations, the
most current International Classification of Functioning (ICF) framework proposes that disability
includes impairments, activity limitations, and participation restrictions.[142] The new ICF
model attempts to capture aspects of the condition covered not only by impairment and activity
limitation but also its effect on the individual’s participation in life events. Because it
encompasses the effect of the disability on all aspects of the individual, the ICF is a useful model
to base the measurement of WAD disability on. To be valid, self-report outcome measures need
to capture all components of a construct. Most measures currently used to measure WAD-
related disability do not have a body of evidence that supports their construct definition,
comprehensiveness, validity or reproducibility.
8
A commonly used outcome measure in whiplash research is the Neck Disability Index (NDI).
The NDI was developed to capture neck-specific disability and consists of 10 items, each with 6
response options rated from 0 (no disability) to 5 (maximal disability).[56,116,131,139] The
items include questions on pain intensity and related to the effect of neck pain on function
relevant to personal care, lifting, reading, headaches, concentration, work, driving, sleeping and
recreation.[131] The NDI has been reported to have good construct validity, reliability and
responsiveness in different populations.[56,116,139] However, it was not designed for WAD
and therefore it does not capture all aspects of WAD disability. A review of the published
literature demonstrated that the NDI omits important components of WAD disability because it
centers on neck pain.[63] Specifically, only three of nine disability items (i.e. work, driving, and
sleep) identified by WAD patients as being important are included in the NDI.[63] Other items
important to WAD patients that are not included in the NDI include fatigue, participation in
sports, depression, socializing with friends, frustration and anger.[63] Furthermore, neck pain
patients have been found to have lower general health scores based on the SF-36 outcome
measure specifically in the energy/fatigue, mental health and role-emotional domains compared
to those without neck pain.[70] Therefore, a comprehensive instrument that includes a range of
items that are important to patients would measure the construct of WAD disability more
accurately and perform better as an outcome measure.
An instrument recently developed to measure WAD disability is the Whiplash Disability
Questionnaire (WDQ). The WDQ was developed based on the ICF framework of disability.[99]
However, the developers have shown that in chronic WAD patients, the WDQ only includes one
domain/factor suggesting that it is does not fully represent the ICF framework. The
psychometric properties of the WDQ were studied in Australian patients with chronic
WAD.[49,99,140] In this population, the WDQ demonstrated good validity, reliability and
responsiveness. Recently, a German translation of the WDQ was also shown to have adequate
measurement properties for patients with chronic whiplash injuries.[87,110] However, the
WDQ’s reliability, validity and responsiveness in patients with acute WAD remain unknown.
To be useful clinically and in research, an outcome measure must have strong measurement
properties throughout the course of the condition. Moreover, because validation of an outcome
measure is specific to the population and setting studied, it is therefore necessary to establish its
measurement properties in a population of patients with acute WAD.[119]
9
1.3 The measurement divide Different schools of measurement can have different approaches to instrument development and
evaluation. Two schools relevant to health care, clinimetrics and psychometrics, have been the
source of some debate.[46,92] The international COSMIN research group recently developed a
set of measurement standards using Delphi methods to reach consensus on taxonomy,
terminology and definitions related to measurement properties for health-related patient-reported
outcomes.[89] However, this group consisted largely of researchers adhering to clinimetric
measurement methods. Clinimetrics is a measurement school developed by Feinstein and
focused on measurement relevant to clinical outcomes.[46] Psychometrics is an older school of
measurement developed in psychology with a focus on personal and interpersonal behaviour and
educational testing or examination.[92] Many of the psychometric measurement methods are
clinically relevant and have, therefore, been applied across health care. Inconsistency in methods
used to develop and evaluate instruments is complicated by the existence of these different
schools of measurement mainly because they use different theoretical and empirical
methods.[31,118,143] The debate between the two schools, clinimetrics and psychometrics, has
lead to confusion on the appropriateness of various instruments used to measure health
outcomes. I compared and contrasted the clinimetric and psychometric methods to advance this
debate and provide suggestions on the use of outcome measures in the future. As demonstrated
in the next chapter, this comparison led to the development of a conceptual framework that
integrated the theories and methods of both schools. I, therefore, use the term ‘measurement
properties’ in this thesis instead of ‘psychometric’ or ‘clinimetric properties’ when discussing the
evaluation of the WDQ. I propose that using ‘measurement properties’ will minimize confusion
and focus the discussion on properties of the instrument rather than the school of measurement.
1.4 Objectives
1.4.1 General Objectives
My first objective is to determine the measurement properties of the Whiplash Disability
Questionnaire (WDQ) in a cohort of patients with acute WAD. My second objective is to
10
analyze the divide between clinimetrics and psychometrics and develop a conceptual framework
for the evaluation of measurement properties.
1.4.2 Specific Objectives
In a clinical cohort of patients with recent WAD (less than 21 days duration), we aim to:
1.4.2.1 To clarify conceptual differences between psychometric and clinimetric methods;
1.4.2.2 Determine the short-term test-retest reliability of the WDQ;
1.4.2.3 Determine the factor structure of the WDQ;
1.4.2.4 Determine the internal consistency of the WDQ;
1.4.2.5 Determine the construct validity of the WDQ using the Neck Disability Index and
Short Form General Health Status Survey (SF-36);
1.4.2.6 Determine the short-term responsiveness of the WDQ using the global perceived
improvement question as an indicator of improvement
1.5 Structure of the Thesis This thesis is presented as a multiple-paper dissertation with five chapters: an overall
introduction, three papers that address specific objectives of the thesis and an overall discussion.
The sequence of papers was ordered to address conceptual issues first and follow the traditional
order determining measurement properties of an outcome measure. Specifically, the papers were
ordered to present reliability first, followed by validity, factor structure, internal consistency and
responsiveness. Each of the three manuscripts was written in a publishable format and includes
an introduction, a methods section, a results section and a discussion.
The thesis consists of the following three papers. Chapter Two: “Measurement Properties: A new
framework to contribute to the debate between the field of clinimetrics and psychometrics”
addresses the conceptual issues relevant to two fields within measurement leading to a new
conceptual framework which was the first objective of this thesis. Chapter Three: “Can
Recovery from Whiplash-associated Disorders be Measured Reliably in Patients with Acute
Whiplash-Associated Disorders? A Test-retest Reliability Study of the Whiplash Disability
11
Questionnaire” presents information relevant to the second objective of the thesis; specifically,
the 3-5 day test-retest reliability of the WDQ in adults with acute WAD (Figure 1.1). Chapter
Four: “Exploratory Factor Analysis, Validity and Responsiveness of the Whiplash Disability
Questionnaire in Adults with Acute Whiplash-associated Disorders” includes information
relevant to specific objectives three through six (Figure 1.1). Specifically, the factor structure,
internal consistency and construct validity of the WDQ (i.e. objective three through five) were
determined using baseline data from 130 participants with acute WAD. Objective six (i.e. short-
term responsiveness over six weeks) was established using baseline and six-week follow-up data
(Figure 1.1). All three papers will be submitted for publication before or soon after the doctoral
examination.
Figure 1.1: Data collection and data use in analysis addressing objectives two to six
This dissertation also includes several appendices. The first appendix includes interviewer-
administered baseline and follow-up questionnaires used to addresses the different specific
objectives of the thesis. Appendix 2 includes ethics approval certificates for this thesis from the
University Health Network and the University of Toronto.
12
Chapter 2 :
Measurement Properties: A new framework to contribute to the
debate between the field of clinimetrics and psychometrics
2.1 Introduction Measurement is a core science located at the heart of many intersecting health disciplines.
Consequently, different measurement paradigms have been developed to support the “metrics”
used in the various disciplines. Most common to health research are two fields: psychometrics
and clinimetrics. Psychometrics is the measurement of phenomena that are best measured by
multiple items or attributes reflecting a specific construct (i.e., anxiety or
depression).[57,92,112,145] While psychometrics is popular in health care, the second most
prevalent measurement school is clinimetrics which was introduced by Feinstein in the early
1980’s.[46] Clinimetrics focuses on prognostic and diagnostic indices, which may combine
different constructs (e.g., blood pressure, symptoms or previous risk factors) to create a
composite weighted score of risk of a distinct construct. The APGAR is an example of a robust,
well-used clinimetric index using distinct constructs (e.g. grimace, appearance, pulse) to
diagnose/classify the health status of newborns and identify newborns in need of medical
attention.[2]
Various disciplines have developed their “own” measurement theories and methodologies (e.g.,
sociometrics, biometrics, anthropometrics) which address measurement concepts central to their
discipline or area of research.[4,83,123] As previously mentioned, , the two most prevalent
schools in health research are psychometrics or clinimetrics. The theories and methodologies
promoted by these two schools are mainly used to design indicators or outcome measures for
research and clinical purposes. They will, therefore, be the focus of this paper. These two
schools are often described as discordant with proponents of each school promoting the strengths
of their approach and highlighting the limitations of the other framework.[31,118,143] We will
deconstruct these similarities and differences in order to move beyond the current debates.
This debate is more acute in the current period of patient-centered care where reimbursement is
often contingent on patient outcomes. Furthermore, regulators are increasingly aware that
13
measurement standards must be met before the benefits of an intervention are established.[128]
The “Era of Health Care Accountability” described by Relman is dependent on good
measurement reinforcing Nunally’s call that accurate measurement of key variables sets the pace
of scientific progress.[92,104,141] However, clinicians and researchers who need to select
patient-based outcomes are confronted with the tensions that exist between the two paradigms of
measurement: clinimetrics and psychometrics. This polarized debate often leads to ambiguity by
making some tools appear inadequate in their development or manifest properties when
approached from one perspective compared to the other. In this paper, we argue that
psychometrics and clinimetrics have more similarities than differences. For example, both fields
emphasize that outcome measures should be standardized, reproducible and accurate.[47,92]
They also share a common interest in the measurement of various latent constructs such as pain,
disability, self-efficacy, appraisal, perceptions and depression.
Despite their common interests, the co-existence of clinimetrics and psychometrics has given rise
to a hearty debate leading some psychometricians to question the need for clinimetrics.[39,118]
These proponents of psychometric theory suggest that the existence of clinimetrics is redundant
because it is not substantially different from the older field of psychometrics.[39,118] They
suggest that, like any classification, distinctions should continue to exist only if they facilitate
accurate communication about clinical and other outcome measures.[39,76] Alternatively,
proponents of clinimetrics suggest that their school is necessary because it offers clinically-based
methods to construct measures even if the measures are of latent constructs such as pain, anxiety
or functioning.[31,143] We propose that the current division between clinimetrics and
psychometrics creates an unneeded schism in a area where much work is needed: measurement
of key health constructs and variables. The debate has led to an unnecessary confusion in
clinical research and has created a barrier for the appropriate choice and use of measures.
Moreover, it has kept the fields separate and limited the advancement of measurement
methodology. Reconciliation between the two schools might lie in revisiting the roots of each
school rather than in continuing to debate the differences.
The purpose of our paper was twofold. First, we performed a scoping literature review to
describe the attributes of the clinimetric-psychometric divide. The aim of a scoping study is ‘to
map rapidly the key concepts underpinning a research area... especially where an area… has not
been reviewed comprehensively before’.[3] Second, we synthesized the findings and developed
14
a revised framework that highlights the similarities and differences of each, respecting the nature
of the measurement theory.
2.2 Methods
We conducted a scoping review of the literature. Our search included five stages: a)
development of a research question; b) search for relevant studies; c) study selection; d) data
charting; and e) collation, summarizing and reporting the results.[3]
2.2.1 Research question
What are the methodological similarities and differences between clinimetrics and psychometrics
in the development and evaluation of a clinical measure?
2.2.2 Search for relevant studies
We performed a literature search in Medline between 1950 and March 2012 using a combination
of the MeSH terms ‘psychometrics’ (exploded) and ‘health status’ (exploded) and text terms
‘clinimetric*’ and ‘psychometric*’. The terms were combined in the search using ‘and’ as the
combination link (i.e., MeSH ‘psychometrics’ and ‘clinimetric*’ or ‘psychometric*’ and
‘clinimetric*’). The search was limited to publications in English. We performed a similar
search in PsychINFO, CINAHL and Embase databases using the same subject headings as search
terms. Finally, we performed a textbook (title) search in the University of Toronto catalogue
using terms ‘clinimetric*’, ‘psychometric*’, ‘measurement’ and ‘health’. Article bibliographic
reference lists were also searched for relevant literature.
2.2.3 Study selection
The lead author (MS) reviewed all titles and abstracts and selected articles relevant to the
research question. An article was considered relevant if the major theme of the article was on the
comparison of clinimetrics and psychometrics.
2.2.4 Data charting
Through a series of iterative meetings (MS, DEB), we performed a content analysis of relevant
articles to identify emerging themes and stances taken by different authors. Our data charting
15
was guided by previous frameworks used to assess health indices and the measurement of
disease-specific quality of life.[55,78] These frameworks identified several categories that were
useful in categorizing our data when investigating the instrument development stages: item
selection, reduction, scaling and questionnaire formating.[55,78] Moreover, these frameworks
included stages for the process of instrument testing and evaluation including
reliability/reproducibility, validity and responsiveness. We used these categories to chart
similarities and difference between clinimetric and psychometric methods. For example, our
content analysis of articles included identification of similarities and differences in methods used
by clinimetrics and psychometrics in the stages of item selection and reduction.
2.2.5 Collation, summarizing and reporting results including synthesis
We synthesized the themes and findings from the relevant literature through iterative consensus
meetings between two of the authors (MS, DEB). This synthesis led to the development of a
position statement for our framework. We verified the results by revisiting each article to extract
features supporting or contradicting our position statement and presented it to the larger author
group (PC, JDC, EB) for debate and critique. Results from articles demonstrating empirical
testing were also summarized in a separate table to provide more relevant elements in reporting
of empirically based studies.
2.3 Results
2.3.1 Literature search
The Medline search using a combination of the MeSH term ‘psychometrics’ and the key word
‘clinimetric*’ or key words ‘psychometric*’ and ‘clinimetric*’ yielded 90 results (Figure 2.1).
A Medline search using the MeSH term ‘health status’ and the keyword ‘clinimetric*’ yielded 41
results. CINAHL, PhychINFO and EMBASE searches did not identify any new articles and
therefore are not represented in Figure 2.1.
16
MeSH: ‘Psychometrics
Text word: Clinimetric*
MeSH:‘Health Status’
90 41
AND OR
AND
Selection based on relevance to the debate:main topic of the article is the differences between clinimetrics and psychometrics
15
Duplicates or not relevant
22
7 articles added from bibliography search
Text word: Clinimetric*
AND
Text word: Psychometric*
Article Search in Medline Textbook Search
Text word: Clinimetric
Text word: Psychometric
3 180
Text word: Measurement
Text word:Health
711
AND
0
Reasons for exclusion:55 articles assessed a specific instrument20 articles not on main topic for other reasons894 textbooks not on main topic
Figure 2.1: Literature search for the measurement divide scoping review
2.3.2 Study selection
Our search yielded 15 relevant articles (Table 2.1). All articles were published in the early 1990s
following Feinstein’s description of clinimetrics in the 1980s. Additional articles were obtained
from searching article bibliographies (five articles and two replies to included articles). Five of
the relevant articles used empirical methods to test the proposed methodological differences
between clinimetrics and psychometrics (Table 2.2). The textbook search yielded three citations
(one textbook and two theses) using the term clinimetric, 180 citations using the term
17
psychometric and 711 citations for the combination of terms measurement and health. Textbook
citations were focused on the methods of each field and not on the differences between
psychometrics and clinimetrics. Therefore, we selected relevant textbooks of each field to
inform our article review.[33,47,92] Most articles (61%) were excluded because they focused on
evaluating the measurement properties of specific measures without comparing clinimetric and
psychometric methods.
2.3.3 Data charting
Based on our content analysis, we found several emerging themes and positions by different
authors. Specifically, several proponents of clinimetrics reported that the construction of
clinimetric indexes involves a deliberate combination of multiple attributes that are not expected
to produce a homogenous measure (Table 2.1).[31,32,41,42,91,143,146] They also suggested
that clinimetric measures usually contain fewer items than psychometric measures and that the
items are chosen to combine multiple clinical constructs in a single index. This contrasts with
the psychometric approach where different facets of the same construct are preferable.
Furthermore, proponents of clinimetrics propose that “dissected intuition” (defined as
stakeholder or expert input including clinician or patient input) is the fundamental distinction
between the two metric fields in the construction of instruments.[31,143] In contrast,
proponents of psychometrics suggest that there is an equal amount of “dissected intuition”
involved in constructing psychometric instruments, and that the need for the homogeneity
amongst items (i.e. internal consistency) in the outcome measure is dependent on the purpose of
the measure.[118] However, a more purist psychometric instrument development would perhaps
use less opinion or appraisal from stakeholders and focus on indicators of similarity in response
patterns (correlations, factor analysis) to determine items to keep.
Another important distinction between clinimetrics and psychometrics is the nature of the
variables included in an instrument. Clinimetricians suggest that psychometric approaches
include indicator (variables that result from the measured construct) rather than causal variables
(variables that may induce change in the measured construct rather than be the result of
it).[32,44,45] In contrast, clinimetric approaches include mainly causal variables.[32,44,45] As
pointed out by Fayers et al, this is significant because changes in the latent (measured) construct
should be directly and proportionally reflected by the indicator variables, but not necessarily by
18
the causal variables (Figure 2.2). Therefore, from a statistical perspective, a causal variable may
behave differently from an indicator variable with respect to demonstrating change in the
construct of interest.[44,45]
Figure 2.2: Latent construct relationship with causal and indicator variables
In the measurement evaluation phase of an instrument, the two schools did not demonstrate
differences. Both the proponents of clinimetrics and psychometrics suggested that measurement
properties are important and should meet accepted standards (validity, reliability, responsiveness
and interpretability) regardless of the type of development.[32,118]
19
2.3.4 Collation, summarizing and reporting of results
Feinstein proposed that the conceptual distinction between clinimetrics and psychometrics must
be based on what is being measured; specifically, whether a clinical phenomenon or a
psychosocial/educational construct is being measured.[47] However, the differences may be
more methodological in nature and relate to how measures are developed irrespective of what is
being measured. We present the results of our scoping review as a framework that collates and
summarizes the information extracted from reviewed articles. We divide our framework in three
phases: a) the development and scoring phase; b) the structure/precision phase; and c) the
measurement performance phase (Figure 2.3).
In the item development and scoring phase (Figure 2.3), the difference between the schools is
based on whether the instrument has a targeted criterion (e.g. death, neonatal survival) or an
untargeted criterion (e.g. depression, anxiety). Specifically, the main difference was the use of
clinical consensus (clinimetric) or statistical (psychometric) methods. An instrument measuring
a targeted criterion consists of a clinically-relevant, tangible concept with items that do not
correlate highly, representing multiple constructs within that concept (e.g. death causes can be
defined by several constructs that are not related to each other such as traffic accidents or
hypertension). In contrast, an instrument measuring an untargeted criterion aims to measure a
clinically-relevant, intangible concept containing a single construct with highly-correlated items
representing attributes of that construct (e.g. guilt, insomnia and agitation are attributes of
depression which correlate highly with each other).
The Apgar score, a measure of the condition of a newborn baby, is the most notable example of a
purely clinimetric index.[2,48] In developing the scale, Virginia Apgar used her clinical
experience and knowledge to select five objective signs to define the newborn baby’s
condition.[2,48] The five items (heart rate, respiratory effort, reflex irritability, muscle tone and
skin colour) are distinct constructs that are not expected to be highly correlated. These items
were chosen to define a condition that is not itself a single construct. However, it provides a
total clinical score that is predictive of neonatal survival. Other examples of clinimetric scales
are the multi-construct disease activity indexes used in rheumatology (e.g., rapid assessment of
disease activity in rheumatology [radar] questionnaire).[86,108] Statistical tests were not used in
20
their development to demonstrate high correlation between subscales, or to develop a final
homogenous instrument with highly correlated items measuring a single construct.
In contrast to clinimetric instruments, psychometric scales such as the Hamilton depression and
anxiety scales were developed using statistical methods for item selection.[57,112,145] The
Hamilton depression scale was developed using items relevant to the construct of depression
(e.g. depressed mood, suicidal thoughts, agitation, hypochondriasis) that correlate highly and that
were selected using factor analysis to produce the 17-item instrument.[57] The development of
psychometric scales relied heavily on factor analysis to identify items that are strongly
correlated. They focused on developing a homogenous set of items that measured single
constructs. These scales contain multiple related items or attributes (i.e., guilt, insomnia,
agitation) as indicators of a single concept (i.e., depression).[57] In psychometric scales, internal
consistency is expected to be high.
We found scales that used hybrid methods. These scales are located on the continuum between
the two poles presented in Figure 2.3. For example, the Whiplash Disability Questionnaire
(WDQ) was constructed using information collected from clinician expert opinion and
patients.[99] However, a more psychometric method of principal component analysis was
applied to finalize the instrument. In contrast, the Disabilities of the Arm, Shoulder and Hands
(DASH) questionnaire was developed using statistical methods.[64] Specifically, 30 items
included in the DASH were selected by subjecting 70 items to equidiscriminatory item total
correlation statistical method which selects items that are highly correlated but discriminate well
between individuals throughout the range of scores.[64] However, the statistical process was
supplemented (at the stage of final item retention) by patient opinion on the importance of items
and by clinician expert opinion. Therefore, the DASH development initially relied on
psychometric methods, but a parallel clinimetric approach was used to complete its development.
The authors intentionally used both measurement schools. These WDQ and the DASH are
examples of instruments located in the methodological continuum between the purely clinimetric
and purely psychometric approaches; the WDQ located closer to clinimetrics while the DASH is
closer to psychometrics (Figure 2.3).
Several studies have compared clinimetric and psychometric development and evaluation
methods (Table 2.2b).[6,73,85,106,127] Four studies suggested that the application of
21
clinimetric and psychometric development methods lead to the retention of different
items,[6,73,85,127] while one study reported that it resulted in the selection of different domains
within the instrument.[106] Interestingly, four studies reported that the same type of
measurement properties are assessed in both fields (i.e., reliability, validity and
responsiveness).[6,73,85,106,127] The authors found that the measurement properties were
similar regardless of how the measure was created. The clinimetrically-developed instrument
(described as the concept-retention instrument) was recommended in the study by Beaton et al
due to its similarity to the original DASH but the authors demonstrated that all three instruments
would perform similarly in terms of their measurement properties.[6] Our analysis confirms that
the strategies for evaluating the performance of outcome measures is similar for clinimetric,
psychometric and hybrid scales.[32,41] This suggests that there is no divide when assessing the
measurement properties of instruments. The same measurement properties are necessary for
both clinimetric and psychometric instruments regardless of the use of statistical methods in
development. However, some authors have suggested that face and content validity are more
important in the development of clinimetric rather than psychometric instruments since there is
more focus on clinical intuition.[31,146] Other authors suggested that the definition of construct
validity differed between the fields but these authors erroneously defined construct validity as
the assessment of internal consistency.[36] In contrast, Streiner suggested that a priori
hypotheses should be formed to assess validity based on the goal of the instrument and that this
is equally important to both fields.[118]
22
Figure 2.3: Conceptual framework bridging clinimetrics and psychometrics
23
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!
! ! ! !
<-%=3&!)&!(>!ABBC!
D,11#'(!()*(!40/./-#(%/4!/.3#@#'!*%#!)#(#%&1#.&,'!8)/0#!$'74)&-#(%/4!/.3#@#'!*/-!2&%!)&-&1#.#/(7!
E0/./-#(%/4!/.3#@#'!(#.3!(&!)*5#!2#8#%!/(#-'!(&!2*4/0/(*(#!#*'#!&2!'4*0#!,'*1#!8)/0#!$'74)&-#(%/4!'4*0#'!/.40,3#!0*%1#!.,-6#%!&2!/(#-'!(&!/-$%&5#!)&-&1#.#/(7!
D,11#'(!()*(!3/''#4(#3!/.(,/(/&.!<40/./4*0!%#*'&./.1>!/'!,'#3!/.!40/./-#(%/4'!6,(!.&(!/.!$'74)&-#(%/4'!*.3!()*(!3#5#0&$-#.(!&2!$'74)&-#(%/4!'4*0#'!3#$#.3'!*0-&'(!4&-$0#(#07!&.!#-$/%/4*0F'(*(/'(/4*0!3#4/'/&.'!&2!8)*(!1#('!/.40,3#3!/.!()#!/.3#@!
G)#'#!*,()&%'!4%#*(#!$&0*%!3/'(/.4(/&.'!6#(8##.!40/./-#(%/4!*.3!$'74)&-#(%/4!-#()&3'!&2!'4*0#!3#5#0&$-#.(!
G)#!*,()&%'!3&!.&(!3/'4,''!/2!*.7!&5#%0*$!6#(8##.!()#!(8&!2/#03'!#@/'('!
?,'%:)-!)&!(>!ABBH!
I/%#4(!*''#''-#.(!&2!()#!(8&!*$$%&*4)#'!/.!3#5#0&$/.1!*!-#*',%#!0#*3!(&!()#!%#(#.(/&.!&2!3/22#%#.(!/(#-'!
J&.#!$%#'#.(#3! K-$/%/4*0!#5*0,*(/&.!&2!()#!40/./-#(%/4!*.3!$'74)&-#(%/4!3#5#0&$-#.(!&2!-#*',%#'!
+#*',%#-#.(!$%&$#%(/#'!.&(!*''#''#3!
.(7)-$!)&!(>!ABBH!$9!ALB!
D,11#'(!()*(!#@$0&%*(&%7!2*4(&%!*.*07'/'!<K:M>!/'!/.*$$%&$%/*(#!/.!()#!3#5#0&$-#.(!&2!40/./-#(%/4!/.3/4#'!6#4*,'#!K:M!4*..&(!-&3#0!/.3/4*(&%'!()*(!)*5#!*!4*,'*0!#22#4(!&.!()#!0*(#.(!4&.'(%,4(!
D,11#'(!()*(!K:M!4*.!0#*3!(&!/.4&.'/'(#.(!%#',0('!*4%&''!'(,3/#'!8)#.!*$$0/#3!(&!/.3/4#'!8/()!4*,'*0!/.3/4*(&%'!6#4*,'#!4&%%#0*(/&.'!6#(8##.!4*,'*0!/.3/4*(&%'!-*7!.&(!%#20#4(!()#!-*./2#'(*(/&.!&2!*!4)*.1#!/.!()#!0*(#.(!2*4(&%!
J&.#!$%#'#.(#3! N.!',$$&%(!&2!3/22#%#.(/*0!,'#!&2!2*4(&%!*.*07'/'!3#$#.3/.1!&.!()#!(7$#'!&2!/(#-'!/.40,3#3!/.!()#!/.'(%,-#.(!8)/4)!-*7!%#20#4(!-&%#!40/./-#(%/4!/.'(%,-#.('!6,(!(&!4&.(/.,#!/('!,'#!/.!$'74)&-#(%/4'!
O'#3!0/(#%*(,%#!&.!(8&!3/22#%#.(!/.'(%,-#.('!*'!#@*-$0#'!(&!3#-&.'(%*(#!',$$&%(!2&%!/.*$$%&$%/*(#.#''!&2!K:M!2&%!40/./-#(%/4!/.'(%,-#.('!
24
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!.(7)-$!)&!(>!ABBH!$9!LBL!
D,11#'(!()*(!?,*0/(7!&2!0/2#!/.'(%,-#.('!)*5#!*!4&-6/.*(/&.!&2!4*,'*0!*.3!#22#4(!/.3/4*(&%'!',11#'(/.1!()*(!()#7!*%#!.#/()#%!$,%#07!40/./-#(%/4!.&%!$,%#07!$'74)&-#(%/4!
O'#3!*!10&6*0!?,*0/(7!&2!0/2#!?,#'(/&.!(&!3#-&.'(%*(#!4&%%#0*(/&.!8/()!()#!#22#4(!/.3/4*(&%'!*.3!*!0*4P!&2!4&%%#0*(/&.!8/()!$&(#.(/*007!4*,'*0!/.3/4*(&%'!
D,11#'(!()*(!'(*(/'(/4*0!-#()&3'!',4)!*'!2*4(&%!*.*07'/'!*%#!.&(!*$$%&$%/*(#!8)#.!*''#''/.1!4*,'*0!/.3/4*(&%'!6#4*,'#!4*,'*0!/.3/4*(&%'!3&!.&(!%#20#4(!()#!0*(#.(!4&.'(%,4(!</9#9!()#7!-*7!6#!'/3#Q#22#4('!&2!*!4&.3/(/&.!4*,'/.1!()#!4)*.1#'!/.!()#!0*(#.(!4&.'(%,4(>!
J&.#!$%#'#.(#3! D,$$&%(!2&%!()#!/.2&%-#3!=&.#!*.3!()#!3/22#%#.4#'!/.!3#5#0&$-#.(!&2!40/./-#(%/4!*.3!$'74)&-#(%/4!/.'(%,-#.('!
M3-/(!()*(!()#%#!/'!.&!3#2/./(#!8*7!(&!3#(#%-/.#!/2!/.3/4*(&%'!*%#!4*,'*0!6,(!',11#'(!'&-#!-#()&3'!()*(!*%#!',11#'(/5#!&2!()#'#!/(#-'!6#)*5/.1!3/22#%#.(07!()*.!#22#4(!/.3/4*(&%'!8)#.!',6-/((#3!(&!'(*(/'(/4*0!*.*07'/'!
M0()&,1)!()#!-#()&3'!,'/.1!3*(*!*%#!.&(!.#4#''*%/07!#-$/%/4*0!$%&&2!&2!3/22#%#.4#';!',11#'(!()*(!()#%#!/'!*!3/22#%#.4#!/.!()#!(7$#'!&2!/.'(%,-#.('!*.3!()*(!4#%(*/.!'(*(/'(/4*0!-#()&3'!')&,03!.&(!6#!*$$0/#3!/.!#5#%7!4*'#!
25
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!.)%'$&)%'!ABBB!
O'/.1!()#!M$1*%!'4&%#!*'!*.!#@*-$0#!&2!*!40/./-#(%/4!/.3#@;!',11#'('!()*(!3/''#4(#3!/.(,/(/&.!&%!40/./4*0!R,31-#.(;!#*'#!&2!,.3#%'(*.3/.1!*.3!#*'#!&2!*$$0/4*(/&.!*%#!/-$&%(*.(!/.!40/./-#(%/4'!6,(!.&(!*'!-,4)!/.!$'74)&-#(%/4'!
S#(#%&1#.#/(7!/-$&%(*.(!/.!40/./-#(%/4'!6,(!)&-&1#.#/(7!/-$&%(*.(!/.!$'74)&-#(%/4'!
D,11#'('!()*(!()#%#!/'!.&!#5/3#.4#!()*(!&.#!-#()&3!/'!6#((#%!()*.!()#!&()#%!6,(!()*(!40/./4/*.'!')&,03!.&(!%#07!'&0#07!&.!'(*(/'(/4*0!-#()&3'!/.!3#5#0&$/.1!-#*',%#'!'/.4#!',44#''2,0!-#()&3'!&2!&.#!2/#03!*%#!.&(!.#4#''*%/07!(%*.'$&%(*60#!(&!*.&()#%!2/#03!
!
T'74)&-#(%/4!-#()&3'!%#R#4(!3/5#%'/(7!*.3!'/-$0#!%*(/.1'!/.!-#*',%#'!!
M%1,#'!()*(!2*4#!5*0/3/(7!/'!4%,4/*0!/.!40/./-#(%/4'!6,(!.&(!/.!$'74)&-#(%/4'!6#4*,'#!$#%(/.#.(!(&$/4'!-*7!6#!&-/((#3!3,#!(&!$&&%!'(*(/'(/4*0!4&%%#0*(/&.!8/()!()#!%#(*/.#3!/(#-'!
E&Q*,()&%#3!"%/1)(!#(!*0!$*$#%!*.3!,'#3!'/-/0*%!$&0*%!3/'(/.4(/&.'!6#(8##.!40/./-#(%/4!*.3!$'74)&-#(%/4!-#()&3'!&2!'4*0#!3#5#0&$-#.(!
D(*(#3!()*(!()#%#!/'!.&!#5/3#.4#!()*(!&.#!-#()&3!/'!6#((#%!()*.!()#!&()#%!6,(!()*(!40/./4*0!R,31-#.(!')&,03!$0*7!*!%&0#!/.!3#5#0&$/.1!-#*',%#'!
D,$$&%(!2&%!3/22#%#.4#'!/.!3#5#0&$-#.(!'(*1#'!
@(-A!)&!(>!ABBB!
I/22#%#.(!/(#-'!*%#!%#(*/.#3!67!3/22#%#.(!-#()&3'!%#',0(/.1!/.!'0/1)(07!3/22#%#.(!2/.*0!-#*',%#'!
G)#!-#*',%#-#.(!$%&$#%(/#'!8#%#!*''#''#3!*.3!'*(/'2*4(&%7!2&%!6&()!2/.*0!-#*',%#'!
I/''#4(#3!/.(,/(/&.!8*'!,'#3!67!6&()!-#()&3'!(&!2/.*0/=#!()#!-#*',%#'!
U.4#!V3/''#4(#3!/.(,/(/&.W!8*'!,'#3!(&!4)*.1#!()#!/(#-'!(&!2/.*0/=#!()#!'4*0#;!()#!(8&!-#()&3'!%#',0(#3!/.!'/-/0*%!'4*0#'!8/()!'/-/0*%!-#*',%#-#.(!$%&$#%(/#'!
!
J&.#!$%#'#.(#3! N(#-'!%#(*/.#3!/.!-#*',%#'!3#5#0&$#3!67!()#!(8&!-#()&3'!3/22#%#3!6,(!8#%#!-&%#!'/-/0*%!&.4#!40/./4*0!R,31-#.(!8*'!,'#3!/.!6&()!4*'#'!(&!2/.*0/=#!()#!/.'(%,-#.('!<',$$&%(/5#!&2!&,%!/.2&%-#3!=&.#>!
+#()&3'!(&!*''#''!-#*',%#-#.(!$%&$#%(/#'!8#%#!()#!'*-#!/%%#'$#4(/5#!&2!()#!3#5#0&$-#.(!-#()&3!,'#3!(&!4%#*(#!()#!/.'(%,-#.(!<',$$&%(!2&%!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#'>!
!
26
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!B7C('$0%!ABBB!
M1%##!8/()!:#/.'(#/.!()*(!40/./-#(%/4'!/'!'/1./2/4*.(!&.!/('!&8.!*.3!()*(!/(!/'!*.!*$$%&$%/*(#!'(*%(/.1!$&/.(!2&%!'(*(/'(/4*0!'4*0#!3#5#0&$-#.(!6#4*,'#!3/5#%'/(7!6*'#3!&.!40/./4*0!P.&80#31#!')&,03!6#!/.40,3#3!(&!'*(/'27!2*4#!*.3!4&.(#.(!5*0/3/(7!
N./(/*0!*3)#%#.4#!(&!'(*(/'(/4*0!*.3!$'74)&-#(%/4!-#()&3'!-*7!6#!*$$%&$%/*(#!#5#.!/2!()#!,0(/-*(#!1&*0!/'!*!40/./-#(%/4!-#*',%#!
DP/00'!*.3!/.'/1)('!&2!6&()!()#!40/./4/*.!*.3!()#!$'74)&-#(%/4/*.!*%#!%#?,/%#3!/.!3#5#0&$-#.(!&2!-#*',%#'!'/.4#!.#/()#%!*$$%&*4)!*0&.#!/'!',22/4/#.(!
E0/./4*0!/.'/1)(!/'!()#!#''#.(/*0!/.1%#3/#.(!-/''/.1!2%&-!$'74)&-#(%/4!-#()&3'!8)/4)!'&$)/'(/4*(#3!'(*(/'(/4'!4*..&(!$%&5/3#!
K3/(&%/*0!/.!',$$&%(!2&%!()#!3#5#0&$-#.(*0!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!
D,$$&%('!&,%!2%*-#8&%P!/.!(#%-'!&2!()#!/.2&%-#3!=&.#!6#4*,'#!',11#'('!()*(!-#()&3'!,'#3!6&()!67!40/./4/*.'!*.3!$'74)&-#(%/4/*.'!*%#!%#?,/%#3!/.!-#*',%#!3#5#0&$-#.(!
27
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!.(7)-$!)&!(>!CXXC!
D,11#'(!()*(!4*,'*0!5*%/*60#'!-*7!*44&,.(!2&%!()#!3/22#%#.4#!/.!40/./-#(%/4!*.3!$'74)&-#(%/4!/.'(%,-#.('!*.3!#@$0*/.!()*(!6&()!*%#!.##3#3!
M%1,#!()*(!4*,'*0!5*%/*60#'!*22#4(!()#!0*(#.(!5*%/*60#;!.&(!'/-$07!/.3/4*(&%'!&2!/(!<*'!#22#4(!/.3/4*(&%'!,'#3!/.!$'74)&-#(%/4!'4*0#'!8&,03!6#>!!
M0'&!*%1,#!()*(!4*,'*0!5*%/*60#'!*%#!',22/4/#.(!6,(!.&(!.#4#''*%7!4&-$&.#.(!4*,'#'!&2!*!0*(#.(!4&.'(%,4(!<,'/.1!'(*.3*%3/=#3!#?,*(/&.!-&3#0/.1!(#%-/.&0&17>!
D,11#'(!()*(!(%*3/(/&.*0!$'74)&-#(%/4!'(*(/'(/4*0!*$$%&*4)#'!</9#9!2*4(&%!*.*07'/'>!-*7!6#!/.*$$%&$%/*(#!2&%!40/./-#(%/4!/.3#@!3#5#0&$-#.(!6#4*,'#!4&5*%/*.4#'!6#(8##.!4*,'*0!5*%/*60#'!-*7!#@/'(!%#1*%30#''!&2!()#/%!%#0*(/&.')/$!8/()!()#!,.3#%07/.1!2*4(&%!8)/4)!-*7!',11#'(!*33/(/&.*0!2*4(&%'!3/'(/.4(!2%&-!()#!/.3/4*(&%'!
D,11#'(!()*(!.&!3*(*!*.*07'/'!/'!.#4#''*%7!(&!3#4/3#!)&8!(&!4&-6/.#!/.3/5/3,*0!/(#-'!&2!*!40/./-#(%/4!-&3#0!*.3!(&!3#(#%-/.#!()#!%#0*(/5#!/-$&%(*.4#!&2!/(#-'!6,(!()*(!()/'!/'!.&(!*!')&%(4&-/.1!6#4*,'#!()#!*/-!&2!40/./-#(%/4!*.3!$'74)&-#(%/4!/.'(%,-#.(!3#5#0&$-#.(!/'!3/22#%#.(!
M0()&,1)!',11#'(!()*(!4&.'(%,4(/&.!*.3!*''#''-#.(!3/22#%'!6#(8##.!-#()&3';!()#!'*-#!(7$#'!&2!$#%2&%-*.4#!-#()&3'!*%#!,'#3!(&!*''#''!6&()!</9#9!5*0/3/(7;!%#0/*6/0/(7;!%#'$&.'/5#.#''>!
D,$$&%(!()*(!3/22#%#.(!/.'(%,-#.('!8/00!0/P#07!%#',0(!2%&-!,.4%/(/4*0!*$$0/4*(/&.!&2!#/()#%!-#()&3!*.3!#/()#%!-/1)(!6#!/.*3#?,*(#!
D,11#'(!()*(!-#()&3'!&2!5*0/3*(/.1!/.'(%,-#.('!3#$#.3'!&.!()#!(7$#!&2!/.'(%,-#.(!*.3!()*(!/.(#%Q/(#-!4&%%#0*(/&.!*.3!/.(#%.*0!4&.'/'(#.47!')&,03!.&(!6#!#@$#4(#3!(&!6#!)/1)!/.!/.'(%,-#.('!4&.(*/./.1!4*,'*0!/(#-'!
D,11#'(!()*(!4&.(#.(!5*0/3/(7!-*7!6#!-&%#!/-$&%(*.(!/.!40/./-#(%/4'!6#4*,'#!&-/''/&.!&2!*.!/(#-!()*(!/'!.&(!4&%%#0*(#3!(&!&()#%!/(#-'!6#4*,'#!/(!/'!4*,'*0!-*7!)*5#!*!1%#*(#%!/-$*4(!&.!()#!$#%2&%-*.4#!&2!()#!/.'(%,-#.(!()#.!/.!$'74)&-#(%/4!/.'(%,-#.('!8)#%#!&-/''/&.!&2!&.#!/(#-!(*$$/.1!/.(&!()#!'*-#!,.3#%07/.1!4&.4#$(!-*7!)*5#!.&!#22#4(!&.!$#%2&%-*.4#!
D,11#'(!()*(!3/22#%#.(/*0!/(#-!2,.4(/&./.1!4*.!6#!,'#3!(&!3/22#%#.(/*(#!6#(8##.!/.3/4*(&%!*.3!4*,'*0!5*%/*60#'!*.3!()*(!'4*0#'!')&,03!/.!1#.#%*0!/.40,3#!&.#!&%!()#!&()#%;!.&(!6&()!
D,11#'(!()*(!$'74)&-#(%/4!-#()&3'!4*.!6#!,'#3!/.!-,0(/Q/(#-!'4*0#'!(&!#@*-/.#!',6'#('!&2!/.3/4*(&%!5*%/*60#'!6,(!.&(!4&-6/.#3!8/()!4*,'*0!5*%/*60#'!
D,$$&%(!2&%!()#!3/22#%#.4#!/.!3#5#0&$-#.(!'(*1#'!&2!&,%!2%*-#8&%P!8/()!*.!#@$0*.*(/&.!,'/.1!4*,'*0!*.3!/.3/4*(&%!5*%/*60#'!(&!#@$0*/.!8)7!'&-#!'(*(/'(/4*0!-#()&3'!</9#9!2*4(&%!*.*07'/'>!-*7!2*/0!2&%!40/./-#(%/4!/.3/4#'!
D,$$&%(!2&%!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#!&2!&,%!2%*-#8&%P!6#4*,'#!()#!'*-#!(7$#!&2!-#*',%#!*''#''-#.('!*%#!',11#'(#3!67!6&()!-#()&3'!
28
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!D(-&3#>#*)/!CXXC!<%#$07!(&!:*7#%'!#(!*0!CXXC>!
E&00*6&%*(/&.!6#(8##.!'(*(/'(/4/*.'!*.3!$%*4(/(/&.#%'!/.!$'74)&0&17!*.3!-#3/4/.#!<()&'#!4&.'(%,4(/.1!40/./4*0!*.3!$'74)&-#(%/4!/.'(%,-#.('>!/'!.##3#3!(&!/-$%&5#!,.3#%'(*.3/.1!&2!-#()&3'>!
:*4(&%!*.*07'/'!3&#'!.&(!8&%P!8)#.!*$$0/#3!(&!*!-/@(,%#!&2!4*,'*0!*.3!/.3/4*(&%!5*%/*60#'!
D,11#'(!()*(!#22#4(!/.3/4*(&%'!*%#!/.3/4*(&%'!&2!6&()!()#!0*(#.(!5*%/*60#!*.3!()#!4*,'*0!/.3/4*(&%'!<-#3/*(/.1!()#!0*(#.(!5*%/*60#>!*.3!',11#'(!()*(!4*,'*0!/.3/4*(&%'!3&!.&(!.##3!(&!6#!/.40,3#3!/.!/.'(%,-#.('!*'!0&.1!*'!*00!*$$%&$%/*(#!#22#4(!/.3/4*(&%'!*%#!/.40,3#3!
D,$$&%(!2&%!()#!/.2&%-#3!/.!()#!'#.'#!()*(!3/22#%#.(!#@$#%('!)*5#!(&!8&%P!(&1#()#%!(&!/-$%&5#!-#()&3'!
D,11#'(!()*(!'(*(/'(/4*0!*.3!$'74)&-#(%/4!*$$%&*4)#'!*%#!'/-/0*%!*.3!()*(!2*4(&%!*.*07'/'!/'!.&(!*$$%&$%/*(#!/2!4*,'*0!2*4(&%'!*%#!/.40,3#3!
29
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!4%E0)-$!)&!(>!CXXL!
D,11#'(!()*(!40/./-#(%/4!*.3!$'74)&-#(%/4'!3/22#%!/.!3#5#0&$-#.(!'(*1#'!6#4*,'#!$'74)&-#(%/4'!*/-'!2&%!,./3/-#.'/&.*0!'4*0#'!/.40,3/.1!&.07!#22#4(!/.3/4*(&%'!8)/0#!40/./-#(%/4!/.3/4#'!/.40,3#!-,0(/$0#!*((%/6,(#'!()*(!4&,03!6#!8#/1)(#3!3/22#%#.(07!2&%!1%#*(#%!%#'$&.'/5#.#''!*.3!/.40,3#!4*,'*0!/.3/4*(&%'!*'!8#00!*'!#22#4(!/.3/4*(&%'!
D,11#'(!()*(!()#!40/./-#(%/4!*$$%&*4)!/'!-&%#!*$$%&$%/*(#!/.!*''#''/.1!#.5/%&.-#.('!*.3!()*(!3/22#%#.4#!/.!3#5#0&$-#.(!-*7!%#',0(!/.!3/22#%#.4#'!/.!()#!%#',0('!&2!-#*',%#-#.(!$%&$#%(/#'!
D,11#'(!()*(!4&.(#.(!5*0/3/(7!-*7!6#!-&%#!/-$&%(*.(!(&!40/./-#(%/4'!6,(!/(!/'!/-$&%(*.(!/.!6&()!2/#03'!*.3!()*(!/.(#%.*0!4&.'/'(#.47!-*7!6#!/%%#0#5*.(!(&!40/./-#(%/4!'4*0#'!6,(!5*0/3/(7!*.3!%#0/*6/0/(7!.##3!(&!6#!*''#''#3!2&%!*00!/.'(%,-#.('!
D,11#'(!()*(!40/./-#(%/4'!,'#'!-*.7!$%&4#3,%#'!3#5#0&$#3!/.!$'74)&-#(%/4'!
M%1,#!()*(!',/(*6/0/(7!&2!'(*(/'(/4*0!$%&4#3,%#'!.##3!(&!6#!4&.'/3#%#3!/.'(#*3!&2!R,'(!*$$0/#3!6#4*,'#!&2!2*-/0/*%/(7!8)#.!3#5#0&$/.1!.#8!'4*0#'!2&%!*''#''-#.(!&2!#.5/%&.-#.('!
J&.#!$%#'#.(#3! E%/(/?,#3!*.!*%(/40#!5*0/3*(/.1!*!-#*',%#!*''#''/.1!#.5/%&.-#.('!(&!3/'4,''!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!
D,$$&%(!2&%!3/22#%#.4#!/.!()#!3#5#0&$-#.(!'(*1#'!&2!&,%!2%*-#8&%P!*.3!()#!(7$#!&2!*''#''-#.('!$#%2&%-#3!/.!&,%!-#*',%#-#.(!$#%2&%-*.4#!'(*1#!
D,$$&%(!2&%!()#!/.2&%-#3!=&.#!'/.4#!40/./-#(%/4'!,'#'!-*.7!&2!()#!$%&4#3,%#'!3#5#0&$#3!/.!$'74)&#-(%/4'!
30
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!4%(*#'5!CXXL!<%#$07!(&!I/RP#%'>!
T'74)&-#(%/4/*.'!*%#!R,'(!*'!/.(#%#'(#3!/.!4&-6/./.1!-,0(/$0#!-#*',%#'!/.(&!*!'/.10#!'4&%#!*'!*%#!40/./4/*.'!*.3!()#!/-$&%(*.(!1&*0!/.!$'74)&-#(%/4'!/'!()#!#'(/-*(/&.!&2!()#!5*0/3/(7!&2!/.2#%#.4#'!-*3#!2%&-!-#*',%#-#.(';!.&(!R,'(!()#!#'(/-*(/&.!&2!%#0/*6/0/(7!
K*%07!'(*1#'!&2!*.7!'#*%4)!2&%!5*0/3!/.2#%#.4#'!')&,03!/.40,3#!4&.(#.(!#@$#%('!&2!()#!'$#4/2/4!2/#03!&2!/.(#%#'(!</9#9!40/./4/*.!2&%!)#*0()!4*%#>;!.&(!R,'(!$'74)&-#(%/4/*.'!<4&.'/3#%#3!(&!6#!?,*.(/(*(/5#>!
J&.#!$%#'#.(#3! M%1,#'!*1*/.'(!()#!3/5/3#!&2!I/RP#%'!#(!*0!<CXXC>!67!*%1,/.1!/.!',$$&%(!&2!&,%!/.2&%-#3!=&.#!67!6%/.1/.1!()#!'(*P#)&03#%!&$/./&.!/.(&!()#!3#5#0&$-#.(!&2!*!$'74)&-#(%/4!'4*0#!
@(-%#'!CXXL!<%#$07!(&!I/RP#%'>!
D,11#'(!()*(!3#5#0&$-#.(!&2!-#*',%/.1!/.'(%,-#.('!4*.!6#!3&.#!8/()!.&!&%!5#%7!2#8!-*()#-*(/4*0!&%!()#&%#(/4*0!(&&0'!*'!0&.1!*'!()#!3#5#0&$#%'!3#2/.#!8)*(!/'!6#/.1!-#*',%#3!
M1%##'!8/()!I/RP#%'!()*(!*''#''-#.(!&2!()#!-#*',%#W'!,(/0/(7;!4&.'/'(#.47!*.3!-#*./.1!3&#'!.&(!%#?,/%#!#0*6&%*(#!*.*07'/'!&%!-&3#0'!<R,'(!40/./4*0!R,31-#.(>!
J&.#!$%#'#.(#3! D,$$&%('!3/22#%#.4#!6#(8##.!-#()&3'!/.!&,%!3#5#0&$-#.(!'(*1#'!/.!',11#'(/.1!()*(!40/./4*0!/.(,/(/&.!')&,03!$0*7!*!'/1./2/4*.(!%&0#!/.!3#5#0&$/.1!40/./4*0!/.3/4#'!
31
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!5)!F)&!)&!(>!CXXL!$9AALH!
E&.'/3#%!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*'!3/'(/.4(!*.3!%#0*(#3!2/#03'!6,(!,'#!V40/./-#(%/4!$%&$#%(/#'W!*'!*!(#%-!2&%!-#*',%#-#.(!$%&$#%(/#'!/.40,3/.1!()&'#!$%#5/&,'07!,'#3!/.!$'74)&-#(%/4'!</9#9!5*0/3/(7;!%#0/*6/0/(7>!
D,11#'(!()*(!-#*',%#'!3#5#0&$#3!,'/.1!NYG!*%#!0#''!3#$#.3#.(!&.!$&$,0*(/&.!*.3!'/(,*(/&.'!6,(!()*(!NYG!,'#!/.!40/./-#(%/4'!-*7!6#!0/-/(#3!6#4*,'#!/(!*/-'!(&!3#5#0&$!,./3/-#.'/&.*0!/.'(%,-#.('!
D,11#'(!()*(!()#!*/-!&2!()#!-#*',%#!3#(#%-/.#'!8)/4)!-#*',%#-#.(!$%&$#%(/#'!*%#!/-$&%(*.(!
D,11#'(!()*(!*!40&'#!4&00*6&%*(/&.!6#(8##.!40/./4/*.';!'(*(/'(/4/*.';!#$/3#-/&0&1/'('!*.3!$'74)&0&1/'(!*%#!.##3#3!(&!/-$%&5#!2,(,%#!,'#!&2!40/./-#(%/4'!/.!40/./4*0!%#'#*%4)!*.3!$%*4(/4#!
E0*/-!()*(!*00!-#*',%#'!/.!40/./4*0!%#'#*%4)!*%#!40/./-#(%/4!
D,11#'(!()*(!2*4#!5*0/3/(7!/'!-&%#!/-$&%(*.(!/.!40/./-#(%/4'!!
D,$$&%(!2&%!()#!3/22#%#.4#!/.!()#!3#5#0&$-#.(!*.3!/.2&%-#3!=&.#!67!',11#'(/.1!()*(!#@$#%('!2%&-!3/22#%#.(!2/#03'!.##3#3!/.!40/./-#(%/4'!6,(!*0'&!()#!$#%2&%-*.4#!'(*1#!67!',11#'(/.1!()*(!()#!*/-!&2!()#!-#*',%#!3#(#%-/.#'!()#!(7$#!&2!-#*',%#!#5*0,*(/&.;!.&(!()#!-#()&3!&2!3#5#0&$-#.(!
32
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!9&-)%')-!)&!(>!CXXL!
D,11#'(!()*(!R,'(!*'!-,4)!V3/''#4(#3!/.(,/(/&.W!/'!,'#3!/.!$'74)&-#(%/4'!*'!/.!40/./-#(%/4'!
Y#4&1./=#!()*(!()#%#!/'!*!3/22#%#.4#!()*(!()#7!3#'4%/6#!*'!-&%#!?,*0/(*(/5#!6*'#3!&.!)&8!-,4)!3#$#.3#.4#!()#%#!/'!&.!,'/.1!&.07!40/./4*0!R,31#-#.(!/.!/(#-!'#0#4(/&.!<',11#'(/.1!()*(!/(!/'!)/1)!/.!40/./-#(%/4'!*.3!0&8!/.!$'74)&-#(%/4'>!
M%1,#!*1*/.'(!()#!',11#'(/&.!()*(!*00!?,#'(/&..*/%#'!/.!$'74)&-#(%/4'!*%#!,./3/-#.'/&.*0!*.3!()*(!*00!-#3/4*0!&.#'!*%#!)#(#%&1#.#&,'!6#4*,'#!()*(!8&,03!&5#%0&&P!()#!3/5#%'/(7!&2!6&()!2/#03'!
D,11#'(!()*(!()#!,'#!&2!()#!(#%-!40/./-#(%/4'!)*'!0#*3!(&!/1.&%*.4#!&2!()#!0/(#%*(,%#!&.!-#*',%#-#.(!$%&$#%(/#'!()*(!8*'!3#5#0&$#3!/.!$'74)&-#(%/4'!*.3!()/'!',$$&%('!&,%!'#4(/&.!&.!-#*',%#-#.(!$%&$#%(/#'!/.!()*(!()/'!*%#*!')&,03!.&(!3/22#%!6#(8##.!2/#03'!*.3!()*(!/.2&%-*(/&.!2%&-!6&()!2/#03'!')&,03!6#!,'#3!
D,11#'(!()*(!'4*0#'!()*(!')&,03!6#!)&-&1#.#&,'!')&,03!6#!#5*0,*(#3!8/()!*.!/.3#@!&2!/.(#%.*0!4&.'/'(#.47!6,(!()*(!.&(!*00!'4*0#'!*%#!)&-&1#.#&,'!!
D,11#'(!()*(!/.(#%.*0!4&.'/'(#.47!*.3!/.3/4#'!&2!%#$%&3,4/6/0/(7!</9#9!(#'(Q%#(#'(!&%!/.(#%Q%*(#%!%#0/*6/0/(7>!*%#!.&(!*0(#%.*(/5#'!6,(!()*(!()#7!-#*',%#!3/22#%#.(!$#%2&%-*.4#!*'$#4('!&2!*!'4*0#!*.3!()*(!6&()!4&,03!6#!,'#3!/.!6&()!2/#03'!
D,11#'(!()*(!()#!,'#!&2!()#!(#%-!40/./-#(%/4'!)*'!0#*3!(&!*!-/',.3#%'(*.3/.1!&2!()#!$%&4#3,%#'!,'#3!(&!3#5#0&$!'4*0#'!
D,11#'(!()*(!()#!2/#03!&2!40/./-#(%/4'!')&,03!.&(!#@/'(!*'!*!'#$*%*(#!2/#03!2%&-!$'74)&-#(%/4'!6#4*,'#!/(!/'!&.07!*!',6'#(!&2!$'74)&-#(%/4'!8/()!4&.4#$('!()*(!3&!.&(!3/22#%!',6'(*.(/*007!
E&.40,'/&.Z![40/./-#(%/4'!/'!.&(!3#'4%/6/.1!*!.#8!2*-/07!&2!(#4)./?,#'!()*(!')&,03!6#!,'#3!8/()!*!,./?,#!(7$#!&2!'4*0#;!6,(!/'!'/-$07!*.&()#%!8&%3!2&%!*!$&%(/&.!&2!8)*(!/'!3&.#!/.!$'74)&-#(%/4'\!<$$9!AA]]>!
D(%#/.#%!$%#'#.('!*!'#*%4)!'(%*(#17!(&!')&8!()*(!-&'(!-#*',%#-#.(!$%&$#%(7!*%(/40#'!4*.!6#!2&,.3!8/()!$'74)&-#(%/4'!*.3!0#''!8/()!40/./-#(%/4'!$&/.(/.1!&,(!()*(!()/'!5*'(!/.2&%-*(/&.!4*.!6#!-/''#3!67!'#*%4)/.1!2&%!40/./-#(%/4'!*0&.#!*.3!()*(!/1.&%*.4#!*.3!-/',.3#%'(*.3/.1!4*.!%#',0(!6,(!8#!8&,03!',11#'(!(&!1#.#%*0/=#!()#!(#%-/.&0&17!(&!/.40,3#!*00!0/(#%*(,%#!67!/.3#@/.1!()#-!/.!3*(*6*'#'!*'!-#*',%#-#.(!$%&$#%(/#'!<.&(!40/./-#(%/4!&%!$'74)&-#(%/4!&%!*.7!&()#%!-#*',%#-#.(!',62/#03>!
D,$$&%(/5#!&2!&,%!3/*1%*-!/.!()#!'#.'#!()*(!.&(!*00!40/./-#(%/4!-#*',%#'!*%#!)#(#%&1#.&,'!*.3!.&(!*00!$'74)&-#(%/4!&.#'!*%#!,./3/-#.'/&.*0!^!()#%#!/'!*!(%*.'/(/&.!6#(8##.!()#!2/#03'!<',$$&%(!2&%!&,%!/.2&%-#3!=&.#>!
_%*3/.1!&2!()#!,'#!&2!40/./4*0!R,31#-#.(!/.!3#5#0&$/.1!-#*',%#'!/'!*0'&!',$$&%(/5#!&2!&,%!/.2&%-#3!=&.#!
33
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!5)!F)&!)&!(>!!CXXL!$9AA]`!<%#$07!(&!D(%#/.#%>!
+#*',%#-#.(!3#2/.#3!*'!W-#(%/4'W!()*(!4*.!6#!',62/#03'!&2!$'74)&-#(%/4';!40/./-#(%/4';!6/&-#(%/4'!
D,11#'(!()*(!$'74)&-#(%/4'!*.3!40/./-#(%/4'!*%#!.&(!4&.(%*3/4(&%7!6,(!)*5#!3/22#%#.(!*/-'!8)/4)!*%#!-&%#!4&.(#.(!3%/5#.!2&%!40/./-#(%/4'!</9#9!-#*',%#!-,0(/$0#!4&.'(%,4('!8/()!*!'/.10#!/.3#@!>!*.3!-&%#!'(*(/'(/4*007!3%/5#.!2&%!$'74)&-#(%/4'!</9#9!-#*',%#!*!'/.10#!4&.'(%,4(!,'/.1!-,0(/$0#!/(#-'>!
D,11#'(!()*(!3/22#%#.4#'!*%#!-&'(07!#5/3#.(!/.!3#5#0&$-#.(!'(*1#'!8/()!$'74)&-#(%/4!-#*',%#-#.(!/.'(%,-#.('!/.40,3/.1!&.07!/.3/4*(&%!5*%/*60#'!<()*(!4&%%#0*(#!8/()!()#!,.3#%07/.1!4&.'(%,4(!(&!6#!-#*',%#3!6,(!3&!.&(!*0(#%!&%!/.20,#.4#!()#!4&.'(%,4(;!.&(!4*,'*0>!8)/0#!40/./-#(%/4!-#*',%#-#.(!/.'(%,-#.('!-*7!/.40,3#!/.3/4*(&%!*.3!4*,'*0!5*%/*60#'!
D,11#'(!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4!*.3!$'74)&-#(%/4!*$$%&*4)#'!*%#!0#''!&65/&,'!/.!()#!#5*0,*(/&.!&2!()#!&,(4&-#!-#*',%#!
D,11#'(!()*(!6&()!-#(%/4!3/'4/$0/.#'!-*7!,'#!()#!'*-#!-#()&3&0&1/4!*.3!'(*(/'(/4*0!*$$%&*4)#';!3#$#.3/.1!&.!()#!1&*0!*.3!',6R#4(!&2!-#*',%#-#.(!
J&.#!$%#'#.(#3! Y#$07!(&!D(%#/.#%!$&/.(/.1!&,(!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!/.!',$$&%(!()*(!()#7!')&,03!6#!3/22#%#.(!6,(!-*P#!,'#!&2!-#()&3'!2%&-!&.#!*.&()#%!8)#.!%#?,/%#3!67!()#!1&*0!&2!()#!-#*',%#!
a0#.3#3!*$$%&*4)#'!*.3!&,%!/.2&%-#3!=&.#!',$$&%(#3!!
34
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!D);3!CXX]!
E&.4&%3*.4#!6#(8##.!40/./4*0!*.3!'(*(/'(/4*0!4&)#%#.4#!&2!'7-$(&-'!/'!#''#.(/*0!(&!40/./-#(%/4'!*.3!)*'!6##.!3#-&.'(%*(#3!/.!*!')&%(#.#3!3#$%#''/&.!?,#'(/&..*/%#;!()#!SM+QI`!<8/()!/-$%&5#3!%#'$&.'/5#.#''>!
E0*''/4*0!$'74)&-#(%/4'!4*.!%#',0(!/.!*.!*0(#%.*(/5#!(&!40/./4*0!()/.P/.1!8)/4)!/'!.&(!%#4&--#.3#3!6,(!-&3#%.!$'74)&-#(%/4'!/'!-&%#!/.(#1%*(/5#!
E%&.6*4)W'!*0$)*!/'!3#$#.3#.(!&.!()#!.,-6#%!&2!/(#-'!/.!()#!?,#'(/&..*/%#!<8)/4)!-*P#'!/(!*.!/.3/4*(&%!&2!0#.1()!&2!()#!-#*',%#>!*.3!.&(!%#0#5*.(!/.!40/./-#(%/4'!
I/'4,''!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*'!()#7!%#0*(#!(&!$'74)/*(%7!
D,$$&%(!2&%!()#!/.2&%-#3!=&.#!6#4*,'#!40/./4*0!*.3!'(*(/'(/4*0!-#()&3'!.##3!(&!6#!/.!4&.4&%3*.4#!2&%!6#((#%!-#*',%#!3#5#0&$-#.(!*.3!-&3#%.!$'74)&-#(%/4'!/'!-&%#!/.(#1%*(/5#!/.!*1%##-#.(!8/()!()*(!
G**)>0(*:!CXX]!
M0()&,1)!40/./-#(%/4'!)*'!/.(%&3,4#3!'#.'/(/5/(7!(&!4)*.1#!*'!*.!/-$&%(*.(!-#*',%#-#.(!4&.4#$(;!$'74)&-#(%/4!-#()&3'!')&,03!.&(!6#!/1.&%#3!
Y#0/*6/0/(7!*.3!5*0/3/(7;!/.40,3/.1!2,.4(/&.*0!*.*07'/';!')&,03!6#!/.5#'(/1*(#3!%*()#%!()*.!*'',-#3!
J&.#!$%#'#.(#3! I/'4,''!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*'!()#7!%#0*(#!(&!$'74)/*(%7!
D,$$&%(!2&%!()#!$#%2&%-*.4#!-#*',%#-#.(!'(*1#!6#4*,'#!*00!-#*',%#'!.##3!(&!6#!*''#''#3!,'/.1!()#!'*-#!*''#''-#.(!-#()&3'!
.(H(!)&!(>!CXX]!
T'74)&-#(%/4!(#4)./?,#'!')&,03!6#!,'#3!*'!$*%(!&2!40/./-#(%/4'!6,(!,'/.1!$'74)&-#(%/4'!*0&.#!4&,03!0#*3!(&!-/'0#*3/.1!#22#4('!/.!40/./4*0!%#'#*%4)!
E0/./-#(%/4!-#()&3'!3/22#%!2%&-!$'74)&-#(%/4!-#()&3'!/.!()#!3#5#0&$-#.(!&2!*!'4*0#!
!
J&.#!$%#'#.(#3! I/'4,''#'!()#!,'#!&2!40/./-#(%/4'!/.!$'74)/*(%7!
D,$$&%(!2&%!3/22#%#.4#!/.!3#5#0&$-#.(!&2!'4*0#'!6,(!',11#'(!(&!/.(#1%*(#!-#()&3'!&2!&()#%!2/#03'!</.!',$$&%(!&2!&,%!/.2&%-#3!=&.#>!
35
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!D)(&#'!)&!(>!!CXXb!
E&-$*%#3!()%##!/(#-Q%#3,4(/&.!*$$%&*4)#'!(&!4%#*(#!?,/4PIMDS!*.3!2&,.3!()*(!()#!4&.4#$(Q%#(#.(/&.!<R,31-#.(Q6*'#3>!*$$%&*4)!$%&3,4#3!*!4&-$*%*60#;!*.3!'0/1)(07!6#((#%!-#*',%#!()*.!'(*(/'(/4*007Q3%/5#.!*$$%&*4)#'!
!
J&.#!$%#'#.(#3! K-$/%/4*0!#5*0,*(/&.!&2!()#!40/./-#(%/4!*.3!$'74)&-#(%/4!3#5#0&$-#.(!&2!*!')&%(#%!5#%'/&.!&2!()#!IMDS!/.!()#!'*-#!$&$,0*(/&.!*'!+*%@!#(!*0!*''#''#3!()#!&%/1/.*0!IMDS!
G)#!40/./-#(%/4!-#*',%#!)*3!()#!'(%&.1#'(!%*.P/.1!/.!(#%-'!&2!-#*',%#-#.(!$%&$#%(/#'!6,(!5#%7!'/-/0*%;!&2(#.!*'',-#3!(&!3#$#.3!&.!'&0/3!$'74)&-#(%/4!2&,.3*(/&.'!!
I%6)-(!)&!(>!CXX`!
E0/./-#(%/4!*.3!$'74)&-#(%/4!*$$%&*4)#'!(&!3&-*/.!3#5#0&$-#.(!4&-$*%#3!2&%!*!-7&4*%3/*0!/.2%*4(/&.!?,#'(/&..*/%#!*.3!2&,.3!()*(!()#!40/./-#(%/4*007!'4&%#3!5#%'/&.!)*3!6#((#%!&5#%*00!-#*',%#-#.(!$%&$#%(/#'!*.3!')&,03!6#!,'#3!,.(/0!*!6#((#%!$'74)&-#(%/4*007!'4&%#3!5#%'/&.!/'!3#5#0&$#3!
+#()&3'!')&,03!6#!/.(#1%*(#3!</9#9!/2!2*4(&%!*.*07(/4!(#4)./?,#'!1/5#!*-6/1,&,'!%#',0('!/.!3/22#%#.(!3*(*'#(';!40/./-#(%/4'!-*7!)#0$>!
E0/./-#(%/4!*.3!$'74)&-#(%/4!*$$%&*4)#'!(&!3#5#0&$/.1!-,0(/Q/(#-!SYcUd!/.'(%,-#.('!-*7!6#!4&-$0/-#.(*%7!6#4*,'#!()#!2/%'(!/-$%&5#'!5*0/3/(7!*.3!%#'$&.'/5#.#''!8)/0#!()#!'#4&.3!/-$%&5#'!%#0/*6/0/(7!
K-$/%/4*0!#5*0,*(/&.!&2!()#!40/./-#(%/4!*.3!$'74)&-#(%/4!3#5#0&$-#.(!&2!-#*',%#!3&-*/.'!<.&(!/(#-!'#0#4(/&.>!
D,$$&%(!2&%!()#!/.2&%-#3!=&.#!6#4*,'#!',11#'(!()*(!()#'#!%#',0('!',11#'(!()#!/.(#1%*(/&.!&2!()#!(8&!*$$%&*4)#'!/.!3#5#0&$-#.(!*.3!#5*0,*(/&.!&2!-#*',%#'!-/1)(!6#!()#!6#'(!*$$%&*4)!
D,11#'(!()*(!%#',0('!&2!-#*',%#-#.(!$%&$#%(/#'!3/22#%!6*'#3!&.!3#5#0&$-#.(!-#()&3!6,(!()*(!()#!'*-#!$%&$#%(/#'!')&,03!6#!*''#''#3!
36
Table 2.1: Position statement of our framework and the evidence that is in support of the framework
! "#$%&%#'!$&(&)*)'&!#+!#,-!.-(*)/#-0!! !
1,--)'&!23)$%$!
"#!$%&$&'#!()*(!+#*',%#-#.(!/'!*!0*%1#!2/#03!()*(!4&.'/'('!&2!'#5#%*0!',62/#03'!/.40,3/.1!40/./-#(%/4'!*.3!$'74)&-#(%/4'!8)/4)!-*7!&5#%0*$!/.!'&-#!*%#*'!*.3!6#!3/'(/.4(!/.!&()#%'9!!"#!$%&$&'#!()*(!()#!3/22#%#.4#'!6#(8##.!40/./-#(%/4'!*.3!$'74)&-#(%/4'!*%#!/.!()#!3#5#0&$-#.(!&2!&,(4&-#!-#*',%#'!*.3!.&(!/.!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#9!!:,%()#%-&%#;!8#!$%&$&'#!()*(!()#%#!/'!*.!&5#%0*$!<&%!*.!/.2&%-#3!=&.#>!/.!()#!,'#!&2!3#5#0&$-#.(!(#4)./?,#'!6#(8##.!()#!2/#03!'$#4/2/4!(#4)./?,#'!*(!#/()#%!#@(%#-#9!
! !4)+%')5!678! 9,::#-&!+#-!#,-!$&(&)*)'&! 1#'&-(5%;&#-7!&#!#,-!$&(&)*)'&! 1#**)'&$!2,-')-!)&!(>!CXXB!
E&-$*%/.1!40/./-#(%/4!*.3!$'74)&-#(%/4!-#()&3'!(&!*''/1.!8#/1)('!(&!*!$#3/*(%/4!1*'(%&#.(#%&0&17!/.3#@!*.3!2&,.3!()*(!3/22#%#.(!/(#-'!8#%#!%#(*/.#3!
+*()#-*(/4*0!*$$%&*4)!(&!8#/1)(/.1!()#!/.3#@!8*'!',$#%/&%!(&!()#!R,31#-#.(*0!*$$%&*4)!6#4*,'#!&2!2#*'/6/0/(7!</9#9!0*6!/(#-'!&-/((#3>!
+#*',%#-#.(!$%&$#%(/#'!8#%#!#?,*007!1&&3!2&%!6&()!-#()&3'!<#5#.!()&,1)!$'74)&-#(%/4'!/'!&2(#.!4%/(/4/=#3!2&%!*!0*4P!&2!2*4#!5*0/3/(7!*.3!'#.'/6/0/(7>!
Y#',0('!',11#'(!()*(!*!I#0$)/!(#4)./?,#!-*7!.&(!6#!*60#!(&!',6'(/(,(#!0*%1#!#@$#.'/5#!$%&'$#4(/5#!'(,3/#'!.##3#3!2&%!8#/1)(/.1!-#*',%#'!,'/.1!$'74)&-#(%/4!-#()&3'!</9#9!#@40,'/&.!&2!0*6&%*(&%7!/(#-'!4%#*(#3!*!-&%#!2#*'/60#!-#*',%#!()*(!8&,03!6#!?,/4P#%!(&!4&-$0#(#>!
K-$/%/4*0!#5*0,*(/&.!&2!()#!40/./-#(%/4!*.3!$'74)&-#(%/4!8#/1)(/.1!&2!-#*',%#'!
D,11#'(!()*(!*!()&,1)(2,0!4&-6/.*(/&.!&2!6&()!-*()#-*(/4*0!*.3!40/./4*0!-#()&3'!')&,03!6#!,'#3!/.!3#5#0&$-#.(!&2!6&()!40/./-#(%/4!*.3!$'74)&-#(%/4!-#*',%#'!</.!',$$&%(!&2!&,%!/.2&%-#3!=&.#>!
.(H(!)&!(>!CXAC!!
D,11#'(!()*(!)&-&1#.#/(7!&2!4&-$&.#.('!/'!.&(!%#?,#'(#3!2&%!40/./-#(%/4'!*'!/(!-*7!6#!2&%!$'74)&-#(%/4'!
D,11#'(!()*(!8#/1)/.1!&2!/.3/5/3,*0!/(#-'!-*7!!3/22#%!*'!0&.1!*'!()#!-#*',%#!/'!*60#!(&!3/'4%/-/.*(#!6#(8##.!3/22#%#.(!1%&,$'!&2!',6R#4('!*.3!(&!%#20#4(!4)*.1#'!/.!#@$#%/-#.(*0!'#((/.1'!</9#9!3%,1!(%/*0'>!
N.'/.,*(#!()*(!40/./-#(%/4'!)*'!*!'#(!&2!%,0#'!()*(!3/22#%'!2%&-!$'74)&-#(%/4'!/.!3#5#0&$-#.(!'(*1#'!
D,11#'(!()*(!40/./-#(%/4!$%/.4/$0#'!')&,03!1,/3#!()#!'#0#4(/&.!&2!-#()&3'!2&%!*!'$#4/2/4!*''#''-#.(!6,(!3&!.&(!3#'4%/6#!$%/.4/$0#'!()*(!3/22#%!2%&-!$'74)&-#(%/4'!/.!()#!#5*0,*(/&.!&2!$%&$#%(/#'!
D,11#'(!()*(!10&6*0!%*(/.1!*.3!/.5&05#-#.(!&2!()#!$*(/#.(!*%#!,./?,#!2#*(,%#'!&2!40/./-#(%/4!'4*0#'!
N.'/.,*(#!()*(!40/./-#(%/4'!)*'!/('!&8.!'#(!&2!%,0#'!/.!#5*0,*(/&.!'(*1#'!
I&-*/.'!/.40,3#3!/.!V40/./-#(%/4W!0*(#.(!(%*/('!3#'4%/6#3!*%#!(%*3/(/&.*007!-&%#!$'74)&-#(%/4!</9#9!$'74)&'&4/*0!4)*%*4(#%/'(/4'>!
G)#!(#%-!40/./-#(%/4!,'#3!2&%!*00!-#*',%#'!%#0#5*.(!(&!40/./4*0!$%*4(/4#!
+#*',%#-#.(!$%&$#%(/#'!0*6#0#3!40/./-#(%/4!$%&$#%(/#'!</1.&%/.1!$'74)&-#(%/4!4&.(%/6,(/&.'>!
D,$$&%(!2&%!()#!3/22#%#.4#!/.!3#5#0&$-#.(!'(*1#'!&2!&,%!2%*-#8&%P!*.3!()#!-#*',%#-#.(!$#%2&%-*.4#!'(*1#!'/.4#!*00!()#!'*-#!*''#''-#.(!-#()&3'!*%#!%#4&--#.3#3!6,(!%#.*-/.1!&2!()#!$'74)&-#(%/4!$%&$#%(/#'!*'!40/./-#(%/4!/'!.&(!',$$&%(/5#!&2!()#!2%*-#8&%P!
37
Table 2.2a: Studies using empirical methods to test differences between clinimetric and psychometric methods !D(,37! N.'(%,-#.(! E&-$*%/'&.!&2!3#5#0&$-#.(!-#()&3'! Y#',0('!e,./$#%!#(!*0;!ABBH!
M'()-*!c,*0/(7!&2!d/2#!c,#'(/&..*/%#!
E0/./4*0!/-$*4(!5#%','!2*4(&%!*.*07'/'!/(#-!%#3,4(/&.!&2!AbC!/(#-'!/.!AbX!*3,0('!8/()!'7-$(&-*(/4!*'()-*!
E0/./4*0!/-$*4(!,'#3!/-$*4(!*'!()#!$%&3,4(!&2!2%#?,#.47!*.3!/-$&%(*.4#!&2!/(#-'!*'!%*(#3!67!$*(/#.('!&.!*!bQ$&/.(!'4*0#!
:*4(&%!*.*07'/'!6*'#3!&.!S70*.3!*.3!+*%P'!#@40,3/.1!/(#-'!8/()!*!2%#?,#.47!&2!f]Xg;!()#.!/(#-'!8/()!/(#-Q(&(*0!4&%%#0*(/&.'!&2!0#''!()*.!X9]!*.3!()#.!/(#-'!8/()!/(#-!/.(#%Q4&%%#0*(/&.!hX9H9!:/.*007;!0&*3/.1'!0#''!()*.!X9]!&.!()#!2/%'(!2*4(&%!&2!*!$%/.4/$*0!4&-$&.#.(!*.*07'/'!8#%#!#@40,3#3!
I/22#%#.(!/.'(%,-#.('!%#',0(#3!2%&-!,'/.1!3/22#%#.(!/(#-!%#3,4(/&.!*$$%&*4)#'!<8/()!CX!/(#-'!6#/.1!/.!4&--&.!6#(8##.!LCQ/(#-!*.3!L`Q/(#-!2/.*0!/.'(%,-#.('>!
T'74)&-#(%/4!-#()&3!3/'4*%3#3!()#!)/1)#'(!/-$*4(!#-&(/&.*0!2,.4(/&.!*.3!#.5/%&.-#.(*0!/(#-'!*.3!/.40,3#3!2*(/1,#Q%#0*(#3!/(#-'!/.'(#*3!
+#*',%#-#.(!$%&$#%(/#'!8#%#!.&(!*''#''#3!&%!4&-$*%#3!
+*%@!#(!*0;!ABBB!
I/'*6/0/(/#'!&2!M%-;!D)&,03#%!*.3!S*.3!<IMDS>!c,#'(/&..*/%#!
E0/./-#(%/4!5#%','!$'74)&-#(%/4!/(#-!'#0#4(/&.F%#3,4(/&.!&2!HX!/(#-'!(&!*!LXQ/(#-!'4*0#!/.!]XH!$*(/#.('!8/()!,$$#%!#@(%#-/(7!3/'&%3#%'!
E0/./-#(%/4!-#()&3'!/.5&05#3!()#!)/1)#'(!-#*.!*33/(/5#!'4&%#'!&2!/-$&%(*.4#!*.3!'#5#%/(7!,'/.1!bQ$&/.(!$*(/#.(!%*(/.1'!
T'74)&-#(%/4!-#()&3'!/.5&05#3!#?,/3/'4%/-/.*(&%7!/(#-!(&(*0!4&%%#0*(/&.!<KNGE>!8)/4)!/.5&05#'!3/5/3/.1!'4&%#'!/.(&!Cb();!bX()!*.3!Hb()!$#%4#.(/0#'!*.3!'#0#4(/.1!/(#-'!8/()!()#!)/1)#'(!4&%%#0*(/&.'!8/()!()#!&5#%*00!'4&%#!,'/.1!()#!/(#-Q(&(*0!4&%%#0*(/&.'!2%&-!#*4)!',61%&,$!
I/22#%#.(!/.'(%,-#.('!%#',0(#3!2%&-!3/22#%#.(!/(#-!'#0#4(/&.F%#3,4(/&.!*$$%&*4)#'!<8/()!A`!&,(!&2!LX!/(#-'!/.!4&--&.>!
E%&.6*4)W'!*0$)*!8*'!)/1)!2&%!6&()!/.'(%,-#.('!*.3!()#%#!8*'!*!)/1)!NEE!6#(8##.!/.'(%,-#.('!
:/.*0!'4&%#'!3/3!.&(!3/22#%!-,4)!*2(#%!40/./4/*.!/.$,(!(&!2/.*0/=#!6&()!/.'(%,-#.('!
38
Table 2.2a: Studies using empirical methods to test differences between clinimetric and psychometric methods !D(,37! N.'(%,-#.(! E&-$*%/'&.!&2!3#5#0&$-#.(!-#()&3'! Y#',0('!a#*(&.!#(!*0;!CXXb!
c,/4PIMDS!c,#'(/&..*/%#!<&6(*/.#3!2%&-!()#!'*-#!3*(*F!$&$,0*(/&.!*'!+*%@!#(!*0>!
E&-$*%#3!L!/(#-!%#3,4(/&.!-#()&3'!/.40,3/.1!4&.4#$(Q%#(#.(/&.!*.3!C!'(*(/'(/4*007Q6*'#3!*$$%&*4)#'!<KNGE!*.3!/(#-!%#'$&.'#!()#&%7>!(&!')&%(#.!()#!LXQ/(#-!IMDS!?,#'(/&..*/%#!/.(&!*.!AAQ/(#-!c,/4PIMDS!/.!]XH!$*(/#.('!8/()!5*%/&,'!,$$#%Q0/-6!4&.3/(/&.'!
+#*',%#-#.(!$%&$#%(/#'!*''#''#3!/.!*!0&.1/(,3/.*0!'(,37!&2!CXX!$*(/#.('!8/()!,$$#%Q0/-6!3/'&%3#%'!
E&.4#$(Q%#(#.(/&.!/'!*!R,31-#.(*0!*$$%&*4)!1#*%#3!(&8*%3!%#(*/./.1!4&.4#$('!*.3!'#0#4(/.1!/(#-'!2%&-!#*4)!3&-*/.!
S/1)#'(!%*.P/.1!/(#-'!&2!KNGE!-#()&3!*%#!-#*.(!(&!3#(#4(!3/'*6/0/(7!*4%&''!()#!2,00!%*.1#!&2!'4&%#'!
Y*'4)!-&3#0/.1!2&%!()#!/(#-!%#'$&.'#!()#&%7!-#()&3!'#0#4('!/(#-'!6*'#3!&.!3/22/4,0(7!()*(!*%#!#?,*007!'$*4#3!*.3!4*0/6%*(#3!*0&.1!'4*0#!0#.1()!
I/22#%#.(!-#()&3'!%#',0(#3!/.!%#(#.(/&.!&2!3/22#%#.(!/(#-'!8/()!KNGE!*.3!Y*'4)!%#(*/./.1!()#!-&'(!/(#-'!/.!4&--&.!<H>!8)/0#!4&.4#$(Q%#(#.(/&.!)*3!b!,./?,#!/(#-'!
G8&!/(#-'!*%#!%#(*/.#3!/.!4&--&.!67!*00!()%##!-#()&3'!
M00!()%##!-#()&3'!%#-*/.#3!'/-/0*%07!%#0/*60#;!5*0/3!*.3!%#'$&.'/5#!*'!()#!&%/1/.*0;!0&.1#%!IMDS!
E&.4#$(Q%#(#.(/&.!?,/4PIMDS!8*'!'#0#4(#3!6#4*,'#!/(!)*3!()#!'(%&.1#'(!%*.P/.1!/.!(#%-'!&2!-#*',%#-#.(!$%&$#%(/#'!
Y/6#%*!#(!*0;!CXX`!
+4J#8!c,*0/(7!&2!d/2#!*2(#%!+7&4*%3/*0!N.2*%4(/&.!<cd+N>!c,#'(/&..*/%#!
M!-&3/2/#3!$'74)&-#(%/4*007!3#%/5#3!5#%'/&.!4&-$*%#3!(&!()#!&%/1/.*0!40/./-#(%/4*007!3#%/5#3!5#%'/&.!/.!(#%-'!&2!3&-*/.!4&.'(%,4(/&.F'4&%/.1!
G)#!&%/1/.*0!40/./-#(%/4!/.'(%,-#.(!8*'!3#%/5#3!,'/.1!40/./4/*.!/.$,(!*.3!$*(/#.(!/-$&%(*.4#!%*(/.1'!*.3!()#!b!',63&-*/.'!8#%#!'#0#4(#3!&.!()#!6*'/'!&2!4&.4#$(,*0!0/.P'!*.3!',--*%/=#3!/.(&!C!3&-*/.'!<#-&(/&.*0!*.3!$)7'/4*0!)#*0()>!
G)#!-&3/2/#3!$'74)&-#(%/4!5#%'/&.!8*'!3#%/5#3!67!%#-&5/.1!C!&%/1/.*0!/(#-';!*33/.1!L!.#8!/(#-'!*.3!3#5#0&$/.1!*!'4&%/.1!-#()&3!,'/.1!#@$0&%*(&%7!2*4(&%!*.*07'/'!8)/4)!%#',0(#3!/.!L!3&-*/.'!<$)7'/4*0;!#-&(/&.*0!*.3!'&4/*0>!
E%&.6*4)W'!*0$)*!*.3!()#!'(*.3*%3/=#3!%#'$&.'#!-#*.!2&%!6&()!40/./-#(%/4*007!*.3!$'74)&-#(%/4*007!3#%/5#3!/.'(%,-#.('!8*'!'/-/0*%!6,(!5*0/3/(7!8*'!6#((#%!2&%!()#!40/./-#(%/4*007!3#%/5#3!/.'(%,-#.(!6*'#3!&.!*!$%/&%/!)7$&()#'/=#3!/.(#%Q4&%%#0*(/&.'!6#(8##.!/.'(%,-#.('!*.3!#@(#%.*0!*.4)&%!4&%%#0*(/&.'!8/()!()#!D:QL`!3&-*/.'!
39
Table 2.2a: Studies using empirical methods to test differences between clinimetric and psychometric methods !D(,37! N.'(%,-#.(! E&-$*%/'&.!&2!3#5#0&$-#.(!-#()&3'! Y#',0('!G,%.#%!#(!*0;!CXXB!
T#3/*(%/4!O04#%*(/5#!E&0/(/'!M4(/5/(7!N.3#@!<TOEMN>!
I/22#%#.4#'!/.!-*()#-*(/4*0!<$'74)&-#(%/4>!*.3!R,31-#.(*0!<40/./-#(%/4>!8#/1)/.1!!&2!*!1*'(%&#.(#%&0&17!/.3#@!*''#''#3!
U%/1/.*007!8#/1)#3!-*()#-*(/4*007!,'/.1!-,0(/5*%/*(#!%#1%#''/&.!-&3#0/.1!&.!AbH!4)/03%#.!
N.3#$#.3#.(07;!R,31-#.(*0!*$$%&*4)!,'#3!*!I#0$)/!1%&,$!&2!L`!#@$#%('!(&!$%&5/3#!8#/1)('!2&%!()#!/.3#@!8)/4)!%#(*/.#3!0*6&%*(&%7!/(#-'!#@40,3#3!67!()#!-*()#-*(/4*0!-&3#0/.1!
"#/1)('!&2!/.'(%,-#.('!3#5#0&$#3!,'/.1!3/22#%#.(!*$$%&*4)#'!8#%#!'/-/0*%!
M!3/22#%#.4#!/.!*1%##-#.(!8*'!3#-&.'(%*(#3!67!a0*.3!*.3!M0(-*.!$0&('!3,#!(&!()#!3/22#%#.4#!/.!0*6&%*(&%7!/(#-!/.40,'/&.F#@40,'/&.!
E&.'(%,4(!5*0/3/(7!*.3!%#'$&.'/5#.#''!%#',0('!8#%#!#?,*007!1&&3!2&%!6&()!/.'(%,-#.('!6,(!()#!-*()#-*(/4*007!3#%/5#3!5#%'/&.!!-*7!6#!*!-&%#!2#*'/60#!/.3#@!6#4*,'#!0*6&%*(&%7!/(#-'!*%#!#@40,3#3!
40
Table 2.2b: Studies using empirical methods to test differences between clinimetric and psychometric methods
D(,37! N.'(%,-#.(! I/3!-#()&3'!0#*3!(&!*!3/22#%#.4#!/.!4&.(#.(i!
I/3!4)&/4#!&2!-#*',%#-#.(!$%&$#%(/#'!(#'(#3!3/22#%!6*'#3!&.!(7$#!&2!3#5#0&$-#.(!-#()&3i!
D,$$&%('!&,%!2%*-#8&%Pi!
"#%#!%#',0('!&2!-#*',%#-#.(!$%&$#%(/#'!',$#%/&%!2&%!&.#!-#()&3!4&-$*%#3!(&!()#!&()#%i!
e,./$#%!#(!*0;!ABBH!
M'()-*!c,*0/(7!&2!d/2#!c,#'(/&..*/%#!!
j#'! J&! j#'! J&(!*''#''#3!
+*%@!#(!*0;!ABBB!
I/'*6/0/(/#'!&2!M%-;!D)&,03#%!*.3!S*.3!<IMDS>!c,#'(/&..*/%#!
j#'! J&! j#'! J&!
a#*(&.!#(!*0;!CXXb!
c,/4PIMDS!c,#'(/&..*/%#!<&6(*/.#3!2%&-!()#!'*-#!3*(*F!$&$,0*(/&.!*'!+*%@!#(!*0>!
j#'! J&! j#'! J&!
Y/6#%*!#(!*0;!CXX`!
+4J#8!c,*0/(7!&2!d/2#!*2(#%!+7&4*%3/*0!N.2*%4(/&.!<cd+N>!c,#'(/&..*/%#!
j#'! J&! j#'! J&!
G,%.#%!#(!*0;!CXXB!
T#3/*(%/4!O04#%*(/5#!E&0/(/'!M4(/5/(7!N.3#@!<TOEMN>!
j#'! J&! j#'! J&!
41
2.3.5 Synthesis
We proposed a new conceptual framework (Figure 2.3) that links the two dominant schools of
measurement encountered in health research. Our scoping review highlighted that the science of
measurement is a specialty that includes several subspecialties which can overlap in some areas
while being distinct in others (Table 2.1). We found that the differences between clinimetrics
and psychometrics lie in the item development and precision/structure phases but not in the
measurement performance stage. Furthermore, we suggest that an overlap exists between the
different approaches of instrument development, with psychometrics and clinimetrics sitting at
opposing poles of the blended zone.
We identified three phases of instrument development. The first phase (item development and
scoring) includes the previously defined categories of item generation, reduction, definition of
response categories and scoring. The second phase (structure/precision) includes the verification
of the structure and content. The third phase (measurement performance) includes the evaluation
of the measurement properties of the instrument.
Our framework distinguishes between development stages of clinimetric and psychometric
measures. Specifically, clinimetric methods include target criterion indicators (e.g., diagnosis or
death within 24 hours) (Figure 2.3). They also include more clinical consensus regarding the
scope and structure of the measure in the structure and precision phase of measure development
(Figure 2.3). In contrast, the psychometric measures include untargeted scales that rely on
statistical approaches. Psychometric approaches favour techniques such as item total
correlations, or factor analysis and internal consistency to clinical consensus in decisions about
content.
We found several measures that were developed using a combination of both clinimetric and
psychometric principles. In some cases, consensus, clinical expert and patient opinion was
combined with psychometric-based statistical methods such as factor analysis or item-total
correlations.[99] In other cases, statistical methods and expert opinion were used to select items
that were subsequently subjected to clinical expert decisions on final item retention.[85] We
suggest that the development stages of these types of measures could be blended into a mutually
“informed zone” of overlap that merges the principles of both fields (Figure 2.3).
42
The third phase of our framework refers to testing the performance of the instrument. This phase
includes the assessment of reliability, validity and responsiveness of the instrument. The process
involves testing the performance of the numeric scores or classifications obtained from the
instrument. Both psychometrics and clinimetrics agree on the importance of this phase. Both
fields describe similar methods and analyses to study the reliability, construct validity,
responsiveness and interpretability of instruments. For example, both schools agree that
construct validity should be based on a priori hypotheses against which observed relationships
are compared and that similar analytical approaches can be used (e.g. correlations or known
groups) to test the relationships. However, differences in construct validity may exist based the
structure of the instrument (e.g. one continuous measure compared to a multidimensional
instrument). Furthermore, it is important to note that statistical methods do not apply equally to
both schools. For example, internal consistency is useful in psychometrics to ensure that the
multiple items belong to the same construct. This is not the case for clinimetric tools because the
items do not need to be correlated. The measures developed in the informed zone are more
challenging. They may include multiple items of the same construct, but may also include the
full range of the clinical experiences that do not represent a single factor (e.g. Asthma Quality of
Life Questionnaire).
2.4 Discussion We conducted a scoping literature review and proposed a new conceptual framework that unites
clinimetrics and psychometrics. We suggest that the ongoing debate between clinimetrics and
psychometrics is unnecessary and creates uncertainty as to the value of well-developed outcome
measures. We proposed to resolve this debate by identifying the unique strengths of each field in
order to specify situations when they should remain unique or be combined for use by both
fields.
Our framework is based on the literature that compares clinimetrics and psychometrics. Several
authors have demonstrated that the nature and structure of an instrument can be highly dependent
on the measurement school that influenced its development (Table 2.2b). Others point to the
differences between clinimetrics and psychometrics but also to the overlapping informed zone
that is common to proponents of both fields. Specifically, de Vet et al and Streiner et al
43
suggested that clinimetric indexes are more content driven and psychometric indexes are more
statistically driven although both approaches are used by both fields.[32,118]
In the current “Era of Assessment and Accountability”, measuring outcomes to identify
appropriate and cost-effective treatment approaches is a major focus in health care. Our search
demonstrates that the amount of literature published during this era is growing rapidly and that
an increasing level of attention focuses on the psychometric-clinimetric divide. In a recently
published textbook, de Vet et al stated that appropriate methods must be used by both fields
regardless of the clinimetric-psychometric distinction.[33] These authors refrained from
distinguishing measurement properties as clinimetric or psychometric. This suggests that clinical
measurement has already started moving toward integrating methods. Furthermore, this
international multi-disciplinary research group (largely clinimetric in focus) has reached
consensus on the taxonomy, terminology, and definitions used in the measurement field.[89]
They are currently developing critical appraisal tools to identify studies of high methodological
quality in systematic reviews on measurement properties which will help provide
recommendations for standardized methods in future publications (www.cosmin.nl/). Our
framework can serve to guide the development of such evaluation tools leading to
standardization by providing clarity on distinctions in instrument development that should be
graded differently.
Our study has strengths and limitations. We used scoping study methods to obtain information
relevant to the measurement debate. Using these established methods to perform our review is a
strength of our study. A scoping study differs from a systematic review because authors do not
assess the quality of included studies.[3,82] However, scoping studies usually address broader
topics in which different types of articles may be applicable in contrast to systematic reviews
which address specific questions through a relatively narrow range of quality-assessed
studies.[3,82] Our literature search highlighted that indexing of articles on the topic of
clinimetrics was limited. We used keywords instead of subject headings or other database-
specific indexing descriptors to search the literature on clinimetrics. Indexing descriptors
provide a controlled vocabulary for use in bibliographic records. This controlled vocabulary
leads to results with more specificity in literature searches. In our search, no indexing
descriptors relevant to clinimetrics were available (e.g. no relevant Medical Subject Heading
[MeSH] in Medline). This likely limited our ability to capture articles that were specifically on
44
clinimetrics. In fact, five of our included studies were obtained from article bibliographies as our
search could not target them. In addition, some journals were not consistently indexed in
databases (e.g. our database searches missed two of the included articles because only selected
issues of the journal were indexed). We propose that clinimetrics should be introduced as a
MeSH term in Medline to assist with more accurate indexing of articles on the topic. However,
Streiner pointed out that searching specifically for clinimetric articles might misdirect
researchers and clinicians and lead to ignorance of the vast contributions of psychometrics to
measurement.[118] Therefore, we propose that ‘measurement properties’ should be the indexing
descriptor for all measurement fields assessing instrument properties. This comprehensive term
would improve database searches on the topic by capturing all relevant literature from both fields
in electronic database literature searches. Until the indexing methods get changed, a
combination of all relevant terms needs to be used in comprehensive searches to avoid missing
articles. As apparent from one of our included studies (Table 2.1), some authors continue to use
terminology such as ‘clinimetric properties’ even in 2012 in a way that may overlook
contributions from the psychometric field.[43] We suggest that our framework should be used as
a guide in future instrument development and evaluation to avoid the continuation of this divide.
The development of our framework may have been limited by the available literature.
Specifically, a limitation that affected our search strategy is the limited number of relevant
MeSH terms available to conduct our search. This may have lead to missing relevant articles
since subject heading searches are designed to capture relevant literature more accurately.
However, we used variations of keywords to compensate for this limitation. Similar problems
were encountered by other authors in searching for literature on measurement properties.[124]
PubMed search filters to capture articles on instrument measurement properties have been
developed for this reason.[124] These filters should be used by researchers performing
systematic reviews and by clinicians scoping the literature for instruments with sound
measurement properties to use clinically. Our search aimed to capture articles on the clinimetric-
psychometric debate excluding articles discussing measurement properties of specific
instruments with no discussion of the difference between fields. Therefore, these filters were not
appropriate for our scoping review. Other schools of measurement, such as sociometrics and
anthropometrics, may also have unique features, but we have not reviewed them in this paper.
Literature on item response theory (IRT) was not included in our literature review or in the
45
proposed framework because it is based on a different theory. However, IRT likely belongs in
the informed overlap zone of instrument development. Specifically, IRT aims to develop
unidimensional instruments (similar to psychometric methods), but it provides different score
weights for items of different difficulty levels within an instrument. This suggests that a single
instrument can include multiple constructs rather than only highly inter-correlated attributes of a
single construct (similar to clinimetrics). However, this literature was not evaluated in the
current study and needs to be assessed in future research to determine its relationship and
contribution to the presented conceptual framework. Considering that the two measurement
fields (i.e. clinimetrics and psychometrics) demonstrate both uniqueness and an overlap, future
research should revisit other measurement fields. In doing so, we can determine if all
measurement fields have unique development stages and similar ways of evaluating the
performance of their outcome measures.
2.5 Conclusion Our scoping review provides the supporting evidence for a new framework for measurement
development and evaluation. The framework proposes a shift in the early conceptual
foundations and development of a measure but finds a converging point in the measurement
performance of an instrument (e.g. reliability, validity, responsiveness). Our framework
highlights that many measures used in clinical medicine blend features of clinimetric (focus on
patient and expert input) and psychometric approaches (focusing on statistical analysis).
Assessing the quality of blended measures is challenging compared to assessing purely
clinimetric or psychometric measures. However, we found that all measures converge to the
same nomenclature and methods in the measurement performance stage. Therefore, labeling
measurement properties as a clinimetric or psychometric may not be useful. Our new framework
will help scientists understand the measurement methodology by bringing together information
from both sides of a protracted debate.
46
Chapter 3 :
Can Recovery from Whiplash-associated Disorders be Measured
Reliably in Patients with Acute Whiplash-Associated Disorders?
A Test-retest Reliability Study of the Whiplash Disability
Questionnaire
3.1 Introduction More than 80% of individuals injured in traffic collisions suffer from whiplash-associated
disorders (WAD) and 50% of those will experience neck pain one year later.[15,19] Moreover,
whiplash is an important source of disability.[15] However, disability is difficult to define and
measure because it is highly contextualized and varies from person to person, place to place, and
from situation to situation.[5] Most of the current measures used to measure WAD-related
disability lack comprehensiveness and they have only been studied in sub-acute and chronic
patients.[63,98]
To be clinically useful, self-report outcome measures must be valid and reliable. This is
important to understand the day to day variability in score (reliability) and to quantify true
changes in state.[119] The minimal detectable change (MDC) assesses this variability as the
minimal change an instrument can detect over the day-to-day variability of individuals with a
stable condition.[24,138] High reliability is necessary when interpreting results such as change
scores and individual responses to interventions to ensure that change detected by the instrument
is due to true change in state beyond the daily variability of individuals with a stable condition
(i.e. >MDC).[24,138]
The WDQ is a self-report questionnaire based on the International Classification of Functioning
(ICF) framework of disability and includes items from the Neck Disability Index (NDI) such as
pain intensity, personal care, lifting and work.[99,131] It also includes items important to WAD
patients (fatigue, participation in sports, depression, social activities and anger).[63,99] The
WDQ was designed as an evaluative tool to measure response to treatment and its psychometric
properties have been studied in chronic WAD patients.[78,99] Its test-retest reliability was
reported to be excellent (ICC[3,1]=0.93 over 1 month) and its MDC was adequate (MDC=15
47
points out of 130 with 90% confidence).[49,99,140] Development in a chronic population may
restrict its use in patients with acute injuries because their disability status is likely to change
more rapidly.[111]
The test-retest reliability and the MDC of the WDQ are unknown in patients with acute WAD.
The purpose of our study was to determine the short-term test-retest reliability of the WDQ and
its MDC in a cohort of patients with acute WAD. We also aimed to determine whether the WDQ
test-retest reliability varied with WAD grade and with a participant’s recall of their baseline
WDQ responses.
3.2 Methods
3.2.1 Participants
Eligible participants made an insurance claim for traffic injuries to AVIVA Canada between
February 2008 and August 2009. Participants were included if they: 1) were at least 18 years of
age; 2) resided or worked in the Greater Toronto area, Burlington, Cambridge or the Kitchener
area; 3) were diagnosed with WAD Grades I-III[113] by two trained study
coordinators/chiropractors; and 4) had WAD of less than 3 weeks in duration. Participants were
excluded if they: 1) were unable to provide written informed consent; 2) were unable to complete
the interview in English; and 3) had a history of neck surgery.
3.2.2 Procedure
Potential participants were recruited alongside the University Health Network (UHN) Whiplash
Intervention Trial but participation for this study was offered regardless of their eligibility for the
trial.[26] Informed consent was obtained from all participants prior to enrolment in the study.
The University Health Network and University of Toronto Research Ethics Boards approved the
study.
3.2.3 Data
Data collection: The WDQ, change in neck pain question and a memory question (i.e. “Do you
remember your answers to the questions asked three-days ago?”) were administered in a
standardized in-person interview at baseline and again 3-5 days later. A 3-5 day period was
48
selected as a reasonable time frame within which participants would not remember their previous
responses and would be unlikely to experience a change.
Stability indicator: We determined whether a participant’s condition was stable (indicator of
stability) using the self-rated change in neck pain question: “How do you feel your neck pain has
changed since the injury?”.[90] This question included seven-Likert response options ranging
from ‘Very much better’ to ‘Very much worse’. We chose this question because it has good test-
retest reliability and contains a time anchor.[90] We used two indicators of stability: 1) the ‘No
change’ response option and 2) an expanded definition of stability including response options
‘Slightly better’, ‘No change’, and ‘Slightly worse’. A similar expanded definition was used in
previous research.[72]
Whiplash Disability Questionnaire (WDQ): The WDQ includes 13 items that measure the effect
of whiplash (Table 4).[99] Each item is scored from 0 (no impact) to 10 (greatest impact) on a
numerical scale. The responses are summed from 0 (no disability) to 130 (complete
disability).[99] As recommended by developers, missing item values were considered zeros in
the summation to obtain a total WDQ score.[99,140]
3.2.4 Sample Size
The sample size required to detect an Intra-class Correlation Coefficient (ICC) [model 2,1] of 0.9
using a lowest acceptable ICC value of 0.8 at a 0.05 level of significance is 46.[81,111,133]
3.2.5 Analysis
3.2.5.1 Test-Retest Reliability
We used Shrout and Fleiss Model 2,1 for multiple-raters to calculate the ICC.[111,119,138]
Model 2,1 reflects the repeated structure of the data and allows the results to be interpreted for
more than our specific testing situation.[111,138] We used an ICC value of 0.8 or above as a
standard for a reasonable level of reliability for group level analyses and 0.9 as a reasonable level
for analysis at the individual level.[100] ICC’s and 95% confidence intervals (CI) were
calculated for overall scores and for individual items. Missing values were excluded from
individual item ICC calculations. We used a memory question (i.e. ‘Do you remember your
49
answers to the questions [asked three days ago]?’) to determine if the ICCs were sensitive to
memory. Finally, we calculated the ICCs in participants with different WAD grades to
determine if test-retest reliability values were affected by WAD grade.
3.2.5.2 Minimal detectable change
The MDC statistic at 95% confidence was calculated using the standard error of measurement for
repeated measures.[24] The value of the MDC95 represents the change above which there is 95%
confidence that the change is greater than the day to day variability of a stable
participant.[24,138] The MDC is, therefore, determined in participants reporting no change. We
calculated the MDC using two methods. First, we calculated the standard error of measurement
(SEM) using the standard deviation of the mean baseline total WDQ and the ICC for the no
change group and then the MDC at 95% confidence. Second, we obtained the SEM directly
from a Repeated Measures Analysis of Variance (ANOVA) in the form of the root mean square
error (eliminating chances for potential errors in MDC calculations using standard deviation and
the ICC).
3.2.5.3 Sensitivity Analyses
We performed sensitivity analyses to determine the impact of missing data on the test-retest
reliability. We repeated the analyses after imputing the midpoint WDQ item value (5), the
highest item value (10), and the mean of other WDQ items for the individual.[65] A complete
case analysis was also performed by calculating an ICC (2,1) for the participants in the entire
sample and the participants in the group reporting no change who had no missing values.
All statistical analyses were performed using SAS software (SAS 9.1 for Windows, SAS
Institute Inc., Cary, NC, USA).
3.3 Results WDQ data was obtained from all 66 participants at both administrations. At follow-up, 62
participants were asked questions about remembering their baseline and change in neck pain was
obtained from 54 participants. These questions were added to the follow-up after data collection
50
was initiated. Therefore, there were 4 missing values for remembering the baseline and 12
missing values for the change in neck pain question because the questions were not administered.
3.3.1 Descriptive statistics
On average, participants were enrolled 5.6 days following their collision. The sociodemographic
characteristics of the sample are presented in Table 3.1. The mean age of the sample was 41.6
years and 71.2% were females. The mean baseline WDQ score was 49.3 out of the total score of
130 and 46.5 at follow-up. Of the 54 participants who were asked the change in neck pain
question, 15 (27.8%) reported no change, 31 (57.4%) reported to be slightly or very much better
and eight (14.8%) reported to be getting worse. The mean baseline WDQ score for participants
reporting no change was 56.9 at baseline and 56.6 at follow-up.
3.3.2 Completeness of WDQ
At baseline, 24.2% (16/66) had one missing item and one participant (1.5%) had two missing
items. At follow-up, 16.7% (11/66) of the sample had one missing item. The most common
question with missing values was the effect of whiplash injury on sporting activities (10.6% of
the entire sample and 26.7% of the sample with no change) followed by the effect of whiplash
injury on driving or using public transportation (1.5% of entire sample and 6.7% sample with no
change). Complete case analysis (with no missing values) included 46 participants in the entire
sample and 11 in the group reporting no change. Demographics of participants with missing
values did not differ compared to participants with complete data.
51
Table 3.1: Baseline demographic characteristics of patients with acute whiplash associated
disorders.
Characteristic Entire Sample No Change Subgroup N = 66 N=15
Female, no. (%) 47 (71.2) 8 (53.3) Age, years Mean (SD); range 41.6 (12.7); 19.6-73.5 43.3(12.3); 19.6-66.5 Time since injury, days Mean (SD); median; range 5.6 (4.4); 4.0; 0-19 4.8 (3.5); 4.0; 1-15 WAD grade I, no. (%) 19 (28.8) 4 (26.7) II, no. (%) 47 (71.2) 11 (73.3) III, no. (%) 0 (0) 0 (0) WDQ Total Score, Mean (SD); median; range
49.33 (28.8); 48.5; 2-116 56.93 (18.6); 53.0; 25-92
Highest level of education, no. (%) High school or less 11 (16.7) 5 (33.3) Post secondary or some university 18 (27.3) 3 (20.0) Technical school graduate 11 (16.7) 2 (13.3) University graduate 26 (39.4) 5 (33.3) Income, no. (%) $0-$49,999 34 (51.5) 5 (33.3) $50,000-$59,999 11 (16.7) 3 (20.0) $60,000-$79,999 7 (10.6) 2 (13.3) $80,000+ 12 (18.2) 4 (26.7) Did not respond 2 (3.0) 1 (6.7) Lawyer Involvement in the Claim (%) 0 (0) 0 (0) Pain Intensity, Mean (SD)* Neck 5.74 (2.0) 6.40 (1.3) Shoulder 4.32 (3.0) 5.20 (2.7) Low Back 3.97 (3.4) 2.67 (3.4) Headache 3.75 (3.2) 3.73 (3.2) Arm 2.28 (2.8) 2.53 (2.9)
Abbreviations: SD = standard deviation *Numeric rating scale of 0-10 (0 = no pain and 10 = worst pain ever)
3.3.3 Test-retest reliability
The ICC(2,1) for the total WDQ score was 0.89 (95% CI 0.85-0.92) [Table 3.2]. Participants
who remembered their responses had similar test-retest reliability compared to participants who
52
did not remember their previous responses. The ICC was similar across WAD grades and for
those who reported a change or no change in their neck pain [Table 3.2]. Sensitivity analysis
demonstrated that the ICCs were not influenced by missing data [Table 3.3].
Table 3.2: Intra-class Correlation Coefficient for the Total Summary Score categorized by the
report of no recovery on the change in neck pain question and memory effects
Total Summary Score of the WDQ n ICC (2,1) 95% CI
All participants 66 0.89 0.85-0.92 WAD grade 66 Grade I 19 0.82 0.70-0.91 Grade II 47 0.88 0.83-0.92 Question about remembering the baseline responses answered 62 Participants reporting that they remember their responses 36 0.89 0.84-0.94 Participants reporting that they do not remember their responses 26 0.85 0.76-0.92 Change in neck pain question answered 54 Participants reporting no change 15 0.83 0.69-0.93 Participants reporting slight to no change 32 0.85 0.77-0.91
3.3.4 Individual item test-retest reliability
The test-retest reliability for each of the 13 WDQ items ranged from ICC = 0.60 (95% CI 0.57-
0.77) for non-sporting leisure activity to ICC = 0.85 (95% CI 0.80-0.90) for the anxiety-related
question [Table 3.4]. The questions on whiplash-related pain (ICC=0.66; 95% CI 0.56-0.75) and
on the effect of the whiplash injury on sporting activities (ICC=67; 95% CI 0.56-0.87) also had
lower ICCs. The results for participants reporting ‘No change’ on the change in neck pain
question were similar for most items. However, some of these estimates differed. For example,
the reliability was lower for items related to work/home/study duties, tired/fatigued, non-sporting
leisure activity and anger [Table 3.4]. While this may be related to less precise estimates
(because of the sample size (n=15)), it is also possible that the reliability of these items is worse
for participants who report no change in neck pain.
53
Table 3.3: Sensitivity Analysis for the Intra-class Correlation Coefficient for the Total Summary
Score
Imputed value n ICC(2,1) 95% CI Entire sample 0 66 0.89 0.85-0.92 5 66 0.89 0.85-0.92 10 66 0.88 0.84-0.92 Mean of individual 66 0.89 0.85-0.92 Excluding missing values 46 0.89 0.85-0.92 Sample reporting no change 0 15 0.83 0.69-0.93 5 15 0.85 0.73-0.94 10 15 0.87 0.76-0.95 Mean of individual 15 0.84 0.72-0.94 Excluding missing values 11 0.89 0.77-0.96
Table 3.4: Intra-class Correlation Coefficient for individual items of the WDQ
Entire Sample Subgroup reporting no change
Item Individual Item theme n* ICC (2,1) (95 % CI) n* ICC (2,1) (95 % CI)
1 Pain 66 0.66 0.56-0.75 15 0.76 0.59-0.90 2 Personal care 66 0.81 0.75-0.87 15 0.84 0.72-0.94 3 Work/home/study duties 65 0.75 0.67-0.82 15 0.54 0.27-0.79 4 Driving/Public transport use 64 0.78 0.70-0.84 14 0.71 0.50-0.88 5 Sleep 66 0.78 0.71-0.84 15 0.74 0.55-0.89 6 Tired/Fatigued 66 0.72 0.64-0.80 15 0.19 -0.11-0.56 7 Social activity 65 0.84 0.78-0.89 15 0.77 0.60-0.90 8 Sporting activity 50 0.67 0.56-0.78 11 0.58 0.31-0.82 9 Non-sporting leisure activity 65 0.60 0.49-0.71 15 0.44 0.15-0.73 10 Sadness/depression 66 0.81 0.75-0.87 15 0.62 0.37-0.83 11 Anger 66 0.74 0.66-0.82 15 0.35 0.05-0.68 12 Anxiety 66 0.85 0.80-0.90 15 0.78 0.62-0.91 13 Concentration 66 0.75 0.67-0.82 15 0.81 0.67-0.92
* Missing items were excluded in individual item ICC calculations
54
3.3.5 Minimal detectable change
For the 15 participants who reported no change in neck pain, the MDC95 was 21.4 (SD=14.9).
The parameters used in this calculation were SDbaseline WDQ = 18.6 and the entire sample ICC
which produced an SEM of 7.7. The MDC and SEM obtained from the ANOVA were 21.9 and
7.9, respectively. These results suggest that the WDQ requires a change of 22/130 points before
one can be 95% confident that the change was beyond the daily variability of an individual with
a stable condition.
3.4 Discussion In our study of participants with acute WAD, the WDQ demonstrated very good reliability and
moderate boundaries of error (i.e. one sixth of the scale). Our stratified analysis demonstrated
that the WDQ remained reliable regardless of WAD grade, memory effects or the report of no
change in neck pain.
Our results agree with the previous study of short- and medium-term test-retest reliability in
patients with chronic WAD.[140] An ICC(3,1) measured over one month was reported as 0.86
(n=52) and 0.93 in 24 participants who reported no change in their condition. Contrary to our
analysis, Willis et al. used a Model-3 ICC to compute reliability.[140] They justified using this
model because they administered the WDQ (one questionnaire) even though it was administered
at two time points. This may have led to an overestimation of the test-retest reliability measured
from the same patient at two time-points. We were interested in estimating the stability of the
WDQ over time; therefore, Model 2 (random effects of time interval) was indicated instead of
Model 3 (fixed effects).[138]
We found that some WDQ items demonstrated adequate reliability (i.e. anxiety, social activity,
sadness/depression and personal care) while others did not (i.e. pain, sporting activity, non-
sporting leisure activity). Most items (on their own) were less reliable than the total WDQ score.
It is known that the ICC statistic can be affected by the range of possible scores.[138] This
likely lead to differences in ICCs because the range of possible scores is smaller for individual
items (out of 10) than for the overall score (out of 130). Furthermore, the sample size in the
group reporting no change was small and the ICC estimates for individual items therefore had
55
poorer precision in that subgroup. The follow-up period of 3-5 days was chosen to ensure WAD
stability in the entire sample. Specifically, minimal to no change was expected during this very
short interval of time in the acute phase of WAD. Therefore, the results of the entire sample were
a good demonstration of reliability in this population of stable participants with acute WAD.
Previous research in Australia has found MDC values similar to our results.[140] Willis et al
reported the 90% MDC of the WDQ over one month to be 15 points for 24 subjects in the
population with chronic symptoms.[140] Based on convention, we reported the MDC with the
95% confidence intervals.[138] Our MDC95 was larger (22 points) because we used wider
confidence intervals to compute the MDC and because there is more variability in symptoms in a
sample of acute WAD participants than in chronic ones. In our sample, the MDC90 was 18
points (n=15) which is very similar to the Australian figure.
Our study had strengths: 1) the recruitment of a sample with acute WAD injuries; 2) a perfect
follow-up rate (100%) and few missing values; 3) the use of the appropriate statistical model to
compute the ICCs (ICC model 2,1); and 4) the use of the conventional 95% confidence intervals
to estimate the MDC boundaries of error.[35,100,138] However, it also had limitations. One of
the limitations was missing data, specifically with the sporting activity item. This may be related
to the fact that we studied an acute sample of patients who may not have been able to return to or
attempt their sporting activity. We conducted a sensitivity analysis and found that the ICC
remained stable using complete case analysis and with the imputation of means and extreme
values. Therefore, we found no evidence that missing data biased our results. Another limitation
was a short period of time for repeat administration of the same questionnaires (3-5 days). This
may have affected the results if participants remembered their original answers; however, we
found that memory did not have a significant effect. Since the environment differed between
administrations (i.e. in-person interview at baseline and telephone interview at follow-up), this
can also be considered a limitation. However, both administrations were interviewer-
administered (with the interviewer verbally asking participants WDQ questions). Therefore, the
type of administration should be considered similar for both interviews and this additional
variability in conditions would have likely biased the reliability estimates toward the null. Since
the statistics were satisfactory, it suggests that true reliability values are likely better than what is
reported in this study and that measurement error may be overestimated. Finally, the change in
neck pain question was used to detect change over time in reliability testing. However, this
56
question asked if the participants’ neck pain had changed since the injury, not since the baseline
questionnaire was administered. It is possible that most of the symptom change occurred since
the injury, but before the baseline was administered. This would result in a portion of the sample
being falsely classified as changed when they have not changed between test administrations.
The ICC, therefore, was limited for the group reporting no change due to the inaccuracy of the
change in neck pain question, but adequate reliability was, nevertheless, demonstrated in both
groups.
Future research assessing WDQ responsiveness needs to consider our MDC results in their
calculations and in their assessment of the minimal clinically important change. Also, while
reliability is necessary in research because it allows accurate interpretation of results by
minimizing measurement error in statistical analysis, it is not sufficient on its own to establish
the usefulness of a measure.[24] Therefore, studying construct validity is the next step in
establishing the psychometric properties of this measure.
3.5 Conclusion
The results of this study suggest that the WDQ has very good test-retest reliability in individuals
with acute WAD. The reliability of the WDQ remains stable for participants reporting no change
which supports its use in research and in clinical practice. The WDQ had wide boundaries of
error based on the MDC value. Therefore, it may be limited in detecting true change in an
individual patient.
3.6 Acknowledgement
This study was funded by an industry grant from AVIVA Canada Incorporated to the University
Health Network for the UHN Whiplash Intervention Trial. Maja Stupar was funded by a Vanier
Canada Scholar Canadian Institutes of Health Research award. The authors declare no conflicts
of interest.
57
Chapter 4 :
Exploratory Factor Analysis, Validity and Responsiveness of the
Whiplash Disability Questionnaire in Adults with Acute Whiplash-
associated Disorders
4.1 Introduction Although the burden of WAD-related disability is significant in society, there are few WAD-
specific measures with sound measurement properties that can be used to describe its impact on
individuals and society.[61,63] One of the available instruments is the Whiplash Disability
Questionnaire (WDQ), a 13-item disability measure that has been validated in adults with
chronic WAD and that can be used to provide a summative total score of whiplash-related
disability.[99,140] The development of the WDQ was based on the disability framework of the
International Classification of Functioning, Disability and Health (ICF) which includes the
constructs of impairment, activity limitations and participation restriction.[121,142] In addition
to the items included in previous instruments (e.g., neck pain, impact on personal care, lifting,
concentration) the WDQ includes items deemed important by individuals with WAD (e.g.
fatigue, participation in sports, depression, socializing with friends).[63,99] Therefore, the WDQ
includes items that cover multiple concepts, suggesting that it may contain distinct factors or
constructs. However, previous research in the chronic WAD population indicates that the WDQ
contains only one factor.[99] It is important to note that this determination was made using
principal component analysis, a method that is not designed to determine the factor structure of
measurement tools. To our knowledge, no one has used factor analysis to determine the factor
structure of the WDQ in the acute WAD population.
The measurement properties of the WDQ have been studied in a chronic and stable WAD
population.[99,140] Willis et al reported that the WDQ has excellent test-retest reliability (ICC
= 0.90 over 24 hours and ICC = 0.96 over one month) in chronic patients.[140] Similarly, we
found that the short-term test-retest reliability was adequate [i.e. ICC > 0.85] (n=66) in adults
with acute WAD (within 21 days of their accident). Face validity and responsiveness of the
WDQ were only studied in chronic WAD; its validity and responsiveness in acute WAD samples
58
is unknown.[99,140] Pinfold et al reported that according to a multidisciplinary medical
committee panel, the WDQ has reasonable face validity.[99]
Given the lack of standard methods to determine responsiveness, the responsiveness of the WDQ
has been studied using various methods. Specifically, Willis et al reported: 1. the responsiveness
statistic for the subgroup of participants reporting recovery or worsening on a global recovery
question; 2. they reported the effect size and standardized response means for the overall chronic
study population and; 3 they correlated overall WDQ changes scores with the global recovery
question as the external anchor.[140] The responsiveness statistic was reported as 1.06 for
participants who improved over one month and -1.86 for those who got worse.[140] These
values suggest reasonable change over time in the appropriate direction for those reporting
change by demonstrating values that were not close to zero.[100,144] In contrast, effect size and
standardized response means calculated for the overall study population were almost zero, which
suggests that the overall study population had minimal to no change in symptoms over one
month.[140] Although effect size and standardized response means can be used similarly to the
responsiveness statistic to demonstrate change over time using an external anchor, Willis et al
only reported the lack of change in the overall study population using these statistics.
Spearman’s rank correlation between the WDQ change scores and patient-perceived recovery
(scale ranging from -5 to +5) was adequate (r = 0.67).[140] Although correlations of 0.7 are
often used to support responsiveness, it must be remembered that values below 0.7 are common
because of the measurement error associated with each instrument when measuring change over
time (i.e., compared to using actual scores of each instrument).[33]
The validity and responsiveness of the WDQ need to be assessed in the acute WAD population
to determine if the instrument can be used in patients with recent injuries. The purpose of this
study was to determine the factor structure, construct validity and responsiveness of the WDQ in
a sample of individuals with acute WAD.
4.2 Methods
4.2.1 Participants and Procedures
Eligible participants made an insurance claim for traffic injuries to AVIVA Canada between
February 2008 and August 2009. Participants were included if they: 1) were at least 18 years of
59
age; 2) resided or worked in the Greater Toronto area, Burlington, Cambridge or the Kitchener
area at the time of their motor vehicle collision; 3) were diagnosed with WAD Grades I-III[113]
by a trained study coordinator; and 4) had WAD of less than 3 weeks in duration. Participants
were excluded if they: 1) were unable to provide written informed consent; 2) were unable to
complete the interview in English; and 3) had a history of neck surgery.
Potential participants were recruited and assessed by two study coordinators who determined
their eligibility for participation in the University Health Network (UHN) Whiplash Intervention
Trial.[26] The eligibility assessment included three telephone screening questions (i.e., age, self-
report of neck pain intensity on a 11-point numerical rating scale (NRS) and time since injury).
Those individuals who met the telephone screening criteria were invited to a clinical assessment.
This assessment included a history, physical examination and imaging when necessary. Potential
participants were asked to participate in the current study regardless of their eligibility for the
trial. The eligibility for the trial was slightly different in that only participants with WAD Grade I
and II were included and those participants were randomized to different treatment groups. The
current cohort study also included WAD grade III and participants who were not randomized
into the trial. The Quebec Task Force defined WAD grade I clinical presentation as a neck
complaint of pain, stiffness or tenderness only and WAD grade II as a neck complaint with
musculoskeletal signs such as decreased range of motion and point tenderness.[113] WAD grade
III also includes neurological signs (e.g. decreased tendon reflexes, weakness, sensory deficits).
Informed consent was obtained from all participants prior to enrolment in the study. The UHN
and the University of Toronto Research Ethics Boards approved the study.
4.2.2 Data Collection
Participants completed an in-person, interviewer-administered questionnaire at baseline and at a
6-week follow-up in-person, or telephone interview. At baseline, we collected data on
demographics, the whiplash disability (WDQ), pain intensity (Numerical Rating Scale), neck
disability (Neck Disability Index [NDI] and the Bournemouth Questionnaire), mental health
(CES-D), and general health (SF-36). Whiplash disability (WDQ) and a global recovery
question were also collected by interviewer-administered questionnaire at 6-weeks.
60
4.2.2.1 Whiplash Disability Questionnaire
The Whiplash Disability Questionnaire (WDQ) consists of 13 items that measure the effect of
whiplash on pain, personal care, work/home/study duties, driving/public transportation, sleep,
tiredness/fatigue, social activity, sporting activity, non-sporting leisure activity,
depression/sadness, anxiety, anger and concentration.[99] Each item response is rated on a
numerical scale from 0 (no impact) to 10 (greatest impact). The questionnaire responses are
summed for a maximum possible total of 130 points (designating complete disability) and the
minimal possible score of 0 (designating no disability).[99] The WDQ was conceptualized using
the ICF and the selection of its items was inspired by the Neck Disability Index (NDI) and a list
of WAD features deemed important to patients.[63,99,142] Items such as pain intensity,
personal care, lifting and work were obtained from the NDI, which is a 10-item measure of neck
disability.[131] In addition, the WDQ contains items found to be important to WAD patients
such as fatigue, participation in sports, depression, social activities and anger.[63,99] The WDQ
was designed as an evaluative tool to measure change in whiplash-related disability over
time.[78,99]
4.2.2.2 Numerical Pain Rating Scale
The numerical rating scale (NRS) is widely used in pain research.[60] Its psychometric
properties were found to be adequate in musculoskeletal injuries.[21,22,96] Specifically, it has
moderate test-retest reliability (ICC=0.76, 95% CI 0.51-0.87; period=2.5+/-0.96 days) and
adequate responsiveness (AUC=0.85, 95% CI 0.78-0.93) in patients with neck pain.[22] In
patients with acute injuries, the NRS performs well with scores that are highly correlated with
the pain intensity measured with the Visual Analogue Scale (VAS).[8] The NRS has good
discriminant qualities in patients with acute pain and better reliability than the VAS in patients
with trauma.[7]
4.2.2.3 Neck Disability Index
The Neck Disability Index (NDI) is a functional status, self-report questionnaire with 10 items.
Each item has six possible responses describing the level of severity. Level 0 in each item
denotes no pain/disability and level 5 describes maximal pain/disability. The scores of each item
are summed to a maximum total of 50 points. The summed score can be used as a measure of
disability based on the originally proposed scale (0-4 no disability; 5-14 mild; 15-24 moderate;
61
25-34 severe; >35 complete disability), or multiplied by two to obtain a percentage ranging from
no disability (0%) to complete disability (100%).[131] The NDI was designed to measure
change over time of neck pain-related disability in whiplash and persistent neck pain patients in
response to treatment.
The NDI has been widely used as a measure of pain and disability for neck pain and WAD in
clinical and research settings. The NDI is a reliable and responsive measure in patients with
acute neck pain similar to the WAD patients under investigation in this thesis.[22,56,116,139]
The construct validity of the NDI is reported to be adequate.[56,139] However, the face and
content validity of this region-specific measure may not capture all aspects of whiplash-related
disability. Although neck pain is the cardinal symptom of whiplash, it is certainly not the only
symptom experienced by patients with WAD. Furthermore, the ordinal scaling of this measure
at the item-level may not satisfy the interval-level scaling requirement needed for commonly
applied statistical methods of analysis used in clinical research.[129]
4.2.2.4 Neck Bournemouth Questionnaire
The Neck Bournemouth Questionnaire (NBQ) was developed in 2002 to measures neck-related
disability.[10] The authors report that it attempts to capture the affective and cognitive
dimensions of neck pain that are overlooked by other commonly used neck pain disability
measures (i.e. Northwick Neck Pain Questionnaire, NDI, Copenhagen Neck Pain (CNP) and
Neck Pain and Disability Scale).[10] The NBQ consists of seven items, each with response
options on a scale from 0 (no impairment) to 10 (complete impairment). The questions address
neck pain intensity, activity limitations (i.e., work, daily activities, recreational, social and
family), emotional symptoms (i.e., anxiety and depression), effect of work on neck pain, and
ability of the subject to control the neck pain.[10] Preliminary studies suggest that it has good
internal consistency (Cronbach’s alpha=0.9), moderate test-retest reliability (ICC=0.65), and
acceptable responsiveness (Cohen and Kaziz effect sizes greater than one and better than NDI
and CNP) in individuals with nonspecific neck pain.[9,10,66] The NBQ has acceptable construct
validity when compared with the NDI (Pearson’s r of 0.51-0.71 for overall NBQ), SF-36 Health
Survey (moderate Pearson’s r correlations of -0.43 to -0.59 with individual items of the NBQ for
the physical health, mental health and social functioning SF-36 domains) and the Copenhagen
Neck Functional Disability Scale (Pearson’s r of 0.48-63).[10]
62
We chose the Neck Bournemouth Questionnaire instead of other possible neck disability
instruments because the Neck Bournemouth Questionnaire aims to capture affective and
cognitive symptoms related to neck pain. Moreover, other measures such as the Northwick Neck
Pain Questionnaire have items identical or very similar to the NDI items.[132] Since the WDQ
was developed from the NDI, the Neck Bournemouth Questionnaire was chosen to avoid
potentially artificial inflation of correlations between the constructs based on identical items.
4.2.2.5 CES-D
It is well documented that depressive symptomatology has a negative effect on recovery from
whiplash injuries.[14,107] The CES-D is a widely used questionnaire with adequate
psychometric properties developed to measure the frequency of depressive symptoms.[102] It is
a 20-item scale developed by The National Institute of Mental Health with responses in a Likert
format ranging from 0 (lasting less than one day out of a week) to 3 (lasting all the time; 5-7 days
within a week). Total summative scores range from 0 to 60, with higher scores reflecting greater
levels of depressive symptoms and lower scores reflecting lower levels of symptoms. The CES-
D has 4 separate factors: depressive affect, somatic symptoms, positive affect, and interpersonal
relations.
4.2.2.6 SF-36 Health Survey
The SF-36 Health Survey is a 36-item questionnaire that assesses functional impairment and
symptoms due to medical health problems. It was developed for the Medical Outcomes Study,
and has been tested and validated extensively across a range of disorders.[88,103,115,137] It has
strong psychometric properties in some musculoskeletal and degenerative neck
conditions.[1,77,80,105] We used SF-36 Version 2.0 (SF-36v2) for acute injuries (1-week
recall).[136] The SF-36 is a generic measure that is intended to capture health-related quality of
life across a range of disorders. It consists of 8 domains (General Health, Physical Functioning,
Social Functioning, Role Physical, Role Emotional, Mental Health, Vitality, Bodily Pain) whose
item scores can be added up for a total SF-36 score, or assessed within the individual domains.
The SF-36 can be used to determine construct validity by comparing domains that are similar to
WDQ items.
63
4.2.2.7 Self-report Recovery
Self-reported recovery is widely used in clinical research as a global measure of improvement
following injury or disease.[93] We used a global recovery question that had seven response
options to the question: ”How well do you feel you are recovering from your injuries?”. The
seven response options were: 1. ‘completely better’; 2. ‘much improved’; 3. ‘slightly improved’;
4. ‘no change’; 5. ‘slightly worse’; 6. ‘much worse’; 7. ‘worse than ever’. This global recovery
question is reliable in acute WAD patients.[90] Self-reported recovery has been compared with
the SF-36 for minor musculoskeletal injuries and with neck pain NRS, Pain Disability Index and
CES-D in whiplash injuries.[18,93] Physical aspects of functional health status were more
strongly associated with self-reported recovery than were emotional or social aspects.
Incrementally poorer recovery ratings on the recovery question were also associated with greater
neck pain NRS, functional limitations, poorer physical health, depression and being off
work.[16]
4.2.3 Analysis
All statistical analyses were performed using SAS software (SAS 9.1 for Windows, SAS
Institute Inc., Cary, NC, USA).
4.2.3.1 Descriptive statistics
We examined the distributions of individual WDQ items and the total WDQ score to determine
the adequacy of the data for factor analysis and to determine which correlation statistic should be
used in construct validity. We also screened the data for missing values. Data distribution was
examined by visually inspecting plots, comparing the mean and median values as well as
examining the values of skewness and kurtosis for each item and the overall WDQ. Skewness
and kurtosis values between 0 and 1 were considered to demonstrate adequate normal
distribution of the data.
4.2.3.2 Factor Structure
We performed an exploratory factor analysis (EFA) of the WDQ. We considered inter-item
correlations to be adequate for factor analysis when correlations were between 0.3 and 0.7.[58]
64
Sampling adequacy of data for factor analysis was assessed by applying the Bartlett’s Test of
Sphericity and the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. We considered
the data adequate for factor analysis if the Bartlett’s Test of Sphericity had a significant p-value
(<0.05) rejecting the null hypothesis of an identity matrix, and if the KMO value was above 0.8
demonstrating sampling adequacy.[38,51,75]
Responses to the WDQ were subjected to an EFA using squared multiple correlations as prior
communality estimates. The maximum likelihood factor method was used to extract the factors
followed by a varimax (orthogonal) or promax (oblique) rotation. We used a combination of
several criteria to determine the number of factors to be retained: scree plot, proportion of
variance explained and clinical interpretability/meaningfulness.[58] The interpretability and
meaningfulness criterion meant that there had to be at least 3 items per factor that share a
conceptual meaning, that items loading on different factors measure different constructs, and that
the rotated factor pattern had a simple structure with no complex loadings.[58]
We used the likelihood ratio (chi-squared) test, root mean square residual (RMSR), the Akaike
information criteria (AIC), the Schwarz Bayesian criteria (SBC) and the Tucker and Lewis
Reliability coefficient (TLRC) to assess the goodness of fit of the models. We considered model
fit values adequate if they satisfied the arbitrary cut-points recommended for RMSR (< 0.08) and
[79,109] Finally, we considered models with a lower AIC and SBC values and
chi-square model fit values with significant p-values (at p<0.05) to have better fit.[79]
An item was considered to load on a factor if its factor loading was 0.40 for that factor and
<0.40 for the other factors.[58] We performed sensitivity analyses on the factor solution by: 1)
excluding items with the most missing values; 2) excluding items resulting in complex factor
loading; 3) imputing ‘0’ for missing values.
We measured internal consistency using Cronbach’s alpha for the overall scale and for each of
the factors. For items measuring one construct, Cronbach’s alpha was expected to be > 0.7.
[120,122] In cases where the Cronbach’s alpha level is > 0.9, item redundancy was considered,
and the number of items reduced accordingly.[120,122] For multi-construct instruments,
Cronbach’s alpha is not expected to satisfy these minimum values but should be assessed for
individual constructs.[117]
65
4.2.3.3 Validity
We developed a priori hypotheses for the expected correlations between the WDQ total score,
items or domains and the various measures of constructs relevant to whiplash-related symptoms.
Our hypotheses included correlation estimations with constructs for pain intensity (Numerical
Rating Scale), neck disability (Neck Disability Index [NDI] and Bournemouth Questionnaire),
mental health (CES-D), and general health (SF-36). We hypothesized that the WDQ should
correlate with other measures of neck-related disability (i.e. NDI, Neck Bournemouth
Questionnaire). Therefore, we tested this potential association with a hypothesis that the total
WDQ score would correlate strongly (i.e. r=0.6-0.8) with the Neck Bournemouth Questionnaire
and the NRS for the neck area, and it would correlate very strongly with the NDI (i.e. r=0.7-0.9).
Our hypothesis of very strong correlations with the NDI was based on the fact that the WDQ was
developed using NDI items. We also hypothesized that the WDQ daily activities items and daily
activities domain would correlate strongly (i.e. 0.6-0.8) with the Neck Bournemouth
Questionnaire, SF-36 physical function and role physical, the NDI and the neck/head NRS. The
total WDQ score was hypothesized to correlate moderately (i.e. 0.4-0.6) with the SF-36 physical
function and role physical, and NRS for the upper and lower limbs. Similarly, emotional items
and the WDQ emotional domain were hypothesized to correlate moderately (i.e. 0.4-0.6) with
the SF-36 mental health and role emotional, as well as the NDI and NRS of the upper and lower
limb. Finally, the total WDQ score was hypothesized to correlate poorly with the CES-D (i.e.
0.2-0.4) and poorly-to-moderately (i.e. 0.3-0.5) with the SF-36 mental health and role emotional
along with trunk region NRS.
We calculated Pearson’s correlations coefficients for normally distributed data and the
Spearman’s rank correlations coefficients were used for data that was skewed.
4.2.3.4 Responsiveness
We computed responsiveness based on a priori hypotheses of WDQ score changes using the
report of recovery on the global recovery question, or a change of 3/10 or more points on the
neck pain NRS as external anchors. We hypothesized that participants with acute WAD would
demonstrate change over the six-week period using responsiveness statistics and associations
between recovery and WDQ scores based on a priori hypothesized correlations and the receiver
operator characteristics curve approach. In accordance with previous research, we considered a
66
change of 3 or more points on an 11-point NRS to be a moderate improvement that would
demonstrate responsiveness.[37] The report of recovery on the 6-week global recovery question
was defined by dichotomizing response options. Responses were dichotomized as recovered if
the options ‘completely better’ or ‘much improved’ were selected.[90] They were classified as
not recovered if any of the other response options were selected (including the ‘slightly
improved’, ‘no change’, ‘slightly worse’, ‘much worse’ and ‘worse than ever’ responses). Our
definition excluded the ‘slightly improved’ response option from the recovered group. This is in
accordance with previous research which suggested that excluding categories adjacent to the ‘no
change’ category may provide the most accurate definition of recovery.[74] By excluding
participants who may be very close to reporting no change, our dichotomization of recovery may
provide the least contaminated groups of patients who have recovered clearly differentiated from
the non-recovered group. Previous studies have used the same dichotomous definition of global
recovery.[18]
We assessed responsiveness by calculating the effect size, Guyatt’s responsiveness statistic and
the standardized response mean defining recovery using the recovery question.[23,54,68] Effect
size has been defined as the difference between mean scores at baseline and follow-up divided
by the standard deviation of baseline scores.[23,68] The standardized response mean uses a
similar ratio except that it uses a standard deviation of the change scores in the denominator.[68]
Guyatt’s responsiveness statistic, on the other hand, uses a different numerator. The numerator
is supposed to represent the smallest difference between baseline and follow-up scores that has a
meaningful benefit (such as a minimal clinically important difference (MCID)).[54,68] In the
absence of a proposed MCID, Guyatt et al suggested that a mean change can also be used.[54]
The Guyatt responsiveness statistic also differs from the other statistics because the denominator
is computed based on the error estimated from a sample of stable patients.[54,144]
Some authors have suggested that the relative size of change for responsiveness statistics can be
categorized into small, medium and large effect size.[144] However, these categorizations are
only helpful for head-to-head comparisons between tools. The magnitude of the statistic can
depend on several other factors such as intervening time or intervention. We considered
evidence of responsiveness to be demonstrated based on our a priori hypotheses without reliance
on the magnitude of the statistic. We presented all three statistics because different indices of
responsiveness may provide different results or a different responsiveness rank order.[144]
67
We also hypothesized that strong positive associations >0.8 would exist between a report of
recovery (i.e. using the recovery question or the NRS) and the summative total WDQ change
score between baseline and six weeks. We hypothesized that those participants reporting
recovery would demonstrate the highest WDQ change scores over the six-week period (i.e.
highest decrease in disability scores). For the two domains (daily activities and emotional),
moderate positive associations >0.5 were expected within the participants reporting recovery.
Finally, we demonstrated responsiveness using the receiver operator characteristic (ROC) curve
approach.[34] We reported AUCs for dichotomous variables because they provide a measure of
the instrument’s ability to discriminate between participants who improved and those who have
not, based on the external anchor.[33] Change thresholds in WDQ scores were plotted against
their ability to discriminate between participants who recovered and those who have not (using
change in neck pain NRS and the recovery question as external anchors). Sensitivity and 1-
specficity were plotted for each change threshold and the area under the curve was calculated.
De Vet et al have suggested that an area under the curve (AUC) of at least 0.7 is suggestive of
good discriminative ability, and hence good responsiveness when a reasonable external anchor
(criterion indicator) is used.[33] While this threshold is reasonable for the overall WDQ and the
physical domain, we hypothesized that the emotional domain will demonstrate change over six
weeks that is less than the threshold proposed by De Vet et al (i.e. AUC = 0.6).
4.2.4 Sample Size
The sample size required to adequately perform an exploratory factor analysis is at least 5-10
times the number of participants to the number of variables being analyzed.[58,84] The WDQ
has 13 items, and we considered a sample size of 130 participants to be adequate for analysis.
4.3 Results
4.3.1 Sample characteristics
We enrolled 130 participants with acute WAD. Of those, 91 (70%) were female and mean age
was 42.1 (SD= 13.2) years [Table 4.1]. Participants were enrolled a mean of 6.5 days post-injury
68
(SD=4.9). Thirty-four (26%) participants had Grade I WAD, 95 (73%) had Grade II, and one
(0.8%) had Grade III. The majority of participants (87.8%) had education that was higher than
high school and almost half (47.7%) had an income of less than $50,000. Two participants
(1.5%) had lawyers involved in the claim. The mean WDQ score was 49.8 (SD=29.1) at
baseline (n=130). The NDI was completed by 125 participants with a mean score of 17.5
(SD=8.1) and 129 participants completed the Neck Bournemouth Questionnaire with a baseline
mean score of 30.6 (SD=16.4). The SF-36 was completed by all participants with mean
standardized scores of 62.4 (SD=26.0) for physical functioning and 68.7 (SD=21.5) for the
mental health domain. Finally, the CES-D was also completed by all participants with a mean
score of 20.2 (SD=7.0).
Seventy eight percent (n=101) of participants responded to the six-week follow-up telephone
interview. However, one participant did not respond to the WDQ questionnaire. The mean
WDQ score was 32.0 (SD=31.2). At six weeks, 15 participants (14.9%) reported being
‘completely recovered’, 48 (47.5%) reported being ‘much improved’, 30 (29.7%) were ‘slightly
improved’, three (3%) reported ‘no change’ and five (5%) got worse.
69
Table 4.1: Baseline demographic characteristics of patients with acute whiplash associated
disorders.
Characteristic N Baseline
Female, no. (%) 130 91 (70.0) Age, years Mean (SD); range 130 42.1 (13.2); 19.6-81.6 Time since injury, days Mean (SD); median; range 130 6.5 (4.9); 5.00; 0-25 WAD grade 130 I, no. (%) 34 (26.2) II, no. (%) 95 (73.1) III, no. (%) 1 (0.8) WDQ Total Score, Mean (SD); median; range 130 49.8 (29.1); 46; 2-119 Neck Disability Index, mean (SD); median; range 125 17.5 (8.1); 16; 0-41 Neck Bournemouth Questionnaire, mean (SD); median; range
129 30.6 (16.4); 28; 2-65
SF-36 Physical Functioning, mean (SD); median; range
130 62.4 (26.0); 65; 0-100
SF-36 Mental Health, mean (SD); median; range 130 68.7 (21.5); 75; 10-100 CES-D, mean (SD); median; range 130 20.2 (7.0); 18; 8-42 Highest level of education, no. (%) 130 High school or less 17 (12.3) Post secondary or some university 43 (33.1) Technical school graduate 21 (16.2) University graduate 50 (38.5) Income, no. (%) 129 $0-$49,999 62 (47.7) $50,000-$59,999 17 (13.1) $60,000-$79,999 21 (16.2) $80,000+ 27 (20.8) Did not respond 2 (1.5) Lawyer Involvement in the Claim (%) 130 2 (1.5) Pain Intensity, Mean (SD)*; Median 130 Neck 5.5 (2.0); 5.5 Shoulder 4.6 (2.8); 5.0 Low Back 3.9 (3.3); 4.0 Headache 3.7 (3.2); 4.0 Arm 2.2 (2.7); 0 !! !! !!
70
4.3.2 Data completion
Responses to the WDQ demonstrated few missing values. The item with the highest number of
missing values was the sporting activities item (19 missing out of 130 at baseline and 14 out of
100 at follow up) [Table 4.2]. At baseline, 22 participants had one missing item and one
participant had 2 missing items. Distributions of individual items at baseline are provided in
Appendix 3. Table 4.2 and distributions in Appendix 3 demonstrate that the emotional items and
items related to social activities and driving or using public transportation may contribute less to
the overall WDQ score because they are more skewed toward no disability. The emotional
subscale and its items demonstrate a floor effect.
At follow up, 13 participants had one missing item and two participants had two missing items.
There were no participants with more than 2 missing items.
Table 4.2: Baseline means, medians and normality values of WDQ total score and individual
items
Variable N Mean Std Dev
Median % min
% max
Kurtosis Skewness
Pain 130 5.7 2.1 6.0 0 2 -0.6 -0.2 Personal care 130 3.0 2.9 2.0 33 0 -1.0 0.6 Work/home/study duties 129 4.8 3.1 5.0 12 8 -1.2 0.0 Driving/using public trans 127 3.8 3.0 3.0 18 2 -1.2 0.3 Sleep 130 4.8 3.4 6.0 21 5 -1.4 -0.1 Fatigue/tiredness 130 5.4 2.9 6.0 9 6 -0.8 -0.4 Social activities 129 3.5 3.2 3.0 29 2 -1.2 0.5 Sporting activities 111 6.3 3.5 7.0 10 26 -1.1 -0.5 Non-sporting leisure activities 130 3.1 3.0 2.5 33 2 -0.9 0.6 Depression/sadness 130 2.3 3.0 1.0 48 1 -0.2 1.1 Anger 130 2.2 3.0 0.0 54 2 0.1 1.1 Anxiety 130 3.2 3.0 2.0 27 2 -1.0 0.6 Concentration 130 2.9 3.0 2.0 36 2 -0.8 0.6 Summative WDQ score 130 49.8 29.1 46.0 0 0 -0.7 0.4 Daily activities subscale score 130 39.2 21.0 38.0 0 0 -0.8 0.2 Emotional subscale score 130 10.6 10.2 7.0 47.5 0 -0.2 0.9
71
4.3.3 Factor structure
The total WDQ score distribution satisfied the normal distribution assumption (skewness=0.4;
kurtosis=-0.7) (Figure 4.1). Most of the individual items were skewed toward less disability (i.e.
means and medians being mostly below 5/10) [Table 4.2]. Ninety-six percent of inter-item
correlations were above 0.3 and 97.4% were below 0.7. The Bartlett’s Test of Sphericity and
Kaiser-Meyer-Olkin measure supported that the inter-item correlations were adequate for factor
analysis with a chi-squared value of 116.3 (p<0.0001) and the KMO value of 0.92.
Figure 4.1: Total WDQ baseline distribution
EFA was performed on baseline data collected from 107 participants with complete WDQ scores
(i.e. no missing values on the correlation matrix). The proportion of variance explained, the
Scree plot and clinical interpretability suggested that two factors should be retained [Figure 4.2].
However, the model fit criteria suggested that a three-factor model was appropriate but the three-
factor model resulted in complex loading of the concentration item (WDQ13) [Table 4.3].
Therefore, the three-factor model failed to satisfy the clinical interpretability criteria that we set a
72
priori, which required a simple factor solution with no complex loading of items (i.e. no items
loading on more than one factor in the final solution).
!"#$$%&'()%(*%+,-$./0'1$2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%4565%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%896:%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%8:65%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%846:%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%,%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%-%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%$%8565%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%.%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%/%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%0%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%1%%96:%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%$%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:65%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%46:%7%%%%%%%%%%%%%%%%%4%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%;%%%%%<%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%565%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%=%%%%%9%%%%%>%%%%%?%%%%%5%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%8%%%%%4%%%%%;%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%@46:%7%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%ABBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB7BBBBB%%%%%%%%%%%%%5%%%%%8%%%%%4%%%%%;%%%%%<%%%%%:%%%%%=%%%%%9%%%%%>%%%%%?%%%%85%%%%88%%%%84%%%%8;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%C1DE$#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Figure 4.2: Factor analysis scree plot
Using promax rotation, we determined that two factors should be retained with nine items
loading on the daily activities factor and four loading on the emotion factor [Table 4.4].
Specifically, depression/sadness, anger, anxiety and concentration items group together
conceptually and statistically to describe the emotional subscale of the WDQ [Table 4.4]. The
nine items loading on the other factor share a conceptual meaning of activities involved in daily
life along with pain and fatigue/tiredness, which can be considered closely associated with daily
life activities. Personal care, work/home/study duties, driving/using public transportation, sleep
and social, sporting and non-sporting leisure activities share the conceptual meaning of daily
activities. Statistically and theoretically, they group well with pain and fatigue/tiredness because
individuals with pain or fatigue often have activity limitations caused by pain/fatigue, but who
73
may also get more pain/fatigue with activities. In a practical sense, it may often be difficult to
determine if pain/fatigue caused the activity limitation, or if the limitation is the cause of the
pain/fatigue. Statistically, these concepts group together with strong correlations to describe the
daily activities subscale of the WDQ [Table 4.4]. The inter-item correlation between the two
factors (daily activities, emotional) in the promax rotation was 0.63. This high correlation
suggests that emotions are associated with daily activity limitations in adults with acute WAD
and should not be ignored when measuring outcomes in research studies and in clinical practice.
Table 4.3: Model fit statistics for the models with different number of factors in the WDQ
# of Factors LR c2 df P-value RMSR AIC SBC TLRC One 189.4 65 <0.0001 0.082 70.4 -103.3 0.82 Two 109.2 53 <0.0001 0.053 10.3 -131.4 0.90 Three 66.3 42 0.0098 0.037 -12.9 -125.2 0.94
* LR=Likelihood Ratio; df=degrees of freedom; RMSR=root mean square residual; AIC=Akaike information criteria; SBC=Schwarz Bayesian criteria; TLRC=Tucker and Lewis Reliability coefficient
We performed sensitivity analyses on the factor solution. First, we excluded the sporting item
since it was the item with the most missing items. The two-factor solution remained stable with
this exclusion (n=125). Second, we excluded the concentration item because it resulted in a
complex 3-factor solution by loading on two factors. The two-factor solution remained stable on
removal of the concentration item (n=107). Finally, we imputated ’0’ for missing values as
suggested by the WDQ developers, and the imputation did not alter the two-factor solution
(n=130).
Cronbach alpha coefficient was 0.93 overall (n=107), 0.92 for the activity limitation (n=107) and
0.88 for the emotion factor (n=130). Imputing zero for missing values resulted in adjusted
Cronbach alphas of 0.93, 0.91 and 0.88, respectively.
74
Table 4.4: Factor analysis of the WDQ: The 2-factor solution
Variable* Factor Pattern Factor Structure Factor Daily
Activities Emotional Daily
Activities Emotional
1 Pain .61 .11 .68 .49 2 Personal care .69 .03 .71 .47 3 Work/home/study duties .85 .00 .85 .54 4 Driving/using public
transportation .60 .23 .75 .61
5 Sleep .59 .21 .72 .58 6 Fatigue/tiredness .46 .32 .67 .61 7 Social activities .69 .18 .80 .61 8 Sporting activities .83 -.17 .72 .35 9 Non-sporting leisure
activities .72 .12 .79 .57
10 Depression/sadness .20 .72 .65 .84 11 Anger -.04 .67 .38 .64 12 Anxiety -.09 .94 .50 .88 13 Concentration .28 .56 .63 .73
4.3.4 Validity
Our analysis suggested that the WDQ has adequate construct validity. A priori theorized strong
Pearson’s correlations were met for anticipated relationship between the WDQ and the NDI,
Bournemouth questionnaire, SF-36 physical function and numerical pain rating scales (for the
neck, shoulder, mid and low back pain) [Table 4.5]. Moderate correlations (as theorized) were
found for the CES-D and the SF-36 mental health domain [Table 4.5]. The NRS scores for
abdomen, hand, leg, foot and face pain intensity and the SF-36 mental health and role emotional
subscales scores were not normally distributed. Therefore, these correlations were computed
using Spearman’s rank correlations. All other distributions were normal and their correlations
were reported as Pearson’s correlations.
75
Table 4.5: Results of construct validation (n=130). A priori expected Pearson correlations
between the WDQ, its subdomains and constructs shown (E) followed by observed/achieved
results (A).
WDQ N Overall Domain daily
activities Domain emotional
Neck Bournemouth E 0.6-0.8 0.6-0.8 - Questionnaire A 129 0.89 0.86 SF-36 Physical function E 0.4-0.6 0.6-0.8 - A 130 0.72 0.74 SF-36 Role Physical E 0.4-0.6 0.6-0.8 - A 130 0.68 0.72 SF-36 Mental Health* E 0.3-0.4 - 0.4-0.6 A 130 0.58 0.70 SF-36 Role emotional* E 0.3-0.4 - 0.4-0.6 A 130 0.57 0.66 CES-D E 0.2-0.4 - 0.6-0.8 A 130 0.67 0.73 NDI E 0.7-0.9 0.6-0.8 0.4-0.6 A 125 0.80 0.82 0.60 NRS upper limb E 0.4-0.6 0.4-0.6 0.4-0.6 NRS shoulder A 130 0.49 0.51 0.33 NRS arm* 130 0.42 0.37 0.40 NRS hand* 130 0.26 0.21 0.29 NRS lower limb E 0.4-0.6 0.4-0.6 0.4-0.6 NRS leg* A 130 0.22 0.23 0.15 NRS foot* 130 0.24 0.26 0.21 NRS trunk E 0.3-0.5 0.3-0.5 0.3-0.5 NRS midback A 130 0.48 0.46 0.42 NRS low back pain 130 0.40 0.37 0.35 NRS abdominal* 130 0.30 0.29 0.27 NRS neck/head E 0.6-0.8 0.6-0.8 0.6-0.8 NRS neck A 130 0.64 0.65 0.48 NRS head* 130 0.44 0.45 0.36
All correlations significant at p<0.05 (except between leg pain and emotional domain (p=0.097)) E=expected; A=achieved * Spearman rank correlations reported due to distributions skewed toward a value of zero
76
4.3.5 Responsiveness
The effect size, Guyatt’s responsiveness statistic and SRM for recovered participants (n=62)
demonstrated change in WDQ scores at 6 weeks for the overall WDQ and daily activities
subscale, but less for the emotional domain [Table 4.6]. As described in the methods, the exact
values of the responsiveness statistics should not be used to demonstrate severity of change since
arbitrary cutoff points are not useful. However, all responsiveness statistics demonstrate an
improvement. The mean change in scores also reflects that improvement in the overall WDQ
score and each of the subscales over 6 weeks [Table 4.6].
Table 4.6: Effect size, Guyatt’s responsiveness statistic (RS) and standardize response mean
(SRM) for participants reporting recovery on the global recovery question (N=62)
Variable N Mean SD SD** Effect Size
RS SRM
Baseline total WDQ score 101 47.61 29.34 Follow-up total WDQ score 100 31.95 31.15 Baseline daily activities score 101 37.25 21.18 Follow-up daily activities score 100 23.99 22.83 Baseline emotional score 101 10.37 10.26 Follow-up emotional score 100 7.96 9.86 Change in total WDQ score 100 15.99 23.12 19.76 0.78 1.16 1.04 Change in daily activities score 100 13.48 17.9 21.13 0.92 0.92 1.14 Change in emotional score 100 2.51 7.52 6.08 0.34 0.57 0.49
SD = standard deviation; SD** = standard deviation of the group reporting no change
The total WDQ and daily activities subscale had moderate correlations with the global recovery
question at six weeks and with the change in NRS neck pain over the six-week follow-up [Table
4.7]. For the same comparisons, the emotional subscale demonstrated poor correlations. The
total WDQ score and its subscales did not reach a priori hypothesized correlations with the 7-
category global recovery question or the change in 11-cateogory NRS neck pain continuous
scores [Table 4.7]. A priori hypothesized correlations between the global recovery question at
six weeks, a change in pain intensity and the change in WDQ over six weeks were not achieved
77
for the dichotomized variables [Table 4.7]. However, AUCs for the total summative WDQ
change scores and the daily activities domain demonstrated a value over 0.7 for both the
recovery question and the change in NRS neck pain of three or more points, thus demonstrating
adequate responsiveness [Table 4.7]. The emotional domain AUC was below the hypothesized
0.6 for all comparisons.
Table 4.7: Spearman’s rank correlations and AUCs for responsiveness based on the a priori
hypotheses
WDQ
N
Overall Domain daily
activities Domain
emotional Continuous variable comparisons Global recovery
7-category recovery question at 6 weeks
Corr 100 -0.41; p<0.0001
-0.41; p<0.0001
-0.19; p=0.057
Change in neck pain
Change in 11-point NRS for neck pain
Corr 100 0.54; p<0.0001
0.54; p<0.0001
0.27; p=0.007
Dichotomized external anchor comparisons
Recovery status
Completely recovered vs other responses of the recovery question
Corr# 100 0.29 0.30 0.17
AUC Recovered
n=15 * 0.68 0.50
Completely better and Much Improved vs other responses of recovery question
Corr# 100 0.52 0.57 0.24
AUC Recovered
n=62 0.73 0.75 0.50
Pain severity
NRS neck pain change of 3 or more
Corr# 100 0.50 0.54 0.26
AUC Recovered
n=47 0.73 0.76 0.60
*unable to construct; # Biserial Correlation
78
!
4.4 Discussion We found that in adults with acute WAD, the WDQ consists of two factors or subscales: daily
activities and emotional. Our sensitivity analyses demonstrated that the factor structure is stable
when we excluded the item with the most missing values (i.e. sporting activities) or the item that
had a complex loading (i.e. concentration item) in a three-factor model. Our two-factor model
was not affected by missing values. The total summative score and the two subscales of the
WDQ had good internal consistency. Our analysis also demonstrated that the WDQ had good
construct validity. Specifically, strong correlations were found between the total WDQ score and
the NDI, the Bournemouth questionnaire, the SF-36 physical function and the numerical pain
rating scales (for the neck, shoulder, mid and low back pain). Moderate correlations were also
demonstrated with the CES-D and the SF-36 mental function. Finally, the confirmation of our a
priori mini-theories supports that the overall summative score and the physical activities
subscale of the WDQ were responsive. However, responsiveness was not established for the
emotional subscale of the WDQ.
Previous work by Pinfold et al. found that the WDQ had only one factor in patients with chronic
WAD.[99] While it is possible that the WDQ has different factor structures in acute and chronic
populations, this finding may also be attributable to the analytical methods used to derive the
factors. Pinfold et al. used principal component analysis (PCA) instead of factor analysis.
Statistically, the two methods differ since the goal of PCA is to account for the total variance in
the sample.[40,50,58] PCA does not differentiate between the common and unique variance and
defines principal components as linear combinations of measured variables. In contrast, the
common factor model used in factor analysis assumes that total variance can be divided into
common and unique variance of each variable. It uses only common variance in determining the
number of factors to retain, and the measured variables are linear composites of estimated latent
factors in EFA. This can lead to different results because PCA uses total variance to estimate the
number of factors to extract from the solution, while EFA uses only the common variance of
each variable (i.e., a portion of the total variance that is shared among variables).[40,50,58]
Furthermore, PCA is mainly recommended as a data reduction technique; whereas, factor
79
analysis is appropriate for both data reduction and for determining the factor structure of an
instrument.[50,58] Therefore, we chose to use EFA instead of PCA.
Compared to our results, a higher internal consistency (alpha=0.96) for the total summative scale
was previously reported in chronic WAD. This finding may be attributable to the homogeneity
of the chronic population.[99] However, the authors did not report a high inter-item correlation
(>0.85), which would have suggested item redundancy.[99,120,122] Based on our results, the
WDQ has two subscales each with adequate internal consistency and no item redundancy. The
two-factor WDQ also demonstrated adequate overall internal consistency in this sample of acute
WAD.
Our results demonstrated construct validity in participants with acute WAD by confirming our a
priori mini-theory correlations between the WDQ and other relevant constructs. Strong
correlations with other physical ability and pain constructs as well as moderate correlations with
the emotional constructs suggests that the WDQ is valid at capturing the full spectrum of
symptoms relevant to WAD. To our knowledge, the construct validity of the WDQ has not been
previously assessed. Face validity was previously found to be reasonable in chronic WAD by a
medical committee panel (i.e. practitioners involved in musculoskeletal rehabilitation, clinical
psychology and psychiatry).[99] No changes to the questionnaire were requested by this
multidisciplinary panel for the population with chronic WAD during the development of the
WDQ.[99]
Our responsiveness results suggest that the total WDQ and the daily activities subscale
demonstrate change over 6 weeks, but that the emotional domain is not responsive in acute
WAD. The lack of responsiveness in the emotional domain may be due to a lack of change in
emotion (i.e. depression, anxiety) over a 6-week period. With a floor effect of almost 50%, the
emotional subscale does not have the range of symptom severity to demonstrate change since
there is no emotional disability from the onset in this sample of acute WAD. A previous study
reported an adequate correlation between the total WDQ score and a transition recovery question
over one month in chronic WAD (Spearman’s r=0.67).[140] Our lower correlations in acute
WAD were potentially due to the external anchor which had fewer response options. Our global
recovery question was a 7-response option Likert scale while Willis et al’s question was an 11-
category Likert scale. Although a 7-response option Likert scale is often considered adequately
80
continuous for research purposes, an 11-category Likert scale with more points in the score is a
more continuous scale when tested statistically. For dichotomous variables, we reported AUCs
because they provide a measure of the instrument’s ability to discriminate between participants
who improved and those who have not, based on the external anchor.[33] Our results
demonstrate satisfactory AUCs (AUC>0.7) for the total summative WDQ score and the daily
activities, but not the emotional subscale. To our knowledge, there are no other reports of AUCs
for WDQ responsiveness. Willis et al also reported the effect size (ES=0.02) and the SRM
(SRM=0.05) demonstrating that there was minimal change in WAD disability in their overall
study sample over one month (n=52).[140] We assessed similar responsiveness statistics to
Willis et al. for those reporting improvement on our recovery question using the recovery status
definition (i.e. the top 2 recovery response options) in acute WAD (n=62). In contrast to Willis
et al, we found that all three responsiveness statistics demonstrated change over six weeks in
acute WAD for those reporting change. However, their assessment was performed in a chronic
population and no change was expected.[140] Responsiveness statistics were not assessed for
the participants reporting worsening since deterioration was not a frequent occurrence (n=8) as
expected. Caution should be used in demonstrating responsiveness on the WDQ because a
change of one sixth of the scale was needed to demonstrate significant change.
Our study had several strengths. We had a large sample of participants with acute WAD
recruited within 21 days of their injury. We also had few missing data and a follow-up rate of
almost 80%. However, our results may be limited by several factors. Our EFA demonstrated two
factors. It is possible that a different set of decision criteria to determine a factor structure may
have lead to different results (i.e., a minimum factor loading of 0.35 instead of 0.4 may result in
a different number of factors extracted from the factor solution). However, our decisions on
factor structure were based on a priori determined criteria, and we applied standards that are
commonly used and recommended in the field in terms of such criteria.[58] Missing values may
also lead to different factor structures. However, we performed sensitivity analyses and the factor
structure remained stable. There is no criterion for measuring WAD disability. Therefore, we
used a priori hypotheses to determine construct validity, which is the acceptable method of
assessing validity.[33] It is possible that other instruments measuring constructs relevant to
WAD disability (e.g., Hamilton Rating Scale for Depression, or the Northwick Neck Pain
Questionnaire) may provide different results; however, we used validated and common self-
81
report outcome measures for WAD that are representative of the relevant constructs. There is
controversy on how responsiveness should be assessed and determined.[33] We reported
correlations, AUCs and various responsiveness statistics using a priori hypotheses in order to
confirm changes in WAD disability over six weeks experienced by our participants (and
expressed through external anchors of change). Although, the total WDQ and daily activities
subscale were responsive, the total WDQ still required a change of at least one sixth of the scale
over six weeks to demonstrate change beyond the daily variability of individuals reporting no
change. Therefore, clinicians and researchers should be cautious in using the total WDQ change
scores below 22 points.
We recommend that the daily activities subscale and the total summative WDQ be used.
However, the emotional subscale on its own is not responsive to change in participants with
acute WAD and should be used with caution until it undergoes more rigorous testing in a subset
of people expected to have more change in emotional functioning than our sample had. Future
research should address the lack of responsiveness of the emotional subscale and suggest
potential modifications that could improve the response to emotional factors in WAD disability.
4.5 Conclusion Our results demonstrate that the WDQ has two factors (daily activities and emotional) and
adequate construct validity. While the WDQ demonstrated responsiveness to recovery over six
weeks, a change of almost one sixth of the WDQ (MDC=22 points) is required to demonstrate
change beyond the daily variability of individuals reporting no change. Furthermore, the
emotional subscale was not able to detect change during the first six weeks of the acute phase of
WAD. The WDQ can, therefore, be used in clinical settings to determine disability status and to
demonstrate change over time. However, only the overall score and the daily activities subscale
should be used to demonstrate change over time since the emotional subscale was not responsive
in this sample.
82
4.6 Acknowledgement This study was funded by an industry grant from AVIVA Canada Incorporated to the University
Health Network for the UHN Whiplash Intervention Trial. Maja Stupar was funded by a Vanier
Canada Scholar Canadian Institutes of Health Research award. The authors declare no conflicts
of interest.
83
Chapter 5 :
Discussion
5.1 Context and summary of the thesis The primary focus of this thesis was to investigate the measurement properties of a disability
outcome measure, the Whiplash Disability Questionnaire (WDQ), in a population of adults with
acute WAD. Specifically, I used classical test theory to determine the test-retest reliability,
construct validity, factor structure and responsiveness of the WDQ.[33] This is the first
evaluation of the WDQ in injured adults within 21 days of their motor vehicle collision. The
WDQ had only been validated in patients with chronic WAD; its applicability to adults with
acute WAD remained unknown. It is important to validate this instrument for use in acute
injuries because accurate measurement of outcomes early after a traffic collision and during
treatment may assist in preventing chronicity.
The WDQ is a comprehensive instrument compared to other outcome measures currently used in
clinical research and clinical practice to monitor individuals with WAD.[63] A comprehensive
instrument is preferable because it has better content validity than a less comprehensive
instrument.[125] Proper evaluation of disability throughout the course of whiplash injuries is
necessary to study treatment effectiveness and prevent chronic disability.
Different methodologies are available to assess the measurement properties of outcome
measures. A close look at these methodologies (and at their differences) led to the conceptual
paper (Chapter 2) that focused on resolving the debate between clinimetrics and psychometrics.
I found that differences were more prevalent for the development of outcome measures while the
evaluation involved similar methods for both clinimetrics and psychometrics.
When examining the measurement properties of an instrument, it is important to first establish
the instrument’s reliability and then its validity and responsiveness.[78] The WDQ was
developed with an evaluative purpose meaning that it was developed to measure the magnitude
of change in symptoms/disability over time.[78] I found that the WDQ is reliable in acute
whiplash injuries (Chapter 3) which led me to assess its factor structure, construct validity and
responsiveness. The exploratory factor analysis indicated that WDQ has two factors: daily
84
activities and emotional subscales (Chapter 4). Therefore, the construct validity and
responsiveness were assessed using both the total summative scale and the subscales. The WDQ
had reasonable construct validity for the total summative and subscale scores but the emotional
subscale did not demonstrate adequate responsiveness (Chapter 4). Finally, the reliability of the
subscales was evaluated posteriori and demonstrated adequate reliability. In the 66 participants
reporting minimal to no change in their whiplash symptoms over 3-5 days, the intra-class
correlation coefficient (ICC) for the daily activities subscale was 0.85 (95% CI 0.80-0.90) and
0.87 (95% CI 0.82-0.91) for the emotional subscale. The results of subscale reliability complete
the validation of the WDQ in acute WAD and link the results of Chapters 3 and 4.
5.2 Contribution of the research to the whiplash literature
The construct of disability is difficult to measure in WAD patients because it cannot be measured
with biological or patho-physiological measures. Instead, this construct is mostly measured
using self-reported outcome measures that can be perceived to contain bias due to the subjective
nature of the reporting of the outcome. In order to standardize the definition of disability, the
World Health Organization (WHO) developed the International Classification of Functioning,
Disability and Health (ICF).[121,142] According to this classification, disability ‘serves as an
umbrella term for impairments, activity limitations and participation restriction’ and this includes
environmental and factors that can interact with these constructs.[142] Therefore, self-reported
outcome measures should be multi-faceted to capture all domains relevant to the definition of
disability as proposed by the ICF. However, in the WAD literature, the construct of recovery is
often equated to the absence of neck pain. This is problematic because WAD is a much broader
construct than neck pain.[59,134] Furthermore, WAD recovery lacks a standardized definition
which partly contributes to reporting of varying recovery rates in research. The WDQ was
recently identified as the most rigorously-developed whiplash-specific disability questionnaire
currently available to monitor patients because of its comprehensive scope.[134] Developers of
the WDQ combined the use of the ICF, expert opinion and a problem elicitation technique to
interview WAD patients to identify items for inclusion in the WDQ.[99,134]
My thesis provides validation of this promising tool. Its use may improve the measurement of
WAD disability, which will help standardize the measurement of disability in future research.
85
Standardizing the measurement of disability in acute WAD is important to document the impact
of whiplash injuries on patients’ activities. This measurement is also important to study the
clinical course of WAD and identify those who may be at risk of developing chronic disability.
While my thesis was underway, the COSMIN research group published a consensus document
on measurement terminology and proposed a list of evaluation criteria for the critical appraisal of
measurement studies.[125] Included in the criteria are items on adequate sample size, reporting
and handling of missing data that are relevant to all studies assessing measurement properties.
The criteria also included property specific items such as adequate follow-up period for test-
retest reliability and setting appropriate a priori hypotheses for construct validity and
responsiveness (the complete COSMIN criteria checklist is available online at
http://www.cosmin.nl/the-cosmin-checklist_8_5.html). It is important to note that the COSMIN
criteria have not yet been evaluated. However, the methods used in my thesis satisfy most of the
criteria suggested by this group of clinimetricians (Appendix 4).
Development of standardized criteria by the COSMIN group may lead to standardization of
methods and improvement of the quality of published research in the measurement field.
However, some of their criteria rating levels are arbitrary and may need to be modified during
the validation process. For example, the COSMIN group used a rule of thumb for evaluating
quality of a study (Appendix 4).[125] This arbitrary criterion rating level does not take into
account that sample sizes vary according to the research questions and the parameters to be
estimated.[11,133]
While developing the evaluative criteria, the COSMIN research group also performed a literature
review of neck pain and disability measures, applied their criteria, and found that methodology
and reporting of literature on this topic could be improved in several areas.[126] Specifically,
Terwee et al reported that the most important methodological aspects that need improvement
included assessing unidimensionality in internal consistency analysis and using stable patients
and similar test conditions in studies on reliability and measurement error. Furthermore, they
suggested that more emphasis should be placed on the relevance and comprehensiveness of the
items in content validity studies and that construct validity and responsiveness studies should be
based on predefined hypotheses.[126] Therefore, this thesis has contributed methodologically
86
strong results on the measurement properties of the WDQ for acute injuries. Consequently, the
WDQ can be used in clinical and research settings as a more comprehensive condition-specific
outcome measure in whiplash injuries compared to the current commonly used neck-specific
measures. Although the responsiveness of the emotional subscale needs to be improved, the
WDQ has adequate measurement properties for use in clinical and research settings. Moreover,
its use will decrease the burden on patients and research participants by providing one short
instrument to assess a construct that would otherwise require multiple instruments.
5.3 Implications of the research The results of this thesis have implications for several stakeholders including clinicians,
researchers, health policy makers, insurers and most importantly patients. The main implication
is that the WDQ, a comprehensive, condition-specific outcome measure, has measurement
properties that support its use in patients with acute WAD. This compliments already published
information on its validity in chronic WAD. A questionnaire that is valid in different stages of a
disorder can be used to demonstrate significant changes in disability status over the full duration
of the condition and is not limited to just cases that have become chronic. In turn, the WDQ can
be helpful in studying how to prevent chronic WAD disability by assisting with the identification
of effective treatments.
Clinicians and researchers must be cautious about the responsiveness of the WDQ in acute WAD
patients, specifically when using the emotional subscale. We demonstrated responsiveness using
within-person and between-person scores for those reporting change using an external anchor.
This makes the responsiveness results relevant clinically (where within-person change is
assessed) and in research or for public health purposes (where between-person change is often
relevant such as differences between recovered and unrecovered groups). Furthermore, the
WDQ is reliable for evaluative purposes in research settings (i.e. adequate ICC for group-level
analysis) and in clinical settings (i.e. MDC of 22 points for changes in individual patients).
Aside from the limitations within the emotional subscale responsiveness, the WDQ is reliable,
valid and responsive for use in research and in clinical settings.
87
Validated instruments are necessary for accurate measurement of treatment effectiveness in
research and clinically. Without validated outcome measures, research results may be biased and
this bias can have a major impact on health policy as inaccurate research results can translate into
flawed health policies. Research results and health policy both influence the administration of
insurer coverage for injuries following motor vehicle collisions. Therefore, validated
instruments are the necessary building blocks for effective treatment of patients with WAD
because they help develop sound research results, effective health policy and effective insurance
policy.
5.4 Future research The results of this thesis have led to several questions that should be answered in future research
studies. These include questions resulting from both the quantitative assessment of the WDQ
measurement properties and the conceptual measurement paper.
5.4.1 Content validity using qualitative methods
The item with the most missing values inquired about ‘sporting activities’. The relevance of this
item needs to be established in patients with acute WAD. For example, a participant with an
acute injury may not have had time to attempt a sporting activity that may lead to missing values
because of the time of administration. The meaning of the sporting activity item needs to be
reconsidered. In Australia, where the WDQ was developed, definition of sport participation
broadly includes participation in organized sport and non-organized sport plus physical
activities.[69] In Canada, sport is defined as an organized physical activity such as aerobics,
walking clubs or baseball.[12] Walking to work or bicycling for leisure would not be included
as a sporting activity. Survey studies have shown that 49% of Canadian adults over the age of 20
walk at least 30 minutes per day but only 34% of them report that they participate in sport.[12]
Therefore, a Canadian participant may answer that the sporting activity item is not applicable
simply because they have not attempted an organized physical activity instead of answering how
it affects their daily physical activity. While there may not have been time to attempt
participation in organized sports in acute WAD, participation in everyday physical activities
would likely be attempted. However, these participants may not answer the sporting activity
item based on the wording of the item. While this may differ in other countries or regions, we
88
suspect that this item had the most missing values in our study because information on daily
activities was not considered when answering the item.
The sporting items should be modified, removed or substituted and the modified WDQ tested
both qualitatively to determine the appropriateness of the item and quantitatively to determine if
the measurement properties continue to hold or improve with the modified WDQ compared to
the original.
5.4.2 Minimizing measurement error
The WDQ requires a minimal change of 22 points before we can consider real change occurring
beyond the normal variations of a stable individual with WAD. This is a change of
approximately one sixth of the total score. This minimal detectable change coincides with the
minimal clinically important change and change beyond normal variations can therefore be
considered important as well. However, future research should examine how to decrease the
measurement error of the WDQ so that smaller changes in WAD disability can be detected.
There may be two potential ways to improve measurement error: 1. modify the WDQ, and 2.
modify WDQ scoring. One manner to improve it may be to examine the items of the WDQ
conceptually and determine if the sporting item (with the most missing values) and concentration
items (with complex loading in the three-factor model) should be modified or substituted with
other items or if more life participation items should be added. Another method may be to
examine the scoring of the WDQ. The WDQ may need to have weighted scoring or the
individual item scoring may need to be modified if a shorter item scale is found to be more
sensitive.
5.4.3 Longitudinal and structural construct validity
Our sample included 130 participants who were assessed within 21 days of their motor vehicle
collision. Although this sample size is adequate for the assessment of WDQ measurement
properties, a larger sample may provide more accurate answers. Specifically, confirmatory factor
analysis is needed to confirm the 2-factor structure of the WDQ. Our sample was too small to be
split into two smaller sub-samples to perform exploratory and confirmatory factor analysis.
However, the UHN Whiplash Intervention Trial (WIT) which had similar inclusion and
exclusion criteria to this thesis’ cohort study and which shares a significant proportion of
participants can be used to perform such an analysis.[26] The UHN WIT can provide WDQ
89
baseline data on 340 participants. This sample size would be adequate for splitting to perform
both a CFA and to perform another EFA to determine if issues encountered in our study such as
missing values for the sporting activities item or complex loading of the concentration item in
the factor analysis would be relevant in a larger population or if a larger population could
potentially yield a 3-factor structure that would reflect domains of the ICF more closely.
The larger sample size would also be useful for retesting the emotional subscale responsiveness
over 6 weeks. However, the emotional subscale may need to be refined with modification or
inclusion of more emotional items to improve responsiveness over time. Modifications to the
WDQ would have to be tested in a separate study since the UHN WIT will only have the
original, unmodified WDQ.
5.4.4 Predictive validity
While the WDQ was developed with an evaluative purpose, it may have predictive properties
that could be helpful clinically in preventing chronic symptoms. The predictive utility of the
WDQ should be evaluated for identifying patients at risk for developing chronic WAD in a
similar population to determine if specific items, subscales or the total WDQ score can predict
cases that may require modifications in management. If the WDQ (including its items or
subscales) has a good discriminant validity at predicting cases that recover, then the instrument
would be useful to identify patients at risk of developing chronic disability. Furthermore, the
WDQ should be tested for inclusion in a clinical prediction rule that includes other prognostic
factors such as demographic or clinical factors.
5.4.5 Direct comparison with other relevant instruments The measurement properties of the WDQ should be directly compared to other outcome
measures used in WAD to determine if the WDQ outperforms other instruments. The sample
size should be large enough to do sub-analyses based on chronicity since measurement properties
can differ based on time of administration.
5.4.6 Applicability of the conceptual framework
We have developed a conceptual framework to assist in the use of instruments relevant to
measuring clinical outcomes. Our framework provides a basis for differences in development
and consistency in evaluation of measurement properties between clinimetrics and
90
psychometrics. Once the framework is applied, it will be tested in terms of its applicability to
the field of measurement. The application of the framework will help determine if it needs
further modifications.
Other measurement fields may follow a similar pattern of conceptual similarities and differences
with clinimetrics and psychometrics but, to our knowledge, this has not been studied. Future
research should determine if other measurement fields that could be relevant to clinical
measurement, such as biometrics, can fit into a similar framework or if the conceptual
differences may demonstrate a different development and evaluation framework pattern.
91
References
[1] Angst F, Aeschlimann A, Steiner W, Stucki G. Responsiveness of the WOMAC
osteoarthritis index as compared with the SF-36 in patients with osteoarthritis of the legs
undergoing a comprehensive rehabilitation intervention. Ann Rheum Dis 2001;60:834-840.
[2] Apgar V. A proposal for a new method of evaluation of the newborn infant. Curr Res Anesth
Analg 1953;32:260-267.
[3] Arksey H, O'Malley L. Scoping Studies: Towards a Methodological Framework.
International Journal of Social Research Methodology: Theory & Practice 2005;8:19-32.
[4] Armitage P, David HA. Advances in biometry : 50 years of the International Biometric
Society. New York: Wiley, 1996.
[5] Beaton DE, Tarasuk V, Katz JN, Wright JG, Bombardier C. "Are you better?" A qualitative
study of the meaning of recovery.see comment. Arthritis Rheum 2001;45:270-279.
[6] Beaton DE, Wright JG, Katz JN, Upper Extremity Collaborative G. Development of the
QuickDASH: comparison of three item-reduction approaches. J Bone Joint Surg Am
2005;87:1038-1046.
[7] Berthier F, Potel G, Leconte P, Touze MD, Baron D. Comparative study of methods of
measuring acute pain intensity in an ED. Am J Emerg Med 1998;16:132-136.
[8] Bijur PE, Latimer CT, Gallagher EJ. Validation of a verbally administered numerical rating
scale of acute pain for use in the emergency department. Acad Emerg Med 2003;10:390-392.
[9] Bolton JE. Sensitivity and specificity of outcome measures in patients with neck pain:
detecting clinically significant improvement. Spine 2004;29:2410-2417.
[10] Bolton JE, Humphreys BK. The Bournemouth Questionnaire: a short-form comprehensive
outcome measure. II. Psychometric properties in neck pain patients. J Manipulative Physiol Ther
2002;25:141-148.
92
[11] Bonett DG. Sample size requirements for estimating intraclass correlations with desired
precision. Stat Med 2002;21:1331-1335.
[12] Canadian Fitness and Lifestyle Research Institute. Section A: Physical Activity and Sport
Participation Rates in Canada. Statistics Canada 2008;2002/2003 Canadian Community Health
Survey.
[13] Carroll LJ, Cassidy JD, Côté P. Frequency, timing, and course of depressive
symptomatology after whiplash. Spine 2006;31:E551-6.
[14] Carroll LJ, Cassidy JD, Côté P. The role of pain coping strategies in prognosis after
whiplash injury: passive coping predicts slowed recovery.see comment. Pain 2006;124:18-26.
[15] Carroll LJ, Holm LW, Hogg-Johnson S, Côté P, Cassidy JD, Haldeman S, Nordin M,
Hurwitz EL, Carragee EJ, van der Velde G, Peloso PM, Guzman J, Bone and Joint Decade 2000-
2010 Task Force on Neck Pain and Its Associated,Disorders. Course and prognostic factors for
neck pain in whiplash-associated disorders (WAD): results of the Bone and Joint Decade 2000-
2010 Task Force on Neck Pain and Its Associated Disorders. Spine 2008;33:S83-92.
[16] Carroll LJ, Jones DC, Ozegovic D, Cassidy JD. How well are you recovering? The
association between a simple question about recovery and patient reports of pain intensity and
pain disability in whiplash-associated disorders. Disabil Rehabil 2012;34:45-52.
[17] Carstensen TB, Frostholm L, Oernboel E, Kongsted A, Kasch H, Jensen TS, Fink P. Post-
trauma ratings of pre-collision pain and psychological distress predict poor outcome following
acute whiplash trauma: a 12-month follow-up study. Pain 2008;139:248-259.
[18] Cassidy JD, Carroll LJ, Côté P, Frank J. Does multidisciplinary rehabilitation benefit
whiplash recovery?: results of a population-based incidence cohort study. Spine 2007;32:126-
131.
[19] Cassidy JD, Carroll LJ, Côté P, Lemstra M, Berglund A, Nygren A. Effect of eliminating
compensation for pain and suffering on the outcome of insurance claims for whiplash injury. N
Engl J Med 2000;342:1179-1186.
93
[20] Chappuis G, Soltermann B, Cea, Aredoc, Ceredoc. Number and cost of claims linked to
minor cervical trauma in Europe: results from the comparative study by CEA, AREDOC and
CEREDOC. Eur Spine J 2008;17:1350-1357.
[21] Childs JD, Piva SR, Fritz JM. Responsiveness of the numeric pain rating scale in patients
with low back pain. Spine 2005;30:1331-1334.
[22] Cleland JA, Childs JD, Whitman JM. Psychometric properties of the Neck Disability Index
and Numeric Pain Rating Scale in patients with mechanical neck pain. Arch Phys Med Rehabil
2008;89:69-74.
[23] Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: L. Erlbaum
Associates, 1988.
[24] Copay AG, Subach BR, Glassman SD, Polly DW,Jr, Schuler TC. Understanding the
minimum clinically important difference: a review of concepts and methods. Spine J 2007;7:541-
546.
[25] Côté P, Cassidy JD. The epidemiology of neck pain: what we have learned from our
population-based studies. Journal of the Canadian Chiropractic Association 2003;47:284-290.
[26] Côté P, Cassidy JD, Carette S, Boyle E, Shearer HM, Stupar M, Ammendolia C, van der
Velde G, Hayden JA, Yang X, van Tulder M, Frank JW. Protocol of a randomized controlled
trial of the effectiveness of physician education and activation versus two rehabilitation programs
for the treatment of Whiplash-associated Disorders: The University Health Network Whiplash
Intervention Trial. Trials 2008;9:75.
[27] Côté P, Cassidy JD, Carroll LJ. The factors associated with neck pain and its related
disability in the Saskatchewan population. Spine 2000;25:1109-1117.
[28] Côté P, Cassidy JD, Carroll LJ, Frank JW, Bombardier C. A systematic review of the
prognosis of acute whiplash and a new conceptual framework to synthesize the literature. Spine
2001;26:E445-58.
94
[29] Côté P, Hogg-Johnson S, Cassidy JD, Carroll LJ, Frank JW, Bombardier C. Initial patterns
of clinical care and recovery from whiplash injuries: a population-based cohort study.see
comment. Arch Intern Med 2005;165:2257-2263.
[30] Côté P, Soklaridis S. Does early management of whiplash-associated disorders assist or
impede recovery? Spine 2011;36:S275-9.
[31] de Vet HCW, Terwee CB, Bouter LM. Clinimetrics and psychometrics: two sides of the
same coin. J Clin Epidemiol 2003;56:1146-1147.
[32] de Vet HCW, Terwee CB, Bouter LM. Current challenges in clinimetrics. J Clin Epidemiol
2003;56:1137-1141.
[33] de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in Medicine: A Practical
Guide. New York, U.S.A.: Cambridge University Press, 2011.
[34] Deyo RA, Centro RM. Assessing the responsiveness of functional scales to clinical change:
An analogy to diagnostic test performance. J Chronic Dis 1986;39:897-906.
[35] Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status
measures. Statistics and strategies for evaluation. Control Clin Trials 1991;12:142S-158S.
[36] Dijkers MP. Psychometrics and clinimetrics in assessing environments. A comment
suggested by Mackenzie et al., 2002. J Allied Health 2003;32:38-43.
[37] Dworkin RH, Turk DC, Wyrwich KW, Beaton D, Cleeland CS, Farrar JT, Haythornthwaite
JA, Jensen MP, Kerns RD, Ader DN, Brandenburg N, Burke LB, Cella D, Chandler J, Cowan P,
Dimitrova R, Dionne R, Hertz S, Jadad AR, Katz NP, Kehlet H, Kramer LD, Manning DC,
McCormick C, McDermott MP, McQuay HJ, Patel S, Porter L, Quessy S, Rappaport BA,
Rauschkolb C, Revicki DA, Rothman M, Schmader KE, Stacey BR, Stauffer JW, von Stein T,
White RE, Witter J, Zavisic S. Interpreting the clinical importance of treatment outcomes in
chronic pain clinical trials: IMMPACT recommendations. Journal of Pain 2008;9:105-121.
[38] Dziuban CD, Shirkey EC. When is a correlation matrix appropriate for factor analysis?
Some decision rules. Psychol Bull 1974;81:358-361.
95
[39] Emmelkamp PM. The additional value of clinimetrics needs to be established rather than
assumed. Psychother Psychosom 2004;73:142-144.
[40] Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory
factor analysis in psychological research. Psychol Methods 1999;4:272-299.
[41] Fava GA, Belaise C. A discussion on the role of clinimetrics and the misleading effects of
psychometric theory. J Clin Epidemiol 2005;58:753-756.
[42] Fava GA, Ruini C, Rafanelli C. Psychometric theory is an obstacle to the progress of
clinical research. Psychother Psychosom 2004;73:145-148.
[43] Fava GA, Tomba E, Sonino N. Clinimetrics: the science of clinical measurements. Int J
Clin Pract 2012;66:11-15.
[44] Fayers PM, Hand DJ. Factor analysis, causal indicators and quality of life. Quality of Life
Research: An International Journal of Quality of Life Aspects of Treatment, Care &
Rehabilitation 1997;6:139-150.
[45] Fayers PM, Hand DJ, Bjordal K, Groenvold M. Causal indicators in quality of life research.
Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care
& Rehabilitation 1997;6:393-406.
[46] Feinstein AR. T. Duckett Jones Memorial Lecture. The Jones criteria and the challenges of
clinimetrics. Circulation 1982;66:1-5.
[47] Feinstein AR. Clinimetrics. New Haven: Yale University Press, 1987.
[48] Feinstein AR. Multi-item "instruments" vs Virginia Apgar's principles of clinimetrics. Arch
Intern Med 1999;159:125-128.
[49] Ferrari R, Russell A, Kelly AJ. Assessing whiplash recovery--the Whiplash Disability
Questionnaire. Aust Fam Physician 2006;35:653-654.
[50] Floyd FJ, Widaman KF. Factor analysis in the development and refinement of clinical
assessment instruments. Psychol Assess 1995;7:286-299.
96
[51] Gorsuch RL. Using Bartlett's Significance Test to determine the number of factors to
extract. Educational and Psychological Measurement 1973;33:361-364.
[52] Gray JAM. Evidence-based healthcare. New York: Churchill Livingston, 2001.
[53] Gross A, Forget M, St George K, Fraser MM, Graham N, Perry L, Burnie SJ, Goldsmith
CH, Haines T, Brunarski D. Patient education for neck pain. Cochrane Database Syst Rev
2012;3:005106.
[54] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of
evaluative instruments. J Chronic Dis 1987;40:171-178.
[55] Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in
clinical trials. CMAJ 1986;134:889-895.
[56] Hains F, Waalen J, Mior S. Psychometric properties of the neck disability index. Journal of
Manipulative & Physiological Therapeutics 1998;21:75-80.
[57] Hamilton M. The assessment of anxiety states by rating. Br J Med Psychol 1959;32:50-55.
[58] Hatcher L. A step-by-step approach to using the SAS system for factor analysis and
structural equation modeling. Cary, NC: SAS Institute, 1994.
[59] Hincapié CA, Cassidy JD, Côté P, Carroll LJ, Guzman J. Pain localization after traffic
collisions: analysis of a population-based inception cohort study. In: Anonymous World
Congress on Neck Pain, 2008. pp. 100.
[60] Hjermstad M, Fayers P, Haugen D, Caraceni A, Hanks G, Loge J, Fainsinger R, Aass N,
Kaasa S, European Palliative Care Research Collaborative (EPCRC). Studies comparing
Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for assessment of
pain intensity in adults: a systematic literature review. J Pain Symptom Manage 2011;41:1073-
1093.
[61] Holm LW, Carroll LJ, Cassidy JD, Hogg-Johnson S, Côté P, Guzman J, Peloso P, Nordin
M, Hurwitz E, van der Velde G, Carragee E, Haldeman S, Bone and Joint Decade 2000-2010
Task Force on Neck Pain and Its Associated,Disorders. The burden and determinants of neck
97
pain in whiplash-associated disorders after traffic collisions: results of the Bone and Joint
Decade 2000-2010 Task Force on Neck Pain and Its Associated Disorders. Spine 2008;33:S52-9.
[62] Holm LW, Carroll LJ, Cassidy JD, Skillgate E, Ahlbom A. Expectations for recovery
important in the prognosis of whiplash injuries. PLoS Med 2008;5:e105.
[63] Hoving JL, O'Leary EF, Niere KR, Green S, Buchbinder R. Validity of the neck disability
index, Northwick Park neck pain questionnaire, and problem elicitation technique for measuring
disability associated with whiplash-associated disorders. Pain 2003;102:273-281.
[64] Hudak P, Amadio PC, Bombardier C, Beaton DE, Cole D, Davis AM, Hawker GA, Katz
JN, Makela M, Marx RG, Punnett L, Wright JG. Development of an upper extremity outcome
measure: The DASH disabilities of the arm, shoulder, and head. Am J Ind Med 1996;29:602-
608.
[65] Huntington JL, Dueck A. Handling missing data. Curr Probl Cancer 2005;29:317-325.
[66] Hurst H, Bolton J. Assessing the clinical significance of change scores recorded on
subjective outcome measures. J Manipulative Physiol Ther 2004;27:26-35.
[67] Hurwitz EL, Carragee EJ, van der Velde G, Carroll LJ, Nordin M, Guzman J, Peloso PM,
Holm LW, Côté P, Hogg-Johnson S, Cassidy JD, Haldeman S, Bone and Joint Decade 2000-
2010 Task Force on Neck Pain and Its Associated,Disorders. Treatment of neck pain:
noninvasive interventions: results of the Bone and Joint Decade 2000-2010 Task Force on Neck
Pain and Its Associated Disorders. Spine 2008;33:S123-52.
[68] Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a
critical review and recommendations. J Clin Epidemiol 2000;53:459-468.
[69] Ifedi F. Sport participation in Canada, 2005. [S.l.]: Culture, 2008.
[70] Jette DU, Jette AM. Physical therapy and health outcomes in patients with spinal
impairments.see commenterratum appears in Phys Ther 1997 Jan;77(1):113. Phys Ther
1996;76:930-941.
98
[71] Jull GA, Soderlund A, Stemper BD, Kenardy J, Gross AR, Cote P, Treleaven J, Bogduk N,
Sterling M, Curatolo M. Toward optimal early management after whiplash injury to lessen the
rate of transition to chronicity: discussion paper 5. Spine 2011;36:S335-42.
[72] Juniper EF, Guyatt GH, Feeny DH, Ferrie PJ, Griffith LE, Townsend M. Measuring quality
of life in children with asthma. Qual Life Res 1996;5:35-46.
[73] Juniper EF, Guyatt GH, Streiner DL, King DR. Clinical impact versus factor analysis for
quality of life questionnaire construction. J Clin Epidemiol 1997;50:233-238.
[74] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in
a disease-specific Quality of Life Questionnaire. J Clin Epidemiol 1994;47:81-87.
[75] Kaiser HF. A revised measure of sampling adequacy for factor-analytic data matrices.
Educational and Psychological Measurement 1981;41:379-381.
[76] Katz JN, Liang MH. Classification criteria revisited. Arthritis Rheum 1991;34:1228-1230.
[77] King JT,Jr, Roberts MS. Validity and reliability of the Short Form-36 in cervical
spondylotic myelopathy. J Neurosurg 2002;97:180-185.
[78] Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic
Dis 1985;38:27-36.
[79] Kline RB. Principles and practice of structural equation modeling. New York: Guilford
Press, 2005.
[80] Kosinski M, Keller SD, Hatoum HT, Kong SX, Ware JE. The SF-36 Health Survey as a
generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis:
tests of data quality, scaling assumptions and score reliability. Med Care 1999;37:MS10-22.
[81] Kraemer HC, Korner AF. Statistical alternatives in assessing reliability, consistency, and
individual differences for quantitative measures: Application to behavioral measures of neonates.
Psychol Bull 1976;83:914-921.
[82] Levac D, Colquhoun H, O'Brien KK. Scoping studies: advancing the methodology.
Implement Sci 2010;5:69.
99
[83] Lohman TG, Roche AF, Martorell R. Anthropometric standardization reference manual.
Champaign, IL: Human Kinetics Books, 1988.
[84] MacCallum RC, Widaman KF, Zhang S, Hong S. Sample Size in Factor Analysis. Psychol
Methods 1999;4:84-99.
[85] Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimetric and psychometric
strategies for development of a health measurement scale. J Clin Epidemiol 1999;52:105-111.
[86] Mason JH, Anderson JJ, Meenan RF, Haralson KM, Lewis-Stevens D, Kaine JL. The rapid
assessment of disease activity in rheumatology (radar) questionnaire. Validity and sensitivity to
change of a patient self-report measure of joint count and clinical status. Arthritis Rheum
1992;35:156-162.
[87] McCaskey M, Ettlin T, Schuster C. German version of the whiplash disability
questionnaire: reproducibility and responsiveness. Health Qual Life Outcomes 2013;11:36.
[88] McHorney CA, Ware JE,Jr, Lu JF, Sherbourne CD. The MOS 36-item Short-Form Health
Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse
patient groups. Med Care 1994;32:40-66.
[89] Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de
Vet HCW. The COSMIN study reached international consensus on taxonomy, terminology, and
definitions of measurement properties for health-related patient-reported outcomes. J Clin
Epidemiol 2010;63:737-745.
[90] Ngo T, Stupar M, Côté P, Boyle E, Shearer H. A study of the test-retest reliability of the
self-perceived general recovery and self-perceived change in neck pain questions in patients with
recent whiplash-associated disorders. Eur Spine J 2010;19:957-962.
[91] Nierenberg AA, Sonino N. From clinical observations to clinimetrics: a tribute to Alvan R.
Feinstein, MD. Psychother Psychosom 2004;73:131-133.
[92] Nunnally JC, Bernstein IH. Psychometric theory. New York: McGraw-Hill, Inc., 1994.
100
[93] Ottosson C, Pettersson H, Johansson S-, Nyren O, Ponzer S. Recovery after minor traffic
injuries: A randomized controlled trial. PLoS Clinical Trials.Vol 2007;2:Arte Number: e14. ate
of Pubaton: 23 MAR 2007.
[94] Ozegovic D, Carroll LJ, Cassidy JD. Factors associated with recovery expectations
following vehicle collision: a population-based study. J Rehabil Med 2010;42:66-73.
[95] Ozegovic D, Carroll LJ, David Cassidy J. Does expecting mean achieving? The association
between expecting to return to work and recovery in whiplash associated disorders: a population-
based prospective cohort study. Eur Spine J 2009;18:893-899.
[96] Pengel LH, Refshauge KM, Maher CG. Responsiveness of pain, disability, and physical
impairment outcomes in patients with low back pain. Spine 2004;29:879-883.
[97] Phillips LA, Carroll LJ, Cassidy JD, Cote P. Whiplash-associated disorders: who gets
depressed? Who stays depressed?. Eur Spine J 2010;19:945-956.
[98] Pietrobon R, Coeytaux R, Carey T, Richardson W, DeVellis R. Standard Scales for
Measurement of Functional Outcome for Cervical Pain or Dysfunction: A Systematic Review.
Spine 2002;27:515-522.
[99] Pinfold M, Niere KR, O'Leary EF, Hoving JL, Green S, Buchbinder R. Validity and
internal consistency of a whiplash-specific disability measure. Spine 2004;29:263-268.
[100] Portney LG. Foundations of clinical research : applications to practice. Upper Saddle
River, N.J.: Prentice Hall Health, 2000.
[101] Quinlan KP, Annest JL, Myers B, Ryan G, Hill H. Neck strains and sprains among motor
vehicle occupants-United States, 2000. Accident Analysis & Prevention 2004;36:21-27.
[102] Radloff LS. The CES-D Scale: A self-report depression scale for research in the general
population. Applied Psychological Measurement 1977;1:385-401.
[103] Rebbeck TJ, Refshauge KM, Maher CG, Stewart M. Evaluation of the core outcome
measure in whiplash. Spine 2007;32:696-702.
101
[104] Relman AS. Assessment and accountability: the third revolution in medical care. N Engl J
Med 1988;319:1220-1222.
[105] Revicki DA, Rentz AM, Luo MP, Wong RL. Psychometric characteristics of the short
form 36 health survey and functional assessment of chronic illness Therapy-Fatigue subscale for
patients with ankylosing spondylitis. Health Qual Life Outcomes 2011;9:36.
[106] Ribera A, Permanyer-Miralda G, Alonso J, Cascant P, Soriano N, Brotons C. Is
psychometric scoring of the McNew Quality of Life after Myocardial Infarction questionnaire
superior to the clinimetric scoring? A comparison of the two approaches. Qual Life Res
2006;15:357-365.
[107] Richter M, Ferrari R, Otte D, Kuensebeck HW, Blauth M, Krettek C. Correlation of
clinical findings, collision parameters, and psychological factors in the outcome of whiplash
associated disorders. J Neurol Neurosurg Psychiatry 2004;75:758-764.
[108] Salaffi F, Carotti M, Grassi W. Health-related quality of life in patients with hip or knee
osteoarthritis: comparison of generic and disease-specific instruments. Clin Rheumatol
2005;24:29-37.
[109] Schumacker RE. A beginner's guide to structural equation modeling. Mahwah, N.J.:
Lawrence Erlbaum Associates, 2004.
[110] Schuster C, McCaskey M, Ettlin T. German translation, cross-cultural adaptation and
validation of the whiplash disability questionnaire. Health Qual Life Outcomes 2013;11:45.
[111] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol
Bull 1979;86:420-428.
[112] Snaith RP, Baugh SJ, Clayden AD, Husain A, Sipple MA. The Clinical Anxiety Scale: an
instrument derived from the Hamilton Anxiety Scale. Br J Psychiatry 1982;141:518-523.
[113] Spitzer WO, Skovron ML, Salmi LR, Cassidy JD, Duranceau J, Suissa S, Zeiss E.
Scientific monograph of the Quebec Task Force on Whiplash-Associated Disorders: redefining
"whiplash" and its management. Spine 1995;20:1S-73S.
102
[114] Sterling M, Carroll LJ, Kasch H, Kamper SJ, Stemper B. Prognosis after whiplash injury:
where to from here? Discussion paper 4. Spine 2011;36:S330-4.
[115] Stewart M, Maher CG, Refshauge KM, Bogduk N, Nicholas M. Responsiveness of pain
and disability measures for chronic whiplash. Spine 2007;32:580-585.
[116] Stratford PW, Riddle DL, Binkley JM, Spadoni G, Westaway MD, Padfield B. Using the
Neck Disability Index to make decisions concerning individual patients. Physiotherapy Canada
1999;51:107-112.
[117] Streiner DL. Being inconsistent about consistency: when coefficient alpha does and
doesn't matter. J Pers Assess 2003;80:217-222.
[118] Streiner DL. Clinimetrics vs. psychometrics: an unnecessary distinction. J Clin Epidemiol
2003;56:1142-1145.
[119] Streiner DL. Health measurement scales : a practical guide to their development and use.
Toronto: Oxford University Press, 2003.
[120] Streiner DL. Starting at the beginning: an introduction to coefficient alpha and internal
consistency. J Pers Assess 2003;80:99-103.
[121] Stucki G. International Classification of Functioning, Disability, and Health (ICF): a
promising framework and classification for rehabilitation medicine. Am J Phys Med Rehabil
2005;84:733-740.
[122] Tavakol M, Dennick R. Making sense of Cronbach's alpha. International Journal of
Medical Education 2011;2:53-55.
[123] Terry R. Recent advances in measurement theory and the use of sociometric techniques.
New Dir Child Adolesc Dev 2000:27-53.
[124] Terwee CB, Jansma EP, Riphagen II, de Vet HCW. Development of a methodological
PubMed search filter for finding studies on measurement properties of measurement instruments.
Qual Life Res 2009;18:1115-1123.
103
[125] Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the
methodological quality in systematic reviews of studies on measurement properties: a scoring
system for the COSMIN checklist. Qual Life Res 2012;21:651-657.
[126] Terwee CB, Schellingerhout JM, Verhagen AP, Koes BW, de Vet HC. Methodological
quality of studies on the measurement properties of neck pain and disability questionnaires: a
systematic review. J Manipulative Physiol Ther 2011;34:261-272.
[127] Turner D, Griffiths AM, Steinhart AH, Otley AR, Beaton DE. Mathematical weighting of
a clinimetric index (Pediatric Ulcerative Colitis Activity Index) was superior to the judgmental
approach. J Clin Epidemiol 2009;62:738-744.
[128] U.S. Food and Drug Administration. Guidance for Industry and Food and Drug
Administration Staff - Factors to Consider When Making Benefit-Risk Determinations in
Medical Device Premarket Approvals and De Novo Classifications. 2011.
[129] van der Velde G, Beaton D, Hogg-Johnston S, Hurwitz E, Tennant A. Rasch analysis
provides new insights into the measurement properties of the neck disability index. Arthritis
Rheum 2009;61:544-551.
[130] Verhagen AP, Scholten-Peeters GG, van Wijngaarden S, de Bie RA, Bierma-Zeinstra SM.
Conservative treatments for whiplash. Cochrane Database Syst Rev 2007:003338.
[131] Vernon H, Mior S. The Neck Disability Index: a study of reliability and validity.erratum
appears in J Manipulative Physiol Ther 1992 Jan;15(1):followi. J Manipulative Physiol Ther
1991;14:409-415.
[132] Vernon H, Mior S. The Northwick Park Neck Pain Questionnaire, devised to measure
neck pain and disability.comment. Br J Rheumatol 1994;33:1203-1204.
[133] Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies.
Stat Med 1998;17:101-110.
[134] Walton D. A review of the definitions of 'recovery' used in prognostic studies on whiplash
using an ICF framework. Disabil Rehabil 2009;31:943-957.
104
[135] Walton DM, Pretty J, MacDermid JC, Teasell RW. Risk factors for persistent problems
following whiplash injury: results of a systematic review and meta-analysis. J Orthop Sports
Phys Ther 2009;39:334-350.
[136] Ware JE,Jr. SF-36 health survey update. Spine 2000;25:3130-3139.
[137] Ware JE,Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I.
Conceptual framework and item selection. Med Care 1992;30:473-483.
[138] Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and
the SEM. J Strength Cond Res 2005;19:231-240.
[139] Westaway MD, Stratford PW, Binkley JM. The patient-specific functional scale:
validation of its use in persons with neck dysfunction. Journal of Orthopaedic & Sports Physical
Therapy 1998;27:331-338.
[140] Willis C, Niere KR, Hoving JL, Green S, O'Leary EF, Buchbinder R. Reproducibility and
responsiveness of the Whiplash Disability Questionnaire. Pain 2004;110:681-688.
[141] Wolf SM. Quality assessment of ethics in health care: the accountability revolution. Am J
Law Med 1994;20:105-128.
[142] World Health Organization. International classification of functioning, disability and
health : ICF. Geneva, 2001.
[143] Wright JG, Feinstein AR. A comparative contrast of clinimetric and psychometric
methods for constructing indexes and rating scales. J Clin Epidemiol 1992;45:1201-1218.
[144] Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin
Epidemiol 1997;50:239-246.
[145] Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand
1983;67:361-370.
[146] Zyzanski SJ, Perloff E. Clinimetrics and psychometrics work hand in hand. Arch Intern
Med 1999;159:1816-1817.
105
Appendices
Appendix 1: Questionnaires
A-1.1: Baseline Questionnaire
The University Health Network WDQ Validation Study
Baseline Questionnaire
STUDY NUMBER: _________________________ TODAY’S DATE: _______, _______, ________ (day) (month) (year)
Baseline v. Apr29.08 – Page 1
106
PLEASE PRINT ALL ANSWERS SECTION A: In this section, we will be asking you questions about your traffic
accident.
1. When did your traffic accident happen? Please provide the date of the accident:
Day___ Month___ Year 20____
SECTION B: In this section, we will be asking you a question about previous accidents and injuries that may have happened in the past 2 years. Please do not include your most recent accident and its related injuries when responding to this question.
1. Excluding any pain caused by the present accident, have you had any neck pain in the past
two years? No Yes
107
1. Do you have neck pain caused by your car accident?
No Skip to Question 2. Yes Please rate your average neck pain in the past 24 hours on a pain scale of 0 to 10
where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
2. Do you have shoulder pain caused by your car accident? No Skip to Question 3. Yes Please rate your average shoulder pain in the past 24 hours on a pain scale of 0 to
10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
3. Do you have low back pain caused by your car accident? No Skip to Question 4. Yes Please rate your average low back pain in the past 24 hours on a pain scale of 0
to 10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 4. Do you have a headache caused by your car accident?
No Skip to Question 5. Yes Please rate your average headache pain in the past 24 hours on a pain scale of 0
to 10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
5. Other parts of your body?
a. Do you have pain in your arm(s) caused by your car accident? No Skip to Question 5b.
Yes Please rate your average arm pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 b. Do you have pain in your hand(s) caused by your car accident?
No Skip to Question 5c. Yes Please rate your average hand pain in the past 24 hours on a pain scale of 0
to 10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad
No Pain as could be 0 1 2 3 4 5 6 7 8 9 10
SECTION C: In this section, we will be asking questions about your pain and its intensity as caused by your accident-related injuries.
108
c. Do you have pain in your face caused by your car accident?
No Skip to Question 5d. Yes Please rate your average face pain in the past 24 hours on a pain scale of 0 to
10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad
No Pain as could be 0 1 2 3 4 5 6 7 8 9 10
d. Do you have pain in your leg(s) caused by your car accident?
No Skip to Question 5e. Yes Please rate your average leg pain in the past 24 hours on a pain scale of 0 to
10 where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
e. Do you have pain in your foot/feet caused by your car accident? No Skip to Question 5f.
Yes Please rate your average foot pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
f. Do you have pain in your mid back caused by your car accident? No Skip to Question 5g.
Yes Please rate your average mid back pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
g. Do you have pain in your abdomen, chest, or groin caused by your car accident? No Skip to Question 6.
Yes Please rate your average abdomen/chest/groin pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
109
6. Did the accident cause any of the following symptoms? (check any that apply) Anxiety or worry Anger Concentration or attention problems Difficulty moving your neck Dizziness or unsteadiness Feeling of numbness, tingling or pain in arms or hands Feeling of numbness, tingling or pain in legs or feet Hearing problems Memory problems or forgetfulness Pain when your neck is moved Sleep problems Sore jaw Unusual fatigue or tiredness Vision problems
110
SECTION D: In this section, we will be asking you questions about past and current health issues that are unrelated to your recent accident.
1. In general, would you say your health is:
Excellent Very Good Good Fair Poor
2. Compared to one week ago, how would you rate your health in general now?
Much better than one week
ago
Somewhat better now
than one week ago
About the same as one
week ago
Somewhat worse now
than one week ago
Much worse now than one
week ago
3. The following questions are about activities you might do during a typical day. Does
your health now limit you in these activities? If so, how much?
Yes, limited
a lot
Yes, limited a little
No, not limited at all
a Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports
b Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf
c Lifting or carrying groceries d Climbing several flights of stairs e Climbing one flight of stairs f Bending, kneeling, or stopping g Walking more than a mile h Walking several hundred yards i Walking one hundred yards j Bathing or dressing yourself
111
4. During the past week, how much of the time have you had any of the following problems with your work or other regular daily activities as a result of your physical health?
All of
the time
Most of the time
Some of the time
A little of the time
None of the time
a Cut down on the amount of time you spent on work or other activities
b Accomplished less than you would like
c Were limited in the kind of work or other activities
d Had difficulty performing the work or other activities (for example, it took extra effort)
5. During the past week, how much of the time have you had any of the following
problems with your work or other regular daily activities as a result of any emotional problems (such as feeling depressed or anxious)?
6. During the past week, to what extent has your physical health or emotional problems
interfered with your normal social activities with family, friends, neighbours or groups?
Not at all Slightly Moderately Quite a bit Extremely
7. How much bodily pain have you had during the past week?
None Very mild Mild Moderate Severe Very severe
8. During the past week, how much did pain interfere with your normal work (including
both work outside the home and housework)?
Not at all Slightly Moderately Quite a bit Extremely
All of the
time
Most of the time
Some of the time
A little of the time
None of the time
a Cut down on the amount of time you spent on work or other activities
b Accomplished less than you would like
c Did work or activities less carefully than usual
112
9. These questions are about how you have felt and how things have been with you
during the past week. For each question, please give the one answer that comes closest to the way you have been feeling.
How much of the time during the past week…
All of
the time
Most of the time
Some of the time
A little of the time
None of the time
a Did you feel full of life? b Have you been very nervous?
c Have you felt so down in the dumps that nothing could cheer you up?
d Have you felt calm and peaceful? e Did you have a lot of energy?
f Have you felt downhearted and depressed?
g Did you feel worn out? h Have you been happy? i Did you feel tired?
10. During the past week, how much of the time has your physical health or emotional
problems interfered with your social activities (like visiting friends, relatives, etc.)?
All of the time Most of the time
Some of the time
A little of the time None of the time
11. How TRUE or FALSE is each of the following statements for you?
Definitely true
Mostly true
Don’t know
Mostly false
Definitely false
a I seem to get sick a little easier than other people
b I am as healthy as anybody I know
c I expect my health to get worse d My health is excellent
113
SECTION E: In this section, we will be asking you a question about your
expectations of recovery.
1. Do you think that your injury will…
get better soon get better slowly never get better don’t know
114
7. How well do you feel you are recovering from your injuries?
Completely better Much improved Slightly improved No change Slightly worse Much worse Worse than ever
8. How do you feel your neck pain has changed since the injury?
Very much better Better Slightly better No change Slightly worse Worse Very much worse
SECTION F: In this section, we will be asking you a question regarding how well you believe your recovery is progressing.
115
"SECTION G:
The questionnaire has been designed to give us information as to how your NECK PAIN has affected your ability to manage in everyday life. Please answer every question and mark in each section ONLY THE ONE BOX which applies to you. We realize you may consider that two of the statements in any one section relates to you, but PLEASE JUST MARK THE BOX WHICH MOST CLOSELY DESCRIBES YOUR PROBLEM. SECTION 1: Pain Intensity
I have no pain at the moment The pain is very mild at the moment The pain is moderate at the moment The pain is fairly severe at the moment The pain is very severe at the moment The pain is the worst imaginable at the moment
SECTION 2: Personal Care (Washing, Dressing etc.)
I can look after myself normally without causing extra pain I can look after myself normally but it causes extra pain It is painful to look after myself and I am slow and careful I need some help but manage most of my personal care I need help every day in most aspects of my personal care I do not get dressed, I wash with difficulty, and stay in bed
SECTION 3: Lifting
I can lift heavy weights without extra pain I can lift heavy weights but it causes extra pain Pain prevents me from lifting heavy weights off the floor but I can manage if they are conveniently positioned (e.g. on a table) Pain prevents me from lifting heavy weights, but I can manage light to medium weights if they are conveniently positioned I can lift only very light weights I cannot lift or carry anything at all
SECTION 4: Reading
I can read as much as I want with no pain in my neck I can read as much as I want with slight pain in my neck I can read as much as I want with moderate pain in my neck I cannot read as much as I want because of moderate pain in my neck I can hardly read at all because of severe pain in my neck I cannot read at all
116
SECTION 5: Headache I have no headaches at all I have slight headaches which occur infrequently I have moderate headaches which occur infrequently I have moderate headaches which occur frequently I have severe headaches which occur frequently I have headaches almost all the time
SECTION 6: Concentration
I can concecntrate fully when I want with no difficulty I can concentrate fully when I want to with slight difficulty I have a fair degree of difficulty in concentrating when I want to I have a lot of difficulty in concentrating when I want to I have a great deal of difficulty in concentrating when I want to I cannot concentrate at all
SECTION 7: Work
I can do as much work as I want to I can do my usual work, but no more I can do most of my usual work, but not all I cannot do my usual work I can hardly do any work at all I cannot do any work at all
SECTION 8: Driving
I can drive my car without any neck pain I can drive my car as long as I want with slight pain in my neck I can drive my car as long as I want with moderate pain in my neck I cannot drive my car as long as I want because of moderate pain in my neck I can hardly drive at all because of severe pain in my neck I cannot drive at all
SECTION 9: Sleeping
I have no trouble sleeping My sleep is slightly disturbed (less than 1 hour sleepless) My sleep is mildly disturbed (1-2 hours sleepless) My sleep is moderately disturbed (2-3 hours sleepless) My sleep is greatly disturbed (3-5 hours sleepless) My sleep is completely disturbed (5-7 hours sleepless)
SECTION 10: Recreation
I am able to engage in all my recreation activities with no neck pain at all I am able to engage in all my recreation activities with some neck pain I am able to engage in most, but not all, my recreation activities because of pain in my neck I am able to engage in few of my recreation activities because of pain in my neck I can hardly do any recreation activities because of pain in my neck I cannot do any recreation activities at all
117
The following scales have been designed to find out about your neck pain and how it’s affecting you. Please answer ALL the scales, and mark ONE number on EACH scale that best describes how you feel. 1. Over the past week, on average, how would you rate your neck pain? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!/012! ! ! ! ! ! ! ! ! 3.456!7012!7.55189:!!2. Over the past week, how much has your neck pain interfered with your daily activities (housework,
washing, dressing, lifting, reading, driving)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!126:4;:4:2<:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! =2089:!6.!<044>!.?6!0<61@16>!!3. Over the past week, how much has your neck pain interfered with your ability to take part in
recreational, social, and family activities?!!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!126:4;:4:2<:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! =2089:!6.!<044>!.?6!
0<61@16>!!4. Over the past week, how anxious (tense, uptight, irritable, difficulty in concentrating/relaxing) have
you been feeling? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!02A1.?5!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! BA64:C:9>!02A1.?5!
!5. Over the past week, how depressed (down-in-the-dumps, sad, in low spirits, pessimistic, unhappy)
have you been feeling?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!D:74:55:D!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! BA64:C:9>!D:74:55:D!!6. Over the past week, how have you felt your work (both inside and outside the home) has affected (or
would affect) your neck pain? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!E0@:!C0D:!16!2.!F.45:! ! ! ! ! !!!!!!!!!!!!!E0@:!C0D:!16!C?<G!
F.45:!!
7. Over the past week, how much have you been able to control (reduce/help) your neck pain on your own?!
!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!H.C79:6:9>!<.264.9!16! ! ! ! ! ! ! ! -.!<.264.9!FG065.:@:4!
118
SECTION H: In this section, we will be asking you questions specifically about your daily activities and your feelings that may have been affected by your whiplash injuries.
Please circle a number in each section to indicate how you have been affected by the whiplash injury and symptoms. If one or more questions are not relevant to you, please leave that section blank.!!1. How much pain do you have today? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!/012! ! ! ! ! ! ! ! ! 3.456!7012!IC0J12089:!%" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=2089:!6.!7:4;.4C!&" How much do your whiplash symptoms interfere with your work/home/study duties?!!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=2089:!6.!7:4;.4C!4. How much do your whiplash symptoms interfere with driving or using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=2089:!6.!640@:9!12!!!!!!!!!!!!
<04K?5:!7?891<!640257.46!
(" How much do your whiplash symptoms interfere with sleep?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! H022.6!59::7!6. How tired/fatigued do you feel as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55K;061J?:!099!6G:!61C:!
*" How much do your whiplash symptoms interfere with social activity?!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!5.<1091L:!8. How much do your whiplash symptoms interfere with sporting activities? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!!9. How much do your whiplash symptoms interfere with non-sporting leisure activity?
!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!!
10. How much sadness/depression do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! BA64:C:!!
50D2:55KM:74:551.2!
119
!11. How much anger do you experience as a result of your whiplash injury/symptoms?
!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02J:4!!12. How much anxiety do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02A1:6>!!13. How much difficulty do you have concentrating as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!D1;;1<?96>!! ! ! ! ! ! ! ! ! =2089:!6.!! ! ! ! ! ! ! ! ! ! H.2<:26406:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
120
SECTION I: In this section, we will be asking you questions about your mood.
Below is a list of some of the ways you may have felt or behaved. Please indicate how often you have felt this way during the past week by checking the appropriate space. Rarely or
none of the time (less than 1 day)
Some or a little of the time (1-2 days)
Occasionally or a moderate amount of time (3-4 days)
Most or all of the time (5-7 days)
1. I was bothered by things that usually don't bother me. 0 1 2 3
2. I did not feel like eating; my appetite was poor. 0 1 2 3
3. I felt that I could not shake off the blues even with help from my family or friends.
0 1 2 3
4. I felt that I was just as good as other people. 0 1 2 3
5. I had trouble keeping my mind on what I was doing. 0 1 2 3
6. I felt depressed. 0 1 2 3 7. I felt that everything I did was an effort. 0 1 2 3
8. I felt hopeful about the future. 0 1 2 3 9. I thought my life had been a failure. 0 1 2 3 10. I felt fearful. 0 1 2 3 11. My sleep was restless. 0 1 2 3 12. I was happy. 0 1 2 3 13. I talked less than usual. 0 1 2 3 14. I felt lonely. 0 1 2 3 15. People were unfriendly. 0 1 2 3 16. I enjoyed life. 0 1 2 3 17. I had crying spells. 0 1 2 3 18. I felt sad. 0 1 2 3 19. I felt that people disliked me. 0 1 2 3 20. I could not get "going." 0 1 2 3
121
SECTION J: In this last section, we would like to know a little about you.
1. Age: ___________ 2. Sex: Male Female 3. Marital Status:
Single, never married Living common-law Widowed Married Divorced Separated
4. Please check your highest level of education:
Grade 8 or less Higher than grade 8, but did not graduate from high school High school graduate Post secondary or some university Technical school graduate University graduate
5. What is your combined total family unit/household income per year?
$0 - $49,999 $50,000 - $59,999 $60,000 - $79,999 Above $80,000
6. Have you hired a lawyer or paralegal to help you with your claim?
No Yes
122
A-1.2: Three-to-Five Day Follow-up Questionnaire Three-day Follow-up Study ID: _____________
SECTION J: In this section, we will be asking you questions specifically about
your daily activities and your feelings that may have been affected by your whiplash injuries.
Please select a number to indicate how you have been affected by the whiplash injury and symptoms. If one or more questions are not relevant to you, please leave that section blank.!!1. How much pain do you have today? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!/012! ! ! ! ! ! ! ! ! 3.456!7012!IC0J12089:!!%" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!!&" How much do your whiplash symptoms interfere with your work/home/study duties?!!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!!4. How much do your whiplash symptoms interfere with driving or using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!640@:9!12!!!!!!!!!!!!
<04K?5:!7?891<!640257.46!
(" How much do your whiplash symptoms interfere with sleep?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! H022.6!59::7!!6. How tired/fatigued do you feel as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55K;061J?:!!099!6G:!61C:!
!*" How much do your whiplash symptoms interfere with social activity?!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!5.<1091L:!!8. How much do your whiplash symptoms interfere with sporting activities? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!
123
!9. How much do your whiplash symptoms interfere with non-sporting leisure activity? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!!
10. How much sadness/depression do you experience as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! BA64:C:!!
50D2:55KM:74:551.2!!
11. How much anger do you experience as a result of your whiplash injury/symptoms? !
#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02J:4!!12. How much anxiety do you experience as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02A1:6>!!13. How much difficulty do you have concentrating as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!D1;;1<?96>!! ! ! ! ! ! ! ! ! =2089:!6.!! ! ! ! ! ! ! ! ! ! H.2<:26406:!$'" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !!!!!!!!! =2089:!6.!7:4;.4C!!$(" How much do your whiplash symptoms interfere with your work duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!!$)" How much do your whiplash symptoms interfere with your home duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C! 17. How much do your whiplash symptoms interfere with driving? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!640@:9!12!!!!!!!!!!!!
<04! 18. How much do your whiplash symptoms interfere with using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!?5:!7?891<!!
640257.46! 19. How tired do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55!099!6G:!61C:!!
124
! 20. How fatigued do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
;061J?:!099!6G:!61C:! 21. How much sadness do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!!!! BA64:C:!50D2:55! 22. How much depression do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!! BA64:C:!M:74:551.2!
23. How do you feel your neck pain has changed since the injury?
Very much better Better Slightly better No change Slightly worse Worse Very much worse
125
A-1.3: Six-week Follow-up Questionnaire
The University Health Network WDQ Validation Study
Follow-Up Questionnaire (Six weeks)
STUDY NUMBER: _________________________ TODAY’S DATE: _______, _______, ________ (day) (month) (year) 6 week Follow-up - v. Apr29.08 – Page 125
126
We would like to remind you that this information is confidential and will not be released to AVIVA or anyone else. If you do not wish to answer a question, please tell me and we will go on to the next question. If you do not understand a question and/or instruction, please ask me to explain it to you.
127
SECTION C: In this section, we will be asking questions about your pain and its
intensity as caused by your accident-related injuries.
1. Which part(s) of your body was injured by the accident? (Check all that may apply)
Neck Face Abdomen/chest/groin Head Low back Leg(s) Shoulder(s) Hand(s) Arm(s) Foot/feet Mid back
2. Please rate your average neck pain in the past 24 hours on a pain scale of 0 to 10 where 0 means
no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
3. Please rate your average shoulder pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
4. Please rate your average low back pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
5. Please rate your average headache pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
6. Please rate your average arm pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10
7. Please rate your average hand pain in the past 24 hours on a pain scale of 0 to 10 where
0 means no pain at all and 10 means pain as bad as it could be. Pain as bad
No Pain as could be 0 1 2 3 4 5 6 7 8 9 10
128
1. Please rate your average face pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain at all and 10 means pain as bad as it could be.
Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 2. Please rate your average leg pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no pain
at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 3. Please rate your average foot pain in the past 24 hours on a pain scale of 0 to 10 where 0 means no
pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 4. Please rate your average mid back pain in the past 24 hours on a pain scale of 0 to 10 where 0 means
no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 5. Please rate your average abdomen/chest/groin pain in the past 24 hours on a pain scale of 0 to 10
where 0 means no pain at all and 10 means pain as bad as it could be. Pain as bad No Pain as could be
0 1 2 3 4 5 6 7 8 9 10 6. Did the accident cause any of the following symptoms? (check any that apply) Anxiety or worry Anger Concentration or attention problems Difficulty moving your neck Dizziness or unsteadiness Feeling of numbness, tingling or pain in arms or hands Feeling of numbness, tingling or pain in legs or feet Hearing problems Memory problems or forgetfulness Pain when your neck is moved Sleep problems Sore jaw Unusual fatigue or tiredness Vision problems
129
SECTION D: In this section, we will be asking you questions about past and current health issues that are unrelated to your recent accident.
1. In general, would you say your health is:
Excellent Very Good Good Fair Poor
2. Compared to one week ago, how would you rate your health in general now?
Much better than one week
ago
Somewhat better now than one week ago
About the same as one week
ago
Somewhat worse now than one week ago
Much worse now than one
week ago
3. The following questions are about activities you might do during a typical day. Does your health now
limit you in these activities? If so, how much?
Yes, limited a
lot
Yes, limited a
little
No, not limited at
all
a Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports
b Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf
c Lifting or carrying groceries d Climbing several flights of stairs e Climbing one flight of stairs f Bending, kneeling, or stopping g Walking more than a mile h Walking several hundred yards i Walking one hundred yards j Bathing or dressing yourself
4. During the past week, how much of the time have you had any of the following problems with your
work or other regular daily activities as a result of your physical health?
All of the
time
Most of the time
Some of the time
A little of the time
None of the time
a Cut down on the amount of time you spent on work or other activities
b Accomplished less than you would like c Were limited in the kind of work or other activities
d Had difficulty performing the work or other activities (for example, it took extra effort)
130
5. During the past week, how much of the time have you had any of the following problems with your work or other regular daily activities as a result of any emotional problems (such as feeling depressed or anxious)?
6. During the past week, to what extent has your physical health or emotional problems interfered with
your normal social activities with family, friends, neighbours or groups?
Not at all Slightly Moderately Quite a bit Extremely
7. How much bodily pain have you had during the past week?
None Very mild Mild Moderate Severe Very severe
8. During the past week, how much did pain interfere with your normal work (including both work
outside the home and housework)?
Not at all Slightly Moderately Quite a bit Extremely
9. These questions are about how you have felt and how things have been with you during the past
week. For each question, please give the one answer that comes closest to the way you have been feeling.
How much of the time during the past week…
All of
the time
Most of the time
Some of the time
A little of the time
None of the time
a Did you feel full of life? b Have you been very nervous?
c Have you felt so down in the dumps that nothing could cheer you up?
d Have you felt calm and peaceful? e Did you have a lot of energy? f Have you felt downhearted and depressed? g Did you feel worn out? h Have you been happy? i Did you feel tired?
All of the
time
Most of the time
Some of the time
A little of the time
None of the time
a Cut down on the amount of time you spent on work or other activities
b Accomplished less than you would like c Did work or activities less carefully than usual
131
10. During the past week, how much of the time has your physical health or emotional problems interfered with your social activities (like visiting friends, relatives, etc.)?
All of the time Most of the time Some of the time A little of the time None of the time
11. How TRUE or FALSE is each of the following statements for you?
Definitely true
Mostly true
Don’t know
Mostly false
Definitely false
a I seem to get sick a little easier than other people
b I am as healthy as anybody I know c I expect my health to get worse d My health is excellent
132
SECTION E: In this section, we will be asking you a question about your
expectations of recovery.
12. Do you think that your injury will…
get better soon get better slowly never get better don’t know
SECTION F: In this section, we will be asking you a question regarding how well
you believe your recovery is progressing.
1. How well do you feel you are recovering from your injuries?
Completely better Much improved Slightly improved No change Slightly worse Much worse Worse than ever
133
SECTION H: In this section, we will be asking you questions specifically about your daily activities and your feelings that may have been affected by your whiplash injuries.
Please circle a number in each section to indicate how you have been affected by the whiplash injury and symptoms. If one or more questions are not relevant to you, please leave that section blank.!!!2. How much pain do you have today? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!/012!! ! ! ! ! ! ! ! ! 3.456!7012!IC0J12089:!!&" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!7:4;.4C!!'" How much do your whiplash symptoms interfere with your work/home/study duties?!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!7:4;.4C!!5. How much do your whiplash symptoms interfere with driving or using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!640@:9!12!!!!!!!!!!!!
<04K?5:!7?891<!640257.46!!
)" How much do your whiplash symptoms interfere with sleep?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! H022.6!59::7!!7. How tired/fatigued do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55K;061J?:!099!6G:!61C:!!
+" How much do your whiplash symptoms interfere with social activity?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!5.<1091L:!!9. How much do your whiplash symptoms interfere with sporting activities? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!!
$#" How much do your whiplash symptoms interfere with non-sporting leisure activity?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! ! ! ! =2089:!6.!
/0461<1706:!!
11. How much sadness/depression do you experience as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! BA64:C:!!
50D2:55KM:74551.2!
134
$%" How much anger do you experience as a result of your whiplash injury/symptoms?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02J:4!!$&" How much anxiety do you experience as a result of your whiplash injury/symptoms?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! ! ! ! BA64:C:!02A1:6>!!14. How much difficulty do you have concentrating as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!D1;;1<?96>!! ! ! ! ! ! ! ! ! =2089:!6.!<.2<:26406:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
135
!!SECTION I: In this section, we will be asking you questions about your mood.
Below is a list of some of the ways you may have felt or behaved. Please indicate how often you have felt this way during the past week by checking the appropriate space. Rarely or
none of the time (less than 1 day)
Some or a little of the time (1-2 days)
Occasionally or a moderate amount of time (3-4 days)
Most or all of the time (5-7 days)
1. I was bothered by things that usually don't bother me. 0 1 2 3
2. I did not feel like eating; my appetite was poor. 0 1 2 3
3. I felt that I could not shake off the blues even with help from my family or friends.
0 1 2 3
4. I felt that I was just as good as other people. 0 1 2 3
5. I had trouble keeping my mind on what I was doing. 0 1 2 3
6. I felt depressed. 0 1 2 3 7. I felt that everything I did was an effort. 0 1 2 3
8. I felt hopeful about the future. 0 1 2 3 9. I thought my life had been a failure. 0 1 2 3 10. I felt fearful. 0 1 2 3 11. My sleep was restless. 0 1 2 3 12. I was happy. 0 1 2 3 13. I talked less than usual. 0 1 2 3 14. I felt lonely. 0 1 2 3 15. People were unfriendly. 0 1 2 3 16. I enjoyed life. 0 1 2 3 17. I had crying spells. 0 1 2 3 18. I felt sad. 0 1 2 3 19. I felt that people disliked me. 0 1 2 3 20. I could not get "going." 0 1 2 3
136
SECTION J: In this last section, we would like to know a little about you.
1. Have you hired a lawyer or paralegal to help you with your claim?
No Yes
2. Are you currently working? No Yes
3. How do you feel your neck pain has changed since the injury?
Very much better Better Slightly better No change Slightly worse Worse Very much worse
Thank you for participating in this part of the study.
137
A-1.4: Addition to WIT Baseline Questionnaire
Baseline Study ID: _____________
THIS QUESTIONNAIRE IS DESIGNED TO HELP US BETTER UNDERSTAND HOW YOUR NECK PAIN AFFECTS YOUR ABILITY TO MANAGE EVERYDAY -LIFE ACTIVITIES. PLEASE MARK IN EACH SECTION THE ONE BOX THAT APPLIES TO YOU. ALTHOUGH YOU MAY CONSIDER THAT TWO OF THE STATEMENTS IN ANY ONE SECTION RELATE TO YOU, PLEASE MARK THE BOX THAT MOST CLOSELY DESCRIBES YOUR PRESENT -DAY SITUATION.
SECTION 1: Pain Intensity I have no pain at the moment The pain is very mild at the moment The pain is moderate at the moment The pain is fairly severe at the moment The pain is very severe at the moment The pain is the worst imaginable at the moment
SECTION 2: Personal Care (Washing, Dressing etc.)
I can look after myself normally without causing extra pain I can look after myself normally but it causes extra pain It is painful to look after myself and I am slow and careful I need some help but manage most of my personal care I need help every day in most aspects of my personal care I do not get dressed, I wash with difficulty, and stay in bed
SECTION 3: Lifting
I can lift heavy weights without extra pain I can lift heavy weights but it causes extra pain Pain prevents me from lifting heavy weights off the floor but I can manage if they are conveniently positioned (e.g. on a table) Pain prevents me from lifting heavy weights, but I can manage light to medium weights if they are conveniently positioned I can lift only very light weights I cannot lift or carry anything at all
SECTION 4: Reading
I can read as much as I want with no pain in my neck I can read as much as I want with slight pain in my neck I can read as much as I want with moderate pain in my neck I cannot read as much as I want because of moderate pain in my neck I can hardly read at all because of severe pain in my neck I cannot read at all
Addition to WIT baseline - v. Oct06.08 – Page 137
SECTION H: In this section, we will be asking you questions regarding your neck pain and how it affects your everyday life.
138
!"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$
SECTION 5: Headache I have no headaches at all I have slight headaches which occur infrequently I have moderate headaches which occur infrequently I have moderate headaches which occur frequently I have severe headaches which occur frequently I have headaches almost all the time
SECTION 6: Concentration
I can concentrate fully when I want with no difficulty I can concentrate fully when I want to with slight difficulty I have a fair degree of difficulty in concentrating when I want to I have a lot of difficulty in concentrating when I want to I have a great deal of difficulty in concentrating when I want to I cannot concentrate at all
SECTION 7: Work
I can do as much work as I want to I can do my usual work, but no more I can do most of my usual work, but not all I cannot do my usual work I can hardly do any work at all I cannot do any work at all
SECTION 8: Driving
I can drive my car without any neck pain I can drive my car as long as I want with slight pain in my neck I can drive my car as long as I want with moderate pain in my neck I cannot drive my car as long as I want because of moderate pain in my neck I can hardly drive at all because of severe pain in my neck I cannot drive at all
SECTION 9: Sleeping
I have no trouble sleeping My sleep is slightly disturbed (less than 1 hour sleepless) My sleep is mildly disturbed (1-2 hours sleepless) My sleep is moderately disturbed (2-3 hours sleepless) My sleep is greatly disturbed (3-5 hours sleepless) My sleep is completely disturbed (5-7 hours sleepless)
SECTION 10: Recreation
I am able to engage in all my recreation activities with no neck pain at all I am able to engage in all my recreation activities with some neck pain I am able to engage in most, but not all, my recreation activities because of pain in my neck I am able to engage in few of my recreation activities because of pain in my neck I can hardly do any recreation activities because of pain in my neck I cannot do any recreation activities at all
Addition to WIT baseline - v. Oct06.08 – Page 2
139
!"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$ SECTION I:
The following scales have been designed to find out about your neck pain and how it’s affecting you. Please answer ALL the scales, and mark ONE number on EACH scale that best describes how you feel. 8. Over the past week, on average, how would you rate your neck pain? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!/012! ! ! ! ! ! ! ! ! 3.456!7012!7.55189:!!9. Over the past week, how much has your neck pain interfered with your daily activities (housework,
washing, dressing, lifting, reading, driving)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!126:4;:4:2<:! ! ! ! ! ! ! =2089:!6.!<044>!.?6!0<61@16>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!10. Over the past week, how much has your neck pain interfered with your ability to take part in
recreational, social, and family activities?!!!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!126:4;:4:2<:! ! ! ! ! ! ! =2089:!6.!<044>!.?6!0<61@16>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11. Over the past week, how anxious (tense, uptight, irritable, difficulty in concentrating/relaxing) have
you been feeling? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!02A1.?5!! ! ! ! ! ! ! ! BA64:C:9>!02A1.?5!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!12. Over the past week, how depressed (down-in-the-dumps, sad, in low spirits, pessimistic, unhappy)
have you been feeling?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!D:74:55:D!!!! ! ! ! ! ! ! ! BA64:C:9>!D:74:55:D!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!13. Over the past week, how have you felt your work (both inside and outside the home) has affected (or
would affect) your neck pain? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!E0@:!C0D:!16!2.!F.45:! ! ! ! ! E0@:!C0D:!16!C?<G!F.45:!
!!
14. Over the past week, how much have you been able to control (reduce/help) your neck pain on your own?!
#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!H.C79:6:9>!<.264.9!16! ! ! ! ! ! !! ! -.!<.264.9!FG065.:@:4!!Addition to WIT baseline - v. Oct06.08 – Page 3 $$$
140
$!"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$ $'" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !!!!!!!!! =2089:!6.!7:4;.4C!!$(" How much do your whiplash symptoms interfere with your work duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!!$)" How much do your whiplash symptoms interfere with your home duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C! 17. How much do your whiplash symptoms interfere with driving? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!640@:9!12!!!!!!!!!!!!
<04! 18. How much do your whiplash symptoms interfere with using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!?5:!7?891<!!
640257.46! 19. How tired do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55!099!6G:!61C:!!
20. How fatigued do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
;061J?:!099!6G:!61C:! 21. How much sadness do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!!!! BA64:C:!50D2:55! 22. How much depression do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!! BA64:C:!M:74:551.2!
141
Addition to WIT baseline - v. Oct06.08 – Page 4 !"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$ SECTION G: In this section, we will be asking you a question regarding how well
you believe your recovery is progressing.
9. How do you feel your neck pain has changed since the injury?
Very much better Better Slightly better No change Slightly worse Worse Very much worse
Addition to WIT baseline - v. Oct06.08 – Page 5
142
A-1.5: Addition to WIT Six-week Follow-up Questionnaire $!"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$ Six-week Follow-up Study ID: _____________ SECTION G: In this section, we will be asking you a question regarding how well
you believe your recovery is progressing.
4. How do you feel your neck pain has changed since the injury? Very much better Better Slightly better No change Slightly worse Worse Very much worse
(" How much do your whiplash symptoms interfere with your personal care (washing, dressing, etc)?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! ! ! !!!!!!!! =2089:!6.!7:4;.4C!!
)" How much do your whiplash symptoms interfere with your work duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!!
*" How much do your whiplash symptoms interfere with your home duties?!#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!7:4;.4C!
8. How much do your whiplash symptoms interfere with driving? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! =2089:!6.!640@:9!12!!!!!!!!!!!!
<04!
9. How much do your whiplash symptoms interfere with using public transport? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.!06!099!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! =2089:!6.!?5:!7?891<!!
640257.46!
10. How tired do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
614:D2:55!099!6G:!61C:!
143
Addition to WIT 6wk follow-up - v. Oct06.08 – Page 143 !"#$%&&'(')*$()$+,-$!.'/012.$3*(4564*(')*$75'10$
11. How fatigued do you feel as a result of your whiplash injury/symptoms? #!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.6!06!099!! ! ! ! ! ! BA64:C:!!
;061J?:!099!6G:!61C:!
12. How much sadness do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!!!! BA64:C:!50D2:55!
13. How much depression do you experience as a result of your whiplash injury/symptoms? !#!! $!! %!! &!! '!! (!! )!! *!! +!! ,!! $#!-.2:!! ! ! ! ! ! ! !!!!!!! BA64:C:!M:74:551.2! Addition to WIT 6wk follow-up - v. Oct06.08 – Page 2
144
Appendix 2: Ethics Certificates
A-2.1: University Health Network Ethics Approval
145
A-2.1: University Health Network Ethics Approval (continued)
146
A-2.2: University of Toronto Ethics Approval
147
Appendix 3: Baseline WDQ Distributions
148
149
150
151
152
153
154
Appendix 4: COSMIN Checklist completed with criteria relevant to this
thesis
COSMIN checklist with 4-point scale Contact !"#$%&'%%(#)*+#,-#-./0%&1/23#4%5/678#!%.2%&#+%97&2:%.2#;<#=9/5%:/;8;>3#7.5#"/;1272/12/61#=4?@#A.12/2B2%#<;&#C%782*#7.5#!7&%#D%1%7&6*#EFGE#"$#H:12%&57:#$*%#I%2*%&87.51#J%K1/2%L#'''M6;1:/.M.8(#'''M%:>;M.8##=N:7/8L#6KM2%&'%%O0B:6M.8#
Instructions
$*/1#0%&1/;.#;<#2*%#!@P4AI#6*%6Q8/12#/1#&%6;::%.5%5#<;&#B1%#/.#1312%:72/6#&%0/%'1#;<#:%71B&%:%.2#9&;9%&2/%1M#J/2*#2*/1#0%&1/;.#/2#/1#9;11/K8%#2;#6786B872%#
;0%&788#:%2*;5;8;>/678#RB78/23#16;&%1#9%B53#;.#7#:%71B&%:%.2#9&;9%&23M#H#:%2*;5;8;>/678#RB78/23#16;&%#9%&#K;S#/1#;K27/.%5#K3#27Q/.>#2*%#8;'%12#&72/.>#;<#
7.3#/2%:#/.#7#K;S#TU';&1%#16;&%#6;B.21VWM#X;&#%S7:98%(#/<#<;#&%8/7K/8/23#12B53#;.%#/2%:#/.#2*%#K;S#UD%8/7K/8/23V#/1#16;&%5#9;;&(#2*%#:%2*;5;8;>/678#RB78/23#;<#2*72#
&%8/7K/8/23#12B53#/1#&72%5#71#9;;&M#$*%#A.2%&9&%27K/8/23#K;S#7.5#2*%#?%.%&78/Y7K/8/23#K;S#7&%#:7/.83#B1%5#71#5727#%S2&762/;.#<;&:1M#J%#&%6;::%.5#2;#B1%#2*%#
A.2%&9&%27K/8/23#K;S#2;#%S2&762#788#/.<;&:72/;.#;.#2*%#/.2%&9&%27K/8/23#/11B%1#5%16&/K%5#/.#2*/1#K;S#T%M>M#.;&:#16;&%1(#<8;;&N6%/8/.>#%<<%621(#:/./:78#/:9;&27.2#
6*7.>%W#;<#2*%#/.12&B:%.21#B.5%B53#<&;:#2*%#/.68B5%5#7&2/68%1M#P/:/87&(#'%#&%6;::%.5#2;#B1%#2*%#?%.%&78/Y7K/8/23#K;S#2;#%S2&762#5727#;.#2*%#6*7&762%&/12/61#
;<#2*%#12B53#9;9B872/;.#7.5#17:98/.>#9&;6%5B&%M#$*%&%<;&%#.;#16;&/.>#1312%:#'71#5%0%8;9%5#<;*%1%#K;S%1M#
#
$*/1#16;&/.>#1312%:#/1#5%16&/K%5#/.#2*/1#979%&L#
#
$%&'%%#!"(#4;QQ/.Q#Z"(#[.;8#+Z(#@12%8;#DJ\?(#";B2%&#Z4(#5%#,%2#C!JM#D72/.>#2*%#:%2*;5;8;>/678#RB78/23#/.#1312%:72/6#&%0/%'1#;<#12B5/%1#;.#
:%71B&%:%.2#9&;9%&2/%1L#7#16;&/.>#1312%:#<;*%#!@P4AI#6*%6Q8/12M#]B78/23#;<#Z/<%#D%1%7&6*#^FEE(#\B83#_#`%9BK#7*%75#;<#9&/.2aM#
#
155
Step 1. Evaluated measurement properties in the article
#
# A.2%&.78#6;.1/12%.63# ";S#H#
# D%8/7K/8/23# ";S#"#
# 4%71B&%:%.2#%&&;&# ";S#!#
# !;.2%.2#078/5/23# ";S#+#
# P2&B62B&78#078/5/23# ";S#=#
# C39;2*%1%1#2%12/.># ";S#X#
# !&;11N6B82B&78#078/5/23# ";S#?#
# !&/2%&/;.#078/5/23# ";S#C#
# D%19;.1/0%.%11# ";S#A#
156
Step 2. Determining if the statistical method used in the article are based on CTT or IRT
Box General requirements for studies that applied Item Response Theory (IRT) models excellent good fair poor E# J71#2*%#AD$#:;5%8#B1%5#75%RB72%83#5%16&/K%5b#%M>M#@.%#)7&7:%2%&#Z;>/12/6#4;5%8#
T@)Z4W(#)7&2/78#!&%5/2#4;5%8#T)!4W(#?&75%5#D%19;.1%#4;5%8#T?D4W#
AD$#:;5%8#75%RB72%83#5%16&/K%5#
AD$#:;5%8#.;2#75%RB72%83#5%16&/K%5##
# #
# # # # # #^# J71#2*%#6;:9B2%<2'7&%#976Q7>%#B1%5#75%RB72%83#5%16&/K%5b#%M>M#D-44^F^F(#
JAIP$=)P(#@)Z4(#4-Z$AZ@?(#)HDP!HZ=(#"AZ@?(#IZ4Ac=+#
P;<2'7&%#976Q7>%#75%RB72%83#5%16&/K%5##
P;<2'7&%#976Q7>%#.;2#75%RB72%83#5%16&/K%5#
# #
# # # # # #d# J71#2*%#:%2*;5#;<#%12/:72/;.#B1%5#75%RB72%83#5%16&/K%5b#%M>M#6;.5/2/;.78#
:7S/:B:#8/Q%8/*;;5#T!4ZW(#:7&>/.78#:7S/:B:#8/Q%8/*;;5#T44ZW##
4%2*;5#;<#%12/:72/;.#75%RB72%83#5%16&/K%5#
4%2*;5#;<#%12/:72/;.#.;2#75%RB72%83#5%16&/K%5#
# #
# # # # # #e# J%&%#2*%#711B:92/;.1#<;&#%12/:72/.>#97&7:%2%&1#;<#2*%#AD$#:;5%8#6*%6Q%5b#%M>M#
B./5/:%.1/;.78/23(#8;678#/.5%9%.5%.6%(#7.5#/2%:#</2#T%M>M#5/<<%&%.2/78#/2%:#<B.62/;./.>#
T+AXWW#
711B:92/;.1#;<#2*%#AD$#:;5%8#6*%6Q%5#
711B:92/;.1#;<#2*%#AD$#:;5%8#97&283#6*%6Q%5#
711B:92/;.1#;<#2*%#AD$#:;5%8#.;2#6*%6Q%5#;&#B.Q.;'.#
#
$;#;K27/.#7#2;278#16;&%#<;*%#:%2*;5;8;>/678#RB78/23#;<#12B5/%1#2*72#B1%#AD$#:%2*;51(#2*%#U';&1%#16;&%#6;B.21V#78>;&/2*:#1*;B85#K%#7998/%5#2;#
2*%#AD$#K;S#/.#6;:K/.72/;.#'/2*#2*%#K;S#;<#2*%#:%71B&%:%.2#9&;9%&23#2*72#'71#%078B72%5#/.#2*%#AD$#12B53M#X;&#%S7:98%(#/<#AD$#:%2*;51#7&%#
B1%5#2;#12B53#/.2%&.78#6;.1/12%.63#7.5#/2%:#e#/.#2*%#AD$#K;S#/1#16;&%5#<7/&(#'*/8%#2*%#/2%:1#/.#2*%#/.2%&.78#6;.1/12%.63#K;S#TK;S#HW#7&%#788#16;&%5#
71#>;;5#;&#%S6%88%.2(#2*%#:%2*;5;8;>/678#RB78/23#16;&%#<;&#/.2%&.78#6;.1/12%.63#'/88#K%#<7/&M#C;'%0%&(#/<#7.3#;<#2*%#/2%:1#/.#K;S#H#/1#16;&%5#9;;&(#
2*%#:%2*;5;8;>/678#RB78/23#16;&%#<;&#/.2%&.78#6;.1/12%.63#'/88#K%#9;;&M#
157
Step 3. Determining if a study meets the standards for good methodological quality#
Box A. Internal consistency# excellent good fair poor E# +;%1#2*%#1678%#6;.1/12#;<#%<<%62#/.5/672;&1(#/M%M#/1#/2#K71%5#;.#7#&%<8%62/0%#:;5%8b#
#Design requirements # # # # # #^# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#
:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
d# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
e# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#/.2%&.78#6;.1/12%.63#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFFW#
?;;5#17:98%#1/Y%#TfFNggW##
4;5%&72%#17:98%#1/Y%#TdFNegW#
P:788#17:98%#1/Y%#ThdFW#
f# J71#2*%#B./5/:%.1/;.78/23#;<#2*%#1678%#6*%6Q%5b#/M%M#'71#<762;.7831/1#;&#AD$#:;5%8#7998/%5b#
X762;.7831/1#9%&<;&:%5#/.#2*%#12B53#9;9B872/;.#
HB2*;&1#&%<%#7.;2*%B53#/.#'*/6*#<762;.7831/1#'71#9%&<;&:%5#/.#7#1/:/87B53#9;9B872/;.#
HB2*;&1#&%<%#7.;2*%B53#/.#'*/6*#<762;.7831/1#'71#9%&<;&:%5(#KB2#.;2#/.#7#1/:/87B53#9;9B872/;.#
X762;.7831/1#I@$#9%&<;&:%5#7.5#.;#&%<%&%.6%#2;#7.;2*%B53#
# # # # # #_# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#B./5/:%.1/;.78/23#7.7831/1#75%RB72%b# ij#k/2%:1#7.5#
EFF##fj#k/2%:1#7.5#EFF#@D#_Nij#k/2%:1#KB2#hEFF##
fj#k/2%:1#KB2#hEFF#
hfj#k/2%:1#
158
i# J71#7.#/.2%&.78#6;.1/12%.63#1272/12/6#6786B872%5#<;&#%76*#TB./5/:%.1/;.78W#T1BKW1678%#1%97&72%83b#
A.2%&.78#6;.1/12%.63#1272/12/6#6786B872%5#<;&#%76*#1BK1678%#1%97&72%83#
# # A.2%&.78#6;.1/12%.63#1272/12/6#I@$#6786B872%5#<;&#%76*#1BK1678%#1%97&72%83##
G# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b##
I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #g# <;&#!8711/678#$%12#$*%;&3#T!$$W(#6;.2/.B;B1#16;&%1L#J71#!&;.K76*V1#789*7#
6786B872%5b#!&;.K76*V1#789*7#6786B872%5#
# @.83#/2%:N2;278#6;&&%872/;.1#6786B872%5#
I;#!&;.K76*V1#789*7#7.5#.;#/2%:N2;278#6;&&%872/;.1#6786B872%5##
EF# <;&#!$$(#5/6*;2;:;B1#16;&%1L#J71#!&;.K76*V1#789*7#;&#[DN^F#6786B872%5b# !&;.K76*V1#789*7#;&#[DN^F#6786B872%5#
# @.83#/2%:N2;278#6;&&%872/;.1#6786B872%5#
I;#!&;.K76*V1#789*7#;&#[DN^F#7.5#.;#/2%:N2;278#6;&&%872/;.1#6786B872%5##
EE# <;&#AD$L#J71#7#>;;5.%11#;<#</2#1272/12/6#72#7#>8;K78#8%0%8#6786B872%5b#=M>M# ^(#&%8/7K/8/23#6;%<</6/%.2#;<#%12/:72%5#872%.2#2&7/2#078B%#T/.5%S#;<#T1BKl%62#;&#/2%:W#1%97&72/;.W##
?;;5.%11#;<#</2#1272/12/6#72#7#>8;K78#8%0%8#6786B872%5#
# # ?;;5.%11#;<#</2#1272/12/6#72#7#>8;K78#8%0%8#I@$#6786B872%5#
#I"M#A2%:#E#/1#B1%5#2;#5%2%&:/.%#'*%2*%&#/.2%&.78#6;.1/12%.63#/1#&%8%07.2#<;*%#/.12&B:%.2#B.5%B53M#A2#/1#.;2#B1%5#2;#&72%#2*%#RB78/23#;<#2*%#12B53M#
159
Box B. Reliability: relative measures (including test-retest reliability, inter-rater reliability and intra-rater reliability)#
excellent good fair poor Design requirements
E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFFW##
?;;5#17:98%#1/Y%#TfFNggW#
4;5%&72%#17:98%#1/Y%#TdFNegW#
P:788#17:98%#1/Y%#ThdFW#
e# J%&%#72#8%712#2';#:%71B&%:%.21#707/87K8%b# H2#8%712#2';#:%71B&%:%.21##
# # @.83#;.%#:%71B&%:%.2#
f# J%&%#2*%#75:/./12&72/;.1#/.5%9%.5%.2b# A.5%9%.5%.2#:%71B&%:%.21#
H11B:7K8%#2*72#2*%#:%71B&%:%.21#'%&%#/.5%9%.5%.2#
+;BK2<B8#'*%2*%*%#:%71B&%:%.21#'%&%#/.5%9%.5%.2##
:%71B&%:%.21#I@$#/.5%9%.5%.2#
_# J71#2*%#2/:%#/.2%&078#1272%5b# $/:%#/.2%&078#1272%5#
# $/:%#/.2%&078#I@$#1272%5##
#
i# J%&%#972/%.21#127K8%#/.#2*%#/.2%&/:#9%&/;5#;.#2*%#6;.12&B62#2;#K%#:%71B&%5b# )72/%.21#'%&%#127K8%#T%0/5%.6%#9&;0/5%5W#
H11B:7K8%#2*72#972/%.21#'%&%#127K8%##
-.68%7&#/<#972/%.21#'%&%#127K8%#
)72/%.21#'%&%#I@$#127K8%#
G# J71#2*%#2/:%#/.2%&078#799&;9&/72%b# $/:%#/.2%&078#799&;9&/72%#
# +;BK2<B8#'*%2*%/:%#/.2%&078#'71#799&;9&/72%##
$/:%#/.2%&078#I@$#799&;9&/72%#
160
g# J%&%#2*%#2%12#6;.5/2/;.1#1/:/87&#<;&#K;2*#:%71B&%:%.21b#%M>M#239%#;<#75:/./12&72/;.(#%.0/&;.:%.2(#/.12&B62/;.1#
$%12#6;.5/2/;.1#'%&%#1/:/87&#T%0/5%.6%#9&;0/5%5W##
H11B:7K8%#2*72#2%12#6;.5/2/;.1#'%&%#1/:/87&#
-.68%7&#/<#2%12#6;.5/2/;.1#'%&%#1/:/87&#
$%12#6;.5/2/;.1#'%&%#I@$#1/:/87&#
EF# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods
EE# <;.2/.B;B1#16;&%1L#J71#7.#/.2&768711#6;&&%872/;.#6;%<</6/%.2#TA!!W#6786B872%5b# A!!#6786B872%5#7.5#:;5%8#;&#<;&:B87#;<#2*%#A!!#/1#5%16&/K%5#
A!!#6786B872%5#KB2#:;5%8#;&#<;&:B87#;<#2*%#A!!#.;2#5%16&/K%5#;&#.;2#;92/:78M#)%7&1;.#;&#P9%7&:7.#6;&&%872/;.#6;%<</6/%.2#6786B872%5#'/2*#%0/5%.6%#9&;0/5%5#2*72#.;#1312%:72/6#6*7.>%#*71#;66B&&%5#
)%7&1;.#;&#P9%7&:7.#6;&&%872/;.#6;%<</6/%.2#6786B872%5#JA$C@-$#%0/5%.6%#9&;0/5%5#2*72#.;#1312%:72/6#6*7.>%#*71#;66B&&%5#;&#JA$C#%0/5%.6%#2*72#1312%:72/6#6*7.>%#*71#;66B&&%5##
I;#A!!#;&#)%7&1;.#;&#P9%7&:7.#6;&&%872/;.1#6786B872%5#
E^# <;/6*;2;:;B1m.;:/.78m;&5/.78#16;&%1L#J71#Q7997#6786B872%5b# [7997#6786B872%5# # # @.83#9%&6%.27>%#7>&%%:%.2#6786B872%5##
Ed# <;&#;&5/.78#16;&%1L#J71#7#'%/>*2%5#Q7997#6786B872%5b# J%/>*2%5#[7997#6786B872%5#
# -.'%/>*2%5#[7997#6786B872%5#
@.83#9%&6%.27>%#7>&%%:%.2#6786B872%5##
Ee# <;&#;&5/.78#16;&%1L#J71#2*%#'%/>*2/.>#16*%:%#5%16&/K%5b#%M>M#8/.%7&(#RB75&72/6# J%/>*2/.>#16*%:%#5%16&/K%5#
J%/>*2/.>#16*%:%#I@$#5%16&/K%5#
# #
#
161
Box C. Measurement error: absolute measures#
excellent good fair poor Design requirements
E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFFW#
?;;5#17:98%#1/Y%#TfFNggW#
4;5%&72%#17:98%#1/Y%#TdFNegW#
P:788#17:98%#1/Y%#ThdFW##
e# J%&%#72#8%712#2';#:%71B&%:%.21#707/87K8%b# H2#8%712#2';#:%71B&%:%.21##
# # @.83#;.%#:%71B&%:%.2#
f# J%&%#2*%#75:/./12&72/;.1#/.5%9%.5%.2b# A.5%9%.5%.2#:%71B&%:%.21#
H11B:7K8%#2*72#2*%#:%71B&%:%.21#'%&%#/.5%9%.5%.2#
+;BK2<B8#'*%2*%*%#:%71B&%:%.21#'%&%#/.5%9%.5%.2##
:%71B&%:%.21#I@$#/.5%9%.5%.2#
_# J71#2*%#2/:%#/.2%&078#1272%5b# $/:%#/.2%&078#1272%5#
# $/:%#/.2%&078#I@$#1272%5##
#
i# J%&%#972/%.21#127K8%#/.#2*%#/.2%&/:#9%&/;5#;.#2*%#6;.12&B62#2;#K%#:%71B&%5b# )72/%.21#'%&%#127K8%#T%0/5%.6%#9&;0/5%5W#
H11B:7K8%#2*72#972/%.21#'%&%#127K8%##
-.68%7&#/<#972/%.21#'%&%#127K8%#
)72/%.21#'%&%#I@$#127K8%#
G# J71#2*%#2/:%#/.2%&078#799&;9&/72%b# $/:%#/.2%&078#799&;9&/72%#
# +;BK2<B8#'*%2*%/:%#/.2%&078#'71#799&;9&/72%##
$/:%#/.2%&078#I@$#799&;9&/72%#
162
g# J%&%#2*%#2%12#6;.5/2/;.1#1/:/87&#<;&#K;2*#:%71B&%:%.21b#%M>M#239%#;<#75:/./12&72/;.(#%.0/&;.:%.2(#/.12&B62/;.1#
$%12#6;.5/2/;.1#'%&%#1/:/87&#T%0/5%.6%#9&;0/5%5W##
H11B:7K8%#2*72#2%12#6;.5/2/;.1#'%&%#1/:/87&#
-.68%7&#/<#2%12#6;.5/2/;.1#'%&%#1/:/87&#
$%12#6;.5/2/;.1#'%&%#I@$#1/:/87&#
EF# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #EE# <;&#!$$L#J71#2*%#P27.57&5#=&&;&#;<#4%71B&%:%.2#TP=4W(#P:788%12#+%2%627K8%#
!*7.>%#TP+!W#;&#Z/:/21#;<#H>&%%:%.2#TZ;HW#6786B872%5b#P=4(#P+!(#;&#Z;H#6786B872%5#
);11/K8%#2;#6786B872%#Z;H#<&;:#2*%#5727#9&%1%.2%5#
# P=4#6786B872%5#K71%5#;.#!&;.K76*V1#789*7(#;&#;.#P+#<&;:#7.;2*%	9B872/;.#
## Box D. Content validity (including face validity)#
excellent good fair poor General requirements
E# J71#2*%&%#7.#711%11:%.2#;<#'*%2*%̔#/2%:1#&%<%#&%8%07.2#719%621#;<#2*%#6;.12&B62#2;#K%#:%71B&%5b#
H11%11%5#/<#788#/2%:1#&%<%#&%8%07.2#719%621#;<#2*%#6;.12&B62#2;#K%#:%71B&%5#
# H19%621#;<#2*%#6;.12&B62#2;#K%#:%71B&%5#9;;&83#5%16&/K%5#HI+#2*/1#'71#.;2#27Q%.#/.2;#6;.1/5%&72/;.##
I@$#711%11%5#/<#788#/2%:1#&%<%#&%8%07.2#719%621#;<#2*%#6;.12&B62#2;#K%#:%71B&%5#
163
^# J71#2*%&%#7.#711%11:%.2#;<#'*%2*%̔#/2%:1#7&%#&%8%07.2#<;*%#12B53#9;9B872/;.b#T%M>M#7>%(#>%.5%&(#5/1%71%#6*7&762%&/12/61(#6;B.2&3(#1%22/.>W#
H11%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#12B53#9;9B872/;.#/.#75%RB72%#17:98%#1/Y%#T EFW#
H11%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#12B53#9;9B872/;.#/.#:;5%&72%#17:98%#1/Y%#TfNgW#
H11%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#12B53#9;9B872/;.#/.#1:788#17:98%#1/Y%#ThfW#
I@$#711%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#12B53#9;9B872/;.#@D#27&>%2#9;9B872/;.#.;2#/.0;80%5##
d# J71#2*%&%#7.#711%11:%.2#;<#'*%2*%̔#/2%:1#7&%#&%8%07.2#<;*%#9B&9;1%#;<#2*%#:%71B&%:%.2#/.12&B:%.2b#T5/16&/:/.72/0%(#%078B72/0%(#7.5m;	&%5/62/0%W#
H11%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#9B&9;1%#;<#2*%#7998/672/;.#
)B&9;1%#;<#2*%#/.12&B:%.2#'71#.;2#5%16&/K%5#KB2#711B:%5#
I@$#711%11%5#/<#788#/2%:1#7&%#&%8%07.2#<;*%#9B&9;1%#;<#2*%#7998/672/;.##
#
e# J71#2*%&%#7.#711%11:%.2#;<#'*%2*%̔#/2%:1#2;>%2*%:9&%*%.1/0%83#&%<8%62#2*%#6;.12&B62#2;#K%#:%71B&%5b#
H11%11%5#/<#788#/2%:1#2;>%2*%:9&%*%.1/0%83#&%<8%62#2*%#6;.12&B62#2;#K%#:%71B&%5#
# I;#2*%;&%2/678#<;B.572/;.#;<#2*%#6;.12&B62#7.5#2*/1#'71#.;2#27Q%.#/.2;#6;.1/5%&72/;.#
I@$#711%11%5#/<#788#/2%:1#2;>%2*%:9&%*%.N1/0%83#&%<8%62#2*%#6;.12&B62#2;#K%#:%71B&%5###
f# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
###
164
Box E. Structural validity# excellent good fair poor E# +;%1#2*%#1678%#6;.1/12#;<#%<<%62#/.5/672;&1(#/M%M#/1#/2#K71%5#;.#7#&%<8%62/0%#:;5%8b#
#Design requirements # # # # # #^# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#
:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
d# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
e# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# ij#k/2%:1#7.5#EFF##
fj#k/2%:1#7.5#EFF#@D#fNij#k/2%:1#KB2#hEFF##
fj#k/2%:1#KB2#hEFF#
hfj#k/2%:1#
f# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#T%M>M#&;272/;.#:%2*;5#.;2#5%16&/K%5W#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#T%M>M#/.799&;9&/72%#&;272/;.#:%2*;5W#
165
Statistical methods # # # # # #_# <;&#!$$L#J71#%S98;&72;&3#;.</&:72;&3#<762;.7831/1#9%&<;&:%5b# =S98;&72;&3#;&#
6;.</&:72;&3#<762;.7831/1#9%&<;&:%5#7.5#239%#;<#<762;.7831/1#799&;9&/72%#/.#0/%'#;<#%S/12/.>#/.<;&:72/;.##
=S98;&72;&3#<762;.7831/1#9%&<;&:%5#'*/8%#6;.</&:72;&3#';B85#*70%#K%%.#:;&%#799&;9&/72%#
# I;#%S98;&72;&3#;.</&:72;&3#<762;.7831/1#9%&<;&:%5#
i# <;&#AD$L#J%&%#AD$#2%121#<;%2%&:/./.>#2*%#TB./NW#5/:%.1/;.78/23#;<#2*%#/2%:1#
9%&<;&:%5b#
AD$#2%12#<;%2%&:/./.>#TB./W5/:%.1/;.N78/23#9%&<;&:%5#
# # AD$#2%12#<;%2%&:/./.>#TB./W5/:%.1/;.N78/23#I@$#9%&<;&:%5#
###Box F. Hypotheses testing#
excellent good fair Poor Design requirements
E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFF#9%.7831/1W#
?;;5#17:98%#1/Y%#TfFNgg#9%.7831/1W#
4;5%&72%#17:98%#1/Y%#TdFNeg#9%.7831/1W##
P:788#17:98%#1/Y%#ThdF#9%.7831/1W#
166
e# J%&%#*39;2*%1%1#&%>7&5/.>#6;&&%872/;.1#;&#:%7.#5/<<%&%.6%1#<;&:B872%5#7#9&/;&/#T/M%M#K%<;&%#5727#6;88%62/;.Wb#
4B82/98%#*39;2*%1%1#<;&:B872%5#7#9&/;&/#
4/./:78#.B:K%&#;<#*39;2*%1%1#<;&:B872%#7#9&/;&/#
C39;2*%1%1#07>B%#;&#.;2#<;&:B872%5#KB2#9;11/K8%#2;#5%5B6%#'*72#'71#%S9%62%5#
-.68%7&#'*72#'71#%S9%62%5#
# # # #f# J71#2*%#%S9%62%5#direction#;<#6;&&%872/;.1#;&#:%7.#5/<<%&%.6%1#/.68B5%5#/.#2*%#
*39;2*%1%1b#=S9%62%5#5/&%62/;.#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#1272%5#
=S9%62%5#5/&%62/;.#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#I@$#1272%5##
# #
_# J71#2*%#%S9%62%5#7K1;8B2%#;&#&%872/0%#magnitude#;<#6;&&%872/;.1#;&#:%7.#5/<<%&%.6%1#/.68B5%5#/.#2*%#*39;2*%1%1b#
=S9%62%5#:7>./2B5%#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#1272%5#
=S9%62%5#:7>./2B5%#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#I@$#1272%5##
# #
i# <;.0%&>%.2#078/5/23L#J71#7.#75%RB72%#5%16&/92/;.#9&;0/5%5#;<#2*%#6;:97&72;&#/.12&B:%.2T1Wb#
H5%RB72%#5%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W#
H5%RB72%#5%16&/92/;.#;<#:;12#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W##
);;%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W#
I@#5%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W#
G# <;.0%&>%.2#078/5/23L#J%&%#2*%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#75%RB72%83#5%16&/K%5b#
H5%RB72%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#/.#7#9;9B872/;.#1/:/87#2*%#12B53#9;9B872/;.#
H5%RB72%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#KB2#.;2#1B&%#/<#2*%1%#79983#2;#2*%#12B53#9;9B872/;.#
P;:%#/.<;&:72/;.#;.#:%71B&%:%.2#9&;9%&2/%1#T;#&%<%&%.6%#2;#7#12B53#;.#:%71B&%:%.2#9&;9%&2/%1W#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#/.#7.3#12B53#9;9B872/;.##
I;#/.<;&:72/;.#;.#2*%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#
167
g# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#T%M>M#;.83#5727#9&%1%.2%5#;.#7#6;:97&/1;.#'/2*#7.#/.12&B:%.2#2*72#:%71B&%1#7.;2*%.12&B62W#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #EF# J%&%#5%1/>.#7.5#1272/12/678#:%2*;51#75%RB72%#<;*%#*39;2*%1%1#2;#K%#2%12%5b# P272/12/678#
:%2*;51#7998/%5#799&;9&/72%#
H11B:7K8%#2*72#1272/12/678#:%2*;51#'%&%#799&;9&/72%(#%M>M#)%7&1;.#6;&&%872/;.1#7998/%5(#KB2#5/12&/KB2/;.#;<#16;&%1#;&#:%7.#TP+W#.;2#9&%1%.2%5#
P272/12/678#:%2*;51#7998/%5#I@$#;92/:78#
P272/12/678#:%2*;51#7998/%5#I@$#799&;9&/72%#
####Box G. Cross-cultural validity#
excellent good fair poor Design requirements
E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
168
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# !$$L#ij#k/2%:1#7.5# EFF#AD$L# ^FF#9%&#>&;B9##
!$$L#fj#k/2%:1#7.5# EFF#@D#fNij#k/2%:1#KB2#hEFF#AD$L# ^FF#/.#E#>&;B9#7.5#EFFNEgg#/.#E#>&;B9#
!$$L#fj#k/2%:1#KB2#hEFF#AD$L#EFFNEgg#9%&#>&;B9#
!$$L#hfj#k/2%:1#AD$L#ThEFF#/.#E#;&#K;2*#>&;B91#
e# J%&%#K;2*#2*%#;&/>/.78#87.>B7>%#/.#'*/6*#2*%#CDN)D@#/.12&B:%.2#'71#5%0%8;9%5(#7.5#2*%#87.>B7>%#/.#'*/6*#2*%#CDN)D@#/.12&B:%.2#'71#2&7.1872%5#5%16&/K%5b#
";2*#1;B&6%#87.>B7>%#7.5#27&>%2#87.>B7>%#5%16&/K%5##
# # P;B&6%#87.>B7>%#I@$#Q.;'.#
f# J71#2*%#%S9%&2/1%#;<#2*%#9%;98%#/.0;80%5#/.#2*%#2&7.1872/;.#9&;6%11#75%RB72%83#5%16&/K%5b#%M>M#%S9%&2/1%#/.#2*%#5/1%71%T1W#/.0;80%5(#%S9%&2/1%#/.#2*%#6;.12&B62#2;#K%#:%71B&%5(#%S9%&2/1%#/.#K;2*#87.>B7>%1#
=S9%&2/1%#;<#2*%#2&7.1872;&1#5%16&/K%5#'/2*#&%19%62#2;#5/1%71%(#6;.12&B62(#7.5#87.>B7>%##
=S9%&2/1%#;<#2*%#2&7.1872;&1#'/2*#&%19%62#2;#5/1%71%#;.12&B62#9;;&#;&#.;2#5%16&/K%5#
=S9%&2/1%#;<#2*%#2&7.1872;&1#'/2*#&%19%62#2;#87.>B7>%#.;2#5%16&/K%5#
#
_# +/5#2*%#2&7.1872;&1#';&Q#/.5%9%.5%.283#<&;:#%76*#;2*%&b# $&7.1872;&1#';&Q%5#/.5%9%.5%.2#
H11B:7K8%#2*72#2*%#2&7.1872;&1#';&Q%5#/.5%9%.5%.2##
-.68%7&#'*%2*%&7.1872;&1#';&Q%5#/.5%9%.5%.2#
$&7.1872;&1#';&Q%5#I@$#/.5%9%.5%.2#
i# J%&%#/2%:1#2&7.1872%5#<;&'7&5#7.5#K76Q'7&5b# 4B82/98%#<;&'7&5#7.5#:B82/98%#K76Q'7&5#2&7.1872/;.1##
4B82/98%#<;&'7&5#2&7.1872/;.1#KB2#;.%#K76Q'7&5#2&7.1872/;.##
@.%#<;&'7&5#7.5#;.%#K76Q'7&5#2&7.1872/;.#
@.83#7#<;&'7&5#2&7.1872/;.#
G# J71#2*%&%#7.#75%RB72%#5%16&/92/;.#;<#*;'#5/<<%&%.6%1#K%2'%%.#2*%#;&/>/.78#7.5#2&7.1872%5#0%&1/;.1#'%&%#&%1;80%5b#
H5%RB72%#5%16&/92/;.#;<#*;'#5/<<%&%.6%1#K%2'%%.#2&7.1872;&1#'%&%#&%1;80%5##
);;&83#;&#I@$#5%16&/K%5#*;'#5/<<%&%.6%1#K%2'%%.#2&7.1872;&1#'%&%#&%1;80%5#
# #
169
g# J71#2*%#2&7.1872/;.#&%0/%'%5#K3#7#6;::/22%%#T%M>M#;&/>/.78#5%0%8;9%&1Wb# $&7.1872/;.#&%0/%'%5#K3#7#6;::/22%%#T/.0;80/.>#;2*%	%;98%#2*7.#2*%#2&7.1872;&1(#%M>M#2*%#;&/>/.78#5%0%8;9%&1W##
$&7.1872/;.#I@$#&%0/%'%5#K3#T1B6*W#7#6;::/22%%#
# #
EF# J71#2*%#CDN)D@#/.12&B:%.2#9&%N2%12%5#T%M>M#6;>./2/0%#/.2%&0/%'1W#2;#6*%6Q#/.2%&9&%272/;.(#6B82B&78#&%8%07.6%#;<#2*%#2&7.1872/;.(#7.5#%71%#;<#6;:9&%*%.1/;.b#
$&7.1872%5#/.12&B:%.2#9&%N2%12%5#/.#2*%#27&>%2#9;9B872/;.#
$&7.1872%5#/.12&B:%.2#9&%N2%12%5(#KB2#B.68%7&#/<#2*/1#'71#5;.%#/.#2*%#27&>%2#9;9B872/;.##
$&7.1872%5#/.12&B:%.2#9&%N2%12%5(#KB2#I@$#/.#2*%#27&>%2#9;9B872/;.#
$&7.1872%5#/.12&B:%.2#I@$#9&%N2%12%5#
EE# J71#2*%#17:98%#B1%5#/.#2*%#9&%N2%12#75%RB72%83#5%16&/K%5b# P7:98%#B1%5#/.#2*%#9&%N2%12#75%RB72%83#5%16&/K%5##
# P7:98%#B1%5#/.#2*%#9&%N2%12#I@$#T75%RB72%83W#5%16&/K%5#
#
E^# J%&%#2*%#17:98%1#1/:/87&#<;̔#6*7&762%&/12/61#%S6%92#87.>B7>%#7.5m;B82B&78#K76Q>&;B.5b#
P*;'.#2*72#17:98%1#'%&%#1/:/87&#<;̔#6*7&762%&/12/61#%S6%92#87.>B7>%#m6B82B&%#
P272%5#TKB2#.;2#1*;'.W#2*72#17:98%1#'%&%#1/:/87&#<;̔#6*7&762%&/12/61#%S6%92#87.>B7>%#m6B82B&%##
-.68%7&#'*%2*%:98%1#'%&%#1/:/87&#<;̔#6*7&762%&/12/61#%S6%92#87.>B7>%#m6B82B&%##
P7:98%1#'%&%#I@$#1/:/87&#<;̔#6*7&762%&/12/61#%S6%92#87.>B7>%#m6B82B&%##
Ed# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
170
Statistical methods # # # # # #Ee# <;&#!$$L#J71#6;.</&:72;&3#<762;.7831/1#9%&<;&:%5b# 4B82/98%N>&;B9#
6;.</&:72;&3#<762;.7831/1#9%&<;&:%5##
# # 4B82/98%N>&;B9#6;.</&:72;&3#<762;.7831/1#I@$#9%&<;&:%5#
Ef# <;&#AD$L#J71#5/<<%&%.2/78#/2%:#<B.62/;.#T+AXW#K%2'%%.#87.>B7>%#>&;B91#711%11%5b# +AX#K%2'%%.#87.>B7>%#>&;B91#711%11%5#
# # +AX#K%2'%%.#87.>B7>%#>&;B91#I@$#711%11%5#
##Box H. Criterion validity#
excellent good fair poor Design requirements
E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFFW#
?;;5#17:98%#1/Y%#TfFNggW#
4;5%&72%#17:98%#1/Y%#TdFNegW##
P:788#17:98%#1/Y%#ThdFW#
e# !7.#2*%#6&/2%&/;.#B1%5#;&#%:98;3%5#K%#6;.1/5%&%5#71#7#&%71;.7K8%#U>;85#127.57&5Vb# !&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#T%0/5%.6%#9&;0/5%5W#
I;#%0/5%.6%#9&;0/5%5(#KB2#711B:7K8%#2*72#2*%#6&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V##
-.68%7&#'*%2*%*%#6&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#
!&/2%&/;.#B1%5#67.#I@$#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#
171
f# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #_# <;.2/.B;B1#16;&%1L#J%&%#6;&&%872/;.1(#;*%#7&%7#B.5%*%#&%6%/0%&#;9%&72/.>#
6B&0%#6786B872%5b#!;&&%872/;.1#;&#H-!#6786B872%5#
# # !;&&%872/;.1#;&#H-!#I@$#6786B872%5##
i# <;/6*;2;:;B1#16;&%1L#J%&%#1%.1/2/0/23#7.5#19%6/</6/23#5%2%&:/.%5b# P%.1/2/0/23#7.5#19%6/</6/23#6786B872%5#
# # P%.1/2/0/23#7.5#19%6/</6/23#I@$#6786B872%5#
##Box I. Responsiveness# excellent good fair poor Design requirements E# J71#2*%#9%&6%.27>%#;<#:/11/.>#/2%:1#>/0%.b# )%&6%.27>%#;<#
:/11/.>#/2%:1#5%16&/K%5#
)%&6%.27>%#;<#:/11/.>#/2%:1#I@$#5%16&/K%5##
# #
^# J71#2*%&%#7#5%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5b# +%16&/K%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
I;2#5%16&/K%5#KB2#/2#67.#K%#5%5B6%5#*;'#:/11/.>#/2%:1#'%&%#*7.58%5##
I;2#68%7&#*;'#:/11/.>#/2%:1#'%&%#*7.58%5#
#
d# J71#2*%#17:98%#1/Y%#/.68B5%5#/.#2*%#7.7831/1#75%RB72%b# H5%RB72%#17:98%#1/Y%#T EFFW#
?;;5#17:98%#1/Y%#TfFNggW#
4;5%&72%#17:98%#1/Y%#TdFNegW#
P:788#17:98%#1/Y%#ThdFW##
e# J71#7#8;.>/2B5/.78#5%1/>.#'/2*#72#8%712#2';#:%71B&%:%.2#B1%5b# Z;.>/2B5/.78#5%1/>.#B1%5#
# # I;#8;.>/2B5/.78#5%1/>.#B1%5##
f# J71#2*%#2/:%#/.2%&078#1272%5b# $/:%#/.2%&078#75%RB72%83#5%16&/K%5##
# # $/:%#/.2%&078#I@$#5%16&/K%5#
172
_# A<#7.32*/.>#;66B&&%5#/.#2*%#/.2%&/:#9%&/;5#T%M>M#/.2%&0%.2/;.(#;2*%&#&%8%07.2#%0%.21W(#'71#/2#75%RB72%83#5%16&/K%5b#
H.32*/.>#2*72#;66B&&%5#5B&/.>#2*%#/.2%&/:#9%&/;5#T%M>M#2&%72:%.2W#75%RB72%83#5%16&/K%5##
H11B:7K8%#'*72#;66B&&%5#5B&/.>#2*%#/.2%&/:#9%&/;5#
-.68%7&#;&#I@$#5%16&/K%5#'*72#;66B&&%5#5B&/.>#2*%#/.2%&/:#9%&/;5#
#
i# J71#7#9&;9;&2/;.#;<#2*%#972/%.21#6*7.>%5#T/M%M#/:9&;0%:%.2#;%2%&/;&72/;.Wb# )7&2#;<#2*%#972/%.21#'%&%#6*7.>%5#T%0/5%.6%#9&;0/5%5W##
I@#%0/5%.6%#9&;0/5%5(#KB2#711B:7K8%#2*72#97&2#;<#2*%#972/%.21#'%&%#6*7.>%5##
-.68%7&#/<#97&2#;<#2*%#972/%.21#'%&%#6*7.>%5##
)72/%.21#'%&%#I@$#6*7.>%5##
Design requirements for hypotheses testing # # # # # ## X;.12&B621#<;&#'*/6*#7#>;85#127.57&5#'71#.;2#707/87K8%L#
## # # #
G# J%&%#*39;2*%1%1#7K;B2#6*7.>%1#/.#16;&%1#<;&:B872%5#7#9&/;&/#T/M%M#K%<;&%#5727#6;88%62/;.Wb#
C39;2*%1%1#<;&:B872%5#7#9&/;&/#
# C39;2*%1%1#07>B%#;&#.;2#<;&:B872%5#KB2#9;11/K8%#2;#5%5B6%#'*72#'71#%S9%62%5#
-.68%7&#'*72#'71#%S9%62%5#
# # # #g# J71#2*%#%S9%62%5#direction#;<#6;&&%872/;.1#;&#:%7.#5/<<%&%.6%1#;<#2*%#6*7.>%#
16;&%1#;<#CDN)D@#/.12&B:%.21#/.68B5%5#/.#2*%1%#*39;2*%1%1b#=S9%62%5#5/&%62/;.#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#1272%5#
=S9%62%5#5/&%62/;.#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#I@$#1272%5##
# #
EF# J%&%#2*%#%S9%62%5#7K1;8B2%#;&#&%872/0%#magnitude#;<#6;&&%872/;.1#;&#:%7.#5/<<%&%.6%1#;<#2*%#6*7.>%#16;&%1#;<#CDN)D@#/.12&B:%.21#/.68B5%5#/.#2*%1%#*39;2*%1%1b#
=S9%62%5#:7>./2B5%#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#1272%5#
=S9%62%5#:7>./2B5%#;<#2*%#6;&&%872/;.1#;/<<%&%.6%1#I@$#1272%5##
# #
173
EE# J71#7.#75%RB72%#5%16&/92/;.#9&;0/5%5#;<#2*%#6;:97&72;&#/.12&B:%.2T1Wb# H5%RB72%#5%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W##
# );;%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W#
I@#5%16&/92/;.#;<#2*%#6;.12&B621#:%71B&%5#K3#2*%#6;:97&72;&#/.12&B:%.2T1W#
E^# J%&%#2*%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#75%RB72%83#5%16&/K%5b#
H5%RB72%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#/.#7#9;9B872/;.#1/:/87#2*%#12B53#9;9B872/;.#
H5%RB72%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#KB2#.;2#1B&%#/<#2*%1%#79983#2;#2*%#12B53#9;9B872/;.#
P;:%#/.<;&:72/;.#;.#:%71B&%:%.2#9&;9%&2/%1#T;#&%<%&%.6%#2;#7#12B53#;.#:%71B&%:%.2#9&;9%&2/%1W#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#/.#7.3#12B53#9;9B872/;.##
I@#/.<;&:72/;.#;.#2*%#:%71B&%:%.2#9&;9%&2/%1#;<#2*%#6;:97&72;&#/.12&B:%.2T1W#
Ed# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#T%M>M#;.83#5727#9&%1%.2%5#;.#7#6;:97&/1;.#'/2*#7.#/.12&B:%.2#2*72#:%71B&%1#7.;2*%.12&B62W#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #Ee# J%&%#5%1/>.#7.5#1272/12/678#:%2*;51#75%RB72%#<;*%#*39;2*%1%1#2;#K%#2%12%5b# P272/12/678#
:%2*;51#7998/%5#799&;9&/72%#
# P272/12/678#:%2*;51#7998/%5#I@$#;92/:78#
P272/12/678#:%2*;51#7998/%5#I@$#799&;9&/72%#
174
Design requirement for comparison to a gold standard # # # # # ## X;.12&B621#<;&#'*/6*#7#>;85#127.57&5#'71#707/87K8%L#
## # # #
Ef# !7.#2*%#6&/2%&/;.#<;*7.>%#K%#6;.1/5%&%5#71#7#&%71;.7K8%#>;85#127.57&5b# !&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#T%0/5%.6%#9&;0/5%5W##
I;#%0/5%.6%#9&;0/5%5(#KB2#711B:7K8%#2*72#2*%#6&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V##
-.68%7&#'*%2*%*%#6&/2%&/;.#B1%5#67.#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#
!&/2%&/;.#B1%5#67.#I@$#K%#6;.1/5%&%5#7.#75%RB72%#U>;85#127.57&5V#
E_# J%&%#2*%&%#7.3#/:9;&27.2#<87'1#/.#2*%#5%1/>.#;&#:%2*;51#;<#2*%#12B53b# I;#;2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
# @2*%&#:/.;&#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
@2*%&#/:9;&27.2#:%2*;5;8;>/678#<87'1#/.#2*%#5%1/>.#;&#%S%6B2/;.#;<#2*%#12B53#
Statistical methods # # # # # #Ei# <;.2/.B;B1#16;&%1L#J%&%#6;&&%872/;.1#K%2'%%.#6*7.>%#16;&%1(#;*%#7&%7#B.5%&#
2*%#D%6%/0%&#@9%&72;&#!B&0%#TD@!W#6B&0%#6786B872%5b#!;&&%872/;.1#;&#H&%7#B.5%*%#D@!#!B&0%#TH-!W#6786B872%5##
# # !;&&%872/;.1#;&#H-!#I@$#6786B872%5##
EG# <;/6*;2;:;B1#1678%1L#J%&%#1%.1/2/0/23#7.5#19%6/</6/23#T6*7.>%5#0%&1B1#.;2#6*7.>%5W#5%2%&:/.%5b#
P%.1/2/0/23#7.5#19%6/</6/23#6786B872%5#
# # P%.1/2/0/23#7.5#19%6/</6/23#I@$#6786B872%5#
###
175
Interpretability##J%#&%6;::%.5#2;#B1%#2*%#A.2%&9&%27K/8/23#K;S#2;#%S2&762#788#/.<;&:72/;.#;.#2*%#/.2%&9&%27K/8/23#/11B%1#5%16&/K%5#/.#2*/1#K;S#;<#2*%#/.12&B:%.21#B.5%B53#<&;:#2*%#/.68B5%5#7&2/68%1M####Box Interpretability
)%&6%.27>%#;<#:/11/.>#/2%:1## #
+%16&/92/;.#;<#*;'#:/11/.>#/2%:1#'%&%#*7.58%5# #
+/12&/KB2/;.#;<#2*%#T2;278W#16;&%1## #
)%&6%.27>%#;<#2*%#&%19;.5%.21#'*;#*75#2*%#8;'%12#9;11/K8%#T2;278W#16;&%# #
)%&6%.27>%#;<#2*%#&%19;.5%.21#'*;#*75#2*%#*/>*%12#9;11/K8%#T2;278W#16;&%# #
P6;&%1#7.5#6*7.>%#16;&%1#T/M%M#:%7.1#7.5#P+W#<;&#&%8%07.2#T1BKW#>&;B91(#%M>M#<;&#.;&:72/0%#
>&;B91(#1BK>&;B91#;<#972/%.21(#;*%#>%.%&78#9;9B872/;.#
#
4/./:78#A:9;&27.2#!*7.>%#T4A!W#;/./:78#A:9;&27.2#+/<<%&%.6%#T4A+W# #
#
176
Generalizability #J%#&%6;::%.5#2;#B1%#2*%#?%.%&78/Y7K/8/23#K;S#2;#%S2&762#5727#;.#2*%#6*7&762%&/12/61#;<#2*%#12B53#9;9B872/;.1#7.5#17:98/.>#9&;6%5B&%1#;<#2*%#/.68B5%5#12B5/%1M##Box Generalisability # #4%5/7.#;&#:%7.#7>%#T'/2*#127.57&5#5%0/72/;.#;&#&7.>%W# #
+/12&/KB2/;.#;<#1%S# #
A:9;&27.2#5/1%71%#6*7&762%&/12/61#T%M>M#1%0%&/23(#1272B1(#5B&72/;.W#7.5#5%16&/92/;.#;<#2&%72:%.2# #
P%22/.>T1W#/.#'*/6*#2*%#12B53#'71#6;.5B62%5#T%M>M#>%.%&78#9;9B872/;.(#9&/:7&3#67&%#;&#
*;19/278m&%*7K/8/272/;.#67&%W#
#
!;B.2&/%1#/.#'*/6*#2*%#12B53#'71#6;.5B62%5# #
Z7.>B7>%#/.#'*/6*#2*%#CDN)D@#/.12&B:%.2#'71#%078B72%5# #
4%2*;5#B1%5#2;#1%8%62#972/%.21#T%M>M#6;.0%./%.6%(#6;.1%6B2/0%(#;&#&7.5;:W# #
)%&6%.27>%#;<#:/11/.>#&%19;.1%1#T&%19;.1%#&72%W# #
##
177