Post on 23-Feb-2018
NeuroCog FX® - computerized screening of cognitive functions in epilepsy patients
Christian Hoppe*,1, Klaus Fliessbach1, Uwe Schlegel2, Christian E. Elger1, Christoph
Helmstaedter1
1 Department of Epileptology, University of Bonn Medical Centre, Sigmund-Freud-Str. 25,
53105 Bonn, Germany. 2 Department of Neurology, Knappschaftskrankenhaus, Ruhr-University, Bochum, Germany.
* Corresponding author. Fax: ++49 (0)228 / 287-16294.
E-mail address: christian.hoppe@ukb.uni-bonn.de (C. Hoppe).
Word count: 6,742 words (text body only)
NeuroCog FX® - Hoppe et al.
- 2 -
Abstract (145 words)
NeuroCog FX®, a computerized neuropsychological screening for serial examinations of patients
with epilepsy and other neurological diseases, was developed to fill the gap between unspecific
ratings and comprehensive assessments. Eight subtests address attention, working memory,
verbal and figural memory, and language. The test duration is less than 30 minutes. In research
contexts, the test can be applied at multiple sites by non-academic personnel. Normative data
were recorded from healthy subjects (N=244; age range=16-75 years; retest: N=44; validation:
N=40) and unselected patients from an Epileptology unit (N=212; retest: N=94; validation:
N=126). Psychometric analyses confirmed sufficient reliability and concurrent validity,
particularly in patients. NeuroCog FX® memory and overall performance scores showed “fair” to
“good” diagnostic utility with regard to deficits revealed by established tests. NeuroCog FX®
provides reliable and valid measures of cognitive performance and may be used in clinical and
research contexts as a screening instrument.
Keywords:
Epilepsy, cognition, memory, computer-based testing
NeuroCog FX® - Hoppe et al.
- 3 -
1. Introduction
The maintenance or restitution of cognitive functions is a major therapeutic aim in the treatment
of epilepsy and other neurological diseases. In epilepsy, cognitive performance may be impaired
by relatively stable factors such as focal brain lesions (e.g. congenital malformation) but also by
dynamic factors such as underlying progressive diseases (e.g. encephalitis, tumor), seizure
activity, adverse effects of antiepileptic drugs, and effects of epilepsy surgery [1,2,3]. Besides
seizure control neuropsychological functioning is a key determinant of health-related quality of
life and as such a major factor for treatment success. Furthermore, neuropsychological alterations
may be important indicators of latent disease dynamics. In other neurological conditions such as
brain tumors [4], dementia [5], multiple sclerosis [6], or Parkinson’s disease [7] the role of
cognitive functioning and the need for adequate neuropsychological evaluation have been
recognized as well.
For the valid individual diagnostic evaluation (e.g. presurgical work-up in epilepsy)
comprehensive neuropsychological assessment by experienced neuropsychologists is essential
[8]. However, in other clinical or research contexts this ‘gold standard’ may not be required. For
example, neurologists in private practice may want to select patients for more comprehensive
neuropsychological examination based on an objective economic measure. Also for multiple
follow-up examinations of cognitive performance, the administration of a complete testing
battery is inappropriate. Similarly, in neuropharmacological research contexts, serial extensive
individual neuropsychological evaluations may appear inappropriate at an early stage of drug
development.
There is a need for economic but nevertheless objective, reliable, valid, standardized, and
appropriate screening instruments for serial cognitive examinations of patients with epilepsy and
other neurological diseases. Notably, screening instruments are not developed to replace
established neuropsychological tests but to offer alternatives for potentially inadequate
‘measures’ of cognitive performance and change (e.g. global change rating scales). In particular,
the test duration should be short (below 30 minutes) and non-academic personnel (e.g. study
nurses, doctor’s assistants, medical students) should be able to administer, score, and file the test
NeuroCog FX® - Hoppe et al.
- 4 -
results. However, the individual diagnostic evaluation generally requires professional
neuropsychological education and experience.
Computerized testing appears promising for the purpose of screening patients and might fill the
gap between unspecific ratings and comprehensive neuropsychological assessments [9,10,11]. The
software defines the test procedure allowing the highly standardized administration by different
testers at different sites. Scoring and filing of the data are automated which increases the
objectivity and facilitates scientific use. Furthermore, random selection of items from a bigger
pool allows multiple serial examinations with short follow-up intervals. In contrast, most of the
established paper-pencil tests have a limited number of validated parallel versions (if at all).
Several computer-based test systems have been published during the last two decades [12].
However, these tools had little impact on neuropsychological research in epilepsy so far. A
PUBMED search identified only eleven studies using computerized cognitive testing in epilepsy
patients during the last five years (due to July 15, 2009; search term: ‘computer* cogniti*
epilep*’; identified batteries: BARS, CalCAP, CDR, CNS Vital Signs, FePsy, RTB [12]) but the
number of scientific publications is not necessarily representative for clinical use.
Here we report on the development and psychometric evaluation of NeuroCog FX®
(‘neurocognitive effects’), a computer-based neuropsychological screening battery for serial
clinical or scientific examinations of individual patients. Based on previous data, the test was
introduced in a German paper in 2006 [13]. The eight subtests refer to well-established
neuropsychological paradigms and address four separate cognitive domains which were selected
for their clinical relevance in patients with epilepsy and brain tumors (primary CNS lymphomas):
attention, working memory, memory (comprising verbal and figural learning and recognition),
and language [14,15]. The two memory subtests are based on a former version developed in our
unit which has shown to be sensitive for the immediate effects of high-intensity vagus nerve
stimulation and postictal material-specific memory disturbance after lateralized focal seizures
[16,17]. Besides application in epilepsy patients, the test is presently used in studies of the Glioma
Network, a multicentre research consortium of the German Cancer Foundation, and in several
other neurological studies (e.g. myotrophic dystonia).
NeuroCog FX® - Hoppe et al.
- 5 -
2. Methods
2.1. NeuroCog FX®
Table 1 shows the functional domains being addressed, descriptions of the computerized tasks,
and the measures being recorded and standardized.
> Table 1
The test system was programmed in Borland Delphi 6 by one of the authors (C. Hoppe) and runs
on current PCs (or laptops) operating under MS Windows 95 or higher. Hardware requirements
(e.g. graphic resolution) are specified in the technical manual. At the beginning of each subtest a
short instruction is shown which may be complemented by the examiner according to the test
manual. The Two-Back test instruction includes a short demonstration. All other tests are
administered right after instruction without further familiarization or practice. In the most recent
version, all reactions from the subject are recorded via the keyboard, i.e. no paper or additional
hardware is needed (e.g. mouse, touchpad). With regard to subjects with motor impairment, the
spacebar (as biggest key) was defined as the standard reaction key. However, the Digit Span test
requires data entry via the number keys. The examiner is permitted to assist the subject if
necessary. The language of test administration is German; an English version is under
construction.
The test was administered in patients and healthy subjects by non-academic personnel (doctoral
medical students) under the supervision of experienced neuropsychologists. Patients were tested
in a seated position in the neuropsychological lab rooms of our department. Control subjects
underwent computerized testing at different quiet and undisturbed places but established
neuropsychological tests were also performed in the lab rooms. Test administrators were
instructed to ensure optimal and constant lighting conditions. The test administrators always
remained present and no group tests were performed.
NeuroCog FX® - Hoppe et al.
- 6 -
The subtests provide raw scores, i.e. number of correct reactions and errors, or reaction times,
respectively. The Digit Span test is standardized for the maximum length of reproducible digit
sequences (i.e., span). The recognition memory scores are calculated by hits minus (0.5*false
alarms) with hits being correctly confirmed items and false alarms being erroneously confirmed
distracters. Thus, random response strategies (e.g. confirming or rejecting each single item)
likewise result in a score of 0, perfect performance equals the number of learning items (12 or 7,
respectively), and scores markedly below 0 may indicate possible malingering [18].
For the evaluation of validity and reliability, the Phonematic Fluency test was primarily
administered in a written form (i.e., patient writing words on a sheet of paper) with a standard
initial letter (P) according to an established German testing procedure (Leistungs-Prüf-System,
Subtest 6, by Horn [19]). However, written administration would exclude patients with motor
handicaps from being assessed. Furthermore, serial follow-up assessments require randomly
changing initial letters. According to our previous neuropsychological experience with epilepsy
patients, test performance remains more or less unaffected by the test mode (oral vs. written)
because the requirement to write the words is no limiting factor. However, initial letter selection
will influence performance. In its recent version, Phonematic Fluency is administered in an oral
form, i.e. patients say words and the tester counts the number of words by clicking a button on
the screen. The initial letters are randomly selected by the computer (set: L, P, or S). Meanwhile,
further consecutive patients from our outpatient clinic have been included allowing statistical
analyses of the effects of the different test modes (see below).
Based on exploratory Principal Component Analyses (PCA; see below) two measures of overall
performance, instead of a single total score, were defined. SCORE was defined by the mean of
the mean standard values of the scored subtests on working memory (Digit Span, Two-Back),
memory (Verbal and Figural Memory), and verbal fluency (Phonematic Fluency). RTT was
defined by the mean of the standard values from the three reaction-time based tests (Simple
Reaction, Go/No Go, Inverted Go/No Go).
NeuroCog FX® - Hoppe et al.
- 7 -
2.2. Healthy subjects
Psychometric evaluation and standardization was primarily done in healthy subjects. None of
these subjects had a history of neurological or psychiatric disease. All healthy subjects (and the
parents of minors) gave informed consent.
CON-TOTAL sample: Normative data were derived from N=244 healthy subjects with a mean
age of 42.1 years (median: 40.6; range=17-80, SD=17.7; gender male/female: 107/137;
handedness right/left/ambidextrous: 215/13/16). Subjects were recruited according to the
predefined ranges of the normative age groups: adolescents (16-29 years; n=87); younger adults
(30-44 years; n=57); older adults (45-59 years; n=48); and seniors (60-75 years; n=52).
The level of education of subjects from the normative sample was tested by a paper-pencil
multiple-choice word/pseudowords vocabulary test (Mehrfachwahl-Wortschatztest, MWT-B
[20]). The mean MWT-B IQ of the normative sample was 115.4 (SD=15.3). This high mean IQ
value was the result of the dated norms (1977) and actually corresponds to present average
intelligence levels as indicated by the almost identical MWT-B test results in a recent
standardization study in 235 other healthy adult subjects (unpublished data; cf. Flynn effect [21]).
CON-RELIABLE subsample: Subtest reliabilities were estimated by test-retest correlations. A
subgroup of N=44 healthy subjects with the mean age of 36.8 years ((median:=28.4, range=17-
72, SD=16.3; male/female: 17/27; handedness right/left/ambidextrous: 40/1/3) underwent the
computerized test battery twice with a mean interval of 2.0 months (median=2.0, range=0.9-3.7,
SD=0.5). To evaluate later practice effects, nineteen of these subjects were assessed four times in
total with a mean inter-test interval of 1.36 months.
CON-VALIDATE subsample: Subtest validities were estimated by concurrent validation, i.e.
correlations of newly introduced and established neuropsychological measures. A subgroup of
N=40 healthy subjects with the mean age of 41.3 years (median=39.9, range=17-66, SD=14.6;
male/female: 13/27; handedness right/left: 36/4) were tested by the computerized test as well as
by an established comprehensive neuropsychological test battery.
NeuroCog FX® - Hoppe et al.
- 8 -
2.3. Patients
Additional data for psychometric evaluation and preliminary clinical evaluation including the
analysis of sensitivity/specificity of the test scores were recorded in unselected consecutive
adolescent and adult patients from the Bonn Department of Epileptology. All patients (and
parents of the minors) gave their informed consent to participate in the study.
PAT-TOTAL sample: This sample comprised all data from the first application of
NeuroCog FX® from N=212 consecutive patients with a mean age of 38.1 years (median=37.0;
range=15-74, SD=13.7; male/female: 100/112; handedness right/left/ambidextrous: 178/16/18).
The first N=40 patients were enrolled prospectively while the subsequent N=172 patients were
selected for neuropsychological testing for diverse clinical indications (e.g. subjective
complaints, medication change, baseline/outcome evaluation of in-house neuropsychological
training).
PAT-RELIABLE subsample: A subgroup of N=94 patients with the mean age of 38.1 years
(median=35.1, range=16-74, SD=14.1; gender male/female: 42/52; handedness
right/left/ambidextrous: 79/6/9) underwent NeuroCog FX® assessment at least twice within a
maximum interval of three months (test-retest interval/days: mean=13.9, median=6.0, range=0-
105, ,SD=24.9) for clinical indications.
PAT-VALIDATE subsample: A subgroup of N= 126 patients with the mean age of 38.6 years
(median=40.0, range=16-70, SD=13.1; gender male/female: 60/66; handedness
right/left/ambidextrous: 101/6/19) underwent NeuroCog FX® but were also selected for
comprehensive neuropsychological testing with an established test battery for diverse clinical
reasons. The neuropsychological standard battery could not be completed in all patients; patient
selection for this study was based on completed verbal and figural memory assessment (see
below).
NeuroCog FX® - Hoppe et al.
- 9 -
2.4. Factor analyses
In many assessment batteries subtest scores are finally summed up in a single total score (e.g.
full-scale IQ). To obtain rational parameters of overall test performance, exploratory PCA was
applied to the NeuroCog FX® subtest raw scores from the total sample of all subjects (CON-and
PAT-TOTAL, N=379; extraction criterion: Eigenvalue>1, factor rotation: VARIMAX with
Kaiser’s normalization). Notably, PCA was not performed to test whether the neuropsychological
functional domains addressed by the computerized tasks are extracted as statistically independent
factors.
2.5. Psychometric evaluation
Reliabilities and practice effects. Subtest reliabilities were estimated by Pearson’s product-
moment correlation of test and retest scores (r12) in the CON-RELIABLE sample. Additionally,
Spearman’s rank correlation coefficients were calculated to rule out possible effects of the non-
normal data distribution. Practice effects were estimated by the mean differences (MD) between
group means of the test and retest raw scores in the CON-RELIABLE subsample.
In addition, effects of repeated testing were separately evaluated in the PAT-RELIABLE sample.
In most of these patients, re-evaluation with NeuroCog FX® aimed at the evaluation of cognitive
effects of clinical interventions such as antiepileptic medication changes or cognitive training
performed during the test-retest interval. Though being retrospective and confounded by a variety
of treatments, this data provides important additional information on reliability and practice
effects in a clinical target population of the test.
Concurrent validities. NeuroCog FX® was validated in the CON-VALIDATE subsample of
healthy subjects (N=40; for sample characteristics see above) and, separately, in the PAT-
VALIDATE patient sample (N=126). All subjects underwent computerized testing and a well-
established comprehensive neuropsychological assessment battery; Table 2 lists the established
tests (for detailed descriptions [22]).
NeuroCog FX® - Hoppe et al.
- 10 -
> Table 2
Computerized and established testing was administered in a single session in healthy subjects but
with an interval in patients (N=126; sequence NeuroCog FX® – standard test battery: N=28,
range=1-291, median: 90 days; both batteries at the same day: N= 57; sequence standard test
battery – NeuroCog FX®: N=41, range=1-326, median: 8 days).
Concurrent validities of NeuroCog FX® subtests were evaluated by two different approaches.
Firstly, Pearson’s product-moment correlations of NeuroCog FX® subtests and the domain-
related subtests from the established neuropsychological battery were calculated as validity
estimates; possible effects of non-normal data distribution were controlled for by also calculating
the respective Spearman’s rank correlations. Secondly, PCA with VARIMAX-rotation was
applied on raw scores from both instruments to test the functional coherence of NeuroCog FX®
measures and respective established tests. Both analyses were performed separately for healthy
subjects (CON-VALIDATE) and patients (PAT-VALIDATE).
2.6. Individual diagnostic evaluation
Standardization. Data were standardized to allow age-related individual diagnostic evaluation and
comparisons between different subtests. Normal distribution was tested by the Kolmogorov-
Smirnov goodness-of-fit test. To ensure the usual reference of standard values (SV; mean=100,
SD=10) and percentiles (PR; e.g. SV=90 refers to PR=16) despite the non-normal distribution of
raw scores, SV were assigned to raw scores separately for the pre-defined age groups based on a
set of selected PR (plane transformation [23]: PR/SV: 0/60, 1/70, 3/80, 10/85, 16/90, 20/92, 30/95,
40/97, 50/100, 60/103, 70/105, 80/108, 84/110, 90/115, 97/120, 99/130, 100/140 with PR 100/SV
140 being reserved for future performance exceeding performance of the normative sample).
Reliable change. The statistical evaluation of differences between the test scores in individual
patients (e.g. changes from test to retest in the same measure) requires the calculation of a
confidence interval based on critical differences (∆-crit) and practice effects (shift of expectancy
value) of the respective score [23]. Thereby, ∆-crit refers to a defined significance level α of the
NeuroCog FX® - Hoppe et al.
- 11 -
confidence interval and reflects the non-perfect reliability (rr) of the measure (rr<1). ∆-crit is
defined by ∆-crit = ±zα * SE∆ with the standard error of change SE∆ = SDTest * (2 * (1-rr))1/2 and
SDTest as the standard deviation of the test score. Here, ∆-crit were calculated for α=.10 with zα =
±1.64 and the reliability rr was estimated by Pearson’s product-moment correlation of test and
retest score r12 (i.e., ∆-crit = (±1.64) * SDTest * (2 * (1-r12))1/2). The 90%-confidence intervals for
each score were then calculated by MD ± ∆-crit with MD as the test-retest difference of group
means which are used as an estimate of the true practice effect. The outer limits of the confidence
intervals are given, i.e. differences of scores already indicating significant decline or
improvement, respectively.
Determination of diagnostic thresholds. To identify the optimal diagnostic thresholds, diagnostic
classifications (affected versus unaffected) based on scores from the computerized screening
battery were compared to categorizations based on established neuropsychological tests. The
analysis was focused on memory deficits which are particularly important in epilepsy but also
other neurological diseases [1,2,24]. Based on the assumption that a functional deficit is reliably
indicated by age-corrected below-average scores in established tests (SV<85 = mean - 1.5 SD, ∆-
crit being considered), the total rate of type I and type II classification errors, the relative
reduction of incorrect classifications as compared to random classification, positive and negative
predictive values, likelihood ratios positive and negative, sensitivity, specificity, and correlation
coefficients (ϕ, χ2) were calculated for different thresholds of the computerized scores (SV<80,
<85, <90, <95) in the merged PAT- and CON-VALIDATE subsamples. The same coefficients of
diagnostic utility were calculated for the NeuroCog FX® overall performance score, SCORE,
with regard to the identification of other cognitive deficits (SV<85) in at least two non-memory
tests from the established assessment battery.
2.7. Cognitive performance in patients
In addition to psychometric analyses, the group differences between patients and controls have
been analyzed by MANOVAs and MANCOVA (age as covariate). Findings from the
computerized measures were compared with respective findings based on established tests.
NeuroCog FX® - Hoppe et al.
- 12 -
2.8. Statistics
The significance level was set to α=.05 for all tests. Statistics have been performed by SPSS 17
(SPSS Inc., 2008).
3. Results
The median test duration was 24 minutes in healthy subjects and 28 minutes in patients
(maximum: 35 minutes). Some patients had difficulties understanding the Two-Back test despite
the integrated demonstration tool; consequently, in 3 out of 212 patients this test could not be
performed.
3.1. Factor analyses
Explorative PCA on raw scores from the total sample of subjects (CON- and PAT-TOTAL,
N=379) extracted two factors (Eigenvalue>1, VARIMAX rotation): SCORE, comprising all
scored subtest (Digit Span, Two-Back, Verbal and Figural Memory, and Phonematic Fluency);
and RTT, comprising the three reaction-time based tests (Simple Reaction, Go/No Go, and
Inverted Go/No Go). All factor loadings >.30 are shown in Table 3.
> Table 3
The model explained 60% of the variance. Deleting the Two-Back test from analysis for its low
reliability (see below) results in a slightly improved model with identical factors explaining 64%
of the variance. Based on this data, instead of defining a single total score, two measures of
overall performance, SCORE and RTT, were defined. The SCORE and RTT standard values
shared about 16% of their variance (r=0.42, P<.001).
NeuroCog FX® - Hoppe et al.
- 13 -
3.2. Psychometric evaluation
Reliability and validity were estimated based on Pearson’s product-moment correlation.
Importantly, non-parametric Spearman’s rank correlation generally yielded similar results.
Reliability and practice effects. Table 4A shows the practice effect estimates (i.e., group mean
differences between test and retest raw scores, MD) and the reliability estimates calculated by
Pearson’s product-moment correlations (r12) of test and retest raw scores in the sample of healthy
subjects.
> Table 4A
With the exception of the Two-Back and the Simple Reaction test all subtests yielded significant
practice effects, i.e. improvements from test to retest. Also SCORE, but not RTT, showed
significantly increased scores from test to retest. For those 19 subjects who underwent the test
four times, further improvements occurred in single measures from second to third application
(reaction times in the Simple Reaction and the Go/No Go test) but not between third and fourth
application (data not shown) indicating sufficient test stability.
All but one test-retest correlations were significant (5 scores P<.001, 2 scores P<.01). The
correlations were large for overall performance standard values (SCORE: r12=0.71; RTT:
r12=0.55) and medium to large in the majority of subtests (0.45 ≤ r12 ≤ 0.69). However, the Two-
Back test showed no significant test-retest correlation (r12=0.21, P>.05). Near-to-the-maximum
mean values and the small variance in this measure indicate a possible ceiling effect.
In addition, we estimated practice effects and reliabilities based on patient data from the PAT-
RELIABLE sample (Table 4B).
> Table 4B
Practice effects were clearly smaller in patients and reached significance only for Digit Span
(improved) and Simple Reaction (decelerated). Furthermore, the test-retest correlations were
NeuroCog FX® - Hoppe et al.
- 14 -
large for SCORE and all subtests (0.70 ≤ r12 ≤ 0.84) indicating high reliability of all measures
when estimated based on patient data. However, due to larger variance, the critical differences for
significant change are not smaller than those derived from healthy subjects.
Concurrent validity. Table 5 shows the concurrent validity estimates, i.e. the Pearson’s product-
moment correlation coefficients of corresponding measures from NeuroCog FX® and the
established test battery, separately for healthy subjects (CON-VALIDATE) and patients (PAT-
VALIDATE).
> Table 5
In healthy subjects, reaction times and the Two-Back test score did not correlate with tests from
the validation test battery which, however, did not include directly comparable counterparts.
Significant correlations were obtained for the tests on working memory (small to medium),
memory (medium to large), and verbal fluency (large). In patients, all but one subtests from the
computerized battery showed medium to large correlations with established counterparts.
However, Verbal Memory and one of the VLMT scores (retention) showed only a small
correlation in patients (r=-0.19, P<.05; healthy subjects: r=-0.49). .
Concurrent validity was also tested by explorative PCA on the raw scores of computerized and
established measures separately in healthy subjects and patients (CON- and PAT-VALIDATE
subsamples). The Two-Back test was excluded from this analysis due to its low reliability. From
the established battery the Maze Test and Verbal Semantic Fluency were excluded due to the
frequent missing values. Table 6A and 6B show all VARIMAX-rotated factor loadings >.30 for
healthy subjects and patients (CON- and PAT-VALIDATE). Measures and factors were arranged
by the subtests from NeuroCog FX®.
> Table 6A and 6B
Analyses in both samples extracted 6 factors (Eigenvalue > 1). The computerized reaction time
tests loaded on a single factor but were not associated with established measures. The other
NeuroCog FX® - Hoppe et al.
- 15 -
subtests loaded on different factors and were associated with their established neuropsychological
counterparts. For example, Verbal Memory loaded on a factor together with scores from the
Verbal Learning and Memory Test. However, in patients the verbal retention score loaded on a
factor together with Digit Span instead of Verbal Memory. Thus, PCA confirmed concurrent
validity of the NeuroCog FX® subtests but with a limitation regarding verbal retention.
3.3. Individual diagnostic evaluation
Standardization. In healthy subjects, the raw scores of the following subtests were distributed
askew indicating ceiling effects: Two-Back, Go/No Go, Inverted Go/No Go, and to a lower extent
Verbal Memory. Possible ceiling effects were already considered in the previous reliability and
validation analyses. Standardization was based on the first test administration data from healthy
subjects for the predefined four age groups. Group-wise descriptive statistics and results from
univariate ANOVAs on age group effects (with post-hoc Scheffé tests) are shown in Table 7.
Post-hoc tests revealed no significant group differences between the younger adults (29-44 years)
and the older adults (45-59 years). Performance of seniors was significantly lower in all subtests.
> Table 7
Notably, with the exception of Figural Memory, neither the subtest raw scores nor the subtest
standard values were distributed normally (Kolmogorov-Smirnov goodness-of-fit test, P<.05).
The overall standard values, SCORE and RTT, showed normal distribution in patients, healthy
subjects, and in the total sample (P>.05). Standard values were assigned according to plane
transformation (see method section) [23]. Meanwhile, the age-related normative data were
integrated in an upgrade of the computerized test which now provides automatic scoring and
normative age-related evaluation together with a graphical performance profile (Figure 1 shows
an example from the most recent test release).
> Figure 1
NeuroCog FX® - Hoppe et al.
- 16 -
Reliable change. Tables 4A and 4B show the critical differences ∆-crit (α=.10) and the reliable
change indices indicating the outer limits of significant changes (decline/improvement) based on
90%-confidence intervals for raw scores in individual subjects. Measures of overall performance,
SCORE and RTT, may be regarded as changed if test and retest standard values differ by
approximately one SD (SV ±10).
Identification of diagnostic thresholds. Tables 8A-C show several coefficients of the diagnostic
utility of NeuroCog FX® memory subtests and the overall performance score, SCORE, at
different thresholds of the age-corrected standard values (SV<80, <85, <90, <95) with regard to
established indicators of memory and other cognitive deficits (threshold: SV<85). The analyses
are based on the combined data from patients and healthy subjects (PAT- and CON-VALIDATE)
but, notably, only single control subjects showed neuropsychological deficits according to the
definition.
> Tables 8A, 8B, & 8C
Regarding the different aspects of verbal memory, applying a strict threshold of SV<80 (2 SD,
corresponding to about 8 items in four trials; see Table 4A) to the computerized scores showed
advantageous diagnostic properties such as lowest rate of classification errors, highest relative
error reduction, and a “fair” likelihood ratio positive (Table 8A). However, at this threshold the
likelihood ratio negative and the sensitivity are rather low, i.e. the rate of unidentified affected
patients is rather high. For the verbal learning score and the recognition score slightly better
values were achieved as compared to the final score and the retention score.
Regarding figural learning (Table 8B), the diagnostic utility appears higher than for verbal
learning and memory with the best outcome being achieved if a lower threshold of SV<90 (1 SD,
corresponding to about 6 figures in four trials; see Table 4A) is applied to the Figural Memory
standard value.
Finally, the overall performance of the scored tests (including the computerized memory tests),
represented by SCORE, was explored regarding its utility to diagnose other cognitive deficits,
NeuroCog FX® - Hoppe et al.
- 17 -
irrespective of possible memory deficits (Table 8C). A balance of sensitivity (0.69) and
specificity (0.72) might be achieved for SV<95 (i.e., 0.5 SD) but regarding rate of classification
errors, relative error reduction, and likelihood ratio positive, SCORE shows similar or
advantageous values for the more usual threshold of SV<90.
3.4. Neuropsychological performance in patients
A MANCOVA on the NeuroCog FX® data from the merged PAT-TOTAL and CON-TOTAL
sample (N= 379, complete data) yielded significant effects of the covariate age (Wilks λ=0.73,
F8; 369=17.0, P<.001) and the group factor (Wilks λ=0.74, F8; 369=16.5, P<.001) indicating general
cognitive impairment in patients irrespective of age in a group level analysis; univariate post-hoc
analyses confirmed the group effect in each single measure. MANOVA on the age-corrected
standard values of overall performance (SCORE, RTT) also showed a significant main effect of
the group factor (N= 361; Wilks λ=0.68, F2; 358 =83.4, P<.001); post-hoc univariate analyses
confirmed the group effect for both parameters.
The group mean of the number of below-average scores (SV<90) was 2.8 (SD: 2.2; median 2.0)
in patients and 1.1 (SD: 1.3; median 1.0) in controls (χ2=95.5, P<.001) from eight available
scores. Thus, more individual analyses revealed rather specific profiles of impaired and
unimpaired functions in patients. Figure 2 shows the subtest performance profiles in terms of
percentages of subjects with below-average standard values. SCORE showed greater group
differences than RTT. Verbal memory and phonematic fluency appeared most susceptible to
cognitive deterioration.
> Figure 2
In order to compare the findings from the computerized tests and the established
neuropsychological assessment battery, further group analyses were performed on established
measures from the merged CON- and PAT-VALIDATE samples (N= 103). Multivariate analysis
again yielded a main effect of the group factor but no effect of age was obtained in this sample.
Post-hoc univariate analyses yielded effects of age as a covariate only for Figural Memory
NeuroCog FX® - Hoppe et al.
- 18 -
(F=11.7, P<.01) and Simple Reaction (F=4.1, P<.05) as well as for 3 of 15 established measures
(CIT Interference, VLMT learning score, DCS learning score). Thus, no evidence could be
revealed for a general inappropriateness of computerized testing in older subjects with
presumably less computer experience. Significant main effects of the group factor (corrected for
age) were obtained for 6 of 8 single measures but not for Simple Reaction and Inverted Go/No Go
which, however, showed significant group differences in the TOTAL samples. A similar pattern
of general cognitive deterioration in epilepsy patients was revealed by established measures.
Patients showed significantly lower performance on all established tests and measures except of
the Corsi Block Tapping (P<.001 for most of the measures).
3.5. Phonematic fluency
The effects of changing the conditions of the Phonematic Fluency test were assessed in the PAT-
VALIDATE sample. According to our hypothesis, patients who underwent the written form of
the Phonematic Fluency test in the early stages of test development (N= 42) yielded no different
results (raw score, standard value) than the orally tested patients enrolled later (N= 82; Mann-
Whitney test, raw score: P=.86, standard value: P=.30). However, the initial letter selection
significantly affected performance with ‘S’ yielding higher scores than ‘L’ and ‘P’ (‘L’: N=52,
mean: 10.0; ‘P’: N=92, mean: 9.7; ‘S’: N=48; mean: 12.6; ANOVA: F=6.95, P=.001, post-hoc
Scheffé tests: ‘S’>‘L’: P=.018, ‘S’>‘P’: P=.002). Therefore, the following correction for initial
letter ‘S’ was meanwhile integrated in the program: The raw score should be corrected by
subtraction of 20% (truncated) before applying the age-corrected standard value. After this
correction the mean standard values for the three initial letter groups were equal (ANOVA,
F=0.01, P=.91).
4. Discussion
NeuroCog FX® is a PC-based cognitive screening instrument which was developed to fill the
diagnostic gap between unspecific ratings and comprehensive neuropsychological test batteries.
For example, economic identification of patients for more comprehensive neuropsychological
examinations, frequent follow-up examinations of cognitive performance, or the preliminary
NeuroCog FX® - Hoppe et al.
- 19 -
scientific evaluation of cognitive drug effects may motivate and justify using a screening
instrument. NeuroCog FX® can be administered in about 30 minutes. The test can be
administered by non-academic personnel (e.g. medical students). In the most recent version of the
test system scoring (including determination of age-corrected standard values) and electronic
filing for later scientific use are integrated. The tasks address four cognitive domains: attention
(psychomotor speed), working memory (capacity, manipulation), verbal and figural memory
(learning and recognition), and language (phonematic fluency). These functions are related to the
quality of life and are known to be sensitive indicators of treatment effects and latent disease
dynamics in epilepsy and brain tumor diseases [1-7]. Two measures of overall performance,
SCORE (scored tests) and RTT (reaction-time based tests), were defined according the results
from PCA. Individual diagnostics can be performed based on the age-related normative data and
the provided thresholds for clinical classification but strictly requires professional
neuropsychological expertise. The test is intended for use also in other neurological diseases and
is presently evaluated in diverse studies (e.g. primary CNS lymphoma, glioma, myotonic
dystonia, and septicemia).
To evade effects of a non-normal distribution of raw scores, plane transformation was applied,
i.e. an age-group related percentile-based “manual” assignment of standard values to the eight
subtest raw scores (normative sample: N=244; age range = 16-80 years; four age groups with
N≥48) [23]. No group differences were found between younger and older adults (30-44 years
versus 45-60 years) but older adults (60-75 years) showed lower performance in all subtests. No
education-related norms were provided. In early-onset diseases, like many types of epilepsy, a
correction for the lower educational level would cover neurocognitive deficits. In late-onset
diseases individual diagnostic evaluations must carefully account for the educational background
of a patient and eventually should mainly refer to the intraindividual performance profile (Fig. 1).
Based on a sample of healthy subjects (CON-RELIABLE), the test-retest correlations of all but
one NeuroCog FX® subtests appeared medium to large confirming sufficient reliability.
However, the Two-Back score showed no significant test-retest correlation but a near-to-the-
maximum group mean and small variance (r12=.22, P=.15; mean=8.8, SD=2.3) indicating a
ceiling effect which might have contributed to the low test-retest correlation. In contrast, patients
NeuroCog FX® - Hoppe et al.
- 20 -
showed “large” test-retest correlations in all scores (0.70<r12<0.85) including the Two Back test
(r12=.70, P<.001; mean=5.4, SD=4.0). We conclude that NeuroCog FX® is suited for serial use
in epilepsy patients but may be inappropriate for studies in high-functioning subjects.
From a clinical perspective, an individual classification of the course of performance during
follow-up examinations may be required. Reliable change indices (90%-confidence intervals)
indicated that significant individual change in the NeuroCog FX® subtests refers to a raw score
difference of approximately one normative standard deviation (i.e., 10 standard value points).
This corresponds to findings on reliabilities and confidence intervals of established
neuropsychological measures. For example, for scores from the VLMT used in this study the
critical differences (P=.10) were about 1 SD for learning score, final score, and recognition score
and 1.6 SD for the retention score (unpublished data from a comprehensive normative study
performed in 2003, N=81 retested healthy subjects).
Concurrent validity was estimated based on correlations between computerized and established
neuropsychological measures. In patients, all computerized tests showed specific correlations
with established neuropsychological counterparts (small to large, 0.19<|r|<0.67). Validity
estimates appeared slightly lower for healthy subjects, probably due to ceiling effects as shown
above for reliability analysis. Explorative PCA, separately performed on both samples (CON-
and PAT-VALIDATE), confirmed that the computerized subtests (except the reaction times)
loaded on different factors together with their established neuropsychological counterparts. We
conclude that the NeuroCog FX® subtests address the different cognitive functions as intended
and, thus, cover important aspects of attention, working memory, verbal and figural memory, and
language. However, this screening instrument is not suited to replace established tests (and was
not intended to).
The clinical validity and the utility of individual diagnostic applications were further explored by
evaluating classification error rates, predictive values, likelihood ratios, sensitivity, and
specificity at different standard value thresholds of computerized test scores. The provided data
on diagnostic utility allow the user to define categorical thresholds according to his/her specific
requirements (Table 8A-C). Under consideration of critical differences, falling below a threshold
NeuroCog FX® - Hoppe et al.
- 21 -
of (normative mean – 1.5 standard deviations) in established measures appears as a reasonable
criterion for diagnosing functional impairment. The focus of a more detailed analysis was set on
the detection of memory deficits which are an important co-morbidity of epilepsy. Importantly,
only single healthy subjects showed neuropsychological deficits according to this definition, and
NeuroCog FX® classified almost 100% of the healthy subjects correctly. The diagnostic utility of
the Figural Memory test appeared high for a threshold of SV<90 (Table 8B). For the Verbal
Memory test, the classification error rate was minimal at a rather high threshold of SV<80 (Table
8A). For screening purposes, the rate of affected but non-identified patients (1-sensitivity) could
be decreased by lowering the threshold but, of course, at the expense of lower specificity. We
conclude that the computerized memory tests to some extent cover important aspects of verbal
and figural learning and memory as captured by established test.
However, a substantial portion of subjects (about 30%) showed normal performance in the
computerized tests while they fail established tests - and vice versa. The computerized tests
strongly differ from the established tests with regard to presentation and test modalities (e.g.
recognition versus free reproductions, verbal test: reading versus listening) which contributes to
this dissociation. In group studies, failure in verbal delayed free recall, which may remain
undetected by recognition tests, is correlated with (left) hippocampal dysfunction [24,25]. A
dissociation of intact recognition but impaired recall is also known from amnesic conditions [26].
Although the diagnostic properties of the established retention score (with regard to hippocampal
lesions) are unknown, NeuroCog FX® might be inappropriate, or require completion by further
tests, in studies on hippocampal function or dysfunction. Notably, in patients, the verbal retention
score did not load on one factor together with the Verbal Memory score (Table 6B). Besides
memory deficits, the overall performance measure, SCORE, showed sufficient diagnostic utility
for the detection of other cognitive deficits (e.g. working memory, verbal fluency).
After the early stages of test development (standardization, reliability/validity analyses; N=40
patients) further patients were selected for computerized and neuropsychological testing for
diverse clinical indications such as subjective complaints, medication changes, and
baseline/outcome evaluations of neuropsychological training. Some of the patients still
underwent diagnostic procedures and, thus, did not have a clear diagnosis of epilepsy. Therefore,
NeuroCog FX® - Hoppe et al.
- 22 -
the clinical data although being recorded from unselected Epileptology unit patients can not be
unequivocally referred to epilepsy. For this mixed patient population, group level analyses of
NeuroCog FX® scores as well as established measures revealed a pattern of general cognitive
impairment. However, single patients showed rather differential profiles of affected and
unaffected functions (median: 2 of 8 NeuroCog FX® subtests with SV <90). In terms of
percentage of subjects with below-average scores (SV<90; Fig. 2), Verbal Memory (patients vs.
healthy subjects: 44% vs. 14%) and Phonematic Fluency (45% vs. 13%) showed the greatest
group mean differences which is consistent with earlier findings based on established measures
[1,2,24].
We would like to close the discussion with some remarks on the benefits and pitfalls of
computerized screenings though a full review of this issue is out of the scope of this article. The
most important risk of each screening approach, irrespective of whether or not it is computerized,
is that the tasks fail to address clinically relevant cognitive functions. For example, cognitive
effects of topiramate had been controversially discussed based on data from computerized
screening until established tests properly addressing language and other cognitive functions
associated to the frontal lobes were applied [27,28]. During the last two decades several computer-
based or computer-assisted neuropsychological test systems were developed and applied also in
studies on epilepsy [12]. Some systems established for research on neurological diseases, such as
the Automated Neuropsychological Assessment Metrics (ANAM), to our knowledge, have never
been used in epilepsy research so far [29]. Like NeuroCog FX® subtests, most of the
computerized tasks represent computer adaptations of well-established neuropsychological
paradigms. Consequently, computerized testing provided little theoretical or conceptual advances
to the field. NeuroCog FX® offers a selection of tasks which was based on the
neuropsychological expertise of our group in epilepsy and brain tumors [13-17,22,24,25]. Concurrent
validation analyses confirmed that the tasks actually address the intended cognitive domains. On
a group level, NeuroCog FX® is likely to detect systematic alterations in attention, working
memory, verbal and figural memory, or language.
NeuroCog FX® - Hoppe et al.
- 23 -
The possible benefits of computerized testing such as high standardization and ease of
administration or automated scoring and filing have already been mentioned above. Most recent
reviews on the use of computerized testing finally arrive at a positive evaluation [10,11,29]. Beyond
the established psychometric criteria, important computer-specific issues are the application in
computer-naïve populations (e.g. older subjects) and technical issues referred to hardware and
software specifications. Notably, analyses of age effects on NeuroCog FX® performance did not
reveal any computer-specific difficulties with the procedure in elderly subjects. Furthermore, the
user requirements for NeuroCog FX® are very low. All reactions are recorded via the keyboard,
most of them via the spacebar. If necessary, the tester may assist the testee. Neither patients nor
healthy subjects complained about the usability of the computerized test. The major technical
concerns refer to exact time measurement and timing of stimulus material as required for event-
related psychophysiological experiments [30]. NeuroCog FX® was evaluated on different
hardware and operation system platforms and is intended to be used flexibly. Users are advised to
close all other programs when running the test. But nevertheless, time measurement and timing
will not be exact at a millisecond level. However, psychometric analyses confirmed sufficient
reliability of the median reaction times despite the error variance caused by this technical
shortcoming. Scores from reaction-time based tests are not summed up with the other tests in a
single total score.
While the software guarantees a high standardization of the test procedure and the scoring
process, the hardware platforms and devices as well as relevant environmental conditions may
vary at different sites and times. We recommend that subjects are tested under comparable
conditions during follow-up examinations (i.e., same hardware, especially same screen; same
room, especially similar lighting conditions). Some further technical issues of computerized
testing, for example monitor flickering, have lost much of their former relevance due to new
technical developments. But different lighting conditions might influence the contrast and,
finally, performance (e.g. visuomotor reaction times). Therefore, our testers were instructed to
ensure optimal lighting and perfect viewing conditions. Graphics requirements (e.g. resolution)
are specified in the technical manual. Notably, data for psychometric analyses of NeuroCog FX®
were recorded under real-life conditions, i.e. these analyses already account for the error variance
caused by technical or environmental shortcomings and variation.
NeuroCog FX® - Hoppe et al.
- 24 -
Korczyn and Aharonson (2007) in their review favor self-explaining systems allowing complete
self-administration in non-demented subjects even with aphasia [14]. Although NeuroCog FX®
would be suited for group administrations (e.g. no auditory signals) and even self-administration,
we do not recommend this use. In our experience, neuropsychological examination of patients
requires an assistant who ensures the correct understanding of the tasks, keeps the patient
motivated and grants all of the necessary support (‘testing to the limits’). Usability criteria
proposed for self-administered computerized tests (so called “controlled” or “supervised” mode)
do not apply to NeuroCog FX® which is recommended for the so called “managed” mode, i.e.
administration with high level human supervision [31].
5. Conclusion
NeuroCog FX® is an economic screening tool which allows multiple serial science-based testing
during individual treatments or in multicenter group studies if more comprehensive
neuropsychological evaluations are inappropriate or unavailable. The test provides objective,
standardized, and sufficiently reliable and valid measures of clinically relevant cognitive
functions in epilepsy. The test does not replace a comprehensive neuropsychological evaluation.
The tool is presently under evaluation for use in patients with other neurological diseases.32
Conflict of Interest Statement
With permission of the University of Bonn Medical Centre NeuroCog FX® is marketed by three
of the co-authors (C. Hoppe, K. Fließbach, C. Helmstaedter).
Acknowledgment
Thanks to Nina Stephanie Lehnen, Alexander Höinghaus, Frederike Adler, and Johanna Michel
for data recording in patients and healthy subjects during their MD thesis. We are also grateful to
all participants of the studies. Special thanks to the anonymous reviewers for their detailed and
extraordinarily helpful comments.
NeuroCog FX® - Hoppe et al.
- 25 -
Tables and Figures
Table 1: NeuroCog FX®: subtests, functions, task descriptions, and measures.
Note: Measures not selected for standardization are shown in parentheses. Tests are
shown according to the predefined standard administration sequence.
a In case of motor handicaps, the examiner is allowed to assist the subject in typing.
b The background for the different administration rules applied on this test is explained in
the method section. No significant effects of type of administration (oral versus
written) were obtained in the patient samples; effects of different initial letters (L, P, S)
are corrected (see results section).
Table 2: Established neuropsychological assessment battery.
Note: The tests are described in more detail in [22].
Table 3: PCA on NeuroCog FX® scores (PAT- and CON-TOTAL, N=379).
Note: VARIMAX rotation. All factor loadings ≥0.30 are shown.
Table 4A: Reliability, practice effects and critical differences in healthy controls (CON-
RELIABLE).
a T-test for paired samples.
r12: Pearson’s product-moment test-retest correlation (estimate of reliability); M1: group
mean of first test application; M2: group mean of retest; SD = standard deviation; ∆-crit:
critical differences (for details see method section, α=.10); C.I. = 90%-confidence
intervals for raw scores (declined/improved).
ns = non-significant, * p<.05, ** p<.01, *** p<.001
NeuroCog FX® - Hoppe et al.
- 26 -
Table 4B: Reliability, practice effects and critical differences in patients (PAT-
RELIABLE).
For notes, see Table 4a.
Table 5: Concurrent validity estimates (CON- and PAT-VALIDATE).
+ p<.10, * p<.05, ** p<.01
Note: Pearson’s product-moment correlation coefficients.
Table 6A: Concurrent validation (PCA) in healthy controls (CON-VALIDATE).
Note: VARIMAX rotation. All factor loadings ≥0.30 are shown. The Two-Back test has
been excluded from this analysis.
Table 6B: Concurrent validation (PCA) in patients (PAT-VALIDATE, N=82).
For notes, see Table 6A.
Table 7: Performance in age groups: Means, standard deviations and results from
ANOVAs.
Table 8A: Detecting verbal memory deficits (CON- and PAT-VALIDATE, N=156).
a Deficits were indicated by age-corrected below-average scores of established
neuropsychological measures (SV<85, critical differences being considered). b Including false alarms (type I errors) and misses (type II errors).
NeuroCog FX® - Hoppe et al.
- 27 -
Table 8B: Detecting figural memory deficits (CON- and PAT-VALIDATE, N=138).
For notes, see Table 8A.
Table 8C: Detecting other cognitive deficits (CON- and PAT-VALIDATE, N=149). a The median number of below-average scores was 1.0. Cognitive deficits were indicated
by two or more age-corrected below-average scores in neuropsychological tests exclusive
of scores from verbal and figural list learning (SV<85, critical differences being
considered). b Including false alarms (type I errors) and misses (type II errors).
NeuroCog FX® – Hoppe et al.
- 28 -
Table 1
NeuroCog FX®: subtests, functions, tasks, and measures.
Subtest Function Task Measures
Digit Span verbal short-term memory
Successive visual presentation of single digits (1/second) from a digit sequence with increasing length (3-9); 2 trials for each span
Immediately recall the digit sequence by typing (number keys) a
score: number of correct responses
(digit span: maximal length of correctly reproduced digit sequence)
Two-Back working memory Continuous presentation of single digits (1/second).
React (spacebar) as fast as possible if present digit equals the second to the last digit
score: hits minus false alarms
(reaction time: median of reaction times for hits)
Simple Reaction alertness React as fast as possible when a blue circle occurs (spacebar)
reaction time: median of reaction times
Go/No Go selective attention React (Go) as fast as possible if a blue circle occurs (spacebar) but ignore yellow circles (No go)
(hits)
(false alarms)
reaction time: median of reaction times for hits
Inverted Go/No Go
susceptibility to interference effects and cognitive flexibility
Vice versa: React (Go) as fast as possible if a yellow circle occurs (spacebar) but ignore blue circles (No go)
(hits)
(false alarms)
reaction time: median of reaction times for hits
NeuroCog FX® – Hoppe et al.
- 29 -
Table 1, continued
Verbal Memory verbal learning and
recognition 3 trials of word list learning (12 nouns from 6 predefined lists or random selection)
learning: visual presentation (1 item per second)
subsequent yes/no recognition tests (item distracter ratio = 1:2, paced presentation, spacebar press indicates ‘yes’, max. reaction interval: 2 seconds, same distracters but word sequence re-arranged from trial to trial)
plus delayed yes/no recognition test (retention interval filled by Figural Memory)
pool: 72 items (word frequency < 5 per million) and 140 distracters (word frequency <6 per million) from the CELEX database (Max-Planck-Institute of Neurolinguistics, Nijmegen/Netherlands)
(hits)
(false alarms)
total score: hits – false alarms/2
(reaction time: median reaction time for hits)
(reaction time: median reaction time for false alarms)
NeuroCog FX® – Hoppe et al.
- 30 -
Figural Memory figural learning and recognition
3 trials of figure list learning (7 checkerboard patterns with 4 indicated yellow squares in a 3x3 blue matrix, from 6 predefined lists or random selection)
learning: visual presentation (1 item per 2 seconds)
subsequent yes/no recognition tests (item distracter ratio = 1:2, paced presentation, spacebar press to indicate ‘yes’, max. reaction interval: 2 seconds, same distracters but pattern sequence re-arranged from trial to trial)
plus delayed yes/no recognition test (retention interval filled by Verbal Memory/delayed recognition)
pool: 126 possible patterns, 42 items, 84 distracters
(hits)
(false alarms)
total score: hits – false alarms/2
(reaction time: median reaction time for hits)
(reaction time: median reaction time for false alarms)
Phonematic Fluency
phonematic literal word fluency
former version (CON-TOTAL, N=42 from PAT-VALIDATE): write words with initial letter P (paper-pencil test)
present version (N=82 from PAT-VALIDATE): name words with random first letter (L, P, or S),
each type of words but counting, conjunctions, or declinations are not permitted
time: 1 minute
program shows initial letter and elapsed time b on the screen and allows the examiner to count correct words via button clicks
score: number of correct words
NeuroCog FX® – Hoppe et al.
- 31 -
Table 2
Established neuropsychological assessment battery.
Function Test/s Subtests Measures
Counting of Symbols
time for completion Attention and executive functions
Test für cerebrale Insuffizienz (c.I.T.) [Test for cerebral insufficiency] AB-Interference time for completion
Trail Making Test (TMT) Forms A and B time for completion
Maze Test (from Chapuis) time for completion
Digit Span/Forward
span Short-term memory and working memory Digit
Span/Reversed span
Span tests from Wechsler Memory Scale (WMS-III)
Corsi Block Tapping Forward
block span
Verbal memory
Verbaler Lern- und Merkfähigkeitstest (VLMT) [Rey Auditory Verbal and Learning Test]
learning score: total of recalled words during learning (trials 1-5)
final score: recalled words after retention (trial 7)
retention score: loss of words from trial 5 to trial 7 (negative)
recognition score: yes/no recognition, hits minus false alarms
NeuroCog FX® – Hoppe et al.
- 32 -
Table 2, continued. Figural memory
Diagnosticum für Cerebralschädigung - revidiert (DCS) [DCS – a visual learning and memory test for neuropsychological evaluation]
learning score: total recalled figures during learning (trials 1-5)
final score: number of recalled items in trial 5
Word fluency, phonematic
Leistungs-Prüf-System [Performance-Test-System]
Subtest 6: Word fluency (written version)
number of correct words
Word fluency, semantic
Demenz-Test [Dementia-Test] Supermarket Test (written)
number of correct words
NeuroCog FX® – Hoppe et al.
- 33 -
Table 3
PCA on NeuroCog FX® scores (CON- and PAT-TOTAL, N=379).
Factors
Tests – Measures SCORE RTT NeuroCog FX®
Digit Span – score .736
Two Back – score .708
Verbal Memory – score .568 -.333
Figural Memory – score .682
Phonematic Fluency – score .714
Simple Reaction – reaction time .792
Go/No Go – reaction time .873
Inverted Go/No Go – reaction time .867
NeuroCog FX® – Hoppe et al.
- 34 -
Table 4A
Reliability, practice effects and critical differences in healthy controls (CON-RELIABLE).
Subtest – Measure (max. raw score) n r12 M1 (SD) M2 (SD) M2 - M1 a 90%-∆-crit C.I.
Digit Span – score (9) 44 0.68 *** 7.4 (2.2) 8.2 (2.1) +0.8 ** 2.9 -3 / +4
Two-Back – score (10) 41 0.21 ns 8.8 (2.3) 9.2 (1.4) +0.4 ns 4.8 -5 / +6
Simple Reaction – reaction time/ms 44 0.45 ** 262 (54) 261 (48) -1 ns 93 +93 / -95
Go/No Go – reaction time/ms 40 0.54 *** 362 (58) 342 (57) -20 * 92 +73 / -113
Inverted Go/No Go – reaction time/ms 44 0.57 *** 373 (63) 349 (49) -24 ** 96 +73 / -121
Verbal Memory – total score (48) 44 0.45 ** 41.2 (4.1) 43.6 (3.8) +2.4 *** 7.1 -5 / +10
Figural Memory – total score (28) 44 0.52 *** 14.6 (5.5) 16.5 (5.3) +2.0 * 8.9 -8 / +11
Phonematic Fluency – score (“P”) 44 0.69 *** 14.2 (3.1) 16.0 (3.3) +1.8 *** 4.0 -3 / +6
SCORE – standard value 36 0.62 *** 102.4 (6.2) 107.1 (6.1) +4.7 *** 8.9 -5 / +14
RTT – standard value 36 0.55 *** 100.1 (8.9) 102.6 (7.5) +2.5 ns 13.9 -12 / +16
NeuroCog FX® – Hoppe et al.
- 35 -
Table 4B
Reliability, practice effects and critical differences in patients (PAT-RELIABLE).
Subtest – Measure (max. raw score) n r12 M1 (SD) M2 (SD) M2 - M1 a 90%-∆-crit 90%-CI
Digit Span – score (9) 85 0.82 *** 5.1 (2.4) 5.5 (2.2) +0.4 * 2.4 -3 / +3
Two-Back – score (10) 82 0.70 *** 5.4 (4.0) 5.1 (4.3) -0.3 ns 5.1 -5 / +6
Simple Reaction – reaction time/ms 89 0.81 *** 299 (90) 314 (111) +14 * 92 +79 / -107
Go/No Go – reaction time/ms 88 0.74 *** 411 (107) 409 (117) -2 ns 127 +130 / -126
Inverted Go/No Go – reaction time/ms 89 0.80 *** 427 (109) 415 (113) -12 ns 114 +127 / -103
Verbal Memory – total score (48) 82 0.85 *** 32.7 (10.1) 32.3 (10.5) -0.4 ns 9.1 -10 / +9
Figural Memory – total score (28) 82 0.72 *** 8.7 (6.0) 9.3 (6.2) +0.7 ns 7.4 -6 / +9
Phonematic Fluency – corrected score 82 0.81 *** 8.8 (4.5) 8.9 (4.6) +0.02 ns 4.6 -5 / +5
SCORE – standard value 75 0.84 *** 87.2 (10.8) 87.5 (10.6) +0.3 ns 10.1 -10 / +11
RTT – standard value 80 0.78 *** 91.4 (12.3) 92.2 (12.8) +0.8 ns 13.5 -13 / +15
NeuroCog FX® – Hoppe et al.
- 36 -
Table 5
Concurrent validity estimates.
NeuroCog FX® Established neuropsychological assessment battery
CON-VALIDATE (N=40)
PAT-VALIDATE
r N r
Digit Span Digit Span/Forward 0.30 + 96 0.54 ***
Digit Span/Reversed 0.36 * 112 0.50 ***
Two-Back TMT Form B - time -0.04 ns 99 -0.48 ***
CIT Interference – time +0.12 ns 98 -0.40 ***
Maze Test – time -0.36 * 76 -0.33 **
Simple Reaction - time CIT Symbol Counting – time 0.10 ns 82 0.42 ***
TMT Form A – time 0.26 * 105 0.32 **
Verbal Memory VLMT – learning score (trials 1-5) 0.47 ** 124 0.62 ***
VLMT – final score (trial 7) 0.56 *** 124 0.49 ***
VLMT – retention score (∆5-7) -0.49 ** 124 -0.19 *
VLMT – recognition score 0.49 ** 123 0.54 ***
Figural Memory DCS – learning score (trials 1-5) 0.46 ** 110 0.58 ***
DCS – final score (trial 5) 0.38 * 109 0.54 ***
Phonematic Fluency Verbal Fluency, phonematic literal 0.60 *** 86 0.67 ***
Verbal Fluency, semantic (supermarket) 0.45 ** 37 0.34 *
NeuroCog FX® - Hoppe et al.
- 37 -
Table 6A
Concurrent validation (PCA) in healthy controls (CON-VALIDATE).
Factors
Tests - Measures 5 2 1 4 3 6NeuroCog FX®
Digit Span – score .664 -.507Simple Reaction .755 -.338Go/No Go .782 -.339Inverted Go/No Go .859Verbal Memory .590 .308 .338Figural Memory -.362 .542Phonematic Fluency .782
Established test battery
Digit Span/Forward – span .629
Digit Span/Reversed – span .809
VLMT – learning score .759 -.424
VLMT – final score .933
VLMT – retention score -.778
VLMT – recognition .776
DCS – learning score .453 .599 .391
DCS – final score .599 .513
Verbal Fluency, phonematic – score .817
Corsi Block Tapping – span .704
TMT Form A – time -.796 .336
TMT Form B – time -.534 .556
CIT Counting Symbols – time .781
CIT Interference – time .423 .589
NeuroCog FX® - Hoppe et al.
- 38 -
Table 6B
Concurrent validation (PCA) in patients (PAT-VALIDATE, N= 82).
Factors
Tests - Measures 4 2 1 3 5 6 NeuroCog FX®
Digit Span – score .676 -.318
Simple Reaction .821
Go/No Go .865
Inverted Go/No Go .839
Verbal Memory .621 .368
Figural Memory .306 .679
Phonematic Fluency .815
Established test battery
Digit Span/Forward – span .606 .479
Digit Span/Reversed – span .690 .307
CIT Counting Symbols – time -.735 .351
CIT Interference - time -.492 .349 -.444
VLMT – retention score -.410 -.360 .361
VLMT – learning score .811
VLMT – final score .861
VLMT – recognition .884
DCS – learning score .885
DCS – final score .873
Word Fluency, phonematic – score .801
TMT Form A – time -.486 .436
TMT Form B – time -.390 -.366 .542
Corsi Block Tapping – span -.842
NeuroCog FX® - Hoppe et al.
- 39 -
Table 7 Performance in age groups: Means, standard deviations and results from ANOVAs.
Age Groups ANOVA
16-29 yrs.
N=87
(A)
30-44 yrs.
N=57
(B)
45-59 yrs.
N=48
(C)
60-75 yrs.
N=52
(D)
P Post-hoc
Scheffé tests
Digit Span 7.7 (2.0) 7.4 (2.1) 6.7 (2.4) 6.3 (2.1) .001 AB > D
Two Back 8.5 (2.6) 8.1 (2.7) 7.9 (2.3) 5.6 (3.6) .000 ABC > D
Simple Reaction (ms) 256 (44) 281 (65) 273 (51) 330 (80) .000 ABC > D
Go/No Go (ms) 348 (54) 360 (57) 376 (61) 440 (98) .000 ABC > D
Inverted Go/No Go (ms) 356 (51) 379 (55) 389 (64) 435 (79) .000 ABC < D A < C
Verbal Memory 42.5 (3.7) 41.4 (4.4) 40.9 (4.1) 34.4 (8.9) .000 ABC > D
Figural Memory 17.3 (4.8) 13.9 (5.3) 13.1 (6.8) 7.6 (5.7) .000 A > BC > D
Phonematic Fluency 13.0 (3.7) 13.2 (4.8) 14.8 (4.3) 11.7 (3.3) .003 C > D
NeuroCog FX® - Hoppe et al.
- 40 -
Table 8A
Detecting verbal memory deficits (CON- and PAT-VALIDATE, N= 156).
Verbal Learning Score Verbal Final Score Verbal Retention Score Verbal Recognition Score
Affected patients/controls (%) a 35 / 2 (24.3) 48 / 1 (32.0) 28 / 0 (17.9) 43 / 1 (28.4)
NeuroCog FX® Verbal Memory (SV thresholds)
<80 <85 <90 <80 <85 <90 <80 <85 <90 <80 <85 <90
Positively tested (%) 20.4 35.5 46.1 20.9 35.3 46.4 20.5 35.3 46.8 20.6 34.8 46.5
Classification errors (%) b 21.1 28.3 33.6 28.1 32.0 34.0 24.4 34.0 41.7 23.2 29.7 33.5
Relative error reduction (%) b -39.5 -33.6 -30.1 -28.9 -28.4 -30.2 -21.7 -16.2 -13.1 -37.8 -31.7 -30.8
Positive predictive value 0.58 0.44 0.40 0.59 0.50 0.48 0.34 0.27 0.25 0.63 0.48 0.44
Negative predictive value 0.84 0.87 0.89 0.75 0.78 0.82 0.86 0.87 0.88 0.80 0.82 0.86
Likelihood ratio positive 4.30 2.49 2.07 3.10 2.12 1.95 2.40 1.71 1.50 4.21 2.34 2.02
Likelihood ratio negative 0.58 0.48 0.38 0.70 0.61 0.48 0.73 0.68 0.63 0.61 0.55 0.43
Sensitivity 0.49 0.65 0.76 0.39 0.55 0.69 0.39 0.54 0.64 0.45 0.59 0.73
Specificity 0.89 0.74 0.63 0.88 0.74 0.64 0.84 0.69 0.57 0.89 0.75 0.64
Correlation (ϕ) 0.40 0.35 0.34 0.30 0.28 0.32 0.22 0.18 0.16 0.39 0.32 0.33
χ2 24.05 18.38 17.27 13.90 12.38 15.31 7.38 5.01 4.19 23.08 15.92 17.05
χ2-Test (df=1), significance (P) 0.000 0.000 0.000 0.000 0.000 0.000 0.007 0.025 0.041 0.000 0.000 0.000
NeuroCog FX® - Hoppe et al.
- 41 -
Table 8B
Detecting figural memory deficits (CON- and PAT-VALIDATE, N= 138).
Figural Learning Score Figural Final Score
Affected patients/controls (%) a 48 / 1 (35.5) 49 / 0 (35.5)
NeuroCog FX® Figural Memory (SV thresholds)
<80 <85 <90 <80 <85 <90
Positively tested (%) 5.8 28.3 34.1 6.0 28.4 34.3
Classification errors (%) b 31.2 21.7 18.8 33.6 24.6 21.6
Relative error reduction (%) b -16.2 -50.3 -58.5 -12.0 -44.3 -52.7
Positive predictive value 0.88 0.74 0.74 0.75 0.71 0.72
Negative predictive value 0.68 0.80 0.85 0.66 0.77 0.82
Likelihood ratio positive 12.71 5.27 5.30 5.20 4.26 4.40
Likelihood ratio negative 0.87 0.46 0.33 0.90 0.52 0.39
Sensitivity 0.14 0.59 0.71 0.12 0.55 0.67
Specificity 0.99 0.89 0.87 0.98 0.87 0.85
Correlation (ϕ) 0.27 0.51 0.59 0.20 0.45 0.53
χ2 10.02 35.83 47.25 5.42 27.19 37.36
χ2-Test (df=1), significance (P) 0.002 0.000 0.000 0.020 0.000 0.000
NeuroCog FX® - Hoppe et al.
- 42 -
Table 8C
Detecting other cognitive deficits (CON- and PAT-VALIDATE, N= 149).
Established non-memory tests
Affected patients/controls (%) a 63 / 8 (47.7)
NeuroCog FX® SCORE (overall performance) (SV thresholds)
<80 <85 <90 <95
Positively tested (%) 10.7 19.5 32.2 47.7
Classification errors (%) b 42.3 37.6 28.9 29.5
Relative error reduction (%) b -12.2 -22.6 -41.3 -40.8
Positive predictive value 0.75 0.76 0.79 0.69
Negative predictive value 0.56 0.59 0.67 0.72
Likelihood ratio positive 3.30 3.45 4.17 2.45
Likelihood ratio negative 0.88 0.76 0.53 0.43
Sensitivity 0.17 0.31 0.54 0.69
Specificity 0.95 0.91 0.87 0.72
Correlation (ϕ) 0.19 0.28 0.44 0.41
χ2 5.37 11.49 28.20 24.81
χ2-Test (df=1), significance (P) 0.020 0.001 0.000 0.000
NeuroCog FX® - Hoppe et al.
- 43 -
Figure 1: NeuroCog FX® cognitive performance profile.
NeuroCog FX® - Hoppe et al.
- 44 -
Figure 2: Performance profiles: CON- and PAT-TOTAL.
Percentage of subjects with SV<90 in the respective NeuroCog FX® subtest (Chi-square
tests, P<.001 for each subtest).
NeuroCog FX® – Hoppe et al.
- 45 -
References
1 Elger CE, Helmstaedter C, Kurthen M. Chronic epilepsy and cognition. Lancet Neurol 2004;
3:663-672.
2 Motamedi G, Meador K. Epilepsy and cognition. Epilepsy Behav 2003;4 Suppl 2:S25-38.
3 Meador KJ, Gilliam FG, Kanner AM & Pellock JM. Cognitive and behavioral effects of
antiepileptic drugs. Epilepsy Behav 2001;2:SS1-SS17.
4 Taphoorn MJ & Klein M. Cognitive deficits in adult patients with brain tumours. Lancet
Neurol 2004;3:159-168.
5 Sparks DL, Sabbagh MN, Connor DJ, Lopez J, Launer LJ, Browne P, Wasser D, Johnson-
Traver S, Lochhead J & Ziolwolski C. Atorvastatin for the treatment of mild to moderate
Alzheimer disease: preliminary results. Arch Neurol 2005;62:753-757.
6 Panitch H, Miller A, Paty D & Weinshenker B. Interferon beta-1b in secondary progressive
MS:results from a 3-year controlled study. Neurology 2004;63:1788-1795.
7 Ravina B, Putt M, Siderowf A, Farrar JT, Gillespie M, Crawley A, Fernandez HH,
Trieschmann MM, Reichwein S & Simuni T. Donepezil for dementia in Parkinson's
disease:a randomised, double blind, placebo controlled, crossover study. J Neurol Neurosurg
Psychiatry 2005;76:934-939.
8 Brodie MJ, Shorvon SD, Canger R, Halász P, Johannessen S, Thompson P, Wieser HG,
Wolf P. Commission on European Affairs: Appropriate standards of epilepsy care across
Europe. Epilepsia 1997;38:1245-1250.
9 Wilken JA, Sullivan CL, Lewandowski A & Kane RL. The use of ANAM to assess the side-
effect profiles and efficacy of medication. Arch Clin Neuropsychol 2007;22 Suppl 1:S127-
S133.
10 Wild K, Howieson D, Webbe F, Seelye A & Kaye J. Status of computerized cognitive testing
in aging: a systematic review. Alzheimer’s Dementia 2008;4:428-437.
11 Korczyn AD & Aharsonson V. Computerized methods in the assessment and prediction of
dementia. Curr Alzheimer Res 2007;4:364-369.
NeuroCog FX® – Hoppe et al.
- 46 -
12 First peer-review publication on test systems which have already been applied in epilepsy
research (sorted by publishing year): Cambridge Neuropsychological Test Automated
Battery CANTAB (Sahakian BJ, Morris RG, Evenden JL, Heald A, Levy R, Philpot MP &
Robbins TW. A comparative study of visuospatial memory and learning in Alzheimer-type
dementia and Parkinson's disease. Brain 1988;111:695-718); Automated Neuropsychological
Assessment Metrics ANAM (Bleiberg J, Garmoe W, Cederquist J, Reeves D & Lux W.
Effects of Dexedrine on performance consistency following brain injury: a double-blind
placebo crossover case study. Neuropsychiatr Neuropsychol Behav Neurol 1993;6:245-248);
FePSY ‘The Iron Psyche’ (Aldenkamp AP, Alpherts WC, Diepman L, van 't Slot B, Overweg
J & Vermeulen J. Cognitive side-effects of phenytoin compared with carbamazepine in
patients with localization-related epilepsy. Epilepsy Res 1994;19:37-43); Cognitive Drug
Research CDR (Mohr E, Knott V, Sampson M, Wesnes K, Herting R, Mendis T. Cognitive
and quantified electroencephalographic correlates of cycloserine treatment in Alzheimer's
disease. Clin Neuropharmacol 1995;18:28-38); MicroCog (Powell Ass.; Di Sclafani V, Clark
HW, Tolou-Shams M, Bloomer CW, Salas GA, Norman D & Fein G. Premorbid brain size is
a determinant of functional reserve in abstinent crack-cocaine and crack-cocaine-alcohol-
dependent adults. J Int Neuropsychol Soc 1998;4:559–565); Headminder (Erlanger DM,
Feldman DJ, Theodoracopulos A, Kaplan D. Development and validation of the cognitive
stability index, a web-based protocol for monitoring change in cognitive function. Arch of
Clin Neuropsychol 2000;15:293-316); Mindstreams (NeuroTrax Inc.; Elstein D, Guedalia J,
Doniger GM, Simon ES, Antebi V, Arnon Y & Zimran A. Computerized cognitive testing in
patients with type I Gaucher disease: effects of enzyme replacement and substrate reduction.
Genet Med 2005;7:124-130); California Computerized Assessment Package CALCAP (Chang
L, Ernst T, Speck O, Patel H, DeSilva M, Leonido-Yee M & Miller EN. Perfusion MRI and
computerized cognitive test abnormalities in abstinent methamphetamine users. Psychiatry
Res 2002; 114:65-79); Behavioral Assessment and Research System BARS (Rohlman DS,
Gimenes LS, Eckerman DA, Kang SK, Farahat FM & Anger WK. Development of the
Behavioral Assessment and Research System (BARS) to detect and characterize
neurotoxicity in humans. Neurotoxicology 2003;24:523-31); CNS Vital Signs (Gualtieri CT
& Johnson LG. Reliability and validity of a computerized neurocognitive test battery, CNS
Vital Signs. Arch Clin Neuropsychol 2006;21:623-643); Rochester Test Battery RTB
NeuroCog FX® – Hoppe et al.
- 47 -
(Davidson PW, Weiss B, Beck C, Cory-Slechta DA, Orlando M, Loiselle D, Young EC,
Sloane-Reeves J & Myers GJ. Development and validation of a test battery to assess subtle
neurodevelopmental differences in children. Neurotoxicology. 2006;27:951-69); NexAde
(Korczyn AD & Aharsonson V. Computerized methods in the assessment and prediction of
dementia. Curr Alzheimer Res 2007;4:364-369).
13 Fliessbach K, Hoppe C, Schlegel U, Elger CE & Helmstaedter C (2006) [NeuroCogFX - a
computer-based neuropsychological assessment battery for the follow-up examination of
neurological patients]. Fortschr Neurol Psychiatr;74:643-50.
14 Fliessbach K, Helmstaedter C, Urbach H, Althaus A, Pels H, Linnebank M, Juergens A,
Glasmacher A, Schmidt-Wolf IG, Klockgether T & Schlegel U. Neuropsychological
outcome after chemotherapy for primary CNS lymphoma: a prospective study. Neurology
2005;64:1184-1188.
15 Jünemann H, Helmstaedter C & Elger CE [Possible use of computerized memory testing in
presurgical epilepsy diagnostics]. In: Scheffner D, ed [Epilepsy 91]. Reinbek: Einhorn-Presse
Verlag; 1992. p. 449-452.
16 Helmstaedter C, Hoppe C & Elger CE. Memory alterations during acute high-intensity vagus
nerve stimulation. Epilepsy Res 2001;47:37-42.
17 Helmstaedter C, Elger CE & Lendt M. Postictal courses of cognitive deficits in focal
epilepsies. Epilepsia 1994;35:1073-1078.
18 Gualtieri CT & Johnson LG. Reliability and validity of a computerized neurocognitive test
battery, CNS Vital Signs. Arch Clin Neuropsychol 2006;21:623-643.
19 Horn W [Leistungsprüfsystem L-P-S]. Göttingen: Hogrefe; 1983.
20 Lehrl S [Multiple choice vocabulary test. Form B]. Erlangen: Straube; 1977.
21 Flynn, J. R. The mean IQ of Americans: Massive gains 1932 to 1978. Psychol Bull 1984;
95:29-51.
22 Hoppe C, Helmstaedter C, Scherrmann J & Elger CE. No evidence for cognitive side effects
after 6 months of vagus nerve stimulation in epilepsy patients. Epilepsy Behav 2001;2:351-
356.
NeuroCog FX® – Hoppe et al.
- 48 -
23 Krauth J [Test construction and test theory]. Weinheim: Psychologie Verlags Union; 1995.
24 Hoppe C, Elger CE & Helmstaedter C. Long-term memory impairment in patients with focal
epilepsy. Epilepsia 2007;48 Suppl 9:26-29.
25 Gleissner U, Helmstaedter C, Schramm J & Elger CE. Memory outcome after selective
amygdalohippocampectomy in patients with temporal lobe epilepsy: one-year follow-up.
Epilepsia 2004;45:960-962.
26 Goodrich-Hunsacker NJ, Hopkins RO. Word Test performance in amnesic patients with
hippocampal damage. Neuropsychology 2009;23:529-534.
27 Aldenkamp AP. Cognitive effects of topiramate, gabapentin, and lamotrigine in healthy
young adults (comment). Neurology 2000;54:271-272.
28 Kockelmann E, Elger CE & Helmstaedter C. Significant improvement in frontal lobe
associated neuropsychological functions after withdrawal of topiramate in epilepsy patients.
Epilepsy Res 2003;54:171-178.
29 For example, the special supplement of Arch Clin Neuropsychol 2007;22 Suppl. 1.
30 Cernich AN, Brennana DM, Barker LM, Bleiberg J. Sources of error in computerized
neuropsychological assessment. Arch Clin Neuropsychol 2007;22 Suppl. 1: S39-S48.
31 International Test Commission (ITC). International guidelines on computer-based and
internet delivered testing
[http://www.intestcom.org/Downloads/ITC%20Guidelines%20on%20Computer%20-
%20version%202005%20approved.pdf; July 16, 2009]. ITC; 2005.
32 A demo version of the software can be requested from the corresponding author.