Download - Cognitive Measurements In the HRS/ADAMS Surveysiom.nationalacademies.org/~/media/Files/Activity Files/Aging... · Cognitive Measurements In the HRS/ADAMS Surveys Jack McArdle, USC

Cognitive Measurements In the HRS/ADAMS Surveys

Jack McArdle, USC & LRI

IOM Meeting, Washington DC April 10, 2014; Panel 3, 2:30-2:35pm.

Overview

1. Background Theory and Early Experiments 2. Current Cognitive Measures in the HRS 3. Contemporary Analyses of Cognition in HRS 4. Cognition Research in the HRS-ADAMS

5. Future Plans, Summary & Conclusions

1. Background Theory and Early Experiments

The US National Growth and Change Studies (NGCS, 1980- now called CogUSC)

NGCS/CogUSC is not a single study, but rather a program of research on the “growth and decline of multiple intellectual abilities.” The NGCS/CogUSC was started at the University of Denver in 1980 by Jack McArdle and John Horn, moved to the University of Virginia until 2005, and now is at USC. Our main substantive interest was to use all available collections of psychological tests to better describe and understand the many changes that seem to occur to people over the adult ages of the life-span (18-95+). NGCS/CogUSC research has been continuously funded by the National Institute on Aging (NIA) since 1980.

Most of the credit goes to Dr. John L. Horn,

USC Professor Psychology & Gerontology

1987-2006

Gc = Acquired Knowledge

“Shhh, Zogf!….Here comes one now!”

Gf = Novel Reasoning

Cattell (1942) & Horn’s (1967) Theory of Cognitive Changes

Multilevel longitudinal age curves (and 95% confidence boundaries) for Gf and Gc abilities (Rasch W-units) from CogNGCS longitudinal

data (McArdle et al., Developmental Psychology, 2002)

0 10 20 30 40 50 60 70 80 90 100

-60

-40

-20

0

20

40

60

General F luid A bility (G f) score as a func tion of A ge

A ge-at-Tes ting

Ge

ne

ral

Flu

id A

bili

ty s

co

re

0 10 20 30 40 50 60 70 80 90 100

-60

-40

-20

0

20

40

60

General Crys tallized A bility (Gc) score as a func tion of A ge

A ge-at-Tes ting

Gen

eral

Cry

stal

lized

Abi

lity

scor

e

Gf : Max (dy/dt) at Age=23, Min (d2y/dt2) at Age=46.

Gc : Max (dy/dt) at Age=36, Min (d2y/dt2) at Age=71.

Woodcock’s (1990) Multiple Abilities “toolkit” of what is now termed the “Cattell-Horn-Carroll” (CHC) Tests in the Woodcock-Johnson (WJ) Battery (Note: factor Go not included)

We then did a few HRS-based Experiments •  Several experiments in cognitive CATI have recently been

used to measure Gf using some form of the WJ-III Number Series (NS) test:

1. Boker & McArdle (1998) on Internet testing; 2. the NGCS-HRS (2004-06) modules; 3. The McArdle, Fisher, Kadlec (2007) HRS analyses; 3. The CogUSA (2007-2014) multi-wave data collection; 4. The RAND-ALP Experiment #102.

•  In sum, the WJ-III NS test with 47-items was chosen

because it had been developed to have a high internal consistency reliability of ric>0.95 over the adult age range. The HRS-2010 and beyond uses BATS versions.

2. Current Cognitive Measures In the HRS

The Health and Retirement Study The University of Michigan Health and Retirement Study

(HRS) surveys N > 30,000 Americans over the age of 50 and not institutionalized (every 2 years).

Supported by the National Institute on Aging (NIA), the study paints an emerging portrait of an aging America's physical and mental health, insurance coverage, financial status, family support systems, labor market status, and retirement planning.

The Memory tests are based on 4 intentionally short questions (to be discussed), but designed to be indicators of survey validity, dementia and impairment.

The full scope of the study and available data is described on the web at à HRSONLINE.ISR.UMICH.EDU

Cognitive functioning measures included in different waves of HRS/AHEAD

Measures HRS 92 AHEAD 93

HRS 94 AHEAD 95

HRS 96 HRS 98 and later waves

Immediate recall (20 items)

Immediate recall (10 items)

Delayed recall (20 items)

Delayed recall (10 items)

Serial 7 subtraction

Backward, Dates, Names

WAIS-Similarities

WAIS-Vocabulary

Rating of memory

Dementia diagnoses

Q#1: “How is My Memory?” Over Age

Q#2: “How has your memory changed?

Q#3: “Immediate Word Recall” Over Age

Q#4: “Delayed Word Recall” Over Age

TICS Mental Alertness Measurement There are classic definitions based on the Mini-Mental

State Exam (MMSE), the Clinical rating Scale, and the more recent Telephone Interview of Cognitive Status (TICS) versions. Current HRS/ADAMS Mental Status:

Q#1: Please tell me Today’s Date. Month? Day of the Month? Year? Day of the Week?

Q#2: What do people usually use to cut paper? Q#3: What do you call the kind of prickly plant that grows

in the desert? Q#4: Who is the President of the United States? Q#5: Who is the Vice President of the US? This differs substantially from the Montreal Cognitive

Assessment (MOCA), mainly because they use pictures.

3. Contemporary Analyses of the Cognitive Measures

in the HRS

Age can be the Basis of Change •  The major classification of cognitive aging studies is a

separation of the way inferences are made about aging: 1. Cross-Sectional “Age Differences” = made from differences between people at different ages, versus 2. Longitudinal “Age Changes” = additionally made from differences within the same people at specific ages.

•  Recent advances in longitudinal statistical methods have led to remarkable flexibility in model fitting (e.g., latent growth, multi-level, hierarchical models) including the ability to deal with unbalanced and incomplete data.

•  If a study includes two-or-more repeated observations of the same persons, these math-stat models allow the direct examination of “rates of change over age” (Δy[t]) without random error, and with correction for non-random dropout.

•  One new example is our 2007 study on the nature and sources of Age Changes in Episodic Memory in the HRS, and another is our 2011 study on multivariate dynamics.

A Multilevel Model with Nonlinear Changes in HRS Cognition Scores given AGE of Measurement (N=14,250;

D=32,665; McArdle, et, al, 2007)

Two Cognitive Factors are well measured by the current HRS measures, but they are not Gf-Gc

(HRS data only, from McArdle et al., 2007)

IM[1] DM[1] S7[1] BC[1] NA[1] DA[1]

UIM[1] UDM[1] US7[1] UBC[1] UNm[1] UDa[1]

Episodic Memory M

Mental Alertness A

.94 .85 .67 .47 .70 .53

.63

.1 .3 .6 .8 .5 .7

IM DM S7 BC NA DA

Results from a Multilevel Approach on HRS Cognition (in McArdle et. al, 2007) 1.  The cross-sectional memory scores expected at Age

65 are higher for people who are: (b1) Younger; (b2 Women; (b3) Born after 1920; (b4) More Educated; (b5) Living as a Couple; and (b6) In Good Cardiovascular Health. 2. The longitudinal changes in memory scores at any

age with no random error are expected to be larger or positive for people who:

(w1) Younger ; (w2) Born before 1920 (but overall scores are lower); (w3) Less Educated (the can gain more); (w4) In Good Cardiovascular Health; and (w5) have Taken the Same Tests At Least Once Before.

IR[1] IR[2]

DR[1]

S7[1]

BC[1]

NA[1]

DA[1]

DM[2]

S7[2]

BC[2]

NA[2]

DA[2]

UIR[1]

UDR[1]

US7[1]

UBC[1]

UNA[1]

UDA[1]

UIR[2]

UDR[2]

US7[2]

UBC[2]

UNA[2]

UDA[2]

M[1] M[2]

A[1] A[2]

.53

.24

-.07

.99

.62 .06

1

1

.50

.01

Of Importance is that Two HRS Cognitive Factors Loadings are Invariant Over Time and Mode-of-Testing (McArdle, 2011) – not the Factor Scores

F1=Face-To-Face Testing T2=2 years later + Telephone Testing

From all the HRS longitudinal data, we created a Vector Field Plot of Dynamic Results (Δy[t]= α + β y[t-1] + γ x[t-1]; McArdle, 2011).

Added to HRS a Fluid Reasoning (Gf); as in the WJ-III Definition:

The ability to reason and solve problems that often involve unfamiliar information or procedures. Manifested in the reorganization, transformation, and extrapolation of information.

Examples: Number Series task (47 items) 4 6 7

23 26 30 35

2 3 6 11 8 12 24 44

Issues in “Adaptive” testing •  For almost a century, it has been clear that increasing the

number of items on a test can lead to increased reliability of the total test score (see Spearman & Brown, 1910).

•  What has also been clear is that the number of items (M) can be cut down with only a minor loss of total test score reliability.

•  This leads to considerations of “adaptive testing” where items are presented in a strategic fashion to obtain the maximum reliability with the fewest number of items.

•  Two practical problems of using multiple measure testing with an the elderly population are unwanted “cognitive fatigue” and “practice effects.” In this case, shortened tests are essential.

Using the Unpublished Items (McArdle & Woodcock, 2009)

1. These item are rank ordered (~20 W units apart) within each set, so the total scores has reasonable precision (SEM~10)

2. Due to original Rasch scaling, the total score (0-15) is parallel with the Rasch W score and it can be reported.

3. Items should be administered one screen at a time, and the entered values and response time (RT) should be saved.

4. If administering the full WJ we would save time by starting at item 4 (basal), and go up by 1 item until 3 in a row are incorrect (ceiling). [If any R fails 4,5,6, go down to 1,2,3]. This requires knowledge of initial correct answers.

5. The use of these items also permits a wide range of adaptive test strategies, but these have not been fully tested.

Good fit (χ2=5, df=3, εa=.023) for the Revised “Very Simple Structure” result for

the complete HRS 2004 cognitive measures

S7[1] BC[1] NA[1] DA[1]

US7 UBC UNm UDa

MS

.67 .47 .70 .53

.50

.1 .6 .8 .5 .7

IM[1] DM[1]

UIM UDM

EM

.94 .85

.3

IM DM S7 BC NA DA IM[1]

Uns

NS

.99

NS DA[1] NS

.20

.30

Empirical Correlations with NS47 Full Score from Less Items (K) and Selections

Conclusions about Number Series This NS test can measures aspects of fluid intelligence

(Gf) useful for numerical reasoning. Respondents can do about 5-6 items in a three minute

period either over TEL or FTF. Items do not need to be repeated from one year to the

next because the item pool is large. With such a small number of items (5-7) there is no

substantial advantage to using adaptation, and a fixed set of 6-7 items each year will be “nearly optimal.”

The fixed items can be scored in the W-metric using a basic template, so no new scoring issues arise.

Adaptation will help save some time in the FTF setting, as long as a few more items are administered, and this will be comparable to the longer test and the fixed TEL.

New CogUSA Measurement Strategy We think it is possible to measure key aspects of cognition (1) in-person (FTF), (2) over the telephone (TEL), or maybe (3) over the internet (WEB), or (4) on hand-held tablets (TAB). In the current CogUSA research we are re-measuring most of the same people as before (n>1,000), now using some combination of all modes. This use of telephone and internet testing has the potential to make any new study much less expensive, so even more participants can be included, but the question of mode equivalence needs to be addressed directly. This will also put a premium on high quality measures, and we will choose the best ones.

New Internet Measurement Strategy If it is possible to measure the same key aspects of

cognition (1) in-person, (2) over the telephone, or (3) on the internet, then the scores should not matter.

But the Internet model is not yet well established. Any use of internet testing has the potential to make the

new study much less expensive. This means more participants can be included. But it also increases the diversity of tests (i.e., visual materials) that can be presented in an adaptive fashion.

We have had some prior experiences with Internet

testing (see Boker & McArdle, 1998).

ALP 2010 Internet Testing •  The American Life Panel (RAND, Kapteyn, 2004) was

used to evaluate the the UNS-BAT (with n>2,500 adults). •  We obtained full sets of scores on all 15 items in Set A

and all 15 items in Set B to evaluate the benefits of different kinds of biases due to adaptive testing strategies (i.e., for HRS, is BAT better than FAT).

•  We give both tests in counterbalanced order in ALP 2010 to checked the relationship of parallel forms termed A and B – We wanted to know can we give either A or B out?

•  We checked the utility of adding “speed of response” as well – definitely for item selection, and maybe for new scoring tables – we save RT in ALP.

•  We checked the use if the filler between sets A and B is the “Need for Cognition” items (8 simple yes or no).

Histogram of W-score(N=2548, median=520)

NS Item 7 Response Time Distribution

Selected ALP Results (09/2011) 1.  The mean score over all the ALP participants was well over

85% correct indicating this was a “special” group. 2.  The internal consistency reliability was established using 30-

items and (Cronbach’s alpha) as r=0.88. 3.  The correlations of all 15 items in Set A and all 15 items in Set

B was rab= 0.66. The counterbalanced order of A-B showed little effect (but the intercepts were slightly higher).

4.  The adaptive 6 item UNS-BAT had an overall correlation of rta= 0.67-0.74, but higher with short forms rfa= 0.75-0.80.

5.  We checked the utility of adding “speed of response” – we save RT in ALP – definitely useful for item selection, and maybe for new scoring tables.

6.  The Need For Cognition filler between sets A and B was useful.

Reconsider HRS Cognitive Tasks in 2010 We have had several discussions with the other HRS Co-PIs,

and we had new empirical data from CogUSA and other studies, so a consensus was formed to:

1. Retain the current HRS Cognitive Measures - due to their ability to measure important individual difference in factors EM and MA, and for longitudinal simplicity (comparisons).

2. Try to extract more reliable information about Speed of Processing by using the previously unavailable Timing Data (+ a second trial of BC) – (i.e., following the advice of Lachman & Spiro, 2002).

3. Included only the Newly Adaptive NS, and one Retrieval Fluency (Animals) measure. This was due to clearest validity evidence, but other measures were added in 2012.

4. Results on Dementia

and Aging from HRS-ADAMS

Why Worry About Cognitive Declines? Cognitive decline reaching a point of “no-return to

normality” is an important cause of decreased quality of life, disability, and loss of independence among older adults, and it even makes people refuse testing.

Dementia is defined behaviorally as a loss of cognitive control leading to “functional impairment.” Dementia of the Alzheimer’s type (AD) can be a behavioral indicator of serious brain damage.

Thus, identifying individuals in the earliest stages of cognitive decline may allow medical and behavioral interventions to be targeted at those most likely to benefit from these treatments.

Clinical and (less commonly) population-based studies of early cognitive decline and mild cognitive impairments have generated interest in recent years.

The Aging, Demographics & Memory (ADAMS) Study (see HRSONLINE)

The ADAMS study was based on 1,700 people from HRS. We took the cognitive screening tasks (EM+MA), and

subdivided people over the age of 70, into those who performed (in 2000-2002) in cognitive strata (ranging from “low functioning” to “high normal” on 0-35) and then tested on a full battery of cognitive tasks (in their home) with interviewers who were blind to the strata.

By a case-consensus design (where raters did not know the people) these HRS people were now classified as (a) Demented (b) Non-Clinical, or (c) Normal, non-case.

The analyses that followed were interested in matching (for n=856) the prior HRS scores to the in-person testing.

Dementia Prevalence and Incidence Newest statistics (see Plassman, et. al, 2007) available on

a nationally representative study of the USA for HRS people over 71 years of age (ADAMS-HRS).

The overall dementia rate is 14% among individuals aged 71 and older, projected to represent 3.4 million persons in US in 2002.

About 10% for Alzheimer’s Disease, projected to represent 2.4 million people in the US in 2002.

Dementia estimates change with age, from 5% for those aged 71-74 to 34% for those aged 90 and above.

Alternative estimates are higher due to different definitions of “degree of functional impairment.”

Mild Cognitive Impairment (MCI) Multiple studies of memory-based MCI in clinical

samples have found an increased risk for progression to dementia compared to those with normal cognitive function

However, in population-based samples memory-

based MCI has not been clearly associated with increased risk of progression to dementia

Key Statistic à About 22% of USA people over age

71 (5.4 million) have Cognitive Impairment Without Dementia (CIND; Plassman et al, 2008)

Combined Growth-Survival Model Based “Probability of Dementia” based only on HRS scores

at each Age for ADAMS participants (n=822) Probability of AD

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Age at Measurement

60 70 80 90 100

Causes and Control of Dementia? As of this date, there are no clear direct controls of dementia, or

AD in particular, but many people are looking. Dementia is a brain disorder so much of the evidence is based

on Beta-Amyloid increases, and the resulting neuro-degeneration. These have been linked to specific gene markers (APOE) and drugs therapies, but little is certain.

Increases in Cardiovascular health has recently led to an expected consequence of longer life, but the risk of incidence of AD still increases with age.

Women live longer, so the risk is higher (60%), but the age related changes maybe different for males and females -- lots of informative studies of hormone therapy going on.

No single behavioral therapies (i.e., training, using mind games, weight lifting, etc,) have demonstrated any form of clinical efficacy for preventing or delaying the onset of AD.

5. Future Plans,

Summary & Conclusions

Summary of HRS Cognitive Results #1. Our analyses show multiple common factors are measured

in the current HRS, including Episodic Memory, Mental Status, and Vocabulary. These all behaved similarly to other research studies of Cognition and Aging.

#2. These factors at the initial testing are strongly related to Age (-.7), Gender (+.5f), Education (+.3), Cohort (-.3), and Couple (+.1), but are not related to the format of the interview (by Telephone of Face-to-Face) up to now.

#3. The longitudinal mixed effects age models for EM[t] have specific confounds – (a) positive retest/practice effects exist and (b) the groups of complete and incomplete cases are not strictly equivalent because the dropouts have lower initial scores. This is like other studies in Cognition and Aging.

#4. The longitudinal mixed effects age models for EM[t] and show the cognitive status at age 65 is related to Gender (+5f), Education (+2), and Birth Cohort (-12).

#5. The longitudinal mixed effects age models for EM and show the cognitive changes over age are related to Education (-.3) and Birth Cohort (+2). Dynamic models have been fit.

Summary of CogUSA Findings #1: These Cognitive tests should strong diversity (not a G)

when measured over the telephone or in-person. #2: Cognitive tests administered over the telephone and in-

person are not identical, but they are more related than any other tests. There is a small practice effect.

#3: Not all 30 items are needed for the Wscore. And the BAT is as good as the FAT.

#4: The NS Wscores are negatively related to Age and positively related to Education.

#5: The NS Wscores are related to other Economic Outcomes (see McArdle, Smith & Willis, 2009; NBER)

#6: The Repeated Testing should yield Trait, Trait-Change, and State-like components, and we can measure Gf.

Summary of ALP 2010 Internet Findings #1: Not all 30 items are needed for the Wscore. And the

BAT is as good as the FAT. (We note that in ALP the start point is item 4 and we stop after 2 consecutive failures or a 1-2 minute RT delay).

#2: Test forms A and B of 15 items each yield the same Wscore (SEW), but there may be a practice effect.

#3: Wscores are negatively related to Age and positively related to Education.

#4: Wscores are related to other Economic Outcomes in ALP (see McArdle, Smith & Willis, 2009; NBER)

#5: Wscores are not related to Need for Cognition scores. #6: This Testing will yield Trait, Trait-Change, and State-

like components, and we can measure Gf over the WEB.

New HRS Measurement Data The HRS has kept all prior Cognitive Measures in place –

there measures seem to do a very good job in determining (1) Episodic Memory (M[t]), and (2) Mental Alertness (A[t]) but not other key functions.

Also available are all the cognitive data we have collected on NS, and all other cognitive data in the HRS modules, ADAMS, CogUSA, and the ALP.

Many people have been measured on the NS in HRS 2010 and it is now available in W-score form. Since it is one of the few measures of Gf in the HRS battery is should be a useful addition (McArdle, Smith & Willis, 2011).

The Fluency task added was based only on the popular “Animals” question (i.e., How many Animals can you name (45 seconds), and a “Verbal Analogies” task too.

Other tests are being considered based on “Speed of Mental Processing” (e.g, Speed of Response to BC).

Final IOM Conclusions 1. There is not one thing (a construct) called Adult Intelligence or Adult Cognition; There are at least two things that are useful, probably much more.

2. So we must be specific in what we hope to measure – cognition in the normal range (HRS) or cognition in non-normative disease states (ADAMS).

3. There are many good reasons to do a broad surveillance of “dementia” now (similar to BFRSS), especially among the younger people.

WITH THANKS! •  The continuing NIA support (AG-07137-22)

No.AG‑07137-17) •  The CogUSC Team (John, Kelly, Kelly, & Yan) •  The UM-HRS Team (Gwen, Brooke, Jessica, Bill,

Bob, David) •  The RAND-USC-ALP team (Arie, Tania, Bas,

Andy)

Author References Horn, J. L. & McArdle, J.J. (2007). Understanding human intelligence since

Spearman. In R. Cudeck & R. MacCallum, (Eds.). Factor Analysis at 100 years (pp. 205-247). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

McArdle, J.J. & Woodcock, R.W. (Eds., 1998). Human Abilities in Theory and Practice. Mahwah, NJ: Erlbaum.

McArdle, J.J., Ferrer-Caja, E., Hamagami, F. & Woodcock, R.W. (2002). Comparative longitudinal multilevel structural analyses of the growth and decline of multiple intellectual abilities over the life-span. Developmental Psychology, 38 (1) 115-142.

McArdle, J.J., Fisher, G.G. & Kadlec, K.M. (2007). Latent Variable Analysis of Age Trends in Tests of Cognitive Ability in the Health and Retirement Survey, 1992-2004. Psychology and Aging, 22 (3), 525-545.

McArdle, J.J. (2011). Longitudinal Panel Analysis of the HRS Cognitive Data. In H. Haupt (Ed). Adv. Stat. Anal. Springer-Verlag: Berlin., 95 (4), 453-480.

Plassman, B.L., Langa, K.M., Fisher, G.G., Heeringa, S.G., Weir, D.R., Ofstedal, … et. (2008). Prevalence of Cognitive Impairment without Dementia in the United States. Annals of Internal Medicine, 148, 427-434.