Neuroscience on a Population Scale: Design, Measurement ...

23
Steven G. Heeringa Senior Research Scientist and Associate Director Survey Research Center, Institute for Social Research University of Michigan Annual Workshop Michigan Program in Survey Methods University of Michigan October 25, 2019 Neuroscience on a Population Scale: Design, Measurement, Data Integration and Analysis. Outline and focus of today’s talk Unique challenges in population neuroscience studies 9 Design 9 Measurement 9 Data types and integration 9 Statistical analysis and interpretation Adolescent Brain Cognitive Development (ABCD) Study

Transcript of Neuroscience on a Population Scale: Design, Measurement ...

Page 1: Neuroscience on a Population Scale: Design, Measurement ...

Steven G. Heeringa

Senior Research Scientist and Associate Director

Survey Research Center, Institute for Social Research

University of Michigan

Annual Workshop

Michigan Program in Survey Methods

University of Michigan

October 25, 2019

Neuroscience on a Population Scale:Design, Measurement, Data Integration and Analysis.

Outline and focus of today’s talk

• Unique challenges in population neuroscience studiesDesignMeasurementData types and integrationStatistical analysis and interpretation

• Adolescent Brain Cognitive Development (ABCD) Study

Page 2: Neuroscience on a Population Scale: Design, Measurement ...

Design Challenges: Subject recruitment, sampling plans for population neuroscience research

Both internal and external validity are needed

Study participation is demanding and study consent and response is a major challenge.

Portability/geographic reach of the measurement technology

Full, conventional probability sample designs like that employed in the epidemiological studies such as the National Health and Nutrition Examination Survey (NHANES) may not be feasible.

Internal Validity, External Validity and “Population Representativeness” in Epidemiological Studies.

Keiding and Louis (2016), Perils and potentials of self-selected entry to epidemiological studies and surveys. J. R. Statist. Soc. A 179, Part 2, pp. 319–376

However, self-selection dramatically departs from the traditional, so-called gold standard approaches of targeted enrolment to scientific studies and sampling frame-based surveys. Traditionalists argue that we must adhere to the values of planned accrual and follow-up for all studies and identification of a sampling frame for surveys and possibly also for epidemiological and other such studies. Others propose that we should stop worrying about it and open up accrual, using modern approaches (covariate adjustments, find instrumental variables, ‘big data’) to make the necessary adjustments.

Elliott, M.R. and Richard L. Valliant. 2017 “Inference for nonprobability samples.” Statistical Science, 32(2):249-264.

Page 3: Neuroscience on a Population Scale: Design, Measurement ...

Additional design/measurement challenges in population neuroscience studies.

The Brain R2 Problem: Multi-level (brain, individual, social networks, community, environment, culture).

Sample size and power to detect small effects and interactions.• Important outcomes are likely a function of many small effects.• Analogy to trends in analysis of genetic/epigenetic contributions to

health and other outcomes.

Types of Measurement in Population Neuroscience Studies

DNA

Epigenetics

Biomarkers

Child Assessments

Parent Interviews

Family History

Environment

School and AdminRecords

MRI

DTI

fMRI

Page 4: Neuroscience on a Population Scale: Design, Measurement ...

Epidemiological, Statistical and Computational Challenges in Population Neuroscience

• Descriptive/Analytical?, “left brain”/”right brain”• Informative sampling, subject recruitment design• Selection mechanisms (e.g. bias) in recruitment• Lack of independence, correlated observation

• Geographic clustering of subjects• Clustering in measurement (centers, interviewers, instruments)

• Latent Constructs• Missing data• Longitudinal Analysis of growth, change• Analysis of Brain Image Data

• Spatial correlation (3D array of voxels)• Temporal correlation (time series, fMRI)• Image correction, registration/standardization, smoothing• Dimension reduction, creating summary measures

Analytic Aims for Population Neuroscience

Brain as Brain : Anatomy, networks

Brain as Outcome: e.g. Does regular marijuana use (screen time) during adolescent development affect brain development (morphology) or function.

Brain as Predictor: e.g. Educational attainment.

Page 5: Neuroscience on a Population Scale: Design, Measurement ...

Computational and Statistical Tools• Whole Brain Computations and Graphical Display

• e.g. AFNI

• Cortical Surface Analysis• e.g. SUMA, FreeSurfer, SurfStat

• GLMs, GAMMs• R, SAS, Stata, etc.

• Machine learning (?)

• Pace of development and enhancement of methodology and software is rapid!

ADOLESCENT BRAIN COGNITIVE DEVELOPMENT STUDY

HTTPS:/ /ABCDSTUDY.ORG/INDEX.HTML

Page 6: Neuroscience on a Population Scale: Design, Measurement ...

Overview of ABCD• ABCD is the largest long-term study of brain development in the

United States.

• Baseline cohort of n=11,875 children age 9-10 recruited 09/2016-10/2018.• Recruitment and data collection in 21 sites nationally• Coordinating site: University of CA, San Diego• Special twin sample recruitment (n=1727) in 4 sites

• Longitudinal design to follow baseline cohort until age 20.

• “Open science model” for hypothesis generation and data sharing.

• NIH and CDC Partners. NIH/NIDA Funded.

Distribution of ABCD Study Sites, https://abcdstudy.org/

Page 7: Neuroscience on a Population Scale: Design, Measurement ...

ABCD Design Strategies to Maximize External Validity and Statistical Efficiency for Population-based Analysis

21 Sites and Catchment Areas• Nationally distributed• Demographically and socio-economically diverse• May be treated as a “pseudo” sample of primary stage units

Probability sample of schools and students within individual catchment areas• Introduces randomization and “representativeness” to the recruitment• Still highly vulnerable to selection bias due to noncooperation by

schools and parents within schools

Demographic controls (targets) for site specific baseline samples and the national aggregate.• Achieve minimum sample sizes and covariate balance with respect to

the U.S. population of eligible children.• Minimize weighting inefficiencies in descriptive analysis

Page 8: Neuroscience on a Population Scale: Design, Measurement ...

Informative Features of the ABCD Sample Design for Population-based Estimation and Inference

• Clustering of observations on ABCD children

Sites, schools, families, interviewers, imaging equipmentObservations not independent: Intra-class correlations. Approaches:

(1) Model the clustering as random effects. MLM(2) Use distribution-free robust methods that account for

clustering in variance estimation.

• Selectivity (random and nonrandom) of the sample selection/recruitment

Site selection, sample stratification, school consent, parental consentCovariate adjustment in analysis modelsCalibration weighting to established population controls

Page 9: Neuroscience on a Population Scale: Design, Measurement ...

Estimation of ABCD Population Descriptive Statistics

• Examples of descriptive statisticsMeans, quantiles of continuous variables: BMI, NIH Tool Box test scores, polygenic scores, hippocampus volume, measures of neuron activity.Categorical proportions for binary, multinomial and count variables.

• Employ software specifically developed for design-based estimation from clustered sample data, e.g. R Survey library.

• Employ propensity-based population weight factor in estimation of the descriptive statistics.

Propensity-based Population Weight for ABCD Participants

• 2011-2015 American Community Survey (ACS) serves a benchmark for characterizing U.S. children age 9 and 10.

n=376,370 observations of 9,10 year olds and their familiesKey demographic and SES variables with consistent measurement in ACS and ABCD baseline are identified: age, sex, race/ethnicity, family income, family type, hhsize, Census region, parent employment

• Use logistic regression (with weight) to model the logit of the probability that Y=1, that the case belongs to ABCD vs. ACS

0 1 1(i) logit[Prob(Y=1)|X]= ... P PX Xˆ

ˆˆ( ) ( 1 | )1

X

X

eii p Y Xe

ˆ( ) 1 / ( 1 | )iii Weight p Y X

Page 10: Neuroscience on a Population Scale: Design, Measurement ...

Fitted ABCD Propensity Model (Logistic)Predictor Category

Intercept -5.11 - - -Age 9 0.112 1.12 0.95 1.32Sex Male 0.037 1.04 0.88 1.22Race/Ethnicity White -0.787 0.46 0.34 0.61

Black -0.293 0.75 0.52 1.07Hispanic -0.849 0.43 0.31 0.60Asian -1.570 0.21 0.12 0.36

Family Income <$25K -0.711 0.49 0.34 0.71$25K-$49K -0.830 0.44 0.31 0.62$50K-$74K -0.705 0.49 0.35 0.69$75K-$99K -0.366 0.69 0.50 0.97$100K-$199K -0.149 0.86 0.64 1.15

Family Type Married 1.136 * * * Parent Employment Married, 2 in LF -0.846 * * *

Married, 1 in LF -1.037 * * *Married, O in LF -1.281 * * *Single, in LF 0.027 * * *

Region Northeast -0.424 0.65 0.51 0.84Midwest -0.489 0.61 0.48 0.78South -0.712 0.49 0.39 0.61

Household size 2-3 0.008 1.01 0.72 1.414 -0.115 0.89 0.66 1.205 -0.105 0.90 0.66 1.236 0.070 1.07 0.76 1.51

b̂ ˆ ˆ( )LCL ˆ( )UCL

Individual Weight Examples• Mean weight==(11,874/8,211,605)-1 ~ (0.00145)-1= 690

• Example 1: 9 year old African-American girl from New England who lives in a family of 4 with two working parents and a family income of $100K-$199K per year:

• Example 2: 10 year old girl of Asian ancestry residing in the South in a 4 person family with two parents who are not working and $25k-$49K total annual income:

1

1

exp( 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115)1 exp 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115

.003372 296.60

iW

1

1

exp( 5.11 1.570 0.830 1.136 1.281 0.115 0.712)1 exp 5.11 1.570 0.830 1.136 1.281 0.115 0.712

.000207 4828.09

iW

Page 11: Neuroscience on a Population Scale: Design, Measurement ...

Distribution of ABCD Baseline Analysis Weights

ABCD October 25, 2018 Data Set.

05.

0e-0

4.0

01.0

015

Den

sity

0 500 1000 1500 2000rpwgtmeth1

Distribution of ABCD Analysis Weights by Sex of Child

ABCD Final Baseline Data Set. N=11,873.

Page 12: Neuroscience on a Population Scale: Design, Measurement ...

Distributions of ABCD Analysis Weights by Family Income Category

ABCD Final Baseline Data Set. N=11,873.

ABCD Demographic Distributions.*Data through 10/25/2018.

Demographic/SES Characteristic

Category ABCD Sample ACS (Weighted)Unweighted Weighted **

n % % %Sex Male 6064 52.3% 51.2% 51.2%

Female 5530 47.7% 48.8% 48.8%

Age 9 6036 52.1% 49.6% 49.6%10 5558 47.9% 50.4% 50.4%

Race/Ethnicity Hispanic 2379 20.5% 24.0% 24.0%NH White 6104 52.6% 52.4% 52.4%NH Black 1683 14.5% 13.4% 13.4%

Asian 253 2.2% 3.6% 4.7%All Other 1175 10.1% 6.4% 5.5%

Total 11594 100.0% 100.0% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.

Page 13: Neuroscience on a Population Scale: Design, Measurement ...

ABCD SES Distributions.*Data through 10/25/2018.

Demographic/SES Characteristic

CategoryABCD Sample ACS (Weighted)

Unweighted Weighted**

n % % %Family Income <$25,000 1782 15.4% 20.2% 21.5%

$25,000-$49,999 1735 15.0% 20.7% 21.7%

$50,000-$74,999 1628 14.0% 17.5% 17.0%

$75,000-$99,999 1646 14.2% 13.1% 12.5%

$100,000-$199,999

3500 30.2% 21.6% 20.5%

>=$200,000 1303 11.2% 7.0% 6.8%

Total** 11594 100.0% 100% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.

ABCD: Population Estimates by SES

Characteristic Category ABCD (Unweighted)

% (se)

ABCD (Weighted, Design Corrected)

Pooled %(se)

Not-Pooled %(se)

Family Income <$25K 16.2 (0.34) 20.0 (2.4) 20.0 (2.3)$25K-$49K 14.9 (0.33) 20.5 (1.6) 20.1 (1.6)$50K-$74K 13.8 (0.3) 17.5 (0.9) 17.0 (0.9)$75K-$99K 14.3 (0.3) 13.2 (0.8) 13.2 (0.8)$100K-$199K 29.6 (0.4) 21.7 (2.2) 22.4 (2.2)$200K + 11.2 (0.3) 7.1 (1.0) 7.4 (1.1)

Parent Employment

Married, 2 in LF 50.3 (0.5) 41.9 (1.9) 42.2 (1.9)Married, 1 in LF 21.9 (0.4) 23.1 (1.7) 23.2 (1.8)Married, O in LF 1.3 (0.1) 2.0 (0.3) 1.9 (0.2)Single, M, in LF 1.6 (0.1) 1.9 (0.2) 2.0 (0.2)Single, M, Not in LF 0.4 (0.1) 0.4 (0.1) 0.4 (0.1)Single, F, in LF 19.5 (0.4) 24.0 (1.6) 22.4 (2.2)Single, F, Not in LF 5.1 (0.2) 6.7 (0.6) 7.4 (1.1)

ABCD Final Baseline Data Set. N=11,873.

Page 14: Neuroscience on a Population Scale: Design, Measurement ...

ABCD: Estimates of the Population Distribution of NIH Tool Box Flanker and Reading Test Scores (uncorrected)

ABCD Final Baseline Data Set. n=11,873

Variable Distribution Statistic

ABCD (Unweighted)

ABCD (Weighted, Design Corrected)

Pooled Not-Pooled NIH ToolBoxFlanker Test(Uncorrected)

n 11,712 11712 9999Mean 94.0 (0.08) 93.83 (0.30) 93.83 (0.27)Q5 75.98 (0.34) 75.52 (0.96) 75.62 (0.99)Q25 88.95 (0.14) 88.70 (0.50) 88.72 (0.48)Q50 (Median) 94.94 (0.10) 94.79 (0.26) 94.76 (0.22)Q75 99.79 (0.09) 99.65 (0.19) 99.64 (0.16)Q95 105.77 (0.11) 105.74 (0.18) 105.77 (0.16)

NIH ToolboxReading Test(Uncorrected)

n 11704 11704 9991Mean 90.86 (0.06) 90.60 (0.07) 90.74 (0.25)Q5 79.22 (0.18) 78.77 (0.49) 78.71 (0.48)Q25 88.69 (0.09) 86.40 (0.28) 86.46 (0.29)Q50 (Median) 90.25 (0.04) 90.05 (0.10) 90.20 (0.11)Q75 94.26 (0.06) 94.00 (0.31) 94.25 (0.16)Q95 101.37 (0.12) 101.25 (0.25) 101.47 (0.25)

ABCD: Population estimates by Number of Lifetime ER Visits

ABCD Early Release Data Set. Recruitment to 09/2017.

Variable Count of Visits

ABCD (Unweighted)

% (se)

ABCD (Weighted, Design Corrected)

Pooled %(se) Not-Pooled %(se)

Lifetime ER Visits 0 45.2 (0.5) 43.9 (1.4) 43.9 (1.5)1 25.5 (0.4) 25.1 (0.6) 25.1 (0.6)2 15.9 (0.3) 16.1 (0.4) 16.2 (0.7)3 10.8 (0.3) 11.8 (0.6) 11.6 (0.7)4 2.0 (0.1) 2.4 (0.3) 2.5 (0.3)5 0.5 (0.1) 0.7 (0.1) 0.7 (0.1)

Page 15: Neuroscience on a Population Scale: Design, Measurement ...

Poisson Regression of LT ER Visits.

.

Regression Parameter

Regression Method Model Coefficient Relative Risk RatioParameter Estimate

Standard Error

Relative Risk

LCI UCI

Sex: Female MLE -0.148 0.019 0.86 0.83 0.90Design:Pooled -0.135 0.020 0.87 0.84 0.91Design:Not Pooled -0.130 0.020 0.88 0.84 0.91Model: 2 Level, All sites -0.149 0.016 0.86 0.83 0.89Model: 2 Level, No twin -0.151 0.016 0.86 0.83 0.89Model: 3 Level (DEAP) -0.145 0.021 0.87 0.83 0.90

FamInc: 25-49k OLS 0.131 0.033 1.14 1.07 1.22Design:Pooled 0.113 0.041 1.12 1.03 1.21Design:Not Pooled 0.120 0.045 1.13 1.03 1.23Model: 2 Level, All sites 0.144 0.039 1.15 1.07 1.25Model: 2 Level, No twin 0.152 0.039 1.16 1.08 1.26Model: 3 Level (DEAP) 0.141 0.040 1.15 1.06 1.25

Multivariate Modeling of ABCD Cross-sectional Relationships and Longitudinal Outcomes

• Multi-level modeling (ABCD DEAP)

Three levels with abcd_site and family defining the random effects at Level 3 and Level 2 (DEAP method)

Include key demographic and SES measures as Level 1 fixed effects/ covariates

Explore scientifically-justified first level interactions between key demographic and SES covariates

No current evidence to support recommendation on use of weights in multilevel analysis

Page 16: Neuroscience on a Population Scale: Design, Measurement ...

Full model-based approach to analyzing ABCD data on developmental outcomes (DEAP GAMM4 model)

0 1

00 00

Y Level 1 Modelwhere:

1,..., indexes the individual cohort member;j=1,..., N indexes the cohort member's family;k=1,...,21 indexes the ABCD site/imaging center.

ijk jk jk i ijk

jk k

jk

jk

x

i

R

U

t

1 10 1

00 00 000

10 100 10

Level 2 Model InterceptLevel 2 Model Slope

Level 3 Model InterceptLevel 3 Model Slope

jk k jk

k

k k

k

U

VV

ABCD Data Exploration and Analysis Portal

Page 17: Neuroscience on a Population Scale: Design, Measurement ...

DEAP: “Explore” Interface

DEAP: Multi-level Analysis Interface

Page 18: Neuroscience on a Population Scale: Design, Measurement ...

ABCD Data Access

NIMH Data Archive (NDA)https://ndar.nih.gov/study.html?id=576

Thank you! Questions?

[email protected].

Page 19: Neuroscience on a Population Scale: Design, Measurement ...

Supplemental slides

Standardized Measures: NIH Tool Box

http://www.healthmeasures.net/explore-measurement-systems/nih-toolbox

Page 20: Neuroscience on a Population Scale: Design, Measurement ...

MRI Technology

How it this implemented?◦ Big magnet

◦ Focuses on magnetic properties of water◦ Cells have lots of water◦ Blood can be more or less magnetic depending

on how much oxygen is in it◦ Background:

◦ MRI machines vary in:◦ Imaging strength – 3T is the research standard◦ Bore size – how big the hole is◦ If optimized for brain imaging

◦ Head coil◦ Many now “research dedicated”

◦ Musts:◦ Researchers – have biophysics support/expertise◦ Participants: Lie still, tolerate the noise

Source: Luke Hyde, University of Michigan

MRI: Types of data collected

Structural/Anatomical MRI◦ Amount of grey matter◦ Brain region size,shape◦ Amount of corticol folding

◦ Gives you◦ Development of the brain over time – our understanding of

adolescents and normal/abnormal development◦ Individual differences in size/shape/density/function of brain areas

across individuals.◦ e.g., SES effects on neural structure, amygdala structure for

children with and without autism◦ For better or worse, this is compelling to the public/policy

makers◦ Predictor of a health or behavioral outcome

Source: Luke Hyde, University of Michigan

Page 21: Neuroscience on a Population Scale: Design, Measurement ...

DTI: Types of data collected

Structural: White Matter◦ Diffusion Tensor Imaging (DTI)

◦ White matter are axons carrying information from one area to another, often in bundles

◦ Maps the “highways” of information◦ Uses the direction water is flowing

◦ Gives you:◦ Individual level tracts across whole brain◦ Look at individual differences in the development of these tracts◦ How they correlate with predictors or outcomes

Source: Luke Hyde, University of Michigan

fMRI:Types of data collectedFunctional MRI

◦ Uses BOLD (Blood Oxygen Level Dependent) signal◦ Blood changes in magnetization when oxygenated

◦ See which brain areas are activated as participants do a task – anything that involves thinking!◦ Not clear if input or output blood flow◦ Relatively “slow” – only every 2 seconds◦ Relatively “large” – 1 voxel = 2 x 2 x 2 mm

◦ Millions of neurons◦ Indirect measure of brain activity

◦ Can access most of the brain!◦ Can be:

◦ Task-based◦ “resting”

◦ Analyzed as:◦ Specific brain areas (brain mapping)◦ Networks and how they cohere

Source: Luke Hyde, University of Michigan

Page 22: Neuroscience on a Population Scale: Design, Measurement ...

Understanding complex pathways

****

ns

Source: Luke Hyde, University of Michigan

Brain as predictor: So why use it?

Can be a better predictor than self-report◦ Berkman & Falk (2013). Beyond

brain mapping: Using neural measures to predict real-world outcomes. Current Directions in Psychological Science.

◦ Predicting treatment success◦ Neurofeedback

Can tell us more about the underlying thought process that can’t be reflected on.

◦ E.g., emotion versus cognitive areas

◦ Adolescents

Source: Luke Hyde, University of Michigan

Page 23: Neuroscience on a Population Scale: Design, Measurement ...

Whole Brain and Cortical Surface Analysishttps://afni.nimh.nih.gov/

FreeSurfer: https://surfer.nmr.mgh.harvard.edu/