Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme...

30
Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal Studies and Director of CLOSER [email protected] Sub-brand to go here

Transcript of Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme...

Page 1: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme

Jane Elliott

Director of the Centre for Longitudinal Studies and Director of CLOSER

[email protected]

Sub-brand to go here

Page 2: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Summary

• A brief overview of CLOSER• Early progress on harmonisation work packages

– biological structure– Socioeconomic status and qualifications

• Uniform Search Platform• Contextual database• Benefits of cross cohort analysis

2

Page 3: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Cohorts and Longitudinal Studies Enhancement Resources = CLOSER

Nine Longitudinal Studies Hertfordshire Cohort Study 1946 British Birth Cohort 1958 British Birth Cohort 1970 British Birth Cohort ALSPAC – Avon Longitudinal Study of Parents and Children Millennium Cohort Study Southampton Women’s Study Life Study Understanding Society

Funded by ESRC and MRC

3

Page 4: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Objectives & timetableMaximise the use, value and impact of data collected through a portfolio

of key UK longitudinal studies

• Stimulate interdisciplinary research across major longitudinal studies

• Provide common resources for research

• Assist with training and development

• Share information and expertise between study teams

1st October 2012 – 30th September 2017

4

Page 5: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Work streams4 work packages on data harmonisation

3 work packages on data linkage

Core work on

Impact – Lead by the British Library

Training and Capacity Building

Uniform Search platform

Leadership team contributing to strategic planning, sharing of best practice, funders’ strategies

See our website: www.CLOSER.ac.uk for further information

Twitter: @CLOSER_UK

5

Page 6: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

6

Leadership team

1946 cohort

1958 cohort

1970 cohort

ALSPAC

MCSUnderstanding

Society

SWS

HCS

Life Study

Metadata

Uniform Search Platform

Training and capacity building

Impact

WP6: Data linkage - geography

WP5: Data linkage administrative data

WP7: Data linkage – health data

Page 7: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Vision for the USP

• Portal to discovery of hundreds of thousands of variables, questions and data collection instruments across the nine longitudinal studies:

• covering survey and biomedical data collection• promoting CLOSER harmonisation work• state-of-the-art searching tool• focus on improving visibility of associations between (currently) disparate

metadata items• shared subject/topic classification

• We should remember that this is massively ambitious; something that matches or surpasses the best multi-study metadata repository out there:

• RAND Survey Meta Data Repository covering the HRS family of studies: https://mmicdata.rand.org/megametadata/

Page 8: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Why do it?

• Benefits to users:

• single resource discovery portal – replacing a fractured resource discovery landscape

• lowers barriers to conducting cross-cohort analysis• increased visibility of cohort data and resources

• Benefits to data managers:

• standardised metadata management workflows – currently curated in isolation• workflows in place for future ‘joiners’

• Benefits to Principal Investigators/survey commissioners:

• make prospective harmonisation easier• promotion and re-use of tested questions and instruments

Page 9: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Assumptions, constraints

• Not a data repository

• Not a major software development project:

• major £££ is for metadata creation/enhancement

• DDI-L agreed as standard for metadata exchange:

• covers subject areas (bio and soc science) and data collection methods (‘hard’ instrument and survey)

• designed for marking-up longitudinal/repeated metadata items

• Colectica Designer selected as preferred metadata ingest/editing software

Page 10: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Challenges

 • Legacy metadata:

• elderly and decrepit!• not always designed for equivalence within a study, much less across

studies• differing or non-existent naming conventions• substantial (manual) effort required to establish equivalences and level of

equivalence

• Metadata managed by five or six different units: different formats, workflows, vocabularies

• Relative lack of familiarity with DDI-L:

• uneven knowledge across study units

Page 11: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Metadata: State of play

• >200k variables

• c.150 data collections:

• CAI, PAPI, nurse visit, clinic-based protocol, biosamples, etc.

• c.85 validated survey instruments

• GHQ, AUDIT, Malaise Inventory, etc.• c.10 instruments used in >1 study

• c.20 validated clinical measures

• blood pressure, bone density, lung function, etc. • range of instruments used

• c.15 cognitive or physical tests

Page 12: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

How to do it?

• USP will be a web interface that sits on top of a central repository fed by metadata created and delivered both by the individual study units and the CLOSER core

• Study units continue to curate metadata as they see fit; but not in conflict with proposed USP metadata profile

• Substantial metadata creation and enhancement to be undertaken by the study units: inputting historical questionnaires; mapping between data items and data collection

• CLOSER core responsible for identifying common (cross-study) variable and question schemes, allowing studies to reference these and also any agreed controlled vocabularies (concept, life stage etc.)

Page 13: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Contextual database - rationaleLife course approach stresses the importance of the connection between

individuals and the historical and socioeconomic context in which these individuals lived

But some research based on cohort studies pays little attention to the social, economic or historical context that helps shape the lives of individuals

Some data on social change and social context will come from the studies themselves (e.g. Breast feeding)

Aim of the contextual database is to provide a central source of key indicators over time likely to be of direct relevance to cohort research

13

Page 14: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

14

Source: Changing Britain Changing Lives : Three generations at the turn of the century Table 8.3 (Wadsworth et al)

Page 15: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Proportion of women in paid employment, by age and cohort

Source: Jenny Neuburger - Paper presented at CLS June 2008

Page 16: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Contextual database - elements

16

Economic indicatorsQualifications & EducationDemographyHealth & health behaviourInequality & povertyLabour market and unemploymentHousingDigital economy

Also want to include policy narratives and a bibliography

Page 17: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Work package 1Biological structure and function

Two years March 2013- February 2015William Johnson & Rebecca Hardy

MRC Unit for Lifelong Health and Ageing

Blood pressure

Cognitive performance

Physical capability

Body size and composition

Page 18: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Research priority

Body size - because of the obesity epidemic and the long term consequences of adiposity on health & well-being

Need for harmonisation:

Body size data froma single study

Harmonised body size data across multiple studies

Restricted N and power Larger N and greater power

Results may not begeneralizable

Replication of results andquantification of heterogeneity

Modelling capabilitydependent on studydata

Modelling capability increased by pooling data

Age and period effectsconfounded

Decompose age and periodeffects

No cohort effects (secular trend) Investigate cohort effects

Page 19: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

First papers

Compare body size distributions and mean trajectories, across different phases of the life course, between cohorts

Investigate how SEP inequalities in body size trajectories, across different phases of the life course, differ between cohorts

Li L et al. Am J Epidemiol. 2008

Howe LD et al. JECH. 2012

Page 20: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

0 2 4 6 7 11 15 20 26 36 43 53 60-64

0 7 11 16 23 33 42 44 50

0 5 10 16 26 30 34

0 7 8 9 10 11 12 13 15 18

0 1 3 5 7

Studies

Page 21: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Data

Page 22: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Between studies:Data covering different age ranges

Data increasingly positively skewed in more recent studies

Within individuals:Different number of observations at different exact ages

Different precision of data

Within and between individuals:Both measured and self-reportdata

Challenges

Page 23: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

23

Page 24: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

1) Demonstration research project focussing on socioeconomic differences in growth and obesity across cohorts

2) A harmonised dataset, with accompanying documentation for other

users

What we are aiming to achieve:

Page 25: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Socio-economic data harmonisation work package Claire Crawford, Brian Dodgeon, Tim Morris, Sam Parsons,

Anna Vignoles (lead)

Two years April 2013- March 2015

Page 26: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

What measures?Measures to be harmonised are: • parental education level• cohort member level of education• socio-economic (occupation) status• household equivalised income• home ownership

Cohorts: NSHD; NCDS; BCS; ALSPAC; MCS

Page 27: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Priority Measures agreed• Highest qualification (vocational/academic separately) held at every

age• Age left full time education • Whether the person went past compulsory schooling • Average GCSE score or equivalent• GCSE Grades in mathematics and English (not for all cohorts)• For cohort member parents - age left full time education and

highest qualification at birth of CM• Grandparents’ age left school

Page 28: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Measures available by cohort

NSHD NCDS BCS70 ALSPAC MCS

Cohort MemberHighest qualification (each age)

✔ ✔ ✔ ✔

Age left full-time education ✔ ✔ ✔ ✔

Post compulsory education ✔ ✔ ✔ ✔

Maths grade [O’level, CSE, GCSE] ✔ ✔ ✔ ✔

English grade [O’level, CSE, GCSE] ✔ ✔ ✔ ✔

Exam total score [O’level, CSE, GCSE] ✔ ✔ ✔ ✔

ParentAge left full-time education ✔ ✔ ✔ ✔ ✔

Highest qualification [birth or nearest data collection point ✔ ✔ ✔ ✔

GrandparentAge left full-time education ✔

Page 29: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

The value of cross-cohort analysis

1) A meta-narrative of societal change over time

2) Creating a synthetic life course – understanding life time trajectories

3) Investigate cohort effects - examining the impact of different social and policy contexts

4) Replication of results – checking the robustness of models

5) Larger N and greater power

6) Decompose age and period effects

29

Page 30: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal.

Lifetime systolic blood pressure trajectories and velocities (predicted means)

Men Women

Wills et al. PLOS Med, 2011