Handling social science data: Challenges and responses

19
Handling social science data: Challenges and responses Paul Lambert, University of Stirling DAMES research Node, www.dames.org.uk DIR workshop: Handling Social Science Data 17/MAR/2010 1

description

"Handling social science data: Challenges and responses", Paul Lambert, 17th March 2010

Transcript of Handling social science data: Challenges and responses

Page 1: Handling social science data: Challenges and responses

Handling social science data: Challenges and responses

Paul Lambert, University of Stirling

DAMES research Node, www.dames.org.uk

DIR workshop: Handling Social Science Data17/MAR/2010 1

Page 2: Handling social science data: Challenges and responses

What is social science data?

17/MAR/2010 DIR workshop: Handling Social Science Data 2

Example: Accessing surveys via UK Data Archive

Shibboleth authentication

Download and analyse in Stata, SPSS, etc

Page 3: Handling social science data: Challenges and responses

Principal forms of data…

• ‘Large and complex social surveys’ Longitudinal; cross-national; hierarchical

• Small scale social surveys

• Administrative data (e.g. ADMIN node; ADLS; commercial data)

• Supplementary (digital) data E.g. ‘GESDE’ services at DAMES

• Qualitative material – auido / video / textual17/MAR/2010 DIR workshop: Handling Social Science Data 3

Page 4: Handling social science data: Challenges and responses

Large and complex social surveys• several thousand variables• tens of thousands of cases (micro-data)• additional complex survey data features (e.g. household clustering)

17/MAR/2010 DIR workshop: Handling Social Science Data 4

Page 5: Handling social science data: Challenges and responses

Complex data example: British Household Panel Survey dataset [SN 5151]

31877 100.00 XXXXXXXXXXXXXXXXX 17941 56.28 100.00 (other patterns) 593 1.86 43.72 11............... 631 1.98 41.86 ................1 632 1.98 39.88 ........1........ 840 2.64 37.90 ..........1...... 964 3.02 35.26 1................ 1224 3.84 32.24 ......11111...... 2032 6.37 28.40 ..........1111111 2726 8.55 22.02 ........111111111 4294 13.47 13.47 11111111111111111 Freq. Percent Cum. Pattern

1 1 2 6 9 17 17Distribution of T_i: min 5% 25% 50% 75% 95% max

(pid*year uniquely identifies each observation) Span(year) = 17 periods Delta(year) = 1 unit year: 1991, 1992, ..., 2007 T = 17 pid: 10002251, 10004491, ..., 1.794e+08 n = 31877

. xtdes, i(pid) t(year)

Total 224,624 100.00 2007 14,910 6.64 100.00 2006 15,392 6.85 93.36 2005 15,627 6.96 86.51 2004 15,791 7.03 79.55 2003 16,238 7.23 72.52 2002 16,597 7.39 65.29 2001 18,867 8.40 57.91 2000 15,603 6.95 49.51 1999 15,623 6.96 42.56 1998 10,906 4.86 35.60 1997 11,193 4.98 30.75 1996 9,438 4.20 25.77 1995 9,249 4.12 21.56 1994 9,481 4.22 17.45 1993 9,600 4.27 13.23 1992 9,845 4.38 8.95 1991 10,264 4.57 4.57 year Freq. Percent Cum.

. tab year

• This example shows BHPS being analysed in Stata. BHPS re-contacts subjects annually (since 1991)

• 4294 interviewed as adults every year for 17 years. • Analysis methods, and measurement issues over

time, are challenging.

Page 6: Handling social science data: Challenges and responses

Supplementary (digital) data

• E.g. ‘Occupational information resources’ = data files within information on occupations, which can be usefully linked to micro-data about occupations

e.g. GEODE acts as a library of OIRs, www.geode.stir.ac.uk

Such resources are oftennot widely known about,but have the ability toenhance analysis

17/MAR/2010 DIR workshop: Handling Social Science Data 6

Page 7: Handling social science data: Challenges and responses

DIR workshop: Handling Social Science Data 7

Example: Qualitative data used by ‘Digital Records for e-Social Science’ (DReSS)

• transcribed talk

• audio / video• digital

records• system logs• location

transcript

code tree

video

system log

17/MAR/2010

Page 8: Handling social science data: Challenges and responses

Three well-known challenges• We’re data rich, but analysts’ poor

• UK Data Forum (2007); Wiles et al (2009)• Under-use of suitably complex statistical models

• Coordination and communication on data processing • Recodes / Standardisation / harmonisation / documentation• Not rewarded/incentivised to researchers

• Lack of generic/accessible representation of tasks• Limited disciplinary/project/researcher cross-over when dealing with

data• Specific software orientations

These are not generally problems of scale, but of organisation

17/MAR/2010 DIR workshop: Handling Social Science Data 8

Page 9: Handling social science data: Challenges and responses

‘Managed’ responses?

• Data handling/analysis capacity-buildingESRC programmes (NCRM, RDI, RMP); training

workshops/materials; P/G funds; strategic research grant investment

• Documentation/replication policiesDale (2006)

• Software for data access and analysisNESSTAR – UK Data Archive data/metadata browserLong (2009) on the Stata softwareRemote access to data (e.g. SDS)

17/MAR/2010 DIR workshop: Handling Social Science Data 9

Page 10: Handling social science data: Challenges and responses

..train and/or constrain the analysts..

Train them ->

17/MAR/2010 DIR workshop: Handling Social Science Data 10

Page 11: Handling social science data: Challenges and responses

..constrain the analysis..

17/MAR/2010 DIR workshop: Handling Social Science Data 11

Page 12: Handling social science data: Challenges and responses

Non-hierarchical responses?

Technological collaborative services might support effective, unmanaged data access, coordination and exploitation(in principle)

UK e-Social Science investment in data oriented social science research support NeISS; E-Stat; DAMES; Obesity e-Lab; CQeSS

17/MAR/2010 DIR workshop: Handling Social Science Data 12

Page 13: Handling social science data: Challenges and responses

..some examples..

E-Stat @

National e-Infrastructure for Social Simulation

• Expert led simulation demonstrations

• Combining data resources• Workflows for the simulation

analysis Modify and re-specify existing

simulation templates

17/MAR/2010 DIR workshop: Handling Social Science Data 13

Design a tool to specify complex statistical models in generic / visual terms

Multilevel modelsMultiple data permutations and analytical alternatives

Ready access to a suite of complex modelling tools

Page 14: Handling social science data: Challenges and responses

DAMES – online services for data coordination/organisation

Tools for handing variables in social science data

Recoding measures; standardisation / harmonisation; Linking; Curating

17/MAR/2010 DIR workshop: Handling Social Science Data 14

Page 15: Handling social science data: Challenges and responses

GESDE – Search and browse supplementary data on occupations; educational qualifications; ethnicity

17/MAR/2010 DIR workshop: Handling Social Science Data 15

Page 16: Handling social science data: Challenges and responses

• Data curation tool (for collecting metadata)

17/MAR/2010 DIR workshop: Handling Social Science Data 16

Page 17: Handling social science data: Challenges and responses

Handling data: analysis-oriented data management priorities

• {Data collection or creation}• Data preservation or curation

• Data enhancement/modification

• Data analysis• Multiple permutations of related analyses• Documentation and replication

17/MAR/2010 DIR workshop: Handling Social Science Data 17

Page 18: Handling social science data: Challenges and responses

Ideas on the future of social science research data

• Enduring challenges of documentation for replication, and coordination

• More and more comparative analysis• Harmonisation and standardisation

• Data linkage and data enhancement• Models for complex multiprocess systems • Fluency – increasing uptake by more users

17/MAR/2010 DIR workshop: Handling Social Science Data 18

Page 19: Handling social science data: Challenges and responses

References and Links

• ADLS: http://www.adls.ac.uk/ • ADMIN Node: http://www.ncrm.ac.uk/about/organisation/Nodes/ADMIN/ • DAMES Node: http://www.dames.org.uk/ • DReSS: http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/ • Secure Data Service: http://securedata.ukda.ac.uk/ • UK Data Archive: http://www.data-archive.ac.uk/

• Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.

• Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.

• Wiles, R., Bardsley, N., & Powell, J. L. (2009). Consultation on research needs in research methods in the UK social sciences. Southampton: University of Southampton / ESRC National Centre for Research Methods, and http://eprints.ncrm.ac.uk/810/

17/MAR/2010 DIR workshop: Handling Social Science Data 19