Ethnicity Review Geography of Inequality: Race, Ethnicity, and Gender.
Review and consultation: Next steps in supporting data on ethnicity
-
Upload
gary-stuart -
Category
Documents
-
view
15 -
download
1
description
Transcript of Review and consultation: Next steps in supporting data on ethnicity
DAMES workshop on ‘Data on ethnicity in social survey research’, 28th January 2010, University of Stirling
Review and consultation: Next steps in supporting data on
ethnicity
Some preliminary comments: i. E-Social Science
ii. Challenges/principles
iii. Ethnicity research agendas
Further comments/discussions/questions
2
3
i) What makes this ‘e-Social Science’?
Attention to data management in context of.. Standards setting Metadata Portal framework
Liferay portal to various DAMES resources
iRODS system for ‘GE*DE’ specialist data
Controlled data access under security limits
Use of workflows
4
‘Data Management’
‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing
related data resources and combining them within the process of analysis’ […the DAMES Node..]
Usually performed by social scientists (post-release)Most overt in quantitative survey data analysis Usually a substantial component of the work process
Here we differentiate from archiving / controlling data itselfHere we differentiate from archiving / controlling data itself
5
‘Data Management though e-Social Science’
DAMES – www.dames.org.uk
ESRC Node funded 2008-2011
Aim: Useful social science provisionsSpecialist data topics – occupations; education
qualifications; ethnicity; social care; health Mainstream packages and accessible resources Engage with existing provisions (e.g. ESDS; CESSDA)
Programme of case studies and provisions – more later
6
‘The significance of data management for social survey research’
Data management is a major component of the social survey research workload
Pre-release manipulations performed by distributors / archivists• Coding measures into standard categories; Dealing with missing records
Post-release manipulations performed by researchers • Re-coding measures into simple categories• All serious researchers perform extended post-release management (and have the scars to show for it)
We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently
So the ‘significance’ of DM is about how much better research might be if we did things more effectively…
7
Data Management through e-Social Science www.dames.org.uk
1.1) Grid Enabled Specialist Data Environments (‘GE*DE’)
2.1) Description, discovery & service use through metadata and data abstraction
1.2) Data resources for micro-simulation on social care data
2.2) Techniques to handle data from multiple sources
1.3) Linking e-Health and social science databases
2.3) Workflow modelling for social science
1.4) Training and interfaces for management of complex survey data
2.4) Security driven data management
9
Challenges/principles
Data manipulation skills and inertia
I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset Data supply decisions (‘what is on the archive version’) are critical
Much of the explanation lies with lack of confidence in data manipulation / linking data
Too many under-used resources – cf. www.esds.ac.uk
10
Software issues
Stata seems to be the superior package for secondary survey data analysis:
o Advanced data management and data analysis functionalityo Supports easy evaluation of alternative measures (e.g. est
store)o Culture of transparency of programming/data manipulation
Problems…o Not available to all users o Not easily incorporated in generic services
11
Variables and functional form
Functional form = the way in which measures are arithmetically incorporated in quantitative analysis
With occupations, education, ethnicity, and elsewhere, we tend to be too willing to make simplifying categorisations
o Multiple categorisations are possibleo As are scaling approaches – better suited for complex
analytical procedures
12
Good habits: Keep clear records of DM activities
Reproducible (for self)Replicable (for all)Paper trail for whole
lifecycleCf. Dale 2006; Freese 2007
In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata)
Syntax Examples: www.longitudinal.stir.ac.uk
13
Principle: Use existing standards and previous research
Variable operationalisationsUse recognised recodes / standard classifications
• NSI harmonisation standards (e.g. ONS)• Cross-national standards [Hoffmeyer-Zlotnick & Wolf 2003;
Harkness et al. 2005; Jowell et al. 2007] • Research reviews [e.g. Shaw et al. 2007]• Common v’s best practices (e.g. dichotomisations)
Use reproducible recodes / classifications (paper trail)
Other data file manipulations• Missing data treatments• Matching data files (finding the right data)
14
Principle: Do something, not nothing
We currently put much more effort into data collection and data analysis, and neglect data manipulation
Survey research – the influence of ‘what was on the archive version’
…In my experience, a common reason why people didn’t do more DM was because they were frightened to…
15
Principle: Support linking data
Complex data (complex research) is distributed across different files. In surveys, use key linking variables for... One-to-one matching
SPSS: match files /file=“file1.sav” /file=“file2.sav” /by=pid. Stata: merge pid using file2.dta
One-to-many matching (‘table distribution’)SPSS: match files /file=“file1.sav” /table=“file2.sav” /by=pid .Stata: merge pid using file2.dta
Many-to-one matching (‘aggregation’)SPSS: aggregate outfile=“file3.sav” /meaninc=mean(income) /break=pid. Stata: collapse (mean) meaninc=income, by(pid)
Many-to-Many matches
Related cases matching
16
Challenges..
Agreeing about variable constructions
Unresolved debates about optimal measures and variables
Esp. in comparative research such as across time, between countries
In DAMES, we have particular interests in comparability for: Longitudinal comparability (
http://www.longitudinal.stir.ac.uk/variables/) Scaling / scoring categories to achieve ‘meaning equivalence’
or ‘specific measures’
17
Challenges..
Incentivising documentation / replicability
There is little to press researchers to better document DM, but much to press them not to
• Make DM and its documentation easier?• Reward documentation (e.g. citations)?
iii) Ethnicity research agendas
Our impressiono More data on more referentso Controlled access to datao Increasing recognition of intergenerational change o Mixed identities
Other views…?
18