Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the...
-
Upload
nicholas-ball -
Category
Documents
-
view
217 -
download
0
Transcript of Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the...
Cross-national data in DAMES and GE*DE
Paul Lambert, University of Stirling
Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24th
June 2009
This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk
2
Some recent history –Atkinson (1996: 47)
3
Stewart et al. (2009: 5)
4
Today’s workshop: ‘Where next?’
Problems / challenges with cross-national survey analysis Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data
The contribution of e-social science
5
Why is e-Science relevant?e-Science models cover distributed computing & enabling
of collaborations [e.g. Foster et al., 2001]
e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009]
Cross-national survey projects include complex distributed data & a clear need for collaborations…
Hitherto, cross-national survey projects have not generally made use of e-science initiatives
6
Part 1: What is e-Social Science doing for cross-national survey research?
Projects on the research lifecycle data collection data management [DAMES]data analysis
Projects on a national scale Projects on data, but not necessarily survey data
[e.g. digital records; aggregate data; metadata]
7
The example of DAMES and GE*DE
www.dames.org.uk 1.1) Grid Enabled Specialist Data Environments (‘GE*DE’)
2.1) Description, discovery & service use through metadata and data abstraction
1.2) Data resources for micro-simulation on social care data
2.2) Techniques to handle data from multiple sources
1.3) Linking e-Health and social science databases
2.3) Workflow modelling for social science
1.4) Training and interfaces for management of complex survey data
2.4) Security driven data management
8
‘Data management’ means… ‘the tasks associated with linking related data resources, with
coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’ […DAMES Node..]
Usually performed by social scientists themselvesMost overt in quantitative survey data analysis
• Preparing or ‘enabling’ survey analysisUsually a substantial component of the work process
• But not explicitly rewarded (and sometimes penalised)
Here we differentiate from archiving / controlling data itselfHere we differentiate from archiving / controlling data itself
9
‘The significance of data management for social survey research’
(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151)
The data management studied across the DAMES Node is a major component of the social survey research workload
Pre-release manipulations performed by distributors / archivists• Coding measures into standard categories• Dealing with missing records
Post-release manipulations performed by researchers • Re-coding measures into simple categories
We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently
So the ‘significance’ of DM is about how much better research might be if we did things more effectively…
10
In GE*DE, we’re developing
Services for accessing and depositing specialist data • Occupations, educational qualifications, ethnicity• UK Administrative data (with ADLS)
Materials specifically oriented to comparative analytical approaches
• Data resources often from major cross-national studies • Producing new cross-national data resources• (see also talk on standardization of categorical data in session 4a)
11
GEODE v1: Organising and distributing specialist data resources (on occupations)
12
Cross-national data in DAMES and GE*DE
1. New specialist data on occupations, education and ethnicity
a. Curation and re-release of existing data
b. Generation of new data (and/or metadata), with focus on standardisation/ harmonisation
2. Conduit to existing resources
3. Generic resources for workflow documentation and replication
13
E.g. (1a) Occupations [cf. Leiulfsrud et al. 2005]
14
E.g. (1b) Ethnicity / Migration
AboriginalLatin, Central and South American
CanadianBlack/Caribbean South Asian
French and Canadian Other EuropeanCanadian and other Other multiple origins
French BritishBritish and French French and otherBritish Isles, French and Canadian British Isles and CanadianBritish Isles, Canadian and others Other East and Southeast AsianBritish, French and other French, Canadian and otherWest Asian British and otherBritish Isles, French, Canadian and other
Arab
Other Christian+indigenousNo religion+indigenous Catholic+indigenous
Other ChristianCatholicNo religion
Other relgion
Laotian
Hmong Other race, n.e.c.Latin American Indian
Cambodian
AIAN, tribe not specifiedNavajo
Choctaw BlackSiouxVietnameseChippewa Other Specified Indian tribe (2000-2005)CherokeeTwo or more racesPuebloNative Hawaiian
White Asian, not specifiedThai
Other Asian race combinationsFilipino
KoreanPakistani
ChineseJapanese
Asian Indian
Taiwanese
Canada 2001 Mexico 2000 USA 2000Source: IPUMS International (Minnesota Population Centre, 2009).Point show mean occupational advantage score for employed adults using US 2000 CAMSIS. (For ethnic groups with >= 1000 census responses)
15
E.g. (2): Occupations
16
E.g. (3): Workflow documentation
17
Part 2: The contribution of e-Science
The contribution should concern:Navigating complex dataSecurityWorkflows
Compare with current issues for cross-national surveys: Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data
18
(a) Quantity of data (& metadata)
…current trends
Moving beyond macro-data analysis* to exploiting large-scale micro-datamicro-data
*Country level analysis, e.g. Fuchs (2009)
Interest in / access to securesecure micro-data Exploitation of complexcomplex micro-data
o Longitudinal data and the life-course [Mayer, 2005]o Micro-data and links with macro-data o Metadata about the quality of the micro-data
19
(a) … can be helped by…
Interest in / access to secure micro-dataE-Science projects building portals for secure access to data (e.g.
Sinnott 2008)
Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE)
Metadata provision on data resources (e.g. PolicyGrid)
Comparative standardisations (e.g. GE*DE)
Tools for complex analysis (e.g. e-Stat)
Tools for simulation (e.g. NeISS)
Tools for visualisation of complex data (e.g. Maptube)
Tools for workflow records for research lifecycle (cf. MyExperiment]
20
(b) Harmonisation, equivalence and data quality
Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations
E-Science resources support Documenting / replicating ex post harmonisations
e.g. syntax databases at GE*DE Furnishing new scaling tools (meaning equivalence) e.g. scales of
educational qualifications at GE*DE Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of
alternative measures
? Pluralistic/open source v’s quality control
21
More on GE*DE and issues of data quality
GE*DE covers Occupations; Educational qualifications; Ethnicity and migration
These are ‘key variables’ in social science research
Regularly measured Link to concepts of central interest Multivariate context
(Critical relations with gender, age cohort, etc)
22
Key variables: concepts and measures
Variable Concept Measure (e.g.) Something useful Occupation Class; stratification;
unemploymentOccupation-based social classification
www.geode.stir.ac.uk
Education Credentials; Ability; Merit
Qualification based educational level
www.equalsoc.org/8
[Schneider, 2008]
Ethnic group
Ethnicity; religion; race; national origins
Minority ethnic group indicators
[Bosveld et al 2006]
Age Age; life course stage; cohort
Polynomial age function
[Abbott 2006]
Gender Gender; household / family context
www.genet.ac.uk
Income Income; wealth; poverty;
Monthly income; income groups; …
www.data-archive.ac.uk [SN 3909]
23
c) Access to data
..need for
Facilities for granting access to dataIncluding new [potentially secure] data
Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly]
E-Social science contributions Security infrastructures (e.g. portal frameworks) offer much
stronger models for secure access to data Services for organising / distributing metadata
24
The contribution of e-Science - reflections
The contribution should concern: Navigating complex data Security Workflows
But, generally, it isn’t taken up
(cf. existing networks, e.g. LIS, IPUMS, ESS, etc)
25
Possible explanations
E-science tools and services too heavyweight compared to ad hoc sharing solutions
• Overheads in adopting e-Science tools (cf. existing working models)
• E-science tools are unduly generic (c.f. ongoing focussed projects and related resources)
Working habits: Experts and software • Major cross-national projects pre-date e-Science initiatives• Key role of project-specific experts• Many projects are ‘small N’ and don’t seem to require
heavyweight inputs• Survey researchers collaborate through proprietary software
(e.g. Stata, SPSS)
26
Conclusions – will things change?
Overheads of e-Science engagement might decline • GE*DE aims: user friendly services, service delivery emphasis,
training workshops, mainstream software
Existing ad hoc practices could become insufficient• Data of greater scale and complexity • Data with security limits• Need for integrated access and complex analysis• Need for plurality in analyses of multiple measures (even in
‘Small N’ comparisons)• Need for documentation for replication
27
References cited
Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press.
Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press.
Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.
Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222.
Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58.
Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4).
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing
Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0.
Minneapolis: University of Minnesota. Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of
Content and Criterion Validity for 15 European Countries. Mannheim: MZES. Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service
and Applications. London: CRC Press. Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more
equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.