Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the...

Cross-national data in DAMES and GE*DE

Paul Lambert, University of Stirling

Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24th

June 2009

This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk

2

Some recent history –Atkinson (1996: 47)

3

Stewart et al. (2009: 5)

4

Today’s workshop: ‘Where next?’

Problems / challenges with cross-national survey analysis Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data

The contribution of e-social science

5

Why is e-Science relevant?e-Science models cover distributed computing & enabling

of collaborations [e.g. Foster et al., 2001]

e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009]

Cross-national survey projects include complex distributed data & a clear need for collaborations…

Hitherto, cross-national survey projects have not generally made use of e-science initiatives

6

Part 1: What is e-Social Science doing for cross-national survey research?

Projects on the research lifecycle data collection data management [DAMES]data analysis

Projects on a national scale Projects on data, but not necessarily survey data

[e.g. digital records; aggregate data; metadata]

7

The example of DAMES and GE*DE

www.dames.org.uk 1.1) Grid Enabled Specialist Data Environments (‘GE*DE’)

2.1) Description, discovery & service use through metadata and data abstraction

1.2) Data resources for micro-simulation on social care data

2.2) Techniques to handle data from multiple sources

1.3) Linking e-Health and social science databases

2.3) Workflow modelling for social science

1.4) Training and interfaces for management of complex survey data

2.4) Security driven data management

8

‘Data management’ means… ‘the tasks associated with linking related data resources, with

coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’ […DAMES Node..]

Usually performed by social scientists themselvesMost overt in quantitative survey data analysis

• Preparing or ‘enabling’ survey analysisUsually a substantial component of the work process

• But not explicitly rewarded (and sometimes penalised)

Here we differentiate from archiving / controlling data itselfHere we differentiate from archiving / controlling data itself

9

‘The significance of data management for social survey research’

(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151)

The data management studied across the DAMES Node is a major component of the social survey research workload

Pre-release manipulations performed by distributors / archivists• Coding measures into standard categories• Dealing with missing records

Post-release manipulations performed by researchers • Re-coding measures into simple categories

We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently

So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

10

In GE*DE, we’re developing

Services for accessing and depositing specialist data • Occupations, educational qualifications, ethnicity• UK Administrative data (with ADLS)

Materials specifically oriented to comparative analytical approaches

• Data resources often from major cross-national studies • Producing new cross-national data resources• (see also talk on standardization of categorical data in session 4a)

11

GEODE v1: Organising and distributing specialist data resources (on occupations)

12

Cross-national data in DAMES and GE*DE

1. New specialist data on occupations, education and ethnicity

a. Curation and re-release of existing data

b. Generation of new data (and/or metadata), with focus on standardisation/ harmonisation

2. Conduit to existing resources

3. Generic resources for workflow documentation and replication

13

E.g. (1a) Occupations [cf. Leiulfsrud et al. 2005]

14

E.g. (1b) Ethnicity / Migration

AboriginalLatin, Central and South American

CanadianBlack/Caribbean South Asian

French and Canadian Other EuropeanCanadian and other Other multiple origins

French BritishBritish and French French and otherBritish Isles, French and Canadian British Isles and CanadianBritish Isles, Canadian and others Other East and Southeast AsianBritish, French and other French, Canadian and otherWest Asian British and otherBritish Isles, French, Canadian and other

Arab

Other Christian+indigenousNo religion+indigenous Catholic+indigenous

Other ChristianCatholicNo religion

Other relgion

Laotian

Hmong Other race, n.e.c.Latin American Indian

Cambodian

AIAN, tribe not specifiedNavajo

Choctaw BlackSiouxVietnameseChippewa Other Specified Indian tribe (2000-2005)CherokeeTwo or more racesPuebloNative Hawaiian

White Asian, not specifiedThai

Other Asian race combinationsFilipino

KoreanPakistani

ChineseJapanese

Asian Indian

Taiwanese

Canada 2001 Mexico 2000 USA 2000Source: IPUMS International (Minnesota Population Centre, 2009).Point show mean occupational advantage score for employed adults using US 2000 CAMSIS. (For ethnic groups with >= 1000 census responses)

15

E.g. (2): Occupations

16

E.g. (3): Workflow documentation

17

Part 2: The contribution of e-Science

The contribution should concern:Navigating complex dataSecurityWorkflows

Compare with current issues for cross-national surveys: Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data

18

(a) Quantity of data (& metadata)

…current trends

Moving beyond macro-data analysis* to exploiting large-scale micro-datamicro-data

*Country level analysis, e.g. Fuchs (2009)

Interest in / access to securesecure micro-data Exploitation of complexcomplex micro-data

o Longitudinal data and the life-course [Mayer, 2005]o Micro-data and links with macro-data o Metadata about the quality of the micro-data

19

(a) … can be helped by…

Interest in / access to secure micro-dataE-Science projects building portals for secure access to data (e.g.

Sinnott 2008)

Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE)

Metadata provision on data resources (e.g. PolicyGrid)

Comparative standardisations (e.g. GE*DE)

Tools for complex analysis (e.g. e-Stat)

Tools for simulation (e.g. NeISS)

Tools for visualisation of complex data (e.g. Maptube)

Tools for workflow records for research lifecycle (cf. MyExperiment]

20

(b) Harmonisation, equivalence and data quality

Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations

E-Science resources support Documenting / replicating ex post harmonisations

e.g. syntax databases at GE*DE Furnishing new scaling tools (meaning equivalence) e.g. scales of

educational qualifications at GE*DE Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of

alternative measures

? Pluralistic/open source v’s quality control

21

More on GE*DE and issues of data quality

GE*DE covers Occupations; Educational qualifications; Ethnicity and migration

These are ‘key variables’ in social science research

Regularly measured Link to concepts of central interest Multivariate context

(Critical relations with gender, age cohort, etc)

22

Key variables: concepts and measures

Variable Concept Measure (e.g.) Something useful Occupation Class; stratification;

unemploymentOccupation-based social classification

www.geode.stir.ac.uk

Education Credentials; Ability; Merit

Qualification based educational level

www.equalsoc.org/8

[Schneider, 2008]

Ethnic group

Ethnicity; religion; race; national origins

Minority ethnic group indicators

[Bosveld et al 2006]

Age Age; life course stage; cohort

Polynomial age function

[Abbott 2006]

Gender Gender; household / family context

www.genet.ac.uk

Income Income; wealth; poverty;

Monthly income; income groups; …

www.data-archive.ac.uk [SN 3909]

23

c) Access to data

..need for

Facilities for granting access to dataIncluding new [potentially secure] data

Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly]

E-Social science contributions Security infrastructures (e.g. portal frameworks) offer much

stronger models for secure access to data Services for organising / distributing metadata

24

The contribution of e-Science - reflections

The contribution should concern: Navigating complex data Security Workflows

But, generally, it isn’t taken up

(cf. existing networks, e.g. LIS, IPUMS, ESS, etc)

25

Possible explanations

E-science tools and services too heavyweight compared to ad hoc sharing solutions

• Overheads in adopting e-Science tools (cf. existing working models)

• E-science tools are unduly generic (c.f. ongoing focussed projects and related resources)

Working habits: Experts and software • Major cross-national projects pre-date e-Science initiatives• Key role of project-specific experts• Many projects are ‘small N’ and don’t seem to require

heavyweight inputs• Survey researchers collaborate through proprietary software

(e.g. Stata, SPSS)

26

Conclusions – will things change?

Overheads of e-Science engagement might decline • GE*DE aims: user friendly services, service delivery emphasis,

training workshops, mainstream software

Existing ad hoc practices could become insufficient• Data of greater scale and complexity • Data with security limits• Need for integrated access and complex analysis• Need for plurality in analyses of multiple measures (even in

‘Small N’ comparisons)• Need for documentation for replication

27

References cited

Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press.

Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press.

Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.

Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222.

Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58.

Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4).

Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing

Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0.

Minneapolis: University of Minnesota. Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of

Content and Criterion Validity for 15 European Countries. Mannheim: MZES. Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service

and Applications. London: CRC Press. Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more

equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the...

Documents

Transcript of Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the...