Dealing with data on ethnicity: Principles and practice Paul Lambert, University of Stirling Talk...
-
Upload
jacob-mackay -
Category
Documents
-
view
218 -
download
0
Transcript of Dealing with data on ethnicity: Principles and practice Paul Lambert, University of Stirling Talk...
Dealing with data on ethnicity: Principles and practice
Paul Lambert, University of Stirling
Talk presented to the DAMES Node workshop on ‘Data on ethnicity in social survey reseach’ Stirling, 28th Jan 2010.
DAMES (www.dames.org.uk) is an ESRC funded research Node working on ‘Data Management through e-Social Science’
..dealing with data on ethnicity
1) Handling/enhancing categorical data
(‘data management’)
2) Handling/enhancing data on ethnicity
2
3
Categorical data is important..
Principal social survey datumo Basis of most social research reports/analyses/comparisons
It’s rich and complex o We’re often interested in very fine levels of detail / differenceo We usually recode categories in some way for analysis
…how categorical data is managed is of great consequence to the results of analysis…Choices about recoding, boundaries, contrasts made[e.g. RAE analysis: Lambert & Gayle 2009]
4
EFFNATIS sample (1999): Subjective ethnic identity
30. English, White, Cosmopolitan (2,10, 1 0.12 86.0329. English, European and White-British 20 2.43 85.91 28. European and White-British (11,13) 7 0.85 83.48 27. White-Italian 1 0.12 82.62 26. English & White-British (2,13) 8 0.97 82.5025. White-British & Cosmopolitan (13,16 5 0.61 81.53 24. English, White-British (2,13) 41 4.98 80.92 23. English & White (2,10) 54 6.56 75.9422. Indian-British, Asian-British (7,15 9 1.09 69.38 21. Indian & British (1,4) 1 0.12 68.29 19. British Moslem 6 0.73 68.17 18. Indian, Asian (4,9) 2 0.24 67.44 17. Moslem 12 1.46 67.19 16. Cosmopolitan 7 0.85 65.74 15. Asian-British 65 7.90 64.88 13. White-British 60 7.29 56.99 11. European 6 0.73 49.70 10. White 7 0.85 48.97 9. Asian 6 0.73 48.12 8. Bangladeshi-British 22 2.67 47.39 7. Indian-British 34 4.13 44.71 6. Pakistani-British 73 8.87 40.58 5. Bangladeshi 10 1.22 31.71 4. Indian 10 1.22 30.50 3. Pakistani 25 3.04 29.28 2. English 79 9.60 26.25 1. British 137 16.65 16.65 yourself? Freq. Percent Cum. describes how you would describe Q.129 Which of the following best
90. White-British, Ukrainian 1 0.12 98.91 89. Indian, Asian-British (4,15) 1 0.12 98.7888. Indian, Black, Asian-British (4,12, 1 0.12 98.6687. Indian-British, Black-British (7,14 1 0.12 98.54 86. English, Indian (2,4) 4 0.49 98.42 85. European, White-British, Irish 1 0.12 97.93 83. Pakistani-British, Italian 1 0.12 97.8182. English, Pakistani-British, Asian ( 1 0.12 97.69 80. Human being 2 0.24 97.5779. English, Indian-British, White, Asi 1 0.12 97.33 78. Pakistani-British, Asian, Moslem 1 0.12 97.21 77. English, Pakistani (2,3) 3 0.36 97.08 73. British with Baltic-Slav origins 1 0.12 96.7272. English, White-British, Cosmopolita 2 0.24 96.60 71. Irish-English 1 0.12 96.35 70. English, White, European (2,10,11) 3 0.36 96.23 69. English, White, Irish 1 0.12 95.8768. Pakistani, Asian-British, Moslem, F 1 0.12 95.75 67. Pakistani, Asian-British (3,15) 1 0.12 95.63 66. Humanoid 1 0.12 95.50 65. Lancastrian 3 0.36 95.38 64. British, European (1,11) 2 0.24 95.02 63. Bangladeshi, Asian, Black (5,9,12) 1 0.12 94.7862. Pakistani-British, Black, Asian-Bri 2 0.24 94.65 61. English, Pakistani-British (2,6) 2 0.24 94.4160. British, White, European, Cosmopoli 1 0.12 94.1759. English, European, White-British, C 3 0.36 94.05 58. Individual 2 0.24 93.68 56. English, European (2,11) 2 0.24 93.44 55. Scottish 1 0.12 93.20 54. British, White (1,10) 2 0.24 93.07 53. British, English (1,2) 9 1.09 92.8352. English, Indian, Asian-British (2,4 1 0.12 91.7451. Pakistani-British, Indian-British ( 1 0.12 91.62 50. English, Asian-British (2,15) 2 0.24 91.49 49. English, Indian-British (2,7) 2 0.24 91.2548. English, Indian-British, Black, Asi 2 0.24 91.01 47. Indian-British, Asian (7,9) 5 0.61 90.7746. Indian, Asian, Asian-British (4,9,1 1 0.12 90.16 45. Neapolitan 1 0.12 90.0444. English, Pakistani-British, Indian- 2 0.24 89.9143. English, Indian-British, Asian-Brit 1 0.12 89.67 42. Pakistani-British, Asian (6,9) 6 0.73 89.55 41. Black, Asian-British (12,15) 2 0.24 88.8240. Pakistani-British, Asian-British (6 2 0.24 88.58 39. Pakistani-born British 1 0.12 88.34 38. Pakistani, Black (3,12) 1 0.12 88.2137. Pakistani-British, Asian, Black-Bri 1 0.12 88.0936. Pakistani-British, Black-British, A 1 0.12 87.9735. Pakistani-Moslem living in Great Br 1 0.12 87.85 34. Kashmiri 4 0.49 87.73 33. Pakistani, Asian (3,9) 3 0.36 87.2432. Pakistani-British, Asian-British (6 6 0.73 86.8831. English, Indian-British, Asian-Brit 1 0.12 86.15
5
UK EFFNATIS survey (1999)
[Heckmann et al 2001]
79. Islamic 0 0 1 0 1 78. Dramatic 0 0 4 0 4 77. Bollywood 0 0 1 0 1 75. Ambient 1 0 0 0 1 72. Lo-fi 1 0 0 0 1 68. Urdu 0 2 0 0 2 65. Hindi 0 1 2 0 3 64. Classical Hindi 0 1 1 0 2 63. English 0 2 0 0 2 60. Dance Floor 2 0 0 0 2 51. Goth 1 0 0 0 1 50. Break Beat 1 1 0 0 2 48. Swing 0 1 0 0 1 47. Blues 1 0 0 0 1 46. Love songs / ball 0 2 0 0 2 45. Rock 'n' Roll 1 0 0 0 1 44. Soft Rock 2 1 0 0 3 43. Alternative 2 0 0 0 2 41. Drum 'n' Bass 3 0 0 0 3 40. All 19 5 2 0 26 38. Asian Pop Music 0 2 0 0 2 37. Jungle 1 0 0 0 1 36. Bangra 0 5 2 0 7 35. Indian 0 11 3 1 15 31. Punk 3 0 0 0 3 30. Country 2 0 0 0 2 28. Reggae 0 3 2 1 6 27. Easy Listening 3 0 0 0 3 26. Grunge 1 0 0 0 1 24. Eighties 2 0 0 0 2 23. Seventies 3 0 0 0 3 22. Sixties 4 0 0 0 4 21. Classical 4 2 1 1 8 20. Jazz 2 1 1 0 4 18. Folk 1 0 0 0 1 17. Motown 1 0 0 0 1 16. Rock / Heavy Meta 32 2 2 0 36 15. Indie / Britpop 46 4 3 1 54 14. Rap 4 11 2 1 18 13. Acid Jazz 1 0 0 0 1 12. Hardcore 3 0 0 0 3 11. Garage 0 2 1 0 3 10. Techno 1 0 0 0 1 9. House 18 0 1 0 19 8. Trance 0 0 1 0 1 7. Asian Music 0 11 12 2 25 6. Hip Hop 5 6 2 0 13 5. R'n'B 19 16 5 5 45 4. Soul 8 31 10 3 52 3. Dance 94 5 9 1 109 2. Chart 27 2 0 0 29 1. Pop 107 32 20 4 163 music1b 1. Autoch 2. Pakist 3. Indian 4. Bangla Total
Total 97 601 698 4. Bangladeshi 6 14 20 3. Indian 32 56 88 2. Pakistani 58 105 163 1. Autochthonous 1 426 427 uketh2 1. Ethnic 2. No EM Total music, by ethnicity Favourite type of
6
1 11 12 13 20 21 22 23 24 31 32 33 34 41 42 51 52 61 71 72 73 74 81 82 83 91 92 93
maximum: 335
Men's jobs (frequencies)
90 86
85 84
83 82
81 80
78 76
75 74
73 72
71 70
69 66
64 63
62 61
60 59
55 54
53 52
51 50
49 48
46 45
44 43
42 41
40 39
37 36
35 34
33 32
31 30
29 28
27 26
25 24
23 22
21 20
19 18
17 16
15 14
13 12
11 10
0
1 11 12 13 20 21 22 23 24 31 32 33 34 41 42 51 52 61 71 72 73 74 81 82 83 91 92 93
maximum: 895
Women's jobs (frequencies)
Source: British Household Panel Survey, last reported current jobs of adults, waves 1-17, N Males = 10223; N Females=9934X-asis shows ISCO-88 Sub-Major group of job; Y-axis shows ISCO-88 3rd and 4th digit codes.
7
Data management and categorical data
In DAMES, we identify three important categorical variables (occupations, educational qualifications, ethnicity), and collect information about them in order to improve ‘data management’ and hence exploitation of such data
‘Key’ social science variables Existing resources (and metadata & support on those resources) UK and beyond
8
‘Occupational Information Resources’
Small databases (square electronic files) linking lists of occupational positions (occupational unit groups) with information about those positions
Many existing resources already used in academic research (> 1000)
9
Educational information resources
Small databases (often on paper) linking lists of educational qualifications with information about them
Many existing resources (>500), but less communication between them
[Part of UK scheme from ONS (2008)]
10
Ethnic Minority/Migration Information Resources
Data which links measures of ethnicity / migration status with other information
In high demand, but few existing resources (? < 500)
11
Standardizing categorical data
‘Standardization’ refers to treating variables for the purposes of analysis, in order to aid comparison between variables
o {In the terminology of survey research analysts}
1. Arithmetic standardization to re-scale metric values [zi = (xi – x) / sd]
2. Ex-ante harmonisation (during data production) [ensuring measures of the same concept, collected from different contexts, are recorded in coordinated taxonomies]
3. Ex-post harmonisation [adapting measures of the same concept, collected from different contexts, using a coordinated re-coding procedure]
12
The big issue: standardization for comparisons
‘Comparisons are the essence’ [Treiman, 2009: 382]
↔ to make statements about differences [in measures] over contexts
Categorical data is highly problematic..
Can’t immediately conduct arithmetic standardization Struggle to enforce harmonised data collection
• ..which may not in any case be suitable.. Struggle to achieve ex-post harmonisation
• Non-linear relations between categories• Shifting underlying distributions
13
Two conventional ways to make comparisons [e.g. van Deth 2003]
Measurement equivalence= ex ante harmonisation (or ex post harmonisation)
Meaning equivalence= Arithmetic standardisation (or ex ante or ex post harmonisation)
Much comparative research flounders on an insufficient recognition of strategies for equivalence
(“One size doesn’t fit all, so we can’t go on”)
14
Measurement equivalence
Measurement equivalence by assertion
15
Measurement equivalence can go wrong
Show tabplot here
Unskilled
Skilled manual
Petty-bourg.
Non-manual
Salariat
Source: Females from LFS/GHS, using data from Li and Heath (2008)
percent of year category
Goldthorpe class scheme harmonised over time
16
Meaning equivalence
For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence
(because of non-linear relations between categories and shifting underlying distributions)
(even if measurement equivalence seems possible)
Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context
For categorical data, this can be achieved by scaling categories in one or more dimension of difference
17
Managers and Administrators
Professional
Associate professional and technical
Clerical and secretarial
Craft and related
Personal and protective servicesSales
Plant and machine operativesOther occupations
.
higher degree
first degree
teaching qf
other higher qf
nursing qf
gce a levels
gce o levels or equiv
commercial qf, no o levels
cse grade 2-5,scot grade 4-5apprenticeship
other qf
no qf
.white
black-carib
black-african
black-other
indianpakistani
bangladeshi
chinese
other ethnic grp
2030
4050
0 1 2 3Source: British Household Panel Survey 2007, adults aged 18+ and father's Cambridge Scale score.Points at 1-3 show category mean. Points at 0 show individual values (scaled mean=28, sd=6; pop. mean=28, sd=18).
‘Effect proportional scaling’ using parents’ occupational advantage
18
What we do and what we ought to do (when standardizing categories)
Research applications tend to select a favoured categorisation of a concept and stick with it Due to coordinated instructions [e.g. Blossfeld et al. 2006] Due to perceived lack of available alternatives Due to perceived convenience
To make statistical analyses more robust we should… Operationalise and deploy various scalings and arithmetic
measures Try out various of categorisations and explore their distributional
properties … and keep a replicable trail of all these activities..
19
2) Handling data on ethnicity & standardizing categorical data
GESDE projects are concerned with allowing social science researchers to navigate, and exploit, heterogeneous information resources
Occupational Information Resources (GEODE) Educational Information Resources (GEEDE) Ethnic minority/Migration Information Resources (GEMDE)
20
Plenty of interest, and data, on ‘ethnic minority groups’, ‘immigration’, ‘immigrants’
Data includes: Generic & specialist studies collecting ethnic ‘referents’ ‘ethnic identity’; ‘nationality’, parents’ nationality; country of birth;
language spoken; religion; ‘race’
National research and data management: Most countries have evolving standard definitions of ethnic groups
International research and data management Seen as highly problematic in many fields except immigration data Lambert, P.S. (2005). Ethnicity and the Comparative Analysis of
Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11.
…but working with ethnicity data in surveys is hard…!
- It’s sparse - It’s collinear (e.g. to age)
- It’s dynamic (cf. comparative research)
21
22
Ethnic group in the World Values Survey - Britain
Count
18 0 0 0 18
0 1484 0 999 2483
0 0 1 0 1
15 0 0 0 15
1 0 0 0 1
0 0 3 0 3
0 0 11 0 11
0 0 1 0 1
0 0 4 0 4
0 0 12 0 12
9 0 2 0 11
0 0 7 0 7
1124 0 1044 0 2168
0 0 8 0 8
1167 1484 1093 999 4743
-5 Missing; Unknown
-4 Not asked
-1 Don´t know
40 Asian
70 Asian - Central (Arabic)
80 Asian - East (Chinese,Japanese)
90 Asian - South (Indian,Hindu, Pakistani,Bangladeshi)
130 Bangladeshi
200 Black African
210 Black-Caribbean
220 Black-Other / Black
810 Pakistani
1400 White / CaucasianWhite
8000 Other
Total
1981-1984 1989-1993 1994-1999 1999-2004
Wave
Total
23
Ethnic group in the World Values Survey - Mexico
Count
0 1 0 1
0 0 29 29
0 832 0 832
0 364 0 364
5 8 0 13
0 84 0 84
7 14 3 24
544 0 0 544
240 0 564 804
346 0 648 994
86 0 0 86
0 0 25 25
303 335 254 892
0 685 12 697
1531 2323 1535 5389
-5 Missing; Unknown
-2 No answer
-1 Don´t know
70 Asian - Central (Arabic)
80 Asian - East (Chinese,Japanese)
90 Asian - South (Indian, Hindu,Pakistani, Bangladeshi)
220 Black-Other / Black
310 Coloured (medium)
320 Coloured (dark)
330 Coloured (light)
630 Indian (American)
640 Indigenous
1400 White / Caucasian White
8000 Other
Total
1989-1993 1994-1999 1999-2004
Wave
Total
24
UK: ONS & ESDS data guides
Input harmonisation within decades Output harmonisation between decades
o Bosveld, K., Connolly, H., & Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.
Academic strategies – ad hoc ‘black’ group, etcAddition of extra categories over timeMixed ethnicities, marriages…
UK Focus on ‘ethnic identity’, lack of attention to alternative referents
25
Comparative research solutions?
Measurement equivalence might be achieved by:
o Survey data collection o Connecting related groupso Longitudinal linkage
Functional equivalence for categories: o Simplified categorical distinctions o Immigrant cohorts o Scaling ethnic categories
…Principles and practice…
3 themes in DAMES ought, in our perspective, to help here
1)Replicability / transparency
2)Plurality of approaches
3)Ease access (to off-putting operations)
26
Replicability / transparency
Document your own recodes Access somebody else’s recodes Identify commonly used recodes (& use them..!)
27
Plurality of approaches Diminishing excuses for not trying out
multiple operationalisations…
28
0.0
5.1
.15
ES5
ES2E9
E6E5
E3E2
G13G11
G10G7
G5G3
G2K4
R7WR
WR9O17
O8O4
MNI9
I99CM
CFCM2
CF2CG
ISEISIOP
AWMWG1
WG2WG3
GN1
Increase in R-squared Increase in BIC
Britain-.
050
.05
.1.1
5
ES5
ES2E9
E6E5
E3E2
G13G11
G10G7
G5G3
G2K4
R7WR
WR9O17
O8O4
MNI9
I99CM
CFCM2
CF2CG
ISEISIOP
AWMWG1
WG2WG3
GN1
Sweden
Source: BHPS and LNU 1991, adults aged 23-55 in work in 1991, N=4536 Britain, 2504 Sweden. Model 1: ISEI = linear age + gender ; Model 2: ISEI = (Model 1) + occupation-based social classificationGraph shows improvement in R2 for OLS regression, Model 2 v's Model 1,plus scaled BIC statistic (Model 2 BIC - Model 1 BIC / Model 1 BIC). Unweighted data.
Explanatory power of schemes in predicting father's ISEI
Figure 12: R-2 and BIC for Fathers ISEI
Making complex things easier
Organising complex categorical dataLabelling, recoding, etc
Effect proportional scalingStandardisation Interaction terms
29
1. White
2. Mixed 3. Indian
5. Bangladeshi
6. Other Asian
7. Black-Caribbean
8. Black African
9. Other Black
10. Chinese
11. Other ethnic group
4. Pakistani
-2-1
01
2
Source: BHPS wave 17, n = 12626, % 'White' = 97.3
Identified principally by age, gender attitudes and household incomeSOR model dimension scores for BHPS ethnic groups
30
Data used Department for Education and Employment. (1997). Family and Working Lives
Survey, 1994-1995 [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: 3704.
Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg.
Inglehart, R. (2000). World Values Surveys and European Values Surveys 1981-4, 1990-3, 1995-7 [Computer file] (Vol. 2000). Ann Arbor, MI: Institute for Social Research [Producer]; Inter-university Consortium for Political and Social Research [Distributor].
Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666.
University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.
31
References Agresti, A. (2002). Categorical Data Analysis, 2nd Edition. New York: Wiley. Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A
methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk)
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper 2006-05.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
van Deth, J. W. (2003). Using Published Survey Data. In J. A. Harkness, F. J. R. van de Vijver & P. P. Mohler (Eds.), Cross-Cultural Survey Methods (pp. 329-346). New York: Wiley.