Metrics, research award grades, and the REF
Harvey Goldstein
University of Bristol
With support from Mary Day, Ian Diamond and Phil Sooben
The context
• REF proposal to use metrics– Journal impact factors and citations– Research income– Research students– Research council grant application grades
Little discussion so far of the technical measurement issues associated with Research Council awards
The database
• All ESRC applications 2001-2007
• Details of applicants, reviewer, assessor and board grades
• Identification of departments and HEIs
• Award amounts (not considered)
• Final analysis of 2698 applications, 1698 departments
A naïve analysis
Consider the discipline of Education– Note that we have not been able to assign departments to RAE
disciplines so ‘principal discipline’ used.
• Similar results for other disciplines• Final award grade converted to a numeric score • All award types considered – similar results if fellowships
excluded• PI weighted more than Co-apps: same award score
given to each applicant• Weighted analysis of these scores in a 3-level model:
– Application within Applicant within HEI
Results of 2-level model
• Insensitive to specific weighting system
Table 2. Three level variance components model with numerical final grade as response. Maximum likelihood estimates.
Parameter Estimate Standard error
Intercept 6.58 0.10
Level 3 variance 0.58 0.14
Level 2 variance 0.31 0.18
Level 1 variance 2.39 0.23
Deviance 5712.8
VPC (level 3) 17.7%
Problems
• Invalid analysis since scores not independent: – Imagine a situation where we have N applications, each of
which has a different pair of applicants drawn from two particular HEIs, A & B where for an application each applicant is given the application’s awarded score. A simple analysis would compare the mean score for HEI A with the mean score for B, but these mean scores are equal by definition. Thus this analysis contains no information about HEI differences, as opposed to the case where for each pair we have a score derived separately for each applicant.
• Applicants may also come from different departments not associated with the principal discipline
A more valid analysis
• We reconceptualise the data as follows:– We assume each applicant contributes a level of ‘quality’
to the application – – The application score is just the average of these
(weighted according to whether PI or Coapp) – Some applicants are on more than one application
associated with different combinations of other applicants and this allows us, in principle, to assign (estimate) a score for each applicant
– Known as a multiple membership (MM) model
Formally:
– i indexes application, j indexes applicant, is application score
ju
j
ijjijij euwy 0
ijy
Another serious problem
• There are, for education, 454 applications and 989 applicants and in general there are more applicants than applications.
• This means that we cannot use the MM model to score applicants – non-identifiability.
• However, there are only 98 HEIs so we can fit a model that identifies the HEI only (aggregating all applicants for one HEI within an application – will lead to some overestimation of the separation of HEIs).
• This provides HEI/department scores.
Results
Table 4. Two level multiple membership model for Education with numerical final grade as response. MCMC estimates.
Parameter Estimate Standard error
Intercept 6.74 0.12
Level 2 variance (HEI) 0.23 0.16
Level 1 variance (Application) 2.94 0.21
DIC 1788.6
VPC 7.2%
Note that HEI variance now about half what we saw before.
Caterpillar plot
Note how all confidence intervals overlap zero
So no separation from overall mean is possible.
Also, of the four highest in ‘naïve’ analysis, only one is in four highest here.
Similar result if fellowships excluded
It’s even more complicated
• So far all applicants on an application have been assigned to the principal discipline.
• We need to assign to their actual discipline/department and this implies we should carry out a joint analysis of all applications
• Again, there are 2698 applications and but only 1698 departments
• So we have a MM model and we estimate scores for each department
Results
The between-department variance is now larger (19%). Only 0.5% of departments have CIs overlapping the mean.
Including the principal discipline in the model indicates (moderate) discipline differences in award grading (see below).
Table 6. Two level multiple membership model for all applications with numerical final grade as response. MCMC estimates.
Parameter Estimate Standard error
Intercept 7.08 0.04
Level 2 variance (HEI/department) 0.57 0.10
Level 1 variance (Application) 2.47 0.08
VPC 18.8%
One hundred lowest and highest ranked residuals for multiple membership model using all departments, with 95% confidence intervals.
MM model with selected principal disciplines (>100 applications)
Parameter Estimate Standard error
Intercept (Econ) 7.17 0.11
Management -0.69 0.17
Social Policy -0.54 0.20
Education -0.24 0.11
Sociology 0.06 0.15
Human Geog 0.10 0.18
Psychology 0.16 0.13
Level 2 variance (HEI/department) 0.45 0.11
Level 1 variance (Application) 2.37 0.09
VPC 16.0%
Using the results
• Given uncertainty how useful are they?• Can they be combined (formally) with citations to
provide greater precision?• The technical limitations of the analyses are
likely to apply to citation analyses also– E.g. analysis of NAS 2001 database shows 2,600
papers with 13,000 unique authors (Borner et al., 2004)
• What are side effects – perverse incentives
Perverse incentives
• All high stakes performance monitoring systems encourage ‘gaming’ – some possibilities:– Large numbers of co-applicants squeezed into
applications– Discouraging of cross-disciplinary applications– HEI behaviour would change over time with a
destabilising and distorting effect.– Encouragement of many small and short term grants
rather than fewer large and long term ones.– Distort behaviour of referees and board members
(How?)
Comparisons with RAE 2008 scores
• Results for Economics and Education:
• Simple (4,3,2,1,0) RAE scoring system
• Insensitive to other scorings
• Dept. results (residuals) from ESRC analysis (weighted) averaged to RAE HEI categories.
Correlations between RAE and ESRC scores – selected disciplinesDiscipline Correlation
Sociology 0.25
Economics 0.50
Education 0.30
Psychology 0.19
Management 0.07
Economics• 27 HEIs. Correlation =0.50 (P<0.01)
highest 7 RAE scores are (from the top) are:LSE, UCL, Warwick, Oxford, Essex, Nottingham, Bristol
Economics RAE ranks
Education• 37 HEIs. Correlation = 0.30 (P=0.07)
The top 7 are: IOE=Oxford, Cambridge=Kings, Bristol= Leeds, Exeter
Education
What next?
• Incorporation of other research councils in a combined analysis
• Include citation data in a combined model:– In the REF it can be argued that an analysis
at least as complex as the present is unavoidable for validity
– Using citations encounters the same issues of more applicants than papers/books.
Top Related