Download - Metrics, research award grades, and the REF Harvey Goldstein University of Bristol With support from Mary Day, Ian Diamond and Phil Sooben.

Metrics, research award grades, and the REF

Harvey Goldstein

University of Bristol

With support from Mary Day, Ian Diamond and Phil Sooben

The context

• REF proposal to use metrics– Journal impact factors and citations– Research income– Research students– Research council grant application grades

Little discussion so far of the technical measurement issues associated with Research Council awards

The database

• All ESRC applications 2001-2007

• Details of applicants, reviewer, assessor and board grades

• Identification of departments and HEIs

• Award amounts (not considered)

• Final analysis of 2698 applications, 1698 departments

A naïve analysis

Consider the discipline of Education– Note that we have not been able to assign departments to RAE

disciplines so ‘principal discipline’ used.

• Similar results for other disciplines• Final award grade converted to a numeric score • All award types considered – similar results if fellowships

excluded• PI weighted more than Co-apps: same award score

given to each applicant• Weighted analysis of these scores in a 3-level model:

– Application within Applicant within HEI

Results of 2-level model

• Insensitive to specific weighting system

Table 2. Three level variance components model with numerical final grade as response. Maximum likelihood estimates.

Parameter Estimate Standard error

Intercept 6.58 0.10

Level 3 variance 0.58 0.14



Deviance 5712.8

VPC (level 3) 17.7%

Problems

• Invalid analysis since scores not independent: – Imagine a situation where we have N applications, each of

which has a different pair of applicants drawn from two particular HEIs, A & B where for an application each applicant is given the application’s awarded score. A simple analysis would compare the mean score for HEI A with the mean score for B, but these mean scores are equal by definition. Thus this analysis contains no information about HEI differences, as opposed to the case where for each pair we have a score derived separately for each applicant.

• Applicants may also come from different departments not associated with the principal discipline

A more valid analysis

• We reconceptualise the data as follows:– We assume each applicant contributes a level of ‘quality’

to the application – – The application score is just the average of these

(weighted according to whether PI or Coapp) – Some applicants are on more than one application

associated with different combinations of other applicants and this allows us, in principle, to assign (estimate) a score for each applicant

– Known as a multiple membership (MM) model

Formally:

– i indexes application, j indexes applicant, is application score

ju

j

ijjijij euwy 0

ijy

Another serious problem

• There are, for education, 454 applications and 989 applicants and in general there are more applicants than applications.

• This means that we cannot use the MM model to score applicants – non-identifiability.

• However, there are only 98 HEIs so we can fit a model that identifies the HEI only (aggregating all applicants for one HEI within an application – will lead to some overestimation of the separation of HEIs).

• This provides HEI/department scores.

Results

Table 4. Two level multiple membership model for Education with numerical final grade as response. MCMC estimates.


Intercept 6.74 0.12

Level 2 variance (HEI) 0.23 0.16

Level 1 variance (Application) 2.94 0.21

DIC 1788.6

VPC 7.2%

Note that HEI variance now about half what we saw before.

Caterpillar plot

Note how all confidence intervals overlap zero

So no separation from overall mean is possible.

Also, of the four highest in ‘naïve’ analysis, only one is in four highest here.

Similar result if fellowships excluded

It’s even more complicated

• So far all applicants on an application have been assigned to the principal discipline.

• We need to assign to their actual discipline/department and this implies we should carry out a joint analysis of all applications

• Again, there are 2698 applications and but only 1698 departments

• So we have a MM model and we estimate scores for each department

Results

The between-department variance is now larger (19%). Only 0.5% of departments have CIs overlapping the mean.

Including the principal discipline in the model indicates (moderate) discipline differences in award grading (see below).

Table 6. Two level multiple membership model for all applications with numerical final grade as response. MCMC estimates.


Intercept 7.08 0.04

Level 2 variance (HEI/department) 0.57 0.10


VPC 18.8%

One hundred lowest and highest ranked residuals for multiple membership model using all departments, with 95% confidence intervals.

MM model with selected principal disciplines (>100 applications)


Intercept (Econ) 7.17 0.11

Management -0.69 0.17

Social Policy -0.54 0.20

Education -0.24 0.11

Sociology 0.06 0.15

Human Geog 0.10 0.18

Psychology 0.16 0.13

Level 2 variance (HEI/department) 0.45 0.11


VPC 16.0%

Using the results

• Given uncertainty how useful are they?• Can they be combined (formally) with citations to

provide greater precision?• The technical limitations of the analyses are

likely to apply to citation analyses also– E.g. analysis of NAS 2001 database shows 2,600

papers with 13,000 unique authors (Borner et al., 2004)

• What are side effects – perverse incentives

Perverse incentives

• All high stakes performance monitoring systems encourage ‘gaming’ – some possibilities:– Large numbers of co-applicants squeezed into

applications– Discouraging of cross-disciplinary applications– HEI behaviour would change over time with a

destabilising and distorting effect.– Encouragement of many small and short term grants

rather than fewer large and long term ones.– Distort behaviour of referees and board members

(How?)

Comparisons with RAE 2008 scores

• Results for Economics and Education:

• Simple (4,3,2,1,0) RAE scoring system

• Insensitive to other scorings

• Dept. results (residuals) from ESRC analysis (weighted) averaged to RAE HEI categories.

Correlations between RAE and ESRC scores – selected disciplinesDiscipline Correlation

Sociology 0.25

Economics 0.50

Education 0.30

Psychology 0.19

Management 0.07

Economics• 27 HEIs. Correlation =0.50 (P<0.01)

highest 7 RAE scores are (from the top) are:LSE, UCL, Warwick, Oxford, Essex, Nottingham, Bristol

Economics RAE ranks

Education• 37 HEIs. Correlation = 0.30 (P=0.07)

The top 7 are: IOE=Oxford, Cambridge=Kings, Bristol= Leeds, Exeter

Education

What next?

• Incorporation of other research councils in a combined analysis

• Include citation data in a combined model:– In the REF it can be argued that an analysis

at least as complex as the present is unavoidable for validity

– Using citations encounters the same issues of more applicants than papers/books.