Data matters-bournemouth-2015

67
Data Matters Alan Dix Talis & University of Birmingham http://alandix.com/ref2014/

Transcript of Data matters-bournemouth-2015

Page 1: Data matters-bournemouth-2015

Data Matters

Alan DixTalis & University of Birmingham

http://alandix.com/ref2014/

Page 2: Data matters-bournemouth-2015

University ofBirmingham

Tiree

Tiree Tech Wave22-26 October 2015

Page 3: Data matters-bournemouth-2015

today I am not talking about …

• intelligent internet interfaces• visualisation and sampling• situated displays, eCampus,

small device – large display interactions• fun and games, virtual crackers,

artistic performance, slow time• creativity and Bad Ideas• modelling dreams and regret

and the emergence of self

Page 4: Data matters-bournemouth-2015

… or even lots of lights

http:/www.hcibook.com/alan/projects/firefly/

Page 5: Data matters-bournemouth-2015

I am talking about ...

REF data analysis

long tail of small data

Page 6: Data matters-bournemouth-2015

REF

Page 7: Data matters-bournemouth-2015

REF 2014Research Excellence Framework

approx 5 yearly research assessment in the UK

not just about the UK …lots of countries thinking to do similar ... and looking to REF as example

Page 8: Data matters-bournemouth-2015

REF elements

three elements:

outputs (mainly papers)

impact environment

focus of this work

Page 9: Data matters-bournemouth-2015

REF panels

4 main panels, 36 sub-panels, ~200K outputs

sub-panel 11: computer science and informatics

I was on this panel but NO confidential data hereeverything public domain

Page 10: Data matters-bournemouth-2015

REF profilesevery output graded: 4* / 3* / 2* / 1*

individual grades confidential and destroyed

each ‘Unit of Assessment’ (dept) given a profile

http://results.ref.ac.uk/Results/ByUoa/11/Outputs

Page 11: Data matters-bournemouth-2015

sub-area profilesN.B. computing only

each output given ACM code

originally to enable allocation to panelists

… but, also used to create sub-area profiles …

Page 12: Data matters-bournemouth-2015

sub-area profiles

From Morris Sloman’s slides & panel report

theoretical areas30-40% 4*

applied/human areas10-20% 4*

Page 13: Data matters-bournemouth-2015

data not information

sub-panel report warning:"These data should be treated with circumspection …

however already affecting institutional policyhiring, internal investment

… and may influence research council policy

Page 14: Data matters-bournemouth-2015

possible reasons for variation …

1. best applied work is weak– including HCI :-/

2. long tail– weak researchers choose applied areas

3. latent bias– despite panel’s efforts to be fair

can bibliometrics disentangle these?

Page 15: Data matters-bournemouth-2015

metrics and assessment

citation metrics known to be good post-hoc correlates of sophisticated measures

… but not for individuals and small cohorts and danger of gaming and policy distortion

suitable for verifying large-scale patterns(and HEFCE using them for this)

Page 16: Data matters-bournemouth-2015

data used for analysisall in public domain

(virtually) complete list of outputs:– excluding a few confidential ones– for each: name, doi, ACM topic area, Scopus citations

Google scholar citations for each– gathered after REF (not used in assessment)

UoA and sub-area profiles

Page 17: Data matters-bournemouth-2015

metrics used

Scopus (late 2013 census )– with/without 2012/13 as few citations

‘Normalised Scopus’– using ‘contextual data’, corrects for

different citation patterns between areas– places output in top 1%, 5%, 10% of its area worldwide

Google Scholar (late 2014 census)– with/without 2012/13; zero treated as zero/missing

seven variants – all give similar results

Page 18: Data matters-bournemouth-2015

results … massive differences% citations intop quartile

% REF 4* ratio

winners

losers

Page 19: Data matters-bournemouth-2015

‘scatter’ graph

% outputs in top quartile for citations

% outputsawardedREF 4*

Page 20: Data matters-bournemouth-2015

rank scores

winners

losers

diagram thanks to Andrew Howes

Page 21: Data matters-bournemouth-2015

Another way of looking at it …world ranking within own field

Page 22: Data matters-bournemouth-2015

recall REF …

Page 23: Data matters-bournemouth-2015

for example,HCI research (web similar) …

on average …

• HCI/CSCW paper needs to be in top 0.5%worldwide to get 4*

• logic/algorithms paper just needs to be in top 5%

10 fold difference

Page 24: Data matters-bournemouth-2015

and just as you thought it was all over …… institutional effectslook at +/- 25% REF compared with citationsN.B. use high-end weighted measure as money is focused (4:1:0:0)

of 35 losers, 25 are post-1992 universitiesof 17 winners, 16 are pre-1992 universities

Page 25: Data matters-bournemouth-2015

an example …

XXXXXXX – a new universityYYYYYYYY – an old university

World Rankings

REF

Page 26: Data matters-bournemouth-2015

and Gender?Female authors in main panel B were significantly less likely to achieve a 4* output than male authors with the same metrics ratings. When considered in the UOA models, women were significantly less likely to have 4* outputs than men whilst controlling for metric scores in the following UOAs: Psychology, Psychiatry and Neuroscience; Computer Science and Informatics; Architecture, Built Environment and Planning; Economics and Econometrics.

The Metric Tide (HEFCE, 2015)

Page 27: Data matters-bournemouth-2015

implicit bias?

HEFCE analysis:male staff in computing is 1/3 more likely to get a 4* than female

areas and types institutions disadvantaged by REFoften those with more women

… implications for future recruitment?

Page 28: Data matters-bournemouth-2015

future for research assessment?

• pure metrics?

• metrics as part (e.g. older outputs)

• metrics as under-girding (burden of proof)

• human process – metrics for in-process feedback

Page 29: Data matters-bournemouth-2015
Page 30: Data matters-bournemouth-2015

..

long tail of small data

Page 31: Data matters-bournemouth-2015

Big Dataeveryone is talking about it

Twitter, Google, Facebook, NSA, universities, … and funding

Big Data does it with MapReduceSemantic Data does it with RDF

Page 32: Data matters-bournemouth-2015

the long tail

size ofdata set

a few very large data setse.g. Twitter, streams,Open Govt., OS, geonames, dbpedia the small data of ordinary life:

from local bus timetables to squash club league tables

Page 33: Data matters-bournemouth-2015

stories of small data …

Walking Wales

Learning analytics

Open Data Islands and Communities

Musicology

Page 34: Data matters-bournemouth-2015
Page 35: Data matters-bournemouth-2015

Alan Walks Wales

1058 miles (1700km)3 million footfalls3 ½ monthsApril-July 2013 focus on IT at the margins

one thousand miles of poetry, technology and community

Page 36: Data matters-bournemouth-2015

vision

personalencircling, encompassing, pilgrimage, homecoming,

practicalIT for the walker & IT for local communities

philosophicalreflections on walking and space, locality and identity

researchpersonal agenda and living lab

lots of

data

Page 37: Data matters-bournemouth-2015

data

locationGPX ... batteries ... sporadic signals ....

bio-sensingECG (heart), EDA (skin) and accelerometers

audio and imagesin the moment

textafter the event

implicit

explicit

The largest ECG trace in the public domain

Page 38: Data matters-bournemouth-2015

challenges (1)

locationGPX – merging and mending

bio-sensingECG & EDA – special formats & volume

audio and imagesvolume, transcription and annotation

textsemantic markup, synchronising sources

Page 39: Data matters-bournemouth-2015

challenges (2)

documentationmethodology of creation, data formatsfor other people to use!

meta-datafor machines to use

PRtelling the world about it!

academic culturewe do not value data!

Page 40: Data matters-bournemouth-2015

an offer

multiple synchronisable data streamslargest public domain ECG trace

post-hoc analysissimulate real use

please use it!

Page 41: Data matters-bournemouth-2015
Page 42: Data matters-bournemouth-2015

Learning analytics

macro-analyticsuniversity strategyMOOCs

micro-analyticsindividual course, student, resource

Page 43: Data matters-bournemouth-2015

time frames for learning analytics

days and hoursemail, during lectures and labs, stduent meetings, gaps

weekpreparing for teaching, exercises

months/mid-semesterreporting points, staff meetings, cohort/student progress

end of semester/term/yearexams, exam boards, course revew,

start of semester/term/yearpreparing for new courses or re-runs, rollover!

yearsnew courses, professional development, appraisal, promotion

Page 44: Data matters-bournemouth-2015
Page 45: Data matters-bournemouth-2015

Open Data

everyone is doing it

Governments, Cities, local gov.

In C21 Data is Power

Page 46: Data matters-bournemouth-2015

why not an island?

Page 47: Data matters-bournemouth-2015

island data flows

Community

groups and individuals

rest ofthe world

othercommunities

12

3

4

Page 48: Data matters-bournemouth-2015

island data flowsfrom community to world

Community

groups and individuals

rest ofthe world

1• visibility and

control• identity and

empowerment• level of detail• local knowledge

Page 49: Data matters-bournemouth-2015

island data flowsfrom world to community

Community

groups and individuals

rest ofthe world

2 • making the mostof open data

• local decision making

• lobbying and negotiation

Page 50: Data matters-bournemouth-2015

island data flowswithin the community

Community

groups and individuals

3

• gossip is not enough!• sparse, dispersed population• social cohesion and economic benefits

Page 51: Data matters-bournemouth-2015

island data flowsbetween communities

Community

groups and individuals

othercommunities

4

• sharing best practice• brand presence• interlinked data

Page 52: Data matters-bournemouth-2015

benefits to …

the communityempowerment and controlavailability of informationcommunication within and between communities

the worldimproved quality of datalevel of detail of datalocal knowledge and understanding

Page 53: Data matters-bournemouth-2015
Page 54: Data matters-bournemouth-2015

In Concert

Concert ephemera1750–1800 Calendar of London Concerts1815–1895 Concert Life in London1894–1944 Concert Programme Exchange (BL)

External sourcesMusicBrainzMBz id as connect into Linked Data, BBC, etc.

Authoritative sources (future)e.g. British Library BNB, Concert Programmes metadata

Page 55: Data matters-bournemouth-2015
Page 56: Data matters-bournemouth-2015
Page 57: Data matters-bournemouth-2015

concert databaseclassic digital humanities?

original sources

selectedsources

systematicsample

transcription& extraction

(medium expertise)

interpretation(high expertise)

digitisedsources

authoritativedata

analysis & use(high expertise)

academicpublication

large digitalarchive

(e.g. BBC)

possiblycreatelinkage

Page 58: Data matters-bournemouth-2015

Barriers to progress

effort and expertiseauthority and qualitydigital acontextualityopenness

Page 59: Data matters-bournemouth-2015

Openness and Reward

Career developmentLeverhulme & REFBuilding the discipline?

Page 60: Data matters-bournemouth-2015

Re-envisioning the Digital Archive:Curation and Use

Page 61: Data matters-bournemouth-2015

big bang to incremental

digitisedsources

authoritativedata

academicpublication

...

Page 62: Data matters-bournemouth-2015

big bang to incremental

problem focused augmentationtransform cost-benefit

digitialarchive

academicpublications

...

partialenhancement

& interpretation

Page 63: Data matters-bournemouth-2015

scenario-focused investigations

Page 64: Data matters-bournemouth-2015

=> reflection and requirements

digital symbiosissuggestion and confirmation

provenance and authority

spreadsheet as user interface

semantics through interaction

Page 65: Data matters-bournemouth-2015
Page 66: Data matters-bournemouth-2015

themes and take-aways ...

data in context

heterogeneity and linking

value and values

ethics and empowerment

…. and please use my data

Page 67: Data matters-bournemouth-2015