Data matters-bournemouth-2015

Post on 07-Jan-2017

469 views 0 download

Transcript of Data matters-bournemouth-2015

Data Matters

Alan DixTalis & University of Birmingham

http://alandix.com/ref2014/

University ofBirmingham

Tiree

Tiree Tech Wave22-26 October 2015

today I am not talking about …

• intelligent internet interfaces• visualisation and sampling• situated displays, eCampus,

small device – large display interactions• fun and games, virtual crackers,

artistic performance, slow time• creativity and Bad Ideas• modelling dreams and regret

and the emergence of self

… or even lots of lights

http:/www.hcibook.com/alan/projects/firefly/

I am talking about ...

REF data analysis

long tail of small data

REF

REF 2014Research Excellence Framework

approx 5 yearly research assessment in the UK

not just about the UK …lots of countries thinking to do similar ... and looking to REF as example

REF elements

three elements:

outputs (mainly papers)

impact environment

focus of this work

REF panels

4 main panels, 36 sub-panels, ~200K outputs

sub-panel 11: computer science and informatics

I was on this panel but NO confidential data hereeverything public domain

REF profilesevery output graded: 4* / 3* / 2* / 1*

individual grades confidential and destroyed

each ‘Unit of Assessment’ (dept) given a profile

http://results.ref.ac.uk/Results/ByUoa/11/Outputs

sub-area profilesN.B. computing only

each output given ACM code

originally to enable allocation to panelists

… but, also used to create sub-area profiles …

sub-area profiles

From Morris Sloman’s slides & panel report

theoretical areas30-40% 4*

applied/human areas10-20% 4*

data not information

sub-panel report warning:"These data should be treated with circumspection …

however already affecting institutional policyhiring, internal investment

… and may influence research council policy

possible reasons for variation …

1. best applied work is weak– including HCI :-/

2. long tail– weak researchers choose applied areas

3. latent bias– despite panel’s efforts to be fair

can bibliometrics disentangle these?

metrics and assessment

citation metrics known to be good post-hoc correlates of sophisticated measures

… but not for individuals and small cohorts and danger of gaming and policy distortion

suitable for verifying large-scale patterns(and HEFCE using them for this)

data used for analysisall in public domain

(virtually) complete list of outputs:– excluding a few confidential ones– for each: name, doi, ACM topic area, Scopus citations

Google scholar citations for each– gathered after REF (not used in assessment)

UoA and sub-area profiles

metrics used

Scopus (late 2013 census )– with/without 2012/13 as few citations

‘Normalised Scopus’– using ‘contextual data’, corrects for

different citation patterns between areas– places output in top 1%, 5%, 10% of its area worldwide

Google Scholar (late 2014 census)– with/without 2012/13; zero treated as zero/missing

seven variants – all give similar results

results … massive differences% citations intop quartile

% REF 4* ratio

winners

losers

‘scatter’ graph

% outputs in top quartile for citations

% outputsawardedREF 4*

rank scores

winners

losers

diagram thanks to Andrew Howes

Another way of looking at it …world ranking within own field

recall REF …

for example,HCI research (web similar) …

on average …

• HCI/CSCW paper needs to be in top 0.5%worldwide to get 4*

• logic/algorithms paper just needs to be in top 5%

10 fold difference

and just as you thought it was all over …… institutional effectslook at +/- 25% REF compared with citationsN.B. use high-end weighted measure as money is focused (4:1:0:0)

of 35 losers, 25 are post-1992 universitiesof 17 winners, 16 are pre-1992 universities

an example …

XXXXXXX – a new universityYYYYYYYY – an old university

World Rankings

REF

and Gender?Female authors in main panel B were significantly less likely to achieve a 4* output than male authors with the same metrics ratings. When considered in the UOA models, women were significantly less likely to have 4* outputs than men whilst controlling for metric scores in the following UOAs: Psychology, Psychiatry and Neuroscience; Computer Science and Informatics; Architecture, Built Environment and Planning; Economics and Econometrics.

The Metric Tide (HEFCE, 2015)

implicit bias?

HEFCE analysis:male staff in computing is 1/3 more likely to get a 4* than female

areas and types institutions disadvantaged by REFoften those with more women

… implications for future recruitment?

future for research assessment?

• pure metrics?

• metrics as part (e.g. older outputs)

• metrics as under-girding (burden of proof)

• human process – metrics for in-process feedback

..

long tail of small data

Big Dataeveryone is talking about it

Twitter, Google, Facebook, NSA, universities, … and funding

Big Data does it with MapReduceSemantic Data does it with RDF

the long tail

size ofdata set

a few very large data setse.g. Twitter, streams,Open Govt., OS, geonames, dbpedia the small data of ordinary life:

from local bus timetables to squash club league tables

stories of small data …

Walking Wales

Learning analytics

Open Data Islands and Communities

Musicology

Alan Walks Wales

1058 miles (1700km)3 million footfalls3 ½ monthsApril-July 2013 focus on IT at the margins

one thousand miles of poetry, technology and community

vision

personalencircling, encompassing, pilgrimage, homecoming,

practicalIT for the walker & IT for local communities

philosophicalreflections on walking and space, locality and identity

researchpersonal agenda and living lab

lots of

data

data

locationGPX ... batteries ... sporadic signals ....

bio-sensingECG (heart), EDA (skin) and accelerometers

audio and imagesin the moment

textafter the event

implicit

explicit

The largest ECG trace in the public domain

challenges (1)

locationGPX – merging and mending

bio-sensingECG & EDA – special formats & volume

audio and imagesvolume, transcription and annotation

textsemantic markup, synchronising sources

challenges (2)

documentationmethodology of creation, data formatsfor other people to use!

meta-datafor machines to use

PRtelling the world about it!

academic culturewe do not value data!

an offer

multiple synchronisable data streamslargest public domain ECG trace

post-hoc analysissimulate real use

please use it!

Learning analytics

macro-analyticsuniversity strategyMOOCs

micro-analyticsindividual course, student, resource

time frames for learning analytics

days and hoursemail, during lectures and labs, stduent meetings, gaps

weekpreparing for teaching, exercises

months/mid-semesterreporting points, staff meetings, cohort/student progress

end of semester/term/yearexams, exam boards, course revew,

start of semester/term/yearpreparing for new courses or re-runs, rollover!

yearsnew courses, professional development, appraisal, promotion

Open Data

everyone is doing it

Governments, Cities, local gov.

In C21 Data is Power

why not an island?

island data flows

Community

groups and individuals

rest ofthe world

othercommunities

12

3

4

island data flowsfrom community to world

Community

groups and individuals

rest ofthe world

1• visibility and

control• identity and

empowerment• level of detail• local knowledge

island data flowsfrom world to community

Community

groups and individuals

rest ofthe world

2 • making the mostof open data

• local decision making

• lobbying and negotiation

island data flowswithin the community

Community

groups and individuals

3

• gossip is not enough!• sparse, dispersed population• social cohesion and economic benefits

island data flowsbetween communities

Community

groups and individuals

othercommunities

4

• sharing best practice• brand presence• interlinked data

benefits to …

the communityempowerment and controlavailability of informationcommunication within and between communities

the worldimproved quality of datalevel of detail of datalocal knowledge and understanding

In Concert

Concert ephemera1750–1800 Calendar of London Concerts1815–1895 Concert Life in London1894–1944 Concert Programme Exchange (BL)

External sourcesMusicBrainzMBz id as connect into Linked Data, BBC, etc.

Authoritative sources (future)e.g. British Library BNB, Concert Programmes metadata

concert databaseclassic digital humanities?

original sources

selectedsources

systematicsample

transcription& extraction

(medium expertise)

interpretation(high expertise)

digitisedsources

authoritativedata

analysis & use(high expertise)

academicpublication

large digitalarchive

(e.g. BBC)

possiblycreatelinkage

Barriers to progress

effort and expertiseauthority and qualitydigital acontextualityopenness

Openness and Reward

Career developmentLeverhulme & REFBuilding the discipline?

Re-envisioning the Digital Archive:Curation and Use

big bang to incremental

digitisedsources

authoritativedata

academicpublication

...

big bang to incremental

problem focused augmentationtransform cost-benefit

digitialarchive

academicpublications

...

partialenhancement

& interpretation

scenario-focused investigations

=> reflection and requirements

digital symbiosissuggestion and confirmation

provenance and authority

spreadsheet as user interface

semantics through interaction

themes and take-aways ...

data in context

heterogeneity and linking

value and values

ethics and empowerment

…. and please use my data