This link icon above automatically shows the looping slides John MacColl European Director, RLG...

48
John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation Or ‘How do we even get started?’

Transcript of This link icon above automatically shows the looping slides John MacColl European Director, RLG...

Page 1: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

John MacCollEuropean Director, RLG Partnership

9 June 2010

The Role of Libraries in Data CurationOr ‘How do we even get started?’

Page 2: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

2

What I want to talk about

•The importance of data• Institutional vs domain solutions•Skills needs•Our project•Reward structures

Page 3: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

3

The importance of data

Page 4: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

4

It’s the data, stupid

• ‘astronomers are just as likely to point a software query tool at a digital sky survey as to point a telescope at the stars’ (The Economist, Feb 2010)

• ‘“It's like the invention of the telescope," Franco Moretti, a Stanford professor of English and comparative literature, says of Google Books. "All of a sudden, an enormous amount of matter becomes visible.” (The Chronicle, ‘The humanities go Google’, May 28 2010)

Page 5: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

5

DataVerse (Gary King, 2007)

“Data sometimes exist on individual researchers’ Web sites, without professional backups, off-site replication, plans for format conversion and migration, or professional cataloging.”

Page 6: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

6

Pious hopes (Carole Palmer)

• 60% ‘archive’ generated or collected data (no offsite backup)

• 61% expect to keep more than 10 years

Page 7: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

7

Data lost, and data never born (U Wisconsin Summary Report of the Research Data Management Study Group (2009))

‘In some cases, inadequate storage capacity is leading to loss of data: forcing some researchers to discard data from past experiments in order to make room for current ones or to avoid certain types of experiments and research altogether’

Page 8: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

8

Data and their uses

Primary data: sensory, numeric, digitised, geospatial, etc

Secondary artifacts: statistical and pattern analyses; subset extractions; visualisations; simulations;

discovery environments

Ancillary data: questionnaires, fieldnotes, lab notebooks, data dictionaries, annotations, lecture

notes, etc

Freely available Locked away

Embargoed Shared with collaborators

transformations

Page 9: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

9

Don’t try this at home?

Page 10: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

10

Institutional vs domain solutions

Page 11: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

11

Blue Ribbon Task Force on Sustainable Digital Preservation and Access: on aggregation

‘Creating economies of scale among archives when possible is always desirable, and may be critical when the materials under stewardship require particular kinds of expertise that are scarce. This is the case for much scientific data.’

Page 12: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

12

Qualified gravitational pull (Green and Gutmann)

‘Most institutional repositories do not and cannot offer support for managing dataset formats over time … Policies for long-term stewardship vary among institutions, but many have developed a sliding scale of preservation promises’

Page 13: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

13

Oxford University: Research data management services: findings of the consultation with service providers (September 2008)

Page 14: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

14

Cornell DataStaR: a ‘staging repository’

Page 15: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

15

Datasets in Cornell IR

Page 16: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

16

Monash approach (institutional) (Treloar)

Page 17: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

17

U Wisconsin proposal

‘Solutions comprised solely of expensive technology will fail, because of the underlying need to establish long-lasting cultural stability within and between the research, library, and IT communities on campus.’

Page 18: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

18

Curation responsibilities (Carlson, The Chronicle, 2006)

“Data from Big Science is … easier to handle, understand and archive.

Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”

big science data

small science data

institution?

domain?

Page 19: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

19

Experiments … failures …

•NSF DataNet – Data Conservancy project. $20m awarded. Led by JHU. Includes social sciences.

•U. Va. Mellon grant $870k. Programmers and archivists. Includes Stanford, Yale and Hull. To create a model for digital collection management ‘that can be easily shared among research libraries’.

•UKRDS

Page 20: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

20

Meanwhile …

Page 21: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

21

Specialist data archives

Page 22: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

22

Skills needs

Page 23: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

23

Is this possible (Gabridge)?

‘libraries can develop existing liaisons with interest, passion, and strong analytical skills; or they can recruit domain experts, and teach them about excellent information science practices.’

Page 24: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

24

ARL study: Scott Brandt

Page 25: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

25

Our project

Page 26: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

26

Joint OCLC Research-LIBER

•Binghamton•Brigham Young•Cambridge•Leeds•Melbourne•Nijmegen•Oxford

Page 27: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

27

Deliverables

•Desk research•Case studies• Interviews with researchers•Report and recommendations

Page 28: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

28

Project Aim

‘It has been frequently asserted in the literature on data curation that there are new service roles for research libraries emerging. This project will seek to test this hypothesis by considering the data curation requirements of a number of recently completed research projects in a sample group of North American and European universities …’

Page 29: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

29

Method

‘Each university partner will produce two or three case studies of projects in which data has been generated, and consider the data curation implications of these … The project will conclude with an assessment of the potential role of the research library in general in relation to such datasets, based on the examples of good practice discovered via the case studies.’

Page 30: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

30

Project Approach

‘The proposed project will adopt a ‘bottom-up’ approach and be grounded in the realities of data storage and preservation behaviour as exemplified in a number of real instances’

Page 31: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

31

Scale again …

‘We consider that the question of how to arrive at an articulation between the institutional library and domain or funder data archives is one of the most urgent requirements in this area, and the project will explore it carefully.’

Page 32: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

32

Environments: data

Page 33: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

33

Timescapes (Leeds)

Page 34: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

34

Nyman/Jones Archive (Leeds)

Page 35: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

35

The Australian Women’s Register (Melbourne)

Page 36: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

36

Life Patterns (Melbourne)

Page 37: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

37

Incremental Project (Cambridge)

Page 38: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

38

What do we expect?

•Not a great deal!•Need to adjust our timescales?•Signs of progress?

• Indications of favourable organisational frameworks?• Indications of favourable policies?

•A taking of stock …

Page 39: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

39

Reward structures

Page 40: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

40

Day’s understatement

Page 41: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

41

Being excited about being cited (DataVerse, King)

‘Articles with accessible data are cited twice as often as otherwise equivalent articles that do not provide data access.’

‘Articles in journals with replication policies that make data available are cited thrice as frequently as otherwise equivalent articles without accessible data’

Page 42: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

42

Library neutrality (Steinhart, 2007)?

‘There is ample evidence that even when appropriate data repositories exist for a particular discipline, researchers often fail to take full advantage of them … This lack of participation in data sharing and archival activities suggests an opportunity for academic libraries to provide a much-needed service’

Page 43: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

43

Thinning the library

•No longer just about capture of outputs at the endpoint

•The library has to be involved in the whole process of research and scholarship, throughout its lifecycle

•This involves ‘thinning out’ the library•Rethinking the point of engagement•The library becomes engineering …•… and people

Page 44: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

44

1. What is the story of your data?2. What form and format are the data in?3. What is the expected lifespan of your data?4. How could your data be used, reused, and repurposed?5. How large is your dataset, and what is its rate of growth?6. Who are potential audiences for your data?7. Who owns the data?8. Does the dataset include any sensitive information?9. What publications or discoveries have resulted from the data?10. How should the data be made accessible?

Ten Questions to Begin a Conversation With Your Faculty About Data Curation (Witt & Carlson)

Page 45: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

45

Repositories at present are the wrong model (Green and Guttman)

‘repositories position themselves at or near the end of the scientific research life cycle. Their goal is less to partner with researchers or with domain-specific repositories throughout the research life cycle than … to garner the value of the institution’s productivity’

Page 46: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

46

Appraisal (Cornell)

‘The archivist can no longer wait “passively at the end of the life cycle for records to arrive at the archives when their creators no longer wanted them – or were dead” (Cook 2000).’

Page 47: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

Discussion!

John MacColl

Page 48: This link icon above automatically shows the looping slides John MacColl European Director, RLG Partnership 9 June 2010 The Role of Libraries in Data Curation.

The role of libraries in data curation. RLG Partnership Annual Meeting, Chicago, June 2010

48

Next up

Lunch and then…1:00

Framing Libraries and the EnvironmentLorcan Dempsey, OCLC Research

Buckingham