Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian...

30
www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with data down under CNI Winter 2007 Project Briefing

Transcript of Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian...

Page 1: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

Andrew Treloar, ARCHER Project DirectorCathrine Harboe-Ree, University LibrarianAlan McMeekin, Executive Director ITS

Dancing with data down under

CNI Winter 2007 Project Briefing

Page 2: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

2

O is for Overview

• Drivers for what we are presenting• Research case study overview• Challenges and solutions• Australian national developments

Page 3: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

3

D is for Drivers

Page 4: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

4

D: Monash – a distinctive and internationalised university

• Established 1960

• Research intensive, doctoral granting

• 55,000 students from more than 100 countries

• 6.1% of student load is graduate

• 3,500 academic staff (6,800 total EFT staff)

• 10 faculties

• Campuses in Australia (six), Malaysia, South Africa, centre in Prato

• Partnerships – India, Hong Kong, Singapore, China

• Total research income $186 mill. (2006)

Page 5: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

5

D: Information Management Strategy

• 2 year initiative to develop an overarching strategy for the whole university

• Took holistic view of information• Informed by views of range of information

management professionals and stakeholders• Report available at:

www.monash.edu.au/staff/information-management/• Based on set of ten principles that have been extended

into the research data domain

Page 6: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

6

D: Monash data management environment

• High level support– DVC (Research), Prof Edwina Cornish– Establishment of E-Research Centre

• Need to manage growing deluge– Leading E-researchers in some disciplines – Synchrotron (1 TB per day)– Shoah Archives (12 TB)– And others

• Need to respond to Australian Code for the Responsible Conduct of Research

– www.nhmrc.gov.au/publications/synopses/r39syn.htm

Page 7: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

7

Source: Adapted from Liz Lyon, eBank UK Presentation

Grid

E-Researchers

Entire E-Research LifeCycleEncompassing experimentation, analysis, publication, research, learning

5

Institutional Archive

LocalWebPublisher

Holdings

Digital Library

E-ResearchersGraduate Students

Virtual Learning Environment

E-Experimentation

E-

Technical Reports

Reprints

Peer-Reviewed Journal & Conference Papers

Preprints & Metadata

Certified Experimental Results & Analyses

Data, Metadata & Ontologies

DARTDART

ARROWARROW

ARCHERARCHER

D: Three inter-related national projects

Page 8: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

8

R is for Research case study

Page 9: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

9

R: Structure determines function

Unfolded protein is chain of amino acids

• Highly mobile• Inactive

Sequence

Folded protein

• Precise shape• Stable• Highly ordered• Active

Structure

Function depends on protein shape

• Specific associations• Precise reactions

Function

Page 10: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

10

R: Flow of biological Information

Page 11: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

11

R: How to solve a structure

Fourier synthesis Electron densityPhases+

Experimental methods = back to lab

Use known structures (molecular replacement)

3D structure

Diffraction intensities

Page 12: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

12

R: Resulting publication in Science

Page 13: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

13

Page 14: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

14

R: Access Statistics: 23/8/2007 to 1/12/2007

• Views: 918 total– 257 from library staff

– 152 from other Monash addresses

– 509 from non-Monash addresses

• Downloads: 498 total– 87 from library staff

– 62 from other Monash addresses

– 349 from non-Monash addresses

Page 15: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

15

R: Why he cares about data

• Raw data are sacred• Data validation for reviewers and by peers• His data are now safe and secure• Store of examples for those doing methods

development• Some data cannot be processed by him;

why not let others have a go?

Page 16: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

16

C is for Challenges and Solutions

• Laboratory data management practice• Institutional data management planning• Sustainable storage provision• Data curation across data stores• Data in institutional repositories

Page 17: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

17

C: Laboratory data management practice

• Challenge– Infrequent and deficient backup– No commitment to long-term preservation– Poor recording of metadata

(descriptive/provenance)• Solution

– Embed IM professionals with research teams– Provide sustainable storage for backup– Improve laboratory data capture systems

Page 18: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

18

C: Institutional data management planning

• Challenge– No systematic organisation-wide approach– No way of engaging with researchers

Page 19: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

19

S: Institutional forum to discuss issues

• Membership– Library

– ITS

– Records and Archives

– Research Office

– e-Research Centre

• Outputs– Policy and Plan (print trial, web production)

– Outreach activities

Page 20: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

20

S: Data Management Plan – objectives

• Assists both researcher and institution• Is completed at beginning of research project,

updated as necessary– May become mandatory in future

• Captures some technical, access and descriptive metadata at the beginning of research project

• Is not onerous• Delivers visible benefits • Assists in providing complete research data solutions

Page 21: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

21

S: Data Management Plan – components

• Originators and owners of the data• Description of project• Metadata used (schema, standards)• Types of data to be collected• Volume of data (initial estimate)• Retention requirements (guidelines provided)• Format/s of and software used in creation and use of the

data • Access policies and provisions• IP constraints• Confidentiality requirements• Storage, preservation and archiving of data

Page 22: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

22

C: Sustainable storage provision

• Challenge– Need sustainable way to provide large (terabyte)

amounts of storage for researchers– Make this more financially attractive than JBOD

under desk • Solution

– Large Research Data Storage (LaRDS)

Page 23: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

23

C: LaRDS requirements

• Addresses institutional and researcher needs• Formulates a set of principles to guide cost

modelling and sustainable funding options• Assumes commitment to storage in perpetuity

– or “as long as required”, whichever comes first ;-) • Adopts a central storage model …

– Centrally funded basic allowance, plus– Directly charged excess allowance

• … in parallel with decentralised storage• 700 TB and growing

Page 24: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

24

C: Different stores for different domains

Page 25: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

25

C: Data in institutional repositories

• Challenge– Most IRs are designed for document objects– Many data objects are large

> 2QP2 produced 36GB of image data– HTTP download metaphor doesn’t scale

• Solution– Trialling both managed content and externally

referenced content at present– Investigating custom disseminators on server

Page 26: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

26

A is for Australian national developments

Page 27: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

27

A: Australian e-Research Infrastructure

• Term ≈ Cyberinfrastructure• National Collaborative Research Infrastructure

Strategy (A$555M, 5 yrs)– 15 research capabilities– and Platforms for Collaboration

• Platforms for Collaboration (A$75M, 4.5 yrs)– National Computation Infrastructure– Interoperation and Collaboration Infrastructure– Australian National Data Services

Page 28: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

28

A: Australian National Data Service

• Monash University is leading a project to establish ANDS

• ANU and CSIRO to be other members of collaborative partnership

• Tasks to be distributed more widely• Four platforms:

– Frameworks (policy)– Utilities– Repositories– Researcher Practice

• http://www.pfc.org.au/twiki/bin/view/Main/Data

Page 29: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

29

Q is for Questions!

[email protected]

[email protected]

[email protected]

• http://arrow.edu.au/

• http://dart.edu.au/

• http://archer.edu.au/

* Thanks to Dr Ashley Buckle and colleagues at Monash for the use of the protein crystallography slides and movies

Page 30: Www.monash.edu.au Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS Dancing with.

www.monash.edu.au

30

Federating Data

• The Australian Repository for Diffraction ImageS– http://www.tardis.edu.au/

• National activity to support communities of protein crystallographers

• Ideal place to hook into the eCrystals Federation– http://wiki.ecrystals.chem.soton.ac.uk/