Post on 28-Mar-2015
Digital | Curation | Centre
An Introduction to the UK Digital Curation Centre
Dr Liz Lyon,
DCC Associate Director Outreach Director, UKOLN, University of Bath, UK
Funded by:
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
CURL/SCONUL Workshop
December 2005
2
Digital | Curation | Centre
Overview
• About the Digital Curation Centre– Organisation and structure
• What is digital curation?– e-Research cycle
• DCC activities– Development activity– Research agenda– Advisory services – Outreach programme
3
Digital | Curation | Centre
UK Digital Curation Centre
• Development activities
• Research agenda
• Delivering services
• Outreach Programme
• http://www.dcc.ac.uk/
4
Digital | Curation | Centre
DCC people (some of them…)
• Management & Co-ordination– Director Chris Rusbridge (University of Edinburgh)
• Community Support & Outreach– Led by Dr Liz Lyon (UKOLN, University of Bath)
• Service Definition & Delivery– Led by Professor Seamus Ross (HATII, University of Glasgow)
• Development– Led by Dr David Giaretta (Astronomical Software & Services, CCLRC)
• Research– Led by Professor Peter Buneman (University of Edinburgh)
5
Digital | Curation | Centre
For later use? In use now (and the future)?
What is digital curation?
Data preservation Data curation
Static Dynamic
“maintaining and adding value to a trusted body of digital information for current and future use”
6
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
7
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
8
Digital | Curation | Centre
9
Digital | Curation | Centre
Engineering Product Information
EPSRC Grand Challenge Project, Prof Chris McMahon, University of Bath
10
Digital | Curation | Centre
– Access Grid – Collaborative telematic art– Modify spaces for performers – Interplay: Hallucinations
11
Digital | Curation | Centre
Data capture & integration into research workflows
• R4L Repository for the Laboratory Project (JISC-funded) automated data capture from instrumentation, deposit of results (chemistry)
• SMART TEA electronic Laboratory notebook + annotations
12
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
13
Digital | Curation | Centre
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
The scholarly knowledge cycle.
Liz Lyon, Ariadne, July 2003.
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
© Liz Lyon (UKOLN, University of Bath), 2005
14
Digital | Curation | Centre
Disciplinary data-centres
15
Digital | Curation | Centre
eBank UK Project• Two key themes:
– Open access to datasets– Linking research data to publications and to learning
• UKOLN, University of Southampton, University of Manchester• e-Science application ‘Combechem’ : Grid-enabled combinatorial
chemistry + National Crystallography Service• Resource Discovery Network / PSIgate physical sciences portal
http://www.ukoln.ac.uk/projects/ebank-uk/
16
Digital | Curation | Centre
A data repository entry
17
Digital | Curation | Centre
Access to the underlying data: complex objects
ecrystals.chem.soton.ac.uk
18
Digital | Curation | Centre
Data descriptions• Validation, publication & discovery
of data models & schema• Managing complex objects • Metadata packaging standards
– METS– MPEG 21 DIDL
• Semantic descriptions– Formal controlled vocabularies– High-level and domain ontologies– Inter-disciplinary discovery
• Informal approaches Web 2.0 “folksonomies”
19
Digital | Curation | Centre
Trusted digital repositories
• Audit Checklist for Certification • Draft Report published August 2005• Research Libraries Group RLG-NARA
Taskforce • Defined criteria under 4 categories
– Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure
20
Digital | Curation | Centre
OAIS Reference Model
4-1.
2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
21
Digital | Curation | Centre
DCC: Development
• “DCC Approach to Digital Curation” based on the Reference Model for an Open Archival Information System (OAIS); ISO standard, 14721:
– Monitoring international standards– Development of a Representation Information
(RI) registry/repository (DCC-RR)– Recommendations for tools and methods for
generating Representation Information– Creating test-beds for digital curation tools
Development info – see
http://dev.dcc.ac.uk
for details of Wiki and email list open to all
22
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
23
Digital | Curation | Centre
Persistent identifiers for data citation
• Identify use cases: depositor, author, service provider, reader, publisher, ?
• Schemes: DOI, Handle, ARK, PURL• Global identification: express as http URIs• Added value services: CrossRef, resolution service,
integration (Globus), look-up service• Domain identifiers: e.g. International Chemical Identifier
(INChI) codes• Google molecules using InChIs demo:
Peter Murray-Rust, University of Cambridge• DCC Workshop June 2005 Glasgow
24
Digital | Curation | Centre
One approach to data citation using DOIs
• Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany STD-DOI Project http://www.std-doi.de
• DOI registry for datasets• Data publication agents: World Data Center Climate,
GeoForschungsZentrum Potsdam • Data requirements: quality control, long-term curation,
use DOI resolver• Exemplar data citation:
– Kamm, H; Machon, L; Donner, S (2004): Gas chromatography (KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p
25
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
26
Digital | Curation | Centre
Adding value: eBank linking data to publications
27
Digital | Curation | Centre
Linking research to learning - embedding eBank aggregator service in a science portal for student learners
28
Digital | Curation | Centre
Adding value through annotation
DCC Research at the University of Edinburgh
• Scientific databases: Annotation scoping report
• AstroDAS: distributed annotation servers in astronomy
• New annotation model + prototype: top-ranked demonstration at recent DB conference
29
Digital | Curation | Centre
DCC Research agenda
• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted
repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities
– “Public domain, public interest, public funding” paper Waelde & McGinley
30
Digital | Curation | Centre
(Very simple) e-Research Cycle and Data Curation
Formulate hypothesis / ideas, test, experiment, observe: data creation,
collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open access
Collaboration
Scholarly communications: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
31
Digital | Curation | Centre
Facilitate “post-processing” and knowledge extraction
Enable the acquisition of newly-derived information and knowledge
• Run complex algorithms over primary datasets
• Mining (data, text, structures)
• Modelling (economic, climate, mathematical, biological)
• Analysis (statistical, lexical, pattern matching, gene)
32
Digital | Curation | Centre
33
Digital | Curation | Centre
DCC Case Study published: Wide Field Astronomy Unit
34
Digital | Curation | Centre
Supporting the community• DCC Outreach & Services:
– HELPDESK@dcc.ac.uk (legal - technical guidance)
– Curation Manual 45 chapters planned, Briefing Papers
– Workshops: Future-proofing Institutional Web sites, Jan 19-20, London
– Information Days: regional– 1st International DCC
Conference, Bath Sept 2005 – PV2005 November,
Edinburgh– 2nd International Conference
November 2006 Glasgow tbc
35
Digital | Curation | Centre
• www.ijdc.net
• Peer-review Editorial Board
• Peter Buneman Editor (research)
• Production editor Richard Waller
• Papers for submission are very welcome!
• 1st issue soon….
36
Digital | Curation | Centre
Associates Network
Goals
Develop understanding, share best practice, advance research, promote recognition, develop consensus
Membership
International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals……
Benefits
Early access to R&D outputs, advisory services, training, input to definition and design, community participation
Discussion Forum www.dcc.ac.uk Please join us!
37
Digital | Curation | Centre
Developing skills & collaboration
• NSF Report : “Data scientist”• Develop hybrid skills• Embed in u/g, p/g curriculum• Facilitate community
collaboration: – Researchers – Data centres – Libraries & archives
• New roles???• Achieve cultural change
Digital | Curation | Centre
Thank you.Questions?
e.lyon@ukoln.ac.uk
Join the DCC Associates Network at www.dcc.ac.uk