Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director
description
Transcript of Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director
![Page 1: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/1.jpg)
Liz Lyon Associate Director, Outreach
Chris Rusbridge, DCC Director
UK Digital Curation Centre One Year On
Digital Curation Centrea centre of support for data curation and preservation
![Page 2: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/2.jpg)
2
Overview
• Why is digital curation important?
• What are the challenges that the DCC faces?
• About the people and our collaborative approach
• Addressing the issues
• How can you contribute to the DCC?
![Page 3: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/3.jpg)
3
Curation?
“maintaining and adding value to a trusted body of digital information for
current and future use”
![Page 4: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/4.jpg)
4
For later use? In use now (and the future)?
Digital curation continuum
Data preservation Data curation
Static Dynamic
![Page 5: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/5.jpg)
5
Assuring permanent access to the records of science & the humanities?
Long term access to primary data
• Increasing data volumes from eScience and Grid-enabled / cyberinfrastructure applications
• Changing research paradigm: data-driven science, “big science”
• Observational data, simulations, large-scale experimentation
• Multi-media resources, statistical data, surveys, geo-spatial data……
![Page 6: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/6.jpg)
6
![Page 7: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/7.jpg)
7
Facilitate “post-processing” and knowledge extraction
Enable the acquisition of newly-derived information and knowledge
• Run complex algorithms over primary datasets
• Mining (data, text, structures)
• Modelling (economic, climate, mathematical, biological)
• Analysis (statistical, lexical, pattern matching, gene)
• Presentation (visualisation, rendering)
![Page 8: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/8.jpg)
8
![Page 9: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/9.jpg)
9
Provide additional functionality beyond digital preservation processes
Annotations
• Gene and protein sequences
• e-Lab books (Smart Tea Project in chemistry)
![Page 10: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/10.jpg)
10
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The scholarly knowledge cycle : linking research data to publications
eBank UK Projecthttp://www.ukoln.ac.uk/projects/ebank-uk/
Emerging policy on open access to data
![Page 11: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/11.jpg)
11
DCC people (some of them…)
• Management & Co-ordination– Director Chris Rusbridge (University of Edinburgh)
• Community Support & Outreach– Led by Dr Liz Lyon (UKOLN, University of Bath)
• Service Definition & Delivery– Led by Professor Seamus Ross (HATII [ERPANET], University of
Glasgow)
• Development– Led by Dr David Giaretta (Astronomical Software & Services,
CCLRC)
• Research– Led by Professor Peter Buneman (Informatics, University of
Edinburgh)
![Page 12: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/12.jpg)
12
The challenges we face
Standards
• Interoperability issues: technical & hopefully soluble
Scale
• Volume and diversity of datasets
Culture
• Bringing communities together
• Library/information science/archives “document tradition”
• Domain research (chemists, astronomers, biologists)
• Computer science (databases)
• Commercial suppliers (storage technology)
![Page 13: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/13.jpg)
13
More challenges……
Process
• Highly-distributed organisation: use collaborative tools
Skills
• Distributed amongst the 4 partners & beyond
Engagement
• Lots of existing work and many significant players
Impact
• Visible & measurable, in the short & long-term
Meeting expectations (which are high…..)
• Of the community and our funders
![Page 14: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/14.jpg)
14
User requirements analysis
Commissioned study
• Leona Carpenter
• Reporting now
• Desk-based research
• Focus groups
• Interviews
Results will inform research, development service definition / delivery and outreach
Recommendations and priority tasks
![Page 15: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/15.jpg)
15
Some sound bytes…
R&D issues: Annotation services, Ontology development, Automating metadata creation, Tools and toolkits, Data Format Description Language, Identifiers, Registries, Economic and cost-benefits studies
Advisory services :“Ask-a-Curator”,FAQs, reports, briefings, awareness-raising materials, best practice guidance, Storage media, “Like Erpanet”, advise Government, Research Councils, funding bodies
Professional development: Short courses, conferences, seminars, workshops, secondments to DCC and to working repository services
Outreach: Leadership for the future, case studies, sharing solutions, collaboration with other partners, international peers, industry links
Taxonomy of “Users”
![Page 16: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/16.jpg)
16
Outline Taxonomy of digital curation users by role
1. Data Creators
2. Data Curators
3. Data Re-users
4. Policy makers
-funding bodies
-other leaders
![Page 17: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/17.jpg)
17
Outline Taxonomy of digital curation users by role
1. Data Creators
2. Data Curators
3. Data Re-users
4. Policy makers
-funding bodies
-other leaders
Data Preservers
Data publishers
![Page 18: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/18.jpg)
18
Outline Taxonomy by significant function of organisational entity
1. Research
2. Service provision
3. Learning & teaching
4. Funders
5. Policy / strategy makers
“Designated communities”
![Page 19: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/19.jpg)
19
Outline Taxonomy by significant function of organisational entity
1. Research
2. Service provision
3. Learning & teaching
4. Funders
5. Policy / strategy makers
“Designated communities”
Commercial
![Page 20: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/20.jpg)
20
Service definition & delivery• Advisory services
– Responses to queries—from legal to technical guidance [email protected]
– Site visits (National Institute of Environmental eScience)
• Information Services– Briefing Documents - Freedom of Information by Mags
McGinley– DIGITAL CURATION MANUAL– 20 chapters written by community experts e.g. Metadata
written by Michael Day, UKOLN– Peer-reviewed– Checklist for Compliance with best practices and standards– Technology Watch
![Page 21: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/21.jpg)
21
Services: workshops
• 2005 Programme – Preservation of medical databases:
24-25 May at the Gulbenkian Institute, Lisbon in collaboration with ERPANET & the Wellcome Trust
– Institutional repositories: 6 July at the University of Cambridge, UK in collaboration with DSpace
– Cost models in collaboration with the Digital Preservation Coalition July at British Library
– Persistent identifiers liaising with NISO, summer, UK location tbc
![Page 22: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/22.jpg)
22
Development approach• OAIS (Open Archival Information System)
linkage: focus on representation information – link to global work on format registries?– Concentrate on scientific data formats?
• Repository– Representation Information– Standards and Tools– Aim for OAIS compliance
• Persistent identifiers• Certification… RLG task force• Open development wiki and email list
![Page 23: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/23.jpg)
23
OAIS Reference Model – Functional Model
4-1.
2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
How relevant to curation?
![Page 24: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/24.jpg)
24
Representation Net
![Page 25: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/25.jpg)
25
Representation Information More detail
How does this relate to format registries?
![Page 26: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/26.jpg)
26
High Level View
Example of use of Representation Information Labelling
![Page 27: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/27.jpg)
27
Registry issues?
• Trusted repository of Representation Information– Authenticity of information– Access control– Certificates/Digests : (are they trustable over the long
term?)• Findability
– Persistent IDs• What can we rely on?
– Labels (to support automated processing)• Extensibility• Distributed
![Page 28: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/28.jpg)
28
Registry development
• Simple PHP prototype
• Scoping study- unification– Formats, standards, tools
• More robust prototype in development– Based on ebXML & JAXR– Potentially distributed, cooperative
maintenance model
![Page 29: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/29.jpg)
29
Development Roadmap
• Registry: complete prototype, link to PRONOM, GDFR etc, handover to service
• Representation information: describe CCLRC (science) data using EAST, etc
• Certification work continues• Additional tools: metadata extraction• Testbeds, interactions with others
![Page 30: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/30.jpg)
30
Research approaches
• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted
repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities
![Page 31: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/31.jpg)
31
The database picture
Source data Curated data: classified, cleaned, annotated, integrated, cross-linked
![Page 32: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/32.jpg)
32
Curated Databases are Central
Much/most scientific data is now in databases• They often do not contain source experimental data. Sometimes
just annotation/metadata• They borrow extensively from, and refer to, other databases• You are now judged by your data as well as your (paper)
publications!!• These databases are built and maintained with a great deal of
human or computational effort.
What makes a database?– it has internal structure or it changes.Size alone doesn’t qualify
![Page 33: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/33.jpg)
33
Archiving (preserving) volatile databases
• How do you preserve something that changes every hour or minute?– Important for the scientific record – someone might have
cited your data at time t.
• Current practice– Create versions (how often?)– Log changes – Use diffs– Do nothing (common!)
![Page 34: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/34.jpg)
34
Curated databases – some issues
• Integrating and publishing data so that someone else can use it.
• Annotating existing data and moving annotations to other databases
• Provenance: where did this data come from?
• Archiving: how do you preserve something that is constantly changing?
![Page 35: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/35.jpg)
35
How do we cite data?
• A URL or citation to an article is already unsatisfactory.– DCC client complaint: “I spend a lot of time
searching [electronic documents] for the part that is relevant to the citation.”
• The problem is much worse when you are citing something in a very large database.
• How do you use a citation to locate data?• How do you ensure that the citation
persists?– Connections with DB archiving and DOIs
![Page 36: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/36.jpg)
36
Research approaches
• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted
repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities
– “Public domain, public interest, public funding” paper Waelde & McGinley
![Page 37: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/37.jpg)
37
www.dcc.ac.uk
![Page 38: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/38.jpg)
38
• www.ijdc.net
• Launch planned June/July
• Peer-reviewed contributions
• Peter Buneman Editor (research)
• Production editor Philip Hunter
![Page 39: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/39.jpg)
39
Sample issue
Full papers
Invited articles
News & views
Papers for submission are very welcome!
![Page 40: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/40.jpg)
40
1st DCC International Conference
• Location - Bath UK
• 29-30 September 2005
• Keynote speakers
Cliff Lynch CNI
Graham Cameron European Bio-informatics Institute
• DCC Research update
• Social highlights
![Page 41: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/41.jpg)
41
Associates Network
Goals
Develop understanding, share best practice, advance research, promote recognition, develop consensus
Membership
International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals……
Benefits
Early access to R&D outputs, advisory services, training, input to definition and design, community participation
Discussion Forum www.dcc.ac.uk Please join us!
![Page 42: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/42.jpg)
42
CCLRC UKOLN
UofGUofE
CMS-Bristol
NIEeS
RG
Durham
WT-CFGLeicester
ICMaastricht
Oxford
Dutch NASwiss NAUrbino
UNC
Salzburg
SDSC
NEODC
CEH
RI
NCS
RLG
Innogen
NHS
Capri NTUAINRIAHUJUPCMax-
PlanckMIMAS
IASSIST
LDCACM
Data Archive
EDGGridPPEGEE
CambridgeLeicester
Jodrell Bank
DLI (US)DPC
DELOS
UNC
ESA
NASANARACNESESARLG
BNSC
TU Vienna UPenn
EBIMRC HGU
KyotoUSC
INRIA
GSK
Roslin
IBM Almaden
JHUCSIRO
CaltechJHU
CSIRO
CDSESO
OCLC
AHDSMicrosoft
IBMOracle
BTSTK
BADCBODC
ESO
IVOA
ResearchCouncils
HEIs&
FE
ResearchInstitutes
InternationalCollaborations
StandardsBodies
DPC
MIMAS
ILRT
Council forMuseums, Archives
& LibrariesRDN. OCLC
So’ton
OAI
NOF
NLA
NeSC
![Page 43: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/43.jpg)
Acknowledgements
Slides from Peter Buneman, David Giaretta and others used
with thanks.
![Page 44: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director](https://reader033.fdocuments.us/reader033/viewer/2022051623/56815980550346895dc6bf62/html5/thumbnails/44.jpg)
44
How you can help us
How does OAIS relate to curation?
How do format registries relate to representation information?
Who else is working across these areas?
What outcomes would you like to see?