Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
-
Upload
university-of-california-curation-center -
Category
Business
-
view
854 -
download
2
description
Transcript of Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
![Page 1: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/1.jpg)
Building Communities and Services in Support of Data-Intensive Research
Stephen AbramsUniversity of California Curation Center
California Digital Library
August 20, 2013
![Page 2: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/2.jpg)
Topics
Data curation UC3 services
DMPTool DataUp EZID Merritt WAS
Collaborative initiatives DataShare Research Hub
Conclusions
![Page 3: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/3.jpg)
Why is data curation important?
Integrity Enabling appropriate scrutiny, debate, reproduction, and
verification of results
Efficiency Avoiding needless duplication of effort
Policy Complying with institutional policies, publication requirements,
and funder mandates
“[Data] is a valuable national asset whose value is multiplied when it is made easily accessible to the public”
– Office of Science and Technology Policy
![Page 4: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/4.jpg)
Why is data curation important?
Catalyzing Promoting progress through new collaborations and creative
(re)use of data
“If I have seen further it is by standing on the shoulders of giants”– Isaac Newton, 1676
![Page 5: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/5.jpg)
What is the library’s role?
A continuation of its long-standing mission and practice to connect patrons with content of interest in meaningful ways across barriers of space and timeCf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th IFLA
General Conference and Assembly, Helsinki, http://conference.ifla.org/past/ifla78/116-tenopir-en.pdf
Offering solutions that enhance the natural points of alignment between the scholarly research and information lifecycles
Publish
Reuse
ShareCreate
Discover
Collect
PreserveAccessResearchResearch CurationCuration
Scholarly lifecycle Information lifecycle
![Page 6: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/6.jpg)
Why is data curation hard?
Ever increasing number, size, and diversity of content Inevitability of disruptive change Resources not keeping pace with growth Stakeholders outside of traditional cultural heritage domains,
with lots of questions
Who can give me advice on what I should do? How should I describe and package my data? How can I cite my data in order to receive
credit for it? How can I share my data? What can I do with web published data?
…
![Page 7: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/7.jpg)
DMPTool – guidance and resourcesFinalist, 2012 DPC Award for Research and Innovation
http://dmptool.org/ Create, edit, and share data
management plans Meet funder requirements
Provide institutional guidance Links to local resources
![Page 8: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/8.jpg)
DMPTool – guidance and resourcesFinalist, 2012 DPC Award for Research and Innovation
http://dmptool.org/ Create, edit, and share data
management plans Meet funder requirements
Provide institutional guidance Links to local resources
![Page 9: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/9.jpg)
DMPTool – guidance and resourcesTwo recently funded projects Functional
enhancements and open source community developmentSloan Foundation
Training and outreachIMLS
http://dmptool.org/ New options for DMP
collaboration and formal and ad hoc review
Stronger administrative control and customization
![Page 10: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/10.jpg)
DataUp – description and packaging
http://dataup.cdlib.org/ http://www.dataup.org/
“It’s easier to augment systems than it is to change behavior”
Curation for tabular datasets Excel add-in Azure cloud service
![Page 11: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/11.jpg)
DataUp – description and packaging
http://dataup.cdlib.org/ http://www.dataup.org/
Best practices check Data description Identifier and citation generation Repository submission to
ONEShare
Curation for tabular datasets Excel add-in Azure cloud service
![Page 12: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/12.jpg)
DataUp – description and packaging
http://dataup.cdlib.org/ http://www.dataup.org/
What researchers don’t need to know Schema definition and XML syntax Identifier registration procedures Citation format Repository packaging and submission Harvesting for aggregation
2013 Innovation Award winner
Recently funded project Functional enhancements and open source
community developmentNSF
![Page 13: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/13.jpg)
EZID – identification and citation
http://n2t.net/ezid/
UC3 is a founding member of the DataCite consortium
Mint DOI and ARK
Add descriptive metadata
Receive QR code Global resolution Aggregated
discovery Updatable
resolution URLs
Establish and maintain persistent two-way linkages between the literature and the data that underlies its results
![Page 14: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/14.jpg)
EZID – identification and citationUC3 is a founding member of the DataCite consortium
Mint DOI and ARK
Add descriptive metadata
Receive QR code Global resolution Updatable
resolution URLs
Link to dataset in repository
http://n2t.net/ezid/
![Page 15: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/15.jpg)
EZID – identification and citationUC3 is a founding member of the DataCite consortium
Mint DOI and ARK
Add descriptive metadata
Receive QR code Global resolution Updatable
resolution URLs
Link from dataset landing page to article citing the data
![Page 16: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/16.jpg)
EZID – identification and citationUC3 is a founding member of the DataCite consortium
Mint DOI and ARK
Add descriptive metadata
Receive QR code Global resolution Updatable
resolution URLs
Link from article back to dataset
![Page 17: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/17.jpg)
EZID – identification and citationUC3 is a founding member of the DataCite consortium
Aggregated discovery via DataShare and Ex Libris Primo Later this year, aggregation via T-R Data Citation Index
![Page 18: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/18.jpg)
EZID – identification and citationUC3 is a founding member of the DataCite consortium
SEI for public visibility in leading search engines
![Page 19: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/19.jpg)
Merritt – preservation and access Content agnostic,
model free Micro-service
architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB
http://merritt.cdlib.org/
Enforceable Data Use Agreements (DUAs) in response to concerns over potential loss of control over dissemination and reuse
Open to the UC community and external partners
Dark archive for long-term assurance
Bright archive for sharing
Integration with preservation grids
Integration with public access portals
Integration with CMS
![Page 20: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/20.jpg)
Merritt – preservation and access Content agnostic,
model free Micro-service
architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB
For curatorially-designated collections and objects, a download request triggers …
Open to the UC community and external partners
Dark archive for long-term assurance
Bright archive for sharing
Integration with preservation grids
Integration with public access portals
Integration with CMS
![Page 21: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/21.jpg)
Merritt – preservation and access Content agnostic,
model free Micro-service
architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB
Open to the UC community and external partners
Dark archive for assurance
Bright archive for sharing
Integration with preservation grids
Integration with public access portals
Integration with CMS
Click-through DUA; acceptance of terms of use triggers …
![Page 22: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/22.jpg)
Merritt – preservation and access Content agnostic,
model free Micro-service
architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB
Open to the UC community and external partners
Dark archive for assurance
Bright archive for sharing
Integration with preservation grids
Integration with public access portals
Integration with CMS
From: [email protected]: Merritt DUA acceptance
Name: Stephen AbramsAffiliation: California Digital LibraryCollection: UCSF DataShareObject: Frontotemporal Lobar Degeneration (FTLD)Date: 2013-05-31 09:50:34 PDTTerms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement
...
Email notification to consumer and curator Delivery of requested content
![Page 23: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/23.jpg)
Web Archiving Service
http://was.cdlib.org/
Collect, describe, manage, preserve, and provide access to web sites
Analysis tools Full-text search 27 curatorial units 185 collections 10,772 web sites 97,121 captures 64 TB
“You can’t study life in our time without the Internet, so we must preserve it”
– René Vourburg, KB
Initially developed as part of the NDIIPP-funded Web at Risk project
The web has become the publication platform of choice
Source of important primary and secondary research data
![Page 24: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/24.jpg)
Web Archiving Service
http://was.cdlib.org/
Collect, describe, manage, preserve, and provide access to web sites
Analysis tools Full-text search 27 curatorial units 185 collections 10,772 web sites 97,121 captures 64 TB
“You can’t study life in our time without the Internet, so we must preserve it”
– René Vourburg, KB
Initially developed as part of the NDIIPP-funded Web at Risk project
For example, California water district web sites supplement UC Davis source water assessment and protection (SWAP) Merritt collections
![Page 25: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/25.jpg)
Connecting to communities of practice
Engage with new user communities where and how they already work
Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial/mobile web
![Page 26: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/26.jpg)
DataShare – catalyzing science
UCSF Clinical and Translational Science Institutehttp://ctsi.ucsf.edu/
UCSF Libraryhttp://www.library.ucsf.edu/
UCSF Center for Imaging of Neurodegenerative Diseasehttp://www.radiology.ucsf.edu/cind/
http://datashare.ucsf.edu/
“Making data transparent and available is going to accelerate all of science; it's a relatively inexpensive way to get more value out of all of the work that we do”
– Michael Weiner, UCSF
Pilot project in biomedical imaging
“The goal is to catalyze widespread sharing of scientific research data”
Prepare Describe Upload Curate Discover Share
![Page 27: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/27.jpg)
DataShare – catalyzing science
UCSF-developed submission client, supporting intuitive drag & drop operation and metadata entry
EZID for DOIs; Merritt for preservation XTF-based faceted search/browse portal
http://xtf.cdlib.org/
http://datashare.ucsf.edu/
“Making data transparent and available is going to accelerate all of science; it's a relatively inexpensive way to get more value out of all of the work that we do”
– Michael Weiner, UCSF
Pilot project in biomedical imaging
“The goal is to catalyze widespread sharing of scientific research data”
Prepare Describe Upload Curate Discover Share
![Page 28: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/28.jpg)
Research Hub – content mgmt and collaboration 3,900 users 770 projects Alfresco CMS
Desktop sync Mobile apps Abode Creative
Suite
Personal file management
Project collaboration
Departmental resource pooling
Research data sharing
“Powerful tools for management and collaboration”
Create Organize and enrich Keep safe Share
http://hub.berkeley.edu/
UC Berkeley Information Services &Technologieshttp://ist.berkeley.edu/
![Page 29: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/29.jpg)
Research Hub – content mgmt and collaboration 3,900 users 770 projects Alfresco CMS
Desktop sync Mobile apps Abode Creative
Suite
Personal file management
Project collaboration
Departmental resource pooling
Research data sharing
“Powerful tools for management and collaboration”
Create Organize and enrich Keep safe Share
http://hub.berkeley.edu/
Primary discovery and access via Research Hub EZID for DOIs; Merritt for preservation Merritt access called for in succession plans
![Page 30: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/30.jpg)
Data curation
“Access to and sharing of data are essential for the conduct and advancement of science”
— Arzberger et al. (2004), “Promoting access to public research data for scientific, economic, and social development,” Data Science Journal 3: 135-52, doi:10.2481/dsj.3.135
Pro-active curation of research outputs is necessary to ensure their ongoing viability and use
Good for research; good for researchers Quicker, more innovative science; higher impact factor
Increasingly necessary for conformance to institutional policies, publication requirements, and funder mandates
![Page 31: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/31.jpg)
Data curation
Widespread adoption is dependent on outreach, education, and minimal intrusion into existing disciplinary workflows and common community practices
The most effective – and sustainable – curation services are composed from best-of-breed components
Libraries are a natural curation partner for the research community
![Page 32: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research](https://reader035.fdocuments.us/reader035/viewer/2022062614/5462830eb1af9f71408b513e/html5/thumbnails/32.jpg)
For more information UC Curation Center
http://www.cdlib.org/uc3/[email protected] Abrams David LoyPatricia Cruse Mark ReyesShirin Faenza Joan StarrScott Fisher Carly StrasserErik Hetzner Marisa StrongJoshua Hubbard Bhavitavya VedulaGreg Janée Kenneth WeissJohn Kunze Perry WilletRosalie Lack
DataSharehttp://datashare.ucsf.edu/Geoffrey Boushey Megan LauranceAnirvan Chatterjee Angela Rizk-JacksonManinder Kahlon Michael WeinerJulia Kochi
Research Hubhttp://hub.berkeley.edu/Ian Crew Patrick McGrathMichael McCarthy Noah Wittman