Infrastructure, Standards, and Policies for Research Data Management
description
Transcript of Infrastructure, Standards, and Policies for Research Data Management
Infrastructure, Standards, and Policies for Research Data Management
Jian Qin School of Informa0on Studies, Syracuse U, USA COINFO 2013, Wuhan, China, 2013-‐10-‐26
About this presenta0on
10/26/2013 COINFO2013, Wuhan, China 2
1. Concepts about data infrastructure
services
2. Problems & gaps in data
management services
3. Problems and gaps
4. Data management infrastructure
service dimensions
Some background about the topic
Infrastructure, standards, and policy
10/26/2013 COINFO2013, Wuhan, China 3
Infrastructure
The underlying founda0on or basic framework (as of a system or organiza0on).
The system of public works of a country,
state, or region.
The resources (as personnel, buildings, or equipment) required for an ac0vity.
hVp://www.merriam-‐webster.com/dic0onary/infrastructure
10/26/2013 COINFO2013, Wuhan, China 4
Data infrastructure
10/26/2013 COINFO2013, Wuhan, China 5
“a sustainable data infrastructure that will be discoverable, searchable, accessible, and usable to the en0re research and educa0on community.” “usable by mul0ple scien0fic disciplines…” “…that can support and provide data solu0ons to a broader range of scien0fic disciplines while reducing duplica0ve efforts.”
hVp://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504776
Standards
10/26/2013 COINFO2013, Wuhan, China 6
Scien=fic data formats Metadata standards for scien=fic data
Data policies
§ Access and use § Management
§ Storage and backup § Metadata
§ Sharing § Preserva0on § Intellectual property rights § Security
10/26/2013 COINFO2013, Wuhan, China 7
Examples of data infrastructure services
§ The Ins0tute for Quan0ta0ve Social Science repository: hVp://www.iq.harvard.edu/
§ Inter-‐University Consor0um for Poli0cal and Social Research (ICPSR): hVp://www.icpsr.umich.edu/icpsrweb/landing.jsp
§ The Dryad Digital Repository: hVp://datadryad.org/ § Data Observa0on Network for Earth: hVp://www.dataone.org/ § Datalib: hVp://databib.org/ (a registry/directory/catalog of research data repositories)
§ Registry of Research Data Repositories: hVp://www.re3data.org/
10/26/2013 COINFO2013, Wuhan, China 8
Major problems
§ “Challenges and opportuni0es,” Introduc0on to special sec0on Dealing with Data. Science, 11 February 2011: Vol. 331, pp. 692-‐693.
§ 20% of respondents regularly use or analyze data sets exceeding 100 GB
§ 7% use data sets exceeding 1 TB § About 50% store their data only in their laboratories § Lack of common metadata and archives for using and storing data
§ No funding to support archiving 10/26/2013 COINFO2013, Wuhan, China 9
Gaps in data management services
10/26/2013 COINFO2013, Wuhan, China 10
Community data repositories
Ins0tu0onal data repositories
Laptops, personal hard drives, etc.
Data lifecycle
Raw data Ac0ve data
Verified, Derived, calculated, … data
Verified, archived data
Gaps: lack of standards and tools to support managing ac=ve data
(data staging services)
Gaps: lack of =me, lack of staff support, and lack of tools for crea=ng meaningful metadata (data products development services)
Why the gaps?
10/26/2013 COINFO2013, Wuhan, China 11
Raw data, ac0ve data
Calculated, derived … data
Verified, archived data
Technical factors
Organiza0onal factors
Behavioral factors
Lack of tools to help DM at different stages of a research lifecycle Data repositories do not always provide tools for pre-‐submission staging
Researchers have no 0me for performing DM tasks No mo0va0on to invest 0me in DM Concerns for losing compe00ve advantages
Lack of repeatable, reliable prac0ces to ensure effec0ve DM Lack of ins0tu0onal policies to support and assess DM prac0ces Lack of DM training programs
10/26/2013 COINFO2013, Wuhan, China 12
10/26/2013 COINFO2013, Wuhan, China 13
Research data management
10/26/2013 COINFO2013, Wuhan, China 14
A series of services that an organiza0on develops and implements through ins0tu0onalized data policies, technological infrastructures, and informa0on standards.
Image credit: DataONE best prac0ces hVp://www.dataone.org/best-‐prac0ces
Principle of Infrastructure as a Service (IaaS)
10/26/2013 COINFO2013, Wuhan, China 15
“a standardized, highly automated offering, where compute resources,
complemented by storage and networking capabili0es are owned and hosted by a service provider and offered
to customers on-‐demand.” Gartner, “IT glossary”, hVp://www.gartner.com/it-‐glossary/infrastructure-‐as-‐a-‐service-‐iaas/.
Nature of an infrastructure § Embeddedness. Infrastructure is sunk into, inside of, other structures, social arrangements, and technologies.
§ Transparency. Infrastructure does not have to be reinvented each 0me of assembled for each task, but invisibly supports those tasks.
§ Reach or scope beyond a single event or a local prac=ce. § Learned as part of membership.
§ Links with conven=ons of prac=ce. § Embodiment of standards.
§ Built on an installed base. § Becomes visible upon breakdown.
§ Is fixed in modular increments, not all at once or globally. 10/26/2013 COINFO2013, Wuhan, China 16
Star, S.L. & Ruhleder, K. (1996). Steps toward an ecology of infrastructure: Design and access for large informa0on space. Informa0on Systems Research, 7(1): 111-‐134.
Three dimensions of data infrastructure services
10/26/2013 COINFO2013, Wuhan, China 17
Infrastructure
Networks, systems, databases, sooware tools, data services
COINFO2013, Wuhan, China 18
Infrastructure
Networks, systems, databases, sooware tools, data services
What is ins0tu0onaliza0on? Why do you need ins0tu0onalize research data management? How can you ins0tu0onalize RDM?
10/26/2013
COINFO2013, Wuhan, China 19
Infrastructure
Networks, systems, databases, sooware tools, data services
How much do you know about data and metadata? How does the nature of data affect metadata? How does metadata affect data access, sharing, reuse, and long-‐term preserva0on?
10/26/2013
COINFO2013, Wuhan, China 20
Infrastructure
Networks, systems, databases, sooware tools, data services
What is data infrastructure and Data infrastructure services? Why do you need to build a data infrastructure? What is the key in building a data infrastructure?
10/26/2013
Data infrastructure services and research libraries
10/26/2013 COINFO2013, Wuhan, China 21
Research librarianship
Data science IT
management
Data infrastructure
services
Data librarianship
Data infrastructure
Library IT
Need more R&D
10/26/2013 COINFO2013, Wuhan, China 22
Building data infrastructure services
hVp://www.arl.org/storage/documents/publica0ons/2012-‐hrsym-‐pres-‐neal-‐j.pdf
• To change in composi0on or structure (what we are/what we do)
• To change the outward form or appearance (how we are viewed/understood)
• To change in character or condi0on (how we do it)
22
The keyword for data infrastructure services is:
COINFO2013, Wuhan, China 23 10/26/2013
In summary…
That includes: • Ins0tu0onalizing DM • Developing and implemen0ng standards for DM • Developing and implemen0ng data infrastructure