Today’s Research Data Environment
description
Transcript of Today’s Research Data Environment
Today’s Research Data Environment
The context for Social Science Data
2
International Polar Year (IPY) experience
3
Data managers’ perspectives of IPY “A Conceptual Framework for Managing Very Diverse
Data for Complex, Interdisciplinary Science” reading assignment
“This emphasis on huge data volumes has underplayed another dimension of the fourth paradigm that presents an equally daunting challenge – the diversity of interdisciplinary data and the need to interrelate these data to understand complex problems such as environmental change and its impact.”
National Science Board’s three categories of data collections: Research collections: project-level data Resource collections: community-level data Reference collections: multiple communities
4
Data managers’ perspectives of IPY “As data managers for IPY, we find that
while technology is a critical factor to addressing the interdisciplinary dimension of the fourth paradigm, the technologies developing for exa-scale data volumes are not the same as what is needed for extremely distributed and heterogeneous data. Furthermore, as with any sociotechnical change, the greater challenges are more socio-cultural than technical.”
5
Lessons learned from the IPY Established a data policy around five data
principles: Discoverable Open Linked Useful Safe
“[M]ust consider the data ecosystem as a whole.”
Need for a “keystone species” in the data ecosystem
6
Lessons learned from the IPY Data realities:
“data will be highly distributed and housed at many different types of institutions,”
“the use and users of data will be very diverse and even unpredictable,”
“the types, formats, units, contexts and vocabularies of the data will continue to be very complex if not chaotic.”
7
Local research data landscapes Large data centres for single projects Project-level repositories (e.g.,
Islandora) Institutional and domain repositories Government agencies with data Data library services Researchers without infrastructure
A patchwork of “entities” that are largely unconnected
8
Global research data landscape Networks of data archives Inter- and non-governmental
organizations with warehouses of data
International social science projects National and pan-national statistical
organizations
A patchwork of “entities” that are loosely connected
9
Data landscape entities
Preservation Function
Individual Centric
Domain Centric
Institutional Centric
Long-term preservation
Domain archives
Institutional repositories
Short to mid-term preservation
Data warehouses Data centres
Staging repositories
No preservation responsibilities
WebsiteFTP site
Research web portals
Data libraries
10
Data landscape entities
Access Function
Individual Centric
Domain Centric
Institutional Centric
Long-term access
Short to mid-term access
Immediate access Website
sFTP sites
Domain web
portals
Data centres
Domain archives
Datalibraries
Staging repositori
es
Institutional
repositories
Sust
aina
bilit
y
Warehouses
11
Data repository relationships“[T]he next step in the evolution of digital repository strategies should be an explicit development of partnerships between researchers, institutional repositories, and domain-specific repositories.” Ann Green and Myron Gutmann, “Building partnerships among social science researchers, institution-based repositories and domain specific data arrchives,” OCLC Systems & Services, Vol. 23 (1), pp. 35-53.
12
How does it all fit together?
Datacentre
OAIS
Datacentre
Website
Website
Website
OAIS
OAIS
OAIS
Datalibrary
Datalibrary
13
A research data infrastructure
OAIS
OAIS
OAIS
OAIS
14
Connect data repositories
OAIS
OAIS
OAIS
OAIS
15
Distribute OAIS functions
AIP
AIP
DIP
SIP
SIP: submission information packageAIP: archival information packageDIP: dissemination information package
16
Share OAIS services
OAIS
OAIS
OAIS
DeliveryProtectionInterpretationApplicationInteroperation Authenticati
onFindMethodLinkage
OAIS
Community Cloud
17
GRDI2020 Digital Science Ecosystem
18
Cyberinfrastructure
19
Data Services and Infrastructure
Data Services
• Local• Technology
• Social• Global
Distributed
Preservation
Backbone
Data Management Plans
Data Citation Training
DataVerse Instance
20
Jim Gray’s e-Science Vision