Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites...

28
Reference Frameworks for Assessing Maturity of Earth Science Data Products: Part 2 AGU’s Data Management Assessment Program: Pilot Out-brief ESIP Summer Meeting, Durham July 19, 2016 Shelley Stall AGU Assistant Director, Enterprise Data Management [email protected]

Transcript of Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites...

Page 1: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Reference Frameworks for Assessing Maturity of Earth Science Data Products:

Part 2

AGU’s Data Management Assessment Program: Pilot Out-brief

ESIP Summer Meeting, Durham July 19, 2016

Shelley Stall

AGU Assistant Director, Enterprise Data Management

[email protected]

Page 2: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

2  h$ps://sciencepolicy.agu.org/files/2013/07/AGU-­‐Data-­‐PosiAon-­‐Statement-­‐Final-­‐2015.pdf  

Page 3: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Data Management Maturity (DMM) Model

The DMM is a process improvement and capability maturity model for the management of an organization’s data assets and corresponding activities. It contains best practices for establishing, building, sustaining, and optimizing effective data management across the data lifecycle, from creation through curation, delivery, maintenance, and preservation.

Page 4: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

AGU Data Management Assessment

•  Data  Management  Maturity  (DMM)  process  model  •  Assessment  and  Scoring  Methodology  

Tools  

•  OrganizaAonal  processes  in  place  that  support  and  manage  data  assets.    

Scope  

•  Determine  level  of  awareness  of  best  pracAces  and  to  what  extent  they  are  performed.    

•  Characterize  the  level  into  capability  and  maturity.  

ObjecAve  

4  

Page 5: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

AGU DMM Assessments - Completed

#1 USGS ScienceBase – Data Release Team Viv Hutchison, Drew Ignizio, Michelle Chang, Madison Langseth, Ben Wheeler, Tamar Norkin, Brandon Serna, Tim Kern, Dell Long, Kevin Raney, Sean Pedigo #2 The Biological and Chemical Oceanography Data Management Office (BCO-DMO) Team Cyndy Chandler, Peter Wiebe, Bob Groman, David Glover, Danie Kinkade, Shannon Rauch, Molly Dicky Allison, Nancy Copley, Pingyu Qiao, Adam Shepherd, Eric Cunningham

5  

Page 6: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Video of Assessment Out-brief - Jan 2016

Winter ESIP combined session with CDF: https://youtu.be/naSWpQUInqM

6  

Page 7: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase Assessment Scope – Data Release Team •  Targeted the Data Release Team team and

their challenge for ensuring that data release capability for USGS was solid and scalable.

•  Did not include the 100s of other project workspaces housed in ScienceBase.

7  

Page 8: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase - Data Management Assessment Objectives •  Establish an objective baseline for data management

practices. •  Ensure that ScienceBase complies with and supports

USGS data policies. •  Ensure that ScienceBase is a recognized repository

for research data by publishers. •  Ensure that users have confident in ScienceBase. •  Ensure that ScienceBase has a strong data release

process that •  Ensure ScienceBase is adequately connecting to the

USGS Library resources.

8  

Newly Formed Team – Opportunity to have a common understanding of the work.

Page 9: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase Assessment Experience (1 of 2)

9  

•  “The onsite assessment was an extremely engaging process”

•  “Everyone was involved.” •  “It was safe. We all felt we could say what we

needed to say.” •  “We came to consensus on what we do, how

we do it. And. We also did not come to consensus and worked on those outside the assessment.”

Page 10: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

10  

•  “Really helped us to get organized. We didn’t have our objectives actually written out and formalized. It’s really key to making progress.”

•  “Organized the documentation we did have.” •  “Our limited documentation was not a huge

failing.” •  “We thought the DMM was about

documentation, but it’s not true.”

ScienceBase Assessment Experience (2 of 2)

Page 11: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase Assessment Outcomes (1 of 2)

•  Communication –  “Finalizing a user agreement – lay out our

expectations and what we provide for users.” –  “Quarterly Update (Newsletter) – Keep stakeholders

aware of ScienceBase improvements, tips, updates in general.”

–  “Working on building a user feedback from data providers about their experiences and to improve our process.”

11  

Page 12: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase Assessment Outcomes (1 of 2)

•  Metrics - need to keep more granular metrics and use those metrics more effectively. –  “Significant growth in services is expected as a result

of the open data policies” –  “We need to track [metrics]” –  “How large are the datasets” –  “What are the actual costs of managing ScienceBase

and working with users.” –  “Use that information to get funding support” –  “Advertise capabilities” –  “Manage Risk”

12  

Page 13: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

ScienceBase – Assessment Experience Summary

•  “The DMM enabled us to prioritize our efforts based on what we were already doing. Figure out what was important and what would have impact on our system and our users.”

•  “We have near term and longer term activities [defined in the final report].”

13  

Page 14: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

BCO-DMO Assessment Scope

•  Full set of data services as defined in their NSF Grant.

14  

Page 15: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

BCO-DMO Data Management Assessment Objectives

•  Pre Mid-Term Review Preparation •  Training •  Planning for the Future

15  

Page 16: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

BCO-DMO Assessment Experience •  “Large focus on consensus building” •  “We do everything in our mission statement pretty

well.” •  “We are fairly confident we do are doing a good job.” •  “We are all on board with the organization concensus

process. Sometimes it was painful to dragging us all along. We did bring everybody to the same goals. Very valuable.”

•  “Corporate memory sits in the Senior PI and manager brains. We’re lacking in this documentation.”

16  

Page 17: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

BCO-DMO – Assessment Experience Summary •  “For the specific objectives we had, it was really

successful.” •  “We learned where we are strong, where we are

deficient, and where we can improve.” •  “We on-boarded everybody.” •  “We have a road map now for what we need to

do.”

17  

Page 18: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Best Practices for Data Management

18  

3.5 Years of Development

70 Peer Reviewers

------- 25 Process Areas

350+ Practice

Statements

Page 19: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Key User Groups •  Institutions

•  Industry

•  Large Data Facilities and Repositories

•  Small Data Facilities and Repositories

•  Research Teams/Projects

19  

Data  Manager  

Data  Steward  

Data  Architect  

Data  Analyst  

Data  Owner  

Researcher  

ScienAst  

Metadata  Guru  

Data  Curator  

Program  Manager  

Principal  InvesAgator  

Chief  Data  Officer  

Modeler  

Publisher  

StaAsAcian  

...and  more  

Page 20: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

20  

Data  Management  Strategy  Grant  Strategy/Business  Case  

Funding  Data  Lifecycle  Management  

CommunicaAons  Data  Management  FuncAon   Data  Profiling  &  Assessment  

Data  Cleansing  CuraAon  

ContribuAon  Management  Governance  Management  

Architectural  Approach  Metadata  Standards  Open  Linked  Data  

Data  Management  Pla`orm  Data  Archive  &  PreservaAon  

Disaster  Recovery  

Data  IntegraAon  Interoperability   Data  CitaAon  

DMM Best Practices Data  Requirements  Data  Quality  Strategy  Metadata  Management  

Vocabulary/Taxonomy/SemanAcs  

Measurement  &  Analysis  Process  Management  Process  Quality  Assurance  Risk  Management  ConfiguraAon  Management  

Page 21: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Data Management Strategy Process Areas: Encompasses process areas designed to focus on development, strengthening, and enhancement of the overall data management program.

•  Data Management Strategy Process Areas: –  Encompasses process areas designed to focus on development,

strengthening, and enhancement of the overall data management program. •  Data Management Strategy

–  Defines the vision, goals, and objectives for the data management program and ensures that relevant stakeholders are aligned on program priorities, implementation and management.

•  Communications –  Ensures that policies, progress announcements, and other data

management communications are published, enacted, understood, and adjusted based on feedback.

•  Data Management Function –  Provides guidance for data management leadership and staff to ensure that

data is managed as an asset. •  Grant Strategy/Business Case

–  Provides a rational for determining which data management initiatives should be funded, and ensures that sustainability of data management by making decisions based on resource considerations and benefits to the organization.

•  Funding –  Ensures the availability of adequate and sustainable financing to support the

data management program. 21  

Page 22: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Data Governance Process Areas: Identifies important data assets, defines and implements processes to manage the assets, and formally manages them throughout the organization.

•  Governance Management

–  Develops the ownership, stewardship, and operational structure needed to ensure that data is managed as a critical asset and implemented to an effective and sustainable manner.

•  Vocabulary/Glossary –  Supports a common understanding of terms and definitions

about structured and unstructured data supporting the community for all stakeholders.

•  Metadata Management –  Establishes the processes and infrastructure for specifying and

extending clear and organized information about the structured and unstructured data assets under management, fostering and supporting data sharing [to include data discoverability, data understandability, data interoperability], ensuring compliant use of data, improving responsiveness to community changes, and reducing data-related risks.

22  

Page 23: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Data Quality Process Areas: Defines a collaborative approach for receiving, assessing, cleansing, and curating data to ensure fitness for intended use in the scientific community. This includes ensuring metadata content and standards are met, data submissions are complete, and data is accessible at the right time.

•  Data Quality Strategy –  Defines an integrated, organization-wide strategy to achieve and maintain

the level of data quality required to support the organization’s goals and objectives. Where data quality guidelines are defined at a domain or community level, the strategy incorporates that compliance.

•  Data Profiling –  Develops an understanding of the content, quality, and rules of a specified

set of data under management. –  This is the first step taken when a new data set is being reviewed. It provides

a basic quantitative understanding. For example, profiling can provide the following information: establishing types or number of distinct values in a column, number or percent of zero, blank or null values, string length, date ranges, and data patterns.

•  Data Quality Assessment –  Provides a systematic approach to measure and evaluate data quality

according to processes, techniques, and against data quality rules. •  Data Cleansing and Curation

–  Defines the mechanisms, rules, processes, and methods to validate and correct data (and metadata) as appropriate.

23  

Page 24: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Data Operations Process Areas: Ensures data requirements are fully specified and data is traceable with documented provenance, manages data changes, and manages data contributions.

•  Data Requirements Definition –  Ensures the data submitted and accessed by the scientific

community will satisfy organizational objectives, is understood by all relevant stakeholders, and is consistent with the processes that receive, curate and make data discoverable and accessible.

•  Data Lifecycle Management –  Ensures that the organization understands, maps, inventories,

and controls its data flows through processes throughout the data lifecycle from creation or acquisition to curation, archive, preservation and access.

•  Contribution / Provider Management –  Optimizes internal and external contribution of data to satisfy

organizational requirements and to manage data access agreements consistently.

24  

Page 25: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Platform & Architecture Ensures the implemented data management platform successfully integrates, archives, preserves data assets to support the organization and/or scientific community objectives. •  Architectural Approach

–  Designs and implements an optimal data layer that enables the acquisition, curation, storage, archive, preservation, and access of data to meet organizational and technical objectives.

•  Architectural Standards –  Provides an approved set of expectations for governing architectural

elements supporting approved data representations, data access, and data distribution, fundamental to data asset control and the efficient use and exchange of information.

•  Data Management Platform –  Ensures that an effective platform is implemented and managed to meet

organizational needs. •  Data Integration

–  Reduce the need for the organization to obtain data from multiple sources, and to improve data availability for organizational processes that require date consideration and aggregation, such as analytics.

•  Data Archiving and Preservation –  Ensures that data maintenance will satisfy organizational and federal

requirements for scientific research data availability, and that legal and regulatory requirements for data archiving, preservation and disaster recovery of data are met.

25  

Page 26: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Supporting Processes Foundational processes that support adoption, execution, sustainment, and improvement of data management processes.

•  Measurement and Analysis –  Develop and sustain a measurement capability and analytical

techniques to support managing and improving data management activities.

•  Process Management –  Establish and maintain a usable set of organizational process

assets, and plan, implement, and deploy organizational process improvements informed by the business goals and objectives and the current gaps in the organization’s processes.

•  Process Quality Assurance –  Provide staff and management with objective insight into process

execution and the associated work products. •  Risk Management

–  Identify and analyze potential problems in order to to take appropriate action to ensure objectives can be achieved.

•  Configuration Management –  Establish and maintain the integrity of the operational environment

using configuration identification, control, status accounting, and audits.

26  

Page 27: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

27  

AGU Data Management Program: http://dataservices.agu.org/dmm/

Page 28: Reference Frameworks for Assessing Maturity of Earth Science Data …commons.esipfed.org › sites › default › files › Stall_ESIP Data... · 2020-01-03 · Data Management Maturity

Contact Information:

Shelley Stall [email protected]

AGU Data Management Program: http://dataservices.agu.org/dmm/

28