Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities...

25
http://www.datacommons.psu.edu

Transcript of Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities...

Page 1: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

http://www.datacommons.psu.edu

Page 2: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Overview of Today’s session

– DataCommons@PSU background– Overview of capabilities– Case studies & data partners– Findings

Page 3: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Why Develop a DataCommons?

• Data management plans and curation are now a requirement of funding agencies like NSF and NIH.

• The issue of data has been featured in journals such as Science and discussed and supported in international scientific research societies such as the Royal Society in the UK (Science as an Open Enterprise) and in organizations such as the European Commission.

Page 4: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Why Develop a DataCommons?

• The issues related to data acquisition, collection, curation, and access have not only become of central importance to funding agencies, they have been recognized as vital to research, collaboration, and teaching.

• Science February 2011 special issue highlights the importance of these issues:

“Scientific innovation has been called on to spur economic recovery; science and technology are essential to improving public health and welfare and to inform sustainability; and the scientific community has been criticized for not being sufficiently accountable and transparent. Data collection, curation, and access are central to all of these issues.”

Furthermore:

“Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.”

Page 5: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Science: The State of Research Data Science: The State of Research Data

Page 6: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Background on the Background on the DataCommons@PSUDataCommons@PSU

• 2005– Concept first presented by PSIEE as an environmental data library and repository for geospatial information

created by PSU faculty, researchers, and collaborators.

– Received a $5K grant from PSU to explore the concept and acquire data.

• 2010– PSIEE presented this idea to the PSIEE director, the director of the Institute for Cyberscience, and the

director of the High Performance Computing Center in spring 2010.

– They recognized that this was a common need and had similar goals and interests currently underway.

– Growing interest and support in the next six months led to a PSU data community meeting sponsored by the Institute for CyberScience and PSIEE in September 2010.

• 2011– Over the next few months the DataCommons site and search/retrieval mechanism were developed and

tested and the first new research data was acquired.

– The DataCommons@psu site was officially launched in April 2011 has grown to include a wide array of data including geospatial, tabular data and databases, documents, models, and protocols.

Page 7: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Why is this important to Penn State?Why is this important to Penn State?

• Access to information is vital to much of the research, teaching, and outreach conducted by the Penn State Community.

• Data also demonstrates research productivity which is usually only represented in $.

• The DataCommons@PSU provides a picture of PSU research.

Page 8: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

PurposePurpose• The purpose of the datacommons@psu is to serve as a portal to data,

applications, and resources that support efforts across the Penn State community.

• The datacommons@psu facilitates interdisciplinary collaboration by connecting people and resources through:

– Data Discovery & Access– Data Archiving and Preservation– Support of Data Sharing– Development of Data and Application Documentation (Metadata)– Support for development of agency required data management plans– Metadata development seminars (new)

The datacommons@psu does not replace existing programs or projects but highlights those by making information and their websites/data accessible via the datacommons@psu search engine.

Page 9: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Purpose…Purpose…• Highlight data, applications, models, and projects created by

members of the university community.• Support collaboration and data sharing across those efforts and

communities.• Support the development of large scale research proposals and

provide the data infrastructure to build research gateways. • Reduce costs by providing widespread access to data needed by

multiple projects and programs and reduce redundant data acquisition efforts and storage of data—Core Data

• Enhance the ability to develop research proposals, publish results, and aid in supporting the educational/outreach component of major funders.

• Provide a unifying tool that promotes cooperation and the development of cross college/cross campus initiatives by linking individuals and groups with similar interests and information needs together.

Page 10: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

What are other universities doing?What are other universities doing?

Page 11: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Capabilities• Data storage• Metadata development• Data search, retrieval, and access • Visualization of compatible data• Core data • Documentation and access to apps created by PSU• Documentation of models and protocols• Creation of Digital Object Identifiers (DOIs)• Links to existing data repositories with PSU data• References and links to publications based on the

data

Page 12: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Search Engine & Data Discovery PortalSearch Engine & Data Discovery PortalEnhanced Data Discovery Enhanced Data Discovery OptionsOptions

Search by PSU Search by PSU College/Dept/Center/InstituteCollege/Dept/Center/Institute

Page 13: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Enhanced Data Discovery OptionsEnhanced Data Discovery Options Search by PSU ResearcherSearch by PSU Researcher

Page 14: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Enhanced Data Discovery OptionsEnhanced Data Discovery Options Search by Research ThemeSearch by Research Theme

Page 15: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Search ResultsSearch Results

Page 16: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

• Researcher: Gabrielle Alpirez de Davie, EducationData: Validity of *ONET Work Importance Profile web version for

Spanish speaking populations• Researcher: John Reichendorfer, OPP, Tom Flynn, OPP Landscape

Data: Aerial Photography, Tree Database, PSU vector data• Researcher: Dennis Decoteau, Horticulture,

Data: Ambient air monitoring for Pennsylvania• Researcher: Marc Abrams, Department of Ecosystem Management

Data: Impacts of contrasting land-use history on composition, soils, and development of mixed-oak, coastal plain forests on Shelter Island, New York

• Researcher: Kim Steiner, Department of Ecosystem ManagementData: Oak Forest Regeneration

• Researcher: Dr. Robert P. Brooks, Geography Department, Riparia Data: Pocono Birds--Presence & Proportion on Lakes

• Researcher: Dr. Eric Post, Department of BiologyData: Trophic Mismatch—Caribou Phenology

• Researcher: Andrew Patterson, Huck Life Sciences InstituteData: Metabolomics, Ant Tissue

Page 17: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Data Summary Page:Data Summary Page:Downloadable DataDownloadable Data

Page 18: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Data Summary Page: App

Page 19: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Apps & Tools Apps & Tools • Links to Data in Application • Links to Data in Thematic

Database

Page 20: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Data Summary Page: Data Summary Page: Multiple OptionsMultiple Options

Multiple Data Viewing/Download Options GIS Enabled Data

Page 21: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Case Studies: Metabolomics DataCase Studies: Metabolomics Data

• Currently hosting data for the Center for Molecular Toxicology and Carcinogenesis.

• Data and protocols can be accessed from Data Commons and from the Metabolomics Explorer.

Page 22: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Case Studies: Arboretum at Penn StateCase Studies: Arboretum at Penn State

Collaboration across departmentsCollaboration across departments

• DataCommons is working with the PSU Arboretum and OPP to acquire and provide access to data via the DataCommons as well as an interactive application.

• Goals are to provide both access to data and usable apps for the public, for teaching at PSU, and for research.

• The PSU campus as a living lab!

Page 23: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

FindingsFindings

• Need for preservation long term.• Need for plan to transition or upgrade to new

versions of software.• Need for metadata education.• Curation of sensitive data.• Large datasets—video, astronomical

observations, remotely sensed data need to be housed and preserved.

Page 24: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Findings continued…Findings continued…• Data storage for projects that already provide

public access to data but need a centralized permanent home.

• Need for data to be accessed by multiple interfaces.

• Identity can be important to some providers. • Cross campus workgroups that have common

data, platform/software needs but no place to store the data.

Page 25: Http://. Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings.

Questions?Questions?