Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and...

24
1 Data Management Plan EPS 0903787 1 Overview The Mississippi EPSCoR RII project entitled “Modeling and Simulation of Complex Systems” is supported by the National Science Foundation. The project includes researchers and educators from Mississippi’s four research institutions: Jackson State University (JSU), Mississippi State University (MSU), the University of Mississippi (UM) including the University of Mississippi Medical Center (UMMC), and the University of Southern Mississippi (USM). Participants from some other state institutions of higher learning also participate in the project. Our data management plan specifies how the data, tools, and associated metadata produced by the project will be managed and secured throughout the life of the project, shared with other researchers and the public, and preserved after the termination of the project. The members of the Cyberinfrastructure/Data Management Committee and the EPSCoR RII Management Team (Table 1) developed this plan with input from EPSCoR research faculty and a liaison from our National Advisory Board, Dr. N. Radhakrishnan. The format of the plan is based on the Data Management Planning Guide published by Cornell University (https://confluence.cornell.edu/display/rdmsgweb/services/) and NSF requirements for data management NSF's information on Dissemination and Sharing of Research Results (http://www.nsf.gov/bfa/dias/policy/dmp.jsp). Table 1: Committee members responsible for developing and implementing the Data Management Plan. Cyberinfrastructure/Data Management Committee EPSCoR RII Management Jason Hale, University of Mississippi (chair) Sandra Harpole, Project Director Dawn Wilkins, University of Mississippi Teresa Gammill, Project Administrator Brian Hopkins, Director of the Mississippi Supercomputing Research Center Katie Echols, Education and Outreach Coordinator Yogi Dandass, Mississippi State University Susan Bridges, Science Coordinator Natarajan Meghanathan, Jackson State University Chaoyang “Joe” Zhang, University of Southern Mississippi The components of this data management plan map very closely to the NSF-wide data management plan instructions in the Grant Proposal Guide (GPG) as shown in Table 2.

Transcript of Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and...

Page 1: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  1  

Data Management Plan EPS 0903787

1 Overview The Mississippi EPSCoR RII project entitled “Modeling and Simulation of Complex Systems” is supported by the National Science Foundation. The project includes researchers and educators from Mississippi’s four research institutions: Jackson State University (JSU), Mississippi State University (MSU), the University of Mississippi (UM) including the University of Mississippi Medical Center (UMMC), and the University of Southern Mississippi (USM). Participants from some other state institutions of higher learning also participate in the project. Our data management plan specifies how the data, tools, and associated metadata produced by the project will be managed and secured throughout the life of the project, shared with other researchers and the public, and preserved after the termination of the project.

The members of the Cyberinfrastructure/Data Management Committee and the EPSCoR RII Management Team (Table 1) developed this plan with input from EPSCoR research faculty and a liaison from our National Advisory Board, Dr. N. Radhakrishnan. The format of the plan is based on the Data Management Planning Guide published by Cornell University (https://confluence.cornell.edu/display/rdmsgweb/services/) and NSF requirements for data management NSF's information on Dissemination and Sharing of Research Results (http://www.nsf.gov/bfa/dias/policy/dmp.jsp).

Table 1: Committee members responsible for developing and implementing the Data Management Plan.

Cyberinfrastructure/Data Management Committee

EPSCoR RII Management

Jason Hale, University of Mississippi (chair) Sandra Harpole, Project Director Dawn Wilkins, University of Mississippi Teresa Gammill, Project

Administrator Brian Hopkins, Director of the Mississippi Supercomputing Research Center

Katie Echols, Education and Outreach Coordinator

Yogi Dandass, Mississippi State University Susan Bridges, Science Coordinator Natarajan Meghanathan, Jackson State University Chaoyang “Joe” Zhang, University of Southern Mississippi

The components of this data management plan map very closely to the NSF-wide data management plan instructions in the Grant Proposal Guide (GPG) as shown in Table 2.

Page 2: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  2  

Table 2: Mapping of NSF data management components to MS EPSCoR data management components. (Adapted from: https://confluence.cornell.edu/display/rdmsgweb/services/)  

NSF component MS EPSCoR component(s) The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project

Description

The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)

Content and Format

Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements

Protection and Intellectual Property Access

Policies and provisions for re-use, re-distribution, and the production of derivatives

Protection and Intellectual Property

Plans for archiving data, samples, and other research products, and for preservation of access to them

Preservation and Transfer of Responsibility

2 Description of Data This project will produce public-use data and tools involved in the study of modeling and simulation of complex systems. Data include experimental data and metadata used to construct and validate models, the models and associated parameters and other metadata, and model outputs. Tools include stand-alone and web-based tools for processing data inputs, tools for building and executing models, and tools for analyzing and visualizing model outputs. Quantitative survey data and curricular materials will also be generated. This plan requires that data be released through a restricted use website and, when appropriate, deposited in standard repositories for each discipline. A web-based survey was created by the Cyberinfrastructure/Data Management Committee to identify the types of data and the computational tools being created by the investigators and to inform planning how to best manage, protect, and share those data and tools for maximum impact. For this plan, “this project” should be interpreted to mean “any active Mississippi NSF Track I RII EPSCoR project.” A copy of the survey can be found in Appendix 1.  

 2.1 Key EPSCoR Data Repositories

The major repository for EPSCoR data will be the Mississippi Center for Supercomputing Research (MCSR) at UM. The Provenance, Archival, and Retrieval System (PARS), which is being developed as part of this project, will be the Web interface to this repository and will also reside at UM. The MS EPSCoR Web site (http://www.msepscor.org) will serve as a portal to the broader scientific community and the public. In addition to MCSR at UM, other EPSCoR universities maintain computing and archival facilities for their EPSCoR researchers. These facilities will be used for short term storage of the data, models, and metadata.

Page 3: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  3  

Many EPSCoR researchers at MSU participate in centers that are part of the High Performance Computing Collaboratory (HPC2). These centers include the Center for Advanced Vehicular Systems (CAVS), the Institute for Genomics, Biocomputing, and Biotechnology (IGBB), and the Center for Computational Sciences (CCS). Disk storage at HPC2 is provided via high-performance RAID-enabled disk systems, including a Sun Microsystems-based SAN storage system. A total of approximately 600 Terabytes of disk space is available for HPC² researchers. In addition to the online disk space, a Sun StorageTek SL8500 tape library provides a capability of 9 Petabytes of near-line storage. Within a few minutes of data being written to disk (currently about 30 minutes), the data will automatically be backed up to tape. Archive full backups are performed on the first Saturday of each month. The Hummod project at the University of Mississippi Medical Center uses Github, a web-based software service, to manage source code and distributions. Github offers a unique blending of powerful version control software, Git, and modern social networking. Git is unique in that it affords every user a complete version of the repository and can handle mergers of repositories and other conflicts in a distributed manner. The “social coding” paradigm supported by Github allows open sourced projects to be “forked”, a process by which a user claims ownership of their own version of the repository as a separate repository from the master, for developers to augment the code to their own desires. Each fork also provides its own set of support features including issue tracking, branching, web page hosting, and package distributions. Because Github provides backup and version control and because of the size of the data, the Hummod data stored at Github is not stored at the MSCR. EPSCoR researchers in the School of Computing at USM use two Dell PowerEdge M710 blade servers to support research in computational biology and bioinformatics. Backups for the two servers occur nightly to a system consisting of a total of 15 1 TB SATA hard drives arranged in a RAID 6 array with two hot-spares. EPSCoR researchers at JSU are primarily in the Department of Chemistry and the Department of Biological Sciences. Both of these departments provide cluster computing facilities with large amounts of disk space for storage of running jobs and completed jobs. Backups are done once per week. 2.2 Scientific Datasets We will produce between 15 and 160 terabytes of scientific data from approximately 40 data sources within the general categories of computational simulation data, observation data, and input/output data from wet lab experiments. Ninety percent of these data sources will be small enough to fit on a commodity hard drive for a desktop computer, with only four estimated to require over 1 terabyte each.

Appendix 2 contains supplementary tables summarizing the results of the survey.

Page 4: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  4  

2.2.1 Computational Chemistry (CompChem) Computational chemistry datasets are typically composed of several very common file types, most notably ASCII plain text files and XML files. These files are fully portable between computers of all types, and require no special readers or specific software versions to access. Thus, a well-organized computational chemistry dataset can be retrieved by any scientist and easily read on any computer in the world. 2.2.2 Computational Biology (CompBio) The computational biology focus group produces experimental data, models, and tools. Table 3 lists the major types of data produced, the standard formats used to store and disseminate this data and associated metadata, and the repositories where it will be distributed. Table 3: Repositories for experimental data for Computational Biology.

Data Type Format Repository Proteomics datasets

PRIDE format PRIDE database maintained by EMBL-EBI (http://www.ebi.ac.uk/pride/)

Microarray datasets

MIAME-compliant data format (http://www.ncbi.nlm.nih.gov/geo/info/MIAME.html)

NCBI Gene Expression Ominbus (GEO) (http://www.ncbi.nlm.nih.gov/geo/)

RNASeq and ChipSeq datasets

Raw data formatted using Sequin program http://www.ncbi.nlm.nih.gov/genbank/eukaryotic_genome_submission/

Sequence Read Archive (SRA) at NCBI http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?

2.2.3 Biological Systems Simulation (BioSim) BioSim will produce two major types of data: 1) computational fluid dynamics (CFD) models including inputs and outputs and 2) the Hummod physiological model and associated data. CFD meshes will be stored in a standard ANSYS Fluent .MSH and .CAS formats. Storage of meshes using a new open description developed in-house called a .OMSH format will also be investigated. CFD data set will be stored in standard ANSYS Fluent .DAT format with possibly more future-oriented data stored in a Loci-CHEM .VOG format. All mesh and data sets will be stored in both ascii and binary formats. The Hummod model and associated data are stored on Github as described in section 2.1

2.3 Project Management Data Project management data is being stored in a secure web-based database at Mississippi State University, Office of Research and Economic Development. A project management system with limited access has been developed to collect and archive project information. Appropriate sections such as publications will be placed on Mississippi EPSCoR website. Project management documents and work products will be created in standard office formats such as Word, PDF, and Excel.

Page 5: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  5  

2.4 Computational Tools We estimate that approximately 15 computational tools will be developed, many of which will be shared with a variety of potential users groups. Source code, instructions for generating executables, detailed documentation, and user’s manuals will be deposited in PARS. Instructions for accessing the tools (source code or binary code) will be made available on the MS EPSCoR Web page at the end of the fourth project year, and for at least three years following project completion, including metadata and links to download sites. 3 Data Protection and Preservation 3.1 Sensitive Data Protection Informed consent statements will use language that will allow the data to be shared with the research community, when appropriate. Research activities envisioned present minimal risk to human subjects. The data collection activities that require IRB approval will be conducted according to the established IRB protocols of the investigator’s home institution, and will comply with all federal regulations. During data analysis, the identifying information of sensitive data will be accessible only to certified members of the project team. The responsible investigators will remove any direct identifiers in the data before depositing the data on any server, including PARS, MCSR, HPCC, the MS EPSCoR Web site, or any institutional, inter-institutional, or public data server or repository. Where identifiable sensitive information must be temporarily stored, it will be encrypted, and meet all compliance requirements of the institution of the responsible investigator. The EPSCoR Steering Committee and the Science Coordinator will ensure that each investigator implements the IRB-approved protocol. We do not anticipate collecting any data that requires IACUC approval. 3.2 Data Archival, Backup and Recovery Each of the data sources will, by the end of the third project year, and for at least three years past the end of the project period, be archived at least semi-annually to a location external to the building where the data is primarily kept and used. To accommodate this need, the Provenance, Archival, and Retrieval System (PARS) be made available as the default archival system for digital scientific data and tools created on this project, as well as the default mechanism for sharing those data and tools within the project, and with named, approved collaborators outside the project (see Data Sharing section.) Investigators who “opt out” of the PARS system must submit plans and receive approval from the Steering Committee to use alternative local, institutional, or national repositories for archiving their data. In April 2012, each investigator will receive training on how to archive scientific data using PARS and will be given a PARS account. Through the PARS Web interface, investigators will be able to upload files of any format, organize those files in folders, and share files and folders with other PARS account holders, granting read and update permissions to individuals and groups as needed. As additional collaborators at any of the participating institutions request accounts on PARS, those accounts will be automatically approved based on the domain suffix for the e-mail address (e.g., @olemiss.edu, @msstate.edu, etc. for each of the participating institutions). Requests for PARS accounts from new collaborators will be approved or denied by the Steering Committee. Account/password policies will be enforced in the PARS system (a

Page 6: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  6  

Unix Server with Web interface). Unless other archival and sharing mechanisms have been arranged and approved, each investigator responsible for creating datasets and tools will deposit the latest versions of those data and tools into the PARS system twice per year, upon publication of the research results, or upon departure from the EPSCoR project or the participating institution. Only EPSCoR data will be archived within PARS.

The backend of the PARS system will be a multi-terabyte scale digital storage system housed at the Mississippi Center for Supercomputing Research in the Data Center on the campus of the University of Mississippi. As with all systems in the UM Data Center, this storage system is monitored around the clock by skilled computer operators and maintained by professional system administrators. In addition, the hardware and data stored on it will be protected by a variety of hardware systems, including: adequate cooling capacity; an uninterruptable power supply (UPS) to protect against power fluctuations; a backup generator system to protect against longer outages; and a 24-hour electronic passkey system to prevent unauthorized access to the building and machine room. Data stored on the system will be carefully protected, with incremental backups made weekly and full backups made monthly. This backend system is online and available now, with an initial allocation of 2 TB of disk space. The disk array will be expanded as needed and may grow to as large as 90-100 TB over the life of the project.

Access to the data in the storage array will be mediated by the front end of the PARS system. Researchers at non-UM campuses will have high-speed network connections to the PARS archive via the Mississippi Optical Network (MissiON), a 10 Gbps managed-wave network connecting all of Mississippi’s research institutions.

3.3 Boilerplate Language for IRB Approval Applications Access to sensitive data [defined as, “any disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation.” 45CFR§46.101(b)(2)(ii)] will be restricted to the investigator and his or her team members who require access for purposes of conducting or analyzing the research. The investigator will ensure that any and all copies of these data that are electronically stored will be encrypted on any storage devices. The MS EPSCoR Provenance, Archival, and Retrieval System (PARS) at UM/MCSR will only be used to store non-sensitive or de-identified data for this research, and the investigator will ensure that all sensitive data are removed before storing to PARS or any other shared repository. 4. Access 4.1 Data Sharing Subsets of MS EPSCoR data will be shared with other MS EPSCoR researchers, other researchers/collaborators on a request/approval basis, K-12 teachers on a request/approval basis, other government agencies or private entities on a request/approval basis, and/or the general public and scientific community. Currently, active planning is underway to share ten of those data sources. By the end of project year 4, specific plans will be developed for sharing all data sources as appropriate (e.g., where sharing does not infringe upon intellectual property rights), and by the end of project year 5, these data sharing plans will have been implemented.

Page 7: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  7  

For data sources whose audience is the general public, the MS EPSCoR Web page will be the default vehicle for data sharing. For those data sources to be shared within the EPSCoR community, or with specifically approved collaborators or partners, the PARS system will be the default sharing tool. Those investigators planning to use other sharing mechanisms must have those plans approved by the Steering Committee. A central clearinghouse web page will be established at the MS EPSCoR Web site identifying the datasets that are available, the sharing mechanism for each, and meta-data needed for sharing, with links to appropriate external sites, as needed, where the data can be requested, accessed, or downloaded. This clearinghouse will be initiated by the end of project year 3 and completed by the end of project year 4.

4.2 Intellectual Property Any potential intellectual property will be handled in accordance with the technology transfer offices at each individual university, and inventoried by the EPSCoR project management team.

5. Preservation and Transfer of Responsibility

MCSR and other designated repositories will maintain archival data for three years after the termination of the project. At that point, responsibility for data preservation will return to the individual investigators.

Page 8: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  8  

Appendix 1 Data Survey of EPSCoR Researchers  

   

Page 9: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  9  

Page 10: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  10  

Page 11: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  11  

Page 12: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  12  

Page 13: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  13  

Page 14: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  14  

Page 15: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  15  

Page 16: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  16  

Page 17: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  17  

Appendix 2 Summary of Survey Results

The tables below summarize the results of the EPSCoR data survey

Supplemental Table 1: Large Data Sets

 

Data Source Max Size

Planned Storage Location

Planned Backup Location

RNA-Seq Reads (Nanduri) 30 TB Institutional Server & Public Repository

MCSR

Protein & Ligand CompChem Data (Doerksen)

12 TB MCSR MCSR

CuO Proteomics Data (Edelmann) 30 TB Institutional Server

On-Site, Separate Building

Molecular Dynamics Trajectories (Wadkins)

10 TB MCSR MCSR

Spectroscopic Data (Hammer) 5 TB Work computer On-Site, In Building

   

Page 18: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  18  

Supplemental Table 2: Types of Data

Types of Data Nbr Input/Output of Computational Simulations 14 Observational Data 11 Input/Output from Wet Lab Experiments 10 Data Compiled from Other Pubic or Private Sources 3 Data Resulting from Manipulation/Processing of Raw Data 1 Total 39

Page 19: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  19  

Supplemental Table 3: Data Sources In

stitu

tion

Bio

SIM

Com

Bio

Com

p C

hem

Ed. &

Out

reac

h

Data Set Description

Sim

ulat

ion

I/O

Obs

erva

tiona

l Dat

a

Com

pila

tions

Web

Lab

I/O

Proc

esse

d D

ata

USM X X HumMod Simulation Results (J. Chen) X USM X X Experimental Data (J. Chen) X USM X X Source Code (J. Chen) X MSU X RNA-seq reads (Nanduri) X MSU X Proteomic (DDF-MuDPIT) data (Nanduri) X UM X Spectroscopic Data (Hammer) X MSU X X Raw chemical sensor readings (Misna) X MSU X X Processed chemical sensor readings (Misna) X UM X X Recombinant Clones Seq Data (Liljegren) X UM X X Microscope Images (Liljegren) X UM X X Proteomics data (Liljegren) X MSU X CompChem output (Gwaltney) X UM X Quantum Chem output (Tschumper) X UMMC

X X Physiological Simulation Responses (Hester) X UM X Redox-sensitive drug release (Jo) X UM X Electrochemistry data (Ritchie) X UM X Observation Notebooks (Hollis) X USM X Instructional Materials (Herron) X USM X Student Test data (Herron) X MSU X Teacher Workshop Evaluations (Echols) X UM X Protein Kinase Data (Doerksen) X UM X Protein & Ligand CompChem Data (Doerksen) x MSU X DNA & Amino Acid Seq Data (Hoffman) X MSU X Raman Spectra Data spreadsheet (Zhang) X USM X Full/Deep RNA transcripts (McLean) X USM X mRNA sequence transcripts (McLean) X MSU X Mined Gene Ontologies (Bridges) X MSU X Epigenetics Analysis Data (Bridges) X MSU X CFD mesh of lung components (Burgreen) X MSU X Viral protein Interaction Model Data (Meyer) X MSU X Viral data proteomic data (Meyer) X MSU X CuO Proteomics Data (Edelmann) X MSU X Cytotoxicity Assay (Edelmann) X MSU X Geometrical Airway Descriptions (Thompson) X MSU X CFD mesh for Airways (Thompson) X MSU X Upper (Thompson) X UM X Molecular Dynamics Trajectories (Wadkins) X UM X Protein Sequence Alignments (Wadkins) X UM X Mass Spec Data (Dass) X

Page 20: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  20  

Supplemental Table 4: Sensitive Datasets and Compliance Responsibilities

Data Source Responsible Investigator

Responsible Compliance Office

Responsible Scientific Leads

HumMod Simulation Results

J. Chen USM IRB BioSim Focus Group Leader (K. Walters)

Experimental Data J. Chen USM IRB BioSim Focus Group Leader (K. Walters) RNA-Seq Reads B. Nanduri MSU IRB CompBio Focus Group Leader (R. Isokpehi) Proteomic (DDF-MuDPIT) Data

B. Nanduri MSU IRB CompBio Focus Group Leader (R. Isokpehi)

Instructional Materials S. Herron USM IRB Ed/Outreach Coordinator (K. Echols) Student Test Data S. Herron USM IRB Ed/Outreach Coordinator (K. Echols) Teacher Workshop Evaluations

K. Echols MSU IRB Ed/Outreach Coordinator (K. Echols)

Page 21: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  21  

Supplemental Table 5: Data Sharing Potential

Name

MS

EPS

CoR

re

sear

cher

s

Oth

er

Res

earc

hers

K-1

2 T

each

ers

Col

lege

st

uden

ts,

inst

ruct

ors

Gov

. age

ncie

s or

pri

vate

en

titie

s T

he P

ublic

HumMod Simulation Results (J. Chen) X X X X X X

Experimental Data (J. Chen) X X X X X X

Source Code (J. Chen) X X X X X X

RNA-seq reads (Nanduri) X X

Proteomic (DDF-MuDPIT) data (Nanduri) X X

Spectroscopic Data (Hammer)

Raw chemical sensor readings (Misna) X X

Processed chemical sensor readings (Misna) X X

Recombinant Clones Seq Data (Liljegren)

Microscope Images (Liljegren) X X

Proteomics data (Liljegren) X X

CompChem output (Gwaltney) X X X X

Quantum Chem output (Tschumper) X X

Physiological Simulation Responses (Hester) X X X X

Redox-sensitive drug release (Jo) X X X

Electrochemistry data (Ritchie) X

Observation Notebooks (Hollis) X

Instructional Materials (Herron) X X X X X

Student Test data (Herron) X X X X X X

Teacher Workshop Evaluations (Echols) X X

Protein Kinase Data (Doerksen) X X

Protein & Ligand CompChem Data(Doerksen) X X X X

DNA & Amino Acid Seq Data (Hoffman) X X

Raman Spectra Data spreadsheet (Zhang) X X

Full/Deep RNA transcripts (McLean) X X X

~400bp mRNA sequence transcripts (McLean) X X X X

Mined Gene Ontologies (Bridges) X X X

Epigenetics Analysis Data (Bridges) X X X

CFD mesh of lung components (Burgreen) X X X X X

Viral protein Interaction Model Data (Meyer) X

Viral data proteomic data (Meyer) X

CuO Proteomics Data (Edelmann) X X X X

Cytotoxicity Assay (Edelmann) X X X X X

Geometrical Airway Descriptions (Thompson) X X X X X

CFD mesh for Airways (Thompson) X X X X X

Upper (Thompson) X X X X X

Molecular Dynamics Trajectories (Wadkins) X X X X X

Protein Sequence Alignments (Wadkins) X X X X X

Mass Spec Data (Dass)

Page 22: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  22  

Supplemental Table 6: Data Sharing Approaches and Early Plans

Name

Secu

re S

FTP

Port

able

med

ia

Web

inte

rfac

e

Des

ktop

clie

nt

I don

't kn

ow

Web

E-M

ail

Hum

Mod

R

epos

.

NC

BI

gith

ub

HumMod Simulation Results (J. Chen) X X

Experimental Data (J. Chen) X

Source Code (J. Chen) X

RNA-seq reads (Nanduri) X

Proteomic (DDF-MuDPIT) data (Nanduri) X

Spectroscopic Data (Hammer)

Raw chemical sensor readings (Misna) X

Processed chemical sensor readings (Misna) X

Recombinant Clones Seq Data (Liljegren) X

Microscope Images (Liljegren) X

Proteomics data (Liljegren) X

CompChem output (Gwaltney) X X

Quantum Chem output (Tschumper) X X

Physiological Simulation Responses (Hester) X X X

Redox-sensitive drug release (Jo) X X

Electrochemistry data (Ritchie) X

Observation Notebooks (Hollis) X X X

Instructional Materials (Herron) X

Student Test data (Herron) X

Teacher Workshop Evaluations (Echols) X

Protein Kinase Data (Doerksen) X X

Protein & Ligand CompChem Data(Doerksen) X

DNA & Amino Acid Seq Data (Hoffman) X

Raman Spectra Data spreadsheet (Zhang) X X X

Full/Deep RNA transcripts (McLean) X

~400bp mRNA sequence transcripts (McLean) X

Mined Gene Ontologies (Bridges) X

Epigenetics Analysis Data (Bridges) X X X

CFD mesh of lung components (Burgreen) X X X

Viral protein Interaction Model Data (Meyer) X

Viral data proteomic data (Meyer) X

CuO Proteomics Data (Edelmann) X X

Cytotoxicity Assay (Edelmann) X X X

Geometrical Airway Descriptions (Thompson) X X X

CFD mesh for Airways (Thompson) X X X

Upper (Thompson) X X X

Molecular Dynamics Trajectories (Wadkins) X X X

Protein Sequence Alignments (Wadkins) X X X

Mass Spec Data (Dass) X

Page 23: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  23  

Supplemental Table 7: Computational Tools, Development Environments, and Execution Environments

Computational Tool

Development Environment

Execution Environment

Win

dow

s/PC

Win

dow

s M

obile

Mac

OS

- de

skto

p/la

ptop

Web

Bro

wse

r

Uni

x

Lin

ux

HPC

VisBubbles (J. Chen) windows, linux, mac X X X X Electronic Structure Methods (Tschumper)

C/C++/perl/python/ shell scripts/awk

X X X

Post-Hoc analysis of simulation data sets (Hester)

Java X X

Real time analysis (Hester) C# X Virtual screening funnel (Doerksen) X Cross ontology data mining tool (Bridges)

Perl X X X

Mesh generation tools for human lungs (Burgreen)

c++, linux, windows X X X X X

Machine Learning Rule Extraction (Y. Chen)

X X

Machine Learning Feature Selection (Y. Chen)

X X

Machine Learning Instance Based Learning (Y.Chen)

X X

Python scripts for Analyzing Protein Structure (Wadkins)

Python X X X X X

Provenance, Archive and Retrieval System (Wilkins)

Unix/Ruby X X X

Bio/Chem Rule Learning Techniques (Wilkins)

Matlab and C. X X X X X

   

Page 24: Data Management Plan - Mississippi EPSCoR Program · Data Management Plan EPS 0903787 ... Git, and modern social networking. ... the standard formats used to store and disseminate

  24  

Supplemental Table 8: Sharing of Computational Tools

Tool name

MS

EPS

CoR

res

earc

hers

Oth

er r

esea

rche

rs

K-1

2 te

ache

rs

Col

lege

inst

ruct

ors &

stud

ents

Gov

. age

ncie

s & p

riva

te p

artn

ers

Shar

e bi

nary

cod

e

Shar

e so

urce

cod

e

Web

dow

nloa

d

Oth

er

Plan

s for

Sha

ring

VisBubbles (J. Chen) X X X X X X X web interface Electronic Structure Methods (Tschumper) X X X Post-Hoc analysis of simulation data sets (Hester)

X X X X X X X

Real time analysis (Hester) X X X X X X X github site Virtual screening funnel (Doerksen) X X X X Cross ontology data mining tool (Bridges) X X X X Mesh generation tools for human lungs (Burgreen)

X X X X X

Machine Learning Rule Extraction Software (Y. Chen)

X X X X

Machine Learning Feature Selection Software (Y. Chen)

X X X X

Machine Learning Instanced Based Learning Software (Y.Chen)

X X

Python scripts for Analyzing Protein Structure (Wadkins)

X X X X Email

PARS: Provenance, Archive and Retrieval System (Wilkins)

X X

Bio/Chem Rule Learning Techniques (Wilkins)

X Algorithms in Publications