The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management...
-
Upload
baldric-smith -
Category
Documents
-
view
217 -
download
0
Transcript of The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management...
The US Long Term The US Long Term Ecological Research Ecological Research (LTER) Network: (LTER) Network: Site and Network Level Site and Network Level Information ManagementInformation Management
Kristin VanderbiltKristin VanderbiltDepartment of BiologyDepartment of BiologyUniversity of New MexicoUniversity of New MexicoAlbuquerque, New Mexico USAAlbuquerque, New Mexico USA
LTER NetworkLTER Network
Six LTER sites Six LTER sites were established were established in 1980 with in 1980 with funding from NSFfunding from NSF
Now there are 26 Now there are 26 sites with 1800 sites with 1800 associated associated scientists and scientists and students.students.
Research themes for Research themes for all sites:all sites: BiodiversityBiodiversity Net primary productivityNet primary productivity Disturbance patternsDisturbance patterns BiogeochemistryBiogeochemistry
Goals of the LTER Goals of the LTER NetworkNetwork Understand general ecological Understand general ecological
phenomena that occur over long phenomena that occur over long temporal and broad spatial scalestemporal and broad spatial scales
Conduct major Conduct major data synthesesdata syntheses Provide Provide informationinformation for the identification for the identification
and solution of societal problemsand solution of societal problems Create a Create a legacylegacy of of well-designed andwell-designed and
documented long-term experiments and documented long-term experiments and observationsobservations for use by future for use by future generations generations
Data are the legacy of Data are the legacy of the LTER Networkthe LTER Network Integration of information Integration of information
management with research has management with research has been part of the LTER mandate since been part of the LTER mandate since the beginningthe beginning
The goal of information The goal of information management is to make high management is to make high quality, well-documented data easy quality, well-documented data easy to find, access, and use now and in to find, access, and use now and in the futurethe future
LTER Data Release PolicyLTER Data Release Policy http://www.http://www.lternetlternet..eduedu/data//data/netpolicynetpolicy.html.html
Data and information derived from publicly Data and information derived from publicly funded research in the U.S. LTER Network … funded research in the U.S. LTER Network … are to be made available online with as few are to be made available online with as few restrictions as possible, on a restrictions as possible, on a nondiscriminatory basis. nondiscriminatory basis.
LTER Data Release PolicyLTER Data Release Policy http://www.http://www.lternetlternet..eduedu/data//data/netpolicynetpolicy.html.html
Data and information derived from publicly funded Data and information derived from publicly funded research in the U.S. LTER Network … are to be research in the U.S. LTER Network … are to be made available online with as few restrictions as made available online with as few restrictions as possible, on a nondiscriminatory basis. possible, on a nondiscriminatory basis.
There are two data types: There are two data types: – Type I – data are to be released to the general public Type I – data are to be released to the general public
within 2 years from collectionwithin 2 years from collection and no later than the and no later than the publication of the main findings from the dataset and, publication of the main findings from the dataset and,
– Type II - data are to be released to restricted audiences Type II - data are to be released to restricted audiences according to terms specified by the owners of the data. according to terms specified by the owners of the data. Type II data are considered to be exceptional and should Type II data are considered to be exceptional and should be rare in occurrencebe rare in occurrence. Some examples of Type II data . Some examples of Type II data restrictions may include: locations of rare or endangered restrictions may include: locations of rare or endangered species, data that are covered under prior licensing or species, data that are covered under prior licensing or copyright (e.g., SPOT satellite data), or covered by the copyright (e.g., SPOT satellite data), or covered by the Human Subjects Act. Human Subjects Act.
Information Management Information Management responsibilities differ at responsibilities differ at site and network levels: site and network levels:
– quality data and metadata must be quality data and metadata must be archived and made accessible at archived and made accessible at each each LTER SiteLTER Site
– And tools must be created to And tools must be created to integrate site databases across the integrate site databases across the LTER NetworkLTER Network
26 LTER sites = 26 26 LTER sites = 26 different approaches different approaches to information to information managementmanagement Structure and content of Structure and content of
metadata varymetadata vary Mechanisms for storing and Mechanisms for storing and
manipulating data differmanipulating data differ– flat ASCII files, RDBMS; Matlab, SAS, flat ASCII files, RDBMS; Matlab, SAS,
SQLSQL Different search mechanisms for Different search mechanisms for
datadata
Typical LTER Site Typical LTER Site Information Management Information Management SystemSystem Designated information manager (IM)Designated information manager (IM) Standard procedures for working with Standard procedures for working with
data to ensure adequate data to ensure adequate documentation and quality documentation and quality assuranceassurance
Hardware for data storageHardware for data storage A web-pageA web-page Back-up systemBack-up system
Project Initiation: Scientist submits protocols and
project description to IM
Information Management Procedures at the Sevilleta LTER
Data Entry and QA/QC by Scientist
Project Initiation: Scientist submits protocols and
project description to IM
Data Collection
Information Management Procedures at the Sevilleta
Data Entry and QA/QC by scientist
Project Initiation: scientist submits protocols and
project description to IM
Data Collection
Metadata submitted by scientist in Excel
Template
Metadata and data reviewed by IM
Information Management Procedures at the Sevilleta
Data Entry and QA/QC by scientist
Project Initiation: scientist submits protocols and
project description to IM
Data Collection
Metadata submitted by scientist in Excel
Template
Metadata and data reviewed by IM
ASCII file created
containing data and metadata
Files archived on Sevilleta
server
Information Management Procedures at the Sevilleta
Sevilleta data are stored in an ASCII text Sevilleta data are stored in an ASCII text file with a defined structure: file with a defined structure:
\log
History of the data file (when created, updated, errors fixed, etc.)
\doc
Metadata
\header
Variable names
\data
Data records
Sevilleta Data ArchiveSevilleta Data Archive
Data sets are stored in directories clustered by theme….
archive
nutrientanimal plant
phenology npptransectdecomposition soilN
Data Entry and QA/QC by scientist
Project Initiation: PI submits protocols and
project description to IM
Data Collection
Metadata submitted by PI in Excel
Template
Metadata and data reviewed by IM
ASCII file created
containing data and metadata
Files archived on Sevilleta
server
Data Made Web Accessible
Information Management Procedures at the Sevilleta
Ecological Information Ecological Information Management System at Management System at Sevilleta illustrates:Sevilleta illustrates: An IM system can be built with An IM system can be built with
readily available toolsreadily available tools– ExcelExcel– ASCII text filesASCII text files– Web page design toolWeb page design tool
IT genius not required; these IT genius not required; these tools are easy to learntools are easy to learn
Long Term Ecological Research Network Information Management
System
Objective of the NISObjective of the NIS
Enhance discovery and access to Enhance discovery and access to data across LTER sites in order to data across LTER sites in order to facilitate synthesis and facilitate synthesis and collaborative ecological researchcollaborative ecological research
Strategies for enhancing Strategies for enhancing data discovery and data discovery and access:access:
Adopt Ecological Metadata Language Adopt Ecological Metadata Language (EML) as standard(EML) as standard
Establish an all-site data catalog based Establish an all-site data catalog based on EMLon EML
Establish a single-portal web interface Establish a single-portal web interface for querying catalogfor querying catalog
Adoption of a metadata Adoption of a metadata standard is key to standard is key to building NISbuilding NIS Ecological Metadata Language Ecological Metadata Language
(EML) is an XML-based standard (EML) is an XML-based standard – XML is similar to HTML, except the XML is similar to HTML, except the
tags are customized to reflect the tags are customized to reflect the information contentinformation content
Why are metadata Why are metadata standardized ?standardized ? Use a common set of understandable Use a common set of understandable
terms; information is described in a terms; information is described in a similar waysimilar way
Location of information (metadata Location of information (metadata descriptors) can be found in the same descriptors) can be found in the same place; facilitating entry & retrievalplace; facilitating entry & retrieval
““Tools” can be built; e.g., metadata Tools” can be built; e.g., metadata entry and searches can be automatedentry and searches can be automated
<eml><dataset> <title> Sevilleta LTER Net Primary Productivity Data 2004</title><creator>
<individualName><salutation> Dr. </salutation><givenName> Esteban </givenName><surName> Muldavin </surName>
</individualName><eMailAddress> [email protected]
</eMailAddress></creator><abstract>Net primary production (NPP) is a fundamental ecological variable that measures rates of carbon consumption and fixation. </abstract><keywordSet>
<keyword>biomass</keyword><keyword>ANPP</keyword><keyword>LTER</keyword>
</keywordSet><contact>
<positionName> Information Manager </positionName><eMailAddress> [email protected]
</eMailAddress></contact></dataset></eml>
Example of EML
http://knb.ecoinformatics.org/index.jsp
Morpho: A tool for creating EML
LTER Metaca
t
SEV
EML
CAP
EML
JRN
EML
CWT
EML
Excerpt from: Report on the CODATA Workshop on Archiving Scientific and Technical (S&T) Data, 20-21 May 2002, Pretoria, South Africa
Managing S&T Data: Preservation in the Managing S&T Data: Preservation in the Broader Context: Broader Context: Needs for future access of Needs for future access of datadata
– Shared standards and best practices for repository Shared standards and best practices for repository management, federated archives, and metadatamanagement, federated archives, and metadata
– An integrated framework … that provides a shared An integrated framework … that provides a shared way of communicating and standard technologyway of communicating and standard technology
Excerpt from: Report on the CODATA Workshop on Archiving Scientific and Technical (S&T) Data, 20-21 May 2002, Pretoria, South Africa
Managing S&T Data: Preservation in the Managing S&T Data: Preservation in the Broader Context: Broader Context: Needs for future access of Needs for future access of datadata
– Shared standards and best practices for repository Shared standards and best practices for repository management, federated archives, and metadatamanagement, federated archives, and metadata
LTER approach: LTER Network data release policy, LTER approach: LTER Network data release policy, NIS, EMLNIS, EML
– An integrated framework … that provides a shared An integrated framework … that provides a shared way of communicating and standard technologyway of communicating and standard technology
Excerpt from: Report on the CODATA Workshop on Archiving Scientific and Technical (S&T) Data, 20-21 May 2002, Pretoria, South Africa
Managing S&T Data: Preservation in the Managing S&T Data: Preservation in the Broader Context: Broader Context: Needs for future access of Needs for future access of datadata
– Shared standards and best practices for repository Shared standards and best practices for repository management, federated archives, and metadatamanagement, federated archives, and metadata
LTER approach: LTER Network data release policy, NIS, LTER approach: LTER Network data release policy, NIS, EMLEML
– An integrated framework … that provides a shared An integrated framework … that provides a shared way of communicating and standard technologyway of communicating and standard technology
LTER approach: LTER Metacat metadata database and LTER approach: LTER Metacat metadata database and centralized data catalogcentralized data catalog