Special Considerations for Archiving Data from Field Observations A Presentation for...

13
Special Considerations Special Considerations for Archiving Data from for Archiving Data from Field Observations Field Observations A Presentation for “International Workshop on A Presentation for “International Workshop on Strategies for Preservation of and Open Strategies for Preservation of and Open Access to Scientific Data” Access to Scientific Data” June 24, 2004 June 24, 2004 Beijing, China Beijing, China Raymond McCord Raymond McCord Oak Ridge National Laboratory* Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S.

Transcript of Special Considerations for Archiving Data from Field Observations A Presentation for...

Page 1: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Special Considerations for Special Considerations for Archiving Data from Field Archiving Data from Field

ObservationsObservations

A Presentation for “International Workshop on Strategies for A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” Preservation of and Open Access to Scientific Data”

June 24, 2004June 24, 2004

Beijing, ChinaBeijing, China

Raymond McCord Raymond McCord

Oak Ridge National Laboratory*Oak Ridge National Laboratory*

Oak Ridge, Tennessee, USAOak Ridge, Tennessee, USA*Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy

under contract DE-AC05-00OR22725under contract DE-AC05-00OR22725

Page 2: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

PresumptionsPresumptions

• Archives depend on logical rules for information structures and consistent codes for metadata.

• “Field Observations” will contain unexpected variations that will challenge rules.

• Containment of this problem can be accomplished with some planning.

Page 3: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Presentation StrategyPresentation Strategy

• This presentation focuses on special issues for “data management planning”.

• Archives must:– Determine how these special issues were

resolved in the original data management plan.

OR– Resolve these issues by further processing

or documentation as data are added to the Archive.

Page 4: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Challenges from Field DataChallenges from Field Data

• Multiple schemes for location information

• Temporary changes in methods

• Unmeasurable events will occur

• Evolving references lists

• Find a containment strategy for exceptions

Page 5: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Location Information - CoordinatesLocation Information - Coordinates

• Multiple geographic coordinate systems– Local / engineering systems

• Unprojected• Rotational differences

– Global systems• Which projection?• Which projection parameters

• Conversions may be “mathematically irreversible”

• Be careful and test changes before large scale conversions are made– Visualization capability is essential!!

Page 6: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Location Information – Place NamesLocation Information – Place Names

• Multiple naming schemes– “Folk” names – unofficial– Divergence in official naming schemes at local,

regional, and national scales– Connecting historical name changes

• Avoid including a measurable parameter in a “location code”– Stream mile story

• Names for sampling stations included “stream mile”.• Names were changes for part of the data after a higher

resolution mapping occurred and the “stream mile” changed.• Using the entire collection of information was complicated.

Page 7: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Temporary Changes in MethodsTemporary Changes in Methods

• The field sampling protocol is decided to be insufficient (too much or too little).– Need to structure the metadata to record the

temporary change• Field observations are ongoing at a remote

site– Part of the instruments breakdown– Remaining instruments continue operations– Need a robust scheme for missing value

representation• Data analyses must correctly exclude missing values

Page 8: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Unmeasurable Events (too little)Unmeasurable Events (too little)

• How do you code the results from well water when the well is dry at the sampling time?– Need a robust scheme for missing values – be consistent!!– Need a decision rule that skips the entire record

• How do you record values that are below the detection limit (but not zero)?– Set all values to the minimum detection limitor– Set all values to the midpoint between detection and zeroor– Set all values to zeroor– Retain estimated value, but include a quality flag.

• Select one of these strategies and document the choice.– The choice can have significant impacts on summary statistics.

Page 9: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Unmeasurable Events (too much)Unmeasurable Events (too much)

• How do you record the biological population when there are too many individuals to count?– Record some arbitrary large numberor– Flag as unmeasureable

• Similar problems can occur with wide ranges in chemical concentrations.

• Different schemes may have impacts on:– Results from statistical analyses– Setting quality assurance limits

Page 10: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Evolving Reference ListsEvolving Reference Lists

• Taxonomic lists*– Infrequent individuals may not be fully identifiable

• Need other lifecycle stages– Later samples enable fuller identification– Need to recode earlier records to match newer identification (??)– How do you analyze or summarize the entire data collection?

• Chemical constituents*– Chemicals with low concentrations maybe measured as a group– Additional and later locations contain higher concentrations and

fuller chemical speciation is determined.– How do you analyze or summarize the entire series of

measurements?• *(Assumes agreement on single and accepted

classification scheme; may not be true!!)

Page 11: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Containment Strategy for Containment Strategy for ExceptionsExceptions

• “90 / 10” rule– ~90% of the data can be described by a few logical rules.– ~10% of the data cannot be described by rules and contains

numerous and isolated exceptions.

• Guidance for decisions– Consider how many rules can be explained to future data users. – Put the information that cannot be described by rules in an

alternative structure that can:• be labeled as “user beware”.• support detailed and varied documentation.• Accommodate and communicate numerous exceptions.• (Query logic vs. cataloged directory tree)

Page 12: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Evaluation of Containment Strategy Evaluation of Containment Strategy

• More guidance for rule decisions– When the logical rules are “too many”, the archiving

process will become too inefficient and tedious. – Adjust as needed.

• Eliminate spurious variation in codes and logic– For example: inconsistent abbreviations, punctuation,

and capitalization– Minimizes the containment of exceptions.

Page 13: Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Comments and Questions…Comments and Questions…