VOCABULARIES A data management presentation. Data management best practices Inventory of...

25
VOCABULARIES A data management presentation

Transcript of VOCABULARIES A data management presentation. Data management best practices Inventory of...

Page 1: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

VOCABULARIESA data management presentation

Page 2: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

2

Data management best practices

• Inventory of resources/datasets– Database level or series of datasets/collections in dbases

• Archived– Standardized format & schema– Standards and vocabulary– QC procedures – tests and tools

• Discoverable– metadata

• Accessible• Restrictions; rights and rights holder

Page 3: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

3

Presentation Overview

1. What are controlled vocabularies and why are they needed?

2. How do you create a vocabulary?3. Which fields in the OBIS Schema would

benefit from standardization?4. How can OBIS and the OBIS community

contribute to vocabulary activities?

Page 4: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

4

What are controlled vocabularies & why are they needed?

A controlled vocabulary is a collection of terms that are:• Accepted: The term must adhere to community practices.

• Defined: The terms are precisely characterized. Typically, this means the terms have rigorous definitions.

• Managed: In general, there will be a body of experts that create and maintain the controlled vocabulary. The controlled vocabulary maintenance will involve periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.

• Persistent: Once created the terms should never be deleted and modification should be avoided / kept to a minimum maybe to make a definition clearer but never narrower.

Source: https://marinemetadata.org/guides/vocabs/vocdef

Page 5: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

5

What are controlled vocabularies & why are they needed?

Controlled vocabularies provide these abilities by:• establishing the permissible terms to be used

• maintaining the proper and agreed-upon spelling of the terms

• clarifying terms for those who are new to the community

• eliminating the use of arbitrary terms that can cause inconsistencies and confusion

Source: https://marinemetadata.org/guides/vocabs/vocdef

Page 6: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

6

IODE GE-BICH: Group of Experts on Biological and Chemical Data Management & Exchange Practices.

WoRMS did not exist!

Page 7: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

7

Plan A

https://sites.google.com/site/gebichwiki/vocabularies-2/plankton-nets

Page 8: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

8

The British Oceanographic Data Centre

https://www.bodc.ac.uk/data/codes_and_formats/parameter_codes/

Page 9: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

9

BODC

http://seadatanet.maris2.nl/v_bodc_vocab_v2/welcome.asp

Page 10: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

10

Page 11: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

11

Which fields in the OBIS Schema would benefit from standardization?

http://rs.tdwg.org/dwc/terms/

Taxon and location terms

Page 12: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

12

How can OBIS and the OBIS community contribute to vocabulary

activities?

Page 13: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

13

IODE GE-BICH

https://sites.google.com/site/gebichwiki/vocabularies-2/plankton-nets

Page 14: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

14

EMODnet – standardised species attributes vocabulary

http://terms.tdwg.org/wiki/MarineSpeciesTraits

Page 15: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

15

GBIF

https://code.google.com/p/darwincore/wiki/DwCTypeVocabularyhttp://lists.tdwg.org/mailman/listinfo/tdwg-content

Page 16: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

16

GBIF

http://lists.tdwg.org/mailman/listinfo/tdwg-content

Page 17: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

17

Compile terms & definitionsand identify/develop vocabularies

Terms for data record fields and for metadata descriptions and for analysis/statistics• DwC Basis of record• DwC Life stages• DwC Gender & Reproductive condition• DwC Sampling protocol; gear, deployment• DwC Dynamic Properties & Keyword pairs• DwC Measurement types• Metadata Cruise numbers; ship names

Page 18: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

18

Compile terms & definitionsand identify/develop vocabularies

Terms for data record fields and for metadata descriptions and for analysis/statistics• Taxon name… http://www.marinespecies.org/

• Location, Locality, WaterBody http://www.marineregions.org/

Page 19: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

19

Require forum for discussion

Compile info discuss resolve /adopt promote usage thru OBIS manual

Page 20: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

20

Photogallery

http://www.marinespecies.org/carms/vocabulary.php

Page 21: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

21

Photogallery

http://www.marinespecies.org/carms/vocabulary.php

Photo gallery images • title• description or definition• hyperlink to a standard (?)• Citation• Private/public• May be grouped into albums

Page 22: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

22

Comments?

Page 23: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

23

Dataset mapping example

Page 24: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

24

NAFO dataset exampleICNAF Summ.Doc. No. 76/VI/28

Tagging Activities Reported by Member Countries for 1975

• Identified dataset of interest • report in pdf format only, data not digitized

• Digitized tables and created spreadsheet with data records• Reviewed fields and content• Is there a vocabulary for tags?

• Consulted local expert. He provided FAO report which referenced ICES

• Contacted ICES to see if they had terms in a vocabulary• Visited local expert and took pictures of tags. Created

photogallery• Are location place names in a gazetteer? Have a dbase for

Canadian Placenames…• Are area definitions in gazetteer?

• Marineregions.org NAFO areas (update)• Need tool to bulk download MRGIDs and nominal

positions• Can’t get EXCEL macro to work…need help!

• Map common names to scientific names (salmon!!!)• Map scientific names to LSID• DwC fields • Issues – potential for replicate submissions??• Who has the captures? FAO Fisheries Technical Paper No. 190: Materials and methods

used in marking experiments in fishery researchJ. Cons. int. Explor. Mer-1965-A Guide to Fish Marks-87-160 (2).

Page 25: VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.

25

NAFO dataset processing questions• Original dataset has limited info – common name and general area and place name – is it

suitable for OBIS?• Who holds the rights? Contact and Rights holder is NAFO not the indiv countries. • Should indiv countries be listed as associated parties?• What if the institutions no longer exist?

• Basis of record – human observation – but this is a summary table and not indivi observations• Tagging datasets should populate IndividualID field – not suitable for this dataset as series of

tags released. IF we find source dataset then can create new resource with release and capture info. SO – where do we indicate that this is a ‘summary or product or …’

• Location info• How do we reference MarineRegions.org and indicate that the decimal Latitude and Longitude

values are ‘fuzzy’?• In the metadata under taxonomic extents we can mention that the source dataset contained

common names only• How do we acknowledge WoRMS?

• Tags and vocabulary• How can we point to the photo gallery or vocabulary list for tags?• Add FAO and ICES references to citation section under bibliography