Microbial resources data standards and WDCM MDSMicroalgae 8 (3) Bacteria 11(6) Cyanobacteria 11(5)...
Transcript of Microbial resources data standards and WDCM MDSMicroalgae 8 (3) Bacteria 11(6) Cyanobacteria 11(5)...
Microbial resources data standards
and WDCM MDS
Outlines
International data standards relevant to Microbial
resources information
OECD best practice guidelines and CABRI
MINE
ABCD and Darwin Core
MIGS\MIMS\MIMARKS
StrainInfo and MCL
How to incorporate these standards in WDCM datasets
design
WDCM minimum datasets and recommended datasets
Standards Architecture
Biodiversity Information Standards (TDWG) Principal
Biodiversity data will be modelled as graph
of identifiable objects
Objects are defined by an ontology: Understandable by
humans and computers
Requires globally unique identifiers to link objects
across the network
Requires a transport protocol to ‘wrap’ the
biodiversity data for transport: TAPIR
OECD best practice guidelines
Organism Type MDS(RDS)
Filamentous fungi 10(6)
Yeasts 10(6)
Microalgae 8 (3)
Bacteria 11(6)
Cyanobacteria 11(5)
Archaea 11(5)
Protozoa 11(5)
Plasmids 13(17)
Phages 11(1)
Viruses 12
cDNA and gDNA Libraries
6
Common Access to Biological Resources
and Information
Partner collections: BCCM, CABI, CBS, CRBIP, DSMZ, ICLC, NCCB, NCIMB,
28 catalogues,
Minimum datasets , Recommended datasets, Full datasets
The Minimum and Recommended datasets is in conformity with OECD best practice guidelines
Full Datasets
Substrate
Genotype
Pathogenicity
Enzyme Production
Metabolite Production
Remarks
Price Code
Full Datasets
Sexual state
Pathogenicity
Enzyme Production
Metabolite Production
Catalogue entry
Remarks
Price Code
Plasmids
Microbial Information
Network Europe (MINE)
Microbial Information Network Europe (MINE) is being
constructed by a number of major microbial culture
collections in countries of the European Community,
with the support of the Biotechnology Action Programme
(BAP) of the Commission of the European Community.
Species records
strain records
synonym records,
alternative morphonym records
Microbial Information
Network Europe (MINE)
Minimum datasets of 30 fields
Full datasets: 99 fields, grouped in 12 blocks: 1. internal administration
2. Name
3. strain administration
4. Status
5. environment and history
6. biological interactions
7. sexuality
8. properties (cytology, biomolecular data)
9. genotype and genetics
10. growth conditions
11. chemistry and enzymes
12. practical applications
Biodiversity Information Standards Previously: Taxonomic Databases Working Group (TDWG)
is an international not-for-profit group
that develops standards and protocols for
sharing biological data…
TDWG Groups
Biological descriptions
Geospatial
Global identifiers
Imaging
Invasive species
Literature
Observations and specimens
TDWG access protocol for information retrieval (TAPIR)
Taxon names and concepts
Technical Architecture
Access to Biological Collections Data Standard
(ABCD)
ABCD Schema was developed within the BioCASE project
(Biological Collections Access Service for Europe)
Standard for access to and exchange of data about specimens
and observations including living and preserved specimens.
ABCD is much more complex than Darwin Core containing more
than 1300 fields.
It is possible to map the ABCD element to Darwin Core
elements in order for data to be shared between systems.
Darwin Core
The Darwin Core is a standard designed to
facilitate the exchange of information about the
geographic occurrence of species and the existence
of specimens in collections.
It includes 184 terms.
Widely used in global and regional projects such
as GBIF
Without field relative to cultures such as
Restrictions, Toxicity, Identification, Deposition
and Isolation data, Conditions for growth, Storage
Methods, Race, Mutant, Serovar
XML schema of DarwinCore
Genomic Standards Consortium (GSC)
……towards richer descriptions of our collection of genomes,
metagenomes and marker genes …..to promote mechanisms for
standardizing the description of (meta)genomes, including the exchange
and integration of (meta)genomic data.
MIGS\MIMS\MIMARKS
The minimum information about a (meta)genome
sequence(MIGS\MIMS) specification
To describe genomic and metagenomic sequences.
MIGS/MIMS has been extended and adapted for
describing environmental sequences: MIMARKS
MCL
10 Classes, nearly 100 fields 1. Culture 2. Strain 3. Sample 4. Isolation 5. Medium 6. Publication 7. Deposit 8. CatalogDescription 9. BRC 10. StrainInfo
Standards for Journals and
Publications
How to incorporate these standards in
WDCM datasets design
Taxonomic Info
Strain Info
Environment and history
Properties\Phenotypic info
Sequence and genomic info
Reference
WFCC Global Catalogue of Microorganisms
Taxon
Concept
Schema(TC
S)
Darwin Core, MINE, MIMARKS Genbank Schema
MIGS\MIMS
Pubmed
Endnote
WDCM Minimum Data Sets
and
recommended datasets
ATCC JCM NBRC CBS DSMZ BCC …
Strain number、Name、Organism type、 Date of deposition、History 、isolated from、Geographic origin、Condition for growth、Other collection numbers、Application、Reference
WDCM minimal datasets
Indexing System
Isolation source
Original Location
Application
WDCM recommended datasets
Environment package
Application package
Sequence information package
Biochemical and Physiological package
Searching by
Isolation source: human related、soil、water….
Application and products:
Enzyme、biofuel…
OECD Guidelines Darwin core ABCD code
MCL
JSCC ABRCN EBRCN CABRI ….
WDCM experts working group
Extremophiles type:
High temperature、PH
Geographic Characteristics: Hot spring、salt
lake
OECD JCM NBRC DSMZ CBS ATCC STRAININFO ABRCN JSCC WDCM Accession number √ √ √ √ √ √ √ √ Strain number Other collection
numbers √ √ √ √ √ √ √ √
Name √ √ √ √ √ √ √ √ Genus Name
Species_epithet
√ √ Date of
deposition Organism type √ √ √ √ √ √ √
Restrictions √ √ √ Status √ √ √ √ √
History of deposit √ √ √ √ √ √
Condition for
growth √ √ √ √ √ √
Condition for growth:
Temperature /medium
Form of supply √ √ shipped √ Geographic origin √ √ √ √ √ √ Misapplied names
Isolated from √ √ √ √ √ √ √ Mutant
Literature √ √ √ √ √ √ √ Sexual state
Race
Production Application Application Application Applicati
on Synonym Application
Biochemistry
/Physiology
sequence Patents Designations sequence
Cell wall Synonymous Name Deposited
by Depositor sample info Depositor
Fatty acid Rehydration Fluid Biosafety Level
medium info
Quinone Biosafety Level depositor
info
G+C content
Mating Type
Phylogeny Genetic Marker
Plant Quarantine No.
Animal Quarantine No.
Herbarium No.
Restriction
Implementation of Data
standards in WDCM data
management system
EML