Grassbase: the data volume challenge

22
Grassbase: the data volume challenge Maria Vorontsova 26 May 2011

Transcript of Grassbase: the data volume challenge

Page 1: Grassbase: the data volume challenge

Grassbase: the data volume challengeMaria Vorontsova 26 May 2011

Page 2: Grassbase: the data volume challenge

Grassbase: the first botanical agglomerate database behemoth?

Strengths: coherent classification systemcomplete and up to date

Weaknesses: divided stakeholder communitydated softwarepoor web functionalitylimited usefulness for identificationno plan for data exploitation

CATE Araceae2,000 descriptions

Solanaceae Source1,000 descriptions

Grassbase11,161 descriptions

area of boxes proportional to number of descriptions included

Page 3: Grassbase: the data volume challenge

Grass genera are defined by the variability in spikelet and floret composition

pooid spikelet similar to Bromus (Holcus)

panicoid spikelet common in tropical grasses (Panicum)

glume 2

lowerlemma

glume 1

upperpalea

upperlemma lower

palea

Page 4: Grassbase: the data volume challenge

FAO 2009: global consumption of 10 major vegetal foods (2003-2005)

Page 5: Grassbase: the data volume challenge

Grass taxonomy at Kew

Otto Stapf: Flora of Tropical Africa. 1934.

Charles Hubbard: Grasses (of Britain). 1954.

C.R. Metcalfe: Anatomy of the Monocotyledons: Gramineae. 1960.

N.L. Bor: The Grasses of India. 1960.

Derek Clayton: Flora of Tropical East Africa. 1970. Flora of West Tropical Africa. 1972. Genera Graminum. 1986. World Grass Flora: “The Kew View” 1985 onwards.

Page 6: Grassbase: the data volume challenge
Page 7: Grassbase: the data volume challenge

1985Lazarides, Clayton &

PalmerWorld Grass Species

600 characters

2011Grassbase

GrassWorld1017 characters

1977Clifford & WatsonAustralian Grass

Genera332 characters

Evolutionary reconstruction of DELTA grass datasets

1992Watson & Dallwitz

Grass Genera of the World

Australian National University morphological data

Clayton Genera Graminum dataset

25 years full time

data entry

Page 8: Grassbase: the data volume challenge

NAMESspecies + infra

63,000

NAMESgeneric2,000

DESCRIPTIONSspecies

DESCRIPTIONSgenera

INTKEYspecies

INTKEYgenera

ACCEPTED SPECIES

11,000

ACCEPTED GENERA

700

Page 9: Grassbase: the data volume challenge

Access SYNON groups names into homotypic groups

Page 10: Grassbase: the data volume challenge

an average of 88 pieces of information per species in DELTA descriptive language

Page 11: Grassbase: the data volume challenge

species description webpages linked by a single index page

Page 12: Grassbase: the data volume challenge

NAMESspecies + infra

63,000

NAMESgeneric2,000

DESCRIPTIONSspecies

DESCRIPTIONSgenera

INTKEYspecies

INTKEYgenera

ACCEPTED SPECIES

11,000

ACCEPTED GENERA

700

Page 13: Grassbase: the data volume challenge

NAMESspecies + infra

63,000

NAMESgeneric2,000

DESCRIPTIONSspecies

DESCRIPTIONSgenera

INTKEYspecies

INTKEYgenera

ACCEPTED SPECIES

11,000

ACCEPTED GENERA

700TRIBES

TYPESspecies +

infra

AREASTDWG

countries

Page 14: Grassbase: the data volume challenge

NAMESspecies + infra

63,000

NAMESgeneric2,000

DESCRIPTIONSspecies

DESCRIPTIONSgenera

INTKEYspecies

INTKEYgenera

ACCEPTED SPECIES

11,000

ACCEPTED GENERA

700

programs to check coding and tidy descriptions

programs for renaming files

programs to output natural language and INTKEY files

ca. 50 permanent and temporary queries

buttons to output species and generic lists

code to check data presence and spellingwithin and between tables

simplified version for website generated with one button

special program for putting name lists into two columns!

Page 15: Grassbase: the data volume challenge

NAMESspecies + infra

63,000

NAMESgeneric2,000

DESCRIPTIONSspecies

DESCRIPTIONSgenera

INTKEYspecies

INTKEYgenera

ACCEPTED SPECIES

11,000

ACCEPTED GENERA

700TRIBES

TYPESspecies +

infra

AREASTDWG

countries

IPNI updates for newly described names only

coding from literatureholding files for new items

custom set of programs to sync Access and DELTA

multi-stage import procedure via a series of tables

macros for different data type imports

Page 16: Grassbase: the data volume challenge

Grassbase: coding a new species

Page 17: Grassbase: the data volume challenge

Grassbase: coding a new species

Page 18: Grassbase: the data volume challenge

Grassbase: coding a new speciesProgram “Check” confirms internal consistency of data

Page 19: Grassbase: the data volume challenge

Grassbase: coding a new species

Page 20: Grassbase: the data volume challenge

Recent changes in grass names affect common species

Bromus sterilis Anisantha sterilis

Page 21: Grassbase: the data volume challenge

“Panicum” as used by Grassbase includes numerous evolutionary lineages with simple panicoid spikelets

Page 22: Grassbase: the data volume challenge

widening gap between morphological and phylogenetic: ca. 15% in species and generic names

Grassbase: The Kew View, an authoritative system with morphologically defined genera

Grass Phylogeny Working Group, GrassWorld, and others: multiple research groups in USA and Australia