TDWG 2013 Vesper
-
Upload
martinjgraham -
Category
Data & Analytics
-
view
111 -
download
0
Transcript of TDWG 2013 Vesper
Martin Graham & Jessie KennedyEdinburgh Napier University
VESPERVisual Exploration of Species-Referenced Repositories
• VESPER – an exploration into data quality issues for Darwin Core Archives (DWCA)
• DWCA’s are files for storing detailed species-based data sets
• How does a user know which data sets are useful and complete?
Introduction
• GBIF has tools to test DWCA validity
• This work is about visualising data we assume is “valid” but are unsure of “usefulness”– Taxonomy is broken– Dates are wrong– Lions in the sea
• In many cases the usefulness of such data is only seen when visualised in context
Valid vs. Useful
• Web-based visualisation of DWCAs– Uses HTML5
• SVG, CSS3, FileWriters, ArrayBuffers– D3 toolkit– Client side only
• Visualise basic dimensions of data– Taxonomy– Geography– Time– & Miscellaneous Stats
Approach
Darwin Core Archives
Meta.xml
Eml.xml
CoreTaxa/Occurrence Data
Extension
Extension
Meta Files (XML)
Data Files (CSV)
De
scribes
Exactly one
Zero or more
Extension ID == Core ID
• Zip files make things smaller– Good for network transport– But analysing the data means we have to make things big
again
Zapped by Zip
Expand a lot
Expand even more(String copying, UTF-16 etc)
• Partial Unzip• Analyse fields listed in meta file
– Disregard verbose fields
• Find combinations of fields that can be used to generate a visualisation
• List choice of available visualisations for a meta.xml and just extract chosen fields
Zip Zapped
Implicit Taxonomy acceptedNameUsageID, parentNameUsageID
Explicit Taxonomy Any of Kingdom, order, family, genus etc
Map decimalLongitude, decimalLatitude
Timeline eventDate
• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues
Taxonomy
• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues
Taxonomy
• Based on popular leaflet.js library– And Markercluster plugin– Some adaptations to show selected items
Geography
• Simple bar chart– With rangeslider– Zoom in and see yearly patterns (i.e not much at xmas)
Temporal
• Sanity check - Empty data count
Miscellaneous
• Taxonomic fan-out for hollow curve anomalies
• Export selected IDs– These can be saved or sent somewhere else
Miscellaneous
• Selections in one view are reflected in the other views for the same data– Multiple views, linking
Selection
• Javascript visualisations for DWCA archives
• Quickly shows areas of quality issue
• Can handle large archives if only key fields are analysed
Conclusion
• http://www.soc.napier.ac.uk/~cs22/vesperDemo/vesper/demoNew.html– Feedback welcome
• Thanks to GBIF, Canadensys, EMBL for data
• Funded by BBSRC
• Ask for a demo
Fin