Post on 08-Jul-2020
Simon Cox, Bruce Simons, Nick Car
12 March 2015
LAND AND WATER FLAGSHIP
Some problems with standard geospatial metadata
This presentation
• Asks some questions
• Does not provide all the answers • … but suggests some directions …
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 2 |
Outline
• ANZLIC and GeoNetwork
• Where did ANZLIC come from?
• Records
• Uses of metadata
• UML vs XML
• RDF
• RDF vocabularies
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 3 |
ANZLIC Metadata
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 4 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
5 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
6 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
7 |
(horse designed by committee =
camel)
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
○ US FGDC metadata a
strong precedent
○ requirements collected in
the 1990s
○ image and map librarians
8 |
(horse designed by committee =
camel)
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
○ US FGDC metadata a
strong precedent
○ requirements collected in
the 1990s
○ image and map librarians
9 |
(horse designed by committee =
camel)
> dawn of the internet, dataset == file
> 10,000s datasets in standard series,
metadata == digital ‘index cards’
Problem #1: Data ≠ Datasets?
• When cataloguing books, maps, images, even files, the card-index metaphor is OK • A discrete record for each item of data
• Now we expect to access data at a variety of granularities, the dataset/metadata record paradigm no longer applies
• It is a sea of data, and should be matched by a sea of metadata (maybe in the same place)
Problems with metadata | Nick Car 10 |
Breaking it down
• Structural decomposition
Problems with metadata | Nick Car 11 |
• Functional decomposition
Lawrence, Lowry, Miller, Snaith & Woolf, Information in environmental data grids. Phil. Trans. A, 2009
Problem #2: One record can’t serve all purposes
• But one ‘record’ is all you got!
Problems with metadata | Nick Car 12 |
ISO metadata was formalized as UML classes
Problems with metadata | Nick Car 13 |
GeoNetwork stores metadata as XML documents in a text database (Lucene)
Problems with metadata | Nick Car 14 |
Problem #3: Documents package text, not objects
• Instances of UML classes = Objects
• XML document = serialization for transport
• Treating the XML document as ‘canonical’ makes a basic category error: ➢XML validation ≠ quality control
➢if you only intend to manage it as text, why bother with a UML analysis?
For object-oriented behavior, the serialized form must be ‘un-marshalled’ for processing
Problems with metadata | Nick Car 15 |
Metadata creation
Problems with metadata | Nick Car 16 |
Problem #4: Index cards are not infrastructure
• Metadata-entry paradigm encourages record counting as a KPI
• Surely there are better measures of usefulness?
• How can we know, if it is not part of a joined-up architecture
Problems with metadata | Nick Car 17 |
What does everyone else do?
1. Specialist systems for specialized communities – Is spatial special? Do we want our spatial data in the mainstream?
2. Don’t bother with metadata, just index the content – The original strategy of the search engines
– Google Knowledge Graph now works with entities, not text
– (shame the entities don’t have persistent URIs …)
3. Metadata annotations – schema.org – semantic-web-lite
4. What about the Data Repositories?
Problems with metadata | Nick Car 18 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
Problems with metadata | Nick Car 19 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
Problems with metadata | Nick Car 20 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
• RDF vocabularies? • DC, DCAT
• FOAF, PROV-O, VoID, SKOS, ADMS, LOCN
Problems with metadata | Nick Car 21 |
INSPIRE profile of DCAT-AP
Problems with metadata | Nick Car 22 |
INSPIRE metadata record as RDF
Problems with metadata | Nick Car 23 |
RDF benefits
• Standard vocabularies used in the broader community
• Intrinsically object/resource oriented
• URIs for keys - linked data
• Open world – missing information doesn’t make it invalid
• No intrinsic granularity
Problems with metadata | Nick Car 24 |
Summary
ANZLIC + GeoNetwork:
☹ Record-oriented metadata doesn’t match granularity of data
☹ Each record must serve multiple functions
☹ Object oriented design, but serialization-oriented processing
☹ Incentive to create records, not architecture
☹ Not aligned with anyone else’s metadata
RDF?:
☺ Graph of metadata to match graph of data
☺ Targeted metadata subsets can be constructed using SPARQL
☺ Intrinsically resource-oriented
☺ Part of web of Linked Data
☺ Standard RDF vocabularies
Problems with metadata | Nick Car 25 |
LAND AND WATER FLAGSHIP
Thank you Land and Water Flagship Nick Car Research Engineer
t +61 7 3833 5600 e nicholas.car@csiro.au
Land and Water Flagship Simon Cox Research Scientist
t +61 3 9252 6342 e simon.cox@csiro.au w people.csiro.au/C/S/Simon-Cox