Building a names backbone
Transcript of Building a names backbone
![Page 1: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/1.jpg)
Building a “names
backbone”
Nicky Nicolson, RBG Kew
![Page 2: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/2.jpg)
A names backbone
== “an environment for the management of multiple
overlapping classifications and tracking how these
change over time”
Not a monolith:
• Built on a layered view of the domain – clearly
separating names and taxonomy
• Names form the objective basis for higher layers
![Page 3: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/3.jpg)
The current situation…
Many overlapping systems, few links
![Page 4: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/4.jpg)
… and what we’re aiming for:
Authoritative data, reduced duplication, many more links
![Page 5: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/5.jpg)
Names backbone: a layered environment
![Page 6: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/6.jpg)
Name occurrence layer AKA
“Nomen-clutter”
== any attempt
at the
transcription of
a name..
![Page 7: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/7.jpg)
Names layer
Holds objective
published facts
about a name:
-Orthography
- Authorship
- Protologue
reference
- Type citation
- Objective
synonymy
![Page 8: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/8.jpg)
Concepts layer
Hypotheses
draw names
together to form
concepts via
heterotypic
synonymy
![Page 9: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/9.jpg)
The (current) problem:
Most people want
to operate at
concept level…
![Page 10: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/10.jpg)
The (current) problem:
… but have
to start right
down at the
lowest level
![Page 11: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/11.jpg)
The problem:
![Page 12: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/12.jpg)
Solving the problem…
We need to provide ways to allow people to better
navigate between the layers, and better focus their
efforts – e.g. build classifications using the same
objective bases.
We started with a blank sheet of paper – it’s hard to get
existing systems to conform to the layering that we
need
![Page 13: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/13.jpg)
Drawbacks of data models used to
date
• conflated the storage of names and concepts.
• store only a single classification
• store only the end product of a thought process, not
work in progress
• are difficult to version
• are difficult to query effectively (for hierarchies etc)
![Page 14: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/14.jpg)
A new (graph) model
• Stores data as graphs – composed of nodes and
directed relationships
• Both nodes and relationships can hold data as
properties
• Supports highly interconnected data
• Supports self-referential data
• Optimised for queries on relationships
![Page 15: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/15.jpg)
Using a graph model to hold
concept data: Attempt #1
Two nodes, with name
+ status properties,
and an “accepted_as”
link.
== a naïve use of the
graph model: status is
stored in 2 places
(explicitly in status
property, implicitly
by the participation
relationship)
![Page 16: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/16.jpg)
Using a graph model to hold
concept data: Attempt #2
More strict about the
separation of the
nomenclatural
information (the nodes)
and the taxonomic
information (the
relationships between
nodes), but the link
is still very sparse…
![Page 17: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/17.jpg)
Using a graph model to hold
concept data: Attempt #3
Add an attribute to
indicate which
classification asserts
this subjective
relationship:
Taxonomic status of a
name is inferred from
its participation
in a subjective
taxonomic relationship.
![Page 18: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/18.jpg)
Links become more interesting
than the nodes
Expand the data
held on the
subjective
relationship to allow
it to be
computationally
assessed
![Page 19: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/19.jpg)
Multiple opinions – using the
same name nodes
Reuse the name
nodes to store
multiple opinions
using the same
basic facts (name
nodes)
![Page 20: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/20.jpg)
Relationships held
Objective, e.g.:
• Combination-basionym
• Later_homonym
• Alternative_name_for
• …
Subjective, e.g.:
• Parent_child (taxonomic placement)
• Synonym (heterotypic synonymy)
• …
![Page 21: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/21.jpg)
Objective relationships “stronger” than
subjective
![Page 22: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/22.jpg)
Supporting versioning
We keep all relationships, modifications to the data just
mark relationships as no longer current.
We can always resurrect the state of the graph
== persistent identification of taxon concepts
![Page 23: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/23.jpg)
Versioning = name id +
classification + state
We can always resurrect the state of the graph.
Versioning enables remote curation of the data
![Page 24: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/24.jpg)
Versioning = name id +
classification + state
We can always resurrect the state of the graph.
Versioning enables remote curation of the data
![Page 25: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/25.jpg)
Versioning = name id +
classification + state
We can always resurrect the state of the graph.
Versioning enables remote curation of the data
State1, according to
WCS:
Xus yus Smith (A)
= Aus bus Jones
(S)State2, according to
WCS:
Xus zus White (A)
= Xus yus Smith
(S)
= Aus bus Jones
(S)
![Page 26: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/26.jpg)
What can be done with this kind of
data model?
• Client systems can reliably connect to a version of a
concept
• We can see how concepts change over time
• Researchers can query the data to compare
classifications and identify areas of dispute
Longer term:
• Examine the “computed acceptance” rules used in
TPL - could these be run on the relationships in the
names backbone?
![Page 27: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/27.jpg)
Building it: we first focussed on
the top two layers…
![Page 28: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/28.jpg)
… but we need a way to manage
the name occurrences
![Page 29: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/29.jpg)
Building the name occurrence layer:
Populating it:
• Seed it with authoritative set of names
• Add the version history of these names – how were
these names transcribed in the past?
Using it:
• Load candidate name occurrences and match them,
storing metrics on the match.
Reviewing – a “data improvement” team to:
• Verify the matches, focussing on ambiguity (that
which can’t be done computationally) == annotation
![Page 30: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/30.jpg)
Services: name occurrence layer
- Data input / output:
DwCA
-Linking and
reviewing links
-RSS feeds to
indicate activity
![Page 31: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/31.jpg)
Services: names layer
- Data input / output:
TCS
-Propose addition /
edit of names
-RSS feeds to
indicate activity
![Page 32: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/32.jpg)
- Data input / output:
TCS
-Create
classifications using
names
-Propose
addition / edit of
names to names
layer
-RSS feeds
Services: concepts layer
![Page 33: Building a names backbone](https://reader034.fdocuments.us/reader034/viewer/2022042615/55ad05361a28abe7468b46a3/html5/thumbnails/33.jpg)
The names backbone is an
extensible environment:
• Links “name occurrences” to names
• Separates curation of names and concepts
• Supports building concepts on the same objective
basis: enables sharing and reuse of foundation data.
• Allow many relationships to form concepts – supports
multiple overlapping classifications
• Allows distributed curation of the concepts.