CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
1 of 47
Dave Thau andBertram Ludäscher
keywords: knowledge management, automatic reasoning, semantic integration, biological classification
CleanTAX:An Infrastructure for Reasoning about Biological Taxonomies
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
2 of 47
Outline
• Brief Overview of Taxonomies• Impact of Different Taxonomic Views on Data
Analysis• Taxonomies and Relations Between Them• Using Logic to Determine Inconsistencies and
discover new relations• Initial Results of Large Scale Analysis• Some Optimizations• Future Work
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
3 of 47
Beginnings of Biological TaxonomyEgypt, 1500 BC: Ebers medical papyrus, classification of medicinal plants
China, 350 BC: Erh-ya dictionary (second century BC) – classifies trees, grasses, herbs, grains, vegetables
Greece, 300 BC: Theophrastus, Historia plantarum and Causae plantarum – 500 plants – trees, herbs, fruiting plants, perennials
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
4 of 47
Taxonomies are Everywhere:Systematics
Ranunculales
Ranunculus
Ranunculaceae
Magnoliopsida
Tracheophyta
Ranunculus asiaticus
Plantae kingdom
phylum
class
order
family
genus
species
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
5 of 47
Taxonomies are Everywhere:The Dewey Decimal System000 Computers and general reference100 Philosophy and psychology200 Religion300 Social sciences400 Language500 Science600 Technology700 Arts and Recreation800 Literature900 History and geography
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
6 of 47
Taxonomies are Everywhere:Phylogenies
From Thomas D. Als, Roger Vila, Nikolai P. Kandul, David R. Nash, Shen-Horn Yen, Yu-Feng Hsu, André A. Mignault, Jacobus J. Boomsma and Naomi E. Pierce. Nature 432, 386-390.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
7 of 47
Taxonomies are Everywhere:Protein Structure
From Ed Green http://compbio.berkeley.edu/people/ed/SeqCompEval/
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
8 of 47
Taxonomies are Useful, But Slippery
• In all of these cases, taxonomies– Help us organize information– Allow us to make inferences at many levels of
generality
• However, taxonomies are simply "views" of real data– Dewey Decimal or Library of Congress?– Benson's view of Ranunculus or Kartesz's view?– Conflicting phylogenies are common– SCOP versus CATH
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
9 of 47
Different Taxonomies Can Lead To Different Results
Predicted Distribution of Anhinga melanogaster based on Clement's 4th Edition
Predicted Distribution of Anhinga melanogaster based on
contained incontained in contained in
Anhingarufa
Anhingarufa
AnhingaAnhinga
Anhinganova.
Anhinganova.
Anhingamelanogaster
Anhingamelanogaster
is ais a is a
is a
AnhingaAnhinga
Anhingamelanogaster
Anhingamelanogaster
is a is a
Articulations by Santa Barbara Software Products
ph
oto
by D
avid
B
eh
ren
s
Clement's 5th Edition
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
10 of 47
Different Taxonomies Complicate Data Analysis
What were the average number of Ranunculus arizonicus seen in transect 1 in 2005?
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
11 of 47
• Peet05 articulates relation between Benson’48 and Kartesz’04 names …
• Is that articulation consistent?
• Can we infer additional information?
Reasoning With Taxonomic Concepts
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
12 of 47
Problem Statement
• What are taxonomies, anyway?• How do you know a taxonomy makes
sense?• Given some articulations meant to
translate between taxonomies:– do they make sense, or are there internal
contradictions?– have they left out anything which may be
inferred logically?
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
13 of 47
What are Taxonomies?A simple definition: A directed acyclic graph of
nodes and edges, where the edges represent a "subtype" relation
Anhingarufa
Anhingarufa
AnhingaAnhinga
Anhinganova.
Anhinganova.
Anhingamelanogaster
Anhingamelanogaster
is ais a
is a
Potential additional constraints:• children are disjoint (child-disjointness, D) • children partition their parents (coverage, C)• nodes are non-empty (non-emptiness, N)
We call these "latent taxonomic assumptions"• More than one LTA may apply• 8 combinations:none, C, D, N, CD, CN, DN, CDN
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
14 of 47
Inconsistency in a TaxonomyInconsistent under the ND (non-emptiness
and disjoint children) LTA.
AA
BB CC
DD
If B and C are children of A, then they must be disjoint. However, they both contain elements of D
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
15 of 47
How do Taxonomies Relate?
Articulations relate nodes between taxonomies
N M
(v) exclusion
N M
(iv) partial overlap
N M
(ii) proper inclusion
(iii) proper inverseinclusion
N M
(i) congruence
M N
Between any two nodes in the taxonomies, one, and only one, of the following five relations must hold:
M N M > N M < N M o N M x N
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
16 of 47
Ranunculusaquatilis
Ranunculusaquatilis
R.a. varaquatilis
R.a. varaquatilis
R.a. vardiffusus
R.a. vardiffusus
R.a. varhispidulus
R.a. varhispidulus
FNA-03, 1997
Ranunculusaquatilis
Ranunculusaquatilis
R.a. varcapillaceus
R.a. varcapillaceus
Benson, 1948
Many Possible Articulation Sets
R.a. varcalvescens
R.a. varcalvescens
Five relationships, plus "unknown/unstated relation", and 3 x 4 nodes results in 612 (over 2 billion) sets of articulations.
<
<
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
17 of 47
Articulations: Some Make Sense
AA
BB CC
DD
EE FF
Taxonomy 1 Taxonomy 2
isa isa isa isa
B < F
A < D
C E
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
18 of 47
Articulations: Some Are Impossible
AA
BB CC
DD
EE FF
Taxonomy 1 Taxonomy 2
isa isa isa isa
B < F
C > F
Assuming non-emptiness, and disjoint childrenLTAs
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
19 of 47
Articulations: Some Imply other Articulations
AA
BB CC
DD
EE FF
Taxonomy 1 Taxonomy 2
isa isa isa isa
A D
C E
Implies B F
Assuming non-emptiness, disjoint children and coverageLTAs
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
20 of 47
The Relation Lattice• Sometimes, a single relation between two nodes is unknown. • The relation lattice shows all 32 possible combined relations. • Each node represents a disjunction of relations.
oxox
<ox
<ox
<<>> oo
><>< >o>o <o<o
xx
<< <x<x oo >x>x xx>>
>o
>o
><o
><o
><x
><x
<x
<x
<o
<o
>ox
>ox
>x
>x oxox>
<
><
><o
><o
><x
><x >ox>ox ><ox><ox<ox<ox
><ox><ox
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
21 of 47
The Complexity of Developing Articulations
The Ranunculusdata set9 Taxonomies654 Taxa704 Articulations
visualization byMartin Graham
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
22 of 47
Example Articulation Set
AA
DD
BB CC CC BB AA
KK LL MM II JJ EE FF GG HH
O
X
O
A: R. petioralisB: R. macrantusC: R. fascicularis
Kartesz, 2004 Benson, 1948
XO
is included inequalsoverlapsdisjoint
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
23 of 47
Goal – To Help Bob Know
• that the taxonomies he's working with are consistent
• when he's introduced an articulation that leads to inconsistency
• when an articulation is implied by others
• about ambiguous articulations
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
24 of 47
Berendsohn, et. al, 2003 - MoReTaX
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
25 of 47
Logic Based Approach
• Devise a language LTax
– First-order logic constraints on single-place predicates, where each predicate is a "taxon"
• Render taxonomies and articulations between them into a set of first-order formulas
• Then can ask, – does a taxonomy follow your definition of taxonomy?– is a pair of taxonomies plus articulations between them
consistent?– are there unstated articulations?
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
26 of 47
Translating Taxonomy into Logic
isa for each edge M isa N add x:M(x) N(x)Non-Emptiness
(N) for each node N, add x: N(x)
Child Disjointness
(D) for each two children N1, N2 of M,
add x: N1(x) N2(x)Coverage (C) for each node M with children N1,..NL,
add x:M(x) N1(x) … NL(x)
Congruence M N x:M(x) N(x)
Proper Inclusion M > N x:N(x) M(x) a: M(a) N(a)
Proper Inverse Inclusion
M < N x:M(x) N(x) a: N(a) M(a)
Partial Overlap M o N abc: M(a) N(a) M(b) N(b) M(c) N(c)
Exclusion M x N x: M(x) N(x)
Taxonomy and LTA Formulas
Articulation Formulas
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
27 of 47
Theorem Proving
= { x: B.Rac(x) → B.Ra(x), x: B.Rat(x) → B.Ra(x), x: B.Ra(x) ↔ K.Ra(x), x: B.Rat(x) → K.Ra(x)...}
= x: B.Rac(x) → K.Ra(x) a: K.Ra(a) B.Rac(a)
Want to show that ╞ , that holds in
To prove it, show: {} ├
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
28 of 47
CleanTax Methodology
1. Check each taxonomy under each LTA set to see if it's consistent
2. Check the articulations under each LTA set to see if they are consistent
3. Check the taxonomies plus the articulations under the LTA sets from above and make sure the combination is consistent
4. If so, for each pair-wise combination of nodes, try to prove each possible relationship under each consistent LTA set.
Given a set of taxonomies and articulations between them
Implemented using python. The theorem prover prover9, and the model searcher mace4, are used to prove relationships and check consistency.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
29 of 47
The CleanTAX Infrastructure• Features
– Designed to plug in a variety of reasoners– Works with computer clusters (Sun Grid Engine)– Can work with whole taxonomies or subsets
• Command line options– Specify taxonomies and articulation sets to test– Specify relations to test– Specify LTAs to test– Specify nodes to test– Pass parameters to the reasoners
• Inputs– Taxonomic Concept Schema (an XML spec)– Individual reasoner files– Internal representation
• Example Reports– Which taxonomies are consistent under which LTAs– For each pair of nodes tested, for each relation, under each LTA, whether or not it can be
proven true– For each set of taxonomies and articulations, under each LTA, a graph showing new infered
relations
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
30 of 47
Initial resultsWe ran two Ranunculus taxonomies (Benson 1948, 218 Taxa and Kartesz 2004, 142 Taxa) and 206 Articulations from Peet 2005.
When the taxonomies and the articulations were analyzed as a whole, only two LTA combinations were provably consistent: no LTAs and non-emptiness.
This involved 928,680 judgments and took 46.0 hours.
To get a better sense for the impact of LTAs, the combined taxonomies and articulations were divided into 82 connected subgraphs
Among these we found 5 inconsistencies and 1946 new articulations
This involved 166,920 judgments and took 4.8 hours.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
31 of 47
Discovered Inconsistent Mappingunder the {coverage, disjointness, non-emptiness} LTA set
Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stoloniferB.1948:R.h.typicus is congruent to K.2004:R.h.typicusB.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides
The most likely fix here is to change the congruence relation between the toptwo nodes to instead state that Benson's R. hydrocharoides includesKartesz's
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varnatans
R.h. varnatans
R.h. varstolonifer
R.h. varstolonifer
R.h. vartypicus
R.h. vartypicus
Benson, 1948
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varstolonife
r
R.h. varstolonife
rR.h. vartypicus
R.h. vartypicus
Kartesz, 2004
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
32 of 47
Formal Proof of Inconsistency
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
33 of 47
Inferring Additional KnowledgeDoes C = E? Or, is C > E?
AA
BB CC DD
Benson, 1948
EE
GG
Kartesz, 2004
FF IIHH
KKJJ
Taxonomy provided isa ()Articulated Proper Inverse Inclusion (<)Articulated Congruence ()
< < < <
< A: Ranunculus hispidusB: R.h. var caricetorumC: R.h. var hispidusD: R.h. var nitidusE: Ranunculus hispidusF: R.h. var eurylobusG: R.h. var greenmaniiH: R.h. var marilandicusI: R.h. var typicusJ: R. septentrionalisK: R. carolinanis
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
34 of 47
Most Informative Relation (MIR)
oxox
<ox
<ox
<<>> oo
><>< >o>o <o<o
xx
<< <x<x oo >x>x xx>>
>o
>o
><o
><o
><x
><x
<x
<x
<o
<o
>ox
>ox
>x
>x oxox>
<
><
><o
><o
><x
><x >ox>ox ><ox><ox<ox<ox
><ox><ox
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
35 of 47
Latent Taxonomic Assumptions vs New Maximally Informative Relations
The Basic Five Relations
The Other 28 Relations
No LTAs 245 304
All Three LTAs
475 74
Numbers represent novel provably true relations within 75 sub-taxonomies.
Main finding: More constraints lead to more specificity in provably true relations
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
36 of 47
Optimizations
DDNN
CC
NDCNDC
NDND NCNC DCDC
LTA Optimization
If a set of axioms is inconsistent under one node, it will be inconsistent under all the supersets of that node.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
37 of 47
Finding the MIRAlgorithm 1: Bottom Up (A↑)
oxox
<ox
<ox
<<>> oo
><>< >o>o <o<o
xx
<< <x<x oo >x>x xx>>
>o
>o
><o
><o
><x
><x
<x
<x
<o
<o
>ox
>ox
>x
>x oxox>
<
><
><o
><o
><x
><x >ox>ox ><ox><ox<ox<ox
><ox><ox
Try relations on the bottom rank in order,then, if none is true, go to the next rank.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
38 of 47
Finding the MIRAlgorithm 2: Top Down (A↓)
oxox
<ox
<ox
<<>> oo
><>< >o>o <o<o
xx
<< <x<x oo >x>x xx>>
>o
>o
><o
><o
><x
><x
<x
<x
<o
<o
>ox
>ox
>x
>x oxox>
<
><
><o
><o
><x
><x >ox>ox ><ox><ox<ox<ox
><ox><ox
Just check the relations in penultimate rank
((A B C D) E)
((B C D E) A)
(B C D )
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
39 of 47
Relation Lattice Optimization Results 1
A0 A↑ A↓
Number of Judgments
928,680 912,779 154,780
Time (hours) 46.0 45.3 7.8(a 5.8x speedup)
Logical Steps (millions)
2,634 2,589 442
Comparing the two full taxonomies, under the nonemptiness LTA shows a strong improvement for the top-down optimization
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
40 of 47
Relation Lattice Optimization Results 2
A0 A↑ A↓
Number of Judgments
17,019 2,194 2,745
Time (seconds)
574.59 83.61(a 6.9x speedup)
100.47 (a 5.7x speedup)
Logical Steps (thousands)
2,484 384 394
Under more restrictive constraints, the bottom-up optimization improves. Results are for 75 sub-taxonomies under the NDC LTA.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
41 of 47
Summary: Contributions To Date
• Represented taxonomies and articulations between them in logic
• Clarified and represented latent taxonomic assumptions
• Created an infrastructure capable of applying reasoners large taxonomies and articulation sets– discovering inconsistencies– discovering interesting new relations– elucidating impact of LTAs on reasoning
• Described and tested three optimizations
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
42 of 47
Future Work: Applications
Paul Craig and Jessie Kennedy (2007), School of Computing, Napier University, Edinburgh
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
43 of 47
Future Work: Suggesting Fixes
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varnatans
R.h. varnatans
R.h. varstolonifer
R.h. varstolonifer
R.h. vartypicus
R.h. vartypicus
Benson, 1948
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varstolonife
r
R.h. varstolonife
rR.h. vartypicus
R.h. vartypicus
Kartesz, 2004
Inconsistency found, suggested fixes:1. Change relation between Ranunculus hydrocharoides (Benson, 1948) and
Ranunculus hydrocharoides (Kartesz, 2004) from to >.2. Relax Non-Emptiness constraint, allowing Ranunculus hydrocharoides var.
natans to be empty.3. Relax Coverage constraint, allowing R. hydrocharoides to contain
specimens not contained in its children4. …
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
44 of 47
Future Work: Other Logics – DL
Ranunculuspetiolaris
…
Benson, 1948
Ranunculuspetiolaris
Kartesz, 2004
Ranunculus Ranunculus
Ranunculusmacranthus
…
<
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
45 of 47
Other Future Work
• Better parallelization
• Better interfaces (GUI, Web Services)
• Applications to other domains
• Enhancing reporting tools to better support data curation
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
46 of 47
Conclusions
• Taxonomies are more complicated than you may have thought.
• Logic is a useful tool for discovering inconsistencies and new relations in taxonomies and articulations between them.
• This is an interesting interdisciplinary line of research combining elements from systematics, artificial intelligence, and high-performance computing.
CleanTAX, Dave Thau [email protected], Dave Thau [email protected] Research Institute Stanford Research Institute Artificial Intelligence Center SeminarArtificial Intelligence Center Seminar8/16/20078/16/2007
47 of 47
Thanks!Acknowledgements
SEEK is supported by the National Science Foundation under awards 0225676. 0225665, 0225635, and 0533368.
Invaluable Consultation: Bertram Ludäscher and Shawn Bowers
Ranunculus Data Set: Bob Peet
Visualization Tools: Jessie Kennedy, Martin Graham and Paul Craig
Niche Modeling: Kirsten Menger-Anderson
Funding and Context: The SEEK project
D. Thau and B. Ludäscher. Reasoning about Taxonomies in First-Order Logic. Journal of Ecological Informatics, (accepted for publication in 2007).
D. Thau and B. Ludäscher. Toward Optimizing CleanTAX: An Automated Reasoning Method for Taxonomies and Articulations. (submitted to 2007 IEEE/WIC/ACM International Conference on Web Intelligence.
References
Top Related