Ontology-driven KDD Process Composition
-
Upload
emanuele-storti -
Category
Technology
-
view
213 -
download
0
description
Transcript of Ontology-driven KDD Process Composition
Ontology-DrivenKDD Process Composition
Claudia Diamantini, Domenico Potena, Emanuele Storti{diamantini, potena, storti}@diiga.univpm.it
www.diiga.univpm.it
UNIVERSITA’ POLITECNICA DELLE MARCHEDIIGA – Dipartimento di Ingegneria Informatica,
Gestionale e dell’AutomazioneAncona, Italy
IDA'09, Lyon, Aug 31
Introduction
Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]
Emanuele StortiIDA'09, Lyon, Aug 31
Many sources of complexity: iterative/interactive process many tasks and phases several algorithms available for each
phase, with specific: characteristics, interfaces preconditions/postconditions performances
Introduction
Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]
Emanuele StortiIDA'09, Lyon, Aug 31
Many sources of complexity: iterative/interactive process many tasks and phases several algorithms available for each
phase, with specific: characteristics, interfaces preconditions/postconditions performances
Need of systems for supporting users in composing algorithm for producing valid and useful KDD processes
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces and their relations
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces and their relations
Defining techniques for matching algorithms with compatible interfaces
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces and their relations
Defining techniques for matching algorithms with compatible interfaces
Defining a goal-oriented composition procedurewhich starts from user requests and produces a list of valid processes ranked according to some criteria
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces and their relations
Defining techniques for matching algorithms with compatible interfaces
Defining a goal-oriented composition procedure
which starts from user requests and produces a list of valid processes ranked according to some criteria
goaldataset
constraints
Aim of the work
Emanuele StortiIDA'09, Lyon, Aug 31
Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces and their relations
Defining techniques for matching algorithms with compatible interfaces
Defining a goal-oriented composition procedure
which starts from user requests and produces a list of valid processes ranked according to some criteria
goaldataset
constraintsprocesses
Framework
KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools
Separation of information in 3 logical layer:
KDD Algorithm
KDD Tool
KDD Service
abstract algorithm
specific implementation of an algorithm
tool running on a specific machine
Emanuele StortiIDA'09, Lyon, Aug 31
Algorithm level output = prototype KDD processes
Framework
KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools
Separation of information in 3 logical layer:
KDD Algorithm
KDD Tool
KDD Service
abstract algorithm
specific implementation of an algorithm
tool running on a specific machine
Emanuele StortiIDA'09, Lyon, Aug 31
Algorithm level output = prototype KDD processes
KDD Ontology (1)
Emanuele StortiIDA'09, Lyon, Aug 31
evaluation)
KDDONTO is an ontology formalizing the domain of KDD algorithms: developed following a formal methodology [Noy, 2002]
taking into account quality requirements [Gruber, 1995]
Main classes and relations: Algorithm, Method Task, Phase Data, DataFeature Performance has_input/has_output ...
translation in OWL logic modeling (concept definition
KDDONTO is coinceived for supporting process composition Properties useful for representing algorithm's interfaces:
has_condition pre/postcondition for some input/output data in_module/out_module suggestions about composable algorithms not_with/not_before explicit incompatibilities between methods
KDD Ontology (2)
Emanuele StortiIDA'09, Lyon, Aug 31
Properties useful for representing relations among data: part_of/has_part relations between a compound datum and its subcomponents in_constrast explicit incompatibilities between conditions
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
- is-a and part-of relations- inferential reasoning on KDDONTO
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
in ≡ outA111
B o
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
- is-a and part-of relations- inferential reasoning on KDDONTO
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
in ≡ outA111
B o in ≡ outA122
B o
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
- is-a and part-of relations- inferential reasoning on KDDONTO
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
in ≡ outA111
B o in ≡ outA122
B oin ≡ outA2
13B o
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
- is-a and part-of relations- inferential reasoning on KDDONTO
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
in ≡ outA111
B o in ≡ outA122
B oin ≡ outA2
13B o
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
VQ part_of LVQA1B
- is-a and part-of relations- inferential reasoning on KDDONTO
Algorithm Matchmaking
Linking algorithms with compatible interfaces
Emanuele StortiIDA'09, Lyon, Aug 31
Approximate MatchInterfaces share similar data
in ≡ outA111
B o in ≡ outA122
B oin ≡ outA2
13B o
match ({A , A } ,B):1 2E
Exact MatchInterfaces share the same data- equivalence only
match ({A , A } ,B):1 2A
DATASET ≡ DATASETA2B oVQ part_of LVQA1B
- is-a and part-of relations- inferential reasoning on KDDONTO
Composition Procedure (1)
Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes
Emanuele StortiIDA'09, Lyon, Aug 31
I. Definition of dataset , goal and user constraintsThree phases:
Composition Procedure (1)
Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes
Emanuele StortiIDA'09, Lyon, Aug 31
I. Definition of dataset , goal and user constraints
A Dataset type and set of instances of DataFeature class
e.g.: LabeledDataset{float, balanced, normalized,missing_values}
Three phases:
Composition Procedure (1)
Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes
Emanuele StortiIDA'09, Lyon, Aug 31
I. Definition of dataset , goal and user constraints
An instance of Task class
e.g.: CLASSIFICATIONA Dataset type and set of instances of DataFeature class
e.g.: LabeledDataset{float, balanced, normalized,missing_values}
Three phases:
Composition Procedure (1)
Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes
Emanuele StortiIDA'09, Lyon, Aug 31
I. Definition of dataset , goal and user constraints
An instance of Task class
e.g.: CLASSIFICATIONA Dataset type and set of instances of DataFeature class
e.g.: LabeledDataset{float, balanced, normalized,missing_values}
Pruning Criteria• max number of algorithms in a process;• max cost of a process;• max computational complexity
Three phases:
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
taskds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process buildingStarts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Composition Procedure (2)
Emanuele StortiIDA'09, Lyon, Aug 31
II. Process building
III. Process ranking
Starts from task and goes backwards iteratively
Stop conditions: - no process can be further expanded - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset
task
ds
A
iteration, algorithms are added to processes by exploiting matching functionalities
Cost function takes into account: kind of match (exact / approximate), precondition relaxation, algorithm performances, ...
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
KDDComposer
A prototype implementing the composition procedure
Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:
{float, normalized, missing_values,...}
Constraints: max 5 algorithms, etc.
Emanuele StortiIDA'09, Lyon, Aug 31
Resultsa ranked list of many valid processesCompared to a non-ontological approach more valid processes (inference)
less invalid processes (ontological and non-ontological pruning)
Conclusion
Procedure for composing valid KDD processes semantic representation of algorithms and data
Emanuele StortiIDA'09, Lyon, Aug 31
Advantages KDDONTO resulting processes are valid
supports complex pruning strategies Approximate Match more valid results (novel w.r.t other works in the Literature) Ranking according to both ontological and non-ontological criteria Prototype processes can be themselves considered as valid, unknown and useful
knowledge, valuable for both novice and experts users
Future works translating each prototype process in a concrete workflow of KDD Web Services
Emanuele Storti
Project website: http://boole.diiga.univpm.it
Project website
IDA'09, Lyon, Aug 31
Ontology-DrivenKDD Process Composition
Claudia Diamantini, Domenico Potena, Emanuele Storti{diamantini, potena, storti}@diiga.univpm.it
www.diiga.univpm.it
UNIVERSITA’ POLITECNICA DELLE MARCHEDIIGA – Dipartimento di Ingegneria Informatica,
Gestionale e dell’AutomazioneAncona, Italy
IDA'09, Lyon, Aug 31