Ontology-driven KDD Process Composition

39
Ontology-Driven KDD Process Composition Claudia Diamantini, Domenico Potena, Emanuele Storti {diamantini, potena, storti}@diiga.univpm.it www.diiga.univpm.it UNIVERSITA’ POLITECNICA DELLE MARCHE DIIGA – Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Ancona, Italy IDA'09, Lyon, Aug 31

description

Full paper: http://boole.diiga.univpm.it/paper/ida09.pdf One of the most interesting challenges in Knowledge Discovery in Databases (KDD) eld is giving support to users in the composition of tools for forming a valid and useful KDD process. Such an activity implies that users have both to choose tools suitable to their knowledge discovery problem, and to compose them for designing the KDD process. To this end, they need expertise and knowledge about functionalities and properties of all KDD algorithms implemented in available tools. In order to support users in this heavy activity, in this paper we introduce a goal-driven procedure for automatically compose algorithms. The proposed procedure is based on the exploitation of KDDONTO, an ontology formalizing the domain of KDD algorithms, allowing us to generate valid and non-trivial processes.

Transcript of Ontology-driven KDD Process Composition

Page 1: Ontology-driven KDD Process Composition

Ontology-DrivenKDD Process Composition

Claudia Diamantini, Domenico Potena, Emanuele Storti{diamantini, potena, storti}@diiga.univpm.it

www.diiga.univpm.it

UNIVERSITA’ POLITECNICA DELLE MARCHEDIIGA – Dipartimento di Ingegneria Informatica,

Gestionale e dell’AutomazioneAncona, Italy

IDA'09, Lyon, Aug 31

Page 2: Ontology-driven KDD Process Composition

Introduction

Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]

Emanuele StortiIDA'09, Lyon, Aug 31

Many sources of complexity: iterative/interactive process many tasks and phases several algorithms available for each

phase, with specific: characteristics, interfaces preconditions/postconditions performances

Page 3: Ontology-driven KDD Process Composition

Introduction

Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]

Emanuele StortiIDA'09, Lyon, Aug 31

Many sources of complexity: iterative/interactive process many tasks and phases several algorithms available for each

phase, with specific: characteristics, interfaces preconditions/postconditions performances

Need of systems for supporting users in composing algorithm for producing valid and useful KDD processes

Page 4: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Page 5: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Formalizing knowledge of KDD experts into an

ontology for describing algorithms, their interfaces and their relations

Page 6: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Formalizing knowledge of KDD experts into an

ontology for describing algorithms, their interfaces and their relations

Defining techniques for matching algorithms with compatible interfaces

Page 7: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Formalizing knowledge of KDD experts into an

ontology for describing algorithms, their interfaces and their relations

Defining techniques for matching algorithms with compatible interfaces

Defining a goal-oriented composition procedurewhich starts from user requests and produces a list of valid processes ranked according to some criteria

Page 8: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Formalizing knowledge of KDD experts into an

ontology for describing algorithms, their interfaces and their relations

Defining techniques for matching algorithms with compatible interfaces

Defining a goal-oriented composition procedure

which starts from user requests and produces a list of valid processes ranked according to some criteria

goaldataset

constraints

Page 9: Ontology-driven KDD Process Composition

Aim of the work

Emanuele StortiIDA'09, Lyon, Aug 31

Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure

Formalizing knowledge of KDD experts into an

ontology for describing algorithms, their interfaces and their relations

Defining techniques for matching algorithms with compatible interfaces

Defining a goal-oriented composition procedure

which starts from user requests and produces a list of valid processes ranked according to some criteria

goaldataset

constraintsprocesses

Page 10: Ontology-driven KDD Process Composition

Framework

KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools

Separation of information in 3 logical layer:

KDD Algorithm

KDD Tool

KDD Service

abstract algorithm

specific implementation of an algorithm

tool running on a specific machine

Emanuele StortiIDA'09, Lyon, Aug 31

Algorithm level output = prototype KDD processes

Page 11: Ontology-driven KDD Process Composition

Framework

KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools

Separation of information in 3 logical layer:

KDD Algorithm

KDD Tool

KDD Service

abstract algorithm

specific implementation of an algorithm

tool running on a specific machine

Emanuele StortiIDA'09, Lyon, Aug 31

Algorithm level output = prototype KDD processes

Page 12: Ontology-driven KDD Process Composition

KDD Ontology (1)

Emanuele StortiIDA'09, Lyon, Aug 31

evaluation)

KDDONTO is an ontology formalizing the domain of KDD algorithms: developed following a formal methodology [Noy, 2002]

taking into account quality requirements [Gruber, 1995]

Main classes and relations: Algorithm, Method Task, Phase Data, DataFeature Performance has_input/has_output ...

translation in OWL logic modeling (concept definition

Page 13: Ontology-driven KDD Process Composition

KDDONTO is coinceived for supporting process composition Properties useful for representing algorithm's interfaces:

has_condition pre/postcondition for some input/output data in_module/out_module suggestions about composable algorithms not_with/not_before explicit incompatibilities between methods

KDD Ontology (2)

Emanuele StortiIDA'09, Lyon, Aug 31

Properties useful for representing relations among data: part_of/has_part relations between a compound datum and its subcomponents in_constrast explicit incompatibilities between conditions

Page 14: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 15: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

in ≡ outA111

B o

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 16: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

in ≡ outA111

B o in ≡ outA122

B o

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 17: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

in ≡ outA111

B o in ≡ outA122

B oin ≡ outA2

13B o

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 18: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

in ≡ outA111

B o in ≡ outA122

B oin ≡ outA2

13B o

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

VQ part_of LVQA1B

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 19: Ontology-driven KDD Process Composition

Algorithm Matchmaking

Linking algorithms with compatible interfaces

Emanuele StortiIDA'09, Lyon, Aug 31

Approximate MatchInterfaces share similar data

in ≡ outA111

B o in ≡ outA122

B oin ≡ outA2

13B o

match ({A , A } ,B):1 2E

Exact MatchInterfaces share the same data- equivalence only

match ({A , A } ,B):1 2A

DATASET ≡ DATASETA2B oVQ part_of LVQA1B

- is-a and part-of relations- inferential reasoning on KDDONTO

Page 20: Ontology-driven KDD Process Composition

Composition Procedure (1)

Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes

Emanuele StortiIDA'09, Lyon, Aug 31

I. Definition of dataset , goal and user constraintsThree phases:

Page 21: Ontology-driven KDD Process Composition

Composition Procedure (1)

Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes

Emanuele StortiIDA'09, Lyon, Aug 31

I. Definition of dataset , goal and user constraints

A Dataset type and set of instances of DataFeature class

e.g.: LabeledDataset{float, balanced, normalized,missing_values}

Three phases:

Page 22: Ontology-driven KDD Process Composition

Composition Procedure (1)

Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes

Emanuele StortiIDA'09, Lyon, Aug 31

I. Definition of dataset , goal and user constraints

An instance of Task class

e.g.: CLASSIFICATIONA Dataset type and set of instances of DataFeature class

e.g.: LabeledDataset{float, balanced, normalized,missing_values}

Three phases:

Page 23: Ontology-driven KDD Process Composition

Composition Procedure (1)

Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities produces a subset of all possible valid processes

Emanuele StortiIDA'09, Lyon, Aug 31

I. Definition of dataset , goal and user constraints

An instance of Task class

e.g.: CLASSIFICATIONA Dataset type and set of instances of DataFeature class

e.g.: LabeledDataset{float, balanced, normalized,missing_values}

Pruning Criteria• max number of algorithms in a process;• max cost of a process;• max computational complexity

Three phases:

Page 24: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

taskds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 25: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 26: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 27: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 28: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 29: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process buildingStarts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Page 30: Ontology-driven KDD Process Composition

Composition Procedure (2)

Emanuele StortiIDA'09, Lyon, Aug 31

II. Process building

III. Process ranking

Starts from task and goes backwards iteratively

Stop conditions: - no process can be further expanded - some process constrains are violated

Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset

task

ds

A

iteration, algorithms are added to processes by exploiting matching functionalities

Cost function takes into account: kind of match (exact / approximate), precondition relaxation, algorithm performances, ...

Page 31: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Page 32: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Page 33: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Page 34: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Page 35: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Page 36: Ontology-driven KDD Process Composition

KDDComposer

A prototype implementing the composition procedure

Example scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features:

{float, normalized, missing_values,...}

Constraints: max 5 algorithms, etc.

Emanuele StortiIDA'09, Lyon, Aug 31

Resultsa ranked list of many valid processesCompared to a non-ontological approach more valid processes (inference)

less invalid processes (ontological and non-ontological pruning)

Page 37: Ontology-driven KDD Process Composition

Conclusion

Procedure for composing valid KDD processes semantic representation of algorithms and data

Emanuele StortiIDA'09, Lyon, Aug 31

Advantages KDDONTO resulting processes are valid

supports complex pruning strategies Approximate Match more valid results (novel w.r.t other works in the Literature) Ranking according to both ontological and non-ontological criteria Prototype processes can be themselves considered as valid, unknown and useful

knowledge, valuable for both novice and experts users

Future works translating each prototype process in a concrete workflow of KDD Web Services

Page 38: Ontology-driven KDD Process Composition

Emanuele Storti

Project website: http://boole.diiga.univpm.it

Project website

IDA'09, Lyon, Aug 31

Page 39: Ontology-driven KDD Process Composition

Ontology-DrivenKDD Process Composition

Claudia Diamantini, Domenico Potena, Emanuele Storti{diamantini, potena, storti}@diiga.univpm.it

www.diiga.univpm.it

UNIVERSITA’ POLITECNICA DELLE MARCHEDIIGA – Dipartimento di Ingegneria Informatica,

Gestionale e dell’AutomazioneAncona, Italy

IDA'09, Lyon, Aug 31