Topic Maps for Association Rule Mining

Topic Maps for

Association Rule Mining

Tomáš Kliegr, Jan Zemánek, Marek Ovečka

Department of Information and Knowledge EngineeringFaculty of Informatics and Statistics

University of Economics, Prague

Data Mining using CRISP-DM

The goal of data mining is to obtain useful non-trivial patterns from the data.

Analytical Report

Common data mining tasks

Clustering Classification

Sex(M) and Salary(Low) and District(Havlickuv Brod) => Quality(Bad)

Association rules

Association Rule MiningEXAMPLEUnlike clustering and classification, association rules provide true “nuggets” – rules

meeting selected interest measuresDuration(2y+)and District(Prague)=> Loan Quality(good)

THE QUEST FOR TOPIC MAPS

Antecedent Consequent

Select the really interesting rules from the rules output automatically.Help searching through the results.

THE PROBLEM WITH INTEREST MEASURESIt is usually not possible to tweak the interest measure thresholds so that only the really interesting rules are output. To be on the safe side, we often get (many!) more rules than desired,

The quest

- Past results

- Background knowledge

- Redundant rules

Discovered nuggetsMore precise tasks

orAutomatic rule filtering

The lingua franca for exchange of data mining models is PMML

Predictive Modeling Markup Language• XML Schema• PMML is the leading standard for

statistical and data mining models• Supported by over 20 vendors and

organizations• Covers the technical part of the

CRISP-DM Cycle

http://www.dmg.org/pmml_examples/index.html









PMML is “just” an XML Schema

• Developed for deploying mining models • Good for migration from one data mining

environment to anotherBut:• No explicit links between nodes• Verbose• Self-contained. Lacks support for– Interlinking multiple PMML documents– Interlinking PMML with other information

Association Rule Mining Ontology

The ontology is a „semantization“ of PMML XML Schema

DESIGN GUIDELINESThe key design principle was to allow easy transformation of data from PMML to AROn

SCOPEThe ontology is limited to the subset of PMML relevant toassociation rule mining. 60 topic types, 50 association types and 20 occurence types

USENo automatic transformation is yet available, but we are working on one using OKS framework. Currently, data can be input using Ontopoly.

• xs:element is mapped to topic type• Topics are assigned same names as PMML Nodes

– But respecting spaces between words and capitalization

• Superclasses are introduced for semantically similar XML Nodes

• Named elements used as children in other elements that carry most of the semantics of their parents are merged with parent

• If an XML element has a directly corresponding topic type in the ontology, the URI of the XML element within the schema is used as subject identifier

Design guidelines: Elements

Design guidelines: Attributes• Enumeration restriction on an attribute is mapped as a topic type with enumeration

superclass (this is a workaround for missing TMCL support in OKS)

• Attributes that could be interpreted as reference to other elements become associations

• Other attributes become occurence types

Design guidelines: Associations• Names for association types are arbitrarily chosen so that they are most

descriptive• Introduce less rather than more associations

– minimizes the effort when populating the ontology from PMML– Avoid unnecessary inflation of the topic map

• Link only the semantically closest topics– Additional „soft“ relations can be introduced with inference statements or derived with tolog

Design guidelines: Role types

• Topic types used to map PMML elements are used as role types– Unless multiple topics are permitted in association end. In that case

superclass is used as a role type, or a new role type is introduced

Two alternative association rulerepresentations-Apriori based(Item-Itemset)-GUHA based(Boolean Attributes)

Ongoing work

• Support for background knowledge „already known association rules“

• Support for schema mapping „linking of background knowledge with mining results“

• Already in the ontology, distinguished by base of subject identifier

Schema Mapping• http://keg.vse.cz/sma/XXXBackground Knowledge• http://keg.vse.cz/bko/xxx

http://keg.vse.cz/sma/XXX

http://keg.vse.cz/bko/xxx

Data Mining Use case

PREDICT LOAN QUALITYFind client characteristics that could be used to predict their attitude to paying back a loan.

BASED ON PAST RECORDS Input data: records on already given loans

The data

• 6181 clients in the PKDD’99 financial dataset

Data were preprocessed, i.e.District districtPrague PragueBrno Brno… …

duration Duration

Many distinct values in<0;100>

<0;12>

<13;23>

<24;inf>

status statusAggA GoodB MediumC

BadD

ID sex age duration district Loan quality

5464 male 54 12 [months] Prague A

5489 female 20 6 months Ostrava E

… .. .. .. .. ..

• ….And perhaps 9997 other association rules

Preprocessed data

Association Rule Learner

WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT

ASK CLIENT WHAT HE KNOWS

If loan duration is more than two years and the loan was given in Prague district, we can expect good loan quality.

…background knowledge

Semantize the results

Formalize Background Knowledge

Schema Mapping• Background knowledge can use different “vocabulary” than the data • If we are to use background knowledge in querying, we need to interlink

them with data.

The same approach would apply if we interlink several mining models (PMMLs)

Deleting information with Topic Maps

• Find association rules that subsume background knowledge

Visualization of a tolog query

Summary

• Methodology for transferring XML Schema to Topic Maps

• Association Rule Mining Ontology based on PMML• Easily extensible to other data mining algorithms• Initial attempts to formalize background knowledge• Initial attempts to use Topic Maps for schema mapping

AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr

http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr

Topic Maps for Association Rule Mining

Technology

Transcript of Topic Maps for Association Rule Mining