0 Unsupervised Rare Pattern Mining: A Surveyyunsing/publications/TKDD...Detecting rare patterns in...

DRAFT

0

Unsupervised Rare Pattern Mining: A Survey

Yun Sing Koh, University of AucklandSri Devi Ravana, University of Malaya

Association rule mining was first introduced to examine patterns among frequent items. The original moti-vation for seeking these rules arose from need to examine customer purchasing behaviour in supermarkettransaction data. It seeks to identify combinations of items or itemsets, whose presence in a transactionaffects the likelihood of the presence of another specific item or itemsets. In recent years, there has beenan increasing demand for rare association rule mining. Detecting rare patterns in data is a vital task, withnumerous high-impact applications including medical, finance, and security. This survey aims to provide ageneral, comprehensive, and structured overview of the state-of-the-art methods for rare pattern mining. Weinvestigate the problems in finding rare rules using traditional association rule mining. As rare associationrule mining has not been well explored, there is still specific groundwork that needs to be established. Wewill discuss some of the major issues in rare association rule mining and also look at current algorithms. As acontribution we give a general framework for categorizing algorithms: Apriori and Tree based. We highlightthe differences between these methods. Finally we present several real-world application using rare patternmining in diverse domains. We conclude our survey with a discussion on open and practical challenges inthe field.

Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications – Data min-ing; A.1 [Introductory and Survey]

General Terms: Algorithms, Performance

Additional Key Words and Phrases: Association Rule Mining, Infrequent Patterns, Rare Rules

ACM Reference Format:Yun Sing Koh and Sri Devi Ravana, 2016. Unsupervised Rare Pattern Mining: A Survey. ACM Trans. Knowl.Discov. Data. 0, 0, Article 0 ( 2016), 31 pages.DOI:http://dx.doi.org/10.1145/0000000.0000000

1. INTRODUCTIONThe main goal of association rule mining is to discover relationships among sets ofitems in a transactional database. Association rule mining, introduced by Agrawalet al. [1993] aims to extract interesting correlations, frequent patterns, associations orcasual structures among sets of items in transaction databases or other data reposito-ries. The relationships are not based on intrinsic properties of the data themselves butrather based on the co-occurrence of the items within the database. These associationsbetween items are also known as association rules.

Two major types of rules can be found in a database: frequent and rare rules. Bothfrequent and rare association rules present different information about the databasein which they are found. Frequent rules focus on patterns that occur frequently, while

Author’s addresses: Y. Koh, Department of Computer Science, University of Auckland. S.D. Ravana, De-partment of Information Systems, Faculty of Computer Science & Information Technology, University ofMalaya.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is per-mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any componentof this work in other works requires prior specific permission and/or a fee. Permissions may be requestedfrom Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© 2016 ACM 1556-4681/2016/-ART0 $15.00DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on Knowledge Discovery from Data, Vol. 0, No. 0, Article 0, Publication date: 2016.

This is a draft version of the paper accepted at ACM TKDD in May 2016.

DRAFT

0:2 Y. Koh and S.D. Ravana

rare rules focus on patterns that occur infrequently. In many domains, events thatoccur frequently may be less interesting than events that occur rarely, since frequentpatterns represent the known and expected, while rare patterns may represent un-expected or previously unknown associations, which is useful to domain experts. Ex-amples of mining infrequent itemsets include identifying relatively rare diseases, pre-dicting telecommunication equipment failure, and finding associations between infre-quently purchased supermarket items. In the area of medicine, the expected, frequentresponses to medications are less interesting than exceptional, rare responses whichmay indicate adverse reactions or drug interactions. Suppose a database of patientsymptoms contains a rare itemset elevated heart rate, fever, skin bruises, low bloodpressure, where all items other than low blood pressure are frequent items. This item-set will produce a rule: elevated heart rate, fever, skin bruises → low blood pressurethat highlights the association between the different three former symptoms with lowblood pressure, which are symptoms of severe sepsis [Tsang et al. 2011]. Motivated bythe crucial and critical information that may be derived from rare patterns, there hasbeen a spur of research in this area.

The main difficulty in finding rare patterns or rare association rule is the miningprocess can be likened to finding a needle in a haystack [Weiss 2004]. The difficultly ofsearching for a small needle is multiplied by the fact that the needle is obscured by alarge amount of hay strands. Likewise, rare rules pose difficulties for data mining sys-tems for a variety of reasons. The fundamental problem is the lack of data associatedwith rare cases. Rare rules tend to cover only a few examples. The lack of data makesdetecting rare cases difficult and, even when rare case are detected, it is hard for usto meaningfully generalize the results. Generalization is a difficult task when we aretasked with identifying regularities from few data points.

In classical association rule mining, all frequent itemsets are found, where an item-set is said to be frequent if it appears at least a given percentage of the time in alltransactions called minimum support (minsup). Then association rules are found inthe form of A → B where AB is a frequent itemset. Strong association rules are de-rived from frequent itemsets and constrained by minimum confidence (minconf). Con-fidence is the percentage of transactions containing A that also contain B. An exampleof an association rule, A → B is 90% of transactions that contain item A also containitem B. 30% of all transactions contain item A. 27% of all transactions contain thesetwo items together. Here the support of the rule is 27% and the confidence is 90%.Confidence is the conditional probability of a transaction containing A also containingB. Currently there are many association rule mining algorithms dedicated to frequentitemset mining. These algorithms are defined in such a way that they only find ruleswith high support and high confidence. Most of these approaches adopt an Apriori-likeapproach, which is based on anti-monotonic Apriori heuristics [Agrawal and Srikant1994]. This means if any length k-itemset is not frequent in the database, its supersetlength (k+1)-itemsets can never be frequent. The Apriori algorithm iteratively gener-ates sets of candidate itemsets and checks their corresponding occurrence frequenciesin the database.

Common association rule mining algorithms, such as Apriori, discover valid rulesby exploiting support and confidence requirements and using the minimum supportthreshold to prune the combinatorial search space. Two major problems may arisewhen we apply these techniques. Intuitively to discover rare rules, every transactionhas to be inspected to generate the support of all combinations of items. However, thissignificantly increases the workload during candidate set generation, construction oftree nodes and testing. The number of rules generated would be considerable, and notall the rules generated may be interesting. Patterns with items having significantlydifferent support levels would be produced, which usually have weak correlations.



DRAFT

Unsupervised Rare Pattern Mining: A Survey 0:3

However if we set the minimum support threshold too high, interesting rare ruleswill never be generated. Rare rules with low support would never be considered in theitemset generation.

Using frequent pattern mining approaches, items that rarely occur are in very fewtransactions are normally pruned out. Thus, infrequent itemsets warrant special at-tention as they are more difficult to find using traditional data mining techniques, thusnew approaches need to be developed to cater for these specific types of itemsets. Re-cently, there have been some growing interests in developing techniques to mine rarepatterns.

1.1. Types of Rare PatternsOverall there are two major categories of the types of rare patterns: basic and ex-tended. In Figure 1 we show the roadmap of the categories of rare pattern. In the basiccategory, we have association rule mining and rare patterns. In the extended cate-gory, there is a range of other patterns including subgraph, probabilistic, and sequencepatterns. Based on pattern diversity, rare pattern mining can be classified using thefollowing criteria basic and extended patterns.

Basic Rare Patterns: Rare patterns can also be mapped into basic association rules,or other kinds of rare rules based on constraints or interestingness measures. The ma-jor advantage on focusing on the basic type of patterns and this category of miningtechniques is they are widely applicable in different application domains. Further-more, we can also utilize those techniques on different data types. For example wecan apply rare pattern techniques on a sequential dataset. Sequential datasets haveadditional temporal information compared to a normal transactional dataset. Usingbasic approaches on a sequential datasets has its disadvantages such that we wouldstrip away some underlying information, for example, the temporal information in asequential dataset. In this survey we will concentrate on the basic type of rare pat-terns and the current mining techniques in the area. We aim to provide and discuss alarge variety of research in the area.

Extended Rare Patterns: Overall most of the current research in the field concen-trates on finding the basic type of rare patterns. There exists research concentratingon finding the patterns in the extended category. The main reason behind this couldbe the computational complexity of these mining techniques in detecting rare occur-rences of events. For example, frequent graph mining techniques can suffer from costlysubgraph isomorphism testing and generate an enormous number of candidates dur-ing the mining process. This problem is exacerbated when we try to look at rare graphmining. Rare subgraph pattern techniques have been used to find useful substructuresin a graph database. For example, network security administrators can conduct a pat-tern (subgraph) matching over the network traffic data to detect possible maliciousattacks which are relatively uncommon compared to the normal network traffic.

Some rare patterns may involve sequences and structures. For example, by studyingthe order in which rare events occur we may be able to find sequential patterns, thatis, rare subsequences in a sequence of ordered events. For example, when a diagnosinga patients condition, the context is reflected in the sequence of conditions that hasappeared across a time period. By mining sequential patterns of symptoms we cancapture a patient’s condition. In this way, a patient’s condition can be found at a moreabstract level and it gets easier to track and detect changes in the patient’s conditions.For instance, high temperature→ low blood pressure→ high pulse rate in women whohas just given birth could indicate an onset of a secondary postpartum hemorrhagewhich is a relatively rare condition which can cause mortality.

In some datasets we need to consider the existential uncertainty of rare itemsets,indicating the probability that an itemset occurs in a transaction, makes traditional



DRAFT


techniques inapplicable. For example a medical dataset may contain a table of patientrecords, each of which contains a set of symptoms and/or illnesses that a patient suffersfrom. Applying rare association analysis on such a dataset allows us to discover anypotential correlations among the symptoms and illnesses for rare diseases. In thesecases, symptoms would best be represented by probabilities based on historical datastatistics that indicate their presence in the patients’ records. These type of data isknown as uncertain data.

Rare Patternsand Data TypeRoadmap

ExtendedData Type

Uncertain

Spatial Co-location

Structural/Graph/Network

Temporal(Evolutionary)

Sequential Time-Series

Kinds ofRare Pattern

Extended

Sequence Pattern

Probabilistic Pat-tern

Subgraphs

Basic Rare AssociationRule / Rare Pattern

Fig. 1. Roadmap for Types Rare Patterns

We can also apply those techniques on other kinds of data types. For example wecan apply rare pattern techniques on a sequential dataset [Hu et al. 2014]. Sequentialdatasets have additional temporal information. In sequential time-series data analy-sis, researchers can discretize time-series values into multiple levels or intervals. Indoing so small fluctuations and differences in value can be ignored. The data can thenbe summarized into sequential patterns, which can be indexed to facilitate similaritysearch or comparative analysis, which are normally used in recommender systems.Whereas we can also use temporal or evolutionary data, whereby the data is dynamicand may evolve over time [Huang et al. 2014]. Suppose an analyst is modelling ab-normal user behaviours for detecting fraudulent call patterns, processing the cause offrauds can change dynamically as fraudsters continuously find new ways to break thesystem.

We may also mine rare structural patterns [Lee et al. 2015] which are substruc-tures from a structured dataset. Note that structure is a general concept that coversmany different kinds of structural forms including directed graphs, undirected graphs,network, lattices, trees, sequences, sets, or combinations of such structures. A generalpattern may contain different elements such as subsequence, subtree, or subgraph.

Pattern analysis is useful in the analysis of spatial co-location data. Rare spatialco-location patterns [Huang et al. 2006] represent rare relationships among eventshappening in different and possibly nearby locations. For example, these can help de-termine if a certain rare but deadly disease is geographically co-located with certainareas, to establish patterns of potential epidemics occurring.



DRAFT


1.2. Our ContributionsThis survey is an attempt to provide a structured and broad overview of extensiveresearch in rare pattern mining spanning different application domains. In this surveywe will take an in-depth look at the issues in rare pattern mining, applications ofrare pattern mining, current rare pattern mining approaches, an evaluation of thetechniques, and a discussion on the open challenges in the area.

We divide the current rare pattern mining techniques into two types: static and dy-namic. We proposed that techniques that deal with static data are divided into four cat-egories: variable support threshold, without support threshold, consequent constraint-based rule mining, and tree based approaches. Techniques that deals with dynamicdata are divided into two categories: with and without drift detection.

We identify the advantages and disadvantages of the techniques in each category.We also provide a detailed discussion of the application domains where rare patternmining techniques have been used. We discuss the similarity between rare patternmining and a related research area of anomaly detection. We also looked at the openchallenges in rare pattern mining.

1.3. OrganizationIn Section 2 we identify the various aspects that determine the formulation of theproblem and highlight complexity associated with rare patten mining. In Section 3 webriefly describe the different application domains where rare pattern mining has beenapplied. In subsequent sections we provide a categorization of rare pattern miningtechniques based on the research area which they belong to. The majority of tech-niques can be categorized as techniques for detecting rare rules in static data, and arecataloged in Section 4. The techniques for detecting rare rules in dynamic data are cat-aloged in Section 5. We compare and contrast the characteristics of the techniques inSection 6. We present some discussion on the similarity and differences between rarepattern mining techniques and anomaly detection in Section 7. Section 8 contains ourconcluding remarks.

2. BASIC CONCEPT: RARE PATTERN PROBLEMThe most popular pattern discovery method in data mining is association rule mining.The associations between items are also known as association rules. Association rulemining was also initially known as frequent pattern mining [Agrawal et al. 1993]. Thefollowing is a formal statement of association rule mining for a transaction database.

Let I = i1, i2, . . . , im be the universe of items. A setX ⊆ I of items is called an itemsetor pattern. In particular, an itemset with k items is called a k-itemset. In associationrule mining, the main function of a unique transaction ID is to allow an itemset tooccur more than once in a database. Every transaction contains a unique transactionID tid. A transaction t = (tid,X) is a tuple where X is an itemset. A transactiont = (tid , X) is said to contain itemset Y if Y ⊆ X. Let {t1, t2, . . . , tn} be the set of allpossible transactions called T . A transaction database D is a set of transactions, suchthat D ⊆ T . In effect, D is really a multi-set of itemsets. The count of an itemset Xin D, denoted by count(X,D), is the number of transactions in D containing X. Thesupport of an itemset X in D, denoted by sup(X,D), is the proportion of transactionsin D that contain X, i.e.,

sup(X,D) =count(X,D)

|D|where |D| is the total number of transactions in D.



DRAFT


Definition 2.1 (Rare Itemset). Given a user-specified minimum support thresholdminsup ∈ [0, 1], X is called a rare itemset or rare pattern in D if sup(X,D) ≤ minsup.

Typically the problem of mining rare patterns is to find the meaningful set of rarepatterns in a transaction database with respect to a given support threshold.

Definition 2.2 (Association Rule). An association rule is an implication of the formX → Y , where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅.

The left-hand side (LHS) X is referred to as the antecedent of the rule, and theright-hand side (RHS) Y as the consequent. The rule X → Y has support of s in thetransaction set D, if s = sup(XY,D). XY refers to an itemset that contains both X andY , and is always used in association rule mining literature instead of X ∪ Y . The ruleX → Y holds in the transaction set D with confidence c where c = conf(X → Y,D),

conf(X → Y,D) =sup(XY,D)

sup(X,D).

Given a transaction database D, a support threshold minsup and a confidence thresh-old minconf, the task of association mining is to generate all association rules thathave support and confidence above the user-specified thresholds. This is known as thesupport-confidence framework [Agrawal et al. 1993].

However, it is difficult to find association rules with low support but high confidenceusing Apriori-like methods. In order to find these rules, the minimum support thresh-old would need to be set quite low to enable rare items to be included. The Aprioriheuristic would be able to reduce the size of candidate itemsets if the minimum sup-port is set reasonably high. However, in situations with abundant frequent patterns,long patterns, or a low minimum support threshold, an Apriori-like algorithm maystill suffer from the following nontrivial cost: most of the items would be allowed toparticipate in itemset generation. This will have an effect on the scalability of theApriori algorithm. It is costly to handle a huge number of candidate items. It is timeconsuming to repeatedly scan the database and check each of the candidate itemsetsgenerated. The complexity of the computation will increase exponentially with regardto the number of itemsets generated.

Traditional association rule analysis, such as Apriori, focus on mining associationrules in large databases with a single minsup. Since a single threshold is used forthe whole database, it assumes that all items in the database are of the same natureand/or have similar frequencies. As such, Apriori works best when all items have ap-proximately the same frequency in the data. Apriori exploits the downward closureproperty that states that if an itemset is frequent so are all its subsets. This leads to aweakness in Apriori. Consider the case where the user-defined minsup of {A,B,C} is2% and the user-defined minsup of {A,B}, {A,C}, and {B,C} is 5%. It is possible for{A,B,C} to be frequent with respect to its minsup but none of {A,B}, {A,C}, {B,C}to be frequent with respect to their minsup. {A,B}, {A,C}, {B,C} may have supportbelow 5% but {A,B,C} may still be above 2%. In this case itemset {A,B,C} is con-sidered frequent because of the different minsup values. In reality, some items may bevery frequent while others may rarely appear. Hence minsup should not be uniformbecause deviation and exceptions generally have a much lower support than generaltrends. Note that support requirements vary as the support of items contained in anitemset varies.

Given that the existing Apriori algorithm assumes a uniform support, rare item-sets can be hard to find. Items that rarely occur are in very few transactions andwill be pruned because they do not meet the minsup threshold. In data mining, rareitemsets may be obscured by common cases. Weiss [2004] calls this relative rarity.



DRAFT


This means that items may not be rare in the absolute sense but are rare relative toother items. This is especially a problem when data mining algorithms rely on greedysearch heuristics that examine one item at a time. Since rare cases may depend onthe conjunction of many conditions, analysing any single condition alone may not beinteresting [Weiss 2004].

As a specific example of the problem, consider the association mining problem indetermining if there is an association between buying a food processor and buying acooking pan [Liu et al. 1999b]. The problem is that both items are rarely purchasedin a supermarket. Thus, even if the two items are almost always purchased together,when either one is purchased, this association may not be found. Modifying the minsupthreshold to take into account the importance of the items is one way to ensure thatrare items remain in consideration. In order to find this association minsup must beset low. However setting this threshold low would cause a combinatorial explosion inthe number of itemsets generated. Frequently occurring items will be associated withone another in an enormous number of ways simply because the items are so commonthat they cannot help but appear together. This is known as the rare item problem [Liuet al. 1999b]. It means that using the Apriori algorithm, we are unlikely to generaterules that may indicate rare events of potentially dramatic consequence. For example,we might prune out rules that indicate the symptoms of a rare but fatal disease due tothe frequency of occurrence not reaching the minsup threshold.

As rare rule mining is still an area that has not been well explored, there is somegroundwork that needs to be established. A real dataset will contain noise, possiblyat levels of low support. Normally, noise has low support. In Apriori, setting a highminimum support threshold would cut the noise out. Inherently we are looking forrules with low support that could make them indistinguishable from coincidences (thatis, situations where items fall together no more often than they would normally bychance). Apriori is the most commonly used association mining technique. Howeverit is not efficient when we try to find low support rules. Using Apriori, we would stillneed to wade through thousands of itemsets (often having high support) to find the rareitemsets that are of interest to us. Another approach was to use a tree based approach.Most tree based rare pattern mining approaches follow the traditional FP-Tree [Hanet al. 2000] approach. It is a two-pass approach and is affordable when mining a staticdataset. However in a data stream environment, a two-pass approach is not suitable.The overview of mining techniques in Sections 4 and 5 is shown in Table I.

Table I. Categorization of Rare Pattern Mining Ap-proaches

Apriori Based Tree Based

Static Sections 4.1- 4.3 Section 4.4Dynamic - Section 5

In the next section we look at some application areas where rare pattern mininghave been utilized. Rare pattern mining has been used in many different applicationsincluding detecting suspicious uses and behaviors in the context of web applications,finding infrequent patterns on pre-processed wireless connection records, and findingpatterns that look for adverse drug reactions.

3. APPLICATION OF RARE PATTERN MINING IN VARIOUS DOMAINSRare patterns can be applied in various domains including biology, medicine and secu-rity. In the security field, normal behaviour is very frequent, whereas abnormal or sus-picious behaviour is less frequent. When analysing a database where the behaviour of



DRAFT


people in sensitive places such as airports are recorded, if we model those behaviours,a possible outcome would be to find that normal behaviours can be represented by fre-quent patterns and strange or suspicious behaviours by rare patterns. In the medicalfield, by analysing clinical databases a user can discover rare patterns or trends thatwill assist doctors to make decisions about the clinical care. In this section, we discusssome of the applications of rare pattern mining techniques.

3.1. Detecting Suspicious Web UsageAdda et al. [2012] adopted an approach to find rare patterns to detect suspicious usesand behaviors. The mining system Rare Pattern Mining for Suspicious Use Detection(RPMSUD) [Adda et al. 2012] is mainly composed of an engine that unobtrusively anal-yses the usage data of a running web application and detects suspicious activities. Theauthors made an assumption that a web application usage is dominated with repeti-tive access/requests and that rare access patterns are a source of potential abnormalusage that may be related to security issues.

The algorithm behind the RPMSUD system is composed of an event queue process-ing process that periodically analyses the events that occur in a given period of timecalled mining cycles. They consider a usage pattern suspicious if it is detected duringdifferent mining cycles. The user has to determine the duration of a cycle and the num-ber of cycles to consider before issuing an alert. After each cycle the queue is emptied.The number of cycles multiplied with the duration of a cycle is called the mining win-dow. After each mining window, the set of mined patterns is saved and then emptied.

3.2. Detecting Network Attacks from Wireless Connection RecordsWiFi Miner [Rahman et al. 2010] finds infrequent patterns on pre-processed wirelessconnection records using an infrequent pattern mining algorithm based on Apriori.They proposed the Online Apriori-Infrequent algorithm which does not use the confi-dence value parameter and efficiently uses only frequent and rare patterns in a recordto compute an anomaly score for the record to determine whether this record is anoma-lous or not on the fly. The algorithm has an Anomaly Score Calculation which assignsa score to each wireless packet.

Computationally the proposed Online Apriori-Infrequent algorithm improves thejoin and prune step of the traditional Apriori algorithm with a constraint. The con-straint avoids joining itemsets not likely to produce frequent itemsets as their results,thereby improving efficiency and run times significantly. An anomaly score is assignedto each packet (record) based on whether the record has more frequent or infrequentpatterns. Connection records with positive anomaly scores have more infrequent pat-terns than frequent patterns and are considered anomalous packets.

To test the application the authors created a wireless network with two other PCswhere one was the victim and another one was the attacker PC. Within a five min-utes time window they have captured around 19,500 wireless packets, which weregenerated as a result of some innocent activities. They gathered around 500 anoma-lous packets which contained different kinds of crafted attacks like passive attacks,active attacks, Man-In-the-Middle attack. Then, they launched these attacks from theattacker PC to the victim PC. They then tested the system with crafted intrusions andcompared it with other two other benchmark systems and found our system to be moreefficient.

3.3. Detecting Adverse Drug ReactionsJi et al. [2013] proposed a framework to mine potential causal associations in elec-tronic patient data sets where the drug-related events of interest occur infrequently.They developed and incorporated an exclusion mechanism that can effectively reduce



DRAFT


the undesirable effects caused by frequent events. Specifically, they proposed a novelinterestingness measure, exclusive causal-leverage, based on a computational, fuzzyrecognition-primed decision model. On the basis of this new measure, a data miningalgorithm was developed to mine the causal relationship between drugs and their asso-ciated adverse drug reactions. The proposed exclusive causal-leverage measure looksfor itemset pairs < X,Y >. It denotes that X has causality with Y . A large valueindicates a stronger causal association. If its value is negative, it indicates a reversecausality. If a value close to zero it shows no causal association between X and Y .

To evaluate the algorithm, a dataset needs to be preprocessed to obtain differenttypes of information: a list of all drugs in the database and the support count for eachdrug, and a list of all symptoms in the database and the support count for each symp-tom. The lists of drugs and symptoms are needed to form all possible drug-symptompairs whose causal strengths will be assessed. In the experiment, the data was ob-tained from Veterans Affairs Medical Center in Detroit, Michigan. “Event” data suchas dispensing of drug, office visits, and certain laboratory tests were retrieved for allthe patients. The results showed that the exclusive causal-leverage measure outper-formed traditional interestingness measures like risk ratio and leverage because of itsability to capture suspect causal relationships and exclude undesirable effects causedby frequent events.

3.4. Detecting Abnormal Learning Problems in Students from Educational DatasetRomero et al. [2010] explore the extraction of rare association rules when gatheringstudent usage data from a Moodle system. In the education domain, infrequent as-sociations can be of great interest since they are related to rare but crucial cases.The rules produced can allow the instructor to determine frequent/abnormal learningproblems that should be taken into account when dealing with students with specialneeds. Thus, this information could help the instructor to discover a minority of stu-dents who may need specific support with their learning process. The authors used theApriori-Inverse [Koh and Rountree 2005] and Apriori-Rare [Szathmary et al. 2007] al-gorithms detailed in Section 4.1. In the experiments, the authors have evaluated therelation/influence between the on-line activities and the final mark obtained by thestudents.

4. DETECTING RARE RULES IN STATIC DATAIn this section we look at techniques that work on a static dataset. Classic pattern min-ing techniques, such as Apriori, rely on uniform minimum support. These algorithmseither miss the rare but interesting rules or suffer from congestion in itemset gener-ation caused by low support. Driven by such shortcomings, research has been carriedout in developing new rule discovery algorithms to mine rare rules. Figure 2 shows thetaxonomy of the current techniques.

There have been several approaches taken to ensure that rare items are consid-ered during itemset generation. One of the approaches is association rule mining withvariable support threshold. In this approach, each itemset may have a different sup-port threshold. The support threshold for each itemset is dynamically lowered to allowsome rare items to be included in the rule generation. Section 4.1 discuss some of theresearch [Liu et al. 1999a; Yun et al. 2003; Tao et al. 2003; Wang et al. 2003; Senoand Karypis 2001; Koh and Rountree 2005; Pillai et al. 2013; Szathmary et al. 2007;Sadhasivam and Angamuthu 2011; Hoque et al. 2013].

A uniform minimum support threshold is not effective for datasets with a skeweddistribution because uniform minimum support thresholds tend to generate many triv-ial patterns or miss potential low-support patterns. Hence another approach uses as-sociation rule mining without support threshold, but it usually introduces another con-



DRAFT


Static Rare Pattern Mining

With VariableSupport Threshold

Without Sup-port Threshold

ConsequentConstraint Tree

MSApriori

RSAA

Adaptive Apriori

WARM

LPMiner

Apriori-Inverse

Rarity

ARIMA

AfRIM

FRIMA

HURI

Adaptive AprioriRare

Min-Hashing

Confidence-BasedPruning

H-Confidence

MINIT

Dense-Miner

Direction Setting

Emerging Pattern

Fixed-Consequent

MIISR

RP-Tree

IFP-Tree

MCCFP Growth

TRARM-RelSup

Fig. 2. Static Rare Pattern Mining

straint to solve the rare item problem. In Section 4.2 we discuss some of the algorithmsthat focus on mining low support rules without using support thresholds [Cohen et al.2001; Wang et al. 2001; Xiong et al. 2003; Haglin and Manning 2007].

Another way to encourage low-support items to take part in candidate rule genera-tion is by imposing item constraints. For example, it is possible to provide a list of thoseitems that may or may not take part in a rule and then modify the mining process totake advantage of that information [Srikant et al. 1997; Ng et al. 1998; Bayardo et al.2000; Rahal et al. 2004; Li et al. 1999]. One of the restrictions that may be imposed iscalled consequent constraint-based rule mining discussed in Section 4.3.

Recently tree based approaches have been developed to find rare association rules.Section 4.4 discuss some of the algorithms focusing on rare pattern mining using tree



DRAFT


based approaches [Tsang et al. 2011; Gupta et al. 2012; Kiran and Krishna Reddy2012; Lavergne et al. 2012].

4.1. With Variable Support ThresholdIn this approach, the minimum support threshold is lowered using different techniquesto allow some rare items to be included in rule generation. Each itemset is given adifferent minimum support threshold. The support of a rare itemset would be lowerthan frequent itemsets. Thus the minimum support threshold for the rare itemset isset lower to allow the itemset to be considered.

For example the sales of espresso machine may appear in 1% of the time in a de-partmental store’s transaction and the sales of coffee grinder appears in about 1.2%.Suppose both coffee grinder and espresso machine appears 0.8%. Given that we derivedthat the rule espresso machine→ coffee grinder appears with confidence of 80%. If theminimum support threshold for was set to 1%, we would not be able to detect this po-tentially interesting rule. To overcome using the variable support threshold, each itemespresso machine or coffee grinder may be given a different minimum support thresh-old. For example if we set the minimum support threshold for coffee grinder as 0.5%and the minimum support threshold for espresso machine as 0.8%, then the itemsets{espresso machine, coffee grinder} will be considered for rule generation. Another ap-proach would be using two different thresholds such that we consider items that arebelow a minimum support threshold, thus using this variation we can find the ruleespresso machine → coffee grinder as both the items in the rules would be consideredin the during rule generation.

4.1.1. Multiple Minimum Support Thresholds (MSApriori). Liu et al. [1999a] deal with therare item problem by using multiple minimum support thresholds. In their researchpremise, they note that some individual items can have such low support that theycannot contribute to rules generated by Apriori, even though they may participate inrules that have very high confidence. They overcome this problem with a techniquewhereby each item in the database can have its own minimum item support (MIS). Byproviding a different MIS for different items, a higher minimum support can be set forrules that involve frequent items and lower minimum support for rules that involveless frequent items. The minimum support of an itemset is the lowest MIS amongthose items in the itemset. For example, let MIS(i) denote the MIS value of item i. Theminimum support of a rule R is the lowest MIS value of items in the rule. A rule, R:AB → C satisfies its minimum support if the rule has an actual support greater orequal to: min(MIS(A),MIS(B),MIS(C)).

However, consider four items in a dataset, A, B, C, and D with MIS(A) = 10%,MIS(B) = 20%, MIS(C) = 5%, and MIS(D) = 4%. If we find that {A,B} has 9% supportat the second iteration, then it does not satify min(MIS(A),MIS(B)) and is discarded.Then potentially interesting itemsets {A,B,C} and {A,B,D} will not be generated inthe next iteration. By sorting the items in ascending order of their MIS values, theminimum support of the itemset never decreases as the length of an itemset grows,making the support a more general support constraint. In general, it means that afrequent itemset is only extended with an item having a higher (or equal) MIS value.The MIS for each data item i is generated by first specifying LS (the lowest allowableminimum support), and a value β, 0 ≤ β ≤ 1.0. MIS(i) is then set according to thefollowing formula:

MIS(i) = max(β.sup(i,D), LS)

The advantage of the MSApriori algorithm is that it has the capability of finding somerare-itemset rules. However, the actual criterion of discovery is determined by theuser’s value of β rather than the frequency of each data item.



DRAFT


4.1.2. Relative Support Apriori (RSAA). Determining the optimal value for β could be te-dious especially in a database with many items where manual assignment is not fea-sible. Thus [Yun et al. 2003] proposed the RSAA algorithm to generate rules in whichsignificant rare itemsets take part, without any user defined thresholds. This tech-nique uses relative support: for any dataset, and with the support of item i representedas sup(i,D), relative support (RSup) is defined as:

RSup({i1, i2, . . . , ik}, D) =sup({i1, i2, . . . , ik}, D)

min(sup(i1, D), sup(i2, D), . . . , sup(ik, D))

Thus this algorithm increases the support threshold for items that have low frequencyand decreases the support threshold for items that have high frequency.

Using a non-uniform minimum support threshold leads to the problem of choosinga suitable minimum support threshold for a particular itemset. Each item within theitemset may have a different minimum support threshold. MSApriori and RSAA sortthe items within the itemset in non-decreasing order of support. Here the support ofa particular itemset never increases and the minimum support threshold never de-creases as the itemset grows.

4.1.3. Adaptive Apriori. Wang et al. [2003] proposed Adaptive Apriori which has a vari-able minimum support threshold. Adaptive Apriori introduces the notion of supportconstraints (SC) as a way to specify general constraints on minimum support. In par-ticular, they associate a support constraint with each of the itemsets. They considersupport constraints of the form SCi(B1, . . . , Bs) ≥ θi, where s ≥ 0. Each Bj , called abin, is a set of items that need not be distinguished by the specification of minimumsupport. θi is a minimum support in the range of [0 . . . 1], or a function that producesminimum support. If more than one constraint is applicable to an itemset, the con-straint specifying the lowest minimum support is chosen. For example, given SC1(B1,B3) ≥ 0.2, SC2(B3) ≥ 0.4, SC3(B2) ≥ 0.5, and SC0() ≥ 0.9, if an itemset contains{B1, B2, B3}, the minimum support used is 0.2. However, if the itemset only contains{B2, B3} then the minimum support is 0.4.

The key idea of this approach is to push the support constraint following the de-pendency chain of itemsets in the itemset generation. For example, we want to gener-ate itemset {B0B1B2}, which uses SC3, which is 0.5. {B0B1B2} is generated by using{B0B1} with SC0 and {B1B2} with SC3. This requires the minsup, which is 0.5 from{B0B1B2}, to be pushed down to {B0B1}, and then pushed down to {B0} and {B1}. Thepushed minimum support is 0.5, which is lower than the specified minsup for {B0B1},{B0}, or {B1}, which is 0.9. The pushed minimum support of each itemset is forced tobe equal to the support value corresponding to the longest itemset.

4.1.4. Weighted Association Rule Mining (WARM). We can determine the minimum sup-port threshold of each itemset by using a weighted support measurement. Each itemor itemset is assigned a weight based on its significance. Itemsets that are consideredinteresting are assigned a larger weight. Weighted association rule mining (WARM)[Tao et al. 2003] is based on a weighted support measurement with a weighted down-ward closure property. They propose two types of weights: item weight and itemsetweight. Item weight w(i) is assigned to an item representing its significance, whereasitemset weight w(X) is the mean of the item weight. The goal of using weighted sup-port is to make use of the weight in the mining process and prioritise the selection oftargeted itemsets according to their significance in the dataset, rather than by theirfrequency alone. The weighted support of an itemset can be defined as the product ofthe total weight of the itemset (sum of the weights of the items) and the weight of thefraction of transactions that the itemset occurs in. In WARM, itemsets are no longer



DRAFT


simply counted as they appear in a transaction. The change in the counting mecha-nism makes it necessary to adapt the traditional support to a weighted support. Anitemset is significant if its support weight is above a pre-defined minimum weightedsupport threshold. Tao et al. [2003] also proposed a weighted downward closure prop-erty as the adjusted support values violate the original downward closure property inApriori. The rules generated in this approach rely heavily on the weights used. Thusto ensure the results generated are useful, we have to determine a way to assign theitem weights effectively.

4.1.5. LPMiner. Previous approaches vary the minimum support constraint by using aparticular weighting method using either the frequency or significance of the itemsets.Here we will discuss a different approach, LPMiner [Seno and Karypis 2001], whichvaries the minimum support threshold. It uses a pattern length-decreasing supportconstraint which tries to reduce support so that smaller itemsets which have highercounts are favoured over larger itemsets with lower counts. Seno and Karypis proposea support threshold that decreases as a function of itemset length. A frequent item-set that satisfies the length decreasing support constraint can be frequent even if thesubsets of the itemset are infrequent. Hence the downward closure property does nothold. To overcome this problem, they developed a property called smallest valid exten-sion (SVE). In this property, for an infrequent itemset to be considered it must be overa minimum pattern length before it can potentially become frequent. Exploiting thispruning property, Seno and Karypis propose LPMiner based on the FP-tree algorithm[Han et al. 2000]. This approach favours smaller itemsets; however, longer itemsetscould be interesting, even if they are less frequent. In order to find longer itemsets,one would have to lower the support threshold, which would lead to an explosion ofthe number of short itemsets found.

4.1.6. Apriori-Inverse. Apriori-Inverse proposed by Koh and Rountree [2005] is used tomine rare rules which they called perfectly sporadic itemsets, which are itemsets thatonly consist of items below a maximum support threshold (maxSup). Apriori-Inverse issimilar to Apriori, except that at initialisation, only 1-itemsets that fall below maxSupare used for generating 2-itemsets. Since the Aprori-Inverse inverts the downward-closure property of Apriori, all rare itemsets generated must have a support belowmaxSup. In addition, itemsets must also meet an absolute minimum support in orderfor them to be used for candidate generation.

Perfectly sporadic rules consist only of items falling below the maxsup threshold.They may not have any subsets with support above maxsup. They define perfectlysporadic rules as the following A → B in transaction database D is perfectly sporadicfor maxsup s, and minconf c if and only if:

conf(A→ B,D) ≥ c, and ∀x : x ∈ (AB), sup(x,D) < s

That is, support must be under maxsup and confidence must be at least minconf,and no member of the set of AB may have support above maxsup. Perfectly sporadicrules thus consist of antecedents and consequents that occur rarely (that is, less of-ten than maxsup) but, when they do occur, tend to occur together (with at least min-conf confidence). For instance, suppose we had an itemset AB with sup(A,D) = 0.12,sup(B,D) = 0.12, and sup(AB,D) = 0.12, with maxsup = 0.12 and minconf = 0.75.Both A → B (confidence = 1.0) and B → A (confidence = 1.0) are perfectly sporadic inthat they have low support and high confidence.

While this is a useful definition of a particularly interesting type of rule, it cer-tainly does not cover all cases of rules that have support lower than maxsup. For in-stance, suppose we had an itemset AB with sup(A,D) = 0.12, sup(B,D) = 0.16, andsup(AB,D) = 0.12, with maxsup = 0.12 and minconf = 0.75. Both A → B (confidence



DRAFT


= 1.0) and B → A (confidence = 0.75) are sporadic in that they have low support andhigh confidence, but neither is perfectly sporadic, due to B’s support being too high.Since the set of perfectly rare-rules may only be a small subset of rare itemsets, Koh etal. also proposed several modifications that allow Apriori-Inverse to find near-perfectrare itemsets. The methods are based on increasing maxSup during itemset genera-tion, but using the original maxSup during rule generation.

4.1.7. A Rare Itemset Miner Algorithm (ARIMA). Szathmary et al. [2007] proposed two al-gorithms that can be used together to mine rare itemsets. As part of those two algo-rithms, Szathmary et al. defines three types of itemsets: minimal generators (MG),which are itemsets with a lower support than their subsets; minimal rare itemset(mRI), which are itemsets with non-zero support and whose subsets are all frequent;and minimal zero generators (MZG), which are itemsets with zero support and whosesubsets all have non-zero support.

Their approach mines rare itemsets using three algorithms: Apriori-Rare, MRG-Expand ARIMA. Apriori-Rare is a modification of the Apriori algorithm used to mine fre-quent itemsets. Apriori-Rare generates a set of all minimal rare itemsets, that corre-spond to the itemsets usually pruned by the Apriori algorithm when seeking for fre-quent itemsets. The MRG-Exp algorithm, finds all mRIs by using MGs for candidategeneration in each layer in a bottom up fashion. The MRIs represent a border thatseparates the frequent and rare itemsets in the search space. All itemsets above thisborder must be rare according to the anti-monotonic property. ARIMA takes as inputthe set of rare mRIs and generates the set of all rare itemsets split into two sets: theset of rare itemsets having a zero support and the set of rare itemsets with non-zerosupport. Rare itemsets are generated by growing each itemset in the mRI set. In fact,if an itemset is rare then any extension of that itemset will result a rare itemset. Tomake the distinction between zero and non-zero support itemsets, the support of eachextended itemset is calculated. Essentially, ARIMA generates the complete set of rareitemsets. This is done by merging two k-itemsets with k − 1 items in common into ak+1-itemset. ARIMA stops the search for non-zero rare itemsets when the MZG borderis reached, which represents the border above which there are only zero rare itemsets.

4.1.8. Apriori Based Framework for Rare Itemset Mining (AfRIM). Adda et al. [2007] proposedAfRIM that uses a top-down approach. Rare itemset search in AfRIM begins with theitemset that contains all items found in the database. Candidate generation occurs byfinding common k-itemset subsets between all combinations of rare k+1-itemset pairsin the previous level. Candidates are pruned in a similar way to the Rarity algorithm.Those that remain after pruning are rare itemsets. Note that AfRIM traverses thesearch space from the very top and examines itemsets that have zero support, whichmay be inefficient.

4.1.9. Rarity. Troiano et al. [2009] notes that rare itemsets are at the top of the searchspace like AfRIM, so that bottom-up algorithms must first search through many layersof frequent itemsets. In order to avoid this, Troiano et al. proposed the Rarity algorithmthat begins by identifying the longest transaction within the database and uses it toperform a top-down search for rare itemsets, thereby avoiding the lower layers thatonly contain frequent itemsets. In Rarity, potentially rare itemsets (candidates) arepruned in two different ways. Firstly, all k-itemset candidates that are the subset ofany of the frequent k + 1-itemsets are removed as candidates, since they must be fre-quent according to the downward closure property. Secondly, the remaining candidateshave their supports calculated, and only those that have a support below the thresh-old are used to generate the k − 1-candidates. The candidates with supports above thethreshold are used to prune k − 1-candidates in the next iteration.



DRAFT


4.1.10. Frequent Rare Itemset Mining Algorithm (FRIMA). FRIMA was proposed by Hoqueet al. [2013] uses bottom-up approach and attempts to find the frequent as well as rareitemsets based on the downward closure property of Apriori. FRIMA takes a datasetand minsup as input and finds frequent and rare itemsets as outputs. FRIMA initiallyscans the dataset once and finds all the frequent and rare single itemsets. Based on thesupport count it categorizes the single itemsets as zero itemsets having support countzero, frequent itemsets having support count greater than minsup and rare itemsetshaving support count less than minsup. The algorithm then generates three candidatelists. The first candidate list is generated from the frequent single-itemset, secondcandidate list is generated from the rare itemset and the third list is generated bycombining frequent and rare itemsets. Now, these three lists are combined to make onesingle list and make one database scan to find the zero, frequent and rare itemsets ofsize 2. These frequent and rare 1 and 2-itemsets are maintained in a dynamic structureand used towards generation of frequent and rare itemsets greater than 2.

4.1.11. Mining High Utility Rare Itemsets (HURI). High Utility Rare Itemset Mining (HURI)algorithm was proposed by Pillai et al. [2013] to generate high utility rare itemsets ofusers’ interest. HURI is a two-phase algorithm, Phase 1 generates rare itemsets andPhase 2 generates high utility rare itemsets, according to users’ interest. In the firstphase, the key idea was to generate rare itemsets are generated by considering thoseitemsets which have support value less than the maximum support threshold usingApriori-Inverse [Koh and Rountree 2005]. In the second phase, the itemsets found inPhase 1 are pruned based on the utility value. Here high utility rare itemsets havingutility value greater than the minimum utility threshold are generated.

4.1.12. Automated Apriori-Rare. Sadhasivam and Angamuthu [2011] proposed a methodthat adopts both Apriori and MSApriori algorithms for frequent and rare item genera-tion called Automated Apriori-Rare. It mines frequent items belonging to three differ-ent item groups namely Most interesting Group (MiG), Somewhat interesting Group(SiG), and Rare interesting Group (RiG). MiG and RiG use calculated levelwise auto-mated support thresholds like Apriori, whereas RiG uses item-wise support thresholdslike MSApriori. The itemsets which have support greater than the average support ofitems in that level are added to the MiG at that level.

The itemsets which have less support than average support in a level may turn tobe somewhat interesting. Itemsets which have lower support than the average supportare noted as a somewhat interesting itemset. The threshold used for filtering the SiGis known as MedianSup. Rare interesting items are the items with less support andhigh confidence. The itemsets which have less support than AvgSup and MedianSupshould be considered when searching for interesting rare itemsets. The transactionswhich consist of rare itemsets should be separated into a group rare items transactionsand the rare itemsets are found by scanning the itemsets in that group. The remainingtransactions are used to find the MiGs and SiGs.

4.1.13. Observation. The approaches in this section try to vary the support constraintin some fashion to allow some rare items to be included in frequent itemset generation.MIS, RSAA, Adaptive Apriori, WARM, ARIMA, FRIMA, and AfRIM all use a weightingmethod of some form to adapt or customize the support threshold, whereas LPMineruses a pattern length decreasing support constraint and Apriori-Inverse, HURI, andFRIMA constrain the rules generated using two different support bounds. Like Aprioriand MSApriori, RSAA, Adaptive Apriori, WARM, LPMiner, MCCFP-growth, AfRIM,Automated Apriori-Rare are exhaustive in their generation of rules, and so it is nec-essary to spend time looking for rules with high support and high confidence. If thevaried minimum support value is set close to zero, they will take a similar amount



DRAFT


of time to that taken by Apriori to generate low-support rules in amongst the high-support rules. These methods generate all rules that have high confidence and highsupport. In order to include rare items, the minsup threshold must be lower, whichconsequently generates an enormous set of rules consisting of both frequent and infre-quent items.

4.2. Without Support ThresholdThe previous section discussed some of the approaches that use a variable supportthreshold to include some rare items in rule generation. But to ensure each rare itemis considered, the minimum support threshold must still be pushed low, resulting ina combinatorial explosion in the number of rules generated. In order to overcome thisproblem, some researchers have proposed to remove the support-based threshold. In-stead they use another constraint such as similarity or confidence-based pruning.

Using the example of rule espresso machine → coffee grinder where the confidenceis 80%. We obtained two other rules coffee tamper → coffee grinder with confidenceis 80% and coffee tamper, espresso machine → coffee grinder with confidence is 90%.Using these techniques without minimum support such as confidence pruning we cansee note that both the rules stated above would be valid. However if we obtained twoother rules decalcifying tablets → coffee grinder with confidence is 80% and decal-cifying tablets, espresso machine → coffee grinder with confidence is 70%; the secondrule decalcifying tablets, espresso machine → coffee grinder would not meet the confi-dence constraints. The confidence of the second rule is lower than the first rule thusnot providing us with additional information.

4.2.1. Min-Hashing and its variations. Variations on the Min-Hashing technique were in-troduced by Cohen et al. [2001] to mine significant rules without any constraint onsupport. Transactions are stored as a 0/1 matrix with as many columns as there areunique items. Rather than searching for pairs of columns that have high support orhigh confidence, Cohen et al. [2001] search for columns that have high similarity,where similarity is defined as the fraction of rows that have a 1 in both columns whenthey have a 1 in either column. This is also known as latent semantic indexing in In-formation Retrieval. Although this is easy to do by brute force when the matrix fitsinto main memory, it is time-consuming when the matrix is disc-resident. Their solu-tion is to compute a hashing signature for each column of the matrix in such a waythat the probability that two columns have the same signature is proportional to theirsimilarity. After signatures are calculated, candidate pairs are generated, and thenfinally checked against the original matrix to ensure that they do indeed have strongsimilarity. It should be noted that the hashing solution will produce many rules thathave high support and high confidence, since only a minimum acceptable similarityis specified. It is not clear whether the method will extend to rules that contain morethan two or three items, since (mk ) checks for similarity must be done, where m is thenumber of unique items in the set of transactions, and k is the number of items thatmight appear in any one rule.

Removing the support requirement entirely is an elegant solution, but it comes ata high cost of space: for n transactions containing an average of k items over m possi-ble items, the matrix will require n ×m bits, whereas the primary data structure forApriori-based algorithms will require n × log2m × k bits. Note that itemsets with lowsimilarity may still produce interesting rules. For example, in dataset D we are givenitems A, B, and C where sup(A,D) = 0.50, sup(B,D) = 0.50, and sup(C,D) = 0.50. Ifsup(A¬BC,D) = 0.25 and sup(¬ABC,D) = 0.25, R1 : A→ B and R2 : B → C, it shouldbe interesting but no pairs of items are strongly similar to each another.



DRAFT


4.2.2. Confidence-Based Pruning. Another constraint known as confidence-based prun-ing was proposed by Wang et al. [2001]. It finds all rules that satisfy a minimum confi-dence, but not necessarily a minimum support threshold. They call the rules that sat-isfy this requirement confident rules. The problem with mining confident rules is that,unlike support, confidence does not have a downward closure property. Wang et al.[2001] proposed a confidence-based pruning that uses the confidence requirement inrule generation. Given three rules R1 : A → B, R2 : AC → B, and R3 : AD → B, R2

and R3 are two specialisations of R1, having additional items C and D. C and D areexclusive and exhaustive in the sense that exactly one will hold up in each itemsetbut they will not appear together in the same itemset. The confidence of R2 and R3

must be greater than or equal to that of R1. We can prune R1 if neither R2 nor R3 isconfident. This method has a universal-existential upward closure. This states that ifa rule of size k occurs above the given minimum confidence threshold, then for everyother attribute not in the rule (C and D in the given example), some specialisationof size k + 1 using the attribute must also be confident. They exploit this property togenerate rules without having to use any support constraints.

4.2.3. H-Confidence. Xiong et al. [2003] try to improve on the previous confidence-based pruning method. They propose the h-confidence measure to mine hypercliquepatterns. A hyperclique pattern is a type of association containing objects that arehighly affiliated with each other; that is, every pair of objects in a hyperclique patternis guaranteed to have a cosine similarity (uncentered correlation coefficient) above acertain level. They show that h-confidence has a cross-support property which is usefulfor eliminating candidate patterns having items with widely different supports. The h-confidence of an itemset P = {i1, i2, . . . , im} in a database D denoted by hconf(P, D), isa measure that reflects the overall affinity among items within the itemset.

A hyperclique pattern P is a strong-affinity association pattern because the presenceof any item x ∈ P in a transaction strongly implies the presence of P\{x} in the sametransaction1. To that end, the h-confidence measure is designed specifically for captur-ing such strong affinity relationships. Nevertheless, even when including hypercliquepatterns in rule generation, we can also miss interesting patterns. For example, anitemset {A,B,C} that produces low confidence rules A→ BC, B → AC, and C → AB,but a high confidence rule AB → C, would never be identified.

4.2.4. Minimally Infrequent Itemset (MINIT). Haglin and Manning [2007] designed an al-gorithm called MINIT (Minimally Infrequent Item set) for mining minimal infrequentitems. The algorithm works by sorting and ranking items based on the support. Ateach step, the items having support less than minimum support are chosen and onlythe transactions containing those items are selected for further processing. Initially, aranking of items is prepared by computing the support of each of the items and thencreating a list of items in ascending order of support. Minimal τ -infrequent itemsetsare discovered by considering each item in rank order, recursively calling MINIT onthe support set of the dataset with respect to the item considering only those itemswith higher rank than item, and then checking each candidate minimal infrequentitemset against the original dataset. One mechanism that is used to consider onlyhigher-ranking items in the recursion is to preserve a vector indicating which itemsremain feasible at each level of the recursion.

4.2.5. Observation. Similar to the approaches in the previous section, these algorithmssuffer from the same drawback of generating all the frequent rules as well as the rare

1The \ denotes exclusion, where P\{x} returns the pattern P excluding {x}.



DRAFT


rules. In most of these approaches we would need a post-pruning method to filter outthe frequent rules or the trivial rules produced.

4.3. Consequent Constraint-Based ApproachUsing a variable support threshold or no support threshold would generate frequentrules as well as rare rules. Here we will discuss an approach that tries to generateonly rare rules. The research below imposes a restriction called consequent constraint-based rule mining. In this approach, an item constraint is used which requires minedrules to satisfy a given constraint. In this section we will discuss five slightly differ-ent methods that use this approach: Dense-Miner, DS (Direction Setting) rules, EP(Emerging Pattern), Fixed-Consequent ARM (Association Rule Mining), and MIISR(Mining Interesting Imperfectly Sporadic Rules).

Using this technique we are detecting rules with consequent items that are pre-determined. For example, given a medical dataset we would like to detect symptomsthat lead to a rare disease such as Meningitis that appears less than 1.38 × 10−4%of the time in the population. We set the consequent of the rule as Meningitis. Wethen produce rules that fulfill this particular constraint. The items in the antecedentof the rule may be rare or frequent. In this example, we obtain the rule stiff neck,rash, headaches, aversion to light→ Meningitis. In this particular rule, we can expectthat the symptoms stiff neck, rash, headaches would occur relatively frequently in thedataset, whereas aversion to light would be a less common symptom. Together all foursymptoms would point to Meningitis.

4.3.1. Dense-Miner. Bayardo et al. [2000] noted that the candidate frequent itemsetsgenerated are too numerous in dense data, even when using an item constraint. Adense dataset has many frequently occurring items, strong correlations between sev-eral items, and many items in each record. Thus Bayardo et al. [2000] use a consequentconstraint-based rule mining approach called Dense-Miner. They require mined rulesto have a given consequent C specified by the user.

They also introduce an additional metric called improvement. The key idea is to ex-tract rules with confidence above a minimum improvement value greater than any ofthe simplifications of a rule. A simplification of a rule is formed by removing one ormore items from its antecedent. Any positive minimum improvement value would pre-vent unnecessarily complex rules from being generated. A rule is considered overlycomplex if simplifying its antecedent results in a rule with higher confidence. The im-provement of a rule A→ C is defined as the minimum difference between its confidenceand the confidence of any proper sub-rule with the same consequent.

improvement(A→ C,D) = conf(A→ C,D)−max{conf(A′ → C,D)|A′ ⊂ A}

If the improvement of a rule is greater than 0, then removing any non-empty combi-nation of items from the antecedent will lower the confidence by at least the improve-ment. Thus every item and every combination of items present in the antecedent of arule with a large improvement are important contributors to its predictive ability. Incontrast, it is considered undesirable for the improvement of a rule to be negative, asit suggests that the extra elements in the antecedent detract from the rule’s predictivepower.

In their approach, rules with consequent C must meet the user-specified mini-mum support, confidence, and improvement thresholds. A rule is considered to have alarge improvement if its improvement is at least the specified minimum improvement.Dense-Miner’s advantage over previous systems is that it can utilise rule constraintsto efficiently mine consequent-constrained rules at low support.



DRAFT


4.3.2. Direction Setting. Liu et al. [1999b] proposed a rule pruning technique similar toDense-Miner. Their approach uses a chi-square significance test which tries to pruneout spurious and insignificant rules that appear by coincidence rather than random as-sociation. They also introduce direction setting rules. If the antecedent and consequentof a rule are positively associated, we say that the direction is 1, whereas if they arenegatively associated we say that the direction is -1. If the antecedent and consequentare independent, then the direction is 0. The expected directions for R : AB → C is theset of directions for both R1 : A→ C and R2 : B → C. If R1 and R2 have direction of −1but R has the direction of 1, R would be considered surprising. The rule R is a directionsetting (DS) rule if it has a positive association (direction of 1) and its direction is notan element of the set of expected directions. For R to be a DS rule, the direction for bothR1 and/or R2 should not be 1.

4.3.3. Emerging Pattern. The Emerging Pattern (EP) method was proposed by Li et al.[1999]. Given a known consequent C, a dataset partitioning approach is used to findtop rules, zero-confidence rules, and µ-level confidence rules. The dataset, D, is dividedinto sub-datasets D1 and D2; where D1 consists of the transactions containing theknown consequent and D2 consists of transactions which do not contain the conse-quent. All items in C are then removed from the transactions in D1 and D2. Using thetransformed dataset, EP then finds all itemsets X which occur in D1 but not in D2. Foreach X, the rule X → T is a top rule in D with confidence of 100%. On the other hand,for all itemsets, Z, that only occur in D2, all transactions in D which contain Z mustnot contain C. Therefore Z → C has a negative association and is a zero-confidencerule. For µ-level confidence rules Y → C the confidences are greater than or equal to1− µ. The confidences of µ-level rules must satisfy:

sup(Y,D1)|D1|sup(Y,D1)|D1|+ sup(Y,D2)|D2|

≥ 1− µ

Note that sup(Y,D1)|D1| is the number of times itemset Y C appears together indataset D and sup(Y,D1)|D1| + sup(Y,D2)|D2| is the number of times itemset Y ap-pears in dataset D. This approach is considered efficient as it only needs one passthrough the dataset to partition and transform it. Of course, in this method one mustsupply C.

4.3.4. Fixed-Consequent Association Rule Mining. Rahal et al. [2004] proposed a slightlydifferent approach. They proposed a method that generates the highest support rulesthat matched the user’s specified minimum without having to specify any supportthreshold. Fixed-consequent association rule mining generates confident minimalrules using two kinds of trees [Rahal et al. 2004]. Given two rules, R1 and R2, withconfidence values higher than the confidence threshold, where R1 is A → C and R2 isAB → C, R1 is preferred, because the antecedent of R2 is a superset of the antecedentof R1. The support of R1 is necessarily greater than or equal to R2. R1 is referred to asa minimal rule (“simplest” in the notation of [Bayardo et al. 2000]) and R2 is referredto as a non-minimal rule (more complex). The algorithm was devised to generate thehighest support rules that match the user specified minimum confidence thresholdwithout having the user specify any support threshold.

4.3.5. Mining Interesting Imperfectly Sporadic Rules (MIISR). Koh et al. [2006] proposed anapproach to find rare rules with candidate itemsets that fall below a maxsup (maxi-mum support) level but above a minimum absolute support value. They introduced analgorithm called Apriori-Inverse to find sporadic rules efficiently, for example, a rareassociation of two common symptoms indicating a rare disease. They later proposedanother approach called MIISR. In that approach, the consequent of these rules is an



DRAFT


item below maxsup threshold and the antecedent has support below maxsup but mayconsist of individual items above maxsup. In both approaches they use minimum abso-lute support (minabssup) threshold value derived from an inverted Fisher’s exact testto prune out noise. At the low levels of co-occurrences of candidate itemsets that needto be evaluated to generate rare rules, there is a possibility that such co-occurrenceshappen purely by chance and are not statistically significant. The Fisher’s exact testprovided a statistically rigorous method of evaluating significance of co-occurrencesand was thus an integral part of their approach.

4.3.6. Observation. These algorithms are only useful when we have prior knowledgethat a particular consequent is of interest. Most rare items occur infrequently andthus may go undetected. This makes it unsuitable for generating rare item rules effi-ciently because we want to generate rules without needing prior knowledge of whichconsequents ought to be interesting.

4.4. Tree Based ApproachAll of the above algorithms use the fundamental generate-and-test approach used inApriori, which has potentially expensive candidate generation and pruning steps. Inaddition, these algorithms attempt to identify all possible rare itemsets, and as a resultrequire a significant amount of execution time.

The techniques in this section produces rules which are from subsets of the tech-niques discussed from Sections 4.1 to 4.3. The main difference is that the techniquesproposed in this section use a tree based data structure similar to FP-growth [Hanet al. 2000] that removes the need for expensive candidate generations.

4.4.1. Rare Pattern Tree (RP-Tree). RP-Tree algorithm was proposed by Tsang et al.[2011] as a solution to these issues. RP-Tree avoids the expensive itemset generationand pruning steps by using a tree data structure, based on FP-Tree, to find rare pat-terns. RP-Tree finds rare itemsets using a tree structure, using a two-pass approach.RP-Tree performs one database scan to count item support. During the second scan,RP-Tree uses only the transactions which include at least one rare item to build theinitial tree, and prunes the others, since transactions that only have non-rare itemscannot contribute to the support of any rare-item itemset.

The proposed RP-Tree algorithm is an improvement over these existing algorithmsin three ways. Firstly, RP-Tree avoids the expensive itemset generation and pruningsteps by using a tree data structure, based on FP-Tree, to find rare patterns. Secondly,RP-Tree focuses on rare-item itemsets which generate interesting rules and does notspend time looking for uninteresting non-rare-item itemsets. Thirdly, RP-Tree is basedon FP-Growth, which is efficient at finding long patterns, since the task is divided intoa series of searches for short patterns. This is especially beneficial since rare patternstend to be longer than frequent patterns. As RP-Tree uses a two pass approach it isnot suitable in a data stream environment. Moreover RP-Tree is developed to generaterules produced from rare-item itemset, whereby the itemsets consist of only rare itemsor consist of both rare and frequent items. Thus RP-Tree only generates a specific typeof rare rules. They would miss out on rules generated from itemsets that were rare butconsist of individually frequent items.

4.4.2. Inverse FP- Tree (IFP-Tree). The Inverse FP-tree (IFP-tree) [Gupta et al. 2012] isa variant of the FP-tree. The techniques mines minimally infrequent itemsets (MII)which is an extension to MINIT [Haglin and Manning 2007]. An itemset is a MII ifand only if it is infrequent and all its subsets are frequent. The algorithm producesthe same type of rules in MINIT but the data structure used is different. The IFP-minalgorithm that used IFP-Tree recursively mines minimally infrequent itemsets by di-



DRAFT


viding the IFP-tree into two sub-trees: projected tree and residual tree. The projecteddatabase corresponds to the set of transactions that contains a particular item. A po-tential minimal infrequent itemset mined from the projected tree must not have anyinfrequent subset. The itemset itself is a subset since it is actually the union with theitem of the projected tree that is under consideration. The IFP-min algorithm consid-ers the projected tree of only the least frequent item (lf-item). A residual tree for aparticular item is a tree representation of the residual database corresponding to theitem, i.e., the entire database with the item removed. Similar to the projected tree,the IFP-min algorithm considers the residual tree of only the lf-item. The use of resid-ual trees reduces the computation time. It was shown that IFP-tree is more suitablefor large dense datasets, whereas it does not work as well for small dense datasetscompared to MINIT.

4.4.3. Maximum Constraint based Conditional Frequent Pattern-growth (MCCFP). Anothermultiple minimum support framework technique proposed was maximum constraintmodel by Kiran and Krishna Reddy [2012]. In this research, each item has a specifiedsupport constraint, called minimum item support (MIS). A minsup of a pattern is thenrepresented with the maximal MIS value for all items contained in the pattern. Thus,each pattern can satisfy a different minsup depending upon the items within it. Theirproposed model is based on an Apriori-like approach with granular computation of bitstrings for frequent pattern mining. Their proposed approach called Maximum Con-straint based Conditional Frequent Pattern-growth (MCCFP-growth), utilizes priorknowledge of the items MIS values which was user defined to generate frequent pat-terns on the transactional database using a single pass.

MCCFP-growth is an FP-growth-like approach. In MCCFP-growth, the initial treeis constructed from all items in the transaction dataset. MCCFP-growth uses multipleminsups to discover complete set of frequent patterns by using a single scan on thedataset. MCCFP-growth constructs a tree in descending order of items based on MISvalue. Since MCCFP-growth constructs a tree using the descending order of itemsbased on MIS, it requires relatively more memory. Despite this it is still more efficientas compared to Apriori based approaches because it avoids the combinatorial explosionof candidate generation and multiple scans on a transactional dataset.

4.4.4. Targeted Rare Association Rule Mining Using Itemset Trees and the Relative SupportMeasure (TRARM-RelSup). Lavergne et al. [2012] proposed a novel targeted associationmining approach to rare rule mining using the itemset tree data structure (TRARM-RelSup). This algorithm combines the efficiency of targeted association mining query-ing with the capabilities of rare rule mining; this results in discovering a more fo-cused, frequent and rare rules for the user, while keeping the complexity manageable.TRARM-RelSup combines the efficiency of the itemset tree for targeted associationmining with the strategy of using relative support in order to efficiently find targetedrare association rules. Targeted association mining discovers rules based upon the in-terests of users and, therefore, can generate more focused results. The itemset treedata structure was developed to facilitate the targeted rule mining process.

4.4.5. Observation. Tree based approaches are more efficient than the other types ofapproaches as they do not require a candidate generation phase used in other ap-proaches. However building the tree for rare pattern mining is more complicated.

4.5. Key DifferencesIn this section we highlight the key differences between the approaches. We look at theadvantages of a particular approach compared to the other approaches in this section.



DRAFT


— With Variable Support Threshold. Most of the techniques in this approach pro-duces a more general set of rules, as compared to techniques in the other approaches:without support threshold and consequent constraint-based approach. These tech-niques would often produce a large number of rules compared to the two other ap-proaches, which works well in a general exploratory mining. Thus generating a largenumber of rules which provides more coverage over information that can be found inthe dataset.

— Without Support Threshold. This approach normally adopts specific additionalpruning constraints such as confidence based pruning. Using the additional con-straints, rules generated are normally more actionable as compared to the moregeneralized rules in the previous approach. The techniques in this approach usesparticular constraints to prune rules that it deems too trivial or do not provide addi-tional information.

— Consequent Constraint-Based Approach. This approach works well when theusers have existing knowledge of existing rare items to target, which means thatthis approach would be more efficient in terms of time complexity as it only producesa specific set of the rules.

— Tree Based Approach. The biggest advantage of tree based approach is that it isa variant of FP-Tree approaches which is a more efficient tactic of generating rarerules as the minimum support threshold has to be set low. Overall the techniquesin this approach also assimilate a variation of the different constraints used in theother approaches

5. DETECTING RARE RULES IN DYNAMIC DATAEver since its inception, data stream mining has remained one of the more challengingproblems within the data mining discipline. This is mainly due to the nature of datastreams being continuous and unbounded in size as opposed to traditional databaseswhere data is static and stable. Data from a wide variety of application areas rang-ing from online retail applications such as online auctions, telecommunications calldata, credit card transactions, sensor data and climate data are a few examples of ap-plications that generate vast quantities of data on a continuous basis. Data producedby such applications are highly volatile with new patterns and trends emerging on acontinuous basis. The unbounded size of data streams is considered the main obsta-cle to processing data streams. As it is unbounded, it makes it infeasible to store theentire data on disk. Furthermore, we would want to process data streams near realtime. This raises two issues. Firstly, a multi-pass algorithm cannot be used becausethe entire dataset would need to be stored before mining can commence. In addition,scanning the input data more than once is considered to be costly and inefficient. Sec-ondly, obtaining the exact set of rules that includes both frequent and rare rules fromthe data streams is too expensive.

Furthermore it is not feasible to use static mining techniques, even if we ran it ona recent batch of data. Current static techniques require multiple passes. Due to thehigh speed nature of online data streams, they need to be processed as fast as possible;the speed of the mining algorithm should be faster than the incoming data rate, other-wise data approximation techniques, such as sampling, need to be applied which willdecrease the accuracy of the mining results [Jiang and Gruenwald 2006]. The miningmethod on data streams needs to adapt to their changing data distribution; otherwise,it will cause a concept drifting problem. Thus, enforcing mining to the most recentbatch of data, may put the technique at risk of unnecessarily mining and requiring ad-ditional system resources when there is no change in the distribution thus not yieldinganymore interesting rules. Moreover if the batch size is set to large, a concept drift mayhave occurred which means that new rare patterns have emerge in the stream but we



DRAFT


would only detected it when mining is carried at the end of the batch of data, thusincreasing the delay time taken to detect the new rules. We now formally define thenotion of concept drift.

Definition 5.1 (Drift Detection). Let S1 = (t1, t2, ..., tm) and S2 = (tm+1, ..., tn) with0 < m < n represent two samples of instances from a stream with population meansµ1 and µ2 respectively. Then the drift detection problem can be expressed as testingthe null hypothesis H0 that µ1 = µ2 that the two samples are drawn from the samedistribution against the alternate hypothesis H1 that they arrive from different distri-butions with µ1 6= µ2.

In practice the underlying data distribution is unknown and a test statistic based onsample means needs to be constructed by the change detector M . If the null hypothesisis accepted incorrectly when a change also known as drift has occurred then a falsenegative is said to have taken place. On the other hand ifM acceptsH1 when no changehas occurred in the data distribution then a false positive is said to have occurred.

In data streams we can only look at the transactions within the stream once, thus, aone-pass approach is necessary. This rules out the possibility of building a tree basedon the frequency of items within the data stream. The frequency of an item may changeas the stream progresses. There are four scenarios which we need to consider in astream [Tsang et al. 2011]:

Scenario 1. A frequent item x at time T1 may become rare at time T2.Scenario 2. A rare item x at T1 may become frequent at time T2.Scenario 3. A frequent item x at T1 may remain frequent at time T2.Scenario 4. A rare item x at T1 may remain rare at time T2.

T1 represents a point in time, and T2 represents a future point in time after T1. More-over concepts may change over time in data streams, most current techniques in pat-tern mining for data streams do not detect these changes. Figure 3 illustrates a sketchof the taxonomy.

Dynamic Rare Pattern Mining

Without Drift With Drift

Infrequent ItemMining in MultipleData Stream

SRP-Tree

MIP-DS

RPDD

Fig. 3. Dynamic Rare Pattern Mining

5.1. Detecting Rare Rules in Dynamic Data without DriftIn this section we discuss approaches which detect rare rules in data stream withoutconsidering drifts. In these algorithms mining is are made at fixed intervals of time,



DRAFT


thus adapting to changes in data. The accuracy of such predictive models is hinderedby the ability to set the correct interval.

5.1.1. Infrequent Item Mining in Multiple Data Streams. Saha et al. [2007] addressed twomajor issues in infrequent pattern mining. The first issue is scalability in terms ofmemory requirements and pattern selection over time horizons that vary in span. Theydeveloped a framework for detecting infrequent patterns in stream data which allowsthe discovery of patterns at different time resolutions. In general, the framework se-lects the infrequent patterns and stores them into a data structure that allows uniquepatterns across the stream to be stored at the top of the data structure. One very im-portant aspect of this framework is its ability to handle multiple data streams andallows the discovery of infrequent patterns across multiple data streams.

As the underlying approach of the algorithm uses an online hierarchical structure,which allows us to map the concise description of the infrequent patterns within thedata stream. Based on this, the structure allows the data stream to be incrementallyprocessed.

5.1.2. Streaming Rare Pattern Tree (SRP-Tree). Streaming Rare Pattern Tree (SRP-Tree)[Huang et al. 2012] captures the complete set of rare rules in data streams using onlya single scan of the dataset using a sliding window approach. SRP-Tree technique isa one-pass only strategy which is capable of mining rare patterns in a static databaseor in a dynamic data stream. In the case of mining data streams, this technique isalso capable of mining at any given point of time in the stream and with differentwindow and block sizes. The authors propose two variations of the technique wherethe first one involves using a novel data structure called the Connection Table whichallows us to efficiently constrain the search in our tree, and keep track of items withinthe window. The second variation improves upon the first and requires the tree to gothrough restructuring before mining with FP-Growth.

5.1.3. Mining Minimal Infrequent Patterns from Data Streams (MIP-DS). A multi pass algo-rithm MIP-DS was proposed by Hemalatha et al. [2015] to mine minimal infrequentpatterns. The proposed technique works in two phases. In Phase 1, a sliding windowis used to bound the dataset and minimal infrequent patterns are mined from the ob-served stream. In Phase 2, minimal infrequent pattern deviation factor is computedfor each minimal infrequent pattern, which captures the deviation of the support froma specified user defined minimum support threshold. The sum of minimal infrequentpattern deviation factor of minimal infrequent patterns for each transaction is com-puted and weighted using a transaction weighting factor.

5.2. Detecting Rare Rules in Dynamic Data with DriftWith the previous techniques we would run the risk of either mining too frequentlywhich is an expensive task, or too infrequently where changes in the data stream areonly detected ith significant latency. Finding the appropriate “sweet” spot for min-ing is close to an impossible task without incorporating a drift detector. The miningmethod of data streams needs to adapt to their changing data distribution; otherwise,its results may be inaccurate due to concept drift. In an environment of continuous,high speed, and changing data distribution characteristics, the analysis results of datastreams often keep changing as well. Therefore, mining of data streams should be anincremental process to keep up with the changes in the data stream.

5.2.1. Rare Pattern Drift Detector (RPDD). The Rare Pattern Drift Detector (RPDD)[Huang et al. 2014] algorithm is designed to find rare pattern changes from unlabeledtransactional data streams intended for unsupervised association rule mining. Overallthe framework has two components. One of the components is the processing of stream



DRAFT


data. They detail the actual drift detection component of the algorithm. The authorsintroduce a novel M measure, a consolidated numerical measure that represents thestate of item association. The selected rare items from the stream processing compo-nent will be individually tracked and monitored. Essentially, each selected item willhave a separate process that monitors the item associations it has with other items inthe stream using the M measure.

5.3. Key DifferencesIn this section we highlight the key differences between the two approaches. We anal-yse and compare the two approaches in this section.

— Dealing with dynamic data without drift. Most of the current stream data min-ing methods in this approach have user define parameters that are required beforetheir execution, such as the size of the sliding window; however, most of them donot mention how users can adjust these parameters while the system is running. Itis not desirable or feasible for users to wait until a mining algorithm to completebefore they can reset the parameters. Moreover none of the current techniques inthis approach addresses the concept drift problem, which is crucial effective streammining.

— Dealing with dynamic data with drift. The technique in this approach allowsusers to detect changes of pattern in the stream when there is a change in datadistribution. The main drawback of the current proposed technique is it focuses ondetecting drifts or changes in data distribution of targeted rare items, and thus onlymonitoring a subset of possible rare rules.

6. EVALUATION OF THE DIFFERENT TECHNIQUESIn this section we compare and contrast the different techniques which we discussedabove. We evaluate the techniques by comparing them based on five different charac-teristics: complexity, base algorithm, constraints, input data, and output itemsets. In

Table II. Summary of the Different Techniques

Technique Complexity: Pass Base Algorithm Constraints Input: Data Output: Itemsets

Single Two Multiple CandidateGeneration

Variation ofFP-Growth Others

MulitpleSupport

Threshold

PruningConstraints

Weights/Ranking Static Stream Frequent

& Rare Rare

MSApriori x x x x xRSAA x x x x xWARM x x x x xLPMiner x x x x xApriori-Inverse x x x x xARIMA x x x x x xAfRIM x x x x xRarity x x x x xFRIMA x x x x xHURI x x x x xAutomatedApriori-Rare x x x x x

Min-Hashing x x x x xConfidence-BasedPruning x x x x x

H-Confidence x x x x x xMINIT x x x x xDense-Miner x x x x xDirection Setting x x x x xEmerging Pattern x x x x xMIISR x x x x xRP-Tree x x x x xIFP-Tree x x x x x xMCCFP x x x x x xTRARM-RelSup x x x x xInfrequent ItemMining inMultiple DataStreams

x x x x

SRP-Tree x x x x xMIP-DS x x x x xRPDD x x x x x



DRAFT


complexity we look at the number of passes the technique carries out over the dataset.In base algorithm, we look at the underlying algorithm that the technique is based onsuch as Apriori-like candidate generation, variation of FP-Growth or inverted index.In constraints, we look at whether the technique requires multiple support thresholds,additional rule constraints, or additional weights. We then compare the type of inputtaken by the techniques and the type of itemsets produced by the technique. Table IIshows the evaluation of the techniques. The ‘x’ marks the characteristic that is fulfilledby the technique.

7. RARE PATTERN MINING VS ANOMALY DETECTIONA similar area to rare association rules is anomaly detection. Both of these areashave some similar characteristics but are distinctly different. Anomalies are patternsin data that do not conform to a well-defined notion of normal behaviour. Like rarepattern mining, techniques in this category make the implicit assumption that nor-mal instances are far more frequent than anomalies in the test data. Since frequentevents are usually considered normal, anomalies are usually rare. Based on the nor-mal events, data mining methods can generate, in an automated manner, the modelsthat represents normalcy and identify deviations, which could be considered anomalies[Chandola et al. 2009].

We can compare anomaly detection and rare association rules based on three parts:input, method, and output. For anomaly detection, input is generally a collection ofdata instances (also referred as object, record, point, vector, pattern, event, case, sam-ple, observation, entity). Each data instance can be described using a set of attributes.The attributes can be of different types such as binary, categorical or continuous. Oncontrary, rare association rules are normally generated from a categorical data set.However we can also represent the input in rare pattern mining as vectors, whereby,each itemset is represented as a vector, while each item corresponds to one dimensionin a vector. The methods used in anomaly detections can be divided into supervised,semi-supervised and unsupervised techniques. The methods used in rare associationrules are typically unsupervised. The outputs produced by anomaly detection tech-niques are normally one of the following two types: scores and labels. Rare patternmining produces descriptive rules which represent dependencies. In order for anomalydetection techniques to generate patterns like association rules, we can use the joindistribution and conditional distribution, both of which can be provided by statisticalmethods in some anomaly detection techniques.

8. CONCLUSIONS AND OPEN CHALLENGESIn this survey, our aim has been to provide a comprehensive review of rare patternmining. This survey describes various problems associated with mining rare patternsand methods for addressing these problems. Overall the techniques can be categorizedinto two major groups: detection techniques for static data and detection techniquesfor dynamic data.

Following our taxonomy in Figure 2, we surveyed various static rare pattern miningtechniques. The static detection methods were divided into four categories: (i) variablesupport threshold, (ii) without support threshold, (iii) consequent constraint-based ap-proach, and (iv) tree based approach. The first category pushes constraints on to theminimum support threshold, whereas the second category uses statistical measures topre-determine the support threshold. The third category pushes constraints onto thestructure of the rule generated, defining the type of rules produced. The last category,uses a tree based approach which is a more efficient type of data structure in termsof performance. Figure 3 illustrates the dynamic rare pattern mining technique. The



DRAFT


dynamic detection methods were divided into two categories: (i) with drift detectionand (ii) without drift detection.

8.1. Open ChallengesWhile a significant amount of research on rare mining is available, much of this work isstill in its infancy. That is, there are no well-established, proven, methods for generallyhandling rare cases. We expect research on this topic to continue, and accelerate, asincreasingly more difficult data mining tasks are tackled. In this part we provide adiscussion of open challenges in the area.

— Rare pattern mining on data streams. While rare pattern mining has been ex-plored in static data, there has been only a few techniques that worked in a dynamicstream environment. It is difficult to differentiate a rare pattern from an emerg-ing pattern that may be frequent in the future. We are interested in the patternsthat remain rare or infrequent over either the processed chunk of the data streamor the entire stream. For example, colds and flu cases tend to be less frequent atthe beginning of autumn as compared to winter. Thus reporting that a rule such asVitamin → Codral (Cold and Flu Medication) as rare would be premature as it isconsidered as an emerging trend and not rare. Current stream mining techniquesare unable to signal differences between the two types of rules. Vice versa the pat-terns may also become frequent during a different time period and then become rareagain.

— Distinguishing noise from signal. When we deal with rare patterns, we need to beable to distinguish between chance occurrences and real rare pattern mining. Noise isan inherent problem when we deal with rare occurrences of instances. This problemdoes not exist when we deal with frequent pattern mining.

— Choosing the right time window. Many algorithms for time evolving data requirea time window for computation of normal/frequent patterns. One of the open ques-tions is how to choose the window to discover the different rare patterns. if a windowis set too small rare patterns may be obscured and not indistinguishable from noise,whereas if a window is set too large the rare patterns will only be found after the en-tire window has been processed, and causes a delay between when the rare patternoccurred and when it is found.

— Scalable real time rare pattern mining. One of the most important future chal-lenges is to develop scalable approaches for streaming data. Specifically researchshould focus on algorithms that are sub-linear to the input or the very least linear.

— Rare Pattern Mining in Probabilistic Datasets. In uncertain transactiondatabases, the support of an itemset is uncertain; it is defined by a discrete probabil-ity distribution function, where each itemset has a frequentness probability. Therehas been research in detecting frequent itemsets mining in uncertain data [Aggar-wal et al. 2009; Leung and Brajczuk 2010; Tong et al. 2012]. However, it would payto harvest probabilistic rare rules as well. The problem of detecting probabilistic rarerules are by far more complicated as probability calculations for rare itemsets areless consistent compared to frequent itemsets.

ACKNOWLEDGMENTS

This work was partially supported by the University of Auckland (FRDF Grant 3702232) and University ofMalaya and Ministry of Education (High Impact Research Grant UM.C/625/1/HIR/MOHE/FCSIT/14).

REFERENCES

Mehdi Adda, Lei Wu, and Yi Feng. 2007. Rare Itemset Mining. In Proceedings of the



DRAFT


Sixth International Conference on Machine Learning and Applications (ICMLA ’07).IEEE Computer Society, Washington, DC, USA, 73–80.

Mehdi Adda, Lei Wu, Sharon White, and Yi Feng. 2012. Pattern Detection with RareItem-set Mining. CoRR abs/1209.3089 (2012).

Charu C. Aggarwal, Yan Li, Jianyong Wang, and Jing Wang. 2009. Frequent PatternMining with Uncertain Data. In Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD ’09). ACM, New York,NY, USA, 29–38.

Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. 1993. Mining Association RulesBetween Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMODInternational Conference on Management of Data (SIGMOD ’93). ACM, New York,NY, USA, 207–216.

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining As-sociation Rules. In Proceedings of the 20th International Conference on Very LargeDatabases, VLDB ’94, Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.). 487– 499.

Roberto J. Bayardo, Rakesh Agrawal, and Dimitrios Gunopulos. 2000. Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Dis-covery 4, 2/3 (2000), 217 – 240.

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: Asurvey. ACM Computing Surveys (CSUR) 41, 3, Article 15 (2009), 58 pages.

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Mot-wani, Jeffrey D. Ullman, and Cheng Yang. 2001. Finding Interesting AssociationRules Without Support Pruning. IEEE Transactions on Knowledge and Data Engi-neering 13 (2001), 64 – 78.

Ashish Gupta, Akshay Mittal, and Arnab Bhattacharya. 2012. Minimally Infre-quent Itemset Mining using Pattern-Growth Paradigm and Residual Trees. CoRRabs/1207.4958 (2012).

David J. Haglin and Anna M. Manning. 2007. On Minimal Infrequent Itemset Mining.In Proceedings of the 2007 International Conference on Data Mining, DMIN 2007,June 25-28, 2007, Las Vegas, Nevada, USA, Robert Stahlbock, Sven F. Crone, andStefan Lessmann (Eds.). CSREA Press, 141–147.

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining Frequent Patterns Without Candi-date Generation. In Proceedings of the 2000 ACM SIGMOD International Conferenceon Management of Data, SIGMOD ’00. ACM Press, 1 – 12.

C. Sweetlin Hemalatha, V. Vaidehi, and R. Lakshmi. 2015. Minimal infrequent patternbased approach for mining outliers in data streams. Expert Systems with Applica-tions 42, 4 (2015), 1998 – 2012.

N. Hoque, B. Nath, and D.K. Bhattacharyya. 2013. An Efficient Approach on RareAssociation Rule Mining. In Proceedings of Seventh International Conference onBio-Inspired Computing: Theories and Applications (BIC-TA 2012), Jagdish ChandBansal, Pramod Kumar Singh, Kusum Deep, Millie Pant, and Atulya K. Nagar(Eds.). Advances in Intelligent Systems and Computing, Vol. 201. Springer India,193–203.

Zhongyi Hu, Hongan Wang, Jiaqi Zhu, Maozhen Li, Ying Qiao, and Changzhi Deng.2014. Discovery of Rare Sequential Topic Patterns in Document Stream. Chapter 60,533–541.

David Huang, Yun Sing Koh, and Gillian Dobbie. 2012. Rare Pattern Mining on DataStreams. In Data Warehousing and Knowledge Discovery, Alfredo Cuzzocrea andUmeshwar Dayal (Eds.). Lecture Notes in Computer Science, Vol. 7448. SpringerBerlin Heidelberg, 303–314.

David Tse Jung Huang, Yun Sing Koh, Gillian Dobbie, and Russel Pears. 2014. Detect-



DRAFT


ing Changes in Rare Patterns from Data Streams. In Advances in Knowledge Discov-ery and Data Mining, Vincent S. Tseng, Tu Bao Ho, Zhi-Hua Zhou, Arbee L.P. Chen,and Hung-Yu Kao (Eds.). Lecture Notes in Computer Science, Vol. 8444. SpringerInternational Publishing, 437–448.

Yan Huang, Jian Pei, and Hui Xiong. 2006. Mining Co-Location Patterns with RareEvents from Spatial Data Sets. GeoInformatica 10, 3 (2006), 239–260.

Yanqing Ji, Hao Ying, John Tran, Peter Dews, Ayman Mansour, and R. Michael Massa-nari. 2013. A Method for Mining Infrequent Causal Associations and Its Applicationin Finding Adverse Drug Reaction Signal Pairs. IEEE Transactions on Knowledgeand Data Engineering 25, 4 (April 2013), 721–733.

Nan Jiang and Le Gruenwald. 2006. Research Issues in Data Stream Association RuleMining. SIGMOD Rec. 35, 1 (March 2006), 14–19.

R.Uday Kiran and Polepalli Krishna Reddy. 2012. An Efficient Approach to Mine RareAssociation Rules Using Maximum Items Support Constraints. In Data Security andSecurity Data (Lecture Notes in Computer Science), LachlanM. MacKinnon (Ed.), Vol.6121. Springer Berlin Heidelberg, 84–95.

Yun Sing Koh and Nathan Rountree. 2005. Finding Sporadic Rules Using Apriori-Inverse. In PAKDD (Lecture Notes in Computer Science), Tu Bao Ho, David Cheung,and Huan Liu (Eds.), Vol. 3518. Springer, 97–106.

Yun Sing Koh, Nathan Rountree, and Richard O’Keefe. 2006. Mining Interest-ing Imperfectly Sporadic Rules. In PAKDD (Lecture Notes in Computer Science),Wee Keong Ng, Masaru Kitsuregawa, Jianzhong Li, and Kuiyu Chang (Eds.), Vol.3918. Springer, 473–482.

Jennifer Lavergne, Ryan Benton, and Vijay V. Raghavan. 2012. TRARM-RelSup: Tar-geted Rare Association Rule Mining Using Itemset Trees and the Relative SupportMeasure. In Foundations of Intelligent Systems, Li Chen, Alexander Felfernig, Jim-ing Liu, and Zbigniew W. Ras (Eds.). Lecture Notes in Computer Science, Vol. 7661.Springer Berlin Heidelberg, 61–70.

Gangin Lee, Unil Yun, Heungmo Ryang, and Donggyu Kim. 2015. Multiple MinimumSupport-Based Rare Graph Pattern Mining Considering Symmetry Feature-BasedGrowth Technique and the Differing Importance of Graph Elements. Symmetry 7, 3(2015), 1151.

Carson Kai-Sang Leung and Dale A. Brajczuk. 2010. Efficient Algorithms for the Min-ing of Constrained Frequent Patterns from Uncertain Data. SIGKDD Explor. Newsl.11, 2 (May 2010), 123–130.

Jinyan Li, Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao, and Qun Sun.1999. Efficient Mining of High Confidence Association Rules without SupportThresholds. In Principles of Data Mining and Knowledge Discovery (Lecture Notes inComputer Science), Jan M. Zytkow and Jan Rauch (Eds.), Vol. 1704. Springer BerlinHeidelberg, 406–411.

Bing Liu, Wynne Hsu, and Yiming Ma. 1999a. Mining Association Rules with MultipleMinimum Supports. In Proceedings of the 5th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining. 337 – 341.

Bing Liu, Wynne Hsu, and Yiming Ma. 1999b. Pruning and Summarizing the Discov-ered Associations. In Proceedings of the Fifth ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, KDD ’99. ACM Press, 125 – 134.

Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, and Alex Pang. 1998. Ex-ploratory mining and pruning optimizations of constrained associations rules. InProceedings of the 1998 ACM SIGMOD international conference on Management ofdata, SIGMOD ’98. ACM Press, 13 – 24.

Jyothi Pillai, O.P. Vyas, and Maybin Muyeba. 2013. HURI A Novel Algorithm for



DRAFT


Mining High Utility Rare Itemsets. In Advances in Computing and InformationTechnology, Natarajan Meghanathan, Dhinaharan Nagamalai, and Nabendu Chaki(Eds.). Advances in Intelligent Systems and Computing, Vol. 177. Springer BerlinHeidelberg, 531–540.

Imad Rahal, Dongmei Ren, Weihua Wu, and William Perrizo. 2004. Mining ConfidentMinimal Rules with Fixed-Consequents. In Proceedings of the 16th IEEE Interna-tional Conference on Tools with Artificial Intelligence, ICTAI’04. IEEE ComputerSociety, 6 – 13.

Ahmedur Rahman, C.I. Ezeife, and A.K. Aggarwal. 2010. WiFi Miner: An OnlineApriori-Infrequent Based Wireless Intrusion System. In Knowledge Discovery fromSensor Data, Mohamed Medhat Gaber, Ranga Raju Vatsavai, Olufemi A. Omitaomu,Joao Gama, Nitesh V. Chawla, and Auroop R. Ganguly (Eds.). Lecture Notes in Com-puter Science, Vol. 5840. Springer Berlin Heidelberg, 76–93.

Cristobal Romero, Jose Raul Romero, Jose Marıa Luna, and Sebastian Ventura. 2010.Mining Rare Association Rules from e-Learning Data. Educational Data Mining(2010), 171–180.

Kanimozhi SC Sadhasivam and Tamilarasi Angamuthu. 2011. Mining rare itemsetwith automated support thresholds. Journal of Computer Science 7, 3 (2011), 394.

B. Saha, M. Lazarescu, and S. Venkatesh. 2007. Infrequent Item Mining in MultipleData Streams. In Seventh IEEE International Conference on Data Mining Work-shops. 569–574.

Masakazu Seno and George Karypis. 2001. LPMiner: An Algorithm for FindingFrequent Itemsets Using Length-Decreasing Support Constraint. In Proceedingsof the 2001 IEEE International Conference on Data Mining ICDM, Nick Cercone,Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, 505 – 512.

Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal. 1997. Mining AssociationRules with Item Constraints. In Proceedings of the Third International Conferenceon Knowledge Discovery and Data Mining, KDD ’97. 67 – 73.

Laszlo Szathmary, Amedeo Napoli, and Petko Valtchev. 2007. Towards Rare ItemsetMining. In Proceedings of the 19th IEEE International Conference on Tools with Ar-tificial Intelligence - Volume 01 (ICTAI ’07). IEEE Computer Society, Washington,DC, USA, 305–312.

Feng Tao, Fionn Murtagh, and Mohsen Farid. 2003. Weighted Association Rule Miningusing weighted support and significance framework. In Proceedings of the NinthACM SIGKDD International Conference on Knowledge Discovery and Data Mining,KDD ’03. ACM Press, 661 – 666.

Yongxin Tong, Lei Chen, Yurong Cheng, and Philip S. Yu. 2012. Mining FrequentItemsets over Uncertain Databases. Proc. VLDB Endow. 5, 11 (July 2012), 1650–1661.

Luigi Troiano, Giacomo Scibelli, and Cosimo Birtolo. 2009. A Fast Algorithm for Min-ing Rare Itemsets. In Proceedings of the 2009 Ninth International Conference onIntelligent Systems Design and Applications (ISDA ’09). IEEE Computer Society,Washington, DC, USA, 1149–1155.

Sidney Tsang, Yun Sing Koh, and Gillian Dobbie. 2011. RP-Tree: Rare Pattern TreeMining. In DaWaK (Lecture Notes in Computer Science), Alfredo Cuzzocrea andUmeshwar Dayal (Eds.), Vol. 6862. Springer, 277–288.

Ke Wang, Yu He, and David W. Cheung. 2001. Mining confident rules without supportrequirement. In Proceedings of the Tenth International Conference on Informationand Knowledge Management. ACM Press, 89 – 96.

Ke Wang, Yu He, and Jiawei Han. 2003. Pushing Support Constraints Into AssociationRules Mining. IEEE Transactions Knowledge Data Engineering 15, 3 (2003), 642 –658.



DRAFT


Gary M. Weiss. 2004. Mining with rarity: a unifying framework. SIGKDD ExplorationNewsletter 6, 1 (2004), 7 – 19.

Hui Xiong, Pang-Ning Tan, and Vipin Kumar. 2003. Mining Strong Affinity AssociationPatterns in Data Sets with Skewed Support Distribution. In ICDM. IEEE ComputerSociety, 387 – 394.

Hyunyoon Yun, Danshim Ha, Buhyun Hwang, and Keun Ho Ryu. 2003. Mining As-sociation Rules on Significant Rare Data Using Relative Support. The Journal ofSystems and Software 67, 3 (15 September 2003), 181 – 191.



0 Unsupervised Rare Pattern Mining: A Surveyyunsing/publications/TKDD...Detecting rare patterns in...

Documents

Transcript of 0 Unsupervised Rare Pattern Mining: A Surveyyunsing/publications/TKDD...Detecting rare patterns in...