mobile marketing

uSu

motanal foendistexnog contextual data, i.e. elimination of redundant rules that occur when multiple

o knowargeted's proof a cuparticu

Decision Support Systems 56 (2013) 234246

Contents lists available at ScienceDirect

Decision Supp

j ourna l homepage: www.eMPM) has become an increasingly important marketing tool be-cause the ubiquity, interactivity, and localization of mobile devicesoffers great potential for collecting customers' information, under-standing their preferences and quickly advertising customizedproducts [16,37,60]. Recent studies have predicted that the volumeof business transactions associated with MPM will soon become theprimary contributor to revenue growth in one-to-one marketing[39].

In personalized marketing, it is important to consider the contex-

achieve satisfactory accuracy in many cases. As empirical studieshave shown [40], a view of a customer's activities from multiple per-spectives can enhance the predictive accuracy of data-basedmethods of customer analysis. Thus, dimensions other than location,of contextual information, e.g., time of the day and weather, can alsobe useful in predicting activities of a mobile user. In order to accu-rately predict customer preferences, we need to take into accountmultiple dimensions of a customer's context. Let us look at thefollowing example.tual information, i.e. the environment whereorder to understand the needs of the custom

Corresponding author at: Av. Padre Toms Pereira T83974172; fax: +853 83978320.

E-mail addresses: [email protected] (H. Tang), [email protected] (S.X. Sun).

0167-9236/$ see front matter 2013 Elsevier B.V. Allhttp://dx.doi.org/10.1016/j.dss.2013.06.004lar customer. Recently,ommerce, personalizedersonalized Marketing,

his/her preferences at certain locations [11,24,44]. However, loca-tion is only one aspect of a context [9]. In practice, predicting a mo-bile user's possible activities simply based on location may notwith the dazzling proliferation of mobile cmarketing via mobile devices (Mobile P1. Introduction

Personalized Marketing (PM), alsketing, is the process of delivering tto a customer based on the customerjective of PM is to identify the needsucts and services that appeal to thatwork can effectively extract patterns from a mobile customer's context information for improving theprediction of his/her activities.

2013 Elsevier B.V. All rights reserved.

n as one-to-one mar-products and services

le [36,41]. The main ob-stomer and offer prod-

collect mobile device carriers' geographical positions, value-addedservices can be delivered via mobile devices, based on the locationof a customer, which is often referred to as Location-based Ser-vices, or LBS for short [44]. The correlations between a specic loca-tion and the actual activities of a customer can be identied byanalysis of his/her short-term location log and then used to predictActivity predictionMobile Personalized MarketingData mining

du

imensions of contextual information are used in the prediction. The effectiveness of our framework is eval-ated through experiments conducted on a mobile user's context dataset. The results show that our frame-Sequential rule a common barrier in mininA prediction framework based on contextPersonalized Marketing

Heng Tang a,, Stephen Shaoyi Liao b, Sherry Xiaoyuna Faculty of Business Administration, University of Macao, Macao, Chinab Department of Information Systems, City University of Hong Kong, Hong Kong, China

a b s t r a c ta r t i c l e i n f o

Article history:Received 17 December 2010Received in revised form 8 June 2013Accepted 10 June 2013Available online 19 June 2013

Keywords:Multidimensional rule

Personalized marketing viacome an increasingly impordevices offers great potentiproducts or services. A tremof the user's preferences. Ththe correlation between conences. Our framework helpsa customer is located, iner. Since it is possible to

aipa, Macao, China. Tel.: +853

[email protected] (S.S. Liao),

rights reserved.al data to support Mobile

n b

bile devices, also known as Mobile Personalized Marketing (MPM), has be-t marketing tool because the ubiquity, interactivity and localization of mobiler understanding customers' preferences and quickly advertising customizedous challenge in MPM is to factor a mobile user's context into the predictionpaper proposes a novel framework with a three-stage procedure to discoverts of mobile users and their activities for better predicting customers' prefer-t only to discover sequential rules from contextual data, but also to overcome

ort Systems

l sev ie r .com/ locate /dssExample 1. When a customer is in a shopping mall, there is a 60%possibility of him/her being interested in redeeming a mobile couponin a shop. The estimation of this possibility can signicantly vary withextra contextual information. When it is a rainy weekend, the possi-bility that the service is preferred by the customer in a shoppingmall can increase to 95%, and in other contexts the possibility canbe as low as 15% because people tend to be indoors when the weatheris not favorable for outdoor activities.

235H. Tang et al. / Decision Support Systems 56 (2013) 234246In the example above, multiple dimensions of the contextual infor-mation provide important clues to the customer's preferences under amore specic circumstance (e.g., location, weather and time). As such,accuracy of the prediction whether a customer will likely accept anoffer of a service can be improved when such multidimensional infor-mation about the customer's context is incorporated into a predictionmethod for personalized marketing.

The estimation of a customer's preference for a service can beregarded as a mapping from the customer, context, and service to aprobability, i.e. p = f (Customer, Context, Service). In recommendersystems, the extent to which a customer prefers a service is reectedby the User Rating [2]. In this study, the probability of a customerpreferring a service is obtained through analyzing the correlation be-tween a sequence of contexts and the activities of a customer basedon historical data. Then, for a given context, the service or productwith the highest probability of being preferred can be proactively of-fered to the customer. The correlations between a series of contextsand activities of a customer are represented as sequential rules,often represented as x leads to y indicating y happens after x hashappened [5]. Users' activities, notably, can also be viewed as an im-portant type of contextual information, since they can offer valuableclues for predicting future moves. This sequential rule based solutionenables the service provider to not only tailor services for customers,but also deliver the services in advance.

Example 2. A simple sequential rule is given as follows, showing thecorrelation between the contexts and the activities of the customer inExample 1.

{ofce, afternoon}, {shopping mall, night} leads to redeeming amobile coupon for the food court (probability = 70%).

This sequential rule indicates that the customer is likely toaccept a mobile coupon when going from ofce to shoppingmall at night. Location and time are the involved dimensions ofthe two contexts in sequence. As this rule has a comparativelyhigh probability, it can be used to make predictions. Then,whenever the antecedent, i.e. the contexts {ofce, afternoon}and {shopping mall, night}, occur again, a prediction can bemade that a mobile coupon for the food court will be preferredby the customer.

Given that in practice, a huge amount of contextual informationwith various dimensions can be collected using mobile devices, it isa great challenge to effectively identify the sequential rules that aremost useful for prediction of customer preferences. Moreover, inorder to proactively address customers' needs, it is also critical toquickly identify situations where a sequential rule is applicable.

As different combinations of dimensions of a context can be usedfor prediction of customer preferences, accuracy of the predictionundoubtedly depends on the set of dimensions for various contexts.Rather than enhancing the predictiveness, incorporation of addi-tional contextual dimensions sometimes results in redundancy[63], which is generally known as a phenomenon wherein partsof knowledge are in fact corollaries of other parts of knowledge[38].

Example 3. Seven sequential rules shown as follows are derived fromthe sequential rule in Example 2 but incorporate one new dimension,i.e. day of the week:

(1) {ofce, afternoon, Monday}, {shopping mall, night, Monday}leads to redeeming a mobile coupon, and

(2) {ofce, afternoon, Tuesday}, {shopping mall, night, Tuesday}leads to redeeming a mobile coupon.

(7) {ofce, afternoon, Sunday}, {shopping mall, night, Sunday}leads to redeeming a mobile coupon.The probabilities of these rules are found to be very close to eachother. Thus, the additional dimension, day of the week, does not in-troduce new knowledge to the original rule.

As illustrated in Example 3, redundancy of sequential rules needsto be taken into account. It is worth noting that the number of se-quential rules may increase dramatically after adding a new dimen-sion. Thus, in order to reduce the complexity of the rule base andoptimize prediction efciency, the redundancy issue needs to beaddressed.

In summary, the following problems are important and need to beaddressed when mining multidimensional contextual data; theseproblems have motivated the work reported in this paper. First,how can we efciently discover sequential rules that can achievehigh prediction accuracy from multidimensional data? Second, howcan we reduce knowledge redundancy in identied rules and, more-over, using those rules, how can a prediction be made based on thecontext about a customer?

In this study, we propose a data mining based framework to ex-tract and apply sequential rules for a proactive MPM solution. Thisframework enables incorporation of multidimensional contextual in-formation into sequential rule mining; a new concept, i.e. snapshot, isproposed to capture contextual information. Under our framework,the existing Apriori-like mining methods [4] can be easily applied topredict the activities of mobile users. Moreover, we propose apost-pruning method to help reduce rule redundancy, based on themultidimensional nature of contextual information. In addition, wepropose an online mining algorithm that detects the situationswhere certain services can be delivered according to the probabilityof being preferred by a customer, thus enabling real-time predictionsbased on the extracted rules. The proposed framework follows a3-stage process comprising rule learning, selection and matching, assummarized in Fig. 1.

The learning stage starts with analysis of contextual data to ex-tract sequential rules. The proposed rule-learning algorithm isunderpinned by the classical Apriori method [5], which generatescandidate rules in a level-wise manner and then eliminates unquali-ed candidates using support as the criterion. The purpose of therule reduction stage is to screen out rules conveying redundant di-mensional knowledge from the generated rule base. This stage helpsdiminish the number of rules so as to optimize the efciency of thematching process. The rule matching process monitors the ongoingcontext changes, evaluates the probability of a user event to occur,based on previously extracted rules, and identies events with ahigh probability of being preferred.

The multidimensionality of contextual data has posed manychallenges for mining useful rules. The rst challenge is to considerthe multidimensional setting in the mining algorithms [25]. In thispaper, in order to handle the multidimensional data, we propose totake snapshots along a continuous dimension (such as time), andthen identify the co-occurrence relation between the snapshotsand the user's actions. With our formulation of the problem, thedata mining algorithm handling single-dimension mining, i.e. WINEPI[33], is extended for multidimensional sequential rule mining. An-other challenge is to alleviate the rule base complexity, for whichwe propose an information-entropy-based post-pruning methodto identify redundant rules. Our framework can be applied in pro-active MPM, and throughout the rest of the paper, we use theMPM scenario as a running example to demonstrate the efcacyof our proposed methods. This framework, however, is generalizableto many other business applications characterized bymultidimensionaldata.

Overall, the main contributions of this paper include:

Presenting a generic framework with detailed procedures to takeinto account contextual information in predicting activities ofcustomers;

arn

236 H. Tang et al. / Decision Support Systems 56 (2013) 234246 Proposing the concepts of snapshot and event to deal withmultidimensionality of contextual information using the existingrule learning approaches with extensions;

Proposing a reduction method along with a novel redundancy mea-sure to tackle the challenge of information redundancy, which is in-herently caused by the multidimensional nature of contextualinformation, and demonstrating the synergetic effect of combiningof different reduction methods; and

Demonstrating that contextual information other than locationalso matters in predicting activities of a mobile user, i.e. effectivelyusing multidimensional contexts can outperform location-basedpredictions.

The remainder of the paper is structured as follows. In Section 2,we provide a brief review of the relevant literature. In Section 3,we formulate the rule-learning problem and outline the learningalgorithm. The rule reduction method is introduced in Section 4.In Section 5, we describe the matching algorithm. The experimentsare presented in Section 6. Section 7 summarizes the paper andoutlines future research directions. The symbols and denotationsused in more than one place in the paper are summarized inAppendix A.

2. Literature review

This section provides a review of related works in several areas,including sequential pattern mining, multidimensional sequence,rule reduction and association rule based prediction.

2.1. Sequential pattern mining

Data sequence, a set of data records generated sequentially, hasfound many applications in different business areas, such as invest-ment, auctions and banking. Pattern mining from data sequenceshas aroused consistent interest in the data mining community [52].Agrawal et al. [5] addressed the problem of discovering frequentsequential patterns, and the approach they proposed was further im-proved in [45]. Thereafter, research in this area gained momentum,with studies falling into two broad streams, based on the forms ofinput dataset. The rst stream focuses on developing effective algo-rithms to detect sequential patterns from transactional or sequence

Fig. 1. A 3-stage framework (ledatabases. Major studies in this stream include [35,43,50,59,61].The second stream of research focuses on mining one sequence,which stores the succession of data items, with or without a concretenotion of time. Examples include customer shopping sequences,Web click streams, and biological sequences [21]. Mining from atransactional database and mining from a sequence are different.The former aims to identify patterns from multiple sequence seg-ments to predict the preference of a customer based on what othercustomers with similar preferences have done while the latter in-tends to discover recurring patterns from a single sequence to pre-dict the activities of a customer. Predicting the activities of acustomer is unique in that people's activities aremore closely relatedto their personal schedule, pattern of life, places of living (home,ofce and entertainment, etc.), which can be very special from oneindividual to another.

The mining algorithm we propose in this paper attempts to dealwith the second type of data format, i.e. a single sequence, which ismore applicable for context-specic MPM. In this category, Mannilaand Toivonen [32] use Episode to describe frequently recurringsubsequences and propose two efcient algorithms, WINEPI andMINEPI. Many others have also focused on episode mining, includ-ing [8,10,27,33]. For example, Bettini et al. [10] address the prob-lem of mining event structures with multiple time granularity;Laxman et al. [27] extend the episode mining approach by explicit-ly bringing event duration constraints into the concept of episode.The difference between episode mining techniques and ours isthat the former are not directly applicable to multidimensional se-quence mining.

2.2. Multidimensional sequence

The term multidimensional used in this paper comes originallyfrom the multidimensional data model used for data warehousingand On-Line Analytical Processing (OLAP) [13]. The problem of dis-covering multidimensional sequential rules for prediction studied inthis paper is a new issue. To the best of our knowledge, no relatedwork has directly addressed this. However, the general concept ofmining multidimensional sequential rules has been addressed in sev-eral studies and dimension is also referred to as attribute [47]. Yuand Chen [59] investigate the episode mining problem for amultidimensional sequence. However, the term multidimensionalused in their work refers to multiple granularities in terms of thetime dimension of occurrence of events. It is thus a concept differentfrom the way the term is used in our study. Attempts to detect se-quential patterns from a multidimensional transactional databasehave been made by Pinto et al. [42], in which sequence refers topurchase sequence segments of a certain customer. The researchproblem discussed in this paper is different in that our approachesaim to extract patterns from an entire sequence rather than from adatabase of short sequence segments.

2.3. Rule reduction

ing, selection, and matching).Knowledge redundancy is known as a common problem ofknowledge-based systems, as it reduces maintainability and efcien-cy of the knowledge base [49]. In this paper, we concentrate on theredundancy problem associated with rule-based systems.

In general, a rule base with problematic rules, including redundantones, can be validated via conducting post-analysis by either domainexperts or through an automatic process. The approach of involvingdomain experts is known to be exible and highly applicable. For ex-ample, Adomavicius and Tuzhilin [3] propose an expert-drivenframework for validation of a given rule base. The automatic processfor redundancy check normally relies on precisely dening the mea-sure of redundancy. For instance, Zaki [62] proposes the concept ofClosed Itemset based on Formal Concept Analysis, and proves that

237H. Tang et al. / Decision Support Systems 56 (2013) 234246a closed itemset can be used to capture all information about a con-ventional frequent itemset. Moreover, a redundant rule is dened tobe a super rule with the same frequency and condence as itssub-rules [62]. Similar redundancy denitions have also been pro-posed in some other studies [31,55]. In Ashra et al.'s approach[6,7], given a rule r, if r's sub-rules with a higher condence arefound in the rule base, then the rule r should be regarded as redun-dant. [14,15] dene a -tolerance association rule (-TAR) miningtask. The itemset identied using the -TAR method only includesitems with dramatic frequency changes, and rules excluded in the-TAR are viewed as redundant.

Despite these prior efforts on redundancy reduction, to the best of ourknowledge, few works have addressed this issue in multidimensionalsettings. Specically, our research investigates the redundancy problemintroduced by the multidimensionality of sequential rules.

2.4. Association rule based prediction

The proposed matching approach is based on the n-grammethod inwhich an n-gram refers to a succession of n items froma given sequence[28]. n-gram has been widely used in statistical natural language pro-cessing [34] and genetic sequence analysis [54]. Many works attemptto build n-grammodels using associationmining techniques. For exam-ple, Yang et al. [56] attempt to discover association rules fromweb usersessions to estimate conditional probability of accessing web docu-ments for caching optimization. Similarly, the WhatNext system devel-oped in [46] generates simple n-grams through association mining. Weextend then-grambased prediction by borrowing the concept of align-ment from string comparison algorithms [16], such that gaps in theinput sequence are allowed in matching. In addition, time constraintsare also taken into account.

The framework proposed in this paper differs from time seriesforecasting [12,53] in two regards. First, forecasting in the time seriesarea mainly studies the prediction problem with continuous datawhereas the problem to be solved in this paper is a prediction of cat-egorical data across multiple dimensions. Second, to apply time seriesforecasting approaches, an appropriate multivariate model needs tobe determined in advance [12]. In contrast, data mining approachessuch as ours are essentially problem-oriented, aiming at exploring alarge amount of data without many restrictions associated with thepreset model [12].

3. Learning rules from multidimensional data sequence

We will rst introduce the relevant denitions and formulate theproblem of learning sequential rules in Section 3.1. The learning algo-rithm will be described in Section 3.2.

3.1. Problem statement

The input contextual data are considered as a sequence of dataitems with multiple dimensions. We use the term snapshot to de-scribe a mapping from time domain to context, in order that we canapply conventional rule mining methods in a multidimensional set-ting for rule extraction.

Denition 1. Snapshot

Given a function mapping S : TD from a discrete time do-main T = {t1,t2,,tn} to an (m + 1)-dimensional state spaceD D1 D2 Dm1, the mapping S(t) = {v1, ,vm + 1}, t Tis called a snapshot, where vj, j = 1, , m + 1 indicates thestate in the j-th dimension.

In particular, D1, D2, , Dm are the dimensions of context called

contextual dimensions, whereas Dm + 1 is the dimension of customeraction, referred to as actional dimension. In this research, domains inall dimensions are required to be categorical in order to form adiscrete state space, therefore, a discretization method needs to beapplied in advance in the case of continuous state space.

A snapshot describes the contextual state in every dimension. Inpractice, though, the recurrence of context can only be found insome dimensions, while the states in other dimensions are random.For example, a user shows up in ofce in most weekdays, but duringwhich the temperature could be nearly random. We hence dene theconcept of event, which can be viewed as the template of state vec-tors, as follows.

Denition 2. Event

An event e is a subset of the state space, denoted e D.

For instance, e1 = {(v1, ,vm + 1)|v1 = Street, v2 = Morning,v3 = Raining} is an event characterized by the context of location,time, and weather, and e1. dim = {D1,D2,D3} is called the (restricted)dimension set of e1. As another example, the event e2 = {(v1, ,vm + 1) |vm + 1 = Purchase} describes a purchase action of a cus-tomer, of which e2. dim = {Dm + 1}. For simplicity, the above twoevents are also written as e1 = {v1=Street, v2=Morning, v3=Raining} and e2 = {vm + 1 = Purchase}, respectively.

A customer's context recorded in a stream of state vectors forms asequence, which is dened as follows.

Denition 3. Sequence

Given T = {t1,t2,tn}, a sequence is a list seqT = {S(t), t T} whoseelements are ordered ascendingly by t. Assuming t1 b t2 b b tn, theoverall time span of seqT is span(seqT) = tn t1.

We say an event e occurs in sequence seqT at time t, denoted e t seqT,if there exists a time point t such that S(t) e. For brevity it is also denot-ed e seqT in the case that occurring time does not matter. For example,given S(t1) = { Street , Morning , Raining , Purchase } denedon a 4-dimensional state space D and e2 = {v4 = Purchase }, sinceS(t1) e2, we say e2 occurs in seqT, denoted e2 seqT. Note that therecan be, in a general case, more than one event occurring at the sametime point.

To simplify the notation in the paper, we use a tuple (e, t) to rep-resent an event occurring in a sequence at a specic time point,where e is the event label taking value from a nite alphabet and tis the occurring time of e. Thus a sequence could be denoted by, forexample, seqT1 = b(x,11), (x,12), (x,13), (w,14), (y,15), (x,16),(z,17), (x,18)>, where w, x, y and z are events occurring in seqT.

A rule is a type of pattern representing the hidden correlationamong events in a sequence dened as follows.

Denition 4. Rule

A rule is a list of events denoted by r = e1,e2, ,el, wheree1;; el D.

Having introduced the notion of a rule, we are now able to denewhat is meant by its occurrence in a sequence so as to formulate its sig-nicance. Shortly, an occurrence of a rule r is considered as a series oftime points recording the occurring time of the corresponding eventsof r in a sequence. For practical purposes, we should allow some timegap between the adjacent events in an occurrence, hence two thresh-olds g andw are introduced into our formulation: g (or gap) is themax-imum allowed time difference between the occurring time of any twoneighboring event types, while w (or width) is the maximum allowedtime difference between the occurring time of the rst and the last

event types. The rigorous denition is provided as follows.

238 H. Tang et al. / Decision Support Systems 56 (2013) 234246Denition 5. Occurrence

Given a sequence seqT and a rule r = e1,e2, ,el, a list of timepoints or = t1o,t2o, ,tlo, tio T, is called a (g,w)-occurrence of r inseqT, if and only if: (1) eitoi seqT for i = 1 l, (2) t1

o t2o tlo,(3) ti + 1o tio g for i = 1 l 1, and (4) tlo t1o w. The setof all (g,w)-occurrences of r in seqT is denoted Occr(r, seqT, g, w) = {or|or = t1o,t2o,,tlo is a (g,w)-occurrence of r in seqT for t1o, t2o,, tlo T }.

For instance, given seqT in the previous example, if we consider arule r = x,y,z and thresholds g = 2 and w = 5, then b 13, 15,17 > Occr(r, seqT1, 2, 5) is an occurrence of r, while b 12, 15,17 > Occr(r, seqT1, 2, 5), since it violates constraint (3) in theabove denition. Likewise, b 11, 15, 17 > Occr(r, seqT1, 2, 5) as it vi-olates both constraints (3) and (4).

We adopt the time window concept introduced in [33] to measurethe signicance of a rule. A window win is a half-open time interval inthe span of seqT, denoted winT = [ti,tj), if tj > t1 and ti b tn. The widthof the window is |winT| = tj ti. Let W(seqT,w) be the set of all win-dows with width w in the span of seqT, where w is the aforementionedtime threshold, i.e.,W(seqT,w) = {winT = [ti,tj)| |winT| = w for ti,tj T}. We assume, without loss of generality, that time points in T areconsecutive integers, W(seqT,w) thus has the cardinality W(seqT,w) = span(seqT) + w 1. For example, consider the same seqT1, wehave W(seqT1, 5) = 7 + 5 1 = 11, where W(seqT1, 5) = {[7,12),[8,13), [9,14), [10,15), [11,16), [12,17), [13,18), [14,19), [15,20),[16,21), [17,22)}. Notice that windows dened on seqT1 can extendout of its span.

The number of windows containing a rule's occurrences can beused to gauge its frequency. A window win = [ti,tj) contains an oc-currence or = t1o,t2o, ,tlo if and only if ti t1o and tlo b tj, denotedor win. Of all windows in W(seqT,w), those containing any occur-rence of r is dened as Wr(seqT, g,w) = {win|win W(seqT, w) andor win and or Occr(r, seqT, g, w)}.

Let rule r = x,y,z, according to the denition of Wr, we haveWr(seqT1, 2, 5) = {[13,18)}. The cardinality of the set Wr can beused to measure the frequency of rule r, thereby the frequency of r =x,y,z, subject to g and w, is calculated by Wr(seqT1, 2, 5) = 1.

We adopt the classical support-condence framework [4,33] toquantify the signicance of a rule, in which for a rule r: X Y,supp(r) = P(XY) and conf(r) = P(XY) / P(X) are two key measures.The support of a sequential rule r = e1,e2,,el is dened as the ratio

supp r; seqT ; g;w Wr seqT ;g;w W seqT ;w , implying the probability that the rulemay occur in any window. Note that, the numerator Wr(seqT, g, w)only counts the number of windows containing r's occurrences, regard-less of how many of them are found in the same window. Accordingly,the condence of rule r is thus the support of the entire rule r over that

of the antecedent, namely, conf r; seqT ; g;w supp r;seqT ;g;w supp e1 ;e2 ;;el1h i;seqT ;g;w .All in all, the problem of mining a sequential rule with length l

from seqT, subject to time thresholds g and w, is to identify all rulessatisfying the following two conditions:

1)

supp r; seqT ; g;w min supp 1

2)

conf r; seqT ; g;w min conf : 2

The above conditions dene the minimum required support andcondence of qualied rules. In addition, for any rule r, we say that

r is a frequent rule if and only if condition (1) is satised.3.2. The mining algorithm

Since the ultimate goal of rule mining is to anticipate the occurrenceof customers' actions (rather than that of other contextual events), weonly need to consider rules whose end event is actional. Such kind ofrule is called an actional rule, denoted r.tail.dim = {Dm + 1}, whereDm + 1 is a dimension of customer action. An actional rule can also bewritten in the implication form as r : e1,e2, ,el 1 el. Note thatthe denition of rule so far does not prohibit distinct dimensions in dif-ferent events in a rule. Such exibility allows to express some generalrules like bweather is good, called shopping buddy, shopping>.However, this exibility in dimensions imposes great challenge to therule learning phase of an MPM system. Because in addition to the mas-sive searching space to be dealt with when enumerating nearby eventsand growing candidate rules, meanwhile we have to consider a vastnumber of dimension combinations for each event, leading to asynergetically combinatorial explosion in searching space. Hence, inthis paper, we only consider a restricted version of actional rulewhose antecedent is with the identical dimension set only.

Denition 6. Sequential rule

A rule r : e1,e2, ,el 1 el is called a sequential rule if it is anactional rule and e1. dim = e2. dim = = el 1. dim. The commondimension set of the antecedent is denoted r.dim. Rule r is said to bek-dimensional if r. dim = k.

The problem of mining sequential rules addressed in this paper issimilar to the Episode Mining problem studied in [8,10,27,32,33]. Inthis paper, we modify the WINEPI algorithm proposed in [33] by in-corporating thresholds g and w in order to enhance the pruning pro-cess for reducing the search space.

The algorithm is outlined in Fig. 2 (Algorithm 1). It adopts alevel-wise strategy used in the Apriori algorithm in that the rules withlength k are generated in the k-th iteration (level). Initially, in thelevel 1 procedure, rules with length 1 are counted and stored (line 1in Fig. 2). Each new rule in the level k candidate set is generated byconcatenating two concatenatable rules with length k1 (line 5 inFig. 2). Hence, the length of the rule will grow by one in each iteration.Concatenation is the basic operation in the rule-growing process formu-lated as follows.

Denition 7. Concatenation of overlapping rules

Given two rules r1 = e1,1,e1,2, ,e1,l and r2 = e2,1,e2,2, ,e2,l,where l 2 is their length. If for any i = 2l we have e1,i = e2,i 1,we say that r1 and r2 are concatenatable. The outcome of concatenationis concat(r1,r2) = e1,1,e1,2,,e1,l,e2,lwhich has the length l + 1. r1 andr2 are referred to as the left and right rules of concat(r1,r2), respectively.

The concatenation operation generates a new rule with lengthl + 1 by stitching together two rules with length l and hence pre-vents extensive combinatorial explosion. This operation conforms tothe downward-closure property [4,33], it therefore exhausts all fre-quent rules with length l + 1. As a special case, initially two events(i.e., rules with length 1) can be directly concatenated together toform a new rule with length 2. The concatenation operation isperformed iteratively to generate all possible rules and stores themin the candidate set (line 5 in Fig. 2).

The algorithm in Fig. 2 is explained as follows. The dimension setof the sequential rules to be extracted, denoted dimset, needs to bespecied in advance. Given a data sequence seqT, the algorithm ex-tracts rules with maximum length l and dimension set dimset. Theaforementioned thresholdsmin_supp, min_conf, g, and w are parame-ters. Candi is used to temporarily store candidate rules with length i

that are not yet pruned. In line 1 Cand1 is initialized with occurred

ing r

239H. Tang et al. / Decision Support Systems 56 (2013) 234246events that are either actional or with the dimension set dimset. In theloop between lines 2 and 6, Freqi is used to store frequent rules withlength i, and the pruning method (denoted prune() in line 4) is thenapplied to eliminate rules violating either of the thresholds g or w.Each concatenatable rule pair is then merged to generate a newcandidate with length i + 1. The above procedure is iterated untilall frequent rules with length l are identied. In addition, since thegoal is to extract sequential rules in which actional events do not ap-pear in the antecedent, rules ending with actional event thus will notbe selected as the left rule when conducting concatenation (line 5, de-noted r1.tail.dim {Dm + 1}). This restriction on rule pair selection isanother effective pruning in the algorithm. Subsequently, lines 7 and8 compute Freq1 so as to nalize the loop.

Ultimately, only sequential rules with the minimum length of 2are chosen (line 9 in Fig. 2). Since the support of all rules have beencalculated and stored, computing their condence is straightforward.Rules with condence greater than min_conf are considered as validrules to output.

4. Rule reduction

We now consider the redundancy problem motivated in the ex-amples in Section 1. By specifying different dimension sets, therule learning algorithm can identify sequential rules with variouscongurations of contextual dimensions. Rules with high dimen-sionality could be of great interest because they offer more specicand accurate description of context, they may nevertheless also beconsidered redundant if they do not carry additional knowledgethan their lower dimensional variations. Traditionally, redundantdimensions can be spotted by various dimension (feature) selectionmethods, such as applying heuristics [19,26], so that they can be ex-

Fig. 2. Algorithm 1Extractcluded from rule extraction in the rst place. However, conventionaldimension selection methods consider redundancy problem fromthe dimension level, but ignore the fact that a valueless contextualdimension in terms of some rules could possibly be valuable forsome others. To this end, this study focuses on the redundancyproblem on rule level.

In the research of association rule mining, a number of criteria areproposed to determine whether a rule r1 : XY e is redundant withregard to its closure r2 : X e, where X and Y are items. The mostrepresentative criteria fall into two categories, that is: Associationrule r1 is redundant in terms of r2 if and only if:

(1) conf(r1) conf(r2) [6,7], or(2) conf(r1) = conf(r2) and supp(r1) supp(r2) [31,55,62]

The above criteria can be straightforwardly applied in multi-dimensional settings if states in different dimensions are consideredas items. In other words, to determine whether a state Y Dk isnecessary for a given sequential rule r2, we can simply comparethe condence and support between r2 and its specialization onY, i.e. r1. These criteria consider the relationship between r2 andr1 but ignore the overall effect of dimension Dk where Y comesfrom. As shown in Example 3 in Section 1, the set of sequentialrules derived using the newly added dimension Day of theweek together, rather than as individual sequential rules, shouldbe considered in order to infer whether the new dimension bringsadditional knowledge into the rule base. As a remedy, we have de-veloped a new measure based on the concept of specializationdened as below. Note that since the consequent of a sequentialrule is single dimensional, only the antecedent part needs to be ex-amined. Hence for convenience, we here assume that the length ofa sequential rule is l + 1, such that l is the length of the anteced-ent. For simplicity of writing, the term rule specically refers tosequential rule hereafter in this paper.

Denition 8. Specialization

Let r : e1,e2, ,el el + 1 be a sequential rule with dimensionset r.dim = {D1,D2,Dm 1} where each ei = {v1 = pi1, v2 = pi2,, vm 1 = pi(m 1)}, i = 1 l, is an event with m 1 restricteddimensions. Given r' : e1 ', e2', , el' el + 1, a rule withr'. dim = {D1,D2, ,Dm}, where each ei' = {v1 = qi1, v2 = qi2, ,vm = qim}, i = 1 l is an event with m restricted dimensions. Ifpij = qij for any i = 1l and j = 1m 1, we say that r is a special-ization of r on dimension Dm. The set of all possible specializations of ron Dm is called the specialization set of r on Dm, denoted spec(r,Dm). Or,conversely, rule r is called the generalization of spec(r,Dm).

The level of redundancy of spec(r,Dm) can be gauged by the uniform

ules from a data sequence.extent of frequency distribution of the rules in spec(r,Dm). Informationentropy-based measure, which is widely used to quantify the diversityof probability distribution and information amount [30], is adopted inthis paper.

The entropy of a specialization set spec(r,Dm) is the summation oftwo parts, i.e. the entropy of frequent specializations, which can bedirectly calculated, and the entropy of infrequent specializations,which is unavailable because infrequent rules are pruned in therule-learning phase. Suppose that with the inclusion of dimensionDm, the set of m-dimensional frequent rules with respect to r is de-noted R(r,Dm) = {rj | rj spec(r,Dm) and rj is frequent}, and thusthe information amount of frequent specializations can be calculatedby

Ifreq r;Dm rjR r;Dm

supp rj

supp r logsupp rj

supp r

0@

1A: 4

each rule.Assume that r : e1,e2, ,el el + 1 is the rule to be compared

with, and r[k] is used to denote the k-th event ek in rule r. Given amatched portion r e1; e2;; eih i; i 1l and an incoming snap-shot S(t), if S(t) ei + 1, the current input state S(t) is consideredmatching r . Consequently, the conditional probability p(e|e1,e2,,ei + 1) is examined: if it is no less than the predened threshold, the entire rule r is considered a condentmatch hence its corre-sponding actional event e can be triggered (line 79 in Fig. 4). Other-wise, we record the current match and wait for the next incomingstate (line 12 in Fig. 4), since a longer matching is expected toachieve higher condence [56]. Whenever either of the time thresh-olds g or w is violated, the current matching with rule r is considereda failure (line 4 in Fig. 4), we then can start over again by resettingthe matching status maintained in the three tables and wait for thenew incoming state (line 5 in Fig. 4). In particular, owing to thelevel-wise nature of the rule extraction algorithm during the learn-ing stage, the conditional probabilities associated with all prexes

240 H. Tang et al. / Decision Support Systems 56 (2013) 234246On the other hand, based on the assumption of the Principle of Indif-ference [20,23], we assume that the remaining infrequent specializa-tions are equally probable. Thus the probability of each infrequentspecialization is the average of the remaining possibility, which can be

calculated by pinf 1rjR r;Dm supp rj supp r

= spec r;Dm R r;Dm ,

where the denominator is the estimated number of infrequent spe-cializations of r. Because spec(r,Dm) is the set of all combinations ofm-dimensional specializations with length l, we can calculatespec(r,Dm) = dom(Dm)l, where dom(Dm) is the value domain ofDm.

Therefore, the total information amount of all infrequent specializa-tions can be estimated by

Iinf r;Dm pinf logpinf spec r;Dm R r;Dm : 5

Using bounds [0, log(spec(r,Dm))], we normalize the entropy-based redundancy degree into [0,1], which is the ratio of the overallentropy of the specializations over the upper bound, that is,

Redun r;Dm Ifreq r;Dm Iinf r;Dm

log spec r;Dm : 6

A large value of the above measure (i.e., close to 1) is undesirablebecause it implies that spec(r,Dm) conveys little additional knowledgethan that implied in r.

We use the scenario in the three examples discussed in Section 1as an illustration of the redundancy calculation in Eqs. (4), (5), and(6). To avoid lengthy calculation, we start from a one-dimensionalfrequent rule r with length 2 as {v1 = ofce } e, and assumethat the support of r is: supp(r) = 0.2.

Suppose the frequent specializations on the dimension Day of theweek are the following:

supp v1 office; v2 Satf gh ie 0:08supp v1 office; v2 Sunf gh ie 0:09supp v1 office; v2 Monf gh ie 0:02:

Thus, we have |spec(r,Cm)| = 71 = 7 (i.e., number of days in aweek), and

Pinf 10:080:2

0:090:2

0:020:2

= 73 0:0125

Iinf 0:0125 log0:0125 73 0:316Ifreq

0:080:2

log0:080:2

0:09

0:2log

0:090:2

0:02

0:2log

0:020:2

1:379:

Then the redundancy of the dimension Day of the weekwith regardto r is calculated as: Redun(r,Dm) = (0.316 + 1.379)/log 7 0.604.

From the calculation above, we nd that the new dimension Dayof the week carries extra knowledge for the given rule r because itsredundancy level is signicantly smaller than 1 (based on ourtrial-and-error simulations, when dom(Dm) is not a large set,Redun value greater than 0.9 could be considered signicantly re-dundant). In fact, all supports given above have already intuitivelyshown the sequential occurrence of Ofce and actional event e issignicantly more frequent on 2 days (Sun and Sat) than on otherweek days. Additionally, a byproduct of the above process is thegeneralization of context in the conceptual hierarchy, i.e. somevalues in a contextual dimension could be generalized (e.g., MondayFriday to Weekdays) if they result in high information amount. Thistopic is out of the scope of the current paper and will be discussed inour other works.

Based on the denition of the redundancy measure, the algorithmfor rule reduction is sketched in Fig. 3 as follows. Various rules withdifferent dimension sets are stored in the original rule base. First,

each rule r and its specializations in the original rule base areexamined (line 3), where D \ (r.dim) is the set of dimensions not in-volved in rule r. If the redundancy level of the specialization beingtested does not exceed a given threshold, frequent rules in this spe-cialization set are added into RBr (line 5 in Fig. 3); otherwise rule ris incorporated while its specialization set is discarded (line 7 inFig. 3). The threshold in line 4 can be determined by trial-and-error through simulations.

5. Generating prediction by rule matching

In order to predict an actional event, we need to match the datastream with previously identied rules relevant to the actionalevent. In this study, we have developed a generic approach to allowan inconsecutive matching while factoring the time thresholds (g andw). This approach is underpinned by the n-gram predictionmechanismoriginally used for sequence comparison and has been applied to pre-dict Web requests on the basis of the historical access patterns[46,58]. In order to allow neighboring events in a rule to nd their cor-responding occurrences inconsecutively in the incoming sequence, ap-proximate matching [17] is implemented in the matching process. Therule matching process is sketched in Algorithm 3 in Fig. 4, which is aninnite procedure that keeps monitoring and processing the next in-coming event. Suppose that RB is the rule base with k rules, in whichall rules are with sufciently high support and condence. Three tablesare used to maintain the current matching status of each rule:matched[1k] maintains the last matched event of each rule, time_last[1k]maintains the time point of the last matched event of each rule, andtime_rst[1k] records the time point of the rst matched event of

Fig. 3. Algorithm 2Reducing a rule base.of a rule are readily available thus there is no need to calculate during

ut s

241H. Tang et al. / Decision Support Systems 56 (2013) 234246the matching process. For each new state, the algorithm scans allrules in the rule base and attempts to nd a match from the currentmatching position of each rule stored in the table matched.

Note that multiple rules can be applicable simultaneously sincemore than one rule can have a conditional probability no less than

Fig. 4. Algorithm 3Matching inpthe threshold in an iteration of scan. While the algorithm in Fig. 4only chooses the rst matched rule (line 9 in Fig. 4), an extensioncan be easily made to adopt different criteria, such as the shortestrule, the largest probability, and a predened prot/cost matrix[1,57], to determine the most applicable rules.

For real-timeMPM applications, such on-line processingmethod se-rially handling the input piece-by-piece as depicted in Algorithm 3 ispreferable. In traditional string comparison algorithms, inconsecutivelycomparing elements in two strings can be solved by using DynamicProgramming [17] within the time complexity O(MN), where Mand N are lengths of the two strings. Algorithm 3 attempts to alignan input sequence with a xed rule incrementally, hence the maxi-mum time complexity is reduced to O(N) where N is the length ofinput sequence. As such, the time complexity of comparing one in-coming state with a rule becomes constant, and comparing a statewith all rules in a rule base has complexity O(RB) where RB isthe size of the rule base RB.

6. Evaluation

To validate the methods proposed for extracting, selecting andranking sequential rules, we conduct experiments on a context data-base referred to as Nokia Context Data [18]. The dataset consists of asequence of contextual data. In addition to validating the frameworkproposed in this study, the dataset is used as a manifestation of theMPM scenario mentioned in Section 1. The evaluation encompassestwo experiments:

Experiment I. In order to examine the ability of the proposed frame-work to identify effective multi-dimensional rules, we apply thelearning and matching algorithms to the Nokia Context Data togenerate rule bases and make predictions. The experiment showsthat by taking additional contextual dimensions into consideration,the yielded rule bases may outperform the location-only rule basein predictiveness. However, the actual performance results dependlargely upon contextual dimensions that have been used.

equence to generate prediction.Experiment II. We apply several rule-level reduction methods to arule base produced in Experiment I and compare the newly generatedrule bases with the original one. To examine how the predictivenessof a rule base is affected after the reduction, we apply the matchingalgorithm to the reduced rule bases and compare their predictionperformance with that of the original one.

The paper below will introduce the dataset and the adopted pre-diction measures, the experiment setup and evaluation results willalso be reported. In this paper we use bar charts instead of lineswhen reporting the performance since many series are close invalue.

6.1. Data description and measures

The Nokia context dataset [18] consists of a set of feature les for 43different recording sessions. In each session, the same user carrying anumber of devices is going from home to the workplace or vice versa,duringwhichhemay choose differentmeans of transportation. Portablesensors were used to record the carrier's contextual information, suchas atmospheric pressure and temperature, etc. GSM positioning tech-nology was used to locate the user's current geographical position.There are a total of 15 dimensions in this dataset, of which 14 are con-textual dimensions and one records actions, including interactionswith mobile phone such as calls, short messages and Web pagesaccessed. The interactions in this dataset are viewed as the actionalevents in our experiments. The summary of the dataset with exemplardimensions is presented in Table 1.

The measures of prediction effectiveness used in this research areprecision, recall and F1, which have been adopted widely inprevious studies [22,56]. Precision represents the probability that a

predicted actional event actually occurs, whereas recall representsthe probability that the actional events will be predicted. Let TP bethe number of actually occurred actional events that were predicted,FP be the number of actional events that were predicted but did notactually occur, and FN be the number of actional events that were notpredicted but actually occurred; thus, Precision = TP / (TP + FP), andRecall = TP / (TP + FN). Another widely used measure that combinesprecision and recall into a single metric is F1, which can be calculatedby F1 = 2 Precision Recall / (Precision + Recall) [48]. In all experi-ments, 6-fold cross-validation is adopted.

6.2. Experiment I

We employ the rule learning algorithm on various dimension sets.The identied rules are denoted in the format of, for example, {v1 =

The setting of thresholds is specic to different applicationsand datasets. The two time thresholds in the learning process ofour experiments are set to g = 36,000 s and w = 70,000 s (ap-proximately 10 and 20 h, respectively), such that we will notmiss the co-occurrence of two neighboring events occurring indifferent periods on a single day (e.g., morning, afternoon, etc.).Notice that the effect of window size on the performance of therule mining algorithm has been discussed in [33]. In order to com-pare the results of multi-context prediction and the location-basedprediction, we have conducted this experiment in the followingthree steps.

Step 1: we have compared predictiveness of different rulebases, each of which is generated using one contextual dimen-sion available in the dataset. It has been conrmed that the di-mension of location is indeed an effective dimension, which

Table 1Data samples from the Nokia context dataset.

Record type Exemplar dimensions Value domains/format Example

Context v1: Location Area Code, Cell ID 1,3v2: Day name 1 ~ 7 (Mon,,Sun) 1v3: Day period 1 ~ 4 (Night, Morning, Afternoon, Evening) 1

v4: Actional Event Launch an application a,(031) a,2Access a Web page b,(013) b,13Initiate communication c,(04) c,3

242 H. Tang et al. / Decision Support Systems 56 (2013) 234246d e1, 1, v2 = 1, v3 = 4}{v1 = 1, 3, v2 = 1, v3 = 4}{v1 = 1, 4,v2 = 1, v3 = 4} {v4 = c, 1}, in which {v4 = c, 1} is an actionalevent meaning making a phone call to a certain number. This ruleimplies that if on Monday evening the mobile device carrier visitsthree places 1,1,1,3 and 1,4 in turns, it is very likely that s/hewill call a number afterwards (in the same time window). Using thematching algorithm, prediction of an actional event (i.e. call a number)is done by comparing the sequence of context that has been receivedwith the extracted rules. As another example, extracted rule {v1 = 3,46, v3 = 2} {v4 = a, 12} implies that if the device carrier visitslocation 3,46 in the morning, s/he tends to launch a certain applica-tion afterwards, denoted a,12.

a bLegend:

Fig. 5. Performance of rule bases versus different min_conf threshcan be used to generate a single-dimensional rule base withthe best precision, recall and F1 among all available contextualdimensions.Step 2: since location is the context adopted extensively by vari-ous LBS applications [44], location-based prediction is used as thebaseline for comparison. We thus need to examine whether theuse of multi-dimensional contextual information does achievebetter predictiveness than the baseline. Specically, we havecompared the performance of rule bases generated by Location(denoted L), Location + Day of a Week (denoted LD),Location + Period of a Day (denoted LP), and the mix of all

c

folds (in a, b, and c) and min_supp thresholds (in d, e, and f).

the 3 aforementioned dimensions (denoted LDP). Fig. 5(a)(c)shows the performance (precision, recall and F1) of the ve rulebases versus different condence thresholds (0.7, 0.8, 0.9, and0.95 respectively, while min_supp is xed to 0.1). It can be seen

multidimensional rules so as to achieve better performance. Herewe have examined the additional improvement in predictivenesscaused by introducing extra contextual dimensions into the base-line rule base. For instance, if we consider the predictiveness ofDay along with Location in addition to Location only, theoverall performance of the combinations Day, Location, andLocation + Day should be considered as a whole. In the experi-ment we have observed the performance of 5 rule bases with avariety of dimension combinations: RB1 (Location), RB2 (Loca-tion, Location + Day), RB3 (Location, Location + Period), RB4(Location, Location + Day + Period) and RB5 (Location, Loca-tion + Period, Location + Day, Location + Period + Day), so thatthe additional improvement introduced by Period and Day canbe scrutinized. Note that time-related dimensions are with regular-

tively, while min_supp is xed to 0.1). As shown in Fig. 6(a) and (b),

Table 2Number of rules in rule bases at different min_supp levels (when min_conf = 0.8).

min_supp # of rules

L LD LP LDP

0.08 191 150 209 670.09 98 70 139 250.1 39 31 83 3

a

243H. Tang et al. / Decision Support Systems 56 (2013) 234246that the precisions of rule bases generated by multi-dimensions,i.e., LP, LD, and LDP, are higher than that of L at all min_conflevels. It is also shown in the gure that, the recall of L, on thecontrary, is signicantly better than that of the other 3 rule bases.Actually it is natural to expect such a result since a predictive modelwithmore variables normally outperforms that with one in predic-tion precision; on the other hand, given the same training dataset, thelatter generally have fewer generated rules, hence may result inlower coverage of all actional events (lower recall). Dragged downby low recall, the overall performance (measured by F1) of LD andLDP is thus worse than the baseline. Fig. 5(d)(f) depicts the perfor-mance versus different min_supp thresholds (0.08, 0.09, 0.1, whilemin_conf is xed to 0.8), in which a similar trend can be observed.In order to investigate the impact of support threshold on the num-ber of extracted rules, we have inspected the size of rule bases at dif-ferent min_supp levels (the condence threshold is set to 0.8, asshown in Table 2). As expected, the number of rules in each rulebase increases dramatically when support threshold goes down.The number of rules in the 2-dimensional rule base LP, interesting-ly, is larger than that of the single-dimensional rule base L. The rea-son is that, while high-dimensional rules are rarer thanlow-dimensional rules, rules with dimensions Location + Periodare generally with higher condence, hence fewer of them arescreened out by the specied condence threshold. This nding im-plies that, when used in conjunction with Location, Period is in-deed an informative contextual dimension.Step 3: while it has been shown in step 2 that multi-context pre-diction does not necessarily outperform location-based predictionin overall predictiveness, using extra contextual dimensions couldstill be benecial. When various contextual dimensions are avail-able, a compound rule base can be utilized. That is, a rule basecan be constructed by mixing single dimensional rules and

bLegend:

Fig. 6. Performance of compound rule basewhile rule bases RB3 and RB5, involving the combination of Locationand Period, do not improve dramatically from RB1 in terms of preci-sion, they achieve signicantly better recall, indicating that a largerportion of actional events can be predicted when additional con-textual dimensions are utilized. Consequently, better overall effective-ness measured by F1 is observed from RB3 and RB5 in Fig. 6(c).

The result leads to the following implications. First, contextualdimensions other than Location matter in predicting the activities ofa customer. Therefore, by utilizing extra informative contexts, theoverall predictiveness of various existing LBS systems can be im-proved. Second, incorporating more contextual dimensions does notnecessarily lead to a better result. For instance, though RB4 uses onemore dimension (Day) than RB3, F1 of RB3 signicantly exceedsthat of RB4 (Fig. 6(c)), since the dimension Day does not contributeas much additional information as Periodwhen used in conjunctionwith Location. This also explains the lack of difference between RB1and RB2. A plausible reason is that the user's activity patterns ondifferent days of a week do not have much difference.

Overall, the ndings in this experiment support that given amultidimensional sequence of contextual data, the proposed frame-work and the associated algorithms are able to effectively identifyrules with improved predictiveness, compared with using the loca-tion dimension alone.

6.3. Experiment II

In this experiment, we compare the proposed entropy-basedmethod (referred to as ENT) for rule reduction with the two methods

city per se, hence single dimensional rules using either Day or Pe-riod are spurious thus should not be considered.

Fig. 6(a)(c) depict the performance of compound rule bases RB1RB5 versus differentmin_conf thresholds (0.7, 0.8, 0.9, and 0.95 respec-s versus different min_conf thresholds.

discussed in Section 4. One is referred to as RM1 [6,7] and the otherRM2 [31,62]. The three reduction methods are applied to rule baseRB5 only; reduction results are reported in Table 3. Note that

reduction does not deteriorate the overall prediction performance(F1)while the size of the rule base is reduced. On the other hand, a qual-ity rule base, in which redundancy is minimized and precision isguaranteed, can indeed help detect the rare regularity of mobile cus-tomers. Furthermore, at almost all condence levels, the combinationof ENT and RM1 outperforms ENT or RM1 in precision, implying thatusing the two methods together can produce a smaller rule base withbetter precision than using them separately.

7. Conclusion

This research aims to provide a novel solution to enable context-dependentMPM.We propose a prediction framework in order to detecttemporal correlations between the mobile user's context and his/herpreferences. This framework consists of a series of algorithms that pro-cess a multidimensional contextual data sequence. As many challenges

Table 3Reduction results after applying different approaches on various rule bases.

Method L LD LP LDP Overall

ORIGINAL 39 31 83 3 156RM1 39 28 (9.68%) 79 (4.82%) 3 (0.00%) 149 (4.49%)RM2 39 31 (0.00%) 83 (0.00%) 3 (0.00%) 156 (0.00%)ENT 39 29 (6.45%) 74 (10.84%) 2 (33.33%) 144 (7.69%)RM1 + ENT 39 26 (16.13%) 73 (12.05%) 2 (33.33%) 140 (10.26%)

Note: The integers indicate the number of rules remaining after reduction andpercentages in parentheses indicate the ratio of reduction comparedwith the original rulebase. Labels L, LD, LP, and LDP are the same meaning as those described in Experiment I.Numbers in bold indicate the size of rule bases being actually reduced.

244 H. Tang et al. / Decision Support Systems 56 (2013) 234246RM2 has no impact on the rule base at all because the conditionsupp(r1) b supp(r2) almost consistently holds in datasets with largenumber of records, making the reduction criterion too loose to be appli-cable in screening multidimensional sequential rules.

We have scrutinized the results and found that some rules whichsurvive the screening of RM1 are ltered by ENT, for example, a rule{v1 = 2,12}{v1 = 2,13} {v4 = a,12} has two frequent special-izations {v1 = 2,12,v3 = 2}{v1 = 2,13,v3 = 2} {v4 = a,12}and {v1 = 2,12,v3 = 3}{v1 = 2,13,v3 = 3} {v4 = a,12} in therule base. Although the two specializations are retained in RM1 due totheir condence being higher than that of their generalization, theyare ltered by the ENT reduction method.

Experiment results in Table 3 also lead to an important implication inpractice: the twodifferent reductionmethods, ENT and RM1, complementeach other and can generate synergistic reduction results when used to-gether. This is because they dene the concept of redundancy fromdiffer-ent perspectives. RM1 views the higher-dimensional rule with lowercondence as a redundant rule, whereas ENT determines redundancy byexamining the frequency distribution of the rules in a specialization set.

To study whether the reduction has affected prediction effective-ness, we examine the prediction results using the reduced rule base.Fig. 7 depicts precision, recall and F1 scores at different condencelevels for both before and after the rule reduction process.

Fig. 7(a) shows that at almost all condence levels, the redundancy-removed rule base has higher precision than the original one, especiallywhen bothmethods (RM1 + ENT) are used. Fig. 7(b) shows that the re-call is reduced after some rules are removed from the original rule base,which reveals a side-effect of redundancy elimination: a relativelysmaller rule base tends to cover a smaller portion of all the involvedactional events. As such, rule reduction has a negative effect on recall.The overall prediction performance (F1) is compared in Fig. 7(c), inwhich the improvement of precision is neutralized, in large part, bythe decrease of recall. Nevertheless, as shown in Fig. 7(c), the rule

a bLegend:

Fig. 7. Comparison of Precision, Recahave to be addressed before multidimensional sequential rules can beutilized to its full potential, this paper describes not onlyways to discov-er multidimensional sequential rules, but also ways to overcome thecommon barrier in mining multidimensional sequential rules, i.e. elim-inating rules with redundant dimensional information. We rst formu-late the task of multidimensional sequential rule mining into theclassical association rule-mining problem by introducing the concept ofsnapshot and event, and thendevelop the corresponding rule-learning al-gorithm based on Apriori-like method to identify multidimensional se-quential rules. Furthermore, we develop an entropy-based measure toquantify the redundancy level ofmultidimensional rules in order to selectrules effective for prediction. The approximate stringmatchingmethod isaltered to allow online prediction.

To evaluate our framework and demonstrate its applicability, we val-idate the proposed algorithms using the Nokia context dataset that con-tains mobile device carriers' contextual information along with theirinteractions with mobile devices. The experimental results indicatethat the proposed approach can effectively identifymultidimensional se-quential rules. Moreover, rule reduction coupled with redundancymea-sures can effectively remove redundant rules while preservingpredictiveness of the rule base. Additionally, although it has beenwide-ly recognized intuitively that multidimensional rules can predict withbetter accuracy than single-dimensional rules, we have empiricallyproven that the actual performance depends on the dimensions used.

Other than the MPM domain, the proposed framework for miningmultidimensional data sequences may have extensive applications inother areas where multidimensionality needs to be incorporated inthe mining process to enhance the prediction of future events.

Our approach does not attempt to make precise predictions aboutpeople's daily activities, which could be extremely difcult due to theinherent randomness of human behavior. Instead, we seek to discoversignicant patterns from a customer's contextual information, basedon which we hope some interesting actional events can be anticipated.

cll, and F1 in reduced rule bases.

Hence, the effectiveness of the prediction model, in large part, relies onthe extent of regularity of a customer's daily activities.

We plan to further enhance this study as follows. The rule-learning method used in this research is based on the Apriorisupport-condence lters. This mechanism, however, cannot spotrules with low support but high condence for sparse patterns withhigh business value. A solution to address this problem is to lterrules based on only the condence constraint. This idea is used in tra-ditional association mining studies [29,51], but has not attractedmuch attention from the perspective of sequential rule mining. Addi-tionally, with Apriori rule generation mechanism, if most of the rulesin a specialization set are infrequent, calculation of its redundancycan be inaccurate because infrequent rules will not be generated inthe rst place and thus their information amount has to be estimatedby assuming that they are equally probable. A condence-based lter,consequently, may also help address this problem and thereby fur-ther enhance the proposed reduction method.

Acknowledgment

We would like to thank the anonymous reviewers for the con-structive review. The comments are of great help in improving themanuscript. This research was mainly supported by grant numberedMYRG005(Y1-L1)-FBA12-TH of University of Macau. It is also sup-ported by grants from National Natural Science Foundation of China(71090402 & 71002064).

Appendix A.Table of symbols and denotations

Symbol Denition

245H. Tang et al. / Decision Support Systems 56 (2013) 234246T = {t1,t2, ,tn} A discrete time domainD D1 D2 Dm1 An (m + 1)-dimensional state spaceS(t) = {v1, ,vm + 1},t T

An snapshot mapping S : TD

e Event, a subset of the state space De. dim The dimensions set of eseqT A sequence dened on T: seqT = {S(t), t T}r = b e1, e2, , el > A rule in which e1;; el D, also written as b e1, e2, ,

el 1 > elr. dim The common dimension set of the antecedent of rg The maximum time difference allowed between the

occurring time of any two neighboring events (gap)w The maximum time difference allowed between the

occurring time of the rst and the last events (width)or = b t1o, t2o, , tlo > An occurrence of rule r subject to g and w,

also called a (g,w)-occurrenceOccr(r, seqT, g, w) The set of all (g,w)-occurrences of r in seqTwin = [ti, tj) A time window dened by the time points at

borders ti and tjW(seqT, w) The set of all windows with width w in the span of seqTWr(seqT, g, w) The set of windows in W(seqT, w)

that contain any occurrence of rconcat(r1,r2) Concatenation of rules r1 and r2Candi The set of candidate rules with length iFreqi The set of frequent rules with length isupp(r, seqT, g, w) Support of rule rmin_supp The threshold of minimum supportconf(r, seqT, g, w) Condence of rule rmin_conf The threshold of minimum condencespec(r,Dm) The set of all possible specialization of r on DmR(r,Dm) The set ofm-dimensional frequent rules with respect to rRedun(r,Dm) The redundancy of spec(r,Dm)matched[] The table to maintain the last matched

snapshot of each ruletime_last[] The table to maintain the time point of the last matched

snapshot of each ruletime_rst[] The table to maintain the time point of the rst

matched snapshot of each ruleReferences

[1] A.S. Abrahams, A. Becker, D. Fleder, I.C. MacMillan, Handling generalized costfunctions in the partitioning optimization problem through sequential binaryprogramming, Proceedings of the 5th IEEE International Conference on Data Min-ing, IEEE Computer Society, Houston, Texas, 2005, pp. 39.

[2] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating contextu-al information in recommender systems using a multidimensional approach,ACM Transactions on Information Systems 23 (1) (2005) 103145.

[3] G. Adomavicius, A. Tuzhilin, Expert-driven validation of rule-based user modelsin personalization applications, Data Mining and Knowledge Discovery 5 (1)(2001) 3358.

[4] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large data-bases, Proc. of the 20th Internat. Conf. on Very Large Data Bases, Morgan KaufmannPublishers Inc., Santiago, Chile, 1994, pp. 487499.

[5] R. Agrawal, R. Srikant, Mining sequential patterns, Proc. of the 11th Internat. Conf.on Data Engrg, 1995, pp. 314.

[6] M.Z. Ashra, D. Taniar, K. Smith, A new approach of eliminating redundant asso-ciation rules, Database And Expert Systems Appl.: 15th Internat. Conf, Springer,Zaragoza, Spain, 2004, pp. 465474.

[7] M.Z. Ashra, D. Taniar, K. Smith, Redundant association rules reduction tech-niques, AI 2005: Adv. in Articial Intelligence: 18th Australian Joint Conf. on Arti-cial Intelligence, Springer, Sydney, Australia, 2005, pp. 2963.

[8] M. Atallah, R. Gwadera, W. Szpankowski, Detection of signicant sets of epi-sodes in event sequences, Proc. of the 4th Internat. Conf. on Data Mining, 2004,pp. 310.

[9] M. Baldauf, S. Dustdar, F. Rosenberg, A survey on context-aware systems, Interna-tional Journal of Ad Hoc and Ubiquitous Computing 2 (4) (2007) 263277.

[10] C. Bettini, X.S. Wang, S. Jajodia, J.L. Lin, Discovering frequent event patterns withmultiple granularities in time sequences, IEEE Transactions on Knowledge andData Engineering 10 (2) (1998) 222237.

[11] I. Bose, X. Chen, A framework for context sensitive services: a knowledge dis-covery based approach, Decision Support Systems 48 (1) (2009) 158168.

[12] C. Chateld, Time-series Forecasting, Chapman & Hall/CRC, 2001.[13] S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology,

ACM Sigmod Record 26 (1) (1997) 6574.[14] J. Cheng, Y. Ke, W. Ng, Delta-tolerance closed frequent itemsets, Proc. of the 6th

Internat. Conf. on Data Mining, IEEE Computer Society, Hong Kong, China, 2006,pp. 139148.

[15] J. Cheng, Y. Ke, W. Ng, Effective elimination of redundant association rules, DataMining and Knowledge Discovery 16 (2) (2008) 221249.

[16] D.-Y. Choi, Personalized local internet in the location-based mobile web search,Decision Support Systems 43 (1) (2000) 3145.

[17] M. Crochemore, C. Hancart, T. Lecroq, Algorithms on Strings, Cambridge UniversityPress, 2007.

[18] A. Flanagan, Nokia Context Data. , December 13, 2010 http://www.pervasive.jku.at/Research/Context_Database/index.php2004, (retrieved).

[19] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal ofMachine Learning Research 3 (2003) 11571182.

[20] I.A.N. Hacking, Equipossibility theories of probability, The British Journal for thePhilosophy of Science 22 (4) (2004) 339355.

[21] J. Han,M. Kamber, DataMining: Concepts and Techniques, 2nd ed.Morgan Kaufmann,2006.

[22] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative lter-ing recommender systems, ACM Transactions on Information Systems (TOIS) 22(1) (2004) 553.

[23] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press,2003.

[24] H.A. Karimi, X. Liu, A predictive location model for location-based services, Proc.of the 11th ACM Internat. symposium on Adv. in geographic information systems,2003, pp. 126133, , (ACM, New Orleans, Louisiana, USA).

[25] J. Kogan, C. Nicholas, M. Teboulle, Grouping multidimensional data: recent ad-vances in clustering, Springer-Verlag, New York, 2006.

[26] R. Kohavi, G.H. John, Wrappers for feature subset selection, Articial Intelligence97 (12) (1997) 273324.

[27] S. Laxman, P.S. Sastry, K.P. Unnikrishnan, Discovering frequent generalized episodeswhen events persist for different durations, IEEE Transactions on Knowledge andData Engineering 19 (9) (2007) 11881201.

[28] K.F. Lee, H.W. Hon, R. Reddy, An overview of the SPHINX speech recognitionsystem, IEEE Transactions on Acoustics, Speech, and Signal Processing 38 (1)(1990) 3545.

[29] J. Li, X. Zhang, G. Dong, K. Ramamohanarao, Q. Sun, Efcient mining of high con-dence association rules without support thresholds, Lecture Notes in ComputerScience 1704/1999 (1999) 406411.

[30] J. Lin, Divergence measures based on the Shannon entropy, IEEE Transactions onInformation Theory 37 (1) (1991) 145151.

[31] C. Loglisci, D. Malerba, Mining multiple level non-redundant association rulesthrough two-fold pruning of redundancies, Proc. of the 6th Internat. Conf. on Ma-chine Learn. and Data Mining in Pattern Recognition, Springer, 2009, p. 265.

[32] H. Mannila, H. Toivonen, Discovering generalized episodes using minimal occur-rences, Proc. of the 2nd Internat. Conf. on Knowledge Discovery in Databases andData Mining, 1996, pp. 146151.

[33] H. Mannila, H. Toivonen, A. Inkeri Verkamo, Discovery of frequent episodes inevent sequences, Data Mining and Knowledge Discovery 1 (3) (1997) 259289.

[34] D. Mladenic, M. Grobelnik, Feature selection on hierarchy of web documents,

Decision Support Systems 35 (1) (1994) 4587.

[35] B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, M.C. Hsu, Mining sequential patternsby pattern-growth: the PrexSpan Approach, IEEE Transactions on Knowledgeand Data Engineering 16 (11) (2004) 14241440.

[36] B.P.S. Murthi, S. Sarkar, The role of the management sciences in research on per-sonalization, Management Science 49 (10) (2003) 13441362.

[37] E.W.T. Ngai, A. Gunasekaran, A review for mobile commerce research and appli-cations, Decision Support Systems 43 (1) (2007) 315.

[38] T.A. Nguyen, W.A. Perkins, T.J. Laffey, D. Pecora, Knowledge-base verication, AIMagazine 8 (2) (1987) 6975.

[39] D. O'Shea, Small screen for rent, Telephony Online, 2007. (December 13, 2010http://telephonyonline.com/mag/telecom_small_screen_rent/, retrieved).

[40] B. Padmanabhan, Z. Zheng, S. Kimbrough, An empirical analysis of the value ofcomplete information for eCRM models, MIS Quarterly 30 (2) (2006) 247267.

[41] D. Peppers, M. Rogers, Enterprise One to One: Tools for Competing in the Interac-tive Age, Currency-Doubleday, New York, 1997.

[42] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, U. Dayal, Multi-dimensional sequentialpattern mining, Proc. of the 10th Internat. Conf. on Inform. and Knowledge Man-agement, ACM Press, New York, 2001, pp. 8188.

[43] M. Plantevit, T. Charnois, J. Klema, C. Rigotti, B. Cremilleux, Combining sequence anditemset mining to discover named entities in biomedical texts: a new type of pattern,International Journal of Data Mining, Modelling and Management 1 (2) (2009)119148.

[44] J.H. Schiller, A. Voisard, Location-based Services, Morgan Kaufmann, San Francisco,2004.

[45] R. Srikant, R. Agrawal, Mining sequential patterns: generalizations and perfor-mance improvements, Proc. of the 5th Internat. Conf. on Extending DatabaseTech, Springer, Avignon, France, 1996, pp. 317.

[46] Z. Su, Q. Yang, Y. Lu, H. Zhang, WhatNext: a prediction system for Web requestsusing n-gram sequence models, Proc. of the 1st Internat. Conf. on Web Inform.Systems Engrg., 2000, pp. 214221.

[47] P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison-Wesley,2005.

[48] C.J. Van Rijsbergen, Information Retrieval, Butterworth-Heinemann Newton, MA,USA, 1979.

[49] A.I. Vermesan, Quality assessment of knowledge-based software: some certica-tion considerations, Proc. of the 3rd Internat. Software Engrg. Standards Sympos,1997, pp. 16.

[54] O. White, T. Dunning, G. Sutton, M. Adams, J.C. Venter, C. Fields, A quality controlalgorithm for DNA sequencing projects, Nucleic Acids Research 21 (16) (1993)38293838.

[55] X. Yan, J. Han, R. Afshar, CloSpan: mining closed sequential patterns in largedatasets, SIAM Internat. Conf. on Data Mining, Society for Industrial and AppliedMathematics, San Francisco, CA, 2003.

[56] Q. Yang, T. Li, K. Wang, Building association-rule based sequential classiers forWeb-document prediction, Data Mining and Knowledge Discovery 8 (3) (2004)253273.

[57] Q. Yang, J. Yin, C. Ling, R. Pan, Extracting actionable knowledge from decisiontrees, IEEE Transactions on Knowledge and Data Engineering 18 (16) (2006).

[58] Q. Yang, H.H. Zhang, Web-log mining for predictive Web caching, IEEE Transac-tions on Knowledge and Data Engineering 15 (4) (2003) 10501053.

[59] C.C. Yu, Y.L. Chen, Mining sequential patterns from multidimensional sequencedata, IEEE Transactions on Knowledge and Data Engineering 17 (1) (2005)136140.

[60] S.T. Yuan, C. Cheng, Ontology-based personalized couple clustering for heterogeneousproduct recommendation in mobile marketing, Expert Systems with Applications 26(4) (2004) 461476.

[61] M.J. Zaki, SPADE: an efcient algorithm for mining frequent sequences, MachineLearning 42 (1) (2001) 3160.

[62] M.J. Zaki, Mining non-redundant association rules, Data Mining and KnowledgeDiscovery 9 (3) (2004) 223248.

[63] Y. Zhao, S. Zhang, Generalized dimension-reduction framework for recent-biasedtime series analysis, IEEE Transactions on Knowledge and Data Engineering 18(2) (2006) 231244.

Heng Tang received his Ph.D degree in the Department of Information Systems, CityUniversity of Hong Kong. He is currently an assistant professor at the Department ofAccounting and Information Management, University of Macau. His research interestsare in the areas of Data Mining, Recommender Systems, and Mobile Computing.

Stephen Shaoyi Liao is currently a professor at the Department of Information Sys-tems, City University of Hong Kong. He obtained his Ph.D degree in Information Sys-tems from Aix-Marseille University, France. His main research areas include mobileapplications and Intelligent Transportation Systems.

Sherry Xiaoyun Sun received M.S. and Ph.D degrees in Management Information Sys-tems from Eller College of Management, The University of Arizona, Tucson, Arizona,

246 H. Tang et al. / Decision Support Systems 56 (2013) 234246tenance, IEEE Transactions on Knowledge and Data Engineering 19 (8) (2007)10421056.

[51] K. Wang, Y. He, D.W. Cheung, Mining condent rules without support require-ment, Proc. of the 10th Internat. Conf. on Inform. and Knowledge Management,ACM, Atlanta, Georgia, 2001, pp. 8996.

[52] W. Wang, J. Yang, Mining Sequential Patterns From Large Data Sets, Springer,2005.

[53] W.W.S. Wei, Time Series Analysis: Univariate and Multivariate Methods, AddisonWesley, 1989.USA. She is currently an assistant professor at the Department of Information Systems,City University of Hong Kong. Her research mainly focuses on business process man-agement, electronic commerce, and service oriented computing. Her research workhas been published in Information Systems Research, IEEE Transactions on Systems,Man, and Cybernetics, Journal of Systems and Software, Information Systems Frontier,Knowledge-Based Systems and others.[50] J. Wang, J. Han, C. Li, Frequent closed sequence mining without candidate main-

A prediction framework based on contextual data to support Mobile Personalized Marketing1. Introduction2. Literature review2.1. Sequential pattern mining2.2. Multidimensional sequence2.3. Rule reduction2.4. Association rule based prediction

3. Learning rules from multidimensional data sequence3.1. Problem statement3.2. The mining algorithm

4. Rule reduction5. Generating prediction by rule matching6. Evaluation6.1. Data description and measures6.2. Experiment I6.3. Experiment II

7. ConclusionAcknowledgmentAppendix A. Table of symbols and denotationsReferences

mobile marketing

Documents

Transcript of mobile marketing