Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

11
Chapter 60 DATA MINING FOR CRM Kurt Thearling Abstract Data Mining technology allows marketing organizations to better understand their customers and respond to their needs. This chapter describes how Data Mining can be combined with customer relationship management to help drive improved interactions with customers. An example showing how to use Data Mining to drive customer acquisitionactivities is presented. Keywords: Customer Relationship Management (CRM), campaign management, customer acquisition, scoring 1. What is CRM? It is now a cliche that in the days of the corner market, shopkeepers had no trouble understanding their customers and responding quickly to their needs. The shopkeepers would simply keep track of each customer in their heads, and would know what to do when a customer walked into the store. But today's shopkeepersface a much more complex situation. More customers, more prod- ucts, more competitors, and less time to react means that understanding your customers is now much harder to do. This is where customer relationship man- agement (CRM) comes in. CRM lets companies design, manage, and execute strategies for interacting with customers (and potential customers). CRM can be applied to the complete customer life-cycle, from acquisition, to ongoing account management, to cross-selling, to customer retention and attrition. The goal of CRM is to allow marketing organizations to tune the customer interaction strategies to the specific needs of each individual, giving customers what they want, when they want it. Instead of interacting with large numbers of customers en masse (consider billboards or magazine advertisements), the new role of marketing is to interact with individual customers. This involves identifying and understanding unique customer patterns as well as being able to create customized offers for small customer groups that correspond to those patterns. For example, a pattern might be that 71% of cell-phone customers

Transcript of Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Page 1: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Chapter 60

DATA MINING FOR CRM

Kurt Thearling

Abstract Data Mining technology allows marketing organizations to better understand their customers and respond to their needs. This chapter describes how Data Mining can be combined with customer relationship management to help drive improved interactions with customers. An example showing how to use Data Mining to drive customer acquisition activities is presented.

Keywords: Customer Relationship Management (CRM), campaign management, customer acquisition, scoring

1. What is CRM? It is now a cliche that in the days of the corner market, shopkeepers had no

trouble understanding their customers and responding quickly to their needs. The shopkeepers would simply keep track of each customer in their heads, and would know what to do when a customer walked into the store. But today's shopkeepers face a much more complex situation. More customers, more prod- ucts, more competitors, and less time to react means that understanding your customers is now much harder to do. This is where customer relationship man- agement (CRM) comes in. CRM lets companies design, manage, and execute strategies for interacting with customers (and potential customers). CRM can be applied to the complete customer life-cycle, from acquisition, to ongoing account management, to cross-selling, to customer retention and attrition.

The goal of CRM is to allow marketing organizations to tune the customer interaction strategies to the specific needs of each individual, giving customers what they want, when they want it. Instead of interacting with large numbers of customers en masse (consider billboards or magazine advertisements), the new role of marketing is to interact with individual customers. This involves identifying and understanding unique customer patterns as well as being able to create customized offers for small customer groups that correspond to those patterns. For example, a pattern might be that 71% of cell-phone customers

Page 2: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

1250 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK

that make five or more calls to customer support in the first month cancel their service. A marketing manger could use this pattern to identify dissatisfied customers and proactively respond to their needs before they cancel.

As a result of the complex interactions that are now possible, the function of marketing is increasingly becoming tied to technology, ranging from com- plex Data Mining algorithms to campaign management software applications. Campaign management software allows marketing professionals to segment groups of customers (and prospective customers) into smaller groups and then specify the interaction that should take place with those individuals. For ex- ample, consider a marketing manager for a cellular phone company that is focusing on customer retention. There might be a large number of reasons that a customer chooses to leave their cellular provider, and the marketing man- ager is responsible for identifying ways to reduce this problem. One group of customers might be leaving because they are experiencing technical problems (e.g., frequent dropped calls) while another group might be leaving because the plan they are signed up for does not match their current calling patterns (e.g., a local calling plan with a large number of national calls).

A user of campaign management software would define these segments by selecting customers in the database that have the desired characteristics. For the customers with technical problems, the marketer could create a customer segment that selects those customers who have had more than five dropped calls within the last month. Once the segment is defined, it needs to be as- sociated with offers that will be communicated to the customers in order to improve retention. In the case of customers with technical problems, the offer might be a rebate of one month's charges and a promise to improve service quality. This offer could be communicated by a call from customer service or via a piece of direct mail (email might be a third option, if the customer's email address is available). The campaign management application would take the segment and split it into two groups, half receiving a phone call and the other half receiving a piece of direct mail (which half a customer fell into would be by random selection). Once the the segmentation is defined and the marketing manager is satisfied with the campaign, it would need to be executed. This would be handled by a scheduler that executes the campaign at regular inter- vals (e.g., every night at 2am). Upon execution of the campaign, the segments associated with the phone call would be passed to the call-center software sys- tem, which would queue up the customers who are supposed to receive the offer along with the specifics of the script that the operator is supposed to use (full or half month rebate). The direct mail segments would likely be handled differently, possibly by using an external vendor (a "mail shop") that would take a list of customers and produce the actual envelopes that would be mailed. In this case, the campaign management system would generate a file listing each of the customers, including their address and offer type.

Page 3: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Data Mining for CRM 125 1

2. Data Mining and Campaign Management In the above discussion of campaign management, the selection criteria used

to define customer segments ("five dropped calls within the last month") was static, based on historical values stored in a database. Alternatively, some deci- sions might be based on predicted values (scores) that are output by Data Min- ing models. Scores can take just about any form, from numbers to strings to entire data structures, but the most common scores are numbers (for example, the probability of responding to a particular promotional offer). These scores can be combined with static values to select the most appropriate prospects for a targeted marketing campaign.

The actual execution of a Data Mining model (scoring) is distinct from the process that creates the model. Typically, a model is used multiple times af- ter it is created to score data in different marketing campaigns. For example, consider a model that has been created to predict the probability that a cus- tomer will respond to the cell-phone retention campaign. The model would be built by using historical data from customers and calls, as well as the responses those customers had to various retention offers. After the model has been cre- ated based on historical data, it can then be scored on new data in order to make predictions about unseen behavior. This is what Data Mining is all about.

Scoring is the unglamorous workhorse of Data Mining. It doesn't have the sexiness of a neural network or a genetic algorithm, but without it, data min- ing is pretty useless. At the end of the day, after your Data Mining tools have given you a great predictive model, there's still a lot of work to be done. Scor- ing models against a customer database can be a time-consuming, error-prone activity, so the key is to integrate it smoothly with the rest of the CRM process. In the past, when a marketer wanted to run a campaign based on model scores, he or she would call the model builder to have the model manually run against a database so that a score file could be created. The marketer then had to solicit the help of an IT staffer to merge the scores with the marketing database. This disjointed process was fraught with problems and errors and could take several weeks. Often, by the time the models were integrated with the database, either the models were outdated or the campaign opportunity had passed.

The current solution is to integrate tightly Data Mining and campaign man- agement technologies. Under this scenario, marketers can invoke statistical models from within the campaign management application, score customer segments on the fly, and quickly create campaigns targeted at customer seg- ments offering the greatest potential. The past few years have seen significant improvements by CRM vendors with respect to integrating Data Mining into the CRM process. This trend is expected to continue resulting in CRM ap- plications driving more and more marketing activities based on Data Mining results.

Page 4: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

1252 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK

3. An Example: Customer Acquisition For most businesses, the primary means of growth involves acquiring new

customers. This could involve finding customers who previously were not aware of your product, were not candidates for purchasing your product (for example, baby diapers for new parents), or customers who in the past have bought from your competitors. Some of these customers might have been your customers previously, which could be an advantage (more data might be avail- able about them) or a disadvantage (they might have switched as a result of poor service). In any case, Data Mining can often help segment these prospec- tive customers and increase the response rates that an acquisition marketing campaign can achieve.

The traditional approach to customer acquisition involved a marketing man- ager developing a combination of mass marketing (magazine advertisements, billboards, etc.) and direct marketing (telemarketing, mail, etc.) campaigns based on their knowledge of the particular customer base that was being tar- geted. In the case of a marketing campaign trying to influence new parents to purchase a particular brand of diapers, the mass marketing advertisements might be focused in parenting magazines (naturally). The ads could also be placed in more mainstream publications whose readership demographics (age, marital status, gender, etc.) were similar to those of new parents.

In direct marketing, a marketing manager would select the demographics that they are interested in (which could very well be the same characteristics used for mass market advertising), and then work with a data vendor (some- times known as a service bureau) to obtain lists of customers meeting those criteria. Service bureaus have large databases containing millions of prospec- tive customers that can be segmented based on specific demographic criteria (age, gender, interest in particular subjects, etc.). To prepare for the "diapers" direct mail campaign, the marketing manager might request a list of prospects from a service bureau. This list could contain people, aged 18 to 30, who have recently purchased a baby stroller or crib (this information might be collected from people who have returned warranty cards for strollers or cribs). The ser- vice bureau would then provide the marketer with a computer file containing the names and addresses for these customers so that the diaper company can contact these customers with their marketing message.

It should be noted that because of the number of possible customer charac- teristics, the concept of "similar demographics" has traditionally been an art rather than a science. There usually are no hard-and-fast rules about whether two groups of customers share the same characteristics. In the end, much of the segmentation that took place in traditional direct marketing involved hunches on the part of the marketing professional. In the case of 18-to-30 year old purchasers of baby strollers, the hunch might be that people who purchase a

Page 5: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Data Mining for CRM 1253

stroller in this age group are probably making the purchase before the arrival of their first child (because strollers are saved and used for additional children) and they also haven't yet decided which brand of diapers to use. Seasoned veterans of the marketing game know their customers well and are often quite successful in making these kinds of decisions.

3.1 How Data Mining and Statistical Modeling Changes Things

Although a marketer with a wealth of experience can often choose rele- vant demographic selection criteria, the process becomes more difficult as the amount of data increases. The complexities of the patterns increase, both with the number of customers being considered and the increasing detail known about each customer. The past few years have seen tremendous growth in con- sumer databases, so the job of segmenting prospective customers is becoming overwhelming.

Data Mining can help this process, but it is by no means a solution to all of the problems associated with customer acquisition. The marketer will need to combine the potential customer list that Data Mining generates with offers that people are interested in. Deciding what is an interesting offer is where the art of marketing comes in.

3.2 Defining Some Key Acquisition Concepts Before the process of customer acquisition begins, it is important to think

about the goals of the marketing campaign. In most situations, the goal of an acquisition marketing campaign is to turn a group of potential customers into actual customers of your product or service. This is where things can get a bit fuzzy. There are usually many kinds of customers, and it can often take a significant amount of time before someone becomes a valuable customer. When the results of an acquisition campaign are evaluated, there are often different kinds of responses requiring consideration.

The responses that come in as a result of a marketing campaign are called "response behaviors". The use of the word "behavior" is important because the way in which different people respond to a particular marketing message can vary. This variation needs to be taken into consideration by the campaign and will likely result in different follow-up actions. A response behavior defines a distinct kind of customer action and categorizes the different possibilities so that they can be further analyzed and reported on.

Binary response behaviors are the simplest kind of response. With a binary response behavior, the customer response is either a yes or no. If someone is sent a catalog, did they buy something from the catalog or not? At the highest level, this is often the kind of response that is talked about. Binary response

Page 6: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

1254 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK

behaviors do not convey any subtle distinctions between customer actions, and these distinctions are not always necessary for effective marketing campaigns.

Beyond binary response behaviors are categorical response behaviors. As you would expect, a categorical response behavior allows for multiple behav- iors to be defined. The rules that define the behaviors are arbitrary and are based on the kind of business in which you are involved. Returning to the example of sending out catalogs, one response behavior might be defined to match if the customer purchased women's clothing from the catalog, whereas a different behavior might match when the customer purchased men's cloth- ing. These behaviors can be refined a far as deemed necessary (for example, "purchased men's red polo shirt."

It should be noted that it is possible for different response behaviors to over- lap. A behavior might be defined for customers that purchased over $100 from the catalog. This couId overlap with the "purchased men's clothing" behav- ior if the clothing that was purchased cost more than $100. Overlap can also be triggered if the customer purchases more than one item (both men's and women's shirts, for example) as a result of a single offer. Although the use of overlapping behaviors can tend to complicate analysis and reporting, the use of overlapping categorical response behaviors tends to be richer and therefore will provide a better understanding of your customers in the future.

Figure 60.1. Example Response Analysis Broken Down by Behavior.

There are usually several different kinds of positive response behaviors that can be associated with an acquisition marketing campaign. This assumes that the goal of the campaign is to increase customer purchases, as opposed to an informational marketing campaign in which customers are simply told of your company's existence. Some of the general categories of response behaviors (see Figure 60.1) are the following:

Page 7: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Data Mining for CRM 1255

Customer inquiry: The customer asks for more information about your products or services. This is a good start. The customer is definitely interested in your products - it could signal the beginning of a long- term customer relationship. You might also want to track conversions, which are follow-ups to inquiries that result in the purchase of a product.

rn Purchase of the offered product or products: This is the usual defini- tion of success. You offered your products to someone, and they decided to buy one or more of them. Within this category of response behaviors, there can be many different kinds of responses. As mentioned earlier, both "purchased men's clothing" and "purchased women's clothing" fit within this category.

rn Purchase of a product different than the ones offered: Despite the fact that the customer purchased one of your products, it wasn't the one you offered. You might have offered the deluxe product and they chose to purchase the standard model (or vice-versa). In some sense, this is very valuable response because you now have data on a cus- tomerlproduct combination that you would not otherwise have collected.

There are also typically two kinds of negative responses. The first is a non- response. This is not to be confused with a definite refusal of your offer. For example, if you contacted the customer via direct mail, there may be any num- ber of reasons why there was no response (wrong address, offer misplaced, etc.). Other customer contact channels (outbound telemarketing, email, etc.) can also result in ambiguous non-responses. The fact there was no response does not necessarily mean that the offer was rejected. As a result, the way you interpret a non-response as part of additional data analysis will need further consideration(more on this later).

A rejection by the prospective customer is the other kind of negative re- sponse. Depending on the offer and the contact channel, you can often deter- mine exactly whether or not the customer is interested in the offer (for exam- ple, an offer made via outbound telemarketing might result in a definitive "no, I'm not interested" response). Although it probably does not seem useful, the definitive "no'' response is often as valuable as the positive response when it comes to further analysis of customer interests.

3 3 It All Begins with the Data One of the differences between customer acquisition and most other mar-

keting applications of Data Mining revolves around the data that is used to build predictive models. The amount of information that you have about peo- ple with whom you do not yet have a relationship is much more limited than the information you have about your existing customers. In some cases, the

Page 8: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

1256 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK

data might be limited to their address andlor phone number. The key to this process is finding a relationship between the information that you do have and the behaviors you want to model.

Most acquisition marketing campaigns begin with the prospect list. A prospect list is simply a list of customers that have been selected because they are likely to be interested in your products or services. There are numerous companies around the world that will sell lists of customers, often with a par- ticular focus (for example, new parents, retired people, new car purchasers, etc.).

Sometimes, it is necessary to add additional information to a prospect list by overlaying data from other sources. For example, consider a prospect list that contains only names and addresses. In terms of a potential data mining analysis, the information contained in the prospect list is very weak. There might be some patterns in the city, state, or Zip code fields, but they would be limited in their predictive power. To augment the data, information about customers on the prospect list could be matched with external data. One simple overlay involves combining the customer's ZIP code with U.S. census data about average income, average age, and so on. This can be done manually or, as is often the case with overlays, your list provider can take care of this automatically.

More complicated overlays are also possible. Customers can be matched against purchase, response, and other detailed data that the data vendors col- lect and refine. This data comes from a variety of sources including retailers, state and local governments, and the customers themselves. If you are mailing out a car accessories catalog, it might be useful to overlay information (make, model, year) about any known cars that people on the prospect list might have registered with their department of motor vehicles.

3.4 Test Campaigns Once you have a list of prospective customers, there is still some work that

needs to be done before you can create predictive models for customer acqui- sition. Unless you have data available from previous acquisition campaigns, you will need to send out a test campaign in order to collect data for analysis. Besides the customers you have selected for your prospect list, it is impor- tant to include some other customers in the campaign, so that the data is as rich as possible for future analysis. For example, assume that your prospect list (that you purchased from a list broker) was composed of men over age 30 who recently purchased a new car. If you were to market to these prospective customers and then analyze the results, any patterns found by Data Mining would be limited to sub-segments of the group of men over 30 who bought a new car. What about women or people under age 30? By not including these

Page 9: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Data Mining for CRM 1257

people in your test campaign, it will be difficult to expand future campaigns to include segments of the population that are not in your initial prospect list. The solution is to include a small random selection of customers whose de- mographics differ from the initial prospect list. This random selection should constitute only a small percentage of the overall marketing campaign, but it will provide valuable information for data mining. You will need to work with your data vendor in order to add a random sample to the prospect list. More sophisticated techniques than random selection do exist, such as those found in statistical design of experiments (DOE).

Although this circular process (customer interaction -t data collection --, Data Mining --+ customer interaction) exists in almost every application of Data Mining to marketing, there is more room for refinement in customer ac- quisition campaigns. Not only do the customers that are included in the cam- paigns change over time, but the data itself can also change. Additional overlay information can be included in the analysis when it becomes available. Also, by using random selection in the test campaigns, new segments of people can be added to your customer pool.

Once you have started your test campaign, the job of collecting and catego- rizing the response behaviors begins. Immediately after the campaign offers go out, responses must be tracked. The nature of the response process is such that responses tend to trickle in over time, which means that the campaign can drag on forever. In most real-world situations, though, there is a threshold after which you no longer look for responses. At that time, any customers on the prospect list that have not responded are deemed "non-responses." Before the threshold, customers who have not responded are in a state of limbo, some- where between a response and a non-response.

3.5 Building Data Mining Models Using Response Behaviors

With the test campaign response data in hand, the actual mining of customer response behaviors can begin. The first part of this process requires you to choose which behaviors you are interested in predicting, and at what level of detail. The level at which the predictive models work should reflect the kinds of offers that you can make, not the kinds of responses that you can track. It might be useful (for reporting purposes) to track catalog clothing purchases down to the level of color and size. If all catalogs are the same, however, the specifics of a customer purchase don't really matter for the Data Mining analysis. In this case (all catalogs are the same), binary response prediction is the way to go. If separate men's and women's catalogs are available, analyzing response behaviors at the gender level would be appropriate. In either case, it is a straightforward process to turn the lower-level categorical behaviors into

Page 10: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

1258 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK

a set of responses at the desired level of granularity. If there are overlapping response behaviors, the duplicates should be removed prior to mining.

In some circumstances, predicting individual response behaviors might be an appropriate course of action. With the movement toward one-to-one cus- tomer marketing, the idea of catalogs that are custom-produced for each cus- tomer is moving closer to reality. Existing channels such as the Internet or outbound telemarketing also allow you to be more specific in the ways you target the exact wants and needs of your prospective customers. A significant drawback of the modeling of individual response behaviors is that the analyti- cal processing power required can grow dramatically because the Data Mining process needs to be carried our multiple times, once for each response behavior that you are interested in.

How you handle negative responses also needs to be thought out prior to the data analysis phase. As discussed previously, there are two kinds of negative responses: rejections and non-responses. Rejections, by their nature, corre- spond to specific records in the database that indicate the negative customer response. Non-responses, on the other hand, typically do not represent records in the database. Non-responses usually correspond to the absence of a response behavior record in the database for customers who received the offer.

There are two ways in which to handle non-responses. The most common way is to translate all non-responses into rejections, either explicitly (by creat- ing rejection records for the non-responding customers) or implicitly (usually a function of the Data Mining software used). This approach will create a data set comprised of all customers who have received offers, with each customer's response being positive (inquiry or purchase) or negative (rejections and non- responses).

The second approach is to leave non-responses out of the analysis data set. This approach is not typically used because it throws away so much data, but it might make sense if the number of actual rejections is large (relative to the number of non-responses); experience has shown that non-responses do not necessarily correspond to a rejection of your product or services offering.

Once the data has been prepared, the actual Data Mining can be performed. The target variable that the Data Mining software will predict is the response behavior type at the level you have chosen (binary or categorical). Because some Data Mining applications cannot predict non-binary variables, some fi- nessing of the data will be required if you are modeling categorical responses using non-categorical software. The inputs to the Data Mining system are the input variables and all of the demographic characteristics that you might have available, especially any overlay data that you combined with your prospect list.

In the end, a model (or models, if you are predicting multiple categorical response behaviors) will be produced that will predict the response behaviors

Page 11: Data Mining and Knowledge Discovery Handbook || Data Mining for CRM

Data Mining for CRM 1259

that you are interested in. The models can then be used to score lists of prospect customers in order to select only those who are likely to response to your offer. Depending on how the data vendors you work with operate, you might be able to provide them with the model, and have them send you only the best prospects. In the situation in which you are purchasing overlay data in order to aid in the selection of prospects, the output of the modeling process should be used to determine whether all of the overlay data is necessary. If a model does not use some of the overlay variables, in the interests of economy, you might consider leaving out these unused variables the next time you purchase a prospect list.