Credit scoring for profitability objectives

10
Innovative Applications of O.R. Credit scoring for profitability objectives Steven Finlay * Department of Management Science, Lancaster University, Lancaster, UK article info Article history: Received 11 August 2008 Accepted 18 May 2009 Available online 24 May 2009 Keywords: OR in banking Credit scoring Genetic algorithms Profitability abstract In consumer credit markets lending decisions are usually represented as a set of classification problems. The objective is to predict the likelihood of customers ending up in one of a finite number of states, such as good/bad payer, responder/non-responder and transactor/non-transactor. Decision rules are then applied on the basis of the resulting model estimates. However, this represents a misspecification of the true objectives of commercial lenders, which are better described in terms of continuous financial measures such as bad debt, revenue and profit contribution. In this paper, an empirical study is under- taken to compare predictive models of continuous financial behaviour with binary models of customer default. The results show models of continuous financial behaviour to outperform classification approaches. They also demonstrate that scoring functions developed to specifically optimize profit con- tribution, using genetic algorithms, outperform scoring functions derived from optimizing more general functions such as sum of squared error. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction The first research into credit scoring (the application of quanti- tative methods to consumer risk assessment) was undertaken by Durand, who used quadratic discriminant analysis to classify credit applications as good or bad payers (Durand, 1941). Since then, the most popular approaches to consumer risk assessment have con- tinued to treat lending decisions as binary classification problems (Hand and Henley, 1997; Thomas et al., 2002; Finlay, 2008a). Data about individuals is collected from the time lending decisions are made and their behaviour observed over a period of a few months or years. On the basis of their behaviour individuals are classified as good or bad payers. Classification or regression methods are then applied to create predictive models that are applied to new credit applications in the future. Those that the model predicts have a high likelihood of being a good payer are accepted, while those with a low likelihood are rejected. In more recent times lend- ers have moved on to apply classification approaches to other as- pects of customer behaviour, constructing models to predict behaviours such as the likelihood to respond to a mailing, the pro- pensity to revolve a balance on a credit card or likelihood of attri- tion to a rival product (Thomas et al., 2002). It is therefore, becoming increasingly common to create and use a number of dif- ferent models of customer behaviour in combination to make deci- sions about who to lend to and on what terms. There are a number of arguments that can be raised against applying traditional classification approaches to credit scoring problems, and in this paper two of them are explored. First, the loss function of interest to the user is often different from the loss func- tion used during model development. This has long been recog- nized as an issue for forecasting problems in general (Fildes and Makridakis, 1988) and has been discussed in a number of more re- cent papers in relation to credit scoring (Finlay, 2005, 2009b; Hand, 2005; Hand et al., 2008; Andreeva et al., 2007). Consider logistic regression, the most popular technique used for constructing credit scoring models (Thomas et al., 2001a; Crook et al., 2007). The dependent variable, y, can take values of 1 or 0 and a model is de- rived that maximizes the likelihood over the set of n observed cases: Y n i¼1 P y i i ð1 P i Þ ð1y i Þ ; ð1Þ where P i is the posterior probability that y i = 1, calculated as a func- tion of the independent variables, x i . Yet, for many practitioners likelihood is of little interest, and the accuracy of point estimates for individual observations are not important. Instead, practitioners are primarily interested in the properties of the distribution of mod- el scores and the accurate ranking of scores within this distribution (Thomas et al., 2001a). The second argument against using classification approaches is more strategic in nature. The construction of binary classification models of behaviour is accepted practice within the credit scoring community, but often it is not a true representation of a commer- cial lender’s main objectives. A model that has been constructed to minimize the misclassification of good and bad payers, based on default behaviour, is at best a crude approximation to the primary objective of identifying those customers that will generate a 0377-2217/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2009.05.025 * Tel.: +44 07964 316465. E-mail addresses: steve.fi[email protected], s.fi[email protected] European Journal of Operational Research 202 (2010) 528–537 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Transcript of Credit scoring for profitability objectives

Page 1: Credit scoring for profitability objectives

European Journal of Operational Research 202 (2010) 528–537

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Innovative Applications of O.R.

Credit scoring for profitability objectives

Steven Finlay *

Department of Management Science, Lancaster University, Lancaster, UK

a r t i c l e i n f o a b s t r a c t

Article history:Received 11 August 2008Accepted 18 May 2009Available online 24 May 2009

Keywords:OR in bankingCredit scoringGenetic algorithmsProfitability

0377-2217/$ - see front matter � 2009 Elsevier B.V. Adoi:10.1016/j.ejor.2009.05.025

* Tel.: +44 07964 316465.E-mail addresses: [email protected], s.fin

In consumer credit markets lending decisions are usually represented as a set of classification problems.The objective is to predict the likelihood of customers ending up in one of a finite number of states, suchas good/bad payer, responder/non-responder and transactor/non-transactor. Decision rules are thenapplied on the basis of the resulting model estimates. However, this represents a misspecification ofthe true objectives of commercial lenders, which are better described in terms of continuous financialmeasures such as bad debt, revenue and profit contribution. In this paper, an empirical study is under-taken to compare predictive models of continuous financial behaviour with binary models of customerdefault. The results show models of continuous financial behaviour to outperform classificationapproaches. They also demonstrate that scoring functions developed to specifically optimize profit con-tribution, using genetic algorithms, outperform scoring functions derived from optimizing more generalfunctions such as sum of squared error.

� 2009 Elsevier B.V. All rights reserved.

1. Introduction

The first research into credit scoring (the application of quanti-tative methods to consumer risk assessment) was undertaken byDurand, who used quadratic discriminant analysis to classify creditapplications as good or bad payers (Durand, 1941). Since then, themost popular approaches to consumer risk assessment have con-tinued to treat lending decisions as binary classification problems(Hand and Henley, 1997; Thomas et al., 2002; Finlay, 2008a). Dataabout individuals is collected from the time lending decisions aremade and their behaviour observed over a period of a few monthsor years. On the basis of their behaviour individuals are classifiedas good or bad payers. Classification or regression methods arethen applied to create predictive models that are applied to newcredit applications in the future. Those that the model predictshave a high likelihood of being a good payer are accepted, whilethose with a low likelihood are rejected. In more recent times lend-ers have moved on to apply classification approaches to other as-pects of customer behaviour, constructing models to predictbehaviours such as the likelihood to respond to a mailing, the pro-pensity to revolve a balance on a credit card or likelihood of attri-tion to a rival product (Thomas et al., 2002). It is therefore,becoming increasingly common to create and use a number of dif-ferent models of customer behaviour in combination to make deci-sions about who to lend to and on what terms.

There are a number of arguments that can be raised againstapplying traditional classification approaches to credit scoring

ll rights reserved.

[email protected]

problems, and in this paper two of them are explored. First, the lossfunction of interest to the user is often different from the loss func-tion used during model development. This has long been recog-nized as an issue for forecasting problems in general (Fildes andMakridakis, 1988) and has been discussed in a number of more re-cent papers in relation to credit scoring (Finlay, 2005, 2009b; Hand,2005; Hand et al., 2008; Andreeva et al., 2007). Consider logisticregression, the most popular technique used for constructing creditscoring models (Thomas et al., 2001a; Crook et al., 2007). Thedependent variable, y, can take values of 1 or 0 and a model is de-rived that maximizes the likelihood over the set of n observedcases:

Yn

i¼1

Pyii ð1� PiÞð1�yiÞ

� �; ð1Þ

where Pi is the posterior probability that yi = 1, calculated as a func-tion of the independent variables, xi. Yet, for many practitionerslikelihood is of little interest, and the accuracy of point estimatesfor individual observations are not important. Instead, practitionersare primarily interested in the properties of the distribution of mod-el scores and the accurate ranking of scores within this distribution(Thomas et al., 2001a).

The second argument against using classification approaches ismore strategic in nature. The construction of binary classificationmodels of behaviour is accepted practice within the credit scoringcommunity, but often it is not a true representation of a commer-cial lender’s main objectives. A model that has been constructed tominimize the misclassification of good and bad payers, based ondefault behaviour, is at best a crude approximation to the primaryobjective of identifying those customers that will generate a

Page 2: Credit scoring for profitability objectives

S. Finlay / European Journal of Operational Research 202 (2010) 528–537 529

positive contribution to profit. At worst, it might be a misspecifica-tion of the problem, if those classified as good or bad actually gen-erate a loss or profit respectively, despite their eventual repaymentclassification. For example, consider a credit card account that washighly utilized and maintained a revolving balance, but defaultedfor a small sum at the end of the observation period. The accountwould be classified as bad under most definitions of good/badpayer, but may have generated a positive contribution to profitover the period of observation.

To summarise, this paper looks at two issues concerning theappropriateness of applying binary classification methodologiesto the development of credit scoring model(s):

(1) The objective function does not represent customer behav-iour in terms of meaningful financial measures. Instead,crude measures such as good/bad, response/non responseetc. are used as substitutes.

(2) The modelling process does not take into account priorinformation about the lender’s decision making criteria.Instead, the objective function used to generate most creditscoring models maximizes/minimizes some generalisedmeasure such as likelihood or the sum of squared errors.

The remainder of this paper is in three parts. First, the treatmentof continuous financial measures for consumer risk assessment isdiscussed. Second, an empirical study is undertaken comparingthe properties of continuous models of financial behaviour with atraditional good/bad payer model constructed using logistic regres-sion. Third, a genetic algorithm (GA) is used to construct models offinancial behaviour that takes into account prior information aboutthe decision rules (the cut-off strategy) to be employed. The GAderived models are then compared with the first set of modelsdeveloped using more traditional methods. It is worth pointingout that GAs have been applied to credit scoring problems before(Fogarty and Ireson, 1993; Desai et al., 1997; Yobas et al., 2000;Finlay, 2005). However, a key element that differentiates this studyfrom previous ones is that it is the first to consider the use of a GA tooptimize continuous financial measures.

1.1. The treatment of continuous financial objectives in credit scoring

This is by no means the first paper to discuss the need to con-sider financial measures of customer behaviour when makingcredit granting decisions, and a number of approaches have beenproposed. The traditional approach is to weight the results pro-duced from a binary model of behaviour with prior informationabout the portfolio. For example, with linear discriminant analysis(Lachenbruch, 1982) the discriminant score function, S, is definedas:

Si ¼ ½xi � 0:5ðu1 þ u2Þ�TX�1ðu1 � u2Þ; ð2Þ

where xi is the vector for observation i with k independent vari-ables. u1 and u2 are the k means vectors for goods and bads respec-tively.

Pis the common covariance matrix.

If the objective is to use the score to accept only those likely togenerate a positive contribution, the rule to assign observations toeach class (the decision to accept or reject them) is augmentedwith average profit and loss information (Thomas et al., 2002):

Classify applicant i as good (and accept the application) if:

Si > lnLP2

RP1:

Otherwise classify as bad (and reject the application) where Pi is theprior probability of an observation being in class i, i = 1 for goodpayers, i = 2 for bad payers. L is the average loss for a bad payer,L P 0, R is the average return from a good payer, R > 0.

In general, for any method that generates probability estimatesof class membership, the general form of the decision rule to opti-mize the lending decision will be:

Si ¼ R � PðG=xiÞ � Lð1� PðG=xiÞÞ: ð3Þ

Accept applicant i if Si > 0: where P(G/xi) is the posterior probabilityof observation i being a good payer. However, this strategy is sim-plistic because R and L are assumed to take the same values forall model scores, and in situations where the ranking of accountsis of primary importance, there is no change to the ranking of ac-counts to adjust for those that generate greater/lesser contribution;i.e. the underlying model remains the same regardless of the valuesof R and L chosen. In practice, R and L may vary in which case a moreappropriate formulation of the score function is:

Si ¼ EðR=xi;GÞPðG=xiÞ � EðL=xi; BÞð1� PðG=xiÞÞ; ð4Þ

where E(R/x, G) and E(L/x, B) are the expected revenue/loss given xand the good/bad status respectively. Therefore, a practical imple-mentation of (4) would involve building models to estimate P(G/x), E(R/x, G) and E(L/x, B). However, R and L are not always easy todetermine, and this may be why (4) has received little attentionto date, in terms of empirical study relating to customer decisionmaking.

Eq. (4) is an improvement on Eq. (3), but it still represents anincorrect specification of the problem. Normally, the good/bad def-inition is based on delinquency states. Goods and bads are consid-ered to be mutually exclusive so that P(G/x) + P(B/x) = 1 (see Li andHand (2002) and Finlay (2008b) for notable exceptions to this).Correspondingly, it is assumed that the return, R, only applies togoods and losses, L, applies only to bads. The possibility of bad pay-ers generating a return prior to default is ignored, as is the possibil-ity of loss from a good. This feature can be accommodated within(4) by allowing R and L to take values in the range ±1, or byextending (4) so that the score function becomes:

Si ¼ PðG=xiÞ½EðRG=xi;GÞ � EðLG=xi;GÞ� þ ½1� PðG=xiÞ�� ½EðRB=xi;BÞ � EðLB=xi;BÞ�; ð5Þ

where the subscripts G and B for R and L represent the return/losson good and bad payers respectively. However, if in the final anal-ysis one is only interested in the return or loss generated from anaccount, the eventual delinquency status becomes something of amoot point. P(G/x) becomes redundant, leading to a much simplerexpression of the score function:

Si ¼ EðR=xiÞ � EðL=xiÞ ¼ EðC=xiÞ; ð6Þ

where C is the net contribution (R � L).A number of other approaches to dealing with financial out-

comes have also been proposed. Cyert et al. (1962) adopted a Mar-kov chain approach to examine the movement between differentdelinquency states over time, in order to estimate the bad debtcomponent of account contribution. The idea of using Markov pro-cesses was developed further by Thomas et al. (2001b) who pro-posed a profit maximization model based on a Markov processes/dynamic programming approach to determine optimal credit lim-its to assign to accounts. Oliver and Wells (2001) and Beling et al.(2005) suggested the definition of efficient frontier strategiesbased on profitability objectives, and an approach described byThomas et al. (2002) is to develop binary models of different as-pects of customer behaviour such as default, usage, acceptanceand attrition. These are then used in combination to segment apopulation, with decisions made on the basis of the financial prop-erties of each segment. While offering the potential to improve thedecision making process, none of these approaches model contin-uous financial measures of behaviour directly. All of them are

Page 3: Credit scoring for profitability objectives

530 S. Finlay / European Journal of Operational Research 202 (2010) 528–537

based on classification methodologies to a greater or less extent.Financial analysis is only applied as a second stage, based on theresulting model output(s) from the chosen classificationapproach(es).

1.2. Arguments against the use of profitability based objectivefunctions

It has been reported that a few lenders have tried constructingmodels to optimize profitability and other financial measures(Thomas et al., 2005). However, there are no case studies describ-ing the success or failure of such approaches within the public do-main. From the academic community there has been littletheoretical or empirical research either. The obvious questions thatneed to be raised are; why has the modelling of continuous finan-cial measures received such little coverage? and, why do classifica-tion models continue to dominate consumer lending decisionmaking? The original arguments against modelling financial mea-sures of behaviour centred around the lack of data lenders heldabout their customers. As Hopper and Lewis noted, at a portfoliolevel generating measures of profitability is relatively straightfor-ward (Hopper and Lewis, 1992). If financial information is avail-able, such as the revenue received, debt written-off, marketingspend and infrastructure costs, then it is relatively straight forwardto calculate the profitability of a portfolio. However, at the level ofan individual credit agreement the analysis is more complex,requiring far more detailed information to be maintained. Considercredit cards, where hundreds of account transactions can occur foreach card account each year. To create an accurate and precisemeasure of profitability requires detailed records of each transac-tion to be maintained, including the different interest ratescharged at different points in time for cash advances as opposedto retail purchases, account charges and late fees, as well as mar-keting and operational costs. Historically, lenders did not havethe IT or reporting infrastructures to be able to do this, and whileinformation systems have improved markedly, many organizationsstill struggle to define an accurate definition of the profitable cus-tomer, based on data held within their accounting systems (Finlay,2009a). Matters can be further complicated by political issues andaccounting practices leading to alternative and/or somewhat sub-jective views about how account profitability should be calculated(Finlay, 2008b). For example, application of IAS39 and US GAAPaccounting standards can result in different provision estimatesfor impaired loans.

One can counter the lack of information argument by adoptingthe position that full and accurate information about an account,or a precise definition of the measure(s) of interest, is not a require-ment for developing usable (although not optimal) forecastingmodels of financial behaviour. One can think of the decision makingprocess as lying along an ‘information spectrum’. At one end, verylittle is known about customer behaviour, perhaps nothing morethan the final default/non-default status of accounts. Faced withthis situation a lender will build models and make decisions onthe basis of this information; that is, models of default/non-defaultbehaviour. At the other end of the spectrum, where everything pos-sible is known about customer behaviour over the forecast horizon,a lender should use this information to build models and make in-formed decisions on the basis of profitability estimates. The ques-tion as to whether or not an individual defaults or not becomesirrelevant (it is acknowledged that there are operational reasonsfor estimating numbers of default cases such as staffing levels with-in debt recovery departments and as input to capital requirementscalculations). The situation facing lenders in the early part of the21st century is probably somewhere in the middle of these two ex-tremes. In this case, the lender’s strategy should be to use the addi-tional information available to improve the precision of the

modelling objective, in order to provide the best possible approxi-mation to the organization’s financial objectives.

The second line of reasoning used against the development anduse of continuous models of financial behaviour were based on theclaim that where attempts had been made to forecast such mea-sures directly, they did not prove to give any benefit over that pro-vided by classification models of good/bad payer behaviour (Lucas,2001). Lucas’s argument was that complex profit measures werefar more sensitive to events that occurred over the forecast horizonthan simple definitions of good and bad payer, based on the delin-quency status of the account. For example, increasing the creditlimit on a credit card might have little or no effect on the eventualdelinquency (and hence good/bad) status of an account, but profit-ability will be highly correlated with the limit increase if the cus-tomer makes use of the additional credit facility. Therefore, therelationship between the customer profile at the start of the out-come period and the profit generated by the end of the outcomeperiod will be weak. Taking this argument further, one might con-clude that the more action taken to control the customer’s behav-iour during the outcome period, by varying account features suchas the credit line or interest rate, the worse any forecast of cus-tomer profitability will be (Finlay, 2009a). This has lead some tosuggest that simple forecasting may be inappropriate. Instead, adynamic approach that incorporates possible future events shouldbe applied. For example, the use of decision trees to provide a viewas to the revenues and losses that result from a known sequence ofactions over the forecast horizon (Crowder et al., 2005). Alterna-tively, the application of survival analysis to predict the timing ofimportant events that impact on the profitability of accounts(Stepanova and Thomas, 2001, 2002; Andreeva et al., 2007). How-ever, for revolving credit, if the customer’s current borrowing isunconstrained by the existing terms of the credit offer, the profit-ability generated from the customer may be relatively invariant toany actions taken by a lender to extend more credit or otherwisealter the terms of the agreement. Hence, the profitability of ac-counts may be no more sensitive to a lender’s actions than thegood/bad payer status. This is particularly likely to apply to the rel-ative profitability of accounts in terms of their rank within the pop-ulation, and it has been demonstrated that good predictive modelsof account contribution can be constructed using standard regres-sion approaches (Finlay, 2008b). Relative ranking is also likely to bestable if the lender’s current lending strategy has evolved overmany years, and therefore already close to optimal. The effect ofnew strategies applied to accounts acts only to maintain the cur-rent status quo, not to provide any large performance increment.The estimated 68 percent of credit card users who do not revolvetheir balances (MORI Market Dynamics, 2004) is a group whoserelative contribution is likely to be fairly static regardless of theinterest rate or credit limit strategies applied to them.

Another argument put forward by Lucas was the ‘share of wallet’issue. A lender’s share of the customer’s total borrowing require-ment (and hence the profit they make) is dependent upon the ac-tions of other lenders who also have a financial relationship withthe customer. Given that lenders do not tend to have informationabout the marketing and operational strategies of their competi-tors, a credit scoring model designed to predict the financial behav-iour of accounts will inevitably generate poor forecasts due to thislack of information. This may be true for some customers, but againif we consider the credit card case where most customers do notmaintain a revolving credit balance, the offers and incentives putforward by the competition, such as low interest rates, balancetransfer options, high credit limits etc. are unlikely to prove attrac-tive to the majority of a lender’s book. Anecdotal evidence of this ispresented by considering the case of Barclaycard. For many yearsBarclaycard was the only major provider of VISA credit cards inthe UK. A huge wave of competition arrived in the late 1980s and

Page 4: Credit scoring for profitability objectives

S. Finlay / European Journal of Operational Research 202 (2010) 528–537 531

early 1990s, and today the UK market is an extremely competitiveone. However, despite many competitors offering far superior prod-uct terms than Barclaycard (in terms of interest rates and creditlimits) Barclaycard has retained its position as the market leaderby a considerable margin. The success of the new entrants in themarket can largely be attributed to the growth in the market –not from acquiring business from Barclaycard.

What must also be recognised when discussing ‘share of wallet’is the difference between accounts and customers, and the differ-ence between forecasting individual behaviour and forecastingportfolio behaviour. If an individual is in the market for say, a cred-it card, mortgage and an unsecured personal loan, then the actionstaken by different lenders to attract that customer may be veryinfluential on the eventual outcome as to which products the cus-tomer chooses from which lenders, and hence the share of walleteach lender obtains and the size and composition of their portfolio.However, at the account level, actual credit usage may be very sim-ilar, regardless of the product they choose. So, if an individual has acredit card with one lender, whether or not they choose to take outa mortgage with the same lender or another lender, will arguably,have little impact on their credit card usage. One might also spec-ulate that credit managers overestimate the impact of their ac-count management decisions on customer behaviour comparedto other areas of the business. For example, changes in operationalstrategies that result in a marked increase/decrease in service lev-els may be far more influential on customer behaviour in terms ofattrition rates, cross sell opportunities etc. than the actions con-trolled by the credit department.

The main limitation to discerning the merits or limitations ofthe above arguments is that they are mostly that – arguments.Apart from Finlay (2008b), very little empirical research has beenpresented showing how models of continuous financial measurescompare to classification approaches. I argue that more researchneeds to be undertaken in this area. In the remainder of this papersome of the themes introduced by Finlay (2008b) are expandedupon, and I provide evidence that binary models of customerbehaviour should be superseded by continuous models, with deci-sions taken directly using the scores that result – as defined in Eqs.(4)–(6). I then compare these results with those from models thathave been constructed to maximize a lender’s objectives directly atportfolio level using a heuristic search methodology based on ge-netic algorithms.

1.3. Methodology Stage 1

First, a standard good/bad payer model was constructed to pre-dict P(G/x) using stepwise logistic regression. Logistic regressionwas chosen because it is the most commonly applied method fordeveloping credit scoring models (Crook et al., 2007) and has beendemonstrated to yield very similar performance to a wide range ofclassification/regression techniques when applied to credit scoringproblems (Baesens et al., 2003a). A significance level of 1% was setfor model entry given the size of the data set and results from pre-liminary modelling that found no improvement from using signif-icance levels ranging from 0.1% to 10%. The relationship betweenthe model score and R, L, RG and RB was then examined. This wasto see how well the assumption that R and L are the same for allmodel scores, as made for Eqs. (2) and (3), held true. A 10-foldcross-validation methodology was adopted, with observationswithin each fold selected using stratified random sampling bygood/bad status.

1.4. Methodology Stage 2

For the second stage of the analysis, models of the other compo-nents of Eqs. (4)–(6) were developed using two different regression

approaches. The set of models produced from each approach isdesignated A and B, respectively:

� A: OLS regression models of C, R, RG, RB and LB

� B: Neural network models of P(G/x), C, R, RG, RB, and LB

No accounts classified as good generated a loss. Therefore, mod-els of LG were not constructed and all values of LG were taken to bezero. For models in set A, a stepwise OLS regression methodologywas applied with a 1% significance applied for model entry.

For all experiments in Stage 2 a 10-fold cross validation meth-odology was applied, with observations assigned to the same foldsas in Stage 1. The neural network models in set B were developedusing the SAS proc neural procedure, using a MLP architecture witha single hidden layer. The network models were trained to mini-mize the sum of squared errors using the quasi-Newton algorithmwith a maximum of 500 training epochs. To avoid over fitting, thedata from the nine folds used to derive the network weights wasfurther segmented 80/20 train/test. Training occurred using the80% development sample, with model selection based on the net-work performance as measured on the 20% test set. The numberof hidden units in each network were selected after performing anumber of preliminary experiments. To further reduce the chanceof over-fitting, the preliminary experiments were performed usinga smaller sample than that used for the main experiments. Strati-fied random sampling by good/bad status was applied to create asample containing 50% of the observations from the full popula-tion. This sample was then segmented 80/20 train/test. T � 1 mod-els were then created using 2, 3, . . . , T units in the hidden layer,where T is the number of inputs. A 3-fold cross validation method-ology was applied, with the average performance of each networkcalculated across each of the validation folds. The network struc-ture with the best average performance was then used for the mainexperiments.

Models in sets A and B were developed using two differentregression approaches to ensure that the results reflected featureswithin the data, and were not as a result of the chosen data pre-processing/modelling technique. The primary reason for using anetwork methodology, in addition to the linear modelling ap-proach, was as a validation tool, to ensure that if there were anynon-linear features in the data then they received appropriatetreatment within the network. It should be noted, however, thatin real world credit scoring applications neural networks are notpopular (Baesens et al., 2003b). This is because any small benefitsneural networks might provide in terms of their ability to discrim-inate between good/bad credit risks tends to be insufficient to out-weigh the operational benefits provided by simpler linear models.

Models in sets A and B were used to generate scores resultingfrom their application to Eqs. (4)–(6). The properties of the result-ing scores where then compared to that of the traditional good/badpayer model; that is, the logistic regression/neural network modelsof P(G/x). Each score was compared on the basis of the sum of con-tribution, C, from accepted applicants, for 19 different reject ratestrategies ranging from 5% to 95% in intervals of 5%.

1.5. Methodology Stage 3

For the third stage of analysis, additional models were producedusing a genetic algorithm (GA). A GA is a data driven, non-paramet-ric heuristic search process, where the training algorithm can bechosen to optimize a wide range of objective functions. Therefore,GAs have the potential to generate models that outperform otherapproaches to scorecard development if the GA optimizes the spe-cific measure that is of interest to the user, but competing ap-proaches optimize a more general function, such as minimizedsquared error or maximum likelihood. GA theory was developed

Page 5: Credit scoring for profitability objectives

532 S. Finlay / European Journal of Operational Research 202 (2010) 528–537

by John Holland and his associates in the late 1960s and early1970s (Holland, 1975) and were quickly adopted as a heuristic ap-proach applicable to a wide range of optimization problems (Hol-lstien, 1971; De Jong, 1975).

The general principles of GAs are analogous to Darwinian prin-ciples of natural selection and survival of the fittest, and the termi-nology employed to describe GA training and selection is takenfrom the biological analogy. A set of solutions to a given problemis analogous to a population of individuals in nature. The goal ofthe GA is to combine together and mutate different solutions sothat over time fitter (better) solutions evolve. Many differentmethods have been proposed for the design of GAs. For reasonsof expediency I do not enter into a detailed discussion of the vari-ous alternatives, other than to comment that overall GAs tend to beremarkably robust, and a wide range of different GA methodologieswill lead to good solutions for many problems (Reeves, 1995). Thisis particularly true for problems where the flat maximum effect isprevalent – a situation that is known to exist for credit scoringproblems (Lovie and Lovie, 1986; Overstreet et al., 1992). There-fore, when applying a GA approach to real world problems it isnot usually necessary to go to great lengths to find the best possi-ble GA design to use. Instead, one needs to identify the range ofparameter values for which good solutions exist, and to thenchoose values that lie somewhere within these ranges. For thosewishing to know more about approaches to GA design (Coley,1999) provides a good practical introduction while (Reeves andRowe, 2003) provides a more theoretical treatment.

In this paper, the goal was to use the GA to create linear scoringfunctions of the form S = a + b1x1, . . . , bnxn that maximized the sumof contribution, C, from individuals scoring above a chosen cut-off:

MAXXn

i¼kþ1

ðCiÞ; ð7Þ

where there are n observations ranked in ascending order by S, andk is chosen such that 100 * k/n yields the desired reject rate. Henceseparate GA derived models are generated for each reject rate ofinterest. It should be noted that the resulting score, S, for the GA de-rived models cannot be interpreted as an estimate of individualfinancial performance, but merely as the relative score of eachobservation within the dataset. A linear function was chosen forthe GA derived models because, as discussed earlier, linear scoringrules are favoured by practitioners and perform well when com-pared to a wide variety of alternative modelling techniques (Bae-sens et al., 2003a).

For the GA a 16 bit binary string was used to represent theparameter coefficients of the model, giving a string length ofL = 16 + 16 * 133 = 2144. This should provide more than a sufficientlevel of granularity, given that most commercial credit scoring sys-tems generate integer scores in the range 0–1000 (Finlay, 2008b).To select the parameters for GA training, a number of preliminaryexperiments were performed, examining population sizes rangingfrom 500 to 2000, mutation rates of between 0.5L and 2L, andcrossover probabilities of between 0.125 and 1. The results fromthese preliminary experiments were very similar, suggesting thatthe performance of the GA is relatively insensitive to the choiceof parameters. The final parameters where therefore chosen onsomewhat arbitrary basis, to lie within the range of values investi-gated for the preliminary experiments. The population size was setequal to 1000 and selection was performed using linear rank selec-tion. Parameterized uniform crossover was applied, with a cross-over probability of 0.25 set at each loci of the string. Themutation rate was set at 1/L for each character in the string. Elitismwas applied, with the best 1% of models automatically propagatedforward from one generation to the next without mutation.

As with the neural network models, the development data wassegmented 80/20 train/test. The performance of the best model in

each generation was then evaluated on the test set, and the modelyielding best test set performance selected. The GA was run for amaximum of 500 generations, or until there had been 50 genera-tions without improvement on the test sample. Ten-fold cross val-idation was again applied, with observations assigned to the samefolds as defined previously for Stages 1 and 2. The scores resultingfrom the GA derived models where then compared with the scoresgenerated from Eq. (6) generated using linear regression and neu-ral networks from Stage 2.

1.6. Empirical data

A behavioural scoring data set was supplied by a large UK cat-alogue retailer that provides revolving credit. The operation of anaccount is similar to that of a store card, in that it can be used topurchase goods from the company, but it does not provide cashwithdrawal or balance transfer facilities. Credit is provided interestfree, but the prices that customers pay for goods are more thantypical high street prices, to account for the overheads associatedwith credit provision. The profit margin on items varies somewhat,but overall the retailer looks to obtain a similar margin across themajority of items it sells. Consequently, a working assumptionused by the company is that contribution can be taken to be a fixedproportion of the payments that are made against an account. Notethat this data set is similar to that described by Finlay (2008b). Itwas provided by the same organization and was extracted fromthe same database, but is from a different credit portfolio. The dataset contained 54 predictor variables. These are typical behaviouralscoring variables such as balance, current and historic arrears sta-tus and payment to balance ratios. Account performance was ob-served for 12 months and used to classify accounts as good orbad payers. Good were those up-to-date or less than one cycledelinquent, bad three cycles or more delinquent. All other accountconditions were classified as indeterminate. This is consistent withgood/bad definitions commonly quoted as being used in industry(Lewis, 1992; Hand and Henley, 1997; McNab and Wynn, 2003).After excluding cases such as newly opened accounts, indetermi-nates, dormant accounts and accounts already seriously delin-quent, the sample contained 105,134 observations classified asgood and 17,109 classified as bad.

The data also contained financial information about accountbehaviour over the 12 month outcome period. In particular, the to-tal income received from the customer and the outstanding bal-ance at the end of the outcome period. For accounts classified asbad, the loss, L, on the account was taken to be a fixed proportionof the outstanding account balance, K, to take into account debtrecovery action resulting in some of the debt being recovered.The net return, R, was taken to be equal to a fixed proportion of,N, the gross payments received from each customer over the out-come period. An approximate value of the contribution from eachaccount, i, was calculated as:

Ci ¼ aNi � bKi ¼ Ri � Li: ð8Þ

It is acknowledged that (8) does not represent a true contributionmeasure, but it comprises the two most important components ofcontribution for a credit product; that is, bad debt, L, and net reve-nue, R. It should therefore act as an acceptable approximation fordecision making purposes, where the priority is to rank accountson the basis of their relative performance. This is a similar approx-imation to customer value to that proposed by Thomas et al. (2001)and applied by Andreeva et al. (2007). If more detailed informationabout costs and revenues where available then it would be a simplematter to extend (8) to incorporate these. For reasons of commercialsensitivity the values of a and b, are not disclosed. However, histo-grams showing the distribution of revenue and loss are shown inFig. 1.

Page 6: Credit scoring for profitability objectives

Revenue From All Accounts (R)

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350 370 390

£

Perc

ent

Revenue From Good Payers (RG)

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

10 30 50 70 90 110 130 150 170 190 220 240 260 280 300 320 340 360 380 400

£

Revenue From Bad Payers (RB)

0%

5%

10%

15%

20%

25%

10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350 370 390

£

Perc

ent

Perc

ent

Perc

ent

Loss FromBad Payers (L B)

0%

2%

4%

6%

8%

10%

12%

14%

45 180 315 450 585 720 855 990 1125 1260 1395 1530 1665 1800

£

Mean: 48.35 Median: 33.61 St Dev: 48.40 Range: (0, 420)

Mean: 48.36 Median: 34.99 St Dev: 50.01 Range: (0, 420)

Mean: 36.94 Median: 26.97 St Dev: 34.51 Range: (0, 375)

Mean: 365.31 Median: 254.94 St Dev: 361.77 Range: (0, 2277)

Fig. 1. Distributions of revenue and loss.

S. Finlay / European Journal of Operational Research 202 (2010) 528–537 533

From Fig. 1 it can be seen that the net revenue generated frombad accounts (RB) is lower than that for the goods, but the valuesare significant. The average bad generates 76% of the revenue ofan average good. Overall, the total revenue generated from the badsrepresents 10.7% of the total revenues from the portfolio as a whole.If one also considers the losses on bads at the end of the outcomeperiod (LB), compared to the total contribution generated from badsover the outcome period (using Eq. (8)), then the total contributionfrom bad accounts is £-5.62 m. This compares to the charge-off va-lue of £6.25 m. This difference is important because what lendersoften consider when evaluating portfolios is loss given default,which is widely used within Value at Risk calculations for capitalrequirements planning. However, from a profitability perspective,one must also consider the revenue received over the forecast hori-zon if an accurate forecast is to be made, otherwise an overtly pes-simistic view of account profitability will result.

1.7. Data pre-processing

For logistic regression, OLS and genetic algorithm derived mod-els, the independent variables were pre-processed to code them asa set of dummy variables. This is a standard practice in the devel-opment of credit scoring models (Hand and Henley, 1997; Crooket al., 2007; Finlay, 2008a) and is generally a good way to deal withnon-linear relationships within the data (Fox, 2000; Hand et al.,2005). For categorical variables, each category was coded as a sep-arate dummy variable. In cases where a category contained rela-tively few data points the categories were combined based onexpert opinion (of someone with more than 10 years of commer-

cial scorecard building experience) as to which larger categorythe smaller category was most similar, or were considered to bespurious data and excluded from the analysis. These decisionscould have been made using purely mechanical means, for exam-ple by considering univariate measures such as the information va-lue for alternative attribute combinations, but expert opinion wasapplied to so that business sensible groupings resulted. For contin-uous and semi-continuous variables dummy variable definitionswere initially defined for each 10% of the population. Where appro-priate, dummy variables were redefined to yield business sensiblegroupings. For example, for the variable credit limit utilization, theview was expressed that a value <=100% (indicating an account iswithin its credit limit) may be indicative of very different underly-ing relationships within the data than values >100% (indicating theaccount is over its credit limit). Therefore, the dummy boundarieswere restricted to occur at 100%.

As described in the methodology sections, separate modelswere developed for P(G/x), C, R, RG, RB and LB. Models of P(G/x), Cand R where developed for the entire population, models of RG

for the just the goods, and RB and LB for just the bads. Therefore,pre-processing was performed separately for each of the three pop-ulations (All, Goods and Bads). However, in order to provide a faircomparison between models and to ensure that any differences inmodel performance did not result from features introduced duringdata pre-processing, all of the dummy variables defined for eachmodelling objective were allowed to feature in any of the resultingmodels.

For the neural network models, each interval within categoricalvariables was coded as a dummy variable, and for continuous

Page 7: Credit scoring for profitability objectives

534 S. Finlay / European Journal of Operational Research 202 (2010) 528–537

variables the input variables were standardized by dividing by thestandard deviation and subtracting the mean.

1.8. Results Stage 1

For reference purposes the scores from the logistic regressionmodel of P(G/x), produced during the first stage of analysis, is re-ferred to as SLog. Scores from the neural network model of P(G/x)from the second stage will be referred to as SNet. The other scoresfrom the second and third stages will be referenced by SXXXY. Thesubscript XXX identifies the procedure(s) used to generate models;that, is Lin, Net or GAS for Logistic/Linear regression, neural net-works and genetic algorithms respectively. The subscript Y identi-fies the equation from which the final score is then derived; that is,Eqs. (4)–(6).

We begin by considering how revenue and loss are distributedacross the range of model scores for SLog, the logistic regressionmodel of P(G/x). For information, the average GINI coefficient forSLog from the ten validation folds was 0.643. This is well withinthe usual range of 0.4–0.8 for models of this type (Finlay, 2008a).Fig. 2 shows the mean and 99% confidence interval for the distribu-tions of revenue and loss by score decile.

From Fig. 2 it can be seen that the values of R, RG, RB and LB varyby score, and the relationships contain non-linear components.Application of the non-parametric Kruskal–Wallis test of betweengroup means provides support for this, with the null hypothesis of

Revenue from All Accounts (R )

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10

Score decile

1 2 3 4 5 6 7 8 9 10

Score decile

£ £

Loss on bad payers (L B )

200

250

300

350

400

450

£

Note: Deciles are calculated separately for each po

calculated using only observations classified as bad

deciles for RG. All observations are used to create s

Fig. 2. Distributions of rev

equal group means rejected using a 99.5% significance level in allfour cases. This contradicts the assumptions made in relation toEqs. (2) and (3) that debt and revenue take the same value for allmodel scores, and supports the case for using approaches such asthose suggested by (4)–(6). One interesting feature is that highscoring bads generate less bad debt than low scoring bads. Thisis surprising because high scoring bads would be expected to havehigher credit limits than low scoring ones and therefore default forhigher amounts, but this is not the case. This suggests there may beunderlying differences in the causes of default and/or customersresponse to default, in terms of their willingness and ability to dealwith it. Another interesting feature is that revenue for both goodand bad payers drops off towards the higher deciles, with thegreatest revenues for bads found towards the middle of the distri-bution. Overall, Fig. 2 suggests that the relationship between riskand contribution is weak. This is supported by the Spearman rankcorrelation coefficient between contribution, C, and estimated risk,SLog, which is just 0.033.

1.9. Results Stage 2

Figs. 3 and 4 show the percentage of the theoretical maximumcontribution that is identified by applying Eqs. (4)–(6) to a range ofreject rate strategies. For a given reject rate, Q, the percentage ofthe maximum possible contribution is calculated as 100 * CQm/CQp

where CQm is the sum of contribution for all accepted observations,

20

30

40

50

60

70Revenue from Good Payers (R G)

Revenue from Bad Payers (R B )

25

30

35

40

45

£

pulation. Therefore, deciles for RB and LB are

. Likewise, only goods are used to create the score

core deciles for R.

1 2 3 4 5 6 7 8 9 10

Score decile

1 2 3 4 5 6 7 8 9 10

Score decile

enue and loss by SLog.

Page 8: Credit scoring for profitability objectives

Contribution by reject rate (Linear models)

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

0 10 20 30 40 50 60 70 80 90 100

Reject rate %

% M

axim

um c

ontr

ibut

ion

SLog SLin4 SLin5 SLin6

Fig. 3. Contribution by reject rate (linear models).

Contribution by reject rate (Neural network models)

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

0 10 20 30 40 50 60 70 80 90 100

Reject rate %

% M

axim

um c

ontr

ibut

ion

Snet Snet4 Snet5 Snet6

Fig. 4. Contribution by reject rate (neural network models).

Contribution by reject rate (Models of C)

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

0 10 20 30 40 50 60 70 80 90 100

Reject rate %

% M

axim

um c

ontr

ibut

ion

SNet6 SLin6 SGas6

Fig. 5. Contribution by reject rate (SGa6, SLin6 and SNet6).

S. Finlay / European Journal of Operational Research 202 (2010) 528–537 535

when observations are ranked by the scores generated from modelm, and CQp is the sum of contribution for accepted observationswhere observations are ranked in known contribution order; i.e.the ranking that would be given by a perfect model.

Figs. 3 and 4 show similar, but not identical features. The mainarea where the results concur is the performance of scores SLin4 andSnet4, and SLin5 and Snet5 generated from Eqs. (4) and (5). These dis-play almost identical predictive ability and both dominate SLog andSNet. This supports the case that it is possible to construct good pre-dictive models of revenues and loss, and hence contribution, andthat such models have the capacity to outperform models ofP(G/x). What is also interesting is that these scores also dominateSLin6 and SNet6, generated from Eq. (6). This suggests that the factors

predictive of risk, loss and revenue are different, meaning that foran aggregate function of contribution (such as Eq. (6)) the relation-ships are non-linear and hence, not well modelled using linearmethods such as OLS. For the neural network models the differencein performance between SNet4/SNet5 and SNet6 is also significant, butto lesser extent, suggesting that the network has managed to dealwith some, but not all non-linearity. Therefore, other network con-figurations that have greater capacity to deal with non-linear fea-tures of the data set, for example networks with more units inthe hidden layer and/or an additional hidden layer, may yield bet-ter performance. Another feature of interest is the relationship be-tween SLin6 and SLog. Neither model dominates the other. Inparticular, for reject rates in the 15–40% range, SLog is the betterof the two. Given that this is the region within which many lendersapply their decision strategies, it may be one reason why Lucas(2001) reported that profitability models do not perform well, ifthe comparison(s) he refers to considered aggregate measures ofcontribution based on linear modelling techniques. The final com-ment on Figs. 3 and 4 is that in general the OLS models marginallyoutperform the network models, which may be attributable to thedifferent data-pre-processing applied. The exception to this is theperformance of SNet6, which dominates SLin6, providing further sup-port for aggregate measures of contribution displaying significantnon-linearity in their relationship with predictor variables.

1.10. Stage 3 results

The GA was coded using Microsoft Visual C++, and each run ofthe GA required approximately 45 minutes of CPU, using one coreof a Intel Core 2 Duo E6750 based PC with 4 GB RAM. 190 runs ofthe GA were required (19 different reject rates and a 10-fold devel-opment/validation methodology), resulting in around 142.5 hoursof processing time. In nearly all cases the best performance of eachGA was found in 50–150 generations, and in no cases were themaximum 500 iterations required. Fig. 5 shows the results forthe GA models SGAS6, compared to SLin6 and SNet6.

We shall begin by considering the relative performance of thescores produced from the two linear modelling approaches; SLin6

Page 9: Credit scoring for profitability objectives

536 S. Finlay / European Journal of Operational Research 202 (2010) 528–537

and SGAS6. As shown in Fig. 5, SGAS6 dominates SLin6 across all rejectrate strategies examined. For 18 of the 19 reject rate strategies, thedifference in performance between SGAS6 and SLin6 is significant atthe 99% level, based on the non-parametric Wilcoxon signed ranktest, applied to the results from each of the 10 validation samples.For the remaining case (reject rate of 35%) the difference is signif-icant at the 95% level but not the 99% level. This provides evidencethat specific business objectives, such as contribution maximiza-tion, are not perfectly correlated with more general objective func-tions such as minimized sum of squared error. Therefore, in suchcircumstances modelling methodologies that are capable of opti-mizing specific business objectives may be more appropriate thanmethodologies that do not.

The primary objective of this research has been to compare theperformance of scores generated from various linear models. How-ever, it is also worth commenting on the relative performance ofthe scores generated from the linear model produced using theGA (SGAS6) and the scores generated from the neural network mod-els of contribution (SNet6), which are also shown on Fig. 5. The mainfeature is that neither model dominates the other. For reject ratesup to 15%, both scores perform similarly. For reject rates in the re-gion 20–35 SNet6 outperforms SGAS6, but the differences are not sta-tistically significant. However, for higher reject rates SGAS6

outperforms SNet6, and for all reject rates above 55% the differencesare significant at a 99% level of significance.

1.11. Concluding remarks

In this paper, several approaches to modelling continuousfinancial measures have been examined. The results suggest threethings. First, scores generated from modelling individual compo-nents of profit contribution, such as default probability, bad debtand revenue, which are then combined as in equations (4) and(5), dominate single aggregated models of profit contribution suchas those in (6). Second, such approaches better identify profitableaccounts than traditional classification models used to generateestimates of the likelihood of default. This is not surprising giventhat different modelling objectives have been applied, but it doesprovide evidence that goes someway towards discounting thearguments that have been made against modelling profit basedobjectives in this way. Third, modelling techniques that can be tai-lored to optimize specific business objectives, such as genetic algo-rithms, can outperform techniques that optimize more generalobjective functions, such as likelihood or minimized sum ofsquared errors.

With regard to future research, it would be desirable to repeatthe research using additional datasets, for different credit portfo-lios, should they be become available. A second area of investiga-tion would be to look at the sensitivity of the profit objective.Some work along these lines was undertaken by Finlay (2008b),which concluded that the ranking observations received was rela-tively invariant to the profit definition that was applied. However,there is further work to be done in this area, particularly in termsof the assumptions that are made about how the revenue/loss com-ponents of profit are calculated, and how variant these are over dif-ferent forecast horizons. A third line of investigation would be toexamine the application of a GA approach to produce separate lin-ear models of each of the components of Eqs. (4) and (5) which werethen combined. Finally, it would be interesting to compare the per-formance of GA derived models with other optimization tech-niques, such linear programming and simulated annealing.

Acknowledgements

The author would like to thank the ESRC for their support forthis research, and Professor Robert Fildes of Lancaster University

for his comments on the paper, as well as the detailed commentsand suggestions made the three journal reviewers. The authorwould also like to express his gratitude to the organization andits staff, that provided the data set used for the empirical study,but which has requested that its identity remains anonymous.

References

Andreeva, G., Ansell, J., Crook, J., 2007. Modelling profitability usingsurvival combination scores. European Journal of Operational Research 183,1537–1549.

Baesens, B., Gestel, T.V., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J., 2003a.Benchmarking state-of-the-art classification algorithms for credit scoring.Journal of the Operational Research Society 54 (5), 627–635.

Baesens, B., Setiono, R., Mues, C., Vanthienen, J., 2003b. Using neural network ruleextraction and decision tables for credit-risk evaluation. Management Science49 (3), 312–329.

Beling, P.A., Covaliu, Z., Oliver, R.M., 2005. Optimal scoring cutoff policies andefficient frontiers. Journal of the Operational Research Society 56 (9), 1016–1029.

Coley, D.A., 1999. An Introduction to Genetic Algorithms for Scientists andEngineers. World Scientific.

Crook, J.N., Edelman, D.B., Thomas, L.C., 2007. Recent developments in consumercredit risk assessment. European Journal of Operational Research 183, 1447–1465.

Crowder, M., Hand, D.J., Krzanowski, W.J., 2005. On customer lifetime value. CreditScoring and Credit Control IX, Edinburgh, Credit Research Centre. University ofEdinburgh School of Management.

Cyert, R.M., Davidson, H.J., Thompson, G.L., 1962. Estimation of the allowance fordoubtful accounts by Markov chains. Management Science 8 (3), 287–303.

DeJong, K.A., 1975. An Analysis of the Behavior of a Class of Genetic AdaptiveSystems. Michigan, University of Michigan, Ann Arbor.

Desai, V.S., Conway, D.G., Crook, J., Overstreet, G., 1997. Credit-scoring models in thecredit union environment using neural networks and genetic algorithms. IMAJournal of Mathematics Applied in Business and Industry 8 (4), 323–346.

Durand, D., 1941. Risk Elements in Consumer Instatement Financing. NationalBureau of Economic Research, New York.

Fildes, R., Makridakis, S., 1988. Forecasting and loss functions. International Journalof Forecasting 4 (4), 545–550.

Finlay, S.M., 2005. Using Genetic Algorithms to Develop Scoring Models forAlternative Measures of Performance. Credit Scoring and Credit Control IX,Edinburgh.

Finlay, S., 2008a. The Management of Consumer Credit: Theory and Practice.Palgrave Macmillan, Basingstoke, UK.

Finlay, S.M., 2008b. Towards profitability. A utility approach to the credit scoringproblem. Journal of the Operational Research Society 59 (7), 921–931.

Finlay, S., 2009a. Consumer Credit Fundamentals. Palgrave Macmillan, Basingstoke.Finlay, S.M., 2009b. Are we modelling the right thing? The impact of incorrect

problem specification in credit scoring. Expert Systems with Applications 36(5), 9065–9071.

Fogarty, T.C., Ireson, N.S., 1993. Evolving Bayesian classifiers for credit control – acomparison with other machine learning methods. IMA Journal of MathematicsApplied in Business and Industry 5 (1), 63–75.

Fox, J., 2000. Nonparametric Simple Regression. Sage, Newbury Park.Hand, D.J., 2005. Good practice in retail credit scorecard assessment. Journal of the

Operational Research Society 56 (9), 1109–1117.Hand, D.J., Henley, W.E., 1997. Statistical classification methods in consumer credit

scoring: A review. Journal of the Royal Statistical Society, Series A-Statistics inSociety 160 (3), 523–541.

Hand, D.J., Sohn, S.Y., Kim, Y., 2005. Optimal bipartite scorecards. Expert Systemswith Applications 29 (3), 684–690.

Hand, D.J., Whitrow, C., Adams, N.M., Juszczak, P., Weston, D., 2008. Performancecriteria for plastic card fraud detection tools. Journal of the OperationalResearch Society 59 (7), 956–962.

Holland, J., 1975. H. Adaptation in Natural and Artificial Systems. The University ofMichigan Press, Ann Arbor, Michigan.

Hollstien, R.B., 1971. Artificial Genetic Adaptation in Computer Control Systems.University of Michigan, Michigan.

Hopper, M.A., Lewis, E.M., 1992. Development and use of credit profit measures foraccount management. IMA Journal of Mathematics Applied to Business andIndustry 4, 3–17.

Ketz, S., Johnson, N.L., 1982. Lachenbruch. In: Discriminant Analysis. Encyclopaediaof Statistical Sciences. vol. 2, Wiley & Sons Inc.

Lewis, E.M., 1992. An Introduction to Credit Scoring. Athena Press, San Rafael.Li, H.G., Hand, D.J., 2002. Direct versus indirect credit scoring classifications. Journal

of the Operational Research Society 53 (6), 647–654.Lovie, A., Lovie, P., 1986. The flat maximum effect and linear scoring models for

prediction. Journal of Forecasting 5 (3), 159–168.Lucas, A., 2001. Statistical challenges in credit card issuing. Applied Stochastic

Models in Business and Industry 17 (1), 83–92.McNab, H., Wynn, A., 2003. Principles and Practice of Consumer Risk Management.MORI Market Dynamics. 2004. Credit Card Debt ‘Overstated, Over Reported And

Largely A Myth’. 2004. From <http://www.morimarketdynamics.com/news_credit-card-myth-pr.php> (Retrieved 08.11.04).

Page 10: Credit scoring for profitability objectives

S. Finlay / European Journal of Operational Research 202 (2010) 528–537 537

Oliver, R.M., Wells, E., 2001. Efficient frontier cutoff policies in credit portfolios.Journal of the Operational Research Society 52 (9), 1025–1033.

Overstreet, G.A., Bradley, E.L., Kemp, R.S., 1992. The flat maximum effect and genericlinear scoring models: A test. IMA Journal of Mathematics Applied in Businessand Industry 4, 97–109.

Reeves, C.R., 1995. Modern Heuristic Techniques for Combinatorial Problems.McGraw-Hill.

Reeves, C.R., Rowe, J.E., 2003. Genetic Algorithms – Principles and Perspectives. AGuide to GA Theory. Kluwer Academic Publishers Group.

Stepanova, M., Thomas, L., 2001. C. PHAB scores: Proportional hazards analysisbehavioural scores. Journal of the Operational Research Society 52 (9), 1007–1016.

Stepanova, M., Thomas, L.C., 2002. Survival analysis methods for personal loan data.Operations Research 50 (2), 277–289.

Thomas, L.C., Banasik, J., Crook, J.N., 2001a. Recalibrating scorecards. Journal of theOperational Research Society 52 (9), 981–988.

Thomas, L.C., Ho, J., Scherer, W.T., 2001b. Time will tell: Behavioural scoring and thedynamics of consumer credit assessment. IMA Journal of ManagementMathematics 12 (1), 89–103.

Thomas, L.C., Edelman, D.B., Crook, J.N., 2002. Credit Scoring and its Applications.Siam, Philadelphia.

Thomas, L.C., Oliver, R.W., Hand, D.J., 2005. A survey of the issues in consumer creditmodelling research. Journal of the Operational Research Society 56 (9), 1006–1015.

Yobas, M.B., Crook, J., Ross, P., 2000. Credit scoring using neural and evolutionarytechniques. IMA Journal of Mathematics Applied in Business and Industry 11(2), 111–125.