How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to...

6
data mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics Volume 15, Number 10 www.dmreview.com Information Is Your Business

Transcript of How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to...

Page 1: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

data mining

Howto BuyDataMiningA Framework forAvoiding Costly ProjectPitfalls in PredictiveAnalytics

Volume 15, Number 10 www.dmreview.com

Information Is Your Business

Page 2: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

HA Frameworkfor AvoidingCostly Project

Pitfalls in PredictiveAnalytics

By Eric A. King

How does someone purchase an intangible, cryptic,seemingly immeasurable technology? Beyond the inher-ent up-front risks of engaging in what is essentially adiscovery process, just identifying a starting point can beintimidating and mystifying. Despite its elusive nature,data mining technology has surpassed the flash-in-the-pan “miracle tool” stigma with widespread and sustainedsuccess stories highlighted in mainstream publications,along with recurring case studies of improved opera-tional efficiencies, enhanced business intelligence andresidual payback. For any organization with annual rev-enues more than $50 million, employing data miningtechnology is not a matter of whether, but when.

Data mining has been seeping into mainstreambusiness applications for more than two decades.Numerous case studies may be quickly referenced via asimple Internet or publication search. Its progress isunstoppable, propelled by sustained value justifications- yet stinted by the complexities of development,interpretation, integration and adoption. Thisarticle will suggest how to properlyapproach the starting line and

data mining

How to Buy

Page 3: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

how to implement a purposefully flexibleframework for establishing an efficientand effective organizational data miningprocess.

Cutting Through The BuzzLet’s first make sure that we’re on thesame page when talking about data min-ing. It is not wholly incorrect to labeldata mining as retrospective searcheson a large database for specificcriteria, otherwise known asonline analytical processing

(OLAP) or SQL queries. An example ofOLAP or SQL queries would entail min-ing a large repository to identify femalesbetween the ages of 28 and 45 fromNew York, New Jersey and Delawarewith incomes between $65,000 to$90,000 who purchased blue slacks

between July 1 and August 15. For thisquery, we know the exact question to ask ofthe database. This practice typically exploresjust 5 to 15 percent of a large database.

For the purposes of this article, datamining shall refer to computer-aided pat-tern discovery of previously unknowninterrelationships and recurrences acrossseemingly unrelated attributes in order topredict actions, behaviors and outcomes.Simply put, when referring to data miningin this article, we are looking at predic-tion derived from information hiddenwithin large volumes of data rather thanretrospection drawn from an OLAP orSQL query.

It is important to recognize andrelate much of the popular terminologythat is thrown about in order to providecontext going forward. Data mining tech-nology is not new. Methods forautomating pattern discovery and pre-diction have existed for decades. Despitea considerable level of hype and strategicmisuse, data mining has not only perse-vered but also matured and adapted forpractical use in the business world. Howcould a community that is so data-rich,yet information-poor and profit-drivenabandon a tool that can validate its ownability to predict customer behavior?

Alongside the technology, terminol-ogy has evolved over the last fourdecades. Names from 40 years ago arestill recognizable as common phrasestoday. In the ‘70s and ‘80s, names such asartificial intelligence and machine learn-ing that implied the computer had itsown consciousness were somewhatoversold (perhaps even “over-souled”).The names of various data mining build-ing blocks such as neural networks,genetic algorithms and evolutionarycomputing deservedly carry Darwinist

Data Mining

Page 4: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

tones of natural selection, as the underly-ing mathematics emulates biologicalprocesses. From a mathematician’s per-spective, these processes may be viewedas statistics on steroids.

In the ‘90s through the early ‘00s, thetechnology has been commonly referredto as data mining and knowledge discov-ery. However, due to the duality of theterm data mining often referring to bothOLAP and pattern discovery, a shift israpidly moving toward far more descrip-tive and accurate nomenclature such aspredictive modeling and predictive ana-lytics. In fact, this will probably be one ofthe last articles I write using the label datamining - which is too mainstream toabandon just yet.

Just What is Data Mining?Is data mining considered a service? Is ithardware? Software? A scored file? A sys-tem or a process? A customized solution?There does not seem to be a consensus,which makes data mining all the harderto visualize, define, manage ... and pur-chase. Two people may discuss data min-ing and have entirely different concepts inmind. Of course, all of the previouslymentioned descriptions are technicallycorrect. While the business communitymay appropriately view data mining as aproductive, value-driven solution, thatperspective focuses on the destination,not the journey. If credit were given to thebest definition of data mining, processwould score the point.

Viewing data mining asa process encompasses allthe hard and softresources, and impliesa structured yetongoing approachto an evolvingoptimization prob-lem. When viewedas a process, datamining projects maybe planned andimplemented in a pro-cedural way that all butensures success. As well,expectations should be inher-ently leveled to never expect a “finalanswer” nor anticipate a single pass.When implemented properly, productiveresults should be expected early and con-tinually improved.

How Not to Buy Data MiningIt is far too common for organizations toadapt their data mining project design to

a blend of their perception of what datamining is with a standard corporate prac-tice for evaluating and purchasing prod-ucts and services. The result is a popularyet doomed approach:1. Collect product literature from data

mining tool vendors at industry eventsor as advertised in journals.

2. Invite vendors whose retail price of theirflagship product fits within availablediscretionary budgets to visit on site.

3. Gain a free education in data miningthrough subjective presentations at thevendor’s expense (too many are anx-ious to chase any sales bait, qualifiedor otherwise).

4. Purchase a data mining tool from thevendor who presented last.

5. Throw some data at the tool and awaitmagical results.

6. Stare at the numbers or even visualiza-tions thereof, wondering why an angelicchorus did not accompany the results.

7. Without knowing whether the resultsare useless or phenomenal, data min-ing is dismissed as hyped and/or pie-in-the-sky technology.

The ultimate cost of a failed first passcan be tremendous. Not only will theorganization suffer opportunity costs fromvalue never realized, but competitors willalso have a greater window to capitalize onthe benefits. Furthermore, morale will beadversely affected, which can wreak untoldhavoc on any organization.

Ultimately, data mining will be uti-lized by all medium and large

organizations in one formor another. Not

employing predictiveanalytics against alarge repository ofdata (which allmedium andlarge businesseshave) would beanalogous tobuilding drilling

p l a t f o r m s ,pipelines and stor-

age tanks with nointentions for a refinery.

Although the term data min-ing may fade, the technology will

not. If a company makes a failedapproach now, it will only need to repeatthe attempt later. The question is whetherthe organization will repeat its mistakes.

A Best-Practice Approach to Data Mining

The recommended approach for data min-

ing presented in this article has a perfecttrack record of matching performance toexpectations. Data mining is essentially a dis-covery process, which requires a purpose-fully flexible framework with numerouscheckpoints for assessment and adjustment.Be wary of any vendor who proposes todeliver a fully implemented data miningsolution without early decision points.Consulting firms with reputable names willoften win sizable data mining contracts andproceed with a weak strategy free ofcheckpoints and adjustable stages. Theproject quickly migrates into an exercise ofpost-justifications, blame casting, contract-wiggling and backpedaling.

The following five stages provide afoundation to drive a successful datamining strategy and implementation.

1. TrainingThe best results in data mining areachieved when a data mining expert com-bines experience with an organizationaldomain expert. While neither needs to befully proficient in the other’s field, it is cer-tainly beneficial to have a basic ground-ing across areas of focus. Even if a datamining project is entirely outsourced,substantial advantages await the organi-zation whose principals are trained to rec-ognize elusive pitfalls, speak confidentlyabout data mining methods, appreciatetradeoffs between accuracy and explain-ability, collaborate more effectively fordata preparation and interpret moreaccurately the model’s results. Suchknowledge can also serve well towardevaluating vendors, interacting with proj-ect managers and effectively questioningany suspect results or methods.

Numerous data mining conferencesand public training courses exist. Manytool vendors have excellent instructors andworthwhile courses, particularly for theircustomers. Most times, however, coursesoffered by tool vendors restrict the scope ofcontent to highlight the capabilities of theirproduct(s). Because tools should not beconsidered until later in the process, try toidentify vendor-neutral conferences andcourses to receive an unbiased, broad andnonpromotional presentation.

If staff or time simply does not exist totrain internally, consider hiring an inde-pendent data mining expert who may actas a liaison and third-party project advo-cate between your organization and themain project vendor. The consultant shouldhold three qualities in combination: 1. He or she should be well-steeped in

the data mining process with a strong

Page 5: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

track record of application success. 2. He or she should be multilingual -

able to converse fluently with analysts,IT staff, users, directors and executivemanagement.

3. Most importantly, he or she should bebusiness-oriented, not rushing to ana-lyze the data, but focusing first onamassing a comprehensive under-standing and assessment of the client’sbusiness model and all availableresources, as well as any applicablehistory, benchmarks and objectives.

Whether conducting your projectinternally or outsourcing, it pays to incor-porate the direction of a data miningexpert. Working in concert with an orga-nizational domain expert, the data miningconsultant will form a symbiotic relation-ship that provides the benefits of knowl-edge transfer, redundancy and inherentreinforcement training. This accumulatedknowledge drives well-informed choices,validates sound judgment and combinesthe perspective that practically assures asolid definition of data mining projectsuccess, and then achieves it.

2. AssessmentThis is the stage in which the true buy fordata mining occurs. Unfortunately, manyorganizations are reluctant to engage in adata mining project assessment (DMPA),because they have been burned on assess-ments by services companies who basi-cally charge to exploit opportunity fromtheir client. When done properly, however,the DMPA is an essential component of asuccessful data mining project.

From the client’s perspective, anyassessment is risky. The value of the resultsis unknown in advance. A full data miningimplementation cannot be estimated indollars or time prior to this exercise. Thereare far too many unknown factors that candramatically affect the approach and scaleof a data mining project. Further, a DMPAmay reveal that an organization is not evenat the starting line - thus saving substantialtime and money resources by preventing apremature project. When performed by areputable services company, this aspect ofthe assessment can arrive at the preciseopposite of exploiting opportunity by sav-ing needless effort and expense in advance.

The DMPA should offer a compre-hensive situational report of findings thatsupport a draft overarching plan (laterdescribed as the recommendations report).The findings report should manifest thereadiness of numerous factors that need tobe present for a successful data mining

implementation. To name a few:l Data Certification: A topical survey

of the structure and nature of the datato support predictive analytics.

l Existing Resources: Additionaltools may be recommended to sup-port or replace existing products. Arethe skills available in house to sup-port the modeling process afterdeployment? What other technolo-gies or methods have been used inthe past? Are previous performancebenchmarks available?

l Stakeholder Objectives: Arethe questions to whichexecutives seek answersaligned with theresources amassedin the findings?Are there desiredand/or requiredp e r f o rman c elevels? Are theb e n c h m a r k srealistic from thec o n s u l t a n t ’ s experience?

l F u n c t i o n a lManagers: There aremany situations in whichcompanies are either unable orunwilling to take the actions recom-mended by the model. (In the wordsof Jack Nicholson in A Few Good Men, itshould be determined in advance if“You can’t handle the truth!”)

l Constraints: Are there hard bound-aries that must be identified and builtinto the decision process - either beforeor after the model’s implementation?Because virtually all data miningmethods present a tradeoff betweenaccuracy and explainability, a pointon the scale should be defined. Whatare tolerable levels of false positivesor negatives from the model?

l User Buy-in: If they won’t adopt it,why build it? How may the system bedesigned to encourage dedicated use?

l IT Support: While usually not a deal-killer, IT is typically far more willing tosupport the model’s function whenthey are included in the strategy andare invited to become data miningadvocates. If IT is going to supportanother project that requires dataaccess, it helps if they can also appre-ciate the high-level vision and benefitsto the organization.

Without the DMPA exercise, somemodeling projects can be carried to com-pletion tactically, but the results ultimately

do not pass the “so what” test strategically.In this situation, the client and consultantstare at each other, wondering whetherthey arrived at outstanding results oroverall failure.

The DMPA should be created inde-pendently, allowing the organization tofreely choose how the resulting plan will bedeveloped and implemented - whether bythe same services company who submittedit, another third-party vendor or the clientitself. The consultant who conducts theDMPA should not incorporate proprietary

components or aspects thatsubjectively commit the

resulting build-outexclusively to the

DMPA author. Thevalue of theDMPA is all inthe strategy, notthe tactics.

The recom-m e n d a t i o n sreport from the

DMPA will producean overarching

project plan. Earlystages may be firmly

priced. However, laterstages may only be estimated

because it cannot be known in advancewhat information will be derived from thedata and how it should be leveraged.Newly discovered information can drivethe remaining project in slightly differentdirections. Most times, there is not a significant departure from the overarchingplan, but it is not realistic to ever fix theprice of a data mining project beyond a fewnear-term tasks. This does not make datamining any easier to buy, but risk cannotbe effectively managed without anadjustable, staged approach to a projectthat by its nature is about discovery of theunknown. A flexible structure and repeat-able process with sound guidelines must bedesigned to effectively manage discovery.

3. Strategy Data mining strategy is far too often over-looked or retrofitted to a resulting model.Nearly all neophytes to the data miningprocess are anxious to run straight to thedata and push whatever is readily avail-able into an analytical tool. While moderntools may help to some degree in datapreparation, exploration and visualiza-tion, even the best tools on the marketcannot anticipate, interpret or implementaround environmental and politicalaspects of model integration. Moreover,

Page 6: How to Buy Data Mining - Predictive Modeling & Data Mining ... · PDF filedata mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics

the modeler may take poor results andproceed as though they were superior, orobtain great results (thanks to modernsoftware’s automation and wizards) andnot know it - essentially building excellentmodels that answer the wrong questions.

Most of the strategy framework isestablished during the DMPA. As discover-ies unfold and unforeseen information fromthe data is interpreted, the strategic directionmay adjust somewhat, but usually not dra-matically. This is why planning for a flexibleframework is such a critical component fora successful data mining implementation.Any vendor who claims to have a completeframework to fit your organization’s overallsituation and forego a DMPA should beregarded with some suspicion. The purposeof the DMPA is to assess the overall situa-tion and resources for data mining anddraft the overarching strategy to direct theproject to completion - and beyond.

4. ImplementationThanks to automated software with effec-tive wizards, the implementation isarguably the easiest and least risky part ofa full-scale data mining project. It is farbetter to have a mediocre model withsolid strategy than the inverse.

One misconception about selectingan external data mining consultant is thatthe consultant should also be an expert inyour industry. It may be helpful for theconsultant to have background in orderto speak and interpret industry lingowhile appreciating the competitive envi-ronment and primary drivers, but unlikebuilding a knowledge base, it is actuallypreferable not to have the industry’sstrongest domain expert who also hap-pens to do some data mining. While theconsultant may appear impressive at the

outset, too much industry expertise canintroduce subjectivity and preconceivednotions that may skew the way modelsare developed and interpreted.

Models by their nature are objective,and the consultant should be, too. Thebest results are achieved when the datamining expert drives the model-buildingprocess but not the results. The data min-ing consultant should then work withyour organization’s domain expert tomutually interpret the results, validatethem and determine the most effectiveway to make them useful.

5. IterationMany industry standard and best practiceprocess diagrams show data mining as alinear process, ending with deployment.Rather, an ongoing process toward ana-lytic enlightenment is a more realisticexpectation and effective mind-set fromwhich to work.

As part of the assessment stage, a feed-back strategy should be derived to capturevaluable performance results (in marketing,this is referred to as a solicitation file). Notonly is the results data used for model val-idation, but it is also valuable fuel for thenext iteration of model building. The modelshould be updated and enhanced with thelatest performance data, perhaps evenweighting the latest feedback more heavilyto encourage a greater recency effect fornovel behavioral patterns.

The closing of this loop from resultsinterpretation to model update is anexcellent opportunity for reinforcementtraining. Reconvene with your data min-ing trainer or consultant to review thefirst pass and prepare for the next itera-tion. Advanced or alternative approachesfor the next model update should be con-

sidered at this milestone, encouraging aprogression of knowledge transfer whilepushing model innovation.

While this article has presentedframeworks for both failing and succeed-ing in data mining, the most critical phaseis the data mining project assessment. Allother aspects of the process are forgivingand easily repairable. Foregoing a datamining expert’s comprehensive situationaland goal-driven assessment is a costlyway to arrive at the false conclusion thatpredictive analytics is overrated.

Once you have gained a base educa-tion in data mining strategy and methodsand have commissioned a thorough projectassessment, you will be well on your waytoward data mining success. When thestarting line is measured and the rightframework is established, the rest of thejourney is relatively straightforward. Youmay proceed through the discovery processwith a structured yet flexible plan for nearlyany scenario, confident that you will reaptremendous rewards for having taken theright approach to data mining.

Eric A. King is president and founderof The Modeling Agency (TMA).TMA is a structured team of senior

consultants that provides training and consulting inpredictive modeling for those who are data-rich, yetinformation-poor. King holds a BS in computer sciencefrom the University of Pittsburgh and has focused on datamining business development and project managementsince 1990. Prior to TMA, King worked for NeuralWare, aneural network tools company, and American HeuristicsCorporation, an artificial intelligence consulting firm.He may be reached at [email protected] or(281) 667-4200 x210.

Editor’s Note: The Modeling Agency offerson-line, on-site, and public vendor-neutral courses inpredictive modeling for practitioners and managerswho are ready to implement data mining solutions. Forinfo: www.the-modeling-agency.com/dmr-special

DMR

©2008 SourceMedia, Inc. and DM Review. All rights reserved. SourceMedia, One State Street Plaza, New York, N.Y. 10004 (800) 367-3989