Marketing Analytics: A Practical Guide to Real Marketing Science

Praiseformarketinganalytics‘ForthoseMBAswhobarelypassedtheirquantitativemarketingandstatisticsclasseswithouttrulyunderstandingthecontent,MarketingAnalyticsprovideseverythingmanagersandexecutivesneedtoknowpresentedasaconversationwithexamplestoboot!You’lldefinitelysoundsmarterintheboardroomafterreadingthisbook!’

JamesMourey,PhDandassistantprofessorofmarketingatDePaulUniversity(Chicago)

‘MarketingAnalyticsisamust-readforanalyticspractitionersandmarketingmanagersseekingacomprehensiveoverviewofthemostactionabletechniquesthatvirtuallyanyorganizationcanapplytogainimmediatebenefits.Ratherthancomplicatethebookwithtechnicaldetailsthatmaynotbeofinteresttoallreaders,DrGrigsbysuccinctlyillustratestheconceptswithrealexamplesandprovidesreferencesforanalystsneedingdeeperguidanceortheory.IwishMarketingAnalyticshadbeenpublished15yearsago–itwould’vesavedmealotofindependentresearch!’

WDeanVogt,Jr,marketingresearchandanalyticspractitioner

‘MarketingAnalyticsisapracticalguidebookwritteninaconversationaltonethatmakescomplextheorieseasilyunderstood.Theauthor’sexperienceintheindustrycombinedwithhisinherentgiftforexplainingeverythingasuccessfulmarketinganalystneedstoknowmakesthisbookamust-read.’

KatyRichardson,FounderandPrincipal,214Creative

‘Thisisagreatbookforpractitionerswhohavelearnedplentyoftheoriesandwanttolearnhowtoapplymethodologies.Itisalsoagreat,easy-to-readresourceforanyonewhodoesnothaveadeeptheoreticalbackgroundbutwantstolearnhowanalyticsworkinreallife.’

IngridGuo,VP,Analytics,andManagingDirector,JavelinMarketingGroup(Beijing)

‘Mike’swritingisstraightforwardandentertaining.Hebringsaconversationalandrelatabletoneandapproachtosomefairlycomplexmaterial.Sometimesmarketerscantakethemselvesalittletooseriously,especiallywhenitcomestothemathematicalsideofthings.Mike’sworkremindsustolightenupandhavefunwithit.’

KatyRollings,PhD,loyaltyanalystatGameStop

‘Thebooksummarizesallthecriticaltopicsinaconsumer-focusedanalyticapproach,andthecasesarefuntoread.’

ErnanHaruvy,PhD,ProfessorofMarketing,UTDallas

‘Thisbookgivesabroadoverviewofmarketinganalyticstopeoplewhodon’thaveanyrelatedbackground…Examplesareexplainedtogivereadersacleareridea.Ithinkthebookisworthareadforanyonewhowantstobecomeamarketinganalyst.’

YuanFang,MSc(marketinganalyticscandidate)

‘Inonesentence,theroleofmarketingistodeterminewhotheorganizationcanserveandhowitcanbestbedone.Tothisend,MikeGrigsbyescortsthereaderthroughthedifficultprocessofunderstanding,explaining,andanticipatingcustomerbehaviour,aptlydeliveredwiththeno-nonsenseauthorityearnedbyveteransofmarketingsuccess.IfMarketingAnalyticsistheclass,I’msittingfrontrow!’

AllynWhite,PhD

‘InhisbookMarketingAnalytics,MikeGrigsbytakespassionatemarketingstrategistsonapractical,real-lifejourneyforsolvingcommonmarketingchallenges.Bycombiningtheconceptsandknowledgeareasofstatistics,marketingstrategyandconsumerbehaviour,Mikerecommendsscientificandinnovativesolutionstocommonmarketingproblemsinthecurrentbusinessenvironment.Everychapterisaninterestingjourneyforthereader.

WhatIlikemostaboutthebookisitssimplicityandhowitappliestorealwork-relatedsituationsinwhichalmostallofushavebeeninvolvedwhilepractisingmarketingofanysort.IalsolikehowMiketalksabouttangiblemeasurementsofstrategicrecommendedmarketingsolutionsaswellashowtheyaddvaluetocompanies’strategicendeavours.Ihighlyrecommendreadingthisbookasitaddsacompletelynewdimensiontomarketingscience.’

KristinaDomazetoska,projectmanagerandimplementationconsultantatInsala–TalentDevelopmentandMentoringSolutions

‘Mike’sbookistherightblendoftheoryappliedtotherealworld,large-scaledataproblemsofmarketing.It’sexactlythebookIwishI’dhadwhenIstartedoutinthisfield.’

JeffWeiner,SeniorDirector,ChannelandEmployeeAnalytics–USRegion,Aimia

‘Iloveyourbook!Itoffersatrulyaccessibleguidetothebasicsandpracticeofmarketinganalytics.Iespeciallylikehowyoubringinyourcorrectinsightsone.g.theoverrelianceoncompetitive(vsconsumer)behaviorinmarketingstrategy.’

KoenHPauwels,AssociateProfessorattheTuckSchoolofBusiness,DartmouthandÖzyeğinUniversity,Istanbul

‘IfoundMarketingAnalyticsinterestingandeasytocomprehend.Ithasluciddescriptionsalongwiththeillustrations,whichcomplementthetext.Evenalaymancanunderstand,asthereisnojargonortechnicallanguageused.’

SunpreetKaurSahni,AssistantProfessoratGNIMT,PhD(marketing)Ludhiana,

Punjab,India

‘Thisisanexcellentreadforpeopleintheindustrywhoworkinstrategyandmarketing.ThisisoneofthefirstbooksthatIhavereadthatcoverstheentirespectrumfromdemand,segmentation,targeting,andhowresultscanbecalculated.Inanagewheremarketingisbecomingmoreandmoresophisticated,thisbookprovidesthetoolsandthemathematicsbehindthefacts.MarketingAnalyticsiswrittenwithascientificvoice,butwasveryreadable,withthesciencewrappedintoeverydayactivities,basedonacharacterwecanallrelateto,thatarederivedfromtheseformulas,ultimatelydrivingROI.’

ElizabethJohnson,VP,ShopperMarketing–DigitalSolutionsRetailigence

‘IstronglyrecommendMarketingAnalyticstobothbeginnersandfolkswhodon’thavemuchbackgroundinstatistics.Averyprecisebook.Complicatedtopicsaroundstatistics,marketingandmodellingarecondensedverywellinamuch-simplifiedlanguage,alongwithreal-worldexamplesandbusinesscases,whichmakesitamusingtoreadandgivesclearunderstandingaboutapplicationsoftheconcepts.Thebooksetsthegroundwithexactlywhatoneneedstoknowfromstatisticsaswellasmarketing,andrunsthroughhowthesetwo,coupledwithanalytics,canhelpsolvereal-worldbusinessproblems.Later,italsocoversMarketResearchtopicsandconcludeswiththeCapstone,coveringapplicationofallthemethodologiestoDigitalAnalytics.IbelievethatMarketingAnalyticswillbeahandyreferenceormanualforstudentsaswellasmarketinganalyticsprofessionals.’

SasmitKhokale,MS(MIS),AnalyticsPractitioner

NoteontheEbookEdition

Foranoptimalreadingexperience,pleaseviewlargetablesandfiguresinlandscapemode.

Thisebookpublishedin2015by

KoganPageLimited

2ndFloor,45GeeStreet

LondonEC1V3RS

UnitedKingdom

www.koganpage.com

©MikeGrigsby,2015

E-ISBN9780749474188

Fullimprintdetails

http://www.koganpage.com

ContentsForeword

Preface

Introduction

PARTONEOverview

01A(little)statisticalreviewMeasuresofcentraltendency

Measuresofdispersion

Thenormaldistribution

Relationsamongtwovariables:covarianceandcorrelation

Probabilityandthesamplingdistribution

Conclusion

Checklist:You’llbethesmartestpersonintheroomifyou…

02Briefprinciplesofconsumerbehaviourandmarketingstrategy

Introduction

Consumerbehaviourasthebasisformarketingstrategy

Overviewofconsumerbehaviour

Overviewofmarketingstrategy

Conclusion


PARTTWODependentvariabletechniques

03Modellingdependentvariabletechniques(withoneequation):whatarethethingsthatdrivedemand?

Introduction

Dependentequationtypevsinter-relationshiptypestatistics

Deterministicvsprobabilisticequations

Businesscase

Resultsappliedtobusinesscase

Modellingelasticity

Technicalnotes

Highlight:Segmentationandelasticitymodellingcanmaximizerevenueinaretail/medicalclinicchain:fieldtestresults

Abstract

Theproblemandsomebackground

Descriptionofthedataset

First:segmentation

Then:elasticitymodelling

Last:testvscontrol

Discussion

Conclusion


04WhoismostlikelytobuyandhowdoItarget?

Introduction

Conceptualnotes

Businesscase

Resultsappliedtothemodel

Liftcharts

Usingthemodel–collinearityoverview

Variablediagnostics

Highlight:Usinglogisticregressionformarketbasketanalysis

Abstract

Whatisamarketbasket?

Logisticregression

Howtoestimate/predictthemarketbasket

Conclusion


05Whenaremycustomersmostlikelytobuy?Introduction

Conceptualoverviewofsurvivalanalysis

Businesscase

Moreaboutsurvivalanalysis

Modeloutputandinterpretation

Conclusion

Highlight:Lifetimevalue:howpredictiveanalysisissuperiortodescriptiveanalysis

Abstract

Descriptiveanalysis

Predictiveanalysis

Anexample


06Modellingdependentvariabletechniques(withmorethanoneequation)

Introduction

Whataresimultaneousequations?

Whygotothetroubleofusingsimultaneousequations?

Desirablepropertiesofestimators

Businesscase

Conclusion


PARTTHREEInter-relationshiptechniques

07Modellinginter-relationshiptechniques:whatdoesmy(customer)marketlooklike?

Introduction

Introductiontosegmentation

Whatissegmentation?Whatisasegment?

Whysegment?Strategicusesofsegmentation

ThefourPsofstrategicmarketing

Criteriaforactionablesegmentation

Aprioriornot?

Conceptualprocess


08Segmentation:toolsandtechniques

Overview

Metricsofsuccessfulsegmentation

Generalanalytictechniques

Businesscase

Analytics

Comments/detailsonindividualsegments

K-meanscomparedtoLCA

Highlight:WhyGoBeyondRFM?

Abstract

WhatisRFM?

Whatisbehaviouralsegmentation?

WhatdoesbehaviouralsegmentationprovidethatRFMdoesnot?

Conclusion

Segmentationtechniques


PARTFOUROther

09MarketingresearchIntroduction

Howissurveydatadifferentthandatabasedata?

Missingvalueimputation

Combatingrespondentfatigue

Afartoobriefaccountofconjointanalysis

Structuralequationmodelling(SEM)


10Statisticaltesting:howdoIknowwhatworks?Everyonewantstotest

Samplesizeequation:usetheliftmeasure

A/Btestingandfullfactorialdifferences

Businesscase


PARTFIVECapstone

11Capstone:focusingondigitalanalyticsIntroduction

Modellingengagement

Businesscase

Modelconception

HowdoImodelmultiplechannels?

Conclusion

PARTSIXConclusion

12TheFinale:whatshouldyoutakeawayfromthis?Anyotherstories/soapboxrants?

WhatthingshaveIlearnedthatI’dliketopassontoyou?

Whatotherthingsshouldyoutakeawayfromallthis?

Glossary

Bibliographyandfurtherreading

Index

Testbanksanddatasetsrelatingtochaptersareavailableonlineat:www.koganpage.com/MarketingAnalytics

http://www.koganpage.com/MarketingAnalytics

IForewordnMarketingAnalyticsMikeGrigsbyprovidesanewwayofthinkingaboutsolvingmarketingandbusinessproblems,withapracticalsetofsolutions.Thisrelevantguideis

intendedforpractitionersacrossavarietyoffields,butisrigorousenoughtosatisfytheappetiteofscholarsaswell.

IcancertainlyappreciateMike’smotivationsforthebook.Thisbookishiswayofgivingbacktotheanalyticscommunitybyofferingadviceandstep-by-stepguidanceforwaystosolvesomeofthemostcommonsituations,opportunities,andproblemsinmarketing.Heknowswhatworksforentry,mid-level,andveryexperiencedcareeranalyticsprofessionals,becausethisisthekindofguidehewouldhavelikedatthesestages.

WhileMike’seducationincludesaPhDinMarketingScience,healsopullsfromhisvastexperiencesfromhisstartasanAnalyst,throughhisjourneytoVPofAnalytics,towalkthereaderthroughthetypesofquestionsandbusinesschallengeswefaceintheanalyticsfieldonaregularbasis.Hisauthorityonthesubjectmatterisobvious,andhisenthusiasmiscontagious,andbestcapturedbymyfavouritesentenceofhisbook:‘Nowlet’slookatsomedataandrunamodel,becausethat’swhereallthefunis.’

Whatthiseducationandexperiencemeansfortherestofusisthatwehaveawell-informedauthorprovidinguswithinsightintotherealitiesofwhatisneededfromtheexcitingworkwedo,andhowwecannotonlyprovidebetterdecisionmaking,butalsomovetheneedleonimportanttheoreticalandmethodologicalapproachesinAnalytics.

Morespecifically,MarketingAnalyticscoversbothinter-relationalanddependency-drivenanalyticsandmodellingtosolvemarketingproblems.Inalightandconversationalstyle(bothengagingandsurprising)Mikearguesthat,ultimately,allmarketsrelyonastrongunderstandingoftheever-changing,difficulttopredict,sometimesfuzzy,andelusivemindsandheartsofconsumers.Anythingwecandotobetterarmourselvesasmarketerstodevelopthisunderstandingiscertainlytimewellspent.Consumerscanandshouldbethefocalpointofgreatstrategy,operationalstandardsofexcellenceandprocesses,tacticaldecisions,productdesign,andsomuchmore,whichiswhyitmakesperfectsensetobetterunderstandnotjustconsumerbehaviours,butalsoconsumerthoughts,opinions,andfeelings,particularlyrelatedtoyourvertical,competitors,andbrand.

Afterareviewofseminalworkonconsumerbehaviour,andanoverviewofgeneralstatisticsandstatisticaltechniques,MarketingAnalyticsdivesintorealisticbusinessscenarioswiththecleveruseofcorporatedialoguebetweenScott,ourfictitiousanalyst,andhisboss.Asourprotagonistprogressesthroughhiscareer,weseeanimprovementinhistoolkitofanalyticaltechniques.Hemovesfromanentrylevelanalystinacubicaltoa

seniorleaderofanalyticswithstaff.Theproblemsbecomemorechallenging,andtheprocessforchoosingtheanalyticstoapplytothesituationspresentedisanuncannyreflectionofreality–atleastbasedonmyexperiences.

WhatIappreciateabsolutelymostaboutthisworkthoughisthefullspectrumofproblemsolving,notjustanalyticsinavacuum.Mikewalksusfromtheinitialmomentwhenaproblemisidentified,throughcommunicationofthatproblem,framingbytheAnalyticsteam,techniqueselectionandexecution(fromthestraightforwardtosomewhatadvanced),communicationofresults,andusefulnesstothecompany.ThisrareandcertainlymorecompletepicturewarrantsatitlesuchasProblemSolvingusingMarketingAnalyticsinlieuoftheshortertitleMikechose.

MarketingAnalyticswillhaveyourethinkingyourmethods,developingmoreinnovativewaystoprogressyourmarketinganalyticstechniques,andadjustingyourcommunicationpractices.Finally,abookweallcanuse!

DrBeverlyWright,VP,Analytics,BKVConsulting

WPreface

e’llstartbytryingtogetafewthingsstraight.Ididnotsetouttowritea(typical)textbook.I’llmentionsometextbooksdownthelinethatmightbehelpfulinsome

areas,butthisistooslimforanacademictome.Leafthroughitandyou’llnotfindanymathematicproofs,noraretherepagesuponpagesofequations.Thisismeanttobeagentleoverview–moreconceptualthanstatistical–forthemarketinganalystwhojustneedstoknowhowtogetonwiththeirjob.Thatis,it’sforthosewhoare,orhopetobe,practitioners.Thisiswrittenwithpractitionersinmind.

IntroductionWhoistheintendedaudienceforthisbook?Thisisnotmeanttobeanacademictomefilledwithmathematicminutiaandclutteredwithstatisticalmumbo-jumbo.Therewillneedtobeanequationnowandthen,butifyourinterestiseconometricrigour,you’reinthewrongplace.AcoupleofgoodbooksforthatareEconometricAnalysisbyWilliamH.Greene(1993)andEconometricModels,TechniquesandApplicationsbyMichaelIntriligator,RonaldG.BodkinandChengHsiao(1996).So,thisbookisnotaimedatthestatistician,althoughtherewillbeafairamountofverbiageaboutstatistics.

Thisisnotmeanttobeareplacementforaprogrammingmanual,eventhoughtherewillbeSAScodesprinkledinnowandthen.Ifyou’reallaboutBI(businessintelligence),whichmeansmostlyreportingandvisualizingdata,thisisnotforyou.

Thiswillnotbeamarketingstrategyguide,butbeawarethatasmathematicsisthehandmaidenofscience,marketinganalyticsisthehandmaidenofmarketingstrategy.Thereisnopointtoanalyticsunlessithasastrategicpayoff.It’snotwhatisinterestingtotheanalyst,butwhatisimpactfultothebusinessthatisthefocusofmarketingscience.

So,towhomisthisbookaimed?Notnecessarilyattheprofessionaleconometrician/statistician,butthereoughttobesomesatisfactionhereforthem.Primarily,theaimisatthepractitioner(orthosewhowillbe).Theintendedaudienceisthebusinessanalystthathastopullatargetedlist,thecampaignmanagerthatneedstoknowwhichpromotionworkedbest,themarketerthatmustDE-marketsomesegmentofhercustomerstogainefficiency,themarketingresearcherthatneedstodesignandimplementasatisfactionsurvey,thepricinganalystthathastosetoptimalpricesbetweenproductsandbrands,etc.

Whatismarketingscience?Asalludedtoabove,marketingscienceistheanalyticarmofmarketing.Marketingscience(interchangeablewithmarketinganalytics)seekstoquantifycausality.Marketingscienceisnotanoxymoron(likemilitaryintelligence,happilymarriedorjumboshrimp)butisanecessary(althoughnotsufficient)partofmarketingstrategy.Itismorethansimplydesigningcampaigntestcells.Itsoverallpurposeistodecreasethechanceofmarketersmakingawrongdecision.Itcannotreplacemanagerialjudgment,butitcanofferboundariesandguardrailstoinformstrategicdecisions.Itencompassesareasfrommarketingresearchallthewaytodatabasemarketing.

Whyismarketingscienceimportant?

Marketingsciencequantifiesthecausalityofconsumerbehaviour.Ifyoudon’tknowalready,consumerbehaviouristhecentre-point,thehub,thepivotaroundwhichallmarketinghinges.Any‘marketing’thatisnotaboutconsumerbehaviour(understandingit,incentingit,changingit,etc.)isprobablyheadingdownthewrongroad.

Marketingsciencegivesinput/informationtotheorganization.Thisinformationisnecessaryfortheverysurvivalofthefirm.Muchlikeanorganismrequiresinformationfromitsenvironmentinordertochange,adaptandevolve,anorganizationneedstoknowhowitsoperatingenvironmentchanges.Tonotcollectandactandevolvebasedonthisinformationwouldbedeath.Tosurvive,forboththeorganizationandtheorganism,insights(fromdata)arerequired.Yes,thisisreasoningbyanalogybutyouseewhatImean.

Marketingscienceteasesoutstrategy.Unlessyouknowwhatcauseswhat,youwillnotknowwhichlevertopull.Marketingsciencetellsyou,forinstance,thatthissegmentissensitivetoprice,thiscohortprefersthismarcom(marketingcommunication)vehicle,thisgroupisundercompetitivepressure,thispopulationisnotloyal,andsoon.Knowingwhichlevertopull(bydifferentconsumergroups)allowsoptimizationofyourportfolio.

Whatkindofpeopleinwhatjobsusemarketingscience?Mostpeopleinmarketingscience(alsocalleddecisionscience,analytics,CRM,direct/databasemarketing,insights,research,etc.)haveaquantitativebent.Theireducationistypicallysomecombinationinvolvingstatistics,econometrics/economics,mathematics,programming/computerscience,business/marketing/marketingresearch,strategy,intelligence,operations,etc.Theirexperiencecertainlytouchesanyandallpartsoftheabove.Theidealanalyticpersonhasastrongquantitativeorientationaswellasafeelforconsumerbehaviourandthestrategiesthataffectit.Asinallmarketing,consumerbehaviouristhefocalpointofmarketingscience.

MarketingscienceisusuallypractisedinfirmsthathaveaCRMordirect/databasemarketingcomponent,orfirmsthatdomarketingresearchandneedtoundertakeanalyticsonthesurveyresponses.Forecastingisapartofmarketingscience,aswellasdesignofexperiments(DOE),webanalyticsandevenchoicebehaviour(conjoint).Inshort,anyquantitativeanalysisappliedtoeconomic/marketingdatawillhaveamarketingscienceapplication.Sowhilethesubjectsofanalysisarefairlybroad,thenumberof(typical)analytictechniquestendstobefairlynarrow.SeeConsumerInsightbyStone,BondandFoss(2004)togetaviewofthisinaction.

WhydoIthinkIhavesomethingtosayaboutmarketingscience?Fairquestion.Mywholecareerhasbeeninvolvedinmarketinganalytics.Formorethan

25yearsI’vedonedirectmarketing,CRM,databasemarketing,marketingresearch,decisionsciences,forecasting,segmentation,designofexperimentsandalltherest.WhilemyBBAandMBAareinfinance,myPhDisinmarketingscience.I’vepublishedafewtradeandacademicarticles,I’vetaughtschoolatbothgraduateandundergraduatelevelsandI’vespokenatconferences,allinvolvedinmarketingscience.I’vedoneallthisforfirmslikeDell,HP,theGapandSprint,aswellasconsultancieslikeTargetbase.OvertheyearsI’vegatheredafewopinionsthatI’dliketosharewithy’all.Andyes,I’vebeeninTexasforover15years.

Whatistheapproach/philosophyofthisbook?Aswithmostnon-fictionwriters,IwrotethisbecauseIwouldhavelovedtohavehadit,orsomethinglikeit,earlier.WhatIhadinminddidnotactuallyexist,asfarasIknew.

IhadbeenapractitionerfordecadesandthereweretimesIjustwantedtoknowwhatIshoulddo,whatanalytictechniquewouldbestsolvetheproblemIhad.Ididnotneedamathematically-orientedeconometricstextbook(likeGreene’s,orKmenta’sElementsofEconometrics(1986)asgreatastheyeachare).Ididnotneedalistofstatisticaltechniques(likeMultivariateDataAnalysisbyHairetal(1998)orMultivariateStatisticalAnalysisbySamKashKachigan(1991))asgreataseachofthemalsoare.WhatIneededwasa(simple)explanationofwhichtechniquewouldaddressthemarketingproblemIwasworkingon.Iwantedsomethingdirect,accessible,andeasytounderstandsoIcoulduseitandthenexplainit.Itwasokayifthebookwentintomoretechnicaldetailslater,butfirstIneededsomethingconceptualtoguideinsolvingaparticularproblem.WhatIneededwasamarketing-focusedbookexplaininghowtousestatistical/econometrictechniquesonmarketingproblems.Itwouldbeidealifitshowedexamplesandcasestudiesdoingjustthat.Voila.

GenerallythisbookhasthesamepointofviewasbookslikePeterKennedy’sAGuidetoEconometrics(1998)andGlennL.UrbanandStevenH.Star’sAdvancedMarketingStrategy(1991).Thatis,thetechniqueswillbedescribedintwoorthreelevels.Thefirstisreallyjustconceptual,devoidofmathematics,andtheaimistounderstand.Thenextlevelismoretechnical,andwilluseSASorsomethingelseasneededtoillustratewhatisinvolved,howtointerpretit,etc.Thenthefinallevel,ifthereisone,willberathertechnicalandaimedreallyonlyattheprofessional.Andtherewillbebusinesscasestoofferexamplesofhowanalyticssolvesmarketingquestions.

OnethingIlikeaboutStephanSorger’s2013book,MarketingAnalytics,isthatintheopeningpageshechampionsaction-ability.Marketingsciencehastobeaboutaction-ability.IknowsomeacademicpuristswillreadthefollowingpagesandgaspthatIoccasionallyallow‘badstats’tocreepin.(Forexample,itiswellknownthatforecastingoftenisimprovedifcollinearindependentvariablesarefound.Shock!)Butthepointisthatevenanimperfectmodelisfarmorevaluablethanwaitingforacademicwhitetower

purity.Businessisabouttimeandmoneyandevenacloudyinsightcanhelpimprovetargeting.Putsimply,thisbook,andmarketingscience,isultimatelyaboutwhatworks,notwhatwillbepublishedinanacademicresearchpaper.

Alloftheabovewillbecastintermsofbusinessproblems,thatis,intermsofmarketingquestions.Forexample,amarketer,say,needstotargethismarketandhehastolearntodosegmentation.Orshehastomanageagroupthatwilldosegmentationforher(aconsultant)andneedstoknowsomethingaboutitinordertointelligentlyquestion.Theproblemwillbeaddressedintermsofwhatissegmentation,whatdoesitmeantostrategy,whydoit,etc.Thenadescriptionofseveralanalytictechniquesusedforsegmentationwillbedetailed.Thenafairlyinvolvedandtechnicaldiscussionwillshowmoreadditionalstatisticaloutput,andanexampleortwowillbeshown.ThisoutputwilluseSAS(orSPSS,etc.)asnecessary.Thiswillalsohelpguidestudentsastheypreparetobecomeanalysts.

Therefore,thephilosophyistopresentabusinesscase(aneedtoanswerthemarketingquestions)anddescribeconceptuallyvariousmarketingsciencetechniques(intwoorthreeincreasinglydetailedlevels)thatcananswerthosequestions.Thenwith,say,SASoutputwillbedevelopedthatshowshowthetechniqueworks,howtointerpretitandhowtouseittosolvethebusinessproblem.Finally,moretechnicaldetailsmaybeshown,asneeded.Okay?

So,ontoalittlestatisticalreview.

Partone

Overview

01

A(little)statisticalreviewMeasuresofcentraltendency

Measuresofdispersion

Thenormaldistribution

Relationsamongtwovariables:covarianceandcorrelation

Probabilityandthesamplingdistribution

Conclusion

Checklist:You’llbethesmartestpersonintheroomifyou…Youknewwehadtodothis,haveageneralreviewofbasicstatistics.Ipromise,it’llbemostlyconceptual,agentlereminderofwhatwelearnedinIntroductoryStatistics.AlsonotetheDefinitionBoxeshelpingtodescribekeyterms,pointoutjargon,etc.

MeasuresofcentraltendencyFirstwe’lldealwithsimpledescriptivestatistics,confinedtoonevariable.We’llstartwithmeasuresofcentraltendency.

Measuresofcentraltendencyincludethemean,medianandmode.

Mean:adescriptivestatistic,ameasureofcentraltendency,themeanisacalculationsummingupthevalueofalltheobservationsanddividingbythenumberofobservations.

Themeaniscalculatedas:

Thatis,sumalltheobservationsup(alltheindividualXs)andthendividebythenumberofobservations(Xs).Thisiscommonlycalled‘theaverage’butI’dliketoofferadifferentviewof’average’.

Average:themostrepresentativemeasureofcentraltendency,NOTnecessarilythemean.

Averageisthemeasureofcentraltendency,thenumbermostlikelytooccur,themostrepresentativenumber.Thatis,itmightnotbethemean;itcouldbethemedianoreven

themode.Thisisourfirstincursionintoastatisticalwayofthinking.

I’dliketopersuadeyouthatit’spossible,forexample,thatthemedianismorerepresentativethanthemean,insomecases–andthatinthosecasesthemedianistheaverage,themostrepresentativenumber.

Median:themiddleobservationinanoddnumberofobservations,orthemeanofthemiddletwoobservations.

Themedianis,bydefinition,thenumberinthemiddle,the50thpercentile,thatvaluethathasjustasmanyobservationsaboveitasbelowit.

ConsiderhomesalespricesviaFigure1.1.Themeanis141,000butthemedianis110,000.Whichnumberismostrepresentative?Isubmititisnotthemean,butthemedian.Ialsosubmitthatthebestmeasureofcentraltendency,inthisexample,isthemedian.Thereforethemedianistheaverage.Iknowthat’snotwhatyoulearnedinthirdgrade,butgetusedtoit.Statisticshasawayofturningoneslightlyaskew.

Figure1.1Homesalesprices

Justtobeclear,Isuggestthatthemeasureofcentraltendencythatbestdescribesthehistogramaboveshouldbecalled‘average’.Modeisthenumberthatappearsmostoften,medianistheobservationinthemiddleandmeanistheobservationssummedovertheircount.

Mode:thenumberthatappearsmostoften.

Averageisthemostrepresentativenumber.Ofcourseitdoesn’thelpthisargumentthatExceluses=AVERAGE()asthefunctiontocalculatethemeaninsteadof=MEAN().I’vetriedaskingBillaboutitbuthe’snotreturnedmycalls,sofar.

MeasuresofdispersionMeasuresofcentraltendencyalonedonotadequatelydescribethevariable(avariableisa

thingthatvaries,likehomesalesprices).Theotherdimensionofavariableisdispersion,orspread.

Therearethreemeasuresofdispersion:range,varianceandstandarddeviation.

Range:ameasureofdispersionorspread,calculatedasthemaximumvaluelesstheminimumvalue.

Rangeiseasy.It’ssimplytheminimum(smallestvalue)observationsubtractedfromthemaximum(largestvalue).It’snotparticularlyuseful,especiallyinamarketingcontext.

Varianceisanothermeasureofdispersionorspread.

Variance:ameasureofspread,calculatedasthesummedsquareofeachobservationlessthemean,dividedbythecountofobservationslessone.

Conceptuallyittakeseachobservationandsubtractsthemeanofalltheobservationsfromit,thensquareseachobservationandaddsupthesquares.Thatquantityisdividedbyn–1,thetotalnumberofobservations,lessone.Theformulaisbelow.Notethisisthesampleformula,nottheformulaforthepopulation.

(NotethatX-baristhesymbolforsamplemean,whileµwouldbethesymboltouseforpopulationmean;swouldbethesymboltouseforsamplestandarddeviationandσwouldbethesymboltouseforpopulationstandarddeviation.)

Now,whatdoesvariancetellus?Unfortunately,notmuch.Itsaysthat(fromTable1.1)thisvariableof18observationshasameanof25andavariance,orspread,of173.6.Butvariancegetsustothestandarddeviation,whichDOESmeansomething.

Table1.1Variance

X X-mean squared

2 –23 529.3

5 –20 400.3

8 –17 289.2

10.9 –14.1 199.3

13.9 –11.1 123.6

16.9 –8.1 65.9

19.9 –5.1 26.2

22.9 –2.1 4.5

25.9 0.9 0.8

28.9 3.9 15.1

31.9 6.9 47.4

33 8 63.9

34 9 80.9

35 10 99.9

36 11 120.9

39 14 195.8

42 17 288.8

45 20 399.7

Mean=25.0 Sum=2,951.3

Count=18 Variance=173.6

Standarddeviation:thesquarerootofvariance.

Standarddeviationiscalculatedbytakingthesquarerootofvariance.Inthiscasethesquarerootof173.6is13.17.Now,whatdoes13.17mean?Itdescribesspreadordispersioninawaythatremovesthescaleofthevariable.Thatis,thereareknownqualitiesofastandarddeviation.Inafairlynormaldistributiondispersionisspreadaroundthemean(whichequalsthemodewhichequalsthemedian).Thatis,thereisasymmetricalspreadaroundthemeanof25.Inthiscasethespreadis25+/–13.17.Thatmeansthat,ingeneral,onestandarddeviation(+/–13.17)fromthemeanwillcontain68%ofallobservations:seeFigure1.2.Thatis,asthecountincreases(basedonthecentrallimittheorem)thedistributionapproachesnormal.Inanormal(bell-shaped)curve,50%ofallobservationsfalltotheleftofthemeanand50%ofallobservationsfalltotherightofthemean.Knowingthestandarddeviationgivesinformationaboutthevariablethatcannotbeobtainedanyotherway.

Figure1.2Standarddeviation

So,bysayingavariablehasameanof25andastandarddeviationof13.17,automaticallymeansthat68%ofallobservationsarebetween11.8and38.2.ThisimmediatelytellsmethatifIfindanobservationthatis<11.8,itisalittlerare,orunusual,giventhat68%willbe>11.8(and<38.2).

So,onestandarddeviationaccountsfor34%belowthemeanand34%abovethemean.Thesecondstandarddeviationaccountsfor14%andthethirddeviationaccountsforalmost1.99%.Thismeansthatthreestandarddeviationstotheleftofthemeanaccountsfor34%+14%+1.99%,ornearly50%ofallobservations.Likewiseforthepositive/rightsideofthemean.

Asanexample,itiswellknownthatIQhasameanof100andastandarddeviationofabout15.Thismeansthat34%ofthepopulationshouldfallbetween100and115.Thisisbecausethemeanis100andthestandarddeviationis15,or115.Thesecondstandarddeviationaccountsforanother14%.Or48%(34%+14%)ofthepopulationshouldbebetween100and130.Finally,justunder2%willbe>3standarddeviation,orhavinganIQ>130.Soyouseehowusefulthestandarddeviationis.Itimmediatelygivesmoreinformationaboutthespread,orhowlikelyorunusualparticularobservationsare.Forexample,ifwehadanIQtestthatshowed150,thisisaVERYrareevent,inthatit’sintherealmof>4standarddeviations:100–115is1,115–130is2,145is3and150is3.33standarddeviationsabovethemean.

ThenormaldistributionI’vealreadymentionedthenormaldistributionbutlet’ssayacouplemoreclarifyingthingsaboutit.Thenormaldistributionisthetraditionalbell-shapedcurve.Onecharacteristicofanormaldistributionisthatthemeanandthemedianandthemodearevirtuallythesamenumber.Thenormaldistributionissymmetricaboutthemeasureofcentraltendency(mean,medianandmode)andthestandarddeviationdescribesthespread,asabove.

Let’salsomentionthecentrallimittheorem.Thissimplymeansthatasn,orthecount,

increases,thedistributionapproachesanormaldistribution.Thisallowsustotreatallvariablesasnormal.

Nowforaquickwordaboutz-scoresasthiswillbehandylater.

Z-score:ametricdescribinghowmanystandarddeviationsanobservationisfromitsmean.

Az-scoreisameasureofthenumberofstandarddeviationsanobservationisrelativetoitsmean.Itconverts

anobservation,intothenumberofstandarddeviationsaboveorbelowthemeanbytakingtheobservation(Xi)andsubtractingthemeanfromitandthendividingthatquantitybyitsstandarddeviation.IntermsofIQ,anobservationof107.5willhaveaz-scoreof(107.5–100)/15,or0.5.ThismeansthatanIQof107.5isone-halfastandarddeviationabovethemean.Since34%(from100–115)lieabovethemean,az-scoreof0.5meansthisobservationoccurshalfway,orabout17%,abovethemean.Thismeansthisobservationis17%aboveaverage(whichis50%)orgreaterthan67%ofthepopulation.Notethat17%+14%+1.99%(orabout33%)areabovethisobservation.

Relationsamongtwovariables:covarianceandcorrelationAlloftheabovedescriptivediscussionswereaboutonevariable.Rememberthatavariableisanitemthattakesonmultiplevalues.Thatis,avariableisathingthatvaries.Nowlet’stalkabouthavingtwovariablesandthedescriptivemeasuresofthem.

CovarianceCovariance,likevariance,ishowonevariablevariesintermsofanothervariable.

Covariance:thedispersionorspreadoftwovariables.

It,likevariance,doesnotmeanmuch;it’sjustanumber.Ithasnoscale,norboundaries,andinterpretationisminimal.Theformulais:

ItmerelydescribeshoweachXobservationvariesfromitsmean,intermsofhoweachYobservationvariesfromitsmean.Thensumtheseupanddividebyn,thecount.Again,thenumberisnearlyirrelevant.

SaywehavethedatasetinTable1.2.Notethecovarianceis77.05,whichagainmeansverylittle.

Table1.2Covarianceandcorrelation

X Y

2 3

4 5

6 7

8 9

9 9

11 11

11 8

13 10

15 12

17 14

19 16

21 22

22 22

24 11

26 12

28 22

30 24

32 26

33 28

33 39

Covar= 77.05

Correl= 87.90%

CorrelationCorrelation,likestandarddeviation,doeshaveameaning,andanimportantone.

Correlation:Ameasureofbothstrengthanddirection,calculatedasthecovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.

Correlationexpressesbothstrengthanddirectionofthetwovariables.Itrangesfrom–100%to+100%.Anegativecorrelationmeansthatas,say,Xgoesup,Ytendstogodown.Averystrongpositivecorrelation(say80%or90%)meansthatasXgoesupby,say10,Yalsogoesupbynearlythesameamount,maybe8or9.NotethatinTable1.2thecorrelationis87.9%whichisprobablyaverystrongrelationshipbetweenXandY.TheformulaforcorrelationiscovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.Thatis,togofromcovariancetocorrelation,covarianceisdividedbythestandarddeviationofxmultipliedbythestandarddeviationofy.Theformulais:

ProbabilityandthesamplingdistributionProbabilityisanimportantconceptinstatisticsofcourseandI’llonlytouchonithere.

First,let’stalkabouttwokindsofthinking:deductiveandinductive.Deductivethinkingiswhatyouaremostfamiliarwith:basedonrulesoflogicandconclusionsfromcausality.Becauseofthisthing,thisconclusionmustbetrue.However,statisticalthinkingisinductive,notdeductive.Inductivethinkingreasonsfromsampletopopulation.Thatis,statisticsisaboutinferencesandgeneralizingtheconclusion.Thisiswhereprobabilitycomesin.Typically,inmarketing,weneverhavethewholepopulationofadataset:wehaveasample.

Here’swhereitgetsalittletheoretical.SaywehaveasampleofdataonXthatcontains1,000observationswithameanof50.Now,theoretically,wecouldhaveaninfinitenumberofsamplesthathaveavarietyofmeans.Indeed,weneverknowwhereoursampleis(withitsmeanof50)inthetotalpossibilityofsamples.Ifwedidhavealargenumberofsamplesdrawnfromthepopulationandwecalculatedthosemeansofthosesamplesthatwouldconstituteasamplingdistribution.

Forexample,saywehaveabarrelcontaining100,000marbles.Thatisthewholepopulation.10%ofthesemarblesareredand90%ofthesemarblesarewhite.Wecanonlydrawasampleof100atatimeandcalculatethemeanofredmarbles.

Inthiscase(contrivedasitis)weKNOWtheaveragenumberofmarblesdrawn,overall,willbe10%.Butnote–andthisisimportant–thereisnoguaranteethatanyoneofoursamplesof100willactuallybe10%.Itcouldbe5%(3.39%ofthetimeitwillbe)anditcouldbe14%(5.13%ofthetimeitwillbe).Itwill,ofcourse,onaverage,be10%.Indeed,only13.19%ofthesampledrawnwillactuallybe10%!Thebinomialdistributiontellsustheabovefacts.

Therefore,wecouldhavedrawnanunusualsamplethathadonly5%redmarbles.Thiswouldoccur3.39%ofthetime,roughly1outof33.That’snotthatrare.Andwehavein

actualitynowaytoreallyknowhowlikelythesamplewehaveistocontainthepopulationmeanof10%.Thisiswhereconfidenceintervalscomein,whichwillbedealtwithlaterinstatisticaltesting.

ConclusionThat’sallIwanttomentionintermsofstatisticalbackground.Morewillbeappliedlater.Nowlet’sgetonwiththefun.

Checklist

You’llbethesmartestpersonintheroomifyou:

Rememberthreemeasuresofcentraltendency:mean,medianandmode.

Rememberthreemeasuresofdispersion:range,varianceandstandarddeviation.

Constantlypointouttherealdefinitionofaverageas‘themostrepresentativenumber’,thatis,itmightNOTnecessarilybethemean.

Alwayslookatametricintermsofbothcentraltendencyaswellasdispersion.

Thinkofaz-scoreasameasureofthelikelihoodofanobservationoccurring.

Observethatcorrelationisabouttwodimensions:strengthanddirection.

02

BriefprinciplesofconsumerbehaviourandmarketingstrategyIntroduction



Overviewofmarketingstrategy

Conclusion


IntroductionYouwillnotethatIhavetiedtwosubjectstogetherinthischapter;consumerbehaviourandmarketingstrategy.That’sbecausemarketingstrategyisallaboutunderstandingconsumerbehaviourandincentivizingitinsuchawaythatthefirmandtheconsumerbothwin.Iknowalotofmarketerswillbesaying,‘Butwhataboutcompetitors?Aretheynotpartofmarketingstrategy?’Andtheansweris,‘No,notreally.’Iamawareofthegaspsthiswillcause.

Byunderstandingconsumerbehaviour,partofthatinsightwillcomefromwhatexperienceconsumershavewithcompetitors,butthefocusisonconsumer,notcompetitive,behaviour.IknowJohnNashandhisworkingametheorytakesabackseatinmyview,butthisisonpurpose.Muchlikethefinancialmotto‘watchthepenniesandthedollarswillfollow’,Isay,‘focusontheconsumerandcompetitiveunderstandingwillfollow’.

Justtobeclear,marketingscienceshouldbeattheconsumerlevel,NOTthecompetitivelevel.Byfocusingoncompetitorsyouautomaticallymovefromamarketingpointofviewtowardafinancial/economicpointofview.


Inmarketing,theconsumeriscentralIliketouseStevenP.Schnaars’MarketingStrategybecauseofthefocusonconsumerbehaviour(Schnaars,1997).Andbecausehe’sright.Amarketingorientationisconsumer-

centric;anythingelseisbydefinitionNOTmarketing.Marketingdrivesfinancialresultsandinordertobemarketing-orientedtheremustbeaconsumer-centricfocus.Thatmeansallmarketingactivitiesaregearedtolearnandunderstandconsumer(andultimatelycustomer)behaviour.

Themarketingconceptdoesnotmeangivingtheconsumer(only)whattheywant,because:

1. theconsumer’swantscanbewidelydivergent;2. theconsumer’swantscontradictthefirm’sminimumneeds;and3. theconsumermightnotknowwhattheywant.Itismarketing’sjobtolearnand

understandandincentivizeconsumerbehaviourtoawin-winposition.

Theobjectionfromproduct-centricmarketersAsafairargument,consumer-centricityrunscontratoproductmanagers.ProductmanagersfocusondevelopingproductsandTHENfindingconsumerstobuythem.(Immediateexamplesthatspringtomindcomefromtechnology,suchasoriginalHP,Apple,etc.)Thissometimesworks,butoftenitdoesnot.TheposterchildforproductfocusregardlessofwhatconsumersthinktheywantisChrysler’sminivanstrategy.ThestoryisthatChryslerchiefLeeIacoccawantedtodesignandproducetheminivanbutthemarketresearchtheydidtoldhimtherewasnodemandforit.Consumerswereconfusedbythe‘halfwaybetweenacarandaconversion(full-size)van’andwerenotinterestedinit.IacoccawentaheadanddesignedandbuiltitanditbasicallysavedChrysler.Whatisthepoint?Onepointisthatconsumersdonotalwaysknowwhattheywant,especiallywithanew/innovativeproducttheyhavenoexperiencewith.ThesecondpointisthatnoteveryonehasthegeniusofLeeIacocca.


BackgroundofconsumerbehaviourAsimpleviewofconsumerbehaviourisbestunderstoodinthemicroeconomicanalysisof‘theconsumerproblem’.Thisisgenerallysummarizedinthreequestions:

1. Whatareconsumers’preferences(intermsofgoods/services)?2. Whatareconsumers’constraints(allocatinglimitedbudgets)?3. Givenlimitedresources,whatareconsumers’choices?

Thisassumesthatconsumersarerationalandhaveadesiretomaximizetheirsatisfaction.

Let’stalkaboutgeneralassumptionsofconsumerpreferences.Thefirstisthatpreferencesarecomplete,meaningconsumerscancompareandrankallproducts.Thesecondassumptionisthatpreferencesaretransitive.Thisisthemathematicrequirement

thatifXispreferredtoYandYispreferredtoZthenXispreferredtoZ.Thethirdassumptionisthatproductsaredesirable(a‘good’isgoodorofvalue).Thismeansthatmoreisbetter(costsnotwithstanding).

Aquicklookintotheassumptionsabovemakesitclearthattheyaremadeinordertodothemathematics.Thisultimatelymeansthatcurveswillbeproduced(thebaneofmostmicroeconomicsstudents)thatlendthemselvestosimplegraphics.Thisimmediatelyleadsintousingthecalculusforanalyticreasons.Calculusrequiressmoothcurvesandtwicedifferentiabilityinordertowork.THISmeansthatsomeheroicassumptionsindeedarerequired,especiallyceterisparibus(holdingallotherthingsconstant).

ThedecisionprocessConsumersgothroughashopping-purchasingprocess,usingdecisionanalyticstocometoachoice.Itshouldberecognizedthatnotalldecisionsareequallyimportantorcomplex.Basedontheriskofawrongchoice,eitherextendedproblemsolvingorlimitedproblemsolvingwilltendtobeused.

Extendedproblemsolvingisusedwhenthecostoftheproductishigh,ortheproductwillbelivedwithforalongtime,orit’stheinitialpurchase,etc.Somethingaboutthechoicerequiresmorethought,evaluationandrigour.

Limitedproblemsolvingisofcoursetheopposite.Whenproductsareinexpensive,shortlived,notreallyimportantorwithlowriskofa‘wrong’decision,limitedproblemsolvingisused.Oftenoneormoreofthe(below)stepsareomitted.Thechoiceismoreautomatic.Thechoiceisusuallyreducedtoarule:whatexperiencetheconsumerhashadbefore,whatbrandtheyhavedisliked,whatpriceislowenough,whattheirneighbourshavetoldthem,etc.

Thetypicaldecisionprocessintermsofconsumerbehaviour(forexample,seeConsumerBehaviorbyEngel,BlackwellandMiniard,1995)isaboutneedrecognition,searchforinformation,informationprocessing,alternativeevaluation,purchaseandpost-purchaseevaluation.Therearemarketingopportunitiesalongeachsteptoinfluenceandincent.

Needrecognition

Theinitiatoroftheconsumerdecisionprocessisneedrecognition.Thisisarealizationthatthereisa‘cognitivedissonance’betweensomeidealstateandthecurrentstate.Thereismuchadvertisingaroundneedarousal.Fromeducatingconsumersonrealneeds(survival,satisfaction)toinformingconsumersaboutpseudo-needs(‘jumponthebandwagon–allofyourfriendshavealreadydoneit!’)needarousaliswhereitstarts.

Searchforinformation

Nowtheconsumerrecallswhattheyhaveheardorwhattheyknowabouttheproductto

infer,dependingonwhethertheproductrequireslimitedorextensiveengagement,anabilitytomakeadecision.Obviouslyadvertisingandbrandingcomeintoplayhere,informingconsumersofbenefits,differentiation,etc.

Informationprocessing

Thenextstepisfortheconsumertoabsorbwhatinformationtheyhaveandwhatfactstheyknow.MostmarketingmessagingstrategiespreferforconsumerstoNOTprocessinformation,buttorecallsuchthingsaspositivebrandexposure,satisfactionfrompreviousinteractionsoremotionalloyalty.Ifconsumersdonot‘process’information(iecriticallyevaluatecostsandbenefits)thentheycanusebrandequity/satisfactiontomaketheshorthanddecision.Itismarketingscience’sjobtofindthosethatareconsidering,distinctfromthosethathave‘alreadydecided’.

Pre-purchasealternativeevaluation

Now,afterinformationhasbeenprocessed,comesthecriticalfinalcomparison:doesthepotentialproducthaveattributestheconsumerconsidersgreaterthantheconsumer’sstandards?Thatis,givenbudgetarystandards,whatistheproductlikelytoofferintermsofsatisfaction(economicutilization)aftertheconsumerhasdecideditisaboveminimumqualifications?

Purchase

Finally,thewholepointofthemarketingfunnelispurchase.Asaleisthelastpiece.Thisisthedecisionoftheconsumerbasedontheshoppingprocessdescribedabove.Theactualpurchaseactioncarrieswithinitalltheabove(andbelow)processesandalloftheactualandperceivedproductattributes.

Post-purchaseevaluation

Buttheconsumerdecisionprocessdoesnot(usually)endwithpurchase.Generallyitisacomparisonwithwhattheconsumerthought(hoped)wouldbetheutilizationgainedfromconsumingtheproductcomparedtowhatactual(perceived)satisfactionwasreceivedfromtheproduct.Thatis,thecreationofloyaltystartspostpurchase.

Now,withconsumerbehaviourcentrallylocated,let’sthinkaboutafirm’sstrategy.Keepthedifferencesbetweencompetitivemovesandconsumerbehaviourfirmlyinmind.

OverviewofmarketingstrategyTheabovewastofocusonconsumerbehaviour.Marketing,tobemarketing,isaboutunderstandingandincentivizingconsumerbehaviourinsuchawaythatboththeconsumerandthefirmgetwhattheywant.Consumerswantaproductthattheyneedwhentheyneeditatapricethatgivesthemvaluethroughachanneltheyprefer.Firmswantloyalty,customersatisfactionandgrowth.Sinceamarketisaplacewherebuyersandsellersmeet,

marketingisthefunctionthatmovesthebuyersandsellerstowardeachother.

Giventheabove,itshouldbenotedthatmarketingstrategyhasevolved(primarilyviamicroeconomics)toafirmvs.firmrivalry.Thatis,marketingstrategyisindangerofforgettingthefocusonconsumerbehaviourandjumpingdeepintosomethinglikegametheorywhereinonefirmcompeteswithanotherfirm.

Everythingthatfollowsaboutmarketingstrategycanbethoughtofasanindirectconsequenceoffirmvs.firmbasedonadirectconsequenceoffocusingonconsumerbehaviour.Thatis,fightingafirmmeansincentivizingconsumers.Thinkofitasaniceberg:whatisseen(firmscompeting)isthetipabovethesurface,butwhatisreallyhappeningthatmovestheicebergisunseen(fromotherfirm’spointofview)belowthesurface(incentivizingconsumers).

TypesofmarketingstrategyEveryoneshouldbeawareofMichaelPorterandhismonumentalarticleandbookaboutcompetitivestrategy(Porter,1979/1980).Thisiswheremarketingstrategybecameadiscipline.

FirstPorterdetailedfactorscreatingcompetitiveintensity.(Tomakeanobviouspoint:whatarefirmscompetingover?Consumerloyalty.)Thesefactorsarethebargainingpowerofsuppliers,thebargainingpowerofbuyers,thethreatofnewentrants,therivalryamongexistingfirmsandthethreatofsubstituteproducts:

Thebargainingpowerofbuyersmeansfirmsloseprofitfrompowerfulbuyersdemandinglowerprices.Thismeansconsumersaresensitivetoprice.

Thebargainingpowerofsuppliersmeansfirmsloseprofitduetopotentialincreasedfactor(input)prices.Suppliersonlyhavebargainingpowerbecauseafirm’smarginsarelow,becauseafirmcannotraiseprices,becauseconsumersaresensitivetoprice.

Thethreatofnewentrantslowersprofitsduetonewcompetitorsenteringthemarket.Again,consumersaresensitivetopriceandveryinformedabouttheotherfirm’sofferings.

Theintensityofrivalrycauseslowerpricesbecauseofthezerosumgamesuppliedbyconsumers.Thereareonlyacertainnumberofpotentialloyalcustomersandifafirmgainsonethenanotherfirmlosesthatone.

Thethreatofsubstituteproductsinvitesconsumerstochooseamongthelower-pricedproducts.

Notehowallofthisstrategy(whichappearslikefirmsfightingotherfirms)isactuallybasedonconsumerbehaviour.AmIputtingtoofineapointonthis?Maybe,butitdoeshelpusfocus,right?

Basedonthesefactorsafirmcanascertaintheintensityofcompetition.Themorecompetitivetheindustryis,themoreafirmmustbeapricetaker,thatis,theyhavelittlemarketpower,meaninglittlecontroloverprice.Thisaffectstheamountofprofiteachfirmintheindustrycanexpect.Giventhis,afirmcanevaluatetheirstrengthsandweaknessesanddecidehowtocompete.Ornot.

Porterthendidabrilliantthing:hedevised,basedontheabove,threegenericstrategies.Afirmcancompeteoncosts(bethelow-costprovider),afirmcandifferentiateandfocusonhigh-endproductsorafirmcansegmentandfocusonasmaller,nichepartofthemarket.Thepointisthefirmneedstocreateandadheretoaparticularstrategy.Oftenfirmsaredilutedanddoeverythingatonce.

However,TreacyandWiersematookPorter’sframeworkandevolvedit(TreacyandWiersema,1997).Theytoocameupwiththreestrategies(disciplines):operationalexcellence(basicallyafocusonlowercosts),productleadership(afocusonhigher-enddifferentiatedproducts)andcustomerintimacy(adifferentiation/segmentationstrategy).YoucanseetheiruseandextensionofPorter’sideas.Bothhavethesamebottomline:firmsshouldbedisciplinedandconcentratetheireffortscorporate-wideonprimarilyone(andonlyone)strategicfocus.

AppliedtoconsumerbehaviourStephanSorger’sexcellentMarketingAnalytics(Sorger,2013)hasabriefdescriptionofcompetitivemoves,bothoffensiveanddefensive.Summariesofeachmovebutappliedviaconsumerbehaviourarenowconsidered.

Defensivereactionstocompetitormoves

Bypassattack(theattackingfirmexpandsintooneofourproductareas)andthecorrectcounterisforustoconstantlyexplorenewareas.RememberTheodoreLevitt’sMarketingMyopia(Levitt,1960)?Ifnot,re-readit;youknowyouhadtoinschool.

Encirclementattack(theattackingfirmtriestooverpoweruswithlargerforces)andthecorrectcounteristomessagehowourproductsaresuperior/uniqueandofmorevalue.Thisrequiresaconstantmonitoringofmessageeffectiveness.

Flankattack(theattackingfirmtriestoexploitourweaknesses)andthecorrectcounteristonothaveanyweaknesses.Thisagainrequiresmonitoringandmessagingtheuniqueness/valueofourproducts.

Frontalattack(theattackingfirmaimsatourstrength)andthecorrectcounteristoattackbackinthefirm’sterritory.Obviouslythisisararelyusedtechnique.

Offensiveactions

Newmarketsegments:thisusesbehaviouralsegmentation(seethelatterchaptersonsegmentation)andincentsconsumerbehaviourforawin-winrelationship.

Go-to-marketapproaches:thislearnsaboutconsumers’preferencesintermsofbundling,channels,buyingplans,etc.

Differentiatingfunctionality:thisapproachextendsconsumers’needsbyofferingproductandpurchasecombinationsmostcompellingtopotentialcustomers.

ConclusionTheabovewasabriefintroductiononbothconsumerbehaviourandhowthatbehaviourappliestomarketingstrategy.Theover-archingpointisthatmarketingscience(andmarketingresearch,marketingstrategy,etc.)shouldallbefocusedonconsumerbehaviour.Goodmarketingisconsumer-centric.Haveyouheardthatbefore?

Checklist


Rememberthatinmarketing,theconsumeriscentral,NOTTHEFIRM.

Pointouttheconsumer’sproblemisalwayshowtomaximizeutilization/satisfactionwhilemanagingalimitedbudget.

Thinkabouttheconsumer’sdecisionprocesswhileundertakingallanalyticprojects.

Recallthatstrategyisafocusonconsumerbehaviour,notcompetitivebehaviour.

RememberthatbothPorterandTreacyandWiersemaprovidethreegeneralstrategies.

Observethatcompetitivecombatcanbethoughtofintermsofconsumerbehaviour.

Parttwo

Dependentvariabletechniques

03

Modellingdependentvariabletechniques(withoneequation)Whatarethethingsthatdrivedemand?Introduction

Dependentequationtypevsinter-relationshiptypestatistics

Deterministicvsprobabilisticequations

Businesscase

Resultsappliedtobusinesscase

Modellingelasticity

Technicalnotes

Highlight:Segmentationandelasticitymodellingcanmaximizerevenueinaretail/medicalclinicchain:fieldtestresults


IntroductionNow,ontothefirstmarketingproblem:determiningandquantifyingthosethingsthatdrivedemand.Marketingisaboutconsumerbehaviour(whichI’vetouchedonbutaboutwhichIwillhavemoretosaylater)andthepointofmarketingisaboutincentivizingconsumerstopurchase.Thesepurchases(typicallyunits)arewhateconomistscalldemand.(Bytheway,financeismoreaboutsupplyandthetwotogetheraresupplyanddemand.RememberbackinBeginningEconomics?)

Dependentequationtypevsinter-relationshiptypestatisticsBeforewediveintotheproblemathand,itmightbegoodtobackupandgivesomesimpledefinitions.Therearetwokindsof(general)statisticaltechniques:thedependentequationtypeandtheinter-relationshiptype.Dependenttypestatisticsdealwithexplicitequations(whichcaneitherbedeterministicorprobabilistic,seebelow).Inter-relationshiptechniquesarenotequations,butthevariancebetweenvariables.Thesewillbecovered/definedlaterbutaretypesoffactoranalysisandsegmentation.Clearlythis

currentchapterisaboutanequation.

DeterministicvsprobabilisticequationsNowlet’stalkabouttwokindsofequations:deterministicandprobabilistic.Deterministicisalgebraic(y=mx+b)andtheleftsideexactlyequalstherightside.

Profit=Revenue–expenses.

Ifyouknowtwoofthequantitiesyoucanalgebraicallysolveforthethird.ThisisNOTthekindofequationdealtwithinstatistics.Ofcoursenot.

Statisticsdealswithprobabilisticequations:

Y=a+bXi+e.

HereYisthedependentvariable(say,sales,unitsortransactions),aistheconstantorintercept,Xissomeindependentvariable(s)(say,price,advertising,seasonality),bisthecoefficientorslopeandeistherandomerrorterm.It’sthisrandomerrortermthatmakesthisequationaprobabilisticone.Ydoesnotexactly=a+bXibecausethereissomerandomdisturbance(e)thatmustbeaccountedfor.ThinkofitasY,onaverage,equalssomeinterceptplusbXi.

Asanexample,saySales=constant+price*slope+error,thatis,Sales=a+Price*b+e.NotethatY(sales)dependsonprice,+/–.

BUSINESSCASEOk,saywehaveaguy,Scott,who’sananalyticmanagerataPCmanufacturingfirm.ScotthasanMSineconomicsandhasbeendoinganalyticsforfouryears.HestartedmostlyasanSASprogrammerandhasonlyrecentlybeenusingstatisticalanalysistogiveinsightstodrivemarketingscience.

Scottiscalledintohisboss’soffice.Hisbossisagoodstrategistwithadirectmarketingbackgroundbutisnotwellversedineconometrics/analytics,etc.

‘Scott’,thebosssays,‘weneedtofindawaytopredictourunitsales.Morethanthat,weneedsomethingtohelpusunderstandwhatdrivesourunitsales.Somethingthatwecanuseasalevertohelpincreasesalesoverthequarter.’

‘Ademandmodel.’Scottsays.‘Unitsareafunctionof,what,price,advertising?’

‘Sure.’

Scottgulpsandsays,‘I’llseewhatIcando.’

Thatnighthethinksaboutitandhassomeideas.He’llfirsthavetothinkabout

causality(‘Demandiscausedby…’)andthenhe’llhavetogetappropriatedata.

It’ssmarttoformulateatheoreticmodelfirst,regardlessofwhatdatayoumayormaynothave.First,trytounderstandthedata-generatingprocess(‘thisiscausedbythat,andmaybethat,etc.’)andthenseewhatdata,orproxiesfordata,canbeusedtoactuallyconstructthemodel.

It’salsowisetohypothesizethesignsofthe(independent,right-handside)variablesyouthinksignificantincausingyourdependentvariabletovary.Rememberthatthedependentvariable(leftsideoftheequation)isdependentupontheindependentvariable(s)(rightsideoftheequation).

Forinstance,it’swellknownthatpriceisprobablyasignificantvariableinaunit-demandmodelandthatthesignshouldbenegative.Thatis,aspricegoesup,units,onaverage,shouldgodown.Thisisthelawofdemand,theonlylawinallofeconomics–excepttheonethatmosteconomicforecastswillbewrong.(‘Economistshavepredicted12ofthelast7recessions.’)

(Foryousticklers,yes,thereisa‘Giffengood’.Thisisanoddproductwherebyanincreasingpricecausesanincreaseindemand.Theseareusuallynon-normalgoods(typicallyluxurygoods)likefineartorwine.Forthevastmajorityofproductsmostmarketersworkon,however,thesenormalgoodsareruledbythelawofdemand:pricegoesup,quantity(units)goesdown.)

SoScottthinksthatpriceandadvertisingspendareimportantingeneratingdemand.Alsothatthereshouldbesomethingabouttheseason.He’sontheconsumersideofthebusinessandithasstrongback-to-schoolandChristmasseasonalspikes.

Hethinkshecaneasilygetthenumberofunitssoldandtheaveragepriceofthoseunits.Seasonalityiseasy;it’sjustavariabletoaccountfortimeoftheyear,sayquarterly.Advertisingspend(fortheconsumermarket)mightbealittletougherbutlet’ssayheisabletotwistsomearmsandeventuallysecureaguessastotheaverageamountofadvertisingspentontheconsumermarket,byquarter.

Thiswillbeatimeseriesmodelsinceithasseasonandquarterlyunits,averagepricesandadvertisingspend,bytimeperiod,quarterly.(Therewillbesomeeconometricsuggestionsontimeseriesmodellinginthetechnicalsection,particularlypertainingtoserialcorrelation.)

Fornow,let’smakesurethere’sagoodgraspoftheproblem.Scottwilluseadependentvariabletechniquecalledordinaryregression(ordinaryleastsquares,OLS)tounderstand(quantify)howseason,advertisingspendandpricecause(explainthemovementof)unitssold.Thisiscalledastructuralanalysis:heistryingtounderstandthestructureofthedata-generatingprocess.Heisattemptingtoquantifyhowprice,advertisingspendandseasonexplain,orcause(mostof)themovementinunitsales.

Whenhe’sthroughhe’llbeabletosaywhetherornotadvertisingspendissignificantincausingunitsales(he’llhavetomakecertainnoadvertisersareinearshotwhenhedoes)andwhetherDecemberispositiveandJanuarynegativeintermsofmovingunitsales,etc.

Now,Scottisreadytodesigntheordinaryregressionmodel.

Conceptualnotes

Ordinaryregressionisacommon,well-understoodandwell-researchedstatisticaltechniquethathasbeenaroundover200years.Rememberthatregressionisadependentvariabletechnique,Y=a+bXi+e,whereeisarandomerrortermnotspecificallyseenbutwhoseimpactisfeltinthedistributionofthevariables.

Ordinaryregression:astatisticaltechniquewherebyadependentvariabledependsonthemovementofoneormoreindependentvariables(plusanerrorterm).

Simpleregressionhasoneindependentvariableandmultipleregressionhasmorethanoneindependentvariable,thatis:

y=a+b1x1+b2x2…+bnxn,etc.

Scott’smodelforhisbosswillusemultipleregressionbecausehehasmorethanoneindependentvariable.

Theoutputofthemodelwillhaveestimatesabouthowsignificanteachvariableis(we’llseeitscoefficientorslope)andwhetherit’ssignificantornot(basedonitsvariance).Thisistheheartofstructuralanalysis,quantifyingthestructureofthedemandforPCs.

So,Scottcollecteddata(seeTable3.1)andranthemodel

Units=price+advertising

andnowseeshowthemodelfits.

Table3.1Demandmodeldata

Quarter Unitsales Avgprice Adspend

1 50 1,400 6,250

2 52.5 1,250 6,565

3 55.7 1,199 6,999

4 62.3 1,099 7,799

1 52.5 1,299 6,555

2 59 1,200 7,333

3 58.2 1,211 7,266

.. .. .. ..

Thereisonegeneralmeasureofgoodnessoffit:R2.R2isthesquareofthecorrelationcoefficient,inthiscasethecorrelationofactualunitsandpredictedunits.Whilecorrelationmeasuresstrengthanddirection,R2measuressharedvariance(explanatorypower)andrangesfrom0%–100%.

(AninterestingbutratheruselessbitoftriviaiswhyR2iscalledR2.Yes,R2isthesquareofR,andRisthecorrelationcoefficient.CorrelationissymbolizedastheGreekletterrho,ρ.Why?InGreeknumeralsα=1,β=2,etc.,andρ=100(kindoflikeRomannumerals,I=1,II=2,C=100,etc.).Rememberthattherangeofcorrelationisfrom–100%to+100%.ρ=rhoandinEnglish=R.Nowimpressyouranalyticfriends.)

Notethedataisquarterly,whichwe’lladdresssoonenough.ScottrunsordinaryregressionandfindstheoutputasTable3.2.

Table3.2Ordinaryregression

Adspend Avgprice Constant

Coefficient 0.0007 –0.0412 101.83

Standerr 0.0003 0.0047

R2 83%

t-ratio 2.72 –8.67

Thefirstrowistheestimatedcoefficient,orslope.Notethatpriceisnegative,ashypothesized.Thesecondrowisthestandarderror,oranestimateofthestandarddeviationofthevariable,whichisameasureofdispersion.

Standarderror:anestimateofstandarddeviation,calculatedasthestandarddeviationdividedbythesquarerootofthenumberofobservations.

Let’stalkaboutsignificance,shallwe?Inmarketingweoperateat95%confidence.Rememberz-scores?1.96isthez-scorefor95%confidence,whichisthesameasap-value<0.05.So,ifat-ratio(whichinthiscaseisthecoefficientdividedbyitsstandarderror)is>|1.96|thevariableisconsideredsignificant.Significancemeansthatthere’slessthana5%chanceofthevariablehaving0impactandthet-ratiotestsfortheprobabilitythatthevariable’simpactislikelytobe0.95%ofallstandard-normalobservationswillbewithin+/–1.96z-scores.

Noticethatadvertisingspendhasacoefficientof0.0007(rounded)andastandarderrorof0.0003(rounded).Thet-ratio(coefficientdividedbyitsstandarderror)is2.72whichis

>1.96soitissaidtobepositiveandsignificant.(‘Whew’theadvertiserssay.)Likewisepriceissignificant(<–1.96)andnegative,asexpected.

Nowlet’smentionfit;howwellthemodeldoeswithjustthesetwovariables.R2isthegeneralmeasureofgoodnessoffitandinthiscaseis83%.Thatis,83%ofthevariancebetweenactualandpredictedunitsisshared,or83%ofthemovementoftheactualdependentvariableis‘explained’bytheindependentvariables.Thiscanbeinterpretedas83%ofthemovementintheunitsalescanbeattributedtopriceandadvertisingspend.Thisseemsprettygood;that’safairlyhighamountofexplanatorypower.That’sprobablywhyScott’sbosswantedhimtodothismodel.

ThenextstepisforScotttoaddseasonality,whichhehypothesizedtobeavariablethatimpactsconsumerPCunitssold.Scotthasquarterlydatasothisiseasytodo.Thenewmodelwillbeunits=price+advertising+season.

Let’stalkaboutdummyvariables(binaryvariables,thosewithonlytwovalues,1or0).Theseareoftencalled‘slopeshifters’becausetheirpurpose(whenturned‘on’asa1)istoshifttheslopecoefficientupordown.Theideaofabinaryvariableistoaccountforchangesintwostatesofnature:onoroff,yesorno,purchaseornot,respondornot,q1ornot,etc.

Scott’smodelisaquarterlymodelsoratherthanuseonevariablecalledquarterwithfourvalues(1,2,3,4)heusesamodelwiththreedummy(binary)variables,q2,q3andq4,each0or1.Thisallowshimtoquantifytheimpactofthequarteritself.Table3.3showspartofthedataset.

Table3.3Quarterlymodel

Quarter Unitsales Avgprice Adspend Q2 Q3 Q4

1 50 1,400 6,250 0 0 0

2 52.5 1,250 6,565 1 0 0

3 55.7 1,199 6,999 0 1 0

4 62.3 1,099 7,799 0 0 1

1 52.5 1,299 6,555 0 0 0

2 59 1,200 7,333 1 0 0

3 58.2 1,211 7,266 0 1 0

4 64.8 999 8,111 0 0 1

1 55 1,299 6,877 0 0 0

2 61.5 1,166 7,688 1 0 0

.. .. .. .. .. .. ..

Abrieftechnicalnote

Whenusingbinaryvariablesthatformasystem,youcannotusethemall.Thatis,foraquarterlymodelyouhavetodroponeofthequarters.Otherwisethemodelwon’tsolve(effectivelytryingtodivideby0)andyouwillhavefallenintothe‘dummytrap’.SoScottdecidestodropq1,whichmeanstheinterpretationofthecoefficientsonthequartersamountstocomparingeachquartertoq1.Thatis,q1isthebaseline.

Nowlet’stalkaboutthenewmodel’s(Table3.4)outputanddiagnostics.NotefirstthatR2improvedto95%,whichmeansaddingquarterlydataimprovedthefitofthemodel.Thatis,price,advertisingspendandseasonnowexplains95%ofthemovementinunitsales,whichisoutstanding.It’sabettermodel.Notethechangeinpriceandadvertisingcoefficients.

Table3.4Regressionoutput

Q4 Q3 Q2 Adspend Avgprice Constant

Coefficient 3.825 2.689 1.533 0.0011 –0.0275 80.7153

Standerr 1.36 1.157 0.997 0.0003 0.0064 9.8496

R2 95%

t-ratio 2.81 2.32 1.54 4.1 –4.3 8.19

Now,forwhatitmeansandhowcanitbeused,theresultsoftheoutputwillbeappliednext.

ResultsappliedtobusinesscaseSonow,whatdoesallthistellus?Analyticswithoutapplicationtoanactionablestrategyismeaningless,muchlikespecialeffectsinamoviewithoutaplot.Lookingattheoutputagain,Scottcanmakesomeactionableandimportantstructuralcomments.

AgaintheR2asameasureoffitis>95%whichmeanstheindependentvariablesdoaverygoodjobexplainingthemovementofunitsales.Allofthevariablesaresignificantatthe95%level(wherez-score>|1.96|)exceptq2.Thecoefficientsonthevariablesallhavetheexpectedsigns.Comparingthequarterstoq1(whichwasdroppedtoavoidthedummytrap),Scottseesthattheyareallpositive,whichmeanstheyareallgreaterthanq1,onaverage.

ThepowerfulthingaboutordinaryregressionisthatitparcelsouttheimpactOFeachindependentvariable,takingintoaccountalltheothervariables.Thatis,itholdsallothervariablesconstantandquantifiestheimpactofeachandeveryvariable,oneatatime.This

meansthat,whentakingallvariablesintoaccount,q4tendstoaddabout3.825unitsmorethanq1.Thisiswhyabinaryvariableiscalledaslopeshifter;justturning‘on’q4adds3.825units,regardlesswhatelseishappeninginpriceoradvertisingspend.Giventheverystrongseasonalpatternofunitsalesthesequarterlyestimatesseemreasonable.

Advertisinghasasignificantandpositiveimpactonunitsales.0.0011asacoefficientmeansevery1,000increaseinadvertisingspendtendstoincreaseunitsby1.1.

Nowlet’slookatprice.Thepricecoefficientisnegative,asexpectedat–0.0275.Whenpricemovesupby,say,100,unitstendtodecreaseby2.75.Now,howcanthisbeuseful?Justknowingthequantificationisvaluablebutmoreimportantlyistocalculatepriceelasticity.

ModellingelasticityElasticityisamicroeconomiccalculationthatshowsthepercentchangeinresponsegivenapercentchangeinstimulus,orinthiscase,thepercentchangeinunitssoldgivenapercentchangeinprice.

Elasticity:ametricwithnoscaleordimension,calculatedasthepercentchangeinanoutputvariablegivenapercentchangeinaninputvariable.

Usingaregressionequationmeansthecalculationofelasticityis:pricecoefficient*averagepriceoveraveragequantity(units).

Averagepriceis1,102andaveragequantityofunitssoldis63sothepriceelasticitycalculatedhereis:

–0.0275*1,102/63=–0.48

Thismeansthatifpriceincreasesby,say10%,unitssoldwilldecreasebyabout4.8%.ThisisstrategicallylucrativeinformationallowingScottandhisteamtooptimizepricingtomaximizeunitssold.Therewillbemoreonthistopiclater.

Asaquickreview,rememberthattherearetwotypesofelasticity:elasticandinelastic.

Elasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproducesmorethanthatchangeinanoutputvariable.

InelasticitymeansthatanX%increaseinpricecausesa<X%decreaseinunitssold.

Inelasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproduceslessthanthatchangeinanoutputvariable.

Thatis,ifpriceweretoincreaseby,say,10%,unitswoulddecrease(rememberthelawofdemand:ifpricegoesup,quantitygoesdown)bylessthan10%.Meaning,ifelasticity<

|1.00|thedemandisinelastic(thinkofitasunitsbeinginsensitivetoapricechange).Ifelasticity>|1.00|thedemandiselastic.

Thesimplereasonwhyelasticityisimportanttoknowisthatittellswhathappenstototalrevenue,intermsofpricing.Inaninelasticdemandcurvetotalrevenuefollowsprice.Soifpriceweretoincrease,totalrevenuewouldincrease.SeeTable3.5belowforamathematicexample.

Table3.5Elasticity,inelasticity,andtotalrevenue

Inelastic 0.075 Increasepriceby 10.00%

p1 10.00 p2 11.00 10.00%

u1 1,000 u2 993 –0.75%

tr1 10,000 tr2 10,918 9.20%

Elastic 1.25 Increasepriceby 10.00%

p1 10.00 p2 11.00 10.00%

u1 1,000 u2 875 –12.50%

tr1 10,000 tr2 9,625 –3.80%

Letmeaddonequicknoteaboutelasticitymodelling,somethingwhichisacommonmistake.Itiswellknownthatifthenaturallogarithmistakenforalldata(dependentaswellasindependentvariables)thentheelasticitycalculationdoesnothavetobedone.Elasticitycanbereadrightoffthecoefficient.Thatis,thebetacoefficientIStheelasticity.

ln(y)=b1ln(x1)+b2ln(x2)…+bnln(xn)

Theproblemwiththisisthat,whilethecalculationiseasier(takingthepricemeansandtheunitmeans,etc.isnotrequired),modellingallthedatainnaturallogsspecificallyassumesaconstantelasticity.Thisassumptionseemsheroicindeed.Tosaythereisthesameresponsetoa5%pricechangeasthereistoa25%pricechangewouldstrikemostmarketersasinappropriate.Amodelinlogswouldhaveaconstantlyconcavecurvetotheoriginthroughout.Formoreonmodellingelasticityfromamarketingpointofview,seeanarticleIwrotethatappearedintheCanadianJournalofMarketingResearch,called‘ModelingElasticity’(Grigsby,2002).

UsingthemodelHowistheordinaryregressionequationused?Thatis,howarepredictedunitscalculated?

NoteFigure3.1showstheactualaswellasthepredictedunitsales.Thegraphshowshowwellthepredictedsalesfittheactualsales.Theequationis:

Y=a+B1x1+B2x2…+BnXnor

Units=constant+b1*q2+b2*q3+b3*q4+b4*price+b5*advert

Figure3.1Actualandpredictedunitsales

Forthesecondobservation(Table3.6)thismeans:

80.7+(3.8*0)+(2.6*0)+(1.5*1)+(0.001*6,565)–(0.02*1,250)=55.2

Table3.6Averagepriceandadspend

Quarter Unitsales Avgprice Adspend Q2 Q3 Q4 Predictedsales

1 50.0 1,400 6,250 0 0 0 49.2

2 52.5 1,250 6,565 1 0 0 55.2

3 55.7 1,199 6,999 0 1 0 58.2

4 62.3 1,099 7,799 0 0 1 63.0

1 52.5 1,299 6,555 0 0 0 52.3

2 59.0 1,200 7,333 1 0 0 57.5

3 58.2 1,211 7,266 0 1 0 58.2

TechnicalnotesWe’llgooversomedetailedbackgroundinformationinvolvingmodellingingeneralandregressioninparticularnow.Thiswillbealittlemoretechnicalandonlynecessaryforafullerunderstanding.

First,beawarethatregressioncarrieswithitsome‘baggage’,someassumptionsthatif

violated(andthey/somealmostalwaysaretosomeextent)themodelhasshortcomings,bias,etc.Asalludedtoearlier,oneofthebestbooksoneconometricsisPeterKennedy’s1998workAGuidetoEconometrics.Thisisbecauseheexplainsthingsfirstconceptuallyandthenaddsmoretechnical/statisticaldetail,forthosethatwant/needit.Hecoverstheassumptionsandfailingsoftheassumptionsofregressionaswellasanyone.Myphilosophyinthisbookissimilarandthissectionwilladdsometechnical,butnotnecessarilymathematical,details.

TheassumptionsThefirstassumption–dealingwithfunctionalform–isthatthedependent

variable(unitsales,above)canbemodelledasalinearequation.Thisdependentvariable‘depends’ontheindependentvariables(season,priceandadvertising,asabove)andsomerandomerrorterm.

Thesecondassumption–dealingwiththeerrorterm–isthattheaveragevalueoftheerrortermiszero.

Thethirdassumption–dealingwiththeerrorterm–isthattheerrortermhassimilarvariancescatteredacrossalltheindependentvariables(homoscedasticity)andthattheerrorterminoneperiodisnotcorrelatedwithanerrorterminanother(later)period(noserial(orauto)correlation).

Thefourthassumption–dealingwithindependentvariables–isthattheindependentvariablesarefixedinrepeatedsamples.

Thefifthassumption–dealingwithindependentvariables–isthatthereisnoexactcorrelationbetweentheindependentvariables(noperfectcollinearity).

Eachoftheseassumptionsisrequiredfortheregressionmodeltowork,tobeinterpretable,tobeunbiased,efficient,consistent,etc.Afailureofanyoftheseassumptionsmeanssomethinghastobedonetothemodelinordertoaccountfortheconsequencesofaviolationoftheassumption(s).Thatis,goodmodelbuildingrequiresatestforeveryassumptionand,ifthemodelfailsthetest,acorrectiontothemodelmustbeapplied.Allthisrequiresanunderstandingoftheconsequencesofviolatingeveryassumption.

Allofthesewillbedealtwithaswegothroughthebusinesscases.Butfornow,let’sjustdealwithserialcorrelation.Serialcorrelationmeanstheerrorterminperiodxiscorrelatedwiththeerrorterminperiodx+1,allthewaythroughthewholedataset.Serialcorrelationisverycommonintimeseriesandmustbedealtwith.

Asimpletest,calledtheDurbin-Watsontest,iseasytoruninSAStoascertaintheextentofserialcorrelation.Iftheresultofthetestisabout2.00thereisnotenoughautocorrelationtoworryabout.

Theconsequenceofaviolationoftheassumptionofnoerrortermcorrelationisthatthestandarderrorsarebiaseddownward,thatis,thestandarderrorstendtobesmallerthantheyshouldbe.Thismeansthatthet-ratios(measuresofsignificance)willbelarger(appearmoresignificant)thantheyreallyare.Thisisaproblem.

Thecorrectionforserialcorrelation(atleastfora1-periodcorrelation)iscalledCochran-Orcutt(althoughtheSASoutputactuallydoesaYule-Walkerestimate,whichsimplymeansithaswaystoputthefirstobservationbackintothedataset)anditbasicallytransformsallthedatabythecorrelationof1-periodlagoftheerrorterm.Themodelisre-runandDurbin-Watsonisre-runandthoseresultsused.

SeeTables3.7and3.8forD-Wbeingnear2.0(from1.08to1.93).Thisseemstoindicatethemodeltransformationworked.Notethechangeincoefficients:pricewentfrom–.0256to–.0274.Notethestandarderrorwentfrom.006to.004andsignificanceincreased.

Table3.7Serialcorrelation

Variable Estimate Standarderror Tvalue

Intercept 78.47 6.41 12.24

Price –0.0256 0.006 –4.27

Advertising 0.001109 0.00019 5.65

Q2 1.5723 0.7422 2.12

Q3 2.9698 1.0038 2.96

Q4 4.357 0.8948 4.87

R2 98.61%

Durbin-Watson 1.08

Table3.8Serialcorrelation

Variable Estimate Standarderror Tvalue

Intercept 78.47 6.41 12.24

Price –0.0274 0.004 –6.17

Advertising 0.001109 0.00019 5.65

Q2 1.5723 0.7422 2.12

Q3 2.9698 1.0038 2.96

Q4 4.357 0.8948 4.87

R2 98.61%

Durbin-Watson 1.93

Nowthattheserialcorrelationhasbeentakencareof,confidenceininterpretationoftheimpactsofthemodelhasimproved.Aquicknotethoughaboutserialcorrelationandthediagnostics/correctionsI’vejustmentioned.Whilemostserialcorrelationislaggedononeperiod(calledanautoregressive1orAR(1)process)thisdoesnotmeanthattherecannotbeotherserialcorrelationproblems.Partofitisaboutthekindofdatagiven.IfitisdailydatatherewilloftenbeanAR(7)process.Thismeansthereisstrongercorrelationbetweenperiodslaggedby7thanperiodslaggedby1.IfthereismonthlydatatherewilloftenbeanAR(12)process,etc.Thus,keepinmindtheD-WstatisreallyonlyappropriateforanAR(1).Thatis,ifthedataisdaily,eachMondaywouldtendtobecorrelatedwithallotherMondays,etc.ThismeansserialcorrelationofanAR(7)type,andnotanAR(1).Thus,dailydatatendstobelaggedby7observations,monthlydatatendstobelaggedby12observations,quarterlydataby4,etc.

HIGHLIGHT

SEGMENTATIONANDELASTICITYMODELLINGCANMAXIMIZEREVENUEINARETAIL/MEDICALCLINIC

CHAIN:FIELDTESTRESULTS

AbstractMostmedicalproductsorservicesarethoughttobeinsensitivetoprice.Thisdoesnotmeanthebestwaytomaximizerevenueistounilaterallyraiseeverypriceindiscriminatelyforallregionsinallclinicsforallproductsorservices.Thereshouldbesomecustomers,someregions,somesegments,someclinics,someproductsorservicesthataresensitivetoprice.Marketinganalyticsneedstogiveguidancetoexploittheseopportunities.

Usingtransactionalandsurveydatafromalargenationalretail/medicalchain,Icollectedinformationthatincluded,bycustomerandbyclinic,thenumberofunits,pricepaidandrevenuerealizedforeachproduct/servicepurchasedoveratwo-yearperiod.Therewasatelephonesurveyadministeredtocontactthreecompetingclinicsaroundeachofthefirm’sclinicsandascertaincompetitivepriceschargedforcertain‘shopped’products/services.Thus,adatasetwascreatedthathadbothown-andcross-priceofseveralproductsorservices.

Becausemuchofacustomer’spurchasingbehaviourcouldbeattributedtoclinicdifferences(staffing,employeecourtesy,location,growth,operationaldiscounts,etc.)

clinicsegmentationwasdone.Toemphasize,thiswascreatedtoaccountforclinicsinfluencing(causing)somecustomerbehaviourotherthanresponsestoown-andcross-price.Forexample,onesegmentprovedtobelarge(intermsofnumberofclinics),suburbanandservingmostlyloyalcustomers.Anothersegmentwasfairlysmall,urbanandservingrathersickpatientswithcustomerswhoweremostlydissatisfiedandhadahighnumberofdefectors.Obviouslycontrollingforthesedifferenceswasimportant.

Aftersegmentation,elasticitymodellingwasdoneoneachsegmentforselectedproductsorservices.Thisoutputshowedthatsomesegmentsandsomeproductsorservicesaresensitivetoprice;othersarenot.Thisdetailstheineffectivenessofsimplyraisingpricesonallproducts/servicesacrossthechain.Inordertomaximizerevenue,pricesshouldbeloweredonaproductinaclinicthatissensitivetoprice.Thissensitivitycomesfromlackofloyalty,lackoflong-termcommitment,knowledgeofcompetingprices,acustomer’sbudget,etc.

Aftertheanalysiswasfinishedandshowntothefirm’smanagement,theyputa90-daytestvscontrolinplace.Theychoseselected(shopped)products/segmentsandregionstotest.After90days,thetestclinicsout-performedthecontrolclinics,intermsofaveragenetrevenue,by>10%.Thisseemstoindicatethatthereareanalyticwaystoexploitpricesensitivityinordertomaximizerevenue.

TheproblemandsomebackgroundGivenaparticularchainofretail/medicalclinicsacrossthenation,pricingpracticeswerenotoriouslysimplistic:raisepricesonnearlyeveryproductorservice,foreveryclinic,ineveryregion,aboutthesameamount,everyyear.Growthwasachievedforatimebutoverthelasthandfulofyearscustomersatisfactionbegantodip,defectionsincreased,loyaltydecreased,employeesatisfaction/courtesydecreased,itwasmoreandmoredifficulttooperationallyenforcepriceincreasesandthefirmoverallhadminimalgrowthandlargerandlargerusesofdiscounts,etc.Muchofthedeteriorationinthesemetricswasroot-causedbacktopricingpolicies.Sotheprimarymarketingproblemwastounderstandtowhatextentpricingaffectedtotalrevenue.Thatis,couldpricesensitivitybediscovereddifferentlybysegmentorregion,fordifferentproductsorservices,toallowthefirmtoexploitthosedifferences?

Pricingismostlyaroundoneoftwopractices.Thefirst,cost-plus,isafinancialdecisionbasedontheinputcostoftheproductsorservicesandincorporatingmarginintothefinalprice.Thisisthetypicalapproach,especiallyintermsofproductsorservicesthoughttobeinsensitivetoprice(egemergencies,radiology,majorsurgery,etc.).Theotherpricingavenueisforshoppedproductsorservices.Theseareproductsorservicesthoughttopossiblybemoresensitivetoprice(exams,discretionaryvaccines,etc.).Fortheseproductsorservicesasurveywascreatedandthreecompetingclinicsaroundeachofthefirm’sclinicswerecalledandaskedwhatpricestheycharged.Thenthefirmtypically

increasedtheirownprices(verymuchoperationallyascost-plus)butwithanunderstandingwherethecompetitionpricedthosesameproductsorservices.Theysometimeslistenedtoanindividualclinic’srequestorprotestforaless-than-typicalpriceincrease.

DescriptionofthedatasetThetransactionaldatabaseprovidedown-firmbehaviouraldataatthecustomerlevel.Thiscouldberolleduptothecliniclevel.Thetransactionaldataincluded:products/servicespurchased,pricepaidforeach,discountapplied,totalrevenue,numberofvisits,timebetweenvisits,ailment/complaint,clinicvisited,staffing,etc.

Theclinicdataincludedaggregationsoftheabove,aswellastradearea,location(ruralvsurban),staffinganddemographicsfromthecensusdatamappedtozipcodelevel.Alsoavailablewascertainmarketresearchsurveydata.Theseincludedcustomersatisfaction/loyaltyanddefectionsurveys,employeesatisfactionsurveys,etc.

Mostinterestingwasthecompetitivesurveydata.Thissurveyaskedthreecompetitorsneareachofthefirm’sclinicswhatpricestheychargedforshoppedproducts.Shoppedproductsarethosebelievedtobemorepricesensitiveandincludedexams,vaccines,minorsurgery,etc.Thus,foreachofthefirm’sclinics,theylookedatownpricespaidbycustomersforeveryproduct/service(bothshoppedandother)aswellasthreecompetitors’priceschargedforselectedshoppedproducts/services.Theown-pricedataallowedelasticitymodellingtobeundertaken,andthecross-pricedatashowedaninterestingcausefromcompetitivepressures.Sometimesthesecompetitivepressuresmadeadifferenceonownpricesensitivityandsometimesnot.Thisprovidedlucrativeopportunitiesformarketingstrategy.

First:segmentation

Whysegment?Thefirststepwastodoclinicsegmentation.

Segmentation:amarketingstrategyaimedatdividingthemarketintosub-markets,whereineachmemberineachsegmentisverysimilarbysomemeasuretoeachotherandverydissimilartomembersinallothersegments.

Thisisbecauseconsumers’behaviour,insomeways,maybecausedbyaclinic’sperformance,staffing,culture,etc.Thatis,whatmightlooklikeaconsumer’schoicemightbemorecausedbyaclinic’sfirmographics.Thedatasetcontainedallrevenueandproducttransactionsthatcouldberolledupbyclinic.Thismeantthatyear-over-yeargrowth,discountingchanges,customervisits,etc.,couldbeusefulmetrics.Alsoimportantwasthelocationofaclinic(rural,urban,etc.).Sotherewasalotofknowledgeaboutthe

clinicanditsperformanceanditwasthesethingsthatitwasnecessarytocontrolforintheelasticitymodels.

Becauselatentclassanalysis(LCA)hasbecomethegoldstandardtheselasttenyears,LCAwasusedasasegmentationtechnique.Ithasprovenfarsuperiortotypical(k-means,asegmentationalgorithmdiscussedlater)techniques,especiallyinoutputtingmaximallydifferentiatedsegments.Anobviouspoint:themoredifferentiatedsegmentsarethemoreuniquemarketingstrategiescanbecreatedforeachsegment.

ProfileoutputAfterrunningLCAontheclinicdata,theprofilebelowwascreated(seeTable3.9).Acoupleofcommentsonthesegments,particularlythosetobeusedinthefieldtest.Segment1isthelargest(intermsofnumberofclinicsincluded)andhasthelargestpercentofannualrevenue.Segment1ismostheavilysituatedinsuburbanareasandmarketresearchshowsthemtohavethemostloyalcustomers.Segment2isthenext-to-largestbutonlybringsinabouthalfoftheirfairshareofrevenue.Segment4,whilesmall,represents>20%ofoverallrevenueandismostlyinurbanareas.Marketresearchrevealsthissegmenttobetheleastsatisfiedandcontainsthemostdefectors.Thesedifferenceshelpaccountforcustomer’ssensitivitytoprice,aswillbeshowninthemodelslater.

Table3.9Elasticitymodelling

Segment1 Segment2 Segment4

%Market 36% 34% 7%

%Revenue 41% 19% 21%

#ofclients 5,743 3,671 15,087

Rev/visit 135 120 215

%Suburb 56% 51% 45%

%Rural 13% 20% 3%

%Urban 31% 29% 52%

Then:elasticitymodellingOverviewofelasticitymodelling

Let’sgobacktobeginningmicroeconomics:priceelasticityisthemetricthatmeasuresthepercentchangeinanoutputvariable(typicallyunits)fromapercentchange,inthiscase(net)price,fromaninputvariable.Ifthepercentchangeis>100%,thatdemandiscalledelastic.Ifitis<100%,thatdemandiscalledinelastic.Thisisanunfortunateterm.The

clearconceptisoneofsensitivity.Thatis,howsensitivearecustomerswhopurchaseunitstoachangeinprice?Ifthereisasay10%changeinpriceandcustomersrespondbypurchasing<10%units,theyareclearlyinsensitivetoprice.Ifthereisasay10%changeinpriceandcustomersrespondbypurchasing>10%units,theyaresensitivetoprice.

Butthisisnotthekeypoint,atleastintermsofmarketingstrategy.Thelawofdemandisthatpriceandunitsareinverselycorrelated(rememberthedownwardslopingdemandcurve?).Unitswillalwaysgotheoppositedirectionofapricechange.Buttherealissueiswhathappenstorevenue.Sincerevenueisprice*units,ifdemandisinelastic,revenuewillfollowthepricedirection.Ifdemandiselastic,revenuewillfollowtheunitdirection.Thus,toincreaserevenueinaninelasticdemandcurve,priceshouldincrease.Toincreaserevenueinanelasticdemandcurve,priceshoulddecrease.

FrompointelasticitytomodellingelasticityMostofusweretaughtinmicroeconomicsthesimpleideaofpointelasticity.Pointelasticityisthepercentdifferencebetween(x,y)points.Thatis,thepercentchangeinunitsgivenapercentchangeinprice.Saypricegoesfrom9–11,andunitsgofrom1000–850.Thepointelasticityiscalculatedas[(1000–850)/1000]/[(11–10)/10)whichis–68%.Notethepercentchangeinunitsis15%,fromapercentchangeinpriceof22%.Obviouslyunitsareasmallerchange(lesssensitive)thanthepricechangesothis(point)demandisinelastic.Thatis,theelasticityatthispointonthedemandcurveisinsensitivetoprice.Notethatasthedemandcurvegoesfromahighpricetoalowprice,theslopechangesandthesensitivitychanges.Thisisthekeymarketingstrategyissue.

Thuselasticityisamarginalfunctionoveranaveragefunction.Theoverallmathematicalconceptof‘marginal’istheaverageslopeofacurvewhichisaderivative.Sotocalculatetheoverallaverageelasticityrequiresthederivativeoftheunitsbypricefunction(ie,thedemandcurve)measuredatthemeans,meaning:

Elasticity=dQ/dP*averageprice/averageunits.

Somathematicallythederivativerepresentstheaverageslopeofthedemandfunction.Inastatisticalmodel(thataccountsforrandomerror)thesameconceptapplies:amarginalfunctionoveranaveragefunction.Inastatistical(regression)modelthebetacoefficientistheaverageslope,thus:

Elasticity=βPrice*averageprice/averageunits.

Aquicknoteonamathematicallycorrectbutpracticallyincorrectconcept:modellingelasticityinlogs.Whileit’struethatifthenaturallogistakenbothofthedemandandprice,thereisnocalculationatthemeans;thebetacoefficientistheelasticity.However–andthisisimportant–runningamodelinnaturallogsalsoimpliesaverywrongassumption:constantelasticity.Thismeansthereisthesameimpactatasmallpricechangeasatalargepricechangeandnomarketerbelievesthat.Thus,modellinginnatural

logsisneverrecommended.

Own-pricevscross-priceandsubstitutesNowcomestheinterestingpartofthisdataset.Ithascompetitorprices!Asurveywasdoneaskingthreecompetitorsnearesttoeachclinicthepricestheychargedfor‘shoppedproducts’.Theseproductsareassumedtobegenerallypricesensitive.Itookthehighestcompetitorpriceandthelowestcompetitorpriceandusedthatascross-pricedataforevery(shopped)product.Thusthedemandmodel(bysegment)foreachshoppedproductwillbe:

Units=f(own-price,highcross-price,lowcross-price,etc.)

Thereasoncompetitivepricesaresointerestingisbecauseoftwothings.First,competitivepricesarecausesofbehaviour.Second,ifacompetitorisastrongsubstitute,strategicchoicesrevealthemselves.

Acompetitorisregardedasasubstituteifthecoefficientontheircross-priceispositive.Thismeansthereisapositivecorrelationwithafirm’sowndemand.Thus,ifthecompetitionisasubstituteandchoosestoraisetheirprices,ourowndemandwillincreasebecausetheircustomerswilltendtoflowtoourdemand(withlowerprices).Ifthecompetitorisasubstituteandchoosestolowertheirprices,ourowndemandwilldecreasebecausetheircustomerswilltendtoflowoutofourdemand(withhigherprices).Thus,knowingifacompetitorisasubstitutegivesexplanatorypowertothemodelaswellasapotentialstrategiclever.

Buttherealissueishowstrongasubstituteacompetitoris.Thisstrengthisrevealedinthecross-pricecoefficients.Sayforaparticulardemandmodelthecoefficientonownpriceis–1.50andthecoefficientonhighcross-priceis+1.10.Ownpricehastheexpectednegativecorrelation(ownpricegoesup,(own)unitsgodown).Highcross-priceispositive,meaninginthiscasethehigh-pricecompetitorisasubstitute.Ifownelasticityispricesensitiveandwelowerourprices,thehighcompetitorscanlowertheirpricesaswell,decreasingourdemand.Butnotethattheyarenotastrongsubstitute.Astrongsubstitutewillnotonlyhaveapositivecoefficientbutthatcoefficientwillbe(absolutevalue)>ownpricecoefficient.Meaning,intheaboveexample,ifwelowerourpricesby10%weexpectourdemandtoincreaseby15%.Ifthecompetitormatchesourpricechangeandlowersby10%,thatwillaffectourdemandby11%,thatis,theywerenotastrongsubstitute.

However,ifourownpricecoefficientwas–1.50andthehigh-pricecompetitorcoefficientwasinstead+3.00,averydifferentstoryunfolds.Ifwelowerourpricesby10%ourdemandwillgoupby15%.Butthestrongsubstitutecanlowertheirpriceby5%andimpactourunitsby15%(5%*3=1.5).Oriftheyalsolowerby10%andmatchusthatwillimpactourunitsby30%!Clearlythisstrongcompetitorisfarmorepowerfulthan

thefirstscenario.Notealsothatnoneofthis‘gametheory’knowledgeispossiblewithoutcrossprices.

ModellingoutputbysegmentThenextfourtablesshowelasticitymodellingresultsbysegmentbyfourselectedshoppedproducts.(Inthefieldtestonlyvaccines(two),minorsurgeryandexamswereused.)Followingeachtablearenotesonstrategicuses.

Table3.10Elasticitymodelling

Vaccinex Seg1 Seg2 Seg4

Vaccinexfirm –0.377 –1.842 –3.702

Vaccinexcomphi –0.839 0.062 1.326

Vaccinexcomplo –0.078 –0.167 –0.757

Segment1:Anelasticity<|1.00|(0.377,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andnocompetitorisasubstitute(nopositivecross-priceelasticity).Thereforeincreaseprice.

Afewdetailsonsegment1vaccinexcalculationsfollow.Forown-priceelasticity,thefirm’spricewas28andtheownpricecoefficientwas–1.2andtheaverageunitswere89.Thusownpriceelasticityis–0.377=–1.2*28/89.Highcompetitorpriceelasticityiscalculatedas–0.839=–1.915*39/89andlowpricecompetitorelasticityis–0.078=–0.33*21/89.Allothercalculationsaresimilar.

Segment2:Anelasticity>|1.00|(1.842,inabsoluteterms)meansthisproductforthissegmenthasademandthatiselastic.Thehighcompetitorisaweaksubstitute(0.062).Thereforedecreaseprice.

Segment4:Anelasticity>1.00(3.702,inabsoluteterms)meansthisproductforthissegmenthasademandthatiselastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Thehighcompetitorisaweaksubstitute(1.326).Thereforedecreaseprice.

Table3.11Furtherelasticitymodelling

Vacciney Seg1 Seg2 Seg4

Vaccineyfirm –0.214 –0.361 –0.406

Vaccineycomphi 0.275 0.018 0.109

Vaccineycomplo 0.196 0.123 0.864

Segment1:Anelasticity<|1.00|(0.214,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthelowcompetitorisaweaksubstitute.Thehighcompetitorisastrongsubstitute.Notethepositive0.275is>absolute0.214meaningthehighcompetitorcanmatch/retaliateagainstthefirmwithasmallerpricedecrease.Thereforetest(rememberthissegmentisloyal)increasingprice.

Segment2:Anelasticity<|1.00|(0.361,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Whilebothcompetitorsaresubstitutes,theyeachareweak.Thereforetestincreasingprice.

Segment4:Anelasticity<|1.00|(0.406,inabsoluteterms)meansthisproductforthissegmenthasademandthatis(surprisingly)inelastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Whilebothcompetitorsaresubstitutes,thelowcompetitorisastrongsubstitute.Thereforecautiouslytestincreasingprice.


Minorsurgery Seg1 Seg2 Seg4

Minsurgfirm –0.57 –0.17 –1.09

Minsurgcomphi 0.202 0.475 –0.59

Minsurgcomplo –0.06 0.291 0.215

Segment1:Anelasticity<|1.00|(0.573,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthehighcompetitorisaweaksubstitute.Thereforetestincreasingprice.

Segment2:Anelasticity<|1.00|(0.173,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Bothcompetitorsarestrongsubstitutes.Therefore(cautiously)testincreasingprice.

Segment4:Anelasticity>|1.00|(1.090,inabsoluteterms)meansthisproductforthissegmenthasademandthatis(barely)elastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Thelowcompetitorisaweaksubstitute.Thereforetestdecreasingprice.


Exams Seg1 Seg2 Seg4

Examfirm –0.1 –0.03 –0.1

Examcomphi 0.008 0.075 0.096

Examcomplo –0.02 –0.03 0.023

Segment1:Anelasticity<|1.00|(0.100,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthehighcompetitorisaweaksubstitute.Thereforetestincreasingprice.

Segment2:Anelasticity<|1.00|(0.025,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thehighcompetitorisastrongsubstitute.Thereforetestincreasingprice.

Segment4:Anelasticity<|1.00|(0.095,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Bothcompetitorsaresubstitutesandthehighcompetitorisastrongsubstitute.Therefore(cautiously)testincreasingprice.

Theaboveanalysisshowshowelasticitycanbeusedasastrategicweapon.Becauseitinvolvesbothown-price(customer’ssensitivity)aswellascross-price(potentialcompetitor’sretaliation)thestrategicleversarelucrative.

ExampleofelasticityguidanceNowlet’stalkabouttransferringthemodellingfromthesegmentleveltothecliniclevel,wherepricingguidanceneedstobe.Thebasicideawastousethesegmentmodel’spricecoefficientandapplythattotheelasticitycalculationbyclinic.Thatis,elasticityatthesegmentlevel:

Segmentquantity=

Segmentprice-coefficient*segmentaverageprice/segmentaveragequantity.

Translatingelasticityto(each)clinic:Clinicquantity=

Segmentprice-coefficient*clinicaverageprice/clinicaveragequantity.

Nowlet’slookataparticularclinic’stestresults.Thisclinicisinsegment4,averypricesensitivesegment.Guidanceforvaccinex(atthisclinic)wastodecreasepriceby6%.Thisdecreasebroughttheclinic’spricepositiondownfromthehighest(comparedtothesurroundingcompetitors)toamiddle-pricedoption.Thehighcompetitorwasaweaksubstitute,sostrongretaliationwasthoughtunlikely.

Forthevaccinexproduct,duringthe90-dayfieldtest,thisclinicgenerated2,292invaccinexrevenueandsold84units,makingaveragenetrevenueof27.28.Thematchedcontrolcellwas25.86,givinga5.48%test-over-controlresult.Thiscomesfromtwothingsinteracting:first,thissegmentingeneralissensitivetopriceandsecond,thisclinichasno(strong)substitutes.Thusguidancewastodecreasepricewithnofearofretaliationfromthecompetitors.

Lookatanotherparticularclinic’stestresults.Thisclinicisinsegment1,aprice

insensitivesegment.Guidanceforexams(atthisclinic)wastoincreasepriceby2%.Thisincreasebroughttheclinic’spricepositionupfromthemiddle(comparedtothesurroundingcompetitors)tothehighest-pricedoption.Rememberthissegmenttendstobeveryloyal.Thehighcompetitorwasaweaksubstitute,sostrongretaliationwasthoughtunlikely.

Fortheexamproduct,duringthe90-dayfieldtest,thisclinicgenerated27,882inexamrevenueandsold499unitsmakingaveragenetrevenueof55.88.Thematchedcontrolcellwas47.41givinga17.85%test-over-controlresult.Thiscomesfromtwothingsinteracting:first,thissegmentingeneralisinsensitivetopriceandsecond,thissegmentandthisclinichaveno(strong)substitutes.Thusguidancewastoincreasepricewithnofearofretaliationfromeitherthecustomersorcompetitors.

Last:testvscontrolTherewerenearly100clinicsthatmetcriteriatobepartofthefieldtest.Therewereabout25testclinicsand75controlclinics.Thetestclinicswouldgettheelasticityguidanceandthecontrolclinicswouldcontinuebusinessasusual.

Matchedcellsbyregionbysegmentweredesigned.Thetestmetricwasaveragenetrevenue(byregion,bysegment,byproduct,etc.).Theoverallresultwasthatthetestclinicsoutperformedthecontrolclinics,intermsofaveragenetrevenue,by>10%in90days.Ofcourseregionsandsegmentsandproductshadadistributionofresults.Oneregionwasextremelypositive,anotherregionwasslightlynegative,onesegment(segment1,theloyalsegment)wasverypositiveandsegment4(thedissatisfiedsegment)waslessso.Suchastrongoverallresultindicateselasticityanalysiscanhelpguideoptimalpricing.

DiscussionIstheregametheoryinthemedicalservicesworld?Mostpractitionerswouldprobablysaynotreally,theirjobismoreaboutpatientcarethancompetition.However,oneinterestingexamplethatmightcontradictcommonwisdomcomesfromthisstudy.

Therehappenedtobetwoclinics,callthemXandY,whicheachcamefromthesameregion,thesamesegment4,butonehadastrongsubstitute(low)competitorandtheotherdidnot.Forexams,bothclinicsweregivenapricedecreaseof4%.Theclinicthatfacedthestrongcompetitor(clinicX)hadonehalftheaveragenetrevenuegainsvscontrolasclinicY.ThismightindicatethelowcompetitoraroundclinicXalsoloweredtheirexamprices(nextsurveywillverify)butbecausetheywereastrongsubstitutetheyonlyneededtolowerby1%tonegativelyimpactthefirm’s4%pricedecrease.

Itseemsthatatleastfortheshoppedproducts,pricesinthemedicalservicesareaareNOTsoinsensitive.Italsoseemsthatsomekindof‘gametheory’mightgoon,especially

incloselocale,torespondandretaliatewithpricechanges.Thatwasprobablywhythecompetitivesurveywasdoneinthefirstplace.

Conclusion

Whyiselasticitymodellingsorarelydone?Inmynearly30yearsofmarketinganalysisoverawidevarietyoffirmsinmanydifferentindustries,elasticitymodelling(asdiscussedhere)isvirtuallyneverdone.Oftentherearesurveysonpricesandpurchasing,etc.Butthisisself-reportedandprobablyself-serving(‘Yes,yourpricesaretoohigh!’).Anothercommonandslightlybettermarketingresearchtechniqueisconjointanalysis.Itissomewhatartificialandstillself-reportedbutanalyticallycontrolsforsuchthings.

Mypointisthatifthereisrealbehaviour–realpurchasingresponsesbasedonrealpricechanges–inthetransactionaldatabase,whywouldTHOSEdataelementsnotbebesttomeasurepricesensitivity?Theanswerseemstobethattranslatingwhatwaslearnedinmicroeconomicsintostatisticalanalysisisawidestepandnotusuallytaught.Thatis,goingfrompointelasticitytostatisticallymodellingelasticityisknowledgenoteasilygained.Note,however,thestepsarequitestraightforwardandthemodellingisnotdifficult.Perhapsthischapterisonewaytogetelasticitymodellingusedmoreinpractice,especiallygiventhepotentialbenefits.

Checklist


Remembertherearetwotypesofstatisticalanalysis:dependentvariabletypesandinter-relationshiptypes.

Recallthattherearetwotypesofequations:deterministicandprobabilistic.

Observethatregression(ordinaryleastsquares,OLS)isadependentvariabletypeanalysisusingindependentvariablestoexplainthemovementinadependentvariable.

PointoutthatR2isameasureofgoodnessoffit;itshowsbothexplanatorypowerandsharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.

Rememberthatthet-ratioisanindicationofstatisticalsignificance.

Alwaysavoidthe‘dummytrap’;keeponelessbinaryvariableinasystem(eg,inaquarterlymodelonlyusethreenotfourquarters).

Thinkintermsofthetwokindsofelasticity:inelasticandelasticdemandcurves.

Focusontherealissueofelasticity:whatimpactithasontotalrevenue(notunits).

Rememberpriceandunitsarenegativelycorrelated.Inaninelasticdemandcurvetotalrevenuefollowsprice;inanelasticdemandcurvetotalrevenuefollowsunits.Toincreasetotalrevenueinaninelasticdemandcurvepriceshouldincrease;toincreasetotalrevenueinanelasticdemandcurvepriceshoulddecrease.

Rememberthatregressioncomeswithassumptions.

04

WhoismostlikelytobuyandhowdoItarget?Introduction

Conceptualnotes

Businesscase


Liftcharts


Variablediagnostics

Highlight:Usinglogisticregressionformarketbasketanalysis


IntroductionThenextmarketingquestionisaroundtargeting,particularlywhoislikelytobuy.Notethatthisquestionisstatisticallythesameas‘Whoislikelytorespond(toamessage,anoffer,etc.)?’Thisprobabilityquestionisthecentreofmarketingscienceinthatitinvolvesunderstandingchoicebehaviour.Thetypicaltechniqueinvolved(especiallyfordatabase/directmarketing)islogisticregression.

ConceptualnotesLogisticregressionhasalotofsimilaritiestoordinaryregression.Theybothhaveadependentvariable,theybothhaveindependentvariables,theyarebothsingleequations,andtheybothhavediagnosticsaroundtheimpactofindependentvariablesonthedependentvariableaswellas‘fit’diagnostics.

Buttheirdifferencesarealsomany.Logisticregressionhasadependentvariablethattakesononlytwo(asopposedtocontinuous)values:0or1,thatis,it’sbinary.Logisticregressiondoesnotusethecriteriaof‘minimizingthesumofthesquarederrors’(whichisordinaryleastsquares,orOLS)tocalculatethecoefficients,butrathermaximumlikelihoodviaagridsearch.Theinterpretationofthecoefficientsisdifferent.Oddsratios(eβ)aretypicallyusedandfitisnotaboutapredictedvs.anactualdependentvariable.

Maximumlikelihood:anestimationtechnique(asopposedtoordinaryleastsquares)

thatfindsestimatorsthatmaximizethelikelihoodfunctionobservingthesamplegiven.

Asaslightdetail,anotherimportantdifferencebetweenlogisticregressionandordinaryregressionisthatlogisticregressionactuallymodelsthe‘logit’ratherthanthedependentvariable.Alogitisthelogoftheevent/(1–theevent),thatis,thelogoftheoddsoftheeventoccurring.Recallthatordinaryregressionmodelsthedependentvariableitself.

(Bytheway,yesthereisatechniquethatcanmodel>twovalues,butnotcontinuous.Thatis,thedependentvariablemighthave3or4or5,etc.,values.Thistechniqueiscalledmultinomiallogit(discriminateanalysiswilldothisaswell)butwewillnotcoveritexcepttosayit’sthesameaslogisticregression,butthedependentvariablehascodesformultipledifferentvalues,ratherthanonly0or1.)Alloftheabovemeansthattheoutputoflogisticregressionisaprobabilitybetween0%and100%,whereastheoutputofordinaryregressionisanestimated(predicted)valuetofittheactualdependentvariable.Figure4.1showsaplotofactualevents(the0sandthe1s)aswellasthelogistic(s-curve).

Figure4.1Actualeventsandlogistics

Nowlet’slookatsomedataandrunamodel,becausethat’swhereallthefunis.

BUSINESSCASENowScott’sboss,veryimpressedwithwhathedidondemandmodelling,callsScottintohisoffice.

‘Scott,weneedtobettertargetthoselikelytobuyourproducts.Wesendoutmillions

ofcatalogues,basedonmagazinesubscriberlists,buttheresponserateistoosmall.WhatcanwedotomakeourmailingROIbetter?’

Scottthinksforaminute.Theresponseratewastoosmall?Responserateistherateofresponse,whichisthenumberofthosethatresponded(purchased),dividedbythetotalnumberthatgotthecommunication.It’sanoverallmetricofsuccess.

‘Wewanttotargetthoselikelytopurchasebasedonacollectionofcharacteristics.Wehavebothcustomersandnon-customersinourdatabase–fromthesubscriberlistswe’vebeenmailing–sowecouldmodeltheprobabilitytorespondbasedoncloneorlookalikemodelling.’

‘Whatdoesthatmean?’thebossasks.

‘I’llhavetodigintoitabitmorebutIknowwecandeveloparegression-typemodelthatscoresthedatabasewithdifferentprobabilitiestopurchaseforeachname.WecansortthedatabasebyprobabilitytopurchaseandonlymailasdeepasROIlimits.’

‘Soundsgood.Gettoworkonthatandgetbacktomewhenyouhavesomething.’WiththatthebossswivelsinhischairsoScottknowstheconversationisover.


NoteTable4.1overleafwhichshowsthesimplifieddataset.Thisisalistofcustomersthatpurchasedandthosethatdidnotpurchase.Scotthasdataonwhichcampaignstheyeachreceived,aswellassomedemographics.Theobjectiveistofigureoutwhichofthenon-purchasers‘looklike’thosethatdidpurchaseandre-mailthem,perhapswiththesamecampaign(ifwefindonethatwaseffective)ordesignanothercampaign.

Table4.1Simplifieddataset

Id Revenue Purchase Campaigna Campaignb Campaignc Income Sizehh Educ

999 1500 1 1 0 1 150000 1 19

1001 1400 1 1 0 1 137500 1 19

1003 1250 1 1 0 0 125000 2 15

1005 1100 1 1 0 0 112500 2 13

1007 2100 1 0 1 0 145000 3 16

1009 849 1 0 0 0 132500 3 17

1010 750 1 0 0 0 165000 3 16

1011 700 1 0 0 0 152500 3 9

1013 550 1 1 0 1 140000 4 15

1015 850 1 1 0 1 127500 4 18

1017 450 1 1 0 1 115000 4 17

1019 0 0 0 0 1 102500 5 16

1021 0 0 0 0 1 99000 6 15

1023 0 0 0 1 1 86500 7 16

1025 0 0 0 1 1 74000 6 15

1027 0 0 0 1 1 61500 5 14

1029 0 0 0 1 1 49000 4 13

1033 0 0 1 0 1 111000 4 12

1034 0 0 0 0 1 98500 3 11

1035 0 0 0 0 1 86000 3 10

Theendresultwillbetoscorethedatabasewith‘probabilitytopurchase’inordertounderstandwhat(statistically)worksandstrategizewhattodonexttime.Thisisthecornerstoneofdirect(database)marketing.

Usingthe(contrived)dataset,youcanrunproclogisticdescendinginSAS.SeeTable4.2fortheoutputofthecoefficients.Thesecoefficientsarenotinterpretedthesamewayasinordinaryregression.

Table4.2Co-efficientoutput

Intercept –57.9

Campaigna –8.48

Campaignb 16.52

Campaignc –9.96

Income 0.001

Sizehh –3.41

Education 0.2

Becauselogisticregressioniscurvilinearandboundby0and1,theimpactoftheindependentvariablesaffectsthedependentvariabledifferently.Theactualimpactis

e^coefficient.

Forexample,education’scoefficientis0.200.Theimpactwouldbe:

e.200=1.225,thatis(2.71828^.200).

Thismeansthatforeveryyearofaddededucation,theincreaseinprobabilityis22.5%.Thismetriciscalledtheoddsratio.Thisobviouslyhastargetingimplications:aimourproductatthehighesteducatedfamiliesaspossible.Notethattwoofthethreecampaignsarenegative(whichtendtodecreaseprobabilitytopurchase)sothisalsoaddscredencetoneedingbettertargeting.

Forlogisticregression,thereisnotreallyagoodnessoffitmeasure,likeR2inOLS.Logithasaprobabilityoutputbetweenadependentvariableof1and0.Oftenthe‘confusionmatrix’isused,andpredictiveaccuracyisasignofagoodmodel.Table4.3showstheconfusionmatrixoftheabovemodel.(TheconfusionmatrixfromSASuses‘ctable’asanoption.)Saythereare10,000observations.

Table4.3Confusionmatrix

Actualnon-events Actualevents

Predictednon-events 1,000 1,750

Predictedevents 500 6,750

Thetotalnumberofevents(purchases)is6,750+1,750or8,500.Themodelpredictedonly6,750+500or7,250.Thetotalaccuracyofthemodelistheactualeventspredictedcorrectlyandtheactualnon-eventspredictedcorrectly,meaning6,750+1,000or7,750/10,000=77.5%.Thenumberoffalsepositivesis500(themodelpredicted500peoplewouldhavetheeventthatdidnot).Thisisanimportantmeasureofdirectmarketing,intermsofthecostofawrongmailing.

Asananalytic‘trick’itoftenhelpstodetermineifthedependentvariable(sales,inthiscase)hasanyabnormalobservations.Rememberthez-score?Thisisafastandsimplewaytocheckifanobservationis‘outofbounds’.Thez-scoreiscalculatedas((observation–mean)/standarddeviation).

Let’ssaythemeanofrevenueis358.45andthestandarddeviationofrevenueis569.72.So,ifyourunthiscalculationforalltheobservationsonrevenueyouwillseethat(Table4.1)id#1007((2,100–358.45)/569.72)=3.074.Thismeansthatobservationis>3standarddeviationsfromthemean,averynon-normalobservation.Itiscommontoaddanewvariable,callit‘positiveoutlier’anditwilltakethevaluesof0aslongasthez-scoreonsalesis<3.00,thenittakesthevalueof1ifz-score>3.Usethisnewvariableasanotherindependentvariabletohelpaccountforoutliers.Someofthecoefficientsshouldchangeandthefitusuallyimproves.Thisnewvariablecanbeseenasaninfluentialobservation.

Table4.4Newvariables

Intercept –51.9

Influence 15.54

Campaigna –6.06

Campaignb 16.6

Campaignc –9.07

Income 0.002

Sizehh –1.65

Education 0.211

Notethemostlyslightchangesincoefficients.Thisoughttomeanpredictiveaccuracyincreases.Notetheupdatedconfusionmatrixbelow.

Table4.5Updatedconfusionmatrix

Actualnon-events Actualevents

Predictednon-events 1,250 1,000

Predictedevents 250 7,500

Thetotalnumberofevents(purchases)isstill8,500butnotetheshiftinaccuracy.Themodelnowpredicts7,500+250=7,750.Thetotalaccuracyofthemodelistheactualeventspredictedcorrectlyandtheactualnon-eventspredictedcorrectly,meaning7,500+1,250or8,750/10,000=87.5%.Thenumberoffalsepositivesis250(themodelpredicted250peoplewouldhavetheeventthatdidnot).Thisisanimportantmeasureofdirectmarketing,intermsofthecostofawrongmailing.Themodelimprovedbecauseofaccountingforinfluentialobservations.

LiftchartsAcommonandimportanttool,especiallyindirect/databasemarketingisthelift(orgain)chart.

Lift/gainschart:avisualdevicetoaidininterpretinghowamodelperforms.Itcomparesbydecilesthemodel’spredictivepowertorandom.

Thisisasimpleanalyticdevicetoascertaingeneralfitaswellasatargetingaidintermsofhowdeeptomail.

Thegeneralprocedureistorunthemodelandoutputtheprobabilitytorespond.Sortthedatabasebyprobabilitytorespondanddivideinto10equal‘buckets’.Thencountthe

numberofactualrespondersineachdecile.Ifthemodelisagoodone,therewillbealotmorerespondersintheupperdecilesandalotfewerrespondersinthelowerdeciles.

Asanexample,saytheaverageresponserateis5%.Wehave10,000totalobservations(customers).Eachdecilehas1,000customersinit,someofthemhaverespondedandsomeofthemhavenot.Overallthereare500responders(500/10,000=5%).So,randomly,wewouldexpectonaverage50ineachdecile.Instead,becausethemodelworks,saythereare250indecile1anditdecreasesuntilthebottomdecilehasonlyoneresponderinit.The‘lift’isdefinedasthenumberofrespondersineachdeciledividedbytheaverage(expected)numberofresponders.Indecile1thismeans250/50=500%.Thisshowsusthatthefirstdecilehasaliftof5X,thattherearefivetimesmoreresponderstherethanaverage.Italsosaysthatthoseinthetopdecilewhodidnotrespondareverygoodtargets,sinceagain,theyall‘lookalike’.Thisisanindicationthemodelcandiscriminatetherespondersfromthenon-responders.

Figure4.2Liftchart

Notethatineachdecilethereare1,000customers.250alreadyrespondedindecile1.Allofthecustomersindecile1haveahighprobabilityoftop10%responding.Thereare750morepotentialtargetsindecile1thathaveNOTresponded.Thisistheplacetofocustargetingandthisiswhyit’scalled‘clonemodelling’.

Tobrieflyaddressthedatabasemarketingquestion,‘HowdeepdoImail?’let’slookattheliftchartabove.Thisisanaccumulationofactualresponderscomparedtoexpectedresponders.Dependingonbudget,etc.,thisliftcharthelpstotarget.Mostdatabasemarketerswillmailasfarasanydecileout-respondstheaverage.Thatis,untiltheliftis<100%.Anotherwayofsayingthisistomailuntilthemaximumdistancebetweenthecurvesisachieved.However,asapracticalmatter,mostdirectmarketers(especiallycataloguers)haveasetbudgetandcanonlyAFFORDtomailsodeep,regardlessofthestatisticalperformanceofthemodel.Thus,mostoftheattentionisonthefirstoneortwodeciles.


Anotherverycommonissuethatmustbedealtwithin(especially)regressionmodelingiscollinearity.

Collinearity:ameasureofhowvariablesarecorrelatedwitheachother.

Collinearityisdefinedasoneormoreindependentvariablesthataremorecorrelatedwitheachotherthaneitherofthemiswiththedependentvariable.Thatis,ifthereare,saytwoindependentvariablesinthemodel,damagingcollinearityisifX1andX2aremorecorrelatedthanX1andYand/orX2andY.Mathematically:

ρ(X1,X2)>ρ(Y,X1)orρ(Y,X2)whereρ=correlation.

Theconsequencesofcollinearityarethat,whiletheparameterestimatesofeachindependentvariableremainunbiased,thestandarderrorsaretoowide.Thismeanswhensignificancetestingiscalculated(parameterestimate/standarderroroftheestimate)forat-ratio(oraWaldratio)thesevariablestendtoshowlesssignificancethantheyreallyhave.Thisisbecausethestandarderroristoolarge.Collinearitycanalsoswitchsignswhichreturnnonsensicalresults.Thus,collinearitymustbetestedanddealtwith.

Aquicknoteonoverlysimplistic‘diagnostics’I’veseeninpracticefollows.It’spossibletorunacorrelationmatrixonthevariablesandobtainthe(simplePearson)correlationcoefficientforeachpair.ThisdoesNOTcheckfordamagingcollinearity,thisisacheckforsimple(linear)correlation.I’veseenanalystsjustrunthematrixanddrop(yes,drop!)anindependentvariablejustbecausethecorrelationofitandanotherindependentvariableis,say,greaterthan80%.(Wheredidtheyget80%?Thisisarbitraryandbeneathanyonecallingthemselvesanalytic.)Ok,offthesoapbox.

Theabove‘testing’isirksomebecauserealtesting(withSASandSPSS)isrelativelyeasy.VIFisthemostcommon.Runprocregressandinclude‘/VIF’asanoption.VIFisthevarianceinflationfactor.Basicallyitregresseseachindependentvariableonallotherindependentvariablesanddisplaysametric.Thismetricis1/(1–R2).Ifthismetricis>10.0(indicatinganR2of>90%)thenasaruleofthumb,somevariableistoocollineartoignore.Thatis,iftherearethreeindependentvariablesinthemodel,x1,x2andx3,VIFwillregressx1=f(x2andx3)andshowR2,thenx2=f(x1andx3)andshowR2andlastx3=f(x1andx2)andshowR2.

Notethatwearenotreallytestingforcollinearity(becausetherewillnearlyALWAYSbesomecollinearity).Wearetestingforcollinearitybadenoughtocauseaproblem(calledillconditioning).

Therecommendedapproachistoincludevariablesthatmaketheoreticsense.IfVIFindicatesavariableiscausingaproblembutthereisastrongreasonforthatvariabletobeincluded,oneoftheothervariablesshouldbeexaminedinstead.(ItisimportanttonotethatdroppingavariableisNOTthefirstcourseofaction.Simplydroppingavariableisarbitrary(andverysimplistic)analytics).Thatis,astronger,moredefendablemodel

resultsfromastrategicunderstandingofthedatageneratingprocess,notbasedonstatisticaldiagnostics.Thescienceofmodellingwouldemphasizediagnostics;theartofmodellingwouldemphasizebalanceandbusinessimpact.DidImentionsometimesinapracticalbusinessenvironment‘badstatistics’areallowedbalancedonrunningabusiness?Gasp!

Dependingontheissuesanddata,etc.,otherpossiblesolutionsexist.Puttingalltheindependentvariablesinafactormatrixwouldkeepthevariables’varianceintactbutthefactorsare,bydefinition,orthogonal(uncorrelated).

Another(correcting)techniqueiscalledridgeregression(typicallyusingSteinestimates)andrequiresspecialsoftware(inSAS‘procregdata=x.xoutvifoutset=xxridge=0to1by0.01;modely=x1x2’etc.)andexpertisetouse.Ingeneral,ittradescollinearityforbiasintheparameterestimates.Again,thebalanceisinknowingthecoefficientsarenowbiasedbutadrasticreductionincollinearityresults.Isitworthit?Sorry,buttheansweris,itdepends.

WhileVIFishelpful,theconditionindexhasbecome(sinceBelsley,KuhandWelsch’s1980bookRegressionDiagnostics)thestateoftheartincollinearitydiagnostics.Themathsbehinditisfascinatingbutmanytextbookswillilluminatethat.Wewillfocusonanexample.Theapproach,withoutgettingTOOmathematical,istocalculatetheconditionindexofeachvariable.Theconditionindexisthesquarerootofthelargesteigenvalue(calledthecharacteristicroot)dividedbyeachvariable’seigenvalue.(Aneigenvalueisthevarianceofeachprincipalcomponentwhenusedinthecorrelationmatrix.)Theeigenvaluesadduptothenumberofvariables(includingtheintercept):seeTable4.6below.Thisisapowerfuldiagnosticbecauseasetofeigenvaluesofrelativelyequalmagnitudeindicatesthatthereislittlecollinearity.Asmallnumberoflargeeigenvaluesindicatesthatasmallnumberofcomponentvariablesdescribemostofthevariabilityofthevariables.Azeroeigenvalueimpliesperfectcollinearityand–thisisimportant–verysmalleigenvaluesmeansthereisseverecollinearity.Again,aneigenvaluenear0.00indicatescollinearity.Asaruleofthumb,aconditionindex>30indicatesseverecollinearity.

Table4.6Variance

Indvar

Eigenvalue

Condindex

Propinter

PropX1

PropX2

PropX3

PropX4

PropX5

PropX6

Inter 6.861 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

X1 0.082 9.142 0.000 0.000 0.000 0.091 0.014 0.000 0.000

X2 0.046 12.256 0.000 0.000 0.000 0.064 0.001 0.000 0.000

X3 0.011 25.337 0.000 0.000 0.000 0.427 0.065 0.001 0.000

X4 0.000 230.420 0.000 0.000 0.000 0.115 0.006 0.016 0.456

X5 0.000 1048.100 0.000 0.000 0.831 0.000 0.225 0.328 0.504

X6 0.000 432750.000 0.999 1.000 0.160 0.320 0.689 0.655 0.038

CommonoutputsalongwiththeVIFandconditionindexaretheproportionsofvariance(seeTable4.6).Thisproportionofvarianceshowsthepercentageofthevarianceofthecoefficientassociatedwitheacheigenvalue.Ahighproportionofvariancerevealsastrongassociationwiththeeigenvalue.

Let’stalkaboutTable4.6.Firstlookattheconditionindex.Theeigenvalueontheinterceptis6.86andthefirstconditionindexisthesquarerootof6.86/6.86=1.00.Nowthesecondconditionindexisthesquarerootof6.86/0.082=9.142.Thediagnosticsindicatethatthereareasmanycollinearityproblemsasthereareconditionindexes>30,orinthiscasetheremaybethreeproblems(230.42,1048.1and432750).Looktotheproportionofvariancetable.Anyproportion>0.50isaredflag.LookatthelastX6variable.VariableX6isrelatedtotheintercept,X1,X4andX5.X5isrelatedtoX2(0.8306)andX6(0.504).ThisindicatesX6isthemostproblematicvariable.Somethingoughttobedoneaboutthat.

PossiblesolutionsmightmeancombiningX5andX6intoafactorandusetheresultingfactorasavariableinsteadofX5andX6ascurrentlymeasured.Thisisbecausefactorsarebyconstructionuncorrelated(wecallitorthogonal).Anotheroptionwouldbetotransform(especially)X6,eithertakingitsexponent,orsquareroot,orsomethingelse.ThepointistotrytofindanX6-likevariablecorrelatedwiththedependentvariablebutLESSCORRELATEDwith,especially,X5.Areyouabletogetalargersample?CanyoutakedifferencesinX6,ratherthanjusttherawmeasure?Andyes,ifthereisatheoreticalreason,youcandropX6andre-runthemodelandseewhatyouhave.Droppingavariableisalastresort.HaveImentionedthat?

AbriefproceduralnoteOnprobablymostoftheanalytictechniqueswe’lltalkabout,certainassumptionsarebuiltin.Thatis,regressionhasmanyassumptionsaboutlinearity,normality,etc.WhileforOLSImentionedoneassumption(especiallyfortimeseriesdata)wasnoserialcorrelation,thissameassumptionisappliedtologisticregressionaswell.Mostregressiontechniquesusemostoftheseassumptions.SowhileinlogitIshowedhowtotestandcorrectforcollinearity,thissametestneedstobeappliedinOLSaswell.Itjusthappenedtocomeupduringourdiscussionoflogisticregression.

Thismeansthatinreality,foreveryregressiontechniqueused,everyassumptionshouldbecheckedandeveryviolationofassumptionsshouldbetestedforandcorrected,ifpossible.ThisgoesforOLS,logitandanythingelse.Okay?

VariablediagnosticsAsinallregression,asignificancetestisperformedontheindependentvariablesbutbecauselogitisnon-linear,thet-testbecomestheWaldtest(whichisthet-testsquared,so1.962=3.84,at95%).Thep-valuestillneedstobe<0.05.

PseudoR2

LogisticregressiondoesnothaveanR2statistic.Thisfreaksalotofpeopleoutbutthat’swhyIshowedthe‘confusionmatrix’,whichisameasureofgoodnessoffit.Remember(fromOLS)R2isthesharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.Themorevariancethesetwosharethecloserthepredictedandactualdependentvariablesare.RememberOLSoutputsanestimateddependentvariable.LogisticregressiondoesNOToutputanestimateddependentvariable.Theactualdependentvariableis0or1.The‘logit’isthenaturallogoftheevent/(1–event).Sotherecanbeno‘estimated’dependentvariable.IfyouHAVEtohavesomemeasureofgoodnessoffitI’dsuggestusingtheloglikelihoodonthecovariateandintercept.SPSSandSASbothoutputthe–2LLontheinterceptonlyandthe–2LLontheinterceptandcovariates.Thinkofthe–2LLoninterceptasTSS(totalsumofsquares)and–2LLoninterceptandcovariatesasRSS(regressionsumofsquares).R2isRSS/TSSandthiswillgiveanindication(calledapseudo-R2)forthosethatneedthatmetric.

ScoringthedatabasewithprobabilityformulaTypicallyafteralogisticregressionisrun,especiallyinadatabasemarketingprocess,themodelhastobeappliedtoscorethedatabase.Yes,SASnowhas‘procscore’butIwantyoutobeabletodoityourselfandtounderstandwhat’shappening.It’soldfashionedbutyouwillknowmore.

Saywehavethebelow(Table4.7)modelwithprobabilitytopurchase.Thatis,thedependentvariableispurchase=1fortheeventandpurchase=0forthenon-event.Becauseofthelogisticcurveboundingbetween0and1,theformulaisprobability=1/(1+e–Z)whereZ=α+βXi.Fortheabovemodelthismeans:

Probability=1/(1+2.71828^–(4.566+X1*–0.003+x2*1.265+x3*0.003))

Thisreturnsaprobabilitybetween0%and100%foreachcustomer(2.71828=e).Soapplythisformulatoyourdatabaseandeachcustomerwillhaveascore(thatcanbeusedforaliftchart,seeabove)forprobabilitytopurchase.

Table4.7Probabilitytopurchase

Independentvariable Parameterestimate

Intercept 4.566

X1 –0.003

X2 1.265

X3 0.003

HIGHLIGHT

USINGLOGISTICREGRESSIONFORMARKETBASKETANALYSIS

AbstractIngeneral,marketbasketanalysisisabackward-lookingexercise.Itusesdescriptiveanalysis(frequencies,correlation,mathematicalKPIs,etc.)andoutputsthoseproductsthattendtobepurchasedtogether.Thatgivesnoinsightsintowhatmarketersshoulddowiththatoutput.Predictiveanalytics,usinglogisticregression,showshowmuchtheprobabilityofaproductpurchaseincreases/decreasesgivenanotherproductpurchase.Thisgivesmarketersastrategiclevertouseinbundling,etc.

Whatisamarketbasket?Ineconomics,amarketbasketisafixedcollectionofitemsthatconsumersbuy.ThisisusedformetricslikeCPI(inflation)etc.Inmarketing,amarketbasketisanytwoormoreitemsboughttogether.

Marketbasketanalysisisused,especiallyinretail/CPG,tobundleandofferpromotionsandgaininsightinshopping/purchasingpatterns.‘Marketbasketanalysis’doesnot,byitself,describeHOWtheanalysisisdone.Thatis,thereisnoassociatedtechniquewiththosewords.

Howisitusuallydone?Therearethreegeneralusesofdata:descriptive,predictiveandprescriptive.Descriptiveisaboutthepast,predictiveusesstatisticalanalysistocalculateachangeonanoutputvariable(eg,sales)givenachangeinaninputvariable(say,price)andprescriptiveisasystemthattriestooptimizesomemetric(typicallyprofit,etc.).Descriptivedata(means,frequencies,KPIs,etc.)isanecessary,butnotusuallysufficient,step.Alwaysgettoatleastthepredictivestepassoonaspossible.Notethatpredictiveheredoesnotnecessarilymeanforecastedintothefuture.Structuralanalysisusesmodelstosimulatethemarket,andestimate(predict)whatcauseswhattohappen.Thatis,usingregression,achangeinpriceshowswhatistheestimated(predicted)changeinsales.

Marketbasketanalysisoftenusesdescriptivetechniques.Sometimesitisjusta‘report’ofwhatpercentofitemsarepurchasedtogether.Affinityanalysis(aslightstepabove)ismathematical,notstatistical.Affinityanalysissimplycalculatesthepercentoftimecombinationsofproductsarepurchasedtogether.Obviouslythereisnoprobabilityinvolved.Itisconcernedwiththerateofproductspurchasedtogether,andnotwithadistributionaroundthatassociation.ItisverycommonandveryusefulbutNOTpredictive–thereforeNOTsoactionable.

LogisticregressionLet’stalkaboutlogisticregression.Thisisanancientandwell-knownstatisticaltechnique,probablytheanalyticpillaruponwhichdatabasemarketinghasbeenbuilt.Itissimilartoordinaryregressioninthatthereisadependentvariablethatdependsononeormoreindependentvariables.Thereisacoefficient(althoughinterpretationisnotthesame)andthereisa(typeof)t-testaroundeachindependentvariableforsignificance.

Thedifferencesarethatthedependentvariableisbinary(havingtwovalues,0or1)inlogisticandcontinuousinordinaryregressionandtointerpretthecoefficientsrequiresexponentiation.Becausethedependentvariableisbinary,theresultisheteroskedasticity.Thereisno(real)R2,and‘fit’isaboutclassification.

Howtoestimate/predictthemarketbasketTheuseoflogisticregressionintermsofmarketbasketbecomesobviouswhenitisunderstoodthatthepredicteddependentvariableisaprobability.Theformulatoestimateprobabilityfromlogisticregressionis:

P(i)=1/1+e–Z

whereZ=α+βXi.Thismeansthattheindependentvariablescanbeproductspurchasedinamarketbaskettopredictlikelihoodtopurchaseanotherproductasthedependentvariable.Notethatthereisnotanissueofcausalityhere,ie,presupposingthatone(independentproduct)causesthepurchaseofthedependentproduct,onlythattheyareassociatedtogether.Theabovemeanstospecificallytakeeach(major)categoryofproduct(focusdrivenbystrategy)andrunningaseparatemodelforeach,puttinginallsignificantotherproductsasindependentvariables.Forexample,saywehaveonlythreeproducts,x,yandz.Theideaistodesignthreemodelsandtestsignificanceofeach,meaningusinglogisticregression:

x=f(y,z)

y=f(x,z)

z=f(x,y).

Ofcourseothervariablescangointothemodelasappropriatebuttheinterestiswhether

ornottheindependent(product)variablesaresignificantinpredicting(andtowhatextent)theprobabilityofpurchasingthedependentproductvariable.Ofcourse,aftersignificanceisachieved,theinsightsgeneratedarearoundthesignoftheindependentvariable,ie,doestheindependentproductincreaseordecreasetheprobabilityofpurchasingthedependentproduct.

AnexampleAsasimpleexample,sayweareanalysingaretailstore,withcategoriesofproductslikeconsumerelectronics,women’saccessories,newbornandinfantitems,etc.Thus,usinglogisticregression,aseriesofmodelsshouldberun.Thatis:

consumerelectronics=f(women’saccessories,jewelleryandwatches,furniture,entertainment,etc.)

Thismeanstheindependentvariablesarebinary,codedasa‘1’ifthecustomerboughtthatcategoryanda‘0’ifnot.Table4.8detailstheoutputforallofthemodels.Notethatotherindependentvariablescanbeincludedinthemodel,ifsignificant.Thesewouldoftenbeseasonality,consumerconfidence,promotionssent,etc.

Table4.8Associatedprobabilities

Consumerelectronics

Women’saccessories

Newborn,infant,etc.

Jewellery,watches

Furniture Homedécor

Entertainment

Consumerelectronics

XXX Insig Insig –23% 34% 26% 98%

Women’saccessories

Insig XXX 39% 68% 22% 21% Insig

Newborn,infant,etc.

Insig 43% XXX –11% –21% –31% 29%

Jewellery,watches

–29% 71% –22% XXX 12% 24% –11%

Furniture 31% 18% –17% 9% XXX 115% 37%

Homedécor 29% 24% –37% 21% 121% XXX 31%

Entertainment 85% Insig 31% –9% 41% 29% XXX

Sportinggoods

18% –37% –29% –29% 24% 9% 33%

Tointerpret,lookat,say,thehomedécormodel.Ifacustomerboughtconsumerelectronics,thatincreasestheprobabilityofbuyinghomedécorby29%.Ifacustomer

boughtnewborn/infantitems,thatdecreasestheprobabilityofbuyinghomedécorby37%.Ifacustomerboughtfurniture,thatincreasestheprobabilityofbuyinghomedécorby121%.Thishasimplications,especiallyforbundlingandmessaging.Thatis,offering,say,homedécorandfurnituretogethermakesgreatsense,butofferinghomedécorandnewborn/infantitemsdoesnotmakesense.

Andhereisaspecialnoteaboutproductspurchasedtogether.Ifitisknown,viatheabove,thathomedécorandfurnituretendtogotogether,thesecanbeandshouldbebundledtogether,messagedtogether,etc.ButthereisnoreasontoPROMOTEthemtogetherortodiscountthemtogetherbecausetheyarepurchasedtogetheranyway.

ConclusionTheabovedetailedasimple(andmorepowerfulway)todomarketbasketanalysis.Ifgivenachoice,alwaysgobeyondmeredescriptivetechniquesandapplypredictivetechniques.

Checklist


Candifferentiatebetweenlogisticandordinaryregression.Logisticandordinaryregressionaresimilarinthatbotharesingleequationshavingadependentvariableexplainedbyoneormoreindependentvariables.Theyaredissimilarinthatordinaryregressionhasacontinuousdependentvariablewhilelogisticregressionhasabinaryvariable;ordinaryregressionusesleastsquarestoestimatethecoefficientswhilelogisticregressionusesmaximumlikelihood.

Rememberthatlogisticregressionpredictsaprobabilityofanevent.

Alwaystestforoutliers/influentialobservationsusingz-scores.

Pointoutthatthe‘confusionmatrix’isameansofgoodnessoffit.

Observethatlift/gainchartsareusedasameasureofmodellingefficacyaswellas(egindirectmail)depthofmailing.

Remembertoalwayscheck/correctforcollinearity.

Suggestlogisticregressionasawaytomodelmarketbaskets.

05

Whenaremycustomersmostlikelytobuy?Introduction

Conceptualoverviewofsurvivalanalysis

Businesscase



Conclusion

Highlight:Lifetimevalue:howpredictiveanalysisissuperiortodescriptiveanalysis


IntroductionSurvivalanalysisisanespeciallyinterestingandpowerfultechnique.Intermsofmarketingscienceitisrelativelynew,mostlygettingexposureintheselast20yearsorso.Itanswersaveryimportantandparticularquestion:‘WHENisanevent(purchase,response,churn,etc.)mostlikelytooccur?’I’dsubmitthisisamorerelevantquestionthan‘HOWLIKELYisanevent(purchase,response,churn,etc.)tooccur?’Thatis,acustomermaybeVERYlikelytopurchasebutnotfor10months.Istiminginformationofvalue?Ofcourseitis;remember,timeismoney.

Bewarethough.Giventheincreaseinactionableinformation,itshouldbenosurprisethatsurvivalanalysisismorecomplexthanlogisticregression.Rememberhowmuchmorecomplexlogisticregressionwasthanordinaryregression?

ConceptualoverviewofsurvivalanalysisSurvivalanalysis(viaproportionalhazardsmodelling)wasessentiallyinventedbySirDavidCoxin1972withhisseminalandoft-quotedpaper,‘RegressionModelsandLifeTables’intheJournaloftheRoyalStatisticalSociety(Cox,1972).It’simportanttonotethistechniquewasspecificallydesignedtostudytimeuntileventproblems.Thiscameoutofbiostatisticsandtheeventofstudywastypicallydeath.That’swhyit’scalled‘survivalanalysis’.Getit?

Thegeneralusecasewasindrugtreatment.Therewouldbe,say,adrugstudywhereapanelwasdividedintotwogroups;onegroupgotthenewdrugandtheothergroupdidnot.Everymonththetestsubjectswerecalledandbasicallyasked,‘Areyoustillalive?’andtheirsurvivalwastracked.Therewouldbetwocurvesdeveloped,onefollowingthetreatmentgroupandanotherfollowingthenon-treatmentgroup.Ifthetreatmenttendedtoworkthetimeuntilevent(death)wasincreased.

Onemajorissueinvolvedcensoredobservations.It’saneasymattertocomparetheaveragesurvivaltimesofthetreatmentvs.thenon-treatmentgroup.

Censoredobservation:thatobservationwhereinwedonotknowitsstatus.Typicallytheeventhasnotoccurredyetorwaslostinsomeway.

Butwhataboutthosesubjectsthatdroppedoutofthestudybecausetheymovedawayorlostcontact?Orthestudyendedandnoteveryonehasdiedyet?Eachoftheseinvolvescensoredobservations.ThequestionaboutwhattodowiththesekindsofobservationsiswhyCoxregressionwascreated;anon-parametricpartiallikelihoodtechnique,whichhecalledproportionalhazards.Itdealswithcensoredobservations,whicharethosepatientsthathaveanunknowntimeuntileventstatus.Thisunknowntimeuntileventcanbecausedbyeithernothavingtheeventatthetimeoftheanalysisorlosingcontactwiththepatient.

Whataboutthosesubjectsthatdiedfromanothercauseandnotthecausethetestdrugwastreating?Arethereothervariables(covariates)thatinfluence(increaseordecrease)thetimeuntiltheevent?Thesequestionsinvolveextensionsofthegeneralsurvivalmodel.Thefirstisaboutcompetingrisksandthesecondisaboutregressioninvolvingindependentvariables.Thesewillbedealtwithsoonenough.

BUSINESSCASEAttheendoftheyearScottcalledhisteamandthemarketingorganizationtogetherforareviewandbrainstormingexercise.ThisissomethingScottbelievedeverysmartanalyticsproshoulddo.Hewasespeciallyinterestedinhowtheanalyticteamwasperceivedasprovidingvaluelastyearandwhatmightbedonedifferentlyintheupcomingyear.

DuringthemeetingthemarketingmanagerscomplimentedScottandhisteamforprovidingactionableinsights.Theresultsgavemostofthemagoodbonusandtheywantedtogetanotheronethisyear.TheydidnotallcompletelyunderstandthetechnicaldetailsandScottmadetheculturearoundthatokay.Hetriedtomakehisteamviewedasconsultants;accessible,conversationalandengagedwiththebroaderorganization.

‘Thanks’,Scottsaidandturnedtothedirectorofconsumermarketing,Stacy.‘Wherecanweimprove?Whattargetingwouldhelpyouandyourteam?’

‘Well,wehaveaprettygoodprocessnow.Wepulllistsbasedonlikelihoodtorespond.It’sworkedwell.’

‘Yeah,I’mgladofthat.Theliftchartsfromlogithelpedusmailonlyasdeepasweneededto.’

‘ThisgivesusthebestROIinthecompany.’

‘Butisthatallwecando?Justtargetthosemostlikelytorespond?’Scottasked.

‘Whatelseisthere?’Stacyasked,checkingherphone.

‘Yeah,I’mnotsure’,Scottsaid.‘Whatdoyouneedtoknowtodoyourjob?Whatiftherewerenorestrictionsondataorfeasibilityoranythingelse?Youhaveamagicbuttonthatifyoupushityouwouldknowtheonethingthatwouldallowyoutodoyourjobbetter,betterthaneverbefore,aknowledgethatgivesyouatremendousadvantage.’

‘Easy!’Kristinasaid.‘IfIknewwhatproducteachcustomerwouldpurchaseinwhatorder,thatis,ifIknewWHENhewouldpurchaseadesktop,oranotebook,Iwouldnotsendalotofuselesscataloguesore-mailstohim.I’dsendtohimthemostcompellingmarcomatjusttherighttimewithjusttherightpromotionandjusttherightmessagingtomaximizehispurchase.’

Theyalllookedather.Thentheynoddedtheirheads.KristinahadtalkedwithScottaboutjoininghisteamaftershegraduates.

‘Itsoundslikesciencefiction’,Stacysaid.‘Wewouldgetalistofcustomerswithamostlikelytimetopurchaseeachproduct?’

Scottrubbedhischin.‘Yes.It’sapredictionofwheneachcustomerisgoingtopurchaseeachproduct.’

‘But’,saidMark,‘whatdoesthatmean?Before?’MarkwasananalystonScott’steam.‘Wewanttopredictwhenthey’llpurchase?’

‘Ithinkso’,Scottsaid.‘Predictwhenthey’llbuyadesktop,whenthey’llbuyanotebook,etc.’

‘Imaginehavingthedatabasescoredwiththenumberofdaysuntileachcustomerislikelytobuypersonalelectronics,adesktop,etc.’Kristinasaid.‘We’djustsortthedatabasebyproductsandthosemorelikelytobuysoonerwouldgetthecommunication.’

‘Butdoesthatmeanusingregression,orlogit,orwhat?’

‘Idon’tknow’,Scottsaid.‘Whatdowedoaboutpredictingthosewhohavenotpurchasedaproduct?Isthisprobabilitytobuyateachdistincttimeperiod?’

Theyallleftthemeetingexcitedaboutthenewmetric(timeuntilpurchase)butScottwaswonderingwhattechniquewouldanswerthatquestion.Iftheyusedordinaryregression,thedependentvariablewouldbe‘numberofdaysuntilpurchaseofadesktop’

basedonsomezero-day,sayJanuaryfirsttwoyearsago.Thosethatpurchasedadesktopwouldhavetheeventatthatmanydays.ThosethatdidnotpurchaseadesktopgaveScottachoice.Eitherhewouldcapthenumberofdaysatnow,saytwoyearsfromthezerodate,whichmeans,say,725,iftheywereonfilefromthezerodateonward.Thatis,thosethathavenotpurchasedadesktopwouldbeforcedtohavetheeventat725days.Notagoodchoice.Theotheroptionwouldbetodeletethosethatdidnotpurchaseadesktop.Alsonotagoodchoice.

Rulenumerouno:nevereverunderanycircumstancesdeletedata.Never.Ever.Thisisan‘Offwiththeirheads!’crime(unlessofcoursethedataiswrongoranoutlier).

Ignoringthetimeuntiltheevent-dependentvariablecouldgiverisetologisticregression.Thatis,thosereceivinga1iftheydidpurchaseadesktopanda0iftheydidnot.Thisputshimrightbackintoprobability,andtheyallagreedthattimingwasamorestrategicoption.SoScottconcludedthatbothOLSandlogithaveseverefaultsintermsoftimeuntileventproblems.

It’simportanttomakeaclarificationaboutatrapalotofpeoplefallinto.Survivalanalysisisatechniquespecificallydesignedtoestimateandunderstandtimeuntileventproblems.Theunderlyingassumptionisthateachtimeperiodisindependentofeachothertimeperiod.Thatis,thepredictionhasno‘memory’.Someunder-educated/under-experiencedanalyststhinkthatifwearesaytryingtopredictwhatmonthaneventwillhappentheycando12logitsandhaveonemodelforJanuary,anotherforFebruary,etc.Thecollecteddatawouldhavea1ifthecustomerpurchasedinJanuaryanda0ifnot,likewise,ifthemodelwasforFebruaryacustomerwouldhavea1iftheypurchasedinFebruaryanda0ifnot.Thisseemslikeitwouldwork,right?Wrong.FebruaryisnotindependentofJanuary.InorderforthecustomertobuyinFebruarytheyhadtodecideNOTtobuyinJanuary.See?Thisiswhylogitisinappropriate.

Nowforyouacademicians,yes,logisticregressionisappropriateforasmallsubsetofaparticularproblem.Ifthedataisperiodic(aneventthatcanonlyoccuratregularandspecificintervals)then,yes,logisticregressioncanbeusedtoestimatesurvivalanalyses.Thisrequiresawholedifferentkindofdataset,onewhereeachrowisnotacustomerbutatimeperiodwithanevent.I’dstillsuggesteventhen,whynotjustusesurvivalanalysis(inSASliferegorphreg)?


Asmentioned,survivalanalysiscamefrombiostatisticsintheearly1970s,wherethesubjectstudiedwasanevent:death.Survivalanalysisisaboutmodellingthetimeuntilanevent.Inbiostatisticstheeventistypicallydeathbutinmarketingtheeventcanberesponse,purchase,churn,etc.

Duetothenatureofsurvivalstudies,thereareacoupleofcharacteristicsthatareendemictothistechnique.Asalludedtoearlier,thedependentvariableistimeuntilevent,

sotimeisbuiltintotheanalysis.Thesecondendemicthingtosurvivalanalysisisobservationsthatarecensored.Acensoredobservationiseitheranobservationthathasnothadtheeventoranobservationthatwaslosttothestudyandthereisnoknowledgeofhavingtheeventornot–butwedoknowatsomepointintimethattheobservationhasnothadtheevent.

Inmarketingit’scommonfortheeventtobeapurchase.Imaginescoringadatabaseofcustomerswithtimeuntilpurchase.Thatisfarmoreactionablethan,fromlogisticregression,probabilityofpurchase.

Let’stalkaboutcensoredobservations.Whatcanbedoneaboutthem?Rememberwedonotknowwhathappenedtotheseobservations.Wecoulddeletethem.Thatwouldbesimple,butdependinghowmanytherearethatmightbethrowingawayalotofdata.Also,theymightbethemostinterestingdataofall,sodeletingthemisprobablyabadidea.(And,rememberthe‘Offwiththeirheads!’crimementionedpreviously.)Wecouldjustgivethemaximumtimeuntilaneventtoallthosethathavenothadtheevent.Thiswouldalsobeabadidea,especiallyifalargeportionofthesampleiscensored,asisoftenthecase.(Itcanbeshownthatthrowingawayalotofcensoreddatawillbiasanyresults.)Thus,weneedatechniquethatcandealwithcensoreddata.Also,deletingcensoredobservationsignoresalotofinformation.Whilewedon’tknowwhen(orevenif)thecustomer,say,purchased,wedoknowasofacertaintimethattheydidNOTpurchase.Sowehavepartoftheircurve,partoftheirinformation,partoftheirbehaviour.Thisshouldnoteverbedeleted.ThisiswhyCoxinventedpartiallikelihood.

Figure5.1Generalsurvivalcurve

Theaboveisageneralsurvivalcurve.Theverticalaxisisacountofthoseinthe‘riskset’anditstartsoutwith100%.Thatis,attime0everyoneis‘atrisk’ofhavingtheeventandnoonehashadtheevent.Atday1,thatis,afteroneday,onepersondied(hadtheevent)andtherearenow99thatareatrisk.Noonediedfor3daysuntil9hadtheeventatday5,etc.Notethatataboutday12,29hadtheevent.

NownoteFigure5.2.Onesurvivalcurveisthesameasabove,buttheotheroneis

‘furtherout’.Notethat50%ofthefirstcurveisreachedat14days,butthesecondcurvedoesnotreach50%until28days.Thatis,they‘livelonger’.

Figure5.2Survivalanalysis

Survivalanalysisisatypeofregression,butwithatwist.Itdoesnotusemaximumlikelihood,butpartiallikelihood.(Themostcommonformofsurvivalanalysis,proportionalhazards,usespartiallikelihood.)Thedependentvariableisnowtwoparts:timeuntiltheeventandwhethertheeventhasoccurredornot.Thisallowstheuseofcensoredobservations.

Theabovegraphsaresurvivalgraphs.MuchofCoxregressionisnotaboutthesurvivalcurve,butthehazardrate.Thehazardisnearlythereciprocalofthesurvivalcurve.Thisendsupastheinstantaneousriskthataneventwilloccuratsomeparticulartime.Thinkofmetricslikemilesperhourasanalogoustothehazardrate.At40milesperhouryouwilltravel40milesinonehourifspeedremainsthesame.Thehazardquantifiestherateoftheeventineachperiodoftime.

SASdoesbothsurvivalmodelling(withproclifereg)andhazardmodelling(asprocphreg).SPSSonlydoeshazardmodelling(asCoxregression).Liferegdoesleftandintervalcensoringwhilephregdoesonlyrightcensoring(thisisnotusuallyanissueformarketing).Withliferegadistributionmustbespecified,butwithphreg(asit’ssemi-parametric)thereisnodistribution.Thisisoneoftheadvantagesofphreg.Theotheradvantageofphregisthatitincorporatestime-varyingindependentvariables,whileliferegdoesnot.(Thisalsoisnotusuallymuchofanissueformarketing.)

Itypicallyuseliferegasiteasilyoutputsatime-until-eventprediction,itisonthesurvivalcurveanditisrelativelyeasytounderstandandinterpret.That’swhatwe’lldemonstratehere.

Imightmentionthatsurvivalanalysisisnotjustaboutthetimeuntileventprediction.Aswithallregressionstheindependentvariablesarestrategiclevers.Saywefindthatforevery1,000e-mailswesendpurchasestendtohappenthreedayssooner.Doyouseethefinancialimplicationshere?Howvaluableisittoknowyouhaveincentivizedagroupof

customersinmakingpurchasesearlier?Ifthisdoesnotinterestyouthenyouareinthewrongcareerfield.


SoScott’steaminvestigatedsurvivalanalysisandconcludeditwasworthashot.Itseemedtogiveawaytoanswerthekeyquestion,‘WHENisacustomermostlikelytopurchaseadesktop?’

Table5.1Finaldesktopmodel,lifereg

Independentvariables Beta e^B (e^B)-1 AvgTTE

Anypreviouspurchase –0.001 0.999 –0.001 –0.012

Recentonlinevisit –0.014 0.987 –0.013 –0.148

#Directmails 0.157 1.17 0.17 1.865

#E-mailsopened –0.011 0.989 –0.011 –0.12

#E-mailsclicked –0.033 0.968 –0.032 –0.352

Income –0.051 0.95 –0.05 –0.547

Sizehousehold –0.038 0.963 –0.037 –0.408

Education –0.023 0.977 –0.023 –0.249

Bluecollaroccupation 0.151 1.163 0.163 1.792

#Promotionssent –0.006 0.994 –0.006 –0.066

Purchdesktop<year 2.09 8.085 7.085 77.934

Thetableaboveliststhefinaldesktopmodelusinglifereg.Thevariablesareallsignificantatthe95%level.Thefirstcolumnisthenameoftheindependentvariable.Theinterpretationofliferegcoefficientsrequirestransformations.Thisgetstheparameterestimatesintoaformtomakestrategicinterpretation.

Thenextcolumnisthebetacoefficient.ThisiswhatSASoutputsbut,aswithlogisticregression,isnotverymeaningful.Anegativecoefficienttendstobringtheeventofadesktoppurchasein;apositivecoefficienttendstopushtheevent(desktoppurchase)out.Thisisaregressionoutputsointhatregardinterpretationisthesame,ceterisparibus.

Togetpercentimpactsontimeuntilevent(TTE),eachbetacoefficientmustbeexponentiated,e^B.That’sthethirdcolumn.Thenextcolumnsubtracts1fromitandconvertsitintoapercentage.Notethat,forexample,‘recentonlinevisit’e^Betaisa0.987impactontime,or,if1issubtractedshowsa1.3%decreaseinaverageTTE.Toconvertthattoascale–saytheaverageis11weeks–thismeans–0.013*11=–0.148weeks.The

interpretationisthatifacustomerhadarecentonlinevisitthattendstopullin(shorten)TTEby0.148weeks.Notrealimpactfulbutitmakessense,right?

Noticethelastvariable,‘purchdesktop<year’.Seehowit’spositive,2.09?Thismeansifthecustomerhaspurchasedadesktopinthelastyearthetimeuntil(another)desktoppurchaseispushedoutby((e^B)–1)*11=77.934weeks.Seehowthisworks?Seehowstrategicallyinsightfulsurvivalanalysiscanbe?Youcanbuildabusinesscasearoundmarcomsent(costofmarcom)anddecreasingthetimeuntilpurchase(revenuerealizedsooner).

Astypicallyusedonadatabase,eachcustomerisscoredwithtimeuntiltheevent,inthiscase,timeuntiladesktoppurchase.Thedatabaseissortedandalistisdesignedwiththosemostlikelytopurchasenext(seeTable5.2below).Thistimeuntilevent(TTE)isatthe(50%decile)median.

Table5.2Timeuntilevent(inweeks)

CustomerID TTE

1000 3.365

1002 3.702

1004 4.072

1006 4.479

1011 5.151

1013 5.923

1015 6.812

1017 7.834

1022 9.009

1024 10.36

1026 12.43

1030 14.92

Notethatcustomer1000isexpectedtopurchaseadesktopin3.3weeksandcustomer1030isexpectedtopurchaseadesktopin14.9weeks.Usingsurvivalanalysis(inSAS,proclifereg)allowedScott’steamtoscorethedatabasewiththoselikelytopurchasesooner.Thislistismoreactionablethanusinglogisticregression,wherethescoreisjustprobabilitytopurchase.

Nowlet’stalkaboutcompetingrisks.Whilesurvivalanalysisisaboutdeath,thestudy

usuallyisinterestedinONEkindofdeath,ordeathfromONEcause.Thatis,thebiostatstudyisabout,say,deathbyheartattackandnotaboutdeathbycancerordeathbyacaraccident.Butit’struethatinastudyofdeathbyheartattackapatientisalsoatriskforotherkindsofdeath.Thisiscalledcompetingrisks.

Inthemarketingarena,whilethefocusmightbeonapurchaseeventfor,say,adesktopPC,thecustomerisalso‘atrisk’forpurchasingotherthings,likeanotebookorconsumerelectronics.Fortunately,thisisaneasyjobofjustcodingtheeventsofinterest.Thatis,ScottcancodeforaneventasDT(desktop)purchase,withallelsecodedasanon-event.Hecandoanothermodelasapurchaseeventof,say,notebooks,andallelseisanon-event,thatis,allotherthingsarecensored.ThusTable5.3showsthreemodels,havingapurchaseeventfordesktop,notebookandconsumerelectronics.

Table5.3Threemodelcomparison

CustomerID TTdesktoppurch TTnotebookpurch TTconsumerelectronicspurch

1000 3.365 75.66 39.51

1002 3.702 88.2 45.95

1004 4.072 111.2 55.66

1006 4.479 15.05 19.66

1011 5.151 13.07 9.109

1013 5.923 9.945 7.934

1015 6.812 22.24 144.5

1017 7.834 3.011 5.422

1022 9.009 2.613 5.811

1024 10.36 1.989 6.174

1026 12.43 4.448 8.44

1030 14.92 0.602 7.76

Alittletechnicalbackground

First,somethingtonoteaboutliferegisthatitrequiresyoutogiveitadistribution.(Phregdoesnotrequirethatyougiveitadistribution,somethingalotofanalystslike.)Inusinglifereg,I’dsuggesttestingalldistributions,andtheonethatfitsthebest(lowestBICorloglikelihood)istheonetouse.Anotherviewwouldbetoacknowledgethatthedistributionhasashapeandascertainwhatshapemakessensegiventhedatayou’reusing.

PseudoR2

WhileR2asametricmakesnosense(sameaswithlogisticregression)alotofanalystslikesomekindofR2.Toreview,R2inOLSisthesharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.Insurvivalanalysisthereisnopredicteddependentvariable.Mostfolksusethemedianasthepredictionandthat’sokay.I’dsuggestrunningasimplemodelwith,andwithout,covariates.Thatis,inSASwithproclifereg,runthemodelwithoutthecovariates(independentvariables)andcollectthe–2loglikelihoodstat.Thenrunthemodelwiththecovariatesandcollectthe–2LLstatanddivide.Thismetric(byanalogy)showsthepercentofexplainedoverthepercentunexplained.

Conclusion

Survivalanalysisisnotacommontopicinmarketinganalyticsanditshouldbe.Whileit’struethatmarketersandbiostatisticians(wheresurvivalanalysisoriginated)donotmoveinthesamecircles,I’venowgivenyousomeofthebasics,sogoandgettowork.

HIGHLIGHT

LIFETIMEVALUE:HOWPREDICTIVEANALYSISISSUPERIORTODESCRIPTIVEANALYSIS

AbstractTypicallylifetimevalue(LTV)isbutacalculationusinghistoricaldata.Thiscalculationmakessomeratherheroicassumptionstoprojectintothefuturebutgivesnoinsightsintowhyacustomerislowervalued,orhowtomakeacustomerhighervalued.Usingpredictivetechniques,heresurvivalanalysisgivesanindicationastowhatcausespurchasestohappensooner,andthushowtoincreaseLTV.

DescriptiveanalysisLifetimevalue(LTV)istypicallydoneasjustacalculation,usingpast(historical)data.Thatis,it’sonlydescriptive.

WhiletherearemanyversionsofLTV(dependingondata,industry,interest,etc.)thefollowingisconceptuallyappliedtoall.LTV,viadescriptiveanalysis,worksasfollows:

1. Ituseshistoricaldatatosumupeachcustomer’stotalrevenue.2. Thissumthenhassubtractedfromitsomecosts:typicallycosttoserve,costto

market,costofgoodssold,etc.3. Thisnetrevenueisthenconvertedintoanannualaverageamountanddepictedasa

cashflow.

4. Thesecashflowsareassumedtocontinueintothefutureanddiminishovertime(dependingondurability,salescycle,etc.)oftendecreasingarbitrarilybysay10%eachyearuntiltheyareeffectivelyzero.

5. These(future,diminished)cashflowsarethensummedupanddiscounted(usuallybyweightedaveragecostofcapital)togettheirnetpresentvalue(NPV).

6. ThisNPViscalledLTV.Thiscalculationisappliedtoeachcustomer.

Thuseachcustomerhasavalueassociatedwithit.Thetypicaluseisformarketerstofindthe‘high-valued’customers(basedonpastpurchases).Thesehigh-valuedcustomersgetmostofthecommunications,promotions/discountsandmarketingefforts.Descriptiveanalysisismerelyabouttargetingthosealreadyengaged,muchlikeRFM(recency,frequency,monetary),whichwewilldiscusslater.

Thisseemstobeagoodstartingpointbut,asisusualwithdescriptiveanalysis,contributesnothinginformative.Whyisonecustomermorevaluable,andwilltheycontinuetobe?Isitpossibletoextractadditionalvalue,butatwhatcost?Isitpossibletogarnermorerevenuefromalowervaluedcustomerbecausetheyaremoreloyalorcostlesstoserve?Whatpartofthemarketingmixiseachcustomermostsensitiveto?LTV(asdescribedabove)givesnoimplicationsforstrategy.Theonlystrategyistoofferandpromoteto(only)thehigh-valuedcustomers.

PredictiveanalysisHowwouldLTVchangeusingpredictiveanalysisinsteadofdescriptiveanalysis?FirstnotethatwhileLTVisafuture-orientedmetric,descriptiveanalysisuseshistorical(past)dataandtheentiremetricisbuiltonthat,withassumptionsaboutthefutureappliedunilaterallytoeverycustomer.PredictiveanalysisspecificallythrustsLTVintothefuture(whereitbelongs)byusingindependentvariablestopredictthenexttimeuntilpurchase.SincethemajorcustomerbehaviourdrivingLTVistiming,amountandnumberofpurchases,astatisticaltechniqueneedstobeusedthatpredictstimeuntilanevent.(OrdinaryregressionpredictingtheLTVamountignorestimingandnumberofpurchases.)

Survivalanalysisisatechniquedesignedspecificallytostudytimeuntileventproblems.Ithastimingbuiltintoitandthusafutureviewisalreadyembeddedinthealgorithm.Thisremovesmuchofthearbitrarinessoftypical(descriptive)LTVcalculations.

So,whataboutusingsurvivalanalysistoseewhichindependentvariables,say,bringinapurchase?DecreasingtimeuntilpurchasetendstoincreaseLTV.Whilesurvivalanalysiscanpredictthenexttimeuntilpurchase,thestrategicvalueofsurvivalanalysisisinusingtheindependentvariablestoCHANGEthetimingofpurchases.Thatis,descriptiveanalysisshowswhathappened;predictiveanalysisgivesaglimpseofwhatmightCHANGEthefuture.

StrategyusingLTVdictatesunderstandingthecausesofcustomervalue:whyacustomerpurchases,whatincreases/decreasesthetimeuntilpurchase,probabilityofpurchasingatfuturetimes,etc.Thenwhentheseinsightsarelearned,marketinglevers(shownasindependentvariables)areexploitedtoextractadditionalvaluefromeachcustomer.Thismeansknowingthatonecustomeris,say,sensitivetopriceandthatadiscountwilltendtodecreasetheirtimeuntilpurchase.Thatis,theywillpurchasesooner(maybepurchaselargertotalamountsandmaybepurchasemoreoften)withadiscount.Anothercustomerprefers,say,productXandproductYbundledtogethertoincreasetheprobabilityofpurchaseandthisbundlingdecreasestheirtimeuntilpurchase.Thisinsightallowsdifferentstrategiesfordifferentcustomerneedsandsensitivities.Survivalanalysisappliedtoeachcustomeryieldsinsightstounderstandandincentivizechangesinbehaviour.

Thismeansjustassumingthepastbehaviourwillcontinueintothefuture(asdescriptiveanalysisdoes)withnoideawhy,isnolongernecessary.It’spossiblefordescriptiveandpredictiveanalysistogivecontradictoryanswers.Whichiswhy‘crawling’mightbedetrimentalto‘walking’.

Ifafirmcangetacustomertopurchasesooner,thereisanincreasedchanceofaddingpurchases–dependingontheproduct.Butevenifthenumberofpurchasesisnotincreased,thefirmgettingrevenuesoonerwilladdtotheirfinancialvalue(timeismoney).

Alsoabusinesscasecanbecreatedbyshowingthetrade-offingivingup,say,marginbutobtainingrevenuefaster.Thismeansstrategycanrevolvearoundmaximizationofcostbalancedagainstcustomervalue.

Theideaistomodelnexttimeuntilpurchase,thebaseline,andseehowtoimprovethat.Howisthiscarriedout?Abehaviourally-basedmethodwouldbetosegmentthecustomers(basedonbehaviour)andapplyasurvivalmodeltoeachsegmentandscoreeachindividualcustomer.Bybehaviourwetypicallymeanpurchasing(amount,timing,shareofproducts,etc.)metricsandmarcom(openandclick,directmailcoupons,etc.)responses.

AnexampleLet’suseanexample.Table5.4showstwocustomersfromtwodifferentbehaviouralsegments.CustomerXXXpurchasesevery88dayswithanannualrevenueof43,958,costsof7,296foranetrevenueof36,662.Saythesecondyearisexactlythesame.Soyearonediscountedat9%isNPVof33,635andyeartwodiscountedat9%fortwoyearsis30,857foratotalLTVof64,492.CustomerYYYhassimilarcalculationsforLTVof87,898.

Table5.4Comparisonofcustomersfromdifferentbehaviouralsegments

Customer Daysbetweenpurchases

Annualpurchases

Totalrevenue

Totalcosts

NetrevYR1

NetrevYR2

YR1Disc

YR2Disc

LTVAT9%

XXX 88 4.148 43,958 7,296 36,662 36,662 33,635 30,857 64,492

YYY 58 6.293 62,289 12,322 49,967 49,967 45,842 42,056 87,898

Theabove(usingdescriptiveanalysis)wouldhavemarketerstargetingcustomerYYYwith>23,000valueovercustomerXXX.ButdoweknowanythingaboutWHYcustomerXXXissomuchlowervalued?Isthereanythingthatcanbedonetomakethemhighervalued?

Applyingasurvivalmodeltoeachsegmentoutputsindependentvariablesandshowstheireffectonthedependentvariable.Inthiscasethedependentvariableis(average)timeuntilpurchase.Saytheindependentvariables(whichdefinedthebehaviouralsegments)arethingslikepricediscounts,productbundling,seasonalmessages,addingadditionaldirectmailcataloguesandofferingonlineexclusives.Thesegmentationshouldseparatecustomersbasedonbehaviourandthesurvivalmodelsshouldshowhowdifferentlevelsofindependentvariablesdrivedifferentstrategies.

Table5.5overleafshowsresultsofsurvivalmodellingonthetwodifferentcustomersthatcomefromtwodifferentsegments.Theindependentvariablesarepricediscountsof10%,productbundling,etc.TheTTEistimeuntileventandshowswhathappenstotimeuntilpurchasebasedonchangingoneoftheindependentvariables.Forexample,forcustomerXXX,givingapricediscountof10%onaveragedecreasestheirtimeuntilpurchaseby14days.GivingYYYa10%discountsdecreasestheirtimeuntilpurchasebyonly2days.ThismeansXXXisfarmoresensitivetopricethenYYY–whichwouldnotbeknownbydescriptiveanalysisalone.LikewisegivingXXXmoredirectmailcataloguespushesouttheirTTEbutpullsinYYYby2days.NotealsothatverylittleofthemarketingleversaffectYYYverymuch.WearealreadygettingnearlyallfromYYYthatwecan,andnomarketingeffortdoesverymuchtoimpacttheTTE.However,withXXXthereareseveralthingsthatcanbedonetobringintheirpurchases.Again,noneofthesewouldbeknownwithoutsurvivalmodellingoneachbehaviouralsegment.

Table5.5Resultsofsurvivalmodelling

XXX YYY

Variables TTE TTE

Pricediscount10% –14 –2

Productbundling –4 12

Seasonalmessage 6 5

Fivemorecatalogues 11 –2

Onlineexclusive –11 3

Table5.6belowshowsnewLTVcalculationsonXXXafterusingsurvivalmodellingresults.WedecreasedTTEby24days,byusingsomecombinationsofdiscounts,bundlingandonlineexclusives,etc.NotenowtheLTVforXXX(afterusingpredictiveanalysis)isgreaterthanYYY.

Table5.6LTVcalculations

Customer Daysbetweenpurchases

Annualpurchases

Totalrevenue

Totalcosts

NetrevYR1

NetrevYR2

YR1Disc

YR2Disc

LTVAT9%

XXX 64 5.703 60,442 10,032 50,410 50,410 33,635 30,857 88,677

YYY 58 6.293 62,289 12,322 49,967 49,967 45,842 42,056 87,898

Whatsurvivalanalysisoffers,inadditiontomarketingstrategylevers,isafinancialoptimalscenario,particularlyintermsofcoststomarket.Thatis,customerXXXrespondstoadiscount.It’spossibletocalculateandtestwhatisthe(just)neededthresholdofdiscountstobringapurchaseinbysomanydayswiththeestimatedlevelofrevenue.Thisendsupbeingacost/benefitanalysisthatmakesmarketersthinkaboutstrategy.Thisistheadvantageofpredictiveanalysis–givingmarketersstrategicoptions.

Checklist


Pointoutthat‘timeuntilanevent’isamorerelevantmarketingquestionthan‘probabilityofanevent’.

Rememberthatsurvivalanalysiscameoutofbiostatisticsandissomewhatrareinmarketing,butverypowerful.

Observethattherearetwo‘flavours’ofsurvivalanalysis:liferegandproportionalhazards.Liferegmodelsthesurvivalcurveandproportionalhazardsmodelsthehazardrate.

Championcompetingrisks,anaturaloutputofsurvivalanalysis.Inmarketing,thisgivestimeuntilvariouseventsortimeuntilmultipleproductspurchased,etc.

Understandthatpredictivelifetimevalue(usingsurvivalanalysis)ismoreinsightfulthandescriptivelifetimevalue.

06

Modellingdependentvariabletechniques(withmorethanoneequation)Introduction

Whataresimultaneousequations?

Whygotothetroubleofusingsimultaneousequations?

Desirablepropertiesofestimators

Businesscase


IntroductionSofarwe’vedealtwithoneequation,arathersimplepointofview.Ofcourse,consumerbehaviourisanythingbutsimple.Marketingscienceisdesignedtounderstand,predictandultimatelyincentivize/changeconsumerbehaviour.Thisrequirestechniquesthatareascomplicatedasthatbehaviourissophisticated.Thisiswheresimultaneousequationscomein,asamorerealisticmodelofbehaviour.

Simultaneousequations:asystemofmorethanonedependentvariable-typeequation,oftensharingseveralindependentvariables.

Whataresimultaneousequations?Simplyput,simultaneousequationsaresystemsofequations.Youhadthisinalgebra.It’simportant.Thisbeginstobuildasimulationofanentireprocess.It’sdoneinmacroeconomics(remembertheKeynesianequations?)anditcanbedoneinmarketing.

PredeterminedandexogenousvariablesTherearetwokindsofvariables:predetermined(laggedendogenousandexogenous)andendogenousvariables.Generally,exogenousarevariablesdeterminedOUTSIDEthesystemofequationsandendogenousaredeterminedINSIDEthesystemofequations.(Thinkofendogenousvariablesasbeingexplainedbythemodel.)Thiscomesinhandytoknowwhenusingtheruleintheidentityproblembelow.(TheidentityproblemisaGIANTpainintheneckbutthemodelcannotbeestimatedwithoutgoingthroughthesehoops.)

Thisisimportantbecauseapredeterminedvariableisonethatiscontemporaneouslyuncorrelatedwiththeerrorterminitsequation.Notehowthistiesupwithcausality.IfYiscausedbyXthenYcannotbeanindependentvariableincontemporaneouslypredicting/explainingY.

Saywehaveasystemcommonineconomics:

Q(demand)=D(I)+D(price)+Income+D(error)

Q(supply)=S(I)+S(price)+S(error)

NotethatthevariablesQandpriceareendogenous(computedwithinthesystem)andincomeisexogenous.Thatis,incomeisgiven.(D(I)istheinterceptinthedemandequationandS(I)istheinterceptinthesupplyequation.)Theseequationsarecalledstructuralformsofthemodel.Algebraically,thesestructuralformscanbesolvedforendogenousvariablesgivingareducedformoftheequations.

Reducedformequations:ineconometrics,modelssolvedintermsofendogenousvariables.

Thatis:

Thereducedformoftheequationsshowshowtheendogenousvariables(thosedeterminedwithinthesystem)DEPENDonthepredeterminedvariablesanderrorterms.Thatis,thevaluesofQandPareexplicitlydeterminedbyincomeanderrors.Thismeansthatincomeisgiventous.

Notethattheendogenousvariablepriceappearsasanindependentvariableineachequation.Infact,itisNOTindependent,itdependsonincomeanderrortermsandthisistheissue.Itisspecificallycorrelatedwithitsown(contemporaneous)errorterm.Correlationofanindependentvariableanditserrortermsleadstoinconsistentresults.

Whygotothetroubleofusingsimultaneousequations?First,becauseit’sfun.AlsonotethatifasystemshouldbemodelledwithsimultaneousequationsandISNOT,theparameterestimatesareINCONSISTENT!Lastly,insightsaremorerealistic.Thesimulationsuggeststheappropriatecomplexity.

ConceptualbasicsGenerally,anyeconomicmodelhastohavethenumberofvariableswithvaluestobeexplainedtobeequaltothenumberofindependentrelationshipsinthemodel.Thisistheidentificationproblem.

Manytextbooks(Kmenta,Kennedy,Greene,etc.)cangivethemathematicderivationforthesolutionofsimultaneousequations.Thegeneralproblemisthattherehavetobeenoughknownvariablesto‘fix’eachunknownquantityestimated.Thatis,thereneedstobearule.Thegoodnewsisthatthereis.Hereistheruleforsolvingtheidentificationproblem:

Thenumberofpredeterminedvariablesexcluded

intheequationMUSTbe>=thenumberofendogenous

variablesincludedintheequation,lessone.

Let’susethisruleonthesupply-demandequationabove:

Q(demand)=D(I)+D(price)+Income+D(error)

Q(supply)=S(I)+S(price)+S(error)

Demand:thenumberofpredeterminedvariablesexcluded=zero.IncomeistheonlypredeterminedvariableanditISNOTexcludedfromthedemandequation.Thenumberofendogenousvariablesincludedlessone=2–1=1.Thetwoendogenousvariablesarequantityandprice.Sothenumberofpredeterminedvariablesexcludedintheequation=0andthisis<thenumberofendogenousvariablesincludedintheequation.Thereforethedemandequationisunder-identified.

Supply:thenumberofpredeterminedvariablesexcluded=one.Incomeistheonlypredeterminedvariableanditisexcludedfromthesupplyequation.Thenumberofendogenousvariablesincludedlessone=2–1=1.Thetwoendogenousvariablesarequantityandprice.Sothenumberofpredeterminedvariablesexcludedintheequation=0andthisis<thenumberofendogenousvariablesincludedintheequation.Thereforethesupplyequationisexactlyidentified.

DesirablepropertiesofestimatorsWehavenottalkedabout(andit’sabouttimewedid)whatarethedesirablepropertiesofestimators.Thatis,wehavespenteffortestimatingcoefficientson,say,priceandadvertisingbuthavenotdiscussedhowtoknowiftheestimatoris‘good’.Thatisthepurposeofthefollowingbriefdescription.Ifyouneedafuller(moretheoreticallystatistical)backgroundvirtuallyanyeconometricstextbookwillsuffice.(Asmentionedintheintroductiontothisbook,IpersonallylikeKmenta’sElementsofEconometricsandKennedy’sAGuidetoEconometrics.)

UnbiasednessAdesirablepropertymosteconometriciansagreeonisunbiasedness.Unbiasednesshastodowiththesamplingdistribution(rememberthestatisticalintroductionchapter?Youdidn’tthinkthatwouldeverbementionedagain,didyou?).

Ifwetakeanunlimitednumberofsamplesofwhatevercoefficientwe’reestimating,andaverageeachofthesesamplestogetherandplotthedistributionofthoseaveragesofthesamples,whatwewouldendupwithisthedistributionofthebetacoefficientofthatvariable.Theaverageoftheseaveragesisthecorrectvalueofthebetacoefficient,onaverage.Honest.Nowwhatdoesthismean?Itmeanstheestimatorofbetaissaidtobeunbiasedifthemeanofthe(verylargenumberofsamples)samplingdistributionisthesamevalueastheestimatedbetacoefficient.Thatis,iftheaveragevalueofbetainrepeatedsamplingisbeta,thentheestimatorforbetaisunbiased,onaverage.NotethatthisdoesNOTmeanthattheestimatedvalueofbetaisthecorrectvalueofbeta.ItmeansONAVERAGEtheestimatedvalueofbetawillbethevalueofbeta.Soundslikedoubletalk,huh?

Theobviousquestionishowdoyouknowifyourestimatorisunbiased?Thatisunfortunatelyaverymathematicallycomplexdiscussion.Theshortansweris:itdependsonhowthedataisgeneratedanditdependsalotonthedistributionoftheerrortermofthemodel.Rememberstatisticsusesinductivethinking(notdeductivethinking)soitisviewedfrominferences,indirectly.Thatis,anestimator,say,viaregression,isdesignedwiththesepropertiesinmind.Thusthesepropertiesproduceassumptionstotakeintoaccounthowthedataisgeneratedandwhatthatdoestothedisturbanceandhencewhatthatmeansforthesamplingdistribution.Asanexample,forregression,theassumptionsare:

1. ThedependentvariableactuallyDEPENDSonalinearcombinationofindependentvariablesandcoefficients.

2. Theaverageoftheerrortermiszero.3. Theerrortermshavenoserialcorrelationandhavethesamevariance(withall

independentvariables).4. Theindependentvariablesarefixedinrepeatedsamples,oftencallednon-stochastic

X.5. Thereisnoperfectcollinearitybetweentheindependentvariables.

Inaveryrealway,econometricmodellingisallaboutdealingwith(detectingandcorrecting)violationsoftheaboveassumptions.Justtomaketheobviouspoint:theseassumptionsaremadesothatthesamplingdistributionoftheparameterestimateshavedesirableproperties,suchasunbiasedness.Now,howimportantisunbiasedness?SomeeconometriciansclaimitisVERYimportantandtheyspendalltheirtimeandeffortaroundthat(andotherproperties).Imyselftakelittlecomfortinunbiasedness.Iwantto

knowiftheestimatorsarebiasedornot,maybeevenaguessastohowmuch,butintherealworld,itisnotoftenofmuchpracticalmatter.ThisisbecauseyoucouldhavetheoreticallyanynumberofsamplesandwhileonaveragethesamplingdistributionIStherealbetaestimate,youneverreallyknowwhichsampleyouhave.It’spossibleyouhaveanunusuallybadsample.Andintherealworldyouarenotusuallyabletotakemanysamples,indeedyouusuallyonlyhaveONE,theoneinfrontofyou.

EfficiencyWhatisoftenmoremeaningful,afterunbiasedness,inmanycases,isefficiency.Thatis,anestimatorthathasminimumvarianceofalltheunbiasedestimators.Insimpletermsitmeansthatestimator,ofalltheunbiasedestimators,hasthesmallestvariance.

ConsistencyUnbiasednessandefficiencyareaboutthesamplingdistributionoftheestimatedcoefficientanddonotdependonthesizeofthesample.Asymptoticpropertiesareaboutthesamplingdistributionoftheestimatedcoefficientinlargesamples.Consistencyisanasymptotic(largesample)property.

Becausethesamplingdistributionchangesasthesamplesizeincreases,themeanandthevariancecanchange.Consistencyisthepropertythatthetruebetavaluewillcollapsetothepointofthepopulationbetavalue,assamplesizeincreasestoinfinity.

ConsistencyissomethingIlikealot,because(indatabasemarketing,forexample)wetypicallyworkwithverylargesamplesandthereforecantakecomfortinthesamplingpropertiesoftheestimators.

WhyamIbringingalltheaboveupnow?Becauseinsimultaneousequations,theonlypropertytheestimatorscanhave(becausetheindependentvariableswillNOTbefixedinrepeatedsamples,thatis,thenon-stochasticXassumptionisviolated)willbeconsistency.

BUSINESSCASEScott’sbosscalledhimintohisoffice.Thesubjectofthemeetinginvitewas‘Cannibalization?’

‘Scott,ourpricingteamsarealwaysatwar,asyouknow.Wehavealwaysfeltthatoneproductcouldcannibalizeanotherwithwildpricingsfromtheproductteams.’

‘Yeah,wetalkaboutthateveryquarter.’

‘WhatIwonderedwas,givenyoursuccessatquantifyingsomuchofourmarketingoperations,canwedosomethingaboutcannibalization?’

‘Whatdoyoumean,“dosomethingaboutit”?’

‘Canweputtogethersomemodelofoptimization?WhatpricesSHOULDthethreeproductteamscharge,inordertomaximizeouroverallrevenue?’

‘Soit’spricingfortheenterpriseinsteadofpricingfortheproduct.Thatsoundslikeaverycomplicatedproblem.’

‘Butitissimilartotheelasticitymodellingthatyoudid,especiallyintermsofsubstitutes,right?’

‘Yeah,Ithinkso.I’mnotsurehowtogetthedemandofeachproductintotheregression.I’llhavetoresearchit.’

‘Great,thanks.E-mailmetomorrowyourideas.’

Scottlookedathimandblinked.Hisbossturnedhischairaroundandwentbacktolookingoverhisothere-mails.Scottgotupandwentbacktohisplace,alittlebewildered.

Coulditbejusthavingademandequationfor,say,desktopsthatincludedthepriceofdesktopsaswellasthepricesofnotebooksandservers?Thatdidnotseemlikeittookintoaccountalloftheinformationavailable.Thatis,theremustbecross-equationcorrelation,meaningconsumersfeelthepricesofnotebookschangeastheyshopforadesktop,etc.WhatScottneededwasawaytosimultaneouslymodeltheimpactofeachproduct’spriceoneachproduct’sdemand.

Theaboveisademandsystem.Itisasetofthreesimultaneousequationsthataresolved(naturally)simultaneously.Thissetofequationspositsthatthedemand(quantity)ofeachproductisimpactedbytheown-priceoftheproductaswellasthecross-priceoftheotherproducts.

Notethattheapproachherewillbefairlybriefandeconometricallyoriented.Foradetailedmathematicalandmicroeconomicallyorientedtreatment,seeAngusDeatonandJohnMuellbauer’soutstanding1980workEconomicsandConsumerBehavior.Inthatbooktheythoroughlydetailconsumerdemandanddemandsystemswhereintheyultimatelypositthe(unfortunatelynamed)AlmostIdealDemandSystem(AIDS).

SoScottresearchedsimultaneousequations.RightawayitwasobviousthatthistechniqueviolatestheOLSassumptionofindependentvariablesfixedinrepeatedsample,ornon-stochasticX.Thatis,theindependentvariablessolutiondependedonthevaluesoftheindependentvariablesintheotherequations.Thisultimatelymeanttheonlydesirableproperty(notunbiasedness,notefficiency)wasconsistency.Thatis,simultaneousequationshavedesirableasymptoticproperties.

Scottfoundanotherissueresultingfromsimultaneousequations:theproblemofidentity.Hehadtoapplytherule(mentionedabove)thateachequationbeatleastjustidentified.Recalltheruleforidentificationis:

Thenumberofpredeterminedvariables

excludedintheequationbe>=thenumber

ofendogenousvariablesincludedinthe

equation,lessone.

NowScotthadtoputtogethertheequationsfromthedatahecollected.Hegotweeklydataondesktop,notebookandworkstationsales(units)forthelastthreeyears.Hegottotalrevenueofeachaswell,whichwouldgivehimaverageprice(price=totalrevenue/units).Hewoulduseseasonalityandconsumerconfidence.Hecollectednumberofdirectmailssentandthenumberofe-mailssent,openedandclickedbyweek.

Scottputtogethertheresultsoverleaffromthemodel(Table6.1).Notetheidentificationstatusonallis‘overidentified’.Fordesktops:thenumberofpredeterminedvariablesexcludedis4(numberofe-mails,numberofvisits,JanuaryandOctober)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofdesktops,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,4>3.Fornotebooks:thenumberofpredeterminedvariablesexcludedis4(numberofdirectmails,consumerconfidence,DecemberandOctober)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofnotebooks,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,4>3.Forworkstations:thenumberofpredeterminedvariablesexcludedis6(numberofe-mails,numberofdirectmails,numberofvisits,consumerconfidence,DecemberandAugust)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofworkstations,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,6>3.

Table6.1Modelresults

PriceDT

PriceNB

PriceWS

#DMs

#EMs

#Visits

Consconf

Jan Dec Oct Aug

QuantityDT

–1.2 2.3 0.4 3.7 XX XX 5.3 XX 1.2 XX 0.5

QuantityNB

1.1 –2 0.2 XX 6.2 2.2 XX –0.8 XX XX 2.9

QuantityWS

0.2 0.8 –2.6 XX XX XX XX –1.1 XX –1.9 XX

Now,whatdoesTable6.1mean?Thiswasdesignedasanoptimalpricingproblem.WhatdoesthemodeltellScott?

First,sincethefocusisonpricingandspecificallycannibalization,lookatthedesktopmodel.Thepricecoefficientisnegative,aswe’dexpect:pricegoesup,quantitygoesdown.Nownoticethecoefficientonnotebooks.It’spositive(+2.3).Thismeansitisseen(bydesktopbuyers)asapotentialsubstitute.NotethatifnotebookpricesgodownthatispositivelycorrelatedwiththedemandfordesktopsandthequantityofdesktopswillGODOWNaswell.Thisiskeystrategicinformation.Itmeansthepricingpeoplecannot(andnevercould)priceinavacuum.RememberHazlitt’sbookEconomicsinOneLesson(1979)?Thelessonwasthateverythingis(directlyorindirectly)connected.Whathappenswithnotebookpricesaffectswhathappenstodesktopdemand.Thismeansaportfolioapproachshouldbetakenandnotasiloapproach.Noteaswellthat,inthedesktopequation,thepricesofworkstationsarealsoasubstitute,butless.It’sobviousthatthisinformationcanbeusedtomaximizetotalprofit.Itmightbethatoneparticularbrand(orproduct)willsubsidizeothers,butasuccessfulfirmwilloperateasanenterprise.Similarconclusionsarefortheotherproducts,intermsofpricing.

Theotherindependentvariablesareinterpretedlikewise.Consumerconfidenceandnumberofdirectmailsarepositiveininfluencingdesktopssalesbutnotintheotherproducts.Fornotebooks,e-mailsandvisitsarepositivebutAugustseasonalityisnegative.ForworkstationsbothJanuaryandOctoberarenegative.Allofthisisstrategicallylucrative.Forexample,don’tsende-mailstodesktopstargets,don’tsenddirectmailstonotebooktargetsanddon’tdomuchmarcominJanuary.

Scottusedtheabovemodeltohelpreorganizethepricingteams.Theybegantopriceasanenterpriserandnotinsilos.Notallofthemlikeditatfirstbuttheincreasesinrevenue(whichtranslatedintobonusesforthem)helpedtoassuagetheirmisgivings.

Conclusion

Simultaneousequationscanquantifyphenomenaandcangiveanswersimpossibletogetotherwise.Yes,it’sdifficult,requiresspecializedsoftwareandahighlevelofexpertise.But,asthebusinesscaseaboveshows,howelsewouldthefirmknowaboutoptimizingpricesacrossproductsorbrands?Inshort,thepriceisworthit.

Checklist


Learntoenjoytheaddedcomplexitythatsimultaneousequationsbringtoanalytics–itbettermatchesconsumerbehaviour.

Rememberthatsimultaneousequationsusetwokindsofvariables:predetermined(laggedendogenousandexogenous)andendogenousvariables.

Pointoutthatestimatorshavedesirableproperties:unbiasedness,efficiency,consistency,etc.

Observethateconometricsisreallyallaboutdetectingandcorrectingviolationsofassumptions(linearity,normality,sphericalerrorterms,etc.).

Provethatsimultaneousequationscanbeusedforoptimalpricingandunderstandingcannibalizationbetweenproducts,brands,etc.

Partthree

Inter-relationshiptechniques

07

Modellinginter-relationshiptechniquesWhatdoesmy(customer)marketlooklike?Introduction

Introductiontosegmentation

Whatissegmentation?Whatisasegment?


ThefourPsofstrategicmarketing

Criteriaforactionablesegmentation

Aprioriornot?

Conceptualprocess


IntroductionAsmentionedearlier,therearetwogeneraltypesofmultivariateanalysis:dependentvariabletechniquesandinter-relationshiptechniques.Mostofthefirstpartofthisbookhasbeenconcernedwithdependentvariabletechniques.Theseincludeallofthetypesofregression(ordinary,logistic,survivalmodelling,etc.),aswellasdiscriminateanalysis,conjointanalysis,etc.

Thepointofdependentvariabletechniquesistounderstandtowhatextentthedependentvariabledependsontheindependentvariables.Thatis,howdoespriceimpactunits,whereunitsisthedependentvariable(somethingwearetryingtounderstandorexplain)andpriceistheindependentvariable,avariablethatishypothesizedtocausethemovementinthedependentvariable.

Inter-relationshiptechniqueshaveacompletelydifferentpointofview.Theseincludemultivariatealgorithmslikefactoranalysis,segmentation,multi-dimensionalscaling,etc.Inter-relationshiptechniquesaretryingtounderstandhowvariables(price,productpurchases,advertisingspend,etc.)interact(inter-relate)together.Rememberhowfactoranalysiswasusedtocorrectforcollinearityinregression?Itdidthisbyextractingthevarianceoftheindependentvariablesinsuchawaysoaseachfactor(whichcontainedthevariables)wasuncorrelatedwithallotherfactors,thatis,theinter-relationshipbetweenthe

independentvariableswasconstructedtoformfactors.

Thissectionwillspendconsiderableeffortonaninter-relationshiptechniquethatisofupmostinterestandimportancetomarketing:segmentation.

IntroductiontosegmentationOk.Thisintroductorychapterisdesignedtodetailsomeofthestrategicusesandnecessitiesofsegmentation.Thechapterfollowingthiswilldiveintomoreoftheanalytictechniquesandwhatsegmentationoutputmaylooklike.Segmentationisoftenthebiggestanalyticprojectavailableandonethatprovidespotentiallymorestrategicinsightsthananyother.Plus,it’sfun!

Whatissegmentation?Whatisasegment?Agoodplacetostartistomakesureweknowwhatwe’retalkingabout.Radical,Iknow.Bydefinition,segmentationisaprocessoftaxonomy,awaytodividesomethingintoparts,awaytoseparateamarketintosub-markets.Itcanbecalledthingslike‘clustering’or‘partitioning’.Thus,amarketsegment(cluster)isasub-setofthemarket(orcustomermarket,ordatabase,etc.)

Segmentation:inmarketingstrategy,amethodofsub-dividingthepopulationintosimilarsub-marketsforbettertargeting,etc.

Thegeneraldefinitionofasegmentisthatmembersare‘homogeneouswithinandheterogeneousbetween’.Thatmeansthatagoodsegmentationsolutionwillhaveallthemembers(say,customers)withinasegmenttobeverysimilartoeachotherbutverydissimilartoallmembersofallothersegments.Homogeneousmeans‘same’andheterogeneousmeans‘different’.

It’spossibletohaveveryadvancedstatisticalalgorithmstoaccomplishthis,oritcanbeaverycrudebusinessrule.Thenextchapterwillmentionafewstatisticaltechniquesfordoingsegmentation.Notethatabusinessrulecouldsimplybe,‘Separatethedatabaseintofourparts:highestuse,mediumuse,lowuseandnouseofourproduct’.Thismanagerialfiathasbeen(andstillis)usedbymanycompanies.

RFM(recency,frequencyandmonetaryvariables)isanothersimplebusinessrule:separatethedatabaseinto,say,decilesbasedonthreemetrics:howrecentlyacustomerpurchased,howfrequentlyacustomerpurchasedandhowmuchmoneyacustomerspent.Manycompaniesarenotdoingmuchmorethanthis,intermsofsegmentation.ThesecompaniesarecertainlynotmarketingcompaniesbecausetechniqueslikeRFMarereallyfromafinancial,andnotacustomer,pointofview.Therefore,asegmentisthatentitywhereinallmembersassignedtothatsegmentare,bysomedefinition,alike.


So,whysegmentatall?Therearethreetypicalusesofsegmentation:findingsimilarmembers,makingmodellingbetterand–mostimportant–usingmarketingstrategytoattackeachsegmentdifferently.

Findinghomogeneousmembersisavaluableuseofastatisticaltechnique.Thebusinessproblemtendstobe:findallthosethatare‘alike’andseehow,say,satisfactiondiffersbetweenthem,orfindallthosethatare‘homogeneous’bysomemeasureandseehowusagevariesbetweenthem.

Asimpleexamplemightbein,say,telecommunications,wherewearelookingatchurn(attrition)rates.Wewanttounderstandthemotivationofchurn,whatbehaviourcanpredictchurn.So,conductsegmentationandidentifycustomersineachsegmentthatarealikeinallimportantwaystothebusiness(products,usage,demographics,channelpreferences,etc.)andshowdifferentchurnratesbysegment.Notethatchurnisnotthevariablethatallsegmentsarealikeon,churniswhatwearetryingtounderstand.Thuswecontrolforseveralinfluences(allmemberswithinasegmentarealike)andnowcanseehighversuslowchurners,afterallothersignificantvariableshavebeeneliminated.

Asecondusage,alsosophisticatedandnuanced,istousesegmentationtoimprovemodelling.Intheabovechurnexample,saysegmentationwasdoneandwewanttopredictchurn.Werunaseparateregressionmodelforeachsegmentandfindthatdifferentindependentvariablesaffectchurndifferently.Thiswillbefarmoreaccurate(andactionable)thanone(average)modelappliedtoeveryonewithoutsegmentation.Thisapproachtakesadvantageofthedifferentreasonstochurn.Onesegmentmightchurnduetodroppedcalls,anothermightchurnbecauseofthepriceoftheplanandanotherissensitivetotheirbillbasedoncalls,minutesanddataused.Thus,eachmodelwillexploitthesedifferencesandbefarmoreaccuratethanotherwise.Themoreaccuratethemodel,thegreatertheinsights;thegreatertheunderstanding,themoreobviousthestrategyofhowtocombatchurnineachsegment.

Butfromamarketingpointofview,thereasontosegmentisthesimpleanswerthatnoteveryoneisalike;notallcustomersarethesame.Onesizedoesnotfitall.

I’devenofferatweakon‘segmentation’atthispoint.Marketsegmentationusesthemarketingconcept,wherethecustomeriskingandstrategyisthereforecustomer-centric.NotethatanalgorithmlikeRFMisfromthefirm’s(financial)pointofviewwithmetricsthatareimportanttothefirm.RFMisaboutdesigningvaluetiersbasedonafinancialperspective(seeChapter8highlight,‘WhygobeyondRFM?’).

Sincemarketingsegmentationshouldbefromthecustomer’spointofview,whydosegmentation?Thatis,howdoes‘onesizedoesnotfitall’operateintermsofcustomer-centricity?

Generally,it’sbasedonrecognizingthatdifferentcustomershavedifferentsensitivities.Thesedifferentsensitivitiescausethemtobehavedifferentlybecausetheyaremotivated

differently.

Thismeansconsiderableeffortneedstobeappliedtolearnwhatmakeseachbehaviouralsegmentasegment.(Thespecifictechniquestodothisareexplainedinthenextchapter.)Itmeansdevelopingastrategytoexploitthesedifferentsensitivitiesandmotivations.

Usuallythereisasegmentsensitivetoprice,andasegmentnotsensitivetoprice.Oftenthereisasegmentthatprefersonechannel(sayonline)andasegmentthatprefersanotherchannel(sayoffline).TypicallyonesegmentwillhavehighpenetrationofproductXwhileanothersegmentwillhavehighpenetrationofproductY.Onesegmentneedstobecommunicatedtodifferently(style,imaging,messaging,etc.)thananothersegment.Notethatthisisfarmoreinvolvedthanasimplebusinessrule.

Theideaisthatifasegmentissensitiveto,say,price,thenthosemembersshouldgetadiscountorabetteroffer,inordertomaximizetheirprobabilitytopurchase(theyfaceanelasticdemandcurve).Thesegmentthatisnotsensitivetoprice(becausetheyareloyal,wealthy,nosubstitutesavailable,etc.)shouldnotbegiventhediscountbecausetheydon’tneeditinordertopurchase.

Iknowtheaboveaddscomplexitytotheanalysis.ButnotethatconsumerbehaviourIScomplex.Behaviourincorporatessimultaneousmotivationsandmultidimensionalfactors,sometimesnearlyirrational(rememberDanAriely’sbook,PredictablyIrrational?).

Understandingconsumerbehaviourrequiresacomplex,sophisticatedsolution,ifthegoalistodomarketing,ifthegoalistobecustomer-centric.Asimplersolutionwon’twork.Muchliketheproblemthathappenswhenwetakeathree-dimensionalglobeoftheearthandspreaditoutoveratwo-dimensionalspace.Greenlandisnowwayoffinsize;theworldiswrong.Beingoverlysimplisticproduceswrongresults;justlikeapplyingaunivariatesolutiontoamultivariateproblemwillproducewrongresults.

FortheMBA(whichseemstoneedalistàlaPowerPoint)I’dsuggestthefollowingasbenefitsofsegmentation:

MarketingResearch:learningWHY.Segmentationprovidesarationaleforbehaviour.

MarketingStrategy:targetingbyproduct,price,promotionandplace.Strategyusesthemarketingmixbyexploitingsegmentdifferences.

MarketingCommunications:messagingandpositioning.Somesegmentsneedatransactionalstyleofcommunication;othersegmentsneedarelationshipstyleofcommunication.Onesizedoesnotfitall.

MarketingEconomics:imperfectcompetitionleadstopricemakers.Withthefirmcommunicatingjusttherightproductatjusttherightpriceinjusttherightchannelatjusttherighttimetothemostneedytarget,suchcompellingoffersgivethe

firmnearlymonopolisticpower.

ThefourPsofstrategicmarketingSegmentationispartofastrategicmarketingprocesscalledthefourPsofstrategicmarketing,coinedbyPhilipKotler.Kotlerisprobablythemostwidelyrecognizedmarketingguruintheworld,essentiallycreatingthedisciplineofmarketingasseparatefromeconomicsandpsychology.HewrotemanytextbooksincludingMarketingManagement(1967),nowinits14thedition,whichhasbeenusedfordecadesasthepillarofallmarketingeducation.

MostmarketersareawareofthefourPsoftacticalmarketing:product,price,promotionandplace.Theseareoftencalledthe‘marketingmix’.Butbeforetheseareapplied,amarketingstrategyshouldbedeveloped,basedonthefourPsofstrategicmarketing.

PartitionThefirststepistopartitionthemarketbyapplyinga(behavioural)segmentationalgorithmtodividethemarketintosub-markets.Thismeansrecognizingstrategicallythatonesizedoesnotfitall,andunderstandingthateachsegmentrequiresadifferenttreatmenttomaximizerevenue/profitorsatisfaction/loyalty.

ProbeThissecondstepisusuallyaboutadditionaldata.Oftenthismaycomefrommarketingresearch,probingforattitudesaboutthebrand,itscompetitors,shoppingandpurchasingbehaviour,etc.Sometimesitcancomefromdemographicoverlaydata,whichisespeciallyvaluableifitincludeslifestyleinformation.Last,probingdatacancomefromcreatedvariablesfromthedatabaseitself.Thesetendtobearoundvelocity(timebetweenpurchases)orshareofproductspenetrated(whatpercentdoesthecustomerbuyofcategoryX,whatpercentofcategoryY,etc.),seasonality,consumerconfidenceandinflation,etc.

PrioritizeThisstepisafinancialanalysisoftheresultingsegments.Whicharemostprofitable,whicharegrowingfastest,whichrequiremoreefforttokeeporcosttoserve,etc.?PartofthepointofthisstepistofindthosethatwemightdecidetoDE-market,thatis,thosethatarenotworththeefforttocommunicateto.

PositionPositioningisaboutusingalloftheaboveinsightsandapplyinganappropriatemessage,

orthecorrectlookandfeelandstyle.Thisisthetoolthatallowsthecreationofcompellingmessagesbasedonasegment’sspecificsensitivities.Thismarketingcommunicationisoftencalledmarcom.ThisincorporatesthefourPsoftacticalmarketing.

CriteriaforactionablesegmentationI’vealwaysthoughtthelistbelowguidedasegmentationprojectthatendedupbeingactionable.ThistooprobablycamefromPhilipKotler(asdomostthingsthataregoodandimportantinmodernmarketing).

Identifiability.Inordertobeactionableeachsegmenthastobeidentifiable.Oftenthisistheprocessofscoringthedatabasewitheachcustomerhavingaprobabilityofbelongingtoeachsegment.

Substantiality.Eachsegmentneedstobesubstantialenough(largeenough)tomakemarketingtoitworthwhile.Thusthere’sabalancebetweendistinctivenessandsize.

Accessibility.Notonlydothemembersofthesegmenthavetobeidentifiable,theyhavetobeaccessible.Thatis,therehastobeawaytogettothemintermsofmarketingefforts.Thistypicallyrequireshavingcontactinfo,e-mail,directmail,SMS,etc.

Stability.Segmentmembershipshouldnotchangedrastically.Thethingsthatdefinethesegmentsshouldbestablesothatmarketingstrategyispredictableovertime.Segmentationassumestherewillbenodrasticshocksindemand,orradicalchangesintechnology,etc.,intheforeseeablefuture.

Responsiveness.Tobeactionable,thesegmentationmustdriveresponses.Ifmarcomdataisoneofthesegmentationdimensions,thisisusuallyachievable.

Aprioriornot?Asthisisapractitioner’sguidetomarketingscience,itshouldcomeasnosurprisethatIadvocatestatisticalanalysistoperformsegmentation.However,it’safactthatsometimesthereare(top-down)dictumsthatdefinesegments.Thesearemanagerialfiatsthatdemandamarketbebased(apriori)onmanagerialjudgment,ratherthansomeanalytictechnique.Theusualdimension(s)managerswanttoartificiallydefinetheirmarketbytendtobeusage,profit,satisfaction,size,growth,etc.Analytically,thisisaunivariateapproachtowhatisclearlyamultivariateproblem.

Inmyopinion,thereisaplaceformanagerialjudgment,butitisNOTinsegmentdefinition.Afterthesegmentsaredefined,thenmanagerialjudgmentshouldascertainifthesolutionmakessense,ifthesegmentsthemselvesareactionable.

Conceptualprocess

Settleona(marketing/customer)strategyThegeneralfirststepinbehaviouralsegmentationisoneofstrategy.Afterthefirmestablishesgoals,astrategyneedstobeinplacetoreachthosegoals.Thereshouldbeachampion,abusinessleader,astakeholderthatistheultimateuserofthesegmentation.

Analyticsneedstorecognizethatasegmentationnotdrivenbystrategyisakintoabodywithoutaskeleton.Strategysupportseverything.Averydifferentsegmentationshouldresultifthestrategyisaboutmarketshareasopposedtoastrategyaboutnetmargin.

Astrategydiscussionshouldrevolvearoundcustomerbehaviour.Whatisthemindsetinacustomer’smind?Whatisthebehaviourwearetryingtounderstand?Whatincentiveareweemploying?Anygoodsegmentationsolutionshouldtietogethercustomerbehaviourandmarketingstrategy.Remember,marketingiscustomer-centric.

Collectappropriate(behavioural)dataThenextanalyticstepinbehaviouralsegmentationistocollectappropriate(behavioural)data.Thistendstobegenerallyaroundtransactions(purchases)andmarcomresponses.

Afewcommentsoughttobemadeaboutwhatismeantby‘behaviouraldata’.Mytheoryofconsumerbehaviour(andit’sokayifyoudon’tagree)istoenvisionfourlevels(seeFigure7.1overleaf):primarymotivations,experientialmotivations,behavioursandresults.Results(typicallyfinancial)arecausedbybehaviours(usuallysomekindoftransactionpurchasesandmarcomresponses),whicharecausedbyoneorboth(primaryandexperiential)motivations.Primarymotivations(pricevaluation,attitudesaboutlifestyle,tastesandpreferences,etc.)aregenerallypsychographicandnotreallyseen.Theyaremotivationalcauses(searching,needarousal,etc.)withoutbrandinteraction.Experientialmotivationstendtohavebrandinteractionandareanothermotivatortoadditionalbehavioursthatultimatelycause(financial)results.Thesemotivationsarethingslikeloyalty,engagement,satisfaction,etc.Notethatengagementisanexperientialcause(therehasbeeninteractionwiththebrand)andisnotabehaviour.Engagementwouldbemetricslikerecencyandfrequency.TherewillbemoreonthistopicwhenwediscussRFM(seeChapter8highlight).I’llwarnyouthisisoneofmysoapboxes.

Figure7.1RevenueGrowthMargin

Usuallytransactionsandmarcomresponses(fromdirectmail,e-mail,etc.)arethemaindimensionsofbehaviouralsegmentation.Oftenadditionalvariablesarecreatedfromthesedimensions.

Wewanttoknowhowmanytimesacustomerpurchased,howmucheachtime,whatproductswerepurchased,whatcategorieseachproductpurchasedbelongedto,etc.Oftenvaluableprofilingvariablesgoalongwiththis,includingnetmarginoneachpurchase,costofgoodssold,etc.Wewanttoknowthenumberoftransactionsoveraperiodoftime,thenumberofunitsandifanydiscountswereappliedtothesetransactions.

Intermsofmarcomresponseswewanttocollectwhatkindofvehicle(directmail,e-mail,etc.),opens,clicks,websitevisits,storepurchases,discountsused,etc.Wewanttoknowwheneachvehiclewassentandwhatcategoryofproductwasfeaturedoneachvehicle.Anyversioningneedstobecollected,andanyoffers/promotions,etc.,needtobeannotatedinthedatabase.Allofthisdatasurroundingtransactionsandresponsesisthebasisofcustomerbehaviour.

Generallyweexpecttofindasegmentthatisheavilypenetratedinonetypeofcategory(broadproductspurchased)butnotanotherandthiswillbedifferentbymorethanonesegment.Asbearsrepeating,onesegmentisheavilypenetratedbycategoryX,whileanotherisheavilypenetratedbycategoryY,etc.Wealsoexpecttofindoneormoresegmentsthatprefere-mailoronlinebutnotdirectmail,orviceversa.Wetypicallyfindasegmentthatissensitivetopriceandonethatisnotsensitivetoprice.Theseinsightscomedifferentlyfromthesebehaviouraldimensions.

Create/useadditionaldata

Nowcomesthefunpart.Hereyoucancreateadditionaldata.Thisdataatleasttakestheformofseasonalityvariables,calculatestimebetweeneachpurchase,timebetweencategoriespurchased,peaksandvalleysoftransactionsandunitsandrevenue,shareofcategories(percentofbabyproductscomparedtototal,percentofentertainmentcategoriescomparedtototal),etc.Thereshouldbemetricslikenumberofunitsandtransactionspercustomer,percentofdiscountspercustomer,toptwoorthreecategoriespurchasedpercustomer,etc.Allofthesecanbeused/testedinthesegmentation.

Asformarcom,thereshouldbeahostofmetricsaroundmarcomtypeandofferandtimeuntilpurchase.Thereshouldbebusinessrulestyingacampaigntoapurchase.Thereshouldbevariablesindicatingcategoriesfeaturedonthecover,orsubjectlines,oroffersandpromotions.

Notehowalloftheaboveexpandbehaviouraldata.Butthereareothersourcesofdataaswell.Oftenprimarymarketingresearchisused.Thistendstobearoundsatisfactionorloyalty,somethingaboutcompetitivesubstitutes,maybemarcomawarenessorimportanceofeachmarcomvehicle.

Thirdpartyoverlaydataisarichsourceofadditionalinsightsintofleshingoutthesegments.Thisisoftenmatcheddatalikedemographics,interests,attitudes,lifestyles,etc.Thisdataistypicallymosthelpfulwhenitdealswithattitudesorlifestyle,butdemographicscanbeinterestingaswell.Againallofthisadditionaldataisaboutfleshingoutthesegmentsandtryingtounderstandthemindset/rationaleofeachsegment.

RunthealgorithmAsmentioned,thealgorithmdiscussionwillbecoveredindepthinthenextchapter,butafewcommentscanbemadenow,particularlyintermsofprocess.Notethatthealgorithmisguidedbystrategyanduses(definingorsegmenting)variablesbasedonstrategy.

Thealgorithmistheanalyticgutsofsegmentationandcareshouldbetakeninchoosingwhichtechniquetouse.Thealgorithmshouldbefastandnon-arbitrary.Analytically,wearetryingtoachievemaximumseparation(segmentdistinctiveness).

Theultimateideaofsegmentationistoleveladifferentstrategyagainsteachsegment.ThereforeeachsegmentshouldhaveadifferentreasonforBEINGasegment.Thealgorithmneedstoprovidediagnosticstoguideoptimization.Thegeneralmetricofsuccessis‘homogeneouswithinandheterogeneousbetween’segments.Therehavebeenmanysuchmetricsoffered(SAS,viaprocdiscrim,uses‘thelogarithmofthedeterminantofthecovariancematrix’asametricofsuccess).Intheprofiling,thedifferentiationofeachsegmentshouldmakeitselfclear.

Justtostackthedeck,letmedefinewhatagoodalgorithmforsegmentationshouldbe.Itshouldbemultivariable,multivariate,andprobabilistic.Itshouldbemultivariablebecauseconsumerbehaviourismostcertainlyexplainedbymorethanonevariable,andit

shouldbemultivariatebecausethesevariablesthatareimpactingconsumerbehavesimultaneously,interactingwitheachother.Itshouldbeprobabilisticbecauseconsumerbehaviourisprobabilistic;ithasadistributionandatsomepointthatbehaviourcanevenbeirrational.Gasp!

ProfiletheoutputProfilingiswhatweshowtootherpeopletoprovethatthesolutiondoesdiscriminatebetweensegments.Generallythemeansand/orfrequenciesofeachkeyvariable(especiallytransactionsandmarcomresponses)areshowntoquicklygaugedifferencesbyeachsegment.Notethatthemoredistincteachsegmentisthemoreobviousastrategy(foreachsegment)becomes.

ToshowthemeansofKPIs(keyperformanceindicators)bysegmentiscommon,butoftenanothermetricteasesoutdifferencesbetter.Usingindexesoftenspeedsdistinctiveness.Thatis,takeeachsegment’smeananddividebythetotalmean.Forexample,saysegmentonehasaveragerevenueof1,500andsegmenttwohasaveragerevenueof750andthetotalaverage(allsegmentstogether)is1,000.Dividingsegmentonebythetotalis1,500/1,000=1.5,thatis,segmentonehasrevenue50%aboveaverage.Notealsothatsegmenttwois750/1,000=0.75meaningthatsegmenttwocontributesrevenue25%lessthanaverage.Applyingindexestoallmetricsbysegmentimmediatelyshowsdifferences.Thisisespeciallyobviouswheresmallnumbersareconcerned.Asanotherexample,saysegmentonehasaresponserateof1.9%andtheoverallgrandtotalresponserateis1.5%.Whilethesenumbers(segmentonetototal)areonly0.4%different,notethattheindexofsegmentone/totalis1.9%/1.5%showingthatsegmentoneis27%greaterthanaverage.Thisiswhyweliketo(andshould)useindexes.

Whileseeingdrasticdifferencesineachsegmentisverysatisfying,themostenjoyablepartofprofilingoftenistheNAMINGofeachsegment.Firstyoumustrealizethatnamingasegmenthelpsdistinguishthesegments.Themoresegmentsyouhavethemoreimportantthisbecomes.

Ihaveacoupleofsuggestionsaboutnamingsegments;takethemasyouseefit.Sometimesthenamingofsegmentsislefttothecreativedepartmentandthat’sokay.Butusuallyanalyticshastocomeupwithnames.

Eachnameshouldbeonlytwoorthreewords,ifpossible.Theyshouldbemoreinformativethansomethinglike‘HighRevenueSegment’or‘LowResponseSegment’.Theyshouldincorporatetwoorthreesimilardimensions.Eitherkeepmostofthemtoproductmarcomresponsedimensions,orkeepthemalongastrategicdimensionortwo(highgrowth,costtoserve,netmargin,etc.).It’stemptingtonamethemplayfullybutthisstillhastobeusable.Thatis,while‘BohemianMix’isfun,whatdoesitmeanstrategicallyorfromamarketingpointofview?

Modeltoscoredatabase(iffromasample)Thenextstep,ifthesegmentationwasdoneonasample,istoscorethedatabasewitheachcustomer’sprobabilitytobelongtoeachsegment.Thisisoftencarriedoutquicklywithdiscriminateanalysis.Apply(inSAS)procdiscrimtothesampleandgettheequationsthatscoreeachcustomerintoasegment.(Discriminateanalysisisacommontechnique,oncecategories(segments)aredefined,tofitvariablesinequationstopredictcategory(segment)membership.)Thenruntheseequationsagainstthedatabase.

Ifthisisaccurateenough(whatever‘accurateenough’means)thenyou’regoodtogo.ButdiscrimsometimesisNOTaccurateenough.Imyselfthinkthisisbecauseyouhavetousethesamevariables(althoughwithdifferentweights)oneachsegment.Thiscanbeinefficient.Thereisalsotheassumptioninherentinprocdiscrimaboutthesamevarianceacrossasegmentwhichishardlyevertrue,soyoumayneedtoturntoanothertechnique.

Ihaveoftensettledforlogisticregression,whereadifferentequationscoreseachsegment.Thatis,ifIhavefivesegments,thefirstlogitwillbewithabinarydependentvariable:1ifthecustomerisinsegmentoneand0ifnot.Thesecondlogitwillbea1ifthecustomerisinsegmenttwoanda0ifnot.ThenIputinvariablestomaximizeprobabilityofeachsegmentandIremovethosevariablesthatareinsignificantandrunallequationsagainstallcustomers.Eachcustomerwillhaveaprobabilitytobelongtoeachsegmentandthemaximumscorewins,ie,thesegmentthathasthehighestprobabilityisthesegmenttowhichthecustomerisassigned.

TestandlearnThetypicallaststepistocreateatestandlearnplan.Thisisgenerallyabroad-basedtestdesign,aimedatlearningwhichelementsdriveresults,whichisdirectlyinformedbythesegmentationinsights.

NoteChapter10ondesignofexperiments(DOE).Theoverallideahereistodevelopatestingplantotakeadvantageofsegmentation.Thefirstthingtotestistypicallyselection/targeting.Thatis,pullasampleofthoselikelytobelongtoaveryhighlyprofitable,heavyusagesegmentanddoamailingtothemandcomparerevenueandresponsestosomegeneralcontrolgroup.Thesehigh-endsegmentsshoulddrasticallyout-performthebusinessasusual(BAU)group.

Acommonnextstep(dependingonstrategy,etc.)mightbepromotionaltesting.Thiswouldusuallyfollowwithelasticitymodellingbysegment.Oftenoneormoresegmentsarefoundtobeinsensitivetopriceandoneormoresegmentsarefoundtobesensitivetoprice.Thetesthereistoofferpromotionsanddetermineifthesegmentinsensitivetopricewillstillpurchaseevenwithalowerdiscount.Thismeansthefirmdoesnothavetogiveawaymargintogetthesameamountofpurchases.

Othertypicaltestsrevolvearoundproductcategories,channelpreferenceand

messaging.Afullfactorialdesigncouldgetmuchlearningimmediatelyandthenmarcomcouldbeaimedappropriately.Thegeneralideaisthatifasegmentis,say,heavilypenetratedinproductX,sendthemaproductXmessage.IfasegmentmighthaveapropensityforproductY(givenproductX)doatestandseehowtoincentivizebroadercategorypurchases.Thenextchapterwillgothroughadetailedexampleofwhatthistestingmightmean.

Checklist


Pointoutthatsegmentationisastrategic,notananalytic,exercise.

Rememberthatsegmentationismostlyamarketingconstruct.

Arguethatsegmentationisaboutwhat’simportanttoaconsumer,notwhat’simportanttoafirm.

Recallthatsegmentationgivesinsightsintomarketingresearch,marketingstrategy,marketingcommunicationsandmarketingeconomics.

ObservethefourPsofstrategicmarketing:partition,probe,prioritizeandposition.

UncompromisinglydemandthatRFMbeviewedasaservicetothefirm,notaservicetotheconsumer.

Requireeachsegmenttohaveitsownstoryrationaleforwhyitisasegment.Thereshouldbeadifferentstrategylevelledateachsegment,otherwisethereisnopointinbeingasegment.

08

Segmentation:toolsandtechniquesOverview

Metricsofsuccessfulsegmentation


Businesscase

Analytics

Comments/detailsonindividualsegments

K-meanscomparedtoLCA

Highlight:WhyGoBeyondRFM?



OverviewThepreviouschapterwasmeanttobeageneral/strategicoverviewofsegmentation.Thischapterisdesignedtoshowtheanalyticaspectsofit,whichistheheartofthesegmentationprocess.Analyticsisthefulcrumofthewholeproject.

Afewbookstonote,intermsoftheanalyticsofsegmentation,wouldbeSegmentationandPositioningforStrategicMarketingDecisionsbyJamesH.Myers(1996),MarketSegmentationbyMichelWedelandWagnerA.Kamakura(1998)andAdvancedMethodsofMarketResearch,editedbyRichardP.Bagozzi(2002),especiallythechapters‘TheCHAIDApproachtoSegmentationModelling’and‘ClusterAnalysisinMarketResearch’.NotealsothepapersofJayMagdison(2002)fromtheStatisticalInnovationswebsite(www.statisticalinnovations.com).

MetricsofsuccessfulsegmentationAsmentionedearlier,thegeneralideaofsuccessfulsegmentationis‘homogeneouswithinandheterogeneousbetween’.Thereareseveralpossibleapproachestoquantifyingthisgoal.Generally,aratioofthosemembersinthesegmentiscomparedtoallthosemembersnotinthesegment,andthesmallerthebetter.Thishelpsustocomparea3-segmentsolutionwitha4-segmentsolution,ora4-segmentsolutionusingvariablesa–fwitha4-

http://www.statisticalinnovations.com

segmentsolutionusingvariablesd–j.SAS(viaprocdiscrim)hasthe‘logofthedeterminantofthecovariantmatrix’.Thisisagoodmetrictouseincomparingsolutionsevenifit’sabadly-namedone.


BusinessrulesTheremaybeaplaceforbusiness-rulesegmentation.Ifdataissparse,underpopulated,orveryfewdimensionsareavailable,there’slittlepointtryingtodoananalyticsegmentation.There’snothingforthealgorithmtooperateon.

I(again)cautionagainstamanagerialfiat.Ihavehadmanagerswhoinvestedthemselvesinthesegmentationdesign.Theyhavetoldmehowtodefinethesegments.Thisistypicallyflawed.Iwouldn’tsaytoignoremanagement’sknowledge/intuitionoftheirmarketandtheircustomers.Myadviceistogothroughthesegmentationprocess,dotheanalyticsandseewhattheresultslooklike.Typicallytheanalyticresultsareappealingandmorecompellingthanmanagerialjudgment.Thisisbecauseamanager’sdictumisaroundoneortwooratmostthreedimensions,arbitrarilydefined.Buttheanalyticoutputoptimizesthevariablesandseparationisthemathematical‘best’.Itwouldbeunlikelythatoneperson’sintuitioncouldout-performastatisticalalgorithm.Iwouldevensaythatifananalyticoutputisverydifferentthanamanager’spointofview,thatmanagerhasalottolearnabouthisownmarket.Thestatisticalalgorithmencourageslearning.Mostoftenmanagerialfiatisaboutusage(high,mediumandlow),satisfaction,netprofit,etc.Noneoftheserequire/allowmuchinvestigationintoWHYtheresultsarewhattheyare.Noneoftheserequireanunderstandingofconsumerbehaviour.

ThisiswhyRFM(recency,frequency,andmonetary)issoinsidious.Itisabusinessrule,it’sappealing,itisbasedondataanditworks.Itisultimatelya(typicallyfinancial)manager’spointofview.Itdoesnotencouragelearning.Marketingstrategyisreducedtonothingmorethanmigratinglowervaluetiersintohighervaluetiers.

Agoodoverviewofsegmentation,fromthemanagerialroleandnottheanalyticalrole,isArtWeinstein’sbook,MarketSegmentation(1994),whichprovidesagooddiscussionofsegmentationbasedonbusinessrules.

CHAIDCHAID(chi-squaredautomaticinteractiondetection)isanimprovementoverAID(automaticinteractiondetection).Strictlyspeaking,CHAIDisadependentvariabletechnique,NOTaninter-relationshiptechnique.I’mincludingitherebecauseCHAIDisoftenusedasasegmentationsolution.

Thisbringsustothefirstquestion:‘Whyuseadependentvariabletechniqueintermsofsegmentation?’Myansweristhatitisinappropriate.Adependentvariabletechniqueis

designedtounderstand(predict)whatcausesadependentvariabletomove.Bydefinition,segmentationisnotaboutexplainingthemovementinsomedependentvariable.

OK.Howdoesitwork?Whiletherearemanyvariationsofthealgorithm,ingeneralitworksthefollowingway.CHAIDtakesthedependentvariable,looksattheindependentvariablesandfindstheoneindependentvariablethat‘splits’thedependentvariablebest.‘Best’hereisperthechi-squaredtest.(AIDwasbasedontheF-test,whichistheratioofexplainedvarianceoverunexplainedvarianceandisused(inmodelling)asathresholdthatprovesthemodelisbetterthanrandom.)Itthentakesthat(secondlevel)variableandsearchestheremainingindependentvariablestotestwhichonebestsplitsthatsecondlevelvariable.Itdoesthisuntilthenumberoflevelsassignedisreached,oruntilthereisnoimprovementinconvergence.

Belowisasimpleexample(Figure8.1).ProductrevenueisthedependentvariableandCHAIDisrunandthebestsplitisfoundtobeincome.Incomeissplitintotwogroups:highincomeandlowincome.Thenextbestvariableisresponserate,whereeachincomelevelhastwodifferentresponserates.Highincomeissplitintermsofresponserate>9%andresponserate>6%and<9%.Lowincomeissplitbetween<2%and>2%and<6%.Thusthissimplifiedexamplewouldshowfoursegments:highincomehighresponse,highincomemediumresponse,lowincomemediumresponseandlowincomelowresponse.

Figure8.1CHAIDoutput

TheadvantagesofCHAIDarethatitissimple,easytouseandeasytoexplain.Itprovidesastunningvisualtoshowhowtointerpretitsoutput.

Thedisadvantagesaremany.First,itisnotamodelinthestatistical/mathematicalsenseoftheword,butaheuristic,aguide.Thismeanstheanalysistendstobeunstable;thatis,differentsamplescanproducewildlydifferentresults.Therearenocoefficientsthatshowsignificance,therearenosignsonthevariables(positiveornegative)andthereisnorealmeasureoffit.

CHAIDisapopulartechnique,duetoitseaseandsimplicity.Iwouldofferitisnotappropriateforsegmentation.Itsbestuseisprobablyintermsofdataexploration.Iwouldcaution,however,thatthiscanbecomeacrutchandmightencourageyoutobypassyour

ownbrain.Irememberwhensomeonewhoworkedformewasassignedtobuildaregressionmodel.ShehadCHAIDonherPCsoshewasrunningallkindsofCHAIDoutputandhadmanypagesoftreediagrams.AfterawhileIaskedhowitwasgoingandshewasstillexploringthedata.Shehadhundredsofvariablesandshesaidshehadnorealideaaboutwhatcausedwhat.SheclaimedsheneededCHAIDtominethedatabecauseshehadnocluewhatvariablesmightcause/explainthemovementinthedependentvariable.Itoldherthatifshe,astheanalyst,trulyhadnoideawhatsoeverastowhatmightcauseorexplainthemovementinthedependentvariable(inthiscasesales)thenshewasnottherightpersontodothemodel.AsanalystyouMUSThavesomeideaofthedata-generatingprocessandyouMUSThavesomeideaabout‘thiscausesthat’,egpricechangescausechangesindemand.So,useCHAIDfordesigningstructure,notexplainingcausality.

HierarchicalclusteringHierarchicalclusteringISaninter-relationshiptechnique.ItalsohasagraphicaldisplaybutunlikeCHAIDitisNOTvisuallyappealing.

Hierarchicalclusteringcalculatesa‘nearnessmetric’,atypeofsimilarityviasomeinter-relationshipvariables.Therearemanyoptionshowtodothisbutconceptuallytheideaisthatsomeobservations(saycustomers)are‘closetoeachother’basedonsomesimilarvariables.Thenadendogram(ahorizontaltreestructure)isproducedandtheanalystchooseshowtodividetheresultantgraphics.SeeFigure8.2.

Figure8.2Hierarchicalclustering–dendogram

Notethat,forinstance,observations34and56arejoinedtogether(becausetheyaresimilar)andthesearenextjoinedtoobservation111.Nowtherearethreeobservationsinthiscluster.Asthenumberofobservationsincreasesthegraphicislessandlessusable.Onedisadvantageisthattheanalystisrequiredto(arbitrarily)decidewheretobreaktheclustersoff.Thatis,itultimatelyisuptotheanalysttochoosehowmanyandwhichobservationsareinthefinalclusters.ArbitrarychoiceisNOTbasedonanalytics,butintuition.

Anadvantageofhierarchicalclusteringisitcalculatesthedistanceofeveryobservationfromallotherobservations,sothestarting‘seeds’aremathematicallydistinct.Oftenhierarchicalclusteringisusedfornothingelsethanthesestartingseedsasaninputintoanotheralgorithm.NotewellJamesH.Myers’bookonsegmentation(Myers,1996),whichhasaverygoodandconceptualtreatmentofhierarchicalclustering.

K-meansclustering

K-meansisprobablythemostpopular(analytic)segmentationtechnique.SAS(usingprocfastclus)andSPSS(usingpartitioning)haveverypowerfulalgorithmstodoK-meansclustering.K-meansiseasytodo,fairlyeasytounderstandandexplainandtheoutputiscompelling.K-meansworksandhasbeeninuseforover50years.

K-meanswasinventedbyzoologistsinthe1960sforphylumclassification.WhileEWForgy,RCJanceyandMRAnderbergwereearlyalgorithmdesigners(1960s)itwasJamesMacQueen(1967)whocoinedtheterm‘K-means’.It’scalledK-meansbecauseKisthenumberofclustersandthecentroidsarethemeansoftheclusters.Notetheyweretryingtodecide,basedonananimal’s(particularlyabutterfly’s)characteristics,towhichphylumtheybelonged.Theywantedanalgorithmfortaxonomy.

Thegeneralalgorithm(andaswithallothertechniques,therearevariousversions)isasfollows:

1. Setup:choosenumberofclusters,choosesomekindof‘maximumdistance’todefineclustermembershipandchoosewhichclusteringvariablestouse.

2. Findthefirstobservationthathasalltheclusteringvariablespopulatedandcallthiscluster1.

3. Findthenextobservationthathasalltheclusteringvariablespopulatedandtesthowfarawaythisobservationisfromthefirstobservation.Ifit’sfarenoughawaythencallthiscluster2.

4. Findthenextobservationthathasalltheclusteringvariablespopulatedandtesthowfarawaythisobservationisfromthefirstandsecondobservations(clusters).Ifit’sfarenoughawaythencallthiscluster3.Continuewithsteps2–4untilthenumberofclusterschosenisdefined.

5. Gotothenextobservationandtestwhichclusteritisclosesttoandassignthatobservationtothatcluster.

6. Continuewithstep5untilallobservationsthathavetheclusteringvariablespopulatedhavebeenassigned.

Thereareseveralthingsgoodaboutthisalgorithm.Itisveryfastandcanhandlealargeamountofdata.Itworks.Itwillachievesomekindofseparation.

Therearemanydisadvantages.Personally,IHATEthearbitrarinessofwhattheanalystmustdecide.Asstatedabove,theanalysttellsthealgorithmhowmanyclusterstoform(asifheknows).Thereislittle(analytically)tobasethisimportantcriterionon.Second,hehastotellthealgorithmwhatvariablestousetodefinetheclusters.Again,asifheknowshowmanyclustersthereare.Thisisanextremelyimportantchoice.TheclustersareDEFINEDbasedonthisarbitrarychoice.

AnotherdisadvantagewithK-meansisthattherearenorealdiagnosticsonhowwellitfits,howwellitpredictsandhowwellitscoresthoseobservations(customers)intoeachsegment.Becauseit’sbasedonthesquarerootofEuclideandistance

eachobservationisplacedinthesegmentitis‘closestto’.Thereisnolikelihoodmetric.Supposeacustomerisnewonfile,orhassomeunusualbehaviour.Thiscustomermightnotexhibitrealsegmentbehaviourbutisplacedsomewhere,regardless.

Becauseofthesearbitrarychoices(andthefactthatK-meansgivesnodiagnosticstoaidthesechoices)mostclusteringprojectsendupwiththeanalystgeneratingmanysolutions.Hewilldoa4anda5anda6anda7andan8-clustersolution.Hewilluseineachvariables1–5andthenvariables5–10andthenvariables10–12,etc.Becausetherearenorealdiagnosticstoguidehimhewilloutputreamsofpaperandsharethesepilesofprofileswithhispeersandtheultimateusersofthesegmentationandbasicallythrowuphishandsandsay,‘Whatdoyouthink?Whichofthese20outputsdoyoulikethebest?’Andthenmaybesomebodywilldecidewhattheylike,typicallyforstrategicreasons.Notethesubjectivityhere?

Anotherobviousdisadvantage(giventhealgorithmabove)isthatiftheorderofthedatasetisdifferent,theK-meanssolutionwillbedifferent.Somealgorithmsimprovethisoptionbynotjustgoingdownthelist,buttakingarandomobservationaseachstartingseed.Thisisbetter,butthesameproblemremains.Re-order,orre-do,thealgorithm–withthesamenumberofclustersandthesamevariables–andtheoutputwillbe(very)different.Thisshouldstrikeallanalyticpeopleasagreatproblem.

AlastproblemwithK-meansisthatitisnotanoptimizingalgorithm.Itdoesnottrytomaximize/minimizeanything.Ithasnogenerallycontrollingobjective.

Therefore,IwouldsuggestthatK-meansisnotaviableoptionforactionablesegmentation.Thealgorithmistooarbitraryandtheoutputissubjective,somethingmostgoodanalystsabhor.

LatentclassanalysisLatentclassanalysis(LCA)isamassiveimprovementonalltheabove.Itisnowthestateoftheartinsegmentation.Tome,thebestsoftwareforthisisLatentGoldfromStatisticalInnovations.JayMagdisonisageniusandhaswrittensomeofthebestarticlesonit.Especiallysee‘Anontechnicalintroductiontolatentclassmodels’(2002)and‘Latentclassmodelsforclustering:acomparisonwithK-means’(2002).

LCAtakesacompletelydifferentviewofsegmentation.Ratherthan,asinthecaseofK-means,wherethevariablesdefinethesegments,LCAassumesthescoresonthevariablesarecausedbythe(hidden)segment.Thatis,LCApositsalatent(categorical)variable(segmentmembership)thatmaximizesthelikelihoodofobservingthescoresseenonthevariables.

Itthenrunsthistaxonomyandcreatesaprobabilityofeachobservationbelongingto

eachsegment.Thesegmentthathasthehighestprobabilityisthesegmentintowhichtheobservationisplaced.ThismeansLCAisastatisticaltechniqueandnotamathematical(likehierarchicalorK-meansclustering)technique.

TherearesomedisadvantagesofLCA.SASdoesnotdoit,atleastnotasaproc.SPSSdoesnotdoiteither:youhavetobuyspecialsoftware.StatisticalInnovationscreatedLatentGold,whichhasprobablybecomethegoldstandard(getit,‘gold’?).Italsorequiressometrainingandsomeexpertise,butLatentGoldismenudrivenandveryeasytouse.Also,likethelightbulb,itisnottruethatyouhavetounderstandalloftheintricatedetailsinordertouseit.Sometrainingisrequired,buttheresultsarewellworthit.

Theadvantageshavebeenalludedtobutjusttobeclear,LCAhasaLOTofadvantages.Ultimatelysegmentation’susefulnessisaboutstrategy.Thebetterthedistinctivenessthemoreobviouslyastrategybecomeslevelledoneachsegment.

However,thereareseveralimportantanalyticadvantages,especiallyinthewayLatentGoldarticulatesthealgorithm.First,LCAtellsyoutheoptimalnumberofsegments.Youdonothavetoguess.LCAusestheBIC(BayesInformationCriterion)and–LL(negativeloglikelihood)anderrorratetogiveyoudiagnosticsastowhatisthe‘best’numberofsegmentsgiventhesescoresonthesevariablesandthisdataset.

Second,LCAgivesindicationsastowhichvariablesaresignificantinthesegmentationsolution.Youdonothavetoguess.AnyvariablethathasanR2<10%canbedeemedinsignificant.

Third,LCAproducesanoutputthatscoreseveryobservationwiththeprobabilityofbelongingtoeachsegment.Ifobservation#1hasaprobabilityofbelongingtosegment1of95%andprobabilityofbelongingtosegment2of5%it’sprettyobvioustowhichsegmentthatobservationbelongs.Observation#1exhibitsverystrongsegment1behaviour.Butwhataboutobservation#2thathasaprobabilityofbelongingtosegment1of55%andprobabilityofbelongingtosegment2of45%?Thisobservationdoesnotdemonstrateverystrongsegmentbehaviour,foranysegment.UnderK-meansthisobservationwouldlikelybeassignedtosegment1.ButLCAgivesyouadiagnostic.Typicallysomeassumptionshouldbemade.It’susuallysomethinglike,anyobservationthatdoesnotscoreatleast70%likelihoodofbelongingtoanysegmentshouldbeeliminatedfromtheoutput.Thoseobservationsareplacedinsomeotherbuckettobedealtwithinsomeotherway.Thereshouldnotbemorethan5%oftheseoutliers,givenmostmarketingmodelsareat95%confidence.Agoodsolutionwillhavefarlessthan5%outliers.

Thesediagnosticsmaketheanalyticsveryfastandveryclean.Theyalsomakethesegmentationsolutionverydistinct.Asmentioned,thisisthehallmarkofagoodsegmentationsolution:distinctiveness.Butthisisnotjustvaluablefortheanalyst;itisofupmostimportancetothestrategist.Themoredistinctthesegmentationsolutionthe

clearereachstrategybecomes.

Table8.1Latentclassanalysis

RFM CHAID K-means LCA

Multivariable XX XX XX XX

Customer-centric XX

Multivariate XX XX

Probabilistic XX

BUSINESSCASEScott’sbosscalledhimintotheoffice.Helookedaroundwhilehisbossplayedwiththephone,whichalwaysirritatedScott.

‘SoScott’,hisbosssaid,grudginglylookingupfromhissmartphone.‘Wearereadytomakeamajorpushinconsumerstrategy.We’veaddedconsumerelectronicstoourproductmixandnowwanttodivedeeper.’

‘Thatsoundsgood.Whatdoesthatmeanformygroup?’

‘We’dliketoexploreversioningourdirectmailcatalogues,positioningoure-mailsmorestrategically,etc.WeallrememberyourONESIZEDOESNOTFITALLspeechattheoffsitelastquarter.’

‘Yeah,sorry,therehadbeenafewcocktailsand…’

‘No,it’srighton.We’retalkingaboutinitiatingacustomermarketsegmentationprojectandyouareslatedtoleadit.’

Scottgulped.Thatwouldbealotofwork.Itwouldbealotoffunandveryvisible.‘I’llstartputtingateamtogetherandbegintogothroughtheprocess.’

Scottwentbacktohisoffice(he’dbeenpromotedbynow)andsketchedoutaprocess,outputtingasegmentationbasedonconsumerbehaviour.Hewroteonhiswhiteboardalistofstepsandtheninvitedstakeholderstoacollectionofmeetings.Theywerestartingabigproject:customersegmentation.

Strategize

Thefirststepinbehaviouralsegmentationistostrategize.Thistendstobeaviewfromtwolenses:marketingstrategyandconsumerbehaviour.Thesetwoshouldnotbecontradictory.

Scott’steammetandtherewassomediscussionbutScottprovidedleadershipongoalsbasedonthemantraofPeterDrucker,thelegendarymanagementguruwhocreatedbusinessmanagementasadistinctandseparatediscipline.Druckersaidthereareonlythreemetricsthatmakeanybusinesssense:increasingrevenue,increasingcustomersatisfactionanddecreasingexpenses.Ifyouareworkingonaprojectthatcannottietoatleastoneofthesemetricsyoushouldaskyourselfwhetheryoureallyshouldbedoingthatproject.Scott’steamdecidedtheirmarketingstrategyforthesegmentationwouldbeincreasingnetprofitmargin.Thewholepointforeachsegmentwasstrategizingcross-sell/up-sellopportunities.Thiswasadeparturefromlastyear’sstrategyofmostlyacquiringcustomers.Theyrealizedhowexpensiveacquisitioncanbe.

Intermsofconsumerbehaviour,Scott’steamhypothesizedpotentialconsumersegments.Therewouldlikelybeoneormoregenerallysensitivetoprice,oneormorehavingdifferentproductpenetrations,oneormorereactingtocompellingmessagesdesignedforthemandoneormorethatpreferonechanneloveranother.Thisisjustusingtacticalmarketing(product,price,promotionandplace)differentiallyagainsteachsegment.

Therealissuewasintermsofbehaviour.Theytalkedlongaboutwhatcausedthebehaviourstheywouldsee.Theyrationalizedtheremightbeaconsumersegmentheavilyintogamesandentertainment,oranotherconsumersegmentveryhightech/web-centric/earlyadopters,etc.Theremightbeanothersegmentneedingarelationship,moreonthelow-techside,needingtheirhandsheldthroughthetechno-babble.Theyknewmostoftheir(behavioural)datawouldbetransactionsandmarcomresponses.

Sotheteamthoughtthat,giventhemarketingstrategyofincreasingnetrevenueandthevariouspotentialconsumerbehavioursegments,astrategycouldbelevelleddifferentlyateachsegment.Thatis,acompletelydifferentcommunicationstylewouldbeusedon,say,aprice-sensitive,low-techconsumerasopposedtoaheavygamer.Scottthoughttherewasalotofexcitementandbuy-inforthisoutput.

Collectbehaviouraldata

Scottwenttohisdatabaseteamandtheytalkedaboutwhatdatatheyhad.Firsttheyhadtodefineaconsumer(asopposedtoasmallbusiness,eg,asoleproprietorship)butthatwasfairlystraightforward.Thentheytalkedaboutdata.

Scottwantedbehaviouraldata,specificallytransactionsandmarcomresponses.Theytalkedabouttwoorthreeyearsofhistory.ThePCconsumerbusinesshasastrongseasonality(peakinginAugustandevenmoreinDecember)andScotthadalreadylearnedhowseasonalityhadtobetakenintoaccount.

Intermsoftransactions,theissuewaswhatkindofgranularitywasneeded.Theydecidedtheyneededonlybroadproductcategories–laptops,desktopsandworkstations(veryfewconsumerswouldbuyaserver)–andonlygoonelevelbelowthis,eg,high-end

desktopversusscaled-backdesktop,andsoon.They’daddconsumerelectronics,whichincludedtelevisions,printers,software(personalproductivity,games,etc.),digitalcameras,accessories,etc.They’dincludeproductdetailsaswellasgrossrevenueanddiscountsapplied,netrevenue,numberofpurchases,timebetweenpurchases,monthstheproduct(s)werepurchased,etc.

Thinkingaboutmarcomresponses(asignofbehaviourandanindicationofengagement)theytalkedaboutbothdirectmailande-mail.Theywouldmostlyignoresocialmedia/in-boundmarketingbecauseofdifficultyinmatchingcustomers,andwebbanner/advertising(again,itcannotbetieddirectlytoaparticularcustomer).Theyknewtowhomtheysentacatalogue,whentheysentit,whatwasonthecoverandwhatoffers/promotionswereinsideeachone.Eachcataloguehadaunique800phonenumber,sowhenthecustomersrang,thecallcentrewouldknowwhichcataloguehaddriven(atleast)thatinquiry.Ifapromotionwasusedonlinethosewerealsouniquetoeachcatalogue.Thesamedatawasavailablefore-mail.Eachwassenttoaparticulare-mailaddressandtheycouldkeeptrackofeachopenandclick,etc.Soagain,therewasalotofdata.

Collectadditionaldata

Thenextstepwastocollectadditionaldata.Thiscouldcomefromseveralpossiblesources.Itcouldcomefromcreating/derivingdatafromthedatabase.Itcouldcomefromoverlaydataandfromprimarymarketresearchdata.

Fromtheconsumerdatabasetheycreatedadditionalvariables.Theseincludedmonthlydummyvariablesforseasonality.Theycalculatedtimebetweenpurchases,theyderivedtypicalmarketbasketsandtheyputtogethershareofproducts,thatis,percentofdesktops,percentofconsumerelectronics,andsoon.

Theypurchasedoverlaydata.Thisincludedbothdemographics(suchasage,education,income,gender,sizeofhouseholdandoccupation)aswellaslifestyleandinterestvariables.Theyhopedthesewouldfleshoutthesegments.Thisdatawasprettywellmatched,atabout80%totheirconsumerdatabase.

TherewasalimitedamountofprimarymarketingresearchbutScottfoundafewstudiesthatcouldbehelpful(especiallyintheProbephaseofthefourPsofstrategicmarketing).Theyhaddoneacustomersatisfactionstudyandanawarenessstudy.Thesestudieseachtookcustomernamesfromthedatabaseand,whilenotwellrepresentedcouldbematchedtothetransactionfile.

Analytics

Collectdataandsample

Notetherearetwokindsofvariablesinthisenvironment:segmentingvariablesandprofilingvariables.Segmentingvariablesarethoseusedtocreatethesegments,while

profilingvariablesareeverythingelse.Theprimarymarketingresearchdatawillbeprofilingvariables,astheyaretoounderpopulatedtobeusedassegmentingvariables.Mostofthedemographicswillbeprofilingvariables,asdemographicsaretypicallynotusefulindefiningsegments.Buttheother(behavioural)variableswillgothroughthealgorithmandbetestedastowhetherornottheyaresignificantandifsowillbekeptassegmentingvariables.Notethatanythingthatisnotasegmentingvariablewillbeaprofilingvariable.

What’snextiswhatScotthasbeenmostlookingforwardto:theanalytics.Thereareseveralstepsinthisprocessandtheyareallenjoyable.

Sofirsthewouldhavetotakeasample.LCAcannotoperateonmillions(orevenhundredsofthousands)ofrecords.Thealgorithmwouldtakeyearstoconverge.Sohechoosesarandomsampleof,say,20,000customerrecords.Theserecordshavebeenmatchedwithtransactionsandmarcomresponses,deriveddataandoverlaydataand(wherepossible)primarymarketingresearchdata.

Usuallythereisnoneedtoworryaboutoversampling(acertainvariable)orstratifying,etc.

Oversampling:asamplingtechniqueforcingaparticularmetrictobeoverrepresented(larger)inthesamplethaninsimplerandomsampling.Thisisdonebecauseasimplerandomsamplewouldproducetoofewofthatparticularmetric.

Stratifying:asamplingtechniquechoosingobservationsbasedonthedistributionofanothermetric.Thisisdonetoensurethesamplecontainsadequateobservationsofthatparticularmetric.

Intypicalconsumermarketingasimplerandomsampleisfine.Takealookatanygoodgeneralstatisticsbookforsampling,etc.,suchasStatisticalAnalysisforDecisionMaking,byMorrisHamburg(1987).

Normalize

Now,eventhoughnotstrictlynecessary,isthetimetoweedoutnon-normality.Iliketodothissteptoensureagainststrangeorweirddataelements.So,therearetwostages.

Thefirststageissimplytotesteveryvariablefor‘non-normality’.Thisgenerallymeanstakingthez-scoreofeachvariableorstandardizingeachvariable,thendeletinganyobservationthathasascore>+/–3.0standarddeviations.(Threestandarddeviationsis99.9%oftheobservationsinanormaldistributionandisthereforeveryNON-normal.)Theseareclearlynon-normaldataelementsandthereshouldnotbeverymanyofthem.Somepeoplereplacetheseoutlierswiththemeanbutifthereareenoughobservationsthisisnotnecessaryandalittletooarbitraryformytaste.

ForthesecondstageIwillhavetoaskyoutomakesureyou’resittingdown.RememberhowI’veclamouredabouthowbadK-meansisandhowit’snotagood

solution?WellnowI’maskingyoutouseK-meanstotestfornormality.

TheideaistorunK-meanswithaLOTofclusters,like100orso.Usethe(typicallybehavioural)variablesthatmakemostsensetoyouindefiningtheclusters.Allwearetryingtodoisformclustersthatareunusualintermsofbehaviouralmotivations.Sonowwith,say,100clusters,thoseclustersthatareverysmall(likehavingonlyafewcustomersinthem)arebymultivariatedefinition‘unusual’.Theseobservationsshouldbeeliminated.Thepointisthatwhilewe’velookedatanysinglevariablebeingunusual,thistechniqueusesamultivariableapproachtofindagroupofcustomersmovinginsuchawaytobenon-normal.That’swhytheseobservations(customers)aredeletedfromfurtheranalysis.

Notethatwearetryingtounderstandthenormalmarket.That’swhythereiseffortputforthtodetectnon-normality.Becausewehaveasampleit’sevenmoreimportanttoascertainunusualscoresonvariablesorunusualcustomerbehaviourandeliminateit.

So,let’ssaythatScottandhisteamdidtheaboveprocessandtheirsamplewentfrom20,000to18,000.Thenherandomlysplitsthis18,000intotwofiles,AandB.Thiswillbeatestfileandavalidationfileforlater.

RunLCA

NowScottfeedstestfileAintothesoftwareandisreadytorunLCA.Hefirstchoosestorunasolutioncreatingsegments2through9,justtonarrowdownwherethingsare.LCAshowsdiagnostics(BIC,LL,etc.,seeabove)tohelpwiththeoptimalnumberofsegments(seeTable8.2).NotethattheBICgoesdownandisataminimumatsixsegments.ThistellsScottsixsegmentsareprobablytherightnumber.TheBICistheBayesInformationCriterion.Thinkofitasanareaoferror(essentiallynegativeprobability)withthesmallertheareathebetter.Whicheverclusterhasthesmallesterror(intermsofpredictingmembership)thebetteritis.

Table8.2BayesInformationCriterion

BIC

2cluster 92,454

3cluster 79,546

4cluster 61,565

5cluster 59,605

6cluster 58,456

7cluster 58,989

8cluster 59,650

9cluster 60,056

Nowherunsthesecondmodel,afterdeletingthosevariablesthatareinsignificantandcomesupwithTable8.3.

Table8.3BayesInformationCriterion:secondmodel

BIC

3cluster 64,466

4cluster 56,550

5cluster 41,058

6cluster 40,611

7cluster 57,089

8cluster 58,067

Thevariablesheusesalsogivediagnosticsastowhicharesignificant.NoteTable8.4below,showingR2<10%formostofthedemographics.TheseScottremoves.

Table8.4Listofvariablesremoved

Age 0.05

Education(years) 0.07

Income 0.01

Sizehousehold 0.02

Occupation–bluecollar 0.05

Occupation–whitecollar 0.04

Occupation–agriculture 0.02

Occupation–government 0.01

Occupation–unemployed 0.02

Ethnicity–asian 0.02

Ethnicity–white 0.02

Ethnicity–black 0.01

Thisispartofthemodellingexercise:putvariablesin,runthesegmentsolutions,seewhereBICisbest,lookatsignificanceandremovethosethatareinsignificant,etc.Whilethisseemstimeconsuming,itendsupbeingfarfasterthan,say,K-means,mostlybecause

thereisabsolutelyagoodsolutionattheend,notanarbitraryquagmireofundifferentiatedclusters.

Thevariablesthatendupbeingsignificantinclude:

Figure8.3SignificantVariables

Notethatthesevariablesarebehavioural,asexpected.Revenuevariablesarenoteventested,astheyaretheRESULTofbehaviour.Demographicstypicallyarenotsignificantandarealsonotbehavioural.Ofcourse,anyandallofthesevariablescanbeusedforprofiling.

Thenextstepistocorrectforwhitenoise,usingbi-variateresiduals.Thisstepaddsalargenumberofparametersandwillslowtheanalysisdown.Waydown.Analytically,allthreedimensionsarenudgedsimultaneously:findthenumberofsegments,findthesignificantvariablesandcorrectwithbivariateresiduals.

Thenextstepistomarkthosebivariateresiduals.Theseareindicationsofsomepatternremainingthattheindependentvariablesarenoteliminating.Thebivariateresidualsshouldbecheckeddowntoabout3.84.Thisisthe95%levelofconfidence(rememberthe95%z-scoreforlinearmodelsis1.96and3.84=1.96*1.96,acurvilinearmetric).

Thecommonlaststepistorunthesecondfilethroughusingthesamenumberofsegments,six,andthesamevariablesfoundtobesignificant.Checkthebivariateresidualsandlookatthetwooutputs.Theyshouldappearessentiallythesame.Iusuallydonotstatistically‘test’thissameness,Ijustlookatit.Ihaveneverseenthetworesultstobe

differentinanymeaningfulway.

Profileandoutput

Theprofilegenerallyusesallthevariables.Oftenthereisa‘top-down’viewanda‘bottom-up’view,orastrategyviewandatacticalview,orageneralviewandaspecificview.Belowisthestrategic,top-downorgeneralviewofthesixsegments.Thislensputsthesegmentstogether,tocompareandcontrast,allatonce,lookingatKPIs.

Table8.5Generalviewofsixsegments

Seg1 Seg2 Seg3 Seg4 Seg5 Seg6

%ofmarket 30% 24% 19% 15% 9% 3%

%ofrevenue 32% 39% 9% 17% 2% 0%

#Totalpurch 14.49 25.64 8.88 18.17 7.95 9.65

RevDTpurch 3,150 4,730 999 2,592 352 81

RevNBpurch 2,320 720 680 1,152 630 168

Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154

#DMsent 13.5 9.1 19.5 5.6 6.8 9.5

#EMsent 15.9 17.8 9.1 12.9 15.5 12.8

#EMopen 1.4 3.2 0.4 4.5 1.7 2.6

#EMclick 0.1 0.4 0 2.3 0.3 0.2

#Prodpurchcallcentre 3.6 2.6 8 0.9 2 3.9

#Prodpurchonline 10.9 23.1 0.9 17.3 6 5.8

Education(years) 19.1 12.9 11.8 17.9 13.8 13.8

$Income 185K 60K 45K 125K 15K 75K

%Q4purchase 25% 70% 83% 14% 15% 41%

Avgtimebetweenpurch 6.5 3.1 16.5 4.2 9.4 15.4

Avgtimebetweenwebvisits 3.2 2.1 9.5 1.9 3.9 8.5

Afewquickcommentscanbemadeontheaboveoutput.Firstisthatsomedemographicsareshown.Thisistypical.Rememberthatwhiledemographicsarenotstatisticallysignificantindesigningthesegmentation,theymightstillbeofuseinfleshingoutthesegments(andadvertisersseemtolovedemographics).Thefirststageispartitioningandthesecondstageisprobing.Addingadditionaldataispartoftheprobingstage.

Let’slookatthesegmentationsolution.Segment1isthelargestintermsofmarketandeachsegmentissuccessivelysmallerwithsegment6thesmallestat3%.Thestoryishowsegmentsizecomparestopercentofrevenuegenerated.Notethatsegment2contributes39%oftherevenuewithonly24%ofthemarket.Notethatsegment5,conversely,isnotpullingitsfairsharehaving9%ofthemarketbutgeneratingonly2%oftherevenue.ThesemetricsbegintoletScottknowwhereheshouldputhisresourcesandwhichsegmentsare‘worth’marketingto.Seethegraphbelow.

Figure8.4%ofmarketvs%ofrevenue

*Doesnotaddto100%duetorounding.

Anotherstorydisplaysitselfaroundchannelpreference.Segment2andsegment4seemtobeveryweb-centric,whilesegment3isNOTonethatpursuesonlinepurchases.Segment4opens4.5ofthe12.9e-mailssenttothem,whereassegment3opens0.4ofthe9.1e-mailssenttothem.Segment2purchases23.1oftheir25.64productsonline(andsegment4purchases17.3oftheir18.17productsonline)butagainsegment3purchasesonly0.9oftheir8.88productsonline.Theseareclearbehaviouraldifferences.

Segment1hasthehighestandsegment5(mostlystudents,seebelowdetails)hasthelowestincome.Segment1hasthemosteducationandsegment2theleasteducation.Thefiguresbelowshowoccupationsandotherdemographics.

Comments/detailsonindividualsegmentsAfewnotesandobservationsoneachsegmentfollow.

Segment1

Segment1isthelargestsegment(30%ofthemarket)andcontributes32%oftherevenue.

Segment1purchasesmoredesktops(3.5)andnotebooks(2.9)thananyothersegment.Theyhaveahighpenetrationofproductivesoftware(twicetheaverage)probablyheavilyinvestedinsmartphoneandtabletownership,whichmeanstheyareveryhigh-tech

comfortable.

Segment1receivesthesecond-highestnumberofdirectmailsande-mailssent.It’sinterestingtonote,however,thattheyhavethenext-to-lowestnumberofe-mailsclicked/numberofe-mailsopenat0.7%.

Segment1hasthelargestsizehousehold(4.1)andmost(70%)whitecollaroccupations.Theyhavethehighestincomeandhighesteducation.Theyareyoungishandprobablycouldbecalledyuppies.

Segment2

Segment2isthenext-to-largestsegment(24%ofthemarket)andcontributesmorethantheirfairshareoftherevenueat39%.

Segment2paysbyfarthehighestdesktopprices(75%aboveaverage)andhasnearlyfourtimeshigherthanaveragegamingsoftwarepurchases.Almostnoproductivitypurchases,butalotofaccessory(nearlythreetimesaverage)andphonepurchases(nearlytwiceaverage).

Segment2showsnext-to-highestnumberofe-mailopensandthehighestnumberofproductspurchasedonline,88%aboveaverage.Thissegmentcallsthecallcentrenext-to-lowestnumberoftimesfromthecataloguebuthasthehighestnumberofcallsfrome-mailsandtheyhavethemostonlineconfigurations.

Thissegmentisthegamers!Theytendtobeyoungandsinglewithnext-to-smallestsizeofhousehold.Theypurchaseallofthegamingaccessories:headphones,joystick,etc.

Segment3

Segment3makesup19%ofthecustomermarketbutonlyaccountsfor9%oftherevenue.Thissegmentdoesnotcomeclosetopullingitsweight.

Segment3purchasesalargeamountofdigitalcameras(nearlytwiceaverage)and50%morephones.Whentheydopurchasetheytendtobuylow-endentry-leveltechnology,whichisonereasontheirrevenuecontributionissolow.

Segment3receivesthehighestnumberofcataloguesandthelowestnumberofe-mails.Thissegmentopensfewerandclickslessthananyother.Segment3needsa(directmail)discountinordertopurchase.

Segment3callsfromdirectmailmoreandpurchasesfromthecallcentremorethananyothersegment.Conversely,thissegmentcallsfrome-maillessandpurchasesonlinelessthananyothersegment.

Segment3needshand-holding.Theyarelowtechandneedarelationshiptofosterapurchase.TheytendtobeAfrican-American,withahighpercentageofbluecollarandgovernmentoccupations.Thissegmenthastheleasteducation.Theycallthecallcentre

withcomplaintsmorethananyothersegmentandtendtopurchasemostlyduringtheChristmasseason.

Segment4

Segment4is15%ofthemarketandgenerates17%oftherevenue.

Segment4purchasesnext-to-mostdesktopsandnext-to-mostnotebooks.Theyareveryhightech,purchasingthemostTVs,cameras,networkandotheraccessories.

Thissegmenthasthehigheste-mailopensandbyfar(overfourtimesaverage)e-mailclicksthananyothersegment.Theypurchasefewerproductsfromthecallcentreandnext-to-mostproductspurchasedonlinethananyothersegment.Theyhavetheshortesttimebetweenwebvisits.

Segment4isveryweb-centricandprobablybelieves‘printisdead!’TheytendtobeAsian,veryhightech,withengineeringwhitecollaroccupations.Theywouldbeearlyadopters,withnext-to-highesteducationcomparedtoothersegments.Theyignoredirectmailandmakemostoftheirpurchasesonline.

Segment5

Thissegmentistheleastsuccessful,being9%ofthemarketbutonlypulling2%oftherevenue.

Segment5purchaseslow-endproducts(fewdesktop,largelynotebooks),mostlyduringback-to-schoolsalesandusuallywithadiscount.Theypurchasenearlyzeroconsumerelectronics.

Segment5receivesthenext-to-leastnumberofdirectmailsandmakesthenext-to-leastcallcentrepurchases.

Segment5appearstobemostlystudents,single,unemployed,lowincome,etc.

Segment6

Segment6isonly3%ofthemarketingandgenerates<1%oftherevenue.

Segment6reallyonlypurchasesaccessoriesandoccasionalitems,spareparts,etc.

Thissegmentisnotreallyengagedinourbrand,doesnotreallyrespondtocommunications,etc.Segment6doesnotvisitourwebsitemuchandhasthelongesttimebetweenpurchases.ThissegmentmightbeatargettoDE-marketto.Notethehighpercentageofagriculturaloccupations.

Tables8.6and8.7presentsomedetailsbysegment,asreferencedabove.

Table8.6Detailsbysegment

Segment Segment Segment Segment Segment Segment

1 2 3 4 5 6

%ofmarket 30% 24% 19% 15% 9% 3%

%ofrevenue 32% 39% 9% 17% 2% 0%

NumDTpurch 3.5 2.2 1.11 2.88 0.88 0.09

NumNBpurch 2.9 1.2 0.85 1.44 1.05 0.21

Numelectronics–TVpurch 0.11 1.15 0.09 1.35 0.05 0.21

Numelectronics–camerapurch 0.02 0.05 1.06 1.88 0.24 0.45

Numelectronics–printerpurch 1.38 1.06 1.15 1.19 1.09 0.29

Numelectronics–accessorypurch 1.2 5.5 0.08 1.08 0.29 1.87

Numelectronics–phonepurch 0.03 1.21 0.99 0.89 0.09 0.35

Numelectronics–sw–gamepurch 0.02 9.55 0.08 0.09 0.68 0.65

Numelectronics–sw–productivepurch

4.1 0.09 1.06 2.21 0.24 0.87

Numother–networkpurch 1.1 1.02 1.54 2.89 1.98 0.87

Numother–accessoriespurch 0.11 1.55 0.22 1.59 1.08 1.54

Numother–otherpurch 0.02 1.06 0.65 0.68 0.28 2.25

Numtotalpurch 14.49 25.64 8.88 18.17 7.95 9.65

RevDTpurch 3,150 4,730 999 2,592 352 81

RevNBpurch 2,320 720 680 1,152 630 168

Revelectronics–TVpurch 127 1,811 104 1,553 30 242

Revelectronics–camerapurch 7 15 371 658 60 158

Revelectronics–printerpurch 207 105 173 179 82 44

Revelectronics–accessorypurch 90 853 6 81 19 140

Revelectronics–phonepurch 7 454 223 200 14 79

Revelectronics–sw–gamepurch 1 716 5 6 37 42

Revelectronics–sw–productivepurch

308 2 80 166 18 65

Revother–networkpurch 61 97 85 159 109 48

Revother–accessoriespurch 4 271 8 56 38 54

Revother–otherpurch 0 12 10 10 4 34

Revother–otherpurch 0 12 10 10 4 34

Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154

Table8.7Additionaldetailsbysegment

Segment1

Segment2

Segment3

Segment4

Segment5

Segment6

NumberDMsent 13.5 9.1 19.5 5.6 6.8 9.5

NumberEMsent 15.9 17.8 9.1 12.9 15.5 12.8

NumberEMopen 1.4 3.2 0.4 4.5 1.7 2.6

NumberEMclick 0.1 0.4 0 2.3 0.3 0.2

Numberprodpurchcallcenter 3.6 2.6 8 0.9 2 3.9

Numberprodpurchonline 10.9 23.1 0.9 17.3 6 5.8

NumberDMdiscount 8.1 5.5 11.7 3.4 4.1 5.7

NumberEMdiscount 11.1 12.5 6.4 9 10.9 9

NumberDMcall 1.2 0.8 15.9 0.2 3.9 9.5

NumberEMcall 9.4 12.8 2.1 3.4 8.4 4.8

Numonlineconfig 5.5 21.5 0.7 16.5 12.6 0.4

Numbercallcenterpurch 3.6 2.6 8 0.9 2 3.9

Numbercallcentercomplaint 2.1 0.9 5.6 3.2 1.2 0.5

Age 28.9 25.5 41.9 30.1 21.2 38.9

Education(years) 19.1 12.9 11.8 17.9 13.8 13.8

Income 185,000 60,000 45,000 125,000 15,250 75,000

Sizehh 4.1 1.2 3.9 3.7 1.1 3.1

Occupation–bluecollar 20% 19% 60% 18% 13% 25%

Occupation–whitecollar 70% 38% 1% 65% 5% 35%

Occupation–agriculture 4% 5% 2% 1% 5% 18%

Occupation–government 3% 28% 25% 15% 15% 11%

Occupation–unemployed 1% 8% 10% 1% 60% 10%

Ethnicity–asian 15% 5% 2% 21% 7% 1%

Ethnicity–white 55% 65% 35% 41% 70% 80%

Ethnicity–black 20% 15% 35% 8% 10% 11%

Q1purchase 30% 4% 6% 20% 5% 1%

Q2purchase 25% 10% 5% 31% 5% 3%

Q3purchase 20% 15% 5% 33% 75% 55%

Q4purchase 25% 70% 83% 14% 15% 41%

Avgtimebetweenpurch(months) 6.5 3.1 16.5 4.2 9.4 15.4

Avgtimebetweenwebvisits(weeks)

3.2 2.1 9.5 1.9 3.9 8.5

Namingthesegments

Oneofthemostenjoyableexerciseseveristhenamingofthesegments.Acommonwaytodoitisthroughrevenueandproducts.Thisisthedesktopsegmentandthisisthelow-techsegment,etc.Anotherpossibilityiswithmarcom.Thisisthedirectmailrespondersandthisisthee-mailpreferencesegment,etc.Bothoftheseareprobablytoosimplistic.

Eachsegmentnameshouldhaveonlytwoorthreewordstodescribeit:desktopdevotees,gamers,lifestarters,web-centrics,etc.Theideaistobedescriptiveaswellasmemorable.

K-meanscomparedtoLCAThecomparisonbelowcamefromScott’sdebatewithotheranalyticfolks.SomeofthemhadlearnedK-meansandbecauseLCAwasnewtothemdidnotreallyunderstandortrustit.ThereforeScottranLCAandtoldtheK-meansteamthenumberofsegmentshefoundandhetoldthemwhichvariablestouse.Notethatthesetwopiecesofinformation(howmanysegmentsandwhichvariablestousearesignificant)wouldnoteverbeinformationK-meanswouldhave.ThushegavetheK-meansteamtwoHUGEadvantages.EachteamranthealgorithmandproducedtheKPIsinTable8.8.

Table8.8KPIs

LCAoutput Segment1

Segment2

Segment3

Segment4

Segment5

Segment6

hi/low

%ofmarket 30% 24% 19% 15% 9% 3% 12

%ofrevenue 32% 39% 9% 17% 2% 0% 81.44

Numtotalpurch 14.49 25.64 8.88 18.17 7.95 9.65 3.23

RevDTpurch 3,150 4,730 999 2,592 352 81 58.4

RevNBpurch 2,320 720 680 1,152 630 168 13.81

Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154 8.48

NumberDMsent 13.5 9.1 19.5 5.6 6.8 9.5 3.48

NumberEMsent 15.9 17.8 9.1 12.9 15.5 12.8 1.96

NumberEMopen 1.4 3.2 0.4 4.5 1.7 2.6 12.4

NumberEMclick 0.1 0.4 0 2.3 0.3 0.2 124.04

Numberprodpurchcallcentre

3.6 2.6 8 0.9 2 3.9 8.8

Numberprodpurchonline

10.9 23.1 0.9 17.3 6 5.8 25.99

Education(years) 19.1 12.9 11.8 17.9 13.8 13.8 1.62

Income 185,000 60,000 45,000 125,000 15,250 75,000 12.13

Q4purchase 25% 70% 83% 14% 15% 41% 5.93

Timebetweenpurch(months)

6.5 3.1 16.5 4.2 9.4 15.4 5.32

Timebetweenvisits(weeks)

3.2 2.1 9.5 1.9 3.9 8.5 5

K-meansoutput Segment1

Segment2

Segment3

Segment4

Segment5

Segment6

hi/low

%ofmarket 24% 19% 17% 16% 15% 9% 2.67

%ofrevenue 19% 15% 17% 19% 18% 13% 1.45

Numtotalpurch 14.1 17.7 16.2 14.8 16.9 17.2 1.26

RevDTpurch 1,901 2,490 3,498 4,021 2,011 2,666 2.12

RevNBpurch 1,344 1,108 1,655 1,100 1,100 911 1.82

Revtotalpurch 4,992 5,006 6,271 7,509 7,489 9,200 1.84

NumberDMsent 10.1 11 11.2 12.8 12.9 15.1 1.5

NumberEMsent 11.9 15.2 16.4 15.2 14.9 15 1.38

NumberEMopen 1.8 2.2 2.3 2.2 2.1 2.8 1.56

NumberEMclick 0.61 0.66 0.54 0.52 0.51 0.26 2.54

Numberprodpurchcallcentre

3.1 3.6 3.7 3.9 3.4 4.9 1.58

Numberprodpurch 9.1 10.2 12.4 17.1 13.5 13.6 1.88

online

Numtotalpurch 12.2 13.8 16.1 21.0 16.9 18.5 1.73

Education(years) 16.3 16.4 15.1 13.1 15.3 15.5 1.25

Income 109,655 109,166 98,066 98,054 97,112 88,055 1.25

Q4purchase 39% 34% 61% 44% 44% 55% 1.79

Timebetweenpurch(months)

6.6 7.5 7.7 9.1 8.1 7.9 1.38

Timebetweenvisits(weeks)

3.8 4.1 4.5 4.6 3.5 4.9 1.4

NoticeinthetopLCAtablethevariable‘Numtotalpurch’.Thistableshowstheaveragesbysegment.Segment2onaveragepurchasesthemostitems,with25.64andsegment5purchasestheleastitemsonaveragewith7.95.Lookatthelastcolumnandseethehigh/lowand25.64/7.95=3.23.Thatisameasureofrange,ordispersion.

SeethelowerpartofthetablewhichusesK-means.Itisthesamedata,samenumberofsegmentsandsamevariablesusedassignificant.Thehigh/lowofNumtotalpurcharemuchlessdifferentthanthatfromLCA.Ahighof17.7andalowof14.1givearangeofonly1.26.Thisisatypicaldifference.K-meansoutputwouldwork;LCAissimplybetter,moredistinctandultimatelyproducesaclearerstrategy.

AnotherfairlycommonfindingcomparingK-meanstoLCAisintermsofsegmentsize.LCAproducessegmentsrangingfrom30%to3%,butK-meansrangesonlyfrom24%to9%.BecauseK-meansproducesroughlysphericalclustersandtheytendtobeofsimilarsize.Thereisnomarketingtheorythatwouldhypothesizethesegmentsshouldbeofaboutthesamesize.

ScottconvincedtheteamthattheLCAoutputwastheobviouswaytogo.

Elasticitymodelling

Oneverynaturalandhelpfulexerciseaftersegmentationistodoelasticitymodelling.(RememberChapter3ondemandwentthroughthemodellingdetail.)Thisshowsdifferentpricesensitivitiesbysegment.Thatis,onesegmentwilllikelybesensitivetopriceandanothersegmentwilllikelyNOTbesensitivetoprice,etc.Thisallowsforverylucrativestrategies.Reviewearlierchaptersforhowelasticitymodellingistypicallydone.

WhatScottfoundwasthatsegment1isnotsensitivetoprice.Thissegmentdoesnotrequireadiscountinordertopurchase.Hefoundconverselythatsegments3and5areverysensitivetoprice.Thesearethesegmentsthatwillonlybuywithsomekindofpromotion.

Testandlearnplan

Thelaststeptendstobeputtingtogethersomekindoftestingplan.Wewillcoverstatisticaldetailslaterinthebook,buttheconceptisstraightforward.

Theideaistocorroboratethesensitivitiesthesegmentationfound.Thatis,ifasegmentissensitivetoprice,testthat.Ifasegmentprefersaparticularchannel,testthat,etc.

Usuallyselectionistestedfirst,thenpromotionandthenchannelorproductcategory,etc.Theseareusuallyinatestversuscontrolsituation.

HIGHLIGHT

WHYGOBEYONDRFM?(ThisarticlewaspublishedinadifferentformatinMarketingInsights,April2014)

AbstractWhileRFM(recency,frequencyandmonetary)isusedbymanyfirms,itinfacthaslimitedmarketingusage.Itisreallyonlyaboutengagement.Itisvaluableforashort-term,financialorientationbutasorganizationsgrowandbecomemorecomplexamoresophisticatedanalytictechniqueisneeded.RFMrequiresnomarketingstrategyandasfirmsincreaseincomplexitythereneedstobeanincreaseinstrategicplanning.Segmentationistherighttoolforboth.

RFMhasbeenapillarofdatabasemarketingfor75years.Itcaneasilyidentifyyour‘best’customers.Itworks.SowhygobeyondRFM?Toanswerthat,let’smakesureweallknowwhatwe’retalkingabout.

WhatisRFM?Onedefinitioncouldbe,‘Anessentialtoolforidentifyinganorganization’sbestcustomersistherecency/frequency/monetaryformula.’RFMcameaboutmorethan75yearsagoforusebydirectmarketers.Itwasespeciallypopularwhendatabasemarketingpioneers(suchasStanRapp,TomCollins,DavidShepherdandArthurHughes)startedwritingtheirbooksandadvocatingdatabasemarketing(asthenextgenerationofdirectmarketing)nearly50yearsago.Itbecameapopularwaytomakeadatabasebuild(anexpensiveproject)returnaprofit.Thus,themostpressingneedwastosatisfyfinance.

JacksonandWangwrote,‘Inordertoidentifyyourbestcustomers,youneedtobeabletolookatcustomerdatausingrecency,frequencyandmonetaryanalysis(RFM)…’(JacksonandWang,1997).Againthefocusisonidentifyingyourbestcustomers.But,itisnotmarketing’sjobtojustidentifyyour‘best’customers.‘Best’isacontinuumandshouldbebasedonfarmorethanmerelypastfinancialmetrics.

TheusualwayRFMisputintoplace,althoughthereareaninfinitenumberofpermutations,endsupincorporatingthreescores.First,sortthedatabaseintermsofmostrecenttransactionsandscorethetop20%,say,witha5andondowntothebottom20%witha1.Thenre-sortthedatabasebasedonfrequency,maybewiththenumberoftransactionsinayear.Again,thetop20%geta5andthebottom20%geta1.Thelaststepistore-sortthedatabaseon,say,salesdollarvolume.Thetop20%geta5andthebottom20%geta1.Now,sumthethreecolumns(R+F+M)andeachcustomerwillhaveatotalrangingfrom15to3.Thehighestscoresarethe‘best’customers.

Table8.9Customertotals

CustomerID R F M Total

999 3 2 1 6

1001 5 3 3 11

1003 4 4 2 10

1005 1 5 2 8

1007 1 4 1 6

1009 2 4 3 9

1010 3 4 4 11

1012 2 3 5 10

1014 3 1 5 9

1016 4 1 4 9

1017 5 2 3 10

1018 4 3 4 11

1020 4 4 3 11

1022 3 5 3 11

1024 2 4 2 8

1026 1 3 5 9

Notethatthis‘best’isentirelyfromthefirm’spointofview.Thefocusisnotaboutcustomerbehaviour,notaboutwhatthecustomerneeds,whythosewithahighscorearesoinvolvedorwhythosewithalowscorearenotsoengaged.Thepointistomakea(financial)returnonthedatabase,nottounderstandcustomerbehaviour.Thatis,themotivationisfinancialandnotmarketing.

RFMworksasamethodoffindingthosemostengaged.Itworkstoacertainextent,andthatextentisselectionandtargeting.RFMissimpleandeasytouse,easytounderstand,easytoexplainandeasytoimplement.Itrequiresnoanalyticexpertise.Itdoesn’treallyevenrequiremarketers,onlyadatabaseandaprogrammer.

Sayyoure-scorethedatabaseeverymonth,inanticipationofsendingoutthenewcatalogue.ThatmeansthateverymontheachcustomerpotentiallychangesRFMvaluetiers.Aftereverytimeperiodanewscoreisrunandanewmigrationemerges.Notethatyoucannotlearnwhyacustomerchangedtheirpurchasingpatterns,whytheydecreasedtheirbuying,whytheymadefewerpurchasesorwhythetimebetweenpurchaseschanged.Muchlikethetipofaniceberg,onlytheblatantresultsareseenandRFMgivesnothinginthewayofunderstandingtheunderlyingmotivationsthatcausedtheresultantactions.Therecanbenorationaleastocustomerbehaviourbecausethepurposeofthealgorithmusedwasnotforunderstandingcustomerbehaviour.RFMusesthethreefinancialmetricsanddoesnotuseanalgorithmthatdifferentiatescustomerbehaviour.

BecauseRFMcannotincreaseengagement(itonlybenefitsfromwhateverlevelofinvolvement,brandloyalty,satisfaction,etc.youinheritedatthetime–withnoideaWHY)ittendstomakemarketerspassive.Thereisnorelationshipbuildingbecausethereisnocustomerunderstanding.Thatis,becauseRFMcannotprovidearationaleastowhatmakesonevaluetierbehavethewaytheydo,marketingstrategistscannotactivelyincentivizedeeperengagement.

RFMisagoodfirststep,buttomakeagreatsteprequiressomethingbeyondRFM.Marketersrequirebehaviouralsegmentationinordertopractisemarketing.

Whatisbehaviouralsegmentation?Behaviouralsegmentation(BS)quicklyfollowedRFM,duetothefrustrationsthatRFMproducedgood,butnotgreat,results.Aswithmostthings,complexanalysisrequirescomplexanalytictoolsandexpertise.BSwasputintoplacetoapplymarketingconceptswhenusingadatabaseformarketingpurposes.

Inordertoinstituteamarketingstrategy,thereneedstobeaprocess.KotlerrecommendedthefourPsofstrategicmarketing:Partition,Probe,PrioritizeandPosition.Partitioningistheprocessofsegmentation.

Whileit’smathematicallytruethatpartitioningonlyrequiresabusinessrule(RFMisabusinessrule)todividethemarketintosub-markets,behaviouralsegmentationisaspecificanalyticstrategy.Itusescustomerbehaviourtodefinethesegmentsanditusesastatisticaltechniquethatmaximallydifferentiatesthesegments.JamesH.Myersevensays,‘Manypeoplebelievethatmarketsegmentationisthekeystrategicconceptinmarketingtoday’.

BSisfromthecustomer’spointofview,usingcustomertransactionsandmarcom

responsedatatospecificallyunderstandwhat’simportanttocustomers.Itisbasedonthemarketingconceptofcustomer-centricity.BSworksforallstrategicmarketingactivities:selectiontargeting,optimalpricediscounting,channelpreference/customerjourney,productpenetration/categorymanagement,etc.BSallowsamarketertodomorethanmeretargeting.

Animportantpointmightbemadehere.Behavioursarecausedbymotivations,bothprimaryandexperiential.Behavioursarepurchases,visits,productusageandpenetration,opens,clicksandmarcomresponses,etc.Thesebehaviourscausefinancialresults,revenue,growth,lifetimevalueandmargin.

Primarymotivationswouldbeunseenthingslikeattitudes,tastesandpreferences,lifestyle,valuesetonprice,channelpreferences,benefitsorneedarousal.Thereareexperiential,secondarycausesofbehaviour,typicallybasedonsomebrandexposure.Thesearenotbehaviours,butcausesubsequentbehaviours.Thesesecondarycauseswouldbethingslikeloyalty,engagement,satisfaction,courtesyorvelocity.NotethatRFMusesrecencyandfrequency,metricsofengagement,whichisasecondarycause.RFMalsousesmonetarymetrics,whichareresultantfinancialmeasures.ThusRFMdoesnotusebehaviouraldata,butengagementandfinancialdata.TheseareverydifferentthanbehaviouraldatausedinBS.Onesimplewaytodistinguishbehaviouraldatafromsecondarydataisthatbehavioursarenouns:purchases,responses,etc.Notethatsecondarycausesareadjectives:engagementmetrics,loyalcustomers,recenttransactions,frequentlypurchased,etc.

BStypicallyrequiresanalyticexpertisetoimplement.Behaviouralsegmentationisastatisticaloutput(seetheboxonpage164).

OnecriticaldifferencebetweenBSandRFMisthatinabehaviouralsegmentationmemberstypicallydonotchangegroups.Thatis,thebehaviourthatdefinesasegmentevolvesveryslowly.Forexample,ifonepersonissensitivetoprice,herdefiningbehaviourwillnotreallychange.Sheissensitivetopriceevenaftershehasababy,sheissensitivetopriceassheages,orifshegetsapuppy,orbuysanewhouse.Herproductspurchasedmightchange,herinterestsincertaincampaignsmightchange,butherdefiningbehaviourwillnotchange.ThisisoneoftheadvantagesofBSoverRFM.Thisiswhatdrivesyourlearningaboutthesegments.BSprovidessuchinsightsthateachsegmentgeneratesarationale,astory,astowhyit’suniqueenoughtoBEasegment.

WhileRFMusesonlythreedimensions,BSusesanyandallbehaviouraldimensionsthatbestdifferentiatethesegments.Ittypicallyrequiresfarmorethanthreevariablestooptimallydistinguishamarket.

Becausemarketingmixtestingcanbedoneoneachsegment(usingproduct,price,promotionandplace)theinsightsgeneratedmakefordifferentiatedmarketingstrategiesforeachsegment.TotestifRFMtiersdrivebehaviourisprobablyinappropriate,because

tiermembershippotentiallychangeseverytimeperiod.Muchlikestudiesthatproclaim,‘womenwhosmokegivebirthtobabieswithlowbirthweight’,thereisspuriouscorrelationgoingon.Justasanotherdimension(socio-economic,culture,etc.)mightbethereal(unseen)causeofthelowbirthweightandNOTnecessarily(only)thesmoking,sothereareotherdimensionsof(unseen)behaviourusingRFMtoexplain,say,campaignresponses.Thatis,theresponseisnotcausedbytheRFMtier,butsomeothermotivation.

Inshort,BSgoesfarbeyondRFM.Theinsightsandresultantstrategiesaretypicallyworthit.

WhatdoesbehaviouralsegmentationprovidethatRFMdoesnot?Asmentioned,BSdeliversacohortofsegmentmembersthataremaximallydifferentiatedfromothersegmentmembers.Becausethesememberstypicallydonotchangesegments,variousmarketingstrategiescanbelevelledateachsegmenttomaximizecross-sell,up-sell,ROI,margin,loyalty,satisfaction,etc.

BSidentifiesvariablesthatoptimallydefineeachsegment’suniquesensitivities.Forexample,onesegmentmightbedefinedbychannelpreference,anotherbypricesensitivity,anotherbydifferingproductpenetrationsandanotherbyapreferredmarcomvehicle.Thisknowledge,inandofitself,generatesvastinsightsintosegmentmotivations.Theseinsightsallowforadifferentiatedpositioningofeachsegmentbasedoneachsegment’skeydifferentiators.Yougetawayfromtryingtoincentivizecustomersoutofthe‘bad’tiersandintothe‘good’tiers.InBS,therearenogoodorbadtiers.Yourjobisnowtounderstandhowtomaximizeeachsegmentbasedonwhatdrivesthatsegment’sbehaviour,ratherthanfocusononlymigration.Thus,BSgivesyouatest-and-learnplan.

Becauseoftheinsightsprovided,knowledgeisgainedofeachsegment’sprimepainpoints,whichmeansthateachsegmentcanbetreatedwiththerightmessage,attherighttime,withtherightofferandattherightprice.Thiskindofpositioningcreatesa‘segmentofone’inthecustomer’smind.Thisuniquenessdifferentiatesthefirm,perhapseventotheextentofmovingitawayfromheavycompetitionandtowardmonopolisticcompetition.Thismeansyouapproachadegreeofmarketpowerthatisbecomingapricemaker.

BecauseBSprovidessuchinsightsittendstomakemarketersveryactiveinunderstandingmotivations.Thistendstogenerateverylucrativestrategiesforeachsegment.

ConclusionWhataretheadvantagesofRFM?It’sfast,simpleandeasytouse,explainandimplement.Whatarethedisadvantagesofbehaviouralsegmentation?Itrequiresanalyticexpertiseto

generate,ismorecostlyandtakeslongertodo.

BStakesbehaviouralvariablesandusesthemforthepurposeofunderstandingcustomerbehaviour,anditusesastatisticalalgorithmtomaximallydifferentiateeachsegmentbasedonbehaviour(seeboxoverleaf).Asmentioned,thevastmajorityofmarketersthatevolvefromRFMtoBSsayit’sworthit,andtheirmarginsagree.


TherearethreecharacteristicsthatdistinguishbehaviouralsegmentationfromRFM:BSuses(typically)morebehaviouraldata,BSusesthedataforthespecificpurposeofunderstandingcustomerbehaviourandBSusesstatisticaltechniquestomaximallyseparatethesegments.

Therearetwogeneralphilosophiesinanalysis:supervisedandunsupervisedtechniques.Unsupervisedtechniquesalmosteliminatetheanalystfromtheanalysis.Theseareneuralnetworks,machinelearning,chaostheory,etc.Philosophically,itseemsonthewrongtracktorunatechniquerequiringlittleanalyticstrategy.It’salsowellknownthatneuralnetworktechniquessufferfromover-fittinganddifficultyinexplainingwhatthemodelmeans(usuallybecauseofthehundredsofadditional/transformationalvariablesneuralnetworkingtendstocreate).Therefore,unsupervisedtechniquesarenotrecommended.

Ofthosetechniquesthatrequiresomekindofanalyticinput,ashortcomparisonfromRFMtoCHAIDtoK-meanstoLatentClassisinstructive.RFMismultivariable(typicallyusingthreevariables)butitisnotmultivariate–simultaneouslyusingthethreedimensions.RFMismathematicalandcouldnotbeastatisticallyvalidoption.

CHAID(chi-squaredautomaticinteractiondetection)issometimesofferedasasegmentationsolution.Itisatree-likestructurethatsplitsthenodesbasedonthechi-squaretest.WhileCHAIDisfastandsimple(andprobablybetterthanRFM)itcannotbeoptimal.CHAIDisnotastatisticalmodelbutaheuristic,aguideline.Itbringswithitnodiagnosticsandlittleintelligence.

K-means(alsocalledpartition,iterative,orclustering)isanotherfastandsimpletechnique.Thetypicalalgorithmrequiresyoutodecideonthenumberofclusters(asifyouknow)anddecidewhichvariablestousetodesigntheclusters(asifyouknow).K-meansgivesnodiagnosticstoaidintheseimportantcriteria,leavingittoyourarbitraryintuition.

So,afterthenumberofclustersisdecided,alongwithwhichvariablestouseforclustering,thealgorithmgoestothefirstobservation(egcustomeronthedataset)thathasallthevariablespopulated,calculatesthecentroid(averageofallthevariablesindimensionalspace)andlabelsthiscluster1.Itgoestothenext

observationthatispopulated,calculatesthecentroidandascertainshowfaraway(basedonthesquarerootEuclideandistance)thesecondobservationisfromthefirst.Ifit’s‘farenough’away(basedoncriteriatheanalystgivesoradefault)tobedefinedasitsowncluster,itis.Itcontinuesthroughthedatasetuntilthenumberofclusterssuppliediscreatedandalloftheobservationsareclassifiedintoone(mutuallyexclusive)cluster.

Note:1)Itisnotstatistical,butmathematical.ItusesthesquarerootEuclidiandistancetoassignclustermembership.2)Clustercentroids(andhenceclusters)arehighlydependentontheorderofthedataset.Ifthedatasetisre-sortedtherewilllikelybeverydifferentsegments.3)Itofferslittleinthewayofdiagnostics.4)Becausetheclustersarenaturallyspherical(owingtoassignmentsbasedondistancefromacentroid)theclusterstendtobeofsimilarsize,whichseemsanunlikelyassumptioninarealmarket.WhileK-meansisastepaboveRFMandCHAID,itclearlysuffersfrommanyshortcomings.

Latentclassanalysis(LCA)hasbeenaroundfor50years,butinthelast20hasreallycaughton.LCAisaBayesian(maximumlikelihood)techniquewhichisstatisticalinnature.Becausecustomerbehaviourisprobabilistic(evenirrational)astatisticaltechniquebettermatchesbehaviourthanamathematicaltechnique.Ithasdiagnosticstofindtheoptimalnumberofsegments.Ithasdiagnosticstofindwhichvariablesaresignificantforthesegmentation.

LCAappliesaprobabilityscoretoeveryobservation(customeronthedataset)tobelongtoeachsegment.Forexample,it’sonethingifcustomerAis95%likelytobelongtosegment1andonly5%likelytobelongtosegment2.Thereisanobviousconclusion.Butwhatif,owingtothecustomeraseitherneweronfileorhavingdisplayedsomeunusualpatterns,itisscoredat55%likelytobelongtosegment1and45%likelytobelongtosegment2?Thisisnotsoclear.LCAgivesyoutheabilitytoremovefromthesegmentassignmentsanyofthosethatdonotfigurestrongsegmentbehaviour.Thisshouldtypicallybeaverysmallpercentageofthefilebuttheabilityto‘know’whereeachcustomermostlikelybelongsisveryimportantstrategically.

Ithasbeenprovedoften,butbynonebetterthanJayMagidsonandJeroenK.Vermunt,thatLCAisvastlysuperiortoK-Meansintermsofsegmentidentificationandseparation(MagidsonandVermunt,2002).GiventheadvantagesofLCAasseenabove,itshouldbeseenasthefirstandbestchoice.

Checklist


RememberSASgivesametricofanoptimalsegmentationsolutionasthe‘logofthedeterminantofthecovariantmatrix’.

Recallavarietyofsegmentationtechniques:businessrules,CHAID,hierarchicalclustering,K-means,latentclassanalysis(LCA),etc.

PointoutthatLCAprovidestheoptimalnumberofsegments,diagnosisofwhichvariablesaresignificantandcalculatesaprobabilityscoreforeverymemberbelongingtoeverysegment–nothingisarbitrary!

Usethebehaviouralsegmentationprocess:strategize,collectbehaviouraldata,create/useadditionaldata,runthechosenalgorithmandprofilesegmentoutput.

ProveRFMisfromthefirm’spointofviewandnottheconsumer’s.

PreachRFMincitesnostrategyexceptmigration.

Partfour

Other

09

MarketingresearchIntroduction

Howissurveydatadifferentthandatabasedata?

Missingvalueimputation

Combatingrespondentfatigue

Afartoobriefaccountofconjointanalysis

Structuralequationmodelling(SEM)


IntroductionWhystickinachapteronmarketingresearch?Mostoftheanalytictechniques(discussedsofar)applytobothmarketingresearchanddatabasemarketing.It’sbecause,whilethereisoverlap,thefunctionandgoalofmarketingresearchisdifferentthanthatofdatabasemarketing.

Databasemarketingexistsinordertodrivepurchasesfromcustomers.Marketingresearchexistsinordertounderstandconsumerbehaviour.

Databasemarketingispopulatedwithprogrammers,econometriciansandmarketers.Marketingresearchispopulatedwithpsychologists,statisticiansandmarketers.Databasemarketingisappliedanalytics.Marketingresearchisexploratoryanalytics.Databasemarketingistacticalandfast.Marketingresearchisstrategicandthorough.

MerlinStone’sbookConsumerInsight(Stone,2004)detailswelldatabasemarketingandmarketingresearch.ThisoverviewincludesCRM,marketingsystems/operations,loyalty,etc.

Howissurveydatadifferentthandatabasedata?Thisisagoodquestion,andmoreinvolvedthanitmayseematfirstglance.Ofcourse,surveydatacomesfromasurveyanddatabasedatacomesfromadatabase.Butthekeythingisthatsurveydatahasasourcethatis(typically)theconsumeranditisself-reportedandmayevenincludeopinions,etc.Databasedatahasasourcethat(typically)isasystem(transactionalorotherwise)anditisrealdata,realbehaviour,realresponses;thatis,NOTself-reported.

Marketingresearchasadisciplinetendstofocusonsurveydata,whereasdirectmarketing,ofcourse,tendstofocusondatabasedata.You’veseenhowmanymarketingsciencetechniquesareapplicabletoboth.Thischapterscrapesoffthosetechniquesthataremostlyusedinmarketingresearch.Youcannotreallydo,forexample,aconjointondatabasedata;itisnotdesignedthatway.

Thisisoneareaofcontentionalludedtoearlier,especiallyintermsofpricing.Marketingresearchwouldsuggestasurveyandaskcustomers/potentialcustomersaboutpricingpolicies.Theseresponsesaresubjective/self-reportedandtendtohavethesameconclusion:‘Yourpricesaretoohigh!’Conjointisdesignedtogetaroundthatinsomemannerbutitisstillartificialintermsofarealbuying/choicedecision.That’swhyIrecommendusingdatabasedatawhichisrealreactionsfromrealtransactionsfacingrealchoicesintermsofrealprices.Realcool,right?Butthereisaplaceforsurveysandconjoint,etc.Justseebelow.

MissingvalueimputationAcommonissueinsurveydata(aswellasdatabasedata,butlessso)iswhattodoaboutmissingvalues.Itisatypicalpractice–but,asisthecasewithmosttypicalpractices,notagoodidea–tojustreplacethemissingvaluewiththemeanvalue.Thatis,saywehavesurveydataarounddemographics,includingage.Saythatinthiscaseageisimportanttowhatwe’restudying.Ifaverysmallpercentofageismissing,maybereplacingthemissingvalueswiththeoverallmeanisnotsobad.Butit’sstillstupid.

Abetterpossibilityistodosegmentation(evenK-meansisadecentchoice)andbasedon,say,incomeorsizeofhousehold,replacethemissingagevalueswiththemeanofeachsegment.Thisindicatesthatageiscorrelatedwithincomeorsizeofhousehold,andthat’sprobablynotabadassumption.

Thebestideawouldbetomodel,usingordinaryregression,thepredictedagebasedontheabovedemographicsbyeachsegment.Thiswouldaddvariation,ratherthanonlythe(segment)meanvalue.

Thisisallbasedonasubjectiveideathatdependsonthepercentofwhatevervalueismissing.If,say,<5%ismissing,replacingwiththeoverallmeanvaluemightbeacceptable.If,say,between5%and25%ismissing,replacingwiththemeanvaluebysegmentisbetter.Ifbetween25%and50%ismissing,modellingthemissingvaluewithregressionbysegmentisthebest.If>50%ismissingnoimputationshouldbeattempted.

CombatingrespondentfatigueMarketingsurveysshouldbeshort(Idon’tknowwhatImeanbyshort,buttheyshouldrequirelittleeffort,thinkingortime).Iftheyaretoolong(whatevertoolongmeans)fatiguewillsetin(orworse,irritation)andresponseswillbegintobe

erroneous/nonsensical.

Thefirstsuggestiontocombatthisproblemistodesignsurveysthatareshort.It’sbettertohavetwoorthreesurveysinsteadofonelongsurvey.Otherwisetheanswersaremeaningless.

Ananalyticsuggestionistorotateandmodelquestions.Thisrequiressomethinkinganddesignbuttheresultsareusuallyverygood.

Thegeneralideaistousesomequestionstomodeltheanswerstootherquestions.Obviouslythesemodelledquestionswouldnotbeasked.Thatis,saythesurveyisinthree(welldesignedformodelling)sections,A,BandC.Onlyonefourthoftherespondents(randomlychosen)wouldgettheentiresurvey.OnefourthwouldgetonehalfofAandonehalfofB,anotherfourthwouldgetonehalfofAandonehalfofC,andthelastfourthwouldgetonehalfofBandonehalfofC.Thesurveyishalfaslongfortheselastthreefourthsoftherespondents.

Nowtheideaistomodeltheotherhalfofthosesectionsthatwerenotgiven.Thatis,useanswersfromAandBtomodelmissingC,BandCtomodelAandAandCtomodelB.See?Frommyexperiencetheerrorsfromfatiguearefarlessintherotate-and-modelscenariothantheerrorsfromthemodel.Thatmeansthatthemodelsareat95%confidenceandthoseanswersarebetterthangivingtheentirelongsurveyto100%oftherespondentsthatwillintroducefatigue-inducederrorsintothem.

AfartoobriefaccountofconjointanalysisTobefair,ifyou’rereadingthisbookinordertoknowallaboutconjointanalysis,youarereadingthewrongbook.Therearedozensof(entire)booksdetailingallthecooltypesandtechniquesofconjoint.IwillbarelymentionthisherebecauseconjointisavastsubjectandIamnotmuchofaconjointguy.

Toelaboratethelastpoint,Ithinkconjointservesanimportantpurpose,especiallyinmarketingresearch,especiallyinproductdesign(beforetheproductisintroduced).Mymainproblem(asmentionedabove)withsurveysoverallisthattheyareself-reportedandartificial.Conjointsetsupacontrivedsituationforeachrespondent(customer)andasksthemtomakechoices.Thecustomermakeschoicesandthesechoicesaretypicallyintermsofpurchasingaproduct.YouknowI’maneconguyandthesecustomersarenotreallypurchasing.Theyarenotweighingrealchoices.Theyarenotusingtheirownmoney.Theyarenotbuyingproductsinarealeconomicarena.TheartificialnessiswhyIdonotadvocateconjointformuchelseotherthannewproductdesign.Thatis,ifyouhaverealdatauseit.Ifyouneed(potential)customers’inputindesigninganewproductuseconjointforthat.Also,pleaserecognizethatconjointanalysisisnotactuallyan‘analysis’(likeregression,etc.)butaframeworkforparsingoutsimultaneouschoices.Conjointmeans‘consideredjointly’.

Thegeneralprocessofconjointistodesignchoices,dependingonwhatisbeingstudied.Marketingresearchersaretryingtounderstandwhatattributes(independentvariables)aremore/lessimportantintermsof(typically)customerspurchasingaproduct.Soacollectionofexperimentsisdesignedtoaskcustomershowthey’drateaproduct(howlikelytheywouldbetopurchase)givenvaryingproductattributes.

Intermsof,say,PCmanufacturing,choice1mightbe:an800costofPC,17inchmonitor,1Gigharddrive,1GigRAM,etc.Choice2mightbe:an850costofPC,19inchmonitor,1Gigharddrive,1GigRAM,etc.Thereareenoughchoicesdesignedtoshoweachcustomerinordertocalculate‘part-worths’thatshowhowmuchtheyvaluedifferentproductattributes.Thisissupposedtogivemarketersandproductdesignersanindicationofmarketsizeandoptimaldesignforthenewproduct.

Notethatitisimportanttodesignthetypesandnumberoflevelsofeachattributesothattheindependentvariablesareorthogonal(notcorrelated)toeachother.Thesechoicedesigncharacteristicsarecriticaltotheprocess.Attheendanordinaryregressionisusedtooptimallycalculatethevalueofpart-worths.Itisthisestimatedvaluethatmakesconjointstrategicallyuseful.

Nowlet’stakeaslightlydeeperdiveintotheanalyticsofconjoint.Notethattheideaistopresenttoresponderschoices(insuchawaythattheyarerandomandorthogonal)andtherespondersrankthesechoices.Thechoicerankingsarearesponder’sjudgmentaboutthe‘value’(economistscallitutility)oftheproductorserviceevaluated.Itisassumedthatthistotalvalueisbrokendownintotheattributesthatmakeupthechoices.Theseattributesaretheindependentvariablesandthesearethepart-worthsofthemodel.Thatis:

Ui=x11+x12+x21+x22+xmn

whereUi=totalworthforproduct/serviceand

X11=part-worthestimateforlevel1ofattribute1




Xmn=part-worthestimateforlevelmofattributen.

Asmentionedabove,myview(andmanywillviolentlydisagree)isthatconjointisappropriatefornewproduct/serviceevaluations,andthat’saboutall.Itisnotappropriateinthetypicalwayusuallyused,especiallyintermsofpricing,except,asmentioned,inanewproduct–aproductwherethereisnorealdata.(Ievenprefer,say,vanWestendorppricingschemesoverconjoint.Thesearewherethesurveyasksrespondentswhatpriceissohighyouwouldnotconsiderpurchaseandwhatpriceissolowyouwouldsuspectaqualityissue.Theintersectionofwhere‘tooexpensive’and‘toocheap’crossis

hypothesizedasoptimalprice.)

Anyway,foranexistingproduct,itispossibletodesignaconjointanalysisandputpricelevelsinaschoicevariables.Ihavehadmarketingresearcherstellmethatthispricevariablederivesanelasticityfunction.YoushouldknowbynowhowIfeelaboutthat.Idisagreeforthefollowingreasons.1)thoseestimatesareNOTrealeconomicdata.Theyarecontrivedandartificial.2)Thesizeofthesampleitisderivedfromistoosmalltomakerealcorporatestrategicchoices.3)Thedataisself-reported.Thoserespondentsarenotrespondingwiththeirownmoneyinarealeconomicareapurchasingrealproducts.4)Usingrealdataisfarsuperiortousingconjointdata.HaveIsaidthisenoughyet?Ok,therantwillnowstop.

Structuralequationmodelling(SEM)Thiswillunfortunately(also)beafar-too-briefaccountofSEM.SEMisinthedomainofmarketingresearch,ratherthandirect/databasemarketing(wherewe’vespentmostofourtime)butitissopowerfulandsofunthataquicktourhastobedone.

TherearesomesimilaritiesbetweenSEMandsimultaneousequations(coveredearlier).Theyeachareaboutsystemsofequationsandthusseveralsimilaritiesfollow.Theyeachdealwithendogenousandexogenousvariables.Theyeachrequirethealgebraicsolutionoffixedvariablesandenoughobservationstocalculatevariance.Ofcoursetheyeachrequiretheanalysttothinkthroughcauseandeffect.Thisisbecausebothtechniquesareaboutcauseandeffectandcanbeconceptualizedasregressions.

Asmentioned,SEMisamarketingresearchtoolwhilesimultaneousequationsareaneconometrictool.Thisisthefirstdifference.Another(major)differenceisthatsimultaneousequationsare(only)aboutblatantvariableswhileSEMcancontainbothblatantaswellaslatentvariables.Thisisinfact,inmyview,themostimportant(andexciting)difference.Anotherdifferenceisthatsimultaneousequationsoperateoneach(raw)observation(say,eachrowisacustomer)butSEMoperatesonanobservationbeinganelementofacovariancematrix.Whew.So,withthat,let’sgoontoafewdefinitionsofSEMasadifferentkindofanimal.

Figure9.1Unitsandpricecauserevenue

Inthecontrivedexampleabove,notethatbothunitsandpriceCAUSErevenue.Revenueisadependentvariable.That’sequation1.NotealsothatbothpriceandmarcomCAUSEunits.Unitsareadependentvariableinequation2.Obviouslyunitsarebothanindependentandadependentvariable.Therearetwoequations.Alloftheseareblatant(manifest)variables.Theycanbemeasuredforwhattheyare.

Revenue=f(units,price)

Units=f(price,marcom)

Itistrueinthiscasethatwhilepriceandmarcomstatisticallyimpactunits(withstochasticerror),revenueisNOTstatisticallydrivenbyunitsandpricewitharandomerror.Revenueisalgebraicallycausedbyunits*price.Thiswouldbeastraightlinewithnoerror.It’sjustanexample.ItalsoshowsthatSEMisoftendiagrammedusingpaths.Wewilldothesame.Exampleswillrevolvearoundpathanalysis.InSASitwillbewithproccalis.

Let’sgooversometerminology,asSEMhasitsownlanguage,jargon,etc.Asnoted,therearetwokindsofvariables:manifestandlatent.Manifestvariablesareblatant,directlymeasured,directlyobserved.Thesearethingslikeresponses,sales,units,priceordaysbetweenpurchases.Thesecondkindofvariableislatent.Theseare(indirectly)estimatedthroughobservabledata.Thesearethingslikesatisfaction,loyaltyandintelligence.Thatis,whilethereisnoquantitativeobservablemetricof,say,satisfaction,itcanbeinferredbyobservablebehaviour.

Nowlet’smentionagainexogenousandendogenousvariables.Exogenousvariablesareoutsidethesystem;theyareindependentvariables(notcaused)butcanbeeitherlatentormanifest.Endogenousvariablesaretypically(atleast)dependentvariablesandarecausedbysomethingelse.Theyalsocanbeeitherlatentormanifest.Okay?Nowwe’rereadytodoSEM.

ComparingregressiontoSEMForasimpleexamplelet’suseprocregrevenue=f(units,price)andthenproccalisrevenue=f(units,price).

ThisisfartoosimpleauseofSEMbutitwillillustratesomeimportantthings.Note

thatallvariablesaremanifestandwehaveonlyoneequation.Let’ssaywerunprocregandgetthefollowing:

Table9.1Procreg

Variable Parmestimate Standarderror Tvalue

Intercept –8862

Units 73.24 7.4 9.98

Price 111.25 19.03 5.84

Nowifwerunproccalis:

proccalisdata=xx.xxmeanstr;

path

rev<–unitsn_price;

run;

Table9.2Proccalis

Pathrevenue Variable Parmestimate Standarderror Tvalue

Intercept –8863

Units 73.24 1.48 49.39

Price 111.25 2.07 53.81

Proccalisgivesalotmore(butnotshownhere)results.TheonlypointhereisthatSEMandOLSshowthesame(singleequation,manifest)output,intermsofparameterestimates.Thedifferenceint-valuecalculationisthatregressionusesadifferentdenominatorforstandarderrorthanSEM.

CalculatingimpactsNowlet’sseewhathappenswhenweincludemorecomplexityandmorerealism.Mostmarketerswanttoknowtheimpactoftheirmarcom(andprice)onrevenue.Saywedidaregressionmodelrevenue=f(units,price,e-mail,directmail).(Wewillignorethealgebraicissueofhavingbothpriceandunitsasindependentvariables.)Theinteresthereismarcomimpacts.

Table9.3Regressionmodelrevenue

Variable Parmestimate Standarderror Tvalue

Intercept –9368

Units 77.08 7.569 9.79

Price 115.24 20.112 5.73

Email 9.089 2.969 3.06

Directmail 3.99 1.88 2.12

Thisindicatesthateverye-mailsentdrives9.089inrevenueandforeverydirectmailsentweget3.99inrevenue.Lookslikemarcomisreallyrockin’!Thismeansthatsending100eachdrives909and399or1,308intotalrevenue.Thismodelimplicitlyassumestheimpactofmarcomisdirectlyonrevenueandnotonunits.TheR2hereis57%.

Nowlet’sgoastepfurther,andtheresultswillbemoreinteresting.Wewillusetheabovepathoftwoequations:

Revenue=f(units,price)

Units=f(price,email,directmail)

wheremarcomwillbenumberofe-mailsanddirectmailssent.Thehypothesishereisthatunitsandpricedirectly(algebraicallyinthiscase)impactrevenue.Theotherhypothesisisthatpriceandmarcom(EMandDM)directlyimpactunitswhichthenindirectlyimpactrevenue.Thatis,unitsarebothadependentandanindependentvariable.ThatmeansthatrevenuecomesfrombothpriceandunitsandthatunitscomefrompriceandEMandDM.

Thismeansthetotalimpactonrevenueis:

Table9.4Totalimpactonrevenue

Pathrevenue Variable Parmestimate Standarderror Tvalue

Intercept –8863

Units 73.24 1.48 49.39

Price 111.25 2.07 53.81

Pathunits Intercept 259

Price –2.53 0.082 –30.88

Email 1.266 0.299 4.23

DirectMail 1.141 0.089 12.82

Mostimportantlynotetheimpactofmarcomisthroughunits,andnottorevenue.Theimpactofonee-mailisnow1.266ofrevenueandeverydirectmailisnow1.414.Nowsending100eachonlytotals241inrevenue.Thisisfarmorerealisticthantheabove

model.TheR2hereis78%.Whilethisisacontrived,overlysimplisticmodelithascomplexitythatmorecloselymatchesreality.

UseoflatentvariablesNowlet’stalkaboutwheretherealpowerofSEMcomesin:theuseoflatentvariables.Inthiscaselet’sputtogetheraframeworkforloyalty.Notethatthereisactuallynosuchthingasablatantentitycalled/quantifiedas‘loyalty’.Itisalatentvariable.Theideaisthatitislikeintelligence,whichisalsounquantifiableasitself;itcanonlybeindirectlymeasuredassomethinglikeascoreonanIQtest,whichinturnmeasuresdimensionsofintelligence:spatialability,logic,mathematics,verbalskills,etc.Sameistrueforloyalty.Itcanbeseenandsurmisedbyotheractions.

Let’ssaywehaveabehaviouralsegmentationinplacebasedoncustomertransactionsandresponsestomarcom.Weareinterestedinhowloyaleachsegmentis,whichisnotnecessarilythesamethingashowmuchtheyspendorhowmanytransactionstheyhave.Sowedoprimarymarketingresearchandaskquestionsaboutopinions/attitudesaroundprice,value,qualityandsatisfaction.Thesemetricswillshowarangeofloyalty.Wealsoaskaboutshareofvoice,competitivedensityandtheconvenienceofourstorescomparedtoourcompetitors.

Themodelabovetriestoputaframeworktogetherthatsaysconsumerbehaviour(transactions,responses,etc.)iscausedbyaspectrumofloyalty(fromnonetotransactionaltoemotional)whichisinturncausedbyattitudesaroundprice,value,satisfactionandqualityaswellasopinions/metricsofoperationallogisticslikeconvenience,shareofvoiceandcompetitivedensity.

Figure9.2Marcomresponsestransactions

Sothegeneralanalyticideaisthattherearenosuchmetrics/quantitiesasemotionalortransactionalloyalty.Thesearelatentvariables.Butaddingthesevariableshelpsexplainthebehaviourofcustomerspurchasingandcustomersresponding.Thislatentvariableisdiscoveredbyafactoranalysis-typetechniqueusedinSEM.Thatis,themanifestvariablesindirectlyshowtheinfluenceofthelatentvariableandthatlatentvariableis‘teasedout’andlabelled.

Aquicknoteaboutthedifferencebetweentransactionalandemotionalloyaltyshouldclarifythisimportantpoint.Itispossibleforacustomertoappearveryloyalintermsofbuyingalotofproducts,havingashorttimebetweenpurchases,respondingtomarcom,etc.,butnotbeinfactactuallyloyal.Theseareheavypurchasersbecausetheremightnotbeanycompetitorsaround,orourstoresareveryconvenientorourshareofvoiceiscomparativelylarge.Thusit’simportanttoknowhow‘loyal’customersare.Thatis,atransactionalloyalcustomermayjumpshipifcompetitorsmoveinneartheirlocation,orchangetheirshareofvoice.

Theresultsbelowarefromapplyingtheloyaltymodeltotwodifferentsegments,sayXandY.Thesegmentsweredefinedby(transactionsandmarcomresponses)behaviour.Thequestionishowloyal(whatkindofloyalty)theyareandwhatcanbedoneaboutit.Let’ssaythateachsegmenthasgenerallythesamemetricsontransactionsandresponses.SegmentXscoresasatransactionalloyaltycustomer.Notetheparameterestimatesofconvenienceandcompetitivedensityareveryhighandsignificantwhileshareofvoiceisstrongandnegative.Thesearetraditionalindicationsofthetransactionalloyaltysegment.Notealsohighandpositiveimpactsofattitudesaroundpriceandquality,andrecognize

thatmostofthevariablesontheemotionalpathareinsignificant.

Now,asegmentthatscoresasastrongtransactionalloyalty-onlysegmentisabitofaredflag.ThisisespeciallytrueiftheyLOOKliketheyareloyalbasedontheirnumberandamountofpurchases.

Howcanweusetheabovemodeltomovethesegmentfrommeretransactionalloyaltytoemotionalloyalty?Theanswerisintheemotionalloyaltypath.Thesinglelargestimpactisshareofvoiceandthatisametricwecan(somewhat)control.Thereisabusinesscasearoundwhatisthecosttospendandincreaseourrelativeshareofvoiceappliedagainsttheaddedsecurity(andperhapsincreasedpurchasing)ofasegmentthatevolvesintoemotionalloyalty.Seethatshareofvoiceisnegativeinthetransactionalpath?AsSOVincreasesacustomerislesstransactionalandmoreemotional.

Table9.5SegmentX,transactionalloyalty

Path Variable Parmest Sterror Tvalue

Transactional

Price 5.65 3.23 1.75

Quality 6.21 1.65 3.75

Value 3.03 2.07 1.47

Satisfaction 1.35 0.66 2.05

Convenience 5.22 0.75 6.96

Competition 2.66 0.99 2.68

Shareofvoice –1.55 1.03 –1.51


Emotional

Price 0.03 2.66 0.01

Quality 0.56 1.07 0.53

Value 1.04 2.36 0.44




Shareofvoice 2.55 1.69 1.51

Nowlet’slookattheoppositekindofloyalty,thebrand/emotionalkind.Thesearecustomersthatloveourbrand,nomatterwhat.ViewtheoutputbelowforsegmentY,whichscoresmostlyasanemotionallyloyalgroup.Noteontheemotionalpathconvenienceandcompetitivedensityarenegative.Thissegmentissoconnectedtothebrandthatevenifitisinconvenienttogotoourstoretheygoanywayandevenifmorecompetitionmovesinthesecustomerscometoourstoreanyway.Thisisemotionalloyalty.Youseealsothatontheemotionalpath,whilepriceispositiveit’sinsignificantandqualityisverysmall.Itshouldbenosurprisethatbothvalueandsatisfactionarehigh.Onthetransactionalpathnoneofthosemetricsaresignificant.

Table9.6SegmentY,emotionalloyalty


Transactional

Price –1.27 5.65 –0.22

Quality 2.07 6.24 0.33

Value 2.07 1.65 1.25




Shareofvoice –2.65 1.54 –1.72


Emotional

Price 3.25 3.04 1.07

Quality 0.24 0.12 2.06

Value 1.26 0.76 1.67


Convenience –3.65 1.26 –2.91

Competition –2.07 0.56 –3.66

Shareofvoice 1.27 0.87 1.45

ThisisthepowerofSEM,hypothesizingandtestingalatentvariable.Thislatentvariableaccountsformovementinthecustomertransactionsandcustomerresponses.Ifonlyablatant/manifestmodelwasusedthefitwouldnothavebeensogoodandtheinsights

(differentiatingbetweenthetwokindsofloyalty)wouldnotberealized.Soisthatcool,orwhat?

Checklist


Pointoutthatmarketingresearchanddatabasemarketingusemanysimilarmarketingscience/analytictechniques.

Rememberthatsurveydataanddatabasedataaredifferentinmanyways:•surveydataistypicallyafewhundredorthousandresponses,whereasperhapsmillionsofconsumershavetransactionsonadatabase;•surveydataisself-reported/opinionswhereasdatabasedataisrealevents;•surveydataisasampleofsomekindwhereasdatabasedatacanbethewholerelevantpopulation(egallofafirm’scustomers).

Takegreatcareinimputingmissingvalues.Undersomecircumstancesreplacingamissingvaluewiththemeanisappropriate,othertimesmaybeamodeliscalledfor.

Recallthatconjointanalysisisbestsuitedfornewproducts,becauseoftheartificialnatureofthesimulatedpurchase.

Differentiatebetweenstructuralequationsmodels(SEM)andsimultaneousequations.SEMandsimultaneousequationsarebothsystemsofequations,butSEMdoesnotrequireonlyblatantvariables.

ArguethatthepowerofSEMisinuncoveringlatentvariables.

10

StatisticaltestingHowdoIknowwhatworks?Everyonewantstotest

Samplesizeequation:usetheliftmeasure

A/Btestingandfullfactorialdifferences

Businesscase


EveryonewantstotestStatisticaltesting(designofexperiments,DOE)seemstodecreasetheriskofmakingamistake.

Designofexperiments:aninductivewayofcreatingastatisticaltestusingastimulustakingintoaccountvariance,confidence,etc.,byrandomizationandcomparisontoacontrolgroup.

I’lltellyourightnow,Imyselfamnotreallyatestingguy.Iseeitsworth,butthetimesthatthetestisactually‘clean’,canbemeasuredandismeasuringwhatitwasdesignedtomeasure,areveryfew.Thisisbecauseofacoupleofthings.First,companiesdonotwanttodesignfortestvs.control–whywouldtheywanttotakepotentialbuyersoutofthetreatment(iethecontrolgroupdoesnotgetthestimulus–thetest)?Themarketingscienceansweristhat‘youmustinvestinthetest!’Sofirmsusuallyfighttomakethecontrolgroupsosmall,actuallytoosmall,sothatastatistical(t-test,z-test,etc.)cannot(reliably)beperformed.

Anotherreasonisthatmostofthetimethetestis‘dirty’.Weneverseemtogetcustomersthatweretogetonlyacertainkind(ornokind)oftreatment(stimulus).SayacustomerissupposedtogettreatmentXsotheycanbemeasuredagainsttreatmentY(thatisthetest).However,accidentally,thatcustomeralsogetsstimulifromotherpartsofthecompanyandthenumberoneruleoftestingis:onlyonethingcanbedifferentinmeasuringtestvs.control.IfacustomerwassupposedtogetonlytreatmentXandthey(orsomeofthem)alsogotstimulusAandtreatmentB,promotionC,etc.,thetestcannotbedone;youcannotmeasure(inaDOEframework)multipledifferences(withoutdesigningforthat).Thatiswhythedesigniscritical.

Veryfewcompaniesaredisciplinedenoughtoactuallycarryoutatest.Mostofthetime,attheendofthetest,everyoneshrugstheirshouldersandalsoacknowledgesseasonalityorcompetitionorchangingtastesandpreferencesorhypothesizesthatsomethingsystematic,affectedthetestresults.Sotheywanttotestagain.Andagain:neverreallylearninginordertoact,justtesting.Moreaboutthatlater.

Samplesizeequation:usetheliftmeasureTestingquestionsalwaysbeginwithsamplesize.Theideaistohaveasamplelargeenough–andwithenoughvariation–inordertobeconfidentaboutgeneralizingtothepopulation.Rememberstatisticsusesinductivereasoning.Thatisthepointoftesting:takeasmallsample(soasnotto(publicly)ruinanything)andsimulatethepopulation.That’simportant.Whatyou’retryingtodoisdesignalaboratorythatlooks(andacts)justlikethepopulation.Youexperimentonthe(sampled)laboratoryandfindwhatseemstoworkandthenyouhavetothrusttheseontothepopulation,whichyouhopewillactasthesampledid.That’sinductivereasoning.

Sowehavetorevisitthenormaldistribution,z-scoresandtheconfidenceinterval.Thatwasalongtimeago,sogobackifyouneedto.Idid.

Rememberthatthenormaldistribution(althoughkindoftheoretic)isthemodelthatweuse(mostly)fortesting.Weassumeanormaldistribution.Thenormaldistributionischaracterizedbytwothings:1)themeanandmedianandmodeareallthesamenumberand2)theirdistributionissymmetricalaboutthatnumber.Now,bydefinition,withinthefirststandarddeviationofanormaldistributionarecontained68%ofalltheobservations;withthesecondstandarddeviationadd14%toeachside,aggregating28%moreforatotalnumberofobservationsbetweentwodeviationsof96%.SeeFigure10.1.Nowlet’sthinkaboutz-scores.Remembertheformulais

(observation–mean)/standarddeviation.

Figure10.1Z-scores

IntermsofIQ,wherethemeanis100andthestandarddeviationis15,68%ofallobservationsarebetween85and115.Saidanotherway,anIQof+1standarddeviationsisaz-scoreof1.00,whichisgreaterthan(34+34+14+1.9)nearly84%ofthepopulation.Az-scoreof+2.0isgreaterthannearly98%ofthepopulation.See?Thisisactuallythekeytosamplesizeneededandoveralltesting.

BysampleImeanasubsetofthepopulation.Evenifyoudonotreallyhavethewhole,entirepopulation,we’llpretend.Whatelsecanwedo?Sowegenerallytakeasimplerandomsample(SRS)ofthepopulation.Buthowlargeasampledoweneedinordertosimulatethepopulation?

Samplesizeneedstotakeintoaccount(intermsofDOE)variationwhichaffectsconfidence.Wearetryingtobeprettyconfidentthatthesizeofoursamplewillmirrorthepopulationwhenthetestingisdoneandthengeneralizedtothepopulation.Thatis,ifyoutookthemeanofthepopulationandfoundittobe50.0andthentookanSRSandfoundthatmeantobe40.0,wouldyoubeconfidentthatyoursamplemirroredthepopulation?Theansweris,‘Maybe,dependingonthevariation’.Sayyouknewthepopulationhadameanof50.0butastandarddeviationof25.50.It’spossibleyourSRSisrepresentativeofthepopulation.Thez-scoreis–0.392,whichmightnotbeTHATunusual.

So,theformulaI’dadvocateforsamplesizeneedstotakeintoaccountthestandarddeviationofthepopulation,howconfidentyouwanttobeofgeneralizingyourresultstothepopulationafterthetest,whatsensitivityyouwanttomeasure(ieliftdetection)andexpectedresponse.Thatis:

wherenissamplesize,Zisconfidencelevel,risresponserateandl=liftdetection.Asanexample,saywehaveanexpectedresponserateof28%,aconfidencewantedof90%(z-score=1.64)andaminimalliftdetectionof5%,thesamplesizeneededineachcellis5,566.Thatis,tobe90%confidentyourresultswillgeneralizetothepopulation(9outof10timesitwill,theoretically),andhavingusuallya28%responserateandyouwantedtonotdetectadifferenceunlessitisbyatleast5%(thatis,26.6%–29.4%)response,youneedatotalsampleof11,131.Thatis,forA/Btestingyouneed5,566ineach(testandcontrol)cell.See?

Ihavetomentionasillythingthatisstillgoingon,Ihearitallthetime.Theanswertothequestion‘HowlargeasamplesizedoIneed?’isoften‘380’.(Ifnotexactly380itisverycloseto380.)Letmeshowyouwherethiscomesfromandwhyitiswrong.Evenstupid.

Theformulathisusesis:

Oftenmarketerstestat95%confidence(az-scoreof1.96)anda1%responserateisassumedandtheyonlywanttoaccepta1%error,whichtranslatesthisformulaintoasamplesize380.Nowthinkaboutthis.A1%assumedresponseratemeansthatofthe380cellonly3.8willrespond.Iguaranteethat3.8(okay,rounditupto4people)isNOTenoughtobeconfidentabout.Atall.Oriftheysay380areresponses,thenthatcellactuallyhad38,000init,right?Seethefolly?

Isn’tthisthesameproblemwiththeformulaIrecommendabove?No,itisnot.Ofthe5,566cellsizeandaresponserateof28%thatmeanstherewillbe1,558respondersandIcanbeconfidentwiththat.Orevenata1%responserate(still90%confidenceand5%lift)thecellsizeisover200,000.And2,000responsesareenoughtotestandbeconfidentabout.So,donotletthemtellyou380isanadequatesamplesize.Isitanywondercorporationsareinanosedive?

A/BtestingandfullfactorialdifferencesAcoupleofquicknotesonverycommontestingwillfollow.DidImentionIamnotreallyatestingguy?

WealwaystalkaboutA/Btesting(sometimescalled‘champion/challenger’)andthissimplymeanscomparing(evenastestvs.control)twocellsagainsteachother.Theideaisthatwerandomlychosetheparticipantsineachcelland(thisisimportant)theonlydifference(getthat?Theonlydifference)betweenthemisthatthetestcellhasthetesttreatmentandthecontrolcelldoesnot.

ThenwemeasuretheaverageresponsesofcellAvs.cellBandiftheyaredifferentenoughwesaytheyarestatistically/significantlydifferent.Thatmeanswehaveconfidence(typically95%)thatwhenwegeneralizethistothepopulationthesameresultshappen,onalargerscale.TheformulaIusuallyuseforresponsetestingisthez-score:

where .At95%confidenceifthisformulais>1.96thentheAresponserateisstatistically,significantly(andpositively–yesthisisveryimportant!)differentthantheBresponserate.

Asanexample,let’ssayfortheAtestwehaveresponsesof1,200andwesent10,000.ForBwehaveresponsesof950andwesent5,000.rAmeansresponsesfromA,nAmeanspopulationofA.(rA=1,200,nA=10,000,rB=950andnB=5,000.)Thiscalculatestoaz-scoreof–11.53whichisstatisticallyandsignificantlydifferent:withBoutperformingAat95%confidence.

Letmemakeanotherpointthatmarketers(especiallyretailers)haveahardtimewith.Inordertoeffectivelycalculateandmonitorincrementalmarcom,thereneedstobeauniversalcontrolgroup(UCG).Thismeansagroupofcustomersthatnever(ever)getpromotedto.Thiscanbeasmallgroup,butstillstatisticallysignificantinordertotest.IfyoudonothaveaUCGyoucanonlytestonetreatmentcomparedtoanother,andneverknowifit’sincremental(ordetrimentalforthatmatter).IrealizeI’maskingyoutosetasideagroupofcustomersthatwillnevergetapromotion,nevergetabrandmessage,etc.Thisiscalledinvestinginthetest.Ifknowledge(orproof)thatyourmarcomisdrivingincrementalrevenuetoyourbusinessisimportant(andnoonewoulddisagreethatitis)thenyouneedtoinvestinthetest.Everycampaignneedstobedesignedatleastasatestvs.controlandthecontrolistheUCG.Ifyoudoabusinesscaseonthepotentialrevenueyou’lllosefromtheUCGandcomparethattotheinsightyou’llhaveaboutwhichcampaignsareactuallyincreasingthebottomline,investinginaUCGwinseverytime.RememberthepointofanalyticsistodecreasethechanceofmakingamistakeandUCGisallaboutthat.

BUSINESSCASEScottwalkedintothelittleconferenceroom,knowinghewouldagainhavetoexplainandstrugglewithBecky,thedirectorofconsumermarketing.Everymonthshehadmanyideasabouttest-and-learnplansandwhatshewantedtolearnfromaseriesofmailings.EverymonthScotthadtoexplaintohertheconceptsoftesting,especiallytheideaofonlychangingonedimensionatatimeinordertotest.Hehadthoughtifmaybeherecordedlastmonth’sconversationhewouldjustsendtherecordingandhaveherpressplaytore-hearit.

Hearrivedfirst.Healwaysarrivedfirst.Heestimatedinayearhewasted53hourswaitingforameeting/phonecalltostartwhileeverybodyelseeventuallywanderedin.Beckyandherteamjoinedhimaboutsixminutespastthehour.

‘SoScott,we’dliketotestourmessagesagain.Reallygetsomelearning.’

‘Great,allforit’,Scottsaid.Healwayssaidthis.

‘I’vethoughtaboutwhatyou’vebeensayingandhaveputatabletogether.We’dliketotestdiscountsagainstdifferentaudiences.’Sheshowedhimthetable.Notethatdiscountlevelisappliedonlyonce.(SeeTable10.1.)

Table10.1Testingdiscountsagainstdifferentaudiences

CellA 5%discount Desktoppurchase

CellB 10%discount Onlineexclusive

CellC 15%discount Purchased>$2,500

CellD 20%discount Addingaprinter

Scottsighed.‘Becky,thisisthesameideawe’vehadbefore.Comparetwocustomers;oneincellAandanotherincellB.IfcellBhasahigherresponse/morerevenue,isitbecauseofthe10%discountsorbecauseoftheonlineexclusive?’

‘Iwouldsayboth’,shesmiled.

‘Butthepointofatestistoisolatejustonetreatment,inordertoquantifythatstimulus.’Helookedatthem.Theyallsmiled,allnodded.‘Whatisneededtotestthisisnota4cellbuta16cellmatrix.Likethis.’(HedrewTable10.2.)

Table10.2Testingdiscountsagainstdifferentaudiencesina16cellmatrix

5%discount 10%discount 15%discount 20%discount

Desktoppurchase CellA CellE CellI CellM

Onlineexclusive CellB CellF CellJ CellN

Purchased>$2,500 CellC CellG CellK CellO

Addingaprinter CellD CellH CellL CellP

‘Wow’,Beckysaid.‘Thatmakessense.Wewillneedafargreatersamplesizethough,right?’

‘That’sright.Thisiscalledfullfactorialandwilldetectallinteractions.Thebenefitisintheconfidenceofthelearningsandthecostisinthesamplesize,whichmeansbothtimeandmoney.It’satrade-off,asalways.’

‘Okay,we’llredesign.Let’salsotalkabouttheresultsoflastmonth’stest.’

‘Great.’

‘Well,inthiscasethecontrolcellout-performedthetestcell.Sothetestdidnotwork.’

‘Whatwerewetesting?’

‘Thiswastopastdesktoppurchasers.Thecontrolwasa10%discountandthetestwasa20%discount.Inthepastthe10%discountisprettystandardsowewantedtoseehowmanymoresaleshappenwitha20%discount.’

‘Makessense’,Scottsaid.‘Itseemssoweirdthatthe10%wouldout-performthe20%.Byhowmuch?’

‘Byalmost50%moreresponse,thatis,numberofpurchases.’

‘Thesewererandomlychosen?’

‘Yep’,Beckysaid.‘Iguessitmeansourtargetaudiencedoesnotneedadeeperdiscount,whichisagoodthing.Theyareveryloyalandwillactwithoutadeeperstimulus.ButsomehowIdoubtit.’

‘SodoI.Itdoesnotmakeeconomicsense.Weshouldinvestigatethelist,makesurebothsidesgotthesingletreatment,trytoseeifsomethingwasamiss.Eachcellwasaboutthesamesize?’

‘Yeah,veryclose.’

‘But’,Kristinasaid,‘howdidwemakesurebothcellsonlygotthistreatment?’

‘Whatdoyoumean?’Scottasked.

‘NothinghappenedthatIknowoftopullthesecustomersoutandonlygetthismonth’sdeal.’

‘Andlastmonththe“GetaFreePrinter”wentout.’

‘Andthedesktopbundlewentout.’

‘Andsincefarmoreofourcustomersgetthe10%discountthananythingelse,thosethatgotthe10%discountinthistestcellmayalsohavereceivedoneorbothoftheotherstimuli.Right?’

‘Yeah,Ithinkso.’

‘Well,iftrue,thatcouldexplainit’,Scottsaid.‘Our10%testcellmayhavegotatleastthreestimuli,notone.’

Beckysighed.‘Sothetesthastobedoneagain?’

‘Probably.Ifitwasimportanttoknowwhatthattreatmentdrovethentheanswerisyes.’

‘Well,yeahitwas.Andwe’vehadsuchdifficultywithtestinganyway–Imeanthedesignofit–togobackandre-testwillbeahardsell.’

Scottlookedather.‘Idon’tknowhowhelpfulitmightbe,butwepossiblycoulddoamultivariateexercisetotrytoisolatethistest.’

‘Whatdoyoumean?’

‘I’mnotsure.Wemightbeabletodoamodelthataccountsforallthetreatmentsandstill,ceterisparibus,measuresjustthiscampaign.’

Kristinalookedup.‘YoumeananANOVAofsomekind?’(Analysisofvarianceisageneralstatisticaltechniquetoanalysethedifferenceswithinandbetweengroupmeans.)

‘Yeah,althoughI’maneconguysoI’mmorecomfortablewithregression.Butsometechniquethataccountsformultiplesimultaneoussourcesofstimulionrevenue.’

ScottwenttothewhiteboardanddrewTable10.3.

Table10.3Multiplesourcesmodel

CustID

60dayreview

Printerpromo

DTbundlepromo

20%discpromo

#opens

#clicks

#webvisits

#calls

Pastrev

X 0 1 0 1 7 3 9 0 1800

Y 900 0 1 1 8 1 5 2 490

Z 0 0 0 0 11 4 4 1 800

‘Now’,Scottsaid,‘wecanincludeanyandallpromotions,etc.,thatwecantrackandputinthismodel.Theideaistomeasurethedollarvalueofallstimuli.’

‘Whatifwedon’torcan’tgetalltheinformation?’

‘Wewillalwaysmisssomething.It’simportanttoincludeallweknow,allwecanknow,frombothatheoreticalaswellasactualcausalityassumption.Thereisafinelinebetweenincludingtoomuchandmissingsomethingimportant.’

‘Canyouexplainabitaboutthat?I’mnotsurewhatyoumean’,Kristinaasked.Shehadalwayshadaninterestinthemodellingprocess,especiallyonthemoretechnicalsideofthings.

‘Fromaneconometricpointofview,toexcludearelevantvariablewillbiasthoseparameterestimates,soweneedtoensurewehaveallimportanttheoreticallysoundindependentvariables.Toincludeanirrelevantvariableincreasesthestandarderroroftheparametersestimates,meaningthatwhiletheyareunbiasedthevariationislargerthanitshouldbesothet-ratios(beta/standarderrorofbeta)willappearsmallerthantheyshouldbe.Thus,itbehoovesmodellerstodesignatheoreticallysoundmodelandcollectrelevantdata.’

Theyalllookedathim.‘Soundsgood’,Beckysaid.‘Let’stalkwithITandcollectthedatayouneedandyoucanputthistogetherforus?’

SoScottgotthedatatogetherandranthemodelandtheyfoundthevariouscampaigns’contributiontorevenuethataccountedformostotherimportantfactors.ThistypeofanalysisallowedScott’steamtooffercampaignvaluationoutsideofastrictlytestingenvironment.Whileeachpointofviewhasplusesandminuses,Scott’svaluationmethodcouldspecificallytakeintoaccountother(dirty)dataissues.Also,hisresultsdirectlytiedtosales,somethingA/Btestingdidnotdo.Asmentioned,abackgroundineconomicsisvaluableforamarketingsciencefunction.

Checklist


Remindeveryonethattheymust‘Investinthetest!’Thistypicallymeansusingalargeenoughsampleforacontrolgroupthatwillallowameaningfultest.

Pointoutthatit’sdifficulttoactuallycontrolforeverything.Simplerandomselectionisonlyabluntinstrument.

Rememberthatexperimentdesign,A/Btesting(championvs.challenger)willnotgivetheimpactofindividualdimensions(whatimpactpricehas,ormessage,orcompetitionchanges,etc.).

Demandthatthesamplesizeequationincorporateslift.

Makefunofthesillyanswer(‘N=380’)tothequestion‘Howlargeasampledoweneed?’

Shoutloudthatinalltestingeachcellcanonlydifferbyonething(onedimension).

Recommendusingordinaryregressiontoaccountfor‘dirty’testing.

Partfive

Capstone

11

Capstone:focusingondigitalanalyticsIntroduction

Modellingengagement

Businesscase

Modelconception


Conclusion

IntroductionThischapterisacapstoneofmostofwhatwe’vedonebefore.It’smeanttobeapracticalapplicationoftraditionaltechniquesappliedtodifferentkindsof(non-traditional)data.

Sincethemid-1990swhentheWorldWideWebbecameavailable,manymarketingscientistsandotherspanickedbecauseofthenewkindofdata.Clickstreams/weblogswerebecomingavailableandmanypeoplethoughtthatthenewdatawouldneednewtechniques.Theyforgotitisstillmarketing.Theyforgotitisstillconsumerbehaviour.

You’veprobablysurmised,asImentionedelsewhere,Iamnotinfavourofunsupervisedtechniquesanditwasthesethatmanydataanalystsbegantorunto.Unsupervisedtechniquesincludethingslikeneuralnetworks,variousmachinelearnings,chaos/catastrophetheory,etc.(IfyouHAVEtolearnthesethingsyouwilleasilyfindabucketloadofnew-fangledalgorithmsonline.)Butwhywouldnewdatarequirenewtechniques?Whendirectmailbecameavailabledidweinventnewtechniques?Whene-mailbecameavailabledidweinventnewtechniques?Regressionisstillworthwhileregardlessofthekindsofdataused.

TheaboveisnottosaythatdigitaldataISNOTverydifferentthantraditionaldata.ILOVEclickstreamdata(suchasOmniture’spageviews)thatshowsjustwhatpageaconsumerviews,forhowlongandinwhatorder.Thatisanamazingtrackingofconsumerbehaviour.Andthenewsocialmediaisbringingaboutaparadigmshiftfromoutboundmarketingtoinboundmarketing.It’sdifferentkindsofdatabutwhywoulditrequirenewstatisticaltechniques?Consumersarestillbehaving,shopping,buying.Right?

Newdata(BIGDATA!)isbringingaboutpanicbecauseitisMOREdata(bothintermsofsize(includingincreasedvariety)andadditionalbehaviouraldimensions).Newdata

stilltracksaconsumer’sawareness,familiarity,consideration,shoppingandpurchase.SoI’dsuggestNOTusingneuralnetworksandTaguchimethodsasareactiontonewdata.Theremightbeaplaceforthesethings,butitisNOTjustbecausethedataisnew.

I’mnotagainstnewalgorithmswhenneeded.Itypicallydonotthinktheyareneeded.Iamalsophilosophicallyopposedtomanyoftheconceptionsthatseemtobebehindthesenewtechniques,inthattheytrytoremovetheanalystfromtheanalysis.Manyofthemarevirtuallymarketedasavoodoo/blackboxandadvocatenotreallyneedingananalyticexpertiserunningtheoperations.Thatseemstomeaformulaformassivefailure.Nottomentionthatwhenthesethingshavebeenputintothefield,Ihaveneverseenthemdobetterthantraditionaleconometrictechniques.Never.Ihavehadmanydebatesandbetsonthisveryissueovertheyears.(Youknowwhoyouare!)

ModellingengagementWhenitcomesdowntoit,afirmcanonlyreallybesuccessfulifitcanengageconsumers.ThisiswhyRFM(recency,frequency,monetary)works,toacertainextent:it(simplistically)findsthosecustomersthattendtobemostengaged.Therealissueisquantifyingengagement:whatbehaviourismostvaluable?

Whyquantifyengagement?Becauseengagementisbydefinitionpsychological(itsimpactisseeninovertbehaviour)themetric‘engagement’hastobederivedindirectly.Thatis,engagementisamotivator,astimulusthatshowsitselfincertainovertbehaviours.Becauseengagementisanindicatorofinterest,dependingontheproblemsolvingfortheproductneeded,interest(intheshoppingphase)iskeytomovingtheconsumertothepurchasingphase.Quantifyingengagementcanleadtospecificmarketingactions.

Whatarethehypothesizedfactorstodrivepurchases?Thereareseveralthingsthatcausepurchases.Someofthesearepricing,seasonality,competition,consumerconfidence,campaignsandengagement.Thesearebothblatantaswellaslatent.Thesearebothinternalandexternal.Thesearebothmarketingleversandconsumers’needarousal.But,engagement(interestintheproduct)iscertainlyaprecursorbeforeanypurchasingcanbemade,regardlessofthelevelofdecisionmaking.

Whataretheissuesarounddesigninganengagementmodel?Figure11.1showsan‘issuetree’,atechniquesometimesusedindesigningaproject.Theideaisthatthekeyissues/requirementsarestatedandsolutionsorotherissuesaredetailed.Thisway,focusisonthebigpicture,andall‘troublespots’aswellasnecessitiesareplannedfor.Yes,thiscomesfromMcKinsey.

Figure11.1Issuetree

Whatshouldanengagementmodellooklike?Becauseengagementislatent,thereneedstobeatechniquethataccountsfortheinteractionsanddiscoveryofthishiddenmotivator.Butthemodelmustultimatelyquantifyengagement.Itshouldshowwhatexplanatorypowerengagementhas(givenseasonality,competition,pricing,marcom,etc.)andhowmuchengagementisworthtothefirm.Thatis,themodelmustbothgiveastructuralanalysisinsharedvarianceaswellasimpacttorevenue.RememberPeterDrucker’sadmonition:ifyourprojectisnot

increasingsatisfaction,decreasingexpenseorincreasingrevenue,youshouldconsiderNOTdoingit.

Sinceengagementisaboutbothhiddenmotivationsandoutrightbehaviours,whatdoesthismeananalytically?Itmeansfactoranalysiswillbeusedtofindthelatentmotivations.Factoranalysisisaninter-relationshiptechniquestolenfrompsychologists.Theideaisthatitextractsvariancefromvariablesthat‘load’(correlatetogether)andthenmakesanewfactor.Thatis,variablesloadhighorlow,dependingontheunderlying(hidden)factor.

Recallthatweusedfactoranalysistocombineindependentvariablesintoother(factors)thatwerebydefinitionnon-correlated.Thatis,theresultantfactorsareuncorrelatedwitheachotherbutthecollectionoffactorsmaintainsthe(distinct,non-overlapping)varianceoftheindependentvariables.Thisiswhyittendstoworkasacorrectionforcollinearity.

Another(andmoretypical)useoffactoranalysisistodivineunderlyingmotivations.Conceptuallythismeansthatifblatantvariablesloadhighontoafactor,itisbecausetheyareeachmotivatedbyalatentdimension.Thenanotherlatentdimensioncomesintoplaytomotivatetheothervariables.Forexample,ifwehavevariableslikeGPA,income,education,jobtitle,etc.thatloadhighontoonefactorwemightcallthatfactor‘intelligence’.Thereisnovariablecalled‘intelligence’;welabelthefactorassuchbasedonwhichvariablescorrelatetogether.Thusthesameanalyticstrategycanbeleviedforengagement.Thisisthetechniquethatstructuralequationmodels(SEM)uses.

BUSINESSCASEScottwas‘loaned’totheonlinesoftwaresalesteamattheendoftheyear.Thisteamwasnewandprimarilymarketedsoftwareforsmallbusinesses.Thesoftwarewouldkeeptrackofthefirm’snetwork,ensuringsecurityandconnectivitywasupdated.Italsorecommendedcertainhardwareproductstoupgradeperformance,etc.

ScottreportedtotheGMofthesoftwaregroup.

‘HiScott,goodtoseeyou’,hesaidandstoodupandshookScott’shand.‘I’veheardgoodthingsaboutyouandfranklyweneedyourhelp.’

‘AnywayIcan’,Scottsaid.

‘Good.Weneedtounderstandwhatonlineactionsindicateinterest.Whenourpotentialcustomerscometoourwebsitetheycanbrowseforthesoftware,clickonproductdemos,downloadatrialversion,downloadawebinar,chatwithasalesengineer,etc.Wearetryingtoquantifythoseactionsthataremostindicativeofpurchase,andthenexploitthoseactions.’

Scottnodded.

‘So’,theGMcontinued,‘whenapotentialcustomeroptsintoreceivee-mails,ortojoinacommunity,weknowthatbehaviourisobviouslyoneofengagement.Wewanttoknowwhatthatengagementisworth.Doesonlyopt-inbehaviourprovidethepathtopurchase,orarethereotherthings?’

‘Soyouwanttoquantifythoseclicks–thosebehaviours–thatleadtopurchase.’

‘That’sright.Notallbehavioursareequallyimportantinindicatingengagement.Wewanttoknowwhereinthepurchasingchainarenumberofopens,numberofpageviews,andtimeonsite,etc.’

‘Sure,Isee.Whichbehavioursarebiggerdriversofpurchasingthanothers?Whichareshoppingandlatent,whichareprecursorstopurchasingandareblatant?Soundsfun.’

ScottalreadyhadanideaashelefttheGM’soffice.Hecalledhisteamtogetherandtheyorganizedaccesstodata.Themaindimensionswouldbeclickstream/pageviews,primarilywhitepaperdownloads,webinars,trialsoftwaredownloads,numberofopens,numberofclicks,numberofpageviews,timeonsiteandwidthanddepthofproductpages.Opensandclicksrefertoe-mailengagement,widthofproductpagesindicatesthevarioussoftwareoptionsavailableanddepthofproductpagesindicatesaninvestigationofallofthespecificsforaparticularsoftwareproduct.Widthanddepthareimportantanddifferentviewsofcustomerbehaviour.Thinkofwidthasifshoppingforjeansandtopsandshoesandcoats.Thinkofdepthasifshoppingforjeans,whitewashedjeans,differentsizedjeans,returnpolicy,storelocation,productreviewofjeans,etc.

Mostoftheinternalclientsbelievedthatonlygated/registereditems(whitepaperdownload,trialsoftwaredownload,webinars,etc.)hadanyrealengagementtoquantify.Thisisanobviouslydeeperbehaviourthan,say,numberofopensandnumberofclicks.Scottwonderediftherewereanyotherbehaviours(particularlynon-gated)thatwouldquantifyasengagedastheopt-inrequiredbehaviours.

Sohecollectedthedataandranfactoranalysis.Twofactorsaccountedfor86%ofallthevariationoftheindependentvariables.Giventhebelowloadings(Table11.1),Scottcalledfactorone‘WindowShopping’andfactortwohecalled‘TryitOn’.Thatis,thebehavioursofopens,clicksandnumberofpageviews,forexample,arehypothesizedtobemotivatedby‘WindowShopping’.Likewisethebehavioursofdepthofproductpages,whitepaperdownloadandwebinarsaremotivatedbyadesireto‘TryitOn’.Whilethisseemsultimatelyintuitive,thewaytheanalysisputsthesetwolatentfactorstogethertoexplaintheblatantbehavioursiscompelling.

Table11.1Factoranalysis

Variable Factor1 Factor2

WindowShopping TryitOn

Opens 0.76 0.26

Clicks 0.84 0.12

Webinar 0.10 0.88

Whitepaperdownload 0.12 0.82

Softwaredownload 0.29 0.86

Pageviews 0.90 0.11

Timeonsite 0.77 0.14

Widthproductpages 0.03 0.09

Depthproductpages 0.16 0.77

It’simportanttonote(forbusinessinsights)thatthefactor‘TryitOn’isnotonlygateditems,butincludesdepthofproductpagesat0.77.Thismeansthereishighengagementindepthofproductpages,almostashighastheopt-inbehaviours.

ModelconceptionThisgaveScottanobviousfunctionalformofthemodel:

Purchase=windowshoppingandtryiton.

Thatis,hewouldregresspurchasespendonthetwofactors(whichinturnaccountsforthevariationofalltheotherindependentvariablesandarethemselvesorthogonal,thatis,uncorrelatedwitheachother).Whenhedidthat,usingthefactorsasthetwoindependentvariables,heachievedanadjustedR2ofover37%andbothfactorsweresignificantatthe95%level.Thismeansthatindrivingrevenue,engagementitselfaccountsformorethanonethirdoftheimpact.The‘TryitOn’coefficientwas17,573andthe‘WindowShopping’coefficientwas5,448.Thismeansthat‘TryitOn’hasthreetimestheimpactonrevenuethandoes‘WindowShopping’.Theinterceptwas9,801.

Examplesappliedtocustomers

Table11.2showsthreeexamplesofhowitworks.Notethatcontact1050hasalargeamountofwebinars,didmanywhitepaperdownloads,downloadedthetrialsoftwareandsearchedthewebsiteproductpagestoasignificantdepth.Theyobviouslyoptedinandfallintothe‘tryiton’motivationandhavehighpredictedrevenue.

Table11.2Examplesappliedtocustomers

Contact Engagedrevenue

Windowshopping

Tryit

Opens Clicks Webinar Whitepaper

Trialsw

Pageviews

Timeon

W_prodpages

on dl dl site

1050 90,451 –0.005 4.591 34 22 5 7 1 222 666 8

1061 51,523 4.453 0.988 77 71 1 6 1 620 1860 4

1269 37,145 3.445 0.488 55 8 0 0 0 559 111 5

Let’scalculatecontact1050’sengagedrevenueusingthemodel.Engagedrevenue=

intercept+

(TryitOncoeff*tryitonindepenvar)+

(Windowshoppingcoeff*windowshoppingindepenvar).

90,451=9,801+(5,448*–0.005)+(17,573*4.591).

Second,notecontact1061hasadifferentbehaviour.Theyhadmanyopensandclicks(indeedtheyclickedonnearlyeveryopen),smallernumberofdownloadactions,butahighnumberofpageviewsandtimeonsite.Theyexhibitthewindowshoppingbehaviourandthushavesmallerpredictedrevenue.

Last,notecontact1269.Theyhavethesmallestnumberofclicks,smallestnumberofdownloads,leasttimeonsightandnodepthofproductpages.Thereforetheirpredictedrevenueislowest.

Scottgothisteamtogether,aswellasthestakeholders,fortheoutputpresentation.Hewantedtotalkaboutmarketingactions.Theycameupwiththefollowinglist:

Sales/hotleads:giventhescore,thesecontactscouldbeturnedovertothesalesteam,thatis,engagementcanbeusedasa‘qualifier’ofahotlead.

Operations/strategy:giventhevastlymorevaluable‘TryitOn’behaviour,everythingpossibleshouldbedonetoremovebarriersto‘TryitOn’.

Marcom/campaigns:messagethat‘TryitOn’isavailable,leteverypotentialcontactknowthattheycandownloadtrialsoftware,readawhitepaper,etc.,togetcomfortablewiththebuyingdecision.

Atthequarterlyanalyticoperationsmeeting,ScottandhisteamwerecalledoutbytheVPfortheirworkonengagementmodelling.Thiswasagroupofallthemarketinganalystsinthecompany.

Therehadbeenatestputinplacebasedonthatanalysisandtheresultswereoverwhelming:whencampaignsmentionedtheavailabilityof‘TryinOn’beforepurchase,purchasewasultimately3.5timesmorethanwiththosethatdidnotgetthemessage.Thistranslatestohugeincreasesinsoftwarerevenue.Theaudiencesmiledandnoddedtheirheads.

‘I’malittlesurprised’,theVPsaid.‘Thisisextremelymeaningfultous;we’vefounda

simplewaytoextractmillionsinextrarevenue,basedonananalyticproject.’

Thecrowdlookedathim.

TheVPhuffed.‘Whenwehaveafunctionalbreakfastoranafter-workget-together,youguysarelaughingandclappingandmakingallkindsofnoise.Atsportseventsyouscreamandcheer.Butwhenhearingofananalyticresultthatisverypositive,youjustnodyourhead.’

Nowtheaudiencesquirmedabit.

‘Ijustmean’,theVPcontinued,‘Iwouldthink–givenyouallworkinanalytics,andhavespentyearseducatingyourselfaboutanalytics–thatwhenyouseeanexcitingresultprovinganalytics,therewouldbealotmorehoopla.It’sokaytobegladthatyourchosencareerfieldreallydoesaddvalue.’

LetmereiteratewhatthisVPissaying.Analyticfolks,overall,tendtobeabitquiet–sure,let’ssayit’sthelogic/rational-dominatedsideoftheirbrain.

Howdoyouknowifyou’reananalyticperson?Youlovethesimplejoythatcomeswhenseeingavariablethatshouldbesignificant,provedinthedata.Thesatisfiedlookofwonderpervadesyourfacewhentheworldmakessense.Thatreplacestheconstant,cynicalcaveat-ladenwearinessweusuallyhavetocarryaround.That’swhatgotusintoanalyticsinthefirstplace,right?Peopleareconfusing,fullofirrationalgreyareas,butdataisdata,truthistruth.Whenwell-understoodrelationshipsmakesenseit’scomforting;wheninsightsarefound,it’sexciting.Murdersolved!Puzzlecompleted!Andit’sconsumerbehaviourwearetryingtopredict–thishelpsusbelievethatmaybepeopleareNOTsoconfusing.Okay,infomercialover,backtotheVP’smeeting.

‘It’sokay’,’theVPsaid,‘toacknowledgethatanalyticsworks.’

Scottstoodupandclapped.‘Yeah,analyticsrocks!’

Mostoftheaudiencelookedattheirwatches,afewclappedorcheeredalittle,somecoughed,oneortworolledtheireyes.TheVPshruggedhisshouldersandtheyallwentbacktowork.Scottsatbackdownandsighed.


Simultaneousequationsaretheanswertothatquestion.Thisincludesblogs,positiveratings,directmail,e-mail,etc.

SocialmediahasbecomeTHETHINGlately,ofcourse.Whileeveryoneseemstojumpontherevolutionarybandwagon,andrightfullyso,therehavebeenotherrevolutionarybandwagons.Inthemid-1990stheinternet/WWWbecameavailableandwidespread.Inthemid-1970sitwaspersonalcomputersandinthe1960smainframecomputers–eachofthesehadhugedataimplications.SowhilesocialmediaISadifferentkindofdata,analyticallyitmerelyallowsmoreunderstandingofconsumerbehaviour.Of

coursethemostexcitingaspectofsocialmedia(intermsofmarketingscience)isthatforthefirsttimeINBOUNDmarketingispossible.

Assuch,theabilitytomodelsocialmediaiscritical.Thisdoesnotmeanitwillrequirenewtechniques;itisjustadifferentsourceofdata.Itdoesshedlightonshoppingchannels,thatis,whatdoessocialmediahavetodowithonlinepurchasesasopposedtoofflinepurchases?Sinceeveryoneisdemandingtoknowhowmuchadvertisingbudgettoassigntosocialmedia,theimpactofsocialmediaonpurchasingbychanneliscritical.

That’swhatScottknewwasgoingtohappenwhenhewascalledintotheofficeofthenewlycreatedVPofdigitalmedia.

TheVPputdownherphoneandshookScott’shand.Scottsmiled.

‘IbetIknowwhatyou’regoingtosay’,Scottsaid.‘You’dliketoknowwhatimpactsocialmediahasonsales.’

‘Sure,butonecomplication:wehavetwosaleschannels,onlineandoffline.We’dliketoknowtowhatextentsocialmediaimpactsonsalesinboththeonlineandofflinechannel.’

Scottgulped.‘Well,that’salittlemorecomplicated.’

Shesmiled.‘ButnottoohardforsomeonethatwontheExecutiveAwardlastyear,right?’

‘We’lldowhatwecan’,Scottsaid.‘I’llgetconnectedwithyourdatapeopleandwe’llseewhatwecanfindout.’

‘Theissueisimportant’,shepointedout.‘Allofusarebeingaskedtocutouradvertisingbudgets.Wehaveaportfolioapproach.Dowespendindirectmail,e-mail,onlineorsocial?Youranalysiscanhelpusoptimizeourbudgets.’

‘Isee.Nopressure.’

‘Andwe’llneeditintwoweeks,tomeetourmarcomplans.’Shesmiledandpickedupherphone,themeetingover.Scottwalkedtohisofficeandknewthatthenexttwoweekswouldbedifficult.

Histeamcollectedweeklysalesdata,bothonlineandoffline.Scottwoulddoatimeseriesmodel.Hewouldusesimultaneousequationstomodeltheimpactofthemarketingmix(product,price,promotionsandplace)onsales.He’ddoaseparatemodelfordesktops,notebooksandworkstations.

Forexample,inthedesktopmodel,hewantedtoknowwhatpricedoestoexplainthesalesofdesktopsbyeach(onlineandoffline)channel.Whataboutpromotions,likee-mailanddirectmail?Andwhataboutsocialmedia:blogs,positivementions,shareofvoice,etc?Itwouldbeinterestingtofindoutthedifferencestheseindependentvariableshadonmovingunitsdifferentlybychannel.E-mailanddirectmailcouldbethoughtofas

outboundmarketing,whereassocialmediacouldbethoughtofasinboundmarketing.Fromastrategicpointofview,theobjectivewastooptimizethebudget,andScottthoughtthatifthismodelworkedthatwouldbeaveryrealuse.

BecauseScotthadalreadydecidedonatimeseriesmodel,ieeachrowisaweeklyaggregation,hedidnothavetodealwithsparsedataonaconsumerlevel.Thatis,ifhetookthe‘eachrowisaconsumer’approach,therewouldbesofewmatches(especiallyintermsofsocialmedia)thathewouldnothavealargeenoughsample.Likewisehewasgoingtomodelunitssoldasthedependentvariableagainstthewholemarketingmix,NOTjustusesocialmediaasindependentvariables.Thatwouldplacefartoomuchattentiononjustsocialmediaandwouldflyinthefaceofalltheotherthingsknowntomoveconsumerbehaviour,suchasprice,season,marcomvehicles,etc.

Sothetheoreticconceptionofthemodelwouldbe:

ONLINEUNITS=f(#directmails,#emails,onlineprice,offlineprice,socialmedia,etc.)

OFFLINEUNITS=f(#directmails,#emails,onlineprice,offlineprice,socialmedia,consumerconfidence,etc.)

Hewouldhavetoconsidertheidentityproblemandalltheothermodellingissues,buttheabovelookedlikewhatheneeded.

AnaddedthingScotthadtoaddress:thelagstructure.It’swellknownthatmanythings(especiallymarketingcommunicationvehicles)havealageffecton,say,demand.(Bylagismeantaweeklyvariableismoveddownoneweek,sothatinsteadofitsactualoccurrenceonJan7forexample,itislaggedtohappenonJan14.)Theactualshape,amplitudeandlengthofthatlagstructureisthesubjectofhundredsofacademicpapers.Sotheproblemis,torestate:whatimpactdomarketinglevers(price,websitevisits,marcomvehicles(includingthelagstructure),socialmedia)andothereffects(seasonality,consumerconfidence)haveonmovingunitsinboththeonlineandofflinechannels?ThisshouldbeseenasaBIGproblem,andveryimportanttoquantify.

SoScottcollectedthedataandbeganworkingonthemodel.HesettledonSAS’s3SLSprocedure.Forsocialmediatheir‘listeninggroup’cameupwithseveralvariables:numberofblogsaboutthecompanyaswellascompetitors,shareofvoice(percentmentionsaboutthecompanydividedbytotalmentionsofallcompetitors),forums,positivementions,etc.ForthelagstructureScottusedSAS’smacro(%pdl)thatallowsmodellingtoincludethenumberoflagsandamplitudeoflags.

Table11.3showstheoutputofthesimultaneous(desktop)models.Thereareseveralnotesabouteach.Firsttheofflinemodelhasanadjustedfitof80%;thatis,thelistedindependentvariables(significantatthe95%level)accountfor80%ofthemovementintheofflinechannel.

Table11.3Impactonofflineunits

OFFLINE

Variable Parameter R-Square 86%

Estimate AdjR-Sq 80%

Intercept 52,289

Blogs 0.055 +55units

Directmails 0.046 +46units

Directmails_lag1 0.039 +39units




E-mails 0.025 +25units

E-mails_lag2 –0.04 –40units



Visits 0.048 +48units

Offlineprice –3.417

Onlineprice 1.801

Consumerconfidence 21.158

Q4 192,668

Themarcom(directmailande-mail)showsalageffect.Directmaillags0–4periodsinitsimpactande-mailalsolags0–4periodsinitsimpact.

Priceisinteresting.Theofflineprice(intheofflinemodel)is,asexpected,negative.Thisagainisthe‘lawofdemand’;pricegoesupandunitsgodown.Theonlinepriceispositive.Thismeanstheonlinepriceisasubstitute;thatis,iftheonlinepriceincreasedby,say,10%,theOFFlinedemandwouldincreaseby18%.

Nowaninterpretationisneeded,especiallyofsocialmediaandmarcomintermsofunits.Thegreyhighlightsshowhowmanyunitsareexpected,onaverage,fromeach,initems.Thatis,multiplyingthecoefficientby1,000,forexample,meansthatifthereare

1,000blogs,onaveragetheofflinechannelbenefitsbyabout55units.Whendirectmailisdropped,foreach1,000piecesthereare46unitsincreasedtotheofflinechannel.Notethee-maillagsarebothpositiveandnegative,meaningtheamplitudehasadifferentshape.E-mailonlyhasapositiveimpactwhenitisfirstdropped,butovertimeitisnegative(thismightreflecte-mailfatigue).Theaboveseemstoindicatethatdirectmailismoreimpactfulthane-mailintheofflinechannel.Notealsohowimpactfulq4isintheofflinechannel.Thisispartoftheinsightthatonlyaneconometricmodelgives.

Nowtakealookattheonlinemodel.TheadjustedR2isalittlebetter.Nowobserveprices.Theonlinepriceisagainnegativeasexpectedbutnotethatwhiletheofflinepriceispositive(indicatingsubstitutability)itisfarlessimpactfulthanintheofflinemodel.Thatis,intheonlinemodea10%increaseintheofflinepricebringsaboutonlya1.2%changeintheonlineunits(comparedtoan18%impactintheofflinemodel).

Itshouldbenosurprisethatwebvisitsarefarmoreimpactfultoonlineunitsbutlookhowmuchmorepowerfule-mailis.Whilethisalsoisprobablynosurprisepleasenotethatthismarcomchannelcanbequantified.Observelikewisethatintheonlinemodelnowdirectmailisnegative.

Nowlet’sinterpretthesocialmedia.Itismuchmoresignificantintheonlinemodel.Shareofvoice,forums,howmanyfollowersthefirmhasandpositivementionsallcontributetotheonlineunits.Thiswouldprobablyindicatethefirmshoulddowhattheycantoinvestinachievingpositivementions,followers,increasingshareofvoice,etc.

Thelasttaskistolookattheseasonality.Becauseq4isdropped(rememberthedummytrap?)alltheotherquartersarereferencingthat.Noteallthreearenegative(comparedtoq4)withq2beingthemostnegative.Thishelpsplanningpurposes.

Thisoverallmessagewouldseemtobe:directmailandconsumerconfidencearepowerfulinimpactingofflineunits,bute-mailandsocialmediaarenot.Intheonlinechannele-mail,socialmediaandwebsitevisitsaremuchmoreimpactful.Whileagainthisisintuitivelycompelling,ithadnotbeenquantifiedbefore.

So,giventheabovemodel,whatarethestrategicimplicationsScottcangive?Intermsofprice:sincetheonlinechannelismuchmoreofasubstituteforofflinepurchasers,raisetheofflinepricetodrivemorebuyersonlineandthinkaboutaddingonlineexclusives.

Intermsofe-mail:decreasetheamountofe-mailssenttothosethatonly/mostlypurchaseoffline.Increasetheamountofe-mailssenttothosethatonly/mostlypurchaseonline.

Intermsofdirectmail:decreasetheamountofdirectmailsenttothosethatonly/mostlypurchaseonline.Increasetheamountofdirectmailsenttothosethatonly/mostlypurchaseoffline.

Intermsofsocialmedia:engageininboundmarketing(findXadvocates/championsof

thefirm,instituteablogstrategyofcommunity,etc.).Offerpromotionsinsocialspacetopurchasethefirm’sonlineproducts.

Noteallthestrategicimplicationsfromthismodel.Itaddressesmostofthemarketingmix(product,price,promotionandplace)andoffersstrategiesbasedonquantifyingcausality.

Table11.4Impactononlineunits

ONLINE

Variable Parameter R-Square 88%

Estimate AdjR-Sq 83%

Intercept 11,805

SOV 46.92

Forums 0.0037 +3units

Followers 0.0592 +59units

Positivementions 0.016 +16units

Directmails 0.08 +80units

Directmails_lag3 –0.073 –73units

Directmails_lag4 –0.043 –43units

E-mails 0.113 +113units

E-mails_lag1 0.013 +13units

E-mails_lag4 0.009 +9units

Visits 0.165 +165units

Offlineprice 0.121

Onlineprice –5.704

Q1 –1,947

Q2 –2,323

Q3 –170

ConclusionSimultaneousequationsprovideapowerful(andsophisticated)wayofquantifying

important(andwell-known)interactions.Oversimplificationisthebaneofgoodanalytics.

Partsix

Conclusion

12

TheFinaleWhatshouldyoutakeawayfromthis?Anyotherstories/soapboxrants?WhatthingshaveIlearnedthatI’dliketopassontoyou?


WhatthingshaveIlearnedthatI’dliketopassontoyou?Wow,we’rehereattheend.Ihopeitwasworthwhileandmaybealittlefun.Ifso,tellyourfriends.

OnethingI’dliketherestofthecorporateworldtoknowiswhatamarketinganalystdoes.Thatis,notthetechnicaldetailsbutwhatistheirfunction,whatistheirpurpose,whyaretheyimportant?

Now,Iknowthatifwetakearandomsampleofpeopleallacrossanumberofcorporationsandaskthem,‘Whatarethefirsttwowordsthatcometomind,whenyouthinkofmarketinganalysts?’

Mostofthemwillanswer,‘Smoulderingsexuality’.

Iknowit’strue,wedealwithrealdata,weseecampaigneffectiveness,wecanforecast,itisnodoubtthesexiestthinginthebuilding.ButthatisnotwhatIwouldwantthemtothinkaboutus,topofmind.Iwouldhopethatthisbook–andmanylikeit–andy’all,willhelpthemtothinkofusas‘QUANTIFYINGCAUSALITY’.

Weareabletothinkintermsof‘thiscausesthat’,thisvariable(price)changesthatvariable(sales)andthen–mostimportantly–quantifyitsomarketingstrategycanactonit.Wequantifycausality.

Idon’twanttohear,‘Correlationisnotcausality’becausewhocares;wearenottalkingaboutcorrelation,andwehardlyevertalkaboutcorrelation.Grangercausality(inventedbyeconomistCliveGranger)assertsthatifanXvariablecomesbeforetheYvariable,andiftheYvariabledoesnotcomebeforetheXvariable,andif,inremovingtheXvariable,theaccuracyofthepredictiondeteriorates,thenthereforeXcausesY.Andwecanstateitascausality.

So,acoupleofthingsI’velearnedthatI’dliketopassontoyou.TheseareanecdotesthathelpedmefocusonimportantthingsandIhopethesestorieswillhelpyou.

Anecdote#1Myfirstjobwasasasalesmaninashoestore.Iwas16andthatatleastmeantIthoughteveryoneover30wasoutoftouchandun-cool(itwasthemid-1970s).

OnedaythebosswasoutandleftBenandIinchargeofthestore.Benwasapart-timesalesguy,hadknownthebossandhisfamilyforyearsandwassemi-retired,over60,andJewish.

Awomancameindraggingtwotoddlerswithher.Benwasatthecounterandthewomansetdownapairofshoesandsaidthestrapbroke.Bensaidhe’dhelphergetareplacement.IsawrightawaythosewereNOTourshoes.Thatwomanwasabouttogetafreepairofshoesbecauseofabefuddled,half-addled,maybesenileandconfusedsalesman.Iwasnotabletogethisattentiontoexplaintohimtheerrorofhisways.Hegotheranotherpairofshoesandshealsoboughtapairforoneofhertoddlers.IwatchedthemasshepaidandcheckedoutandBenwavedatherandsmiled.

Iwentuptohim.‘Ben,whatareyoudoing!?Thosewerenotourshoes!’

‘Oh,youmeanforMrs.Rasmun?’

‘Yes,yougaveherapairofshoes,forfree!’

‘Yes,Iknowher.She’sareturningcustomer,hasaboutfivekids,comesinhereallthetime.’

‘But,youGAVEherapairofshoes.’

Helookedatme.‘Yes.IfItoldherthosewerenotourshoesshewouldhavedisagreedandwalkedout,unhappy,maybenottoevercomeback.Maybenotbuyherkidstheirshoeshere.Ididgiveherapairofshoes.Ialsosoldheranotherpairofshoes,andensuredshewassatisfiedandwouldcontinuetocomeback.’

Igulped.‘Oh…’.Somuchformycoolness.

WhatItookawayfromthat,otherthanmynarrow-mindedprofiling,wasthatsmartnessisalwaysaboutfocusingonthecustomer.It’snotwhatis‘right’financially,butwhatdrivesabusinessiscustomer-centricity.That’sprobablywhyIendedupinmarketing,adisciplinethat(issupposedto)putcustomersfirst.

Now,doesthismeanthecustomerisalwaysright?Ofcoursenot,seeabove.ThecustomerCANbecrazy.RememberGaryBecker’sirrationaldemandcurve(Becker,1962).But,accordingtoPeterDrucker,thepurposeofabusinessistocreateandkeepacustomer–getit?KEEPacustomer.Thismeansunderstandingacustomer,andthismeansusinganalytics.

Whattogetoutofthis:beingcustomer-centricisalwaysright.

Anecdote#2IworkedearlyonasananalystataPCmanufacturingfirm.IwasalsofinishingmyPhD;infact,writingmydissertation.Itinvolvedafairlynovelkindofmathematics,calledtensoranalysis(moreusedinphysics/engineeringthanmarketing/economics)andwasaboutmodellingmulti-dimensionaldemand.Myboss(whilenotveryanalytic,wasverystrategic–includingpromotinghisgroupandhimselftoallofhisbosses)wasimpressedwiththeidea.

Somehowhegotanappointmentwiththeheadguy,threelevelsabovehimself,toshowmydissertation.Thiswasnotaboutthedifferentialgeometryofmanifoldtensors,butwhatcouldbedoneforthePCmanufacturingcompanyintermsofbetterestimatesofdemand.Sothebigmeetingwasset,aboutfiveweeksinadvance.Thiswastogiveustimetoprepare,because–mygod!–thiswasanaudiencewiththeCEO,theBIGBOSS.Sowe(myboss,callhimBob,andI)workedhardonthePowerPointpresentation,spendingdaysonthewordsandgraphics,tryingtofocusontheusecasesofdemandforPCs.HRandthebigboss’ssecretaryevenmadeusrehearse,thatis,practiseourdeliveryinfrontofthem,tomakesuretherewerenooffendingphrasesorcomments(thiswasprobablydirectedatme–Iwasseenassomewhataloosecannon)andtheyhadtoapproveit.FinallyitwasalldoneandwehadourtimewiththeBIGBOSS.

Wewentinandtheofficewaslikeamuseum,glassandbrassandmarble–itwasacorporatetemple.

‘So’,myboss,Bob,began,‘thankssomuchforsomeofyourtime.MikeherehasaveryinterestingPCmodeltoshowyou.Mike?’

Iclearedmythroatandpointedtotheoverheadprojection.‘Demandisusuallymodelledasunitsbeingafunctionofseveralthings,includingprice.Itisalwaysaboutholdingeverythingelseconstant.’

‘SoBob’,theBIGBOSSsaid,‘howarewegoingtobeatthecompetitionontheseserverwars?’

Ilookedathim.What?

’Oh’,Bobstammered,‘wehavesomeideasinmind.’

Thenext45minuteswasaboutBobandtheBIGBOSStalkingabouttheserverwarsandourcompetition.Attheendweshookhandsandleft.TheBIGBOSShadalimp,damphandshake.

Whattogetoutofthis:successcomesfromfocusingonwhat’simportant,especiallyonwhat’simportanttopeopleseverallevelsaboveyou.

Anecdotes#3and#4

Thisanecdoteisimportant,becauseanyonedoingmarketingsciencehasfacedit.Andthosenotinmarketingsciencewonderaboutit.I’mtalkingaboutalteringthedata,editingtheoutputfile,changingtheresultstobe(more)intuitive.

Thisistheunderbellyofmarketingscience.Iknowthoseinotherfunctionswonderifwechangethedata.Dowemakestuffup?

Iwastalkingwithaclientrecentlyandtheytoldmeaboutaconsultantwhowaspredictingthelifttheywouldgetonaparticularcampaign.Theconsultantestimateda16%increase,whichwasWAYMOREthananythingeverachievedbefore.Theconsultantwassketchyonwhatwerethekeydriversofthisphenomenalsuccess.Theclientfranklydidnotbelieveitandsaidso.Theconsultantaskedwhatitshouldbeandtheclientrepliedthataboutone-tenthofhisestimatewouldbebelievable.Thenextweektheconsultantcamebackwitharevisedestimateof,waitforit,2%.HonesttoGod!One-tenthofwhattheiranalyticshadpredictedearlier.NowI’mheretotellyouthatthereisnowayamodelwouldpredict16%andthenreviseittorealisticallybe2%,assumingrealanalyticsweredone.

ThatisoneoftheonlyinstancesIknowofwheretheysimplychangedtheoutputfile.Bytheway,theclientdidnotbelieveiteither(didnottrusttheiranalytics)andfiredthem.Rightfullyso.

So,dowechangetheoutputfile?Theanswerisno.Wecan’t.It’snotjustaboutintellectualintegrity,it’saboutCOA(coveringourasses!).Alteringthedatacannotbehidden;changingtheresultscannotbeburieddeepenoughtoneverbefound.Thatis,youwillbefoundout,youwillbecaughtandtheywillknowthatyoualteredtheresults.Youwillneverhavecredibilityagain.Ever.Itcannotbehidden.Trustme,itwill(eventually)bediscovered.Thisisbecausealldataisinterrelated,onemetricdrivesanother,andonepieceaffectsanotherbecauseonevariablefitstogetherwithanothertotellthewholestory.ChangingonepartofitwillaffectallotherpartsanditwillNOTaddup.Thatdoesnotmeanyouhavetobroadcastittoeveryonethough.Youcanemphasizethisordirecttheconversationtofocusonthat.

ThebiggestmistakeI’veevermade(thatIknowof)wasridiculouslysimplebutverycostly.Iwasadatabasemarketinganalystandmyjobwastodoamodelandproducealistforcustomersmostlikelytopurchase.Wesentoutoveramillioncataloguesamonth(atacostofabout0.40each).

IdevelopedalogisticregressionmodeltoscorethedatabasewithprobabilitytobuyandusedSASprocrank.Iwassupposedtogivethemthetopthreedeciles.Now,SASprocrankhasdecileoutputlabelledfrom0to9,with0thehighest(thebest).Iaccidentallysentdeciles7,8and9–thelowest,theworst.Althoughthesewerethehighest(numbered)deciles,getit?Easymistaketomake,right?Well,thecampaignthatmonthdidnotdowell.SoIsentamessagetoeveryonethatIwasworkingonanew

modelthatIthoughtmightbebetterfornextmonth.MymessagewasdesignedasapreemptivestrikethatIwasengagedandworkingontheproblem.That’swhattheysaw,Iwasmakingitbetter.WhenthetimearrivedthefollowingmonthIusedthesamemodelbutthistimepickeddeciles0,1and2(thebest).Thatcampaignworkedwell.Iwascongratulatedonimprovingthemodel.Ofcoursemyteamknewitwasthesamemodelbuttherightdecileswerechosen.Keytakeaway:becarefulandbeupfrontandhonest(asneedbe).

Anotheranecdotefromearlyinmycareerwasaboutdemandestimation.Myjobwastoforecastcallvolumeandbasedonthatvolumedifferentload-balancing(amongotherthings)sitesweredesigned.Well,thecompanyhaddecidedtobuildanothersite(inFlorida)tohandleallthecalls.Theyhadboughtthelandandgotabuildingandwerehiringpeopletostaffit.Eventuallysomeonethoughtmaybetheyshouldpredicthowmanycallswouldgothere,thatis,estimatedemand.Itsohappenedthatmybosswasawell-respectedandlong-timeeconometricianandourjobwastoputupthedemandnumbers.Everyoneknewthedemandwashuge;thequestionwasjusthowhuge.SoIcollecteddata,macroandmicrovariables,competition,newproducts,timeseriestrends,etc.TheforecastIgotwaslow–waylowerthanexpected.Igulpedandlookedatitagain.Themodelwasforecastinglessthanhalfwhatwasneededforanewsite.Imetwithmybossandwewentovereverythingbutcouldonlyassume,inthebestscenario,60%ofwhatwasneeded.Wegavetherealestateteamourestimatesandtheysaidthanksandthencarriedonwiththebuildingandthehiringforthenewsite.Ayearlaterthatsitewasclosed–therewasnotenoughcallvolumetosupportit.

Nowitwouldhavebeeneasyandacceptableforustojustdoubletheoutput,right?Itwouldhavebeeneasytomakeheroicassumptionsthatmadenosenseinordertogetthedemandforecastwayhigher,right?Inthiscasewejustshowedtheoutputandshruggedourshouldersandcalleditaconservative,worstcasescenario.

TohavealtereditwouldhavebeenakintowhatEinsteincalledTheBiggestBlunderofHisLife(notthatI’mcomparingmyselftohim!)Einstein’srelativityequationsshowedthatbecauseofgravitytheuniverseshouldbeexpanding(orcontracting).Sincenoonebelievedthat,includingEinsteinhimself,headdeda‘cosmologicalconstant’tohisequations,ineffectamathematicalwaytocancelouttheexpansion.AfewyearslaterHubblediscoveredthattheuniversewasindeedexpanding.Einsteineditedtheoutputfile!Thekeytakeaway?IfitdidnotworkforEinsteinitwillnotworkforyou.Donotchangetheresults.


Haveanimplementationplan!Thebestanalyticsintheworldisofnouseifitisnotimplemented.OftenIhavebeen

accused(oftenrightlyso)ofdoinganalyticsthatistooadvanced,andnooneunderstandswhatitmeans,nooneunderstandshowtouseit.ThisisafterIhavedoneit,showntheresultsandputtogetheraPowerPointpresentationexplainingwhatitisandhowithelps.Itwastypicallythenatureofmyjobtodoaprojectandthen,basically,goaway.TheodoreLevitt(who,itcouldbeargued,basicallyinventedmarketingasadisciplinewithhisMarketingMyopiaarticle)saidthatpeopledonotwantaone-inchdrill;theywanttomakeahole,oneinchwide.Iwasoftenguiltyofexpoundingonthecoolnessofthedrill,thewonderfuldetailsandspecificationsofthedrill,howthedrillwouldhelpmakeahole,whythisdrillisbetterthanthatdrill,etc.Ineededtofocusonwhatwastheneed,notthetool.ThereforeI’dsuggestsomeofthefollowingafteranalyticshasbeendone.

Setuptacticalusecases.Puttogetherscenariosofbeforeandafter,withandwithouttheanalytics.

Trainthestaff,maybeevenwithrealdata.Designsimulationsorusepastdataandshowhowtheanalyticswillbeimplemented.Thismaymeandesigningatrackingreportandfocusingonthenewmetrics.Itoughttomeanactuallyshowingdata,thescoreonthedatabaseandthestrategicimplicationsofthenewinsights.Takeawaytheabstractblackbox:analyticsisnotvoodoo.

Getstakeholderstogetherandtalkabouttheirgoals(especiallythosetheirbonusesaredependenton).Showhowthenewanalyticsdirectlyimpactsthesemetrics,andthendecideuponstretchgoals.Ihavetypicallyfoundthebarisratherlow.Mostfirms,evenFortune100firms,havelittleideawhat’sgoingon,havefewinsightsanddonotknowtheircustomersorcompetition.Theytypicallymarketwithashotgunapproachandthrowmoneyaroundhopingforthebest.Afewwell-designedanalyticprojectscandrasticallymakeadifference.That’showyoubecomeasuperstar.

Youshouldsetupcheck-insat30daysafter,90daysafter,and180daysafter,etc.,togetbacktogetherandseehowit’sgoing,whathasbeenhappening.Youareaconsultantandaretheretohelpanswerquestions,ensurethemodesareworkingandarebeingusedcorrectly.

It’scommontosetuptestvs.controlgroups,somakesureyouarepartofthis.Remember,everyonewantstotest,butalmostnooneknowshowtodesignastatisticaltest.

Findawaytomakeanalyticscentraltoasmanydivisionsandseniorpeopleaspossible.Getinfrontofasmanydecisionmakersasfeasible.Nevertalkaboutthetechnicalaspectsoftheanalytics,alwaystalkaboutthedownstreamresultant(typicallyfinancial)metrics.Insteadofsayingthet-ratioissignificantandpositive,tellthemthatnetprofitcanincreaseby2.5%nextquarter.Thatwillmakethemputtheirphonesdownandlisten.

Takeaclassorreadabook(ortwo)onabnormalpsychology

Successinthecorporateworlddependsmoreonyourabilitytoworkwithpeopleandgetthemtodowhatneedstobedonethanonyourtechnicalskills.Thisbookhasbeenaboutaddingtoolsbutreallyyouneedtounderstandpeople.Everyoneisdifferent,thesamethingsdonotworkonallpeople,andpeopleevolveandchangeovertime.Justlikekids.

Allbusinessemotionscomefromeitherfearorgreed.Discovertheprimarymotivatorofthepeopleaboveyouandthepeoplebelowyou.Generallyspeaking,lower-levelfolksaretactic-oriented;theyneedalistoftaskstocomplete.Astheyriseinthecorporaterankstheytendtobecomelesstacticalandmorestrategic.Thismeans,generally,lower-levelfolksaremotivatedbyfear(didtheygetthejobdone,wasitdonecorrectly,cantheybeblamed?)andhigher-levelpeoplearemotivatedbygreed(theyruntheorganizationandgetabonus,theygetperks,newspaperclippingsmentiontheirname).Astheyreachaveryhighleveltheyaremotivatedagainbyfearbecausetheycanbeblamedforeverything.

Soyouneedtoknowpeopleenough(especiallythoseunderyou)sothatyouunderstandiftheyaregoingthroughadivorce,havingtroublewiththeirkids,drugproblems,orjustplaincrazy.Somepeoplewouldpreferrecognitiontoaraise,aflexiblescheduletoanincreaseintitle,one-on-onetimewithyouinsteadoftheforcedfrivolitiesofdepartmentoff-sites.(BTW,noteveryonelovesbowlingorpaintball!)So,investanddiscover.

ConsumerbehaviourispredictableenoughWhatmarketingsciencedealswithisquantifyingcausality.Thatis,measuringhowonevariableimpactsanothervariable.Thismeanspredictingconsumerbehaviour.

Iliketopointoutthattheweatherman,everyday,predictstheweather.Everydayit’swrong.(Maybeit’srightenough,butyoudecidehowoftenyouhavemadefunofthebadpredictions.)Meteorologistshavedecadesofdataandusemainframecomputerstodevelopmodels.Thedatatheydealwitharedewpoints,temperature,wind,pressure,precipitation,etc.Thatis,theydealwithinanimateobjects.Allofthis,andtheystillcan’tgetitright!

Wemarketingsciencefolkstypicallyhaveonlyahandfulofyearsofdatatoworkwith.WedothisonaPCorso,maybeaserver.Andwedealwithirrationalanimateconsumers.Wehavenochancetobe‘right’.

Butthetechniquesyou’veseenherehelpandtheyhelptogetitrightoftenenough.It’softenenoughtomovetheneedleonacorporation’sfinancialperformance.Andbytheway,howgooddoesthemodelhavetobe?I’vehadabossnotuseamodelbecauseitwasnot100%accurate.(Yes,hewasanidiot.)

Iliketousetheanalogyoftheevolutionofthehumaneye.Millionsofyearsagoourancestorswereblindandathighriskamongpredators.Eventuallysomemutationsformed

andwedevelopedan‘eyebud’thatallowednotperfectvisionbutcoulddetectlightfromdark,couldsenseshadowymovementsahead,etc.Iproposethatwhilethiseyebudwasnowherenearperfect(not100%)theinsight(getit,sight?)wasenoughtoallowthemtomakesmarterdecisions.Itsvisualacuitywouldgrowanddevelopovertimebutatleastitcouldnowslightly‘see’largecreaturescomingtowardit,itcouldtelldayfromnight,maybefindfoodeasier,etc.Iproposethiswasenoughtosurvive.

So,aimhigh.Wecameoutofthemud.

Thebarislow.Wecanonlygoupfromhere.Goget‘em!

GlossaryAverage:themostrepresentativemeasureofcentraltendency,NOTnecessarilythemean.

Censoredobservation:thatobservationwhereinwedonotknowitsstatus.Typicallytheeventhasnotoccurredyetorwaslostinsomeway.

Collinearity:ameasureofhowvariablesarecorrelatedwitheachother.

Correlation:ameasureofbothstrengthanddirection,calculatedasthecovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.

Covariance:thedispersionorspreadoftwovariables.

Designofexperiments:aninductivewayofcreatingastatisticaltestusingastimulustakingintoaccountvariance,confidence,etc.,byrandomizationandcomparisontoacontrolgroup.

Elasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproducesmorethanthatchangeinanoutputvariable.

Elasticity:ametricwithnoscaleordimension,calculatedasthepercentchangeinanoutputvariablegivenapercentchangeinaninputvariable.

Inelasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproduceslessthanthatchangeinanoutputvariable.

Lift/gainschart:avisualdevicetoaidininterpretinghowamodelperforms.Itcomparesbydecilesthemodel’spredictivepowertorandom.

Maximumlikelihood:anestimationtechnique(asopposedtoordinaryleastsquares)thatfindsestimatorsthatmaximizethelikelihoodfunctionobservingthesamplegiven.

Mean:adescriptivestatistic,ameasureofcentraltendency,themeanisacalculationsummingupthevalueofalltheobservationsanddividingbythenumberofobservations.

Median:themiddleobservationinanoddnumberofobservations,orthemeanofthemiddletwoobservations.

Mode:thenumberthatappearsmostoften.

Ordinaryregression:astatisticaltechniquewherebyadependentvariabledependsonthemovementofoneormoreindependentvariables(plusanerrorterm).

Oversampling:asamplingtechniqueforcingaparticularmetrictobeoverrepresented(larger)inthesamplethaninsimplerandomsampling.Thisisdonebecauseasimplerandomsamplewouldproducetoofewofthatparticularmetric.

Range:ameasureofdispersionorspread,calculatedasthemaximumvaluelessthe

minimumvalue.

Reducedformequations:ineconometrics,modelssolvedintermsofendogenousvariables.

Segmentation:amarketingstrategyaimedatdividingthemarketintosub-markets,whereineachmemberineachsegmentisverysimilarbysomemeasuretoeachotherandverydissimilartomembersinallothersegments.

Simultaneousequations:asystemofmorethanonedependentvariable-typeequation,oftensharingseveralindependentvariables.

Standarddeviation:thesquarerootofvariance.

Standarderror:anestimateofstandarddeviation,calculatedasthestandarddeviationdividedbythesquarerootofthenumberofobservations.

Stratifying:asamplingtechniquechoosingobservationsbasedonthedistributionofanothermetric.Thisisdonetoensurethesamplecontainsadequateobservationsofthatparticularmetric.

Variance:ameasureofspread,calculatedasthesummedsquareofeachobservationlessthemean,dividedbythecountofobservationslessone.

Z-score:ametricdescribinghowmanystandarddeviationsanobservationisfromitsmean.

BibliographyandfurtherreadingAriely,Dan(2008)PredictablyIrrational:Thehiddenforcesthatshapeourdecisions,HarperCollins

Bagozzi,Richard(ed)(2002)AdvancedMethodsofMarketingResearch,Blackwell

Baier,Martin,Ruf,KurtisandChakraborty,Goutam(2002)ContemporaryDatabaseMarketing:Conceptsandapplications,RacomCommunications

Becker,Gary(1962)Irrationalbehaviourandeconomictheory,JournalofPoliticalEconomy,70(1),pp1–13

Belsley,David,Kuh,EdwinandWelsch,Roy(1980)RegressionDiagnostics:Identifyinginfluentialdataandsourcesofcollinearity,JohnWileyandSons

Binger,BrianandHoffman,Elizabeth(1998)MicroeconomicswithCalculus,AddisonWesley

Birn,RobinJ(2009)TheEffectiveUseofMarketResearch:Howtodriveandfocusbetterbusinessdecisions,KoganPage

Brown,WilliamS.(1991)IntroducingEconometrics,WestPublishingCompany

Chiang,Alpha(1984)FundamentalMethodsofMathematicalEconomics,McGrawHill

Cox,David(1972)Regressionmodelsandlifetables,JournalofRoyalStatisticalSociety,34(2),pp187–220

Deaton,AngusandMuellbauer,John(1980)EconomicsandConsumerBehavior,CambridgeUniversityPress

Engel,James,Blackwell,RogerandMiniard,Paul(1995)ConsumerBehavior,DrydenPress

Greene,WilliamH(1993)EconometricAnalysis,PrenticeHall

Grigsby,Mike(2002)Modelingelasticity,CanadianJournalofMarketingResearch,20(2),p72

Grigsby,Mike(2014)RethinkingRFM,MarketingInsights

Hair,Joseph,Anderson,Rolph,Tatham,RonaldandBlack,William(1998)MultivariateDataAnalysis,PrenticeHall

Hamburg,Morris(1987)StatisticalAnalysisforDecisionMaking,HarcourtBraceJovanovich

Hazlitt,Henry(1979)EconomicsinOneLesson:Theshortestandsurestwaytounderstandbasiceconomics,CrownPublishers

Hughes,ArthurM.(1996)TheCompleteDatabaseMarketer,McGrawHill

Intriligator,Michael,Bodkin,RonaldandHsiao,Cheng(1996)EconometricsModels,TechniquesandApplications,PrenticeHall

Jackson,RobandWang,Paul(1997)StrategicDatabaseMarketing,NTCBusinessBooks

Kachigan,Sam(1991)MultivariateStatisticalAnalysis:Aconceptualintroduction,RadiusPress

Kennedy,Peter(1998)AGuidetoEconometrics,MITPress

Kmenta,Jan(1986)ElementsofEconometrics,Macmillan

Kotler,Philip(1967)MarketingManagement:Analysis,planningandcontrol,PrenticeHall

Kotler,Philip(1989)Frommassmarketingtomasscustomization,PlanningReview,17(5),pp10–47

Lancaster,Kelvin(1971)ConsumerDemand,ColumbiaUniversityPress

Leeflang,Peter,S.H.,Wittink,Dick,Wedel,MichelandNaert,Philippe(2000)BuildingModelsforMarketingDecisions,KluwerAcademicPublishers

Levitt,Theodore(1960)Marketingmyopia,HarvardBusinessReview,38,pp24–47

Lilien,Gary,Kotler,PhilipandMoorthy,K.Sridhar(2002)MarketingModels,Prentice-HallInternationaleditions

Lindsay,CottonMather(1982)AppliedPriceTheory,DrydenPress

MacQueen,JB(1967)Somemethodsforclassificationandanalysisofmultivariateobservations,inProceedingsof5thBerkeleySymposiumonMathematicalStatisticsandProbability,UniversityofCaliforniaPress

Magidson,JayandVermunt,Jeroen(2002)Anontechnicalintroductiontolatentclassmodels,StatisticalInnovationwhitepaper[online]http://statisticalinnovations.com/technicalsupport/lcmodels2.pdf

Magidson,JayandVermunt,Jeroen(2002)Latentclassmodelsforclustering:acomparisonwithK-means,CanadianJournalofMarketingResearch,20,pp37–44

Myers,James(1996)SegmentationandPositioningforStrategicMarketingDecisions,AmericanMarketingAssociation

Porter,Michael(1979)Howcompetitiveforcesshapestrategy,HarvardBusinessReview,March/April,pp137–45

Porter,Michael(1980)CompetitiveStrategy,TheFreePress

Samuelson,Paul(1947)FoundationsofEconomicAnalysis,HarvardUniversityPress

http://statisticalinnovations.com/technicalsupport/lcmodels2.pdf

Schnaars,StevenP(1997)MarketingStrategy:Customers&competition,TheFreePress

Silberberg,Eugene(1990)TheStructureofEconomics:Amathematicalanalysis,McGrawHill

Sorger,Stephan(2013)MarketingAnalytics,AdmiralPress

Stone,Merlin,Bond,AlisonandFoss,Bryan(2004)ConsumerInsight:Howtousedataandmarketresearchtogetclosertoyourcustomer,KoganPage

Sudman,SeymourandBlair,Edward(1998)MarketingResearch:Aproblemsolvingapproach,McGrawHill

Takayama,Akira(1993)AnalyticalMethodsinEconomics,UniversityofMichiganPress

Treacy,MichaelandWiersema,Fred(1997)TheDisciplineofMarketLeaders:Chooseyourcustomers,narrowyourfocus,dominateyourmarket,AddisonWesley

Urban,GlenandStar,Steven(1991)AdvancedMarketingStrategy:Phenomena,analysisanddecisions,PrenticeHall

Varian,Hal(1992)MicroeconomicAnalysis,W.W.Norton&Company

Wedel,MichelandKamakura,Wagner(1998)MarketSegmentation:Conceptualandmethodologicalfoundations,KluwerAcademicPublishers

Weinstein,Art(1994)MarketSegmentation:Usingdemographics,psychographicsandothernichemarketingtechniquestopredictandmodelcustomerbehavior,IrwinProfessionalPublishing

IndexNote:italicsindicateafigureortableinthetext.

A/Btesting(i),(ii)

abnormalpsychology(i)

advertising(i)

affinityanalysis(i)

AID(automaticinteractiondetection)(i)

AlmostIdealDemandSystem(AIDS)(i)

average(i),(ii),(iii)

definition(i),(ii)

BayesInformationCriterion(BIC)(i),(ii),(iii)

Becker,Gary(i)

behaviouralsegmentation(BS)(i)

differencetoRFM(i),(ii)

techniques(i)

seealsosegmentation

branding(i)

causality(i),(ii)

seealsoGrangercausality

censoredobservation(i),(ii)

centrallimittheorem(i)

CHAID(chi-squaredautomaticinteractiondetection)(i),(ii)

advantages(i)

disadvantages(i)

output(i)

uses(i)

‘champion/challenger’

seeA/Btesting

Chrysler(i)

Cochran-Orcutttest(i)

collinearity(i)

definition(i)

conditionindex(i),(ii)

confidenceintervals(i),(ii)

‘confusionmatrix’(i),(ii)

conjointanalysis(i),(ii)

consumerseecustomerbehaviour

correlation(i)

definition(i)

negative(i)

positive(i)

serial(i),(ii)

covariance(i),(ii)

definition(i)

Cox,SirDavid(i),(ii)

customerbehaviour(i),(ii),(iii),(iv),(v)

background(i)

choices(i),(ii)

constraints(i)

data(i),(ii)

decision-process(i),(ii)

engagement(i)

example(i)

experientialmotivations(i),(ii)

informationprocessing(i)

loyalty(i),(ii),(iii)

marketingstrategyand(i),(ii)

needrecognition(i)

predicting(i)

preferences(i)

primarymotivations(i),(ii)

post-purchaseevaluation(i)

pre-purchasealternativeevaluation(i)

purchasing(i)

shareofvoice(i)

underlyingmotivations(i)

customerloyalty(i),(ii)

emotional(i),(ii)

transactional(i),(ii)

data(i)

behavioural(i),(ii)

big(i)

clickstream(i)

database(i)

digital(i)

survey(i)

usesof(i)

Deaton,Angus(i)

deductivethinking(i)

demand(i)

drivers(i)

elastic(i),(ii),(iii)

estimation(i)

inelastic(i),(ii),(iii)

descriptiveanalysis(i)

designofexperiments(DOE)(i),(ii),(iii)

digitalanalytics(i)

Drucker,Peter(i),(ii),(iii)

‘dummytrap’(i),(ii)

dummyvariables(i),(ii)

seealsovariables

Durbin-Watsontest(i),(ii)

econometrics(i),(ii),(iii)

elasticdemand(i)

elasticity(i)

elasticitymodelling(i),(ii),(iii)

outputbysegment(i)

overview(i)

ownpricevscompetitors(i)

pointelasticity(i)

segmentation(i)

seealsodemand

engagement(i)

issuetree(i)

model(i)

purpose(i)

seealsocustomerbehaviour

equations

deterministic(i),(ii)

probalistic(i),(ii)

reducedform(i)

simultaneous(i),(ii)

estimators(i),(ii)

consistency(i)

efficiency(i)

unbiasedness(i)

gametheory(i),(ii),(iii),(iv)

generalsurvivalcurve(i)

glossary(i)

Grangercausality(i)

Hamburg,Morris(i)

hierarchicalclustering(i)

dendogram(i)

Iacocca,Lee(i)

illconditioning(i)

inductivethinking(i)

inelasticdemand(i)

Kennedy,Peter(i)

K-meansclustering(i),(ii),(iii),(iv)

advantages(i)

disadvantages(i)

Kotler,Philip(i),(ii),(iii)

latentclassanalysis(LCA)(i),(ii),(iii)

advantages(i)

disadvantages(i)

LatentGold(i)

Levitt,Theodore(i),(ii)

lifetimevalue(LTV)(i)

descriptiveanalysis(i)

examplecalculations(i)

predictiveanalysis(i)

lift/gainschart(i),(ii)

logisticregression(i),(ii),(iii),(iv),(v),(vi),(vii),(viii)

marketbasketanalysisand(i)

logit(i),(ii)

MacQueen,James(i)

Magdison,Jay(i),(ii)

marcomseemarketingcommunications

marketbasketanalysis(i)

estimating/predicting(i),(ii)

marketing(i),(ii),(iii),(iv)

consumer-centric(i)

customerbehaviourand(i)

database(i),(ii),(iii)

demand(i)

partition(i),(ii)

position(i),(ii)

prioritize(i),(ii)

probe(i),(ii)

productcentric(i)

strategic(i),(ii)

tactical(i)

marketingcommunications(marcom)(i),(ii),(iii),(iv)

businesscase(i)

impactonrevenue(i)

responsestransactions(i)

marketingeconomics(i)

marketingresearch(i),(ii)

marketingstrategy(i),(ii),(iii),(iv),(v)

competitivethreats(i)

consumerbehaviourand(i)

defensivereactions(i)

lifetimevale(LTV)and(i)

offensivereactions(i)

types(i)

maximumlikelihood(i),(ii)

mean(i),(ii),(iii),(iv)

definition(i)

measuresofcentraltendency(i),(ii),(iii)

measuresofdispersion(i),(ii)

median(i),(ii),(iii),(iv)

definition(i)

mode(i),(ii),(iii),(iv)

definition(i)

modelling

dependentvariabletechniques(i),(ii)

engagement(i)

inter-relationshiptechniques(i)

segmentationand(i)

structuralequation(i)

Muelbauer,John(i)

multipleregression(i)

Myers,JamesH(i),(ii)

Nash,John(i)

netpresentvalue(NPV)(i)

normaldistribution(i),(ii),(iii)

Omniture(i)

ordinaryregression(i),(ii),(iii),(iv),(v),(vi),(vii)

definition(i)

oversampling(i),(ii)

partiallikelihood(i)

pointelasticity(i)

Porter,Michael(i)

predictiveanalysis(i)

pricing(i),(ii),(iii),(iv)

probability(i),(ii),(iii)

example(i)

proportionalhazardsmodelling(i)

seealsosurvivalanalysis

range(i),(ii)

definition(i)

reducedfromequations(i)

regression(i),(ii),(iii),(iv),(v)

revenuegrowthmargin(i)

RFM(recency,frequency,monetary)(i),(ii),(iii),(iv),(v),(vi)

definition(i)

ridgeregression(i)

samplesizeequation(i),(ii)

sampling(i)

distribution(i),(ii)

Schnaars,StevenP(i)

segmentation(i),(ii),(iii)

accessibility(i)

actionable(i)

algorithm(i)

behavioural(i)

behaviouraldata(i)

benefits(i)

businessrules(i)

definition(i),(ii)

example(i)

identifiability(i)

marketingstrategy(i),(ii)

metrics(i)

namingsegments(i),(ii)

pricingand(i)

referencebooks(i)

responsiveness(i)

scoringdatabase(i)

stability(i)

strategicusesof(i),(ii)

substantiality(i)

testandlearnplan(i),(ii)

toolsandtechniques(i)

significance(i),(ii)

simpleregression(i)

simultaneousequations(i),(ii),(iii),(iv),(v),(vi)

definition(i)

‘slopeshifters’seebinaryvariables

standarddeviation(i),(ii),(iii),(iv)

definition(i)

standarderror(i),(ii)

StatisticalInnovations(i)

statisticaltechniques

assumptions(i),(ii)

dependentequationtypes(i),(ii),(iii)

inter-relationshiptypes(i),(ii),(iii)

segmentation(i)

statisticaltesting(i)

A/Btesting(i)

samplesizeequation(i)

Stone,Merlin(i)

strategicmarketingseemarketing

stratifying(i),(ii)

structuralequationmodelling(SEM)(i),(ii)

latentvariables(i)

supply(i)

surveys

data(i),(ii)

design(i)

respondentfatigue(i)

survivalanalysis(i)

businesscase(i)

targeting(i)

‘timeuntilanevent’(i)

t-ratio(i),(ii)

universalcontrolgroup(UCG)(i)

variables(i),(ii),(iii),(iv)

binary(i),(ii)

endogenous(i),(ii)

exogenous(i),(ii)

inter-relationshiptechniques(i)

latent(i),(ii)

predetermined(i)

seealsocorrelation,covariance,modelling

variance(i),(ii),(iii),(iv)

varianceinflationfactor(VIF)(i)

Vermunt,JeroenK(i)

Yule-Walkerestimate(i)

z-score(i),(ii),(iii),(iv),(v),(vi),(vii)

formula(i)

Publisher’snote

Everypossibleefforthasbeenmadetoensurethattheinformationcontainedinthisbookisaccurateatthetimeofgoingtopress,andthepublisherandauthorcannotacceptresponsibilityforanyerrorsoromissions,howevercaused.Noresponsibilityforlossordamageoccasionedtoanypersonacting,orrefrainingfromaction,asaresultofthematerialinthispublicationcanbeacceptedbytheeditor,thepublisherortheauthor.

FirstpublishedinGreatBritainandtheUnitedStatesin2015byKoganPageLimited

Apartfromanyfairdealingforthepurposesofresearchorprivatestudy,orcriticismorreview,aspermittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced,storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublishers,orinthecaseofreprographicreproductioninaccordancewiththetermsandlicencesissuedbytheCLA.Enquiriesconcerningreproductionoutsidethesetermsshouldbesenttothepublishersattheundermentionedaddresses:

2ndFloor,45GeeStreetLondonEC1V3RSUnitedKingdomwww.koganpage.com

1518WalnutStreet,Suite1100PhiladelphiaPA19102USA

4737/23AnsariRoadDaryaganjNewDelhi110002India

©MikeGrigsby,2015

TherightofMikeGrigsbytobeidentifiedastheauthorofthisworkhasbeenassertedbyhiminaccordancewiththeCopyright,DesignsandPatentsAct1988.

ISBN9780749474171

E-ISBN9780749474188

BritishLibraryCataloguing-in-PublicationData

ACIPrecordforthisbookisavailablefromtheBritishLibrary.

LibraryofCongressCataloging-in-PublicationData

Grigsby,Mike.

Marketinganalytics:apracticalguidetorealmarketingscience/MikeGrigsby.

pagescm

ISBN978-0-7494-7417-1(paperback)–ISBN978-0-7494-7418-8(ebk)1.Marketingresearch.2.Marketing.I.Title.

HF5415.2.G7542015

658.8’3–dc23

2015016002

TypesetandeBookbyGraphicraftLimited,HongKong

PrintproductionmanagedbyJellyfish

PrintedandboundbyCPIGroup(UK)Ltd,Croydon,CR04YY

http://www.koganpage.com

http://www.graphicraft.com.hk

Marketing Analytics: A Practical Guide to Real Marketing Science

Documents

Transcript of Marketing Analytics: A Practical Guide to Real Marketing Science