Marketing Analytics: A Practical Guide to Real Marketing Science
Transcript of Marketing Analytics: A Practical Guide to Real Marketing Science
Praiseformarketinganalytics‘ForthoseMBAswhobarelypassedtheirquantitativemarketingandstatisticsclasseswithouttrulyunderstandingthecontent,MarketingAnalyticsprovideseverythingmanagersandexecutivesneedtoknowpresentedasaconversationwithexamplestoboot!You’lldefinitelysoundsmarterintheboardroomafterreadingthisbook!’
JamesMourey,PhDandassistantprofessorofmarketingatDePaulUniversity(Chicago)
‘MarketingAnalyticsisamust-readforanalyticspractitionersandmarketingmanagersseekingacomprehensiveoverviewofthemostactionabletechniquesthatvirtuallyanyorganizationcanapplytogainimmediatebenefits.Ratherthancomplicatethebookwithtechnicaldetailsthatmaynotbeofinteresttoallreaders,DrGrigsbysuccinctlyillustratestheconceptswithrealexamplesandprovidesreferencesforanalystsneedingdeeperguidanceortheory.IwishMarketingAnalyticshadbeenpublished15yearsago–itwould’vesavedmealotofindependentresearch!’
WDeanVogt,Jr,marketingresearchandanalyticspractitioner
‘MarketingAnalyticsisapracticalguidebookwritteninaconversationaltonethatmakescomplextheorieseasilyunderstood.Theauthor’sexperienceintheindustrycombinedwithhisinherentgiftforexplainingeverythingasuccessfulmarketinganalystneedstoknowmakesthisbookamust-read.’
KatyRichardson,FounderandPrincipal,214Creative
‘Thisisagreatbookforpractitionerswhohavelearnedplentyoftheoriesandwanttolearnhowtoapplymethodologies.Itisalsoagreat,easy-to-readresourceforanyonewhodoesnothaveadeeptheoreticalbackgroundbutwantstolearnhowanalyticsworkinreallife.’
IngridGuo,VP,Analytics,andManagingDirector,JavelinMarketingGroup(Beijing)
‘Mike’swritingisstraightforwardandentertaining.Hebringsaconversationalandrelatabletoneandapproachtosomefairlycomplexmaterial.Sometimesmarketerscantakethemselvesalittletooseriously,especiallywhenitcomestothemathematicalsideofthings.Mike’sworkremindsustolightenupandhavefunwithit.’
KatyRollings,PhD,loyaltyanalystatGameStop
‘Thebooksummarizesallthecriticaltopicsinaconsumer-focusedanalyticapproach,andthecasesarefuntoread.’
ErnanHaruvy,PhD,ProfessorofMarketing,UTDallas
‘Thisbookgivesabroadoverviewofmarketinganalyticstopeoplewhodon’thaveanyrelatedbackground…Examplesareexplainedtogivereadersacleareridea.Ithinkthebookisworthareadforanyonewhowantstobecomeamarketinganalyst.’
YuanFang,MSc(marketinganalyticscandidate)
‘Inonesentence,theroleofmarketingistodeterminewhotheorganizationcanserveandhowitcanbestbedone.Tothisend,MikeGrigsbyescortsthereaderthroughthedifficultprocessofunderstanding,explaining,andanticipatingcustomerbehaviour,aptlydeliveredwiththeno-nonsenseauthorityearnedbyveteransofmarketingsuccess.IfMarketingAnalyticsistheclass,I’msittingfrontrow!’
AllynWhite,PhD
‘InhisbookMarketingAnalytics,MikeGrigsbytakespassionatemarketingstrategistsonapractical,real-lifejourneyforsolvingcommonmarketingchallenges.Bycombiningtheconceptsandknowledgeareasofstatistics,marketingstrategyandconsumerbehaviour,Mikerecommendsscientificandinnovativesolutionstocommonmarketingproblemsinthecurrentbusinessenvironment.Everychapterisaninterestingjourneyforthereader.
WhatIlikemostaboutthebookisitssimplicityandhowitappliestorealwork-relatedsituationsinwhichalmostallofushavebeeninvolvedwhilepractisingmarketingofanysort.IalsolikehowMiketalksabouttangiblemeasurementsofstrategicrecommendedmarketingsolutionsaswellashowtheyaddvaluetocompanies’strategicendeavours.Ihighlyrecommendreadingthisbookasitaddsacompletelynewdimensiontomarketingscience.’
KristinaDomazetoska,projectmanagerandimplementationconsultantatInsala–TalentDevelopmentandMentoringSolutions
‘Mike’sbookistherightblendoftheoryappliedtotherealworld,large-scaledataproblemsofmarketing.It’sexactlythebookIwishI’dhadwhenIstartedoutinthisfield.’
JeffWeiner,SeniorDirector,ChannelandEmployeeAnalytics–USRegion,Aimia
‘Iloveyourbook!Itoffersatrulyaccessibleguidetothebasicsandpracticeofmarketinganalytics.Iespeciallylikehowyoubringinyourcorrectinsightsone.g.theoverrelianceoncompetitive(vsconsumer)behaviorinmarketingstrategy.’
KoenHPauwels,AssociateProfessorattheTuckSchoolofBusiness,DartmouthandÖzyeğinUniversity,Istanbul
‘IfoundMarketingAnalyticsinterestingandeasytocomprehend.Ithasluciddescriptionsalongwiththeillustrations,whichcomplementthetext.Evenalaymancanunderstand,asthereisnojargonortechnicallanguageused.’
SunpreetKaurSahni,AssistantProfessoratGNIMT,PhD(marketing)Ludhiana,
Punjab,India
‘Thisisanexcellentreadforpeopleintheindustrywhoworkinstrategyandmarketing.ThisisoneofthefirstbooksthatIhavereadthatcoverstheentirespectrumfromdemand,segmentation,targeting,andhowresultscanbecalculated.Inanagewheremarketingisbecomingmoreandmoresophisticated,thisbookprovidesthetoolsandthemathematicsbehindthefacts.MarketingAnalyticsiswrittenwithascientificvoice,butwasveryreadable,withthesciencewrappedintoeverydayactivities,basedonacharacterwecanallrelateto,thatarederivedfromtheseformulas,ultimatelydrivingROI.’
ElizabethJohnson,VP,ShopperMarketing–DigitalSolutionsRetailigence
‘IstronglyrecommendMarketingAnalyticstobothbeginnersandfolkswhodon’thavemuchbackgroundinstatistics.Averyprecisebook.Complicatedtopicsaroundstatistics,marketingandmodellingarecondensedverywellinamuch-simplifiedlanguage,alongwithreal-worldexamplesandbusinesscases,whichmakesitamusingtoreadandgivesclearunderstandingaboutapplicationsoftheconcepts.Thebooksetsthegroundwithexactlywhatoneneedstoknowfromstatisticsaswellasmarketing,andrunsthroughhowthesetwo,coupledwithanalytics,canhelpsolvereal-worldbusinessproblems.Later,italsocoversMarketResearchtopicsandconcludeswiththeCapstone,coveringapplicationofallthemethodologiestoDigitalAnalytics.IbelievethatMarketingAnalyticswillbeahandyreferenceormanualforstudentsaswellasmarketinganalyticsprofessionals.’
SasmitKhokale,MS(MIS),AnalyticsPractitioner
NoteontheEbookEdition
Foranoptimalreadingexperience,pleaseviewlargetablesandfiguresinlandscapemode.
Thisebookpublishedin2015by
KoganPageLimited
2ndFloor,45GeeStreet
LondonEC1V3RS
UnitedKingdom
www.koganpage.com
©MikeGrigsby,2015
E-ISBN9780749474188
Fullimprintdetails
ContentsForeword
Preface
Introduction
PARTONEOverview
01A(little)statisticalreviewMeasuresofcentraltendency
Measuresofdispersion
Thenormaldistribution
Relationsamongtwovariables:covarianceandcorrelation
Probabilityandthesamplingdistribution
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
02Briefprinciplesofconsumerbehaviourandmarketingstrategy
Introduction
Consumerbehaviourasthebasisformarketingstrategy
Overviewofconsumerbehaviour
Overviewofmarketingstrategy
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
PARTTWODependentvariabletechniques
03Modellingdependentvariabletechniques(withoneequation):whatarethethingsthatdrivedemand?
Introduction
Dependentequationtypevsinter-relationshiptypestatistics
Deterministicvsprobabilisticequations
Businesscase
Resultsappliedtobusinesscase
Modellingelasticity
Technicalnotes
Highlight:Segmentationandelasticitymodellingcanmaximizerevenueinaretail/medicalclinicchain:fieldtestresults
Abstract
Theproblemandsomebackground
Descriptionofthedataset
First:segmentation
Then:elasticitymodelling
Last:testvscontrol
Discussion
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
04WhoismostlikelytobuyandhowdoItarget?
Introduction
Conceptualnotes
Businesscase
Resultsappliedtothemodel
Liftcharts
Usingthemodel–collinearityoverview
Variablediagnostics
Highlight:Usinglogisticregressionformarketbasketanalysis
Abstract
Whatisamarketbasket?
Logisticregression
Howtoestimate/predictthemarketbasket
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
05Whenaremycustomersmostlikelytobuy?Introduction
Conceptualoverviewofsurvivalanalysis
Businesscase
Moreaboutsurvivalanalysis
Modeloutputandinterpretation
Conclusion
Highlight:Lifetimevalue:howpredictiveanalysisissuperiortodescriptiveanalysis
Abstract
Descriptiveanalysis
Predictiveanalysis
Anexample
Checklist:You’llbethesmartestpersonintheroomifyou…
06Modellingdependentvariabletechniques(withmorethanoneequation)
Introduction
Whataresimultaneousequations?
Whygotothetroubleofusingsimultaneousequations?
Desirablepropertiesofestimators
Businesscase
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
PARTTHREEInter-relationshiptechniques
07Modellinginter-relationshiptechniques:whatdoesmy(customer)marketlooklike?
Introduction
Introductiontosegmentation
Whatissegmentation?Whatisasegment?
Whysegment?Strategicusesofsegmentation
ThefourPsofstrategicmarketing
Criteriaforactionablesegmentation
Aprioriornot?
Conceptualprocess
Checklist:You’llbethesmartestpersonintheroomifyou…
08Segmentation:toolsandtechniques
Overview
Metricsofsuccessfulsegmentation
Generalanalytictechniques
Businesscase
Analytics
Comments/detailsonindividualsegments
K-meanscomparedtoLCA
Highlight:WhyGoBeyondRFM?
Abstract
WhatisRFM?
Whatisbehaviouralsegmentation?
WhatdoesbehaviouralsegmentationprovidethatRFMdoesnot?
Conclusion
Segmentationtechniques
Checklist:You’llbethesmartestpersonintheroomifyou…
PARTFOUROther
09MarketingresearchIntroduction
Howissurveydatadifferentthandatabasedata?
Missingvalueimputation
Combatingrespondentfatigue
Afartoobriefaccountofconjointanalysis
Structuralequationmodelling(SEM)
Checklist:You’llbethesmartestpersonintheroomifyou…
10Statisticaltesting:howdoIknowwhatworks?Everyonewantstotest
Samplesizeequation:usetheliftmeasure
A/Btestingandfullfactorialdifferences
Businesscase
Checklist:You’llbethesmartestpersonintheroomifyou…
PARTFIVECapstone
11Capstone:focusingondigitalanalyticsIntroduction
Modellingengagement
Businesscase
Modelconception
HowdoImodelmultiplechannels?
Conclusion
PARTSIXConclusion
12TheFinale:whatshouldyoutakeawayfromthis?Anyotherstories/soapboxrants?
WhatthingshaveIlearnedthatI’dliketopassontoyou?
Whatotherthingsshouldyoutakeawayfromallthis?
Glossary
Bibliographyandfurtherreading
Index
Testbanksanddatasetsrelatingtochaptersareavailableonlineat:www.koganpage.com/MarketingAnalytics
IForewordnMarketingAnalyticsMikeGrigsbyprovidesanewwayofthinkingaboutsolvingmarketingandbusinessproblems,withapracticalsetofsolutions.Thisrelevantguideis
intendedforpractitionersacrossavarietyoffields,butisrigorousenoughtosatisfytheappetiteofscholarsaswell.
IcancertainlyappreciateMike’smotivationsforthebook.Thisbookishiswayofgivingbacktotheanalyticscommunitybyofferingadviceandstep-by-stepguidanceforwaystosolvesomeofthemostcommonsituations,opportunities,andproblemsinmarketing.Heknowswhatworksforentry,mid-level,andveryexperiencedcareeranalyticsprofessionals,becausethisisthekindofguidehewouldhavelikedatthesestages.
WhileMike’seducationincludesaPhDinMarketingScience,healsopullsfromhisvastexperiencesfromhisstartasanAnalyst,throughhisjourneytoVPofAnalytics,towalkthereaderthroughthetypesofquestionsandbusinesschallengeswefaceintheanalyticsfieldonaregularbasis.Hisauthorityonthesubjectmatterisobvious,andhisenthusiasmiscontagious,andbestcapturedbymyfavouritesentenceofhisbook:‘Nowlet’slookatsomedataandrunamodel,becausethat’swhereallthefunis.’
Whatthiseducationandexperiencemeansfortherestofusisthatwehaveawell-informedauthorprovidinguswithinsightintotherealitiesofwhatisneededfromtheexcitingworkwedo,andhowwecannotonlyprovidebetterdecisionmaking,butalsomovetheneedleonimportanttheoreticalandmethodologicalapproachesinAnalytics.
Morespecifically,MarketingAnalyticscoversbothinter-relationalanddependency-drivenanalyticsandmodellingtosolvemarketingproblems.Inalightandconversationalstyle(bothengagingandsurprising)Mikearguesthat,ultimately,allmarketsrelyonastrongunderstandingoftheever-changing,difficulttopredict,sometimesfuzzy,andelusivemindsandheartsofconsumers.Anythingwecandotobetterarmourselvesasmarketerstodevelopthisunderstandingiscertainlytimewellspent.Consumerscanandshouldbethefocalpointofgreatstrategy,operationalstandardsofexcellenceandprocesses,tacticaldecisions,productdesign,andsomuchmore,whichiswhyitmakesperfectsensetobetterunderstandnotjustconsumerbehaviours,butalsoconsumerthoughts,opinions,andfeelings,particularlyrelatedtoyourvertical,competitors,andbrand.
Afterareviewofseminalworkonconsumerbehaviour,andanoverviewofgeneralstatisticsandstatisticaltechniques,MarketingAnalyticsdivesintorealisticbusinessscenarioswiththecleveruseofcorporatedialoguebetweenScott,ourfictitiousanalyst,andhisboss.Asourprotagonistprogressesthroughhiscareer,weseeanimprovementinhistoolkitofanalyticaltechniques.Hemovesfromanentrylevelanalystinacubicaltoa
seniorleaderofanalyticswithstaff.Theproblemsbecomemorechallenging,andtheprocessforchoosingtheanalyticstoapplytothesituationspresentedisanuncannyreflectionofreality–atleastbasedonmyexperiences.
WhatIappreciateabsolutelymostaboutthisworkthoughisthefullspectrumofproblemsolving,notjustanalyticsinavacuum.Mikewalksusfromtheinitialmomentwhenaproblemisidentified,throughcommunicationofthatproblem,framingbytheAnalyticsteam,techniqueselectionandexecution(fromthestraightforwardtosomewhatadvanced),communicationofresults,andusefulnesstothecompany.ThisrareandcertainlymorecompletepicturewarrantsatitlesuchasProblemSolvingusingMarketingAnalyticsinlieuoftheshortertitleMikechose.
MarketingAnalyticswillhaveyourethinkingyourmethods,developingmoreinnovativewaystoprogressyourmarketinganalyticstechniques,andadjustingyourcommunicationpractices.Finally,abookweallcanuse!
DrBeverlyWright,VP,Analytics,BKVConsulting
WPreface
e’llstartbytryingtogetafewthingsstraight.Ididnotsetouttowritea(typical)textbook.I’llmentionsometextbooksdownthelinethatmightbehelpfulinsome
areas,butthisistooslimforanacademictome.Leafthroughitandyou’llnotfindanymathematicproofs,noraretherepagesuponpagesofequations.Thisismeanttobeagentleoverview–moreconceptualthanstatistical–forthemarketinganalystwhojustneedstoknowhowtogetonwiththeirjob.Thatis,it’sforthosewhoare,orhopetobe,practitioners.Thisiswrittenwithpractitionersinmind.
IntroductionWhoistheintendedaudienceforthisbook?Thisisnotmeanttobeanacademictomefilledwithmathematicminutiaandclutteredwithstatisticalmumbo-jumbo.Therewillneedtobeanequationnowandthen,butifyourinterestiseconometricrigour,you’reinthewrongplace.AcoupleofgoodbooksforthatareEconometricAnalysisbyWilliamH.Greene(1993)andEconometricModels,TechniquesandApplicationsbyMichaelIntriligator,RonaldG.BodkinandChengHsiao(1996).So,thisbookisnotaimedatthestatistician,althoughtherewillbeafairamountofverbiageaboutstatistics.
Thisisnotmeanttobeareplacementforaprogrammingmanual,eventhoughtherewillbeSAScodesprinkledinnowandthen.Ifyou’reallaboutBI(businessintelligence),whichmeansmostlyreportingandvisualizingdata,thisisnotforyou.
Thiswillnotbeamarketingstrategyguide,butbeawarethatasmathematicsisthehandmaidenofscience,marketinganalyticsisthehandmaidenofmarketingstrategy.Thereisnopointtoanalyticsunlessithasastrategicpayoff.It’snotwhatisinterestingtotheanalyst,butwhatisimpactfultothebusinessthatisthefocusofmarketingscience.
So,towhomisthisbookaimed?Notnecessarilyattheprofessionaleconometrician/statistician,butthereoughttobesomesatisfactionhereforthem.Primarily,theaimisatthepractitioner(orthosewhowillbe).Theintendedaudienceisthebusinessanalystthathastopullatargetedlist,thecampaignmanagerthatneedstoknowwhichpromotionworkedbest,themarketerthatmustDE-marketsomesegmentofhercustomerstogainefficiency,themarketingresearcherthatneedstodesignandimplementasatisfactionsurvey,thepricinganalystthathastosetoptimalpricesbetweenproductsandbrands,etc.
Whatismarketingscience?Asalludedtoabove,marketingscienceistheanalyticarmofmarketing.Marketingscience(interchangeablewithmarketinganalytics)seekstoquantifycausality.Marketingscienceisnotanoxymoron(likemilitaryintelligence,happilymarriedorjumboshrimp)butisanecessary(althoughnotsufficient)partofmarketingstrategy.Itismorethansimplydesigningcampaigntestcells.Itsoverallpurposeistodecreasethechanceofmarketersmakingawrongdecision.Itcannotreplacemanagerialjudgment,butitcanofferboundariesandguardrailstoinformstrategicdecisions.Itencompassesareasfrommarketingresearchallthewaytodatabasemarketing.
Whyismarketingscienceimportant?
Marketingsciencequantifiesthecausalityofconsumerbehaviour.Ifyoudon’tknowalready,consumerbehaviouristhecentre-point,thehub,thepivotaroundwhichallmarketinghinges.Any‘marketing’thatisnotaboutconsumerbehaviour(understandingit,incentingit,changingit,etc.)isprobablyheadingdownthewrongroad.
Marketingsciencegivesinput/informationtotheorganization.Thisinformationisnecessaryfortheverysurvivalofthefirm.Muchlikeanorganismrequiresinformationfromitsenvironmentinordertochange,adaptandevolve,anorganizationneedstoknowhowitsoperatingenvironmentchanges.Tonotcollectandactandevolvebasedonthisinformationwouldbedeath.Tosurvive,forboththeorganizationandtheorganism,insights(fromdata)arerequired.Yes,thisisreasoningbyanalogybutyouseewhatImean.
Marketingscienceteasesoutstrategy.Unlessyouknowwhatcauseswhat,youwillnotknowwhichlevertopull.Marketingsciencetellsyou,forinstance,thatthissegmentissensitivetoprice,thiscohortprefersthismarcom(marketingcommunication)vehicle,thisgroupisundercompetitivepressure,thispopulationisnotloyal,andsoon.Knowingwhichlevertopull(bydifferentconsumergroups)allowsoptimizationofyourportfolio.
Whatkindofpeopleinwhatjobsusemarketingscience?Mostpeopleinmarketingscience(alsocalleddecisionscience,analytics,CRM,direct/databasemarketing,insights,research,etc.)haveaquantitativebent.Theireducationistypicallysomecombinationinvolvingstatistics,econometrics/economics,mathematics,programming/computerscience,business/marketing/marketingresearch,strategy,intelligence,operations,etc.Theirexperiencecertainlytouchesanyandallpartsoftheabove.Theidealanalyticpersonhasastrongquantitativeorientationaswellasafeelforconsumerbehaviourandthestrategiesthataffectit.Asinallmarketing,consumerbehaviouristhefocalpointofmarketingscience.
MarketingscienceisusuallypractisedinfirmsthathaveaCRMordirect/databasemarketingcomponent,orfirmsthatdomarketingresearchandneedtoundertakeanalyticsonthesurveyresponses.Forecastingisapartofmarketingscience,aswellasdesignofexperiments(DOE),webanalyticsandevenchoicebehaviour(conjoint).Inshort,anyquantitativeanalysisappliedtoeconomic/marketingdatawillhaveamarketingscienceapplication.Sowhilethesubjectsofanalysisarefairlybroad,thenumberof(typical)analytictechniquestendstobefairlynarrow.SeeConsumerInsightbyStone,BondandFoss(2004)togetaviewofthisinaction.
WhydoIthinkIhavesomethingtosayaboutmarketingscience?Fairquestion.Mywholecareerhasbeeninvolvedinmarketinganalytics.Formorethan
25yearsI’vedonedirectmarketing,CRM,databasemarketing,marketingresearch,decisionsciences,forecasting,segmentation,designofexperimentsandalltherest.WhilemyBBAandMBAareinfinance,myPhDisinmarketingscience.I’vepublishedafewtradeandacademicarticles,I’vetaughtschoolatbothgraduateandundergraduatelevelsandI’vespokenatconferences,allinvolvedinmarketingscience.I’vedoneallthisforfirmslikeDell,HP,theGapandSprint,aswellasconsultancieslikeTargetbase.OvertheyearsI’vegatheredafewopinionsthatI’dliketosharewithy’all.Andyes,I’vebeeninTexasforover15years.
Whatistheapproach/philosophyofthisbook?Aswithmostnon-fictionwriters,IwrotethisbecauseIwouldhavelovedtohavehadit,orsomethinglikeit,earlier.WhatIhadinminddidnotactuallyexist,asfarasIknew.
IhadbeenapractitionerfordecadesandthereweretimesIjustwantedtoknowwhatIshoulddo,whatanalytictechniquewouldbestsolvetheproblemIhad.Ididnotneedamathematically-orientedeconometricstextbook(likeGreene’s,orKmenta’sElementsofEconometrics(1986)asgreatastheyeachare).Ididnotneedalistofstatisticaltechniques(likeMultivariateDataAnalysisbyHairetal(1998)orMultivariateStatisticalAnalysisbySamKashKachigan(1991))asgreataseachofthemalsoare.WhatIneededwasa(simple)explanationofwhichtechniquewouldaddressthemarketingproblemIwasworkingon.Iwantedsomethingdirect,accessible,andeasytounderstandsoIcoulduseitandthenexplainit.Itwasokayifthebookwentintomoretechnicaldetailslater,butfirstIneededsomethingconceptualtoguideinsolvingaparticularproblem.WhatIneededwasamarketing-focusedbookexplaininghowtousestatistical/econometrictechniquesonmarketingproblems.Itwouldbeidealifitshowedexamplesandcasestudiesdoingjustthat.Voila.
GenerallythisbookhasthesamepointofviewasbookslikePeterKennedy’sAGuidetoEconometrics(1998)andGlennL.UrbanandStevenH.Star’sAdvancedMarketingStrategy(1991).Thatis,thetechniqueswillbedescribedintwoorthreelevels.Thefirstisreallyjustconceptual,devoidofmathematics,andtheaimistounderstand.Thenextlevelismoretechnical,andwilluseSASorsomethingelseasneededtoillustratewhatisinvolved,howtointerpretit,etc.Thenthefinallevel,ifthereisone,willberathertechnicalandaimedreallyonlyattheprofessional.Andtherewillbebusinesscasestoofferexamplesofhowanalyticssolvesmarketingquestions.
OnethingIlikeaboutStephanSorger’s2013book,MarketingAnalytics,isthatintheopeningpageshechampionsaction-ability.Marketingsciencehastobeaboutaction-ability.IknowsomeacademicpuristswillreadthefollowingpagesandgaspthatIoccasionallyallow‘badstats’tocreepin.(Forexample,itiswellknownthatforecastingoftenisimprovedifcollinearindependentvariablesarefound.Shock!)Butthepointisthatevenanimperfectmodelisfarmorevaluablethanwaitingforacademicwhitetower
purity.Businessisabouttimeandmoneyandevenacloudyinsightcanhelpimprovetargeting.Putsimply,thisbook,andmarketingscience,isultimatelyaboutwhatworks,notwhatwillbepublishedinanacademicresearchpaper.
Alloftheabovewillbecastintermsofbusinessproblems,thatis,intermsofmarketingquestions.Forexample,amarketer,say,needstotargethismarketandhehastolearntodosegmentation.Orshehastomanageagroupthatwilldosegmentationforher(aconsultant)andneedstoknowsomethingaboutitinordertointelligentlyquestion.Theproblemwillbeaddressedintermsofwhatissegmentation,whatdoesitmeantostrategy,whydoit,etc.Thenadescriptionofseveralanalytictechniquesusedforsegmentationwillbedetailed.Thenafairlyinvolvedandtechnicaldiscussionwillshowmoreadditionalstatisticaloutput,andanexampleortwowillbeshown.ThisoutputwilluseSAS(orSPSS,etc.)asnecessary.Thiswillalsohelpguidestudentsastheypreparetobecomeanalysts.
Therefore,thephilosophyistopresentabusinesscase(aneedtoanswerthemarketingquestions)anddescribeconceptuallyvariousmarketingsciencetechniques(intwoorthreeincreasinglydetailedlevels)thatcananswerthosequestions.Thenwith,say,SASoutputwillbedevelopedthatshowshowthetechniqueworks,howtointerpretitandhowtouseittosolvethebusinessproblem.Finally,moretechnicaldetailsmaybeshown,asneeded.Okay?
So,ontoalittlestatisticalreview.
Partone
Overview
01
A(little)statisticalreviewMeasuresofcentraltendency
Measuresofdispersion
Thenormaldistribution
Relationsamongtwovariables:covarianceandcorrelation
Probabilityandthesamplingdistribution
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…Youknewwehadtodothis,haveageneralreviewofbasicstatistics.Ipromise,it’llbemostlyconceptual,agentlereminderofwhatwelearnedinIntroductoryStatistics.AlsonotetheDefinitionBoxeshelpingtodescribekeyterms,pointoutjargon,etc.
MeasuresofcentraltendencyFirstwe’lldealwithsimpledescriptivestatistics,confinedtoonevariable.We’llstartwithmeasuresofcentraltendency.
Measuresofcentraltendencyincludethemean,medianandmode.
Mean:adescriptivestatistic,ameasureofcentraltendency,themeanisacalculationsummingupthevalueofalltheobservationsanddividingbythenumberofobservations.
Themeaniscalculatedas:
Thatis,sumalltheobservationsup(alltheindividualXs)andthendividebythenumberofobservations(Xs).Thisiscommonlycalled‘theaverage’butI’dliketoofferadifferentviewof’average’.
Average:themostrepresentativemeasureofcentraltendency,NOTnecessarilythemean.
Averageisthemeasureofcentraltendency,thenumbermostlikelytooccur,themostrepresentativenumber.Thatis,itmightnotbethemean;itcouldbethemedianoreven
themode.Thisisourfirstincursionintoastatisticalwayofthinking.
I’dliketopersuadeyouthatit’spossible,forexample,thatthemedianismorerepresentativethanthemean,insomecases–andthatinthosecasesthemedianistheaverage,themostrepresentativenumber.
Median:themiddleobservationinanoddnumberofobservations,orthemeanofthemiddletwoobservations.
Themedianis,bydefinition,thenumberinthemiddle,the50thpercentile,thatvaluethathasjustasmanyobservationsaboveitasbelowit.
ConsiderhomesalespricesviaFigure1.1.Themeanis141,000butthemedianis110,000.Whichnumberismostrepresentative?Isubmititisnotthemean,butthemedian.Ialsosubmitthatthebestmeasureofcentraltendency,inthisexample,isthemedian.Thereforethemedianistheaverage.Iknowthat’snotwhatyoulearnedinthirdgrade,butgetusedtoit.Statisticshasawayofturningoneslightlyaskew.
Figure1.1Homesalesprices
Justtobeclear,Isuggestthatthemeasureofcentraltendencythatbestdescribesthehistogramaboveshouldbecalled‘average’.Modeisthenumberthatappearsmostoften,medianistheobservationinthemiddleandmeanistheobservationssummedovertheircount.
Mode:thenumberthatappearsmostoften.
Averageisthemostrepresentativenumber.Ofcourseitdoesn’thelpthisargumentthatExceluses=AVERAGE()asthefunctiontocalculatethemeaninsteadof=MEAN().I’vetriedaskingBillaboutitbuthe’snotreturnedmycalls,sofar.
MeasuresofdispersionMeasuresofcentraltendencyalonedonotadequatelydescribethevariable(avariableisa
thingthatvaries,likehomesalesprices).Theotherdimensionofavariableisdispersion,orspread.
Therearethreemeasuresofdispersion:range,varianceandstandarddeviation.
Range:ameasureofdispersionorspread,calculatedasthemaximumvaluelesstheminimumvalue.
Rangeiseasy.It’ssimplytheminimum(smallestvalue)observationsubtractedfromthemaximum(largestvalue).It’snotparticularlyuseful,especiallyinamarketingcontext.
Varianceisanothermeasureofdispersionorspread.
Variance:ameasureofspread,calculatedasthesummedsquareofeachobservationlessthemean,dividedbythecountofobservationslessone.
Conceptuallyittakeseachobservationandsubtractsthemeanofalltheobservationsfromit,thensquareseachobservationandaddsupthesquares.Thatquantityisdividedbyn–1,thetotalnumberofobservations,lessone.Theformulaisbelow.Notethisisthesampleformula,nottheformulaforthepopulation.
(NotethatX-baristhesymbolforsamplemean,whileµwouldbethesymboltouseforpopulationmean;swouldbethesymboltouseforsamplestandarddeviationandσwouldbethesymboltouseforpopulationstandarddeviation.)
Now,whatdoesvariancetellus?Unfortunately,notmuch.Itsaysthat(fromTable1.1)thisvariableof18observationshasameanof25andavariance,orspread,of173.6.Butvariancegetsustothestandarddeviation,whichDOESmeansomething.
Table1.1Variance
X X-mean squared
2 –23 529.3
5 –20 400.3
8 –17 289.2
10.9 –14.1 199.3
13.9 –11.1 123.6
16.9 –8.1 65.9
19.9 –5.1 26.2
22.9 –2.1 4.5
25.9 0.9 0.8
28.9 3.9 15.1
31.9 6.9 47.4
33 8 63.9
34 9 80.9
35 10 99.9
36 11 120.9
39 14 195.8
42 17 288.8
45 20 399.7
Mean=25.0 Sum=2,951.3
Count=18 Variance=173.6
Standarddeviation:thesquarerootofvariance.
Standarddeviationiscalculatedbytakingthesquarerootofvariance.Inthiscasethesquarerootof173.6is13.17.Now,whatdoes13.17mean?Itdescribesspreadordispersioninawaythatremovesthescaleofthevariable.Thatis,thereareknownqualitiesofastandarddeviation.Inafairlynormaldistributiondispersionisspreadaroundthemean(whichequalsthemodewhichequalsthemedian).Thatis,thereisasymmetricalspreadaroundthemeanof25.Inthiscasethespreadis25+/–13.17.Thatmeansthat,ingeneral,onestandarddeviation(+/–13.17)fromthemeanwillcontain68%ofallobservations:seeFigure1.2.Thatis,asthecountincreases(basedonthecentrallimittheorem)thedistributionapproachesnormal.Inanormal(bell-shaped)curve,50%ofallobservationsfalltotheleftofthemeanand50%ofallobservationsfalltotherightofthemean.Knowingthestandarddeviationgivesinformationaboutthevariablethatcannotbeobtainedanyotherway.
Figure1.2Standarddeviation
So,bysayingavariablehasameanof25andastandarddeviationof13.17,automaticallymeansthat68%ofallobservationsarebetween11.8and38.2.ThisimmediatelytellsmethatifIfindanobservationthatis<11.8,itisalittlerare,orunusual,giventhat68%willbe>11.8(and<38.2).
So,onestandarddeviationaccountsfor34%belowthemeanand34%abovethemean.Thesecondstandarddeviationaccountsfor14%andthethirddeviationaccountsforalmost1.99%.Thismeansthatthreestandarddeviationstotheleftofthemeanaccountsfor34%+14%+1.99%,ornearly50%ofallobservations.Likewiseforthepositive/rightsideofthemean.
Asanexample,itiswellknownthatIQhasameanof100andastandarddeviationofabout15.Thismeansthat34%ofthepopulationshouldfallbetween100and115.Thisisbecausethemeanis100andthestandarddeviationis15,or115.Thesecondstandarddeviationaccountsforanother14%.Or48%(34%+14%)ofthepopulationshouldbebetween100and130.Finally,justunder2%willbe>3standarddeviation,orhavinganIQ>130.Soyouseehowusefulthestandarddeviationis.Itimmediatelygivesmoreinformationaboutthespread,orhowlikelyorunusualparticularobservationsare.Forexample,ifwehadanIQtestthatshowed150,thisisaVERYrareevent,inthatit’sintherealmof>4standarddeviations:100–115is1,115–130is2,145is3and150is3.33standarddeviationsabovethemean.
ThenormaldistributionI’vealreadymentionedthenormaldistributionbutlet’ssayacouplemoreclarifyingthingsaboutit.Thenormaldistributionisthetraditionalbell-shapedcurve.Onecharacteristicofanormaldistributionisthatthemeanandthemedianandthemodearevirtuallythesamenumber.Thenormaldistributionissymmetricaboutthemeasureofcentraltendency(mean,medianandmode)andthestandarddeviationdescribesthespread,asabove.
Let’salsomentionthecentrallimittheorem.Thissimplymeansthatasn,orthecount,
increases,thedistributionapproachesanormaldistribution.Thisallowsustotreatallvariablesasnormal.
Nowforaquickwordaboutz-scoresasthiswillbehandylater.
Z-score:ametricdescribinghowmanystandarddeviationsanobservationisfromitsmean.
Az-scoreisameasureofthenumberofstandarddeviationsanobservationisrelativetoitsmean.Itconverts
anobservation,intothenumberofstandarddeviationsaboveorbelowthemeanbytakingtheobservation(Xi)andsubtractingthemeanfromitandthendividingthatquantitybyitsstandarddeviation.IntermsofIQ,anobservationof107.5willhaveaz-scoreof(107.5–100)/15,or0.5.ThismeansthatanIQof107.5isone-halfastandarddeviationabovethemean.Since34%(from100–115)lieabovethemean,az-scoreof0.5meansthisobservationoccurshalfway,orabout17%,abovethemean.Thismeansthisobservationis17%aboveaverage(whichis50%)orgreaterthan67%ofthepopulation.Notethat17%+14%+1.99%(orabout33%)areabovethisobservation.
Relationsamongtwovariables:covarianceandcorrelationAlloftheabovedescriptivediscussionswereaboutonevariable.Rememberthatavariableisanitemthattakesonmultiplevalues.Thatis,avariableisathingthatvaries.Nowlet’stalkabouthavingtwovariablesandthedescriptivemeasuresofthem.
CovarianceCovariance,likevariance,ishowonevariablevariesintermsofanothervariable.
Covariance:thedispersionorspreadoftwovariables.
It,likevariance,doesnotmeanmuch;it’sjustanumber.Ithasnoscale,norboundaries,andinterpretationisminimal.Theformulais:
ItmerelydescribeshoweachXobservationvariesfromitsmean,intermsofhoweachYobservationvariesfromitsmean.Thensumtheseupanddividebyn,thecount.Again,thenumberisnearlyirrelevant.
SaywehavethedatasetinTable1.2.Notethecovarianceis77.05,whichagainmeansverylittle.
Table1.2Covarianceandcorrelation
X Y
2 3
4 5
6 7
8 9
9 9
11 11
11 8
13 10
15 12
17 14
19 16
21 22
22 22
24 11
26 12
28 22
30 24
32 26
33 28
33 39
Covar= 77.05
Correl= 87.90%
CorrelationCorrelation,likestandarddeviation,doeshaveameaning,andanimportantone.
Correlation:Ameasureofbothstrengthanddirection,calculatedasthecovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.
Correlationexpressesbothstrengthanddirectionofthetwovariables.Itrangesfrom–100%to+100%.Anegativecorrelationmeansthatas,say,Xgoesup,Ytendstogodown.Averystrongpositivecorrelation(say80%or90%)meansthatasXgoesupby,say10,Yalsogoesupbynearlythesameamount,maybe8or9.NotethatinTable1.2thecorrelationis87.9%whichisprobablyaverystrongrelationshipbetweenXandY.TheformulaforcorrelationiscovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.Thatis,togofromcovariancetocorrelation,covarianceisdividedbythestandarddeviationofxmultipliedbythestandarddeviationofy.Theformulais:
ProbabilityandthesamplingdistributionProbabilityisanimportantconceptinstatisticsofcourseandI’llonlytouchonithere.
First,let’stalkabouttwokindsofthinking:deductiveandinductive.Deductivethinkingiswhatyouaremostfamiliarwith:basedonrulesoflogicandconclusionsfromcausality.Becauseofthisthing,thisconclusionmustbetrue.However,statisticalthinkingisinductive,notdeductive.Inductivethinkingreasonsfromsampletopopulation.Thatis,statisticsisaboutinferencesandgeneralizingtheconclusion.Thisiswhereprobabilitycomesin.Typically,inmarketing,weneverhavethewholepopulationofadataset:wehaveasample.
Here’swhereitgetsalittletheoretical.SaywehaveasampleofdataonXthatcontains1,000observationswithameanof50.Now,theoretically,wecouldhaveaninfinitenumberofsamplesthathaveavarietyofmeans.Indeed,weneverknowwhereoursampleis(withitsmeanof50)inthetotalpossibilityofsamples.Ifwedidhavealargenumberofsamplesdrawnfromthepopulationandwecalculatedthosemeansofthosesamplesthatwouldconstituteasamplingdistribution.
Forexample,saywehaveabarrelcontaining100,000marbles.Thatisthewholepopulation.10%ofthesemarblesareredand90%ofthesemarblesarewhite.Wecanonlydrawasampleof100atatimeandcalculatethemeanofredmarbles.
Inthiscase(contrivedasitis)weKNOWtheaveragenumberofmarblesdrawn,overall,willbe10%.Butnote–andthisisimportant–thereisnoguaranteethatanyoneofoursamplesof100willactuallybe10%.Itcouldbe5%(3.39%ofthetimeitwillbe)anditcouldbe14%(5.13%ofthetimeitwillbe).Itwill,ofcourse,onaverage,be10%.Indeed,only13.19%ofthesampledrawnwillactuallybe10%!Thebinomialdistributiontellsustheabovefacts.
Therefore,wecouldhavedrawnanunusualsamplethathadonly5%redmarbles.Thiswouldoccur3.39%ofthetime,roughly1outof33.That’snotthatrare.Andwehavein
actualitynowaytoreallyknowhowlikelythesamplewehaveistocontainthepopulationmeanof10%.Thisiswhereconfidenceintervalscomein,whichwillbedealtwithlaterinstatisticaltesting.
ConclusionThat’sallIwanttomentionintermsofstatisticalbackground.Morewillbeappliedlater.Nowlet’sgetonwiththefun.
Checklist
You’llbethesmartestpersonintheroomifyou:
Rememberthreemeasuresofcentraltendency:mean,medianandmode.
Rememberthreemeasuresofdispersion:range,varianceandstandarddeviation.
Constantlypointouttherealdefinitionofaverageas‘themostrepresentativenumber’,thatis,itmightNOTnecessarilybethemean.
Alwayslookatametricintermsofbothcentraltendencyaswellasdispersion.
Thinkofaz-scoreasameasureofthelikelihoodofanobservationoccurring.
Observethatcorrelationisabouttwodimensions:strengthanddirection.
02
BriefprinciplesofconsumerbehaviourandmarketingstrategyIntroduction
Consumerbehaviourasthebasisformarketingstrategy
Overviewofconsumerbehaviour
Overviewofmarketingstrategy
Conclusion
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionYouwillnotethatIhavetiedtwosubjectstogetherinthischapter;consumerbehaviourandmarketingstrategy.That’sbecausemarketingstrategyisallaboutunderstandingconsumerbehaviourandincentivizingitinsuchawaythatthefirmandtheconsumerbothwin.Iknowalotofmarketerswillbesaying,‘Butwhataboutcompetitors?Aretheynotpartofmarketingstrategy?’Andtheansweris,‘No,notreally.’Iamawareofthegaspsthiswillcause.
Byunderstandingconsumerbehaviour,partofthatinsightwillcomefromwhatexperienceconsumershavewithcompetitors,butthefocusisonconsumer,notcompetitive,behaviour.IknowJohnNashandhisworkingametheorytakesabackseatinmyview,butthisisonpurpose.Muchlikethefinancialmotto‘watchthepenniesandthedollarswillfollow’,Isay,‘focusontheconsumerandcompetitiveunderstandingwillfollow’.
Justtobeclear,marketingscienceshouldbeattheconsumerlevel,NOTthecompetitivelevel.Byfocusingoncompetitorsyouautomaticallymovefromamarketingpointofviewtowardafinancial/economicpointofview.
Consumerbehaviourasthebasisformarketingstrategy
Inmarketing,theconsumeriscentralIliketouseStevenP.Schnaars’MarketingStrategybecauseofthefocusonconsumerbehaviour(Schnaars,1997).Andbecausehe’sright.Amarketingorientationisconsumer-
centric;anythingelseisbydefinitionNOTmarketing.Marketingdrivesfinancialresultsandinordertobemarketing-orientedtheremustbeaconsumer-centricfocus.Thatmeansallmarketingactivitiesaregearedtolearnandunderstandconsumer(andultimatelycustomer)behaviour.
Themarketingconceptdoesnotmeangivingtheconsumer(only)whattheywant,because:
1. theconsumer’swantscanbewidelydivergent;2. theconsumer’swantscontradictthefirm’sminimumneeds;and3. theconsumermightnotknowwhattheywant.Itismarketing’sjobtolearnand
understandandincentivizeconsumerbehaviourtoawin-winposition.
Theobjectionfromproduct-centricmarketersAsafairargument,consumer-centricityrunscontratoproductmanagers.ProductmanagersfocusondevelopingproductsandTHENfindingconsumerstobuythem.(Immediateexamplesthatspringtomindcomefromtechnology,suchasoriginalHP,Apple,etc.)Thissometimesworks,butoftenitdoesnot.TheposterchildforproductfocusregardlessofwhatconsumersthinktheywantisChrysler’sminivanstrategy.ThestoryisthatChryslerchiefLeeIacoccawantedtodesignandproducetheminivanbutthemarketresearchtheydidtoldhimtherewasnodemandforit.Consumerswereconfusedbythe‘halfwaybetweenacarandaconversion(full-size)van’andwerenotinterestedinit.IacoccawentaheadanddesignedandbuiltitanditbasicallysavedChrysler.Whatisthepoint?Onepointisthatconsumersdonotalwaysknowwhattheywant,especiallywithanew/innovativeproducttheyhavenoexperiencewith.ThesecondpointisthatnoteveryonehasthegeniusofLeeIacocca.
Overviewofconsumerbehaviour
BackgroundofconsumerbehaviourAsimpleviewofconsumerbehaviourisbestunderstoodinthemicroeconomicanalysisof‘theconsumerproblem’.Thisisgenerallysummarizedinthreequestions:
1. Whatareconsumers’preferences(intermsofgoods/services)?2. Whatareconsumers’constraints(allocatinglimitedbudgets)?3. Givenlimitedresources,whatareconsumers’choices?
Thisassumesthatconsumersarerationalandhaveadesiretomaximizetheirsatisfaction.
Let’stalkaboutgeneralassumptionsofconsumerpreferences.Thefirstisthatpreferencesarecomplete,meaningconsumerscancompareandrankallproducts.Thesecondassumptionisthatpreferencesaretransitive.Thisisthemathematicrequirement
thatifXispreferredtoYandYispreferredtoZthenXispreferredtoZ.Thethirdassumptionisthatproductsaredesirable(a‘good’isgoodorofvalue).Thismeansthatmoreisbetter(costsnotwithstanding).
Aquicklookintotheassumptionsabovemakesitclearthattheyaremadeinordertodothemathematics.Thisultimatelymeansthatcurveswillbeproduced(thebaneofmostmicroeconomicsstudents)thatlendthemselvestosimplegraphics.Thisimmediatelyleadsintousingthecalculusforanalyticreasons.Calculusrequiressmoothcurvesandtwicedifferentiabilityinordertowork.THISmeansthatsomeheroicassumptionsindeedarerequired,especiallyceterisparibus(holdingallotherthingsconstant).
ThedecisionprocessConsumersgothroughashopping-purchasingprocess,usingdecisionanalyticstocometoachoice.Itshouldberecognizedthatnotalldecisionsareequallyimportantorcomplex.Basedontheriskofawrongchoice,eitherextendedproblemsolvingorlimitedproblemsolvingwilltendtobeused.
Extendedproblemsolvingisusedwhenthecostoftheproductishigh,ortheproductwillbelivedwithforalongtime,orit’stheinitialpurchase,etc.Somethingaboutthechoicerequiresmorethought,evaluationandrigour.
Limitedproblemsolvingisofcoursetheopposite.Whenproductsareinexpensive,shortlived,notreallyimportantorwithlowriskofa‘wrong’decision,limitedproblemsolvingisused.Oftenoneormoreofthe(below)stepsareomitted.Thechoiceismoreautomatic.Thechoiceisusuallyreducedtoarule:whatexperiencetheconsumerhashadbefore,whatbrandtheyhavedisliked,whatpriceislowenough,whattheirneighbourshavetoldthem,etc.
Thetypicaldecisionprocessintermsofconsumerbehaviour(forexample,seeConsumerBehaviorbyEngel,BlackwellandMiniard,1995)isaboutneedrecognition,searchforinformation,informationprocessing,alternativeevaluation,purchaseandpost-purchaseevaluation.Therearemarketingopportunitiesalongeachsteptoinfluenceandincent.
Needrecognition
Theinitiatoroftheconsumerdecisionprocessisneedrecognition.Thisisarealizationthatthereisa‘cognitivedissonance’betweensomeidealstateandthecurrentstate.Thereismuchadvertisingaroundneedarousal.Fromeducatingconsumersonrealneeds(survival,satisfaction)toinformingconsumersaboutpseudo-needs(‘jumponthebandwagon–allofyourfriendshavealreadydoneit!’)needarousaliswhereitstarts.
Searchforinformation
Nowtheconsumerrecallswhattheyhaveheardorwhattheyknowabouttheproductto
infer,dependingonwhethertheproductrequireslimitedorextensiveengagement,anabilitytomakeadecision.Obviouslyadvertisingandbrandingcomeintoplayhere,informingconsumersofbenefits,differentiation,etc.
Informationprocessing
Thenextstepisfortheconsumertoabsorbwhatinformationtheyhaveandwhatfactstheyknow.MostmarketingmessagingstrategiespreferforconsumerstoNOTprocessinformation,buttorecallsuchthingsaspositivebrandexposure,satisfactionfrompreviousinteractionsoremotionalloyalty.Ifconsumersdonot‘process’information(iecriticallyevaluatecostsandbenefits)thentheycanusebrandequity/satisfactiontomaketheshorthanddecision.Itismarketingscience’sjobtofindthosethatareconsidering,distinctfromthosethathave‘alreadydecided’.
Pre-purchasealternativeevaluation
Now,afterinformationhasbeenprocessed,comesthecriticalfinalcomparison:doesthepotentialproducthaveattributestheconsumerconsidersgreaterthantheconsumer’sstandards?Thatis,givenbudgetarystandards,whatistheproductlikelytoofferintermsofsatisfaction(economicutilization)aftertheconsumerhasdecideditisaboveminimumqualifications?
Purchase
Finally,thewholepointofthemarketingfunnelispurchase.Asaleisthelastpiece.Thisisthedecisionoftheconsumerbasedontheshoppingprocessdescribedabove.Theactualpurchaseactioncarrieswithinitalltheabove(andbelow)processesandalloftheactualandperceivedproductattributes.
Post-purchaseevaluation
Buttheconsumerdecisionprocessdoesnot(usually)endwithpurchase.Generallyitisacomparisonwithwhattheconsumerthought(hoped)wouldbetheutilizationgainedfromconsumingtheproductcomparedtowhatactual(perceived)satisfactionwasreceivedfromtheproduct.Thatis,thecreationofloyaltystartspostpurchase.
Now,withconsumerbehaviourcentrallylocated,let’sthinkaboutafirm’sstrategy.Keepthedifferencesbetweencompetitivemovesandconsumerbehaviourfirmlyinmind.
OverviewofmarketingstrategyTheabovewastofocusonconsumerbehaviour.Marketing,tobemarketing,isaboutunderstandingandincentivizingconsumerbehaviourinsuchawaythatboththeconsumerandthefirmgetwhattheywant.Consumerswantaproductthattheyneedwhentheyneeditatapricethatgivesthemvaluethroughachanneltheyprefer.Firmswantloyalty,customersatisfactionandgrowth.Sinceamarketisaplacewherebuyersandsellersmeet,
marketingisthefunctionthatmovesthebuyersandsellerstowardeachother.
Giventheabove,itshouldbenotedthatmarketingstrategyhasevolved(primarilyviamicroeconomics)toafirmvs.firmrivalry.Thatis,marketingstrategyisindangerofforgettingthefocusonconsumerbehaviourandjumpingdeepintosomethinglikegametheorywhereinonefirmcompeteswithanotherfirm.
Everythingthatfollowsaboutmarketingstrategycanbethoughtofasanindirectconsequenceoffirmvs.firmbasedonadirectconsequenceoffocusingonconsumerbehaviour.Thatis,fightingafirmmeansincentivizingconsumers.Thinkofitasaniceberg:whatisseen(firmscompeting)isthetipabovethesurface,butwhatisreallyhappeningthatmovestheicebergisunseen(fromotherfirm’spointofview)belowthesurface(incentivizingconsumers).
TypesofmarketingstrategyEveryoneshouldbeawareofMichaelPorterandhismonumentalarticleandbookaboutcompetitivestrategy(Porter,1979/1980).Thisiswheremarketingstrategybecameadiscipline.
FirstPorterdetailedfactorscreatingcompetitiveintensity.(Tomakeanobviouspoint:whatarefirmscompetingover?Consumerloyalty.)Thesefactorsarethebargainingpowerofsuppliers,thebargainingpowerofbuyers,thethreatofnewentrants,therivalryamongexistingfirmsandthethreatofsubstituteproducts:
Thebargainingpowerofbuyersmeansfirmsloseprofitfrompowerfulbuyersdemandinglowerprices.Thismeansconsumersaresensitivetoprice.
Thebargainingpowerofsuppliersmeansfirmsloseprofitduetopotentialincreasedfactor(input)prices.Suppliersonlyhavebargainingpowerbecauseafirm’smarginsarelow,becauseafirmcannotraiseprices,becauseconsumersaresensitivetoprice.
Thethreatofnewentrantslowersprofitsduetonewcompetitorsenteringthemarket.Again,consumersaresensitivetopriceandveryinformedabouttheotherfirm’sofferings.
Theintensityofrivalrycauseslowerpricesbecauseofthezerosumgamesuppliedbyconsumers.Thereareonlyacertainnumberofpotentialloyalcustomersandifafirmgainsonethenanotherfirmlosesthatone.
Thethreatofsubstituteproductsinvitesconsumerstochooseamongthelower-pricedproducts.
Notehowallofthisstrategy(whichappearslikefirmsfightingotherfirms)isactuallybasedonconsumerbehaviour.AmIputtingtoofineapointonthis?Maybe,butitdoeshelpusfocus,right?
Basedonthesefactorsafirmcanascertaintheintensityofcompetition.Themorecompetitivetheindustryis,themoreafirmmustbeapricetaker,thatis,theyhavelittlemarketpower,meaninglittlecontroloverprice.Thisaffectstheamountofprofiteachfirmintheindustrycanexpect.Giventhis,afirmcanevaluatetheirstrengthsandweaknessesanddecidehowtocompete.Ornot.
Porterthendidabrilliantthing:hedevised,basedontheabove,threegenericstrategies.Afirmcancompeteoncosts(bethelow-costprovider),afirmcandifferentiateandfocusonhigh-endproductsorafirmcansegmentandfocusonasmaller,nichepartofthemarket.Thepointisthefirmneedstocreateandadheretoaparticularstrategy.Oftenfirmsaredilutedanddoeverythingatonce.
However,TreacyandWiersematookPorter’sframeworkandevolvedit(TreacyandWiersema,1997).Theytoocameupwiththreestrategies(disciplines):operationalexcellence(basicallyafocusonlowercosts),productleadership(afocusonhigher-enddifferentiatedproducts)andcustomerintimacy(adifferentiation/segmentationstrategy).YoucanseetheiruseandextensionofPorter’sideas.Bothhavethesamebottomline:firmsshouldbedisciplinedandconcentratetheireffortscorporate-wideonprimarilyone(andonlyone)strategicfocus.
AppliedtoconsumerbehaviourStephanSorger’sexcellentMarketingAnalytics(Sorger,2013)hasabriefdescriptionofcompetitivemoves,bothoffensiveanddefensive.Summariesofeachmovebutappliedviaconsumerbehaviourarenowconsidered.
Defensivereactionstocompetitormoves
Bypassattack(theattackingfirmexpandsintooneofourproductareas)andthecorrectcounterisforustoconstantlyexplorenewareas.RememberTheodoreLevitt’sMarketingMyopia(Levitt,1960)?Ifnot,re-readit;youknowyouhadtoinschool.
Encirclementattack(theattackingfirmtriestooverpoweruswithlargerforces)andthecorrectcounteristomessagehowourproductsaresuperior/uniqueandofmorevalue.Thisrequiresaconstantmonitoringofmessageeffectiveness.
Flankattack(theattackingfirmtriestoexploitourweaknesses)andthecorrectcounteristonothaveanyweaknesses.Thisagainrequiresmonitoringandmessagingtheuniqueness/valueofourproducts.
Frontalattack(theattackingfirmaimsatourstrength)andthecorrectcounteristoattackbackinthefirm’sterritory.Obviouslythisisararelyusedtechnique.
Offensiveactions
Newmarketsegments:thisusesbehaviouralsegmentation(seethelatterchaptersonsegmentation)andincentsconsumerbehaviourforawin-winrelationship.
Go-to-marketapproaches:thislearnsaboutconsumers’preferencesintermsofbundling,channels,buyingplans,etc.
Differentiatingfunctionality:thisapproachextendsconsumers’needsbyofferingproductandpurchasecombinationsmostcompellingtopotentialcustomers.
ConclusionTheabovewasabriefintroductiononbothconsumerbehaviourandhowthatbehaviourappliestomarketingstrategy.Theover-archingpointisthatmarketingscience(andmarketingresearch,marketingstrategy,etc.)shouldallbefocusedonconsumerbehaviour.Goodmarketingisconsumer-centric.Haveyouheardthatbefore?
Checklist
You’llbethesmartestpersonintheroomifyou:
Rememberthatinmarketing,theconsumeriscentral,NOTTHEFIRM.
Pointouttheconsumer’sproblemisalwayshowtomaximizeutilization/satisfactionwhilemanagingalimitedbudget.
Thinkabouttheconsumer’sdecisionprocesswhileundertakingallanalyticprojects.
Recallthatstrategyisafocusonconsumerbehaviour,notcompetitivebehaviour.
RememberthatbothPorterandTreacyandWiersemaprovidethreegeneralstrategies.
Observethatcompetitivecombatcanbethoughtofintermsofconsumerbehaviour.
Parttwo
Dependentvariabletechniques
03
Modellingdependentvariabletechniques(withoneequation)Whatarethethingsthatdrivedemand?Introduction
Dependentequationtypevsinter-relationshiptypestatistics
Deterministicvsprobabilisticequations
Businesscase
Resultsappliedtobusinesscase
Modellingelasticity
Technicalnotes
Highlight:Segmentationandelasticitymodellingcanmaximizerevenueinaretail/medicalclinicchain:fieldtestresults
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionNow,ontothefirstmarketingproblem:determiningandquantifyingthosethingsthatdrivedemand.Marketingisaboutconsumerbehaviour(whichI’vetouchedonbutaboutwhichIwillhavemoretosaylater)andthepointofmarketingisaboutincentivizingconsumerstopurchase.Thesepurchases(typicallyunits)arewhateconomistscalldemand.(Bytheway,financeismoreaboutsupplyandthetwotogetheraresupplyanddemand.RememberbackinBeginningEconomics?)
Dependentequationtypevsinter-relationshiptypestatisticsBeforewediveintotheproblemathand,itmightbegoodtobackupandgivesomesimpledefinitions.Therearetwokindsof(general)statisticaltechniques:thedependentequationtypeandtheinter-relationshiptype.Dependenttypestatisticsdealwithexplicitequations(whichcaneitherbedeterministicorprobabilistic,seebelow).Inter-relationshiptechniquesarenotequations,butthevariancebetweenvariables.Thesewillbecovered/definedlaterbutaretypesoffactoranalysisandsegmentation.Clearlythis
currentchapterisaboutanequation.
DeterministicvsprobabilisticequationsNowlet’stalkabouttwokindsofequations:deterministicandprobabilistic.Deterministicisalgebraic(y=mx+b)andtheleftsideexactlyequalstherightside.
Profit=Revenue–expenses.
Ifyouknowtwoofthequantitiesyoucanalgebraicallysolveforthethird.ThisisNOTthekindofequationdealtwithinstatistics.Ofcoursenot.
Statisticsdealswithprobabilisticequations:
Y=a+bXi+e.
HereYisthedependentvariable(say,sales,unitsortransactions),aistheconstantorintercept,Xissomeindependentvariable(s)(say,price,advertising,seasonality),bisthecoefficientorslopeandeistherandomerrorterm.It’sthisrandomerrortermthatmakesthisequationaprobabilisticone.Ydoesnotexactly=a+bXibecausethereissomerandomdisturbance(e)thatmustbeaccountedfor.ThinkofitasY,onaverage,equalssomeinterceptplusbXi.
Asanexample,saySales=constant+price*slope+error,thatis,Sales=a+Price*b+e.NotethatY(sales)dependsonprice,+/–.
BUSINESSCASEOk,saywehaveaguy,Scott,who’sananalyticmanagerataPCmanufacturingfirm.ScotthasanMSineconomicsandhasbeendoinganalyticsforfouryears.HestartedmostlyasanSASprogrammerandhasonlyrecentlybeenusingstatisticalanalysistogiveinsightstodrivemarketingscience.
Scottiscalledintohisboss’soffice.Hisbossisagoodstrategistwithadirectmarketingbackgroundbutisnotwellversedineconometrics/analytics,etc.
‘Scott’,thebosssays,‘weneedtofindawaytopredictourunitsales.Morethanthat,weneedsomethingtohelpusunderstandwhatdrivesourunitsales.Somethingthatwecanuseasalevertohelpincreasesalesoverthequarter.’
‘Ademandmodel.’Scottsays.‘Unitsareafunctionof,what,price,advertising?’
‘Sure.’
Scottgulpsandsays,‘I’llseewhatIcando.’
Thatnighthethinksaboutitandhassomeideas.He’llfirsthavetothinkabout
causality(‘Demandiscausedby…’)andthenhe’llhavetogetappropriatedata.
It’ssmarttoformulateatheoreticmodelfirst,regardlessofwhatdatayoumayormaynothave.First,trytounderstandthedata-generatingprocess(‘thisiscausedbythat,andmaybethat,etc.’)andthenseewhatdata,orproxiesfordata,canbeusedtoactuallyconstructthemodel.
It’salsowisetohypothesizethesignsofthe(independent,right-handside)variablesyouthinksignificantincausingyourdependentvariabletovary.Rememberthatthedependentvariable(leftsideoftheequation)isdependentupontheindependentvariable(s)(rightsideoftheequation).
Forinstance,it’swellknownthatpriceisprobablyasignificantvariableinaunit-demandmodelandthatthesignshouldbenegative.Thatis,aspricegoesup,units,onaverage,shouldgodown.Thisisthelawofdemand,theonlylawinallofeconomics–excepttheonethatmosteconomicforecastswillbewrong.(‘Economistshavepredicted12ofthelast7recessions.’)
(Foryousticklers,yes,thereisa‘Giffengood’.Thisisanoddproductwherebyanincreasingpricecausesanincreaseindemand.Theseareusuallynon-normalgoods(typicallyluxurygoods)likefineartorwine.Forthevastmajorityofproductsmostmarketersworkon,however,thesenormalgoodsareruledbythelawofdemand:pricegoesup,quantity(units)goesdown.)
SoScottthinksthatpriceandadvertisingspendareimportantingeneratingdemand.Alsothatthereshouldbesomethingabouttheseason.He’sontheconsumersideofthebusinessandithasstrongback-to-schoolandChristmasseasonalspikes.
Hethinkshecaneasilygetthenumberofunitssoldandtheaveragepriceofthoseunits.Seasonalityiseasy;it’sjustavariabletoaccountfortimeoftheyear,sayquarterly.Advertisingspend(fortheconsumermarket)mightbealittletougherbutlet’ssayheisabletotwistsomearmsandeventuallysecureaguessastotheaverageamountofadvertisingspentontheconsumermarket,byquarter.
Thiswillbeatimeseriesmodelsinceithasseasonandquarterlyunits,averagepricesandadvertisingspend,bytimeperiod,quarterly.(Therewillbesomeeconometricsuggestionsontimeseriesmodellinginthetechnicalsection,particularlypertainingtoserialcorrelation.)
Fornow,let’smakesurethere’sagoodgraspoftheproblem.Scottwilluseadependentvariabletechniquecalledordinaryregression(ordinaryleastsquares,OLS)tounderstand(quantify)howseason,advertisingspendandpricecause(explainthemovementof)unitssold.Thisiscalledastructuralanalysis:heistryingtounderstandthestructureofthedata-generatingprocess.Heisattemptingtoquantifyhowprice,advertisingspendandseasonexplain,orcause(mostof)themovementinunitsales.
Whenhe’sthroughhe’llbeabletosaywhetherornotadvertisingspendissignificantincausingunitsales(he’llhavetomakecertainnoadvertisersareinearshotwhenhedoes)andwhetherDecemberispositiveandJanuarynegativeintermsofmovingunitsales,etc.
Now,Scottisreadytodesigntheordinaryregressionmodel.
Conceptualnotes
Ordinaryregressionisacommon,well-understoodandwell-researchedstatisticaltechniquethathasbeenaroundover200years.Rememberthatregressionisadependentvariabletechnique,Y=a+bXi+e,whereeisarandomerrortermnotspecificallyseenbutwhoseimpactisfeltinthedistributionofthevariables.
Ordinaryregression:astatisticaltechniquewherebyadependentvariabledependsonthemovementofoneormoreindependentvariables(plusanerrorterm).
Simpleregressionhasoneindependentvariableandmultipleregressionhasmorethanoneindependentvariable,thatis:
y=a+b1x1+b2x2…+bnxn,etc.
Scott’smodelforhisbosswillusemultipleregressionbecausehehasmorethanoneindependentvariable.
Theoutputofthemodelwillhaveestimatesabouthowsignificanteachvariableis(we’llseeitscoefficientorslope)andwhetherit’ssignificantornot(basedonitsvariance).Thisistheheartofstructuralanalysis,quantifyingthestructureofthedemandforPCs.
So,Scottcollecteddata(seeTable3.1)andranthemodel
Units=price+advertising
andnowseeshowthemodelfits.
Table3.1Demandmodeldata
Quarter Unitsales Avgprice Adspend
1 50 1,400 6,250
2 52.5 1,250 6,565
3 55.7 1,199 6,999
4 62.3 1,099 7,799
1 52.5 1,299 6,555
2 59 1,200 7,333
3 58.2 1,211 7,266
.. .. .. ..
Thereisonegeneralmeasureofgoodnessoffit:R2.R2isthesquareofthecorrelationcoefficient,inthiscasethecorrelationofactualunitsandpredictedunits.Whilecorrelationmeasuresstrengthanddirection,R2measuressharedvariance(explanatorypower)andrangesfrom0%–100%.
(AninterestingbutratheruselessbitoftriviaiswhyR2iscalledR2.Yes,R2isthesquareofR,andRisthecorrelationcoefficient.CorrelationissymbolizedastheGreekletterrho,ρ.Why?InGreeknumeralsα=1,β=2,etc.,andρ=100(kindoflikeRomannumerals,I=1,II=2,C=100,etc.).Rememberthattherangeofcorrelationisfrom–100%to+100%.ρ=rhoandinEnglish=R.Nowimpressyouranalyticfriends.)
Notethedataisquarterly,whichwe’lladdresssoonenough.ScottrunsordinaryregressionandfindstheoutputasTable3.2.
Table3.2Ordinaryregression
Adspend Avgprice Constant
Coefficient 0.0007 –0.0412 101.83
Standerr 0.0003 0.0047
R2 83%
t-ratio 2.72 –8.67
Thefirstrowistheestimatedcoefficient,orslope.Notethatpriceisnegative,ashypothesized.Thesecondrowisthestandarderror,oranestimateofthestandarddeviationofthevariable,whichisameasureofdispersion.
Standarderror:anestimateofstandarddeviation,calculatedasthestandarddeviationdividedbythesquarerootofthenumberofobservations.
Let’stalkaboutsignificance,shallwe?Inmarketingweoperateat95%confidence.Rememberz-scores?1.96isthez-scorefor95%confidence,whichisthesameasap-value<0.05.So,ifat-ratio(whichinthiscaseisthecoefficientdividedbyitsstandarderror)is>|1.96|thevariableisconsideredsignificant.Significancemeansthatthere’slessthana5%chanceofthevariablehaving0impactandthet-ratiotestsfortheprobabilitythatthevariable’simpactislikelytobe0.95%ofallstandard-normalobservationswillbewithin+/–1.96z-scores.
Noticethatadvertisingspendhasacoefficientof0.0007(rounded)andastandarderrorof0.0003(rounded).Thet-ratio(coefficientdividedbyitsstandarderror)is2.72whichis
>1.96soitissaidtobepositiveandsignificant.(‘Whew’theadvertiserssay.)Likewisepriceissignificant(<–1.96)andnegative,asexpected.
Nowlet’smentionfit;howwellthemodeldoeswithjustthesetwovariables.R2isthegeneralmeasureofgoodnessoffitandinthiscaseis83%.Thatis,83%ofthevariancebetweenactualandpredictedunitsisshared,or83%ofthemovementoftheactualdependentvariableis‘explained’bytheindependentvariables.Thiscanbeinterpretedas83%ofthemovementintheunitsalescanbeattributedtopriceandadvertisingspend.Thisseemsprettygood;that’safairlyhighamountofexplanatorypower.That’sprobablywhyScott’sbosswantedhimtodothismodel.
ThenextstepisforScotttoaddseasonality,whichhehypothesizedtobeavariablethatimpactsconsumerPCunitssold.Scotthasquarterlydatasothisiseasytodo.Thenewmodelwillbeunits=price+advertising+season.
Let’stalkaboutdummyvariables(binaryvariables,thosewithonlytwovalues,1or0).Theseareoftencalled‘slopeshifters’becausetheirpurpose(whenturned‘on’asa1)istoshifttheslopecoefficientupordown.Theideaofabinaryvariableistoaccountforchangesintwostatesofnature:onoroff,yesorno,purchaseornot,respondornot,q1ornot,etc.
Scott’smodelisaquarterlymodelsoratherthanuseonevariablecalledquarterwithfourvalues(1,2,3,4)heusesamodelwiththreedummy(binary)variables,q2,q3andq4,each0or1.Thisallowshimtoquantifytheimpactofthequarteritself.Table3.3showspartofthedataset.
Table3.3Quarterlymodel
Quarter Unitsales Avgprice Adspend Q2 Q3 Q4
1 50 1,400 6,250 0 0 0
2 52.5 1,250 6,565 1 0 0
3 55.7 1,199 6,999 0 1 0
4 62.3 1,099 7,799 0 0 1
1 52.5 1,299 6,555 0 0 0
2 59 1,200 7,333 1 0 0
3 58.2 1,211 7,266 0 1 0
4 64.8 999 8,111 0 0 1
1 55 1,299 6,877 0 0 0
2 61.5 1,166 7,688 1 0 0
.. .. .. .. .. .. ..
Abrieftechnicalnote
Whenusingbinaryvariablesthatformasystem,youcannotusethemall.Thatis,foraquarterlymodelyouhavetodroponeofthequarters.Otherwisethemodelwon’tsolve(effectivelytryingtodivideby0)andyouwillhavefallenintothe‘dummytrap’.SoScottdecidestodropq1,whichmeanstheinterpretationofthecoefficientsonthequartersamountstocomparingeachquartertoq1.Thatis,q1isthebaseline.
Nowlet’stalkaboutthenewmodel’s(Table3.4)outputanddiagnostics.NotefirstthatR2improvedto95%,whichmeansaddingquarterlydataimprovedthefitofthemodel.Thatis,price,advertisingspendandseasonnowexplains95%ofthemovementinunitsales,whichisoutstanding.It’sabettermodel.Notethechangeinpriceandadvertisingcoefficients.
Table3.4Regressionoutput
Q4 Q3 Q2 Adspend Avgprice Constant
Coefficient 3.825 2.689 1.533 0.0011 –0.0275 80.7153
Standerr 1.36 1.157 0.997 0.0003 0.0064 9.8496
R2 95%
t-ratio 2.81 2.32 1.54 4.1 –4.3 8.19
Now,forwhatitmeansandhowcanitbeused,theresultsoftheoutputwillbeappliednext.
ResultsappliedtobusinesscaseSonow,whatdoesallthistellus?Analyticswithoutapplicationtoanactionablestrategyismeaningless,muchlikespecialeffectsinamoviewithoutaplot.Lookingattheoutputagain,Scottcanmakesomeactionableandimportantstructuralcomments.
AgaintheR2asameasureoffitis>95%whichmeanstheindependentvariablesdoaverygoodjobexplainingthemovementofunitsales.Allofthevariablesaresignificantatthe95%level(wherez-score>|1.96|)exceptq2.Thecoefficientsonthevariablesallhavetheexpectedsigns.Comparingthequarterstoq1(whichwasdroppedtoavoidthedummytrap),Scottseesthattheyareallpositive,whichmeanstheyareallgreaterthanq1,onaverage.
ThepowerfulthingaboutordinaryregressionisthatitparcelsouttheimpactOFeachindependentvariable,takingintoaccountalltheothervariables.Thatis,itholdsallothervariablesconstantandquantifiestheimpactofeachandeveryvariable,oneatatime.This
meansthat,whentakingallvariablesintoaccount,q4tendstoaddabout3.825unitsmorethanq1.Thisiswhyabinaryvariableiscalledaslopeshifter;justturning‘on’q4adds3.825units,regardlesswhatelseishappeninginpriceoradvertisingspend.Giventheverystrongseasonalpatternofunitsalesthesequarterlyestimatesseemreasonable.
Advertisinghasasignificantandpositiveimpactonunitsales.0.0011asacoefficientmeansevery1,000increaseinadvertisingspendtendstoincreaseunitsby1.1.
Nowlet’slookatprice.Thepricecoefficientisnegative,asexpectedat–0.0275.Whenpricemovesupby,say,100,unitstendtodecreaseby2.75.Now,howcanthisbeuseful?Justknowingthequantificationisvaluablebutmoreimportantlyistocalculatepriceelasticity.
ModellingelasticityElasticityisamicroeconomiccalculationthatshowsthepercentchangeinresponsegivenapercentchangeinstimulus,orinthiscase,thepercentchangeinunitssoldgivenapercentchangeinprice.
Elasticity:ametricwithnoscaleordimension,calculatedasthepercentchangeinanoutputvariablegivenapercentchangeinaninputvariable.
Usingaregressionequationmeansthecalculationofelasticityis:pricecoefficient*averagepriceoveraveragequantity(units).
Averagepriceis1,102andaveragequantityofunitssoldis63sothepriceelasticitycalculatedhereis:
–0.0275*1,102/63=–0.48
Thismeansthatifpriceincreasesby,say10%,unitssoldwilldecreasebyabout4.8%.ThisisstrategicallylucrativeinformationallowingScottandhisteamtooptimizepricingtomaximizeunitssold.Therewillbemoreonthistopiclater.
Asaquickreview,rememberthattherearetwotypesofelasticity:elasticandinelastic.
Elasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproducesmorethanthatchangeinanoutputvariable.
InelasticitymeansthatanX%increaseinpricecausesa<X%decreaseinunitssold.
Inelasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproduceslessthanthatchangeinanoutputvariable.
Thatis,ifpriceweretoincreaseby,say,10%,unitswoulddecrease(rememberthelawofdemand:ifpricegoesup,quantitygoesdown)bylessthan10%.Meaning,ifelasticity<
|1.00|thedemandisinelastic(thinkofitasunitsbeinginsensitivetoapricechange).Ifelasticity>|1.00|thedemandiselastic.
Thesimplereasonwhyelasticityisimportanttoknowisthatittellswhathappenstototalrevenue,intermsofpricing.Inaninelasticdemandcurvetotalrevenuefollowsprice.Soifpriceweretoincrease,totalrevenuewouldincrease.SeeTable3.5belowforamathematicexample.
Table3.5Elasticity,inelasticity,andtotalrevenue
Inelastic 0.075 Increasepriceby 10.00%
p1 10.00 p2 11.00 10.00%
u1 1,000 u2 993 –0.75%
tr1 10,000 tr2 10,918 9.20%
Elastic 1.25 Increasepriceby 10.00%
p1 10.00 p2 11.00 10.00%
u1 1,000 u2 875 –12.50%
tr1 10,000 tr2 9,625 –3.80%
Letmeaddonequicknoteaboutelasticitymodelling,somethingwhichisacommonmistake.Itiswellknownthatifthenaturallogarithmistakenforalldata(dependentaswellasindependentvariables)thentheelasticitycalculationdoesnothavetobedone.Elasticitycanbereadrightoffthecoefficient.Thatis,thebetacoefficientIStheelasticity.
ln(y)=b1ln(x1)+b2ln(x2)…+bnln(xn)
Theproblemwiththisisthat,whilethecalculationiseasier(takingthepricemeansandtheunitmeans,etc.isnotrequired),modellingallthedatainnaturallogsspecificallyassumesaconstantelasticity.Thisassumptionseemsheroicindeed.Tosaythereisthesameresponsetoa5%pricechangeasthereistoa25%pricechangewouldstrikemostmarketersasinappropriate.Amodelinlogswouldhaveaconstantlyconcavecurvetotheoriginthroughout.Formoreonmodellingelasticityfromamarketingpointofview,seeanarticleIwrotethatappearedintheCanadianJournalofMarketingResearch,called‘ModelingElasticity’(Grigsby,2002).
UsingthemodelHowistheordinaryregressionequationused?Thatis,howarepredictedunitscalculated?
NoteFigure3.1showstheactualaswellasthepredictedunitsales.Thegraphshowshowwellthepredictedsalesfittheactualsales.Theequationis:
Y=a+B1x1+B2x2…+BnXnor
Units=constant+b1*q2+b2*q3+b3*q4+b4*price+b5*advert
Figure3.1Actualandpredictedunitsales
Forthesecondobservation(Table3.6)thismeans:
80.7+(3.8*0)+(2.6*0)+(1.5*1)+(0.001*6,565)–(0.02*1,250)=55.2
Table3.6Averagepriceandadspend
Quarter Unitsales Avgprice Adspend Q2 Q3 Q4 Predictedsales
1 50.0 1,400 6,250 0 0 0 49.2
2 52.5 1,250 6,565 1 0 0 55.2
3 55.7 1,199 6,999 0 1 0 58.2
4 62.3 1,099 7,799 0 0 1 63.0
1 52.5 1,299 6,555 0 0 0 52.3
2 59.0 1,200 7,333 1 0 0 57.5
3 58.2 1,211 7,266 0 1 0 58.2
TechnicalnotesWe’llgooversomedetailedbackgroundinformationinvolvingmodellingingeneralandregressioninparticularnow.Thiswillbealittlemoretechnicalandonlynecessaryforafullerunderstanding.
First,beawarethatregressioncarrieswithitsome‘baggage’,someassumptionsthatif
violated(andthey/somealmostalwaysaretosomeextent)themodelhasshortcomings,bias,etc.Asalludedtoearlier,oneofthebestbooksoneconometricsisPeterKennedy’s1998workAGuidetoEconometrics.Thisisbecauseheexplainsthingsfirstconceptuallyandthenaddsmoretechnical/statisticaldetail,forthosethatwant/needit.Hecoverstheassumptionsandfailingsoftheassumptionsofregressionaswellasanyone.Myphilosophyinthisbookissimilarandthissectionwilladdsometechnical,butnotnecessarilymathematical,details.
TheassumptionsThefirstassumption–dealingwithfunctionalform–isthatthedependent
variable(unitsales,above)canbemodelledasalinearequation.Thisdependentvariable‘depends’ontheindependentvariables(season,priceandadvertising,asabove)andsomerandomerrorterm.
Thesecondassumption–dealingwiththeerrorterm–isthattheaveragevalueoftheerrortermiszero.
Thethirdassumption–dealingwiththeerrorterm–isthattheerrortermhassimilarvariancescatteredacrossalltheindependentvariables(homoscedasticity)andthattheerrorterminoneperiodisnotcorrelatedwithanerrorterminanother(later)period(noserial(orauto)correlation).
Thefourthassumption–dealingwithindependentvariables–isthattheindependentvariablesarefixedinrepeatedsamples.
Thefifthassumption–dealingwithindependentvariables–isthatthereisnoexactcorrelationbetweentheindependentvariables(noperfectcollinearity).
Eachoftheseassumptionsisrequiredfortheregressionmodeltowork,tobeinterpretable,tobeunbiased,efficient,consistent,etc.Afailureofanyoftheseassumptionsmeanssomethinghastobedonetothemodelinordertoaccountfortheconsequencesofaviolationoftheassumption(s).Thatis,goodmodelbuildingrequiresatestforeveryassumptionand,ifthemodelfailsthetest,acorrectiontothemodelmustbeapplied.Allthisrequiresanunderstandingoftheconsequencesofviolatingeveryassumption.
Allofthesewillbedealtwithaswegothroughthebusinesscases.Butfornow,let’sjustdealwithserialcorrelation.Serialcorrelationmeanstheerrorterminperiodxiscorrelatedwiththeerrorterminperiodx+1,allthewaythroughthewholedataset.Serialcorrelationisverycommonintimeseriesandmustbedealtwith.
Asimpletest,calledtheDurbin-Watsontest,iseasytoruninSAStoascertaintheextentofserialcorrelation.Iftheresultofthetestisabout2.00thereisnotenoughautocorrelationtoworryabout.
Theconsequenceofaviolationoftheassumptionofnoerrortermcorrelationisthatthestandarderrorsarebiaseddownward,thatis,thestandarderrorstendtobesmallerthantheyshouldbe.Thismeansthatthet-ratios(measuresofsignificance)willbelarger(appearmoresignificant)thantheyreallyare.Thisisaproblem.
Thecorrectionforserialcorrelation(atleastfora1-periodcorrelation)iscalledCochran-Orcutt(althoughtheSASoutputactuallydoesaYule-Walkerestimate,whichsimplymeansithaswaystoputthefirstobservationbackintothedataset)anditbasicallytransformsallthedatabythecorrelationof1-periodlagoftheerrorterm.Themodelisre-runandDurbin-Watsonisre-runandthoseresultsused.
SeeTables3.7and3.8forD-Wbeingnear2.0(from1.08to1.93).Thisseemstoindicatethemodeltransformationworked.Notethechangeincoefficients:pricewentfrom–.0256to–.0274.Notethestandarderrorwentfrom.006to.004andsignificanceincreased.
Table3.7Serialcorrelation
Variable Estimate Standarderror Tvalue
Intercept 78.47 6.41 12.24
Price –0.0256 0.006 –4.27
Advertising 0.001109 0.00019 5.65
Q2 1.5723 0.7422 2.12
Q3 2.9698 1.0038 2.96
Q4 4.357 0.8948 4.87
R2 98.61%
Durbin-Watson 1.08
Table3.8Serialcorrelation
Variable Estimate Standarderror Tvalue
Intercept 78.47 6.41 12.24
Price –0.0274 0.004 –6.17
Advertising 0.001109 0.00019 5.65
Q2 1.5723 0.7422 2.12
Q3 2.9698 1.0038 2.96
Q4 4.357 0.8948 4.87
R2 98.61%
Durbin-Watson 1.93
Nowthattheserialcorrelationhasbeentakencareof,confidenceininterpretationoftheimpactsofthemodelhasimproved.Aquicknotethoughaboutserialcorrelationandthediagnostics/correctionsI’vejustmentioned.Whilemostserialcorrelationislaggedononeperiod(calledanautoregressive1orAR(1)process)thisdoesnotmeanthattherecannotbeotherserialcorrelationproblems.Partofitisaboutthekindofdatagiven.IfitisdailydatatherewilloftenbeanAR(7)process.Thismeansthereisstrongercorrelationbetweenperiodslaggedby7thanperiodslaggedby1.IfthereismonthlydatatherewilloftenbeanAR(12)process,etc.Thus,keepinmindtheD-WstatisreallyonlyappropriateforanAR(1).Thatis,ifthedataisdaily,eachMondaywouldtendtobecorrelatedwithallotherMondays,etc.ThismeansserialcorrelationofanAR(7)type,andnotanAR(1).Thus,dailydatatendstobelaggedby7observations,monthlydatatendstobelaggedby12observations,quarterlydataby4,etc.
HIGHLIGHT
SEGMENTATIONANDELASTICITYMODELLINGCANMAXIMIZEREVENUEINARETAIL/MEDICALCLINIC
CHAIN:FIELDTESTRESULTS
AbstractMostmedicalproductsorservicesarethoughttobeinsensitivetoprice.Thisdoesnotmeanthebestwaytomaximizerevenueistounilaterallyraiseeverypriceindiscriminatelyforallregionsinallclinicsforallproductsorservices.Thereshouldbesomecustomers,someregions,somesegments,someclinics,someproductsorservicesthataresensitivetoprice.Marketinganalyticsneedstogiveguidancetoexploittheseopportunities.
Usingtransactionalandsurveydatafromalargenationalretail/medicalchain,Icollectedinformationthatincluded,bycustomerandbyclinic,thenumberofunits,pricepaidandrevenuerealizedforeachproduct/servicepurchasedoveratwo-yearperiod.Therewasatelephonesurveyadministeredtocontactthreecompetingclinicsaroundeachofthefirm’sclinicsandascertaincompetitivepriceschargedforcertain‘shopped’products/services.Thus,adatasetwascreatedthathadbothown-andcross-priceofseveralproductsorservices.
Becausemuchofacustomer’spurchasingbehaviourcouldbeattributedtoclinicdifferences(staffing,employeecourtesy,location,growth,operationaldiscounts,etc.)
clinicsegmentationwasdone.Toemphasize,thiswascreatedtoaccountforclinicsinfluencing(causing)somecustomerbehaviourotherthanresponsestoown-andcross-price.Forexample,onesegmentprovedtobelarge(intermsofnumberofclinics),suburbanandservingmostlyloyalcustomers.Anothersegmentwasfairlysmall,urbanandservingrathersickpatientswithcustomerswhoweremostlydissatisfiedandhadahighnumberofdefectors.Obviouslycontrollingforthesedifferenceswasimportant.
Aftersegmentation,elasticitymodellingwasdoneoneachsegmentforselectedproductsorservices.Thisoutputshowedthatsomesegmentsandsomeproductsorservicesaresensitivetoprice;othersarenot.Thisdetailstheineffectivenessofsimplyraisingpricesonallproducts/servicesacrossthechain.Inordertomaximizerevenue,pricesshouldbeloweredonaproductinaclinicthatissensitivetoprice.Thissensitivitycomesfromlackofloyalty,lackoflong-termcommitment,knowledgeofcompetingprices,acustomer’sbudget,etc.
Aftertheanalysiswasfinishedandshowntothefirm’smanagement,theyputa90-daytestvscontrolinplace.Theychoseselected(shopped)products/segmentsandregionstotest.After90days,thetestclinicsout-performedthecontrolclinics,intermsofaveragenetrevenue,by>10%.Thisseemstoindicatethatthereareanalyticwaystoexploitpricesensitivityinordertomaximizerevenue.
TheproblemandsomebackgroundGivenaparticularchainofretail/medicalclinicsacrossthenation,pricingpracticeswerenotoriouslysimplistic:raisepricesonnearlyeveryproductorservice,foreveryclinic,ineveryregion,aboutthesameamount,everyyear.Growthwasachievedforatimebutoverthelasthandfulofyearscustomersatisfactionbegantodip,defectionsincreased,loyaltydecreased,employeesatisfaction/courtesydecreased,itwasmoreandmoredifficulttooperationallyenforcepriceincreasesandthefirmoverallhadminimalgrowthandlargerandlargerusesofdiscounts,etc.Muchofthedeteriorationinthesemetricswasroot-causedbacktopricingpolicies.Sotheprimarymarketingproblemwastounderstandtowhatextentpricingaffectedtotalrevenue.Thatis,couldpricesensitivitybediscovereddifferentlybysegmentorregion,fordifferentproductsorservices,toallowthefirmtoexploitthosedifferences?
Pricingismostlyaroundoneoftwopractices.Thefirst,cost-plus,isafinancialdecisionbasedontheinputcostoftheproductsorservicesandincorporatingmarginintothefinalprice.Thisisthetypicalapproach,especiallyintermsofproductsorservicesthoughttobeinsensitivetoprice(egemergencies,radiology,majorsurgery,etc.).Theotherpricingavenueisforshoppedproductsorservices.Theseareproductsorservicesthoughttopossiblybemoresensitivetoprice(exams,discretionaryvaccines,etc.).Fortheseproductsorservicesasurveywascreatedandthreecompetingclinicsaroundeachofthefirm’sclinicswerecalledandaskedwhatpricestheycharged.Thenthefirmtypically
increasedtheirownprices(verymuchoperationallyascost-plus)butwithanunderstandingwherethecompetitionpricedthosesameproductsorservices.Theysometimeslistenedtoanindividualclinic’srequestorprotestforaless-than-typicalpriceincrease.
DescriptionofthedatasetThetransactionaldatabaseprovidedown-firmbehaviouraldataatthecustomerlevel.Thiscouldberolleduptothecliniclevel.Thetransactionaldataincluded:products/servicespurchased,pricepaidforeach,discountapplied,totalrevenue,numberofvisits,timebetweenvisits,ailment/complaint,clinicvisited,staffing,etc.
Theclinicdataincludedaggregationsoftheabove,aswellastradearea,location(ruralvsurban),staffinganddemographicsfromthecensusdatamappedtozipcodelevel.Alsoavailablewascertainmarketresearchsurveydata.Theseincludedcustomersatisfaction/loyaltyanddefectionsurveys,employeesatisfactionsurveys,etc.
Mostinterestingwasthecompetitivesurveydata.Thissurveyaskedthreecompetitorsneareachofthefirm’sclinicswhatpricestheychargedforshoppedproducts.Shoppedproductsarethosebelievedtobemorepricesensitiveandincludedexams,vaccines,minorsurgery,etc.Thus,foreachofthefirm’sclinics,theylookedatownpricespaidbycustomersforeveryproduct/service(bothshoppedandother)aswellasthreecompetitors’priceschargedforselectedshoppedproducts/services.Theown-pricedataallowedelasticitymodellingtobeundertaken,andthecross-pricedatashowedaninterestingcausefromcompetitivepressures.Sometimesthesecompetitivepressuresmadeadifferenceonownpricesensitivityandsometimesnot.Thisprovidedlucrativeopportunitiesformarketingstrategy.
First:segmentation
Whysegment?Thefirststepwastodoclinicsegmentation.
Segmentation:amarketingstrategyaimedatdividingthemarketintosub-markets,whereineachmemberineachsegmentisverysimilarbysomemeasuretoeachotherandverydissimilartomembersinallothersegments.
Thisisbecauseconsumers’behaviour,insomeways,maybecausedbyaclinic’sperformance,staffing,culture,etc.Thatis,whatmightlooklikeaconsumer’schoicemightbemorecausedbyaclinic’sfirmographics.Thedatasetcontainedallrevenueandproducttransactionsthatcouldberolledupbyclinic.Thismeantthatyear-over-yeargrowth,discountingchanges,customervisits,etc.,couldbeusefulmetrics.Alsoimportantwasthelocationofaclinic(rural,urban,etc.).Sotherewasalotofknowledgeaboutthe
clinicanditsperformanceanditwasthesethingsthatitwasnecessarytocontrolforintheelasticitymodels.
Becauselatentclassanalysis(LCA)hasbecomethegoldstandardtheselasttenyears,LCAwasusedasasegmentationtechnique.Ithasprovenfarsuperiortotypical(k-means,asegmentationalgorithmdiscussedlater)techniques,especiallyinoutputtingmaximallydifferentiatedsegments.Anobviouspoint:themoredifferentiatedsegmentsarethemoreuniquemarketingstrategiescanbecreatedforeachsegment.
ProfileoutputAfterrunningLCAontheclinicdata,theprofilebelowwascreated(seeTable3.9).Acoupleofcommentsonthesegments,particularlythosetobeusedinthefieldtest.Segment1isthelargest(intermsofnumberofclinicsincluded)andhasthelargestpercentofannualrevenue.Segment1ismostheavilysituatedinsuburbanareasandmarketresearchshowsthemtohavethemostloyalcustomers.Segment2isthenext-to-largestbutonlybringsinabouthalfoftheirfairshareofrevenue.Segment4,whilesmall,represents>20%ofoverallrevenueandismostlyinurbanareas.Marketresearchrevealsthissegmenttobetheleastsatisfiedandcontainsthemostdefectors.Thesedifferenceshelpaccountforcustomer’ssensitivitytoprice,aswillbeshowninthemodelslater.
Table3.9Elasticitymodelling
Segment1 Segment2 Segment4
%Market 36% 34% 7%
%Revenue 41% 19% 21%
#ofclients 5,743 3,671 15,087
Rev/visit 135 120 215
%Suburb 56% 51% 45%
%Rural 13% 20% 3%
%Urban 31% 29% 52%
Then:elasticitymodellingOverviewofelasticitymodelling
Let’sgobacktobeginningmicroeconomics:priceelasticityisthemetricthatmeasuresthepercentchangeinanoutputvariable(typicallyunits)fromapercentchange,inthiscase(net)price,fromaninputvariable.Ifthepercentchangeis>100%,thatdemandiscalledelastic.Ifitis<100%,thatdemandiscalledinelastic.Thisisanunfortunateterm.The
clearconceptisoneofsensitivity.Thatis,howsensitivearecustomerswhopurchaseunitstoachangeinprice?Ifthereisasay10%changeinpriceandcustomersrespondbypurchasing<10%units,theyareclearlyinsensitivetoprice.Ifthereisasay10%changeinpriceandcustomersrespondbypurchasing>10%units,theyaresensitivetoprice.
Butthisisnotthekeypoint,atleastintermsofmarketingstrategy.Thelawofdemandisthatpriceandunitsareinverselycorrelated(rememberthedownwardslopingdemandcurve?).Unitswillalwaysgotheoppositedirectionofapricechange.Buttherealissueiswhathappenstorevenue.Sincerevenueisprice*units,ifdemandisinelastic,revenuewillfollowthepricedirection.Ifdemandiselastic,revenuewillfollowtheunitdirection.Thus,toincreaserevenueinaninelasticdemandcurve,priceshouldincrease.Toincreaserevenueinanelasticdemandcurve,priceshoulddecrease.
FrompointelasticitytomodellingelasticityMostofusweretaughtinmicroeconomicsthesimpleideaofpointelasticity.Pointelasticityisthepercentdifferencebetween(x,y)points.Thatis,thepercentchangeinunitsgivenapercentchangeinprice.Saypricegoesfrom9–11,andunitsgofrom1000–850.Thepointelasticityiscalculatedas[(1000–850)/1000]/[(11–10)/10)whichis–68%.Notethepercentchangeinunitsis15%,fromapercentchangeinpriceof22%.Obviouslyunitsareasmallerchange(lesssensitive)thanthepricechangesothis(point)demandisinelastic.Thatis,theelasticityatthispointonthedemandcurveisinsensitivetoprice.Notethatasthedemandcurvegoesfromahighpricetoalowprice,theslopechangesandthesensitivitychanges.Thisisthekeymarketingstrategyissue.
Thuselasticityisamarginalfunctionoveranaveragefunction.Theoverallmathematicalconceptof‘marginal’istheaverageslopeofacurvewhichisaderivative.Sotocalculatetheoverallaverageelasticityrequiresthederivativeoftheunitsbypricefunction(ie,thedemandcurve)measuredatthemeans,meaning:
Elasticity=dQ/dP*averageprice/averageunits.
Somathematicallythederivativerepresentstheaverageslopeofthedemandfunction.Inastatisticalmodel(thataccountsforrandomerror)thesameconceptapplies:amarginalfunctionoveranaveragefunction.Inastatistical(regression)modelthebetacoefficientistheaverageslope,thus:
Elasticity=βPrice*averageprice/averageunits.
Aquicknoteonamathematicallycorrectbutpracticallyincorrectconcept:modellingelasticityinlogs.Whileit’struethatifthenaturallogistakenbothofthedemandandprice,thereisnocalculationatthemeans;thebetacoefficientistheelasticity.However–andthisisimportant–runningamodelinnaturallogsalsoimpliesaverywrongassumption:constantelasticity.Thismeansthereisthesameimpactatasmallpricechangeasatalargepricechangeandnomarketerbelievesthat.Thus,modellinginnatural
logsisneverrecommended.
Own-pricevscross-priceandsubstitutesNowcomestheinterestingpartofthisdataset.Ithascompetitorprices!Asurveywasdoneaskingthreecompetitorsnearesttoeachclinicthepricestheychargedfor‘shoppedproducts’.Theseproductsareassumedtobegenerallypricesensitive.Itookthehighestcompetitorpriceandthelowestcompetitorpriceandusedthatascross-pricedataforevery(shopped)product.Thusthedemandmodel(bysegment)foreachshoppedproductwillbe:
Units=f(own-price,highcross-price,lowcross-price,etc.)
Thereasoncompetitivepricesaresointerestingisbecauseoftwothings.First,competitivepricesarecausesofbehaviour.Second,ifacompetitorisastrongsubstitute,strategicchoicesrevealthemselves.
Acompetitorisregardedasasubstituteifthecoefficientontheircross-priceispositive.Thismeansthereisapositivecorrelationwithafirm’sowndemand.Thus,ifthecompetitionisasubstituteandchoosestoraisetheirprices,ourowndemandwillincreasebecausetheircustomerswilltendtoflowtoourdemand(withlowerprices).Ifthecompetitorisasubstituteandchoosestolowertheirprices,ourowndemandwilldecreasebecausetheircustomerswilltendtoflowoutofourdemand(withhigherprices).Thus,knowingifacompetitorisasubstitutegivesexplanatorypowertothemodelaswellasapotentialstrategiclever.
Buttherealissueishowstrongasubstituteacompetitoris.Thisstrengthisrevealedinthecross-pricecoefficients.Sayforaparticulardemandmodelthecoefficientonownpriceis–1.50andthecoefficientonhighcross-priceis+1.10.Ownpricehastheexpectednegativecorrelation(ownpricegoesup,(own)unitsgodown).Highcross-priceispositive,meaninginthiscasethehigh-pricecompetitorisasubstitute.Ifownelasticityispricesensitiveandwelowerourprices,thehighcompetitorscanlowertheirpricesaswell,decreasingourdemand.Butnotethattheyarenotastrongsubstitute.Astrongsubstitutewillnotonlyhaveapositivecoefficientbutthatcoefficientwillbe(absolutevalue)>ownpricecoefficient.Meaning,intheaboveexample,ifwelowerourpricesby10%weexpectourdemandtoincreaseby15%.Ifthecompetitormatchesourpricechangeandlowersby10%,thatwillaffectourdemandby11%,thatis,theywerenotastrongsubstitute.
However,ifourownpricecoefficientwas–1.50andthehigh-pricecompetitorcoefficientwasinstead+3.00,averydifferentstoryunfolds.Ifwelowerourpricesby10%ourdemandwillgoupby15%.Butthestrongsubstitutecanlowertheirpriceby5%andimpactourunitsby15%(5%*3=1.5).Oriftheyalsolowerby10%andmatchusthatwillimpactourunitsby30%!Clearlythisstrongcompetitorisfarmorepowerfulthan
thefirstscenario.Notealsothatnoneofthis‘gametheory’knowledgeispossiblewithoutcrossprices.
ModellingoutputbysegmentThenextfourtablesshowelasticitymodellingresultsbysegmentbyfourselectedshoppedproducts.(Inthefieldtestonlyvaccines(two),minorsurgeryandexamswereused.)Followingeachtablearenotesonstrategicuses.
Table3.10Elasticitymodelling
Vaccinex Seg1 Seg2 Seg4
Vaccinexfirm –0.377 –1.842 –3.702
Vaccinexcomphi –0.839 0.062 1.326
Vaccinexcomplo –0.078 –0.167 –0.757
Segment1:Anelasticity<|1.00|(0.377,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andnocompetitorisasubstitute(nopositivecross-priceelasticity).Thereforeincreaseprice.
Afewdetailsonsegment1vaccinexcalculationsfollow.Forown-priceelasticity,thefirm’spricewas28andtheownpricecoefficientwas–1.2andtheaverageunitswere89.Thusownpriceelasticityis–0.377=–1.2*28/89.Highcompetitorpriceelasticityiscalculatedas–0.839=–1.915*39/89andlowpricecompetitorelasticityis–0.078=–0.33*21/89.Allothercalculationsaresimilar.
Segment2:Anelasticity>|1.00|(1.842,inabsoluteterms)meansthisproductforthissegmenthasademandthatiselastic.Thehighcompetitorisaweaksubstitute(0.062).Thereforedecreaseprice.
Segment4:Anelasticity>1.00(3.702,inabsoluteterms)meansthisproductforthissegmenthasademandthatiselastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Thehighcompetitorisaweaksubstitute(1.326).Thereforedecreaseprice.
Table3.11Furtherelasticitymodelling
Vacciney Seg1 Seg2 Seg4
Vaccineyfirm –0.214 –0.361 –0.406
Vaccineycomphi 0.275 0.018 0.109
Vaccineycomplo 0.196 0.123 0.864
Segment1:Anelasticity<|1.00|(0.214,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthelowcompetitorisaweaksubstitute.Thehighcompetitorisastrongsubstitute.Notethepositive0.275is>absolute0.214meaningthehighcompetitorcanmatch/retaliateagainstthefirmwithasmallerpricedecrease.Thereforetest(rememberthissegmentisloyal)increasingprice.
Segment2:Anelasticity<|1.00|(0.361,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Whilebothcompetitorsaresubstitutes,theyeachareweak.Thereforetestincreasingprice.
Segment4:Anelasticity<|1.00|(0.406,inabsoluteterms)meansthisproductforthissegmenthasademandthatis(surprisingly)inelastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Whilebothcompetitorsaresubstitutes,thelowcompetitorisastrongsubstitute.Thereforecautiouslytestincreasingprice.
Table3.12Furtherelasticitymodelling
Minorsurgery Seg1 Seg2 Seg4
Minsurgfirm –0.57 –0.17 –1.09
Minsurgcomphi 0.202 0.475 –0.59
Minsurgcomplo –0.06 0.291 0.215
Segment1:Anelasticity<|1.00|(0.573,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthehighcompetitorisaweaksubstitute.Thereforetestincreasingprice.
Segment2:Anelasticity<|1.00|(0.173,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Bothcompetitorsarestrongsubstitutes.Therefore(cautiously)testincreasingprice.
Segment4:Anelasticity>|1.00|(1.090,inabsoluteterms)meansthisproductforthissegmenthasademandthatis(barely)elastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Thelowcompetitorisaweaksubstitute.Thereforetestdecreasingprice.
Table3.13Furtherelasticitymodelling
Exams Seg1 Seg2 Seg4
Examfirm –0.1 –0.03 –0.1
Examcomphi 0.008 0.075 0.096
Examcomplo –0.02 –0.03 0.023
Segment1:Anelasticity<|1.00|(0.100,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmentisloyal(viamarketresearch)andthehighcompetitorisaweaksubstitute.Thereforetestincreasingprice.
Segment2:Anelasticity<|1.00|(0.025,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thehighcompetitorisastrongsubstitute.Thereforetestincreasingprice.
Segment4:Anelasticity<|1.00|(0.095,inabsoluteterms)meansthisproductforthissegmenthasademandthatisinelastic.Thissegmenttendstobedissatisfiedwithahighnumberofdefectors(viamarketresearch).Bothcompetitorsaresubstitutesandthehighcompetitorisastrongsubstitute.Therefore(cautiously)testincreasingprice.
Theaboveanalysisshowshowelasticitycanbeusedasastrategicweapon.Becauseitinvolvesbothown-price(customer’ssensitivity)aswellascross-price(potentialcompetitor’sretaliation)thestrategicleversarelucrative.
ExampleofelasticityguidanceNowlet’stalkabouttransferringthemodellingfromthesegmentleveltothecliniclevel,wherepricingguidanceneedstobe.Thebasicideawastousethesegmentmodel’spricecoefficientandapplythattotheelasticitycalculationbyclinic.Thatis,elasticityatthesegmentlevel:
Segmentquantity=
Segmentprice-coefficient*segmentaverageprice/segmentaveragequantity.
Translatingelasticityto(each)clinic:Clinicquantity=
Segmentprice-coefficient*clinicaverageprice/clinicaveragequantity.
Nowlet’slookataparticularclinic’stestresults.Thisclinicisinsegment4,averypricesensitivesegment.Guidanceforvaccinex(atthisclinic)wastodecreasepriceby6%.Thisdecreasebroughttheclinic’spricepositiondownfromthehighest(comparedtothesurroundingcompetitors)toamiddle-pricedoption.Thehighcompetitorwasaweaksubstitute,sostrongretaliationwasthoughtunlikely.
Forthevaccinexproduct,duringthe90-dayfieldtest,thisclinicgenerated2,292invaccinexrevenueandsold84units,makingaveragenetrevenueof27.28.Thematchedcontrolcellwas25.86,givinga5.48%test-over-controlresult.Thiscomesfromtwothingsinteracting:first,thissegmentingeneralissensitivetopriceandsecond,thisclinichasno(strong)substitutes.Thusguidancewastodecreasepricewithnofearofretaliationfromthecompetitors.
Lookatanotherparticularclinic’stestresults.Thisclinicisinsegment1,aprice
insensitivesegment.Guidanceforexams(atthisclinic)wastoincreasepriceby2%.Thisincreasebroughttheclinic’spricepositionupfromthemiddle(comparedtothesurroundingcompetitors)tothehighest-pricedoption.Rememberthissegmenttendstobeveryloyal.Thehighcompetitorwasaweaksubstitute,sostrongretaliationwasthoughtunlikely.
Fortheexamproduct,duringthe90-dayfieldtest,thisclinicgenerated27,882inexamrevenueandsold499unitsmakingaveragenetrevenueof55.88.Thematchedcontrolcellwas47.41givinga17.85%test-over-controlresult.Thiscomesfromtwothingsinteracting:first,thissegmentingeneralisinsensitivetopriceandsecond,thissegmentandthisclinichaveno(strong)substitutes.Thusguidancewastoincreasepricewithnofearofretaliationfromeitherthecustomersorcompetitors.
Last:testvscontrolTherewerenearly100clinicsthatmetcriteriatobepartofthefieldtest.Therewereabout25testclinicsand75controlclinics.Thetestclinicswouldgettheelasticityguidanceandthecontrolclinicswouldcontinuebusinessasusual.
Matchedcellsbyregionbysegmentweredesigned.Thetestmetricwasaveragenetrevenue(byregion,bysegment,byproduct,etc.).Theoverallresultwasthatthetestclinicsoutperformedthecontrolclinics,intermsofaveragenetrevenue,by>10%in90days.Ofcourseregionsandsegmentsandproductshadadistributionofresults.Oneregionwasextremelypositive,anotherregionwasslightlynegative,onesegment(segment1,theloyalsegment)wasverypositiveandsegment4(thedissatisfiedsegment)waslessso.Suchastrongoverallresultindicateselasticityanalysiscanhelpguideoptimalpricing.
DiscussionIstheregametheoryinthemedicalservicesworld?Mostpractitionerswouldprobablysaynotreally,theirjobismoreaboutpatientcarethancompetition.However,oneinterestingexamplethatmightcontradictcommonwisdomcomesfromthisstudy.
Therehappenedtobetwoclinics,callthemXandY,whicheachcamefromthesameregion,thesamesegment4,butonehadastrongsubstitute(low)competitorandtheotherdidnot.Forexams,bothclinicsweregivenapricedecreaseof4%.Theclinicthatfacedthestrongcompetitor(clinicX)hadonehalftheaveragenetrevenuegainsvscontrolasclinicY.ThismightindicatethelowcompetitoraroundclinicXalsoloweredtheirexamprices(nextsurveywillverify)butbecausetheywereastrongsubstitutetheyonlyneededtolowerby1%tonegativelyimpactthefirm’s4%pricedecrease.
Itseemsthatatleastfortheshoppedproducts,pricesinthemedicalservicesareaareNOTsoinsensitive.Italsoseemsthatsomekindof‘gametheory’mightgoon,especially
incloselocale,torespondandretaliatewithpricechanges.Thatwasprobablywhythecompetitivesurveywasdoneinthefirstplace.
Conclusion
Whyiselasticitymodellingsorarelydone?Inmynearly30yearsofmarketinganalysisoverawidevarietyoffirmsinmanydifferentindustries,elasticitymodelling(asdiscussedhere)isvirtuallyneverdone.Oftentherearesurveysonpricesandpurchasing,etc.Butthisisself-reportedandprobablyself-serving(‘Yes,yourpricesaretoohigh!’).Anothercommonandslightlybettermarketingresearchtechniqueisconjointanalysis.Itissomewhatartificialandstillself-reportedbutanalyticallycontrolsforsuchthings.
Mypointisthatifthereisrealbehaviour–realpurchasingresponsesbasedonrealpricechanges–inthetransactionaldatabase,whywouldTHOSEdataelementsnotbebesttomeasurepricesensitivity?Theanswerseemstobethattranslatingwhatwaslearnedinmicroeconomicsintostatisticalanalysisisawidestepandnotusuallytaught.Thatis,goingfrompointelasticitytostatisticallymodellingelasticityisknowledgenoteasilygained.Note,however,thestepsarequitestraightforwardandthemodellingisnotdifficult.Perhapsthischapterisonewaytogetelasticitymodellingusedmoreinpractice,especiallygiventhepotentialbenefits.
Checklist
You’llbethesmartestpersonintheroomifyou:
Remembertherearetwotypesofstatisticalanalysis:dependentvariabletypesandinter-relationshiptypes.
Recallthattherearetwotypesofequations:deterministicandprobabilistic.
Observethatregression(ordinaryleastsquares,OLS)isadependentvariabletypeanalysisusingindependentvariablestoexplainthemovementinadependentvariable.
PointoutthatR2isameasureofgoodnessoffit;itshowsbothexplanatorypowerandsharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.
Rememberthatthet-ratioisanindicationofstatisticalsignificance.
Alwaysavoidthe‘dummytrap’;keeponelessbinaryvariableinasystem(eg,inaquarterlymodelonlyusethreenotfourquarters).
Thinkintermsofthetwokindsofelasticity:inelasticandelasticdemandcurves.
Focusontherealissueofelasticity:whatimpactithasontotalrevenue(notunits).
Rememberpriceandunitsarenegativelycorrelated.Inaninelasticdemandcurvetotalrevenuefollowsprice;inanelasticdemandcurvetotalrevenuefollowsunits.Toincreasetotalrevenueinaninelasticdemandcurvepriceshouldincrease;toincreasetotalrevenueinanelasticdemandcurvepriceshoulddecrease.
Rememberthatregressioncomeswithassumptions.
04
WhoismostlikelytobuyandhowdoItarget?Introduction
Conceptualnotes
Businesscase
Resultsappliedtothemodel
Liftcharts
Usingthemodel–collinearityoverview
Variablediagnostics
Highlight:Usinglogisticregressionformarketbasketanalysis
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionThenextmarketingquestionisaroundtargeting,particularlywhoislikelytobuy.Notethatthisquestionisstatisticallythesameas‘Whoislikelytorespond(toamessage,anoffer,etc.)?’Thisprobabilityquestionisthecentreofmarketingscienceinthatitinvolvesunderstandingchoicebehaviour.Thetypicaltechniqueinvolved(especiallyfordatabase/directmarketing)islogisticregression.
ConceptualnotesLogisticregressionhasalotofsimilaritiestoordinaryregression.Theybothhaveadependentvariable,theybothhaveindependentvariables,theyarebothsingleequations,andtheybothhavediagnosticsaroundtheimpactofindependentvariablesonthedependentvariableaswellas‘fit’diagnostics.
Buttheirdifferencesarealsomany.Logisticregressionhasadependentvariablethattakesononlytwo(asopposedtocontinuous)values:0or1,thatis,it’sbinary.Logisticregressiondoesnotusethecriteriaof‘minimizingthesumofthesquarederrors’(whichisordinaryleastsquares,orOLS)tocalculatethecoefficients,butrathermaximumlikelihoodviaagridsearch.Theinterpretationofthecoefficientsisdifferent.Oddsratios(eβ)aretypicallyusedandfitisnotaboutapredictedvs.anactualdependentvariable.
Maximumlikelihood:anestimationtechnique(asopposedtoordinaryleastsquares)
thatfindsestimatorsthatmaximizethelikelihoodfunctionobservingthesamplegiven.
Asaslightdetail,anotherimportantdifferencebetweenlogisticregressionandordinaryregressionisthatlogisticregressionactuallymodelsthe‘logit’ratherthanthedependentvariable.Alogitisthelogoftheevent/(1–theevent),thatis,thelogoftheoddsoftheeventoccurring.Recallthatordinaryregressionmodelsthedependentvariableitself.
(Bytheway,yesthereisatechniquethatcanmodel>twovalues,butnotcontinuous.Thatis,thedependentvariablemighthave3or4or5,etc.,values.Thistechniqueiscalledmultinomiallogit(discriminateanalysiswilldothisaswell)butwewillnotcoveritexcepttosayit’sthesameaslogisticregression,butthedependentvariablehascodesformultipledifferentvalues,ratherthanonly0or1.)Alloftheabovemeansthattheoutputoflogisticregressionisaprobabilitybetween0%and100%,whereastheoutputofordinaryregressionisanestimated(predicted)valuetofittheactualdependentvariable.Figure4.1showsaplotofactualevents(the0sandthe1s)aswellasthelogistic(s-curve).
Figure4.1Actualeventsandlogistics
Nowlet’slookatsomedataandrunamodel,becausethat’swhereallthefunis.
BUSINESSCASENowScott’sboss,veryimpressedwithwhathedidondemandmodelling,callsScottintohisoffice.
‘Scott,weneedtobettertargetthoselikelytobuyourproducts.Wesendoutmillions
ofcatalogues,basedonmagazinesubscriberlists,buttheresponserateistoosmall.WhatcanwedotomakeourmailingROIbetter?’
Scottthinksforaminute.Theresponseratewastoosmall?Responserateistherateofresponse,whichisthenumberofthosethatresponded(purchased),dividedbythetotalnumberthatgotthecommunication.It’sanoverallmetricofsuccess.
‘Wewanttotargetthoselikelytopurchasebasedonacollectionofcharacteristics.Wehavebothcustomersandnon-customersinourdatabase–fromthesubscriberlistswe’vebeenmailing–sowecouldmodeltheprobabilitytorespondbasedoncloneorlookalikemodelling.’
‘Whatdoesthatmean?’thebossasks.
‘I’llhavetodigintoitabitmorebutIknowwecandeveloparegression-typemodelthatscoresthedatabasewithdifferentprobabilitiestopurchaseforeachname.WecansortthedatabasebyprobabilitytopurchaseandonlymailasdeepasROIlimits.’
‘Soundsgood.Gettoworkonthatandgetbacktomewhenyouhavesomething.’WiththatthebossswivelsinhischairsoScottknowstheconversationisover.
Resultsappliedtothemodel
NoteTable4.1overleafwhichshowsthesimplifieddataset.Thisisalistofcustomersthatpurchasedandthosethatdidnotpurchase.Scotthasdataonwhichcampaignstheyeachreceived,aswellassomedemographics.Theobjectiveistofigureoutwhichofthenon-purchasers‘looklike’thosethatdidpurchaseandre-mailthem,perhapswiththesamecampaign(ifwefindonethatwaseffective)ordesignanothercampaign.
Table4.1Simplifieddataset
Id Revenue Purchase Campaigna Campaignb Campaignc Income Sizehh Educ
999 1500 1 1 0 1 150000 1 19
1001 1400 1 1 0 1 137500 1 19
1003 1250 1 1 0 0 125000 2 15
1005 1100 1 1 0 0 112500 2 13
1007 2100 1 0 1 0 145000 3 16
1009 849 1 0 0 0 132500 3 17
1010 750 1 0 0 0 165000 3 16
1011 700 1 0 0 0 152500 3 9
1013 550 1 1 0 1 140000 4 15
1015 850 1 1 0 1 127500 4 18
1017 450 1 1 0 1 115000 4 17
1019 0 0 0 0 1 102500 5 16
1021 0 0 0 0 1 99000 6 15
1023 0 0 0 1 1 86500 7 16
1025 0 0 0 1 1 74000 6 15
1027 0 0 0 1 1 61500 5 14
1029 0 0 0 1 1 49000 4 13
1033 0 0 1 0 1 111000 4 12
1034 0 0 0 0 1 98500 3 11
1035 0 0 0 0 1 86000 3 10
Theendresultwillbetoscorethedatabasewith‘probabilitytopurchase’inordertounderstandwhat(statistically)worksandstrategizewhattodonexttime.Thisisthecornerstoneofdirect(database)marketing.
Usingthe(contrived)dataset,youcanrunproclogisticdescendinginSAS.SeeTable4.2fortheoutputofthecoefficients.Thesecoefficientsarenotinterpretedthesamewayasinordinaryregression.
Table4.2Co-efficientoutput
Intercept –57.9
Campaigna –8.48
Campaignb 16.52
Campaignc –9.96
Income 0.001
Sizehh –3.41
Education 0.2
Becauselogisticregressioniscurvilinearandboundby0and1,theimpactoftheindependentvariablesaffectsthedependentvariabledifferently.Theactualimpactis
e^coefficient.
Forexample,education’scoefficientis0.200.Theimpactwouldbe:
e.200=1.225,thatis(2.71828^.200).
Thismeansthatforeveryyearofaddededucation,theincreaseinprobabilityis22.5%.Thismetriciscalledtheoddsratio.Thisobviouslyhastargetingimplications:aimourproductatthehighesteducatedfamiliesaspossible.Notethattwoofthethreecampaignsarenegative(whichtendtodecreaseprobabilitytopurchase)sothisalsoaddscredencetoneedingbettertargeting.
Forlogisticregression,thereisnotreallyagoodnessoffitmeasure,likeR2inOLS.Logithasaprobabilityoutputbetweenadependentvariableof1and0.Oftenthe‘confusionmatrix’isused,andpredictiveaccuracyisasignofagoodmodel.Table4.3showstheconfusionmatrixoftheabovemodel.(TheconfusionmatrixfromSASuses‘ctable’asanoption.)Saythereare10,000observations.
Table4.3Confusionmatrix
Actualnon-events Actualevents
Predictednon-events 1,000 1,750
Predictedevents 500 6,750
Thetotalnumberofevents(purchases)is6,750+1,750or8,500.Themodelpredictedonly6,750+500or7,250.Thetotalaccuracyofthemodelistheactualeventspredictedcorrectlyandtheactualnon-eventspredictedcorrectly,meaning6,750+1,000or7,750/10,000=77.5%.Thenumberoffalsepositivesis500(themodelpredicted500peoplewouldhavetheeventthatdidnot).Thisisanimportantmeasureofdirectmarketing,intermsofthecostofawrongmailing.
Asananalytic‘trick’itoftenhelpstodetermineifthedependentvariable(sales,inthiscase)hasanyabnormalobservations.Rememberthez-score?Thisisafastandsimplewaytocheckifanobservationis‘outofbounds’.Thez-scoreiscalculatedas((observation–mean)/standarddeviation).
Let’ssaythemeanofrevenueis358.45andthestandarddeviationofrevenueis569.72.So,ifyourunthiscalculationforalltheobservationsonrevenueyouwillseethat(Table4.1)id#1007((2,100–358.45)/569.72)=3.074.Thismeansthatobservationis>3standarddeviationsfromthemean,averynon-normalobservation.Itiscommontoaddanewvariable,callit‘positiveoutlier’anditwilltakethevaluesof0aslongasthez-scoreonsalesis<3.00,thenittakesthevalueof1ifz-score>3.Usethisnewvariableasanotherindependentvariabletohelpaccountforoutliers.Someofthecoefficientsshouldchangeandthefitusuallyimproves.Thisnewvariablecanbeseenasaninfluentialobservation.
Table4.4Newvariables
Intercept –51.9
Influence 15.54
Campaigna –6.06
Campaignb 16.6
Campaignc –9.07
Income 0.002
Sizehh –1.65
Education 0.211
Notethemostlyslightchangesincoefficients.Thisoughttomeanpredictiveaccuracyincreases.Notetheupdatedconfusionmatrixbelow.
Table4.5Updatedconfusionmatrix
Actualnon-events Actualevents
Predictednon-events 1,250 1,000
Predictedevents 250 7,500
Thetotalnumberofevents(purchases)isstill8,500butnotetheshiftinaccuracy.Themodelnowpredicts7,500+250=7,750.Thetotalaccuracyofthemodelistheactualeventspredictedcorrectlyandtheactualnon-eventspredictedcorrectly,meaning7,500+1,250or8,750/10,000=87.5%.Thenumberoffalsepositivesis250(themodelpredicted250peoplewouldhavetheeventthatdidnot).Thisisanimportantmeasureofdirectmarketing,intermsofthecostofawrongmailing.Themodelimprovedbecauseofaccountingforinfluentialobservations.
LiftchartsAcommonandimportanttool,especiallyindirect/databasemarketingisthelift(orgain)chart.
Lift/gainschart:avisualdevicetoaidininterpretinghowamodelperforms.Itcomparesbydecilesthemodel’spredictivepowertorandom.
Thisisasimpleanalyticdevicetoascertaingeneralfitaswellasatargetingaidintermsofhowdeeptomail.
Thegeneralprocedureistorunthemodelandoutputtheprobabilitytorespond.Sortthedatabasebyprobabilitytorespondanddivideinto10equal‘buckets’.Thencountthe
numberofactualrespondersineachdecile.Ifthemodelisagoodone,therewillbealotmorerespondersintheupperdecilesandalotfewerrespondersinthelowerdeciles.
Asanexample,saytheaverageresponserateis5%.Wehave10,000totalobservations(customers).Eachdecilehas1,000customersinit,someofthemhaverespondedandsomeofthemhavenot.Overallthereare500responders(500/10,000=5%).So,randomly,wewouldexpectonaverage50ineachdecile.Instead,becausethemodelworks,saythereare250indecile1anditdecreasesuntilthebottomdecilehasonlyoneresponderinit.The‘lift’isdefinedasthenumberofrespondersineachdeciledividedbytheaverage(expected)numberofresponders.Indecile1thismeans250/50=500%.Thisshowsusthatthefirstdecilehasaliftof5X,thattherearefivetimesmoreresponderstherethanaverage.Italsosaysthatthoseinthetopdecilewhodidnotrespondareverygoodtargets,sinceagain,theyall‘lookalike’.Thisisanindicationthemodelcandiscriminatetherespondersfromthenon-responders.
Figure4.2Liftchart
Notethatineachdecilethereare1,000customers.250alreadyrespondedindecile1.Allofthecustomersindecile1haveahighprobabilityoftop10%responding.Thereare750morepotentialtargetsindecile1thathaveNOTresponded.Thisistheplacetofocustargetingandthisiswhyit’scalled‘clonemodelling’.
Tobrieflyaddressthedatabasemarketingquestion,‘HowdeepdoImail?’let’slookattheliftchartabove.Thisisanaccumulationofactualresponderscomparedtoexpectedresponders.Dependingonbudget,etc.,thisliftcharthelpstotarget.Mostdatabasemarketerswillmailasfarasanydecileout-respondstheaverage.Thatis,untiltheliftis<100%.Anotherwayofsayingthisistomailuntilthemaximumdistancebetweenthecurvesisachieved.However,asapracticalmatter,mostdirectmarketers(especiallycataloguers)haveasetbudgetandcanonlyAFFORDtomailsodeep,regardlessofthestatisticalperformanceofthemodel.Thus,mostoftheattentionisonthefirstoneortwodeciles.
Usingthemodel–collinearityoverview
Anotherverycommonissuethatmustbedealtwithin(especially)regressionmodelingiscollinearity.
Collinearity:ameasureofhowvariablesarecorrelatedwitheachother.
Collinearityisdefinedasoneormoreindependentvariablesthataremorecorrelatedwitheachotherthaneitherofthemiswiththedependentvariable.Thatis,ifthereare,saytwoindependentvariablesinthemodel,damagingcollinearityisifX1andX2aremorecorrelatedthanX1andYand/orX2andY.Mathematically:
ρ(X1,X2)>ρ(Y,X1)orρ(Y,X2)whereρ=correlation.
Theconsequencesofcollinearityarethat,whiletheparameterestimatesofeachindependentvariableremainunbiased,thestandarderrorsaretoowide.Thismeanswhensignificancetestingiscalculated(parameterestimate/standarderroroftheestimate)forat-ratio(oraWaldratio)thesevariablestendtoshowlesssignificancethantheyreallyhave.Thisisbecausethestandarderroristoolarge.Collinearitycanalsoswitchsignswhichreturnnonsensicalresults.Thus,collinearitymustbetestedanddealtwith.
Aquicknoteonoverlysimplistic‘diagnostics’I’veseeninpracticefollows.It’spossibletorunacorrelationmatrixonthevariablesandobtainthe(simplePearson)correlationcoefficientforeachpair.ThisdoesNOTcheckfordamagingcollinearity,thisisacheckforsimple(linear)correlation.I’veseenanalystsjustrunthematrixanddrop(yes,drop!)anindependentvariablejustbecausethecorrelationofitandanotherindependentvariableis,say,greaterthan80%.(Wheredidtheyget80%?Thisisarbitraryandbeneathanyonecallingthemselvesanalytic.)Ok,offthesoapbox.
Theabove‘testing’isirksomebecauserealtesting(withSASandSPSS)isrelativelyeasy.VIFisthemostcommon.Runprocregressandinclude‘/VIF’asanoption.VIFisthevarianceinflationfactor.Basicallyitregresseseachindependentvariableonallotherindependentvariablesanddisplaysametric.Thismetricis1/(1–R2).Ifthismetricis>10.0(indicatinganR2of>90%)thenasaruleofthumb,somevariableistoocollineartoignore.Thatis,iftherearethreeindependentvariablesinthemodel,x1,x2andx3,VIFwillregressx1=f(x2andx3)andshowR2,thenx2=f(x1andx3)andshowR2andlastx3=f(x1andx2)andshowR2.
Notethatwearenotreallytestingforcollinearity(becausetherewillnearlyALWAYSbesomecollinearity).Wearetestingforcollinearitybadenoughtocauseaproblem(calledillconditioning).
Therecommendedapproachistoincludevariablesthatmaketheoreticsense.IfVIFindicatesavariableiscausingaproblembutthereisastrongreasonforthatvariabletobeincluded,oneoftheothervariablesshouldbeexaminedinstead.(ItisimportanttonotethatdroppingavariableisNOTthefirstcourseofaction.Simplydroppingavariableisarbitrary(andverysimplistic)analytics).Thatis,astronger,moredefendablemodel
resultsfromastrategicunderstandingofthedatageneratingprocess,notbasedonstatisticaldiagnostics.Thescienceofmodellingwouldemphasizediagnostics;theartofmodellingwouldemphasizebalanceandbusinessimpact.DidImentionsometimesinapracticalbusinessenvironment‘badstatistics’areallowedbalancedonrunningabusiness?Gasp!
Dependingontheissuesanddata,etc.,otherpossiblesolutionsexist.Puttingalltheindependentvariablesinafactormatrixwouldkeepthevariables’varianceintactbutthefactorsare,bydefinition,orthogonal(uncorrelated).
Another(correcting)techniqueiscalledridgeregression(typicallyusingSteinestimates)andrequiresspecialsoftware(inSAS‘procregdata=x.xoutvifoutset=xxridge=0to1by0.01;modely=x1x2’etc.)andexpertisetouse.Ingeneral,ittradescollinearityforbiasintheparameterestimates.Again,thebalanceisinknowingthecoefficientsarenowbiasedbutadrasticreductionincollinearityresults.Isitworthit?Sorry,buttheansweris,itdepends.
WhileVIFishelpful,theconditionindexhasbecome(sinceBelsley,KuhandWelsch’s1980bookRegressionDiagnostics)thestateoftheartincollinearitydiagnostics.Themathsbehinditisfascinatingbutmanytextbookswillilluminatethat.Wewillfocusonanexample.Theapproach,withoutgettingTOOmathematical,istocalculatetheconditionindexofeachvariable.Theconditionindexisthesquarerootofthelargesteigenvalue(calledthecharacteristicroot)dividedbyeachvariable’seigenvalue.(Aneigenvalueisthevarianceofeachprincipalcomponentwhenusedinthecorrelationmatrix.)Theeigenvaluesadduptothenumberofvariables(includingtheintercept):seeTable4.6below.Thisisapowerfuldiagnosticbecauseasetofeigenvaluesofrelativelyequalmagnitudeindicatesthatthereislittlecollinearity.Asmallnumberoflargeeigenvaluesindicatesthatasmallnumberofcomponentvariablesdescribemostofthevariabilityofthevariables.Azeroeigenvalueimpliesperfectcollinearityand–thisisimportant–verysmalleigenvaluesmeansthereisseverecollinearity.Again,aneigenvaluenear0.00indicatescollinearity.Asaruleofthumb,aconditionindex>30indicatesseverecollinearity.
Table4.6Variance
Indvar
Eigenvalue
Condindex
Propinter
PropX1
PropX2
PropX3
PropX4
PropX5
PropX6
Inter 6.861 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
X1 0.082 9.142 0.000 0.000 0.000 0.091 0.014 0.000 0.000
X2 0.046 12.256 0.000 0.000 0.000 0.064 0.001 0.000 0.000
X3 0.011 25.337 0.000 0.000 0.000 0.427 0.065 0.001 0.000
X4 0.000 230.420 0.000 0.000 0.000 0.115 0.006 0.016 0.456
X5 0.000 1048.100 0.000 0.000 0.831 0.000 0.225 0.328 0.504
X6 0.000 432750.000 0.999 1.000 0.160 0.320 0.689 0.655 0.038
CommonoutputsalongwiththeVIFandconditionindexaretheproportionsofvariance(seeTable4.6).Thisproportionofvarianceshowsthepercentageofthevarianceofthecoefficientassociatedwitheacheigenvalue.Ahighproportionofvariancerevealsastrongassociationwiththeeigenvalue.
Let’stalkaboutTable4.6.Firstlookattheconditionindex.Theeigenvalueontheinterceptis6.86andthefirstconditionindexisthesquarerootof6.86/6.86=1.00.Nowthesecondconditionindexisthesquarerootof6.86/0.082=9.142.Thediagnosticsindicatethatthereareasmanycollinearityproblemsasthereareconditionindexes>30,orinthiscasetheremaybethreeproblems(230.42,1048.1and432750).Looktotheproportionofvariancetable.Anyproportion>0.50isaredflag.LookatthelastX6variable.VariableX6isrelatedtotheintercept,X1,X4andX5.X5isrelatedtoX2(0.8306)andX6(0.504).ThisindicatesX6isthemostproblematicvariable.Somethingoughttobedoneaboutthat.
PossiblesolutionsmightmeancombiningX5andX6intoafactorandusetheresultingfactorasavariableinsteadofX5andX6ascurrentlymeasured.Thisisbecausefactorsarebyconstructionuncorrelated(wecallitorthogonal).Anotheroptionwouldbetotransform(especially)X6,eithertakingitsexponent,orsquareroot,orsomethingelse.ThepointistotrytofindanX6-likevariablecorrelatedwiththedependentvariablebutLESSCORRELATEDwith,especially,X5.Areyouabletogetalargersample?CanyoutakedifferencesinX6,ratherthanjusttherawmeasure?Andyes,ifthereisatheoreticalreason,youcandropX6andre-runthemodelandseewhatyouhave.Droppingavariableisalastresort.HaveImentionedthat?
AbriefproceduralnoteOnprobablymostoftheanalytictechniqueswe’lltalkabout,certainassumptionsarebuiltin.Thatis,regressionhasmanyassumptionsaboutlinearity,normality,etc.WhileforOLSImentionedoneassumption(especiallyfortimeseriesdata)wasnoserialcorrelation,thissameassumptionisappliedtologisticregressionaswell.Mostregressiontechniquesusemostoftheseassumptions.SowhileinlogitIshowedhowtotestandcorrectforcollinearity,thissametestneedstobeappliedinOLSaswell.Itjusthappenedtocomeupduringourdiscussionoflogisticregression.
Thismeansthatinreality,foreveryregressiontechniqueused,everyassumptionshouldbecheckedandeveryviolationofassumptionsshouldbetestedforandcorrected,ifpossible.ThisgoesforOLS,logitandanythingelse.Okay?
VariablediagnosticsAsinallregression,asignificancetestisperformedontheindependentvariablesbutbecauselogitisnon-linear,thet-testbecomestheWaldtest(whichisthet-testsquared,so1.962=3.84,at95%).Thep-valuestillneedstobe<0.05.
PseudoR2
LogisticregressiondoesnothaveanR2statistic.Thisfreaksalotofpeopleoutbutthat’swhyIshowedthe‘confusionmatrix’,whichisameasureofgoodnessoffit.Remember(fromOLS)R2isthesharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.Themorevariancethesetwosharethecloserthepredictedandactualdependentvariablesare.RememberOLSoutputsanestimateddependentvariable.LogisticregressiondoesNOToutputanestimateddependentvariable.Theactualdependentvariableis0or1.The‘logit’isthenaturallogoftheevent/(1–event).Sotherecanbeno‘estimated’dependentvariable.IfyouHAVEtohavesomemeasureofgoodnessoffitI’dsuggestusingtheloglikelihoodonthecovariateandintercept.SPSSandSASbothoutputthe–2LLontheinterceptonlyandthe–2LLontheinterceptandcovariates.Thinkofthe–2LLoninterceptasTSS(totalsumofsquares)and–2LLoninterceptandcovariatesasRSS(regressionsumofsquares).R2isRSS/TSSandthiswillgiveanindication(calledapseudo-R2)forthosethatneedthatmetric.
ScoringthedatabasewithprobabilityformulaTypicallyafteralogisticregressionisrun,especiallyinadatabasemarketingprocess,themodelhastobeappliedtoscorethedatabase.Yes,SASnowhas‘procscore’butIwantyoutobeabletodoityourselfandtounderstandwhat’shappening.It’soldfashionedbutyouwillknowmore.
Saywehavethebelow(Table4.7)modelwithprobabilitytopurchase.Thatis,thedependentvariableispurchase=1fortheeventandpurchase=0forthenon-event.Becauseofthelogisticcurveboundingbetween0and1,theformulaisprobability=1/(1+e–Z)whereZ=α+βXi.Fortheabovemodelthismeans:
Probability=1/(1+2.71828^–(4.566+X1*–0.003+x2*1.265+x3*0.003))
Thisreturnsaprobabilitybetween0%and100%foreachcustomer(2.71828=e).Soapplythisformulatoyourdatabaseandeachcustomerwillhaveascore(thatcanbeusedforaliftchart,seeabove)forprobabilitytopurchase.
Table4.7Probabilitytopurchase
Independentvariable Parameterestimate
Intercept 4.566
X1 –0.003
X2 1.265
X3 0.003
HIGHLIGHT
USINGLOGISTICREGRESSIONFORMARKETBASKETANALYSIS
AbstractIngeneral,marketbasketanalysisisabackward-lookingexercise.Itusesdescriptiveanalysis(frequencies,correlation,mathematicalKPIs,etc.)andoutputsthoseproductsthattendtobepurchasedtogether.Thatgivesnoinsightsintowhatmarketersshoulddowiththatoutput.Predictiveanalytics,usinglogisticregression,showshowmuchtheprobabilityofaproductpurchaseincreases/decreasesgivenanotherproductpurchase.Thisgivesmarketersastrategiclevertouseinbundling,etc.
Whatisamarketbasket?Ineconomics,amarketbasketisafixedcollectionofitemsthatconsumersbuy.ThisisusedformetricslikeCPI(inflation)etc.Inmarketing,amarketbasketisanytwoormoreitemsboughttogether.
Marketbasketanalysisisused,especiallyinretail/CPG,tobundleandofferpromotionsandgaininsightinshopping/purchasingpatterns.‘Marketbasketanalysis’doesnot,byitself,describeHOWtheanalysisisdone.Thatis,thereisnoassociatedtechniquewiththosewords.
Howisitusuallydone?Therearethreegeneralusesofdata:descriptive,predictiveandprescriptive.Descriptiveisaboutthepast,predictiveusesstatisticalanalysistocalculateachangeonanoutputvariable(eg,sales)givenachangeinaninputvariable(say,price)andprescriptiveisasystemthattriestooptimizesomemetric(typicallyprofit,etc.).Descriptivedata(means,frequencies,KPIs,etc.)isanecessary,butnotusuallysufficient,step.Alwaysgettoatleastthepredictivestepassoonaspossible.Notethatpredictiveheredoesnotnecessarilymeanforecastedintothefuture.Structuralanalysisusesmodelstosimulatethemarket,andestimate(predict)whatcauseswhattohappen.Thatis,usingregression,achangeinpriceshowswhatistheestimated(predicted)changeinsales.
Marketbasketanalysisoftenusesdescriptivetechniques.Sometimesitisjusta‘report’ofwhatpercentofitemsarepurchasedtogether.Affinityanalysis(aslightstepabove)ismathematical,notstatistical.Affinityanalysissimplycalculatesthepercentoftimecombinationsofproductsarepurchasedtogether.Obviouslythereisnoprobabilityinvolved.Itisconcernedwiththerateofproductspurchasedtogether,andnotwithadistributionaroundthatassociation.ItisverycommonandveryusefulbutNOTpredictive–thereforeNOTsoactionable.
LogisticregressionLet’stalkaboutlogisticregression.Thisisanancientandwell-knownstatisticaltechnique,probablytheanalyticpillaruponwhichdatabasemarketinghasbeenbuilt.Itissimilartoordinaryregressioninthatthereisadependentvariablethatdependsononeormoreindependentvariables.Thereisacoefficient(althoughinterpretationisnotthesame)andthereisa(typeof)t-testaroundeachindependentvariableforsignificance.
Thedifferencesarethatthedependentvariableisbinary(havingtwovalues,0or1)inlogisticandcontinuousinordinaryregressionandtointerpretthecoefficientsrequiresexponentiation.Becausethedependentvariableisbinary,theresultisheteroskedasticity.Thereisno(real)R2,and‘fit’isaboutclassification.
Howtoestimate/predictthemarketbasketTheuseoflogisticregressionintermsofmarketbasketbecomesobviouswhenitisunderstoodthatthepredicteddependentvariableisaprobability.Theformulatoestimateprobabilityfromlogisticregressionis:
P(i)=1/1+e–Z
whereZ=α+βXi.Thismeansthattheindependentvariablescanbeproductspurchasedinamarketbaskettopredictlikelihoodtopurchaseanotherproductasthedependentvariable.Notethatthereisnotanissueofcausalityhere,ie,presupposingthatone(independentproduct)causesthepurchaseofthedependentproduct,onlythattheyareassociatedtogether.Theabovemeanstospecificallytakeeach(major)categoryofproduct(focusdrivenbystrategy)andrunningaseparatemodelforeach,puttinginallsignificantotherproductsasindependentvariables.Forexample,saywehaveonlythreeproducts,x,yandz.Theideaistodesignthreemodelsandtestsignificanceofeach,meaningusinglogisticregression:
x=f(y,z)
y=f(x,z)
z=f(x,y).
Ofcourseothervariablescangointothemodelasappropriatebuttheinterestiswhether
ornottheindependent(product)variablesaresignificantinpredicting(andtowhatextent)theprobabilityofpurchasingthedependentproductvariable.Ofcourse,aftersignificanceisachieved,theinsightsgeneratedarearoundthesignoftheindependentvariable,ie,doestheindependentproductincreaseordecreasetheprobabilityofpurchasingthedependentproduct.
AnexampleAsasimpleexample,sayweareanalysingaretailstore,withcategoriesofproductslikeconsumerelectronics,women’saccessories,newbornandinfantitems,etc.Thus,usinglogisticregression,aseriesofmodelsshouldberun.Thatis:
consumerelectronics=f(women’saccessories,jewelleryandwatches,furniture,entertainment,etc.)
Thismeanstheindependentvariablesarebinary,codedasa‘1’ifthecustomerboughtthatcategoryanda‘0’ifnot.Table4.8detailstheoutputforallofthemodels.Notethatotherindependentvariablescanbeincludedinthemodel,ifsignificant.Thesewouldoftenbeseasonality,consumerconfidence,promotionssent,etc.
Table4.8Associatedprobabilities
Consumerelectronics
Women’saccessories
Newborn,infant,etc.
Jewellery,watches
Furniture Homedécor
Entertainment
Consumerelectronics
XXX Insig Insig –23% 34% 26% 98%
Women’saccessories
Insig XXX 39% 68% 22% 21% Insig
Newborn,infant,etc.
Insig 43% XXX –11% –21% –31% 29%
Jewellery,watches
–29% 71% –22% XXX 12% 24% –11%
Furniture 31% 18% –17% 9% XXX 115% 37%
Homedécor 29% 24% –37% 21% 121% XXX 31%
Entertainment 85% Insig 31% –9% 41% 29% XXX
Sportinggoods
18% –37% –29% –29% 24% 9% 33%
Tointerpret,lookat,say,thehomedécormodel.Ifacustomerboughtconsumerelectronics,thatincreasestheprobabilityofbuyinghomedécorby29%.Ifacustomer
boughtnewborn/infantitems,thatdecreasestheprobabilityofbuyinghomedécorby37%.Ifacustomerboughtfurniture,thatincreasestheprobabilityofbuyinghomedécorby121%.Thishasimplications,especiallyforbundlingandmessaging.Thatis,offering,say,homedécorandfurnituretogethermakesgreatsense,butofferinghomedécorandnewborn/infantitemsdoesnotmakesense.
Andhereisaspecialnoteaboutproductspurchasedtogether.Ifitisknown,viatheabove,thathomedécorandfurnituretendtogotogether,thesecanbeandshouldbebundledtogether,messagedtogether,etc.ButthereisnoreasontoPROMOTEthemtogetherortodiscountthemtogetherbecausetheyarepurchasedtogetheranyway.
ConclusionTheabovedetailedasimple(andmorepowerfulway)todomarketbasketanalysis.Ifgivenachoice,alwaysgobeyondmeredescriptivetechniquesandapplypredictivetechniques.
Checklist
You’llbethesmartestpersonintheroomifyou:
Candifferentiatebetweenlogisticandordinaryregression.Logisticandordinaryregressionaresimilarinthatbotharesingleequationshavingadependentvariableexplainedbyoneormoreindependentvariables.Theyaredissimilarinthatordinaryregressionhasacontinuousdependentvariablewhilelogisticregressionhasabinaryvariable;ordinaryregressionusesleastsquarestoestimatethecoefficientswhilelogisticregressionusesmaximumlikelihood.
Rememberthatlogisticregressionpredictsaprobabilityofanevent.
Alwaystestforoutliers/influentialobservationsusingz-scores.
Pointoutthatthe‘confusionmatrix’isameansofgoodnessoffit.
Observethatlift/gainchartsareusedasameasureofmodellingefficacyaswellas(egindirectmail)depthofmailing.
Remembertoalwayscheck/correctforcollinearity.
Suggestlogisticregressionasawaytomodelmarketbaskets.
05
Whenaremycustomersmostlikelytobuy?Introduction
Conceptualoverviewofsurvivalanalysis
Businesscase
Moreaboutsurvivalanalysis
Modeloutputandinterpretation
Conclusion
Highlight:Lifetimevalue:howpredictiveanalysisissuperiortodescriptiveanalysis
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionSurvivalanalysisisanespeciallyinterestingandpowerfultechnique.Intermsofmarketingscienceitisrelativelynew,mostlygettingexposureintheselast20yearsorso.Itanswersaveryimportantandparticularquestion:‘WHENisanevent(purchase,response,churn,etc.)mostlikelytooccur?’I’dsubmitthisisamorerelevantquestionthan‘HOWLIKELYisanevent(purchase,response,churn,etc.)tooccur?’Thatis,acustomermaybeVERYlikelytopurchasebutnotfor10months.Istiminginformationofvalue?Ofcourseitis;remember,timeismoney.
Bewarethough.Giventheincreaseinactionableinformation,itshouldbenosurprisethatsurvivalanalysisismorecomplexthanlogisticregression.Rememberhowmuchmorecomplexlogisticregressionwasthanordinaryregression?
ConceptualoverviewofsurvivalanalysisSurvivalanalysis(viaproportionalhazardsmodelling)wasessentiallyinventedbySirDavidCoxin1972withhisseminalandoft-quotedpaper,‘RegressionModelsandLifeTables’intheJournaloftheRoyalStatisticalSociety(Cox,1972).It’simportanttonotethistechniquewasspecificallydesignedtostudytimeuntileventproblems.Thiscameoutofbiostatisticsandtheeventofstudywastypicallydeath.That’swhyit’scalled‘survivalanalysis’.Getit?
Thegeneralusecasewasindrugtreatment.Therewouldbe,say,adrugstudywhereapanelwasdividedintotwogroups;onegroupgotthenewdrugandtheothergroupdidnot.Everymonththetestsubjectswerecalledandbasicallyasked,‘Areyoustillalive?’andtheirsurvivalwastracked.Therewouldbetwocurvesdeveloped,onefollowingthetreatmentgroupandanotherfollowingthenon-treatmentgroup.Ifthetreatmenttendedtoworkthetimeuntilevent(death)wasincreased.
Onemajorissueinvolvedcensoredobservations.It’saneasymattertocomparetheaveragesurvivaltimesofthetreatmentvs.thenon-treatmentgroup.
Censoredobservation:thatobservationwhereinwedonotknowitsstatus.Typicallytheeventhasnotoccurredyetorwaslostinsomeway.
Butwhataboutthosesubjectsthatdroppedoutofthestudybecausetheymovedawayorlostcontact?Orthestudyendedandnoteveryonehasdiedyet?Eachoftheseinvolvescensoredobservations.ThequestionaboutwhattodowiththesekindsofobservationsiswhyCoxregressionwascreated;anon-parametricpartiallikelihoodtechnique,whichhecalledproportionalhazards.Itdealswithcensoredobservations,whicharethosepatientsthathaveanunknowntimeuntileventstatus.Thisunknowntimeuntileventcanbecausedbyeithernothavingtheeventatthetimeoftheanalysisorlosingcontactwiththepatient.
Whataboutthosesubjectsthatdiedfromanothercauseandnotthecausethetestdrugwastreating?Arethereothervariables(covariates)thatinfluence(increaseordecrease)thetimeuntiltheevent?Thesequestionsinvolveextensionsofthegeneralsurvivalmodel.Thefirstisaboutcompetingrisksandthesecondisaboutregressioninvolvingindependentvariables.Thesewillbedealtwithsoonenough.
BUSINESSCASEAttheendoftheyearScottcalledhisteamandthemarketingorganizationtogetherforareviewandbrainstormingexercise.ThisissomethingScottbelievedeverysmartanalyticsproshoulddo.Hewasespeciallyinterestedinhowtheanalyticteamwasperceivedasprovidingvaluelastyearandwhatmightbedonedifferentlyintheupcomingyear.
DuringthemeetingthemarketingmanagerscomplimentedScottandhisteamforprovidingactionableinsights.Theresultsgavemostofthemagoodbonusandtheywantedtogetanotheronethisyear.TheydidnotallcompletelyunderstandthetechnicaldetailsandScottmadetheculturearoundthatokay.Hetriedtomakehisteamviewedasconsultants;accessible,conversationalandengagedwiththebroaderorganization.
‘Thanks’,Scottsaidandturnedtothedirectorofconsumermarketing,Stacy.‘Wherecanweimprove?Whattargetingwouldhelpyouandyourteam?’
‘Well,wehaveaprettygoodprocessnow.Wepulllistsbasedonlikelihoodtorespond.It’sworkedwell.’
‘Yeah,I’mgladofthat.Theliftchartsfromlogithelpedusmailonlyasdeepasweneededto.’
‘ThisgivesusthebestROIinthecompany.’
‘Butisthatallwecando?Justtargetthosemostlikelytorespond?’Scottasked.
‘Whatelseisthere?’Stacyasked,checkingherphone.
‘Yeah,I’mnotsure’,Scottsaid.‘Whatdoyouneedtoknowtodoyourjob?Whatiftherewerenorestrictionsondataorfeasibilityoranythingelse?Youhaveamagicbuttonthatifyoupushityouwouldknowtheonethingthatwouldallowyoutodoyourjobbetter,betterthaneverbefore,aknowledgethatgivesyouatremendousadvantage.’
‘Easy!’Kristinasaid.‘IfIknewwhatproducteachcustomerwouldpurchaseinwhatorder,thatis,ifIknewWHENhewouldpurchaseadesktop,oranotebook,Iwouldnotsendalotofuselesscataloguesore-mailstohim.I’dsendtohimthemostcompellingmarcomatjusttherighttimewithjusttherightpromotionandjusttherightmessagingtomaximizehispurchase.’
Theyalllookedather.Thentheynoddedtheirheads.KristinahadtalkedwithScottaboutjoininghisteamaftershegraduates.
‘Itsoundslikesciencefiction’,Stacysaid.‘Wewouldgetalistofcustomerswithamostlikelytimetopurchaseeachproduct?’
Scottrubbedhischin.‘Yes.It’sapredictionofwheneachcustomerisgoingtopurchaseeachproduct.’
‘But’,saidMark,‘whatdoesthatmean?Before?’MarkwasananalystonScott’steam.‘Wewanttopredictwhenthey’llpurchase?’
‘Ithinkso’,Scottsaid.‘Predictwhenthey’llbuyadesktop,whenthey’llbuyanotebook,etc.’
‘Imaginehavingthedatabasescoredwiththenumberofdaysuntileachcustomerislikelytobuypersonalelectronics,adesktop,etc.’Kristinasaid.‘We’djustsortthedatabasebyproductsandthosemorelikelytobuysoonerwouldgetthecommunication.’
‘Butdoesthatmeanusingregression,orlogit,orwhat?’
‘Idon’tknow’,Scottsaid.‘Whatdowedoaboutpredictingthosewhohavenotpurchasedaproduct?Isthisprobabilitytobuyateachdistincttimeperiod?’
Theyallleftthemeetingexcitedaboutthenewmetric(timeuntilpurchase)butScottwaswonderingwhattechniquewouldanswerthatquestion.Iftheyusedordinaryregression,thedependentvariablewouldbe‘numberofdaysuntilpurchaseofadesktop’
basedonsomezero-day,sayJanuaryfirsttwoyearsago.Thosethatpurchasedadesktopwouldhavetheeventatthatmanydays.ThosethatdidnotpurchaseadesktopgaveScottachoice.Eitherhewouldcapthenumberofdaysatnow,saytwoyearsfromthezerodate,whichmeans,say,725,iftheywereonfilefromthezerodateonward.Thatis,thosethathavenotpurchasedadesktopwouldbeforcedtohavetheeventat725days.Notagoodchoice.Theotheroptionwouldbetodeletethosethatdidnotpurchaseadesktop.Alsonotagoodchoice.
Rulenumerouno:nevereverunderanycircumstancesdeletedata.Never.Ever.Thisisan‘Offwiththeirheads!’crime(unlessofcoursethedataiswrongoranoutlier).
Ignoringthetimeuntiltheevent-dependentvariablecouldgiverisetologisticregression.Thatis,thosereceivinga1iftheydidpurchaseadesktopanda0iftheydidnot.Thisputshimrightbackintoprobability,andtheyallagreedthattimingwasamorestrategicoption.SoScottconcludedthatbothOLSandlogithaveseverefaultsintermsoftimeuntileventproblems.
It’simportanttomakeaclarificationaboutatrapalotofpeoplefallinto.Survivalanalysisisatechniquespecificallydesignedtoestimateandunderstandtimeuntileventproblems.Theunderlyingassumptionisthateachtimeperiodisindependentofeachothertimeperiod.Thatis,thepredictionhasno‘memory’.Someunder-educated/under-experiencedanalyststhinkthatifwearesaytryingtopredictwhatmonthaneventwillhappentheycando12logitsandhaveonemodelforJanuary,anotherforFebruary,etc.Thecollecteddatawouldhavea1ifthecustomerpurchasedinJanuaryanda0ifnot,likewise,ifthemodelwasforFebruaryacustomerwouldhavea1iftheypurchasedinFebruaryanda0ifnot.Thisseemslikeitwouldwork,right?Wrong.FebruaryisnotindependentofJanuary.InorderforthecustomertobuyinFebruarytheyhadtodecideNOTtobuyinJanuary.See?Thisiswhylogitisinappropriate.
Nowforyouacademicians,yes,logisticregressionisappropriateforasmallsubsetofaparticularproblem.Ifthedataisperiodic(aneventthatcanonlyoccuratregularandspecificintervals)then,yes,logisticregressioncanbeusedtoestimatesurvivalanalyses.Thisrequiresawholedifferentkindofdataset,onewhereeachrowisnotacustomerbutatimeperiodwithanevent.I’dstillsuggesteventhen,whynotjustusesurvivalanalysis(inSASliferegorphreg)?
Moreaboutsurvivalanalysis
Asmentioned,survivalanalysiscamefrombiostatisticsintheearly1970s,wherethesubjectstudiedwasanevent:death.Survivalanalysisisaboutmodellingthetimeuntilanevent.Inbiostatisticstheeventistypicallydeathbutinmarketingtheeventcanberesponse,purchase,churn,etc.
Duetothenatureofsurvivalstudies,thereareacoupleofcharacteristicsthatareendemictothistechnique.Asalludedtoearlier,thedependentvariableistimeuntilevent,
sotimeisbuiltintotheanalysis.Thesecondendemicthingtosurvivalanalysisisobservationsthatarecensored.Acensoredobservationiseitheranobservationthathasnothadtheeventoranobservationthatwaslosttothestudyandthereisnoknowledgeofhavingtheeventornot–butwedoknowatsomepointintimethattheobservationhasnothadtheevent.
Inmarketingit’scommonfortheeventtobeapurchase.Imaginescoringadatabaseofcustomerswithtimeuntilpurchase.Thatisfarmoreactionablethan,fromlogisticregression,probabilityofpurchase.
Let’stalkaboutcensoredobservations.Whatcanbedoneaboutthem?Rememberwedonotknowwhathappenedtotheseobservations.Wecoulddeletethem.Thatwouldbesimple,butdependinghowmanytherearethatmightbethrowingawayalotofdata.Also,theymightbethemostinterestingdataofall,sodeletingthemisprobablyabadidea.(And,rememberthe‘Offwiththeirheads!’crimementionedpreviously.)Wecouldjustgivethemaximumtimeuntilaneventtoallthosethathavenothadtheevent.Thiswouldalsobeabadidea,especiallyifalargeportionofthesampleiscensored,asisoftenthecase.(Itcanbeshownthatthrowingawayalotofcensoreddatawillbiasanyresults.)Thus,weneedatechniquethatcandealwithcensoreddata.Also,deletingcensoredobservationsignoresalotofinformation.Whilewedon’tknowwhen(orevenif)thecustomer,say,purchased,wedoknowasofacertaintimethattheydidNOTpurchase.Sowehavepartoftheircurve,partoftheirinformation,partoftheirbehaviour.Thisshouldnoteverbedeleted.ThisiswhyCoxinventedpartiallikelihood.
Figure5.1Generalsurvivalcurve
Theaboveisageneralsurvivalcurve.Theverticalaxisisacountofthoseinthe‘riskset’anditstartsoutwith100%.Thatis,attime0everyoneis‘atrisk’ofhavingtheeventandnoonehashadtheevent.Atday1,thatis,afteroneday,onepersondied(hadtheevent)andtherearenow99thatareatrisk.Noonediedfor3daysuntil9hadtheeventatday5,etc.Notethatataboutday12,29hadtheevent.
NownoteFigure5.2.Onesurvivalcurveisthesameasabove,buttheotheroneis
‘furtherout’.Notethat50%ofthefirstcurveisreachedat14days,butthesecondcurvedoesnotreach50%until28days.Thatis,they‘livelonger’.
Figure5.2Survivalanalysis
Survivalanalysisisatypeofregression,butwithatwist.Itdoesnotusemaximumlikelihood,butpartiallikelihood.(Themostcommonformofsurvivalanalysis,proportionalhazards,usespartiallikelihood.)Thedependentvariableisnowtwoparts:timeuntiltheeventandwhethertheeventhasoccurredornot.Thisallowstheuseofcensoredobservations.
Theabovegraphsaresurvivalgraphs.MuchofCoxregressionisnotaboutthesurvivalcurve,butthehazardrate.Thehazardisnearlythereciprocalofthesurvivalcurve.Thisendsupastheinstantaneousriskthataneventwilloccuratsomeparticulartime.Thinkofmetricslikemilesperhourasanalogoustothehazardrate.At40milesperhouryouwilltravel40milesinonehourifspeedremainsthesame.Thehazardquantifiestherateoftheeventineachperiodoftime.
SASdoesbothsurvivalmodelling(withproclifereg)andhazardmodelling(asprocphreg).SPSSonlydoeshazardmodelling(asCoxregression).Liferegdoesleftandintervalcensoringwhilephregdoesonlyrightcensoring(thisisnotusuallyanissueformarketing).Withliferegadistributionmustbespecified,butwithphreg(asit’ssemi-parametric)thereisnodistribution.Thisisoneoftheadvantagesofphreg.Theotheradvantageofphregisthatitincorporatestime-varyingindependentvariables,whileliferegdoesnot.(Thisalsoisnotusuallymuchofanissueformarketing.)
Itypicallyuseliferegasiteasilyoutputsatime-until-eventprediction,itisonthesurvivalcurveanditisrelativelyeasytounderstandandinterpret.That’swhatwe’lldemonstratehere.
Imightmentionthatsurvivalanalysisisnotjustaboutthetimeuntileventprediction.Aswithallregressionstheindependentvariablesarestrategiclevers.Saywefindthatforevery1,000e-mailswesendpurchasestendtohappenthreedayssooner.Doyouseethefinancialimplicationshere?Howvaluableisittoknowyouhaveincentivizedagroupof
customersinmakingpurchasesearlier?Ifthisdoesnotinterestyouthenyouareinthewrongcareerfield.
Modeloutputandinterpretation
SoScott’steaminvestigatedsurvivalanalysisandconcludeditwasworthashot.Itseemedtogiveawaytoanswerthekeyquestion,‘WHENisacustomermostlikelytopurchaseadesktop?’
Table5.1Finaldesktopmodel,lifereg
Independentvariables Beta e^B (e^B)-1 AvgTTE
Anypreviouspurchase –0.001 0.999 –0.001 –0.012
Recentonlinevisit –0.014 0.987 –0.013 –0.148
#Directmails 0.157 1.17 0.17 1.865
#E-mailsopened –0.011 0.989 –0.011 –0.12
#E-mailsclicked –0.033 0.968 –0.032 –0.352
Income –0.051 0.95 –0.05 –0.547
Sizehousehold –0.038 0.963 –0.037 –0.408
Education –0.023 0.977 –0.023 –0.249
Bluecollaroccupation 0.151 1.163 0.163 1.792
#Promotionssent –0.006 0.994 –0.006 –0.066
Purchdesktop<year 2.09 8.085 7.085 77.934
Thetableaboveliststhefinaldesktopmodelusinglifereg.Thevariablesareallsignificantatthe95%level.Thefirstcolumnisthenameoftheindependentvariable.Theinterpretationofliferegcoefficientsrequirestransformations.Thisgetstheparameterestimatesintoaformtomakestrategicinterpretation.
Thenextcolumnisthebetacoefficient.ThisiswhatSASoutputsbut,aswithlogisticregression,isnotverymeaningful.Anegativecoefficienttendstobringtheeventofadesktoppurchasein;apositivecoefficienttendstopushtheevent(desktoppurchase)out.Thisisaregressionoutputsointhatregardinterpretationisthesame,ceterisparibus.
Togetpercentimpactsontimeuntilevent(TTE),eachbetacoefficientmustbeexponentiated,e^B.That’sthethirdcolumn.Thenextcolumnsubtracts1fromitandconvertsitintoapercentage.Notethat,forexample,‘recentonlinevisit’e^Betaisa0.987impactontime,or,if1issubtractedshowsa1.3%decreaseinaverageTTE.Toconvertthattoascale–saytheaverageis11weeks–thismeans–0.013*11=–0.148weeks.The
interpretationisthatifacustomerhadarecentonlinevisitthattendstopullin(shorten)TTEby0.148weeks.Notrealimpactfulbutitmakessense,right?
Noticethelastvariable,‘purchdesktop<year’.Seehowit’spositive,2.09?Thismeansifthecustomerhaspurchasedadesktopinthelastyearthetimeuntil(another)desktoppurchaseispushedoutby((e^B)–1)*11=77.934weeks.Seehowthisworks?Seehowstrategicallyinsightfulsurvivalanalysiscanbe?Youcanbuildabusinesscasearoundmarcomsent(costofmarcom)anddecreasingthetimeuntilpurchase(revenuerealizedsooner).
Astypicallyusedonadatabase,eachcustomerisscoredwithtimeuntiltheevent,inthiscase,timeuntiladesktoppurchase.Thedatabaseissortedandalistisdesignedwiththosemostlikelytopurchasenext(seeTable5.2below).Thistimeuntilevent(TTE)isatthe(50%decile)median.
Table5.2Timeuntilevent(inweeks)
CustomerID TTE
1000 3.365
1002 3.702
1004 4.072
1006 4.479
1011 5.151
1013 5.923
1015 6.812
1017 7.834
1022 9.009
1024 10.36
1026 12.43
1030 14.92
Notethatcustomer1000isexpectedtopurchaseadesktopin3.3weeksandcustomer1030isexpectedtopurchaseadesktopin14.9weeks.Usingsurvivalanalysis(inSAS,proclifereg)allowedScott’steamtoscorethedatabasewiththoselikelytopurchasesooner.Thislistismoreactionablethanusinglogisticregression,wherethescoreisjustprobabilitytopurchase.
Nowlet’stalkaboutcompetingrisks.Whilesurvivalanalysisisaboutdeath,thestudy
usuallyisinterestedinONEkindofdeath,ordeathfromONEcause.Thatis,thebiostatstudyisabout,say,deathbyheartattackandnotaboutdeathbycancerordeathbyacaraccident.Butit’struethatinastudyofdeathbyheartattackapatientisalsoatriskforotherkindsofdeath.Thisiscalledcompetingrisks.
Inthemarketingarena,whilethefocusmightbeonapurchaseeventfor,say,adesktopPC,thecustomerisalso‘atrisk’forpurchasingotherthings,likeanotebookorconsumerelectronics.Fortunately,thisisaneasyjobofjustcodingtheeventsofinterest.Thatis,ScottcancodeforaneventasDT(desktop)purchase,withallelsecodedasanon-event.Hecandoanothermodelasapurchaseeventof,say,notebooks,andallelseisanon-event,thatis,allotherthingsarecensored.ThusTable5.3showsthreemodels,havingapurchaseeventfordesktop,notebookandconsumerelectronics.
Table5.3Threemodelcomparison
CustomerID TTdesktoppurch TTnotebookpurch TTconsumerelectronicspurch
1000 3.365 75.66 39.51
1002 3.702 88.2 45.95
1004 4.072 111.2 55.66
1006 4.479 15.05 19.66
1011 5.151 13.07 9.109
1013 5.923 9.945 7.934
1015 6.812 22.24 144.5
1017 7.834 3.011 5.422
1022 9.009 2.613 5.811
1024 10.36 1.989 6.174
1026 12.43 4.448 8.44
1030 14.92 0.602 7.76
Alittletechnicalbackground
First,somethingtonoteaboutliferegisthatitrequiresyoutogiveitadistribution.(Phregdoesnotrequirethatyougiveitadistribution,somethingalotofanalystslike.)Inusinglifereg,I’dsuggesttestingalldistributions,andtheonethatfitsthebest(lowestBICorloglikelihood)istheonetouse.Anotherviewwouldbetoacknowledgethatthedistributionhasashapeandascertainwhatshapemakessensegiventhedatayou’reusing.
PseudoR2
WhileR2asametricmakesnosense(sameaswithlogisticregression)alotofanalystslikesomekindofR2.Toreview,R2inOLSisthesharedvariancebetweentheactualdependentvariableandthepredicteddependentvariable.Insurvivalanalysisthereisnopredicteddependentvariable.Mostfolksusethemedianasthepredictionandthat’sokay.I’dsuggestrunningasimplemodelwith,andwithout,covariates.Thatis,inSASwithproclifereg,runthemodelwithoutthecovariates(independentvariables)andcollectthe–2loglikelihoodstat.Thenrunthemodelwiththecovariatesandcollectthe–2LLstatanddivide.Thismetric(byanalogy)showsthepercentofexplainedoverthepercentunexplained.
Conclusion
Survivalanalysisisnotacommontopicinmarketinganalyticsanditshouldbe.Whileit’struethatmarketersandbiostatisticians(wheresurvivalanalysisoriginated)donotmoveinthesamecircles,I’venowgivenyousomeofthebasics,sogoandgettowork.
HIGHLIGHT
LIFETIMEVALUE:HOWPREDICTIVEANALYSISISSUPERIORTODESCRIPTIVEANALYSIS
AbstractTypicallylifetimevalue(LTV)isbutacalculationusinghistoricaldata.Thiscalculationmakessomeratherheroicassumptionstoprojectintothefuturebutgivesnoinsightsintowhyacustomerislowervalued,orhowtomakeacustomerhighervalued.Usingpredictivetechniques,heresurvivalanalysisgivesanindicationastowhatcausespurchasestohappensooner,andthushowtoincreaseLTV.
DescriptiveanalysisLifetimevalue(LTV)istypicallydoneasjustacalculation,usingpast(historical)data.Thatis,it’sonlydescriptive.
WhiletherearemanyversionsofLTV(dependingondata,industry,interest,etc.)thefollowingisconceptuallyappliedtoall.LTV,viadescriptiveanalysis,worksasfollows:
1. Ituseshistoricaldatatosumupeachcustomer’stotalrevenue.2. Thissumthenhassubtractedfromitsomecosts:typicallycosttoserve,costto
market,costofgoodssold,etc.3. Thisnetrevenueisthenconvertedintoanannualaverageamountanddepictedasa
cashflow.
4. Thesecashflowsareassumedtocontinueintothefutureanddiminishovertime(dependingondurability,salescycle,etc.)oftendecreasingarbitrarilybysay10%eachyearuntiltheyareeffectivelyzero.
5. These(future,diminished)cashflowsarethensummedupanddiscounted(usuallybyweightedaveragecostofcapital)togettheirnetpresentvalue(NPV).
6. ThisNPViscalledLTV.Thiscalculationisappliedtoeachcustomer.
Thuseachcustomerhasavalueassociatedwithit.Thetypicaluseisformarketerstofindthe‘high-valued’customers(basedonpastpurchases).Thesehigh-valuedcustomersgetmostofthecommunications,promotions/discountsandmarketingefforts.Descriptiveanalysisismerelyabouttargetingthosealreadyengaged,muchlikeRFM(recency,frequency,monetary),whichwewilldiscusslater.
Thisseemstobeagoodstartingpointbut,asisusualwithdescriptiveanalysis,contributesnothinginformative.Whyisonecustomermorevaluable,andwilltheycontinuetobe?Isitpossibletoextractadditionalvalue,butatwhatcost?Isitpossibletogarnermorerevenuefromalowervaluedcustomerbecausetheyaremoreloyalorcostlesstoserve?Whatpartofthemarketingmixiseachcustomermostsensitiveto?LTV(asdescribedabove)givesnoimplicationsforstrategy.Theonlystrategyistoofferandpromoteto(only)thehigh-valuedcustomers.
PredictiveanalysisHowwouldLTVchangeusingpredictiveanalysisinsteadofdescriptiveanalysis?FirstnotethatwhileLTVisafuture-orientedmetric,descriptiveanalysisuseshistorical(past)dataandtheentiremetricisbuiltonthat,withassumptionsaboutthefutureappliedunilaterallytoeverycustomer.PredictiveanalysisspecificallythrustsLTVintothefuture(whereitbelongs)byusingindependentvariablestopredictthenexttimeuntilpurchase.SincethemajorcustomerbehaviourdrivingLTVistiming,amountandnumberofpurchases,astatisticaltechniqueneedstobeusedthatpredictstimeuntilanevent.(OrdinaryregressionpredictingtheLTVamountignorestimingandnumberofpurchases.)
Survivalanalysisisatechniquedesignedspecificallytostudytimeuntileventproblems.Ithastimingbuiltintoitandthusafutureviewisalreadyembeddedinthealgorithm.Thisremovesmuchofthearbitrarinessoftypical(descriptive)LTVcalculations.
So,whataboutusingsurvivalanalysistoseewhichindependentvariables,say,bringinapurchase?DecreasingtimeuntilpurchasetendstoincreaseLTV.Whilesurvivalanalysiscanpredictthenexttimeuntilpurchase,thestrategicvalueofsurvivalanalysisisinusingtheindependentvariablestoCHANGEthetimingofpurchases.Thatis,descriptiveanalysisshowswhathappened;predictiveanalysisgivesaglimpseofwhatmightCHANGEthefuture.
StrategyusingLTVdictatesunderstandingthecausesofcustomervalue:whyacustomerpurchases,whatincreases/decreasesthetimeuntilpurchase,probabilityofpurchasingatfuturetimes,etc.Thenwhentheseinsightsarelearned,marketinglevers(shownasindependentvariables)areexploitedtoextractadditionalvaluefromeachcustomer.Thismeansknowingthatonecustomeris,say,sensitivetopriceandthatadiscountwilltendtodecreasetheirtimeuntilpurchase.Thatis,theywillpurchasesooner(maybepurchaselargertotalamountsandmaybepurchasemoreoften)withadiscount.Anothercustomerprefers,say,productXandproductYbundledtogethertoincreasetheprobabilityofpurchaseandthisbundlingdecreasestheirtimeuntilpurchase.Thisinsightallowsdifferentstrategiesfordifferentcustomerneedsandsensitivities.Survivalanalysisappliedtoeachcustomeryieldsinsightstounderstandandincentivizechangesinbehaviour.
Thismeansjustassumingthepastbehaviourwillcontinueintothefuture(asdescriptiveanalysisdoes)withnoideawhy,isnolongernecessary.It’spossiblefordescriptiveandpredictiveanalysistogivecontradictoryanswers.Whichiswhy‘crawling’mightbedetrimentalto‘walking’.
Ifafirmcangetacustomertopurchasesooner,thereisanincreasedchanceofaddingpurchases–dependingontheproduct.Butevenifthenumberofpurchasesisnotincreased,thefirmgettingrevenuesoonerwilladdtotheirfinancialvalue(timeismoney).
Alsoabusinesscasecanbecreatedbyshowingthetrade-offingivingup,say,marginbutobtainingrevenuefaster.Thismeansstrategycanrevolvearoundmaximizationofcostbalancedagainstcustomervalue.
Theideaistomodelnexttimeuntilpurchase,thebaseline,andseehowtoimprovethat.Howisthiscarriedout?Abehaviourally-basedmethodwouldbetosegmentthecustomers(basedonbehaviour)andapplyasurvivalmodeltoeachsegmentandscoreeachindividualcustomer.Bybehaviourwetypicallymeanpurchasing(amount,timing,shareofproducts,etc.)metricsandmarcom(openandclick,directmailcoupons,etc.)responses.
AnexampleLet’suseanexample.Table5.4showstwocustomersfromtwodifferentbehaviouralsegments.CustomerXXXpurchasesevery88dayswithanannualrevenueof43,958,costsof7,296foranetrevenueof36,662.Saythesecondyearisexactlythesame.Soyearonediscountedat9%isNPVof33,635andyeartwodiscountedat9%fortwoyearsis30,857foratotalLTVof64,492.CustomerYYYhassimilarcalculationsforLTVof87,898.
Table5.4Comparisonofcustomersfromdifferentbehaviouralsegments
Customer Daysbetweenpurchases
Annualpurchases
Totalrevenue
Totalcosts
NetrevYR1
NetrevYR2
YR1Disc
YR2Disc
LTVAT9%
XXX 88 4.148 43,958 7,296 36,662 36,662 33,635 30,857 64,492
YYY 58 6.293 62,289 12,322 49,967 49,967 45,842 42,056 87,898
Theabove(usingdescriptiveanalysis)wouldhavemarketerstargetingcustomerYYYwith>23,000valueovercustomerXXX.ButdoweknowanythingaboutWHYcustomerXXXissomuchlowervalued?Isthereanythingthatcanbedonetomakethemhighervalued?
Applyingasurvivalmodeltoeachsegmentoutputsindependentvariablesandshowstheireffectonthedependentvariable.Inthiscasethedependentvariableis(average)timeuntilpurchase.Saytheindependentvariables(whichdefinedthebehaviouralsegments)arethingslikepricediscounts,productbundling,seasonalmessages,addingadditionaldirectmailcataloguesandofferingonlineexclusives.Thesegmentationshouldseparatecustomersbasedonbehaviourandthesurvivalmodelsshouldshowhowdifferentlevelsofindependentvariablesdrivedifferentstrategies.
Table5.5overleafshowsresultsofsurvivalmodellingonthetwodifferentcustomersthatcomefromtwodifferentsegments.Theindependentvariablesarepricediscountsof10%,productbundling,etc.TheTTEistimeuntileventandshowswhathappenstotimeuntilpurchasebasedonchangingoneoftheindependentvariables.Forexample,forcustomerXXX,givingapricediscountof10%onaveragedecreasestheirtimeuntilpurchaseby14days.GivingYYYa10%discountsdecreasestheirtimeuntilpurchasebyonly2days.ThismeansXXXisfarmoresensitivetopricethenYYY–whichwouldnotbeknownbydescriptiveanalysisalone.LikewisegivingXXXmoredirectmailcataloguespushesouttheirTTEbutpullsinYYYby2days.NotealsothatverylittleofthemarketingleversaffectYYYverymuch.WearealreadygettingnearlyallfromYYYthatwecan,andnomarketingeffortdoesverymuchtoimpacttheTTE.However,withXXXthereareseveralthingsthatcanbedonetobringintheirpurchases.Again,noneofthesewouldbeknownwithoutsurvivalmodellingoneachbehaviouralsegment.
Table5.5Resultsofsurvivalmodelling
XXX YYY
Variables TTE TTE
Pricediscount10% –14 –2
Productbundling –4 12
Seasonalmessage 6 5
Fivemorecatalogues 11 –2
Onlineexclusive –11 3
Table5.6belowshowsnewLTVcalculationsonXXXafterusingsurvivalmodellingresults.WedecreasedTTEby24days,byusingsomecombinationsofdiscounts,bundlingandonlineexclusives,etc.NotenowtheLTVforXXX(afterusingpredictiveanalysis)isgreaterthanYYY.
Table5.6LTVcalculations
Customer Daysbetweenpurchases
Annualpurchases
Totalrevenue
Totalcosts
NetrevYR1
NetrevYR2
YR1Disc
YR2Disc
LTVAT9%
XXX 64 5.703 60,442 10,032 50,410 50,410 33,635 30,857 88,677
YYY 58 6.293 62,289 12,322 49,967 49,967 45,842 42,056 87,898
Whatsurvivalanalysisoffers,inadditiontomarketingstrategylevers,isafinancialoptimalscenario,particularlyintermsofcoststomarket.Thatis,customerXXXrespondstoadiscount.It’spossibletocalculateandtestwhatisthe(just)neededthresholdofdiscountstobringapurchaseinbysomanydayswiththeestimatedlevelofrevenue.Thisendsupbeingacost/benefitanalysisthatmakesmarketersthinkaboutstrategy.Thisistheadvantageofpredictiveanalysis–givingmarketersstrategicoptions.
Checklist
You’llbethesmartestpersonintheroomifyou:
Pointoutthat‘timeuntilanevent’isamorerelevantmarketingquestionthan‘probabilityofanevent’.
Rememberthatsurvivalanalysiscameoutofbiostatisticsandissomewhatrareinmarketing,butverypowerful.
Observethattherearetwo‘flavours’ofsurvivalanalysis:liferegandproportionalhazards.Liferegmodelsthesurvivalcurveandproportionalhazardsmodelsthehazardrate.
Championcompetingrisks,anaturaloutputofsurvivalanalysis.Inmarketing,thisgivestimeuntilvariouseventsortimeuntilmultipleproductspurchased,etc.
Understandthatpredictivelifetimevalue(usingsurvivalanalysis)ismoreinsightfulthandescriptivelifetimevalue.
06
Modellingdependentvariabletechniques(withmorethanoneequation)Introduction
Whataresimultaneousequations?
Whygotothetroubleofusingsimultaneousequations?
Desirablepropertiesofestimators
Businesscase
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionSofarwe’vedealtwithoneequation,arathersimplepointofview.Ofcourse,consumerbehaviourisanythingbutsimple.Marketingscienceisdesignedtounderstand,predictandultimatelyincentivize/changeconsumerbehaviour.Thisrequirestechniquesthatareascomplicatedasthatbehaviourissophisticated.Thisiswheresimultaneousequationscomein,asamorerealisticmodelofbehaviour.
Simultaneousequations:asystemofmorethanonedependentvariable-typeequation,oftensharingseveralindependentvariables.
Whataresimultaneousequations?Simplyput,simultaneousequationsaresystemsofequations.Youhadthisinalgebra.It’simportant.Thisbeginstobuildasimulationofanentireprocess.It’sdoneinmacroeconomics(remembertheKeynesianequations?)anditcanbedoneinmarketing.
PredeterminedandexogenousvariablesTherearetwokindsofvariables:predetermined(laggedendogenousandexogenous)andendogenousvariables.Generally,exogenousarevariablesdeterminedOUTSIDEthesystemofequationsandendogenousaredeterminedINSIDEthesystemofequations.(Thinkofendogenousvariablesasbeingexplainedbythemodel.)Thiscomesinhandytoknowwhenusingtheruleintheidentityproblembelow.(TheidentityproblemisaGIANTpainintheneckbutthemodelcannotbeestimatedwithoutgoingthroughthesehoops.)
Thisisimportantbecauseapredeterminedvariableisonethatiscontemporaneouslyuncorrelatedwiththeerrorterminitsequation.Notehowthistiesupwithcausality.IfYiscausedbyXthenYcannotbeanindependentvariableincontemporaneouslypredicting/explainingY.
Saywehaveasystemcommonineconomics:
Q(demand)=D(I)+D(price)+Income+D(error)
Q(supply)=S(I)+S(price)+S(error)
NotethatthevariablesQandpriceareendogenous(computedwithinthesystem)andincomeisexogenous.Thatis,incomeisgiven.(D(I)istheinterceptinthedemandequationandS(I)istheinterceptinthesupplyequation.)Theseequationsarecalledstructuralformsofthemodel.Algebraically,thesestructuralformscanbesolvedforendogenousvariablesgivingareducedformoftheequations.
Reducedformequations:ineconometrics,modelssolvedintermsofendogenousvariables.
Thatis:
Thereducedformoftheequationsshowshowtheendogenousvariables(thosedeterminedwithinthesystem)DEPENDonthepredeterminedvariablesanderrorterms.Thatis,thevaluesofQandPareexplicitlydeterminedbyincomeanderrors.Thismeansthatincomeisgiventous.
Notethattheendogenousvariablepriceappearsasanindependentvariableineachequation.Infact,itisNOTindependent,itdependsonincomeanderrortermsandthisistheissue.Itisspecificallycorrelatedwithitsown(contemporaneous)errorterm.Correlationofanindependentvariableanditserrortermsleadstoinconsistentresults.
Whygotothetroubleofusingsimultaneousequations?First,becauseit’sfun.AlsonotethatifasystemshouldbemodelledwithsimultaneousequationsandISNOT,theparameterestimatesareINCONSISTENT!Lastly,insightsaremorerealistic.Thesimulationsuggeststheappropriatecomplexity.
ConceptualbasicsGenerally,anyeconomicmodelhastohavethenumberofvariableswithvaluestobeexplainedtobeequaltothenumberofindependentrelationshipsinthemodel.Thisistheidentificationproblem.
Manytextbooks(Kmenta,Kennedy,Greene,etc.)cangivethemathematicderivationforthesolutionofsimultaneousequations.Thegeneralproblemisthattherehavetobeenoughknownvariablesto‘fix’eachunknownquantityestimated.Thatis,thereneedstobearule.Thegoodnewsisthatthereis.Hereistheruleforsolvingtheidentificationproblem:
Thenumberofpredeterminedvariablesexcluded
intheequationMUSTbe>=thenumberofendogenous
variablesincludedintheequation,lessone.
Let’susethisruleonthesupply-demandequationabove:
Q(demand)=D(I)+D(price)+Income+D(error)
Q(supply)=S(I)+S(price)+S(error)
Demand:thenumberofpredeterminedvariablesexcluded=zero.IncomeistheonlypredeterminedvariableanditISNOTexcludedfromthedemandequation.Thenumberofendogenousvariablesincludedlessone=2–1=1.Thetwoendogenousvariablesarequantityandprice.Sothenumberofpredeterminedvariablesexcludedintheequation=0andthisis<thenumberofendogenousvariablesincludedintheequation.Thereforethedemandequationisunder-identified.
Supply:thenumberofpredeterminedvariablesexcluded=one.Incomeistheonlypredeterminedvariableanditisexcludedfromthesupplyequation.Thenumberofendogenousvariablesincludedlessone=2–1=1.Thetwoendogenousvariablesarequantityandprice.Sothenumberofpredeterminedvariablesexcludedintheequation=0andthisis<thenumberofendogenousvariablesincludedintheequation.Thereforethesupplyequationisexactlyidentified.
DesirablepropertiesofestimatorsWehavenottalkedabout(andit’sabouttimewedid)whatarethedesirablepropertiesofestimators.Thatis,wehavespenteffortestimatingcoefficientson,say,priceandadvertisingbuthavenotdiscussedhowtoknowiftheestimatoris‘good’.Thatisthepurposeofthefollowingbriefdescription.Ifyouneedafuller(moretheoreticallystatistical)backgroundvirtuallyanyeconometricstextbookwillsuffice.(Asmentionedintheintroductiontothisbook,IpersonallylikeKmenta’sElementsofEconometricsandKennedy’sAGuidetoEconometrics.)
UnbiasednessAdesirablepropertymosteconometriciansagreeonisunbiasedness.Unbiasednesshastodowiththesamplingdistribution(rememberthestatisticalintroductionchapter?Youdidn’tthinkthatwouldeverbementionedagain,didyou?).
Ifwetakeanunlimitednumberofsamplesofwhatevercoefficientwe’reestimating,andaverageeachofthesesamplestogetherandplotthedistributionofthoseaveragesofthesamples,whatwewouldendupwithisthedistributionofthebetacoefficientofthatvariable.Theaverageoftheseaveragesisthecorrectvalueofthebetacoefficient,onaverage.Honest.Nowwhatdoesthismean?Itmeanstheestimatorofbetaissaidtobeunbiasedifthemeanofthe(verylargenumberofsamples)samplingdistributionisthesamevalueastheestimatedbetacoefficient.Thatis,iftheaveragevalueofbetainrepeatedsamplingisbeta,thentheestimatorforbetaisunbiased,onaverage.NotethatthisdoesNOTmeanthattheestimatedvalueofbetaisthecorrectvalueofbeta.ItmeansONAVERAGEtheestimatedvalueofbetawillbethevalueofbeta.Soundslikedoubletalk,huh?
Theobviousquestionishowdoyouknowifyourestimatorisunbiased?Thatisunfortunatelyaverymathematicallycomplexdiscussion.Theshortansweris:itdependsonhowthedataisgeneratedanditdependsalotonthedistributionoftheerrortermofthemodel.Rememberstatisticsusesinductivethinking(notdeductivethinking)soitisviewedfrominferences,indirectly.Thatis,anestimator,say,viaregression,isdesignedwiththesepropertiesinmind.Thusthesepropertiesproduceassumptionstotakeintoaccounthowthedataisgeneratedandwhatthatdoestothedisturbanceandhencewhatthatmeansforthesamplingdistribution.Asanexample,forregression,theassumptionsare:
1. ThedependentvariableactuallyDEPENDSonalinearcombinationofindependentvariablesandcoefficients.
2. Theaverageoftheerrortermiszero.3. Theerrortermshavenoserialcorrelationandhavethesamevariance(withall
independentvariables).4. Theindependentvariablesarefixedinrepeatedsamples,oftencallednon-stochastic
X.5. Thereisnoperfectcollinearitybetweentheindependentvariables.
Inaveryrealway,econometricmodellingisallaboutdealingwith(detectingandcorrecting)violationsoftheaboveassumptions.Justtomaketheobviouspoint:theseassumptionsaremadesothatthesamplingdistributionoftheparameterestimateshavedesirableproperties,suchasunbiasedness.Now,howimportantisunbiasedness?SomeeconometriciansclaimitisVERYimportantandtheyspendalltheirtimeandeffortaroundthat(andotherproperties).Imyselftakelittlecomfortinunbiasedness.Iwantto
knowiftheestimatorsarebiasedornot,maybeevenaguessastohowmuch,butintherealworld,itisnotoftenofmuchpracticalmatter.ThisisbecauseyoucouldhavetheoreticallyanynumberofsamplesandwhileonaveragethesamplingdistributionIStherealbetaestimate,youneverreallyknowwhichsampleyouhave.It’spossibleyouhaveanunusuallybadsample.Andintherealworldyouarenotusuallyabletotakemanysamples,indeedyouusuallyonlyhaveONE,theoneinfrontofyou.
EfficiencyWhatisoftenmoremeaningful,afterunbiasedness,inmanycases,isefficiency.Thatis,anestimatorthathasminimumvarianceofalltheunbiasedestimators.Insimpletermsitmeansthatestimator,ofalltheunbiasedestimators,hasthesmallestvariance.
ConsistencyUnbiasednessandefficiencyareaboutthesamplingdistributionoftheestimatedcoefficientanddonotdependonthesizeofthesample.Asymptoticpropertiesareaboutthesamplingdistributionoftheestimatedcoefficientinlargesamples.Consistencyisanasymptotic(largesample)property.
Becausethesamplingdistributionchangesasthesamplesizeincreases,themeanandthevariancecanchange.Consistencyisthepropertythatthetruebetavaluewillcollapsetothepointofthepopulationbetavalue,assamplesizeincreasestoinfinity.
ConsistencyissomethingIlikealot,because(indatabasemarketing,forexample)wetypicallyworkwithverylargesamplesandthereforecantakecomfortinthesamplingpropertiesoftheestimators.
WhyamIbringingalltheaboveupnow?Becauseinsimultaneousequations,theonlypropertytheestimatorscanhave(becausetheindependentvariableswillNOTbefixedinrepeatedsamples,thatis,thenon-stochasticXassumptionisviolated)willbeconsistency.
BUSINESSCASEScott’sbosscalledhimintohisoffice.Thesubjectofthemeetinginvitewas‘Cannibalization?’
‘Scott,ourpricingteamsarealwaysatwar,asyouknow.Wehavealwaysfeltthatoneproductcouldcannibalizeanotherwithwildpricingsfromtheproductteams.’
‘Yeah,wetalkaboutthateveryquarter.’
‘WhatIwonderedwas,givenyoursuccessatquantifyingsomuchofourmarketingoperations,canwedosomethingaboutcannibalization?’
‘Whatdoyoumean,“dosomethingaboutit”?’
‘Canweputtogethersomemodelofoptimization?WhatpricesSHOULDthethreeproductteamscharge,inordertomaximizeouroverallrevenue?’
‘Soit’spricingfortheenterpriseinsteadofpricingfortheproduct.Thatsoundslikeaverycomplicatedproblem.’
‘Butitissimilartotheelasticitymodellingthatyoudid,especiallyintermsofsubstitutes,right?’
‘Yeah,Ithinkso.I’mnotsurehowtogetthedemandofeachproductintotheregression.I’llhavetoresearchit.’
‘Great,thanks.E-mailmetomorrowyourideas.’
Scottlookedathimandblinked.Hisbossturnedhischairaroundandwentbacktolookingoverhisothere-mails.Scottgotupandwentbacktohisplace,alittlebewildered.
Coulditbejusthavingademandequationfor,say,desktopsthatincludedthepriceofdesktopsaswellasthepricesofnotebooksandservers?Thatdidnotseemlikeittookintoaccountalloftheinformationavailable.Thatis,theremustbecross-equationcorrelation,meaningconsumersfeelthepricesofnotebookschangeastheyshopforadesktop,etc.WhatScottneededwasawaytosimultaneouslymodeltheimpactofeachproduct’spriceoneachproduct’sdemand.
Theaboveisademandsystem.Itisasetofthreesimultaneousequationsthataresolved(naturally)simultaneously.Thissetofequationspositsthatthedemand(quantity)ofeachproductisimpactedbytheown-priceoftheproductaswellasthecross-priceoftheotherproducts.
Notethattheapproachherewillbefairlybriefandeconometricallyoriented.Foradetailedmathematicalandmicroeconomicallyorientedtreatment,seeAngusDeatonandJohnMuellbauer’soutstanding1980workEconomicsandConsumerBehavior.Inthatbooktheythoroughlydetailconsumerdemandanddemandsystemswhereintheyultimatelypositthe(unfortunatelynamed)AlmostIdealDemandSystem(AIDS).
SoScottresearchedsimultaneousequations.RightawayitwasobviousthatthistechniqueviolatestheOLSassumptionofindependentvariablesfixedinrepeatedsample,ornon-stochasticX.Thatis,theindependentvariablessolutiondependedonthevaluesoftheindependentvariablesintheotherequations.Thisultimatelymeanttheonlydesirableproperty(notunbiasedness,notefficiency)wasconsistency.Thatis,simultaneousequationshavedesirableasymptoticproperties.
Scottfoundanotherissueresultingfromsimultaneousequations:theproblemofidentity.Hehadtoapplytherule(mentionedabove)thateachequationbeatleastjustidentified.Recalltheruleforidentificationis:
Thenumberofpredeterminedvariables
excludedintheequationbe>=thenumber
ofendogenousvariablesincludedinthe
equation,lessone.
NowScotthadtoputtogethertheequationsfromthedatahecollected.Hegotweeklydataondesktop,notebookandworkstationsales(units)forthelastthreeyears.Hegottotalrevenueofeachaswell,whichwouldgivehimaverageprice(price=totalrevenue/units).Hewoulduseseasonalityandconsumerconfidence.Hecollectednumberofdirectmailssentandthenumberofe-mailssent,openedandclickedbyweek.
Scottputtogethertheresultsoverleaffromthemodel(Table6.1).Notetheidentificationstatusonallis‘overidentified’.Fordesktops:thenumberofpredeterminedvariablesexcludedis4(numberofe-mails,numberofvisits,JanuaryandOctober)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofdesktops,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,4>3.Fornotebooks:thenumberofpredeterminedvariablesexcludedis4(numberofdirectmails,consumerconfidence,DecemberandOctober)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofnotebooks,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,4>3.Forworkstations:thenumberofpredeterminedvariablesexcludedis6(numberofe-mails,numberofdirectmails,numberofvisits,consumerconfidence,DecemberandAugust)andthenumberofendogenousvariablesincluded(lessone)is3(quantityofworkstations,priceofdesktops,priceofnotebooksandpriceofworkstations).Thus,6>3.
Table6.1Modelresults
PriceDT
PriceNB
PriceWS
#DMs
#EMs
#Visits
Consconf
Jan Dec Oct Aug
QuantityDT
–1.2 2.3 0.4 3.7 XX XX 5.3 XX 1.2 XX 0.5
QuantityNB
1.1 –2 0.2 XX 6.2 2.2 XX –0.8 XX XX 2.9
QuantityWS
0.2 0.8 –2.6 XX XX XX XX –1.1 XX –1.9 XX
Now,whatdoesTable6.1mean?Thiswasdesignedasanoptimalpricingproblem.WhatdoesthemodeltellScott?
First,sincethefocusisonpricingandspecificallycannibalization,lookatthedesktopmodel.Thepricecoefficientisnegative,aswe’dexpect:pricegoesup,quantitygoesdown.Nownoticethecoefficientonnotebooks.It’spositive(+2.3).Thismeansitisseen(bydesktopbuyers)asapotentialsubstitute.NotethatifnotebookpricesgodownthatispositivelycorrelatedwiththedemandfordesktopsandthequantityofdesktopswillGODOWNaswell.Thisiskeystrategicinformation.Itmeansthepricingpeoplecannot(andnevercould)priceinavacuum.RememberHazlitt’sbookEconomicsinOneLesson(1979)?Thelessonwasthateverythingis(directlyorindirectly)connected.Whathappenswithnotebookpricesaffectswhathappenstodesktopdemand.Thismeansaportfolioapproachshouldbetakenandnotasiloapproach.Noteaswellthat,inthedesktopequation,thepricesofworkstationsarealsoasubstitute,butless.It’sobviousthatthisinformationcanbeusedtomaximizetotalprofit.Itmightbethatoneparticularbrand(orproduct)willsubsidizeothers,butasuccessfulfirmwilloperateasanenterprise.Similarconclusionsarefortheotherproducts,intermsofpricing.
Theotherindependentvariablesareinterpretedlikewise.Consumerconfidenceandnumberofdirectmailsarepositiveininfluencingdesktopssalesbutnotintheotherproducts.Fornotebooks,e-mailsandvisitsarepositivebutAugustseasonalityisnegative.ForworkstationsbothJanuaryandOctoberarenegative.Allofthisisstrategicallylucrative.Forexample,don’tsende-mailstodesktopstargets,don’tsenddirectmailstonotebooktargetsanddon’tdomuchmarcominJanuary.
Scottusedtheabovemodeltohelpreorganizethepricingteams.Theybegantopriceasanenterpriserandnotinsilos.Notallofthemlikeditatfirstbuttheincreasesinrevenue(whichtranslatedintobonusesforthem)helpedtoassuagetheirmisgivings.
Conclusion
Simultaneousequationscanquantifyphenomenaandcangiveanswersimpossibletogetotherwise.Yes,it’sdifficult,requiresspecializedsoftwareandahighlevelofexpertise.But,asthebusinesscaseaboveshows,howelsewouldthefirmknowaboutoptimizingpricesacrossproductsorbrands?Inshort,thepriceisworthit.
Checklist
You’llbethesmartestpersonintheroomifyou:
Learntoenjoytheaddedcomplexitythatsimultaneousequationsbringtoanalytics–itbettermatchesconsumerbehaviour.
Rememberthatsimultaneousequationsusetwokindsofvariables:predetermined(laggedendogenousandexogenous)andendogenousvariables.
Pointoutthatestimatorshavedesirableproperties:unbiasedness,efficiency,consistency,etc.
Observethateconometricsisreallyallaboutdetectingandcorrectingviolationsofassumptions(linearity,normality,sphericalerrorterms,etc.).
Provethatsimultaneousequationscanbeusedforoptimalpricingandunderstandingcannibalizationbetweenproducts,brands,etc.
Partthree
Inter-relationshiptechniques
07
Modellinginter-relationshiptechniquesWhatdoesmy(customer)marketlooklike?Introduction
Introductiontosegmentation
Whatissegmentation?Whatisasegment?
Whysegment?Strategicusesofsegmentation
ThefourPsofstrategicmarketing
Criteriaforactionablesegmentation
Aprioriornot?
Conceptualprocess
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionAsmentionedearlier,therearetwogeneraltypesofmultivariateanalysis:dependentvariabletechniquesandinter-relationshiptechniques.Mostofthefirstpartofthisbookhasbeenconcernedwithdependentvariabletechniques.Theseincludeallofthetypesofregression(ordinary,logistic,survivalmodelling,etc.),aswellasdiscriminateanalysis,conjointanalysis,etc.
Thepointofdependentvariabletechniquesistounderstandtowhatextentthedependentvariabledependsontheindependentvariables.Thatis,howdoespriceimpactunits,whereunitsisthedependentvariable(somethingwearetryingtounderstandorexplain)andpriceistheindependentvariable,avariablethatishypothesizedtocausethemovementinthedependentvariable.
Inter-relationshiptechniqueshaveacompletelydifferentpointofview.Theseincludemultivariatealgorithmslikefactoranalysis,segmentation,multi-dimensionalscaling,etc.Inter-relationshiptechniquesaretryingtounderstandhowvariables(price,productpurchases,advertisingspend,etc.)interact(inter-relate)together.Rememberhowfactoranalysiswasusedtocorrectforcollinearityinregression?Itdidthisbyextractingthevarianceoftheindependentvariablesinsuchawaysoaseachfactor(whichcontainedthevariables)wasuncorrelatedwithallotherfactors,thatis,theinter-relationshipbetweenthe
independentvariableswasconstructedtoformfactors.
Thissectionwillspendconsiderableeffortonaninter-relationshiptechniquethatisofupmostinterestandimportancetomarketing:segmentation.
IntroductiontosegmentationOk.Thisintroductorychapterisdesignedtodetailsomeofthestrategicusesandnecessitiesofsegmentation.Thechapterfollowingthiswilldiveintomoreoftheanalytictechniquesandwhatsegmentationoutputmaylooklike.Segmentationisoftenthebiggestanalyticprojectavailableandonethatprovidespotentiallymorestrategicinsightsthananyother.Plus,it’sfun!
Whatissegmentation?Whatisasegment?Agoodplacetostartistomakesureweknowwhatwe’retalkingabout.Radical,Iknow.Bydefinition,segmentationisaprocessoftaxonomy,awaytodividesomethingintoparts,awaytoseparateamarketintosub-markets.Itcanbecalledthingslike‘clustering’or‘partitioning’.Thus,amarketsegment(cluster)isasub-setofthemarket(orcustomermarket,ordatabase,etc.)
Segmentation:inmarketingstrategy,amethodofsub-dividingthepopulationintosimilarsub-marketsforbettertargeting,etc.
Thegeneraldefinitionofasegmentisthatmembersare‘homogeneouswithinandheterogeneousbetween’.Thatmeansthatagoodsegmentationsolutionwillhaveallthemembers(say,customers)withinasegmenttobeverysimilartoeachotherbutverydissimilartoallmembersofallothersegments.Homogeneousmeans‘same’andheterogeneousmeans‘different’.
It’spossibletohaveveryadvancedstatisticalalgorithmstoaccomplishthis,oritcanbeaverycrudebusinessrule.Thenextchapterwillmentionafewstatisticaltechniquesfordoingsegmentation.Notethatabusinessrulecouldsimplybe,‘Separatethedatabaseintofourparts:highestuse,mediumuse,lowuseandnouseofourproduct’.Thismanagerialfiathasbeen(andstillis)usedbymanycompanies.
RFM(recency,frequencyandmonetaryvariables)isanothersimplebusinessrule:separatethedatabaseinto,say,decilesbasedonthreemetrics:howrecentlyacustomerpurchased,howfrequentlyacustomerpurchasedandhowmuchmoneyacustomerspent.Manycompaniesarenotdoingmuchmorethanthis,intermsofsegmentation.ThesecompaniesarecertainlynotmarketingcompaniesbecausetechniqueslikeRFMarereallyfromafinancial,andnotacustomer,pointofview.Therefore,asegmentisthatentitywhereinallmembersassignedtothatsegmentare,bysomedefinition,alike.
Whysegment?Strategicusesofsegmentation
So,whysegmentatall?Therearethreetypicalusesofsegmentation:findingsimilarmembers,makingmodellingbetterand–mostimportant–usingmarketingstrategytoattackeachsegmentdifferently.
Findinghomogeneousmembersisavaluableuseofastatisticaltechnique.Thebusinessproblemtendstobe:findallthosethatare‘alike’andseehow,say,satisfactiondiffersbetweenthem,orfindallthosethatare‘homogeneous’bysomemeasureandseehowusagevariesbetweenthem.
Asimpleexamplemightbein,say,telecommunications,wherewearelookingatchurn(attrition)rates.Wewanttounderstandthemotivationofchurn,whatbehaviourcanpredictchurn.So,conductsegmentationandidentifycustomersineachsegmentthatarealikeinallimportantwaystothebusiness(products,usage,demographics,channelpreferences,etc.)andshowdifferentchurnratesbysegment.Notethatchurnisnotthevariablethatallsegmentsarealikeon,churniswhatwearetryingtounderstand.Thuswecontrolforseveralinfluences(allmemberswithinasegmentarealike)andnowcanseehighversuslowchurners,afterallothersignificantvariableshavebeeneliminated.
Asecondusage,alsosophisticatedandnuanced,istousesegmentationtoimprovemodelling.Intheabovechurnexample,saysegmentationwasdoneandwewanttopredictchurn.Werunaseparateregressionmodelforeachsegmentandfindthatdifferentindependentvariablesaffectchurndifferently.Thiswillbefarmoreaccurate(andactionable)thanone(average)modelappliedtoeveryonewithoutsegmentation.Thisapproachtakesadvantageofthedifferentreasonstochurn.Onesegmentmightchurnduetodroppedcalls,anothermightchurnbecauseofthepriceoftheplanandanotherissensitivetotheirbillbasedoncalls,minutesanddataused.Thus,eachmodelwillexploitthesedifferencesandbefarmoreaccuratethanotherwise.Themoreaccuratethemodel,thegreatertheinsights;thegreatertheunderstanding,themoreobviousthestrategyofhowtocombatchurnineachsegment.
Butfromamarketingpointofview,thereasontosegmentisthesimpleanswerthatnoteveryoneisalike;notallcustomersarethesame.Onesizedoesnotfitall.
I’devenofferatweakon‘segmentation’atthispoint.Marketsegmentationusesthemarketingconcept,wherethecustomeriskingandstrategyisthereforecustomer-centric.NotethatanalgorithmlikeRFMisfromthefirm’s(financial)pointofviewwithmetricsthatareimportanttothefirm.RFMisaboutdesigningvaluetiersbasedonafinancialperspective(seeChapter8highlight,‘WhygobeyondRFM?’).
Sincemarketingsegmentationshouldbefromthecustomer’spointofview,whydosegmentation?Thatis,howdoes‘onesizedoesnotfitall’operateintermsofcustomer-centricity?
Generally,it’sbasedonrecognizingthatdifferentcustomershavedifferentsensitivities.Thesedifferentsensitivitiescausethemtobehavedifferentlybecausetheyaremotivated
differently.
Thismeansconsiderableeffortneedstobeappliedtolearnwhatmakeseachbehaviouralsegmentasegment.(Thespecifictechniquestodothisareexplainedinthenextchapter.)Itmeansdevelopingastrategytoexploitthesedifferentsensitivitiesandmotivations.
Usuallythereisasegmentsensitivetoprice,andasegmentnotsensitivetoprice.Oftenthereisasegmentthatprefersonechannel(sayonline)andasegmentthatprefersanotherchannel(sayoffline).TypicallyonesegmentwillhavehighpenetrationofproductXwhileanothersegmentwillhavehighpenetrationofproductY.Onesegmentneedstobecommunicatedtodifferently(style,imaging,messaging,etc.)thananothersegment.Notethatthisisfarmoreinvolvedthanasimplebusinessrule.
Theideaisthatifasegmentissensitiveto,say,price,thenthosemembersshouldgetadiscountorabetteroffer,inordertomaximizetheirprobabilitytopurchase(theyfaceanelasticdemandcurve).Thesegmentthatisnotsensitivetoprice(becausetheyareloyal,wealthy,nosubstitutesavailable,etc.)shouldnotbegiventhediscountbecausetheydon’tneeditinordertopurchase.
Iknowtheaboveaddscomplexitytotheanalysis.ButnotethatconsumerbehaviourIScomplex.Behaviourincorporatessimultaneousmotivationsandmultidimensionalfactors,sometimesnearlyirrational(rememberDanAriely’sbook,PredictablyIrrational?).
Understandingconsumerbehaviourrequiresacomplex,sophisticatedsolution,ifthegoalistodomarketing,ifthegoalistobecustomer-centric.Asimplersolutionwon’twork.Muchliketheproblemthathappenswhenwetakeathree-dimensionalglobeoftheearthandspreaditoutoveratwo-dimensionalspace.Greenlandisnowwayoffinsize;theworldiswrong.Beingoverlysimplisticproduceswrongresults;justlikeapplyingaunivariatesolutiontoamultivariateproblemwillproducewrongresults.
FortheMBA(whichseemstoneedalistàlaPowerPoint)I’dsuggestthefollowingasbenefitsofsegmentation:
MarketingResearch:learningWHY.Segmentationprovidesarationaleforbehaviour.
MarketingStrategy:targetingbyproduct,price,promotionandplace.Strategyusesthemarketingmixbyexploitingsegmentdifferences.
MarketingCommunications:messagingandpositioning.Somesegmentsneedatransactionalstyleofcommunication;othersegmentsneedarelationshipstyleofcommunication.Onesizedoesnotfitall.
MarketingEconomics:imperfectcompetitionleadstopricemakers.Withthefirmcommunicatingjusttherightproductatjusttherightpriceinjusttherightchannelatjusttherighttimetothemostneedytarget,suchcompellingoffersgivethe
firmnearlymonopolisticpower.
ThefourPsofstrategicmarketingSegmentationispartofastrategicmarketingprocesscalledthefourPsofstrategicmarketing,coinedbyPhilipKotler.Kotlerisprobablythemostwidelyrecognizedmarketingguruintheworld,essentiallycreatingthedisciplineofmarketingasseparatefromeconomicsandpsychology.HewrotemanytextbooksincludingMarketingManagement(1967),nowinits14thedition,whichhasbeenusedfordecadesasthepillarofallmarketingeducation.
MostmarketersareawareofthefourPsoftacticalmarketing:product,price,promotionandplace.Theseareoftencalledthe‘marketingmix’.Butbeforetheseareapplied,amarketingstrategyshouldbedeveloped,basedonthefourPsofstrategicmarketing.
PartitionThefirststepistopartitionthemarketbyapplyinga(behavioural)segmentationalgorithmtodividethemarketintosub-markets.Thismeansrecognizingstrategicallythatonesizedoesnotfitall,andunderstandingthateachsegmentrequiresadifferenttreatmenttomaximizerevenue/profitorsatisfaction/loyalty.
ProbeThissecondstepisusuallyaboutadditionaldata.Oftenthismaycomefrommarketingresearch,probingforattitudesaboutthebrand,itscompetitors,shoppingandpurchasingbehaviour,etc.Sometimesitcancomefromdemographicoverlaydata,whichisespeciallyvaluableifitincludeslifestyleinformation.Last,probingdatacancomefromcreatedvariablesfromthedatabaseitself.Thesetendtobearoundvelocity(timebetweenpurchases)orshareofproductspenetrated(whatpercentdoesthecustomerbuyofcategoryX,whatpercentofcategoryY,etc.),seasonality,consumerconfidenceandinflation,etc.
PrioritizeThisstepisafinancialanalysisoftheresultingsegments.Whicharemostprofitable,whicharegrowingfastest,whichrequiremoreefforttokeeporcosttoserve,etc.?PartofthepointofthisstepistofindthosethatwemightdecidetoDE-market,thatis,thosethatarenotworththeefforttocommunicateto.
PositionPositioningisaboutusingalloftheaboveinsightsandapplyinganappropriatemessage,
orthecorrectlookandfeelandstyle.Thisisthetoolthatallowsthecreationofcompellingmessagesbasedonasegment’sspecificsensitivities.Thismarketingcommunicationisoftencalledmarcom.ThisincorporatesthefourPsoftacticalmarketing.
CriteriaforactionablesegmentationI’vealwaysthoughtthelistbelowguidedasegmentationprojectthatendedupbeingactionable.ThistooprobablycamefromPhilipKotler(asdomostthingsthataregoodandimportantinmodernmarketing).
Identifiability.Inordertobeactionableeachsegmenthastobeidentifiable.Oftenthisistheprocessofscoringthedatabasewitheachcustomerhavingaprobabilityofbelongingtoeachsegment.
Substantiality.Eachsegmentneedstobesubstantialenough(largeenough)tomakemarketingtoitworthwhile.Thusthere’sabalancebetweendistinctivenessandsize.
Accessibility.Notonlydothemembersofthesegmenthavetobeidentifiable,theyhavetobeaccessible.Thatis,therehastobeawaytogettothemintermsofmarketingefforts.Thistypicallyrequireshavingcontactinfo,e-mail,directmail,SMS,etc.
Stability.Segmentmembershipshouldnotchangedrastically.Thethingsthatdefinethesegmentsshouldbestablesothatmarketingstrategyispredictableovertime.Segmentationassumestherewillbenodrasticshocksindemand,orradicalchangesintechnology,etc.,intheforeseeablefuture.
Responsiveness.Tobeactionable,thesegmentationmustdriveresponses.Ifmarcomdataisoneofthesegmentationdimensions,thisisusuallyachievable.
Aprioriornot?Asthisisapractitioner’sguidetomarketingscience,itshouldcomeasnosurprisethatIadvocatestatisticalanalysistoperformsegmentation.However,it’safactthatsometimesthereare(top-down)dictumsthatdefinesegments.Thesearemanagerialfiatsthatdemandamarketbebased(apriori)onmanagerialjudgment,ratherthansomeanalytictechnique.Theusualdimension(s)managerswanttoartificiallydefinetheirmarketbytendtobeusage,profit,satisfaction,size,growth,etc.Analytically,thisisaunivariateapproachtowhatisclearlyamultivariateproblem.
Inmyopinion,thereisaplaceformanagerialjudgment,butitisNOTinsegmentdefinition.Afterthesegmentsaredefined,thenmanagerialjudgmentshouldascertainifthesolutionmakessense,ifthesegmentsthemselvesareactionable.
Conceptualprocess
Settleona(marketing/customer)strategyThegeneralfirststepinbehaviouralsegmentationisoneofstrategy.Afterthefirmestablishesgoals,astrategyneedstobeinplacetoreachthosegoals.Thereshouldbeachampion,abusinessleader,astakeholderthatistheultimateuserofthesegmentation.
Analyticsneedstorecognizethatasegmentationnotdrivenbystrategyisakintoabodywithoutaskeleton.Strategysupportseverything.Averydifferentsegmentationshouldresultifthestrategyisaboutmarketshareasopposedtoastrategyaboutnetmargin.
Astrategydiscussionshouldrevolvearoundcustomerbehaviour.Whatisthemindsetinacustomer’smind?Whatisthebehaviourwearetryingtounderstand?Whatincentiveareweemploying?Anygoodsegmentationsolutionshouldtietogethercustomerbehaviourandmarketingstrategy.Remember,marketingiscustomer-centric.
Collectappropriate(behavioural)dataThenextanalyticstepinbehaviouralsegmentationistocollectappropriate(behavioural)data.Thistendstobegenerallyaroundtransactions(purchases)andmarcomresponses.
Afewcommentsoughttobemadeaboutwhatismeantby‘behaviouraldata’.Mytheoryofconsumerbehaviour(andit’sokayifyoudon’tagree)istoenvisionfourlevels(seeFigure7.1overleaf):primarymotivations,experientialmotivations,behavioursandresults.Results(typicallyfinancial)arecausedbybehaviours(usuallysomekindoftransactionpurchasesandmarcomresponses),whicharecausedbyoneorboth(primaryandexperiential)motivations.Primarymotivations(pricevaluation,attitudesaboutlifestyle,tastesandpreferences,etc.)aregenerallypsychographicandnotreallyseen.Theyaremotivationalcauses(searching,needarousal,etc.)withoutbrandinteraction.Experientialmotivationstendtohavebrandinteractionandareanothermotivatortoadditionalbehavioursthatultimatelycause(financial)results.Thesemotivationsarethingslikeloyalty,engagement,satisfaction,etc.Notethatengagementisanexperientialcause(therehasbeeninteractionwiththebrand)andisnotabehaviour.Engagementwouldbemetricslikerecencyandfrequency.TherewillbemoreonthistopicwhenwediscussRFM(seeChapter8highlight).I’llwarnyouthisisoneofmysoapboxes.
Figure7.1RevenueGrowthMargin
Usuallytransactionsandmarcomresponses(fromdirectmail,e-mail,etc.)arethemaindimensionsofbehaviouralsegmentation.Oftenadditionalvariablesarecreatedfromthesedimensions.
Wewanttoknowhowmanytimesacustomerpurchased,howmucheachtime,whatproductswerepurchased,whatcategorieseachproductpurchasedbelongedto,etc.Oftenvaluableprofilingvariablesgoalongwiththis,includingnetmarginoneachpurchase,costofgoodssold,etc.Wewanttoknowthenumberoftransactionsoveraperiodoftime,thenumberofunitsandifanydiscountswereappliedtothesetransactions.
Intermsofmarcomresponseswewanttocollectwhatkindofvehicle(directmail,e-mail,etc.),opens,clicks,websitevisits,storepurchases,discountsused,etc.Wewanttoknowwheneachvehiclewassentandwhatcategoryofproductwasfeaturedoneachvehicle.Anyversioningneedstobecollected,andanyoffers/promotions,etc.,needtobeannotatedinthedatabase.Allofthisdatasurroundingtransactionsandresponsesisthebasisofcustomerbehaviour.
Generallyweexpecttofindasegmentthatisheavilypenetratedinonetypeofcategory(broadproductspurchased)butnotanotherandthiswillbedifferentbymorethanonesegment.Asbearsrepeating,onesegmentisheavilypenetratedbycategoryX,whileanotherisheavilypenetratedbycategoryY,etc.Wealsoexpecttofindoneormoresegmentsthatprefere-mailoronlinebutnotdirectmail,orviceversa.Wetypicallyfindasegmentthatissensitivetopriceandonethatisnotsensitivetoprice.Theseinsightscomedifferentlyfromthesebehaviouraldimensions.
Create/useadditionaldata
Nowcomesthefunpart.Hereyoucancreateadditionaldata.Thisdataatleasttakestheformofseasonalityvariables,calculatestimebetweeneachpurchase,timebetweencategoriespurchased,peaksandvalleysoftransactionsandunitsandrevenue,shareofcategories(percentofbabyproductscomparedtototal,percentofentertainmentcategoriescomparedtototal),etc.Thereshouldbemetricslikenumberofunitsandtransactionspercustomer,percentofdiscountspercustomer,toptwoorthreecategoriespurchasedpercustomer,etc.Allofthesecanbeused/testedinthesegmentation.
Asformarcom,thereshouldbeahostofmetricsaroundmarcomtypeandofferandtimeuntilpurchase.Thereshouldbebusinessrulestyingacampaigntoapurchase.Thereshouldbevariablesindicatingcategoriesfeaturedonthecover,orsubjectlines,oroffersandpromotions.
Notehowalloftheaboveexpandbehaviouraldata.Butthereareothersourcesofdataaswell.Oftenprimarymarketingresearchisused.Thistendstobearoundsatisfactionorloyalty,somethingaboutcompetitivesubstitutes,maybemarcomawarenessorimportanceofeachmarcomvehicle.
Thirdpartyoverlaydataisarichsourceofadditionalinsightsintofleshingoutthesegments.Thisisoftenmatcheddatalikedemographics,interests,attitudes,lifestyles,etc.Thisdataistypicallymosthelpfulwhenitdealswithattitudesorlifestyle,butdemographicscanbeinterestingaswell.Againallofthisadditionaldataisaboutfleshingoutthesegmentsandtryingtounderstandthemindset/rationaleofeachsegment.
RunthealgorithmAsmentioned,thealgorithmdiscussionwillbecoveredindepthinthenextchapter,butafewcommentscanbemadenow,particularlyintermsofprocess.Notethatthealgorithmisguidedbystrategyanduses(definingorsegmenting)variablesbasedonstrategy.
Thealgorithmistheanalyticgutsofsegmentationandcareshouldbetakeninchoosingwhichtechniquetouse.Thealgorithmshouldbefastandnon-arbitrary.Analytically,wearetryingtoachievemaximumseparation(segmentdistinctiveness).
Theultimateideaofsegmentationistoleveladifferentstrategyagainsteachsegment.ThereforeeachsegmentshouldhaveadifferentreasonforBEINGasegment.Thealgorithmneedstoprovidediagnosticstoguideoptimization.Thegeneralmetricofsuccessis‘homogeneouswithinandheterogeneousbetween’segments.Therehavebeenmanysuchmetricsoffered(SAS,viaprocdiscrim,uses‘thelogarithmofthedeterminantofthecovariancematrix’asametricofsuccess).Intheprofiling,thedifferentiationofeachsegmentshouldmakeitselfclear.
Justtostackthedeck,letmedefinewhatagoodalgorithmforsegmentationshouldbe.Itshouldbemultivariable,multivariate,andprobabilistic.Itshouldbemultivariablebecauseconsumerbehaviourismostcertainlyexplainedbymorethanonevariable,andit
shouldbemultivariatebecausethesevariablesthatareimpactingconsumerbehavesimultaneously,interactingwitheachother.Itshouldbeprobabilisticbecauseconsumerbehaviourisprobabilistic;ithasadistributionandatsomepointthatbehaviourcanevenbeirrational.Gasp!
ProfiletheoutputProfilingiswhatweshowtootherpeopletoprovethatthesolutiondoesdiscriminatebetweensegments.Generallythemeansand/orfrequenciesofeachkeyvariable(especiallytransactionsandmarcomresponses)areshowntoquicklygaugedifferencesbyeachsegment.Notethatthemoredistincteachsegmentisthemoreobviousastrategy(foreachsegment)becomes.
ToshowthemeansofKPIs(keyperformanceindicators)bysegmentiscommon,butoftenanothermetricteasesoutdifferencesbetter.Usingindexesoftenspeedsdistinctiveness.Thatis,takeeachsegment’smeananddividebythetotalmean.Forexample,saysegmentonehasaveragerevenueof1,500andsegmenttwohasaveragerevenueof750andthetotalaverage(allsegmentstogether)is1,000.Dividingsegmentonebythetotalis1,500/1,000=1.5,thatis,segmentonehasrevenue50%aboveaverage.Notealsothatsegmenttwois750/1,000=0.75meaningthatsegmenttwocontributesrevenue25%lessthanaverage.Applyingindexestoallmetricsbysegmentimmediatelyshowsdifferences.Thisisespeciallyobviouswheresmallnumbersareconcerned.Asanotherexample,saysegmentonehasaresponserateof1.9%andtheoverallgrandtotalresponserateis1.5%.Whilethesenumbers(segmentonetototal)areonly0.4%different,notethattheindexofsegmentone/totalis1.9%/1.5%showingthatsegmentoneis27%greaterthanaverage.Thisiswhyweliketo(andshould)useindexes.
Whileseeingdrasticdifferencesineachsegmentisverysatisfying,themostenjoyablepartofprofilingoftenistheNAMINGofeachsegment.Firstyoumustrealizethatnamingasegmenthelpsdistinguishthesegments.Themoresegmentsyouhavethemoreimportantthisbecomes.
Ihaveacoupleofsuggestionsaboutnamingsegments;takethemasyouseefit.Sometimesthenamingofsegmentsislefttothecreativedepartmentandthat’sokay.Butusuallyanalyticshastocomeupwithnames.
Eachnameshouldbeonlytwoorthreewords,ifpossible.Theyshouldbemoreinformativethansomethinglike‘HighRevenueSegment’or‘LowResponseSegment’.Theyshouldincorporatetwoorthreesimilardimensions.Eitherkeepmostofthemtoproductmarcomresponsedimensions,orkeepthemalongastrategicdimensionortwo(highgrowth,costtoserve,netmargin,etc.).It’stemptingtonamethemplayfullybutthisstillhastobeusable.Thatis,while‘BohemianMix’isfun,whatdoesitmeanstrategicallyorfromamarketingpointofview?
Modeltoscoredatabase(iffromasample)Thenextstep,ifthesegmentationwasdoneonasample,istoscorethedatabasewitheachcustomer’sprobabilitytobelongtoeachsegment.Thisisoftencarriedoutquicklywithdiscriminateanalysis.Apply(inSAS)procdiscrimtothesampleandgettheequationsthatscoreeachcustomerintoasegment.(Discriminateanalysisisacommontechnique,oncecategories(segments)aredefined,tofitvariablesinequationstopredictcategory(segment)membership.)Thenruntheseequationsagainstthedatabase.
Ifthisisaccurateenough(whatever‘accurateenough’means)thenyou’regoodtogo.ButdiscrimsometimesisNOTaccurateenough.Imyselfthinkthisisbecauseyouhavetousethesamevariables(althoughwithdifferentweights)oneachsegment.Thiscanbeinefficient.Thereisalsotheassumptioninherentinprocdiscrimaboutthesamevarianceacrossasegmentwhichishardlyevertrue,soyoumayneedtoturntoanothertechnique.
Ihaveoftensettledforlogisticregression,whereadifferentequationscoreseachsegment.Thatis,ifIhavefivesegments,thefirstlogitwillbewithabinarydependentvariable:1ifthecustomerisinsegmentoneand0ifnot.Thesecondlogitwillbea1ifthecustomerisinsegmenttwoanda0ifnot.ThenIputinvariablestomaximizeprobabilityofeachsegmentandIremovethosevariablesthatareinsignificantandrunallequationsagainstallcustomers.Eachcustomerwillhaveaprobabilitytobelongtoeachsegmentandthemaximumscorewins,ie,thesegmentthathasthehighestprobabilityisthesegmenttowhichthecustomerisassigned.
TestandlearnThetypicallaststepistocreateatestandlearnplan.Thisisgenerallyabroad-basedtestdesign,aimedatlearningwhichelementsdriveresults,whichisdirectlyinformedbythesegmentationinsights.
NoteChapter10ondesignofexperiments(DOE).Theoverallideahereistodevelopatestingplantotakeadvantageofsegmentation.Thefirstthingtotestistypicallyselection/targeting.Thatis,pullasampleofthoselikelytobelongtoaveryhighlyprofitable,heavyusagesegmentanddoamailingtothemandcomparerevenueandresponsestosomegeneralcontrolgroup.Thesehigh-endsegmentsshoulddrasticallyout-performthebusinessasusual(BAU)group.
Acommonnextstep(dependingonstrategy,etc.)mightbepromotionaltesting.Thiswouldusuallyfollowwithelasticitymodellingbysegment.Oftenoneormoresegmentsarefoundtobeinsensitivetopriceandoneormoresegmentsarefoundtobesensitivetoprice.Thetesthereistoofferpromotionsanddetermineifthesegmentinsensitivetopricewillstillpurchaseevenwithalowerdiscount.Thismeansthefirmdoesnothavetogiveawaymargintogetthesameamountofpurchases.
Othertypicaltestsrevolvearoundproductcategories,channelpreferenceand
messaging.Afullfactorialdesigncouldgetmuchlearningimmediatelyandthenmarcomcouldbeaimedappropriately.Thegeneralideaisthatifasegmentis,say,heavilypenetratedinproductX,sendthemaproductXmessage.IfasegmentmighthaveapropensityforproductY(givenproductX)doatestandseehowtoincentivizebroadercategorypurchases.Thenextchapterwillgothroughadetailedexampleofwhatthistestingmightmean.
Checklist
You’llbethesmartestpersonintheroomifyou:
Pointoutthatsegmentationisastrategic,notananalytic,exercise.
Rememberthatsegmentationismostlyamarketingconstruct.
Arguethatsegmentationisaboutwhat’simportanttoaconsumer,notwhat’simportanttoafirm.
Recallthatsegmentationgivesinsightsintomarketingresearch,marketingstrategy,marketingcommunicationsandmarketingeconomics.
ObservethefourPsofstrategicmarketing:partition,probe,prioritizeandposition.
UncompromisinglydemandthatRFMbeviewedasaservicetothefirm,notaservicetotheconsumer.
Requireeachsegmenttohaveitsownstoryrationaleforwhyitisasegment.Thereshouldbeadifferentstrategylevelledateachsegment,otherwisethereisnopointinbeingasegment.
08
Segmentation:toolsandtechniquesOverview
Metricsofsuccessfulsegmentation
Generalanalytictechniques
Businesscase
Analytics
Comments/detailsonindividualsegments
K-meanscomparedtoLCA
Highlight:WhyGoBeyondRFM?
Segmentationtechniques
Checklist:You’llbethesmartestpersonintheroomifyou…
OverviewThepreviouschapterwasmeanttobeageneral/strategicoverviewofsegmentation.Thischapterisdesignedtoshowtheanalyticaspectsofit,whichistheheartofthesegmentationprocess.Analyticsisthefulcrumofthewholeproject.
Afewbookstonote,intermsoftheanalyticsofsegmentation,wouldbeSegmentationandPositioningforStrategicMarketingDecisionsbyJamesH.Myers(1996),MarketSegmentationbyMichelWedelandWagnerA.Kamakura(1998)andAdvancedMethodsofMarketResearch,editedbyRichardP.Bagozzi(2002),especiallythechapters‘TheCHAIDApproachtoSegmentationModelling’and‘ClusterAnalysisinMarketResearch’.NotealsothepapersofJayMagdison(2002)fromtheStatisticalInnovationswebsite(www.statisticalinnovations.com).
MetricsofsuccessfulsegmentationAsmentionedearlier,thegeneralideaofsuccessfulsegmentationis‘homogeneouswithinandheterogeneousbetween’.Thereareseveralpossibleapproachestoquantifyingthisgoal.Generally,aratioofthosemembersinthesegmentiscomparedtoallthosemembersnotinthesegment,andthesmallerthebetter.Thishelpsustocomparea3-segmentsolutionwitha4-segmentsolution,ora4-segmentsolutionusingvariablesa–fwitha4-
segmentsolutionusingvariablesd–j.SAS(viaprocdiscrim)hasthe‘logofthedeterminantofthecovariantmatrix’.Thisisagoodmetrictouseincomparingsolutionsevenifit’sabadly-namedone.
Generalanalytictechniques
BusinessrulesTheremaybeaplaceforbusiness-rulesegmentation.Ifdataissparse,underpopulated,orveryfewdimensionsareavailable,there’slittlepointtryingtodoananalyticsegmentation.There’snothingforthealgorithmtooperateon.
I(again)cautionagainstamanagerialfiat.Ihavehadmanagerswhoinvestedthemselvesinthesegmentationdesign.Theyhavetoldmehowtodefinethesegments.Thisistypicallyflawed.Iwouldn’tsaytoignoremanagement’sknowledge/intuitionoftheirmarketandtheircustomers.Myadviceistogothroughthesegmentationprocess,dotheanalyticsandseewhattheresultslooklike.Typicallytheanalyticresultsareappealingandmorecompellingthanmanagerialjudgment.Thisisbecauseamanager’sdictumisaroundoneortwooratmostthreedimensions,arbitrarilydefined.Buttheanalyticoutputoptimizesthevariablesandseparationisthemathematical‘best’.Itwouldbeunlikelythatoneperson’sintuitioncouldout-performastatisticalalgorithm.Iwouldevensaythatifananalyticoutputisverydifferentthanamanager’spointofview,thatmanagerhasalottolearnabouthisownmarket.Thestatisticalalgorithmencourageslearning.Mostoftenmanagerialfiatisaboutusage(high,mediumandlow),satisfaction,netprofit,etc.Noneoftheserequire/allowmuchinvestigationintoWHYtheresultsarewhattheyare.Noneoftheserequireanunderstandingofconsumerbehaviour.
ThisiswhyRFM(recency,frequency,andmonetary)issoinsidious.Itisabusinessrule,it’sappealing,itisbasedondataanditworks.Itisultimatelya(typicallyfinancial)manager’spointofview.Itdoesnotencouragelearning.Marketingstrategyisreducedtonothingmorethanmigratinglowervaluetiersintohighervaluetiers.
Agoodoverviewofsegmentation,fromthemanagerialroleandnottheanalyticalrole,isArtWeinstein’sbook,MarketSegmentation(1994),whichprovidesagooddiscussionofsegmentationbasedonbusinessrules.
CHAIDCHAID(chi-squaredautomaticinteractiondetection)isanimprovementoverAID(automaticinteractiondetection).Strictlyspeaking,CHAIDisadependentvariabletechnique,NOTaninter-relationshiptechnique.I’mincludingitherebecauseCHAIDisoftenusedasasegmentationsolution.
Thisbringsustothefirstquestion:‘Whyuseadependentvariabletechniqueintermsofsegmentation?’Myansweristhatitisinappropriate.Adependentvariabletechniqueis
designedtounderstand(predict)whatcausesadependentvariabletomove.Bydefinition,segmentationisnotaboutexplainingthemovementinsomedependentvariable.
OK.Howdoesitwork?Whiletherearemanyvariationsofthealgorithm,ingeneralitworksthefollowingway.CHAIDtakesthedependentvariable,looksattheindependentvariablesandfindstheoneindependentvariablethat‘splits’thedependentvariablebest.‘Best’hereisperthechi-squaredtest.(AIDwasbasedontheF-test,whichistheratioofexplainedvarianceoverunexplainedvarianceandisused(inmodelling)asathresholdthatprovesthemodelisbetterthanrandom.)Itthentakesthat(secondlevel)variableandsearchestheremainingindependentvariablestotestwhichonebestsplitsthatsecondlevelvariable.Itdoesthisuntilthenumberoflevelsassignedisreached,oruntilthereisnoimprovementinconvergence.
Belowisasimpleexample(Figure8.1).ProductrevenueisthedependentvariableandCHAIDisrunandthebestsplitisfoundtobeincome.Incomeissplitintotwogroups:highincomeandlowincome.Thenextbestvariableisresponserate,whereeachincomelevelhastwodifferentresponserates.Highincomeissplitintermsofresponserate>9%andresponserate>6%and<9%.Lowincomeissplitbetween<2%and>2%and<6%.Thusthissimplifiedexamplewouldshowfoursegments:highincomehighresponse,highincomemediumresponse,lowincomemediumresponseandlowincomelowresponse.
Figure8.1CHAIDoutput
TheadvantagesofCHAIDarethatitissimple,easytouseandeasytoexplain.Itprovidesastunningvisualtoshowhowtointerpretitsoutput.
Thedisadvantagesaremany.First,itisnotamodelinthestatistical/mathematicalsenseoftheword,butaheuristic,aguide.Thismeanstheanalysistendstobeunstable;thatis,differentsamplescanproducewildlydifferentresults.Therearenocoefficientsthatshowsignificance,therearenosignsonthevariables(positiveornegative)andthereisnorealmeasureoffit.
CHAIDisapopulartechnique,duetoitseaseandsimplicity.Iwouldofferitisnotappropriateforsegmentation.Itsbestuseisprobablyintermsofdataexploration.Iwouldcaution,however,thatthiscanbecomeacrutchandmightencourageyoutobypassyour
ownbrain.Irememberwhensomeonewhoworkedformewasassignedtobuildaregressionmodel.ShehadCHAIDonherPCsoshewasrunningallkindsofCHAIDoutputandhadmanypagesoftreediagrams.AfterawhileIaskedhowitwasgoingandshewasstillexploringthedata.Shehadhundredsofvariablesandshesaidshehadnorealideaaboutwhatcausedwhat.SheclaimedsheneededCHAIDtominethedatabecauseshehadnocluewhatvariablesmightcause/explainthemovementinthedependentvariable.Itoldherthatifshe,astheanalyst,trulyhadnoideawhatsoeverastowhatmightcauseorexplainthemovementinthedependentvariable(inthiscasesales)thenshewasnottherightpersontodothemodel.AsanalystyouMUSThavesomeideaofthedata-generatingprocessandyouMUSThavesomeideaabout‘thiscausesthat’,egpricechangescausechangesindemand.So,useCHAIDfordesigningstructure,notexplainingcausality.
HierarchicalclusteringHierarchicalclusteringISaninter-relationshiptechnique.ItalsohasagraphicaldisplaybutunlikeCHAIDitisNOTvisuallyappealing.
Hierarchicalclusteringcalculatesa‘nearnessmetric’,atypeofsimilarityviasomeinter-relationshipvariables.Therearemanyoptionshowtodothisbutconceptuallytheideaisthatsomeobservations(saycustomers)are‘closetoeachother’basedonsomesimilarvariables.Thenadendogram(ahorizontaltreestructure)isproducedandtheanalystchooseshowtodividetheresultantgraphics.SeeFigure8.2.
Figure8.2Hierarchicalclustering–dendogram
Notethat,forinstance,observations34and56arejoinedtogether(becausetheyaresimilar)andthesearenextjoinedtoobservation111.Nowtherearethreeobservationsinthiscluster.Asthenumberofobservationsincreasesthegraphicislessandlessusable.Onedisadvantageisthattheanalystisrequiredto(arbitrarily)decidewheretobreaktheclustersoff.Thatis,itultimatelyisuptotheanalysttochoosehowmanyandwhichobservationsareinthefinalclusters.ArbitrarychoiceisNOTbasedonanalytics,butintuition.
Anadvantageofhierarchicalclusteringisitcalculatesthedistanceofeveryobservationfromallotherobservations,sothestarting‘seeds’aremathematicallydistinct.Oftenhierarchicalclusteringisusedfornothingelsethanthesestartingseedsasaninputintoanotheralgorithm.NotewellJamesH.Myers’bookonsegmentation(Myers,1996),whichhasaverygoodandconceptualtreatmentofhierarchicalclustering.
K-meansclustering
K-meansisprobablythemostpopular(analytic)segmentationtechnique.SAS(usingprocfastclus)andSPSS(usingpartitioning)haveverypowerfulalgorithmstodoK-meansclustering.K-meansiseasytodo,fairlyeasytounderstandandexplainandtheoutputiscompelling.K-meansworksandhasbeeninuseforover50years.
K-meanswasinventedbyzoologistsinthe1960sforphylumclassification.WhileEWForgy,RCJanceyandMRAnderbergwereearlyalgorithmdesigners(1960s)itwasJamesMacQueen(1967)whocoinedtheterm‘K-means’.It’scalledK-meansbecauseKisthenumberofclustersandthecentroidsarethemeansoftheclusters.Notetheyweretryingtodecide,basedonananimal’s(particularlyabutterfly’s)characteristics,towhichphylumtheybelonged.Theywantedanalgorithmfortaxonomy.
Thegeneralalgorithm(andaswithallothertechniques,therearevariousversions)isasfollows:
1. Setup:choosenumberofclusters,choosesomekindof‘maximumdistance’todefineclustermembershipandchoosewhichclusteringvariablestouse.
2. Findthefirstobservationthathasalltheclusteringvariablespopulatedandcallthiscluster1.
3. Findthenextobservationthathasalltheclusteringvariablespopulatedandtesthowfarawaythisobservationisfromthefirstobservation.Ifit’sfarenoughawaythencallthiscluster2.
4. Findthenextobservationthathasalltheclusteringvariablespopulatedandtesthowfarawaythisobservationisfromthefirstandsecondobservations(clusters).Ifit’sfarenoughawaythencallthiscluster3.Continuewithsteps2–4untilthenumberofclusterschosenisdefined.
5. Gotothenextobservationandtestwhichclusteritisclosesttoandassignthatobservationtothatcluster.
6. Continuewithstep5untilallobservationsthathavetheclusteringvariablespopulatedhavebeenassigned.
Thereareseveralthingsgoodaboutthisalgorithm.Itisveryfastandcanhandlealargeamountofdata.Itworks.Itwillachievesomekindofseparation.
Therearemanydisadvantages.Personally,IHATEthearbitrarinessofwhattheanalystmustdecide.Asstatedabove,theanalysttellsthealgorithmhowmanyclusterstoform(asifheknows).Thereislittle(analytically)tobasethisimportantcriterionon.Second,hehastotellthealgorithmwhatvariablestousetodefinetheclusters.Again,asifheknowshowmanyclustersthereare.Thisisanextremelyimportantchoice.TheclustersareDEFINEDbasedonthisarbitrarychoice.
AnotherdisadvantagewithK-meansisthattherearenorealdiagnosticsonhowwellitfits,howwellitpredictsandhowwellitscoresthoseobservations(customers)intoeachsegment.Becauseit’sbasedonthesquarerootofEuclideandistance
eachobservationisplacedinthesegmentitis‘closestto’.Thereisnolikelihoodmetric.Supposeacustomerisnewonfile,orhassomeunusualbehaviour.Thiscustomermightnotexhibitrealsegmentbehaviourbutisplacedsomewhere,regardless.
Becauseofthesearbitrarychoices(andthefactthatK-meansgivesnodiagnosticstoaidthesechoices)mostclusteringprojectsendupwiththeanalystgeneratingmanysolutions.Hewilldoa4anda5anda6anda7andan8-clustersolution.Hewilluseineachvariables1–5andthenvariables5–10andthenvariables10–12,etc.Becausetherearenorealdiagnosticstoguidehimhewilloutputreamsofpaperandsharethesepilesofprofileswithhispeersandtheultimateusersofthesegmentationandbasicallythrowuphishandsandsay,‘Whatdoyouthink?Whichofthese20outputsdoyoulikethebest?’Andthenmaybesomebodywilldecidewhattheylike,typicallyforstrategicreasons.Notethesubjectivityhere?
Anotherobviousdisadvantage(giventhealgorithmabove)isthatiftheorderofthedatasetisdifferent,theK-meanssolutionwillbedifferent.Somealgorithmsimprovethisoptionbynotjustgoingdownthelist,buttakingarandomobservationaseachstartingseed.Thisisbetter,butthesameproblemremains.Re-order,orre-do,thealgorithm–withthesamenumberofclustersandthesamevariables–andtheoutputwillbe(very)different.Thisshouldstrikeallanalyticpeopleasagreatproblem.
AlastproblemwithK-meansisthatitisnotanoptimizingalgorithm.Itdoesnottrytomaximize/minimizeanything.Ithasnogenerallycontrollingobjective.
Therefore,IwouldsuggestthatK-meansisnotaviableoptionforactionablesegmentation.Thealgorithmistooarbitraryandtheoutputissubjective,somethingmostgoodanalystsabhor.
LatentclassanalysisLatentclassanalysis(LCA)isamassiveimprovementonalltheabove.Itisnowthestateoftheartinsegmentation.Tome,thebestsoftwareforthisisLatentGoldfromStatisticalInnovations.JayMagdisonisageniusandhaswrittensomeofthebestarticlesonit.Especiallysee‘Anontechnicalintroductiontolatentclassmodels’(2002)and‘Latentclassmodelsforclustering:acomparisonwithK-means’(2002).
LCAtakesacompletelydifferentviewofsegmentation.Ratherthan,asinthecaseofK-means,wherethevariablesdefinethesegments,LCAassumesthescoresonthevariablesarecausedbythe(hidden)segment.Thatis,LCApositsalatent(categorical)variable(segmentmembership)thatmaximizesthelikelihoodofobservingthescoresseenonthevariables.
Itthenrunsthistaxonomyandcreatesaprobabilityofeachobservationbelongingto
eachsegment.Thesegmentthathasthehighestprobabilityisthesegmentintowhichtheobservationisplaced.ThismeansLCAisastatisticaltechniqueandnotamathematical(likehierarchicalorK-meansclustering)technique.
TherearesomedisadvantagesofLCA.SASdoesnotdoit,atleastnotasaproc.SPSSdoesnotdoiteither:youhavetobuyspecialsoftware.StatisticalInnovationscreatedLatentGold,whichhasprobablybecomethegoldstandard(getit,‘gold’?).Italsorequiressometrainingandsomeexpertise,butLatentGoldismenudrivenandveryeasytouse.Also,likethelightbulb,itisnottruethatyouhavetounderstandalloftheintricatedetailsinordertouseit.Sometrainingisrequired,buttheresultsarewellworthit.
Theadvantageshavebeenalludedtobutjusttobeclear,LCAhasaLOTofadvantages.Ultimatelysegmentation’susefulnessisaboutstrategy.Thebetterthedistinctivenessthemoreobviouslyastrategybecomeslevelledoneachsegment.
However,thereareseveralimportantanalyticadvantages,especiallyinthewayLatentGoldarticulatesthealgorithm.First,LCAtellsyoutheoptimalnumberofsegments.Youdonothavetoguess.LCAusestheBIC(BayesInformationCriterion)and–LL(negativeloglikelihood)anderrorratetogiveyoudiagnosticsastowhatisthe‘best’numberofsegmentsgiventhesescoresonthesevariablesandthisdataset.
Second,LCAgivesindicationsastowhichvariablesaresignificantinthesegmentationsolution.Youdonothavetoguess.AnyvariablethathasanR2<10%canbedeemedinsignificant.
Third,LCAproducesanoutputthatscoreseveryobservationwiththeprobabilityofbelongingtoeachsegment.Ifobservation#1hasaprobabilityofbelongingtosegment1of95%andprobabilityofbelongingtosegment2of5%it’sprettyobvioustowhichsegmentthatobservationbelongs.Observation#1exhibitsverystrongsegment1behaviour.Butwhataboutobservation#2thathasaprobabilityofbelongingtosegment1of55%andprobabilityofbelongingtosegment2of45%?Thisobservationdoesnotdemonstrateverystrongsegmentbehaviour,foranysegment.UnderK-meansthisobservationwouldlikelybeassignedtosegment1.ButLCAgivesyouadiagnostic.Typicallysomeassumptionshouldbemade.It’susuallysomethinglike,anyobservationthatdoesnotscoreatleast70%likelihoodofbelongingtoanysegmentshouldbeeliminatedfromtheoutput.Thoseobservationsareplacedinsomeotherbuckettobedealtwithinsomeotherway.Thereshouldnotbemorethan5%oftheseoutliers,givenmostmarketingmodelsareat95%confidence.Agoodsolutionwillhavefarlessthan5%outliers.
Thesediagnosticsmaketheanalyticsveryfastandveryclean.Theyalsomakethesegmentationsolutionverydistinct.Asmentioned,thisisthehallmarkofagoodsegmentationsolution:distinctiveness.Butthisisnotjustvaluablefortheanalyst;itisofupmostimportancetothestrategist.Themoredistinctthesegmentationsolutionthe
clearereachstrategybecomes.
Table8.1Latentclassanalysis
RFM CHAID K-means LCA
Multivariable XX XX XX XX
Customer-centric XX
Multivariate XX XX
Probabilistic XX
BUSINESSCASEScott’sbosscalledhimintotheoffice.Helookedaroundwhilehisbossplayedwiththephone,whichalwaysirritatedScott.
‘SoScott’,hisbosssaid,grudginglylookingupfromhissmartphone.‘Wearereadytomakeamajorpushinconsumerstrategy.We’veaddedconsumerelectronicstoourproductmixandnowwanttodivedeeper.’
‘Thatsoundsgood.Whatdoesthatmeanformygroup?’
‘We’dliketoexploreversioningourdirectmailcatalogues,positioningoure-mailsmorestrategically,etc.WeallrememberyourONESIZEDOESNOTFITALLspeechattheoffsitelastquarter.’
‘Yeah,sorry,therehadbeenafewcocktailsand…’
‘No,it’srighton.We’retalkingaboutinitiatingacustomermarketsegmentationprojectandyouareslatedtoleadit.’
Scottgulped.Thatwouldbealotofwork.Itwouldbealotoffunandveryvisible.‘I’llstartputtingateamtogetherandbegintogothroughtheprocess.’
Scottwentbacktohisoffice(he’dbeenpromotedbynow)andsketchedoutaprocess,outputtingasegmentationbasedonconsumerbehaviour.Hewroteonhiswhiteboardalistofstepsandtheninvitedstakeholderstoacollectionofmeetings.Theywerestartingabigproject:customersegmentation.
Strategize
Thefirststepinbehaviouralsegmentationistostrategize.Thistendstobeaviewfromtwolenses:marketingstrategyandconsumerbehaviour.Thesetwoshouldnotbecontradictory.
Scott’steammetandtherewassomediscussionbutScottprovidedleadershipongoalsbasedonthemantraofPeterDrucker,thelegendarymanagementguruwhocreatedbusinessmanagementasadistinctandseparatediscipline.Druckersaidthereareonlythreemetricsthatmakeanybusinesssense:increasingrevenue,increasingcustomersatisfactionanddecreasingexpenses.Ifyouareworkingonaprojectthatcannottietoatleastoneofthesemetricsyoushouldaskyourselfwhetheryoureallyshouldbedoingthatproject.Scott’steamdecidedtheirmarketingstrategyforthesegmentationwouldbeincreasingnetprofitmargin.Thewholepointforeachsegmentwasstrategizingcross-sell/up-sellopportunities.Thiswasadeparturefromlastyear’sstrategyofmostlyacquiringcustomers.Theyrealizedhowexpensiveacquisitioncanbe.
Intermsofconsumerbehaviour,Scott’steamhypothesizedpotentialconsumersegments.Therewouldlikelybeoneormoregenerallysensitivetoprice,oneormorehavingdifferentproductpenetrations,oneormorereactingtocompellingmessagesdesignedforthemandoneormorethatpreferonechanneloveranother.Thisisjustusingtacticalmarketing(product,price,promotionandplace)differentiallyagainsteachsegment.
Therealissuewasintermsofbehaviour.Theytalkedlongaboutwhatcausedthebehaviourstheywouldsee.Theyrationalizedtheremightbeaconsumersegmentheavilyintogamesandentertainment,oranotherconsumersegmentveryhightech/web-centric/earlyadopters,etc.Theremightbeanothersegmentneedingarelationship,moreonthelow-techside,needingtheirhandsheldthroughthetechno-babble.Theyknewmostoftheir(behavioural)datawouldbetransactionsandmarcomresponses.
Sotheteamthoughtthat,giventhemarketingstrategyofincreasingnetrevenueandthevariouspotentialconsumerbehavioursegments,astrategycouldbelevelleddifferentlyateachsegment.Thatis,acompletelydifferentcommunicationstylewouldbeusedon,say,aprice-sensitive,low-techconsumerasopposedtoaheavygamer.Scottthoughttherewasalotofexcitementandbuy-inforthisoutput.
Collectbehaviouraldata
Scottwenttohisdatabaseteamandtheytalkedaboutwhatdatatheyhad.Firsttheyhadtodefineaconsumer(asopposedtoasmallbusiness,eg,asoleproprietorship)butthatwasfairlystraightforward.Thentheytalkedaboutdata.
Scottwantedbehaviouraldata,specificallytransactionsandmarcomresponses.Theytalkedabouttwoorthreeyearsofhistory.ThePCconsumerbusinesshasastrongseasonality(peakinginAugustandevenmoreinDecember)andScotthadalreadylearnedhowseasonalityhadtobetakenintoaccount.
Intermsoftransactions,theissuewaswhatkindofgranularitywasneeded.Theydecidedtheyneededonlybroadproductcategories–laptops,desktopsandworkstations(veryfewconsumerswouldbuyaserver)–andonlygoonelevelbelowthis,eg,high-end
desktopversusscaled-backdesktop,andsoon.They’daddconsumerelectronics,whichincludedtelevisions,printers,software(personalproductivity,games,etc.),digitalcameras,accessories,etc.They’dincludeproductdetailsaswellasgrossrevenueanddiscountsapplied,netrevenue,numberofpurchases,timebetweenpurchases,monthstheproduct(s)werepurchased,etc.
Thinkingaboutmarcomresponses(asignofbehaviourandanindicationofengagement)theytalkedaboutbothdirectmailande-mail.Theywouldmostlyignoresocialmedia/in-boundmarketingbecauseofdifficultyinmatchingcustomers,andwebbanner/advertising(again,itcannotbetieddirectlytoaparticularcustomer).Theyknewtowhomtheysentacatalogue,whentheysentit,whatwasonthecoverandwhatoffers/promotionswereinsideeachone.Eachcataloguehadaunique800phonenumber,sowhenthecustomersrang,thecallcentrewouldknowwhichcataloguehaddriven(atleast)thatinquiry.Ifapromotionwasusedonlinethosewerealsouniquetoeachcatalogue.Thesamedatawasavailablefore-mail.Eachwassenttoaparticulare-mailaddressandtheycouldkeeptrackofeachopenandclick,etc.Soagain,therewasalotofdata.
Collectadditionaldata
Thenextstepwastocollectadditionaldata.Thiscouldcomefromseveralpossiblesources.Itcouldcomefromcreating/derivingdatafromthedatabase.Itcouldcomefromoverlaydataandfromprimarymarketresearchdata.
Fromtheconsumerdatabasetheycreatedadditionalvariables.Theseincludedmonthlydummyvariablesforseasonality.Theycalculatedtimebetweenpurchases,theyderivedtypicalmarketbasketsandtheyputtogethershareofproducts,thatis,percentofdesktops,percentofconsumerelectronics,andsoon.
Theypurchasedoverlaydata.Thisincludedbothdemographics(suchasage,education,income,gender,sizeofhouseholdandoccupation)aswellaslifestyleandinterestvariables.Theyhopedthesewouldfleshoutthesegments.Thisdatawasprettywellmatched,atabout80%totheirconsumerdatabase.
TherewasalimitedamountofprimarymarketingresearchbutScottfoundafewstudiesthatcouldbehelpful(especiallyintheProbephaseofthefourPsofstrategicmarketing).Theyhaddoneacustomersatisfactionstudyandanawarenessstudy.Thesestudieseachtookcustomernamesfromthedatabaseand,whilenotwellrepresentedcouldbematchedtothetransactionfile.
Analytics
Collectdataandsample
Notetherearetwokindsofvariablesinthisenvironment:segmentingvariablesandprofilingvariables.Segmentingvariablesarethoseusedtocreatethesegments,while
profilingvariablesareeverythingelse.Theprimarymarketingresearchdatawillbeprofilingvariables,astheyaretoounderpopulatedtobeusedassegmentingvariables.Mostofthedemographicswillbeprofilingvariables,asdemographicsaretypicallynotusefulindefiningsegments.Buttheother(behavioural)variableswillgothroughthealgorithmandbetestedastowhetherornottheyaresignificantandifsowillbekeptassegmentingvariables.Notethatanythingthatisnotasegmentingvariablewillbeaprofilingvariable.
What’snextiswhatScotthasbeenmostlookingforwardto:theanalytics.Thereareseveralstepsinthisprocessandtheyareallenjoyable.
Sofirsthewouldhavetotakeasample.LCAcannotoperateonmillions(orevenhundredsofthousands)ofrecords.Thealgorithmwouldtakeyearstoconverge.Sohechoosesarandomsampleof,say,20,000customerrecords.Theserecordshavebeenmatchedwithtransactionsandmarcomresponses,deriveddataandoverlaydataand(wherepossible)primarymarketingresearchdata.
Usuallythereisnoneedtoworryaboutoversampling(acertainvariable)orstratifying,etc.
Oversampling:asamplingtechniqueforcingaparticularmetrictobeoverrepresented(larger)inthesamplethaninsimplerandomsampling.Thisisdonebecauseasimplerandomsamplewouldproducetoofewofthatparticularmetric.
Stratifying:asamplingtechniquechoosingobservationsbasedonthedistributionofanothermetric.Thisisdonetoensurethesamplecontainsadequateobservationsofthatparticularmetric.
Intypicalconsumermarketingasimplerandomsampleisfine.Takealookatanygoodgeneralstatisticsbookforsampling,etc.,suchasStatisticalAnalysisforDecisionMaking,byMorrisHamburg(1987).
Normalize
Now,eventhoughnotstrictlynecessary,isthetimetoweedoutnon-normality.Iliketodothissteptoensureagainststrangeorweirddataelements.So,therearetwostages.
Thefirststageissimplytotesteveryvariablefor‘non-normality’.Thisgenerallymeanstakingthez-scoreofeachvariableorstandardizingeachvariable,thendeletinganyobservationthathasascore>+/–3.0standarddeviations.(Threestandarddeviationsis99.9%oftheobservationsinanormaldistributionandisthereforeveryNON-normal.)Theseareclearlynon-normaldataelementsandthereshouldnotbeverymanyofthem.Somepeoplereplacetheseoutlierswiththemeanbutifthereareenoughobservationsthisisnotnecessaryandalittletooarbitraryformytaste.
ForthesecondstageIwillhavetoaskyoutomakesureyou’resittingdown.RememberhowI’veclamouredabouthowbadK-meansisandhowit’snotagood
solution?WellnowI’maskingyoutouseK-meanstotestfornormality.
TheideaistorunK-meanswithaLOTofclusters,like100orso.Usethe(typicallybehavioural)variablesthatmakemostsensetoyouindefiningtheclusters.Allwearetryingtodoisformclustersthatareunusualintermsofbehaviouralmotivations.Sonowwith,say,100clusters,thoseclustersthatareverysmall(likehavingonlyafewcustomersinthem)arebymultivariatedefinition‘unusual’.Theseobservationsshouldbeeliminated.Thepointisthatwhilewe’velookedatanysinglevariablebeingunusual,thistechniqueusesamultivariableapproachtofindagroupofcustomersmovinginsuchawaytobenon-normal.That’swhytheseobservations(customers)aredeletedfromfurtheranalysis.
Notethatwearetryingtounderstandthenormalmarket.That’swhythereiseffortputforthtodetectnon-normality.Becausewehaveasampleit’sevenmoreimportanttoascertainunusualscoresonvariablesorunusualcustomerbehaviourandeliminateit.
So,let’ssaythatScottandhisteamdidtheaboveprocessandtheirsamplewentfrom20,000to18,000.Thenherandomlysplitsthis18,000intotwofiles,AandB.Thiswillbeatestfileandavalidationfileforlater.
RunLCA
NowScottfeedstestfileAintothesoftwareandisreadytorunLCA.Hefirstchoosestorunasolutioncreatingsegments2through9,justtonarrowdownwherethingsare.LCAshowsdiagnostics(BIC,LL,etc.,seeabove)tohelpwiththeoptimalnumberofsegments(seeTable8.2).NotethattheBICgoesdownandisataminimumatsixsegments.ThistellsScottsixsegmentsareprobablytherightnumber.TheBICistheBayesInformationCriterion.Thinkofitasanareaoferror(essentiallynegativeprobability)withthesmallertheareathebetter.Whicheverclusterhasthesmallesterror(intermsofpredictingmembership)thebetteritis.
Table8.2BayesInformationCriterion
BIC
2cluster 92,454
3cluster 79,546
4cluster 61,565
5cluster 59,605
6cluster 58,456
7cluster 58,989
8cluster 59,650
9cluster 60,056
Nowherunsthesecondmodel,afterdeletingthosevariablesthatareinsignificantandcomesupwithTable8.3.
Table8.3BayesInformationCriterion:secondmodel
BIC
3cluster 64,466
4cluster 56,550
5cluster 41,058
6cluster 40,611
7cluster 57,089
8cluster 58,067
Thevariablesheusesalsogivediagnosticsastowhicharesignificant.NoteTable8.4below,showingR2<10%formostofthedemographics.TheseScottremoves.
Table8.4Listofvariablesremoved
Age 0.05
Education(years) 0.07
Income 0.01
Sizehousehold 0.02
Occupation–bluecollar 0.05
Occupation–whitecollar 0.04
Occupation–agriculture 0.02
Occupation–government 0.01
Occupation–unemployed 0.02
Ethnicity–asian 0.02
Ethnicity–white 0.02
Ethnicity–black 0.01
Thisispartofthemodellingexercise:putvariablesin,runthesegmentsolutions,seewhereBICisbest,lookatsignificanceandremovethosethatareinsignificant,etc.Whilethisseemstimeconsuming,itendsupbeingfarfasterthan,say,K-means,mostlybecause
thereisabsolutelyagoodsolutionattheend,notanarbitraryquagmireofundifferentiatedclusters.
Thevariablesthatendupbeingsignificantinclude:
Figure8.3SignificantVariables
Notethatthesevariablesarebehavioural,asexpected.Revenuevariablesarenoteventested,astheyaretheRESULTofbehaviour.Demographicstypicallyarenotsignificantandarealsonotbehavioural.Ofcourse,anyandallofthesevariablescanbeusedforprofiling.
Thenextstepistocorrectforwhitenoise,usingbi-variateresiduals.Thisstepaddsalargenumberofparametersandwillslowtheanalysisdown.Waydown.Analytically,allthreedimensionsarenudgedsimultaneously:findthenumberofsegments,findthesignificantvariablesandcorrectwithbivariateresiduals.
Thenextstepistomarkthosebivariateresiduals.Theseareindicationsofsomepatternremainingthattheindependentvariablesarenoteliminating.Thebivariateresidualsshouldbecheckeddowntoabout3.84.Thisisthe95%levelofconfidence(rememberthe95%z-scoreforlinearmodelsis1.96and3.84=1.96*1.96,acurvilinearmetric).
Thecommonlaststepistorunthesecondfilethroughusingthesamenumberofsegments,six,andthesamevariablesfoundtobesignificant.Checkthebivariateresidualsandlookatthetwooutputs.Theyshouldappearessentiallythesame.Iusuallydonotstatistically‘test’thissameness,Ijustlookatit.Ihaveneverseenthetworesultstobe
differentinanymeaningfulway.
Profileandoutput
Theprofilegenerallyusesallthevariables.Oftenthereisa‘top-down’viewanda‘bottom-up’view,orastrategyviewandatacticalview,orageneralviewandaspecificview.Belowisthestrategic,top-downorgeneralviewofthesixsegments.Thislensputsthesegmentstogether,tocompareandcontrast,allatonce,lookingatKPIs.
Table8.5Generalviewofsixsegments
Seg1 Seg2 Seg3 Seg4 Seg5 Seg6
%ofmarket 30% 24% 19% 15% 9% 3%
%ofrevenue 32% 39% 9% 17% 2% 0%
#Totalpurch 14.49 25.64 8.88 18.17 7.95 9.65
RevDTpurch 3,150 4,730 999 2,592 352 81
RevNBpurch 2,320 720 680 1,152 630 168
Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154
#DMsent 13.5 9.1 19.5 5.6 6.8 9.5
#EMsent 15.9 17.8 9.1 12.9 15.5 12.8
#EMopen 1.4 3.2 0.4 4.5 1.7 2.6
#EMclick 0.1 0.4 0 2.3 0.3 0.2
#Prodpurchcallcentre 3.6 2.6 8 0.9 2 3.9
#Prodpurchonline 10.9 23.1 0.9 17.3 6 5.8
Education(years) 19.1 12.9 11.8 17.9 13.8 13.8
$Income 185K 60K 45K 125K 15K 75K
%Q4purchase 25% 70% 83% 14% 15% 41%
Avgtimebetweenpurch 6.5 3.1 16.5 4.2 9.4 15.4
Avgtimebetweenwebvisits 3.2 2.1 9.5 1.9 3.9 8.5
Afewquickcommentscanbemadeontheaboveoutput.Firstisthatsomedemographicsareshown.Thisistypical.Rememberthatwhiledemographicsarenotstatisticallysignificantindesigningthesegmentation,theymightstillbeofuseinfleshingoutthesegments(andadvertisersseemtolovedemographics).Thefirststageispartitioningandthesecondstageisprobing.Addingadditionaldataispartoftheprobingstage.
Let’slookatthesegmentationsolution.Segment1isthelargestintermsofmarketandeachsegmentissuccessivelysmallerwithsegment6thesmallestat3%.Thestoryishowsegmentsizecomparestopercentofrevenuegenerated.Notethatsegment2contributes39%oftherevenuewithonly24%ofthemarket.Notethatsegment5,conversely,isnotpullingitsfairsharehaving9%ofthemarketbutgeneratingonly2%oftherevenue.ThesemetricsbegintoletScottknowwhereheshouldputhisresourcesandwhichsegmentsare‘worth’marketingto.Seethegraphbelow.
Figure8.4%ofmarketvs%ofrevenue
*Doesnotaddto100%duetorounding.
Anotherstorydisplaysitselfaroundchannelpreference.Segment2andsegment4seemtobeveryweb-centric,whilesegment3isNOTonethatpursuesonlinepurchases.Segment4opens4.5ofthe12.9e-mailssenttothem,whereassegment3opens0.4ofthe9.1e-mailssenttothem.Segment2purchases23.1oftheir25.64productsonline(andsegment4purchases17.3oftheir18.17productsonline)butagainsegment3purchasesonly0.9oftheir8.88productsonline.Theseareclearbehaviouraldifferences.
Segment1hasthehighestandsegment5(mostlystudents,seebelowdetails)hasthelowestincome.Segment1hasthemosteducationandsegment2theleasteducation.Thefiguresbelowshowoccupationsandotherdemographics.
Comments/detailsonindividualsegmentsAfewnotesandobservationsoneachsegmentfollow.
Segment1
Segment1isthelargestsegment(30%ofthemarket)andcontributes32%oftherevenue.
Segment1purchasesmoredesktops(3.5)andnotebooks(2.9)thananyothersegment.Theyhaveahighpenetrationofproductivesoftware(twicetheaverage)probablyheavilyinvestedinsmartphoneandtabletownership,whichmeanstheyareveryhigh-tech
comfortable.
Segment1receivesthesecond-highestnumberofdirectmailsande-mailssent.It’sinterestingtonote,however,thattheyhavethenext-to-lowestnumberofe-mailsclicked/numberofe-mailsopenat0.7%.
Segment1hasthelargestsizehousehold(4.1)andmost(70%)whitecollaroccupations.Theyhavethehighestincomeandhighesteducation.Theyareyoungishandprobablycouldbecalledyuppies.
Segment2
Segment2isthenext-to-largestsegment(24%ofthemarket)andcontributesmorethantheirfairshareoftherevenueat39%.
Segment2paysbyfarthehighestdesktopprices(75%aboveaverage)andhasnearlyfourtimeshigherthanaveragegamingsoftwarepurchases.Almostnoproductivitypurchases,butalotofaccessory(nearlythreetimesaverage)andphonepurchases(nearlytwiceaverage).
Segment2showsnext-to-highestnumberofe-mailopensandthehighestnumberofproductspurchasedonline,88%aboveaverage.Thissegmentcallsthecallcentrenext-to-lowestnumberoftimesfromthecataloguebuthasthehighestnumberofcallsfrome-mailsandtheyhavethemostonlineconfigurations.
Thissegmentisthegamers!Theytendtobeyoungandsinglewithnext-to-smallestsizeofhousehold.Theypurchaseallofthegamingaccessories:headphones,joystick,etc.
Segment3
Segment3makesup19%ofthecustomermarketbutonlyaccountsfor9%oftherevenue.Thissegmentdoesnotcomeclosetopullingitsweight.
Segment3purchasesalargeamountofdigitalcameras(nearlytwiceaverage)and50%morephones.Whentheydopurchasetheytendtobuylow-endentry-leveltechnology,whichisonereasontheirrevenuecontributionissolow.
Segment3receivesthehighestnumberofcataloguesandthelowestnumberofe-mails.Thissegmentopensfewerandclickslessthananyother.Segment3needsa(directmail)discountinordertopurchase.
Segment3callsfromdirectmailmoreandpurchasesfromthecallcentremorethananyothersegment.Conversely,thissegmentcallsfrome-maillessandpurchasesonlinelessthananyothersegment.
Segment3needshand-holding.Theyarelowtechandneedarelationshiptofosterapurchase.TheytendtobeAfrican-American,withahighpercentageofbluecollarandgovernmentoccupations.Thissegmenthastheleasteducation.Theycallthecallcentre
withcomplaintsmorethananyothersegmentandtendtopurchasemostlyduringtheChristmasseason.
Segment4
Segment4is15%ofthemarketandgenerates17%oftherevenue.
Segment4purchasesnext-to-mostdesktopsandnext-to-mostnotebooks.Theyareveryhightech,purchasingthemostTVs,cameras,networkandotheraccessories.
Thissegmenthasthehigheste-mailopensandbyfar(overfourtimesaverage)e-mailclicksthananyothersegment.Theypurchasefewerproductsfromthecallcentreandnext-to-mostproductspurchasedonlinethananyothersegment.Theyhavetheshortesttimebetweenwebvisits.
Segment4isveryweb-centricandprobablybelieves‘printisdead!’TheytendtobeAsian,veryhightech,withengineeringwhitecollaroccupations.Theywouldbeearlyadopters,withnext-to-highesteducationcomparedtoothersegments.Theyignoredirectmailandmakemostoftheirpurchasesonline.
Segment5
Thissegmentistheleastsuccessful,being9%ofthemarketbutonlypulling2%oftherevenue.
Segment5purchaseslow-endproducts(fewdesktop,largelynotebooks),mostlyduringback-to-schoolsalesandusuallywithadiscount.Theypurchasenearlyzeroconsumerelectronics.
Segment5receivesthenext-to-leastnumberofdirectmailsandmakesthenext-to-leastcallcentrepurchases.
Segment5appearstobemostlystudents,single,unemployed,lowincome,etc.
Segment6
Segment6isonly3%ofthemarketingandgenerates<1%oftherevenue.
Segment6reallyonlypurchasesaccessoriesandoccasionalitems,spareparts,etc.
Thissegmentisnotreallyengagedinourbrand,doesnotreallyrespondtocommunications,etc.Segment6doesnotvisitourwebsitemuchandhasthelongesttimebetweenpurchases.ThissegmentmightbeatargettoDE-marketto.Notethehighpercentageofagriculturaloccupations.
Tables8.6and8.7presentsomedetailsbysegment,asreferencedabove.
Table8.6Detailsbysegment
Segment Segment Segment Segment Segment Segment
1 2 3 4 5 6
%ofmarket 30% 24% 19% 15% 9% 3%
%ofrevenue 32% 39% 9% 17% 2% 0%
NumDTpurch 3.5 2.2 1.11 2.88 0.88 0.09
NumNBpurch 2.9 1.2 0.85 1.44 1.05 0.21
Numelectronics–TVpurch 0.11 1.15 0.09 1.35 0.05 0.21
Numelectronics–camerapurch 0.02 0.05 1.06 1.88 0.24 0.45
Numelectronics–printerpurch 1.38 1.06 1.15 1.19 1.09 0.29
Numelectronics–accessorypurch 1.2 5.5 0.08 1.08 0.29 1.87
Numelectronics–phonepurch 0.03 1.21 0.99 0.89 0.09 0.35
Numelectronics–sw–gamepurch 0.02 9.55 0.08 0.09 0.68 0.65
Numelectronics–sw–productivepurch
4.1 0.09 1.06 2.21 0.24 0.87
Numother–networkpurch 1.1 1.02 1.54 2.89 1.98 0.87
Numother–accessoriespurch 0.11 1.55 0.22 1.59 1.08 1.54
Numother–otherpurch 0.02 1.06 0.65 0.68 0.28 2.25
Numtotalpurch 14.49 25.64 8.88 18.17 7.95 9.65
RevDTpurch 3,150 4,730 999 2,592 352 81
RevNBpurch 2,320 720 680 1,152 630 168
Revelectronics–TVpurch 127 1,811 104 1,553 30 242
Revelectronics–camerapurch 7 15 371 658 60 158
Revelectronics–printerpurch 207 105 173 179 82 44
Revelectronics–accessorypurch 90 853 6 81 19 140
Revelectronics–phonepurch 7 454 223 200 14 79
Revelectronics–sw–gamepurch 1 716 5 6 37 42
Revelectronics–sw–productivepurch
308 2 80 166 18 65
Revother–networkpurch 61 97 85 159 109 48
Revother–accessoriespurch 4 271 8 56 38 54
Revother–otherpurch 0 12 10 10 4 34
Revother–otherpurch 0 12 10 10 4 34
Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154
Table8.7Additionaldetailsbysegment
Segment1
Segment2
Segment3
Segment4
Segment5
Segment6
NumberDMsent 13.5 9.1 19.5 5.6 6.8 9.5
NumberEMsent 15.9 17.8 9.1 12.9 15.5 12.8
NumberEMopen 1.4 3.2 0.4 4.5 1.7 2.6
NumberEMclick 0.1 0.4 0 2.3 0.3 0.2
Numberprodpurchcallcenter 3.6 2.6 8 0.9 2 3.9
Numberprodpurchonline 10.9 23.1 0.9 17.3 6 5.8
NumberDMdiscount 8.1 5.5 11.7 3.4 4.1 5.7
NumberEMdiscount 11.1 12.5 6.4 9 10.9 9
NumberDMcall 1.2 0.8 15.9 0.2 3.9 9.5
NumberEMcall 9.4 12.8 2.1 3.4 8.4 4.8
Numonlineconfig 5.5 21.5 0.7 16.5 12.6 0.4
Numbercallcenterpurch 3.6 2.6 8 0.9 2 3.9
Numbercallcentercomplaint 2.1 0.9 5.6 3.2 1.2 0.5
Age 28.9 25.5 41.9 30.1 21.2 38.9
Education(years) 19.1 12.9 11.8 17.9 13.8 13.8
Income 185,000 60,000 45,000 125,000 15,250 75,000
Sizehh 4.1 1.2 3.9 3.7 1.1 3.1
Occupation–bluecollar 20% 19% 60% 18% 13% 25%
Occupation–whitecollar 70% 38% 1% 65% 5% 35%
Occupation–agriculture 4% 5% 2% 1% 5% 18%
Occupation–government 3% 28% 25% 15% 15% 11%
Occupation–unemployed 1% 8% 10% 1% 60% 10%
Ethnicity–asian 15% 5% 2% 21% 7% 1%
Ethnicity–white 55% 65% 35% 41% 70% 80%
Ethnicity–black 20% 15% 35% 8% 10% 11%
Q1purchase 30% 4% 6% 20% 5% 1%
Q2purchase 25% 10% 5% 31% 5% 3%
Q3purchase 20% 15% 5% 33% 75% 55%
Q4purchase 25% 70% 83% 14% 15% 41%
Avgtimebetweenpurch(months) 6.5 3.1 16.5 4.2 9.4 15.4
Avgtimebetweenwebvisits(weeks)
3.2 2.1 9.5 1.9 3.9 8.5
Namingthesegments
Oneofthemostenjoyableexerciseseveristhenamingofthesegments.Acommonwaytodoitisthroughrevenueandproducts.Thisisthedesktopsegmentandthisisthelow-techsegment,etc.Anotherpossibilityiswithmarcom.Thisisthedirectmailrespondersandthisisthee-mailpreferencesegment,etc.Bothoftheseareprobablytoosimplistic.
Eachsegmentnameshouldhaveonlytwoorthreewordstodescribeit:desktopdevotees,gamers,lifestarters,web-centrics,etc.Theideaistobedescriptiveaswellasmemorable.
K-meanscomparedtoLCAThecomparisonbelowcamefromScott’sdebatewithotheranalyticfolks.SomeofthemhadlearnedK-meansandbecauseLCAwasnewtothemdidnotreallyunderstandortrustit.ThereforeScottranLCAandtoldtheK-meansteamthenumberofsegmentshefoundandhetoldthemwhichvariablestouse.Notethatthesetwopiecesofinformation(howmanysegmentsandwhichvariablestousearesignificant)wouldnoteverbeinformationK-meanswouldhave.ThushegavetheK-meansteamtwoHUGEadvantages.EachteamranthealgorithmandproducedtheKPIsinTable8.8.
Table8.8KPIs
LCAoutput Segment1
Segment2
Segment3
Segment4
Segment5
Segment6
hi/low
%ofmarket 30% 24% 19% 15% 9% 3% 12
%ofrevenue 32% 39% 9% 17% 2% 0% 81.44
Numtotalpurch 14.49 25.64 8.88 18.17 7.95 9.65 3.23
RevDTpurch 3,150 4,730 999 2,592 352 81 58.4
RevNBpurch 2,320 720 680 1,152 630 168 13.81
Revtotalpurch 6,281 9,786 2,742 6,811 1,393 1,154 8.48
NumberDMsent 13.5 9.1 19.5 5.6 6.8 9.5 3.48
NumberEMsent 15.9 17.8 9.1 12.9 15.5 12.8 1.96
NumberEMopen 1.4 3.2 0.4 4.5 1.7 2.6 12.4
NumberEMclick 0.1 0.4 0 2.3 0.3 0.2 124.04
Numberprodpurchcallcentre
3.6 2.6 8 0.9 2 3.9 8.8
Numberprodpurchonline
10.9 23.1 0.9 17.3 6 5.8 25.99
Education(years) 19.1 12.9 11.8 17.9 13.8 13.8 1.62
Income 185,000 60,000 45,000 125,000 15,250 75,000 12.13
Q4purchase 25% 70% 83% 14% 15% 41% 5.93
Timebetweenpurch(months)
6.5 3.1 16.5 4.2 9.4 15.4 5.32
Timebetweenvisits(weeks)
3.2 2.1 9.5 1.9 3.9 8.5 5
K-meansoutput Segment1
Segment2
Segment3
Segment4
Segment5
Segment6
hi/low
%ofmarket 24% 19% 17% 16% 15% 9% 2.67
%ofrevenue 19% 15% 17% 19% 18% 13% 1.45
Numtotalpurch 14.1 17.7 16.2 14.8 16.9 17.2 1.26
RevDTpurch 1,901 2,490 3,498 4,021 2,011 2,666 2.12
RevNBpurch 1,344 1,108 1,655 1,100 1,100 911 1.82
Revtotalpurch 4,992 5,006 6,271 7,509 7,489 9,200 1.84
NumberDMsent 10.1 11 11.2 12.8 12.9 15.1 1.5
NumberEMsent 11.9 15.2 16.4 15.2 14.9 15 1.38
NumberEMopen 1.8 2.2 2.3 2.2 2.1 2.8 1.56
NumberEMclick 0.61 0.66 0.54 0.52 0.51 0.26 2.54
Numberprodpurchcallcentre
3.1 3.6 3.7 3.9 3.4 4.9 1.58
Numberprodpurch 9.1 10.2 12.4 17.1 13.5 13.6 1.88
online
Numtotalpurch 12.2 13.8 16.1 21.0 16.9 18.5 1.73
Education(years) 16.3 16.4 15.1 13.1 15.3 15.5 1.25
Income 109,655 109,166 98,066 98,054 97,112 88,055 1.25
Q4purchase 39% 34% 61% 44% 44% 55% 1.79
Timebetweenpurch(months)
6.6 7.5 7.7 9.1 8.1 7.9 1.38
Timebetweenvisits(weeks)
3.8 4.1 4.5 4.6 3.5 4.9 1.4
NoticeinthetopLCAtablethevariable‘Numtotalpurch’.Thistableshowstheaveragesbysegment.Segment2onaveragepurchasesthemostitems,with25.64andsegment5purchasestheleastitemsonaveragewith7.95.Lookatthelastcolumnandseethehigh/lowand25.64/7.95=3.23.Thatisameasureofrange,ordispersion.
SeethelowerpartofthetablewhichusesK-means.Itisthesamedata,samenumberofsegmentsandsamevariablesusedassignificant.Thehigh/lowofNumtotalpurcharemuchlessdifferentthanthatfromLCA.Ahighof17.7andalowof14.1givearangeofonly1.26.Thisisatypicaldifference.K-meansoutputwouldwork;LCAissimplybetter,moredistinctandultimatelyproducesaclearerstrategy.
AnotherfairlycommonfindingcomparingK-meanstoLCAisintermsofsegmentsize.LCAproducessegmentsrangingfrom30%to3%,butK-meansrangesonlyfrom24%to9%.BecauseK-meansproducesroughlysphericalclustersandtheytendtobeofsimilarsize.Thereisnomarketingtheorythatwouldhypothesizethesegmentsshouldbeofaboutthesamesize.
ScottconvincedtheteamthattheLCAoutputwastheobviouswaytogo.
Elasticitymodelling
Oneverynaturalandhelpfulexerciseaftersegmentationistodoelasticitymodelling.(RememberChapter3ondemandwentthroughthemodellingdetail.)Thisshowsdifferentpricesensitivitiesbysegment.Thatis,onesegmentwilllikelybesensitivetopriceandanothersegmentwilllikelyNOTbesensitivetoprice,etc.Thisallowsforverylucrativestrategies.Reviewearlierchaptersforhowelasticitymodellingistypicallydone.
WhatScottfoundwasthatsegment1isnotsensitivetoprice.Thissegmentdoesnotrequireadiscountinordertopurchase.Hefoundconverselythatsegments3and5areverysensitivetoprice.Thesearethesegmentsthatwillonlybuywithsomekindofpromotion.
Testandlearnplan
Thelaststeptendstobeputtingtogethersomekindoftestingplan.Wewillcoverstatisticaldetailslaterinthebook,buttheconceptisstraightforward.
Theideaistocorroboratethesensitivitiesthesegmentationfound.Thatis,ifasegmentissensitivetoprice,testthat.Ifasegmentprefersaparticularchannel,testthat,etc.
Usuallyselectionistestedfirst,thenpromotionandthenchannelorproductcategory,etc.Theseareusuallyinatestversuscontrolsituation.
HIGHLIGHT
WHYGOBEYONDRFM?(ThisarticlewaspublishedinadifferentformatinMarketingInsights,April2014)
AbstractWhileRFM(recency,frequencyandmonetary)isusedbymanyfirms,itinfacthaslimitedmarketingusage.Itisreallyonlyaboutengagement.Itisvaluableforashort-term,financialorientationbutasorganizationsgrowandbecomemorecomplexamoresophisticatedanalytictechniqueisneeded.RFMrequiresnomarketingstrategyandasfirmsincreaseincomplexitythereneedstobeanincreaseinstrategicplanning.Segmentationistherighttoolforboth.
RFMhasbeenapillarofdatabasemarketingfor75years.Itcaneasilyidentifyyour‘best’customers.Itworks.SowhygobeyondRFM?Toanswerthat,let’smakesureweallknowwhatwe’retalkingabout.
WhatisRFM?Onedefinitioncouldbe,‘Anessentialtoolforidentifyinganorganization’sbestcustomersistherecency/frequency/monetaryformula.’RFMcameaboutmorethan75yearsagoforusebydirectmarketers.Itwasespeciallypopularwhendatabasemarketingpioneers(suchasStanRapp,TomCollins,DavidShepherdandArthurHughes)startedwritingtheirbooksandadvocatingdatabasemarketing(asthenextgenerationofdirectmarketing)nearly50yearsago.Itbecameapopularwaytomakeadatabasebuild(anexpensiveproject)returnaprofit.Thus,themostpressingneedwastosatisfyfinance.
JacksonandWangwrote,‘Inordertoidentifyyourbestcustomers,youneedtobeabletolookatcustomerdatausingrecency,frequencyandmonetaryanalysis(RFM)…’(JacksonandWang,1997).Againthefocusisonidentifyingyourbestcustomers.But,itisnotmarketing’sjobtojustidentifyyour‘best’customers.‘Best’isacontinuumandshouldbebasedonfarmorethanmerelypastfinancialmetrics.
TheusualwayRFMisputintoplace,althoughthereareaninfinitenumberofpermutations,endsupincorporatingthreescores.First,sortthedatabaseintermsofmostrecenttransactionsandscorethetop20%,say,witha5andondowntothebottom20%witha1.Thenre-sortthedatabasebasedonfrequency,maybewiththenumberoftransactionsinayear.Again,thetop20%geta5andthebottom20%geta1.Thelaststepistore-sortthedatabaseon,say,salesdollarvolume.Thetop20%geta5andthebottom20%geta1.Now,sumthethreecolumns(R+F+M)andeachcustomerwillhaveatotalrangingfrom15to3.Thehighestscoresarethe‘best’customers.
Table8.9Customertotals
CustomerID R F M Total
999 3 2 1 6
1001 5 3 3 11
1003 4 4 2 10
1005 1 5 2 8
1007 1 4 1 6
1009 2 4 3 9
1010 3 4 4 11
1012 2 3 5 10
1014 3 1 5 9
1016 4 1 4 9
1017 5 2 3 10
1018 4 3 4 11
1020 4 4 3 11
1022 3 5 3 11
1024 2 4 2 8
1026 1 3 5 9
Notethatthis‘best’isentirelyfromthefirm’spointofview.Thefocusisnotaboutcustomerbehaviour,notaboutwhatthecustomerneeds,whythosewithahighscorearesoinvolvedorwhythosewithalowscorearenotsoengaged.Thepointistomakea(financial)returnonthedatabase,nottounderstandcustomerbehaviour.Thatis,themotivationisfinancialandnotmarketing.
RFMworksasamethodoffindingthosemostengaged.Itworkstoacertainextent,andthatextentisselectionandtargeting.RFMissimpleandeasytouse,easytounderstand,easytoexplainandeasytoimplement.Itrequiresnoanalyticexpertise.Itdoesn’treallyevenrequiremarketers,onlyadatabaseandaprogrammer.
Sayyoure-scorethedatabaseeverymonth,inanticipationofsendingoutthenewcatalogue.ThatmeansthateverymontheachcustomerpotentiallychangesRFMvaluetiers.Aftereverytimeperiodanewscoreisrunandanewmigrationemerges.Notethatyoucannotlearnwhyacustomerchangedtheirpurchasingpatterns,whytheydecreasedtheirbuying,whytheymadefewerpurchasesorwhythetimebetweenpurchaseschanged.Muchlikethetipofaniceberg,onlytheblatantresultsareseenandRFMgivesnothinginthewayofunderstandingtheunderlyingmotivationsthatcausedtheresultantactions.Therecanbenorationaleastocustomerbehaviourbecausethepurposeofthealgorithmusedwasnotforunderstandingcustomerbehaviour.RFMusesthethreefinancialmetricsanddoesnotuseanalgorithmthatdifferentiatescustomerbehaviour.
BecauseRFMcannotincreaseengagement(itonlybenefitsfromwhateverlevelofinvolvement,brandloyalty,satisfaction,etc.youinheritedatthetime–withnoideaWHY)ittendstomakemarketerspassive.Thereisnorelationshipbuildingbecausethereisnocustomerunderstanding.Thatis,becauseRFMcannotprovidearationaleastowhatmakesonevaluetierbehavethewaytheydo,marketingstrategistscannotactivelyincentivizedeeperengagement.
RFMisagoodfirststep,buttomakeagreatsteprequiressomethingbeyondRFM.Marketersrequirebehaviouralsegmentationinordertopractisemarketing.
Whatisbehaviouralsegmentation?Behaviouralsegmentation(BS)quicklyfollowedRFM,duetothefrustrationsthatRFMproducedgood,butnotgreat,results.Aswithmostthings,complexanalysisrequirescomplexanalytictoolsandexpertise.BSwasputintoplacetoapplymarketingconceptswhenusingadatabaseformarketingpurposes.
Inordertoinstituteamarketingstrategy,thereneedstobeaprocess.KotlerrecommendedthefourPsofstrategicmarketing:Partition,Probe,PrioritizeandPosition.Partitioningistheprocessofsegmentation.
Whileit’smathematicallytruethatpartitioningonlyrequiresabusinessrule(RFMisabusinessrule)todividethemarketintosub-markets,behaviouralsegmentationisaspecificanalyticstrategy.Itusescustomerbehaviourtodefinethesegmentsanditusesastatisticaltechniquethatmaximallydifferentiatesthesegments.JamesH.Myersevensays,‘Manypeoplebelievethatmarketsegmentationisthekeystrategicconceptinmarketingtoday’.
BSisfromthecustomer’spointofview,usingcustomertransactionsandmarcom
responsedatatospecificallyunderstandwhat’simportanttocustomers.Itisbasedonthemarketingconceptofcustomer-centricity.BSworksforallstrategicmarketingactivities:selectiontargeting,optimalpricediscounting,channelpreference/customerjourney,productpenetration/categorymanagement,etc.BSallowsamarketertodomorethanmeretargeting.
Animportantpointmightbemadehere.Behavioursarecausedbymotivations,bothprimaryandexperiential.Behavioursarepurchases,visits,productusageandpenetration,opens,clicksandmarcomresponses,etc.Thesebehaviourscausefinancialresults,revenue,growth,lifetimevalueandmargin.
Primarymotivationswouldbeunseenthingslikeattitudes,tastesandpreferences,lifestyle,valuesetonprice,channelpreferences,benefitsorneedarousal.Thereareexperiential,secondarycausesofbehaviour,typicallybasedonsomebrandexposure.Thesearenotbehaviours,butcausesubsequentbehaviours.Thesesecondarycauseswouldbethingslikeloyalty,engagement,satisfaction,courtesyorvelocity.NotethatRFMusesrecencyandfrequency,metricsofengagement,whichisasecondarycause.RFMalsousesmonetarymetrics,whichareresultantfinancialmeasures.ThusRFMdoesnotusebehaviouraldata,butengagementandfinancialdata.TheseareverydifferentthanbehaviouraldatausedinBS.Onesimplewaytodistinguishbehaviouraldatafromsecondarydataisthatbehavioursarenouns:purchases,responses,etc.Notethatsecondarycausesareadjectives:engagementmetrics,loyalcustomers,recenttransactions,frequentlypurchased,etc.
BStypicallyrequiresanalyticexpertisetoimplement.Behaviouralsegmentationisastatisticaloutput(seetheboxonpage164).
OnecriticaldifferencebetweenBSandRFMisthatinabehaviouralsegmentationmemberstypicallydonotchangegroups.Thatis,thebehaviourthatdefinesasegmentevolvesveryslowly.Forexample,ifonepersonissensitivetoprice,herdefiningbehaviourwillnotreallychange.Sheissensitivetopriceevenaftershehasababy,sheissensitivetopriceassheages,orifshegetsapuppy,orbuysanewhouse.Herproductspurchasedmightchange,herinterestsincertaincampaignsmightchange,butherdefiningbehaviourwillnotchange.ThisisoneoftheadvantagesofBSoverRFM.Thisiswhatdrivesyourlearningaboutthesegments.BSprovidessuchinsightsthateachsegmentgeneratesarationale,astory,astowhyit’suniqueenoughtoBEasegment.
WhileRFMusesonlythreedimensions,BSusesanyandallbehaviouraldimensionsthatbestdifferentiatethesegments.Ittypicallyrequiresfarmorethanthreevariablestooptimallydistinguishamarket.
Becausemarketingmixtestingcanbedoneoneachsegment(usingproduct,price,promotionandplace)theinsightsgeneratedmakefordifferentiatedmarketingstrategiesforeachsegment.TotestifRFMtiersdrivebehaviourisprobablyinappropriate,because
tiermembershippotentiallychangeseverytimeperiod.Muchlikestudiesthatproclaim,‘womenwhosmokegivebirthtobabieswithlowbirthweight’,thereisspuriouscorrelationgoingon.Justasanotherdimension(socio-economic,culture,etc.)mightbethereal(unseen)causeofthelowbirthweightandNOTnecessarily(only)thesmoking,sothereareotherdimensionsof(unseen)behaviourusingRFMtoexplain,say,campaignresponses.Thatis,theresponseisnotcausedbytheRFMtier,butsomeothermotivation.
Inshort,BSgoesfarbeyondRFM.Theinsightsandresultantstrategiesaretypicallyworthit.
WhatdoesbehaviouralsegmentationprovidethatRFMdoesnot?Asmentioned,BSdeliversacohortofsegmentmembersthataremaximallydifferentiatedfromothersegmentmembers.Becausethesememberstypicallydonotchangesegments,variousmarketingstrategiescanbelevelledateachsegmenttomaximizecross-sell,up-sell,ROI,margin,loyalty,satisfaction,etc.
BSidentifiesvariablesthatoptimallydefineeachsegment’suniquesensitivities.Forexample,onesegmentmightbedefinedbychannelpreference,anotherbypricesensitivity,anotherbydifferingproductpenetrationsandanotherbyapreferredmarcomvehicle.Thisknowledge,inandofitself,generatesvastinsightsintosegmentmotivations.Theseinsightsallowforadifferentiatedpositioningofeachsegmentbasedoneachsegment’skeydifferentiators.Yougetawayfromtryingtoincentivizecustomersoutofthe‘bad’tiersandintothe‘good’tiers.InBS,therearenogoodorbadtiers.Yourjobisnowtounderstandhowtomaximizeeachsegmentbasedonwhatdrivesthatsegment’sbehaviour,ratherthanfocusononlymigration.Thus,BSgivesyouatest-and-learnplan.
Becauseoftheinsightsprovided,knowledgeisgainedofeachsegment’sprimepainpoints,whichmeansthateachsegmentcanbetreatedwiththerightmessage,attherighttime,withtherightofferandattherightprice.Thiskindofpositioningcreatesa‘segmentofone’inthecustomer’smind.Thisuniquenessdifferentiatesthefirm,perhapseventotheextentofmovingitawayfromheavycompetitionandtowardmonopolisticcompetition.Thismeansyouapproachadegreeofmarketpowerthatisbecomingapricemaker.
BecauseBSprovidessuchinsightsittendstomakemarketersveryactiveinunderstandingmotivations.Thistendstogenerateverylucrativestrategiesforeachsegment.
ConclusionWhataretheadvantagesofRFM?It’sfast,simpleandeasytouse,explainandimplement.Whatarethedisadvantagesofbehaviouralsegmentation?Itrequiresanalyticexpertiseto
generate,ismorecostlyandtakeslongertodo.
BStakesbehaviouralvariablesandusesthemforthepurposeofunderstandingcustomerbehaviour,anditusesastatisticalalgorithmtomaximallydifferentiateeachsegmentbasedonbehaviour(seeboxoverleaf).Asmentioned,thevastmajorityofmarketersthatevolvefromRFMtoBSsayit’sworthit,andtheirmarginsagree.
Segmentationtechniques
TherearethreecharacteristicsthatdistinguishbehaviouralsegmentationfromRFM:BSuses(typically)morebehaviouraldata,BSusesthedataforthespecificpurposeofunderstandingcustomerbehaviourandBSusesstatisticaltechniquestomaximallyseparatethesegments.
Therearetwogeneralphilosophiesinanalysis:supervisedandunsupervisedtechniques.Unsupervisedtechniquesalmosteliminatetheanalystfromtheanalysis.Theseareneuralnetworks,machinelearning,chaostheory,etc.Philosophically,itseemsonthewrongtracktorunatechniquerequiringlittleanalyticstrategy.It’salsowellknownthatneuralnetworktechniquessufferfromover-fittinganddifficultyinexplainingwhatthemodelmeans(usuallybecauseofthehundredsofadditional/transformationalvariablesneuralnetworkingtendstocreate).Therefore,unsupervisedtechniquesarenotrecommended.
Ofthosetechniquesthatrequiresomekindofanalyticinput,ashortcomparisonfromRFMtoCHAIDtoK-meanstoLatentClassisinstructive.RFMismultivariable(typicallyusingthreevariables)butitisnotmultivariate–simultaneouslyusingthethreedimensions.RFMismathematicalandcouldnotbeastatisticallyvalidoption.
CHAID(chi-squaredautomaticinteractiondetection)issometimesofferedasasegmentationsolution.Itisatree-likestructurethatsplitsthenodesbasedonthechi-squaretest.WhileCHAIDisfastandsimple(andprobablybetterthanRFM)itcannotbeoptimal.CHAIDisnotastatisticalmodelbutaheuristic,aguideline.Itbringswithitnodiagnosticsandlittleintelligence.
K-means(alsocalledpartition,iterative,orclustering)isanotherfastandsimpletechnique.Thetypicalalgorithmrequiresyoutodecideonthenumberofclusters(asifyouknow)anddecidewhichvariablestousetodesigntheclusters(asifyouknow).K-meansgivesnodiagnosticstoaidintheseimportantcriteria,leavingittoyourarbitraryintuition.
So,afterthenumberofclustersisdecided,alongwithwhichvariablestouseforclustering,thealgorithmgoestothefirstobservation(egcustomeronthedataset)thathasallthevariablespopulated,calculatesthecentroid(averageofallthevariablesindimensionalspace)andlabelsthiscluster1.Itgoestothenext
observationthatispopulated,calculatesthecentroidandascertainshowfaraway(basedonthesquarerootEuclideandistance)thesecondobservationisfromthefirst.Ifit’s‘farenough’away(basedoncriteriatheanalystgivesoradefault)tobedefinedasitsowncluster,itis.Itcontinuesthroughthedatasetuntilthenumberofclusterssuppliediscreatedandalloftheobservationsareclassifiedintoone(mutuallyexclusive)cluster.
Note:1)Itisnotstatistical,butmathematical.ItusesthesquarerootEuclidiandistancetoassignclustermembership.2)Clustercentroids(andhenceclusters)arehighlydependentontheorderofthedataset.Ifthedatasetisre-sortedtherewilllikelybeverydifferentsegments.3)Itofferslittleinthewayofdiagnostics.4)Becausetheclustersarenaturallyspherical(owingtoassignmentsbasedondistancefromacentroid)theclusterstendtobeofsimilarsize,whichseemsanunlikelyassumptioninarealmarket.WhileK-meansisastepaboveRFMandCHAID,itclearlysuffersfrommanyshortcomings.
Latentclassanalysis(LCA)hasbeenaroundfor50years,butinthelast20hasreallycaughton.LCAisaBayesian(maximumlikelihood)techniquewhichisstatisticalinnature.Becausecustomerbehaviourisprobabilistic(evenirrational)astatisticaltechniquebettermatchesbehaviourthanamathematicaltechnique.Ithasdiagnosticstofindtheoptimalnumberofsegments.Ithasdiagnosticstofindwhichvariablesaresignificantforthesegmentation.
LCAappliesaprobabilityscoretoeveryobservation(customeronthedataset)tobelongtoeachsegment.Forexample,it’sonethingifcustomerAis95%likelytobelongtosegment1andonly5%likelytobelongtosegment2.Thereisanobviousconclusion.Butwhatif,owingtothecustomeraseitherneweronfileorhavingdisplayedsomeunusualpatterns,itisscoredat55%likelytobelongtosegment1and45%likelytobelongtosegment2?Thisisnotsoclear.LCAgivesyoutheabilitytoremovefromthesegmentassignmentsanyofthosethatdonotfigurestrongsegmentbehaviour.Thisshouldtypicallybeaverysmallpercentageofthefilebuttheabilityto‘know’whereeachcustomermostlikelybelongsisveryimportantstrategically.
Ithasbeenprovedoften,butbynonebetterthanJayMagidsonandJeroenK.Vermunt,thatLCAisvastlysuperiortoK-Meansintermsofsegmentidentificationandseparation(MagidsonandVermunt,2002).GiventheadvantagesofLCAasseenabove,itshouldbeseenasthefirstandbestchoice.
Checklist
You’llbethesmartestpersonintheroomifyou:
RememberSASgivesametricofanoptimalsegmentationsolutionasthe‘logofthedeterminantofthecovariantmatrix’.
Recallavarietyofsegmentationtechniques:businessrules,CHAID,hierarchicalclustering,K-means,latentclassanalysis(LCA),etc.
PointoutthatLCAprovidestheoptimalnumberofsegments,diagnosisofwhichvariablesaresignificantandcalculatesaprobabilityscoreforeverymemberbelongingtoeverysegment–nothingisarbitrary!
Usethebehaviouralsegmentationprocess:strategize,collectbehaviouraldata,create/useadditionaldata,runthechosenalgorithmandprofilesegmentoutput.
ProveRFMisfromthefirm’spointofviewandnottheconsumer’s.
PreachRFMincitesnostrategyexceptmigration.
Partfour
Other
09
MarketingresearchIntroduction
Howissurveydatadifferentthandatabasedata?
Missingvalueimputation
Combatingrespondentfatigue
Afartoobriefaccountofconjointanalysis
Structuralequationmodelling(SEM)
Checklist:You’llbethesmartestpersonintheroomifyou…
IntroductionWhystickinachapteronmarketingresearch?Mostoftheanalytictechniques(discussedsofar)applytobothmarketingresearchanddatabasemarketing.It’sbecause,whilethereisoverlap,thefunctionandgoalofmarketingresearchisdifferentthanthatofdatabasemarketing.
Databasemarketingexistsinordertodrivepurchasesfromcustomers.Marketingresearchexistsinordertounderstandconsumerbehaviour.
Databasemarketingispopulatedwithprogrammers,econometriciansandmarketers.Marketingresearchispopulatedwithpsychologists,statisticiansandmarketers.Databasemarketingisappliedanalytics.Marketingresearchisexploratoryanalytics.Databasemarketingistacticalandfast.Marketingresearchisstrategicandthorough.
MerlinStone’sbookConsumerInsight(Stone,2004)detailswelldatabasemarketingandmarketingresearch.ThisoverviewincludesCRM,marketingsystems/operations,loyalty,etc.
Howissurveydatadifferentthandatabasedata?Thisisagoodquestion,andmoreinvolvedthanitmayseematfirstglance.Ofcourse,surveydatacomesfromasurveyanddatabasedatacomesfromadatabase.Butthekeythingisthatsurveydatahasasourcethatis(typically)theconsumeranditisself-reportedandmayevenincludeopinions,etc.Databasedatahasasourcethat(typically)isasystem(transactionalorotherwise)anditisrealdata,realbehaviour,realresponses;thatis,NOTself-reported.
Marketingresearchasadisciplinetendstofocusonsurveydata,whereasdirectmarketing,ofcourse,tendstofocusondatabasedata.You’veseenhowmanymarketingsciencetechniquesareapplicabletoboth.Thischapterscrapesoffthosetechniquesthataremostlyusedinmarketingresearch.Youcannotreallydo,forexample,aconjointondatabasedata;itisnotdesignedthatway.
Thisisoneareaofcontentionalludedtoearlier,especiallyintermsofpricing.Marketingresearchwouldsuggestasurveyandaskcustomers/potentialcustomersaboutpricingpolicies.Theseresponsesaresubjective/self-reportedandtendtohavethesameconclusion:‘Yourpricesaretoohigh!’Conjointisdesignedtogetaroundthatinsomemannerbutitisstillartificialintermsofarealbuying/choicedecision.That’swhyIrecommendusingdatabasedatawhichisrealreactionsfromrealtransactionsfacingrealchoicesintermsofrealprices.Realcool,right?Butthereisaplaceforsurveysandconjoint,etc.Justseebelow.
MissingvalueimputationAcommonissueinsurveydata(aswellasdatabasedata,butlessso)iswhattodoaboutmissingvalues.Itisatypicalpractice–but,asisthecasewithmosttypicalpractices,notagoodidea–tojustreplacethemissingvaluewiththemeanvalue.Thatis,saywehavesurveydataarounddemographics,includingage.Saythatinthiscaseageisimportanttowhatwe’restudying.Ifaverysmallpercentofageismissing,maybereplacingthemissingvalueswiththeoverallmeanisnotsobad.Butit’sstillstupid.
Abetterpossibilityistodosegmentation(evenK-meansisadecentchoice)andbasedon,say,incomeorsizeofhousehold,replacethemissingagevalueswiththemeanofeachsegment.Thisindicatesthatageiscorrelatedwithincomeorsizeofhousehold,andthat’sprobablynotabadassumption.
Thebestideawouldbetomodel,usingordinaryregression,thepredictedagebasedontheabovedemographicsbyeachsegment.Thiswouldaddvariation,ratherthanonlythe(segment)meanvalue.
Thisisallbasedonasubjectiveideathatdependsonthepercentofwhatevervalueismissing.If,say,<5%ismissing,replacingwiththeoverallmeanvaluemightbeacceptable.If,say,between5%and25%ismissing,replacingwiththemeanvaluebysegmentisbetter.Ifbetween25%and50%ismissing,modellingthemissingvaluewithregressionbysegmentisthebest.If>50%ismissingnoimputationshouldbeattempted.
CombatingrespondentfatigueMarketingsurveysshouldbeshort(Idon’tknowwhatImeanbyshort,buttheyshouldrequirelittleeffort,thinkingortime).Iftheyaretoolong(whatevertoolongmeans)fatiguewillsetin(orworse,irritation)andresponseswillbegintobe
erroneous/nonsensical.
Thefirstsuggestiontocombatthisproblemistodesignsurveysthatareshort.It’sbettertohavetwoorthreesurveysinsteadofonelongsurvey.Otherwisetheanswersaremeaningless.
Ananalyticsuggestionistorotateandmodelquestions.Thisrequiressomethinkinganddesignbuttheresultsareusuallyverygood.
Thegeneralideaistousesomequestionstomodeltheanswerstootherquestions.Obviouslythesemodelledquestionswouldnotbeasked.Thatis,saythesurveyisinthree(welldesignedformodelling)sections,A,BandC.Onlyonefourthoftherespondents(randomlychosen)wouldgettheentiresurvey.OnefourthwouldgetonehalfofAandonehalfofB,anotherfourthwouldgetonehalfofAandonehalfofC,andthelastfourthwouldgetonehalfofBandonehalfofC.Thesurveyishalfaslongfortheselastthreefourthsoftherespondents.
Nowtheideaistomodeltheotherhalfofthosesectionsthatwerenotgiven.Thatis,useanswersfromAandBtomodelmissingC,BandCtomodelAandAandCtomodelB.See?Frommyexperiencetheerrorsfromfatiguearefarlessintherotate-and-modelscenariothantheerrorsfromthemodel.Thatmeansthatthemodelsareat95%confidenceandthoseanswersarebetterthangivingtheentirelongsurveyto100%oftherespondentsthatwillintroducefatigue-inducederrorsintothem.
AfartoobriefaccountofconjointanalysisTobefair,ifyou’rereadingthisbookinordertoknowallaboutconjointanalysis,youarereadingthewrongbook.Therearedozensof(entire)booksdetailingallthecooltypesandtechniquesofconjoint.IwillbarelymentionthisherebecauseconjointisavastsubjectandIamnotmuchofaconjointguy.
Toelaboratethelastpoint,Ithinkconjointservesanimportantpurpose,especiallyinmarketingresearch,especiallyinproductdesign(beforetheproductisintroduced).Mymainproblem(asmentionedabove)withsurveysoverallisthattheyareself-reportedandartificial.Conjointsetsupacontrivedsituationforeachrespondent(customer)andasksthemtomakechoices.Thecustomermakeschoicesandthesechoicesaretypicallyintermsofpurchasingaproduct.YouknowI’maneconguyandthesecustomersarenotreallypurchasing.Theyarenotweighingrealchoices.Theyarenotusingtheirownmoney.Theyarenotbuyingproductsinarealeconomicarena.TheartificialnessiswhyIdonotadvocateconjointformuchelseotherthannewproductdesign.Thatis,ifyouhaverealdatauseit.Ifyouneed(potential)customers’inputindesigninganewproductuseconjointforthat.Also,pleaserecognizethatconjointanalysisisnotactuallyan‘analysis’(likeregression,etc.)butaframeworkforparsingoutsimultaneouschoices.Conjointmeans‘consideredjointly’.
Thegeneralprocessofconjointistodesignchoices,dependingonwhatisbeingstudied.Marketingresearchersaretryingtounderstandwhatattributes(independentvariables)aremore/lessimportantintermsof(typically)customerspurchasingaproduct.Soacollectionofexperimentsisdesignedtoaskcustomershowthey’drateaproduct(howlikelytheywouldbetopurchase)givenvaryingproductattributes.
Intermsof,say,PCmanufacturing,choice1mightbe:an800costofPC,17inchmonitor,1Gigharddrive,1GigRAM,etc.Choice2mightbe:an850costofPC,19inchmonitor,1Gigharddrive,1GigRAM,etc.Thereareenoughchoicesdesignedtoshoweachcustomerinordertocalculate‘part-worths’thatshowhowmuchtheyvaluedifferentproductattributes.Thisissupposedtogivemarketersandproductdesignersanindicationofmarketsizeandoptimaldesignforthenewproduct.
Notethatitisimportanttodesignthetypesandnumberoflevelsofeachattributesothattheindependentvariablesareorthogonal(notcorrelated)toeachother.Thesechoicedesigncharacteristicsarecriticaltotheprocess.Attheendanordinaryregressionisusedtooptimallycalculatethevalueofpart-worths.Itisthisestimatedvaluethatmakesconjointstrategicallyuseful.
Nowlet’stakeaslightlydeeperdiveintotheanalyticsofconjoint.Notethattheideaistopresenttoresponderschoices(insuchawaythattheyarerandomandorthogonal)andtherespondersrankthesechoices.Thechoicerankingsarearesponder’sjudgmentaboutthe‘value’(economistscallitutility)oftheproductorserviceevaluated.Itisassumedthatthistotalvalueisbrokendownintotheattributesthatmakeupthechoices.Theseattributesaretheindependentvariablesandthesearethepart-worthsofthemodel.Thatis:
Ui=x11+x12+x21+x22+xmn
whereUi=totalworthforproduct/serviceand
X11=part-worthestimateforlevel1ofattribute1
X12=part-worthestimateforlevel1ofattribute2
X21=part-worthestimateforlevel2ofattribute1
X22=part-worthestimateforlevel2ofattribute2
Xmn=part-worthestimateforlevelmofattributen.
Asmentionedabove,myview(andmanywillviolentlydisagree)isthatconjointisappropriatefornewproduct/serviceevaluations,andthat’saboutall.Itisnotappropriateinthetypicalwayusuallyused,especiallyintermsofpricing,except,asmentioned,inanewproduct–aproductwherethereisnorealdata.(Ievenprefer,say,vanWestendorppricingschemesoverconjoint.Thesearewherethesurveyasksrespondentswhatpriceissohighyouwouldnotconsiderpurchaseandwhatpriceissolowyouwouldsuspectaqualityissue.Theintersectionofwhere‘tooexpensive’and‘toocheap’crossis
hypothesizedasoptimalprice.)
Anyway,foranexistingproduct,itispossibletodesignaconjointanalysisandputpricelevelsinaschoicevariables.Ihavehadmarketingresearcherstellmethatthispricevariablederivesanelasticityfunction.YoushouldknowbynowhowIfeelaboutthat.Idisagreeforthefollowingreasons.1)thoseestimatesareNOTrealeconomicdata.Theyarecontrivedandartificial.2)Thesizeofthesampleitisderivedfromistoosmalltomakerealcorporatestrategicchoices.3)Thedataisself-reported.Thoserespondentsarenotrespondingwiththeirownmoneyinarealeconomicareapurchasingrealproducts.4)Usingrealdataisfarsuperiortousingconjointdata.HaveIsaidthisenoughyet?Ok,therantwillnowstop.
Structuralequationmodelling(SEM)Thiswillunfortunately(also)beafar-too-briefaccountofSEM.SEMisinthedomainofmarketingresearch,ratherthandirect/databasemarketing(wherewe’vespentmostofourtime)butitissopowerfulandsofunthataquicktourhastobedone.
TherearesomesimilaritiesbetweenSEMandsimultaneousequations(coveredearlier).Theyeachareaboutsystemsofequationsandthusseveralsimilaritiesfollow.Theyeachdealwithendogenousandexogenousvariables.Theyeachrequirethealgebraicsolutionoffixedvariablesandenoughobservationstocalculatevariance.Ofcoursetheyeachrequiretheanalysttothinkthroughcauseandeffect.Thisisbecausebothtechniquesareaboutcauseandeffectandcanbeconceptualizedasregressions.
Asmentioned,SEMisamarketingresearchtoolwhilesimultaneousequationsareaneconometrictool.Thisisthefirstdifference.Another(major)differenceisthatsimultaneousequationsare(only)aboutblatantvariableswhileSEMcancontainbothblatantaswellaslatentvariables.Thisisinfact,inmyview,themostimportant(andexciting)difference.Anotherdifferenceisthatsimultaneousequationsoperateoneach(raw)observation(say,eachrowisacustomer)butSEMoperatesonanobservationbeinganelementofacovariancematrix.Whew.So,withthat,let’sgoontoafewdefinitionsofSEMasadifferentkindofanimal.
Figure9.1Unitsandpricecauserevenue
Inthecontrivedexampleabove,notethatbothunitsandpriceCAUSErevenue.Revenueisadependentvariable.That’sequation1.NotealsothatbothpriceandmarcomCAUSEunits.Unitsareadependentvariableinequation2.Obviouslyunitsarebothanindependentandadependentvariable.Therearetwoequations.Alloftheseareblatant(manifest)variables.Theycanbemeasuredforwhattheyare.
Revenue=f(units,price)
Units=f(price,marcom)
Itistrueinthiscasethatwhilepriceandmarcomstatisticallyimpactunits(withstochasticerror),revenueisNOTstatisticallydrivenbyunitsandpricewitharandomerror.Revenueisalgebraicallycausedbyunits*price.Thiswouldbeastraightlinewithnoerror.It’sjustanexample.ItalsoshowsthatSEMisoftendiagrammedusingpaths.Wewilldothesame.Exampleswillrevolvearoundpathanalysis.InSASitwillbewithproccalis.
Let’sgooversometerminology,asSEMhasitsownlanguage,jargon,etc.Asnoted,therearetwokindsofvariables:manifestandlatent.Manifestvariablesareblatant,directlymeasured,directlyobserved.Thesearethingslikeresponses,sales,units,priceordaysbetweenpurchases.Thesecondkindofvariableislatent.Theseare(indirectly)estimatedthroughobservabledata.Thesearethingslikesatisfaction,loyaltyandintelligence.Thatis,whilethereisnoquantitativeobservablemetricof,say,satisfaction,itcanbeinferredbyobservablebehaviour.
Nowlet’smentionagainexogenousandendogenousvariables.Exogenousvariablesareoutsidethesystem;theyareindependentvariables(notcaused)butcanbeeitherlatentormanifest.Endogenousvariablesaretypically(atleast)dependentvariablesandarecausedbysomethingelse.Theyalsocanbeeitherlatentormanifest.Okay?Nowwe’rereadytodoSEM.
ComparingregressiontoSEMForasimpleexamplelet’suseprocregrevenue=f(units,price)andthenproccalisrevenue=f(units,price).
ThisisfartoosimpleauseofSEMbutitwillillustratesomeimportantthings.Note
thatallvariablesaremanifestandwehaveonlyoneequation.Let’ssaywerunprocregandgetthefollowing:
Table9.1Procreg
Variable Parmestimate Standarderror Tvalue
Intercept –8862
Units 73.24 7.4 9.98
Price 111.25 19.03 5.84
Nowifwerunproccalis:
proccalisdata=xx.xxmeanstr;
path
rev<–unitsn_price;
run;
Table9.2Proccalis
Pathrevenue Variable Parmestimate Standarderror Tvalue
Intercept –8863
Units 73.24 1.48 49.39
Price 111.25 2.07 53.81
Proccalisgivesalotmore(butnotshownhere)results.TheonlypointhereisthatSEMandOLSshowthesame(singleequation,manifest)output,intermsofparameterestimates.Thedifferenceint-valuecalculationisthatregressionusesadifferentdenominatorforstandarderrorthanSEM.
CalculatingimpactsNowlet’sseewhathappenswhenweincludemorecomplexityandmorerealism.Mostmarketerswanttoknowtheimpactoftheirmarcom(andprice)onrevenue.Saywedidaregressionmodelrevenue=f(units,price,e-mail,directmail).(Wewillignorethealgebraicissueofhavingbothpriceandunitsasindependentvariables.)Theinteresthereismarcomimpacts.
Table9.3Regressionmodelrevenue
Variable Parmestimate Standarderror Tvalue
Intercept –9368
Units 77.08 7.569 9.79
Price 115.24 20.112 5.73
Email 9.089 2.969 3.06
Directmail 3.99 1.88 2.12
Thisindicatesthateverye-mailsentdrives9.089inrevenueandforeverydirectmailsentweget3.99inrevenue.Lookslikemarcomisreallyrockin’!Thismeansthatsending100eachdrives909and399or1,308intotalrevenue.Thismodelimplicitlyassumestheimpactofmarcomisdirectlyonrevenueandnotonunits.TheR2hereis57%.
Nowlet’sgoastepfurther,andtheresultswillbemoreinteresting.Wewillusetheabovepathoftwoequations:
Revenue=f(units,price)
Units=f(price,email,directmail)
wheremarcomwillbenumberofe-mailsanddirectmailssent.Thehypothesishereisthatunitsandpricedirectly(algebraicallyinthiscase)impactrevenue.Theotherhypothesisisthatpriceandmarcom(EMandDM)directlyimpactunitswhichthenindirectlyimpactrevenue.Thatis,unitsarebothadependentandanindependentvariable.ThatmeansthatrevenuecomesfrombothpriceandunitsandthatunitscomefrompriceandEMandDM.
Thismeansthetotalimpactonrevenueis:
Table9.4Totalimpactonrevenue
Pathrevenue Variable Parmestimate Standarderror Tvalue
Intercept –8863
Units 73.24 1.48 49.39
Price 111.25 2.07 53.81
Pathunits Intercept 259
Price –2.53 0.082 –30.88
Email 1.266 0.299 4.23
DirectMail 1.141 0.089 12.82
Mostimportantlynotetheimpactofmarcomisthroughunits,andnottorevenue.Theimpactofonee-mailisnow1.266ofrevenueandeverydirectmailisnow1.414.Nowsending100eachonlytotals241inrevenue.Thisisfarmorerealisticthantheabove
model.TheR2hereis78%.Whilethisisacontrived,overlysimplisticmodelithascomplexitythatmorecloselymatchesreality.
UseoflatentvariablesNowlet’stalkaboutwheretherealpowerofSEMcomesin:theuseoflatentvariables.Inthiscaselet’sputtogetheraframeworkforloyalty.Notethatthereisactuallynosuchthingasablatantentitycalled/quantifiedas‘loyalty’.Itisalatentvariable.Theideaisthatitislikeintelligence,whichisalsounquantifiableasitself;itcanonlybeindirectlymeasuredassomethinglikeascoreonanIQtest,whichinturnmeasuresdimensionsofintelligence:spatialability,logic,mathematics,verbalskills,etc.Sameistrueforloyalty.Itcanbeseenandsurmisedbyotheractions.
Let’ssaywehaveabehaviouralsegmentationinplacebasedoncustomertransactionsandresponsestomarcom.Weareinterestedinhowloyaleachsegmentis,whichisnotnecessarilythesamethingashowmuchtheyspendorhowmanytransactionstheyhave.Sowedoprimarymarketingresearchandaskquestionsaboutopinions/attitudesaroundprice,value,qualityandsatisfaction.Thesemetricswillshowarangeofloyalty.Wealsoaskaboutshareofvoice,competitivedensityandtheconvenienceofourstorescomparedtoourcompetitors.
Themodelabovetriestoputaframeworktogetherthatsaysconsumerbehaviour(transactions,responses,etc.)iscausedbyaspectrumofloyalty(fromnonetotransactionaltoemotional)whichisinturncausedbyattitudesaroundprice,value,satisfactionandqualityaswellasopinions/metricsofoperationallogisticslikeconvenience,shareofvoiceandcompetitivedensity.
Figure9.2Marcomresponsestransactions
Sothegeneralanalyticideaisthattherearenosuchmetrics/quantitiesasemotionalortransactionalloyalty.Thesearelatentvariables.Butaddingthesevariableshelpsexplainthebehaviourofcustomerspurchasingandcustomersresponding.Thislatentvariableisdiscoveredbyafactoranalysis-typetechniqueusedinSEM.Thatis,themanifestvariablesindirectlyshowtheinfluenceofthelatentvariableandthatlatentvariableis‘teasedout’andlabelled.
Aquicknoteaboutthedifferencebetweentransactionalandemotionalloyaltyshouldclarifythisimportantpoint.Itispossibleforacustomertoappearveryloyalintermsofbuyingalotofproducts,havingashorttimebetweenpurchases,respondingtomarcom,etc.,butnotbeinfactactuallyloyal.Theseareheavypurchasersbecausetheremightnotbeanycompetitorsaround,orourstoresareveryconvenientorourshareofvoiceiscomparativelylarge.Thusit’simportanttoknowhow‘loyal’customersare.Thatis,atransactionalloyalcustomermayjumpshipifcompetitorsmoveinneartheirlocation,orchangetheirshareofvoice.
Theresultsbelowarefromapplyingtheloyaltymodeltotwodifferentsegments,sayXandY.Thesegmentsweredefinedby(transactionsandmarcomresponses)behaviour.Thequestionishowloyal(whatkindofloyalty)theyareandwhatcanbedoneaboutit.Let’ssaythateachsegmenthasgenerallythesamemetricsontransactionsandresponses.SegmentXscoresasatransactionalloyaltycustomer.Notetheparameterestimatesofconvenienceandcompetitivedensityareveryhighandsignificantwhileshareofvoiceisstrongandnegative.Thesearetraditionalindicationsofthetransactionalloyaltysegment.Notealsohighandpositiveimpactsofattitudesaroundpriceandquality,andrecognize
thatmostofthevariablesontheemotionalpathareinsignificant.
Now,asegmentthatscoresasastrongtransactionalloyalty-onlysegmentisabitofaredflag.ThisisespeciallytrueiftheyLOOKliketheyareloyalbasedontheirnumberandamountofpurchases.
Howcanweusetheabovemodeltomovethesegmentfrommeretransactionalloyaltytoemotionalloyalty?Theanswerisintheemotionalloyaltypath.Thesinglelargestimpactisshareofvoiceandthatisametricwecan(somewhat)control.Thereisabusinesscasearoundwhatisthecosttospendandincreaseourrelativeshareofvoiceappliedagainsttheaddedsecurity(andperhapsincreasedpurchasing)ofasegmentthatevolvesintoemotionalloyalty.Seethatshareofvoiceisnegativeinthetransactionalpath?AsSOVincreasesacustomerislesstransactionalandmoreemotional.
Table9.5SegmentX,transactionalloyalty
Path Variable Parmest Sterror Tvalue
Transactional
Price 5.65 3.23 1.75
Quality 6.21 1.65 3.75
Value 3.03 2.07 1.47
Satisfaction 1.35 0.66 2.05
Convenience 5.22 0.75 6.96
Competition 2.66 0.99 2.68
Shareofvoice –1.55 1.03 –1.51
Path Variable Parmest Sterror Tvalue
Emotional
Price 0.03 2.66 0.01
Quality 0.56 1.07 0.53
Value 1.04 2.36 0.44
Satisfaction 1.66 1.03 1.62
Convenience 1.99 1.66 1.2
Competition 0.66 2.04 0.32
Shareofvoice 2.55 1.69 1.51
Nowlet’slookattheoppositekindofloyalty,thebrand/emotionalkind.Thesearecustomersthatloveourbrand,nomatterwhat.ViewtheoutputbelowforsegmentY,whichscoresmostlyasanemotionallyloyalgroup.Noteontheemotionalpathconvenienceandcompetitivedensityarenegative.Thissegmentissoconnectedtothebrandthatevenifitisinconvenienttogotoourstoretheygoanywayandevenifmorecompetitionmovesinthesecustomerscometoourstoreanyway.Thisisemotionalloyalty.Youseealsothatontheemotionalpath,whilepriceispositiveit’sinsignificantandqualityisverysmall.Itshouldbenosurprisethatbothvalueandsatisfactionarehigh.Onthetransactionalpathnoneofthosemetricsaresignificant.
Table9.6SegmentY,emotionalloyalty
Path Variable Parmest Sterror Tvalue
Transactional
Price –1.27 5.65 –0.22
Quality 2.07 6.24 0.33
Value 2.07 1.65 1.25
Satisfaction 0.03 5.07 0.01
Convenience 0.23 0.2 1.17
Competition 0.04 0.02 1.8
Shareofvoice –2.65 1.54 –1.72
Path Variable Parmest Sterror Tvalue
Emotional
Price 3.25 3.04 1.07
Quality 0.24 0.12 2.06
Value 1.26 0.76 1.67
Satisfaction 3.23 1.23 2.63
Convenience –3.65 1.26 –2.91
Competition –2.07 0.56 –3.66
Shareofvoice 1.27 0.87 1.45
ThisisthepowerofSEM,hypothesizingandtestingalatentvariable.Thislatentvariableaccountsformovementinthecustomertransactionsandcustomerresponses.Ifonlyablatant/manifestmodelwasusedthefitwouldnothavebeensogoodandtheinsights
(differentiatingbetweenthetwokindsofloyalty)wouldnotberealized.Soisthatcool,orwhat?
Checklist
You’llbethesmartestpersonintheroomifyou:
Pointoutthatmarketingresearchanddatabasemarketingusemanysimilarmarketingscience/analytictechniques.
Rememberthatsurveydataanddatabasedataaredifferentinmanyways:•surveydataistypicallyafewhundredorthousandresponses,whereasperhapsmillionsofconsumershavetransactionsonadatabase;•surveydataisself-reported/opinionswhereasdatabasedataisrealevents;•surveydataisasampleofsomekindwhereasdatabasedatacanbethewholerelevantpopulation(egallofafirm’scustomers).
Takegreatcareinimputingmissingvalues.Undersomecircumstancesreplacingamissingvaluewiththemeanisappropriate,othertimesmaybeamodeliscalledfor.
Recallthatconjointanalysisisbestsuitedfornewproducts,becauseoftheartificialnatureofthesimulatedpurchase.
Differentiatebetweenstructuralequationsmodels(SEM)andsimultaneousequations.SEMandsimultaneousequationsarebothsystemsofequations,butSEMdoesnotrequireonlyblatantvariables.
ArguethatthepowerofSEMisinuncoveringlatentvariables.
10
StatisticaltestingHowdoIknowwhatworks?Everyonewantstotest
Samplesizeequation:usetheliftmeasure
A/Btestingandfullfactorialdifferences
Businesscase
Checklist:You’llbethesmartestpersonintheroomifyou…
EveryonewantstotestStatisticaltesting(designofexperiments,DOE)seemstodecreasetheriskofmakingamistake.
Designofexperiments:aninductivewayofcreatingastatisticaltestusingastimulustakingintoaccountvariance,confidence,etc.,byrandomizationandcomparisontoacontrolgroup.
I’lltellyourightnow,Imyselfamnotreallyatestingguy.Iseeitsworth,butthetimesthatthetestisactually‘clean’,canbemeasuredandismeasuringwhatitwasdesignedtomeasure,areveryfew.Thisisbecauseofacoupleofthings.First,companiesdonotwanttodesignfortestvs.control–whywouldtheywanttotakepotentialbuyersoutofthetreatment(iethecontrolgroupdoesnotgetthestimulus–thetest)?Themarketingscienceansweristhat‘youmustinvestinthetest!’Sofirmsusuallyfighttomakethecontrolgroupsosmall,actuallytoosmall,sothatastatistical(t-test,z-test,etc.)cannot(reliably)beperformed.
Anotherreasonisthatmostofthetimethetestis‘dirty’.Weneverseemtogetcustomersthatweretogetonlyacertainkind(ornokind)oftreatment(stimulus).SayacustomerissupposedtogettreatmentXsotheycanbemeasuredagainsttreatmentY(thatisthetest).However,accidentally,thatcustomeralsogetsstimulifromotherpartsofthecompanyandthenumberoneruleoftestingis:onlyonethingcanbedifferentinmeasuringtestvs.control.IfacustomerwassupposedtogetonlytreatmentXandthey(orsomeofthem)alsogotstimulusAandtreatmentB,promotionC,etc.,thetestcannotbedone;youcannotmeasure(inaDOEframework)multipledifferences(withoutdesigningforthat).Thatiswhythedesigniscritical.
Veryfewcompaniesaredisciplinedenoughtoactuallycarryoutatest.Mostofthetime,attheendofthetest,everyoneshrugstheirshouldersandalsoacknowledgesseasonalityorcompetitionorchangingtastesandpreferencesorhypothesizesthatsomethingsystematic,affectedthetestresults.Sotheywanttotestagain.Andagain:neverreallylearninginordertoact,justtesting.Moreaboutthatlater.
Samplesizeequation:usetheliftmeasureTestingquestionsalwaysbeginwithsamplesize.Theideaistohaveasamplelargeenough–andwithenoughvariation–inordertobeconfidentaboutgeneralizingtothepopulation.Rememberstatisticsusesinductivereasoning.Thatisthepointoftesting:takeasmallsample(soasnotto(publicly)ruinanything)andsimulatethepopulation.That’simportant.Whatyou’retryingtodoisdesignalaboratorythatlooks(andacts)justlikethepopulation.Youexperimentonthe(sampled)laboratoryandfindwhatseemstoworkandthenyouhavetothrusttheseontothepopulation,whichyouhopewillactasthesampledid.That’sinductivereasoning.
Sowehavetorevisitthenormaldistribution,z-scoresandtheconfidenceinterval.Thatwasalongtimeago,sogobackifyouneedto.Idid.
Rememberthatthenormaldistribution(althoughkindoftheoretic)isthemodelthatweuse(mostly)fortesting.Weassumeanormaldistribution.Thenormaldistributionischaracterizedbytwothings:1)themeanandmedianandmodeareallthesamenumberand2)theirdistributionissymmetricalaboutthatnumber.Now,bydefinition,withinthefirststandarddeviationofanormaldistributionarecontained68%ofalltheobservations;withthesecondstandarddeviationadd14%toeachside,aggregating28%moreforatotalnumberofobservationsbetweentwodeviationsof96%.SeeFigure10.1.Nowlet’sthinkaboutz-scores.Remembertheformulais
(observation–mean)/standarddeviation.
Figure10.1Z-scores
IntermsofIQ,wherethemeanis100andthestandarddeviationis15,68%ofallobservationsarebetween85and115.Saidanotherway,anIQof+1standarddeviationsisaz-scoreof1.00,whichisgreaterthan(34+34+14+1.9)nearly84%ofthepopulation.Az-scoreof+2.0isgreaterthannearly98%ofthepopulation.See?Thisisactuallythekeytosamplesizeneededandoveralltesting.
BysampleImeanasubsetofthepopulation.Evenifyoudonotreallyhavethewhole,entirepopulation,we’llpretend.Whatelsecanwedo?Sowegenerallytakeasimplerandomsample(SRS)ofthepopulation.Buthowlargeasampledoweneedinordertosimulatethepopulation?
Samplesizeneedstotakeintoaccount(intermsofDOE)variationwhichaffectsconfidence.Wearetryingtobeprettyconfidentthatthesizeofoursamplewillmirrorthepopulationwhenthetestingisdoneandthengeneralizedtothepopulation.Thatis,ifyoutookthemeanofthepopulationandfoundittobe50.0andthentookanSRSandfoundthatmeantobe40.0,wouldyoubeconfidentthatyoursamplemirroredthepopulation?Theansweris,‘Maybe,dependingonthevariation’.Sayyouknewthepopulationhadameanof50.0butastandarddeviationof25.50.It’spossibleyourSRSisrepresentativeofthepopulation.Thez-scoreis–0.392,whichmightnotbeTHATunusual.
So,theformulaI’dadvocateforsamplesizeneedstotakeintoaccountthestandarddeviationofthepopulation,howconfidentyouwanttobeofgeneralizingyourresultstothepopulationafterthetest,whatsensitivityyouwanttomeasure(ieliftdetection)andexpectedresponse.Thatis:
wherenissamplesize,Zisconfidencelevel,risresponserateandl=liftdetection.Asanexample,saywehaveanexpectedresponserateof28%,aconfidencewantedof90%(z-score=1.64)andaminimalliftdetectionof5%,thesamplesizeneededineachcellis5,566.Thatis,tobe90%confidentyourresultswillgeneralizetothepopulation(9outof10timesitwill,theoretically),andhavingusuallya28%responserateandyouwantedtonotdetectadifferenceunlessitisbyatleast5%(thatis,26.6%–29.4%)response,youneedatotalsampleof11,131.Thatis,forA/Btestingyouneed5,566ineach(testandcontrol)cell.See?
Ihavetomentionasillythingthatisstillgoingon,Ihearitallthetime.Theanswertothequestion‘HowlargeasamplesizedoIneed?’isoften‘380’.(Ifnotexactly380itisverycloseto380.)Letmeshowyouwherethiscomesfromandwhyitiswrong.Evenstupid.
Theformulathisusesis:
Oftenmarketerstestat95%confidence(az-scoreof1.96)anda1%responserateisassumedandtheyonlywanttoaccepta1%error,whichtranslatesthisformulaintoasamplesize380.Nowthinkaboutthis.A1%assumedresponseratemeansthatofthe380cellonly3.8willrespond.Iguaranteethat3.8(okay,rounditupto4people)isNOTenoughtobeconfidentabout.Atall.Oriftheysay380areresponses,thenthatcellactuallyhad38,000init,right?Seethefolly?
Isn’tthisthesameproblemwiththeformulaIrecommendabove?No,itisnot.Ofthe5,566cellsizeandaresponserateof28%thatmeanstherewillbe1,558respondersandIcanbeconfidentwiththat.Orevenata1%responserate(still90%confidenceand5%lift)thecellsizeisover200,000.And2,000responsesareenoughtotestandbeconfidentabout.So,donotletthemtellyou380isanadequatesamplesize.Isitanywondercorporationsareinanosedive?
A/BtestingandfullfactorialdifferencesAcoupleofquicknotesonverycommontestingwillfollow.DidImentionIamnotreallyatestingguy?
WealwaystalkaboutA/Btesting(sometimescalled‘champion/challenger’)andthissimplymeanscomparing(evenastestvs.control)twocellsagainsteachother.Theideaisthatwerandomlychosetheparticipantsineachcelland(thisisimportant)theonlydifference(getthat?Theonlydifference)betweenthemisthatthetestcellhasthetesttreatmentandthecontrolcelldoesnot.
ThenwemeasuretheaverageresponsesofcellAvs.cellBandiftheyaredifferentenoughwesaytheyarestatistically/significantlydifferent.Thatmeanswehaveconfidence(typically95%)thatwhenwegeneralizethistothepopulationthesameresultshappen,onalargerscale.TheformulaIusuallyuseforresponsetestingisthez-score:
where .At95%confidenceifthisformulais>1.96thentheAresponserateisstatistically,significantly(andpositively–yesthisisveryimportant!)differentthantheBresponserate.
Asanexample,let’ssayfortheAtestwehaveresponsesof1,200andwesent10,000.ForBwehaveresponsesof950andwesent5,000.rAmeansresponsesfromA,nAmeanspopulationofA.(rA=1,200,nA=10,000,rB=950andnB=5,000.)Thiscalculatestoaz-scoreof–11.53whichisstatisticallyandsignificantlydifferent:withBoutperformingAat95%confidence.
Letmemakeanotherpointthatmarketers(especiallyretailers)haveahardtimewith.Inordertoeffectivelycalculateandmonitorincrementalmarcom,thereneedstobeauniversalcontrolgroup(UCG).Thismeansagroupofcustomersthatnever(ever)getpromotedto.Thiscanbeasmallgroup,butstillstatisticallysignificantinordertotest.IfyoudonothaveaUCGyoucanonlytestonetreatmentcomparedtoanother,andneverknowifit’sincremental(ordetrimentalforthatmatter).IrealizeI’maskingyoutosetasideagroupofcustomersthatwillnevergetapromotion,nevergetabrandmessage,etc.Thisiscalledinvestinginthetest.Ifknowledge(orproof)thatyourmarcomisdrivingincrementalrevenuetoyourbusinessisimportant(andnoonewoulddisagreethatitis)thenyouneedtoinvestinthetest.Everycampaignneedstobedesignedatleastasatestvs.controlandthecontrolistheUCG.Ifyoudoabusinesscaseonthepotentialrevenueyou’lllosefromtheUCGandcomparethattotheinsightyou’llhaveaboutwhichcampaignsareactuallyincreasingthebottomline,investinginaUCGwinseverytime.RememberthepointofanalyticsistodecreasethechanceofmakingamistakeandUCGisallaboutthat.
BUSINESSCASEScottwalkedintothelittleconferenceroom,knowinghewouldagainhavetoexplainandstrugglewithBecky,thedirectorofconsumermarketing.Everymonthshehadmanyideasabouttest-and-learnplansandwhatshewantedtolearnfromaseriesofmailings.EverymonthScotthadtoexplaintohertheconceptsoftesting,especiallytheideaofonlychangingonedimensionatatimeinordertotest.Hehadthoughtifmaybeherecordedlastmonth’sconversationhewouldjustsendtherecordingandhaveherpressplaytore-hearit.
Hearrivedfirst.Healwaysarrivedfirst.Heestimatedinayearhewasted53hourswaitingforameeting/phonecalltostartwhileeverybodyelseeventuallywanderedin.Beckyandherteamjoinedhimaboutsixminutespastthehour.
‘SoScott,we’dliketotestourmessagesagain.Reallygetsomelearning.’
‘Great,allforit’,Scottsaid.Healwayssaidthis.
‘I’vethoughtaboutwhatyou’vebeensayingandhaveputatabletogether.We’dliketotestdiscountsagainstdifferentaudiences.’Sheshowedhimthetable.Notethatdiscountlevelisappliedonlyonce.(SeeTable10.1.)
Table10.1Testingdiscountsagainstdifferentaudiences
CellA 5%discount Desktoppurchase
CellB 10%discount Onlineexclusive
CellC 15%discount Purchased>$2,500
CellD 20%discount Addingaprinter
Scottsighed.‘Becky,thisisthesameideawe’vehadbefore.Comparetwocustomers;oneincellAandanotherincellB.IfcellBhasahigherresponse/morerevenue,isitbecauseofthe10%discountsorbecauseoftheonlineexclusive?’
‘Iwouldsayboth’,shesmiled.
‘Butthepointofatestistoisolatejustonetreatment,inordertoquantifythatstimulus.’Helookedatthem.Theyallsmiled,allnodded.‘Whatisneededtotestthisisnota4cellbuta16cellmatrix.Likethis.’(HedrewTable10.2.)
Table10.2Testingdiscountsagainstdifferentaudiencesina16cellmatrix
5%discount 10%discount 15%discount 20%discount
Desktoppurchase CellA CellE CellI CellM
Onlineexclusive CellB CellF CellJ CellN
Purchased>$2,500 CellC CellG CellK CellO
Addingaprinter CellD CellH CellL CellP
‘Wow’,Beckysaid.‘Thatmakessense.Wewillneedafargreatersamplesizethough,right?’
‘That’sright.Thisiscalledfullfactorialandwilldetectallinteractions.Thebenefitisintheconfidenceofthelearningsandthecostisinthesamplesize,whichmeansbothtimeandmoney.It’satrade-off,asalways.’
‘Okay,we’llredesign.Let’salsotalkabouttheresultsoflastmonth’stest.’
‘Great.’
‘Well,inthiscasethecontrolcellout-performedthetestcell.Sothetestdidnotwork.’
‘Whatwerewetesting?’
‘Thiswastopastdesktoppurchasers.Thecontrolwasa10%discountandthetestwasa20%discount.Inthepastthe10%discountisprettystandardsowewantedtoseehowmanymoresaleshappenwitha20%discount.’
‘Makessense’,Scottsaid.‘Itseemssoweirdthatthe10%wouldout-performthe20%.Byhowmuch?’
‘Byalmost50%moreresponse,thatis,numberofpurchases.’
‘Thesewererandomlychosen?’
‘Yep’,Beckysaid.‘Iguessitmeansourtargetaudiencedoesnotneedadeeperdiscount,whichisagoodthing.Theyareveryloyalandwillactwithoutadeeperstimulus.ButsomehowIdoubtit.’
‘SodoI.Itdoesnotmakeeconomicsense.Weshouldinvestigatethelist,makesurebothsidesgotthesingletreatment,trytoseeifsomethingwasamiss.Eachcellwasaboutthesamesize?’
‘Yeah,veryclose.’
‘But’,Kristinasaid,‘howdidwemakesurebothcellsonlygotthistreatment?’
‘Whatdoyoumean?’Scottasked.
‘NothinghappenedthatIknowoftopullthesecustomersoutandonlygetthismonth’sdeal.’
‘Andlastmonththe“GetaFreePrinter”wentout.’
‘Andthedesktopbundlewentout.’
‘Andsincefarmoreofourcustomersgetthe10%discountthananythingelse,thosethatgotthe10%discountinthistestcellmayalsohavereceivedoneorbothoftheotherstimuli.Right?’
‘Yeah,Ithinkso.’
‘Well,iftrue,thatcouldexplainit’,Scottsaid.‘Our10%testcellmayhavegotatleastthreestimuli,notone.’
Beckysighed.‘Sothetesthastobedoneagain?’
‘Probably.Ifitwasimportanttoknowwhatthattreatmentdrovethentheanswerisyes.’
‘Well,yeahitwas.Andwe’vehadsuchdifficultywithtestinganyway–Imeanthedesignofit–togobackandre-testwillbeahardsell.’
Scottlookedather.‘Idon’tknowhowhelpfulitmightbe,butwepossiblycoulddoamultivariateexercisetotrytoisolatethistest.’
‘Whatdoyoumean?’
‘I’mnotsure.Wemightbeabletodoamodelthataccountsforallthetreatmentsandstill,ceterisparibus,measuresjustthiscampaign.’
Kristinalookedup.‘YoumeananANOVAofsomekind?’(Analysisofvarianceisageneralstatisticaltechniquetoanalysethedifferenceswithinandbetweengroupmeans.)
‘Yeah,althoughI’maneconguysoI’mmorecomfortablewithregression.Butsometechniquethataccountsformultiplesimultaneoussourcesofstimulionrevenue.’
ScottwenttothewhiteboardanddrewTable10.3.
Table10.3Multiplesourcesmodel
CustID
60dayreview
Printerpromo
DTbundlepromo
20%discpromo
#opens
#clicks
#webvisits
#calls
Pastrev
X 0 1 0 1 7 3 9 0 1800
Y 900 0 1 1 8 1 5 2 490
Z 0 0 0 0 11 4 4 1 800
‘Now’,Scottsaid,‘wecanincludeanyandallpromotions,etc.,thatwecantrackandputinthismodel.Theideaistomeasurethedollarvalueofallstimuli.’
‘Whatifwedon’torcan’tgetalltheinformation?’
‘Wewillalwaysmisssomething.It’simportanttoincludeallweknow,allwecanknow,frombothatheoreticalaswellasactualcausalityassumption.Thereisafinelinebetweenincludingtoomuchandmissingsomethingimportant.’
‘Canyouexplainabitaboutthat?I’mnotsurewhatyoumean’,Kristinaasked.Shehadalwayshadaninterestinthemodellingprocess,especiallyonthemoretechnicalsideofthings.
‘Fromaneconometricpointofview,toexcludearelevantvariablewillbiasthoseparameterestimates,soweneedtoensurewehaveallimportanttheoreticallysoundindependentvariables.Toincludeanirrelevantvariableincreasesthestandarderroroftheparametersestimates,meaningthatwhiletheyareunbiasedthevariationislargerthanitshouldbesothet-ratios(beta/standarderrorofbeta)willappearsmallerthantheyshouldbe.Thus,itbehoovesmodellerstodesignatheoreticallysoundmodelandcollectrelevantdata.’
Theyalllookedathim.‘Soundsgood’,Beckysaid.‘Let’stalkwithITandcollectthedatayouneedandyoucanputthistogetherforus?’
SoScottgotthedatatogetherandranthemodelandtheyfoundthevariouscampaigns’contributiontorevenuethataccountedformostotherimportantfactors.ThistypeofanalysisallowedScott’steamtooffercampaignvaluationoutsideofastrictlytestingenvironment.Whileeachpointofviewhasplusesandminuses,Scott’svaluationmethodcouldspecificallytakeintoaccountother(dirty)dataissues.Also,hisresultsdirectlytiedtosales,somethingA/Btestingdidnotdo.Asmentioned,abackgroundineconomicsisvaluableforamarketingsciencefunction.
Checklist
You’llbethesmartestpersonintheroomifyou:
Remindeveryonethattheymust‘Investinthetest!’Thistypicallymeansusingalargeenoughsampleforacontrolgroupthatwillallowameaningfultest.
Pointoutthatit’sdifficulttoactuallycontrolforeverything.Simplerandomselectionisonlyabluntinstrument.
Rememberthatexperimentdesign,A/Btesting(championvs.challenger)willnotgivetheimpactofindividualdimensions(whatimpactpricehas,ormessage,orcompetitionchanges,etc.).
Demandthatthesamplesizeequationincorporateslift.
Makefunofthesillyanswer(‘N=380’)tothequestion‘Howlargeasampledoweneed?’
Shoutloudthatinalltestingeachcellcanonlydifferbyonething(onedimension).
Recommendusingordinaryregressiontoaccountfor‘dirty’testing.
Partfive
Capstone
11
Capstone:focusingondigitalanalyticsIntroduction
Modellingengagement
Businesscase
Modelconception
HowdoImodelmultiplechannels?
Conclusion
IntroductionThischapterisacapstoneofmostofwhatwe’vedonebefore.It’smeanttobeapracticalapplicationoftraditionaltechniquesappliedtodifferentkindsof(non-traditional)data.
Sincethemid-1990swhentheWorldWideWebbecameavailable,manymarketingscientistsandotherspanickedbecauseofthenewkindofdata.Clickstreams/weblogswerebecomingavailableandmanypeoplethoughtthatthenewdatawouldneednewtechniques.Theyforgotitisstillmarketing.Theyforgotitisstillconsumerbehaviour.
You’veprobablysurmised,asImentionedelsewhere,Iamnotinfavourofunsupervisedtechniquesanditwasthesethatmanydataanalystsbegantorunto.Unsupervisedtechniquesincludethingslikeneuralnetworks,variousmachinelearnings,chaos/catastrophetheory,etc.(IfyouHAVEtolearnthesethingsyouwilleasilyfindabucketloadofnew-fangledalgorithmsonline.)Butwhywouldnewdatarequirenewtechniques?Whendirectmailbecameavailabledidweinventnewtechniques?Whene-mailbecameavailabledidweinventnewtechniques?Regressionisstillworthwhileregardlessofthekindsofdataused.
TheaboveisnottosaythatdigitaldataISNOTverydifferentthantraditionaldata.ILOVEclickstreamdata(suchasOmniture’spageviews)thatshowsjustwhatpageaconsumerviews,forhowlongandinwhatorder.Thatisanamazingtrackingofconsumerbehaviour.Andthenewsocialmediaisbringingaboutaparadigmshiftfromoutboundmarketingtoinboundmarketing.It’sdifferentkindsofdatabutwhywoulditrequirenewstatisticaltechniques?Consumersarestillbehaving,shopping,buying.Right?
Newdata(BIGDATA!)isbringingaboutpanicbecauseitisMOREdata(bothintermsofsize(includingincreasedvariety)andadditionalbehaviouraldimensions).Newdata
stilltracksaconsumer’sawareness,familiarity,consideration,shoppingandpurchase.SoI’dsuggestNOTusingneuralnetworksandTaguchimethodsasareactiontonewdata.Theremightbeaplaceforthesethings,butitisNOTjustbecausethedataisnew.
I’mnotagainstnewalgorithmswhenneeded.Itypicallydonotthinktheyareneeded.Iamalsophilosophicallyopposedtomanyoftheconceptionsthatseemtobebehindthesenewtechniques,inthattheytrytoremovetheanalystfromtheanalysis.Manyofthemarevirtuallymarketedasavoodoo/blackboxandadvocatenotreallyneedingananalyticexpertiserunningtheoperations.Thatseemstomeaformulaformassivefailure.Nottomentionthatwhenthesethingshavebeenputintothefield,Ihaveneverseenthemdobetterthantraditionaleconometrictechniques.Never.Ihavehadmanydebatesandbetsonthisveryissueovertheyears.(Youknowwhoyouare!)
ModellingengagementWhenitcomesdowntoit,afirmcanonlyreallybesuccessfulifitcanengageconsumers.ThisiswhyRFM(recency,frequency,monetary)works,toacertainextent:it(simplistically)findsthosecustomersthattendtobemostengaged.Therealissueisquantifyingengagement:whatbehaviourismostvaluable?
Whyquantifyengagement?Becauseengagementisbydefinitionpsychological(itsimpactisseeninovertbehaviour)themetric‘engagement’hastobederivedindirectly.Thatis,engagementisamotivator,astimulusthatshowsitselfincertainovertbehaviours.Becauseengagementisanindicatorofinterest,dependingontheproblemsolvingfortheproductneeded,interest(intheshoppingphase)iskeytomovingtheconsumertothepurchasingphase.Quantifyingengagementcanleadtospecificmarketingactions.
Whatarethehypothesizedfactorstodrivepurchases?Thereareseveralthingsthatcausepurchases.Someofthesearepricing,seasonality,competition,consumerconfidence,campaignsandengagement.Thesearebothblatantaswellaslatent.Thesearebothinternalandexternal.Thesearebothmarketingleversandconsumers’needarousal.But,engagement(interestintheproduct)iscertainlyaprecursorbeforeanypurchasingcanbemade,regardlessofthelevelofdecisionmaking.
Whataretheissuesarounddesigninganengagementmodel?Figure11.1showsan‘issuetree’,atechniquesometimesusedindesigningaproject.Theideaisthatthekeyissues/requirementsarestatedandsolutionsorotherissuesaredetailed.Thisway,focusisonthebigpicture,andall‘troublespots’aswellasnecessitiesareplannedfor.Yes,thiscomesfromMcKinsey.
Figure11.1Issuetree
Whatshouldanengagementmodellooklike?Becauseengagementislatent,thereneedstobeatechniquethataccountsfortheinteractionsanddiscoveryofthishiddenmotivator.Butthemodelmustultimatelyquantifyengagement.Itshouldshowwhatexplanatorypowerengagementhas(givenseasonality,competition,pricing,marcom,etc.)andhowmuchengagementisworthtothefirm.Thatis,themodelmustbothgiveastructuralanalysisinsharedvarianceaswellasimpacttorevenue.RememberPeterDrucker’sadmonition:ifyourprojectisnot
increasingsatisfaction,decreasingexpenseorincreasingrevenue,youshouldconsiderNOTdoingit.
Sinceengagementisaboutbothhiddenmotivationsandoutrightbehaviours,whatdoesthismeananalytically?Itmeansfactoranalysiswillbeusedtofindthelatentmotivations.Factoranalysisisaninter-relationshiptechniquestolenfrompsychologists.Theideaisthatitextractsvariancefromvariablesthat‘load’(correlatetogether)andthenmakesanewfactor.Thatis,variablesloadhighorlow,dependingontheunderlying(hidden)factor.
Recallthatweusedfactoranalysistocombineindependentvariablesintoother(factors)thatwerebydefinitionnon-correlated.Thatis,theresultantfactorsareuncorrelatedwitheachotherbutthecollectionoffactorsmaintainsthe(distinct,non-overlapping)varianceoftheindependentvariables.Thisiswhyittendstoworkasacorrectionforcollinearity.
Another(andmoretypical)useoffactoranalysisistodivineunderlyingmotivations.Conceptuallythismeansthatifblatantvariablesloadhighontoafactor,itisbecausetheyareeachmotivatedbyalatentdimension.Thenanotherlatentdimensioncomesintoplaytomotivatetheothervariables.Forexample,ifwehavevariableslikeGPA,income,education,jobtitle,etc.thatloadhighontoonefactorwemightcallthatfactor‘intelligence’.Thereisnovariablecalled‘intelligence’;welabelthefactorassuchbasedonwhichvariablescorrelatetogether.Thusthesameanalyticstrategycanbeleviedforengagement.Thisisthetechniquethatstructuralequationmodels(SEM)uses.
BUSINESSCASEScottwas‘loaned’totheonlinesoftwaresalesteamattheendoftheyear.Thisteamwasnewandprimarilymarketedsoftwareforsmallbusinesses.Thesoftwarewouldkeeptrackofthefirm’snetwork,ensuringsecurityandconnectivitywasupdated.Italsorecommendedcertainhardwareproductstoupgradeperformance,etc.
ScottreportedtotheGMofthesoftwaregroup.
‘HiScott,goodtoseeyou’,hesaidandstoodupandshookScott’shand.‘I’veheardgoodthingsaboutyouandfranklyweneedyourhelp.’
‘AnywayIcan’,Scottsaid.
‘Good.Weneedtounderstandwhatonlineactionsindicateinterest.Whenourpotentialcustomerscometoourwebsitetheycanbrowseforthesoftware,clickonproductdemos,downloadatrialversion,downloadawebinar,chatwithasalesengineer,etc.Wearetryingtoquantifythoseactionsthataremostindicativeofpurchase,andthenexploitthoseactions.’
Scottnodded.
‘So’,theGMcontinued,‘whenapotentialcustomeroptsintoreceivee-mails,ortojoinacommunity,weknowthatbehaviourisobviouslyoneofengagement.Wewanttoknowwhatthatengagementisworth.Doesonlyopt-inbehaviourprovidethepathtopurchase,orarethereotherthings?’
‘Soyouwanttoquantifythoseclicks–thosebehaviours–thatleadtopurchase.’
‘That’sright.Notallbehavioursareequallyimportantinindicatingengagement.Wewanttoknowwhereinthepurchasingchainarenumberofopens,numberofpageviews,andtimeonsite,etc.’
‘Sure,Isee.Whichbehavioursarebiggerdriversofpurchasingthanothers?Whichareshoppingandlatent,whichareprecursorstopurchasingandareblatant?Soundsfun.’
ScottalreadyhadanideaashelefttheGM’soffice.Hecalledhisteamtogetherandtheyorganizedaccesstodata.Themaindimensionswouldbeclickstream/pageviews,primarilywhitepaperdownloads,webinars,trialsoftwaredownloads,numberofopens,numberofclicks,numberofpageviews,timeonsiteandwidthanddepthofproductpages.Opensandclicksrefertoe-mailengagement,widthofproductpagesindicatesthevarioussoftwareoptionsavailableanddepthofproductpagesindicatesaninvestigationofallofthespecificsforaparticularsoftwareproduct.Widthanddepthareimportantanddifferentviewsofcustomerbehaviour.Thinkofwidthasifshoppingforjeansandtopsandshoesandcoats.Thinkofdepthasifshoppingforjeans,whitewashedjeans,differentsizedjeans,returnpolicy,storelocation,productreviewofjeans,etc.
Mostoftheinternalclientsbelievedthatonlygated/registereditems(whitepaperdownload,trialsoftwaredownload,webinars,etc.)hadanyrealengagementtoquantify.Thisisanobviouslydeeperbehaviourthan,say,numberofopensandnumberofclicks.Scottwonderediftherewereanyotherbehaviours(particularlynon-gated)thatwouldquantifyasengagedastheopt-inrequiredbehaviours.
Sohecollectedthedataandranfactoranalysis.Twofactorsaccountedfor86%ofallthevariationoftheindependentvariables.Giventhebelowloadings(Table11.1),Scottcalledfactorone‘WindowShopping’andfactortwohecalled‘TryitOn’.Thatis,thebehavioursofopens,clicksandnumberofpageviews,forexample,arehypothesizedtobemotivatedby‘WindowShopping’.Likewisethebehavioursofdepthofproductpages,whitepaperdownloadandwebinarsaremotivatedbyadesireto‘TryitOn’.Whilethisseemsultimatelyintuitive,thewaytheanalysisputsthesetwolatentfactorstogethertoexplaintheblatantbehavioursiscompelling.
Table11.1Factoranalysis
Variable Factor1 Factor2
WindowShopping TryitOn
Opens 0.76 0.26
Clicks 0.84 0.12
Webinar 0.10 0.88
Whitepaperdownload 0.12 0.82
Softwaredownload 0.29 0.86
Pageviews 0.90 0.11
Timeonsite 0.77 0.14
Widthproductpages 0.03 0.09
Depthproductpages 0.16 0.77
It’simportanttonote(forbusinessinsights)thatthefactor‘TryitOn’isnotonlygateditems,butincludesdepthofproductpagesat0.77.Thismeansthereishighengagementindepthofproductpages,almostashighastheopt-inbehaviours.
ModelconceptionThisgaveScottanobviousfunctionalformofthemodel:
Purchase=windowshoppingandtryiton.
Thatis,hewouldregresspurchasespendonthetwofactors(whichinturnaccountsforthevariationofalltheotherindependentvariablesandarethemselvesorthogonal,thatis,uncorrelatedwitheachother).Whenhedidthat,usingthefactorsasthetwoindependentvariables,heachievedanadjustedR2ofover37%andbothfactorsweresignificantatthe95%level.Thismeansthatindrivingrevenue,engagementitselfaccountsformorethanonethirdoftheimpact.The‘TryitOn’coefficientwas17,573andthe‘WindowShopping’coefficientwas5,448.Thismeansthat‘TryitOn’hasthreetimestheimpactonrevenuethandoes‘WindowShopping’.Theinterceptwas9,801.
Examplesappliedtocustomers
Table11.2showsthreeexamplesofhowitworks.Notethatcontact1050hasalargeamountofwebinars,didmanywhitepaperdownloads,downloadedthetrialsoftwareandsearchedthewebsiteproductpagestoasignificantdepth.Theyobviouslyoptedinandfallintothe‘tryiton’motivationandhavehighpredictedrevenue.
Table11.2Examplesappliedtocustomers
Contact Engagedrevenue
Windowshopping
Tryit
Opens Clicks Webinar Whitepaper
Trialsw
Pageviews
Timeon
W_prodpages
on dl dl site
1050 90,451 –0.005 4.591 34 22 5 7 1 222 666 8
1061 51,523 4.453 0.988 77 71 1 6 1 620 1860 4
1269 37,145 3.445 0.488 55 8 0 0 0 559 111 5
Let’scalculatecontact1050’sengagedrevenueusingthemodel.Engagedrevenue=
intercept+
(TryitOncoeff*tryitonindepenvar)+
(Windowshoppingcoeff*windowshoppingindepenvar).
90,451=9,801+(5,448*–0.005)+(17,573*4.591).
Second,notecontact1061hasadifferentbehaviour.Theyhadmanyopensandclicks(indeedtheyclickedonnearlyeveryopen),smallernumberofdownloadactions,butahighnumberofpageviewsandtimeonsite.Theyexhibitthewindowshoppingbehaviourandthushavesmallerpredictedrevenue.
Last,notecontact1269.Theyhavethesmallestnumberofclicks,smallestnumberofdownloads,leasttimeonsightandnodepthofproductpages.Thereforetheirpredictedrevenueislowest.
Scottgothisteamtogether,aswellasthestakeholders,fortheoutputpresentation.Hewantedtotalkaboutmarketingactions.Theycameupwiththefollowinglist:
Sales/hotleads:giventhescore,thesecontactscouldbeturnedovertothesalesteam,thatis,engagementcanbeusedasa‘qualifier’ofahotlead.
Operations/strategy:giventhevastlymorevaluable‘TryitOn’behaviour,everythingpossibleshouldbedonetoremovebarriersto‘TryitOn’.
Marcom/campaigns:messagethat‘TryitOn’isavailable,leteverypotentialcontactknowthattheycandownloadtrialsoftware,readawhitepaper,etc.,togetcomfortablewiththebuyingdecision.
Atthequarterlyanalyticoperationsmeeting,ScottandhisteamwerecalledoutbytheVPfortheirworkonengagementmodelling.Thiswasagroupofallthemarketinganalystsinthecompany.
Therehadbeenatestputinplacebasedonthatanalysisandtheresultswereoverwhelming:whencampaignsmentionedtheavailabilityof‘TryinOn’beforepurchase,purchasewasultimately3.5timesmorethanwiththosethatdidnotgetthemessage.Thistranslatestohugeincreasesinsoftwarerevenue.Theaudiencesmiledandnoddedtheirheads.
‘I’malittlesurprised’,theVPsaid.‘Thisisextremelymeaningfultous;we’vefounda
simplewaytoextractmillionsinextrarevenue,basedonananalyticproject.’
Thecrowdlookedathim.
TheVPhuffed.‘Whenwehaveafunctionalbreakfastoranafter-workget-together,youguysarelaughingandclappingandmakingallkindsofnoise.Atsportseventsyouscreamandcheer.Butwhenhearingofananalyticresultthatisverypositive,youjustnodyourhead.’
Nowtheaudiencesquirmedabit.
‘Ijustmean’,theVPcontinued,‘Iwouldthink–givenyouallworkinanalytics,andhavespentyearseducatingyourselfaboutanalytics–thatwhenyouseeanexcitingresultprovinganalytics,therewouldbealotmorehoopla.It’sokaytobegladthatyourchosencareerfieldreallydoesaddvalue.’
LetmereiteratewhatthisVPissaying.Analyticfolks,overall,tendtobeabitquiet–sure,let’ssayit’sthelogic/rational-dominatedsideoftheirbrain.
Howdoyouknowifyou’reananalyticperson?Youlovethesimplejoythatcomeswhenseeingavariablethatshouldbesignificant,provedinthedata.Thesatisfiedlookofwonderpervadesyourfacewhentheworldmakessense.Thatreplacestheconstant,cynicalcaveat-ladenwearinessweusuallyhavetocarryaround.That’swhatgotusintoanalyticsinthefirstplace,right?Peopleareconfusing,fullofirrationalgreyareas,butdataisdata,truthistruth.Whenwell-understoodrelationshipsmakesenseit’scomforting;wheninsightsarefound,it’sexciting.Murdersolved!Puzzlecompleted!Andit’sconsumerbehaviourwearetryingtopredict–thishelpsusbelievethatmaybepeopleareNOTsoconfusing.Okay,infomercialover,backtotheVP’smeeting.
‘It’sokay’,’theVPsaid,‘toacknowledgethatanalyticsworks.’
Scottstoodupandclapped.‘Yeah,analyticsrocks!’
Mostoftheaudiencelookedattheirwatches,afewclappedorcheeredalittle,somecoughed,oneortworolledtheireyes.TheVPshruggedhisshouldersandtheyallwentbacktowork.Scottsatbackdownandsighed.
HowdoImodelmultiplechannels?
Simultaneousequationsaretheanswertothatquestion.Thisincludesblogs,positiveratings,directmail,e-mail,etc.
SocialmediahasbecomeTHETHINGlately,ofcourse.Whileeveryoneseemstojumpontherevolutionarybandwagon,andrightfullyso,therehavebeenotherrevolutionarybandwagons.Inthemid-1990stheinternet/WWWbecameavailableandwidespread.Inthemid-1970sitwaspersonalcomputersandinthe1960smainframecomputers–eachofthesehadhugedataimplications.SowhilesocialmediaISadifferentkindofdata,analyticallyitmerelyallowsmoreunderstandingofconsumerbehaviour.Of
coursethemostexcitingaspectofsocialmedia(intermsofmarketingscience)isthatforthefirsttimeINBOUNDmarketingispossible.
Assuch,theabilitytomodelsocialmediaiscritical.Thisdoesnotmeanitwillrequirenewtechniques;itisjustadifferentsourceofdata.Itdoesshedlightonshoppingchannels,thatis,whatdoessocialmediahavetodowithonlinepurchasesasopposedtoofflinepurchases?Sinceeveryoneisdemandingtoknowhowmuchadvertisingbudgettoassigntosocialmedia,theimpactofsocialmediaonpurchasingbychanneliscritical.
That’swhatScottknewwasgoingtohappenwhenhewascalledintotheofficeofthenewlycreatedVPofdigitalmedia.
TheVPputdownherphoneandshookScott’shand.Scottsmiled.
‘IbetIknowwhatyou’regoingtosay’,Scottsaid.‘You’dliketoknowwhatimpactsocialmediahasonsales.’
‘Sure,butonecomplication:wehavetwosaleschannels,onlineandoffline.We’dliketoknowtowhatextentsocialmediaimpactsonsalesinboththeonlineandofflinechannel.’
Scottgulped.‘Well,that’salittlemorecomplicated.’
Shesmiled.‘ButnottoohardforsomeonethatwontheExecutiveAwardlastyear,right?’
‘We’lldowhatwecan’,Scottsaid.‘I’llgetconnectedwithyourdatapeopleandwe’llseewhatwecanfindout.’
‘Theissueisimportant’,shepointedout.‘Allofusarebeingaskedtocutouradvertisingbudgets.Wehaveaportfolioapproach.Dowespendindirectmail,e-mail,onlineorsocial?Youranalysiscanhelpusoptimizeourbudgets.’
‘Isee.Nopressure.’
‘Andwe’llneeditintwoweeks,tomeetourmarcomplans.’Shesmiledandpickedupherphone,themeetingover.Scottwalkedtohisofficeandknewthatthenexttwoweekswouldbedifficult.
Histeamcollectedweeklysalesdata,bothonlineandoffline.Scottwoulddoatimeseriesmodel.Hewouldusesimultaneousequationstomodeltheimpactofthemarketingmix(product,price,promotionsandplace)onsales.He’ddoaseparatemodelfordesktops,notebooksandworkstations.
Forexample,inthedesktopmodel,hewantedtoknowwhatpricedoestoexplainthesalesofdesktopsbyeach(onlineandoffline)channel.Whataboutpromotions,likee-mailanddirectmail?Andwhataboutsocialmedia:blogs,positivementions,shareofvoice,etc?Itwouldbeinterestingtofindoutthedifferencestheseindependentvariableshadonmovingunitsdifferentlybychannel.E-mailanddirectmailcouldbethoughtofas
outboundmarketing,whereassocialmediacouldbethoughtofasinboundmarketing.Fromastrategicpointofview,theobjectivewastooptimizethebudget,andScottthoughtthatifthismodelworkedthatwouldbeaveryrealuse.
BecauseScotthadalreadydecidedonatimeseriesmodel,ieeachrowisaweeklyaggregation,hedidnothavetodealwithsparsedataonaconsumerlevel.Thatis,ifhetookthe‘eachrowisaconsumer’approach,therewouldbesofewmatches(especiallyintermsofsocialmedia)thathewouldnothavealargeenoughsample.Likewisehewasgoingtomodelunitssoldasthedependentvariableagainstthewholemarketingmix,NOTjustusesocialmediaasindependentvariables.Thatwouldplacefartoomuchattentiononjustsocialmediaandwouldflyinthefaceofalltheotherthingsknowntomoveconsumerbehaviour,suchasprice,season,marcomvehicles,etc.
Sothetheoreticconceptionofthemodelwouldbe:
ONLINEUNITS=f(#directmails,#emails,onlineprice,offlineprice,socialmedia,etc.)
OFFLINEUNITS=f(#directmails,#emails,onlineprice,offlineprice,socialmedia,consumerconfidence,etc.)
Hewouldhavetoconsidertheidentityproblemandalltheothermodellingissues,buttheabovelookedlikewhatheneeded.
AnaddedthingScotthadtoaddress:thelagstructure.It’swellknownthatmanythings(especiallymarketingcommunicationvehicles)havealageffecton,say,demand.(Bylagismeantaweeklyvariableismoveddownoneweek,sothatinsteadofitsactualoccurrenceonJan7forexample,itislaggedtohappenonJan14.)Theactualshape,amplitudeandlengthofthatlagstructureisthesubjectofhundredsofacademicpapers.Sotheproblemis,torestate:whatimpactdomarketinglevers(price,websitevisits,marcomvehicles(includingthelagstructure),socialmedia)andothereffects(seasonality,consumerconfidence)haveonmovingunitsinboththeonlineandofflinechannels?ThisshouldbeseenasaBIGproblem,andveryimportanttoquantify.
SoScottcollectedthedataandbeganworkingonthemodel.HesettledonSAS’s3SLSprocedure.Forsocialmediatheir‘listeninggroup’cameupwithseveralvariables:numberofblogsaboutthecompanyaswellascompetitors,shareofvoice(percentmentionsaboutthecompanydividedbytotalmentionsofallcompetitors),forums,positivementions,etc.ForthelagstructureScottusedSAS’smacro(%pdl)thatallowsmodellingtoincludethenumberoflagsandamplitudeoflags.
Table11.3showstheoutputofthesimultaneous(desktop)models.Thereareseveralnotesabouteach.Firsttheofflinemodelhasanadjustedfitof80%;thatis,thelistedindependentvariables(significantatthe95%level)accountfor80%ofthemovementintheofflinechannel.
Table11.3Impactonofflineunits
OFFLINE
Variable Parameter R-Square 86%
Estimate AdjR-Sq 80%
Intercept 52,289
Blogs 0.055 +55units
Directmails 0.046 +46units
Directmails_lag1 0.039 +39units
Directmails_lag2 0.012 +12units
Directmails_lag3 0.009 +9units
Directmails_lag4 0.004 +4units
E-mails 0.025 +25units
E-mails_lag2 –0.04 –40units
E-mails_lag3 –0.065 –65units
E-mails_lag4 –0.012 –12units
Visits 0.048 +48units
Offlineprice –3.417
Onlineprice 1.801
Consumerconfidence 21.158
Q4 192,668
Themarcom(directmailande-mail)showsalageffect.Directmaillags0–4periodsinitsimpactande-mailalsolags0–4periodsinitsimpact.
Priceisinteresting.Theofflineprice(intheofflinemodel)is,asexpected,negative.Thisagainisthe‘lawofdemand’;pricegoesupandunitsgodown.Theonlinepriceispositive.Thismeanstheonlinepriceisasubstitute;thatis,iftheonlinepriceincreasedby,say,10%,theOFFlinedemandwouldincreaseby18%.
Nowaninterpretationisneeded,especiallyofsocialmediaandmarcomintermsofunits.Thegreyhighlightsshowhowmanyunitsareexpected,onaverage,fromeach,initems.Thatis,multiplyingthecoefficientby1,000,forexample,meansthatifthereare
1,000blogs,onaveragetheofflinechannelbenefitsbyabout55units.Whendirectmailisdropped,foreach1,000piecesthereare46unitsincreasedtotheofflinechannel.Notethee-maillagsarebothpositiveandnegative,meaningtheamplitudehasadifferentshape.E-mailonlyhasapositiveimpactwhenitisfirstdropped,butovertimeitisnegative(thismightreflecte-mailfatigue).Theaboveseemstoindicatethatdirectmailismoreimpactfulthane-mailintheofflinechannel.Notealsohowimpactfulq4isintheofflinechannel.Thisispartoftheinsightthatonlyaneconometricmodelgives.
Nowtakealookattheonlinemodel.TheadjustedR2isalittlebetter.Nowobserveprices.Theonlinepriceisagainnegativeasexpectedbutnotethatwhiletheofflinepriceispositive(indicatingsubstitutability)itisfarlessimpactfulthanintheofflinemodel.Thatis,intheonlinemodea10%increaseintheofflinepricebringsaboutonlya1.2%changeintheonlineunits(comparedtoan18%impactintheofflinemodel).
Itshouldbenosurprisethatwebvisitsarefarmoreimpactfultoonlineunitsbutlookhowmuchmorepowerfule-mailis.Whilethisalsoisprobablynosurprisepleasenotethatthismarcomchannelcanbequantified.Observelikewisethatintheonlinemodelnowdirectmailisnegative.
Nowlet’sinterpretthesocialmedia.Itismuchmoresignificantintheonlinemodel.Shareofvoice,forums,howmanyfollowersthefirmhasandpositivementionsallcontributetotheonlineunits.Thiswouldprobablyindicatethefirmshoulddowhattheycantoinvestinachievingpositivementions,followers,increasingshareofvoice,etc.
Thelasttaskistolookattheseasonality.Becauseq4isdropped(rememberthedummytrap?)alltheotherquartersarereferencingthat.Noteallthreearenegative(comparedtoq4)withq2beingthemostnegative.Thishelpsplanningpurposes.
Thisoverallmessagewouldseemtobe:directmailandconsumerconfidencearepowerfulinimpactingofflineunits,bute-mailandsocialmediaarenot.Intheonlinechannele-mail,socialmediaandwebsitevisitsaremuchmoreimpactful.Whileagainthisisintuitivelycompelling,ithadnotbeenquantifiedbefore.
So,giventheabovemodel,whatarethestrategicimplicationsScottcangive?Intermsofprice:sincetheonlinechannelismuchmoreofasubstituteforofflinepurchasers,raisetheofflinepricetodrivemorebuyersonlineandthinkaboutaddingonlineexclusives.
Intermsofe-mail:decreasetheamountofe-mailssenttothosethatonly/mostlypurchaseoffline.Increasetheamountofe-mailssenttothosethatonly/mostlypurchaseonline.
Intermsofdirectmail:decreasetheamountofdirectmailsenttothosethatonly/mostlypurchaseonline.Increasetheamountofdirectmailsenttothosethatonly/mostlypurchaseoffline.
Intermsofsocialmedia:engageininboundmarketing(findXadvocates/championsof
thefirm,instituteablogstrategyofcommunity,etc.).Offerpromotionsinsocialspacetopurchasethefirm’sonlineproducts.
Noteallthestrategicimplicationsfromthismodel.Itaddressesmostofthemarketingmix(product,price,promotionandplace)andoffersstrategiesbasedonquantifyingcausality.
Table11.4Impactononlineunits
ONLINE
Variable Parameter R-Square 88%
Estimate AdjR-Sq 83%
Intercept 11,805
SOV 46.92
Forums 0.0037 +3units
Followers 0.0592 +59units
Positivementions 0.016 +16units
Directmails 0.08 +80units
Directmails_lag3 –0.073 –73units
Directmails_lag4 –0.043 –43units
E-mails 0.113 +113units
E-mails_lag1 0.013 +13units
E-mails_lag4 0.009 +9units
Visits 0.165 +165units
Offlineprice 0.121
Onlineprice –5.704
Q1 –1,947
Q2 –2,323
Q3 –170
ConclusionSimultaneousequationsprovideapowerful(andsophisticated)wayofquantifying
important(andwell-known)interactions.Oversimplificationisthebaneofgoodanalytics.
Partsix
Conclusion
12
TheFinaleWhatshouldyoutakeawayfromthis?Anyotherstories/soapboxrants?WhatthingshaveIlearnedthatI’dliketopassontoyou?
Whatotherthingsshouldyoutakeawayfromallthis?
WhatthingshaveIlearnedthatI’dliketopassontoyou?Wow,we’rehereattheend.Ihopeitwasworthwhileandmaybealittlefun.Ifso,tellyourfriends.
OnethingI’dliketherestofthecorporateworldtoknowiswhatamarketinganalystdoes.Thatis,notthetechnicaldetailsbutwhatistheirfunction,whatistheirpurpose,whyaretheyimportant?
Now,Iknowthatifwetakearandomsampleofpeopleallacrossanumberofcorporationsandaskthem,‘Whatarethefirsttwowordsthatcometomind,whenyouthinkofmarketinganalysts?’
Mostofthemwillanswer,‘Smoulderingsexuality’.
Iknowit’strue,wedealwithrealdata,weseecampaigneffectiveness,wecanforecast,itisnodoubtthesexiestthinginthebuilding.ButthatisnotwhatIwouldwantthemtothinkaboutus,topofmind.Iwouldhopethatthisbook–andmanylikeit–andy’all,willhelpthemtothinkofusas‘QUANTIFYINGCAUSALITY’.
Weareabletothinkintermsof‘thiscausesthat’,thisvariable(price)changesthatvariable(sales)andthen–mostimportantly–quantifyitsomarketingstrategycanactonit.Wequantifycausality.
Idon’twanttohear,‘Correlationisnotcausality’becausewhocares;wearenottalkingaboutcorrelation,andwehardlyevertalkaboutcorrelation.Grangercausality(inventedbyeconomistCliveGranger)assertsthatifanXvariablecomesbeforetheYvariable,andiftheYvariabledoesnotcomebeforetheXvariable,andif,inremovingtheXvariable,theaccuracyofthepredictiondeteriorates,thenthereforeXcausesY.Andwecanstateitascausality.
So,acoupleofthingsI’velearnedthatI’dliketopassontoyou.TheseareanecdotesthathelpedmefocusonimportantthingsandIhopethesestorieswillhelpyou.
Anecdote#1Myfirstjobwasasasalesmaninashoestore.Iwas16andthatatleastmeantIthoughteveryoneover30wasoutoftouchandun-cool(itwasthemid-1970s).
OnedaythebosswasoutandleftBenandIinchargeofthestore.Benwasapart-timesalesguy,hadknownthebossandhisfamilyforyearsandwassemi-retired,over60,andJewish.
Awomancameindraggingtwotoddlerswithher.Benwasatthecounterandthewomansetdownapairofshoesandsaidthestrapbroke.Bensaidhe’dhelphergetareplacement.IsawrightawaythosewereNOTourshoes.Thatwomanwasabouttogetafreepairofshoesbecauseofabefuddled,half-addled,maybesenileandconfusedsalesman.Iwasnotabletogethisattentiontoexplaintohimtheerrorofhisways.Hegotheranotherpairofshoesandshealsoboughtapairforoneofhertoddlers.IwatchedthemasshepaidandcheckedoutandBenwavedatherandsmiled.
Iwentuptohim.‘Ben,whatareyoudoing!?Thosewerenotourshoes!’
‘Oh,youmeanforMrs.Rasmun?’
‘Yes,yougaveherapairofshoes,forfree!’
‘Yes,Iknowher.She’sareturningcustomer,hasaboutfivekids,comesinhereallthetime.’
‘But,youGAVEherapairofshoes.’
Helookedatme.‘Yes.IfItoldherthosewerenotourshoesshewouldhavedisagreedandwalkedout,unhappy,maybenottoevercomeback.Maybenotbuyherkidstheirshoeshere.Ididgiveherapairofshoes.Ialsosoldheranotherpairofshoes,andensuredshewassatisfiedandwouldcontinuetocomeback.’
Igulped.‘Oh…’.Somuchformycoolness.
WhatItookawayfromthat,otherthanmynarrow-mindedprofiling,wasthatsmartnessisalwaysaboutfocusingonthecustomer.It’snotwhatis‘right’financially,butwhatdrivesabusinessiscustomer-centricity.That’sprobablywhyIendedupinmarketing,adisciplinethat(issupposedto)putcustomersfirst.
Now,doesthismeanthecustomerisalwaysright?Ofcoursenot,seeabove.ThecustomerCANbecrazy.RememberGaryBecker’sirrationaldemandcurve(Becker,1962).But,accordingtoPeterDrucker,thepurposeofabusinessistocreateandkeepacustomer–getit?KEEPacustomer.Thismeansunderstandingacustomer,andthismeansusinganalytics.
Whattogetoutofthis:beingcustomer-centricisalwaysright.
Anecdote#2IworkedearlyonasananalystataPCmanufacturingfirm.IwasalsofinishingmyPhD;infact,writingmydissertation.Itinvolvedafairlynovelkindofmathematics,calledtensoranalysis(moreusedinphysics/engineeringthanmarketing/economics)andwasaboutmodellingmulti-dimensionaldemand.Myboss(whilenotveryanalytic,wasverystrategic–includingpromotinghisgroupandhimselftoallofhisbosses)wasimpressedwiththeidea.
Somehowhegotanappointmentwiththeheadguy,threelevelsabovehimself,toshowmydissertation.Thiswasnotaboutthedifferentialgeometryofmanifoldtensors,butwhatcouldbedoneforthePCmanufacturingcompanyintermsofbetterestimatesofdemand.Sothebigmeetingwasset,aboutfiveweeksinadvance.Thiswastogiveustimetoprepare,because–mygod!–thiswasanaudiencewiththeCEO,theBIGBOSS.Sowe(myboss,callhimBob,andI)workedhardonthePowerPointpresentation,spendingdaysonthewordsandgraphics,tryingtofocusontheusecasesofdemandforPCs.HRandthebigboss’ssecretaryevenmadeusrehearse,thatis,practiseourdeliveryinfrontofthem,tomakesuretherewerenooffendingphrasesorcomments(thiswasprobablydirectedatme–Iwasseenassomewhataloosecannon)andtheyhadtoapproveit.FinallyitwasalldoneandwehadourtimewiththeBIGBOSS.
Wewentinandtheofficewaslikeamuseum,glassandbrassandmarble–itwasacorporatetemple.
‘So’,myboss,Bob,began,‘thankssomuchforsomeofyourtime.MikeherehasaveryinterestingPCmodeltoshowyou.Mike?’
Iclearedmythroatandpointedtotheoverheadprojection.‘Demandisusuallymodelledasunitsbeingafunctionofseveralthings,includingprice.Itisalwaysaboutholdingeverythingelseconstant.’
‘SoBob’,theBIGBOSSsaid,‘howarewegoingtobeatthecompetitionontheseserverwars?’
Ilookedathim.What?
’Oh’,Bobstammered,‘wehavesomeideasinmind.’
Thenext45minuteswasaboutBobandtheBIGBOSStalkingabouttheserverwarsandourcompetition.Attheendweshookhandsandleft.TheBIGBOSShadalimp,damphandshake.
Whattogetoutofthis:successcomesfromfocusingonwhat’simportant,especiallyonwhat’simportanttopeopleseverallevelsaboveyou.
Anecdotes#3and#4
Thisanecdoteisimportant,becauseanyonedoingmarketingsciencehasfacedit.Andthosenotinmarketingsciencewonderaboutit.I’mtalkingaboutalteringthedata,editingtheoutputfile,changingtheresultstobe(more)intuitive.
Thisistheunderbellyofmarketingscience.Iknowthoseinotherfunctionswonderifwechangethedata.Dowemakestuffup?
Iwastalkingwithaclientrecentlyandtheytoldmeaboutaconsultantwhowaspredictingthelifttheywouldgetonaparticularcampaign.Theconsultantestimateda16%increase,whichwasWAYMOREthananythingeverachievedbefore.Theconsultantwassketchyonwhatwerethekeydriversofthisphenomenalsuccess.Theclientfranklydidnotbelieveitandsaidso.Theconsultantaskedwhatitshouldbeandtheclientrepliedthataboutone-tenthofhisestimatewouldbebelievable.Thenextweektheconsultantcamebackwitharevisedestimateof,waitforit,2%.HonesttoGod!One-tenthofwhattheiranalyticshadpredictedearlier.NowI’mheretotellyouthatthereisnowayamodelwouldpredict16%andthenreviseittorealisticallybe2%,assumingrealanalyticsweredone.
ThatisoneoftheonlyinstancesIknowofwheretheysimplychangedtheoutputfile.Bytheway,theclientdidnotbelieveiteither(didnottrusttheiranalytics)andfiredthem.Rightfullyso.
So,dowechangetheoutputfile?Theanswerisno.Wecan’t.It’snotjustaboutintellectualintegrity,it’saboutCOA(coveringourasses!).Alteringthedatacannotbehidden;changingtheresultscannotbeburieddeepenoughtoneverbefound.Thatis,youwillbefoundout,youwillbecaughtandtheywillknowthatyoualteredtheresults.Youwillneverhavecredibilityagain.Ever.Itcannotbehidden.Trustme,itwill(eventually)bediscovered.Thisisbecausealldataisinterrelated,onemetricdrivesanother,andonepieceaffectsanotherbecauseonevariablefitstogetherwithanothertotellthewholestory.ChangingonepartofitwillaffectallotherpartsanditwillNOTaddup.Thatdoesnotmeanyouhavetobroadcastittoeveryonethough.Youcanemphasizethisordirecttheconversationtofocusonthat.
ThebiggestmistakeI’veevermade(thatIknowof)wasridiculouslysimplebutverycostly.Iwasadatabasemarketinganalystandmyjobwastodoamodelandproducealistforcustomersmostlikelytopurchase.Wesentoutoveramillioncataloguesamonth(atacostofabout0.40each).
IdevelopedalogisticregressionmodeltoscorethedatabasewithprobabilitytobuyandusedSASprocrank.Iwassupposedtogivethemthetopthreedeciles.Now,SASprocrankhasdecileoutputlabelledfrom0to9,with0thehighest(thebest).Iaccidentallysentdeciles7,8and9–thelowest,theworst.Althoughthesewerethehighest(numbered)deciles,getit?Easymistaketomake,right?Well,thecampaignthatmonthdidnotdowell.SoIsentamessagetoeveryonethatIwasworkingonanew
modelthatIthoughtmightbebetterfornextmonth.MymessagewasdesignedasapreemptivestrikethatIwasengagedandworkingontheproblem.That’swhattheysaw,Iwasmakingitbetter.WhenthetimearrivedthefollowingmonthIusedthesamemodelbutthistimepickeddeciles0,1and2(thebest).Thatcampaignworkedwell.Iwascongratulatedonimprovingthemodel.Ofcoursemyteamknewitwasthesamemodelbuttherightdecileswerechosen.Keytakeaway:becarefulandbeupfrontandhonest(asneedbe).
Anotheranecdotefromearlyinmycareerwasaboutdemandestimation.Myjobwastoforecastcallvolumeandbasedonthatvolumedifferentload-balancing(amongotherthings)sitesweredesigned.Well,thecompanyhaddecidedtobuildanothersite(inFlorida)tohandleallthecalls.Theyhadboughtthelandandgotabuildingandwerehiringpeopletostaffit.Eventuallysomeonethoughtmaybetheyshouldpredicthowmanycallswouldgothere,thatis,estimatedemand.Itsohappenedthatmybosswasawell-respectedandlong-timeeconometricianandourjobwastoputupthedemandnumbers.Everyoneknewthedemandwashuge;thequestionwasjusthowhuge.SoIcollecteddata,macroandmicrovariables,competition,newproducts,timeseriestrends,etc.TheforecastIgotwaslow–waylowerthanexpected.Igulpedandlookedatitagain.Themodelwasforecastinglessthanhalfwhatwasneededforanewsite.Imetwithmybossandwewentovereverythingbutcouldonlyassume,inthebestscenario,60%ofwhatwasneeded.Wegavetherealestateteamourestimatesandtheysaidthanksandthencarriedonwiththebuildingandthehiringforthenewsite.Ayearlaterthatsitewasclosed–therewasnotenoughcallvolumetosupportit.
Nowitwouldhavebeeneasyandacceptableforustojustdoubletheoutput,right?Itwouldhavebeeneasytomakeheroicassumptionsthatmadenosenseinordertogetthedemandforecastwayhigher,right?Inthiscasewejustshowedtheoutputandshruggedourshouldersandcalleditaconservative,worstcasescenario.
TohavealtereditwouldhavebeenakintowhatEinsteincalledTheBiggestBlunderofHisLife(notthatI’mcomparingmyselftohim!)Einstein’srelativityequationsshowedthatbecauseofgravitytheuniverseshouldbeexpanding(orcontracting).Sincenoonebelievedthat,includingEinsteinhimself,headdeda‘cosmologicalconstant’tohisequations,ineffectamathematicalwaytocancelouttheexpansion.AfewyearslaterHubblediscoveredthattheuniversewasindeedexpanding.Einsteineditedtheoutputfile!Thekeytakeaway?IfitdidnotworkforEinsteinitwillnotworkforyou.Donotchangetheresults.
Whatotherthingsshouldyoutakeawayfromallthis?
Haveanimplementationplan!Thebestanalyticsintheworldisofnouseifitisnotimplemented.OftenIhavebeen
accused(oftenrightlyso)ofdoinganalyticsthatistooadvanced,andnooneunderstandswhatitmeans,nooneunderstandshowtouseit.ThisisafterIhavedoneit,showntheresultsandputtogetheraPowerPointpresentationexplainingwhatitisandhowithelps.Itwastypicallythenatureofmyjobtodoaprojectandthen,basically,goaway.TheodoreLevitt(who,itcouldbeargued,basicallyinventedmarketingasadisciplinewithhisMarketingMyopiaarticle)saidthatpeopledonotwantaone-inchdrill;theywanttomakeahole,oneinchwide.Iwasoftenguiltyofexpoundingonthecoolnessofthedrill,thewonderfuldetailsandspecificationsofthedrill,howthedrillwouldhelpmakeahole,whythisdrillisbetterthanthatdrill,etc.Ineededtofocusonwhatwastheneed,notthetool.ThereforeI’dsuggestsomeofthefollowingafteranalyticshasbeendone.
Setuptacticalusecases.Puttogetherscenariosofbeforeandafter,withandwithouttheanalytics.
Trainthestaff,maybeevenwithrealdata.Designsimulationsorusepastdataandshowhowtheanalyticswillbeimplemented.Thismaymeandesigningatrackingreportandfocusingonthenewmetrics.Itoughttomeanactuallyshowingdata,thescoreonthedatabaseandthestrategicimplicationsofthenewinsights.Takeawaytheabstractblackbox:analyticsisnotvoodoo.
Getstakeholderstogetherandtalkabouttheirgoals(especiallythosetheirbonusesaredependenton).Showhowthenewanalyticsdirectlyimpactsthesemetrics,andthendecideuponstretchgoals.Ihavetypicallyfoundthebarisratherlow.Mostfirms,evenFortune100firms,havelittleideawhat’sgoingon,havefewinsightsanddonotknowtheircustomersorcompetition.Theytypicallymarketwithashotgunapproachandthrowmoneyaroundhopingforthebest.Afewwell-designedanalyticprojectscandrasticallymakeadifference.That’showyoubecomeasuperstar.
Youshouldsetupcheck-insat30daysafter,90daysafter,and180daysafter,etc.,togetbacktogetherandseehowit’sgoing,whathasbeenhappening.Youareaconsultantandaretheretohelpanswerquestions,ensurethemodesareworkingandarebeingusedcorrectly.
It’scommontosetuptestvs.controlgroups,somakesureyouarepartofthis.Remember,everyonewantstotest,butalmostnooneknowshowtodesignastatisticaltest.
Findawaytomakeanalyticscentraltoasmanydivisionsandseniorpeopleaspossible.Getinfrontofasmanydecisionmakersasfeasible.Nevertalkaboutthetechnicalaspectsoftheanalytics,alwaystalkaboutthedownstreamresultant(typicallyfinancial)metrics.Insteadofsayingthet-ratioissignificantandpositive,tellthemthatnetprofitcanincreaseby2.5%nextquarter.Thatwillmakethemputtheirphonesdownandlisten.
Takeaclassorreadabook(ortwo)onabnormalpsychology
Successinthecorporateworlddependsmoreonyourabilitytoworkwithpeopleandgetthemtodowhatneedstobedonethanonyourtechnicalskills.Thisbookhasbeenaboutaddingtoolsbutreallyyouneedtounderstandpeople.Everyoneisdifferent,thesamethingsdonotworkonallpeople,andpeopleevolveandchangeovertime.Justlikekids.
Allbusinessemotionscomefromeitherfearorgreed.Discovertheprimarymotivatorofthepeopleaboveyouandthepeoplebelowyou.Generallyspeaking,lower-levelfolksaretactic-oriented;theyneedalistoftaskstocomplete.Astheyriseinthecorporaterankstheytendtobecomelesstacticalandmorestrategic.Thismeans,generally,lower-levelfolksaremotivatedbyfear(didtheygetthejobdone,wasitdonecorrectly,cantheybeblamed?)andhigher-levelpeoplearemotivatedbygreed(theyruntheorganizationandgetabonus,theygetperks,newspaperclippingsmentiontheirname).Astheyreachaveryhighleveltheyaremotivatedagainbyfearbecausetheycanbeblamedforeverything.
Soyouneedtoknowpeopleenough(especiallythoseunderyou)sothatyouunderstandiftheyaregoingthroughadivorce,havingtroublewiththeirkids,drugproblems,orjustplaincrazy.Somepeoplewouldpreferrecognitiontoaraise,aflexiblescheduletoanincreaseintitle,one-on-onetimewithyouinsteadoftheforcedfrivolitiesofdepartmentoff-sites.(BTW,noteveryonelovesbowlingorpaintball!)So,investanddiscover.
ConsumerbehaviourispredictableenoughWhatmarketingsciencedealswithisquantifyingcausality.Thatis,measuringhowonevariableimpactsanothervariable.Thismeanspredictingconsumerbehaviour.
Iliketopointoutthattheweatherman,everyday,predictstheweather.Everydayit’swrong.(Maybeit’srightenough,butyoudecidehowoftenyouhavemadefunofthebadpredictions.)Meteorologistshavedecadesofdataandusemainframecomputerstodevelopmodels.Thedatatheydealwitharedewpoints,temperature,wind,pressure,precipitation,etc.Thatis,theydealwithinanimateobjects.Allofthis,andtheystillcan’tgetitright!
Wemarketingsciencefolkstypicallyhaveonlyahandfulofyearsofdatatoworkwith.WedothisonaPCorso,maybeaserver.Andwedealwithirrationalanimateconsumers.Wehavenochancetobe‘right’.
Butthetechniquesyou’veseenherehelpandtheyhelptogetitrightoftenenough.It’softenenoughtomovetheneedleonacorporation’sfinancialperformance.Andbytheway,howgooddoesthemodelhavetobe?I’vehadabossnotuseamodelbecauseitwasnot100%accurate.(Yes,hewasanidiot.)
Iliketousetheanalogyoftheevolutionofthehumaneye.Millionsofyearsagoourancestorswereblindandathighriskamongpredators.Eventuallysomemutationsformed
andwedevelopedan‘eyebud’thatallowednotperfectvisionbutcoulddetectlightfromdark,couldsenseshadowymovementsahead,etc.Iproposethatwhilethiseyebudwasnowherenearperfect(not100%)theinsight(getit,sight?)wasenoughtoallowthemtomakesmarterdecisions.Itsvisualacuitywouldgrowanddevelopovertimebutatleastitcouldnowslightly‘see’largecreaturescomingtowardit,itcouldtelldayfromnight,maybefindfoodeasier,etc.Iproposethiswasenoughtosurvive.
So,aimhigh.Wecameoutofthemud.
Thebarislow.Wecanonlygoupfromhere.Goget‘em!
GlossaryAverage:themostrepresentativemeasureofcentraltendency,NOTnecessarilythemean.
Censoredobservation:thatobservationwhereinwedonotknowitsstatus.Typicallytheeventhasnotoccurredyetorwaslostinsomeway.
Collinearity:ameasureofhowvariablesarecorrelatedwitheachother.
Correlation:ameasureofbothstrengthanddirection,calculatedasthecovarianceofXandYdividedbythestandarddeviationofX*thestandarddeviationofY.
Covariance:thedispersionorspreadoftwovariables.
Designofexperiments:aninductivewayofcreatingastatisticaltestusingastimulustakingintoaccountvariance,confidence,etc.,byrandomizationandcomparisontoacontrolgroup.
Elasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproducesmorethanthatchangeinanoutputvariable.
Elasticity:ametricwithnoscaleordimension,calculatedasthepercentchangeinanoutputvariablegivenapercentchangeinaninputvariable.
Inelasticdemand:aplaceonthedemandcurvewhereachangeinaninputvariableproduceslessthanthatchangeinanoutputvariable.
Lift/gainschart:avisualdevicetoaidininterpretinghowamodelperforms.Itcomparesbydecilesthemodel’spredictivepowertorandom.
Maximumlikelihood:anestimationtechnique(asopposedtoordinaryleastsquares)thatfindsestimatorsthatmaximizethelikelihoodfunctionobservingthesamplegiven.
Mean:adescriptivestatistic,ameasureofcentraltendency,themeanisacalculationsummingupthevalueofalltheobservationsanddividingbythenumberofobservations.
Median:themiddleobservationinanoddnumberofobservations,orthemeanofthemiddletwoobservations.
Mode:thenumberthatappearsmostoften.
Ordinaryregression:astatisticaltechniquewherebyadependentvariabledependsonthemovementofoneormoreindependentvariables(plusanerrorterm).
Oversampling:asamplingtechniqueforcingaparticularmetrictobeoverrepresented(larger)inthesamplethaninsimplerandomsampling.Thisisdonebecauseasimplerandomsamplewouldproducetoofewofthatparticularmetric.
Range:ameasureofdispersionorspread,calculatedasthemaximumvaluelessthe
minimumvalue.
Reducedformequations:ineconometrics,modelssolvedintermsofendogenousvariables.
Segmentation:amarketingstrategyaimedatdividingthemarketintosub-markets,whereineachmemberineachsegmentisverysimilarbysomemeasuretoeachotherandverydissimilartomembersinallothersegments.
Simultaneousequations:asystemofmorethanonedependentvariable-typeequation,oftensharingseveralindependentvariables.
Standarddeviation:thesquarerootofvariance.
Standarderror:anestimateofstandarddeviation,calculatedasthestandarddeviationdividedbythesquarerootofthenumberofobservations.
Stratifying:asamplingtechniquechoosingobservationsbasedonthedistributionofanothermetric.Thisisdonetoensurethesamplecontainsadequateobservationsofthatparticularmetric.
Variance:ameasureofspread,calculatedasthesummedsquareofeachobservationlessthemean,dividedbythecountofobservationslessone.
Z-score:ametricdescribinghowmanystandarddeviationsanobservationisfromitsmean.
BibliographyandfurtherreadingAriely,Dan(2008)PredictablyIrrational:Thehiddenforcesthatshapeourdecisions,HarperCollins
Bagozzi,Richard(ed)(2002)AdvancedMethodsofMarketingResearch,Blackwell
Baier,Martin,Ruf,KurtisandChakraborty,Goutam(2002)ContemporaryDatabaseMarketing:Conceptsandapplications,RacomCommunications
Becker,Gary(1962)Irrationalbehaviourandeconomictheory,JournalofPoliticalEconomy,70(1),pp1–13
Belsley,David,Kuh,EdwinandWelsch,Roy(1980)RegressionDiagnostics:Identifyinginfluentialdataandsourcesofcollinearity,JohnWileyandSons
Binger,BrianandHoffman,Elizabeth(1998)MicroeconomicswithCalculus,AddisonWesley
Birn,RobinJ(2009)TheEffectiveUseofMarketResearch:Howtodriveandfocusbetterbusinessdecisions,KoganPage
Brown,WilliamS.(1991)IntroducingEconometrics,WestPublishingCompany
Chiang,Alpha(1984)FundamentalMethodsofMathematicalEconomics,McGrawHill
Cox,David(1972)Regressionmodelsandlifetables,JournalofRoyalStatisticalSociety,34(2),pp187–220
Deaton,AngusandMuellbauer,John(1980)EconomicsandConsumerBehavior,CambridgeUniversityPress
Engel,James,Blackwell,RogerandMiniard,Paul(1995)ConsumerBehavior,DrydenPress
Greene,WilliamH(1993)EconometricAnalysis,PrenticeHall
Grigsby,Mike(2002)Modelingelasticity,CanadianJournalofMarketingResearch,20(2),p72
Grigsby,Mike(2014)RethinkingRFM,MarketingInsights
Hair,Joseph,Anderson,Rolph,Tatham,RonaldandBlack,William(1998)MultivariateDataAnalysis,PrenticeHall
Hamburg,Morris(1987)StatisticalAnalysisforDecisionMaking,HarcourtBraceJovanovich
Hazlitt,Henry(1979)EconomicsinOneLesson:Theshortestandsurestwaytounderstandbasiceconomics,CrownPublishers
Hughes,ArthurM.(1996)TheCompleteDatabaseMarketer,McGrawHill
Intriligator,Michael,Bodkin,RonaldandHsiao,Cheng(1996)EconometricsModels,TechniquesandApplications,PrenticeHall
Jackson,RobandWang,Paul(1997)StrategicDatabaseMarketing,NTCBusinessBooks
Kachigan,Sam(1991)MultivariateStatisticalAnalysis:Aconceptualintroduction,RadiusPress
Kennedy,Peter(1998)AGuidetoEconometrics,MITPress
Kmenta,Jan(1986)ElementsofEconometrics,Macmillan
Kotler,Philip(1967)MarketingManagement:Analysis,planningandcontrol,PrenticeHall
Kotler,Philip(1989)Frommassmarketingtomasscustomization,PlanningReview,17(5),pp10–47
Lancaster,Kelvin(1971)ConsumerDemand,ColumbiaUniversityPress
Leeflang,Peter,S.H.,Wittink,Dick,Wedel,MichelandNaert,Philippe(2000)BuildingModelsforMarketingDecisions,KluwerAcademicPublishers
Levitt,Theodore(1960)Marketingmyopia,HarvardBusinessReview,38,pp24–47
Lilien,Gary,Kotler,PhilipandMoorthy,K.Sridhar(2002)MarketingModels,Prentice-HallInternationaleditions
Lindsay,CottonMather(1982)AppliedPriceTheory,DrydenPress
MacQueen,JB(1967)Somemethodsforclassificationandanalysisofmultivariateobservations,inProceedingsof5thBerkeleySymposiumonMathematicalStatisticsandProbability,UniversityofCaliforniaPress
Magidson,JayandVermunt,Jeroen(2002)Anontechnicalintroductiontolatentclassmodels,StatisticalInnovationwhitepaper[online]http://statisticalinnovations.com/technicalsupport/lcmodels2.pdf
Magidson,JayandVermunt,Jeroen(2002)Latentclassmodelsforclustering:acomparisonwithK-means,CanadianJournalofMarketingResearch,20,pp37–44
Myers,James(1996)SegmentationandPositioningforStrategicMarketingDecisions,AmericanMarketingAssociation
Porter,Michael(1979)Howcompetitiveforcesshapestrategy,HarvardBusinessReview,March/April,pp137–45
Porter,Michael(1980)CompetitiveStrategy,TheFreePress
Samuelson,Paul(1947)FoundationsofEconomicAnalysis,HarvardUniversityPress
Schnaars,StevenP(1997)MarketingStrategy:Customers&competition,TheFreePress
Silberberg,Eugene(1990)TheStructureofEconomics:Amathematicalanalysis,McGrawHill
Sorger,Stephan(2013)MarketingAnalytics,AdmiralPress
Stone,Merlin,Bond,AlisonandFoss,Bryan(2004)ConsumerInsight:Howtousedataandmarketresearchtogetclosertoyourcustomer,KoganPage
Sudman,SeymourandBlair,Edward(1998)MarketingResearch:Aproblemsolvingapproach,McGrawHill
Takayama,Akira(1993)AnalyticalMethodsinEconomics,UniversityofMichiganPress
Treacy,MichaelandWiersema,Fred(1997)TheDisciplineofMarketLeaders:Chooseyourcustomers,narrowyourfocus,dominateyourmarket,AddisonWesley
Urban,GlenandStar,Steven(1991)AdvancedMarketingStrategy:Phenomena,analysisanddecisions,PrenticeHall
Varian,Hal(1992)MicroeconomicAnalysis,W.W.Norton&Company
Wedel,MichelandKamakura,Wagner(1998)MarketSegmentation:Conceptualandmethodologicalfoundations,KluwerAcademicPublishers
Weinstein,Art(1994)MarketSegmentation:Usingdemographics,psychographicsandothernichemarketingtechniquestopredictandmodelcustomerbehavior,IrwinProfessionalPublishing
IndexNote:italicsindicateafigureortableinthetext.
A/Btesting(i),(ii)
abnormalpsychology(i)
advertising(i)
affinityanalysis(i)
AID(automaticinteractiondetection)(i)
AlmostIdealDemandSystem(AIDS)(i)
average(i),(ii),(iii)
definition(i),(ii)
BayesInformationCriterion(BIC)(i),(ii),(iii)
Becker,Gary(i)
behaviouralsegmentation(BS)(i)
differencetoRFM(i),(ii)
techniques(i)
seealsosegmentation
branding(i)
causality(i),(ii)
seealsoGrangercausality
censoredobservation(i),(ii)
centrallimittheorem(i)
CHAID(chi-squaredautomaticinteractiondetection)(i),(ii)
advantages(i)
disadvantages(i)
output(i)
uses(i)
‘champion/challenger’
seeA/Btesting
Chrysler(i)
Cochran-Orcutttest(i)
collinearity(i)
definition(i)
conditionindex(i),(ii)
confidenceintervals(i),(ii)
‘confusionmatrix’(i),(ii)
conjointanalysis(i),(ii)
consumerseecustomerbehaviour
correlation(i)
definition(i)
negative(i)
positive(i)
serial(i),(ii)
covariance(i),(ii)
definition(i)
Cox,SirDavid(i),(ii)
customerbehaviour(i),(ii),(iii),(iv),(v)
background(i)
choices(i),(ii)
constraints(i)
data(i),(ii)
decision-process(i),(ii)
engagement(i)
example(i)
experientialmotivations(i),(ii)
informationprocessing(i)
loyalty(i),(ii),(iii)
marketingstrategyand(i),(ii)
needrecognition(i)
predicting(i)
preferences(i)
primarymotivations(i),(ii)
post-purchaseevaluation(i)
pre-purchasealternativeevaluation(i)
purchasing(i)
shareofvoice(i)
underlyingmotivations(i)
customerloyalty(i),(ii)
emotional(i),(ii)
transactional(i),(ii)
data(i)
behavioural(i),(ii)
big(i)
clickstream(i)
database(i)
digital(i)
survey(i)
usesof(i)
Deaton,Angus(i)
deductivethinking(i)
demand(i)
drivers(i)
elastic(i),(ii),(iii)
estimation(i)
inelastic(i),(ii),(iii)
descriptiveanalysis(i)
designofexperiments(DOE)(i),(ii),(iii)
digitalanalytics(i)
Drucker,Peter(i),(ii),(iii)
‘dummytrap’(i),(ii)
dummyvariables(i),(ii)
seealsovariables
Durbin-Watsontest(i),(ii)
econometrics(i),(ii),(iii)
elasticdemand(i)
elasticity(i)
elasticitymodelling(i),(ii),(iii)
outputbysegment(i)
overview(i)
ownpricevscompetitors(i)
pointelasticity(i)
segmentation(i)
seealsodemand
engagement(i)
issuetree(i)
model(i)
purpose(i)
seealsocustomerbehaviour
equations
deterministic(i),(ii)
probalistic(i),(ii)
reducedform(i)
simultaneous(i),(ii)
estimators(i),(ii)
consistency(i)
efficiency(i)
unbiasedness(i)
gametheory(i),(ii),(iii),(iv)
generalsurvivalcurve(i)
glossary(i)
Grangercausality(i)
Hamburg,Morris(i)
hierarchicalclustering(i)
dendogram(i)
Iacocca,Lee(i)
illconditioning(i)
inductivethinking(i)
inelasticdemand(i)
Kennedy,Peter(i)
K-meansclustering(i),(ii),(iii),(iv)
advantages(i)
disadvantages(i)
Kotler,Philip(i),(ii),(iii)
latentclassanalysis(LCA)(i),(ii),(iii)
advantages(i)
disadvantages(i)
LatentGold(i)
Levitt,Theodore(i),(ii)
lifetimevalue(LTV)(i)
descriptiveanalysis(i)
examplecalculations(i)
predictiveanalysis(i)
lift/gainschart(i),(ii)
logisticregression(i),(ii),(iii),(iv),(v),(vi),(vii),(viii)
marketbasketanalysisand(i)
logit(i),(ii)
MacQueen,James(i)
Magdison,Jay(i),(ii)
marcomseemarketingcommunications
marketbasketanalysis(i)
estimating/predicting(i),(ii)
marketing(i),(ii),(iii),(iv)
consumer-centric(i)
customerbehaviourand(i)
database(i),(ii),(iii)
demand(i)
partition(i),(ii)
position(i),(ii)
prioritize(i),(ii)
probe(i),(ii)
productcentric(i)
strategic(i),(ii)
tactical(i)
marketingcommunications(marcom)(i),(ii),(iii),(iv)
businesscase(i)
impactonrevenue(i)
responsestransactions(i)
marketingeconomics(i)
marketingresearch(i),(ii)
marketingstrategy(i),(ii),(iii),(iv),(v)
competitivethreats(i)
consumerbehaviourand(i)
defensivereactions(i)
lifetimevale(LTV)and(i)
offensivereactions(i)
types(i)
maximumlikelihood(i),(ii)
mean(i),(ii),(iii),(iv)
definition(i)
measuresofcentraltendency(i),(ii),(iii)
measuresofdispersion(i),(ii)
median(i),(ii),(iii),(iv)
definition(i)
mode(i),(ii),(iii),(iv)
definition(i)
modelling
dependentvariabletechniques(i),(ii)
engagement(i)
inter-relationshiptechniques(i)
segmentationand(i)
structuralequation(i)
Muelbauer,John(i)
multipleregression(i)
Myers,JamesH(i),(ii)
Nash,John(i)
netpresentvalue(NPV)(i)
normaldistribution(i),(ii),(iii)
Omniture(i)
ordinaryregression(i),(ii),(iii),(iv),(v),(vi),(vii)
definition(i)
oversampling(i),(ii)
partiallikelihood(i)
pointelasticity(i)
Porter,Michael(i)
predictiveanalysis(i)
pricing(i),(ii),(iii),(iv)
probability(i),(ii),(iii)
example(i)
proportionalhazardsmodelling(i)
seealsosurvivalanalysis
range(i),(ii)
definition(i)
reducedfromequations(i)
regression(i),(ii),(iii),(iv),(v)
revenuegrowthmargin(i)
RFM(recency,frequency,monetary)(i),(ii),(iii),(iv),(v),(vi)
definition(i)
ridgeregression(i)
samplesizeequation(i),(ii)
sampling(i)
distribution(i),(ii)
Schnaars,StevenP(i)
segmentation(i),(ii),(iii)
accessibility(i)
actionable(i)
algorithm(i)
behavioural(i)
behaviouraldata(i)
benefits(i)
businessrules(i)
definition(i),(ii)
example(i)
identifiability(i)
marketingstrategy(i),(ii)
metrics(i)
namingsegments(i),(ii)
pricingand(i)
referencebooks(i)
responsiveness(i)
scoringdatabase(i)
stability(i)
strategicusesof(i),(ii)
substantiality(i)
testandlearnplan(i),(ii)
toolsandtechniques(i)
significance(i),(ii)
simpleregression(i)
simultaneousequations(i),(ii),(iii),(iv),(v),(vi)
definition(i)
‘slopeshifters’seebinaryvariables
standarddeviation(i),(ii),(iii),(iv)
definition(i)
standarderror(i),(ii)
StatisticalInnovations(i)
statisticaltechniques
assumptions(i),(ii)
dependentequationtypes(i),(ii),(iii)
inter-relationshiptypes(i),(ii),(iii)
segmentation(i)
statisticaltesting(i)
A/Btesting(i)
samplesizeequation(i)
Stone,Merlin(i)
strategicmarketingseemarketing
stratifying(i),(ii)
structuralequationmodelling(SEM)(i),(ii)
latentvariables(i)
supply(i)
surveys
data(i),(ii)
design(i)
respondentfatigue(i)
survivalanalysis(i)
businesscase(i)
targeting(i)
‘timeuntilanevent’(i)
t-ratio(i),(ii)
universalcontrolgroup(UCG)(i)
variables(i),(ii),(iii),(iv)
binary(i),(ii)
endogenous(i),(ii)
exogenous(i),(ii)
inter-relationshiptechniques(i)
latent(i),(ii)
predetermined(i)
seealsocorrelation,covariance,modelling
variance(i),(ii),(iii),(iv)
varianceinflationfactor(VIF)(i)
Vermunt,JeroenK(i)
Yule-Walkerestimate(i)
z-score(i),(ii),(iii),(iv),(v),(vi),(vii)
formula(i)
Publisher’snote
Everypossibleefforthasbeenmadetoensurethattheinformationcontainedinthisbookisaccurateatthetimeofgoingtopress,andthepublisherandauthorcannotacceptresponsibilityforanyerrorsoromissions,howevercaused.Noresponsibilityforlossordamageoccasionedtoanypersonacting,orrefrainingfromaction,asaresultofthematerialinthispublicationcanbeacceptedbytheeditor,thepublisherortheauthor.
FirstpublishedinGreatBritainandtheUnitedStatesin2015byKoganPageLimited
Apartfromanyfairdealingforthepurposesofresearchorprivatestudy,orcriticismorreview,aspermittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced,storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublishers,orinthecaseofreprographicreproductioninaccordancewiththetermsandlicencesissuedbytheCLA.Enquiriesconcerningreproductionoutsidethesetermsshouldbesenttothepublishersattheundermentionedaddresses:
2ndFloor,45GeeStreetLondonEC1V3RSUnitedKingdomwww.koganpage.com
1518WalnutStreet,Suite1100PhiladelphiaPA19102USA
4737/23AnsariRoadDaryaganjNewDelhi110002India
©MikeGrigsby,2015
TherightofMikeGrigsbytobeidentifiedastheauthorofthisworkhasbeenassertedbyhiminaccordancewiththeCopyright,DesignsandPatentsAct1988.
ISBN9780749474171
E-ISBN9780749474188
BritishLibraryCataloguing-in-PublicationData
ACIPrecordforthisbookisavailablefromtheBritishLibrary.
LibraryofCongressCataloging-in-PublicationData
Grigsby,Mike.
Marketinganalytics:apracticalguidetorealmarketingscience/MikeGrigsby.
pagescm
ISBN978-0-7494-7417-1(paperback)–ISBN978-0-7494-7418-8(ebk)1.Marketingresearch.2.Marketing.I.Title.
HF5415.2.G7542015
658.8’3–dc23
2015016002
TypesetandeBookbyGraphicraftLimited,HongKong
PrintproductionmanagedbyJellyfish
PrintedandboundbyCPIGroup(UK)Ltd,Croydon,CR04YY