Establishing a Data Quality Foundation for a Successful ... a Data Quality...3 clients 165 clients 1...
Transcript of Establishing a Data Quality Foundation for a Successful ... a Data Quality...3 clients 165 clients 1...
Establishing a Data Quality Foundation for a Successful MDM Initiative
Tony Fisher Peter HarveyPresident & CEO President & CEODataFlux Intellidyn
Hex Nut, Size 1/4-20, ZincPlated, Package 100
HexNut, 1/4-20, Z, 100p
Hex Nut, 1/4"-20 ZINC, 100-count
Smith, BillB. SmithBill Smith
Governing Data for Corporate Success
Product Data
Customer Data
Are you ready for MDM?
Pure and simple: The most Pure and simple: The most critical factor to master critical factor to master data management is data data management is data quality.quality.
-- David LoshinDavid Loshin
MDM Programs . . .
Promote and ensure corporate alignment
Identify data providers and consumers
Encompass People, Policy and Technology
Must be built on a robust data quality platform, finding and fixing at
the source and auditing
Leverage the consistency, standardization and reuse of data assets
Look Familiar?
Stakeholder Perspectives
Business Drivers for MDM
Operational Efficiency
Risk Management
Competitive Advantage
IT Modernization
Data Governance Maturity Model
HIGH
LOW
Risk
LOW
HIGH
Rew
ard
UNDISCIPLINED REACTIVE PROACTIVE GOVERNEDPeople, Policies, Technology Adoption
BPM Integration
CDI
PDM
CRM
ERP
Data Warehouse
SFA
Database Marketing
MDM
Think Locally,Act Locally.
Think Globally,Act Locally.
Think Globally,Act Collectively.
Think Globally, Act Globally.
Copyright © 2007 DataFlux Corporation LLC, Cary, NC, USA. All Rights Reserved.
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited
Data Quality Practices and Challenges
By
Intellidyn Corp.
November 14, 2007
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 10
Corporate SnapshotIncorporated June 23, 1998, Began Full time operations January 2000“S” Corporation100% Owned by CEO, P. E. HarveyEmployees: 42Profitable since Inception
Revenue Mix:56% Modeled Data10% Analytic Services12% Strategic Consulting22% Prospect Data Warehousing
Sampling of Clients across Banking, Lending, Insurance, Non-Profit, Travel and CollectionsFidelity Investments Ace MortgageMarch of Dimes Countrywide LendingBanco Popular GMAC Mortgage CorpUS Bank Acurian PharmaceuticalNationwide Insurance JP Morgan ChaseAEGON Accredited Home LendersAllstate NovaStar Mortgage Inc.Fireman’s Fund Insurance CannonYour Man Tours (Travel) Viking River CruisesCapital One Tritium Card Services
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 11
The “Inside Track” on Intellidyn“Boot Strapped” from:
A blank sheet of paper$40,000 of personal moneyNo external funding to dateProfitable since inception
Information Infrastructure / Analytic capability:Equivalent to: Merkle, Harte Hanks, EpsilonSurpassing: Trans Union, Donnelley, Knowledgebase, Allant, BeNow, Others
Built by Fortune 50 database/analytic employees to be serviced the way they needed to be serviced
Access to everything“Out Thinking” the client“Zero Defect”Migrate internally, when ready
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 12
What Sets Intellidyn Apart?
We get the client past their data issues and beyond multi-channel campaigning . . .
We are the strategists, leveraging advanced database marketing techniques with our
marketing experience.
Operating from an award winning technology platform
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 13
Since 2001, We Have Scaled Exponentially
From: To:3 clients 165 clients
1 Master File > 5 Master Files
.25 terabyte single master file > 4 TB (ea.) integrated master files
1 terabyte live storage > 30 TB live storage, >75TB Off line
4 models > 300 models
4 processor single thread 20 processor SMP Multi-threading
5 days to load master files < 24 hrs without impacting clients
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 14
Personal InfoName : John DoeSpouse : Jane DoeAddress : 555 Main St
Bethpage, NY 11714Tel # : (516) 555-1212Bus Tel # : (631) 555-1212
Purchased Behavior
Purchased $1,000 in electronic last 6months from catalogs, online & retailPurchased $300 in gifts in last 12 monthOpted in and clicked on email offers
Reported DemographicsBirth date : Jan 1, 1945Martial Status : Married# of Children : 2Children Age Range : < 18 YearsGender : Male
Credit InfoHome Value : $ 500,000Home Purchase Date : Sept, 1985Length of Residence : 15 Yr 7 MonthHomeowner Dwelling Size: SingleHousehold Income : $ 250,000Occupation : Senior ManagementCredit Limit : $ 21,000Highest Purchase: $ 10,500 Vehicle : 2000 Ford Mustang, Aug-00
Model Scoring.62 / .12 / .15 / .06
Vertical ListData Warehouse (6)Visions (15)
Lifetime Value $$$ITA & PA Scoring4 / 10 / 15 / 2 / 5 / 9
Subprime
Alt A
Insur.
PRime
OtherBalance
ResponseConversion
BalanceChannel
We deliver each client a complete marketing database of the US consumer base, their customers, contacts, campaign history under a
proprietary set of Business Rules
“True” CRM Database
LifestyleCredit DataCredit Bureaus Property Data
Purchase Behavior
Retail/Catalog Transactions
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 15
We integrate national prospecting on an enterprise level, enablingclients to speak in one voice to their customers and prospects
Marketing DatabaseDatabase Integration
Address Correction and Validation
Model Scoring
Suppressions
Prospect Screening
Prospect Allocation: zip
Prospect Response & Privacy Requests
Intelligent Consumer Information• Management • Analysis • Marketing Execution
Integrated:Data MiningTargetingResponse AnalysisStatistical ModelingContact MgmtSegmentation
Demographic
Purchase Behavior
Life Style
Prospects
Prospects
Prospects
National Data Sources
VerticalLists Prospects
Direct Mail
Local Marketing
Telemarketing
External Requests/Rules
ProfitOptimizatio
n
Product Managemen
tCampaign
Management
Recycling
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 16
An +80mm mailer alerted us to “Ghosts”A person other than the one on the database
lives at that address
Deed RecordsFound Equifax by KeyZip4
Equifax by KeyLN
No Total Yes Total Grand Total
Found Experian by KeyZip4
Found Experian by KeyLN
No - Last Name in EFAX
Yes - Last name in
EFAX*No - Last
Name in EFAX
Yes - Last name in EFAX8
No last name EXP 70,691 154 70,845 7,372 7,455 14,827 85,672
Yes Last name in EXP* 821 38 859 59 80 139 998
No Total 71,512 192 71,704 7,431 7,535 14,966 86,670No last name
EXP 37,257 60 37,317 1,169 359 1,528 38,845Yes Last name
in EXP* 74,850 139 74,989 791 509 1,300 76,289Yes Total 112,107 199 112,306 1,960 868 2,828 115,134Grand Total 183,619 391 184,010 9,391 8,403 17,794 201,804
Yes address in EXP
No address in EFAX Yes address in EFAX
No address in EXP
January to April 2007 New DEED Records matched to Experian and Equifax
Jan Feb Mar Apr % of Total Type
13,493 13,041 19,044 25,113 39% missing
8,874 7,597 10,634 18,693 35% ghosts
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 17
Overall AssessmentInvestigation Approach:
• Quantification biased– The selected counties were not representative of U.S.
• Biggest driver of non-matches is quality of Deed file N&A– High percentage of:
• Missing Names• Incorrect/incomplete addresses
• Next driver is lack of matching logic sophistication– Soundex, Reference files, Normalization, CASS/NCOA, other
• Client was not comparing “Apples-to-Apples”– Matched at the individual level– Should be at the household level
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 18
Approach: Replicate what we normally achieve, typically +/- 90% match levelsInvestigation Approach:
• We matched Experian Gold to other Master files:– Credit bureaus (Experian & TU)– Acxiom– US 411 Directory– NOT property, due to quality
• Matched at:1. N&A level,
– Distribution by Dwelling type and Recipient Reliability code2. Non-matches matched to aged credit files3. Individual, Household levels4. Address level
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 19
We achieved a near 80 percent match at the individual N&A level (Experian Gold to other master files)
Approximately 75% of Experian Gold records are at the same address on both credit files (Trans Union and Experian) and Acxiom
Additional 2.54% matches applying the Non-matches to historical credit files
Person one on Experian Gold matched as follows
Multi-Family Marginal Multi-Family PO BOX Single Family TOTALMatched to Credit or Acxiom file 9.91% 2.03% 2.36% 61.33% 75.62%
Non Matches 5.62% 1.27% 1.94% 15.55% 24.38%15.53% 3.30% 4.29% 76.88% 100.00%
Multi-Family
Marginal Multi-Family
PO BOX
Single Family TOTAL
ched 3 Month Aged dit 0.53% 0.12% 0.45% 1.43% 2.54%
Matchs 17.17% 3.57% 5.05% 71.67% 97.46% 17.70% 3.69% 5.50% 73.10% 100.00%
Investigation Results:
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 20
Match rates exceed 90 percent at the household and address levels, while “Ghosts” become apparent
• At the household level we are at 85 percentVolume Match Rate
CUM. Match Rate
Matched to Current Credit or Acxiom file 63,880,714 75.62% 75.62%Matched to aged Credit File 1,252,208 1.48% 77.10%
HH Matched to Credit File 6,841,996 8.10% 85.20%Non Match 12,498,879 14.80% 14.80%
84,473,797 100.00% 100.00%
• At the household level and address levels we are at 90 percent Volume Match Rate
CUM. Match Rate
Matched to Current Credit or Acxiom file 63,880,714 75.62% 75.62%Matched to aged Credit File 1,252,208 1.48% 77.10%
HH Matched to Credit File 6,841,996 8.10% 85.20%Address match to Credit Files 4,252,181 5.03% 90.24%
Non Match 8,246,698 9.76% 9.76%84,473,797 100.00% 185.20%
Investigation Results:
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 21
Since 2003, to address this “Ghost” issue Experian created a Recipient Reliability Code (RRC) process
• We deploy RRC 1 & 2 in the prospect selection process• It attempts to solve for the mobility of a person across addresses• It’s a combination of:
• Address Quality• Mobility Score (Predictive Model of likely to move)• Phone confidence score (Connectivity for members at that unit)
RRC Code RRC
Description Recipient Contact Points
Available Components
1 Very High (default for mailing)
Postal/Phone Name/address/phone where available
2 High (default for mailing)
Postal/Phone Name/address/phone where available
3 Moderate Postal/Phone Name/address/phone where available
4 Low Postal/Phone Name/address/phone where available
5 Telemarketing Phone only Name/phone connectivity (No address)
6 End-dated/Address Only Postal only Resident or Occupant mailing (No name)
The top 2 tiers of the RRC model represent over 90% of the US Living units
Investigation Results:
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 22
We continue to assess the impact of limiting RRC to level 1 for younger individuals and possibly 2 for ages 35 - 64
General Mobility of Hshlder by Tenure by Age Range
23%
15%
8%4% 2%
25%21%
11%
52%
35%
0%
10%
20%
30%
40%
50%
60%
15-24 25-34 35-44 45-64 65+
Source: US Census, CPS Mobility Series Table 17 -- March 2004
Owner Occupied (OO) Renter Occupied (RO)
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 23
When we match the remaining “Ghosts” to the daily 411 file we located all but 2.43 percent
Sum of N EG_DWELLCREDIT_PRESENCE AX_PRESENCE Match411 Marginal multi Multi-Family PO BOX Single Family Grand TotalIndividual Individual (blank) 1.06% 5.50% 1.04% 40.45% 48.05%
(blank) (blank) 0.32% 1.70% 0.47% 6.62% 9.11%Historic (Individual) (blank) (blank) 0.07% 0.31% 0.26% 0.84% 1.48%Household (blank) (blank) 0.32% 1.55% 0.42% 5.81% 8.10%Address + Apt (blank) (blank) 0.72% 2.53% 0.65% 5.78% 9.68%Address Only (blank) (blank) 0.07% 1.10% 0.00% 0.06% 1.24%(blank) Individual (blank) 0.65% 2.71% 0.85% 14.26% 18.47%
(blank) Individual 0.01% 0.02% 0.00% 0.55% 0.59%Household 0.01% 0.01% 0.00% 0.51% 0.53%Address 0.02% 0.03% 0.00% 0.29% 0.34%(blank) 0.04% 0.07% 0.61% 1.71% 2.43%
Grand Total 3.30% 15.53% 4.29% 76.88% 100.00%
Approximately 10 percent are moving, which are found during NCOA processing
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 24
Why is this such an imperative?
Campaign Performance drives ROI
Model Performance drives Campaign Performance
Match Rates drive Model Accuracy
Data Quality drive match rates
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 25
We are all in a unique position:
Unit costs decliningdramatically • Processing• Storage• Data
Capability increases Exponentially
• Processing speed• On-line Storage • Data:
– Volumes– Determinicity– Rates of flow– Scope
Our only limitation is talent !
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 26
So, our job is to make “Paradigms out of the Fundamentals”
Data
Daily Credit 411 Directory• Daily • Cell #’sClick Behavior Enterprise:• Touch Points
Modeling
Media PhysicsDynamic:• Variables• Scoring
Personalization
Data/Event Driven: • Copy • Creative • Package • Channel
StrategicPlanning
Integrating:• Property• Credit• Transaction• Media Spend • Market ShareOVER TIME
Performance Analytics
Weekly Across ChannelsROI-Based Over time
Database
TB Level, Refreshed • In Real time • Historical • Behavioral
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 27
The Database Marketing Paradigm?
Who ? Models tell us this(to Target)
Which? We’re getting better at this(Channel)
When? Via “Trigger” Events(to Solicit)
What? Experiences drive Transactions
The Paradigm lies in transforming “Consumer Experiences”into predictive scores in real time
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 28
The Database Marketing Paradigm
To Predict Transactions(as the dependent variables)
ResponseConversionSatisfactionRetentionRenewalCross sell / Up-sell
The Consumer’s Experience(as the independent variables)
To Marketing Stimulus:• Direct Marketing• Interactive• Broad Market Media
+Their Interactions:• VRU Response• Customer Service• Wireless Inquiries• Web site visits
• Frequencies• Pages• Searches
• Inbound Call Centers
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 29
We’ll model on Response to all types of Stimulus and Visits OVER TIME The independent variables
Stimulus:• Direct Marketing
• Telemarketing• Direct mail • Fax
• Interactive• E-mail• Banners & Pop-ups• Keywords• Blogs• Search Engines
• Broad Market Media• Radio• TV• Billboard• Outdoor/Cinema• Print
1 2 3 4 5 6
X XXX X X X X
X X
X X X XX X
X XX X X X
X X XX X X X X X
X XX X X
X XX
X X X
Frequency
Consumer Visits:• VRU Response• Customer Service• Web site visits
• Frequencies• Pages• Searches
• Inbound Call Centers
1 2 3 4 5 6
X XXX X X X X
X X
X X X XX X
Frequency
WE’LL SHIFT FROM POINT-IN-TIME TO TIME SERIES MODELS
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 30
“Customer Experience” Models will have a completely different set of variablesFrom point in time variables
Demographic Number of adults in HouseholdMail Order DonorBody Size of newest carLevel of educationMarket Value DecileProperty type by detailLoan To Value Range
Real EstatePast MortgagesMortgage AmountLTVHome ValueInterest RateMortgage TypeOpen Date
InvestmentsAnnuitiesBondsCertificate Of DepositIRA's/401K’sMoney Market FundsMutual FundsSavings AccountStocks
0 1 2 3 4 5 6Months
To transaction behavior over timeResponse to Broad Market Media X X XResponse to Direct mail X X# of VRU InquiriesFreq. of Web site visits (by type of visit)
Services X XProduct information X X X X XBalance Inquiry X
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 31
Model gains charts will look similar to today’s
Decile
Cumulative # of
Prospects
Cumulative # of
Responders
Cumulative Response
Rate
Cumulative % of
Responders
Cumulative % of
ProspectsCumulative
Lift1 394,592 2,576 0.653% 20.5% 8.8% 2342 813,758 4,453 0.547% 35.5% 18.1% 1963 1,250,421 6,026 0.482% 48.0% 27.8% 1734 1,696,783 7,376 0.435% 58.8% 37.7% 1565 2,158,901 8,566 0.397% 68.2% 48.0% 1426 2,631,539 9,703 0.369% 77.3% 58.5% 1327 3,076,999 10,585 0.344% 84.3% 68.4% 1238 3,546,327 11,370 0.321% 90.6% 78.8% 1159 4,022,636 12,051 0.300% 96.0% 89.4% 10710 4,500,465 12,552 0.279% 100.0% 100.0% 100
Consumer Experience Model
LIFTWith quite different variables
0 1 2 3 4 5 6 7 8 9Months
Change in balance of all mortgage accounts - Current and 4 mths prior# of currently active bankcard accounts - Current and 4 mths prior
# Personal finance inquiries - Current and 2 mths prior# of months since oldest upscale retail account opened
Ratio of Current and 2 mths prior months since most recent trade openedRatio Between Current and 2 mths prior # of accounts with delinquency of 30 daysRatio Between Current and 2 mths prior # Open rev bank trades with hc/cl > 5000
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 32
These models will assign prospects into their Appropriate Consumer Experience decile
Requiring Models for each Customer Experience
1
2
3
45678910
Decile
Cumulative # of
Prospects
Cumulative # of
Responders
Cumulative Response
Rate
Cumulative % of
Responders
Cumulative % of
ProspectsCumulative
Lift1 394,592 2,576 0.653% 20.5% 8.8% 2342 813,758 4,453 0.547% 35.5% 18.1% 1963 1,250,421 6,026 0.482% 48.0% 27.8% 1734 1,696,783 7,376 0.435% 58.8% 37.7% 1565 2,158,901 8,566 0.397% 68.2% 48.0% 1426 2,631,539 9,703 0.369% 77.3% 58.5% 1327 3,076,999 10,585 0.344% 84.3% 68.4% 1238 3,546,327 11,370 0.321% 90.6% 78.8% 1159 4,022,636 12,051 0.300% 96.0% 89.4% 10710 4,500,465 12,552 0.279% 100.0% 100.0% 100
Consumer Experience Model
Decile
Cumulative # of
Prospects
Cumulative # of
Responders
Cumulative Response
Rate
Cumulative % of
Responders
Cumulative % of
ProspectsCumulative
Lift1 394,592 2,576 0.653% 20.5% 8.8% 2342 813,758 4,453 0.547% 35.5% 18.1% 1963 1,250,421 6,026 0.482% 48.0% 27.8% 1734 1,696,783 7,376 0.435% 58.8% 37.7% 1565 2,158,901 8,566 0.397% 68.2% 48.0% 1426 2,631,539 9,703 0.369% 77.3% 58.5% 1327 3,076,999 10,585 0.344% 84.3% 68.4% 1238 3,546,327 11,370 0.321% 90.6% 78.8% 1159 4,022,636 12,051 0.300% 96.0% 89.4% 10710 4,500,465 12,552 0.279% 100.0% 100.0% 100
Consumer Experience Model
Decile
Cumulative # of
Prospects
Cumulative # of
Responders
Cumulative Response
Rate
Cumulative % of
Responders
Cumulative % of
ProspectsCumulative
Lift1 394,592 2,576 0.653% 20.5% 8.8% 2342 813,758 4,453 0.547% 35.5% 18.1% 1963 1,250,421 6,026 0.482% 48.0% 27.8% 1734 1,696,783 7,376 0.435% 58.8% 37.7% 1565 2,158,901 8,566 0.397% 68.2% 48.0% 1426 2,631,539 9,703 0.369% 77.3% 58.5% 1327 3,076,999 10,585 0.344% 84.3% 68.4% 1238 3,546,327 11,370 0.321% 90.6% 78.8% 1159 4,022,636 12,051 0.300% 96.0% 89.4% 10710 4,500,465 12,552 0.279% 100.0% 100.0% 100
Consumer Experience Model
Decile
Cumulative # of
Prospects
Cumulative # of
Responders
Cumulative Response
Rate
Cumulative % of
Responders
Cumulative % of
ProspectsCumulative
Lift1 394,592 2,576 0.653% 20.5% 8.8% 2342 813,758 4,453 0.547% 35.5% 18.1% 1963 1,250,421 6,026 0.482% 48.0% 27.8% 1734 1,696,783 7,376 0.435% 58.8% 37.7% 1565 2,158,901 8,566 0.397% 68.2% 48.0% 1426 2,631,539 9,703 0.369% 77.3% 58.5% 1327 3,076,999 10,585 0.344% 84.3% 68.4% 1238 3,546,327 11,370 0.321% 90.6% 78.8% 1159 4,022,636 12,051 0.300% 96.0% 89.4% 10710 4,500,465 12,552 0.279% 100.0% 100.0% 100
Consumer Experience Model
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 33
We’ll deploy both Consumer Experience and Response/Conversion models together
1
2
3
45678910
Who ? (to Target)
What CE Segment Customer Experience (CE)
AMonthly e-mails of links to special discount programs
BAuto-Populate Blogs, call backs within 7 days of each purchase
CQuarterly phone calls within three days after direct mail in-home date
DPersonal Shopper in PM hrs ready with next suggestion
EDirect Mail Only with Annual Calendar of reminders
F Auto messages via email and phone of Gift Reminders and suggestions
What (Customer Experience Segment)
Which (Channel)
© 2006 Intellidyn Corporation 2007 Reproduction Prohibited12/11/2007 34
The Paradigms of Analytic-Driven Strategy Development/Execution
Our job is to continuously introduce new paradigmsAdding new data sources, coupled with unique derivatives and applicationsAccelerating to dynamic, near real time refreshesCreating marketing programs that reflect the way the consumer prefers to be marketed toIt’s no longer a technical or data sourcing challenge . . . It’s an innovation challenge!
MDM Lessons Learned