Kirk Haslbeck, Hortonworks Dan Kernaghan, Pitney Bowes · By consolidating data and running...
Transcript of Kirk Haslbeck, Hortonworks Dan Kernaghan, Pitney Bowes · By consolidating data and running...
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
0
2000
4000
6000
8000
10000
12000
14000
HDP Oracle X Teradata Netezza
Cost Per Terabyte
Hortonworks #REF!
Hadoop is Lower Cost and more Scalable
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cost Drivers – The Big Picture
Insights – Produce more valuable and more holistic insights
Security - Apply Security Policies in one place instead of repeating them in each Silo
Collaborate - Curate Feature Vectors for our Data Scientists and Promote Collaboration
Time – Get models into production faster. Human time still the most costly
Storage – Store data in an accessible file system at the lowest cost
Storage
Collaborate
SecurityInsights
Time-to-Market
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Various Data Types
First_Name SSN Net_Worth
Joe 233-33 100,000
Mark 456-77 200,000
Structured
0
5
10
15
20
25
30
35
40
12:05 12:08 12:11 12:14 12:17 12:20
Time-Series
Best Buy released their earnings this quarter and beat analyst expectations. Earnings per share increased by 0.02
Unstructured
DB2, Oracle KDB File System
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Stack – Attack the Data with the Right Tool
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Limitations of Building a Model on a Traditional Platform
If you need a lot of data to build a good model, what tools can you use?– Data volumes can eliminate the possibility of desktop tools
– R, Eclipse all limited to 8G of Ram on the desktop machine
Sampling?– Well… we better get an even distribution of true and false positives in each sample, but wait that
requires data munging, back to what tools can we use.
Security Concerns?– Extracting data from it’s secure resting place and pushing it into other environments, often times
unsecure files or desktops where Matlab or R can be installed.
Collaboration– Push processing to the data using modern distributed tooling.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Web-based Notebook for interactive analytics
• Data exploration and discovery
• Visualization
• Interactive snippet-at-a-time experience
• “Modern Data Science Studio”
Features
• Ad-hoc experimentation
• Deeply integrated with Spark + Hadoop
• Supports multiple language backends
• Incubating at Apache
Use Case
Apache Zeppelin
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery
Gathered all Credit Card Transactions– Problem is they didn’t make sense
– No identifiable patterns, no log normal curves
– Gas $45, Chipotle $8.50, Steak dinner $88, Amazon shoes $55
Classification
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Outlier Detection: identify abnormal patterns
Example: identify anomaliesFeatures:- Time frequency- Category - Amount- Distance
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pitney Bowes and HortonworksSpatially Enabling the Data Lake
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pitney Bowes Data – Global Coverage
16
Local datasets for
240Countries
Global coverage built on a legacy of accuracy and precision
Recognized leaderfor LI Data and capabilities.
AMER
764Datasets
EMEA
3079Datasets
APAC
719Datasets
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pitney Bowes Data – Unparalleled Depth
Pitney Bowes | Partner Program Overview | February 14, 201717
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
/5
July 1997
July 1997
$207,000
$207,000
/2
/13
75
/5
Unfinished
Incorrect information
for this property:
• Last sale date• Last sale price• # of bedrooms• # of rooms• Finished basement• # of spaces (garage)• Structure type • Lot width• Parcel boundary
Risk of Relying Solely on Public Data
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Easy to Deploy and Use
Pitney Bowes | April 19, 2017
Client Applications
Pitney Bowes Data Products
Big Data Ecosystem Tools
Spatial Visualization
Reporting AnalyticsCustom
Applications
Distributed Cluster
HDFS HiveReference Datasets
NoSQL Database Spark
Spectrum Data Quality for Big Data
Spectrum Addressing for Big Data
Spectrum Spatial for Big Data
Spectrum Geocoding for Big Data
Spectrum Routing for Big Data
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enriching Data with a Location Stack
For a given location:• POI (carries attributes)
• Retail (Business) Footprint poly
• Building Footprint
• Parcel (Lot)
• Isochrone(travel time)
• Demographics, lifestyle attributes, financial and
consumer vitality, etc.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Wild Fire Risk Walkability Scores
Hydrating the Spatial Data Lake
Property Data
Risk Data
Market Data
Plus• Transactions • IOT Sensors• Social Media
Property Data• 180M+ Property Addresses• Geocode• Property Attributes
Risk Data• Property Boundaries• Distance to Water• Flood Risk• Wild Fire Risk
Market Data• GeoDemographics• Neighborhood Boundaries• Zip Code Boundaries• Points of Interest
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Case studies: Drive superior business outcomes and gain a deeper understanding of customers.
22
Online Mortgage Loan ProviderBy consolidating data and running real-time address validation, they gained a complete view of customers, enabling more effective marketing, accelerated mortgage origination to enable loan processing in days not weeks.
Financial service firm gains richer profilesRestored missing address data through data standardization, data augmentation and geocoding. Enabled firm to run targeted multichannel promotions via web and smartphone apps.
Global US Based Wealth Management OrganizationIncrease customer lifetime value and provide ideal customer experience by optimizing every contact with its mass-affluent customers, with 35% increase in revenues and 55% improvement in client satisfaction
Pitney Bowes | Partner Program Overview | April 19, 2017
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Challenge: Close loans more quickly, improve client experience while mitigating lender’s risk
This lender, unlike most others, relies on wholesale funding to make its loans and uses online applications rather than a system of branches.
Close Loans FasterLender found many specific requirements delayed loan funding and closure, causing clients to abandon online process. Integration if Pitney Bowes data through the pb key enabled the analysis of loan requests to provided an accurate qualification of the property for a loan, reducing abandoned rates and accelerating revenue.
Mitigating RiskThe accurate and complete attributes provided by the spatial data lake, correctly assessed the risks associated with a loan, enabling more accurate pricing and profitability.
Desired Outcomes▪Improved real-time and long-term decisions▪Access to accurate date for 180M properties in the US▪Sharing information with partners (e.g. Fannie Mae)▪Complete picture of property, risk and market
Benefits▪ Accurate qualification of property for a particular loan type▪ Faster loan processing and closure▪ Improved risk assessment of loan to particular property.
Large US Online Loan Provider
Property Analytics Case Study
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Five reasons to modernize with Pitney Bowes Big Data SDKs
24
1
2
3
4
5
They’re easy• Simple and intuitive user experience
• Program in SQL to run processes in the Hortonworks Spatial Data Lake
They’re powerful• Take advantage of more data• Answer questions that were too big before
They’re incredibly fast• Process enormous amounts of data in a fraction of the time
They’re practical• Avoid large capital outlays• They’ll run in the cloud
They’re secure• Extend and enforce your Hadoop permissions• Easy to manage and configure
Pitney Bowes |April 19, 2017