Credit fraud prevention on hwx stack

Post on 19-Feb-2017

190 views 2 download

Transcript of Credit fraud prevention on hwx stack

Credit Fraud Prevention on a Connected Data Platform

Kirk Haslbeck, Sr. Solution Engineer HWX

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Building a Model Show of hands, how many have built a “Model”? What are some limitations?

– Conditional based logic: if/else binary decisions

If you need a lot of data to build a good model, what tools can you use?– Data volumes can eliminate the possibility of desktop tools

Sampling?– Well… we better get an even distribution of true and false positives in each sample, but wait that

requires data munging, back to what tools can we use.

Security Concerns?– Extracting data from it’s secure resting place and pushing it into other environments, often times

unsecure files or desktops where Matlab or R can be installed.

Collaboration– Push processing to the data using modern distributed tooling.

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“All models are wrong, some are useful”

George E. P. Box

Most limiting factor is the data, with modern systems we are now able to capture more data and hopefully produce better insights

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Credit Card Fraud

Requirement: Detect fraudulent transactions. Goal: Save the card company money and build trust amongst card users. Cut down on

fraudulent crime Functional Requirement: Detect fraud in under 2 seconds at point of sale. Learn, adapt

and make smarter decisions over time. Design

– Distance: How far can one travel over a period of time before it is fraudulent?– Category: How can we detect a purchase that a customer wouldn’t likely make?– Frequency: How can we detect purchasing patterns that do not resemble the card holder?

Ideas?– White board some conditional logic, egregiousness vs binary– Back test the data– Build a model per card holder?

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Rules, Statistics, Machine Learning

Rule Based Logic– Great for checking conditions that can prove to be 100% accurate. Easy to build and no reason to

over engineer.– Example: Spending Limit. Card holder limit = $2,000

• If (currentPurchaseAmount + balance > 2,000) then deny transaction

Statistics– Mean, median, mode, variance, deviation– Anomaly detection. Outliers. (i.e. womens retail example)

Machine Learning– Supervised– Unsupervised– Trainable– Adapt over time

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Discovery

Gathered all Credit Card Transactions– Problem is they didn’t make sense– No identifiable patterns, no log normal curves– Gas $45, Chipotle $8.50, Steak dinner $88, Amazon shoes $55

Classification

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Outlier Detection: identify abnormal patterns

Example: identify anomaliesFeatures:- Time frequency- Category - Amount- Distance

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Demo

Show me the Code!

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Next Steps

Limitations of current model– In an Airport ready to fly out– Changes to behavior, like just got a new girlfriend– ? What else

Dependent on the quality of the analyst’s feedback

Tech Overview– Slider, Nifi, Kafka, Storm, Zeppelin, Spark, HBase

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Future of Data: Modern Data Application

D A T A I N M O T I O N

STO

RA

GE

STO

RA

GE

GROUP 2GROUP 1

GROUP 4GROUP 3

D A T A A T R E S T

INTERNETOF

ANYTHING

Hortonworks’ unique approach to data-in-motion and data-at-rest powers Actionable Intelligence

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DATA AT REST

DATA IN MOTION

ACTIONABLEINTELLIGENCE

MODERN DATA APPLICATIONS

Actionable Intelligence from Connected Data Platforms

Capturing perishable insights from data in motion

Ensuring rich, historical insights on data at rest

Necessary for modern data applications

Hortonworks DataFlow

Hortonworks Data Platform

12 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 12

Improved Experience

/Reduced Cost

Immediate Customer Feedback

Years of Customer

Transaction Data

Fraud Detection

Complete Customer

Profile

Real time ingest of

transactions

Proactively identify potential fraudulent transactions to protect the customer and improve customer experience• Proactively monitor every credit

card transaction using machine learning to catch potential fraud

• Customer Service Analyst reviews flagged transactions in real time via a next generation application running on the connected platform

• HDF controls real time flow of data in and out of the connected platform to the various source and destination points

Innovate

Renovate

Purchase Behavior Insight

Journey to Fraud Detection

Wei Wang
Please update the red portion

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

D A T A I N M O T I O N Elastic Compute

Machine Learning

Online Data

Interactive Query

Visualization

Data Acquisition

Data Routing

Simple/Complex Real-time Processing

Real Time Decisions

Queuing

D A T A I N M O T I O N

D A T A I N M O T I O N

Fraud Detection Demo Architecture

Wei Wang

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fraud Detection Demo Architecture

Distributed Storage: HDFS

Many Workloads: YARN

Real-time Serving (HBase)

Spark(Machine Learning)

UI and HTTP PubSub(Jetty and Tomcat)

Real-Time Data Movement

(Apache Nifi)

Data Science(Zeppelin)

Resource Allocation(Slider)

Interactive Query(Hive on Tez)

Configuration Managem

ent(Am

bari)Authorization(Ranger)

Real Time Processing(Storm)

Inbound Messaging(Kafka)

D A T A I N M O T I O N

D A T A I N M O T I O N

D A T A I N M O T I O N

Governance(Atlas)

Wei Wang

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Machine Learning: Enterprise Data Science at Scale

Use flow of data and the computing power of the connected platform to enable autonomous machine learning

• Real time data flows combined with massive parallel computing allows AI to continuously improve

• Enables AI to make decisions in the “Grey Areas”

Build and train AI on full volume data not a sample• Time, effort, accuracy, scale• Visualize data as it is being manipulated

Deploy the AI model without re-implementing• Spark models can be plugged into a modern

connected platform.

16 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 16

Credit Fraud Analyst Inbox

Wei Wang
Can you please add a screen shot of the dashboard

17 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 17

Hortonworks Data Flow

Wei Wang
Can you please add the similar slide for the fault detection

18 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 18

Hortonworks Data Flow

Wei Wang
Can you please add the similar slide for the fault detection

19 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 19

Hortonworks Data Flow

Wei Wang
Can you please add the similar slide for the fault detection

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved