Fraud Detection Architecture
-
Upload
gwen-chen-shapira -
Category
Data & Analytics
-
view
5.575 -
download
1
Transcript of Fraud Detection Architecture
![Page 1: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/1.jpg)
Real Time Fraud DetectionPatterns and reference architectures
Ted Malaska // PSA Gwen Shapira // Software Engineer
![Page 2: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/2.jpg)
2
• Intro• Review Problem• Quick overview of key technology• High level architecture• Deep Dive into NRT Processing• Completing the Puzzle – Micro-batch, Ingest and Batch
Overview
©2014 Cloudera, Inc. All rights reserved.
![Page 3: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/3.jpg)
3©2014 Cloudera, Inc. All rights reserved.
• 15 years of moving data• Formerly consultant• Now Cloudera Engineer:– Sqoop Committer– Kafka– Flume
• @gwenshap
Gwen Shapira
![Page 4: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/4.jpg)
4
• Ted Malaska (PSA at Cloudera)• Hadoop for ~5 years• Contributed to
– HDFS, MapReduce, Yarn, HBase, Spark, Avro, – Kite, Pig, Navigator, Cloudera Manager, Flume, Kafke, Sqoop, Accumulo – And working on a Sentry Patch
• Co-Author to O’Reilly Hadoop Application Architectures• Worked with about 70 companies in 8 countries• Marvel Fan Boy• Runner
Hello
©2014 Cloudera, Inc. All rights reserved.
![Page 5: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/5.jpg)
5
The Problem©2014 Cloudera, Inc. All rights reserved.
![Page 6: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/6.jpg)
6
Credit Card Transaction Fraud
©2014 Cloudera, Inc. All rights reserved.
![Page 7: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/7.jpg)
7
Ikea Meat Balls
©2014 Cloudera, Inc. All rights reserved.
![Page 8: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/8.jpg)
8
Coupon Fraud
©2014 Cloudera, Inc. All rights reserved.
![Page 9: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/9.jpg)
9
Video Game Strategy
©2014 Cloudera, Inc. All rights reserved.
![Page 10: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/10.jpg)
10
Health Insurance Fraud
©2014 Cloudera, Inc. All rights reserved.
![Page 11: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/11.jpg)
11
• Typical Atomic Card Fraud Detection• Ikea Meat Ball• Multi Coupons Combinations • OP or Negative Video Games Strategies • Ad Serving • Health Insurance Fraud• Kid Coming Home From School
Review of the Problem
©2014 Cloudera, Inc. All rights reserved.
![Page 12: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/12.jpg)
12
How do we React• Human Brain at Tennis – Muscle Memory– Reaction Thought– Reflective Meditation
©2014 Cloudera, Inc. All rights reserved.
![Page 13: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/13.jpg)
13
Overview of Key Technologies
©2014 Cloudera, Inc. All rights reserved.
![Page 14: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/14.jpg)
14
Kafka©2014 Cloudera, Inc. All Rights Reserved.
![Page 15: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/15.jpg)
15©2014 Cloudera, Inc. All rights reserved.
•Messages are organized into topics•Producers push messages•Consumers pull messages• Kafka runs in a cluster. Nodes are called brokers
The Basics
![Page 16: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/16.jpg)
16©2014 Cloudera, Inc. All rights reserved.
Topics, Partitions and Logs
![Page 17: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/17.jpg)
17©2014 Cloudera, Inc. All rights reserved.
Each partition is a log
![Page 18: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/18.jpg)
18©2014 Cloudera, Inc. All rights reserved.
Each Broker has many partitions
Partition 0 Partition 0
Partition 1 Partition 1
Partition 2
Partition 1
Partition 0
Partition 2 Partion 2
![Page 19: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/19.jpg)
19©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
![Page 20: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/20.jpg)
20©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
![Page 21: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/21.jpg)
21©2014 Cloudera, Inc. All rights reserved.
Consumers
Consumer Group Y
Consumer Group X
Consumer
Kafka Cluster
Topic
Partition A (File)
Partition B (File)
Partition C (File)
Consumer
Consumer
Consumer
Order retained with in partition
Order retained with in partition but not over
partitionsOff
Set
X
Off S
et X
Off S
et X
Off S
et Y
Off S
et Y
Off S
et Y
Off sets are kept per consumer group
![Page 22: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/22.jpg)
22
Flume
![Page 23: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/23.jpg)
23
Sources Interceptors Selectors Channels Sinks
Flume Agent
Short Intro to FlumeTwitter, logs, JMS, webserver, Kafka
Mask, re-format, validate…
DR, criticalMemory, file,
KafkaHDFS, HBase,
Solr
![Page 24: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/24.jpg)
24
Flume and/or Kafka
©2014 Cloudera, Inc. All rights reserved.
Flume
UpStream
Flume Source
Interceptor
Flume Channel
Flume Sink
Down Stream
SelectorCan Be KafkaCan Be KafkaCan Be Kafka
![Page 25: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/25.jpg)
25©2014 Cloudera, Inc. All rights reserved.
Interceptors• Mask fields• Validate information against external source• Extract fields• Modify data format• Filter or split events
![Page 26: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/26.jpg)
26
SparkStreaming
![Page 27: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/27.jpg)
27
Spark Streaming Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”) 2. val ssc = new StreamingContext(conf, Seconds(1))3. val lines = ssc.socketTextStream("localhost", 9999)4. val words = lines.flatMap(_.split(" "))5. val pairs = words.map(word => (word, 1))6. val wordCounts = pairs.reduceByKey(_ + _)7. wordCounts.print()8. SSC.start()
![Page 28: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/28.jpg)
28
Spark Streaming Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”) 2. val sc = new SparkContext(conf)3. val lines = sc.textFile(path, 2)4. val words = lines.flatMap(_.split(" "))5. val pairs = words.map(word => (word, 1))6. val wordCounts = pairs.reduceByKey(_ + _)7. wordCounts.print()
![Page 29: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/29.jpg)
29Confidentiality Information Goes Here
DStream
DStream
DStream
Spark Streaming
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first Batch
First Batch
Second Batch
![Page 30: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/30.jpg)
30Confidentiality Information Goes Here
DStream
DStream
DStreamSpark Streaming
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Source Receiver RDD
RDD
RDD
Single PassFilter Count
Pre-first Batch
First Batch
Second Batch
Stateful RDD 1
Stateful RDD 2
Stateful RDD 1
![Page 31: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/31.jpg)
31
Spark Streaming and HBase
©2014 Cloudera, Inc. All rights reserved.
Driver
Walker Node
Configs
Executor
Static SpaceConfigs
HConnection
Tasks Tasks
Walker NodeExecutor
Static SpaceConfigs
HConnection
Tasks Tasks
![Page 32: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/32.jpg)
32
High Level Architecture
©2014 Cloudera, Inc. All rights reserved.
![Page 33: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/33.jpg)
33
Real-Time Event Processing Approach
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster IIStorage Processing
SolR
Hadoop Cluster I
ClientClientFlume Agents Hbase /
Memory
Spark Streamin
g
HDFS
Hive/ImpalaMap/
ReduceSpark
Search
Automated & Manual
Analytical Adjustments and Pattern detection
Fetching & Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated & Manual
Review of NRT
Changes and Counters
Local Cache
Kafka
Clients:(Swipe here!)
Web App
![Page 34: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/34.jpg)
34
NRT Processing©2014 Cloudera, Inc. All rights reserved.
![Page 35: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/35.jpg)
35
Focus on NRT First
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster IIStorage Processing
SolR
Hadoop Cluster I
ClientClientFlume Agents Hbase /
Memory
Spark Streamin
g
HDFS
Hive/ImpalaMap/
ReduceSpark
Search
Automated & Manual
Analytical Adjustments and Pattern detection
Fetching & Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated & Manual
Review of NRT
Changes and Counters
Local Cache
Kafka
Clients:(Swipe here!)
Web App
NRT Event Processing with Context
![Page 36: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/36.jpg)
36
Streaming Architecture – NRT Event Processing
©2014 Cloudera, Inc. All rights reserved.
Flume SourceFlume Source
Kafka
Initial Events Topic
Flume SourceFlume InterceptorEvent Processing
LogicLocal
MemoryHBase Client
Kafka
Answer Topic
HBase
Kafk
a Co
nsum
er
Kafk
a Pr
oduc
er
Able to respond with in 10s of milliseconds
![Page 37: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/37.jpg)
37
Partitioned NRT Event Processing
©2014 Cloudera, Inc. All rights reserved.
Flume SourceFlume Source
Kafka
Initial Events Topic Flume SourceFlume InterceptorEvent Processing
LogicLocal
MemoryHBase Client
Kafka
Answer Topic
HBase
Kafk
a Co
nsum
er
Kafk
a Pr
oduc
er
TopicPartition A
Partition B
Partition C
Producer
Partitioner
Producer
Partitioner
Producer
Partitioner
Custom Partitioner
Better use of local memory
![Page 38: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/38.jpg)
38
Completing the Puzzle
©2014 Cloudera, Inc. All rights reserved.
![Page 39: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/39.jpg)
39
Micro Batching
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster IIStorage Processing
SolR
Hadoop Cluster I
ClientClientFlume Agents Hbase /
Memory
Spark Streamin
g
HDFS
Hive/ImpalaMap/
ReduceSpark
Search
Automated & Manual
Analytical Adjustments and Pattern detection
Fetching & Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated & Manual
Review of NRT
Changes and Counters
Local Cache
Kafka
Clients:(Swipe here!)
Web App
Micro Batching
Micro BatchingMicro Batching
![Page 40: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/40.jpg)
40
Complex Topologies
©2014 Cloudera, Inc. All rights reserved.
Kafka
Initial Events Topic
Spark Streaming
Kafk
a Di
rect
Co
nnec
tion
Dag Topologies
Kafka
Initial Events Topic
Spark StreamingKafka Receivers Dag Topologies
Kafka Receivers
Kafka Receivers
• Manages Offset• Stores Offset is RDD• No longer needs HDFS for initial RDD check
pointing
• Lets Kafka Manage Offsets• Uses HDFS for initial RDD recovery
1.3
1.2
![Page 41: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/41.jpg)
41©2014 Cloudera, Inc. All rights reserved.
MicroBatch Bad-Input Handling
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Kafka – incoming events topic
Dag Topologies
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Kafka – bad events topic
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Kafka – resolved events topic
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Kafka – results topic
![Page 42: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/42.jpg)
42
Ingestion
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster IIStorage Processing
SolR
Hadoop Cluster I
ClientClientFlume Agents Hbase /
Memory
Spark Streamin
g
HDFS
Hive/ImpalaMap/
ReduceSpark
Search
Automated & Manual
Analytical Adjustments and Pattern detection
Fetching & Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated & Manual
Review of NRT
Changes and Counters
Local Cache
Kafka
Clients:(Swipe here!)
Web App
Ingestion
Ingestion
![Page 43: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/43.jpg)
43
Ingestion
©2014 Cloudera, Inc. All rights reserved.
Flume HDFS SinkKafka Cluster
TopicPartition A
Partition B
Partition C
SinkSinkSink
HDFS
Flume SolR SinkSinkSinkSink
SolR
Flume Hbase SinkSinkSinkSink
HBase
![Page 44: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/44.jpg)
44
Reflective Thoughts
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster IIStorage Processing
SolR
Hadoop Cluster I
ClientClientFlume Agents Hbase /
Memory
Spark Streamin
g
HDFS
Hive/ImpalaMap/
ReduceSpark
Search
Automated & Manual
Analytical Adjustments and Pattern detection
Fetching & Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated & Manual
Review of NRT
Changes and Counters
Local Cache
Kafka
Clients:(Swipe here!)
Web App
Research and Searching
![Page 45: Fraud Detection Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062223/58f9a989760da3da068b6fb2/html5/thumbnails/45.jpg)
©2014 Cloudera, Inc. All rights reserved.