Slide 3 Fast Data processing with kafka, rfx and redis

27
Fast Data Processing with RFX Simplify Fast Data Processing [email protected] [email protected]

Transcript of Slide 3 Fast Data processing with kafka, rfx and redis

Page 1: Slide 3 Fast Data processing with kafka, rfx and redis

Fast Data Processing with RFXSimplify Fast Data Processing

[email protected]@gmail.com

Page 2: Slide 3 Fast Data processing with kafka, rfx and redis

Today topic : We would talk about all things in this red circle

Page 3: Slide 3 Fast Data processing with kafka, rfx and redis

Demo first

https://github.com/rfxlab/pageview-analytics-with-rfx

Page 4: Slide 3 Fast Data processing with kafka, rfx and redis

Content at glance

1. BEAM✲ methodology for agile data warehouse2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X)

4.1. Event data actor4.2. Event data agent4.3. Event data collector4.4. Event data router4.5. Event data processor4.6. Event data storage4.7. Event data query4.8. Event data reactor

5. Demo “Fast Data in web analytics” with source code explanation

Page 5: Slide 3 Fast Data processing with kafka, rfx and redis

1 - BEAM✲ methodology

Page 6: Slide 3 Fast Data processing with kafka, rfx and redis

1 - BEAM✲ methodology for Agile Data Warehouse

BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses.

It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.

Page 7: Slide 3 Fast Data processing with kafka, rfx and redis

Example with BEAM✲

Page 8: Slide 3 Fast Data processing with kafka, rfx and redis

Goal: Modeling all business events and put into a database in agile way

Page 9: Slide 3 Fast Data processing with kafka, rfx and redis

2 - Fast Data

Page 10: Slide 3 Fast Data processing with kafka, rfx and redis
Page 11: Slide 3 Fast Data processing with kafka, rfx and redis

Introduction to Fast Data

Page 12: Slide 3 Fast Data processing with kafka, rfx and redis
Page 13: Slide 3 Fast Data processing with kafka, rfx and redis

3 - Problems in Practice

Page 14: Slide 3 Fast Data processing with kafka, rfx and redis

Problems

“Fast Data in web analytics”

1. Counting pageview of website2. Counting unique user of website3. Sending email when pageview is unnormal (simple DDOS

attack detection)

Page 15: Slide 3 Fast Data processing with kafka, rfx and redis

4 - Thinking with RFX

Page 16: Slide 3 Fast Data processing with kafka, rfx and redis

● A design pattern to solve big fast data problems● A collection of Open Source Tools● The mission of RFX

1. Build data product quickly with design patterns2. Apply BEAM✲ for agile data pipeline3. React to critical events in near-real-time

What is RFX or Reactive Function X ?

Page 17: Slide 3 Fast Data processing with kafka, rfx and redis

RFX frameworkWhat ?● The Java framework, is built from open source projects:

○ Based on core Akka Actor ( http://akka.io )○ Lightweight DAO with Spring JDBC ( https://spring.io )○ Netty ( http://netty.io ) and VertX ( http://vertx.io/ )○ Common utils class for Apache { Kafka, Hadoop , Spark }○ Common utils class for NoSQL ( Redis ( http://redis.io ), MongoDB )

● a R&D project, started since 11/2013 for fast data processingWhy ?● Divide Java code into modules:

○ common infrastructure code ( rfx-stream ) ○ business logic code ( check valid data stream )○ machine learning code ( automation & optimization )

● Focus on best practices and reusability ● Foundation for scalability (system and business)● Test-driven development for Real-Time Analytics● Continuous integration & improvement

Page 18: Slide 3 Fast Data processing with kafka, rfx and redis

Your business logic here

Page 19: Slide 3 Fast Data processing with kafka, rfx and redis

Reactive Function (X) Philosophy

Page 20: Slide 3 Fast Data processing with kafka, rfx and redis

Core elements of rfx-stream

Page 21: Slide 3 Fast Data processing with kafka, rfx and redis

Core backend modules

rfx-track: ● collecting all events from log agentrfx-stream: ● processing stream data (PipelineProcessing pattern)● processing real-time analytics ● processing business logic (by reactive function)rfx-cronjob: ● synchronizing real-time data to report database (by

parsing data in Redis and update to Report database)

Page 22: Slide 3 Fast Data processing with kafka, rfx and redis

Core frontend modules

rfx-report: ● visualizing data in real-time● monitoring real-time eventrfx-agent: ● tracking user activity: heatmap data, pageview, ...● logging user activity to rfx-track (via network

protocol: HTTP, TCP or UDP)

Page 23: Slide 3 Fast Data processing with kafka, rfx and redis

How to solve problems with RFX ?

Page 24: Slide 3 Fast Data processing with kafka, rfx and redis

Use Cases in “Fast Data in web analytics”

1. Counting pageview of website2. Counting unique user of website3. Sending email when pageview is unnormal (simple

DDOS attack detection)

Page 25: Slide 3 Fast Data processing with kafka, rfx and redis

Apply RFX into Pageview Analytics

1.1. Event data actor: a web user1.2. Event data agent: RFX-track-js1.3. Event data collector: RFX-track-server1.4. Event data queue: Apache Kafka1.5. Event data processor: RFX-stream1.6. Event data storage: Redis, MySQL1.7. Event data query: RFX-data-api1.8. Event data reactor: RFX-reactor

Page 26: Slide 3 Fast Data processing with kafka, rfx and redis

Demo and Explanation for code and concepts

Page 27: Slide 3 Fast Data processing with kafka, rfx and redis

Readings

● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798

● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/● https://kafka.apache.org/quickstart

● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr

ocessing-with-apache-hadoop/● https://www.oreilly.com/ideas/drivetrain-approach-data-products