NRT Event Processing with Snowplow

Post on 15-Apr-2017

112 views 0 download

Transcript of NRT Event Processing with Snowplow

NRT Event Processing

Outline• Introduction

• Our Snowplow Setup

• Example NRT Use Cases

• Radio Campaign

• Telephony System

Simply Business

• Largest UK business insurance provider

• More than 400.000 policy holders

• Using BML, tech and data to disrupt the

business insurance market

Data ’n’ Analytics

• 5 Data Engineers

• 3 Business Intelligence Developers

• 3 Data Analysts

• 1 Data Scientist

• 1 Director of Data Science

• And hiring! :-)

Our Snowplow Setup

Snowplow Setup

Trackers Collector Enrichment Modeling Storage

• Trackers, collectors and storage are 100% upstream Snowplow

• Enrichment:

• Spark apps that use scala-common-enrich as a library

• We add our own enrichments after the default ones

• We perform NRT identity stitching and sessionization

• Modeling: mix of Spark and SQL jobs

• Storage: Spark apps that use scala-hadoop-shred as a library

Why ?

• We wanted a near real-time pipeline, but KCL was too rigid:

• Provision, set up and monitor the machines

• Configuration is difficult for complex DAGs

• In contrast, Spark:

• Once set up, the cluster is a PaaS

• Allows streaming, batch, ML and graph workloads

• Allows analysts and data scientists to use Python

Radio Campaign

The Radio Campaign

• We’re running a radio campaign in Birmingham, Manchester and London

• People that get a quote starting from our radio landing pages get £25 discount

The Banner

• The questionnaire to get quotes can be quite long to complete

• We wanted to reassure our customers that they would get the

discount

• We wanted to display a banner at the top through all the pages of

the questionnaire

The Banner

Our InfrastructureSpark Stream

NRT EnrichmentScala Stream

Collector Kinesis

MongoDB

Visitor APIQuoting AppHTTP

On average, it takes 2.5s for an event to be available in the Visitor API

Benefits of NRT Snowplow

• Our quoting app does not need to know about marketing, user

landing pages, etc.

• Our Mongo table with active sessions’ events becomes a view of our

event log

• Can be reused for many other use cases: analytics on read!

Telephony System

Telephony System

• We have a call center in Northampton with around 200 consultants

• We used an off-the-shelf telephony system

• It worked well for a long time, but:

• Was not very well integrated with our systems

• Quite rigid, we couldn’t adapt it to all our needs

• We had daily reports and they contained aggregated data

Telephony System

• We decided to replace it with a home grown, Twilio-based solution

• Components:

• Contact Strategy Manager

• Voice Channel Manager

• Communication is event-based

• We transform those events into Snowplow’s unstructured

• Spark Streaming app to insert the events into Redshift every 2min

The InfrastructureSpark Stream

NRT EnrichmentScala Stream

Collector Kinesis Kinesis

Redshift

Spark StreamShredder

LookerContact Strategy Manager

Voice Channel Manager

EventTranslator

Events

Example call when viewed as sequence of events:

Benefits of NRT Snowplow

• Event Sourcing is great for reporting and analytics: ensures that

data quality remains high

• Team managers now have a NRT view of what teams are doing

• You can aggregate and drill down on the data as appropriate

• Leveraging our data platform: Snowplow pipeline, Redshift & Looker

• Leveraging our existing skills: everyone knows how to use Looker

Sum Up

The InfrastructureSpark Stream

NRT EnrichmentScala Stream

Collector Kinesis

MongoDB

Kinesis

Redshift

Spark StreamShredder

Visitor API LookerApplications

NRT Benefits

• We can dynamically alter the website while the user is still using it

• We can provide insights on live processes

• Multiple uses to improve conversion:

• Instant inclusion/exclusion from remarketing lists

• Abandoned cart emails/calls

• Social proofing (3 more people are also watching…)

• …

Questions?

@dani_soladani.sola@simplybusiness.co.uk