Convertro/AOL@Devcon tlv 2015

31
Expect the unexpected Or how we adapted to get ourselves out of harm’s way (and lived) Yaniv Ranen AOL/Convertro

Transcript of Convertro/AOL@Devcon tlv 2015

Expect the unexpectedOr how we adapted to get ourselves out of harm’s way

(and lived)

Yaniv Ranen

AOL/Convertro

Named leader in report.

Founded in 2009Acquired by AOL in 2014

Using Big Data stack since 2009

75 people - 30 R&D

5TB of daily data 400M daily data points>100 data sources

Convertro

Programmer @ IDF’s Computer unit (Mamram) Been in the BI field since 2000

CTO @ Ness GilonBI manager @ Kenshoo

Big data engineer @ Convertro(Spend, Insights, Spend recommendations, DS team)

Helping marketers to optimize their spend

Across: Channels | Devices

Online + Offline

Convertro

Collecting the user touch points

Cleansing and enhancing the data

Running an attribution model to decide the event score in the overall path to conversion

Allowing access to our data on the most granular level (Dashboard/data feeds)

Recommend media spend allocation

How do we do that?

Don’t ignore pain

Don’t Fix something that isn’t painful enough

Don’t fix something just because it’s cool

Fix it right, scalable and monitored

Address scalability debt at early stages

Pain driven development

AZ#1

AZ#2

Collecting Processing Presenting

Main TechnologiesSFTP / SCP

Org Services

Initial architecture - Naive

AZ#1

AZ#2

Collecting Processing Presenting

SFTP / SCP

Org Services4 static servers

Event deduping

Main Technologies

“Mothers day”

Pain

High demand for 1800Flowers drove lots of requests per hour

Our cycles had almost exceeded 24 hours

Data was lost

The Super bowl was right around the corner

“Mothers day”

AZ#1

AZ#2

Collecting Processing Presenting

SFTP / SCP

Org ServicesAmazon Autoscale12 – 100 servers

Stateless architectureSimple algorithm

“Bees with Machine guns” used for warm up

Event deduping in MR

Main Technologies

Pain

Onboarding process might have problems

The sooner the problems are dealt with the sooner we begin to gather data for our client

Fast indication if we have tagging problems

Implementation feedback

AZ#1

AZ#2

Collecting Processing Presenting

SFTP / SCP

Org Services

Main Technologies

Implementation feedback

Pain

Clients have different taxonomies

Lots of development adjustments

Parsing data without code changes

Development scaling

Customization needs

Parsing – CSL

Convertro Source Language

Implementation team can write a script in pseudo English

The script gets pushed to our repository

Every build Maven runs a parser which generates a Java class based on the if statement in that file

A parameter in our settings redirects a client to use this automated class

Changes in parsing are done on the fly by our implementation team fast on-boarding

AZ#1

AZ#2

CSL

Collecting Processing Presenting

SFTP / SCP

Org Services

Main Technologies

Spend

Over 100 possible integrationsEach integration is a different snowflake, different login method, different data storedComplex matching techniques with existing data

Pain

Lots of spend integrations

Needs to be customizable per client

Data matching problems

Development time on integration tweaks

Spend

Spend

Each integration has a scraper/API that extracts a csv file of daily spend and saves it on S3

That’s the only unique code for each integration and it’s relatively small

Conformed code

UI interface for implementation team

Self serve and configurable

Spend

Spend

Pain

different SLA for reporting and Dashboard

MPP is a brute force solution

Dashboard - Write once read many (daily)

Reports - Write many read many (ongoing)

Reporting / Dashboard

Dashboard data is copied daily to a different Vertica environment used for the dashboard

Reports run on the faster changing environment

Reporting / Dashboard

Operational Dash

ETL

Pain

Clients want to map ID’s to description

Rebuild pre-calculated table every change

First day inconsistency

Mapping

Hydro

AZ#1

AZ#2

Hydro

Collecting Processing Presenting

SFTP / SCP

Org Services

CSL

Main Technologies

Summary

– Pain driven development – Begin with the naïve approach and study pain areas– Saving developer time is a major issue for us– Our business drives us to automate and reduce costs

Questions?

Thanks

[email protected]

(Yes, we are hiring )