The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Post on 03-Aug-2015

488 views 1 download

Transcript of The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

The Rise of Digital Audio: Dwelling between BIG Data

and Fast Data

Philippe-Alexandre Leroux | Chief Operating OfficerBogdan Bocse | Solutions Architect

The way we consume music has evolved

Music is part of our lives, just not like

before

We can now consume music in many different ways

On Demand Live Radios Custom Radios

It’s now interactive, connected and tailored

around users… = New opportunities for publishers &

advertisers

So what’s different now?

What does it mean for the industry?

Less people are buying CDs

Publishers and Artists need new revenue models

Advertisers want to Digital Audio to be as easy as Display or Video+

+

=Great opportunity for an Ad Tech company to power

the Digital Audio Revolution !

AdsWizz in that ?

We are NOT an airline

We power the Digital Audio revolution

Audience Analytics AdServing Audio StreamingSSP

DSP

Real-Time Bidding

Real-Time reports

Supply Intelligence

Content Analysis

Mobile SDKs

Real-Time Ad Insertion

Traffic Forecasting

Some numbers

#5B +impressions per month#3500+ broadcast stations#10 000 custom stations#1000 podcast shows#100+ Amazon nodes#1+ Million concurrent sessions #100 Swizzers#7 offices world wide

Some of the cool brands we work with

How do we use Big Data?* It’s not just for showing off

Understand user trends

0:000:30

1:001:30

2:002:30

3:003:30

4:004:30

5:005:30

6:006:30

7:007:30

8:008:30

9:009:30

10:0010:30

11:0011:30

12:0012:30

13:0013:30

14:0014:30

15:0015:30

16:0016:30

17:0017:30

18:0018:30

19:0019:30

20:0020:30

21:0021:30

22:0022:30

23:0023:30

UK Online Listening Media Day

Lunch breakDaily peak

Commute

Real-time user profiling

RTB is like the stock exchange, but with ads

Traditional “small” data solutions simply don’t work

For every single transaction we collect 20+ data points

Applied to 5+ billion monthly impressions

A database which grows by 1TB per day

Good luck serving close to real-time queries with MySQL

+

=+

Yeah, yeah, it’s all BIG. What else?

Fast•Cache-Aside Pattern•Redis•Memcached

Complex

Query

•Data Warehousing•Redshift•HadoopStructu

red Query

•Sorted key-value stores•HBase•DynamoDB

Use Case #1: Handling User Profiles

Use Case #2: Distributed Worker

Use Case #3: Distributed Worker +Data Warehouse

Use Case #4: Distributed Worker +Data Warehouse + State Store

An evolving tech stack

Join the ride

We are looking for new Swizzers to join BIG DATA ENGINEER FOR DATA SCIENCE TEAM

MAD DEVOPS NINJA

INCIDENT MANAGER

SUPER VILLAIN (ÜBER JAVA DEVELOPER)

SENIOR MOBILE DEVELOPER (ANDROID/iOS)

SENIOR QA INTEGRATION ENGINEER

…jobs@adswizz.com

PHP / AngularJS DEVELOPERSENIOR IT PROJECT MANAGER

Philippe-Alexandre LerouxChief Operating Officerphilalex.leroux@adswizz.com

Bogdan BocșeSolutions ArchitectBogdan.bocse@adswizz.com

@followadswizz

Philippe-Alexandre LerouxChief Operating Officerphilalex.leroux@adswizz.com

Bogdan BocșeSolutions ArchitectBogdan.bocse@adswizz.com

@followadswizz

Backup Slides(on the off-chance 20 minutes are enough)

What’s it called? What does it mean?

Volumetry If it’s less than 100GB, don’t bother calling it BigData

Atomic Query Size Are you reading 10 or 10 million records per transaction?

Query Load Do you expect 5 or 5000 queries per second?

Response Time Do you expect your data store to answer in 1ms, 10ms or 10s?

Immutability Once your data is written, does it stay written?

Strict Consistency Do you need changes to be instantly visible to all readers?

Data Freshness Do you need the absolute latest data, to the millisecond?

ACID Compliance If you work with ordering or payments, you want transactions.

Query Accuracy Is there room for error for the results to your queries?

Persistence/Durability Should data be stored on a permanent medium (HDD, SSD)?

High Availability Is it required that the data stores stays available throughout hardware and network failures?

Big• Cost grows linearly with data size• No performance degradation with size

Flexible On-the-fly queries

Accurate Exact computationEstimate resultStrict consistency

Fast Fast ReadsFast WritesFast Updates

Cost & Complexity

Redshift: Queries at Scale• Tables have sort keys (like indexes)• Tables have one distribution key• Defines how data is split over nodes

• Tables are split in sorted regions• Each region has several slices spread across nodes• Split across several instances• Each column has its own compression type• SSD-enabled (200 GB node)• Results ….

The Results

• The query on the previous slide (it is actually 4-5 A4 pages long)• Over 39,031,958 rows (100-150 GB)• Took 4.039s

* The data store stores 3 TB over 12 instances

Ordered-Bucket Sampling

Let’s say we want to sample 20% of events for a specific scenario.We split events into 10 buckets, depending on the hash of their “user id”.

Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Bucket #6 Bucket #7 Bucket #8 Bucket #9 Bucket #10

Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Bucket #6 Bucket #7 Bucket #8 Bucket #9 Bucket #10

er1bhUygQoRrPvonNRyw -(hash)> Bucket 332m9bGzQQMs7162ObeRt -(hash)> Bucket 7(…)

Then we sample only those events from Bucket #1 and Bucket #2.