infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory...

36
© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved. bb 1 Better connections. Better results.

Transcript of infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory...

Page 1: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved.

bb

1

Better connections. Better results.

Page 2: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved.

How to effectively process large data sets

Big Data in practice

Mariusz Róg, Team Leader of Engineering, Acxiom Global Service Center, Poland

Page 3: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.3

The talk

The solution

Who we are

The problem

Page 4: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.4

Who we are ?

„One of the biggest companies you've never heard of.”

source: http://en.wikipedia.org/wiki/Acxiom

Page 5: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.

Who we work for?

• 7 of the top 10 credit card issuers• 7 of the top 10 retail banks• 7 of the top 10 retailers• 6 of the top 10 telecom / media companies• 7 of the top 10 automotive manufacturers• 7 of the top 10 U.S. hotels• 5 of the top 10 technology companies• 3 of the top 5 brokerage firms• 3 of the top 5 pharmaceutical manufacturers• 8 of the top 10 insurance providers• 7 of the top 10 hotels• 3 of the top 5 domestic airlines• 4 of the top 5 gaming companies

The trademarks and registered trademarks on this page are the property of their respective owners. Stats updated as of 7/8/13.

Page 6: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.

Where we are?

Note: Acxiom also delivers solutions in many geographies where it does not have a physical presence.

ArgentinaAustraliaBahrainBangladeshBrazilCanadaChileChina

(including Hong Kong and Taiwan)

ColombiaEgyptFranceGermanyIndiaIndonesiaIsraelJapanJordanKoreaKuwait

LebanonMalaysiaMexicoMongoliaNew ZealandOmanPhilippinesPolandQatarRussiaSaudi ArabiaSingaporeSouth AfricaThailandUnited Arab EmiratesUnited KingdomUnited StatesVenezuelaVietnam

Acxiom provides data, processing, consulting, SMS / digital and / or other services to more than 7,500 recurring clients around the globe in approximately 50 countries and 20 languages.

Offices located in these markets

Services available in these markets

Sample of Countries

Page 7: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.7

What we do?

We do data!

Marketing and information management services

SaaS development

Analitics

Customers developement services

Technical and marketing consultingITO support and consulting

Technology R&D

Big Data managment PII and Web security

PII and Healthcare world wide compilance

Forrester Research named Acxiom one of the largest database marketing services and technology providers in the world

Page 8: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.8

We transform.

Formerly Microsoft, aQuantive &

Razorfish

Formerly MySpace,

MTV & AOL

Formerly CFO Amazon, NBC & Electronic Arts

Formerly Architect of Google Analytics

Dennis D. Self

CIO, SVP

Formerly Electronic Arts

and HP

Page 9: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.

The Acxiom Audience operating system

Page 10: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.10

Audience Propensities

Page 11: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.11

Regression Model

Page 12: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.12

The talk

The solution

Who we are

The problem

Page 13: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.13

The flow

Age

f(x,y,z,...)f(x,y,z,...)

Page 14: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.14

The problem

• 21 dimentions (avg inputs)

For example

• 3823 regression models

• 31B (avg size)

• 242271350 people (242M)

Page 15: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.15

The BIG problem

3823 x 21 x 31 x 242271350 = 602 958 394 553 550 B

376 cores47 nodes @ 8x3 Ghz and 32 GB

~548 TB

More than a week!

Page 16: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.16

The talk

The solution

Who we are

The problem

Page 17: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.17

3 Steps

µservices

µµ

µ

µ

µµ

µµ

µ

µ

µ

µ

virtual infrastructure

the system

Page 18: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.18

Can’t say much...

Page 19: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.19

The foundation

We needed a communication framework

• It’s needed to be Fast

• It’s needed to be Stable

• It’s needed to be Concurrent

Page 20: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.20

The ØMQ

Page 21: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.21

It is not a Messaging Queue or ESB

Page 22: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.22

Confused?

Page 23: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.23

Authors

iMatixReal time financial systems

Page 24: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.24

ØMQ

Source: http://zguide2.zeromq.org/

Page 25: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.25

Confused again?

Page 26: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.26

What it is ?

• Ø latency communication framework• Queue based framework• Concurrency framework• Easy and intuitive API• LGPL

Example…

Page 27: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.27

Clientprivate static final ZMQ.Context zmqContext = ZMQ.context(zmqThreadCount);

ZMQ.Socket socket = zmqContext.socket(ZMQ.REQ);socket.setSendTimeOut(sendTimeout);socket.connect("tcp://*:5555");

// ZMQ_MSG_FLAGS 0 = blocking socket typeboolean messageSent = socket.send(msg, ZMQ_MSG_FLAGS);

if(!messageSent){ LOG.error("Error receiving response for {}", this);

}

byte[] responseMsg = socket.recv(ZMQ_MSG_FLAGS);

if(responseMsg == null){ LOG.error("Error receiving response for {}", this);

}

Page 28: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.28

Serverprivate static final ZMQ.Context zmqContext = ZMQ.context(zmqThreadCount);...ZMQ.Socket socket = zmqContext.socket(ZMQ.REQ);socket.setReceiveTimeOut(receiveTimeout);

socket.bind("tcp://*:5555");socket.bind("inproc://workers");

while (Thread.currentThread().isInterrupted() == false) {byte[] recivedBytes = socket.recv(0);if(recivedBytes == null){

LOG.error("Error receiving response for {}", this);}

boolean messageSent = socket.send(msg, ZMQ_MSG_FLAGS); if(!messageSent){

LOG.error("Error receiving response for {}", this);}

}

Page 29: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.29

ØMQ Message

• Atomic• Can be Multipart• Source/Dest• Can be Routed/Proxed/Analized• Data agnostic

preffered Google Protobuf

Page 30: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.30

ØMQ Socket Types

• Unicast• TCP („tcp://localhost:5555”)• IPC („ipc://storeandforward”)• INPROC („inproc://emailThread”)

• Multicast• PGM/EPGM („epgm://192.168.1.1:5555”)

Page 31: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.31

Basic Patterns

source: http://zguide.zeromq.org/

Page 32: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.32

Advanced Patterns

source: http://zguide.zeromq.org/

Page 33: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.33

Growing

zmq - 0MQ lightweight messaging kernelzmq_bind - accept connections on a socketzmq_close - close 0MQ socketzmq_connect - connect a socketzmq_cpp - interface between 0MQ and C++ applicationszmq_device - start built-in 0MQ devicezmq_pgm - 0MQ reliable multicast transport using PGMzmq_errno - retrieve value of errno for the calling threadzmq_getsockopt - get 0MQ socket optionszmq_init - initialise 0MQ contextzmq_inproc - 0MQ local in-process (inter-thread) communication transportzmq_ipc - 0MQ local inter-process communication transportzmq_msg_close - release 0MQ messagezmq_msg_copy - copy content of a message to another messagezmq_msg_data - retrieve pointer to message contentzmq_msg_init_data - initialise 0MQ message from a supplied bufferzmq_msg_init_size - initialise 0MQ message of a specified sizezmq_msg_init - initialise empty 0MQ messagezmq_msg_move - move content of a message to another messagezmq_msg_size - retrieve message content size in byteszmq_pgm - 0MQ reliable multicast transport using PGMzmq_poll - input/output multiplexingzmq_recv - receive a message from a socketzmq_send - send a message on a socketzmq_setsockopt - set 0MQ socket optionszmq_socket - create 0MQ socketzmq_strerror - get 0MQ error message stringzmq_tcp - 0MQ unicast transport using TCPzmq_term - terminate 0MQ contextzmq_version - report 0MQ library version

zmq - 0MQ lightweight messaging kernelzmq_bind - accept incoming connections on a socketzmq_close - close 0MQ socketzmq_connect - create outgoing connection from socketzmq_ctx_destroy - terminate a 0MQ contextzmq_ctx_get - get context optionszmq_ctx_new - create new 0MQ contextzmq_ctx_set - set context optionszmq_ctx_shutdown - shutdown a 0MQ contextzmq_ctx_term - destroy a 0MQ contextzmq_curve_keypair - generate a new CURVE keypairzmq_curve - secure authentication and confidentialityzmq_disconnect - Disconnect a socketzmq_pgm - 0MQ reliable multicast transport using PGMzmq_errno - retrieve value of errno for the calling threadzmq_getsockopt - get 0MQ socket optionszmq_init - initialise 0MQ contextzmq_inproc - 0MQ local in-process (inter-thread) communication transportzmq_ipc - 0MQ local inter-process communication transportzmq_msg_close - release 0MQ messagezmq_msg_copy - copy content of a message to another messagezmq_msg_data - retrieve pointer to message contentzmq_msg_get - get message propertyzmq_msg_init_data - initialise 0MQ message from a supplied bufferzmq_msg_init_size - initialise 0MQ message of a specified sizezmq_msg_init - initialise empty 0MQ messagezmq_msg_more - indicate if there are more message parts to receivezmq_msg_move - move content of a message to another messagezmq_msg_recv - receive a message part from a socketzmq_msg_send - send a message part on a socketzmq_msg_set - set message propertyzmq_msg_size - retrieve message content size in byteszmq_null - no security or confidentialityzmq_pgm - 0MQ reliable multicast transport using PGMzmq_plain - clear-text authenticationzmq_poll - input/output multiplexingzmq_proxy_steerable - start built-in 0MQ proxy with PAUSE/RESUME/TERMINATE control flowzmq_proxy - start built-in 0MQ proxyzmq_recvmsg - receive a message part from a socketzmq_recv - receive a message part from a socketzmq_send_const - send a constant-memory message part on a socketzmq_sendmsg - send a message part on a socketzmq_send - send a message part on a socketzmq_setsockopt - set 0MQ socket optionszmq_socket_monitor - register a monitoring callbackzmq_socket - create 0MQ socketzmq_strerror - get 0MQ error message stringzmq_tcp - 0MQ unicast transport using TCPzmq_term - terminate 0MQ contextzmq_unbind - Stop accepting connections on a socketzmq_version - report 0MQ library versionzmq_z85_decode - decode a binary key from Z85 printable textzmq_z85_encode - encode a binary key as Z85 printable text

ØMQ v 2.2 ØMQ v 4.0

Page 34: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.34

Cross platform

• NetMQhttps://github.com/zeromq/netmq

Haxe

C++

C# ClojureCL Erlang

F#

Felix

Go

Haskell

Java

Lua

Node.js Objective-C Perl

PHP

Python

Racket

ooc

Basic

Ada

Tcl

Scala

Ruby

Q

• JeroMQhttps://github.com/zeromq/jeromq

Page 35: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved.35

Visit us on

https://developer.myacxiom.com/

http://acxiom.com/about-acxiom/careers/

Page 36: infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarzać wielkie zbiory danych.

© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved.

QA: [email protected]

Thank You!