Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

63
javier ramirez @supercoco9 Bigdata: Anal tica í para tu API con Redis, AWS y Google Bigquery

description

¿Quieres monitorizar una API que potencialmente va a generar muchísima información? Es el momento de empezar a pensar en bigdata. En teowaki hemos montado un sistema de analítica de nuestra API usando redis como intermediario y Bigquery como almacén de datos. ¿El resultado? Consultas inmediatas sobre todos los logs de tráfico de nuestro sistema en apenas segundos. Hablaré de las diferentes alternativas evaluadas, y cómo estamos usano Bigquery para resolver nuestro problema

Transcript of Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Page 1: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez@supercoco9

Bigdata: Anal tica ípara tu API con Redis, AWS y Google Bigquery

Page 2: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Servicios para desarrolladoresexpuestos mediante API REST

+web en AngularJS como

cliente del API

Page 3: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013
Page 4: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013
Page 5: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013
Page 6: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

solución obvia:

usar un servicio como 3scale o apigee

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 7: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez@supercoco9

gracias!¡preguntas?¿

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 8: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

1. medir sin interferir2. almacenar el histórico3. evitar el vendor lock-in4. consultas interactivas5. baratito6. bola extra: tiempo real

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 9: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 10: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

data that’s an order of magnitude greater than data you’re accustomed to

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Doug Laney VP Research, Business Analytics and Performance Management at Gartner

Page 11: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.

Ed Dumbill program chair for the O’Reilly Strata Conference

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 12: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Javier Ramirezimpresionable teowaki founder

Page 13: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 14: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)

$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q

SET: 552,028 requests per secondGET: 707,463 requests per secondLPUSH: 767,459 requests per secondLPOP: 770,119 requests per second

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q

SET: 122,556 requests per secondGET: 123,601 requests per secondLPUSH: 136,752 requests per secondLPOP: 132,424 requests per second

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 15: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

http://redis.io

started in 2009 by Salvatore Sanfilippo @antirez

100 contributors at https://github.com/antirez/redis

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 16: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 17: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Redis lo guarda todoen memoriatodo el rato

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 18: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

para qué se usa

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 19: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 20: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

twitter

Cada time line (800 tweets por usuario) est en redisá

5000 writes per second avg300K reads per second

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 21: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

viacomGrafo de dependencias entre objetos. Cach “potente”é

Redis como cola para trabajos de fondo

Registro de actividad y contadores de visualizaciones como buffer antes de guardar en mysql

Scripts Lua trabajando en nodos esclavos para recalcular rankings de popularidad. El nuevo proceso lleva 1/60th del tiempo que tardaba la versi n anterior óen mysql

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 22: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

nginx + lua + redis

apache + mruby + redis

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 23: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 24: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 25: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

la solución obvia: guardar en ficheros de texto comprimidos en S3

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://aws.amazon.com/s3/

Page 26: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

2013-10-15T03:51:25Z 105 GET /teams/107-tw/links

category_guid=111 application/json 93.96.140.216

application/json

Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0

Teowaki Client ES Madrid

fichero de texto tsv

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 27: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

mejor todavía: enviar los ficheros a glacier para reducir el coste

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://aws.amazon.com/glacier/

Page 28: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 29: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 30: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 31: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 32: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Hadoop (map/reduce)

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://hadoop.apache.org/

started in 2005 by Doug Cutting and Mike Cafarella

Page 33: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

cassandra

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://cassandra.apache.org/

released in 2008 by facebook.

Page 34: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

otras soluciones big data:

hadoop+voldemort+kafka

hbase

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://engineering.linkedin.com/projects

http://hbase.apache.org/

Page 35: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 36: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Amazon Redshift

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

http://aws.amazon.com/redshift/

Page 37: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Nuestra elección: google bigquery

http://developers.google.com/bigquery

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 38: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Basado en Dremel

Específicamente diseñado para consultas interactivas en tiempo real sobre petabytes

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 39: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Columnar storage

F cil de comprimirá

Conveniente para consultas sobre series largas en una misma columna

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 40: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Analítica de datos como servicio

Permite ficheros planos o JSON con jerarquía

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 41: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

carga de datos

bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42-05Z.gz

Page 42: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 43: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

conceptos de bigquery

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

projects

datasets

tables

jobs

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 44: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Screenshot de la consola

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 45: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

SQL casi estándar

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

selectfromjoinwheregroup byhavingorderlimit

avgcountmaxminsum

+-*/%

&|^<<>>~

=!=<>><>= <=INIS NULLBETWEEN

ANDORNOT

Page 46: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Funciones variadas

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

current_datecurrent_timenowdatediffdayday_of_weekday_of_yearhourminutequarteryear...

absacosatanceilfloordegreesloglog2log10PISQRT...

concatcontainsleftlengthlowerupperlpadrpadrightsubstr

Page 47: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

“sql” específico para analítica

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

Page 48: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

window functions

Page 49: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

correlaciones

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 50: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

A más fotos de Alf, menos fotos de gatos

Page 51: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Cosas que siempre quisiste probar pero no te dejaron

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

select count(*) from publicdata:samples.wikipedia

where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;

223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

Page 52: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

Page 53: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

tráfico por país

Page 54: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

usuario más activo

Page 55: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

10 request que deberíamos cachear

Page 56: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

5 recursos que más se crean

Page 57: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

precio redis

2* máquinas (master/slave) en digital ocean

$10 mensuales

* estas mismas instancias de redis se usan para todo el resto de procesos en la app

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 58: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

precio s3

$0.095 per GB

un fichero comprimido de 1.6 MB representa 300K filas

$0.0001541698 / mes

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 59: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

precio glacier

$0.01 per GB

un fichero comprimido de 1.6 MB representa 300K filas

$0.000016 / mes

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 60: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

precio bigquery

$80 por TB almacenado300000 filas => $0.007629392 / mes

$35 por TB procesado1 full scan = 84 MB1 count = 0 MB1 full scan de 1 columna = 5.4 MB10 GB => $0.35 / mes

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 61: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

redis $10.0000000000s3 storage $00.0001541698s3 transfer $00.0050000000

glacier transfer $00.0500000000glacier storage $00.0000160000bigquery storage $00.0076293920bigquery queries $00.3500000000

$10.41 / mespara 330000 registros

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

Page 62: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

javier ramirez @supercoco9 http://teowaki.com codemotion 2013

1. medir sin interferir2. almacenar el histórico3. evitar el vendor lock-in4. consultas interactivas5. baratito6. bola extra: tiempo real

Page 63: Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013

Si te ha gustado mi presentaci n, por favor óagrad cemelo registr ndote ené á

http://teowaki.com

Es un sitio para desarrolladores, puedes usarlo de forma gratu ta y yo creo que mola bastanteí

<3 <3 <3Javier Ramírez

@supercoco9

codemotion 2013