Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013
-
Upload
javier-ramirez -
Category
Documents
-
view
1.558 -
download
0
description
Transcript of Api analytics using Redis and Google Bigquery. Jramirez,teowaki.codemotion2013
javier ramirez@supercoco9
Bigdata: Anal tica ípara tu API con Redis, AWS y Google Bigquery
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Servicios para desarrolladoresexpuestos mediante API REST
+web en AngularJS como
cliente del API
solución obvia:
usar un servicio como 3scale o apigee
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez@supercoco9
gracias!¡preguntas?¿
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
1. medir sin interferir2. almacenar el histórico3. evitar el vendor lock-in4. consultas interactivas5. baratito6. bola extra: tiempo real
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
data that’s an order of magnitude greater than data you’re accustomed to
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Doug Laney VP Research, Business Analytics and Performance Management at Gartner
data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.
Ed Dumbill program chair for the O’Reilly Strata Conference
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Javier Ramirezimpresionable teowaki founder
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q
SET: 552,028 requests per secondGET: 707,463 requests per secondLPUSH: 767,459 requests per secondLPOP: 770,119 requests per second
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q
SET: 122,556 requests per secondGET: 123,601 requests per secondLPUSH: 136,752 requests per secondLPOP: 132,424 requests per second
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
http://redis.io
started in 2009 by Salvatore Sanfilippo @antirez
100 contributors at https://github.com/antirez/redis
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Redis lo guarda todoen memoriatodo el rato
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Cada time line (800 tweets por usuario) est en redisá
5000 writes per second avg300K reads per second
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
viacomGrafo de dependencias entre objetos. Cach “potente”é
Redis como cola para trabajos de fondo
Registro de actividad y contadores de visualizaciones como buffer antes de guardar en mysql
Scripts Lua trabajando en nodos esclavos para recalcular rankings de popularidad. El nuevo proceso lleva 1/60th del tiempo que tardaba la versi n anterior óen mysql
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
nginx + lua + redis
apache + mruby + redis
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
la solución obvia: guardar en ficheros de texto comprimidos en S3
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://aws.amazon.com/s3/
2013-10-15T03:51:25Z 105 GET /teams/107-tw/links
category_guid=111 application/json 93.96.140.216
application/json
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0
Teowaki Client ES Madrid
fichero de texto tsv
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
mejor todavía: enviar los ficheros a glacier para reducir el coste
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://aws.amazon.com/glacier/
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Hadoop (map/reduce)
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://hadoop.apache.org/
started in 2005 by Doug Cutting and Mike Cafarella
cassandra
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://cassandra.apache.org/
released in 2008 by facebook.
otras soluciones big data:
hadoop+voldemort+kafka
hbase
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://engineering.linkedin.com/projects
http://hbase.apache.org/
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Amazon Redshift
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
http://aws.amazon.com/redshift/
Nuestra elección: google bigquery
http://developers.google.com/bigquery
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Basado en Dremel
Específicamente diseñado para consultas interactivas en tiempo real sobre petabytes
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Columnar storage
F cil de comprimirá
Conveniente para consultas sobre series largas en una misma columna
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Analítica de datos como servicio
Permite ficheros planos o JSON con jerarquía
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
carga de datos
bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42-05Z.gz
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
conceptos de bigquery
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
projects
datasets
tables
jobs
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
Screenshot de la consola
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
SQL casi estándar
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
selectfromjoinwheregroup byhavingorderlimit
avgcountmaxminsum
+-*/%
&|^<<>>~
=!=<>><>= <=INIS NULLBETWEEN
ANDORNOT
Funciones variadas
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
current_datecurrent_timenowdatediffdayday_of_weekday_of_yearhourminutequarteryear...
absacosatanceilfloordegreesloglog2log10PISQRT...
concatcontainsleftlengthlowerupperlpadrpadrightsubstr
“sql” específico para analítica
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
A más fotos de Alf, menos fotos de gatos
Cosas que siempre quisiste probar pero no te dejaron
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
select count(*) from publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;
223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
usuario más activo
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
10 request que deberíamos cachear
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
5 recursos que más se crean
precio redis
2* máquinas (master/slave) en digital ocean
$10 mensuales
* estas mismas instancias de redis se usan para todo el resto de procesos en la app
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
precio s3
$0.095 per GB
un fichero comprimido de 1.6 MB representa 300K filas
$0.0001541698 / mes
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
precio glacier
$0.01 per GB
un fichero comprimido de 1.6 MB representa 300K filas
$0.000016 / mes
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
precio bigquery
$80 por TB almacenado300000 filas => $0.007629392 / mes
$35 por TB procesado1 full scan = 84 MB1 count = 0 MB1 full scan de 1 columna = 5.4 MB10 GB => $0.35 / mes
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
redis $10.0000000000s3 storage $00.0001541698s3 transfer $00.0050000000
glacier transfer $00.0500000000glacier storage $00.0000160000bigquery storage $00.0076293920bigquery queries $00.3500000000
$10.41 / mespara 330000 registros
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
javier ramirez @supercoco9 http://teowaki.com codemotion 2013
1. medir sin interferir2. almacenar el histórico3. evitar el vendor lock-in4. consultas interactivas5. baratito6. bola extra: tiempo real
Si te ha gustado mi presentaci n, por favor óagrad cemelo registr ndote ené á
http://teowaki.com
Es un sitio para desarrolladores, puedes usarlo de forma gratu ta y yo creo que mola bastanteí
<3 <3 <3Javier Ramírez
@supercoco9
codemotion 2013