API analytics with Redis and Google Bigquery. NoSQL matters edition
-
Upload
javier-ramirez -
Category
Documents
-
view
1.785 -
download
5
description
Transcript of API analytics with Redis and Google Bigquery. NoSQL matters edition
![Page 1: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/1.jpg)
javier ramirez@supercoco9
API Analytics with Redis
and Google Bigquery
![Page 2: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/2.jpg)
![Page 3: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/3.jpg)
![Page 4: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/4.jpg)
![Page 5: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/5.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
REST API +
AngularJS web as an API client
![Page 6: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/6.jpg)
![Page 7: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/7.jpg)
![Page 8: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/8.jpg)
![Page 9: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/9.jpg)
obvious solution:
use a ready-made service as 3scale or apigee
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 10: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/10.jpg)
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 11: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/11.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 12: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/12.jpg)
data that’s an order of magnitude greater than data you’re accustomed to
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
Doug Laney VP Research, Business Analytics and Performance Management at Gartner
![Page 13: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/13.jpg)
data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.
Ed Dumbill program chair for the O’Reilly Strata Conference
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 14: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/14.jpg)
bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
Javier Ramirezimpresionable teowaki founder
![Page 15: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/15.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 16: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/16.jpg)
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)
$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q
SET: 552,028 requests per secondGET: 707,463 requests per secondLPUSH: 767,459 requests per secondLPOP: 770,119 requests per second
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q
SET: 122,556 requests per secondGET: 123,601 requests per secondLPUSH: 136,752 requests per secondLPOP: 132,424 requests per second
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 17: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/17.jpg)
open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
http://redis.io
started in 2009 by Salvatore Sanfilippo @antirez
100 contributors at https://github.com/antirez/redis
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 18: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/18.jpg)
what is it used for
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 19: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/19.jpg)
Every time line (800 tweets per user) is stored in redis
5000 writes per second avg300K reads per second
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 20: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/20.jpg)
nginx + lua + redis
apache + mruby + redis
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 21: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/21.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 22: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/22.jpg)
Redis keeps
everything in memory all the time
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 23: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/23.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 24: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/24.jpg)
easy: store GZIPPED files into S3/Glacier
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
* we are moving to google cloud now
![Page 25: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/25.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 26: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/26.jpg)
Hadoop (map/reduce)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
http://hadoop.apache.org/
started in 2005 by Doug Cutting and Mike Cafarella
![Page 27: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/27.jpg)
cassandra
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
http://cassandra.apache.org/
released in 2008 by facebook.
![Page 28: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/28.jpg)
other big data solutions:
hadoop+voldemort+kafka
hbase
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
http://engineering.linkedin.com/projects
http://hbase.apache.org/
![Page 29: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/29.jpg)
Amazon Redshift
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
http://aws.amazon.com/redshift/
![Page 30: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/30.jpg)
Our choice:
google bigquery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 31: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/31.jpg)
Based on Dremel
Specifically designed for interactive queries over petabytes of real-time data
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 32: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/32.jpg)
Columnar storage
Easy to compress
Convenient for querying long series over a single column
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 33: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/33.jpg)
loading data
You can feed flat CSV files or nested JSON objects
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 34: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/34.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
bq cli
bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42-05Z.gz
![Page 35: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/35.jpg)
web console screenshot
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 36: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/36.jpg)
almost SQL
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
selectfromjoinwheregroup byhavingorderlimit
avgcountmaxminsum
+-*/%
&|^<<>>~
=!=<>><>= <=INIS NULLBETWEEN
ANDORNOT
![Page 37: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/37.jpg)
Functions overview
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
current_datecurrent_timenowdatediffdayday_of_weekday_of_yearhourminutequarteryear...
absacosatanceilfloordegreesloglog2log10PISQRT...
concatcontainsleftlengthlowerupperlpadrpadrightsubstr
![Page 38: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/38.jpg)
analytics specific extensions
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
![Page 39: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/39.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
window functions
![Page 40: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/40.jpg)
correlations
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 41: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/41.jpg)
Things you always wanted to try but were too scare to
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
select count(*) from publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;
223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
![Page 42: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/42.jpg)
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
![Page 43: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/43.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
country segmented traffic
![Page 44: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/44.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
our most active user
![Page 45: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/45.jpg)
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
10 request we should be caching
![Page 46: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/46.jpg)
javier ramirez @supercoco9 http://teowaki.com nosqlmatters 2013
5 most created resources
![Page 47: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/47.jpg)
redis pricing
2* machines (master/slave) at digital ocean
$10 monthly
* we were already using these instances for a lot of redis use cases
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 48: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/48.jpg)
s3 pricing
$0.095 per GB
a gzipped 1.6 MB file stores 300K rows
$0.0001541698 / monthly
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 49: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/49.jpg)
glacier pricing
$0.01 per GB
$0.000016 / monthly
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 50: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/50.jpg)
bigquery pricing
$80 per stored TB300000 rows => $0.007629392 / month
$35 per processed TB1 full scan = 84 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB10 GB => $0.35 / month
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 51: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/51.jpg)
redis $10.0000000000s3 storage $00.0001541698s3 transfer $00.0050000000
glacier transfer $00.0500000000glacier storage $00.0000160000bigquery storage $00.0076293920bigquery queries $00.3500000000
$10.41 / monthfor our first 330000 rows
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 52: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/52.jpg)
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
![Page 53: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/53.jpg)
![Page 54: API analytics with Redis and Google Bigquery. NoSQL matters edition](https://reader034.fdocuments.us/reader034/viewer/2022042614/554f59b4b4c905524c8b5403/html5/thumbnails/54.jpg)
Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Gr cies!à
Javier Ramírez@supercoco9
nosqlmatters 2013