Counting image views using redis cluster

33
Counting Image Views using Redis Cluster Seandon Mooy DevOps Engineer @erulabs

Transcript of Counting image views using redis cluster

Page 1: Counting image views using redis cluster

Counting Image Views using Redis Cluster

Seandon MooyDevOps Engineer

@erulabs

Page 2: Counting image views using redis cluster

Counting Image Views using Redis Cluster

Or…. how I stopped map-reducing and learned to love the stream

Seandon MooyDevOps Engineer

@erulabs

Page 3: Counting image views using redis cluster
Page 4: Counting image views using redis cluster
Page 5: Counting image views using redis cluster
Page 6: Counting image views using redis cluster

3 Billion!

Page 7: Counting image views using redis cluster
Page 8: Counting image views using redis cluster

Delay!

Page 9: Counting image views using redis cluster

Delay!

Failures!

Page 10: Counting image views using redis cluster

Delay!

Failures!

Failures!

Page 11: Counting image views using redis cluster

Also… I may not be the best zookeeper

Page 12: Counting image views using redis cluster

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!

Page 13: Counting image views using redis cluster

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!Optimized timeouts,added circuitbreakers, etc

Trickle of working requests during outage means circuit breakers are hard to design…

Page 14: Counting image views using redis cluster

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!Optimized timeouts,added circuitbreakers, etc

Trickle of working requests during outage means circuit breakers are hard to design…

“Hbase down == Imgur down”Downtime == sadtime :(

Page 15: Counting image views using redis cluster

3 Billion!

Page 16: Counting image views using redis cluster

Solution?

Redis Cluster!

Page 17: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Page 18: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Page 19: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Parses syslog lines, reports metrics via statsd

Page 20: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Parses syslog lines, reports metrics via statsd

Redis 3.2 cluster!

Page 21: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

Ingest service

Hbase Backfill service

Page 22: Counting image views using redis cluster

Fastly

ViewCount V2 - Real time with less complexity!

Ingest service

Hbase Backfill service

Internet

API service

Page 23: Counting image views using redis cluster

ViewCount V2 - Results:

Page 24: Counting image views using redis cluster

ViewCount V2 - Results:

Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000

Page 25: Counting image views using redis cluster

ViewCount V2 - Results:

Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000

Page 26: Counting image views using redis cluster

ViewCount V2 - Results:

20 billion commands!> 400GB in memory!

Page 27: Counting image views using redis cluster

Things to be aware of:

1. Redis Cluster shard maps - redirections, etc.Monitor redirections - gracefully restart workers after shard moves

2. AOF can slow down / fail large “redis-trib.rb” operations.Make sure to disable before / re-enable after!

3. Not all legacy systems support Redis Cluster, and if they do…They might not support it well (PHP-FPM)!

4. Over memory capacity behavior?Previously we would hard-crash - now we’d LRU old 1-view images.

Neither are good, but for us, one is much less painful

Page 28: Counting image views using redis cluster

ViewCount V3?Approaching the point of minimal gains for man-hours, but what else might be fun?

1. Moving PHP7 off NodeJS API and directly to Redis ClusterDownsides: dealing with shard maps is complex is a stateless / process-per-request environment!

2. Using redis3's BITFIELD or HSet to save on key storage costsDownsides: complicate the system, reduce “hit-by-a-bus” issues - keys are just hashes, values are just counts!

3. Dealing with the nature of TCP Streams (TCP is not HTTP!)One connection to rule them all! - Node’s Cluster module helps,

but perhaps Rust or Golang?Downsides: Vertical scaling is non-obvious on EC2

Page 29: Counting image views using redis cluster

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

Page 30: Counting image views using redis cluster

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

Page 31: Counting image views using redis cluster

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!

Page 32: Counting image views using redis cluster

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!

More fun! - I got to talk at RedisConf17!

ViewCount V2 - Results:

Page 33: Counting image views using redis cluster

Acknowledgment

Imgur DevOps Team

Imgur Platform Team