Acunu Analytics and Cassandra at Hailo All Your Base 2013

Post on 24-Jan-2015

1.218 views 1 download

description

Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!

Transcript of Acunu Analytics and Cassandra at Hailo All Your Base 2013

ALL YOUR BASE 2013

Acunu Analytics and Cassandra at Hailo

Tim Moreton, CTO at AcunuDavid Gardner, Architect at Hailo

ALL YOUR BASE 2013

Dave

ALL YOUR BASE 2013

What is Hailo?

Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.

ALL YOUR BASE 2013

ALL YOUR BASE 2013

• The world’s highest-rated taxi app – over 11,000 five-star reviews

• Over 500,000 registered passengers

• A Hailo hail is accepted around the world every 4 seconds

• Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation

What is Hailo?

ALL YOUR BASE 2013

The history

The story behind Cassandra and Acunu adoption at Hailo

ALL YOUR BASE 2013

Hailo launched in London in November 2011 • Launched on AWS

• Two PHP/MySQL web apps plus a Java backend

• Mostly built by a team of 3 or 4 backend engineers

• MySQL multi-master for single AZ resilience

• Get/create/update entity

• Analytics

• Text search

ALL YOUR BASE 2013

Why Cassandra?• A desire for greater resilience – “become a utility”

Cassandra is designed for high availability

• Plans for international expansion around a single consumer appCassandra is good at global replication

• Expected growthCassandra scales linearly for both reads and writes

• Prior experienceI had experience with Cassandra and could recommend it

ALL YOUR BASE 2013

The path to adoption• Largely unilateral decision by developers – a result of a startup

culture

• Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store

• Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London

ALL YOUR BASE 2013

One year on...• Further decompose functionality into Go/Java SOA

• Migrating:

Entity databases to Cassandra

Analytics to Acunu

Search into Elastic Search

ALL YOUR BASE 2013

Cassandra

ALL YOUR BASE 2013

We like Cassandra• Solid design

• HA characteristics

• Easy multi-DC setup

• Simplicity of operation

ALL YOUR BASE 2013

“Cassandra just works”

Dom W, Senior Engineer

ALL YOUR BASE 2013

CF = customers

126007613634425612: createdTimestamp: 1370465412 email: dave@cruft.co givenName: Dave familyName: Gardner locale: en_GB phone: +447911111111

ALL YOUR BASE 2013

Considerations for entity storage• Do not read the entire entity, update one property and then write

back a mutation containing every column

• Only mutate columns that have been set

• This avoids read-before-write race conditions

ALL YOUR BASE 2013

ALL YOUR BASE 2013

CF = stats_db

2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…

ALL YOUR BASE 2013

CF = stats_db

LON123456: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… 20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… 2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… 338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…

ALL YOUR BASE 2013

ALL YOUR BASE 2013

Considerations for time series storage• Choose row key carefully, since this partitions the records

• Think about how many records you want in a single row

• Denormalise on write into many indexes/views

ALL YOUR BASE 2013

ALL YOUR BASE 2013

10 Average years experience per team member

MySQL Cassandra

ALL YOUR BASE 2013

ALL YOUR BASE 2013#CASSANDRAEU CASSANDRASUMMITEU

People who canattempt to queryMySQL

People who canattempt to

query Cassandra

ALL YOUR BASE 2013

ALL YOUR BASE 2013

Acunu Analytics

ALL YOUR BASE 2013

Analytics• With Cassandra we lost the ability to carry out analytics

eg: COUNT, SUM, AVG, GROUP BY

• We use Acunu Analytics to give us this ability in real time, for pre-planned query templates

• It is backed by Cassandra and therefore highly available, resilient and globally distributed

• Integration is straightforward

ALL YOUR BASE 2013

Events

NSQ

ALL YOUR BASE 2013

Events

NSQ

ALL YOUR BASE 2013

Analytics turns events and SQL-like queries into C* operations

Events

NSQ

ALL YOUR BASE 2013

Analytics turns events and SQL-like queries into C* operations

Events

Cassandra stores raw events and intermediate results

NSQ

ALL YOUR BASE 2013

Analytics turns events and SQL-like queries into C* operations

Events

Cassandra stores raw events and intermediate results

Acunu Dashboards provides real-time visualization

AlertsNSQ

ALL YOUR BASE 2013

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtag

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtag

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtag

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtagraw events

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtagraw events

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

3 Rich instant queries over cubesSELECT TOP(keyword) FROM table WHERE browser = ‘chrome’ AND time BETWEEN..GROUP BY d1, d2, ... JOIN ... HAVING.. ORDER BY ..

+

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtagraw events

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

3 Rich instant queries over cubesSELECT TOP(keyword) FROM table WHERE browser = ‘chrome’ AND time BETWEEN..GROUP BY d1, d2, ... JOIN ... HAVING.. ORDER BY ..

+

4 Drilldown to raw events

ALL YOUR BASE 2013

count by day count by hour of day

uniques by hashtagraw events

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

3 Rich instant queries over cubesSELECT TOP(keyword) FROM table WHERE browser = ‘chrome’ AND time BETWEEN..GROUP BY d1, d2, ... JOIN ... HAVING.. ORDER BY ..

+

5 Backfill new cubes using historic data

ALL YOUR BASE 2013

AQLSELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn)FROM AllocationsWHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’GROUP BY timestamp(day)

ALL YOUR BASE 2013

ALL YOUR BASE 2013

Use Cases• Infrastructure and Application monitoring

• Real-time A/B testing of app layout and incentives

• Real time geo-view of supply/demand for drivers

• Several more in the pipeline!

#AYBCONF ALL YOUR BASE 2013#CASSANDRAEU CASSANDRASUMMITEU

Conclusions

ALL YOUR BASE 2013

We like Cassandra and Acunu• Solid design

• HA characteristics

• Easy multi-DC setup

• Simplicity of operation

• With Acunu, rich queries again, easier denormalization

ALL YOUR BASE 2013

Lessons for successful adoption• Have an advocate, sell the dream

• Learn the fundamentals, get the best out of Cassandra

• Invest in tools to make life easier

• Keep management in the loop, explain the trade offs

ALL YOUR BASE 2013

Questions?