Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
-
Upload
datastax-academy -
Category
Technology
-
view
732 -
download
1
Transcript of Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
![Page 1: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/1.jpg)
A CHANGE OF SEASONSA big move to Apache Cassandra
Eiti Kimura, IT Coordinator @Movile Brazil
![Page 2: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/2.jpg)
Eiti Kimura
![Page 3: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/3.jpg)
Spreading the word...
![Page 4: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/4.jpg)
Leader in Latin America
Mobile phones, Smartphones and Tablets
Movile is the company behind the apps that make your life easier.
![Page 5: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/5.jpg)
![Page 6: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/6.jpg)
![Page 7: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/7.jpg)
We think mobile...Movile develops apps across all platforms for smartphones and tablets to not only make life easier, but also more fun.
The company recorded an annual average growth of 80% in the last 7 years
![Page 8: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/8.jpg)
use cases3
THAT Constitute
THE BIG move toApache Cassandra
![Page 9: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/9.jpg)
- Move I -
The Subscription and Billing System a.k.a SBS
![Page 10: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/10.jpg)
Subscription and Billing Platform
- it is a service API- responsible to manage user’s subscriptions- responsible to charge users in carriers- an engine to renew subscriptions
“can not” stop under any circumstanceit has to be very performatic
![Page 11: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/11.jpg)
The platform in numbers
88 Million of Subscriptions
66,1M of unique users
105M of transactions a day
![Page 12: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/12.jpg)
Platform Evolution timeline
2008
Pure relational database times
2009
Apache Cassandra adoption (v0.6)
2011
The data model was entirely remodeled4 nodes
Cluster upgrade from version 1.0 to 1.2
2013
Cluster upgrade from version 0.7 to 1.0
Expanded from 4 to 6 nodes
2014
New data index using time series
2015
THE BIG MOVEmigrating complex queries from relational database
![Page 13: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/13.jpg)
Initial architecture revisited
API
DB
API APIAPI API
Engine
Engine Engine
Classical solution using a regular RDBMS
![Page 14: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/14.jpg)
Architecture disadvantages
- single point of failure- slow response times- platform gone down often- hard and expensive to scale- if you scale your platform and forget to scale
database and other related resources you’ll fail
![Page 15: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/15.jpg)
A new architecture has come
APIAPI
Engine
Engine
DB
A hybrid solution using Apache Cassandra Cluster plus a relational database solution to execute complex queries
Regular SQLQueries
APIAPI
![Page 16: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/16.jpg)
The benefits of new solution
- performance problems: solved- availability problems: solved- single point of failure: partially solved- significantly increased read and write
throughput
![Page 17: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/17.jpg)
The solution weaknesses
Engine
Engine
DB
SQL Queries
![Page 18: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/18.jpg)
![Page 19: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/19.jpg)
- querying relational database consumes time- has side effects, it locks data being updated
and inserted- concurrency causes performance
degradation- it does not scale well- we still need to use relational database to
execute complex queries
The problems
![Page 20: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/20.jpg)
The complex query..
- query subscription table- selects expired subscriptions- the subscriptions must be grouped by user- must be ordered by priority, criteria, type of
user plan
![Page 21: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/21.jpg)
Sort data
Aggregation
Filter Criterias
Projection
SQLServer’s querySELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt
FROM subscription AS s WITH(nolock)
JOIN configuration AS c WITH(nolock)
ON s.configuration_id = c.configuration_id
WHERE s.enabled = 1
AND s.timeout_date < GETDATE()
AND s.related_id IS NULL
AND c.carrier_id = ?
AND ( c.enabled = 1 AND
( c.renewable = 1 OR c.use_balance_parameter = 1 ) )
GROUP BY s.phone
ORDER BY charge_priority DESC, max(user_plan) DESC,
min_last_renew_attempt
![Page 22: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/22.jpg)
The solution- Extract data from Apache Cassandra instead
of use relational database- There is no single point of failure- Performance improved, but more work
querying and filtering dataMain concern: distributed sort data by multiple criterias and data aggregation
- Apache Spark!?
![Page 23: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/23.jpg)
- Databricks to use Apache Spark to sort 100 TB of data on 206 machines in 23 minutes
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
![Page 24: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/24.jpg)
Divide-And-Conquer
Preparing for the new solution
Subscription Subscription Index
● configuration_id○ phone-number
Using a new table as index applying data denormalization!
● each subscription becomes a column (time series)
![Page 25: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/25.jpg)
Proof of Concept with Apache Spark
Data Extractor
Processor
![Page 26: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/26.jpg)
Preparing Resources Processor
Java Code Snippet
JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App",
SPARK_HOME, "spark-simple-1.0.jar");
// Get file from resources folder
ClassLoader classLoader = SparkFileJob.class.getClassLoader();
File file = new File(classLoader.getResource("dataset-10MM.json").getFile());
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json(file.getPath());
df.registerTempTable("subscription");
![Page 27: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/27.jpg)
Preparing and Executing QuerySELECT phone, MAX(charge_priority) as max_priority,
FROM subscription
WHERE enabled = 1
AND timeout_date < System.currentTimeMillis()
AND related_id IS NULL
AND carrier_id in (1, 4, 2, 5)
GROUP BY phone
ORDER BY max_priority DESC, max_plan DESC
sqlContext.sql(query)
.javaRDD()
.foreach(row -> process(row));
Spark SQL Query
Java code snippet
![Page 28: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/28.jpg)
- We have Datastax Spark-Cassandra-Connector!- It allow to expose Cassandra tables as Spark RDDs- use Apache Spark coupled to Cassandra
https://github.com/datastax/spark-cassandra-connector
https://github.com/eiti-kimura-movile/spark-cassandra
![Page 29: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/29.jpg)
Next Steps
- upgrade cluster version to >= 2.1- cluster read improvements in 50% from thrift
to CQL, native protocol v3- implement the final solution Cassandra +
Spark
![Page 30: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/30.jpg)
- Move II -
The Kiwi Migration
![Page 31: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/31.jpg)
The Kiwi Platform- it is a common backend smartphone
platform- provides user and device management- user event and media tracker- analytics- push notifications
High Performance Availability Required
![Page 32: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/32.jpg)
Kiwi: The beginning
API
Consumer
Consumer
API
DynamoDB
Queue SQS
Queue SQS
PostgreSQL
![Page 33: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/33.jpg)
Push notifications
![Page 34: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/34.jpg)
low reading throughput
The push notification crusade
PostgreSQL
Push Publisher
Push Publisher
Push Publisher
Apple notificationservice
Google notificationservice
![Page 35: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/35.jpg)
The problems (dejavú?)
- single point of failure with PostgreSQL- high costs paying for 2 storage services- DynamoDB does not have good read
throughput for linear readings- RDS PostreSQL tuning limit reached- low throughput sending notifications
![Page 36: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/36.jpg)
Slowness means frustration
![Page 37: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/37.jpg)
The solution in numbers
- data storage cost- Amazon DynamoDB: U$ 4,575.00 / mo- PostgreSQL (RDS): U$ 6,250.00 / mo
- read throughput measured- Amazon DynamoDB: 1,4k /s (linear, sequential reads)- PostgreSQL (RDS): 10k /s
U$ 10,825.00 / mo
![Page 38: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/38.jpg)
Push Publisher
Push Publisher
Push Publisher
Apple notificationservice
Google notificationservice
Remodeled solution, Cassandra Way
![Page 39: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/39.jpg)
Datamodel changes
- Amazon DynamoDB- object serialized with Avro- a few columns
- Apache Cassandra - exploded object- more than 80 columns without serialization
![Page 40: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/40.jpg)
Conclusion
AWS DynamoDB + Postgres = U$ 10,825.00/moRead Throughput = ~ 12k/s
Apache Cassandra (8 nodes c3.2xlarge) = U$ 2,580.00/mo
Read Throughput = ~ 200k/s
Before Migration
After Migration
savings of 300%!!!
![Page 41: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/41.jpg)
- Move III -
Distributing Resources
![Page 42: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/42.jpg)
What a kind of resources?
The black listed phone numbers
The ported phone numbers database
Text file resources
![Page 43: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/43.jpg)
Messaging platform
- resources checked before send messages- identify the user carrier- resources loaded up in the memory (RAM)- servers off-cloud (hard to upgrade)
Problem: larger resource files for the same amount of memory
![Page 44: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/44.jpg)
4GB - 6GB RAM
Loading everything, RAM story
Message Publisher
Black list Portability
- low JVM responses (GC)- server memory limit
reached- files continue to grow- more than 20 instances in
different servers loading the same resources
![Page 45: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/45.jpg)
How about a distributed solution?
- the resource files are the same in all of the servers
- RAM memory does not scale well - It is an expensive solution
So..- Why not distribute resources around a ring?
![Page 46: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/46.jpg)
The distributed resources solution
DC1
DC2
DC3
MessagePublisher
MessagePublisher
MessagePublisherMessage
Publisher
MessagePublisherMessage
PublisherMessagePublisherMessage
Publisher
MessagePublisherOther
Platforms
![Page 47: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/47.jpg)
- common information are shared across a Cassandra cluster
- the massive hardware upgrade: solved- the data are available for other platforms- it is highly scalable- easy to accommodate more data
Checking the results
![Page 48: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/48.jpg)
![Page 49: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/49.jpg)
Wrapping up the Moves
- always upgrade to newest versions- high throughput and availability makes a
difference- costs really, really matter!- the horizontal scalability is great! if your
volume of data grow, increase the number of nodes
![Page 50: Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra](https://reader031.fdocuments.us/reader031/viewer/2022030402/58810f8a1a28ab22368b6ee3/html5/thumbnails/50.jpg)
eitikimura eiti-kimura-movile [email protected]