Bigdata meetup dwarak_realtime_score_app

13
Big data Chennai Meetup Title: What it takes to build system for 1 M web sockets Presenter: Dwarakanath R Principal Architect, 8KMiles

Transcript of Bigdata meetup dwarak_realtime_score_app

Big data Chennai

Meetup

Title: What it takes to

build system for 1 M web

sockets

Presenter: Dwarakanath RPrincipal Architect, 8KMiles

Cloud.8KMiles.com

Innovative sports media and events company based in US, brings authentic source of content and a new world of coverage in un-tapped sports.

A real-time score and analytics application that focus a comprehensive coverage on un-tapped sports like Running, Elite Fitness, Wrestling, and Gymnastics.

The sole objective of the application is to bring most accurate news, match report, battle formation, technical statistics, real-time scores, real-time odds comparison, historical data, forward analysis and other diversification services.

About the System

2

Cloud.8KMiles.com

LIVEresults: Design a real-time scorer system to cater 1M concurrent users.

Sportlytics: Design and engineer a sports analytics and reporting platform for the rapidly growing dataset (on going)

Benchmark the applications against requests load with of wide range of sizes

Setup the score application and analytics in on Cloud with attributes of HA, scaling and failover

Objectives

3

Cloud.8KMiles.com

• Scale from 100K users to 1M users without any fuss

• Engineer score update application in near-real-time with an interval of not more than 2 seconds.

• Benchmark the application against requests load with of wide range of sizes

LIVEresults:

• Ingest 55 TB of varied data from S3 to Amazon Redshift

• Arrive at a cost effective architecture model that will cater from simple reports to large & complex analytics

Sportlytics:

Challenges

4

Cloud.8KMiles.com

• The basic and simple objective of LIVEresults is to receive messages from Arena servers, store it and then push those live messages to client devices (browsers, mobile apps)

• The LIVEresults has the following contexts

• LSAGR, API application which exposes API services to receive periodic updates from Arena serves and store those updates in ElastiCache Redis

• Discovery, A service that informs IP address to clients to establish a socket connection. The discover also scales Dash nodes based on RAM and TCP connections

• Dash, A core application context that establishes socket / long polling connections to dash clients. This application pushes score updates to dash clients whenever there is new message from Redis pub-sub.

• LSAGR - The LiveScoring Aggregator (LSG) pushes the message data to the API endpoint. All the identified message calls follows a common JSON structure. The messages calls between LSG and LSAGR are sent in a configurable periodic interval. When server score updates are received to the API, the updates are stored / merged in Redis pub-sub. The updates can be full, partial updates, meta information or any form of match details. The LSAGR nodes does not carry any state which makes scale out and scale down of nodejs / socket.io servers easier.

• Discovery - The first place where clients requests for a end-point to establish connection. The Discovery service senses heat map of Dash nodes and intelligently routes the client request to any available slots inside Dash nodes. It runs in the same servers LSAGR.

• Dash - The application that pushes the updates to the client devices that has established long polling connections to the nodes.

• The persistence store Redis is setup in a Pub-Sub mode. The Redis store will be used only for temporary usage. All the nodejs/socket.io servers publish / subscribe to Redis server.

Cloud.8KMiles.com

Bouts / Rooms Concept

Socket.io

Atmos

Socket.io

Atmos

Socket.io

Atmos

Socket.io

Atmos

Redis Pub / Sub

Elasticache Redis

Persistence Store

Users

Topic 1

Topic 2

Topic 3

Topic 4

Users Users Users

Socket – Bout Mapping

Socket-id –{b1, b2, b3}

Socket – Bout Mapping

Socket-id –{b1, b2, b3}

Socket – Bout Mapping

Socket-id –{b1, b2, b3}

Socket – Bout Mapping

Socket-id –{b1, b2, b3}

• There is a unique socket connection from each client to server. They are identified by socket.io client program

• The bouts subscribed by the clients are notified to socket.io client program. The program informs the server about the client bout subscription

• The server manages a bout – socket mapping for each client. This is achieved using socket.io rooms concept

• When message are broadcasted from Redis to Node servers, the node identifies client subscription and pushes only request bout clock updates.

Cloud.8KMiles.com

Choosing Instance type

7

Hosting Amazon Web Services – US West Oregon VPC

Monitoring AWS CloudWatch

AMI Amazon Linux HVM Image

Scaling Auto Scaling group and configuration

AWS resources Compute & Store : EC2 EBS store, SSD EBS volumes, Enhanced networking, Multi Availability zone

Load Balancing : Elastic Load Balancer

Caching : ElastiCache Redis2.8

Infrastructure instance

specification

LSAGR : Instance type c3.2xlarge (8 vCPU, 15GB memory, 2 * 80 GB SSD storage, High networking)

Dash : Instance type r3.2xlarge (8 vCPU, 61GB memory, 1 * 160 GB SSD storage, High networking)

Redis Cache Node : m3.2xlarge (8 vCPU, 27.9GB memory, High networking)

• All contexts requires high network bandwidth instance types. Network throughput is very important as the amount the data travels in - out is very high for every second.

• LSAGR – High CPU / Discovery High CPU /Dash - High Memory / Redis – High Memory

• Arriving number of nodes

• Define payload mean – Mean size of live updates from Arena servers

• Anticipated number of concurrent users for an event (based on type of event, previous history, marketing reach and etc.)

• Benchmarked concurrent requests of varied instance types

• Formula : Number of nodes required (minimum) = (concurrent users / benchmarked number) * payload mean

• Example 1: Number of nodes required (minimum) = 1 M / 50000) * 2 KB = 40 nodes

• Example 2: Number of nodes required (minimum) = 500K / 50000) * 3 KB = 30 nodes

Cloud.8KMiles.com

LIVEresults Design Statement –Other notes

• The LIVEresults AWS Infrastructure is setup under Virtual Private Cloud.

• The EC2 instance is a EBS store AMI with Amazon Linux operating system.

• OS Hardening on all EC2 nodes

• LIVEresults nodes is setup with AWS AutoScaling where nodes will be scaled out and down based on the server load variant. The nodes will be setup across multiple availably zones to handle zone level failures.

• The Redis is actually AWS ElastiCache with RR and MutliAZ.

• LIVEresults nodes will be monitored using Amazon CloudWatch service. This will work in conjunction with AWS AutoScaling.

• Detailed monitoring will be enabled for CloudWatch for the required AWS resources. During CloudWatch triggers, notifications (SES) will be sent to Flocasts Admin team.

• Write to AWS team before start of any events and warm up the servers under AutoScaling and ELB

• Scale down during events off-time and off seasons

8

Cloud.8KMiles.com

Key points

• Benchmark varied EC2 instance types (Memory and Compute optimized instances)

• Identify the right instance type by running brute-force test and load, stress tests

• Identify the capacity of the instance type that can handle x concurrent users

• Why Redis (Amazon ElastiCache)

• Redis is now seen as No-SQL data store than a conventional Cache system

• Key-value store Redis supports basic and necessary data type like String, sets, lists

• Pub-Sub functionality and Atomic operations support

• Offered as Cache as service which takes away all operations and administration burdens (failover, scaling, multi AZ)

• Can scale out and scale out based on load requirements

• Amazon Web Services

• VPC – Clear distinguish between public and private subnets

• Use building block services

• Flexible Auto Scaling policies

• Redis - Best Practices

• Planning

• Instance type with fast processor for Redis (Single Threaded)

• Use Multi-AZ and Read Replica

• Private Subnets

• Keep close monitoring in Cache miss and hits, total connections and replication lag

• Implementation

• Build intelligence at code level to understand failure and as well manage primary and secondary data source

• Use ElastiCache libraries, Sorted sets, Auto discovery – Use configuration endpoint, Consistent hashing, Pub-Sub 9

Cloud.8KMiles.com

• Async.io, AtmosphereServer Library

• Java Custom componentService Discovery

• Netty

• UndertowContainers / Server

• Amazon Elastic Load BalancerLoad Balancer

• Amazon ElastiCache RedisPersistence

• Amazon Web ServicesHosting

• Amazon CloudWatch

• Other custom internal toolsMonitoring

• Amazon LinuxOperating System

LIVEresults - Technologies

Cloud.8KMiles.com

Sportlytics -Analytics Platform (On going)

11

• Data Collection

• Sports and Events data – match stats

• Player, Team performances and other meta-data

• Live Streaming stats – number of users, requests, failure / errors

• App usage stats - number of users, requests, failure / errors

• Social feeds - Collect and aggregate social feed data from Twitter, Facebook and in-bound comments

• Marketing campaigns

• Analytics and Reporting

• Arrive at players, teams past performance and present rankings data

• Arrive at popularity, star meter based on social feeds

• Arrive at results of previous events and present history data in the context of current events

• Numbers of concurrent users from archives and present stats based on sports, event type and etc. – Apps and Live Streaming

• Predict anticipated users / requests based on past data, marketing campaigns for individual sports

• Recommend players and teams based on user’s profile and browsing history

Cloud.8KMiles.com

• Java, Scala, PythonProgramming

• Amazon S3

• Amazon SQS

• Amazon SNS

Building block services

• Amazon RedshiftAnalytics

• Amazon RDS AuroraPersistence

• Amazon Web ServicesHosting

• Amazon CloudWatchMonitoring

• Amazon LinuxOperating System

Sportlytics - Technologies