WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

19
webRTC infrastructure the secret sauce Dr. Alex Gouaillard, CTO – Temasys Communications

Transcript of WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Page 1: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

webRTC infrastructurethe secret sauce

Dr. Alex Gouaillard,CTO – Temasys Communications

Page 2: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Plan of the presentation

- The reference code

- A little bit better, but still the usual suspects

- The first real pain: Scalability

- Others things to consider

Page 3: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Plan of the presentation

- The reference code

- A little bit better, but still the usual suspects

- The first real pain: Scalability

- Others things to consider

Page 4: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Alice

Web server

Signaling serverSTUN

• Scalable• Maintained• Reliable ??

The reference code: appRTC

TURNMNGT

TURN

HTTP / AJAX / REST

ICE UA

GAE Channel

Page 5: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Alice

Signaling server

The reference Code:appRTC Signaling Design

HTTP / AJAX / REST

ICE UA

GAE Channel

Page 6: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Alice

Signaling server

The reference Code:appRTC Signaling Design

HTTP / AJAX / REST

GAE API Calls

GAE Channel

DB

Signaling server

Page 7: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Alice

Signaling server

The reference Code:appRTC Signaling Design

HTTP / AJAX / REST

GAE API Calls

GAE Channel

DB

Signaling server

Induce delays

Just unreliable

Page 8: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

DO

NE

FR

OZE

N

IN C

ALL

CO

NN

ECTI

ON

HAN

DSH

AKE

(SD

P O

/A)

.

new

checking

connected

disconnected

failed

Completed

close

new

gathering

complete

CALLER SIG-SERVER CALLEE

stable

have-local-offer

stable

Close

CONNECT

ENTER

WELCOME

OFFER

ANSWER

Create PC

Add local stream(s)Create offer<modify sdp>SetLocal(offer)Sending offer Create PC

SetRemote(offer)addRemoteStream(s)Add local stream(s)Create answer<modify sdp>SetLocal(answer)Send answer

stable

Have-remote-offer

stable

new

gathering

complete

onIceCandidate<Filter candidates>Send candidate

<Filter candidates>addIceCandidate

onIceCandidate<Filter candidates>Send candidate

<Filter candidates>addIceCandidate

new

checking

connected

disconnected

failed

Completed

close Close

© Temasys Communications, pvt, ltd, 2014 Document provided under CC BY-NC 4.0

CANDIDATES

PeerConnection

ICEConnection

ICEGathering

SetRemote(answer)addRemoteStream

BYE

Page 9: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

DO

NE

FR

OZE

N

IN C

ALL

CO

NN

ECTI

ON

HAN

DSH

AKE

(SD

P O

/A)

.

new

checking

connected

disconnected

failed

Completed

close

new

gathering

complete

CALLER SIG-SERVER CALLEE

stable

have-local-offer

stable

Close

CONNECT

ENTER

WELCOME

OFFER

ANSWER

Create PC

Add local stream(s)Create offer<modify sdp>SetLocal(offer)Sending offer Create PC

SetRemote(offer)addRemoteStream(s)Add local stream(s)Create answer<modify sdp>SetLocal(answer)Send answer

stable

Have-remote-offer

stable

new

gathering

complete

onIceCandidate<Filter candidates>Send candidate

<Filter candidates>addIceCandidate

onIceCandidate<Filter candidates>Send candidate

<Filter candidates>addIceCandidate

new

checking

connected

disconnected

failed

Completed

close Close

© Temasys Communications, pvt, ltd, 2014 Document provided under CC BY-NC 4.0

CANDIDATES

PeerConnection

ICEConnection

ICEGathering

SetRemote(answer)addRemoteStream

BYE

ORDER & TIMING

SENSITIVE

Page 10: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

The reference codeyou can make it reliable

- But then, the handshake is stretched to up to 10 s …

Page 11: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Plan of the presentation

- The reference code

- A little bit better, but still the usual suspects

- The first real pain: Scalability

- Others things to consider

Page 12: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Alice

Node.jsserver

Better DesignUsual architecture

Socket.io

• Much more Reliable• Much faster (< 1s)• Huge Ecosystem• Not Scalable• Not Maintained => many API/SDK

Page 13: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Better DesignScalability

• Bandwidth intensive instances in AWS?• Socket intensive instances in AWS?• Socket.io 0.9 not CPU scalable• Horizontal scalability• Node fails often• B-E update deployment = downtime

Page 14: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Plan of the presentation

- The reference code

- A little bit better, but still the usual suspects

- The first real pain: Scalability

- Others things to consider

Page 15: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Big Pain: ScalabilityCPU Scalability

• Bandwidth intensive instances in AWS?• BIG instance with HVM (c3.8xl)

• Socket.io 0.9 not CPU scalable

• socket.io 0.9 scales by using a store replicate the data object to each instance of the signaling server.

• It just replicates load everywhere... All instances are doing the same work essentially

• Socket.io 1.x changes the way the store works. The stores become and event bus, and each instance run independently only handling clients connected to them.

• they all listen to all events however so two clients connected to different instances can still talk!

core

Redisstore

Page 16: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Big Pain: ScalabilityHorizontal Scalability

• Horizontal scalability• Business as usual: reverse proxy load balancing.

• Usual suspect: NGINX

Redisstore

NGINX

Page 17: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Big Pain: ScalabilityNode Zombification

• Node fails often• B-E update deployment = downtime?

core

Cluster API is nice, Forever is nice,pm2 is better• Keep alive• Clustering / load balancing• Log aggregation• Terminal monitoring• …

Page 18: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Plan of the presentation

- The reference code

- A little bit better, but still the usual suspects

- The first real pain: Scalability

- Others things to consider

Page 19: WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF

Advanced infrastructure: Others

- Session management - Auth- Usage- Payment

Keys to state of the art webRTC infrastructure- MCUs - TURN/STUN