WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
-
Upload
alex-gouaillard -
Category
Software
-
view
71 -
download
0
Transcript of WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
webRTC infrastructurethe secret sauce
Dr. Alex Gouaillard,CTO – Temasys Communications
Plan of the presentation
- The reference code
- A little bit better, but still the usual suspects
- The first real pain: Scalability
- Others things to consider
Plan of the presentation
- The reference code
- A little bit better, but still the usual suspects
- The first real pain: Scalability
- Others things to consider
Alice
Web server
Signaling serverSTUN
• Scalable• Maintained• Reliable ??
The reference code: appRTC
TURNMNGT
TURN
HTTP / AJAX / REST
ICE UA
GAE Channel
Alice
Signaling server
The reference Code:appRTC Signaling Design
HTTP / AJAX / REST
ICE UA
GAE Channel
Alice
Signaling server
The reference Code:appRTC Signaling Design
HTTP / AJAX / REST
GAE API Calls
GAE Channel
DB
Signaling server
Alice
Signaling server
The reference Code:appRTC Signaling Design
HTTP / AJAX / REST
GAE API Calls
GAE Channel
DB
Signaling server
Induce delays
Just unreliable
DO
NE
FR
OZE
N
IN C
ALL
CO
NN
ECTI
ON
HAN
DSH
AKE
(SD
P O
/A)
.
new
checking
connected
disconnected
failed
Completed
close
new
gathering
complete
CALLER SIG-SERVER CALLEE
stable
have-local-offer
stable
Close
CONNECT
ENTER
WELCOME
OFFER
ANSWER
Create PC
Add local stream(s)Create offer<modify sdp>SetLocal(offer)Sending offer Create PC
SetRemote(offer)addRemoteStream(s)Add local stream(s)Create answer<modify sdp>SetLocal(answer)Send answer
stable
Have-remote-offer
stable
new
gathering
complete
onIceCandidate<Filter candidates>Send candidate
<Filter candidates>addIceCandidate
onIceCandidate<Filter candidates>Send candidate
<Filter candidates>addIceCandidate
new
checking
connected
disconnected
failed
Completed
close Close
© Temasys Communications, pvt, ltd, 2014 Document provided under CC BY-NC 4.0
CANDIDATES
PeerConnection
ICEConnection
ICEGathering
SetRemote(answer)addRemoteStream
BYE
DO
NE
FR
OZE
N
IN C
ALL
CO
NN
ECTI
ON
HAN
DSH
AKE
(SD
P O
/A)
.
new
checking
connected
disconnected
failed
Completed
close
new
gathering
complete
CALLER SIG-SERVER CALLEE
stable
have-local-offer
stable
Close
CONNECT
ENTER
WELCOME
OFFER
ANSWER
Create PC
Add local stream(s)Create offer<modify sdp>SetLocal(offer)Sending offer Create PC
SetRemote(offer)addRemoteStream(s)Add local stream(s)Create answer<modify sdp>SetLocal(answer)Send answer
stable
Have-remote-offer
stable
new
gathering
complete
onIceCandidate<Filter candidates>Send candidate
<Filter candidates>addIceCandidate
onIceCandidate<Filter candidates>Send candidate
<Filter candidates>addIceCandidate
new
checking
connected
disconnected
failed
Completed
close Close
© Temasys Communications, pvt, ltd, 2014 Document provided under CC BY-NC 4.0
CANDIDATES
PeerConnection
ICEConnection
ICEGathering
SetRemote(answer)addRemoteStream
BYE
ORDER & TIMING
SENSITIVE
The reference codeyou can make it reliable
- But then, the handshake is stretched to up to 10 s …
Plan of the presentation
- The reference code
- A little bit better, but still the usual suspects
- The first real pain: Scalability
- Others things to consider
Alice
Node.jsserver
Better DesignUsual architecture
Socket.io
• Much more Reliable• Much faster (< 1s)• Huge Ecosystem• Not Scalable• Not Maintained => many API/SDK
Better DesignScalability
• Bandwidth intensive instances in AWS?• Socket intensive instances in AWS?• Socket.io 0.9 not CPU scalable• Horizontal scalability• Node fails often• B-E update deployment = downtime
Plan of the presentation
- The reference code
- A little bit better, but still the usual suspects
- The first real pain: Scalability
- Others things to consider
Big Pain: ScalabilityCPU Scalability
• Bandwidth intensive instances in AWS?• BIG instance with HVM (c3.8xl)
• Socket.io 0.9 not CPU scalable
• socket.io 0.9 scales by using a store replicate the data object to each instance of the signaling server.
• It just replicates load everywhere... All instances are doing the same work essentially
• Socket.io 1.x changes the way the store works. The stores become and event bus, and each instance run independently only handling clients connected to them.
• they all listen to all events however so two clients connected to different instances can still talk!
core
Redisstore
Big Pain: ScalabilityHorizontal Scalability
• Horizontal scalability• Business as usual: reverse proxy load balancing.
• Usual suspect: NGINX
Redisstore
NGINX
Big Pain: ScalabilityNode Zombification
• Node fails often• B-E update deployment = downtime?
core
Cluster API is nice, Forever is nice,pm2 is better• Keep alive• Clustering / load balancing• Log aggregation• Terminal monitoring• …
Plan of the presentation
- The reference code
- A little bit better, but still the usual suspects
- The first real pain: Scalability
- Others things to consider
Advanced infrastructure: Others
- Session management - Auth- Usage- Payment
Keys to state of the art webRTC infrastructure- MCUs - TURN/STUN