Scaling SIP Servers
description
Transcript of Scaling SIP Servers
![Page 1: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/1.jpg)
Scaling SIP Scaling SIP ServersServers
Sankaran NarayananJoint work with CINEMA team
IRT Group Meeting – April 17, 2002
![Page 2: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/2.jpg)
AgendaAgenda Introduction Issues in scaling Facets of sipd architecture Some results Conclusion and Future Work
![Page 3: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/3.jpg)
Introduction – SIP serversIntroduction – SIP servers SIP Signaling – Proxy,
redirect Proxies
Call routing by contact location
UDP/TCP/TLS Stateful or stateless Programmable scripts
User location – Registrars
SQLdatabase
![Page 4: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/4.jpg)
What is scale ?What is scale ? Large call volumes,
commodity hardware [Schu0012:Industrial]
Response times (mean, deviation), Turn around time
Goals Delay budget [SIPstone]
R2 < 2 s R1 < 500 ms
Class-5 switches handle > 750K BHCA
REGISTER
200 OK
INVITE
180
INVITE
180200
200
ACKACK
R1
R2
![Page 5: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/5.jpg)
Limits to scalingLimits to scaling Not CPU bound
Network I/O – blocking Wait for responses Latency: Contact, DNS lookups
OS resource limits Open files (<= 1024 on Unix) LWP’s (Solaris) vs. user-kernel threads
(Linux, Windows) Try not to…
Customize and recompile OS (parts) server into kernel (khttpd, AFPA, …)
![Page 6: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/6.jpg)
The problemThe problem Scaling CPU-bound jobs (throughput=1/delay)
Hardware: CPU speed, RAM, … Software: better OS, scheduler, … Algorithm: optimize protocol processing
Blocking (Network, Disk I/O) is expensive Hypothesis
I/O-bound CPU-bound; reduce blocking Optimized resource usage – stability at high
loads
![Page 7: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/7.jpg)
Facets of sipd architectureFacets of sipd architecture Blocking Process models Socket management Protocol processing
![Page 8: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/8.jpg)
BlockingBlocking Mutex, event (socket,
timeout), fread Queue builds up
Potentially high variability Tandem queue system
Easy to fix Non-blocking calls (event
driven, later!) Move queue to different
thread (lazy logger)
Logger { lock; write; unlock;}
![Page 9: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/9.jpg)
Blocking (2)Blocking (2) Call routing involves ( 1)
contact lookups 10 ms per query (approx)
Cache Works well for sipd style
servers Fetch-on-demand with
replacement (harder) Loading entire database is easy
need for refresh – long lived servers.
Potentially useful for DNS SRV lookups (?)
SQLdatabase
Cache
PeriodicRefresh
< 1 ms
![Page 10: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/10.jpg)
REGISTER performanceREGISTER performanceSingle CPU Sun Ultra10
Response time is constant for Cache (FastSQL)
![Page 11: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/11.jpg)
Process models (1)Process models (1)One thread per
request Doesn’t scale
Too many threads over a short timescale
Stateless proxy: 2-4 threads per transaction
High load affects throughput
R1R2
R3
R4
IncomingRequestsR1-4
Load
Thro
ughp
ut
![Page 12: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/12.jpg)
Process models (2)Process models (2)Thread pool + Queue Thread overhead less;
more useful processing Overload management
drop requests over responses, drop tail
Not enough if holding time is high
Each request holds (blocks) a thread
IncomingRequestsR1-4
Fixed number of threads
Load
Thro
ughp
ut
![Page 13: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/13.jpg)
Stateless proxy (Solaris)Stateless proxy (Solaris)
Turnaround time is almost constant for stateless proxy
• The sudden increase in response time - client problem
• UDP losses on Ultra10 @ (120 * 6 * 500 * 8) bps
![Page 14: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/14.jpg)
Stateless proxy (Linux)Stateless proxy (Linux)
Request turnaround time breaks downResponse turnaround time is constantEffect of high holding times and thread schedulingHow to set queue size – investigate?
![Page 15: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/15.jpg)
Queue evolution for sipdQueue evolution for sipd
Number of requests (y-axis) waiting in the queue for a free thread on Solaris (left) and Linux (right) over a period of up-time (x-axis).
![Page 16: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/16.jpg)
Process models (3)Process models (3) Blocking thread model needs “too
many” threads Stateful transaction stays for 30 s Return thread to free pool instead of
blocking Event-driven architectures
State transition triggered by a global event scheduler
OnIncoming1xx(), OnInviteTimeout(), … SIP-CGI: pre-forked multiple processes
![Page 17: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/17.jpg)
Socket managementSocket management Problem: open sockets limit (1024),
“liveness” detection, retransmission One socket per transaction does not
scale Global socket if downstream server is
alive, soft state – works for UDP Hard for TCP/TLS – connections Worse for Java servers – no select, poll
![Page 18: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/18.jpg)
Optimizing protocol Optimizing protocol processingprocessing Not too useful if CPU is not the
bottleneck Text protocol - parsing, formatting
overheads Order of headers matter (Via) Other optimizations (parse-on-
demand, date formatting). . .
![Page 19: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/19.jpg)
ConclusionConclusion Unlike web servers: can be stateful, less
disk I/O, lesser impact of TCP stack/behavior, …
Pros: UDP, Stateless routing, Load-balancing using DNS, …
Challenges: scaling state machine, Towards 2.5M BHCA (3600 messages/s)
Event driven architecture (SEDA?) Resource management (file limits, threads) Tuning operating system (scheduler, …)
![Page 20: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/20.jpg)
Future workFuture work Stateful proxy performance
Evaluate event driven architecture Effect of request forking (> 1
contacts) on server behavior Programmable scripts
Queue management and overload control
Other types of servers (conference servers, media servers, etc.),
![Page 21: Scaling SIP Servers](https://reader036.fdocuments.us/reader036/viewer/2022062410/56815be6550346895dc9db1e/html5/thumbnails/21.jpg)
ReferencesReferences CINEMA web page.
http://www.cs.columbia.edu/IRT/cinema H. Schulzrinne. “Industrial strength
internet telephony,” Presentation at 6th SIP bakeoff, Dec. 2000.
H. Schulzrinne et. al. “SIPstone – Benchmarking SIP server performance,” CS Technical report, Columbia University.