Understanding VoIP

Post on 11-Jan-2016

33 views 0 download

Tags:

description

Understanding VoIP. Dr. Jonathan Rosenberg Chief Technology Strategist Skype. What is this course about?. Getting “under the hood” and understanding how VoIP works An exploration of the protocols and technologies behind VoIP - PowerPoint PPT Presentation

Transcript of Understanding VoIP

Understanding VoIP

Dr. Jonathan Rosenberg

Chief Technology Strategist

Skype

What is this course about?

Getting “under the hood” and understanding how VoIP works

An exploration of the protocols and technologies behind VoIP

Conveying an understanding of the various problems that need to be solved for VoIP to work

What this course is not about

A general introduction to telephony A detailed cookbook or deployment guide to

VoIP A product survey of VoIP and IP telephony

products In particular, Cisco or Skype products are not

discussed except in passing

Ground Rules

Ask Questions ANY TIME! I will be bored if this is a one way

conversation No question is too stupid Laughing or mocking anyones questions is

unacceptable Please ask off-the-wall or exploratory

questions – there is a lot that is not in here!

Agenda

Breaking up the problem Voice and Video coding Voice and Video Transport Quality of Service Signaling Security NAT Traversal

Non-Agenda

Programming APIs Emergency Services, Lawful Intercept Numbering, Routing, Naming (ENUM, TRIP) PSTN Interworking Billing, Provisioning, OAM Conferencing, IVR, Applications

Breaking Up the Problem

Endpoint Endpoint

IP NetworkIP Network

SignalingServers

DirectoriesDatabases

AccountingBilling

PresenceServers

MediaServers

OAM

ApplicationServer

RTP

IPIP

SIP, H.323,MGCP,H.248 SIMPLE,

XMPP

SIP

LDAP,ENUM

RADIUSDIAMETER

Voice Coding

DTMF/Tone

Generation

DTMF/ToneDetection

Hybrid EchoCanceller

LossAdmin

NonlinearProcessing

+

-

Silence Detection

SpeechEncoding

Packetizer

No Speech

Speech

Unpacker

ComfortNoise

Generation

SpeechDecoding

2-wire interface

Voice Endpoint Model

Codecs Waveform codecs:

Directly encode speech in an efficient way by exploiting temporal and/or spectral characteristics

Attempt to reproduce input signal’s waveform by minimizing error between input and coded signals

Source codecs / vocoders: Estimate and efficiently encode a parametric

representation of speech

CELP Minimizes perceptually

weighted error similar to waveform coders

Short-term predictor is LP (vocal tract) filter

Excitation is obtained from codebook and long-term pitch predictor

Closed-loop search is MIPS intensive

Codec ComparisonCodec Sampling Bitrate Latency Comments

G.711 8 Khz 64 kbps 125 us PSTN Codec

G.729 8 Khz 8 kbps 10ms CS-ACELP

G.723.1 8 Khz 5.3/6.3 kbps 37.5ms

AMR 8 Khz 4.75 – 12 kbps

25ms GSM codec

G.722.1 16 Khz 24/32kbps 40ms Polycom SIREN

AMR-WB 16 Khz 6.6-23.85 kbps

25ms GSM Wideband – encumbered

SILK 8, 12, 16, 24 Khz (SWB)

6-40kbps 25ms Skype codec

Listen at: http://www.voiceage.com/listeningroom.php

Echo Cancellation

Packet Network

Echo Path

Estimation2-4-wire

Hybrid

Non-LinearProcessor

+

-Reflection

Analog

Digital

Echo Canceller

ERLE

ERL

This echo canceller cancels‘local’ echoes from the hybrid reflection

ERL: Echo Return Loss (dB)

ERLE: Echo Return Loss Enhancement

Double-talk Convergence time

Echo Canceller Specifics The voice echo path is like an electrical circuit

If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will eliminate the echo

The easiest place to make the break is with a canceller ‘looking into’ the local analog/digital telephony network, NOT the packet network (which has much longer and variable delays)

The echo canceller at the other end of the call eliminates the echoes that YOU hear, and vice versa

Echo canceller coverage (e.g. 32 ms) is the maximum length of echo impulse response that can be cancelled from the local analog/digital network (the packet network delay does not matter)

The non-linear processor is used to ‘clean-up’ any residual echo left over from the canceller

Voice Activity Detection

Speech Magnitude (dB)

Speech Detected Hang-Over Speech Detected Hang-Over

time

Sentence 1 Sentence 2

Typically fixedat 200 ms

Noise Floor

Signal-to-NoiseThreshold

Front-endSpeech Clipping

Front-endSpeech Clipping

Comfort Noise Generation Silence isn’t golden…it’s annoying

When speech stops…what do you play to the listener?

Simple techniques: Play white/pink noise Replay last receiver packet over and over

Fancier technique: Transmitter measures local “noise environment” Transmitter sends special “comfort noise” packet

as last packet before silence Receiver generates noise based CN packet.

MOS of 4.0 = Toll Quality

Voice Quality:Mean Opinion Scores

Source Impairment

Codec ‘X’

Channel Simulation

“Nowadays, a chicken leg isa rare dish”

1 2 3 4 5

1 2 3 4 5

Rating

Speech Quality

Distortion

5 Excellent Imperceptible

4 GoodJust perceptible but not annoying

3 FairPerceptible and slightly annoying

2 PoorAnnoying but not objectionable

1Unsatisfactory

Very annoying and objectionable

Clear Channel MOS’s

MeanOpinionScore

5

G.711(64 kbit/sPCM)

4.1

G.726(32 kbit/sADPCM)

G.723.1(6.4 kbit/sMP- MLQ)

G.729(8 kbit/sCS-ACELP)

IS-54(8 kbit/sNA DigCellular)

3.8 3.9 3.93.44

3

2

1

MOS Under Varying ConditionsG.729

Avg Speech Level (-20 dBmO) 3.85Low I nput Level (-30 dBmO) 3.542 Tandem codings 3.463 Tandem codings 2.681% Frame Erasure Rate5% Bit Error Rate 3.245% FER 3.0210% FER20% FER

Video Coding

Key Terms

Term Description

Frame An individual picture in a sequence that makes up the video

Frame Rate The number of frames per second in video. 30 is excellent (TV quality)

Resolution The number of horizontal and vertical pixels. VGA=640x480.

Interlacing A mechanism for transmitting video by splitting a frame into two fields, one field representing the odd lines, and one the even field. This is the “i” in 1080i

Progressive As opposed to interlaced, a method for transmitting video by sending each frame as a whole.

HD High Def resolutions – 720p is 1280x720 with 60fps. 1080i is 1920x1080 at 30fps

Key Concept: Macroblocks

Rectangular block inan image which isa basic unit ofcompression. Typically16x16 pixels.

Key Concept: Inter-Frame Prediction

Encode

Predict information in the current frame by looking at previous frames,possibly taking into account motion.

Key Concept: Discrete Cosine Transform (DCT)

A technique for representing amacroblock by its component frequencies. Discarding the higherfrequencies throws away the finerdetails without losing the core image.

Increasing horizontal frequenciesIncreasing vertical frequencies

Video Encoder Block Diagram

Key Codec Comparisons

Codec Timeline Applications

H.261 1990 ISDN at multiples of 64kbps

H.263 1996 Early Flash using Sorenson Spark implementation. Original RealVideo codec. Required in IMS.

H.264 –AVC

2003 Youtube, iTunes, Blu-ray; most modern video conferencing. The current primary video codec for real-time. Typical VGA 15fps bitrate = 500kbps

H.264-SVC

2007 “Layered” video that provides improved quality and resilience; ideal for multiparty video conferencing.

VP7 2005 On2 Technologies codec; Skype, successor to H263 in Flash

Voice and Video Transport: RTP

RTP: What is it? Real Time Transport Protocol RFC 3550

product of avt working group 1996 proposed standard –

RFC1889 2004 full standard

What does it do e2e transport of real time media optimized for multicast provides sequencing, timing,

framing, loss detection provides feedback on reception

quality

What does it do (cont) provides information on

group members provides data to correlate

audio and video and other media

Works with any codec need payload format for

each codec Flexible

RTP: What isn’t it? Doesn’t guarantee quality of

service doesn’t reserve network

resources doesn’t guarantee no loss or

bounded delay can work with QoS protocols

(RSVP) Doesn’t provide signaling

other protocols must be used to set up RTP (like SIP or H.323)

Not a specific protocol type Does not run directly

ontop of IP Runs ontop of UDP No fixed port number

RTP Stack

IP

UDP

RTP RTCP

Big Picture: RTP, SDP and SIP

End

User

End

User

Proxy Proxy

IP Network

SIP w/ SDP

C=IN IP4 123.1.2.3m=audio RTP/AVP 1122 0 1m=video RTP/AVP 1130 98a=rtpmap:98 h263

RTP

RTP Components: Data + Control

Data aka RTP very confusing

Usually on an even UDP port (NATs change this – later)

Provides sequencing timing framing content labeling User identification

Control = Real Time Control Protocol (RTCP)

Same address as data, but one higher port usually

Provides reception quality sender statistics participant information

(multicast) synchronization

information

Real Time Data Transport Originator breaks stream into

packets (segmentation) application layer framing

(ALF)!!! Packets sent; network may

lose, delay, reorder packets Must, at receiver:

reorder recover resegment rescynchronize clock synchronization!

RTP Source

RTP Sink

RTP

Packets

Transport System

Source Digitize Audio from mike Silence Suppression Echo cancellation Compress Audio

G.711: 64 kbps G.729: 8 kbps G.723.1: 5.3/6.3 kbps

Packetize Audio in RTP Send

Sink Receive packets Un-packetize decompress comfort noise generation reorder recover loss jitter buffer A/D conversion to

speakers

Jitter Buffer Packets delayed differently Must play them out

periodically Packets may arrive after

designated playout time -> loss

Insert extra delay to compensate

May need to adapt this amount

time

pkts

RTP Packet Header

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RTP Header Fields Version: 2 P: indicates padding (for

encryption) X: extension bit CSRC count: for mixers

(later) M: Marker Bit: indicates

framing audio codecs: first packet

in talkspurt video: last packet in frame

Payload Type: indicates encoding in RTP packet allows changes

per-packet Useful for:

adaptation DTMF codec silence codecs

SN: defines ordering of packets Timestamp: when packet was

generated SSRC: identifier CSRC: list of mixed users

RTP Timestamp

Tick units are dependent on codec For speech: 125

microseconds (standard 8 khz sampling rate)

For video: 90 KhZ For audio: 44.1 KhZ (CD

rate) Gaps in TS, but not in

SN mean silence Initial value random for

security

Video Timestamp represents

time at beginning of frame Many packets may have

same timestamp Speech

Time per packet may vary Depends on packetization:

20-100ms typical

Payload Formats Each codec needs a way to

be encapsulated in RTP RFC3550 defines

mechanisms for many common codecs G.711, G.729, G.723.1,

G.722, etc. Some simple video

More complex codecs have their own payload format documents MPEG H.263 and H.261

Payload format defines How to break frame into

packets extra fields needed below

main RTP header

Advanced Topics

DTMF and Tones RFC 2833 Special codecs for

encoding touch tones (DTMF) and other signals

Can send either the waveform (frequency, amplitude)

Or the actual signal (#, 8, 0)

Compressed RTP RFC 2508 For dialup links Don’t send header, just

send index Far side uses index to

retrieve header, and then increments certain fields

Quality of Service

Quality of Service

The problem we are trying to solve is to give “better” service to some at the expense of

giving worse service to to others — QoS fantasies to the contrary, it’s a zero sum

game

- Van Jacobson

Quality of Service So, what’s the problem?

Usability of Voice Circuit as a Function of End-to-End Delay

Time (msec)

Uti

lity

0.0

0.5

1.0

0

100

200

300

400

500

600

700

800

TollQuality

Early I-Phone TechnologyyImproving I-Phone

means:

• Lower PC Delay

• Lower Network Latency

• Tighten Network Jitter

SatelliteZone

CBZone

Fax Relay, Broadcast

Private NetworkVoFR & VoIPTechnology

Delay Budget Device sample capture Encode delay (algorithmic delay + processing delay) Packetization/framing Move to output queue/queueing delay Access (up) link transmission Backbone network transmission Access (down) link transmission Input queue to application Jitter buffer Decode processing delay Device playout delay

“The Network”

Some Techniques to Improve “Network QoS”

RED — Random Early Drop (or “Detect”) WFQ — Weighed Fair Queuing Intserv/RSVP — ReSerVation Protocol IP Precedence DiffServ CRTP — Compressed Realtime

Transport Protocol MCML — Multi-Class Multi-Link PPP

Random Early Detect (RED)this is Basic Hygiene!

Objectives Keep average queue size

low – good for voice Fairness – bigger streams

punished more Avoid synchronization

Only works with loss responsive transport protocols

Algorithm – probabilistic dropping of packets Queue Size

Drop P

robability

1

Min Max

Poll: Will RED Help Voice?

Yes No

• Voice not loss responsive• Mixing voice and data in same queue bad• Voice queues usually not congested

Weighted Fair Queueing

Each flow “sees” a dedicated amount of bandwidth Bj

A packet arriving at time t is transmitted at time t+size/Bj

B1

B3

B2

B

B = B1 + B2 + B3

Whats the Problem??

WFQ is unrealizable because Variable packet sizes Causality

Example: Link speed 100Kbps Flow 1: 10Kbps Flow 2: 90Kbps

1500

100

1500 100

8.8msTheory

128msActual

Approximations of WFQ

Many PhDs written with approximate and implementable algorithms

Algorithms differ in their delay bound How much worse than

perfect WFQ is this? Delay bounds a function of

bandwidth, number of queues, other params

Algorithms

SCFQ: Self-Clocked Fair QueueingWF2Q: Worst-Case Fair Weighted Fair QueueingFBFQ: Frame-Based Fair QueueingPGPS: DRR:

WFQ Voice Configuration

How to pick allocated bandwidth? Consider G.711, 30ms framing (74.6Kbps)

If Bi = 74.6kbps, delay is at least 30ms If Bi = 149.2Kbps, delay at least 15ms

Must set voice queue bandwidth at least 2x actual voice usage to keep delays down!

Unused bandwidth will go to data Need an accurate WFQ Implementation

Priority Queueing

Emulates the familiar “elite airport line” experience

Voice and data packets in separate queues

If there is any packets in voice queue, they are serviced

Voice Data

Server

Priority Queueing Considerations Easy to configure – no bandwidth values

required Main problem – data starvation Need to police voice queue Doesn’t work as well when there is other non-

voice high priority traffic (video) Head-of-Line Blocking from data queue

Intserv: Integrated Services Guaranteed Service (RFC 2212)

Mathematically provable bounds on end-to-end datagram queuing delay/bandwidth

Controlled Load Service (RFC 2211) Approximate QoS from an unloaded network for

delay/bandwidth Describe traffic with a “TSPEC”

r= token bucket rateb= token bucket depthp= peak transmission ratem= minimum (policed) packet sizeM= maximum packet size

Describe endpoints with a « FlowSpec » Source/Destination IP addresses, ports, protocol

RSPEC/FSPEC provides the policy to the queuing/scheduling algorithms

RSVP Design

Signaling distinct from routing (modularity, deployability, evolvability)

Soft state (robustness, simplicity) Transparent operation across non-RSVP routers

(deployability) Support shared and distinct reservations Applies to unicast & multicast applications Simplex & receiver-oriented.

RSVP protocol

PATH : Source Destination Traffic parameters of source Collects info on network capabilities Detects current route

RESV: Source Destination Receiver selected Int-Serv service Traffic parameters of receiver selected reservation Follows route detected by PATH Reservation actually nailed in network

RSVP messages carried over IP Can also be carried over UDP but few people do that

pathSrc Dest.resv

RSVP: Admission Control

Route Selection

Interface 1

Interface N

RoutingProtocol

Routing Database

Packets InPackets Out

Packets Out

AdmissionControl

Resource UtilizationDatabase

Switching

Routing

Queuing Policy Database

Flow Request

ReservationProtocol

Packet Scheduler

Packet Scheduler

Intserv/RSVP Acceptance

Time

Enthusiasm

TodayISP

Intserv/RSVP will solvethe world’s QoS

Cool thing to say:“RSVP does not scale”

vBNS RSVP over ATM transparently transport RSVP

Realvalue

TodayEnterprise

RSVP for VoIP in Enterprise

IP Precedence & Diffserv “Poor man’s” approach to QoS Set IP Precedence/DSCP higher on voice packets

This puts them in a different queue, resulting in isolation from best effort traffic

Can be done by endpoint, proxy, or in routers through heuristics

Scales better than RSVP – Keeps QoS control “local” Pushes work to the edges and boundaries Can provide bulk QoS by customer or network

No admission control Too much high-precedence traffic can still swamp the

network

Diffserv Architectural Model Clouds — regions of relative

homogeneity: Administrative control Technology Bandwidth

Within a cloud, QoS managed by local rules

Hard work confined to boundaries of clouds: Classification Conditioning/Policing

QoS information exchange limited to boundaries Bi-lateral, not multi-lateral Not necessarily symmetric

MeMeNot Me

Not Me

Also Not Me

Also Not Me

Far Away

Far Away

Diffserv Scalability Fundamental assumptions:

Relatively small number of feasible queuing/scheduling algorithms for high link speeds

Number of individual flows is large Many different rules, often policy driven

Group packets explicitly by the “Per-hop behavior (PHB)” they are to get Queue service Shaping/policing

Nodes in the middle of a cloud only have to deal with traffic aggregates

Diffserv Forwarding via PHBs

PHBs map to DSCPs (Diffserv Code Points) Values chosen for backward-compatibility with

IPv4 TOS byte including IP Precedence (RFC 2474)

Packets with different DSCPs may be re-ordered

Forwarding resources partitioned by PHB/DSCP

Assured Forwarding PHB(AF*) Four independent classes Within each class, three levels of drop

precedence A congested AF node discards packets with

higher drop preference first Packets with lowest drop preference must be

within the subscribed profile

*RFC2597

Expedited Forwarding PHB(EF*)

Targeted at VoIP and “virtual leased lines” Roughly equivalent to priority queuing,

with a safety measure to prevent starvation

Implications: No more than 50% of a link can be EF

see RFC3247,3248 for interesting mathematical analyses

Worst case jitter at each hop is max of: number of EF microflows in the aggregate, or a single MTU packet of some other aggregate

*RFC3246

Diffserv Traffic Conditioner

Classifier: selects a packet in a traffic stream based on the content of some portion of the packet header

Meter: checks compliance to traffic parameters (e.g. Token Bucket) and passes result to marker and shaper/dropper to trigger particular action for in/out-of-profile packets

Marker: writes/rewrites DSCP Shaper: delay some packets for them to be compliant with

the profile

Packets

Shaped

Dropped

Meter

Classifier Marker

Shaper /

Dropper

Diffserv Acceptance

Time

Enthusiasm

today

Diffserv will solvethe world’s QoS

Diffserv Engineering?Diffserv SLA ?Internet e2e SLA?

Diffserv Design & Deploymentintra Domain

Realvalue

Inter-SP Diffserv and end-to-endInternet QoS need furtherstandardisation and commercialarrangements

Mixing Intserv & Diffserv: Aggregation

Host signals with RSVP Edge or transit domains

Aggregate reservations mark packets using DSCP

In transit domains Blindly transfer end to end

reservations using another IP Protocol Number - change at edge

Routers detect egress of reservation (deaggregation) on transfer from an interior or aggregator interface to an exterior (deaggregating) interface

Aggregate reservation size varies with load

Edge

Edge

Backbone

RTP Compression

20ms @ 8kbit/s yields 20 byte payload

IP header 20; UDP header 8; RTP header 12 Twice size of

payload! Header compression:

40 bytes to 2-4 most of the time

Hop-by-hop: use only on the slow links

Sample Delay Budget (G.711 - 64kbps)

Delay Source (G.711) Budget (ms)Device Sample Capture .1Encode Delay (Algorithmic Delay + Processing Delay) 2.5Packetization/Fr aming 10 Move to Output Queue/ Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 I nput Queue to Application .5 J itter Buf fer 35 Decode Processing Delay .5 Device Playout Delay .5

Total 94.6

Sample Delay Budget (G.729 - 8kbps)

Delay Source (G.729) Budget (ms)Device Sample Capture .1Encode Delay (Algorithmic Delay + Processing Delay) 17.5Packetization/Fr aming 20 Move to Output Queue/ Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 I nput Queue to Application .5 J itter Buf fer 35 Decode Processing Delay 5 Device Playout Delay .5

Total 119.1

Signaling: SIP

SIP is one of Many

ITU H.323 Originally for video conferencing The first standard protocol for VoIP Still in wide usage, but negative growth

MGCP Dumb phones controlled by smart server “Softswitch” – PSTN emulation view

Megaco/H.248 Standard version of MGCP

Core SIP Functions Establishment of peer to peer sessions Management of peer to peer sessions

Keepalives Graceful and Non-graceful termination

Rendezvous Forking Search

Policy Based Routing Loose Routing Mobility

Limited terminal mobility Device Mobility

Core SIP Functions

Secure User Identification Exchange and Management of Media

Session data User registration Capability declaration Capability query Reliability

SIP Technology Community

SIPRFC3261

DNS3263

Events3265

Rel3262

O/A3264

RTPSDP

SIMPLE

SigComp

SIP ExtensionsENUM

MIDCOM

STUN

ROHC

SIP Design Philosophy

Patterned after other Successful Internet Standards HTTP

Don’t Reinvent the PSTN General Purpose

Functionality Do Not Dictate

Architectures or Services

It needs to work on any IP Network

Leverage the Best of Existing Standards

URLs MIME RFC822

Scalability Push state to the edge

Basic Design

Request/Response Protocol SIP is a Peer Protocol – all

entities send requests and receive requests

Modelled after HTTP Each request invokes

method Main purpose of request

Messages contain bodies

Agent Agent

request

response

Transactions Fundamental unit of

messaging exchange Request Zero or more provisional

responses Usually one final response Maybe ACK

All signaling composed of independent transactions

Identified by Cseq Sequence number Method tag

INVITE

100200

ACK

BYE

200

First Transaction

Second Transaction

Cseq: 1

Cseq: 2

Session Independence Body of SIP message

used to establish call describes the session

Session could be Audio Video Game

SIP operation is independent of type of session

SIP Bodies are MIME objects MIME = Multipurpose

Internet Mail Extensions Mechanisms for

describing and carrying opaque content

Used with HTTP and email

Protocol Components

User Agent End systems Hard and soft phones PSTN Gateways Phone Adaptors Media Servers Anything that

originates or terminates SIP calls

Proxy SIP server responsible for relaying

and processing requests between user agents

Main job: where to send request next?

Back-to-Back User Agent (B2BUA) SIP server that terminates and re-

originates SIP SBCs, Call Agents, etc.

SIP Addressing SIP addresses are URL’s URL contains several components

Scheme (sip) Username Hostname Optional port Parameters Headers and Body

SIP allows any URI type tel URIs http URLs for redirects mailto URLs leverage vast URI

infrastructure

sip:jdrosen@cisco.com:5061; user=host?Subject=foo

The SIP Trapezoid

a.com b.com

SIP

RTP

SIP Methods

INVITE Invites a participant to a

session idempotent - reINVITEs for

session modification BYE

Ends a client’s participation in a session

CANCEL Terminates a search

OPTIONS Queries a participant

about their media capabilities, and finds them, but doesn’t invite

ACK For reliability and call

acceptance REGISTER

Informs a SIP server about the location of a user

SIP ArchitectureRequest

Response

Media

1

2

3

45

67

8

9

1011

12

Corp DB

13

14

14089023077@a.coma.com

sp.com

b.com

14089023077@sp.com

14089023077@b.com

SIP Message Syntax

Many header fields from http

Payload contains a media description SDP - Session

Description Protocol

INVITE sip:+17327654321@example.com SIP/2.0From: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ahSubject: Conference CallTo: John Smith <sip:+17327654321@example.com>Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: 1997234505.56.78@1.2.3.4Content-type: application/sdpCSeq: 4711 INVITEContent-Length: 187

v=0o=user1 53655765 2353687637 IN IP4 1.2.3.4s=Salesc=IN IP4 1.2.3.4t=0 0m=audio 3456 RTP/AVP 0

SIP Address Fields

Request-URI Contains address of

next hop server Rewritten by proxies

based on result of Location Service

To Address of original

called party Contains optional

display name From

Address of calling party

Optional display name

INVITE sip:+17327654321@example.com SIP/2.0From: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ahSubject: Conference CallTo: John Smith <sip:+17327654321@example.com>Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: 1997234505.56.78@1.2.3.4Content-type: application/sdpCSeq: 4711 INVITEContent-Length: 187

v=0o=user1 53655765 2353687637 IN IP4 1.2.3.4s=Salesc=IN IP4 1.2.3.4t=0 0m=audio 3456 RTP/AVP 0

SIP Responses

Look much like requests Headers, bodies

Differ in top line Status Code

Numeric, 100 - 699 Meant for computer processing Protocol behavior based on

100s digit Other digits give extra info

Reason Phrase Text phrase for humans Can be anything

Status Code Classes 100 - 199 (1XX): Informational 200 - 299 (2XX): Success 300 - 399 (3XX): Redirection 400 - 499 (4XX): Client Error 500 - 599 (5XX): Server Error 600 - 699 (6XX): Global Failure

Two groups 100 - 199: Provisional

Not reliable 200 - 699: Final, Definitive

Example 200 OK 180 Ringing

Example SIP Response

Note how only difference is top line

Rules for generating responses Call-ID, To, From, Cseq

are mirrored in response

Branch parameter used as transaction ID

Tag added to To field to identify dialog

SIP/2.0 200 OKFrom: J. Rosenberg <sip:+14082321122@example.com> ;tag=76ahTo: John Smith <sip:+17327654321@example.com> ;tag=112Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: 1997234505.56.78@1.2.3.4Content-type: application/sdpCSeq: 4711 INVITE

SIP Transport

SIP Messages over UDP or TCP/TLS or SCTP

Reliability mechanisms defined for UDP

UDP More Widely Used Faster No connection state

TCP preferred these days NAT Larger SIP messages

Reliability mechanisms depend on SIP request method INVITE anything except INVITE

Reason: optimized for phone calls

Registrations

REGISTER creates mapping in server from one URI to another

REGISTER properties UA location in Contact Registrar identified in Request

URI Identifies registered user in To

and From field Expires header indicates desired

lifetime Can be different for each

Contact Registrations are soft-state

REGISTER sip:example.com SIP/2.0To: sip:89023077@example.com;user=phoneFrom: sip:89023077@example.com;user=phoneCall-ID: 1997234505.56.78@1.2.3.4CSeq: 123 REGISTERContact: sip:89023077@1.2.3.4Expires: 3600

sip:89023077@example.comto

sip:89023077@1.2.3.4

Registration Handling

Registrar is logical function handling REGISTER

Registrar steps: Authenticate Authorize Add Binding Lower expiration Return all currently

registered UA (can be more than one)

SIP/2.0 200 OKTo: sip:89023077@example.com;user=phoneFrom: sip:89023077@example.com;user=phoneCall-ID: 1997234505.56.78@1.2.3.4CSeq: 123 REGISTERContact: sip:89023077@1.2.3.4;expires=3600Contact: sip:89023077@5.6.7.8;expires=524

Forking

A proxy may have more than one address for a user Happens when more than one SIP

URL is registered for a user Can happen based on static routing

configuration In this case, proxy may fork Forking is when proxy sends request

to more than one proxy at once First 200 OK that is received is

forwarded upstream All other unanswered requests

cancelled

INVITE89023077@a.com

INVIT

E 8902

3077

@1.2

.3.4

INVITE 89023077@5.6.7.8

Routing of Subsequent Requests

Initial SIP request sent through many proxies

No need per se for subsequent requests to go through proxies

Each proxy can decide whether it wants to receive subsequent requests Inserts Record-Route header

containing its address For subsequent requests, users

insert Route header Contains sequence of proxies (and

final user) that should receive request

Proxy

Proxy

Proxy

UA1

UA2

INVITE

BYE

Setting up the Session

INVITE contains the Session Description Protocol (SDP) in the body

SDP conveys the desired session from the callers perspective Session consists of a number of

media streams Each stream can be audio,

video, text, application, etc. Also contains information

needed about the session codecs addresses and ports

SDP also conveys other information about session Time it will take place Who originated the

session subject of the session URL for more information

SDP origins are multicast sessions on the mbone Originator of INVITE is

not originator of session

Anatomy of SDP SDP contains informational

headers version (v) origin(o) - unique ID information (I)

Time of the session Followed by a sequence of media

streams Each media stream contains an

m line defining port transport codecs

Media Stream also contains c line Address information

v=0o=user1 53655765 2353687637 IN IP4 128.3.4.5s=Mbone Audioi=Discussion of Mbone Engineering Issuese=mbone@somewhere.comt=0 0m=audio 3456 RTP/AVP 0 78c=IN IP4 1.2.3.4a=rtpmap:78 G723m=video 4444 RTP/AVP 86c=IN IP4 1.2.3.4a=rtpmap:86 H263

Negotiating the Session Called party receives SDP offered

by caller Each stream can be

accepted rejected

Accepting involves generating an SDP listing same stream port number and address of called

party subset of codecs from SDP in request

Rejecting indicated by setting port to zero

Resulting SDP returned in 200 OK Media can now be exchanged

v=0o=user2 16255765 8267374637 IN IP4 4.3.2.1t=0 0m=audio 3456 RTP/AVP 0 c=IN IP4 4.3.2.1m=video 0 RTP/AVP 86c=IN IP4 4.3.2.1

Audio stream accepted, PCMU only.Video stream rejected

Changing Session Parameters

Once call is started, session can be modified

Possible changes Add a stream Remove a stream Change codecs Change address information

Call hold is basically a session change

Accomplished through a re-INVITE Same session negotiation as

INVITE, except in middle of call Rejected re-INVITE - call still active!

INVITE

200ACK

INVITE

200ACK

reINVITE

Hanging Up

How to hang up depends on when and who

After call is set up either party sends BYE request

From caller, before call is accepted send CANCEL BYE is bad since it may not reach

the same set of users that got INVITE

If call is accepted after CANCEL, then send BYE

From callee, before accepted Reject with 486 Busy Here

C S

INVITE

100

Hangup AcceptCANCEL

200 OK

200 OK

ACK

BYE

200 OK

Call Flow for basic call: UA to proxy to UA

Call setup 100 trying hop by hop 180 ringing 200 OK acceptance

Call parameter modification re-INVITE Same as initial INVITE,

updated session description Termination

BYE method

INVITE

100 Trying

INVITE

100 Trying

180 Ringing180 Ringing

200 OK200 OK

ACK

BYE

200 OK

RTP

Privacy and Identity

RFC 3325: A Private Extension for Asserted Identity in Trusted Networks

RFC 3323: A Privacy Mechanism for SIP RFC 4474: SIP Identity

RFC3325 Asserted Identity

Trust Domain

AuthenticatesCaller and verifiesidentity. Adds PAID.

INVITEP-Asserted-Identity: sip:+14089023077@a.com

RFC3323 – SIP Privacy

Trust Domain

INVITEP-Asserted-Identity: sip:+14089023077@a.comFrom: anonymous

INVITEPrivacy: idFrom: anonymous

AnonymousCaller

INVITEFrom: anonymous

4474: SIP Identity

AuthenticatesCaller and verifiesidentity. Signs Request.

INVITEFrom: sip:joe@example.comIdentity: asd87f7as66sda8z

INVITEFrom: sip:joe@example.com

VerifiesSignature

Only useful for user@domain addresses!

Transfers and Dialog Movement: REFER (RFC 3515)

Joe

Alice

Bob

REFERRefer-To: Bob

INVITE

INVITE

INVITE BobReferred-By: Joe1

2

3

4

Third Party Call Control (3pcc): RFC 3725

RTP

INVITEno SDP

200SDP A

INVITESDP A

200SDP B

ACKSDP B

1

2

3

4

5

6

SIP and Quality of Service RFC 3312: Integration of Resource

Management with SIP Problem

How to make sure phone doesn’t ring unless resources are reserved

Solution SIP does not do resource

reservation! SIP INVITE tells far side not to ring Both sides do regular QoS

reservations RSVP PDP context activation

UPDATE to change state

INVITE w. Preconditions

183 Progress

QoS Reservations

UPDATE w. Preconditions

180 Ringing

200 OK

ACK

Security

VoIP Security

The only totally secure system I know of is a rock

- Tony Lauck, circa 1985

But Even Rocks can be Insecure..

It Had a Great User Interface

But it had a serious security vulnerability…

VoIP AttacksAttack Solution

Free Calls aka Toll Fraud User Authentication

Impersonation User Authentication, Secure Caller ID

Learning Private Information (calling patters, PIN codes)

SIP Encryption, Media Encryption

Steal Calls SIP Encryption, Media Encryption

DoS ICE, Others

SIP User Authentication

RTP

We want this SIP server to authenticatethis user

and this SIP server to authenticatethis user

SIP Digest Authentication

Hi, I’d liketo SIPREGISTER

401 –OK, tryagain. Nonce=a7szh1

REGISTER Nonce=a7szh1Username=joeDigest=z0v88a6

Digest= Hash(joe, a7szh1,myPassword)

OK, done!

Digest= Hash(joe, a7szh1,myPassword) = z0v88a6

Offline Dictionary Attack

REGISTER Nonce=a7szh1Username=joeDigest=z0v88a6

Digest= Hash(joe, a7szh1,alligator)

OK, done!

Digest= Hash(joe, a7szh1,alligator) =

Aardvark 9z8v77aAbacus lkf88z7Abate 8z77x…….Alligator z0v88a6

Word Hash(joe, a7szh1,word)

Solution: Digest over TLS

Digest= Hash(joe, a7szh1,alligator)

Digest= Hash(joe, a7szh1,alligator) =

TLSArmor

This is howWeb Security works!

Even Stronger: Mutual TLS for Devices

TLSArmor

MAC8x7a6

a.com

Phone has aCertificatewhich identifiesit

SIP Encryption

RTP

We want each SIP hop to beEncyprted so only the SIPservers and endpoints see thesignaling.

SIP Encryption: TLS

RTP

Mutual TLSAuthentication

a.com

b.com

Media Encryption Countermeasure against:

Eavesdropping Barge-in Modification

Two useful techniques IPSEC SRTP

Complications Key management Legal intercept (who has the keys) Firewall and NAT issues (covered later)

Alternative: Secure RTP Authentication and encryption of RTP and RTCP packets

timestamp

PV X CC M PT sequence number

synchronization source (SSRC) identifier

contributing sources (CCRC) identifiers…

RTP extension (optional)

RTP payload

SRTP MKI -- 0 bytes for voice

Authentication tag -- 4 bytes for voice

Authenticated portionEncrypted portion

SRTP Advantages

Provides both Privacy via encryption and authentication via message integrity check

Very little bandwidth overhead Does not break header compression schemes like cRTP For very low-rate channels (e.g. cellular) can sacrifice authentication

and have no packet expansion. Uses modern strong crypto suites: AES counter mode for

encryption and HMAC for message integrity Disadvantages

Needs key management End-to-end versus hop-by-hop trust tradeoffs in protecting keys Yet another security mechanism to ensure is implemented and

deployed correctly

NAT Traversal

What is NAT? Network Address Translation

(NAT) Creates address binding

between internal private and external public address

Modifies IP Addresses/Ports in Packets

Benefits Avoids network renumbering on

change of provider Allows multiplexing of multiple

private addresses into a single public address ($$ savings)

Maintains privacy of internal addresses

ClientNAT

NAT

S: 1.2.3.4:8877D: 67.22.3.1:80

Binding Table

Internal External10.0.1.1:6554 -> 1.2.3.4:8877

S: 10.0.1.1:6554D: 67.22.3.1:80

IP Pkt IP Pkt

Problem: Getting SIP Through NATs

NAT

INVITE sip:12345@b.com

m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1

RTP to 10.0.1.1

Solution Space

Application Layer Gateways (ALGs) Session Border Controllers (SBC) Simple Traversal of UDP Through NAT

(STUN) Traversal Using Relay NAT (TURN) Interactive Connectivity Establishment (ICE)

Application Layer Gateway

NAT

INVITE sip:12345@b.com

m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1

RTP to 10.0.1.1

INVITE sip:12345@b.com

m=audio 1234 RTP/AVP 0 c=IN IP4 19.1.3.2

ALG

NAT also modifies SIPmessages to fix them up!

ALG Benefits and Drawbacks

Drawbacks Doesn’t work when security

turned on Hard to diagnose problems Requires network upgrade to

support new app Frequent implementation

problems (lack of expertise) Incentives mismatched

Benefits No change to clients or

servers

Session Border Controller

NAT

INVITE sip:12345@b.com

m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1 SBC

9.8.7.6INVITE sip:12345@b.com

m=audio 3225 RTP/AVP 0 c=IN IP4 9.8.7.6

RTP to9.8.7.6

SBC relaysRTP back tosource

SBC Benefits and Drawbacks

Drawbacks Expensive media relaying Interferes with some SIP

extensions Breaks more advanced SIP

security

Benefits No change to clients or

NATs Works with basic SIP

security mechanisms Easier to diagnose

Simple Traversal of UDP Through NAT (STUN)

NAT

What is my IP addressand port please?

STUNServer

9.8.7.6

INVITE sip:12345@b.com

m=audio 3472 RTP/AVP 0 c=IN IP4 1.2.3.4

RTP to1.2.3.4

1.2.3.4

Its 1.2.3.4:3472

STUN Benefits and Drawbacks

Drawbacks Doesn’t always work

Benefits No change to servers or

NATs Works with all SIP

security mechanisms Can support non-VoIP

apps (e.g., games)

Traversal Using Relay NAT (TURN)

NAT

Give me an IP addressand port please?

TURNServer

9.8.7.6

INVITE sip:12345@b.com

m=audio 2376 RTP/AVP 0 c=IN IP4 9.8.7.6

RTP to1.2.3.4

1.2.3.4

9.8.7.6:2376

TURN Benefits and Drawbacks

Drawbacks Expensive Media Relaying

Benefits No change to servers or

NATs Works with all SIP

security mechanisms Can support non-VoIP

apps (e.g., games)

Interactive Connectivity Establishment(ICE) Hybrid of STUN and

TURN P2P NAT Traversal Widely Deployed on

Internet Popular with

Application Providers

ICE Step 1: Allocation Before Making a Call, the

Client Gathers Candidates Each candidate is a

potential address for receiving media

Three different types of candidates Host Candidates Server Reflexive Candidates

(STUN) Relayed Candidates (TURN)

TURN

HostCandidates resideon the agent itself

STUN candidatesare addresses residing on a NAT

NAT

NAT

TURN candidates reside on a TURN server

STUN

ICE Step 2: Create Offer Each candidate is

placed into an a=candidate attribute of the offer

Each candidate line has IP address and port plus other info needed for ICE

c=IN IP4 192.0.2.3 t=0 0 m=audio 45664 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:1 1 UDP 2130706178 10.0.1.1 8998 typ host a=candidate:2 1 UDP 1694498562 192.0.2.3 45664 typ srflx raddr 10.0.1.1 rport 8998

ICE Step 3: Send INVITE

Caller sends a SIP INVITE as normal

No ICE processing by SIP servers

SIPServer

INVITE

ICE Step 4: Allocation Called party does

exactly same processing as caller and obtains its candidates

Recommended to not yet ring the phone!

TURN

NAT

NAT

STUN

ICE Step 5: Provisional Response Callee sends a

provisional response containing its SDP with candidates

As with INVITE, no processing by proxies

Phone has still not rung yet

SIPProxy

1xx

ICE Step 6: Verification Each agent pairs up its

candidates (local) with its peers (remote) to form candidate pairs

Each agent sends a STUN-based ping on each pair, starting at highest priority

If a response is received the check has succeeded and we know media can flow on that pair!

TURNServer

NAT

NAT

TURNServer

NAT

NAT

1

2

3

45

ICE Benefits and Drawbacks

Drawbacks Requires client changes Requires other side to

support it

Benefits Always Works No change to servers or

NATs Works with all SIP security

mechanisms Minimum Media Relaying Can support non-VoIP apps

(e.g., games) Built-In Anti-DOS Eliminates Ghost Rings

That’s it!

Questions?

GlossaryAI N Advanced I ntelligent Network ADPCM Adaptive PCM BGP Border Gateway Protocol CALEA Communication Access f or Law

Enforcement Act CBR Constant Bit Rate CELP Code Excited Linear Prediction CODEC Coder/ Decoder COPS Common Open Policy Service CRTP Compressed RTP CSRC Contributing Source CTI Computer-Telephony

I ntegration DSCP Diff serv Code Point DSL Digital Subscriber Line DSP Digital Signal Processor DTMF Dual Tone Multi-Frequency ERL Echo Return Loss ERLE ERL Enchancement HFC Hybrid Fiber/ Coax

I N I ntelligent Network I SDN I ntegrated Services Digital

Network I SUP I SDN User Part J TAPI J ava Telephony API LDAP Lightweight Directory Access

Protocol MCML Multi-class Multi-link PPP MGCP Media Gateway Control

Protocol MOS Mean Opinion Score MPLS Multi-protocol Label Switching NLP Non-linear Processing NTP Network Time Protocol PCM Pulse Coded Modulation PPP Point-to-point Protocol PHB Per-hop Behavior PQ Priority Queueing PSTN Public Switched Telephony

Network

Glossary (2)QoS Quality of Service RED Random Early Detect (or Drop) RTCP Realtime Transport Control

Protocol RTP Realtime Transport Protocol SCP Service Control Point SIP Session I nvitation Protocol SS7 Signaling System Number 7 SSRC Synchronization Source TAPI Telephony API TDM Time Division Multiplexed TRIP Telephony Routing I nformation

Protocol TSPEC Transmission Specification WFQ Weighted Fair Queueing

Thanks

Enjoy Interop!

to contact me: jdrosen@jdrosen.net