Sip

30
SIP (Session Initiation Protocol) Introduction SIP (Session Initiation Protocol) is a signaling protocol used to create, manage and terminate sessions in an IP based network. A session could be a simple two-way telephone call or it could be a collaborative multi-media conference session. This makes possible to implement services like voice-enriched e-commerce, web page click-to-dial or Instant Messaging with buddy lists in an IP based environment. Don't worry if you don't know about these services. You don't need to know them before you learn about SIP. SIP has been the choice for services related to Voice over IP (VoIP ) in the recent past. It is a standard (RFC 3261 ) put forward by Internet Engineering Task Force (IETF ). SIP is still growing and being modified to take into account all relevant features as the technology expands and evolves. But it should be noted that the job of SIP is limited to only the setup and control of sessions. The details of the data exchange within a session e.g. the encoding or codec related to an audio/video media is not controlled by SIP and is taken care of by other protocols. For an overview of the major SIP functions, click here . This introduction is meant for beginners. This beginners' made easy tutorial is to give a brief introduction to SIP before one ventures into the long RFC documents. However, if you are a veteran please go through this short tutorial and suggest modifications. Here on this site the aim is not to make you an expert of SIP based applications. I doubt whether any site can do that. You have to have hands on experience to muster the aspects related to Internet multimedia or IP telephony. Here I am proposing nothing new. The whole job is to initiate a

Transcript of Sip

Page 1: Sip

SIP (Session Initiation Protocol) Introduction

SIP (Session Initiation Protocol) is a signaling protocol used to create, manage and terminate sessions in an IP based network. A session could be a simple two-way telephone call or it could be a collaborative multi-media conference session. This makes possible to implement services like voice-enriched e-commerce, web page click-to-dial or Instant Messaging with buddy lists in an IP based environment. Don't worry if you don't know about these services. You don't need to know them before you learn about SIP.

SIP has been the choice for services related to Voice over IP (VoIP) in the recent past. It is a standard (RFC 3261) put forward by Internet Engineering Task Force (IETF). SIP is still growing and being modified to take into account all relevant features as the technology expands and evolves. But it should be noted that the job of SIP is limited to only the setup and control of sessions. The details of the data exchange within a session e.g. the encoding or codec related to an audio/video media is not controlled by SIP and is taken care of by other protocols. For an overview of the major SIP functions, click here.

This introduction is meant for beginners. This beginners' made easy tutorial is to give a brief introduction to SIP before one ventures into the long RFC documents. However, if you are a veteran please go through this short tutorial and suggest modifications.

Here on this site the aim is not to make you an expert of SIP based applications. I doubt whether any site can do that. You have to have hands on experience to muster the aspects related to Internet multimedia or IP telephony. Here I am proposing nothing new. The whole job is to initiate a newcomer with the facets of the Session Initiation protocol (SIP) so that a near 200 page RFC document does not intimidate you. However I strongly recommend that you go through the document of RFC 3261 once you have completed this tutorial.

If you need a book that you can use to start with SIP, SIP Demystified is a good option. It starts with standard telephony systems and gradually guide you into Session Initiation Protocol.

We shall start with a little background history of SIP. If you are in a hurry, you can skip to the functions of SIP.

Page 2: Sip

After going through the online tutorial, I recommend that you go through some of the books as your needs and interests are. You can visit the books section or directly check those available in amazon.com.

A Brief History of SIPInitially only the traditional switch-based telephone system was the main medium for transmitting messages. However with the advent of the Internet, the need was felt to fabricate a system, which connects people over the IP based network. Different communities put forward different solutions but the solution presented by IETF was finally accepted as the most general one. However the development of SIP in IETF was not a one-step process.

February 1996

Initial Internet drafts were produced in the form of -Session Invitation Protocol (SIP) – M.Handley, E.SchoolerSimple Conference Invitation Protocol (SCIP) – H.SchulzrinneSIP was originally intended to create a mechanism for inviting people to large-scale multipoint conferences on the Internet Multicast Backbone (Mbone). At this stage, IP telephony didn't really exist. The first draft was known as "draft-ietf-mmusic-sip-00”. It included only one request type, which was a call setup request. (Wondering what music is doing in SIP??? Well, it is an acronym for Multiparty Multimedia Session Control. IETF people are not that music crazy after all.)

December 1996

A newer version “draft-ietf-mmusic-sip-01” was proposed as a modification to SIP-0. Still it was yet to take the shape of SIP as we know it now.

January 1999

The IETF published the draft called "draft-ietf-mmusic-sip-12". It contained the six requests that SIP has today.

March 1999

SIP published RFC 2543 as a standard.It was modified further to generate the more modern version of RFC 3261.

Page 3: Sip

Let's leave the history to get older and concentrate on perhaps the most important part of this tutorial. Let's know about the functions of SIP.

Functions of SIPSIP is limited to only the setup, modification and termination of sessions. It serves four major purposes

SIP allows for the establishment of user location (i.e. translating from a user's name to their current network address).

SIP provides for feature negotiation so that all of the participants in a session can agree on the features to be supported among them.

SIP is a mechanism for call management - for example adding, dropping, or transferring participants.

SIP allows for changing features of a session while it is in progress.

All of the other key functions are done with other protocols.

Yes! this does indeed mean that SIP is not a session description protocol, and that SIP does not do conference control. SIP is not a resource reservation protocol and it has nothing to do with quality of service (QoS). SIP can work in a framework with other protocols to make sure these roles are played out - but SIP does not do them. SIP can function with SOAP, HTTP, XML, VXML , WSDL, UDDI, SDP and others. Everyone has a role to play!

With all that said, SIP is still one of the most important protocols. Better learn about the SIP components.

Components of SIPEntities interacting in a SIP scenario are called User Agents (UA)User Agents may operate in two fashions -

User Agent Client (UAC) : It generates requests and send those to servers. User Agent Server (UAS) : It gets requests, processes those requests and generate

responses.

Note: A single UA may function as both.

Page 4: Sip

Clients:

In general we associate the notion of clients to the end users i.e. the applications running on the systems used by people. It may be a softphone application running on your PC or a messaging device in your IP phone. It generates a request when you try to call another person over the network and sends the request to a server (generally a proxy server). We will go through the format of requests and proxy servers in more detail later.

Servers:

Servers are in general part of the network. They possess a predefined set of rules to handle the requests sent by clients.Servers can be of several types -

Proxy Server: These are the most common type of server in a SIP environment. When a request is generated, the exact address of the recipient is not know in advance. So the client sends the request to a proxy server. The server on behalf of the client (as if giving a proxy for it) forwards the request to another proxy server or the recipient itself.

Redirect Server: A redirect server redirects the request back to the client indicating that the client needs to try a different route to get to the recipient. It generally happens when a recipient has moved from its original position either temporarily or permanently.

Registrar: As you might have guessed already, one of the prime jobs of the servers is to detect the location of an user in a network. How do they know the location? If you are thinking that users have to register their locations to a Registrar server, you are absolutely right. Users from time to time refreshes their locations by registering (sending a special type of message) to a Registrar server.

Location Server: The addresses registered to a Registrar are stored in a Location Server.

Now that the components are ready, we need the SIP commands to make them work.

Commands of SIP INVITE :Invites a user to a call

ACK : Acknowledgement is used to facilitate reliable message exchange for INVITEs.

BYE :Terminates a connection between users CANCEL :Terminates a request, or search, for a user. It is used if a client sends an

INVITE and then changes its decision to call the recipient. OPTIONS :Solicits information about a server's capabilities.

Page 5: Sip

REGISTER :Registers a user's current location INFO :Used for mid-session signaling

If you don't realise how the commands exactly work, don't worry. We will discuss the format of some of the above SIP commands in more detail shortly.

It's time to go through a typical SIP session so that you can appreciate what we have learnt so far and what follows in our journey through SIP.

A Typical Example of SIP sessionSIP signaling follows the server-client paradigm as used widely in the Internet by protocols like HTTP or SMTP. The following picture presents a typical exchange of requests and responses. Please note that it is only a typical case and doesn't include all possible cases.

If you unfamiliar with terms like SIP phone or softphone, learn about VoIP phones. Better open it in a new window.

Before understanding the methods, first you should understand the pictorial diagram. User 1 uses his softphone to reach the SIP phone of user2. Server1 and server2 help to setup the session on behalf of the users. This common arrangement of the proxies and the end-users is called "SIP Trapezoid" as depicted by the dotted line. The messages appear vertically in the order they appear i.e. the message on top (INVITE M1) comes first followed by others. The direction of arrows shows the sender and recipient of each message. Each message contains a 3-digit-number followed by a name and each one is labeled by 'M' and a serial number. The 3-digit-number is the numerical code of the associated message comprehended easily by machines. Human users use the name to identify the message.

Page 6: Sip

Figure : SIP session example with SIP trapezoid

The transaction starts with user1 making an INVITE request for user2. But user1 doesn't know the exact location of user2 in the IP network. So it passes the request to server1. Server1 on behalf of user1 forwards an INVITE request for user2 to server2. It sends a TRYING response to user1 informing that it is trying to reach user2. The response could have been different but we will discuss the other types of responses later. If you are wondering how server1 knows that it has to forward the request to server2, just hold on for a moment. We will discuss that while going through the registration process of SIP.

Receiving INVITE M2 from server1, server2 works in a similar fashion as server1. It forwards an INVITE request to user2 (note: Here server2 knows the location of user2. If it didn't know the location, it would have forwarded it to another proxy server. So an INVITE request may travel through several proxies before reaching the recipient). After forwarding INVITE M3 server2 issues a TRYING response to server1.

The SIP phone, on receiving the INVITE request, starts ringing informing user2 that a call request has come. It sends a RINGING response back to server2 which reaches user1 through server1. So user1 gets a feedback that user2 has received the INVITE request.

User2 at this point has a choice to accept or decline the call. Let's assume that he decides to accept it. As soon as he accepts the call, a 200 OK response is sent by the phone to server2. Retracing the route of INVITE, it reaches user1. The softphone of user1 sends an ACK message to confirm the setup of the call. This 3-way-handshaking

Page 7: Sip

(INVITE+OK+ACK) is used for reliable call setup. Note that the ACK message is not using the proxies to reach user2 as by now user1 knows the exact location of user2.

Once the connection has been setup, media flows between the two endpoints. Media flow is controlled using protocols different from SIP e.g. RTP.

When one party in the session decides to disconnect, it (user2 in this case) sends a BYE message to the other party. The other party sends a 200 OK message to confirm the termination of the session.

Was that a bit long? Need a break? Go, get it! You deserve a break after going through such a long SIP session -:) When you get back, we will dive inside a SIP request message.

Request Message Format of SIPBack already! Well, let's continue.

In the previous SIP session example we have seen that requests are sent by clients to servers. We will now discuss what that request actually contains. The following is the format of INVITE request as sent by user1.

INVITE sip:[email protected] SIP/2.0Via: SIP/2.0/UDP pc33.server1.com;branch=z9hG4bK776asdhds Max-Forwards: 70 To: user2 <sip:[email protected]>From: user1 <sip:[email protected]>;tag=1928301774Call-ID: [email protected] CSeq: 314159 INVITE Contact: <sip:[email protected]>Content-Type: application/sdp Content-Length: 142

---- User1 Message Body Not Shown ----

The first line of the text-encoded message is called Request-Line. It identifies that the message is a request.

Request-LineMethod SP Request-URI SP SIP-Version CRLF[SP = single-space & CRLF=Carriage Return + Line Feed (i.e. the character inserted when you press the "Enter" or "Return" key of your computer)]Here method is INVITE, request-uri is "[email protected]" and SIP version is 2.The following lines are a set of header fields.

Via:

Page 8: Sip

It contains the local address of user1 i.e. pc33.server1.com where it is expecting the responses to come.

Max-Forward:

It is used to limit the number of hops that this request may take before reaching the recipient. It is decreased by one at each hop. It is necessary to prevent the request from traveling forever in case it is trapped in a loop.

To:

It contains a display name "user2" and a SIP or SIPS URI <[email protected]>

From:

It also contains a display name "user1" and a SIP or SIPS URI <[email protected]>. It also contains a tag which is a pseudo-random sequence inserted by the SIP application. It works as an identifier of the caller in the dialog.

Call-ID:

It is a globally unique identifier of the call generated as the combination of a pseudo-random string and the softphone's IP address.     The Call-ID is unique for a call. A call may contain several dialogs. Each dialog is uniquely identified by a combination of From, To and Call-ID. If you are in confusion click here.

CSeq:

It contains an integer and a method name. When a transaction starts, the first message is given a random CSeq. After that it is incremented by one with each new message. It is used to detect non-delivery of a message or out-of-order delivery of messages.

Contact:

It contains a SIP or SIPS URI that is a direct route to user1. It contains a username and a fully qualified domain name(FQDN). It may also have an IP address.     Via field is used to send the response to the request. Contact field is used to send future requests. That is why the 200 OK response from user2 goes to user1 through proxies. But when user2 generates a BYE request (a new request and not a response to INVITE), it goes directly to user1 bypassing the proxies.

Content-Type:

It contains a description of the message body (not shown).

Page 9: Sip

Content-Length:

It is an octet (byte) count of the message body.

The header may contain other header fields also. However those fields are optional. Please note that the body of the message is not shown here. The body is used to convey information about the media session written in Session Description Protocol (SDP). You may continue your journey through SIP without worrying about SDP right now. However it doesn't hurt to take a peep.

Your SIP request is waiting to get a SIP response message.

Response Message Format of SIPHere is what the SIP response of user2 will look like.

SIP/2.0 200 OKVia: SIP/2.0/UDP site4.server2.com;branch=z9hG4bKnashds8;received=192.0.2.3Via: SIP/2.0/UDP site3.server1.com;branch=z9hG4bK77ef4c2312983.1;received=192.0.2.2Via: SIP/2.0/UDP pc33.server1.com;branch=z9hG4bK776asdhds;received=192.0.2.1To: user2 <sip:[email protected]>;tag=a6c85cfFrom: user1 <sip:[email protected]>;tag=1928301774Call-ID: [email protected]: 314159 INVITEContact: <sip:[email protected]>Content-Type: application/sdpContent-Length: 131

---- User2 Message Body Not Shown ----

Status Line

The first line in a response is called Status line.SIP-Version SP Status-Code SP Reason-Phrase CRLF[SP = single-space & CRLF=Carriage Return + Line Feed (i.e. the character inserted when you press the "Enter" or "Return" key of your computer)]Here SIP version is 2, Status-Code is 200 and Reason Phrase is OK.

The header fields that follow the status line are similar to those in a request. I will just mention the differences

Via:

Page 10: Sip

There are more than one via field. This is because each element through which the INVITE request has passed has added its identity in the Via field. Three Via fields are added by softphone of user1, server1 the first proxy and server2 the second proxy. The response retraces the path of INVITE using the Via fields. On its way back, each element removes the corresponding Via field before forwarding it back to the caller.

To:

Note that the To field now contains a tag. This tag is used to represent the callee in a dialog.

Contact:

It contains the exact address of user2. So user1 doesn't need to use the proxy servers to find user2 in the future.

It is a 2xx response. However responses can be differnet depending on particular situations. Learn about the different types of SIP responses.

Response Types of SIPThe first digit of a Status-Code defines the category of response. So any response between 100 and 199 is termed as a "1xx" response and so is done for any other type. SIP/2.0 allows six types of response. They are similar to those of HTTP.

1xx: Provisional -- request received, continuing to process the request; 2xx: Success -- the action was successfully received, understood, and accepted; 3xx: Redirection -- further action needs to be taken in order to complete the

request; 4xx: Client Error -- the request contains bad syntax or cannot be fulfilled at this

server; 5xx: Server Error -- the server failed to fulfill an apparently valid request; 6xx: Global Failure -- the request cannot be fulfilled at any server.

If a response is received having a Status-Code of the form yxx which is not understood by the receiving party, it treats the response as a y00 response i.e. if a client receives an unknown response 345, it treats that as a 300 response. An unknown 1xx is treated as 183 (Session in Progress). So each UA must know how to react to 100,183,200,300,400,500 and 600.

In SIP we talk about calls, dialogs, transactions and messages. Frankly, I was pretty confused initially about how they are related. The next page clarifies their inter-relation.

Page 11: Sip

Relation among Call, Dialog, Transaction & MessageIf you are confused with the relation among Call, Dialog, Transaction & Message, you are not alone. I think quite a good number of people get confused regarding the relation in the beginning.

Messages are the individual textual bodies exchanged between a server and a client. There can be two types of messages. Bingo! You already know them ... Requests and Responses.

Transaction occurs between a client and a server and comprises all messages from the first request sent from the client to the server up to a final (non-1xx) response sent from the server to the client. If the request is INVITE and the final response is a non-2xx, the transaction also includes an ACK to the response. The ACK for a 2xx response to an INVITE request is a separate transaction.

Dialog is a peer-to-peer SIP relationship between two UAs that persists for some time. A dialog is identified by a Call-ID, a local tag and a remote tag. A dialog used to be referred as a 'call leg'.

Call of a callee comprises of all the dialogs it is involved in. I think a Call is same as a Session.

Page 12: Sip

The following figure will make the relation clearer.

(RINGING is a 1xx response and OK is a 2xx response.)

A caller may have connections to a number of callees at a time forming a number of dialogs. All these dialogs make a single call.

Well, time to reveal a old secret! If you want to know how server1 knew the location of user2 during the call setup, the page about SIP registration will help you.

Registration in SIPWhile going through a typical SIP session you have already seen that the caller doesn't know the address of the callee initially. The proxy servers do the job of finding out the exact location of the recipient. What actually happens is that every user registers its current location to a REGISTRAR server. The application sends a message called REGISTER informing the server of its present location. The Registrar stores this binding (between the user and its present address) in a location server which is used by other proxies to locate the user.

Page 13: Sip

User yy uses the IP 195.31.65.152 as its current location and registers it with the server. This actually helps in user mobility. Say there is a messaging application. You can log in from different computers. As soon as you log in using your username, the application REGISTER the username with the IP of that computer. The 'Expire' field reflects the duration for which this registration will be valid. So the user has to refresh its registration from time to time.

Please note that the difference between a proxy server and a registration or a location server is often only logical. Physically they may be situated on the same machine.

Wow!! You have completed the whole of the SIP tutorial. Congratulations! I insist that you go through the conclusion. It has important information to move forward in your SIP education.

ConclusionI hope by now you have got a basic idea of what SIP is and what it does. You should be able to recognize the major components in a SIP scenario and how different messages are exchanged to establish and terminate sessions. But you must remember that it is just the beginning. You should go through the document of RFC 3261. If you are serious about your learning better get your hands on a book as recommended in the books section.

You should go through the other sections of the site -

Introduction to RTP : RTP manages the realtime transmission of audio/vedio data in a session.

Introduction to SDP : SDP is used for describing a session needed for establishing and sustaining a session.

VoIP : VoIP is the technology to transmit voice over an IP network. It's an emerging area you would like to know about.

I encourage you to go through the resources listed in internet multimedia resources page.

Page 14: Sip

I intend to include some more pages regarding header fields and proxy servers in near future. So keep coming back. If you have any query or suggestion and more importantly if have found any mistakes in the tutorial, please feel free to email me at [email protected].

RTP IntroductionThis introduction is meant for beginners. This beginners' made easy tutorial is to give a brief introduction to RTP before one ventures into the long RFC documents. However, if you are a veteran please go through this short tutorial and suggest modifications.

What is RTP?

Real-time Transfer Protocol (RTP) provides end-to-end delivery services for data (such as interactive audio and video) with real-time characteristics.

It was primarily designed to support multiparty multimedia conferences. However it is used for different types of applications which we will go through shortly.

RTP is a standard specified in RFC 1889. More recent versions are RFC 3550 and RFC 3551. For an introduction like this we will stick to RFC1889

Real Time aspect of RTPWhat is meant by real-time?

The class of methods whose correctness depends not only on whether the result is the correct one, but also on the time at which the result is delivered.

To make things simpler, lets take an example. Say you want to listen to a song. When you are downloading it from a site, you don't care whether it is downloaded at the same rate or not. You just need a reliable download (preferably fast -:)). But what if want to listen to the song without downloading it? Then you are not only interested to get the whole data but also the rate at which you receive, otherwise the song loses its charm. Here you need a real-time transmission.Note that the example is given only to show how the time-factor is important in transmission of data. Real-time transmission is more important in multimedia conferences.

RTP gives No Guarantee for timely Delivery

Page 15: Sip

Confused?? I bet you are!

Well, the point is that RTP itself does not provide any mechanism to ensure timely delivery or provide other quality-of-service guarantees. It relies on lower-layer services (e.g. UDP, TCP) to do so. The dependence will be clearer when we discuss the RTP packet structure.

So how come is it called a real-time protocol?

RTP provides suitable functionality for carrying real-time content, e.g., a timestamp and control mechanisms for synchronizing different streams with timing properties. We will discuss those in more detail soon.

Components of RTPBefore going into the detailed structure of RTP, you should know that RTP is basically a combination of two parts -

Real Time Protocol (RTP) : It carries real-time data.

Real Time Control Protocol (RTCP) : It monitors the quality of service and conveys information about the participants.

We will go through RTP first and then discuss RTCP. Both play important roles in the transmission. Here you should note that the data and control messages are separated in the forms of RTP and RTCP.

Applications of RTPThe applications in which RTP plays an important role can be classified as follows -

Simple Multicast Audio Conference

Initially the owner of the conference (say the leader of a group) through some allocation mechanism obtains a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets. This address and port information is distributed to the intended participants. If privacy is desired, the data and control packets may be encrypted, in which case an encryption key must also be generated and distributed.

Page 16: Sip

Each participant sends the audio data in small chunks (say 20ms) or packets. The structure of the packets will be discussed later.

Each instance of the audio application (i.e. each participant) in the conference periodically multicasts a reception report plus the name of its user on the RTCP (control) port. This helps to monitor quality of transmission and also determine who the present participants are.

Audio and Video Conference

If both audio and video media are used in a conference, they are transmitted as separate RTP sessions RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. The canonical name or CNAME of individual participants are used to match the audio and video sessions. We will CANME when discuss functions of RTCP.

The sessions are divided so that a participant may choose only one of them. If there is lecture going on, you can just listen to the professor without having to see his face -:).

Mixers and Translators

So far, we have assumed that all sites want to receive media data in the same format. However, this may not always be appropriate. For users having connections of different bandwidth or those working behind a firewall which won't allow IP packets to pass will need some extra processing. This is done in the form of mixers and translators. We will discuss them briefly in the next two pages.

Mixer in RTPIt may so happen that all participants in a conference do not have the connection of same bandwidth. So how do they take part simultaneously?

One solution is that all of them use a lower bandwidth. But this leads to reduced-quality audio encoding.

A smarter solution exists in the use of a RTP-level relay called a mixer. A mixer may be placed near the low-bandwidth area. This mixer resynchronizes incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed link. The following figure gives a graphical representation -

Page 17: Sip

The mixer puts its own identification as the source (SSRC) of the packet and puts the contributing sources in CSRC fields. If you don't know about SSRC and CSRC, come back to this paragraph after going through the RTP header structure.

Mixers have other uses too. An example is a video mixer that scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

Translator in RTPA problem occurs if one or more participants of a conference are behind a firewall which won't allow an IP packet containing the RTP message to pass. For this situation translators are used.

Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through a secure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site's internal network. The following picture illustrates it -

Page 18: Sip

Translator do not change SSRC or CSRC fields unlike mixers. If you don't know about SSRC and CSRC, come back to this paragraph after going through the RTP header structure.

Translators can be used for other purposes too e.g. to connect of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II.

Packet Structure of RTPThe structure of a RTP packet is shown below.

The real-time media that is being transferred forms the 'RTP Payload'. RTP header contains information related to the payload e.g. the source, size, encoding type etc. We will go through the header structure in the next page.

However the RTP packet can't be transferred as it is over the network. For transferring we use a transfer protocol called User Datagram Protocol (UDP). We won't discuss UDP header.

To transfer the UDP packet over the IP network, we need to encapsulate it with a IP packet. We won't discuss IP header either. To transfer the IP packet over the physical network even the IP packet is sent within other packets. Those are not shown here.

Header Structure of RTPThe following figure shows the RTP header structure -

Page 19: Sip

version (V): 2 bitsThis field identifies the version of RTP. The version is 2 upto RFC 1889.

padding (P): 1 bitIf the padding bit is set, the packet contains one or more additional padding octets at the end which are not part of the payload. The last octet of the padding contains a count of how many padding octets should be ignored. Padding may be needed by some encryption algorithms with fixed block sizes or for carrying several RTP packets in a lower-layer protocol data unit.

extension (X): 1 bitIf the extension bit is set, the fixed header is followed by exactly one header extension.

CSRC count (CC): 4 bitsThe CSRC count contains the number of CSRC identifiers that follow the fixed header.

marker (M): 1 bitMarker bit is used by specific applications to serve a purpose of its own. We will discuss this in more detail when we study Application Level Framing.

payload type (PT): 7 bitsThis field identifies the format (e.g. encoding) of the RTP payload and determines its interpretation by the application. This field is not intended for multiplexing separate media.

sequence number: 16 bitsThe sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. The initial value of the sequence number is random (unpredictable).

timestamp: 32 bitsThe timestamp reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations.

SSRC: 32 bitsThe SSRC field identifies the synchronization source. This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier.

Page 20: Sip

CSRC list: 0 to 15 items, 32 bits eachThe CSRC list identifies the contributing sources for the payload contained in this packet. The number of identifiers is given by the CC field. If there are more than 15 contributing sources, only 15 may be identified. CSRC identifiers are inserted by mixers, using the SSRC identifiers of contributing sources.

Synchronization in RTPThe receiver needs three key information for synchronization - the synchronization source, packets in order and sampling instant of packets which it gets from three header fields. You must know about the header fields first.

Synchronization Source (SSRC)

The receiver may be receiving data from several sources. So for proper arrangement it needs to identify the source of individual packets which is possible from the SSRC field.

Sequence Number

It is not enough to identify the source, the order is important too. The sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. The loss or out-of-order delivery occurs due network problems.

Timestamp

For media delivery not just the order of the packets but also the sampling instant of individual packets are important. Please go through the following paragraph carefully.

Several consecutive RTP packets may have equal timestamps if they are (logically) generated at once, e.g., belong to the same video frame. Consecutive RTP packets may contain timestamps that are not monotonic if the data is not transmitted in the order it was sampled, as in the case of MPEG interpolated video frames. (The sequence numbers of the packets as transmitted will still be monotonic.) So the sequence number is not enough for synchronization.

You already know that in a audio/video session audio and video data are transmitted using separate channels (if you don't know this, please go through applications of RTP). The receiver matches the video data with corresponding audio data using timestamp.

Page 21: Sip

Application Level Framing in RTPRTP is a protocol framework that is deliberately not complete. It is not steadfast in certain structures and can be modified in a way to suit a specific application. RTP is intended to be malleable to provide adequate functionality. This characteristic is known as Application Level Framing and is an important aspect of RTP.

So a profile specification document is needed for each application to specify the way RTP is used e.g. to define extensions or modifications to RTP that are specific to a particular class of applications. Participants in a RTP session should agree to a common format. Several header fields can be manipulated according to a specific application.

The extension bit may be set to indicate that the fixed header is followed by exactly one header extension. Extra fields may carry extra information useful for the using application.

The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. A profile may define additional marker bits or specify that there is no marker bit by changing the number of bits in the payload type field

A profile also specifies a default static mapping of payload type codes to payload formats.

 

RTCPWhat is RTCP?

The RTP control protocol (RTCP) is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets.

Functions of RTCP

It provides feedback on the quality of the data distribution. Different types of packets are used. We will discuss those in the next page.

It carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. SSRC may change from time to time but CNAME

Page 22: Sip

remains the same. It is used to identify a participant during the session. RTCP may also contain extra information for the participants like email.

By having each participant send its control packets to all the others, each can independently observe the number of participants. This number is used to calculate the rate at which the packets are sent. More users in a session means an individual source may send packets less frequently.

Types of RTCP packets SR : Sender report, for transmission and reception statistics from participants that

are active senders

RR : Receiver report, for reception statistics from participants that are not active senders

SDES : Source description items, including CNAME BYE : Indicates end of participation APP : Application specific functions

Conclusion of RTPYou should understand that this is only the tip of the iceberg. If you just needed an introduction, it is OK to stop here. But for bigger things you must go through RFC 1889 and that is not enough. You have to work yourself to be a master in applications employing RTP.RFC 1889 has been superceeded by RFC 3550. Thanks to John York for pointing it out.

At this point, I will strongly recommend that if you are serious about the subject please go through some of the books listed in the books section.

If you have any suggestion, correction, query just mail to