VOIP Paper.doc

33
You Don't Know Jack About VoIP ACM Queue vol. 2, no. 6 - September 2004 by Phil Sherburne and Cary Fitzgerald, Cisco The Communications they are a-changin'. Revolution Telecommunications worldwide has experienced a significant revolution over recent years. The long-held promise of network convergence is occurring at an increasing pace. This convergence of data, voice, and video using IP- based networks is delivering advanced services at lower cost across the spectrum, including residential users, business customers of varying sizes, and service providers. One of the key technologies driving this convergence is VoIP (voice over IP), which has evolved from what many viewed as experimental to a fundamental technology on which businesses from small to Fortune 500 are running their enterprises. VoIP has moved to a level of reliability and capability such that mainstream users are adopting it at a rapidly increasing pace. For this to happen, a number of technical innovations were required to solve issues such as quality of service and reliability. This article explores key principles and technology innovations underlying VoIP, and describes the implications of these innovations for software developers. FROM ANALOG TO VoIP Telecommunications technology is entering its third wave with VoIP. It began with analog signals carried by the first telephones and evolved into digital networks decades later. Now, with the increasing sophistication of the Internet, VoIP is coming into its own. Analog Networks From the invention of the telephone in 1876 to today’s modern communications infrastructure, voice has been carried by analog wave signals. Human speech is an analog wave signal. In the initial telephone networks, speech was converted to electrical wave forms (microphone) and converted back to speech at the other end of the conversation (speaker), traveling the distance between the phones as this analog wave form. While an obvious leap forward over previous methods of communication, this early technology had severe limitations that included introduction of “noise” in the signal. This noise increases with distance traveled.

description

 

Transcript of VOIP Paper.doc

Page 1: VOIP Paper.doc

You Don't Know Jack About VoIP

ACM Queue vol. 2, no. 6 - September 2004 by Phil Sherburne and Cary Fitzgerald, Cisco

The Communications they are a-changin'.

Revolution

Telecommunications worldwide has experienced a significant revolution over recent years. The long-held promise of network convergence is occurring at an increasing pace. This convergence of data, voice, and video using IP-based networks is delivering advanced services at lower cost across the spectrum, including residential users, business customers of varying sizes, and service providers.

One of the key technologies driving this convergence is VoIP (voice over IP), which has evolved from what many viewed as experimental to a fundamental technology on which businesses from small to Fortune 500 are running their enterprises. VoIP has moved to a level of reliability and capability such that mainstream users are adopting it at a rapidly increasing pace. For this to happen, a number of technical innovations were required to solve issues such as quality of service and reliability.

This article explores key principles and technology innovations underlying VoIP, and describes the implications of these innovations for software developers.

FROM ANALOG TO VoIP

Telecommunications technology is entering its third wave with VoIP. It began with analog signals carried by the first telephones and evolved into digital networks decades later. Now, with the increasing sophistication of the Internet, VoIP is coming into its own.

Analog Networks

From the invention of the telephone in 1876 to today’s modern communications infrastructure, voice has been carried by analog wave signals. Human speech is an analog wave signal. In the initial telephone networks, speech was converted to electrical wave forms (microphone) and converted back to speech at the other end of the conversation (speaker), traveling the distance between the phones as this analog wave form.

While an obvious leap forward over previous methods of communication, this early technology had severe limitations that included introduction of “noise” in the signal. This noise increases with distance traveled. Although various methods of reducing noise were developed over the years, it remained a noticeable problem (remember the amount of static on long distance calls?). Another significant problem was one of economics. As the demand for communications increased dramatically post–World War II, the need to increase the carrying capacity of a pair of copper wires was significant. This led to the development of digital transmission capabilities in the long distance network.

Digital Networks

Page 2: VOIP Paper.doc

The early 1950s saw the introduction of technology that converted speech into digital signals. Specifically, the invention and deployment of T1 lines allowed for transmission of voice at 1.544 megabits per second (Mbps). (This is referred to as E1 in Europe and other places outside the United States, at a rate of 2.044 Mbps). Among other benefits, T1 lines addressed the two primary problems with analog voice transmission—noise and economics. Because the digital signals contained either 0s or 1s, the digital “repeaters” that were used to regenerate the signals over distances could also re-form the signals in a near-perfect rendition of the original. Thus, the impact of distance on the quality of the speech was virtually eliminated. The issue of economics was alleviated since a T1 line carries twenty-four 64-Kbps channels (32 for E1 lines). The mechanism to place multiple calls on the T1 line is known as TDM (time division multiplexing).

T1 technology (as well as higher digital transmission rates, including fiber optics) has been deployed extensively over the past 50-plus years. With the exception of the access portion of the network (i.e., lines from homes to the phone company), virtually all voice is carried over digital lines worldwide. Again, a great step forward in communications but with its own set of limitations. Critical among these is the nature of TDM connections, known as circuit-switched connections. Fundamentally, this means that a call from one end of the circuit-switched connection to the other always follows the same path through the network and consumes the same amount of bandwidth, whether there is useful data to be transmitted or not. For example, during a silent pause on a phone call, 64 Kbps of data are still being transmitted in each direction. From an economic viewpoint, this is obviously inefficient.

VoIP

Since the 1970s there has been an increasing use of packet networks for transmitting data. Today the most obvious use of this technology is the Internet. The nature of packet networks in general and IP (Internet protocol) in particular is that the data to be transmitted is split into small packets that include small amounts of address information added to each packet. These packets are sent out over the network—quite possibly taking different paths through the network, unlike a legacy TDM connection where data is simply data, and routing is established at call setup time. The packets are then reassembled at the destination node.

Packet-switched networks have significant advantages over circuit-switched networks. Among these is the ability of packets to take different routes through the network. In the case of network failures (transmission lines being cut, etc.), this allows the data still to reach the destination. In addition, the only bandwidth used is that required for useful data (other than a small amount of control information such as address bits).

People recognized that if a means could be found to use packet technology for the transmission of voice, then the limitations of TDM networks could be overcome. Voice packets could take different routes through the network, and only necessary bandwidth would be used rather than always transmitting even in the face of silence. Even more significantly, both data and voice could be carried on a common, packet-based network. This would simplify management by reducing the number of networks to manage, and lowering network facility and hardware costs.

By the early 1990s certain fundamental technologies were developed that allowed for initial efforts in VoIP. The rest of this article outlines those technologies and their implications for software developers.

Transmission

TECHNOLOGY OVERVIEW

Page 3: VOIP Paper.doc

Any discussion of VoIP must begin with a discussion of both bearer and signaling components. Bearer refers to the actual voice being sent over the network. Signaling refers to the information necessary for successful setup and teardown of the call. This includes the dialed digits, off-hook and on-hook information, originating number, etc. The separation of signaling from bearer information began in the circuit-switched digital networks—for example, ISDN. The concepts behind this were leveraged for VoIP.

Bearer Transmission

One difference between data and voice transmission is the sensitivity to delay associated with transmission across the network. Data is far less sensitive to delay than voice is. Anyone who has experienced an international call over satellite will recognize this sensitivity. This is partially solved by the use of RTP (realtime protocol) for the transmission of voice.

RTP is the standard protocol designed for realtime sensitive data transmission. Because of the realtime nature of voice, all VoIP traffic is carried as RTP packets. RTP “rides” on top of the standard UDP (user datagram protocol) and provides information to the endpoints not available in UDP. Specifically, RTP provides packet sequence information so endpoints can determine arrival order and time-stamping to allow endpoints to help manage “jitter” (discussed later in this article).

Voice coding standards. A number of different voice-encoding algorithms—codecs—are used in VoIP networks. These are standardized as a set of G-series recommendations by the ITU (International Telecommunication Union). Common ones are G.711, which encodes at 64 Kbps, and G.729, which encodes at 8 Kbps. Each of the codecs has different attributes, including compression level, quality, etc.

Considerations for different bearer traffic. Although the discussion so far has focused on voice, in reality, other types of information are transmitted over traditional voice networks. For VoIP to be practical and gain common usage, these types of traffic must also be handled effectively:

• DTMF (dual-tone multi-frequency). This refers to the tones generated by a common touch-tone phone. These are used for not only initiating a phone call but also communicating during a phone call—such as for voice-mail and IVR (interactive voice response) systems. When used for making a phone call, DTMF is part of the signaling information and not transmitted as part of the bearer information. When used mid-call, however, it is transmitted as part of the bearer data.

• Fax. The use of fax machines—although less common today than before the common use of e-mail—remains a critical form of data communication (such as in the legal profession). For broad market acceptance, VoIP networks and equipment must be able to handle traditional fax machines, given the large number deployed worldwide. The issue in handling fax on VoIP networks is that fax transmissions are much more sensitive to packet loss than voice is. Different methods (Fax Passthru and T.38 Fax Relay) have been developed to ensure successful fax transmission over VoIP.

Signaling Methods

An architectural model has evolved within the VoIP industry (see figure 1). Just as with any reference model, specific products or protocols do not necessarily strictly adhere to the model, but it has proven to be a useful framework for characterizing components and

Page 4: VOIP Paper.doc

their roles. The model is a PSTN (public switched telephone network) gateway, with a set of interfaces looking into telephone networks and a set of interfaces looking into VoIP networks, but it equally applies to IP phones and other VoIP endpoints.

The heart of the system is the MGC (media gateway controller). An MGC is an “intelligent” endpoint; it interacts with its peers to establish, modify, and destroy connections with its peers within a network. The manipulation of these connections results in various end-user services: call establishment, features such as transfer and park and hold, and call forwarding. The MGC is the component that supervises calls and services from end to end. Often it is implemented as a highly reliable system component, so call-related information must be mirrored across a complex of MGCs.

The MG (media gateway) is responsible for the media interfaces to the PSTN and to the IP network. Typically, an MG is implemented with a complex of DSPs (digital signal processors) to lower system costs, but general-purpose processors are also sometimes used, depending on the application.

An MG is a simple endpoint. It does only what it is told to do. It does not understand signaling to either the PSTN or the IP network. It does not understand services or even calls. It creates, modifies, and destroys connections as instructed by an MGC. These connections can be between the PSTN and the IP network, between PSTN ports, or even between IP-based endpoints. Since an MG does not understand the end-to-end nature of a call, it needs to concern itself only with the connections it is holding up, so the system reliability requirements for an MG can be somewhat relaxed.

MGCs and MGs interact with each other over a control plane, which can be a proprietary interface such as an internal API; standardized protocols have also been developed. Both the ITU and the IETF (Internet Engineering Task Force) saw the need for this protocol and cooperated to produce the MEGACO (H.248/Media Gateway Controller) protocol, a first-of-its-kind cooperation between the two standards bodies. It is published in the IETF as RFC3525 and in the ITU as H.248. H.248/MEGACO is in an early adoption phase of deployment. Eventually, this is expected to replace an earlier effort to standardize the MGC/MG control plane that has been known as MGCP (media gateway control protocol).

Page 5: VOIP Paper.doc

MGCP (IETF RFC3435) has been deployed in a number of networks and has been adopted in the ITU for application in VoIP over cable.

Quality

MGCs interact with their peers using an intelligent signaling protocol. Two intelligent protocols have emerged—from the ITU, H.323, and from the IETF, SIP (session initiation protocol). They share many concepts; both suppose that the endpoints are intelligent—but they also differ in significant ways.

H.323 is derived from the PSTN protocols used to access PSTN services—Q.931. VoIP connections in H.323 follow the ISDN model: the same message sequences are used to establish and tear down calls. H.323 has been extended to support a number of services; again, these follow an established model from established TDM network architectures. For example, a number of services are described in the H.450 series; these are modeled on the corresponding services from Q.SIG.

SIP is a methods-based protocol, whose roots are in HTTP. In general, services are not explicitly exposed in the protocol; rather, the designer can use a set of well-defined methods to implement services. So, for example, SIP does not have a transfer primitive per se, but executing a set of SIP transactions will result in the user experiencing a transfer. A significant amount of work is going on in the standards communities with respect to SIP, as well as a significant increase in market adoption of SIP-based equipment. SIP-based equipment is clearly expected to achieve a significant share of installed VoIP equipment over the next few years.

A number of exciting new services and concepts are coming out of the VoIP community. We highlight just a couple as follows: the impact of IM (instant messaging) and presence on converged communications; and ENUM, a mechanism for telephone number resolution in VoIP networks.

Instant messaging and presence. A significant number of features in the telephone network are devoted to the concept of increasing the probability that a call will be completed to the right recipient at a time that is acceptable to both the caller and the called party. IM and presence have recently emerged as important business and personal communications tools. Combining IM and presence with VoIP yields some valuable new features. Presence information can be used to determine whether offering a new call to a party is likely to be successful—there is no point in placing the call until the called party is available and willing to take the call. Instant messages can be used as part of the alerting process, which allows both called and calling parties to provide more information to each other about the nature of their communications. The two systems—VoIP and IM/presence—working in concert are more valuable than either one alone. VoIP deployments for these applications are in their very early stages.

ENUM. The best-understood and most widely deployed name resolution system today is the DNS (domain name system). In the DNS, names are written from right to left, with the most general part of the address on the right, and more specific names written to the left (e.g., www.ietf.org). In the PSTN, telephone numbers are written from left to right, with the most general part of the number written on the left and the more specific toward the right (e.g., 1.212.543.6789). ENUM calls for telephone numbers to be written DNS-style, rooted at the domain e164.arpa. So, 1.212.543.6789 becomes 9.8.7.6.3.4.5.2.1.2.1.e164.arpa. Interestingly, each digit is treated as a subdomain. This allows ENUM to ignore the nuances of country codes, city codes, etc. that vary broadly worldwide. When this address is queried, the DNS can return a specific IP address corresponding to the telephone number, or it can return a rule for rewriting the original number into some other form. For example, rules can be returned to rewrite

Page 6: VOIP Paper.doc

1.212.543.6789 as sip:[email protected], sip:[email protected]. ENUM offers the possibility to reuse the worldwide DNS for VoIP. ENUM is a standard set by the IETF as RFC3761.

Managing VoIP Quality of Service

Voice quality. The fundamental concern for VoIP QoS (quality of service) is voice quality. Unfortunately, objective measurements for this have been elusive. That said, the major factors that affect voice quality are delay, packet loss, and treatment at the endpoints.

Voice codecs are unevenly tolerant of packet loss, but loss above 2 to 5 percent will have a perceptible effect on quality. Loss is rarely random and is often associated with high jitter (simply defined as the variation in packet arrival times at the destination).

When one-way delay through a voice network exceeds about 150 milliseconds, natural conversational communication is strained, so most network deployments attempt to keep the delay well below that threshold. There are a number of components to delay: codecs have an intrinsic delay; it takes time to prepare and route a packet to the IP interface on the phone or gateway; various access networks have intrinsic delays; and transit networks contribute both in terms of routing delay and propagation delay.

Further, packets are generated at regular intervals, but because of the vagaries of routing across the IP network, they are delivered to the endpoint with a certain amount of jitter. Endpoints have built into the software a “jitter buffer” where packets are buffered and then played out at a constant rate. This, of course, works fine unless the amount of jitter exceeds what can be absorbed by the jitter buffer. Software-based mechanisms exist in the endpoints to automatically adjust buffer sizes, etc. as jitter increases or decreases. Jitter can be a major component of the delay budget.

QoS tools. The basic idea for controlling QoS revolves around two aspects: the first is ensuring that the network has enough capacity (bandwidth) to allow for high-quality calls; the second is establishing priorities such that the more realtime-sensitive packets are given higher priority for transit through the network.

To ensure enough capacity, there are mechanisms such as RSVP (reservation protocol, RFC 2750). This allows bandwidth to be reserved through the network. Using RSVP, the endpoints or MGCs signal through the network, reserving capacity. This is done in advance of a call being set up.

For priority management, the mechanism is different packet queueing methods within the MGs and routers. A variety of algorithms are available, with the best choice depending on the customer network and traffic types. Associated with this is the concept of TOS (type of service) bits. Within each packet there are three bits at the IP level that indicate up to eight levels of precedence. These are used to ensure that higher-priority packets make it through the network first.

Impact

SOFTWARE CONSIDERATIONS

System and software designers for VoIP equipment and networks face myriad challenges. Common concerns are QoS; security; manageability and operations, reliability, redundancy, and sparing; scalability; deployability, installability, and upgradability; serviceability, capacity management, fault detection, diagnostics, reparability, and metrics; testability and regressions; internationalization; performance and graceful

Page 7: VOIP Paper.doc

degradation under fault conditions and load; extensibility; interoperability with both IP and legacy systems; modularity; manufacturability and costs; open systems and standards compliance; ease of use for end users; consistent/normalized database use; billing and audits; and feature interactions.

Reliability

Expectations for the reliability of VoIP are as high as those for the traditional voice networks. Although there are different measures of reliability (such as the oft-misused “5 9s”), for our discussion the assumption is that the VoIP system must work all the time (7 x 24 x 365). While there are occasional “maintenance windows,” the expectations are the system is always operational (think about a 911 call center in a major metropolitan area).

At a high level this means the software designer must “design for failure”—that is, the designer must consider potential failures in three domains:

1. In the network, which might be caused by external events such as power failures or line cuts.

2. In the hardware, including processor, memory failures, etc.

3. In the software, which may be the result of bugs, corrupted data, etc.

Thus, the software designer must include capabilities such as:

• No upgrade downtime. This typically implies some form of duplicated active/standby systems with data synchronization between the active/standby systems.

• Software audits. There should be separate software components that audit the primary system software. This includes validating the internal data structures for accuracy as well as consistency among data structures. Corrective action may include automatic correction of the invalid data.

• Process monitoring. This means having system monitors that ensure the primary system software is operating correctly. This includes techniques such as watchdog timers—that is, having the primary software send a message on a regular basis to the monitoring software indicating proper functioning. Corrective action by the system monitor may range from process restart to system failure over to a standby system.

• Automatic failover. As a response to certain types of failures—including a full system failure—the system automatically fails to a standby system.

• Geographic redundancy. This is the ability to have the active and standby systems separated by hundreds of miles.

Manageability

The need to manage their networks is critical for all customers, whether large or small. In many cases the cost of operating and managing voice systems—whether traditional TDM or VoIP—far outweighs the cost of the equipment. Therefore, the need for effective tools allowing for cost-effective management is important for successful deployments.

Page 8: VOIP Paper.doc

Manageability, as used here, covers many different areas, including accurate and flexible billing systems, error reporting and resolution, call tracing, adds/moves/changes, etc. Although VoIP does not create new concerns, manageability takes on additional roles.

Consider the need for call tracing, which typically arises when an end user complains about a dropped call, noisy lines, etc. A system administrator will then typically look at the call traces—the route the call has taken through the network—to identify the source of the trouble. As noted earlier in the context of a traditional TDM circuit-switched network, when a call is set up, the voice takes the same path through the network for the duration of the call. This makes tracing calls through the network reasonably straightforward by collating call detail records, etc.

In a VoIP network, the packets containing the voice may take very different routes through the network, which makes the issue of call tracing and diagnosing of intermittent problems much more challenging. This requires not only good instrumentation on the MGCs, MGs, and routers in the network, but also very sophisticated management tools that provide the correlation and reporting of the information.

THE IMPACT OF VoIP

In recent years, we’ve seen increasing adoption of VoIP networks for customers of varying sizes on a global basis. The cost advantage resulting from convergence and the value of new applications offered by this convergence are the primary drivers of this adoption. With this comes the need for increasingly sophisticated systems and management tools to allow for the extensive adoption and deployment of VoIP.

VoIP’s increasing adoption will have a significant impact on our communications and the products that provide those communications. Therefore, software developers across the industry will increasingly need to be aware of and understand the challenges that come with this latest change in the communications infrastructure.

PHIL SHERBURNE is senior director of the Voice Technology Group, Cisco Systems. He runs the Call Control Division, and his team is responsible for the development and deployment of Call Control technologies including the Cisco Call Manager, BTS 10200, PGW 2200, and the SIP Proxy Server products. Previously, he was general manager for the Packet Telephony Call Control Business Unit responsible for the Softswitch products from Cisco. During his career with Cisco, he has been involved with a number of VoIP products and offerings.

Prior to joining Cisco in 2000, Sherburne spent more than 20 years at AT&T and Lucent Technologies Bell Laboratories, where he was involved in development of both PBX and messaging products. He has a B.Sc. in computer science from the University of Oregon and an M.Sc. in computer science from Ohio State University.

CARY FITZGERALD is senior director of the Voice Technology Group at Cisco Systems. He joined Cisco in 1996 and formed the team that built the first commercial VoIP gateway. He is a key contributor setting Cisco’s VoIP architectural directions. Prior to joining Cisco, FitzGerald was a distinguished member of technical staff at AT&T Bell Laboratories, where he led architecture and design teams for voice-response and voice-mail systems. He has a B.S. in computer science from Purdue University.

Not Your Father's PBX?

ACM Queue vol. 2, no. 6 - September 2004 by James E. Coffman, Avaya

Page 9: VOIP Paper.doc

Integrating VoIP into the enterprise could mean the end of telecom business-as-usual.

The Telephone

Perhaps no piece of office equipment is more taken for granted than the common business telephone. The technology behind this basic communication device, however, is in the midst of a major transformation. Businesses are now converging their voice and data networks in order to simplify their network operations and take advantage of the new functional benefits and capabilities that a converged network delivers—from greater productivity and cost savings to enhanced mobility.

Convergence involves much more than simply sending voice packets over an IP (Internet protocol) network. It involves a significant new architecture that introduces advanced IP applications into the framework of an enterprise, with ramifications for communications that will play out over the years to come.

The discussion that follows describes what’s behind some major changes in communication systems design. New systems are evolving to become much more distributed, open, and made up of common, off-the-shelf components. (For an overview on VoIP, read Phil Sherburne and Cary FitzGerald’s “You Don’t Know Jack About VoIP” on page 30 of this issue).

WHAT IS A PBX?

Most of us are familiar with a PBX (private branch exchange) only as users who pick up the phone at the office to call someone inside or outside our business. In fact, although the most basic function of a PBX is to provide communications (usually voice) to the employees of a business, there is more to it than that. Sitting behind the familiar telephone is a sophisticated system of components that provides the functions necessary for communications.

These functions can be roughly divided into six major groups:

1. Feature operations. These consist of both those functions available to the phone user (placing a call, hold, conference, transfer, etc.) and those functions utilized by the operator of the system to control its use and how it is organized—phone number assignments, etc.

2. Endpoints. These allow users to access the functions of the system. The most common endpoints are telephones, but also included are fax machines, modems, PDAs, telephony applications running on laptop computers, and more. Most of the “non-voice” endpoints send signals to the PBX just as if they were simple analog phones. The media streams they send, however, are not voice-like and often require special handling by the system. For example, modems send special tones that tell the system not to perform echo suppression on the information they transmit. PBXs often have special features to suppress “call waiting” tones, which might be useful to human users but would disrupt data traffic.

3. Gateway interfaces. Gateway interfaces allow users inside the business to talk to the outside world. This requires conversion of both the call signaling (how calls are set up) and the voice stream. If someone in a business wants to call an outside phone, the PBX must signal to the outside system—typically the PSTN (public switched telephone network)—and must convert the voice stream inside the business to a voice stream expected by the PSTN.

Page 10: VOIP Paper.doc

4. Switching. Making a call requires that a path be established between the calling and called endpoints. This process is “switching,” and it can be accomplished in a variety of ways. It requires a network to tie the components (endpoints and gateways) together.

5. Media processing. Media processing functions combine and transform the voice streams in a call—to provide conferencing, music on hold, announcements, etc. Media processing is also needed to make sure the connection between two phones results in a path where both users can hear each other.

6. Application interfaces. The PBX also provides a voice network used to deliver voice services beyond simple endpoint-to-endpoint calling, including voice mail, interactive voice response (IVR), and other applications.

Additional Attributes. Several other important attributes of PBXs affect the way they are designed:

• Reliability. Most businesses expect their communication system to be available essentially all of the time. As a result, redundancy of components is built into all large systems.

• Scalability. Most businesses expect to grow over time and don’t want to switch communication systems as they do so. Supporting more users by “adding on” is an aspect of most systems.

• Cost effectiveness at various sizes. To be effective in the marketplace, communication systems must be cost-effective for businesses of all sizes.

• Interoperability. It is important for communications systems to work with systems and devices made by other manufacturers.

Traditional Private Branch Exchanges

YOUR FATHER’S PBX: THE TRADITIONAL ARCHITECTURE

How do you design a system that can provide the capabilities and has the reliability and scalability attributes described in the previous section?

The technology currently available obviously impacts system design. For example, in the early 1980s when PBX systems were developed, computing was an expensive resource. Microprocessors were available but were limited in function and also expensive. Data networking was relatively unknown and often based on circuit-switched models such as X.25.

Most PBXs developed at this time have a common architecture:

1. A control processor to run the software that operates system features. This processor is typically built to support the reliability and scalability expected of PBXs.

2. The communication software that runs on the control processor. This application drives all of the system components and determines the functions that it provides.

3. Endpoints used to access the features and functions of the system. There are two kinds of endpoints: Digital phones provide convenient access to calling functions through

Page 11: VOIP Paper.doc

buttons that are used to tell the system what the user wants (hold, transfer call, etc.) and through a small display used to show who is calling. These endpoints are usually proprietary to a single manufacturer. All traditional PBX systems also support analog phones that provide basic calling functions. Both digital and analog endpoints connect to interface cards in modules.

4. Modules. Sometimes called shelves, these house the interface cards that provide endpoint or gateway interfaces. An individual interface is generally called a “port.” Ports for digital phones usually provide enough power for the phone to operate even if the power to the office fails, assuming that the module itself has backup power. Interfaces to the PSTN are provided by a variety of interface cards. These interfaces convert from signaling and voice formats expected by the PSTN to those used internally to the PBX.

Modules also provide a certain amount of switching among the interface cards held within. Media processing for conferencing, music sources (for music on hold), and announcements is either built as a card that fits into a module or is built right into the interface cards.

5. Inter-module switching. This allows the interconnection of ports in different modules. Traditional PBX systems accomplish this via circuit switching. In circuit switching a dedicated path is set up between the two ports for the duration of the call. Calls to phones not within the business are switched to an interface in a module that enables connection to the PSTN. Often inter-module switching does not have enough capacity to connect all possible calls simultaneously, and its success depends on the fact that not everyone is on the phone at the same time. When call volume exceeds capability, calls are blocked.

The components of the system must be networked together for two purposes. First, a voice network is needed to create a voice path between devices. The voice network is created from switching elements within and between the modules. This is usually done via layers of TDM (time division multiplexing) switching, which is a technology that transmits multiple signals simultaneously over a single transmission path.

Second, a control network is required so the components can communicate with each other to implement system operations. For example, when a user pushes a button on a digital phone, a message indicating the operation requested is sent from the phone to the module it connects to and then to the communication application software. The control network is implemented in a variety of ways, often by stealing some of the TDM timeslots in each module and dedicating them to the transmission of control messages.

The data representation of voice is a 64-kilobits-per-second (Kbps, or 8K eight-bit samples per second) isochronous stream. The voice sample is generally encoded in one of two formats: Mu-law (used mainly in North America and Japan) and A-law (used almost everywhere else). This format matches the one used in digital interfaces to the PSTN. Using the same voice representation in a PBX as in digital interfaces to the PSTN reduces the work needed for the PBX to interoperate with the PSTN.

A module, along with a control processor (often housed in a special slot in a module) and a few interface cards, can provide service to a small number of users.

Systems grow by adding modules and interconnecting them with inter-module switching. The capacity of the modules and of the inter-module switching determines how small or large a system can be economically designed. This is typically based on the amount of switching capacity in these components (see figure 1).

Page 12: VOIP Paper.doc

Technology and Architecture

ENABLING TECHNOLOGIES

PBX systems have been present in businesses since the early 1980s with little change and predate data networking and PC technologies. After 15 years of relative stability, however, virtually all PBX vendors are now introducing radical changes to their architecture.

What technologies are enabling this change? Perhaps the most important is the development of packet switching into an IP-based network with the bandwidth, speed (low delay), and reliability to support voice communications. The development of this technology and its use in data networking have both enabled the change and provided a driver for it. Since most business data networks span the breadth of their organization, it became possible and advantageous to offer voice communication throughout an enterprise while using only one network.

Another essential enabler was the creation of inexpensive DSP (digital signal processor) technology. For voice streams to ride an IP network, they must be packetized and perhaps compressed. These operations require digital signal processing. Without the availability of inexpensive DSP technology, IP phones would have been too expensive compared with their traditional counterparts.

Another technology lowering the cost of IP devices was the creation of inexpensive network interface chip sets. Fast and inexpensive microprocessors also allowed more intelligence to be distributed to phone modules and interface boards and enabled the use of IP to distribute these functions.

THE NEW ARCHITECTURE

Page 13: VOIP Paper.doc

These technology changes have led to a re-design of PBX components. They are evolving to distribute components farther apart, to incorporate more off-the-shelf components, and to use an IP network to transmit both control information and voice. The common term for a PBX with this new architecture is IP-PBX.

Control Processor and Communication Software

The control processor often is an off-the-shelf server that runs communication application software on a standard operating system (Microsoft, Unix, or Linux). The benefits of moving to commercially available hardware and software are substantial, allowing vendors to lower their development costs.

Endpoints

Digital endpoints become IP phones and connect to the IP network rather than to dedicated interfaces in a module. The phones use the IP network to communicate both control and voice streams.

IP phones put some of the same strain on the data infrastructure as does a PC. Each requires an IP address and generally needs DHCP (dynamic host configuration protocol) service to acquire that address. IP phones often use an FTP server to get a new version of their firmware, and—for security or voice-quality reasons—they may be put into a special VLAN (virtual LAN), etc.

Phones require an IP address so that they can be identified by the communication application. Using standard IP network mechanisms, the phone acquires an IP address and the address of the application server, and “registers” itself. Registration allows the communication application to establish a correspondence between a phone number and the IP address used by the phone. Thus, users still dial familiar phone numbers, and the communication application uses the IP address of the phone to communicate with it.

IP phones use the IP network to carry voice streams directly to each other. Unlike the traditional architecture, when two IP devices are talking directly together, they do not use communication system resources to create the voice path.

Gateway Interfaces

Interfaces are still needed to provide access to the PSTN and to analog endpoints such as analog phones and fax machines. These “interfaces” are housed in gateways operating in much the same way as do modules in the traditional architecture, but they convert the signaling and voice streams to IP.

Some manufacturers also provide gateway interface cards allowing customers to continue using existing digital phones, protecting their investment in their existing infrastructure and reducing the cost of migrating to the new system.

Switching Functions

Page 14: VOIP Paper.doc

Inter-module switching is done over the IP network where bandwidth can be limited in certain circumstances (across the wide area, for example). Without provisions to limit the number of calls across a limited resource, the voice quality will degrade. This is analogous to “blocking” in circuit-switched networks, but all calls degrade in quality instead of some being blocked. Some systems enforce “call admission control” so that only a limited number of calls are allowed across the limited links, allowing those calls that do get through to maintain optimum voice quality.

Media Processing

Gateways may also provide the media processing found in traditional modules. Logically, however, this is a separate function, and specialized media processors may be used for this purpose.

System Architecture

Collectively the endpoints, modules, media processors, control processor, and communication software use the IP network to provide the same realtime voice communication functions as provided by a traditional PBX. The new architecture is a client-server approach: the clients are gateways and endpoints, and the server provides the communication application that operates the features. This approach is similar to the way e-mail or Web services are implemented, with a central server providing service to a set of client PCs.

It is important that the IP-PBX be “well-behaved” from a network administration point of view, with common tools and protocols for operation and management.

The other voice applications found in a traditional voice network—voice-mail and IVR—are also migrating to the IP network used for communications (see figure 2).

New Obstacles

CHALLENGES TO THE NEW ARCHITECTURE

Page 15: VOIP Paper.doc

As customers migrate to this new converged network architecture, they generally expect to keep all the positive functions and attributes of their traditional PBX while gaining new advantages. The IP-PBX has some challenges in this regard.

System Reliability

Traditional PBXs are highly reliable systems (many manufacturers claim 99.999 percent reliability—about five minutes of outage per year). The traditional PBX architecture achieves this with highly reliable components and with redundancy built in by the manufacturer.

A major question is how to provide reliability for the control processor of an IP network. It is a key component because if it fails, all the users of the system (who may number into the tens of thousands) will be without service. One approach to this problem is to rely on the fact that gateways and phones can register to multiple servers. If the server to which they are registered fails, they can register to a backup. Depending on how this is done and the intelligence of the gateway and the phone, the re-registration may or may not affect the active connection (voice path between the devices).

If the main processor and the backup share call control information, then after re-registration the callers can continue their conversation and conference additional parties, etc. This call continuity can be particularly critical in contact center operations where it is important not to disconnect callers “in queue” who are waiting to be answered.

Keep in mind that IP networks can be designed to be highly reliable—with multiple paths from device to device. In many real-world environments, however, there is a single IP link between the control processor and a gateway. If this link fails, then the area served by the gateway will no longer have service. This problem can be addressed by building intelligence into the gateway or into a separate processor so that control functions continue in the event that a primary link is lost.

Voice Quality

Voice is more demanding than traditional data communications (such as e-mail, Web pages, etc.) because of its realtime nature. To ensure voice quality, the following attributes of the IP network must be managed for all possible voice paths:

• Bandwidth. For the expected number of simultaneous IP voice calls

• Round-trip delay. The time it takes a packet to go from one IP device to another and back again

• Jitter. Variability in delay

• Packet loss. The number of packets lost (usually expressed as a percent)

Techniques for establishing an IP network suitable for voice must be addressed before the new architecture can be adopted. This usually requires incorporating various quality-of-service capabilities into the IP network, as well as additional bandwidth.

The traditional PBX architecture implements echo suppression mechanisms that assume a circuit-switched network. Within an IP network, the delay increases, requiring changes in echo-suppression capabilities. These considerations affect IP endpoints and gateways.

Page 16: VOIP Paper.doc

Traditional PBXs carry much more than voice over their “voice” networks. For example, modem traffic, fax, and multiple 64-Kbps channels for video are all found in a large enterprise. The equipment using these streams may not work well if the streams are transformed into packets and back into a continuous stream. The delay and possible packet loss introduced by the data network make it impossible for endpoints to maintain the synchronization they expect from a circuit-switched network. These limitations are being addressed as vendors create encoding and error-collecting techniques suitable for non-voice traffic.

IP Telephone Operations

Traditional PBX interface cards provide power to analog and digital phones. This job now goes to the IP network. There are several ways to deliver power to the endpoints, but the most convenient is to have the data switch in the closet provide “inline” power over the IP network. Standards for doing so have recently been ratified so that data-switching equipment from one vendor can power phones from another vendor. If communication is to be preserved through a power outage, the data switches need to be on uninterruptible power supplies.

One advantage of IP telephones is they can be easily moved from one office to another. One difficulty with moving phones is that 911 services require information on the location of the phone placing an emergency call. Again, standards are emerging that allow the IP-PBX placing the emergency call to query the data network for the identity of the data port to which the phone is connected. That port can be associated with a physical location for emergency responders.

Security

As PBX components “disaggregate” and become attached to an IP network, they also become potential targets for intrusion, denial of service, and other hacking threats. These voice communication system components must be hardened against attacks, like other parts of the network infrastructure. Some vendors offer encryption of voice packets to prevent eavesdropping via tools commonly found on the Internet.

The Future

WHY: BUSINESS DRIVERS

Moving to IP telephony over a converged network offers several important advantages over the traditional PBX approach, leading vendors to insist that IP telephony is the future and that virtually all PBX systems sold in coming years will use this new architecture.

Using the IP network to link IP-PBX components together gives an enterprise substantial flexibility in how a system can be configured. Remote locations can be incorporated into a single enterprise-wide communication system. Remote workers can have the same communications capabilities as those working in a headquarters facility. This can improve the communication capabilities within an enterprise, while lowering the total cost of system implementation and operation.

Software packages such as databases, SNMP (simple network management protocol) development environments, and Web servers are available on standard platforms. Thus, the communication system vendor can more easily integrate these components with the telephony application in an IP environment. This allows operators of the IP-PBX to use

Page 17: VOIP Paper.doc

familiar tools (Web browsers, SNMP management interfaces, etc.) to operate the system, resulting in lower administrative costs.

Open servers and IP network bandwidth also enable organizations to scale their communication systems to larger sizes. This can increase efficiency, particularly in contact-center implementations. Businesses can link employees in distributed locations to deliver “follow-the-sun” customer service and to take advantage of lower labor costs available in some parts of the world.

It is also possible to use the resiliency of the data network to increase the availability of voice communications. Businesses are moving their critical data and voice communication components to hardened, geographically dispersed “bunkers.” This makes the business more resilient in the face of fires, floods, and other disasters. The architecture of the IP-PBX lends itself naturally to this structure as the control processor can be located at a distance from the endpoints and modules.

Perhaps more important than these network-based advantages is that the communication system is an application on the data network just like the other applications used in business. Thus, this application can be integrated with other business services such as directories, e-mail services, etc. Many systems allow dialing from corporate directories or personal information managers and integration of voice and e-mail.

WHAT’S NEXT?

Now that the market has begun to evolve toward a new PBX architecture, what changes can we expect to see?

First, the difficulties and limits of the new architecture will be overcome. Enterprises expect the new systems to be as reliable and accomplish all the functions of traditional PBX systems. Thus, modem and other nonvoice TDM traffic will move over the IP network as it has moved over traditional voice networks. Standards are being defined and the increasing capacity of DSPs will bring this about in a cost-effective way.

The expectations of reliability for IP-PBXs will drive developments in the reliability and availability of the new architecture. Since an essential component of the new architecture is the IP network, improved diagnostic and network analysis tools will enable the quick diagnosis and repair of network problems impairing voice communications. Since security breaches will be able to disable both voice and data applications, techniques to protect critical business networks from denial-of-service and other attacks will be deployed. IP networks will become more resilient for all applications, not just communications.

Communication systems will take advantage of the new IP-based architecture by scaling larger and reaching farther. Even large enterprises will likely be able to implement a single communication system that ties all their employees together around the world.

Rich collaboration and video communication applications will merge with voice applications—becoming as easy to use and ubiquitous as traditional voice communications. Voice quality will no longer be tied to traditional network bandwidths; video room systems will provide stereo sound so listeners can locate talkers by position, improving audibility and “liveness.”

Page 18: VOIP Paper.doc

Audio capabilities will merge into PCs and into other mobile devices. No longer will mobile workers have to carry a “tool belt” of different communication devices.

We can expect such new capabilities to continue to drive the evolution from traditional PBX solutions to new, full-featured IP PBX models that will change the way businesses communicate—delivering greater productivity, cost savings, and mobility.

Definitions and Acronyms

Isochronous. A communication link characterized by both ends using a common clock source to send a constant bit stream.

IVR (interactive voice response). An automated system that can understand human speech and provided prerecorded information to the caller.

PSTN (public switched telephone network). The worldwide telephone network. The standards for PSTN interfaces are specified by the CCITT (now ITU, International Telecommunication Union).

TDM (time division multiplexing). A multiplexing technique by which a communication medium is divided into discrete time slots. Each time slot can be used as a communication channel between two devices. If multiple devices attach to a TDM medium, then the medium can be used as a switch.

LOVE IT, HATE IT? LET US KNOW

[email protected] or www.acmqueue.com/forums

JAMES E. COFFMAN has been with Avaya Labs for more than 20 years, working in a variety of areas in telecommunications such as multimedia communications, VoIP systems development, VoIP standards, Web access to call centers, CTI (computer telephony integration) systems development, and operating systems. He currently is a director in the vertical markets development group responsible for vertical market technology. Previously, he directed MultiVantage (now called Avaya Communication Manager) technical architecture and planning at Avaya Labs. Before that, Coffman was responsible for planning and architecture in bringing IP telephony to the Definity communication platform. Coffman has a B.A. in mathematics from Reed College and an M.S.E.E. and Ph.D. in computer science from the University of Pennsylvania. He holds several patents in the telecommunications area.

Page 19: VOIP Paper.doc

VoIP: What is it good for?

ACM Queue vol. 2, no. 6 - September 2004 by SUDHIR R. AHUJA AND J. ROBERT ENSOR, BELL LABS/LUCENT TECHNOLOGIES

If you think VoIP is just an IP version of telecom-as-usual, think again. A host of applications are changing the phone call as we know it.

Growth

VoIP (voice over IP) technology is a rapidly expanding field. More and more VoIP components are being developed, while existing VoIP technology is being deployed at a rapid—and still increasing—pace. This growth is fueled by two goals: decreasing costs and increasing revenues.

Network and service providers see VoIP technology as a means of reducing their cost of offering existing voice-based services and new multimedia services. Service providers also view VoIP infrastructure as an economical base on which to build new revenue-generating services. As deployment of VoIP technology becomes widespread and part of a shared competitive landscape, this second goal will become more important, with service providers working to increase their market bases.

Most current and envisioned VoIP services are so-called converged services, integrating features and functions from multiple existing services. Often, features from conventional voice-based telephony services are combined with those found in data network services. For example, click-to-dial services allow users to control telephone calls from Web browsers running on their personal computers. Converged services may also provide users with new media integration. For example, multimedia conference services allow users to interact with each other through calls in which they exchange both audio and video information (i.e., new versions of videophones).

The growing opportunities for converged telephony-Web services are motivating convergence of telephony and data networks. VoIP services are also driving another network convergence: integration of wireless and wireline networks. More general network convergence seems likely. Because IP networks can be relatively inexpensive, network providers are encouraged to build common IP core networks surrounded by various access networks. These access networks (wireless, wireline, cable, etc.) can share the IP core resources, and thus reduce the costs of providing common services to customers with different access devices.

Many engaging VoIP services are already available, and service providers are planning even more exciting services. Continued deployment of IP networks and IP endpoint devices will enable further development of new services. Also, as the processing capacity of IP endpoints increases—allowing them to deal directly with network access controls, multiple data formats, and transformations—more innovative and convenient services will become possible. This article introduces some noteworthy services that are being deployed today and highlights a few of the interesting future services.

CREATING NEW SERVICES

Conventional telephony services—those available to customers through the public switched telephone network (PSTN)—are built upon a highly structured technology base. This base was created and optimized to support voice calls using analog telephones. The

Page 20: VOIP Paper.doc

base provides application developers with integrated signaling/media transport (in-band signaling) and a limited set of signal handlers and media processors, which are isolated from other networks through their switched circuit connections. Since telephones support very limited signaling mechanisms, invocation and control of PSTN services have been awkward. Some services are invoked by dialing special phone numbers such as 800 or 900 numbers. PSTN services are often invoked and controlled through in-band signaling, which is typically activated through touch tones (DTMF, dual-tone multi-frequency) or voice (IVR, interactive voice response).

Fundamental control and media handling needed by PSTN service providers must be performed by special network elements (signaling control points, service nodes, etc.).

Figure 1 illustrates key components of a call-center service. In this figure, the 800 server is a service node; it is an application server that communicates with the class 5 and 4 switches via SS7 (Signal System 7) signaling protocols. This server deals only with control messages and not with voice itself. It helps establish the final route for the voice call based on the features it has implemented. For example, it can determine whether a call is routed to a company’s call center or to one of its retail outlets.

Service providers may require control and media processing not supported by network elements. This additional processing must be handled at call endpoints. The flow of information into and from an endpoint is through the voice channel itself, and therefore specialized controls must be built on audio controls (e.g., conversations with human operators, DTMF, or IVR). In figure 1, these endpoint application servers are represented by the call-center IVR server, which terminates voice connections and communicates via in-band signaling using DTMF or voice recognition.

VoIP technology provides richer, more flexible foundations for building communication services. IP networks support independent connections for signaling and media traffic. This decoupling of signal and bearer traffic eliminates interference between the information flows; in-band signaling is not required. Thus, communication with application servers is simplified.

In addition, IP network topology allows any node to act as a server. Therefore, multiple application servers and user endpoints—located in one or several service provider domains—can communicate via IP to participate in service support.

Finally, IP transport is provided by various underlying networks, and different network technologies can support different sets of services. For example, DSL and cable networks

Page 21: VOIP Paper.doc

provide broadband IP connections that support realtime voice, data, and video services. Hence, these network providers can offer “triple-play” services to their customers.

Implementation

Figure 2 shows how to implement a call-center service using VoIP technology. In the figure, user endpoints are telephones (not IP-based devices) attached to wireline or wireless access networks. An IP backbone interfaces to these specific access networks through border elements (e.g., media gateways). These gateways terminate voice calls for the users; they handle all TDM (time division multiplexing) voice traffic to and from users. The gateways recognize DTMF signals from the users and convert them to SIP (session initiation protocol) messages for the IP-based application servers. In addition, they convert between the users’ TDM voice payload and RTP (realtime transport protocol) media packets, which are used by the media processors. Several IP-based application servers work in concert, coordinating their activities through SIP signaling to provide the call-center service. The softswitch contains a SIP proxy to support this SIP coordination, and it contains media control functions to support coordination of media processing. The application servers may be geographically distributed and separated from endpoints and switches. For example, Web sites can use stored voice or music files to provide announcements. They can act as music-on-hold servers; a single announcement server is not required.

VoIP technology provides a foundation for creating many new converged services through different combinations of components. For example, IVR and Web components can combine—using SIP as a common signaling protocol—to create call-center services with access from Web browsers or IP phones, as well as voice-only telephones. Similarly, IVR servers and SMS (short message service) can combine to create call-center services that include SMS messages. Users will be able to access these call-center services via any of their access mechanisms or even simultaneously use multiple access technologies to provide better service. Alternatively, SMS systems can combine with Web-based information servers to create MMS (multimedia message services) in which messages may contain Web-based information and be retrieved by Web browsers.

Page 22: VOIP Paper.doc

NEW CONTROLS AND COORDINATION

Converged services can employ features from one set of services to control aspects of other sets of services. For example, click-to-dial services combine Web-based user interfaces with telephony servers to create Web-controllable phones. These services allow users to select (highlight) phone numbers embedded in Web pages, indicating that these numbers should be called. Such services are built by combining the PSTN, IP networks, and IP-based servers.

Figure 3 shows how a typical click-to-dial service works. When customers use their Web browsers to click on a telephone number within a Web page, their computer sends a message over a packet network to an IP-based click-to-dial server. This server, in turn, uses its connections to the PSTN to make telephone calls to the customer and to the number that customer is dialing. These calls are then bridged into a single call by a PSTN control element.

This example illustrates an important characteristic of VoIP services: they can be made as collections of multiple servers. These servers typically base their coordination on SIP signaling. SIP, however, provides a means only to locate and synchronize the initial interaction among the appropriate servers. Once the servers have rendezvoused through SIP, they must then exchange application-specific signaling through appropriate specialized protocols. In this example, the click-to-dial client and the Web server must exchange agreed-upon protocols (typically including HTTP) so that Web pages can be transferred to the user. Also, the click-to-dial client and the click-to-dial server must exchange an agreed-upon protocol to request and control the required telephony functions.

Service coordination and composition become important issues in the development and execution of VoIP services, as multiple application servers are often involved. The industry must develop techniques to coordinate distinct service elements within sessions. One fundamental problem is that service behavior is difficult to describe both formally and conveniently, which makes service coordination labor-intensive. A related problem is that the multiple servers used to create a service might not be in the same network. Therefore, one service provider might not be willing to publish details of its server for another provider. Another difficulty is that services can interfere with each another. For example, if a conference participant temporarily leaves, generating music on hold, this behavior can interfere with or even block continuation of the conference by the remaining participants.

Integration and Sessions

Page 23: VOIP Paper.doc

NEW MEDIA INTEGRATION

Many VoIP services are based on integration of multiple media. One such service is multimedia conferencing, which can be implemented by taking advantage of both SIP signaling and IP transport. SIP messages are available for server registration and rendezvous, as well as the controls that are needed to set up, conduct, and end sessions. Additional IP control messages are used to send media-specific commands. For example, service customers can use these commands to select video feeds, change codecs, change multicast groups, etc. IP transport is used to move the data representing the various media to and from servers and among users.

Figure 4 illustrates a conferencing service. Similar in overall structure to the IVR service depicted in figure 3, this system is based on a different set of servers: a multimedia conference server, an audio bridge, video server, and data-sharing server. The conferencing server coordinates the activities of the data-specific servers, which manipulate different sets of packet data corresponding to appropriate media. For example, the audio bridge receives encoded voice from all participants and distributes combined voice data back to the participants. As the figure illustrates, uniformity of endpoint devices is not required—each customer can participant in a conference through a different type of endpoint—e.g., cellphone, analog phone, or laptop. The media transmitted to/from each participant depends upon the capabilities of the participant’s endpoint device.

QoS (quality of service) is an important issue for IP-based multimedia services. Many current IP services have been deployed without QoS guarantees from underlying network providers. These services are successful because transport quality is sufficient to meet customer demands. Providers of these services, however, do not have assurances that their services can grow to meet the needs of larger customer bases while also meeting time constraints for the services. For example, IP-based voice and video services are being deployed in enterprises without explicit QoS support. Since the enterprise LANs used for transport have enough bandwidth to allow over-provisioning for realtime voice, and video, these services are successful. Timely transport of time-sensitive data, however, to support realtime multimedia conversations across worldwide networks, is harder to ensure.

Page 24: VOIP Paper.doc

We must solve these problems by using adequate transport performance and servers within the signaling and media transport paths that can react to messages within realtime constraints. These servers must process both signaling and bearer traffic within time bounds to meet processing needs associated with transcoding, composition, distribution, etc. Currently, servers capable of this processing are economical only for certain functions.

NEW USES OF SESSIONS

SIP sessions can be long-lived, and persistent sessions provide the foundation for some interesting new VoIP services. One example is an enhanced chat-room service, called Telechat, illustrated in figure 5.

In this application users can interact through voice, video, and data during multimedia conferences. They can also exchange private and public (broadcast) messages. Users can create and access stored data in a shared repository. The data can be imported from other applications, generated during chat sessions, and accessed during or outside of multiparty conferences. Service sessions are not restricted to calls, so they can be long-lived, extending over multiple calls or over other, shorter sessions. These longer sessions can form the basis for persistent state and data storage.

Persistent sessions support long-term interactions—and can serve as the rendezvous point for multiple calls. In addition, a persistent session can provide storage for data used in these calls. Hence, a persistent session can act as a direct representation for a long-term group effort. Enhanced chat-room services can be built upon persistent sessions, which can maintain a room state that is stable over the span of several chat sessions. This persistent state creates a context or surrounding environment for a series of chat sessions.

Persistent sessions create new challenges for system designers. Developers must decide where to maintain session state, which can be distributed among network servers and endpoints or restricted to subsets of these elements. Designers must also decide where to store the data associated with the sessions. In Telechat, for example, session state is stored on multiple servers. In a related issue, service providers must decide who owns what data. Billing for the resources needed to store persistent state is also a source of several design decisions. For example, service providers must specify whether a person who joins a long-term session pays for the session or pays for the connection/interaction with the session.

The Future

Page 25: VOIP Paper.doc

ONLY THE BEGINNING

VoIP is a disruptive technology that is causing significant change in the way voice communication services are delivered. It is providing future roadmaps for telecom networks. This is only the beginning of a more significant move to convergence. As the world moves to a common IP-based data network as backbone, VoIP is only one of the realtime services offered on such networks, along with many data services. The same network will also support video services from videoconferencing to entertainment video.

More important, these services allow convergence at the control and user levels. A user can initiate a call or TV program from the Web and then send a video from a camera phone to the user’s home Web site. Common Web-based services can be used for provisioning the user’s personal choices. Clearly, this is only the beginning of exciting services offered by full multimedia on IP.

An important architectural change is that all application servers will move out of specific networks and become more access-independent. Networks will become multiservice platforms. To do this effectively, networks have to provide flexible QoS mechanisms and the ability to create virtual networks to match the services being deployed. This is where many of VoIP challenges remain to be solved. Specifically, we still need ways to specify network requirements of a particular application (e.g., multiparty audio-conferencing) and we need to be able to map that to the multiservice network. Finally, we need to be able to provision such services and monitor their execution to guarantee delivery.

Last, but not least, is the challenge of integrating the ever-smarter endpoint and endpoint-based applications with the network-centric view presented earlier. Besides new service interaction issues, this raises many new concerns about ownership of the user’s data, authentication, billing for services, and responsibility for security.

VoIP is here and already leading the way not just to cheaper voice calls but also to a host of new applications. We need to focus on the challenges to enable a host of new multimedia applications.

What Is SIP?

SIP (session initiation protocol) is a text-based protocol for initiating communication sessions between users. These sessions may include calls with conventional telephones, voice, video, and data calls, multimedia conferencing, streaming media services, games, etc. SIP is defined by a collection of Requests for Comment managed by the Internet Engineering Task Force (IETF).

SIP messages are exchanged among two or more peers (IP nodes) for rendezvous and synchronization, thus supporting initiation of interactive communication sessions.

Once communicating parties have started their session through SIP messages, they are able to conduct the session through session-specific message exchange. These parties may also use SIP for additional session events, such as adding and dropping session members, changing media, and ending sessions.

SIP is fundamentally a protocol for communication among peers. SIP sessions are conducted by two or more communicating parties. These parties may be network endpoints—IP nodes associated with end-user devices—as well as network servers. If one SIP node knows the address of another node, the first may invite the second to join a SIP session. Thus, SIP sessions do not require support from network servers, but network intermediates typically help endpoints find one another. Users register their network

Page 26: VOIP Paper.doc

addresses with SIP registrars. Users usually send session invitations to one another through SIP proxies, which use registration information to locate invitees.

SIP sessions provide an extensible framework for a wide variety of interactions. They do not define—hence, do not constrain—specialized service behavior. Thus, they form the basis for many different communication services. SIP sessions support services typically accessed through packet data networks (e.g., streaming video-on-demand service). They also support conventional telephony services (e.g., conference voice calls).

Because SIP is a framework in which both telephony and nontelephony services have been developed, SIP has encouraged convergence of services. In particular, SIP is encouraging convergence of telephony and Web-based services. These converged services include Web phones, Web-based management of telephony services, and interactive games in which players can talk with one another in conference calls.

Additional information is available from the SIP working group of the IETF at http://www.ietf.org/html.charters/sip-charter.html.

SUDHIR AHUJA is vice president of the Converged Networks and Services Research Laboratory at Bell Labs/Lucent Technologies, where he is leading research in converged networks, services, speech recognition, text-to-speech coding techniques, video-based communication, and novel multimedia applications. He designed and developed the first large-scale multiprocessor at Bell Labs and championed the first Internet-based video conferencing system. His current interests are in the field of communication applications over the Internet.

Ahuja obtained his M.S. and Ph.D. degrees in electrical engineering from Rice University. His undergraduate education was at the Indian Institute of Technology, Bombay, where he received the President’s Gold Medal for outstanding academic performance. He is a Fellow of Bell Labs and has served as chairman for the Multimedia Services and Terminals Committee of the IEEE Society, area editor for the IEEE Communications Committee, and editor for Transactions on Networking, a joint publication of IEEE and ACM.

BOB ENSOR is a technical manager in the Services Infrastructure Research Department at Bell Labs/Lucent Technologies. He leads research and development efforts in next-generation network architectures and components. Earlier, he served as principal researcher in several projects at Bell Labs, including broadband service data centers, multimedia messaging systems, shared virtual worlds for the Internet, and multimedia conferencing systems. Ensor holds several patents and has published numerous papers. He received his Ph.D. in computer science from SUNY at Stony Brook