7. STANDARDS FOR INTERNET TELEPHONY AND MULTIMEDIA SYSTEMS

21
7. STANDARDS FOR INTERNET TELEPHONY AND MULTIMEDIA SYSTEMS 7.1 Overview on ITU-T Standards ITU-T stands for Telecommunications Standards Sector of International Telecommunication Union, the headquarters of which is located in Geneva. ITU-T is the dominant de-jure standardization body focusing on worldwide telecommunication standards. The practical work of ITU-T is carried out in 15 Study Groups (SG). The Study Group in charge of multimedia standards is the recently established SG16. Internet telephony standardization falls within the scope of SG16. In addition to ITU-T, ISO and IEC issue worldwide de-jure standards, such as MPEG1, which influence communication systems and devices. Furthermore, an increasing number of de-facto bodies such as IETF, ATM Forum, and DAVIC make specifications, which bear close resemblance with communication standards. Since 1996, a huge step forward has been taken as ITU-T has started to strengthen its cooperation with the de-facto bodies. This is essential to succeed in future standards efforts of the merging telecommunication and computer communication infrastructures. ITU-T standards for Internet telephony is a collection of diverse standards issued by ITU- T itself and IETF. As a matter of fact, the set of standards does not restrict to telephony only, but it covers services and technology for real-time multimedia applications conveyed across a packet switched IP network. A taxonomic map of the relevant standards appear on Appendix 1. The ITU-T recently approved Recommendations for low bit-rate multimedia communication systems, which include standards for audio coding, video coding, system control, multiplexing as well as media stream synchronization and packetization. The networks involved are PSTN, digital mobile, and Intranet/Internet or IP based WANs (ATM or Frame Relay). The system specifications are in contained in following Recs.: H.323 (Visual Telephone Systems and Equipment for Local Area networks which Provide a Non-Guaranteed QoS)[ITU 96d] H.324 (Visual telephone Systems and Equipment for PSTN) [ITU96c]. H.310 (Broadband Audiovisual Communication Systems and Terminals) H.321 (Adaptation of H.320 Visual Telephone Terminals to B-ISDN Environments H.322 (Visual Telephone Systems and Terminal Equipment for LANs which Provide Guaranteed Quality of Service). Subsequently we will focus on the two first standards, which are the most important ones in the short term. The forthcoming mobile extension of H.324 bears the notation H.324/M. H.323 is not only applicable for low bit rate connections, but also high speed rates up to hundreds of kbps. The set of Recommendations define the technical requirements for conversational multimedia communication systems, but they support also non-conversational mode of

Transcript of 7. STANDARDS FOR INTERNET TELEPHONY AND MULTIMEDIA SYSTEMS

7. STANDARDS FOR INTERNET TELEPHONY ANDMULTIMEDIA SYSTEMS

7.1 Overview on ITU-T Standards

ITU-T stands for Telecommunications Standards Sector of InternationalTelecommunication Union, the headquarters of which is located in Geneva. ITU-T is thedominant de-jure standardization body focusing on worldwide telecommunicationstandards. The practical work of ITU-T is carried out in 15 Study Groups (SG). TheStudy Group in charge of multimedia standards is the recently established SG16. Internettelephony standardization falls within the scope of SG16. In addition to ITU-T, ISO andIEC issue worldwide de-jure standards, such as MPEG1, which influence communicationsystems and devices. Furthermore, an increasing number of de-facto bodies such as IETF,ATM Forum, and DAVIC make specifications, which bear close resemblance withcommunication standards. Since 1996, a huge step forward has been taken as ITU-T hasstarted to strengthen its cooperation with the de-facto bodies. This is essential to succeedin future standards efforts of the merging telecommunication and computercommunication infrastructures.

ITU-T standards for Internet telephony is a collection of diverse standards issued by ITU-T itself and IETF. As a matter of fact, the set of standards does not restrict to telephonyonly, but it covers services and technology for real-time multimedia applications conveyedacross a packet switched IP network. A taxonomic map of the relevant standards appearon Appendix 1.

The ITU-T recently approved Recommendations for low bit-rate multimediacommunication systems, which include standards for audio coding, video coding, systemcontrol, multiplexing as well as media stream synchronization and packetization. Thenetworks involved are PSTN, digital mobile, and Intranet/Internet or IP based WANs(ATM or Frame Relay). The system specifications are in contained in following Recs.:• H.323 (Visual Telephone Systems and Equipment for Local Area networks which

Provide a Non-Guaranteed QoS)[ITU 96d]• H.324 (Visual telephone Systems and Equipment for PSTN) [ITU96c].• H.310 (Broadband Audiovisual Communication Systems and Terminals)• H.321 (Adaptation of H.320 Visual Telephone Terminals to B-ISDN Environments• H.322 (Visual Telephone Systems and Terminal Equipment for LANs which Provide

Guaranteed Quality of Service).

Subsequently we will focus on the two first standards, which are the most important onesin the short term.

The forthcoming mobile extension of H.324 bears the notation H.324/M. H.323 is notonly applicable for low bit rate connections, but also high speed rates up to hundreds ofkbps. The set of Recommendations define the technical requirements for conversationalmultimedia communication systems, but they support also non-conversational mode of

operation to a limited extent. It is worth noting that H.324 system is a non-packet basedmultimedia standard, which is compatible with H.323 on bit stream level of speech andvideo, but employs circuit switched network access.

7.2 Speech Codecs for Wireless Multimedia and Internet Telephony

Good speech quality is the most important design goal of most multimedia systems. If thespeech quality is not acceptable, the other media content tend to become useless albeit thequality would be good. In this section we briefly deal with some fundamentals of standardspeech codecs, in particular from QoS point of view. Specifically the delay problemsrelated the ITU-T G.723.1 [ITU 96a], speech codec are discussed. This codec is the keycomponent for low bit-rate multimedia applications as well as for Internet telephony,because of the decision taken by the VoIP Consortium in March 1997. According to thedecision, G.723.1 was chosen as default codec for Internet telephony.

7.2.1 Speech Coding Standards

Speech coding is trade-off between bit-rate, delay, complexity, cost and delay. All thecurrent low bit-rate speech codecs G.723.1 (5.3/6.3 kbps) G.729 (8 kbps)[ITU 96b],GSM (13 kbps), IS-54 (7.95 kbps), IS-95 (9.6 kbps), and PDC (6.7 kbps) belong to sameclass of codecs, namely linear prediction analysis- by-synthesis (LPAS). The prevailingalgorithms are mainly based on vector excitation coding, in which is the excitation isderived from a codebook as a closest estimate to the transmitted vector. The complexity ismeasured in MIPS of the signal processing needed. Speech codec of 30 MIPS is regardedas complex one, whereas speech codec using less than 15 MIPS is a low complexity one.

G.723.1 supports two bit rates, namely 5.3 kbps and 6.3 kbps [Cox 96b]. ITU-T SG16 iscurrently developing a 4 kbps speech coding standard, the target dead-line of which is 3Qof year 2000. The objectives are ambitious: speech quality, error resilience, andperformance in the presence of background noise not worse than those of 32 kbpsADPCM codec (G.726). Maximum allowable codec delay is 45 ms. Acceptable per-formance under adverse network conditions is an essential design objective to address theneeds of mobile multimedia applications. The technology has not been decided yet, but alikely candidate is some kind of CELP based technology.

7.2.2 Evaluation Methods of Speech Quality

The speech quality is measured using subjective methodology. The Speech Quality ExpertGroup (SQEG) of ITU-T SG12 is in charge of developing quality testing methods for thesubjective tests [ITU 96k], [ITU 96l]. The commonly used test is ACR, absolute categoryrating. The subjects, selected for the test, listen to samples of speech for 8-10 seconds andthey are requested to give there score using a scale of 1-5. The numerical values derivedcorrespond to mean opinion score (MOS). The robustness of speech codec under adverseconditions is an important design parameter. Also the ability to use the codec in multipointconfigurations needs to be considered. In this case, each digital speech signal has to be

decoded, summed and re-encoded. This combined tandeming and audio bridging not onlyimpairs the speech quality, but doubles the delay as well. The tolerance of the speechcodec to allow 2-3 tandeming without severe degradation of quality is important, becausetandeming is confronted per force in the current telecommunication network. In the futuresuch need will decrease for two reasons. First the terminals are expected to support arange of different speech coding standards avoiding thus the need for transcoding. Seconddigital transcoders with rate adaptation avoid A/D and D/A conversions, which are pri-mary sources of degradation.

The background noise is taken into account by adding speech-correlated noise to thesamples. The measure of the noise is called MNRU, modulated noise reference unit.Another form of degradation is quantization distortion, stemming from reducedbandwidth. The measure of it is QDU, quantization distortion unit.

Besides the ACR test, also DCR test is used. In DCR the subjects hear the original samplefirst uncoded and then the coded one. They are requested to give their rating using thefollowing score in terms of degradation to the reference sample:5 No perceptible distortion4 Perceptible, but not annoying3 Mildly annoying2 Annoying1 Very annoying

This test is better than ACR in such cases when the quality of original reference sample isalready impaired, because the subjects do hardly distinguish the effect of background noiseon the quality.

A MOS of 4.0 is considered commonly good quality. So called “toll quality” pertains toMOS values above 3. The need for improving the speech quality is clearly indicated by theCTIA recent user survey. It is likely that rapidly advancing signal and speech processingtechnologies will allow a MOS score of at least 4 with good performance in adverseoperating conditions for the next generation standards even at lower bit-rates than today.

7.2.3 Delays in Speech Communication

The delays of speech impair human interaction and should be kept as small as possible.Therefore, reduction of delays is prime target in real-time communications. This tends tobe difficult due to restrictions in speech coding technology and network conditions.Subsequently, some fundamental issues of the overall delay are discussed.

Rec. G.114 [ITU 96a] defines 400ms as the maximum allowable overall delay. In real life,such a long delay impairs severely human interaction. According to SQEG, users do notget distracted if the one-way delays remains below 200ms [SQE97], which is a moreappropriate design goal than G.114. Unlike geostationary (GEO) satellites LEO and ICOsatellite links incur fiber-like delays, which make them more appropriate for real-time

applications. The delay of GEO satellite is around 260 ms. Given that delays in low bit-rate or Internet telephony applications usually are more than 200 ms, GEO links areinappropriate for such connections.

The residual delay of a speech codec consists of codec delay and transmission delay (seeFig. 14). The codec delay is composed of algorithmic delay, which is implementationindependent, and processing delay. The transmission delay and processing delay dependon network conditions and implementation. As nominal system delay for G.723.1 a figureof 97.5 ms has been given [Cox 96b], which is derived from 37.5 ms algorithmic delay,and 40 ms of processing delay and 20 ms for transmission. Therefore, in subsequentcalculations, when the network portion is replaced by something else, i.e. the Internet or adigital mobile network, we have used a figure of 80 ms for coding delay and assessedseparately the network transit delay.

Figure 1 Delay introducing entities in digital speech communication

The speech coding scheme G.723.1 was originally designed for low-bit ratevideotelephony specifically in the PSTN. The long delay does not matter invideotelephony, since the coding delay of a video codec tends to be substantially longerthan that of a speech codec. Furthermore, the low repetition rate of video frames (5 Hz)equates to 200 ms, which means that the audio has to be delayed in any case to insure lipsynchronism. Therefore, it was felt in SG16 that low delay, which tends to increasecomplexity and cost, was less important. The possibility of using a G.723.1 codec innetworks with inherent long delays such as digital cellular networks or Internet emergedlater on, when SG15, the predecessor of SG16, decided to develop a mobile extension ofH.324. In terms of Internet, nobody could foresee the option of using G.723.1 as anInternet telephony default codec at that time, when SG15 confirmed design constraints forG.723.1.

In a digital cellular network, a speech codec needs to cope with a much more adverseoperating environment, which inherently introduce long delays due to delay spreadequalization, echo cancellation, bit interleaving, and error control. The same goes for theInternet, which is plagued by long residual routing delays. Therefore, the delay margin leftfor speech codec is much lower than in PSTN to keep the overall residual delay within

τad1

algorithmicdelay

τ pd1s

processingdelay

τ td

Propagation delay ofnetwork= transmissiondelay

τpd2

processingdelay

τad2

algorithmicdelay

Overall system delay τd

Speech Encoder Speech Encoder

τd = (τad1 + τad2 + τpd1 + τpd2) + τtd

reasonable limits. It is easy to insert additional delay into digital data when needed, butonce introduced, you cannot take it away.

The competing audio coding scheme G.729, which lost the voting in VoIP Forum forInternet telephony default codec, introduces a coding delay of 25 ms only. The SQEG hasrecently pointed out that extensive tests have been made for G.729 to evaluate itsperformance on tandemed connections [Hay 96]. Such test results are not available, not atleast in public, for G.723.1. The quality of G.723.1 and G.729 do not otherwise signifi-cantly differ from each other under normal conditions i.e. without any transcoding beingpresent.

The good news from the VoIP decision is that G.723.1 is the default codec of H.324systems, which facilitates interoperability between Internet telephones and H.324 terminals(no transcoding is needed). Besides incurring additional cost, transcoding tend tointroduce additional delay. Therefore, it is more than desirable that interoperability isprovided on peer-to-peer level. In the future, one general purpose signal processor,equipped with a SW package to support a range of different speech coding schemes, canbe foreseen. Yet transcoding cannot be avoided completely due to the structure of existingtelecommunication network.

In Section 8 we have assessed the delays in some basic network configurations betweenG.723.1 voice over packet radio and H.323 Internet telephony as well as betweenH.324/M and H.323.

7.3 ITU-T H.324 Recommendation [ITU 96d]

H.324 is a system specification for low bit-rate real-time multimedia communication. Theprime design objective was to provide best possible performance for audio, video , dataover low bit rate connections (read: sub 64 kbps). The standard was initially targeted toPSTN [Lin 96], but does not rule out other low bit-rate networks. For instance currently amobile extension of H.324 is being developed in ITU-T SG16. The H.324 standard restson the existing widely used H.320 videoconferencing system standard in the sense thatinteroperability at best level of common mode between H.320 systems is an importantdesign goal

Figure 2 H.324 block diagram [Lin 96]

7.3.1 H.324 Functional Entities and Communication Procedures

The main building blocks and related standards of a H.324 system are depicted in Fig.15and respective protocol architecture in Fig. 16.

The interface to physical PSTN includes a V.34 modem and V.8/V.8bis DTE-DCEinterface protocol is used for establishing connection between the modems at start-up. Thecontrol of H.324 is carried out by H.245 control protocol [[ITU 96h], which is based onlogical channels. The logical channels can be opened i.e. activated for the media streamsneeded by an application. For data protocols such as T.120 [ITU 96j] or V.70, a bi-direc-tional pair of logical channels need to be opened. The T.120 stack supports multipointoperation, which enables a H.324 terminal to join a multipoint videoconference. TheH.223 multiplexer mixes the various media streams and demultiplexer extracts them fromthe incoming data. Synchronization of the incoming data is carried out with HDLC flags atthe beginning and end of data frame (HDLC is better known as synchronous link protocolfor SS7).

For low bit-rate video coding, a new scheme developed from H.261, the currentlydominating standard, for H.320 video conferencing. Regardless of the fact that H.263 hasbeen created for sub 64 kbps connections, it provides better quality even on higher bit-rates than H.261. In addition to the picture formats FCIF and QCIF, supported by bothH.261 and H.263, the latter one supports also SQCIF. This is rather low spatial resolutionof the video, and it is debatable whether it will have any practical significance. The speechcodec of H.324 systems is G.723.1 has been discussed in more detail in the Section 7.2.3.

The Transport Layer error control is optional for H.324. It will be needed for enhancedinteroperability with H.324/M systems. Interoperability on level 0 is always ensured evenwithout support for error recovery, but then QoS is determined by the BER of the mobilelink i.e. 10-2 - 10-3. For speech, the QoS remains above the toll quality threshold even atthis very low BER level [Hay 96].

Application layer Audio apps Video apps System control Data Apps

Presentationlayer

G.723.1 H.263H.261

H.245 Control Protocol T.120

ModemcontrolV.25ter

Mux/Demux l l a a y y e e r r

H.223

G.723.1equipment

Data protocolsV.14, LAPMetc.

Control protocolH.245

SRP/LAPM

path delay

PSTN

User dataapplicationsT.120 etc.

System control

Scope of Rec. H.324

Session layer SRP/LAPM V.14, LAPMTransport layer H.223, optionally ITU-T

forthcoming error control(FEC+ARQ) protocol

H.223 H.223

Network layerLink layer V.8/V.8 bis V.25 ter V.8/V.8 bisPhysical Layer V.34 Modem

Figure 3 Protocol architecture of a H.324 system

Table 1 Phases of communication between two H.324 terminals

Phase Mode FunctionA POTS call setup An ordinary telephone call is set up, with

ringing, answering, either automaticallywith V.25 ter procedures or by the calleduser

B Analogue voice conversation The calling user can originate a POTS callfirst and upgrade the call later on into amultimedia call using the V.8 bis protocol(AVD, alternate voice and data)

C multimedia call setup, establishment ofdigital communication for DSVS (digitalsimultaneous voice and data)

A start-up phase of around 10 secondstakes place, during which initial terminalnegotiation and modem (equalizers)“training” for optimal line equalization takeplace.

D Completion of terminal negotiation Exchange of terminal capabilities usingH.245 control protocol (duration < 1 sec.)

E Multimedia communication mode, userscan exchange multimedia information inDSVD mode

All logical channels are available during thecall and can be opened or closed under usercontrol

F Termination of the H.324 portion of thecall

All logical channels are closed and H.245messages specify call disconnect, transferto POTS or another digital mode such asfax or V.34 file transfer.

G Call disconnect, switch to POTS or speci-fied digital mode

7.3.2 Call Initialization

Different phases of a H.324 call are presented in Table 9 [Lin 96]. V.8 bis allows the userto alternate between analogue voice (POTS) and multimedia mode during the same call. Inthe phase C the modems communicate on 300 bps exchanging some basic data on theircapabilities. No line equalization is needed at this low speed. The stepwise increase of theV.34 modulation in 2.4 kbps increments up to a maximum speed of 28.8 kbps (for V.34Q

33.6kbps) is initiated with appropriate parameter adjustments to optimize the modem per-formance. Unlike packet switched mode a the synchronous operation mode of H.324introduces negligible propagation delays, but the coin has its flipside as well.

The start-up is the Achilles’ heel of the H.324 system. Terminals with V.8 protocol only,has the disadvantage that users cannot greet each other by using first POTS. The calledparty just hears the V.34 modem tones for a duration of 10 seconds or even more withoutknowing, who is calling. In any case, communication is unusable during the start-up andthere is no good solution to the problem. One possibility is to train the modems up to halfspeed and start audiovisual communication first at a reduced quality level. Subsequentlythe training is completed during the multimedia communication phase. The duration of theunusable communication phase is reduced by around 50% at the expense of reducedaudiovisual quality at the beginning of the call.

7.3.3 Mobile Extension of H.324 (H.324/M)

The Mobile Extension of H.324 (Annex C to H.324) is under development in ITU-TSG16. The targeted deadline for the Annex C is January 1998. In its Nice meetingFebruary 1997 the Experts’ Group for Very Low Bit-rate Video Telephony decided ITU-T SG16 that unlike originally intended no link layer protocol will be provided over the airinterface. Instead an error control mechanism will be provided on end-to-end level as anextended functionality of H.223. Due to this decision, no gateway is needed in thenetwork and no upgrades are needed in current cellular networks. The error controlprotocol to be developed is likely to be FEC based or a combination of ARQ and FECerror recovery technologies to achieve a net BER, say 10-6. Usually FEC schemes are notcapable of correcting errors fully reliably. In particular, for some error patterns, thatfortunately occur fairly seldom, FEC tends to make coding errors. Bit interleaver spreadsburst errors over time domain in such a fashion that the probability of multiple errors insame code block is significantly lower than that of single errors. ARQ can handle thoserare patterns without significant increase in overall latency. ARQ tend to increase the sizeof receiver buffer and latency. Therefore, it is usually not feasible for audio and video, inparticular in cellular networks with long inherent propagation delay. Under severeconditions, Reed-Solomon (RS), Bose-Caudhuri-Hocquenghem (BCH), or RatePunctured Convolutional Code (RCPC), are used for channel coding.

The SG16 Experts’ Group has decided to include a multilevel error control scheme toH.324, the basic idea of which appears on the Fig. 17 [Wim 97]. The scheme is notexclusively intended for mobile networks, but for error-prone environments in general.Depending on the terminal type, error control may be supported from level 1 to 3. AllH.324 systems can interwork at the level 0, but under degraded network conditions theQoS may remain rather poor.

plain H.223

longer synchronization flagno HDLC 0-insertion

Level 1 + improved header withmultiplex payload length field

Level 2 + improved error pro-tection of the adapation layer

Level 3

Level 2

Level 1

Level 0

Figure 4 Error control scheme for H.324 in error-prone environments

On the first level, the synchronization is improved by better protection of the sync flags.The level 2 adds error robustness in the header. The algorithm for the error protection isstill open. The level 3 adds payload specific error protection to the adaptation layer restingabove H.223 [ITU 96f] multiplexer layer, which means that multiplexed media streams aretreated separately based on different algorithms. This approach stems from the fact thatthe error tolerance of different types of media components varies within a broad range.For instance audio is fairly error tolerant due to high degree of redundancy, but even asingle error may be fatal for application data. Video falls between these two extremes. Thelevel 3 error protection algorithm has not been decided yet. The likely candidates are RSor RPCP. The RS codes are more effective for burst errors, whereas RCPC suits better forcorrection of single error patterns. Burst errors are more common in wireless networkswithout bit interleaving such as PHS and PACS. The levels are to be included in H.245control protocol and terminals negotiate their error protection capabilities at the beginningof a call. User may choose the level, which offers the best performance. The overallperformance is not necessarily improved when a higher level is selected. The additionalerror protection is achieved at the expense of increased overhead. This entails less band-width for payload and impaired video quality.

The suggested protocol architecture for mobile H.324 systems is depicted in Fig. 18. ITU-T SG16 is developing a new protocol for multi-link operation (H.Multilink), which isscheduled for approval January 1998.

Application layer Audio apps Video apps System control Data Apps

Presentationlayer

G.723.1 H.263H.261

H.245 Control Protocol T.120

Session layer SRP/LAPM V.14, LAPMTransport layer H.223, ITU-T forthcoming

error control (FEC+ARQ)protocol

H.223 H.223

Network layer Regional Stds

Link layer Bandwidth-on-demand e.g. multi-link reservation protocol, V.34 modemPhysical Layer Radio bearers (Regional stds)

Figure 5 Protocol architecture of a H.324/M system

The G.723.1 speech codec and H.263 video codecs may need modifications for acceptableperformance under adverse operating conditions. The modified versions have to be back-wards compatible with the PSTN versions. Interworking between H.324/M and H.323terminals is depicted in Fig. 21, Section 7.4.5. H.245 control protocol is common toH.323 and H.324 systems, which makes terminal interoperability easier.

7.4 ITU-T H.323 Recommendation [ITU 96e]

The recently approved ITU-T Recommendation H.323 defines the system requirementsfor Internet telephony and IP based real-time multimedia communications. Therecommendation has obtained a wide support and the computer software industry withMicrosoft, Intel, and IBM in the forefront. Furthermore, IMTC representing the telecomoperators and service providers is backing up H.323. It is likely that H.323 will gain adominant position as international standard for Internet telephony. Further development ofH.323 is underway in ITU-T Study Group 16. Terminals complying with new version ofH.323 will be backwards compatible with current H.323 terminals. A block diagram of aH.323 terminal is presented in Fig. 19. The standard is flexible in the sense that besidesaudio, it supports additional media such as still image data and video up to a full-fledgedmultimedia system. System control functions and audio are mandatory both for A-law andmu-law quantization. Unlike the H.324 systems, H.323 is not specifically a low bit-ratesystem, but in case needed, it can operate also across low bit-rate links.

H.225.0 Layer LANInterface

Receive Pathdelaycompensation

Audio codecG.729,G.723.1G.711,G.722G.728, MPEG1

Video codecH.261, H.263

System Control

H.245 control

H.225.0Call control

H.225.0

Scope of Recommendation H.323

T.120 DataInterface

System ControlUser Interface

RAS=Registrat-

Data equipment/applications

Video I/ODevices

Audio I/ODevices

Figure 6 Block diagram of a H.323 terminal [ITU 96e]

The protocol architecture of a H.323 system appears in subsequent Fig. 20.

Applicationlayer

Audio apps Videoapps

System control Data apps

Presentationlayer

G.711, G.722,G.723.1,G.728G.729

H.263H.261

RTCP H.225.0 Callsignalingchannel

H.245ControlProtocol

T.120

Session layer RTP H.225.0Transport layer Unreliable Transport (UDP) Reliable Transport (TCP)Network layer IPLink layer CSMA/CD, Token Ring protocolPhysical Layer Ethernet, Token Ring LAN

Figure 7 Protocol architecture for H.323

7.4.1 Audio

The mandatory speech codec of terminals complying with H.323 Recommendation isG.711, which is the 64 kbps PCM A-law or mu-law speech coding scheme. The VoIPdecision to back up G.723.1 as a default codec for Internet telephony implies that theindustry has agreed to include in the forthcoming H.323 terminals at least these twocodecs. The resolution does not rule out other audio codecs such as G.729, G.722, G.728and MPEG1 audio appearing in Fig. 19. Capability set of H.245 system control includesall the speech codecs. Terminals interchange such information at a negotiation phase priorsending any payload data.

ITU-T SG16 has recommended G.729 as default codec for terminals with audio-onlycapability in the new version of H.323, determined to be approved in January 1998. G.729is the recently approved ITU-T 8 kbps speech coding scheme [ITU 96c]. It delivers bettervoice quality allowing at 2-3 transcoding sequences i.e. recoding after a D/A and A/Dtransformation procedure without severe degradation of speech quality. Wirelessextensions of G.729 are under development in ITU-T. The victory of G.723.1 (dual-rate6.3/5.3 kbps speech codec for low bit-rate videotelephony) in the VoIP voting, reflects thechoices made by a majority of the industry in regard to their forthcoming Internet

telephony products. The benefit of the inferior G.723.1 codec is better interoperabilitywith H.324 and H.324/M terminals. G.729 codecs are not widely deployed yet.

Slight modifications may be needed for G.723.1, while being used in error proneenvironments. In any case, the modified version has to be compatible with the existingG.723.1 version. Concerning the other audio codecs listed in Fig 19, G.722 is wideband64 kbps audio, and G.728 is the 16 kbps speech codec used in H.320 videoconferenceterminals. The additional codecs are recommended to insure interoperability with variousmultimedia systems.

Interoperability between H.323 phone terminals across IP networks to digital cellularnetworks would need the support of speech codecs such as GSM, IS-54 or IS-95. Thissimplifies the gateway design and keeps the transcoding delay reasonably low.

7.4.2 H.263 Video [ITU 96i]

Video is optional, but if supported, H.261 is mandatory, but it is not suitable for low bit-rate operation. The terminals have to be able to both encode and decode QCIF (QuarterCommon Interchange Format) video. Also H.263 video coding may be supported and ifsupported QCIF is mandatory. CIF is optional. CIF video (352 * 288 pixels for luminanceand 176*144 pixels for chrominance) is not feasible on a low bit-rate connection.Therefore, CIF video with a reasonable motion rendition performance needs a connectionof around 100 kbps. ISDN is commonly used for H.320 videoconferencing terminals,which implies that the available bit-rate is 128 kbps for audio, video and data. Therefore, ifa LAN is equipped with an ISDN H.323 gateway, CIF is recommendable for respectiveH.323 multimedia terminals. In practice, besides H.263 most H.323 systems will probablysupport both H.261 and H.263 to provide interoperability with existing H.320 ISDNvideoconferencing systems and support for low bit-rate operation.

The forthcoming upgradings of H.263 bearing the notation H.263+ are particularlyimportant for error-prone, packet-loss affected environments. For instance scaleablevideocoding enables splitting the coded video into multiple logical channels in such amanner that some loss of data does not impair significantly the video quality. Since lowbit-rate operation is necessary over the Internet, the upgraded H.263 will improve thevideo quality of H.323 systems under adverse network conditions. A wide range ofcustom source formats will improve the adaptability of decoded video scenes to resizablePC displays and windows. This enhanced flexibility is particularly attractive for wirelessterminals, because it will seen by a user as improved overall quality.

7.4.3 Data

The data channel supports telematic applications such as electronic whiteboards, filetransfer, still image transfer, database access, and audiographics conferencing etc. Thevarious data protocols such as those of T.120 series protocol suite are not part of H.323Recommendation.

7.4.4 System, Media Stream and QoS Control

System control for H.323 systems is defined in Rec. H.245 and for media streamsynchronization and packetization in H.225.0.

All communication between endpoints is controlled by H.245 control protocol. Logicalchannels are opened for different media streams by using H.245 procedures, which aretransferred between endpoints across a H.245 control channel. The logical channels maybe either one-way or bi-directional, which enables different configuration in differenttransmission directions. For instance, video may be used one-way only or the spatialresolution of the video may be SQCIF only in the other direction to increase temporalperformance. This flexibility is important to utilize the scarce bandwidth as effectively aspossible in accordance with user needs.

H.245 is also needed to solve a problem faced in a mixed IP/circuit switched network(CSN) environment i.e. how to send DTMF signaling intact through the IP network,which tend to cause excessive delays or tone discontinuities through packet drop-outs.The lengths and levels of the tones of DTMF signals are determined by an ITU-T Q.35standard and a DTMF receiver located in LE of the PSTN discards a skewed or distortedDTMF signal. Furthermore, excessive propagation delays may induce the user to hang upor repeat the touch tone sequence causing malfunction. Therefore, VoIP Forum hasdecided to advocate inclusion of DTMF signals in the PDUs of the next version of H.245.

Different terminal capabilities are included in H.245 by using ASN.1 notation. H.245 iscommon to H.323 and H.324 systems, which facilitates substantially interoperability.H.323 is intended for non-guaranteed QoS LANs. Therefore, the standard does not offerany QoS guarantees. The delay intolerant media stream packets (audio and video) are sentover unreliable UDP, which means that lost packets are not retransmitted, unlessspecifically desired. The H.245 and RTP allows also the use of TCP within limitsdetermined by the playback buffer size, however, at the expense of additional latency.Endpoints may use FEC based error control at the upper layers to handle separate biterrors. Under normal conditions, the audio and video schemes tolerate errors and packetdrop-outs without excessive deterioration of quality.

H.225.0 enable synchronization of audio and video with the aid of RTP time stamps.Furthermore, it defines the used coding algorithm so that the receiving endpoint candecode the media streams correctly. RTCP capabilities of H.225.0 serve QoS, session andrate control. The packets of different media types are separated by sending them todifferent transport addresses of the H.225.0 layer.

7.4.5 Wireless Interworking

The architecture for intercommunication between H.323, H.320, H.322, H.324, H.324/Iand H.324/M terminals is depicted in Fig. 21. H.324/I is a forthcoming ISDN version ofthe H.324 standard. The H.323 gateway may support interworking with PSTN,narrowband and broadband ISDN, or a mobile network. In this context, the focus is onnarrow-band only.

The prime gateway functions consists of conversion functions to adapt H.323 endpointswith H.320 (ISDN), H.321 (B-ISDN), H.324 (PSTN), and possibly forthcoming H.324/Mterminals. Furthermore the gateway interconnects H.323 terminals over the Internet orWAN to H.323 terminals residing in another LAN. Typical conversion functions are:• bit-stream framing and multiplexing into H.221 (H.320)• terminal negotiation and system control signaling into H.242 (control protocol of the

existing H.320 videoconferencing systems)• call control signaling (E.164/Internet address)• DTMF tone conversion• audio conversion to H.320 terminals (G.723.1 to/from G.728) and• video coding conversion to H.324/M terminals (H.263 to/from H.263M)

• data protocol conversion

Intercommunication between H.324 and H.324/M terminals is insured at level 0, whichmeans that no error correction is present i.e. QoS is determined by the raw BER of themobile link, ranging from 10-2 to 10-3. In order to insure a BER of the order 10-6 , theH.324 terminal needs to support the transport layer error control mechanism of H.324/Mterminal as well (see Section 7.3.3).

Figure 8 Illustration of interworking between H.323, and other H series terminals

7.4.6 H.323 Extensions

The H.323 standard is currently addressing terminals interconnected to public packetswitched networks over LANs. ITU-T SG16 is currently studying also the possibility touse H.323 terminals by using a ppp-like protocol and a copper subscriber line of PSTN. Inother words, the idea of running low bit-rate multimedia applications over IP is beinginvestigated. There seems no other technical obstacles to do so except protocol latencyand bandwidth constraints, because originally H.323 was not intended for low bit-rateapplications. If the problems can be solved, this would result in a complete Internettelephony and multimedia communication standard.

H.323 Multipoint ControlUnit (MCU)

H.323 Terminals

Gatekeeper

Non-guaranteed QoS LAN

SS7

E1/T1

H.323 Gateway/IP Server

Guaranteed QoSLAN

BSCMSC BTS

Speechterminal

H.322 Terminal

H.324/M Terminal

H.324 Terminal

ISDN

PSTN

IP network(Intranet/Internet/WAN)

H.320 or H.324/Iterminal

7.5 IETF standards for RTP and RTCP (RFC 1890)

The RTP protocol has been used for H.323. It is an end-to-end protocol operating on thetop of UDP. RTP accomplishes two fundamental tasks in multimedia communications.Since UDP does not support synchronization of different media streams, RTP includes asynchronization mechanism, which time stamps the outgoing packets enabling the receiverto restore the timing relationship between different media streams. The other main task ofRTP is to define the payload format such as audio and video coding. RTCP is an integralpart of RTP and it provides QoS oriented feed-back and supports similar functionality asframe synchronous in-band signaling protocols for videoconferencing and videotelephonyin the ISDN.

Applications can use RTCP to control their operation modes, adjust session controlfunctions, data rate and other parameter settings. RTP will be important in packet radioapplications, which contain real-time media such as voice and video. Since RTCP packetsare sent at regular intervals, they enable also a third party i.e. service provider to monitorremotely network operation.

At a beginning of a RTP session, the application e.g. videoconferencing defines the desirednetwork address and destination port addresses for both RTP and RTCP. Audio and videoare carried in a separate RTP session, each having its own string of RTCP packets, whichmonitor the control the received quality of audio or video. The packets related to a givensession report the transmitting entity about the reception quality. For instance, underadverse network conditions, the transmitter may reduce the bit-rate to increasetransmission quality i.e. BER.

RTP supports dynamic channel allocation in wireless networks and data rate or videocoding can be changed on the fly. In the wireless realm, RTP implementations need beeffective in terms of low additional delay and overhead. Even if RTP is not a mobile awareprotocol, RTP does not include such elements, which would burden or retard thetransmission. The aim is to keep the number of control packets below 5% of the payload,but in practice a much lower overhead is achievable. RTP and RTCP seem to fit well intowireless environments, since the functionality includes such elements that are very usefulfor operation under adverse network conditions.

7.6 Future Directions of Wireless Multimedia Standardization

7.6.1 Potential H.323 Extensions

ITU-T SG16 and ETSI TIPHON project are currently investigating the possibility of usingH.323 terminals across a PSTN subscriber line in a point-to-point configuration. A newtechnology is needed to transfer the data between households and ITSPs (InternetTelephony Service Provider), because a V.34 modem connection and the ppp protocolincur far too high latency. As the number of PCs in households is increasing rapidly, thereis an apparent need to interconnect them. Low-end routers are already available, and theneed for cheap packet switched access is clearly indicated. The new access technology

may come up also as an ADSL or ISDN derivative. According to the current knowledge,a residential H.323 access is not likely to be available in short or medium term. Thefollowing Fig.22 illustrates the block diagram of a suggested H.323/M terminal andrespectively.

An IP based solution would have some significant advantages in regard to H.324systems. The market outlook for H.323 is by far dominant in business communications. Ifa residential version of H.323 would be available, the market would be really huge in longterm, which would enable economies of scale. A H.323 terminal would offer the versatilityof IP functionality at the residential user’s disposal, which would enable access acrossInternet to both conversational and non-conversational Web applications.

Figure 9 Suggested block diagram of a H.323/M terminal

In a similar fashion, we would like to investigate the possibility of wireless extension ofH.323. We need to ask first, whether there is a need for such a standard. Given that in 3-5years, H.323 is foreseen to gain a dominant market position, the answer is yes. There are

H.225.0MLayer

Air interfaceMultilinkreservationprotocol

Receive Pathdelaycompensation

Audio codecG.729,G.723.1G.711,G.722G.728

Video codecH.261, H.263

System Control

H.245 control

H.225.0Call control

H.225.0RAS control

Scope of a potential RecommendationH.323/M

T.120 DataInterface

System ControlUser InterfaceRAS = Registration,admission andstatus

Data equipment/applications

Video I/ODevices

Audio I/ODevices

many business applications, in particular in the medical and utilities fields, which wouldgreatly benefit from a wireless H.323 standard. Furthermore, interoperability with otherH.323 terminals would be very simple with low additional cost over-head. The secondquestion is, whether the standard is feasible. The envisioned wireless packet radio is unfitto act as a vehicle for such real-time services [Häm 96]. Therefore, the only possibility iscircuit-switched connection based on multiple link reservation such as GSM HSCSD. Adifferent error control scheme that is being developed for the forthcoming H.324 mobileextension is needed. Correct operation requires protection for IP headers in H.225.0M atthe link layer. The routing of IMT 2000 and Teledesic are expected to offer 2-way datatransfer capabilities up to ISDN 2B rates. Fig. 23 presents a potential protocolarchitecture for H.323/M [Ban 97].

Applicationlayer

Audio apps Video apps System control Data apps

Presentationlayer

G.723.1G.728,G.711G.729,G.722

H.263H.261

RTCP H.225.0 Callsignalingchannel

H.245 Con-trol Channel

T.120

Session layer RTP H.225.0Transportlayer

Unreliable Transport (UDP) Reliable Transport (TCP)

Networklayer

IP

Link layer Error controlBandwidth-on-demand e.g. multi -link reservation protocol (Regional Standard)

PhysicalLayer

Air interface (Regional Standard)

Figure 10 A potential protocol architecture for H.323/M

7.6.2 IETF RSVP Standard

The IETF RSVP standard enables reservation of adequate bandwidth for real-timeapplications under changing network traffic conditions. During a RSVP session, flow-specific state information will be stored in the routers and endpoint hosts [Est 96]. TheRSVP protocol enables the network to share the link resources in a controlled mannerbetween multiple RSVP entities. RSVP is under intense development. In particularadmission control and policing issues need a lot of further studies. The non-RSVP users’behavior is unclear under heavily congested network conditions as well as implications onoverall network operation. The situation may lead to an explosion of RSVP reservationrequests, which in essence will have policing implications. The use of RSVP in packetradio environments does not seem likely in short and medium term. The current state ofthe art suggests that except voice-over-the-Web, there are no other real-time applications,feasible to be carried over packet radio networks.

In case circuit switched wireless multiple links are used as access paths to wireline TCP/IPnetworks, the RSVP operation is transparent from the mobile network point of view. Thisissue is beyond the scope of this investigation and should be addressed in conjunction withother IETF RSVP studies.

7.6.3 MPEG4 Standard

ISO JTC1 SC29 WG11 is working on the next generation audio and video coding schemeMPEG4. The success of MPEG1 and MPEG2 has encouraged ISO to launch this highlyambitious initiative. MPEG4 will be actually more than an audio and video codingstandards, it is a complete multimedia system standard. This implies that a sophisticatedsyntax and system control will be included in the standard, which enable a range of new

functions such as scaleability, content based bit stream editing and manipulation, andcombining synthetic and natural coding by incorporating an advanced set of developmenttools [Rea 96]. MPEG4 would for instance make possible such exciting properties asincreased spatial resolution of the video on a selected picture area (a real zoomingfeature). The current state of the work suggests that MPEG4 standard will not beavailable before the of year 1999. As we have noted before, MPEG4 is not likely to becompatible with H.263L. The MPEG4 functionality is particularly interesting from lowbit-rate multimedia communications point of view. MPEG4 is expected to offer betterquality for both speech and video. Furthermore, it supports also audio i.e. music andcoding of other tonal information than speech.

Provided that MPEG4 is a success story, it will become a very strong candidate for nextgeneration Internet telephony and multimedia systems. The incompatibility with H.263L isthe biggest threat to this scenario. Because interoperability of a potential large base offorthcoming H.323 systems with MPEG4 systems may require enormous investments ingateways that would not be needed with terminals conforming to H.263L.

7.6.4 Global Initiatives for Future Wireless Standards

In mobile network context, standardization of the IMT 2000, is underway in ITU-RTG8/1. Its scope of work makes up only a small fraction of the enormous effort needed.Japanese are pushing now hard ITU-T to put more effort on IMT 2000. In USA thedeployment of 2nd generation digital network has begun fairly recently. Furthermore, inUSA, much of the lower frequency block of IMT 2000 has been allocated for rapidlyexpanding PCS1900. Therefore, interest in IMT 2000 among the big US players is low. Innational context, the standards efforts are likely to concentrate on further development ofIS-54 and IS-95. In Europe the evolution is seen as a continuous process from GSM [Rap95], [Ber 96],[Cox 96a],[Ram 96]],[Udd 95]. The phase 2+ in GSM includes HSCSD andGPRS standards, that are almost finished.

The next phase may include a wideband asymmetric air interface, possibly based onCDMA technologies. This air-interface can be seen as a terrestrial counterpart andcontender for the Teledesic high speed 2 Mbps satellite links. Unlike Teledesic, the highspeed data services of the terrestrial systems are planned to be provided in wirelessenvironments with limited coverage only[Ber 96]. Therefore, the two systems addresspartly different markets.