Media processing in the cloud- what, where and how

8
The communications technology journal since 1924 2013 • 5 Media processing in the cloud: what, where and how April 11, 2013

description

The evolution to IP technology, VoLTE and new video services will have a profound impact on the way person-to-person media processing will be performed in the networks of the future. This evolution raises some questions: what processing will be needed, where will it take place and how will it be implemented? Read more from the Ericsson Review here: http://www.ericsson.com/thinkingahead/technology_insights

Transcript of Media processing in the cloud- what, where and how

Page 1: Media processing in the cloud-  what, where and how

The communications technology journal since 1924 2013 • 5

Media processing in the cloud: what, where and how April 11, 2013

Page 2: Media processing in the cloud-  what, where and how

Media processing in the cloud: what, where and howThe evolution to IP technology, VoLTE and new video services will have a profound impact on the way person-to-person media processing will be performed in the networks of the future. This evolution raises some questions: what processing will be needed, where will it take place and how will it be implemented?

media processing be provided – will it be handled in a cloud-like manner or will it be pushed out to terminals?

The deployment of generic industry hardware that is capable of running many kinds of applications in a flexible manner is a growing trend within the ICT industry. It follows then that gener-ic computers offering cloud services will also be used to implement future tele-communication networks in operator cloud centers.

The third and final question addressed in this article is: how will media be processed in evolved telecom-munications networks – how much generic hardware will be used and will DSPs on dedicated platforms continue to be the preferred approach.

Bearing in mind that the cloud is not just about technology, this article also describes how cloud principles can be applied to the various business models for communication services.

Initially, the services provided by the telephone network were carried out by switchboard operators. Gradually, as computing resources were introduced, control logic processing and media handling became entirely automatic, leading to today’s models where cloud-based services are provisioned over a network using shared pools of comput-ing resources, and where users pay for what they consume.

Phones were initially simple devices, consisting of a microphone and a loud-speaker. When routing of calls became automatic, a rotary dial was added. Today, more than one billion smart-phones around the world provide a computing platform that is capable of running millions of applications and of providing extensive media processing.

Two of the questions addressed in this article are: what media processing will take place in the communication ser-vices of the future, and where will this

JOH A N LU N DST RÖM

BOX A Terms and abbreviations

AMR Adaptive Multi-RateAMR-WB AMR-widebandAS application serverATM Asynchronous Transfer ModeBGF border gateway functionBSC Base Station ControllerCAGR Compound Annual Growth RateDSP digital signal processorEFR Enhanced Full RateIETF Internet Engineering Task ForceIMS IP Multimedia SubsystemMGC Media Gateway Controller

MGW Media GatewayMSC mobile switching centerMSC-S MSC serverM-MGW Mobile Media GatewayMMTel AS multimedia telephony application serverMRF Media Resource FunctionMRS media resource systemMSS mobile softswitchO&M operations and maintenanceOSS operations support systems PCM pulse-code modulation

PLMN public land mobile networkPSTN public switched telephone networkRNC radio network controllerSBG Session Border GatewaySGC Session Gateway ControllerSGW Signaling GatewaySIP Session Initiation ProtocolTDM time division multiplexingTrFO transcoder free operationVLR visitor location registerVoLTE voice over LTE

There’s a strong argument for regarding telephony as one of the first cloud-based services. Since the invention of the telephone, the industry has evolved significantly and operators have developed a flexible range of services for subscribers provided on a pay-as-you-use basis. Smartphones have brought an enriched experience to users and theoretically they, along with other advanced terminals, could perform much of the media processing traditionally taken care of by networks. However, the constraints posed by bandwidth and battery life, along with the desire to provide new services independent of terminal type, tend to indicate that most media-processing services will remain in the network.

2

E R I C S S O N R E V I E W • APRIL 1 1 , 2013

Voice and video in the cloud

Page 3: Media processing in the cloud-  what, where and how

Processing and network evolutionThe digitalization of voice was one of the first steps in network evolution and elec-tronic media processing. The shift to digital led to lower distortion levels and reduced attenuation of the voice signal, improving its quality.

Digitalization led the way in the development of new approaches for improving voice quality, such as echo cancelling and noise reduction. Without the digitalization of voice, and the devel-opment of efficient voice codecs that save bandwidth, such as Enhanced Full Rate (EFR) and Adaptive Multi-Rate (AMR), mobile telephony would not be the reality it is today.

Pulse-code modulation (PCM) is still the most common method of digital-ly representing analog voice signals over the PSTN and among PLMNs. As networks and devices use and support different codecs and protocols, mobile telephony networks usually need to con-vert voice – by transcoding – from one format to another.

Further improvements to voice qual-ity are taking place through the appli-cation of new codecs, such as AMR-WB, which supports HD voice, combined with mechanisms, such as transcoder free operation (TrFO), based on codec negotiation between the end points involved in a call1,2.

Tones, such as dial and busy tones, and announcements, such as faulty service indications, are examples of general network-generated services that users have grown accustomed to over the years. Other services such as conferences, where voice streams from multiple sources are combined, are also network-generated and exemplify the trend towards advanced voice services.

Circuit-switched networks still han-dle most of today’s voice traffic. The architecture of these networks tends to be based on softswitches consist-ing of Media Gateways (MGWs) and Media Gateway Controllers (MGCs). For mobile softswitches (MSSs), the MGC is integrated in the mobile switching cen-ter server (MSC-S). For the most part, echo cancelling, transcoding, and send-ing of tones and announcements is car-ried out by MGWs. These gateways also interwork with the PSTN for circuit-switched data and fax, they handle multi-party calls, and reframe media

samples on the borders between 3GPP and IETF networks. In addition to per-forming media processing, the MGWs also act as a bridge between different bearer technologies, such as between TDM and IP.

As networks evolve, and people’s use of them progresses, voice will be han-dled by the IMS. And so communication with video will become a mainstream activity for enterprises and consumers. Media handling in this environment is performed primarily in a logical node called the Media Resource Function (MRF), which uses SIP to communi-cate with the rest of the network. The MRF provides services such as tones, announcements and conferences, and will support new services developed in response to subscriber demand.

In an all-IP environment, such as IMS, operators no longer have end-to-end con-trol over networks, resulting in greater emphasis on security. For SIP signaling and related media, it is the responsibil-ity of Session Border Gateways (SBGs) to handle security. These SBGs can be implemented as stand-alone boxes, or integrated into other network elements in a layered architecture, which reduces

capex and opex. These gateways may also provide l imited media-processing capabilities, such as transcoding.

Further development in media pro-cessing will be needed to meet the expo-nential growth in person-to- person video communication.

Consider the media processing requirements for videoconferencing. Most videoconference services show participants using two primary display modes: voice activated and continuous presence. In voice-activated mode, the stream from the active speaker domi-nates the available display area, while other participants are shown in smaller windows, or not at all. In continuous-presence mode, all participants are displayed simultaneously. To deliver a videoconference, the network has two choices: it can either collect all video streams from participating users and send all streams to all users; or it can mix the video streams into one pre-ferred format before sending the single, combined stream to participating users. In the all-streams-to-all-users approach, media processing is performed by the participating terminals, whereas the mixing approach relieves the

Common resources that are pooledand dynamically shared by differentapplications

MSC-S OSSMGC

SGWapp

ATMports

TDMports

IPports

DSPdevices

MGWapp

BGFapp

Common resource handling

MRFapp

O&M

SGC MMTelAS

Common O&M implementation andinterface with a one node view

FIGURE 1 Ericsson media-resource-system architecture

3

E R I C S S O N R E V I E W • APR L 1 1 , 2013

Page 4: Media processing in the cloud-  what, where and how

control, such as the MSC-S, and one for media processing applications, such as the MGW or the MRF.

Today, control applications tend to be built on dedicated, carrier-grade plat-forms with generic processor archi-tectures, such as x86. Some of these platforms can already run multiple tele-com applications and provide many of the benefits offered by operator cloud centers. It is likely that these platforms will develop into telecom cloud cen-ters supporting virtualized software and applications – allowing operators to further reduce their capex and opex investments.

The requirements placed on media-processing platforms are however sig-nificantly different from those for processing control applications. This is because the amount of processing needed for media is much greater and the requirements for real-time process-ing and latency are more stringent. In addition to supporting multiple ser-vices and adapting to changing traffic profiles automatically, media-resource platforms will need to support TDM interfaces for some time to maintain interaction with legacy systems.

General-purpose processors, such as the x86, have become more cost efficient for handling media, howev-er their performance compared with DSPs varies significantly depending on the media being processed. A DSP, for example, offers superior performance for voice processing, such as transcod-ing. But when it comes to certain types of video processing the performance of a DSP is not significantly better.

It is hard to predict whether the cost-to- performance ratio for DSPs and general-purpose processors will change as new chips are introduced to the market and the types of media-processing services evolve. For the moment, DSPs provide the best perfor-mance in comparison to overall cost for services requiring both high channel capacity and density, such as voice in circuit-switched networks.

In the long term, as the need to inter-face with TDM systems disappears and the volume of voice transcoding con-sequently shrinks, using generic pro-cessors and operator cloud centers for media processing will become a more competitive option.

terminal of the need to perform any media processing. The combined approach can save a significant amount of bandwidth in the access network. Yet another way to save bandwidth is to just send the video stream associated with the active speaker to the participants’ terminals.

Videoconferencing is just one exam-ple of a video-based application. Many new services that will be typically deliv-ered by the cloud, such as recording, storage, announcements and mail-boxes, will be implemented later on. Advanced voice and video services may include real-time speech recognition; speech-to-text conversion; automatic language translation; speech-controlled supplementary services; embedded ban-ner advertising; speaker identification; and real-time generation and transla-tion of subtitles in video calls.

The cloud versus the terminalTo ensure good media quality and efficient use of the access network, terminals need to be able to encode and decode digital media. In theory, ter-minals could provide more or less all the media-processing power needed to deliver services offered by the network. To do this, terminals would, for exam-ple, need to:

support all codecs – so that all potential peers can use the codec best suited to their architecture; generate tones and announcements based on error codes received from the network; andact as a conference bridge, or support multiple ways of acting as a video client – to ensure interoperability with all potential peers.

But is this approach cost efficient? And is it good for users? The success of a new communication service lies in the rapid adoption by a critical mass of users. New services therefore need to be as terminal-independent as possible, reach as many users as possible and be interoperable from day one.

To maintain interoperability and avoid fragmentation of some types of services, such as video communication, performing media processing in the net-work is key. Using standardized interfac-es between networks helps to ensure interoperability among operators and

secures optimal performance and quality. In addition, codec negotiation (including interworking between con-trol protocols), transcoding, reframing and video- mixing services can be used in networks to support interoperability.

As illustrated by the videoconference example, handling media processing in the network, rather than the termi-nal, can save bandwidth. This expensive resource can also be used more econom-ically if the network is allowed to pro-vide all transcoding processing, leaving terminals free to use the codec that is best suited to their specific architecture.

Terminals that use less bandwidth often require less power. And so, by handing over bandwidth-hungry ser-vices – such as voice and video mixing – to the network, power consumption in the terminal can be reduced, extend-ing the recharging interval and improv-ing battery life.

Algorithms for voice and video pro-cessing tend to be patented and termi-nal manufacturers have to pay royalties to use them. Performing transcoding in the network through pooled instances reduces the number of algorithms need-ed for terminal media- processing result-ing in lower usage fees and reduced overall cost to subscribers.

When all the factors are brought together, it seems the current approach to media processing – performing it in the network – remains the most effi-cient. As it is likely that the network will continue to be the most practical alter-native in the future, it stands to reason that media processing will also remain a cloud-based service.

Cost-driven platform evolutionRequirements for reliability, energy efficiency, redundancy and low carbon footprint have led to the use of dedicat-ed hardware platforms to build telecom-munication network elements – until now. In an operator cloud, a competi-tive hardware platform not only needs to meet all of these requirements but should be generic enough to support multiple applications and flexible enough to accommodate fluctuating traffic patterns and changing applica-tion capacity needs.

To efficiently provide communica-tion services in a network, two differ-ent platform types are needed: one for

4

E R I C S S O N R E V I E W • APRIL 1 1 , 2013

Voice and video in the cloud

Page 5: Media processing in the cloud-  what, where and how

Sharing resources reduces costThe concept underlying Ericsson’s media-processing platform is based on providing processing capabilities in the network. Such a platform – a media resource system (MRS) – uses DSP resources in a dynamic way, is capable of allocating resources to the different media-processing functions automati-cally, and can pool user requests among the various DSPs.

The MRS concept provides both media-gateway and signaling-gateway functionality for MSS networks. It con-tains an MRF for media processing in IMS networks and provides session bor-der functionality for MSS and IMS net-works. The session border functionality uses a layered architecture, under which a border gateway function (BGF) in the MRS handles the media plane, while a Session Gateway Controller (SGC) han-dles the control plane. Figure 1 shows the high-level distributed and integrat-ed architecture of this system.

Networks with Ericsson Mobile MGW (M-MGW) nodes installed can be upgraded to an MRS with support for future media-processing features, as the M-MGW/MRS can be part of both an MSS and an IMS environment. To perform this type of upgrade simply involves a software update.

The MRS can be considered to be a media cloud platform as it supports multiple media-processing applica-tions, it can share the available com-puting resources as well as sharing external interfaces dynamically among the media-processing applications. Plans to develop the system include the addition of open interfaces that allow specialized external products to provide functionality via the common MRF.

Network scenariosAs illustrated by the example in Figure 2, fixed and mobile network architectures have traditionally been distributed and hierarchical. In such networks, the node closest to the sub-scriber takes care of voice coding or transcoding to PCM when a call enters the network.

Today’s mobile switching solutions allow the control logic – the MSC server nodes – to be centralized to just a few sites, even in fairly large networks. Media, meanwhile, is handled

Coding anddecoding

BSC MSC/VLR Transitexchange

Localexchange

Transcoding Coding anddecoding

FIGURE 2 Traditional network architecture

FIGURE 3 Structure of a modern mobile voice network

BSC

BSC MSC-S

RNC

IPMGW

PLMNPSTNIMS

Pooled media-resources Pooled media-control and call-routing resources

5

E R I C S S O N R E V I E W • APR L 1 1 , 2013

Page 6: Media processing in the cloud-  what, where and how

locally to save bandwidth and min-imize latency. To ensure hardware resources are used efficiently and a high level of resilience is maintained, MSC-S nodes are often pooled. IP-based bearers used on the interface to the radio net-work also allow pooling of MGWs, offer-ing similar benefits in terms of efficient resource usage and resilience. Figure 3 shows a simple network where both the media gateways and servers are pooled.

The introduction of VoLTE and IMS has naturally led to a new network structure, especially in the media plane. The first task that the network needs to take care of is security, and so an SBG makes sure that it is safe to establish a session. Media processing may then be needed in the set-up phase to, for exam-ple, produce tones and announcements; services which can be provided by tem-porarily linking in an MRF. During the call-establishment phase, the control layer determines whether transcoding and reframing are needed. If so, an MRF is linked in, or alternatively a BGF may be able to handle transcoding. Certain services, such as conferencing, may also require additional media processing.

As end-to-end codec negotiation will be more common in IMS networks than it is in circuit-switched networks, the need for media processing will dimin-ish as networks evolve. However, new and advanced processing services will be introduced to handle special cases.

The best network architecture, illus-trated in Figure 4, is based on distrib-uted SBGs or BGFs optimizing latency and ensuring bandwidth efficiency; and advanced services that are not used so often can be centralized.

The flexible nature of the MRS sup-ports all network architectures. It is a scalable solution that can be used at the edge of a network or in a centralized way. In cases where an operator wants to avoid over provisioning to cater for occasional traffic peaks, MRS nodes can be pooled to balance the load through-out the network. This can be achieved even if the nodes are in different geo-graphic locations.

Changing business modelsA significant aspect of cloud comput-ing is the business model. The cloud approach enables enterprises to buy IT

services instead of investing in infra-structure. Telecommunication opera-tors provide communication services, such as voice, to consumers and enter-prises in much the same way. And it is likely that additional products will be cloud-based3.

Vendors can provide wholesale cloud services to operators who, in turn, break them up into smaller, retail, offer-ings for enterprises and consumers. Ericsson’s Device Connection Platform, for example, supports machine-to-machine communication as a cloud ser-vice for operators that offer retail cloud services. Other services, such as low-volume media processing, may be pro-vided to operators as cloud services in the future. The sharing of network ele-ments among several operators enables vendors to obtain better economies of scale than individual operators can for certain services.

The what, the where and the how: the answersEven though terminals are fast becom-ing advanced computers capable of performing sophisticated media pro-cessing, this function is likely to remain a network-based service for rea-sons of efficiency. Telecommunication platforms are developing into multi- application systems, that support both local and geographic spreading of resource pools.

Cloud platforms based on generic pro-cessors are likely to be introduced in the control plane first. Whether these plat-forms will be used for media processing, and when, will depend on: the need for legacy interfaces; the evolution of the cost-to-performance ratio for DSPs; the type of media processing services that will be required in the future; and the volume of these services.

One of the important aspects of cloud computing is the business model. The market is already showing evidence of increased flexibility when it comes to who will provide communication ser-vices. In the future, enterprises will be able to rely on operators to provide com-munication services instead of buying their own equipment. Operators will, in turn, be able to rely on vendors to pro-vide cloud services, creating an efficient value chain in which each player pays for services based on usage.

Externalnetworks

EvolvedPacketCore

EvolvedPacketCore

SGC AS

BGF BGFMRF

IP transport network

IMScontrol plane

Security andtranscoding on

the network edgesSecurity and

transcoding onthe network edges

Centralized and pooledmedia-resources

in MRF

FIGURE 4 Architecture of an all-IP and IMS network

6

E R I C S S O N R E V I E W • APRIL 1 1 , 2013

Voice and video in the cloud

Page 7: Media processing in the cloud-  what, where and how

Johan Lundström

is a strategy manager for mobile softswitch and media processing solutions within product

area Core and IMS at Business Unit Networks. He joined Ericsson in 1991 and since then, he has worked primarily with mobile core networks. He has had various positions in both R&D and product management, including line management. He holds an M.Sc. in telecommunications and software science from the Helsinki University of Technology, Finland.

1. Ericsson, 2010, Ericsson Review, Evolution of the voice interconnect, available at: http://www.ericsson.com/res/thecompany/docs/publications/ericsson_review/2010/evolution_voice_interconnect.pdf

2. Ericsson, 2011, White Paper, HD voice – it speaks for itself, available at: http://www.ericsson.com/res/docs/whitepapers/WP-HD-voice.pdf

3. Ericsson, 2011, White Paper, Visual communication – why operators should address the enterprise market, available at: http://www.ericsson.com/res/docs/whitepapers/wp-visual-communication.pdf

References

The author gratefully acknowledges the colleagues who have contributed to this article: Patrik Roséen, Mats Alendal, Joakim Haldin, Markku Korpi, Peter Jungner, András Vajda, Kari-Pekka Perttula and Jörg Ewert.

Acknowledgements

7

E R I C S S O N R E V I E W • APR L 1 1 , 2013

Page 8: Media processing in the cloud-  what, where and how

Telefonaktiebolaget LM EricssonSE-164 83 Stockholm, SwedenPhone: + 46 10 719 0000Fax: +46 8 522 915 99

284 23-3186 | Uen ISSN 0014-0171

© Ericsson AB 2013