HD Voice Codecs
-
Upload
k-peng-tan -
Category
Documents
-
view
250 -
download
0
Transcript of HD Voice Codecs
-
7/27/2019 HD Voice Codecs
1/21
http://www.voipsupply.com/hd-voice-codecs
HD voice codecs
What is a codec?
The word codec comes from mashing together the functions ofcompressing (co) and decompressing (dec) analog sound into digital bits
for use by computers and networks. There are literally hundreds of audio
codecs -- pieces of computer code -- available today and embedded in
any device that plays sound, from a simple MP3 player to the hottest
smart phones. Some are open source and free while others are
proprietary and/or patented, requiring licensing fees.
Why are there so many different codecs? Over the years, people have
created and optimized codecs for the specific environments they were
going to be used in, so the cellular community built codecs that
optimized the use of radio frequency (RF) bandwidth while others
wanted adaptable bit-rate codecs suitable for a wired broadband
environment that would adjust sound quality depending on how much
bandwidth was available -- compress a little if there's a lot of bandwidth,
crunch harder if there's less.
More recently, developers have been leveraging more efficient computer
processors to develop better codecs. The tradeoff for using more CPUcycles is, of course, more power required to run them -- not an issue at a
desktop, but definitely a concern for mobile devices.
A number of codecs are ITU (International Telecommunications Union)
standards, formalized for international use and incorporation into
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
2/21
http://www.voipsupply.com/hd-voice-codecs
devices. If a codec name starts with a G and a period, such as G.711 or
G.722, it's an ITU standard.
Popular HD voice codecs - G.722, AMR-WB, SILK, iSAC
You can't talk about HD voice codecs without first talking about baselineanalog and digital voice quality. Established way back in 1972, G.711 is
the standard for stock VoIP voice quality and equal to what you get out
of a POTS analog phone call. It captures speech in a range of 3.4 kHz, has
a sampling rate of 8 kHz, and needs 64 kbit/s of bandwidth to deliver a
call.
G.722 is Old School when it comes to HD voice, formalized back in 1988.
It captures sound in a range of 7 kHz and samples audio at a rate of 16kHz -- double that of G.711. The result is superior quality and clarity far
above a POTS analog phone call. Taking advantage of CPU processing
speeds, G.722 can deliver double the quality of a G.711 phone session in
the same amount of bandwidth -- 64 kbit/s.
You'll find G.722 built into pretty much every desktop VoIP handset built today
(2010), regardless of manufacturer or model of phone -- yes, even the
modest-looking $129 list price entry models support G.729. Patents on
G.722 have expired so there's no licensing fees and the processing
requirements are minimal on today's chips. At least one software shop
(D2 Technologies) has implemented G.722 for the Android mobile
operating system. Handset manufacturers who support G.722 include
Aastra, ADTRAN, Allworx, AudioCodes, Avaya, Cisco, Panasonic, Polycom,
Siemens and Snom .
Coming strong out of Europe and the mobile community is AMR-WB, also
known as G.722.2. Mobile operators wanted better sound qualitydelivered in less bandwidth, so AMR-WB should deliver quality G.722
quality at around 24 kbit/s. France Telecom and Ericsson have been
leaders in promoting AMR-WB for mobile HD voice -- in part, because
they hold some of the patents in the standard -- and they would like to
see AMR-WB appear in desktop phones and software clients so users can
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
3/21
http://www.voipsupply.com/hd-voice-codecs
make end-to-end calls in AMR-WB, rather than having to translate
(transcode) between G.722 and AMR-WB. You'll see more AMR-WB buzz
for desktop handsets later in 2010 and into 2011.
SILK is Skype's "super wideband" voice codec. Optimized for real-time
communications on the Internet, SILK is an adaptive bit-rate codec that
supports multiple sampling rates ranging from 8 kHz narrowband to 24
kHz or more. If you have the CPU cycles and bandwidth of 40 Kbp/s, SILK
gives you the best performance possible. On a lower-powered machine
and/or with less available bandwidth, SILK drops down and adjusts to the
conditions involved. Unlike AMR-WB, SILK is available royalty-free. A few
manufacturers, including AudioCodes, have discussed incorporating SILK
into their products.
Finally, Global IP Solution (GIPS) offers a proprietary wideband speech
codec that has been incorporated into a large number of soft clients and
applications, including AIM, Citrix Online, CommuniGate, Gizmo5, Google
Talk, IBM Lotus, NimBuzz, QQ, WebEx, and Yahoo!
The problem with too many different codec
In order to have a successful HD voice call, both (or nearly all in aconference) need to use the same codec. If both sides are using different
HD codecs either one side has to be transcoded -- translated -- into the
same codec type or both sides have to shift to a mutually agreeable
codec.
Transcoding already takes place in the VoIP world on a daily basis, with
calls being compressed before sent out long distance and translations
taking place between the POTS network and VoIP transport. The issueswith transcoding between HD codecs are that it takes more horsepower
(processing cycles) than with vanilla VoIP/POTS networks and nobody is
willing to say the end translation product is as good as a "pure" end-to-
end HD voice call using a single codec.
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
4/21
http://www.voipsupply.com/hd-voice-codecs
If both sides can't find a mutually agreeable HD voice codec, they end up
dropping down to the lowest common denominator -- G.711 -- which kills
the primary point of using HD in the first place.
What is HD Voice?
HD voice is a technology that delivers at least twice the sound as
compared to a typical voice phone call (i.e. "Plain old phone service" or
POTS to be hip; Public switched telephone network or PSTN if you're
more formal) delivered on a landline through the world's analog circuit-
switched phone network.
Real world benefits from HD voice include:
1. Better comprehension and clarity, especially in longdetailed and/or technical discussions
2. Clarity in understanding acronyms
3. The ability to differentiate between and clearly identifyothers on a conference call
4. Clarity and easier understanding in multi-national/multi-lingual conversation where you have non-native speakersand native speakers communicating in one or morelanguages
5. More accurate transcriptions (both human andautomated)
In short, everything involving voice is better in HD voice, be it simple
person-to-person call , a 20 person international conference discussion,
or a speech-to-text process.
The technology of HD voice
Sound is measured in hertz, or Hz. The human ear can typically hear
everything between 20Hz and 20,000Hz. The higher the number, the
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
5/21
http://www.voipsupply.com/hd-voice-codecs
higher (squeaker) the sound is until you move past 20,000Hz and into
ultrasound frequencies only a dog can pick up.
A landline phone call captures and delivers sound in a range of 300Hz to
3400Hz, so there's a lot of sound information chopped off on both the low
and high-ends of the scale. For simplicity's sake, a POTS call has a range
of about 3.4 kHz (3400Hz). POTS calls are often called "narrowband" calls
because they have such a restricted range as compared to what the
human ear can actually hear and process.
Since an HD voice call is defined as delivering at least twice the sound
range of a traditional phone call, an HD voice call will have a range of
about 7 kHz -- or more. Wideband voice and HD voice are often used
interchangeably since an HD voice call is "wider" -- more of a Hz range
than a narrowband call.
In order to deliver twice or better sound than a POTS call, the first thing
you need is a phone acoustically built to capture and deliver that extra
information, so both the microphone(s) and a speaker/handset must
capable of receiving and delivering across a 7 kHz or greater range.
Once sound is captured, it needs to be processed into digital form with a
codec. The G.722 codec (more on codecs later) is generally considered
the baseline for HD voice; it captures and delivers sound between 30Hz
to 7000Hz.
Interestingly, a HD voice call using G.722 can be delivered on the same
amount of bandwidth as its digital POTS equivalent of G.711 -- 64 kbp/s.
If you are currently using G.711 in a VoIP phone system, you can switch to HD
voice without needing more bandwidth.
Finally, two (or more, if it's a conference call) parties need to be able to
talk to each other using the same voice encoding (codec) scheme. Within
an organization/PBX domain, this is pretty easy -- turn on G.722, rebootthe phones, and you're done. Communicating between different HD voice
groups, or "islands" is more difficult because there's some Internet
peering and interconnection issues involved, but service providers are
working out the details to transparently provide HD voice calling.
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
6/21
http://www.voipsupply.com/hd-voice-codecs
What phones support HD voice?
A better question might be "What phones don't support HD voice?" All the
major IP telephone handset manufacturers -- Aastra, Allworx, AudioCodes,
Avaya, Cisco, ShoreTel, Panasonic, Polycom, Siemens, and snom
--support G.722 in their current (2010) phone lines going all the way
down to the entry-level (i.e cheapest) model.
Benefits of HD Voice
Simply put, HD voice makes everything (voice) better. With an HD voice
call delivering twice the sound as a narrowband one, there's much moreaudio information provided for the brain to process, resulting in less
fatigue and better comprehension. Computer-based processes like voice
recognition and speech-to-text also gain from HD voice, with better
accuracy.
Advocates of HD voice use the cliche' of a call sounding as clear and
natural as if you are talking to someone in the same room, and there's a
laundry list of reasons for wideband goodness ranging from being able to
understand a three year old (higher voices get clipped) to public safety.
Specific HD Voice benefits for businesses include:
Reduction of fatigue
During a narrowband call, your brain is quietly working to "fill in the
blanks" by interpreting word sounds that have been clipped to fit into asound range of 300Hz to 3400Hz. All the information you normally hear
between 20Hz to 300Hz and 3400Hz to 20,000Hz is gone, so your brain
has to figure out what is being said by using contextual clues.
For short and clear calls, this isn't a big headache, but the longer the
call, the more work your brain ends up doing without you thinking about
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
7/21
http://www.voipsupply.com/hd-voice-codecs
it. (Yes, there's a reason why you dread hour-long calls and long for them
to be over).
Better comprehension and clarity
Acronyms are "notorious" for being hard to understand during anarrowband call, says HD voice expert and Polycom CTO Jeff Rodman. In
addition, similar-sounding words like "sail" and "fail" also cause confusion.
A narrowband call can result in a lot of repetition and additional
explanation -- or people just don't get it the first time through and have
to get clarification via email... or another phone call.
Because HD voice provides more sound information, it's easy to
understand the difference between FEC, FCC, SEC, and FTC on a call. Inprofessions where accuracy and speed counts -- such as medical, legal,
and financial -- HD voice is a clear winner because information is
communicated more accurately the first time around. Technical
conversations are easier because terms can be clearly understood.
As a result, it is rare that speakers are asked to repeat themselves -- an
occurrence that happens all too often in narrowband.
In addition, individual voices -- people -- are highlighted in HD voice,
making it easier to know who is talking during a call.
Conference calls rock!
If there's just one "must have" app for HD voice, it is conferencing. The
combination of reduced fatigue and better compensation and clarity
enable people to worry more about the content of discussions, rather
than trying to struggle understanding what is being said.
Executives at Fortune 500 companies -- the "C-Level" guys -- are starting
to insist upon conducting conference calls in HD voice for the efficiency
it brings. Time is money, and HD voice enables people to focus on the jobat hand and to get it done more quickly.
Improved multi-national/multi-cultural communication
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/manufacturer/polycomhttp://www.voipsupply.com/ip-phones/conferencehttp://www.voipsupply.com/manufacturer/polycomhttp://www.voipsupply.com/ip-phones/conferencehttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
8/21
http://www.voipsupply.com/hd-voice-codecs
HD voice is a clear winner when it comes to international calls and
another "must have" for businesses regularly doing business with non-
native speakers of another language.
For most of us, it is a challenge to speak another language. There are
accent issues, vocabulary issues, and even tone can be used differently
to communicate nuances. Put all of those factors into a narrowband call
and the ability to clearly communicate between offices in Europe and
Asia becomes much more difficult.
Using HD voice, non-native speakers will be able to more clearly
understand what is being said and be more clearly understood when they
speak. And if everyone is a non-native speaker of the common language
being used during a call, HD voice might make the difference between
communication and confusion.
More accurate transcriptions
HD voice provides much better raw information for both humans and
computers to process when it comes to creating transcriptions. Human
beings can more easily hear what is being said in a recording, saving
time. Any automated speech-to-text process -- ranging from transcription
to emailing phone messages -- benefits.
http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs -
7/27/2019 HD Voice Codecs
9/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
VoIP bandwidthfundamentals
E-Mail Print A AA AAA inShare Facebook Twitter Share This RSS ReprintsBandwidth requirements for Voice over IP can be a tricky beast totame until you look at the method and factors involved. This guideinvestigates what bandwidth means for VoIP, how to calculatebandwidth consumption for a VoIP network and how bandwidth can besaved by using voice compression.
Table of contents
1. What about bandwidth for VoIP?
-- An introduction to bandwidth issues for Voice over IP and itsdifferent components.
2.
3. Calculating bandwidth consumption for VoIP
-- This section discusses how bandwidth can be calculated for VoIP
transmissions and what strategies work best for the majority of
situations.
4.
5. How can voice compression save bandwidth?
-- Using voice compression can be one of the best strategies when
trying to save bandwidth. This section discusses how these 'savings'
can be achieved.
6.
What about bandwidth for VoIP?Voice over IP (VoIP) is the descriptor for the technology used to carry
digitized voice over an IP data network. VoIP requires two classes ofprotocols: a signaling protocol such as SIP, H.323 or MGCP that is usedto set up, disconnect and control the calls and telephony features; anda protocol to carry speech packets. The Real-Time TransportProtocol (RTP) carries speech transmission. RTP is an IETF standardintroduced in 1995 when H.323 was standardized. RTP will work withany signaling protocol. It is the commonly used protocol among IP PBXvendors.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#introhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#consumptionhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#compressionhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#introhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#consumptionhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#compressionhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
10/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
An IP phone or softphone generates a voice packet every 10, 20, 30 or40ms, depending on the vendor's implementation. The 10 to 40ms ofdigitized speech can be uncompressed, compressed and evenencrypted. This does not matter to the RTP protocol. As you havealready figured out, it takes many packets to carry one word.
The shorter the packet, the shorter the delay
End-to-end (phone-to-phone) delay needs to be limited. The shorterthe packet creation delay, the more network delay the VoIP call cantolerate. Shorter packets cause less of a problem if the packet is lost.Short packets require more bandwidth, however, because of increasedpacket overhead (this is discussed below). Longer packets that containmore speech bytes reduce the bandwidth requirements but produce alonger construction delay and are harder to fix if lost. Many vendorshave chosen 20 or 30ms size packets.
RTP packet format
The RTP header field contains the digitized speech sample (20 or 30msof a word) time stamp and sequence number and identifies the contentof each voice packet. The content descriptor defines the compressiontechnique (if there is one) used in the packet. The RTP packet formatfor VoIP over Ethernet is shown below.
Ethernet
Trailer
Digitized
Voice
RTP
Header
UDP
Header
IP
Header
Ethernet
Header
RTP can be carried on frame relay, ATM, PPP and other networks withonly the far right header and left trailer varying by protocol. Thedigitized voice field, RTP, UDP and IP headers remain the same.
Each of these packets will contain part of a digitized spoken word. Thepacket rate is 50 packets per second for 20ms and 33.3 packets per
second for 30ms voice samples.The voice packets are transmitted atthese fixed rates. The digitized voice field can contain as few as 10bytes of compressed voice or as many as 320 bytes of uncompressedvoice.
The UDP header carries the sending and receiving port numbers for thecall. The IP header carries the sending and receiving IP addresses forthe call plus other control information. The Ethernet header carries the
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
11/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
LAN MAC addresses of the sending and receiving devices. The Ethernettrailer is used for error detection purposes. The Ethernet header isreplaced with a frame relay, ATM or PPP header and trailer when thepacket enters a WAN.
'Shipping and handling'
In reality, there is no Voice over IP. It is really voice over RTP, overUDP, over IP and usually over Ethernet. The headers and trailers arerequired fields for the networks to carry the packets. The header andtrailer overhead can be called the shipping and handling cost.
The RTP plus UDP plus IP headers will add on 40 bytes. The Ethernetheader and trailer account for another 18 bytes of overhead, for a totalof at least 58 bytes of overhead before there are any voice bytes in thepacket. These headers, plus the Ethernet header, produce theoverhead for shipping the packets. This overhead can range from 20%to 80% of the bandwidth consumed over the LAN and WAN. Manyimplementations of RTP have no encryption, or the vendor hasprovided its own encryption facilities. An IP PBX vendor may offer astandardized secure version of RTP (SRTP).
Shorter packets have higher overhead. There are 54 bytes of overheadcarrying the voice bytes. As the size of the voice field gets larger withlonger packets, the percentage of overhead decreases -- therefore theneeded bandwidth decreases. In other words, bigger packets are more
efficient than smaller packets.
Header compression
Cisco has created a header compression technique that is now thestandard called RTP header compression. This technique actuallycompresses the RTP, UDP and IP headers and significantly reduces theRTP, UDP and IP overhead from 40 bytes to between 4 and 6 bytes.The bandwidth consumption for compressed voice packets can bereduced by nearly 60%. This technique has less value for largeuncompressed voice packets. The header compression technique is not
recommended for the LAN implementations because there is typicallymore than enough bandwidth for voice calls. The header compressiontechnique should be considered for the WAN implementations, wherebandwidth is limited and much more expensive.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
12/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
Calculating bandwidth consumption for VoIP
The bandwidth needed for VoIP transmission will depend on a fewfactors: the compression technology, packet overhead, networkprotocol used and whether silence suppression is used. This tipinvestigates the first three considerations. Silence suppression will becovered in a later tip.
There are two primary strategies for improving IP network performancefor voice: Allocate more VoIP bandwidth (reduce utilization) orimplement QoS.
How much bandwidth to allocate depends on:
Packet size for voice (10 to 320 bytes of digital voice)
CODEC and compression technique (G.711, G.729, G.723.1,
G.722, proprietary)
Header compression (RTP + UDP + IP), which is optional
Layer 2 protocols, such as point-to-point protocol (PPP), Frame
Relay and Ethernet
Silence suppression/voice activity detection
Calculating the bandwidth for a VoIP call is not difficult once you knowthe method and the factors to include. The chart below, "Calculatingone-way voice bandwidth," demonstrates the overhead calculation for20 and 40 byte compressed voice (G.729) being transmitted over aFrame Relay WAN connection. Twenty bytes of G.729 compressedvoice is equal to 20 ms of a word. Forty bytes of G.729 compressedvoice is equal to 40 ms of a word.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
13/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
The results of this method of calculation are contained in the nexttable, "Packet voice transmission requirements." The tabledemonstrates these points:
Bandwidth requirements reduce with compression, G.711 vs.
G.729.
Bandwidth requirements reduce when longer packets are used,
thereby reducing overhead.
Even though the voice compression is an 8 to 1 ratio, the
bandwidth reduction is about 3 or 4 to 1. The overhead negates
some of the voice compression bandwidth savings.
Compressing the RTP, UDP and IP headers (cRTP) is most
valuable when the packet also carries compressed voice.
Packet voice transmission requirements
(Bits per second per voice channel)
Codec Voice bit
rate
Sample
time
Voice
payload
Packets per
second
Ethernet PPP or Frame
Relay
RTP cRTP
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
14/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
G.711 64 Kbps 20 msec 160 bytes 50 87.2Kbps
82.4Kbps
68.0Kbps
G.711 64 Kbps 30 msec 240 bytes 33.3 79.4
Kbps
76.2
Kbps
66.6
Kbps
G.711 64 Kbps 40 msec 320 bytes 25 75.6
Kbps
73.2
Kbps
66.0
Kbps
G.729A8 Kbps 20 msec 20 bytes 50 31.2
Kbps
26.4
Kbps
12.0
Kbps
G.729A8 Kbps 30 msec 30 bytes 33.3 23.4
Kbps
20.2
Kbps
10.7
Kbps
G.729A8 Kbps 40 msec 40 bytes 25 19.6Kbps
17.2Kbps
10.0Kbps
Note: RTP assumes 40-octets RTP/UDP/IP overhead per packet
Compressed RTP (cRTP) assumes 4-octets RTP/UDP/IP overhead per packetEthernet overhead adds 18-octets per packet
PPP/Frame Relay overhead adds 6-octets per packet
This table provided courtesy ofMichael Finneran.
The varying designs of packet size, voice compression choice andheader compression make it difficult to determine the bandwidth tocalculate for a continuous speech voice call. The IP PBX or IP phonevendor should be able to provide tables like the one above for theirproducts. Many vendors have selected 30 ms for the payload size oftheir VoIP implementations. A good rule of thumb is to reserve 24 Kbpsof IP network bandwidth per call for 8 Kbps (G.729-like) compressedvoice. If G.711 is used, then reserve 80 Kbps of bandwidth.
If silence suppression/voice activity detection is used, the bandwidth
consumption may drop 50% -- to 8 Kbps total per VoIP call. But theassumption that everyone will alternate between voice and silencewithout conflicting with each other is not always realistic. Silencesuppression will be discussed in a later tip.
Most enterprise designers do not perform these calculations. Thevendor provides the necessary information. The designer does have
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicemailto:[email protected]:[email protected]://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
15/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
some freedom, such as selecting the compression technique for voicepayloads and headers, and may be able to vary the packet size.
How can voice compression save bandwidth?The Public Switched Telephone Network (PSTN) started with thetransmission of analog speech. This worked well for decades until theareas under city streets became saturated with copper cables, onecopper pair per call. Starting in the 1950s, AT&T Bell Labs developed atechnique to carry more voice calls over copper wire. They developeddigitized voice technology through which 24 digital calls can be carriedon two pairs of copper wire, thereby increasing the carrying capacity ofthe cables twelvefold. The voice is digitized into streams of 64,000 bpsper call. The technology is called a T1 circuit and the bandwidth for the24 calls is 1.544 Mbps. This worked well for domestic connections. TheT1 technology then became the mechanism for long-distance domestictransmission.
Most of the early voice compression technologies were designed forundersea cables, where bandwidth was limited and expensive. Voicecompression technologies were created to reduce this bandwidth
requirement. Voice compression is also used for digital cell calls,operating at about 8 Kbps instead of 64 Kbps. So voice compression isnot new.
As the PBX market has moved into an IP-based environment, voicecompression has become attractive for WAN transmission. Voicecompression can be used on a LAN, but since LANs have so muchavailable bandwidth, it is not commonly applied to the LAN.
The quality of a PSTN voice call provides enough analog bandwidth tounderstand the speaker in any language. It is also enough bandwidth
for speaker recognition. The analog bandwidth delivered by the PSTN isabout 3.4 KHz. This is considered toll quality. Voice compression canreduce the speech quality and may affect speaker recognition, so thereis a limit to how much bandwidth reduction is possible before callerscomplain about voice quality.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchnetworking.techtarget.com/definition/PSTNhttp://searchnetworking.techtarget.com/definition/PSTNhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
16/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
The CODEC (COder/DECoder) is the component in an IP phone thatdigitizes the voice and converts it back into an analog stream ofspeech. The CODEC is the analog-to-digital-to-analog converter. TheCODEC may also perform the voice compression and decompression.
There are several voice digitization standards and some proprietarytechniques in use for VoIP transmission. Most vendors support one ormore of the following ITU standards and avoid proprietary solutions:
G.711 is the default standard for IP PBX vendors, as well as for
the PSTN. This standard digitizes voice into 64 Kbps. There is no
voice compression.
G.729 is supported by many vendors for compressed voice
operating at 8 Kbps, 8 to 1 compression. With quality just belowthat of G.711, it is the second most commonly implemented
standard.
G.723.1 was once the recommended compression standard. It
operates at 6.3 Kbps and 5.3 Kbps. Although this standard further
reduces bandwidth consumption, voice is noticeably poorer than
with G.729, so it is not very popular for VoIP.
G.722 operates at 64 Kbps, but offers high-fidelity speech.
Whereas the three previously described standards deliver an analog
sound range of 3.4 kHz, G.722 delivers 7 kHz. This version of
digitized speech has been announced by several vendors and will
become common in the future.It is important to note that all of the voice digitization transmissionspeeds are for voice only. The actual transmission speed required mustinclude the packet protocol overhead.
The quality of a voice call is defined by the Mean Opinion Score (MOS).A score of 4.4 to 4.5 out of a possible 5.0 is considered to be tollquality. Voice compression will affect the MOS. An MOS below 4.0 willusually produce complaints from the callers. Cell phone calls averageabout 3.8 to 4.0 for the MOS. The following table presents the voiceMOS for different standard CODECs:
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchnetworking.techtarget.com/definition/codechttp://searchnetworking.techtarget.com/definition/codechttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
17/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
Standard Speed MOSSampling delay per phone
G.711 64 Kbps 4.4 0.75 ms
G.729 8 Kbps 4.2 10 ms
G.723.1 6.3 Kbps5.3 Kbps
4.03.5
30 ms
This table illustrates two points. First, as the voice is compressed, thevoice quality (MOS) decreases. The MOS in the table does not includenetwork impairments such as jitter and packet loss. These impairmentswill further reduce the voice quality. The VoIP network designer shouldchoose a compression technique with a higher MOS so the networkimpairments will not reduce the voice quality to an unacceptable level.
Second, voice compression also adds delay to the end-to-end call. Thetable shows the sampling delay for one phone. This delay is doubledfor the two phones of a call. This end-to-end delay needs to be limited.As compression increases, the delay experienced in the IP networkneeds to decrease, which increases the cost of transmission over theWAN, but not the LAN. The delays shown in the table are thetheoretical minimum. The actual delays experienced will probablyexceed 30 ms, no matter what compression technology isimplemented. This delay will vary by vendor.
The conclusion is that digital voice compression is worth pursuing forVoIP transmission on a WAN, but it comes with some costs in voicequality reduction and increased end-to-end delay.
For more information, view this VoIP over WAN tutorial.
About the author:Gary Audin has more than 40 years of computer, communications andsecurity experience. He has planned, designed, specified, implementedand operated data, LAN and telephone networks. These have included
local area, national and international networks, as well as VoIP and IPconvergent networks in the U.S., Canada, Europe, Australia and Asia.
About G.711
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchenterprisewan.techtarget.com/guides/VoIP-over-WAN-tutorialhttp://searchenterprisewan.techtarget.com/guides/VoIP-over-WAN-tutorialhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
18/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
ISDN audio telecommunication may in principle be accomplished in many ways, butmost regular calls are compressed according to the G.711 recommendation of theCCITT (Comit Consultatif International Tlphonique et Tlgraphique, whichnowadays has been integrated into ITU). G.711 allows compression by usinglogarithmic interpolation, which reduces 14 most significant bits into 8. As thesampling rate is 8 kHz, the transmission rate equals to the 64 kbps offered by one
ISDN B-channel. There are two brands of G.711: A-law is dominant in Europe,whereas United States and Japan are commonly using u-law.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
19/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
What is HD voice?
Your standard POTS call captures and delivers sound in an audio range of 300 hertz
to 3400 hertz with standards set back in 1937. The VoIP equivalent of POTS is
G.711 and takes up 64 kbit/s of bandwidth.
The baseline definition for wideband voice typically called HD voice is G.722. It
delivers audio in the range of 30 to 7000 hertz, about twice as good as a typical
POTS calls and G.711. Due to a little data compression on the fly, a G.722 phone
call only takes up 64 kbit/s of bandwidth.
The combination of upper and lower frequency sounds gives a much clearer and
"richer" experience on voice calls with the key marketing-speak phrase used to
describe it as, "Conversations sound as clear and natural as if talking to someone in
the same room."
Additional/complementary buzz-phrases include "a dramatically improved
communications experience" and "Conference calls will be easy to follow and much
less exhausting."
Why is HD voice such a big deal?
Current quality of phone calls suck compared to FM radio or CDs and mobile callssuck more. Cellular tech heads started with a 1937-era audio standard and then ranthe the quality of experience through a more via data compression blender to crammore calls into radio frequency (RF) spectrum.Implementing HD voice should make everything revolving around voice conferencecalls, IVR, speech-to-text, calls to Mum and the wee ones a much betterexperience.
How do you deliver mobile HD voice?First, forget about the POTS network and all that legacy analogue crap. You need anall-IP network with low latency and enough bandwidth to transport a wideband voicecall, so you need the latest hot-rocking 3G and 4G-esque data networks.
France Telecom is delivering HD voice over the latest GSMHSPA-alphabet-soup via asoft client, but you can do the same thing on a fastenough WiMAX or LTE network. Qualcomm has done some demos over CDMA, but
given the worldwide love of 4G, mobile HD on that tech might be some wishfulthinking.
You also need end-user devices (i.e. phones) with a quality microphone to capture 7KHz of sound, enough CPU horsepower to encode and decode that information on thefly, and a speaker/headphone to deliver the sound to the human ear.
Nokia and SonyEricsson have announced phones that support AMR-WB, the de factostandard of mobile HD voice. You can also do mobile HD voice with a softclient and a
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/company/viahttp://news.techeye.net/topic/3ghttp://news.techeye.net/topic/4ghttp://news.techeye.net/topic/gsmhttp://news.techeye.net/topic/hspahttp://news.techeye.net/topic/wimaxhttp://news.techeye.net/topic/ltehttp://news.techeye.net/company/qualcommhttp://news.techeye.net/company/nokiahttp://news.techeye.net/company/sonyhttp://news.techeye.net/company/ericssonhttp://news.techeye.net/company/warner-brothershttp://news.techeye.net/company/viahttp://news.techeye.net/topic/3ghttp://news.techeye.net/topic/4ghttp://news.techeye.net/topic/gsmhttp://news.techeye.net/topic/hspahttp://news.techeye.net/topic/wimaxhttp://news.techeye.net/topic/ltehttp://news.techeye.net/company/qualcommhttp://news.techeye.net/company/nokiahttp://news.techeye.net/company/sonyhttp://news.techeye.net/company/ericssonhttp://news.techeye.net/company/warner-brothershttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
20/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
sufficiently powerful smartphone; expect to see HD voice clients for the iPhone beingdemoed by Global IP Solutions (GIPS) and Fraunhofer using codecs other than AMR-WB.
AMR-WB what the hell?AMR-WB (AMR-wideband) is the codec and heir-apparent replacement for AMR used
in "standard" GSM calls to provide mobile HD voice. Also called G.722.2, it isdesigned to provide an HD voice experience in 24 kbit/s a big deal to the cellularworld that wants to conserve both RF and network bandwidth.
But there's no free beer when compared to G.722. AMR-WB requires more CPUcycles and number crunching for efficient compression which translates to shorterbattery life. Further, AMR-WB is a patented codec with intellectual propertycontributed by France Telecom/Orange, Nokia, and Ericsson and VoiceAge.Alternatives to AMR-WB have been floated ranging from implementing G.722to Skype's SILK to Fraunhofer providing an "AAC Enhanced Low Delay" codec basedon MPEG.
G.722 has the advantages of being royalty-free and not such as CPU devourer, but it
takes up 64 kbit/s for the cellular RF heads, this is a theoretical show stopper, butsince the mobile people are pimping their data networks to support two-way videocalling with HD voice, the whole "conserve RF/conserve network bandwidth"argument is crap. Device manufacturers also like the fact that G.722 is a simplepiece of code to implement relative to all the different profile flavors for AMR-WB.Skype wants everyone to use SILK and offers it as royalty-free and open source butafter the skeletons as to who owned what IP after eBay bought Skype, well Itdoesn't stop people loading Skype clients on mobile phones and running SILK"natively."And it works just like normal phone calling, eh? If I have an HD voice phone and mybud does-Wellnot really, not yet.
Carriers and businesses running HD voice currently operate as islands you cancommunicate with someone within your network, but if carrier A has HD voice andcarrier B has HD voice, you aren't going to be able to connect an HD voice phone callbecause the higher level SIP/IP connectivity isn't set up if you are using those old-fashioned phone numbers to "dial" another person.
Some sort of HD voice interoperability / interconnection announcement at MobileWorld Congress is purportedly going to take place where a group of mobile carriershave agreed to exchange AMR-WB calls among themselves and if so, this is one ofthose Key Announcements which will get HD voice moving faster.HD voice interoperability is not technically hard, since mobile carriers already haveways to exchange MMS and picture mail and all those other multimedia-loaded
services via IP; supporting AMR-WB calls is just another data type to exchange viaIP. But the politics is another story.
Speaking of ugly, how do calls move between the PSTN, HD voice, AMR-WB, G.722,SILK, and whatever flavor-of-the-day codecs pop up?
Calls need to be transcoded translated between codecs. For example, FranceTelecom already has to transcode between mobile HD voice users and its own PSTNconnections to the rest of the world. And you have to transcode between AMR-WB
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/topic/smartphonehttp://news.techeye.net/product/iphonehttp://news.techeye.net/company/orangehttp://news.techeye.net/company/skypehttp://news.techeye.net/topic/open-sourcehttp://news.techeye.net/company/ebayhttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/smartphonehttp://news.techeye.net/product/iphonehttp://news.techeye.net/company/orangehttp://news.techeye.net/company/skypehttp://news.techeye.net/topic/open-sourcehttp://news.techeye.net/company/ebayhttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice -
7/27/2019 HD Voice Codecs
21/21
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
and G.722 (mobile HD voice and broadband HD voice), plus SILK since Skype wantsits due for HD voice.
Doug Mohney is Editor-in-Chief of HD Voice News (www.hdvoicenews.com) and ishappy to cause heartburn in league with Mike Magee whenever he can.
Read more:http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://www.hdvoicenews.com/http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViLhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViLhttp://www.hdvoicenews.com/http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL