CAN YOU HEAR ME (BETTER) NOW?
by William Flanagan
For at least its first 100 years, the telephone was limited in its audio bandwidth very deliberately. There were technical reasons to cut off the low frequency response well above the frequency of A.C. power lines--and even the second and third harmonics of 60 Hertz (Hz, Cycles per second). Long stretches of phone wire running in parallel with power lines would pick up hum. At the high frequency end, cost was a factor--why make a high fidelity speaker when the human voice conveyed most information at around 1000 Hz.
Early analog multiplexing settled on 4 kHz of carrier bandwidth per channel, some of which was devoted to guard bands and signaling. The usable audio channel came out to be roughly 3,000 Hz, 300 to 3,300 Hz.
When voice was digitized in the 1960's, it happened entirely within "the Phone Company" which used it for transmission between central offices. All the analog telephones and switches remained in place. Early digital electronics were rather primitive by today's practices. Extrapolate Moore's Law backwards from the average CPU now--for 40 years--and see what you get. So the digital channel copied the analog channel bandwidth to define a circuit-switched 64 kbit/s bit stream (the DS-0). The encoding method, still, is Pulse Code Modulation (PCM) as described in ITU Recommendation G.711. We know that in our sleep, right?
What not so many know is that as far back as 1988, the ITU (then CCITT) published a recommendation (G.722) for 7 kHz audio coding. Twenty years after PCM, the capabilities of electronics could step up to the task--still within a DS-0 of course because that's how calls were set up.
The DS-0 was rigid. If you wanted to send some data with the voice call, you would have to alter the coding of voice to reduce the bit rate and make room for a data channel inside the DS-0. OK, we can do that. So G.722 offered a variable bit rate voice coder that adapted by trimming off some of the high end response when needed to accommodate data.
At the start, I said that most of the information in a voice is carried at frequencies around 1 kHz. Most, but not all. The higher frequencies lend realism, clarity, warmth, and other good stuff to the sound. A broader spectrum makes it easier for listeners to recognize voices. You can see the effects from a demonstration on the web site of Audio Codes, G.711 vs. 7 kHz.
Meanwhile, back at VoIP, what does this have to do with the price of beer? Just that we enjoy huge processing power, huge bandwidth, and freedom from the constraints of the fixed channels in time division multiplexing. We can mix voice and data (and video, and IM, and ....) quite freely. We can encode voice at any bit rate we want.
If desired, there's no reason we can't build Hi Fi audio equipment for telephone calls. In most IP phones, the audio components are capable of frequencies well above 7 kHz just because it's modern technology. Even if the outside world insists on G.711 coding, we can use "high definition audio" within our organizations--processing power to perform transcoding at gateways is cheap (relatively) and lots of vendors know how to do it. Gigabit Ethernet on a LAN doesn't break a sweat handling dozens of HD calls while we all merrily browse, download, and connect thin clients to apps in the cloud.
Oddly, a key driver for HD voice seems to be the increasing use of video conferencing. The old "head in a barrel" telephone sound doesn't fit well with today's real-time video images. The whole experience of "remote presence" is a better package when the sound quality is higher. Apparently, AT&T learned this lesson when test users of the early video telephone perceived an increase in image quality when it was the sound that was improved.
If you are planning for LAN capacity to include voice, the impact of HD voice won't be huge. Better encoding methods for 7 kHz voice take about the same number of bits as PCM. When the approximately 40% extra for layers of protocol headers is added, the difference might be ignorable. Signaling is the same regardless of sound quality.
If you can't hear me better now, you probably will in the near future.
![]() |
In Converged
Networking We Have the Experience |