What does Concorde have to do with Troubleshooting VoIP

 In Troubleshooting and Performance Monitoring
Reading Time: 4 minutes

A significant life-time treat for me, I found myself on the flight deck of Concorde for takeoff, pre-911 of course.

At the head of the runway, the captain slammed the throttles hard down. Seconds later, we were at 8,000 feet and climbing fast. What a thrill! Out over water, the Captain, first officer and the engineer were thinking about accelerating the aircraft beyond Mach 1, going supersonic.

From a wall of the cockpit covered in old analog instruments, the engineer pointed to four particular dials. These measured the speed of air through each of Concorde’s four monstrous Rolls-Royce Olympus jet engines. “We close these intake baffles to slow air speed through the engine itself down below the speed of sound” the engineer explained. Shockwaves at supersonic speeds inside the jet engine could smash the fan blades. So if these baffles don’t close properly and slow down the speed of air inside the engine, we will be traveling all the way to New York at subsonic speeds. It will take us 7 hours to get there from London, instead of 3 hours, 30 minutes.

As the baffles closed, the needles of these analog instruments moved slowly down below Mach 1.0 – except for engine number 3! Would my adventure be deflated, like a pin in a balloon, never in my life piercing the sound barrier? Not to worry, the engineer picked up his trusty torch, functioning as a small hammer in this case and gently tapped the glass face of the analog instrument. The needle on the dial hesitated for a split second, and then kicked down below the 1.0 reading. Relief! “Tally-Ho” said the captain — at least, that’s how I remember it — as he pushed the throttle sticks forward. It was like two puffs of air from a straw pressing into your back. Just an exhilarating feeling – two small silent pulses and we were supersonic. The air speed indicator then climbed assertively to Mach 2.20, maximum speed of Concorde, 1,450 mph.

Time for some champagne!

Concorde at JFK

Do we believe our instruments or not?

If your instruments are telling you the wrong results, you will make incorrect decisions. We would not have flown supersonic that day if we had relied on the needle of the old analog instrument. You need confidence in your instruments. This can be realized through methods such as calibration, verification or with experience, over time.

When troubleshooting a VoIP network, as with any task involving analysis, firstly we need good data. Many troubleshooting tools and network monitoring software rely on information given them by the network elements themselves. Logs can be pulled from infrastructure devices themselves, or data exported using SNMP. In addition, a telephony switch or PBX produces a record of data at the end of each call known as a Call Detail Record, or CDR. Many solutions for troubleshooting use these CDRs to give an overview of the relative performance of the network. Typically, the reason for the end of the call, is given by the switch, e.g. normal clearing, busy, or in the case of a SIP network, you will get SIP error codes e.g. 403, 503 or 600 each indicating different types of failure.

In order to troubleshoot what led up to the failed call, you need much more detailed information – you need the raw packets, the actual SIP messages themselves. This involves taps or conveniently configuring a span port or port mirroring. This functionality is available in all Enterprise-class ethernet switches or routers.

If an IP telephony softswitch, SIP proxy or other network element is stressed with multiple failed calls or SIP registrations, or a denial of service attack or high call volume, the first thing it drops are the diagnostics produced by the device itself i.e. the logs and the CDRs.

Turning on trace functionality on a device to capture raw packets only exacerbates the problem because it places a huge burden on the CPU of the device.

Dropped calls are not uncommon occurrences on VoIP networks. If the call does not complete gracefully, either with a successful termination or a specific failure Cause Code, a CDR is never generated in the switch. No record of the call exists, let alone the ability to troubleshoot the problem.

Tools that start with raw data i.e. packet capture of greater levels of confidence versus relying on SNMP or on metrics derived from the device itself.

Determining voice quality or MOS score from RTCP

RTCP is a packet of information released by a device at the end of a call recording the number of VoIP audio packets expected and the number of packets actually received, plus similar readings for Jitter. Although RTCP received from the periphery of the network is useful to expand the reach of measurement, a telephone endpoint’s prime function is to make and receive calls, not make measurements and is built down to a very severe price point. The best practice is to use RTCP in conjunction with analysis of the RTP itself captured from that same network segment or leg of the call.

‘Laat niet de slager keurt zijn eigen vlees’ as they say in the Netherlands: “Don’t let the butcher approve his own meat.” Or “Don’t let the fox guard your henhouse.”

Finding Common Threads in Service Level Anomalies Using Big Data Analytics-Voice QualitySenior on phone with good CX