Real World Problems Found during SIP Load Testing
How to Ensure Your Contact Center Network is Compliant for your Clients
What’s covered in this No-Fluff Tech Article
Every time we do a load test for a customer’s network, we always find some bottleneck that needs to be fixed or opened up. Usually, it is a configuration issue which may be erroneous or intentional, due to a policy setting. But sometimes it is a software bug, incorrect hardware sizing or dimensioning, or just plain the device is not up to the job. This “No fluff” tech article will guide you through the different categories and network element device functions that impact delivering service quality when your network is working flat out. We’ll also dive deeper into the specific issues we found. Many are what you wouldn’t expect. So, we’ll end with the easy way to turn over the rocks as the river is in full-flow and find all buried bugs and roaches to make sure your network can deliver at busy-hour and especially during high-load times, such as your client’s enrollment seasons, a snow day, or state emergency.
Many high-end quality clients require BPO call center network providers to load test their network, to prove and certify full provisioned volume is supported and their services can deliver all features at peak designed-for capacity.
Hammer dialing may achieve an initial purpose. However, when features are implemented, such as call transfer, or the call is transcoded, the utilization of resources on a loaded network increases exponentially. It is important to load stress the network with call flows that are typical of or possible in normal operations.
It’s also vital to be able to report on maximum performance and root cause analysis of any problems or bottlenecks.
What is Load Testing a SIP
Mindful of your time, if you are fully familiar with load testing, you can skip this section. If you need immediate help on getting low cost, comprehensive load testing done on your Call Center or network, Request Support.
Load testing of a SIP VoIP network involves simulating a large number of calls and measuring the performance of the network service as the calls propagate through various components, such as the Session Border Controller (SBC), firewall, session manager and PBX. This type of testing is crucial for ensuring that the network can handle the expected number of calls and for identifying any bottlenecks or limitations.
The load testing tool should be configured to simulate the expected number of calls and the expected call volume. This can typically be done by setting the number of virtual users, the call rate, and the call duration. The load testing tool should also be configured to simulate different types of calls, such as voice, video, and fax, as well as different codecs, such as G.711, G.722, and G.729.
Once the load testing tool has been configured, the next step is to run the test out of normal business hours. During the test, the load testing tool will simulate the calls and importantly measure the performance of the network as the calls propagate through the various components, both SIP Signaling and media. The test results will typically include metrics such as call setup time, call completion rate, and call quality. These metrics can be used to identify any bottlenecks or limitations in the network and to determine whether the network is capable of handling the expected number of calls.
Real World Use Case Problems
Typical problems found during load testing are not what you would think. Sometimes intentional policies applied within a cloud service provider are not in line with those of their client. Telephony departments are sometimes different from security departments and priorities can work against each other.
Example Key Issues Observed During Load Testing
Here we have the client uses accessing through the network through an SBC going to a Session Manager and then up to the PBX and IVR where auto or auto answer terminates the call. This is a very typical call center scenario.
Problem #1: Session Manager not set for TCP port 5670 in the Access List
During this load test, we managed to generate up to only 20 calls successfully to the Session Manager through the SBC, when the SM sent a TCP RST message response back to the SBC. The SBC was sending traffic on TCP port 5670 at a rate of 2 cps. The Session Manager perceived a potential attack and, following current configured security policy, sent this hard reset. This caused the existing TCP socket to close, leading to new sockets being created and further messages from existing calls being sent over new TCP sockets. Ultimately, the SIP layer of the SBC was confused, not wanting to forward ACK messages from existing calls on a different TCP port.
Resolution:
There is an easy workaround. But it does reveal a subtle interoperability issue when unnecessary firewalls in the path of a Voice only circuit don’t understand SIP rules intervene at the TCP layer. The issue was resolved by adding port 5670 to the Access List for the Session Manager and increasing the policy threshold as it had been classified by the firewall as a SIP Brute Force or TDoS attack and quarantined.
Problem #2: ALG Issue
Another network we load tested had the ALG functionality enabled on their Firewall between the SBC and PBX. This “deep inspect” of every SIP message and header did something to the SIP which was distinctly in violation of SIP protocol. During the tracing, we saw that many of the Invites reaching the PBX had 2 Invites that were merged within one SIP INVITE message with some SDP headers within the Invite improperly formatted.
Resolution:
As you have heard from us many times before, if you must install a firewall or an IAD in the path of a pure voice channel which is being secured by a high-end SBC such as an Oracle SBC, please turn off the ALG. An Oracle SBC is the most capable device for securing any SIP network. It more than competently protects any SIP or anything that tries to fuzz the SIP protocol. In addition, anything that is not SIP is immediately ignored.
Problem #3: Firewall Over Utilization
During one load test, the firewall in the path again reached 88% of CPU utilization while approaching the peak call load for the test. However, this happened on only 1 of the firewalls tested even though the firewalls at a different data center had exactly the same hardware (VM) and software configuration. Due to this issue, the MOS score goes briefly from Green (>4.03) -> Yellow -> Red (<3.09) when the firewall utilization is above 80%. For the firewall where CPU was not spiked to above 80%, MOS is consistently Green.
Resolution:
On further analysis, it was found that updates and backups were being auto run on this firewall at the same time, which might have caused the higher CPU utilization. This makes sense. We do load testing during out-of-hours scheduled maintenance. Similarly, you would upgrade the software on the firewall and do backups during out-of-hours scheduled maintenance. We also found performance bottlenecks when a firewall had when one of the network elements had diagnostics switched on during the course of the load test.
Conclusions
Many of the above items causing problems are not the typical bottlenecks and glitches you would usually think about. The list is endless. Impairments to full-service quality under maximum provisioned and planned capacity can hit you from any side. Load testing used to be an expensive exercise, calling in the big-name brand IT consultants. This is history.
SIP Load Testing from Teraquant gives you deep diagnostics and reporting on all performance levels (KPIs) and detailed information on voice quality right through including the endpoint and includes root cause analysis on the bottlenecks and mis-configurations. Teraquant SIP Load Testing is easy to implement, in both private cloud and public cloud, scales for the largest of networks, and is low cost.