SIP Register Avalanches Occur When You Least Expect and Require Careful Measures to Recover Service
You’ve been up all night during a major maintenance operation to your network. Maybe you’ve been doing a plan software upgrade. Everything looks good and you’re looking forward to getting some sleep before the day starts. You bring the network back on-line and into service allowing the SIP phones to register on your network. The problem is during the planned network outage, all the registrations have expired. So now they will all seek to register at the same time.
The network Is not engineered for this avalanche of registrations, all within the same second. It will overload. There will be another outage and the SIP phones will not be able to register. As soon as you fix the outage and bring service back online, the same avalanche will happen again for the same reason.
Careful metering of SIP registrations needs to be implemented in order to ever recover from this event; otherwise, it will happen over and over again.
The SIP protocol allows for users on the internet to register, exchanging authentication information to use a phone service of a cloud voice company.
RFC 3261 – SIP: Session Initiation Protocol is the core SIP specification which includes the definition of the SIP REGISTER method. This declares the AOR of the user agent to the network and how they may be reached for incoming phone calls.
These SIP registrations are stateful and expire after a short period of time upon which they need to be renewed. The Password Credentials do not have to be resubmitted to renew the registration.
Here’s an example of a SIP REGISTER message
Via: SIP/2.0/UDP phone1.teraquant.com;branch=z9hG4bK12345
From: <sip:user@teraquant.com>;tag=a7g8r
To: <sip:user@teraquant.com>
Call-ID: abc123def4567@phone_ext.teraquant.com
CSeq: 1 REGISTER
Contact: <sip:user@phone_ext.teraquant.com:5060>
Expires: 3600
The expires field shows the value in seconds for which this registration remains valid e.g. 3600 seconds in this example. The RFC specifies, re-registration should occur within half that time, i.e. 1,800 seconds in this case.
The problem arises as typical expire intervals are more frequently 60 seconds, rather than 3,600 seconds shown here.
What happens if a part of your network goes down or your SIP Registrar server becomes unavailable?
If the outage exceeds half the expiry period, (typically 30 seconds!) when the service resumes, all user agent devices will try to register ALL at the same time. They flood the network with SIP Register messages. This will bring down your network again.
The result is a continuing outage, where the service coming back into availability, is hit by all endpoints seeking to register at the same time. This overloads the Call Agent, SoftSwitch or SIP Registrar and the whole avalanche disaster happens once again, a continuous repeating wave of failures. It becomes a real challenge to bring your voice service back into operation.
This cataclysm requires careful treatment in order to recover the network and the service.
SIP Registration offload explained
Firewalls or NAT devices permit outbound only connections from the trusted side to the untrusted side and retain open a TCP or UDP socket, or pinhole so that the response to this outbound initiated session may pass through the firewall. SIP Signaling emanating from an enterprise premises through the NAT device may use source IP address/socket 1.2.3.4 port UDP 5060 to send a SIP INVITE out to initiate an outgoing call.
In the case of inbound phone calls, no such Pinhole will be open. How does a SIP INVITE get in through the firewall to initiate an incoming call. But wait…the endpoint inside the NAT device will have necessarily SIP registered on the network to know where to send the SIP INVITE for the incoming call. This would have been initiated by an outbound UDP session, namely, the initial SIP registration. The pinhole in the NAT firewall created by this must be kept open to allow inbound SIP VoIP packets for an incoming call.
This requires SIP protocol savviness, typically absent from your low-cost NAT device such as a cable modem or enterprise IAD or access router.
So ACME Packet (now Oracle SBC) stepped in to pioneer a patented technique called HNT, or Hosted NAT Traversal which serves to keep the NAT pinhole open. The SBC requires the SIP endpoint inside of the NAT device to SIP REG very frequently. In other words, the SBC sets a short expiry interval for the IP phone to SIP re-register, thus keeping the pinhole of the amnesiac NAT device open.
This requires super low SIP registration expiry intervals and puts a heavy load on the Class V switch or SIP Registrar. For example, the expiry header in the SIP REG/200 may be set to 60 seconds for endpoint to refresh its SIP registration every 30 seconds. The ACME Packet/Oracle SBC offloads this burden on the Class V switch or SIP Registrar by caching the SIP REG from the Endpoints. So the SIP core/switch need only require RE-REG every 3600 seconds from the SBC. The SBC caches the state of these SIP registrations, keeping them current, (not EXPIRED) and this takes a huge load off the Class V switch or SIP Registrar.
Techniques to protect against SIP Register avalanche events
Preventing user devices from re-registering until things are under control, is not a desirable option, even if it’s possible. It doesn’t scale and involves control of customers’ IP phones, which is rarely available.
Sophisticated Session Board Controllers like Oracle SBC can be configured to control these re-registrations, throttling the pace and allowing the SIP registrar to recover gracefully.
Part of the problem is that all SIP re-registrations are due at the same time. Hence the overload occurs within a short time period, knocks out the SIP registrar again and the avalanche continues its destructive path as the SIP registrar is never able to stabilize the situation.
Protection techniques include the following categories

- When the SBC sees the volume of registrations going to the SIP Registrar, it can buffer these messages and deliver them a more leisurely rate to the Registrar. You may be able to set this rate at exactly that which can be processed by the SIP Registrar. If the rate of registrations exceeds the capacity of the buffer, the SBC can send 403 challenges, requiring the user agent to re-provide password. This will slow the process down, and allow time for the voice service elements to recover.
- The SBC can seek to stagger the time when each endpoint needs to re-register. Therefore, they’re not all trying to register at the same time, causing the overload. The rate at which registrations are delivered to the Registrar is thus smoothed-out and paced.
- This is done by incrementing the expiry interval in the REGISTER 200—OK by one second for each batch of SIP registrations. Therefore, subsequent batches may be required to re-register after 45 or 90 seconds, or longer, instead of after 30 seconds (if the expiry time specifies 60 seconds).
Expedited treatment of trusted User Agents
These next techniques are more about expediting trusted User Agents quickly through the Registrar and demoting more complex less trusted User Agents.
For example, a SIP REG that is cached, represents a User Agent that has already supplied valid user credentials and only needs to renew an existing registration. This can be expedited quickly to the registrar. Resources are saved because the state of a 403 challenge for a password does not have to be kept.
Also, in a case of the network waiting for the password credentials in response from the endpoint to the 403 challenge, the SBC can save the state of this SIP Reg in memory. So when the endpoint sends its credentials, the SBC can quickly forward them to the registrar, getting this phone registered and out of the way.
Getting this one out of the way.
In addition, if a registration request receives a 401/407 response (to the first REGISTER request) from the Registrar, the SBC can promote the SIP endpoint to a trusted access-control level, forward the authentication information from the Endpoint without having to reevaluate the source IP address, and get that user agent registered and out of the way quickly.
The idea is for Endpoints that quickly provide accurate and correct credentials, their registrations are processed quickly. This saves resources and ameliorates the avalanche event.
session-constraints versus sip-config options: which should I use?
Quite possibly both! But consider what it is you’re protecting.
The sip-config options are typically outward facing, i.e. they offer relief to access side SIP endpoints that are experiencing difficulties in the wake of a SIP registration avalanche.
Speak to Teraquant for advice on how to proceed.
Test to expose any weaknesses and verify your SBC is correctly configured
SIP REG avalanche attacks are a built-in consequence of SIP registrations, which expire frequently within a short term. ALG. CPE NAT demarcation devices are simple low cost items. They are the cause of most one-way audio. See another of Teraquant’s highly popular technical articles here.
However, help is at hand. You can sleep well at night once you’ve configured your SBC correctly, and tested it in your lab.
Teraquant can help you with this during a simple test exercise lasting only a few hours. We have a simple process to generate the load necessary to simulate as many endpoints as you have. You then trigger the fail/ outage scenario, and we can test how the SBC, correctly configured, can smooth out the load and gracefully register all your expired IP phones/SIP endpoints in an elegant manner.



