Understanding Media in SIP Session Description Protocol (SDP)

Reading Time: 9 minutes

Are You Struggling with Failed Negotiations or One-Way Audio?

There are many SIP networks where either calls fail entirely or issues arise when a conference bridge offering a diversity of codecs (some unsupported) cause failed negotiations or one-way audio. In addition, the increased use of VPN especially from current traditional SD-WAN implementations cause incorrect mapping of IP addresses and longer latencies. Video and voice codec negotiations fail due to no match or common ground between codecs offered by each endpoint. There might also be a problem with SIP implementation between the two vendors.

This post aims to give a high-level understanding of how Session Description Protocol, a protocol used within a SIP message, presents various media capabilities of the VoIP phones and networks to negotiate smooth and successful multimedia communications and a better user experience. When SDP in SIP is implemented correctly, it can support voice calls, video conferences, fax over IP and instant messages, as well as the distribution of multimedia. When it’s not, however, successful communication is impossible. For SIP SDP to work, both endpoints must agree on which codec will be used for the call or conference in question.

What is Codec Negotiation?

The codec encodes and compresses the analog audio containing our speech or video and compresses it into the bandwidth available on the IP network. A measure of a codec’s sophistication is its ability to carry high-definition Voice or high-fidelity VoIP audio in a small amount of bandwidth required from the network to transport the voice. For example, Opus and SILK codecs provide hi-fi quality speech up to the limit of the human ear (20kHz) or a bit lower than that for older folks like me using low digital bandwidth (down to as low as 30Kbps).

Both endpoints involved in the phone conversation must agree which codec is going to be used for a particular call in order to ensure interoperability and correct decoding of the audio sent. This process is called Codec Negotiation and occurs while the SIP signaling is setting up the call. The specific media descriptions are specified and offered in the codec list within the Session Description Protocol (SDP) part of the SIP INVITE message. It is then counter offered by the terminating endpoint during this negotiation.

If the negotiation fails, the switch or Session Border Controllers (SBC) will notice the mismatch between the parties and the fact that there is no common codec to be used, offered, or agreed upon. The SBC will then generate a SIP response message 488 “not acceptable media” or “not acceptable here”. In this case the call fails and setup never completes.

Session Description Protocol (SDP)

Understanding and Troubleshooting – SDP in SIP

RFC 4566 (obsoletes RFC 2327) defines the details of SDP in complete detail intended for describing multimedia sessions for purposes of session announcement, session invitation, and other forms of multimedia session initiation such as conference calls.

The SDP session description consists of several lines of text in the form <type>=<value>. The session description starts with v= line (version number, always zero) and each media level section starts with an m= line. We will focus on the media section and media attributes in more detail.

m=<media> <port> <proto> <fmt> ...

Where is the transport protocol dependent on the connection type, and is the media format description.

For subfields, the common types for RTP media are RTP/AVP or RTP/SAVP. RTP/AVP is the Audio/Video profile carried over UDP, whereas RTP/SAVP indicates Secure RTP (encrypted SRTP audio) running over UDP.

The C Field <c=> indicates the IP address where the media RTP should be sent to by the other end.

The <fmt> field indicates the payload type. The details of common payload types is shown in the table 1.1 below. For non-RTP media, the proto field could be set as UDP as indicated in the example below, transport of whiteboard (wb) media type over UDP.

m=application 32416 udp wb

The <m> field shown above, is used to indicate that audio is the payload and also the UDP port number used by the RTP.

a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]

This attribute provides more details such as encoding, clock rate and encoding parameters of the payload type used in the m= line. Up to one rtpmap attribute can be defined for each media format specified. Thus, we might well have the following:

a=fmtp:<format><format specific parameters>

Codec specific parameters can be included through the fmtp attribute.

m=audio 49230 RTP/AVP 18 96 97 98
a=rtpmap:18 G729/8000/1
a=rtpmap:96 L8/8000
a=rtpmap:97 L16/8000
a=rtpmap:98 L16/11025/2
a=fmtp:18 annexb=yes

During the early stages of RTP development, statically assigned payload types (0–34) were used to bind encodings to payload types. Since the payload type number space is limited and relatively small, it cannot accommodate static assignments for all existing and future encodings. So payload type numbers in the range 96–127 are used exclusively for dynamic assignment of a codec to be used for that particular call. Payload types 35–95 are unassigned and reserved for future use.

So the m= line gives a summary of all codecs to be used. The <a=> lines go into detail and provide specific codec definitions dynamically allocating a number for the payload type from 96-127 is used for that call.

For static payload types, the a=rtpmap attribute, may be omitted in case the payload details to be used are completely as per the RTP audio/video static profile for that payload type. For example: 16-bit linear encoded stereo audio sampled at 16 kHz using dynamic RTP/AVP payload type 98 for this stream:

m=audio 49232 RTP/AVP 98
a=rtpmap:98 L16/16000/2

SDP also describes when fax over IP is being offered as a service specific to that particular call. It describes the capabilities of the fax machine and how to negotiate fax transmission. If you need to troubleshoot T.38, it’s necessary to understand these SDP settings which can be shown on a SIP monitor network.

RFC 3551 specifies an initial set of “payload types”. Table 1.1 below lists some of the common payload types defined in RFC 3551 and extends that list for easy reference.

Table 1.1  – Common payload types defined in RFC 3551

Payload ID Encoding Name Audio/Video Clock Rate (Hz) Channels Reference Description
0 PCMU A 8000 1 RFC 3551 ITU-T G.711 PCM μ-Law audio 64 kbit/s
3 GSM A 8000 1 RFC 3551 European GSM Full Rate audio 13 kbit/s (GSM 06.10)
4 G723 A 8000 1 RFC 3551 ITU-T G.723.1 audio
8 PCMA A 8000 1 RFC 3551 ITU-T G.711 PCM A-Law audio 64 kbit/s
9 G722 A 8000 1 RFC 3551 ITU-T G.722 audio 64 kbit/s
10 L16 A 44100 2 RFC 3551 Linear PCM 16-bit Stereo audio 1411.2 kbit/s, uncompressed
11 L16 A 8000 1 RFC 3551 Linear PCM 16-bit Mono audio 705.6 kbit/s, uncompressed
12 QCELP A 8000 1 RFC 3551 Qualcomm Code Excited Linear Prediction
15 G728 A 8000 1 RFC 3551 ITU-T G.728 audio 16 kbit/s
18 G729 A 8000 1 RFC 3551 ITU-T G.729 and G.729a audio 8 kbit/s; Annex B is implied unless the annexb=no parameter is used
26 JPEG V 90000 N/A RFC 2435 JPEG video
31 H261 V 90000 N/A RFC 4587 ITU-T H.261 video
32 MPV V 90000 N/A RFC 2250 MPEG-1 & MPEG-2 video
33 MP2T AV 90000 N/A RFC 2250 MPEG-2 transport stream
34 H263 V 90000 N/A RFC 3551/2190 ITU-T G.711 PCM μ-Law audio 64 kbit/s
35–95 Unassigned/Reserved
96–127 dynamic RFC 3551 Payloads defined dynamically during a session

Note the difference between kHz and kilobits per second used here. kHz are the units used to describe the bandwidth of an analog signal while kilobits per second are the units used to describe the rate at which a analog signal is sampled and converted into a digital signal as the first step to encoding and compression for transmission across the network. For example, 20 kilohertz analog bandwidth (excellent high-fidelity audio voice) might be compared with the 3.7 kilohertz of analog bandwidth which is transmitted down an analog POTS line or a G.711 narrowband PCM VoIP phone conversation. 20 kilobits per second is a sample rate used by a high bandwidth codec which is used to translate an encoded signal containing an analog signal with 24 KH of audio bandwidth (the full range of the human ear) plus background noise suppression and a good deal of forward error correction (FEC). This protects the audio from packet loss across the network. There’s a lot of technology squeezed into a codec using 20 kilobits per second.

Frequently Asked Questions

How does Session Description Protocol (SDP) work?

Reading Time: 9 minutes
SDP is used to negotiate the specifics to allow streams of packets to flow between two network sockets, as well as the specifics of the stream itself (ie: codec used, etc). In simple terms, how SDP works is that one end (usually the originating side) offers a set of characteristics that it can support (ie: the network socket it has created for communication, what codecs it can support, etc). The receiving side will then reply with a counter-offer, usually by listing one of the offered codecs–this serves as acceptance of that codec–and also identifying the socket it has established. Finally, the originating side will either accept or refuse the counter-offer. If the offer is accepted, the streams are established and the call proceeds; otherwise, it is torn down with an error.

What is an SDP event?

Reading Time: 9 minutes
SDP informs both the parties during call setup that endpoint is capable of telephone events, i.e., DTMF/2833. Once it’s negotiated and the call is setup, the endpoints are aware that other side can support the telephone events/DTMF and SDP intervention is not needed any more unless capabilities need to be renegotiated.

What's the difference between SIP and VoIP?

Reading Time: 9 minutes
At its simplest, SIP refers to the specific protocol used to negotiate the audio or video conversation between two endpoints flowing over an IP network, while also being used to manage and maintain the ongoing conversation. VoIP, however, is an umbrella term that encapsulates a wider range of related protocols of which SIP is merely a single one. For example, VoIP can also encompass presence-related metadata that relates to the SIP session.

What is codec negotiation?

Reading Time: 9 minutes
It is the process of two endpoints negotiating on a common, shared audio, and/or video codec to use. It is almost always completed using SDP.

What is an example of a codec?

Reading Time: 9 minutes
A codec is an algorithm that transforms an audio or video data stream into a standardized encoded form, and back again at the other end, so that it can be readily transferred as data, whether over a network or by other means, such as a file. Different codecs exist to support differing performance and bandwidth constraints. G.711a and H263 are both examples of specific codecs, with the former being a purely audio codec and the latter purely video. Conversations that require both audio and video will typically negotiate each separately, but codecs do exist that support both simultaneously.


Teraquant has a Solution!

Locating the source of a failure starts with understanding the Session Description Protocol (SDP) and monitoring. If you do business over a VoIP network and rely on SDP/SIP for multimedia communications, we can equip you with the tools and know-how needed to ensure seamless communication every time; isolating problems in minutes, reducing operational costs, and yielding a better user experience for everyone.

Not sure what solution is right for you? Ready to chat?

About Richard Jobson

Richard is the Founder and President of Teraquant. He has been interested in speech quality assessment and the protocols which drive our telecom network since 1993 and since 2014, has worked with researchers in Public Safety Communications to develop better ways to measure the user experience.

Teraquant Reviews

  • "Palladion is actually now owned by Oracle, however there is one last standing VAR of the product with whom we have had great success. They are called Teraquant. The team there knows more about the product and are better suited to support it than Oracle, for sure.”

    Joshua Lesavoy CIO | Nextiva
  • “The support we have received from Teraquant has been great. They have been alongside us throughout and just a real pleasure to work with.”

    Chris Hall IT Director | Onvoy Communications, LLC
  • “We have just finished another round of Oracle Communications SBC & EOM training with Teraquant and, as expected, several folks from different teams reached out to me to express great satisfaction. Even though the basic usage is easy to learn; the training from Teraquant, showed us how to troubleshoot real issues with complex SIP call flows, allowed us to comprehend the tool’s true potential and empowered us to achieve huge productivity & improved service quality.”

    John Zulaloglu Sr. Communication Services Engineer | Intuit

Standard of SDP

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.