MCPTT Standards Require Measuring Mouth to Ear (M2E) Latency

 In Troubleshooting and Performance Monitoring
Reading Time: 6 minutes

The Mission Critical Push-To-Talk standards call for salient MCPTT KPI’s (Key Performance Indicators) which concisely characterize the performance of a  MCPTT system as experienced by the user. Other measurements required by these standards include speech intelligibility, the PTT Access Time and voice quality (PESQ/POLQA)

In this article, we will discuss true end-to-end audio latency or mouth-to-ear latency. It is essential to optimize latency because high latency leads to cold speech, people talking over each other and verbally seeking reassurance that the receiving party is still present and being heard.

This leads to unproductive conversations, poor communication and in the first-responder situation, loss of timely and vital communication to save lives. The first step to optimizing latency is to measure latency. The first step to improving anything is to measure its current value.

Distinguishing Audio Mouth to Ear (M2E) Latency versus Packet Latency.

Packet latency is not the same as the real latency experienced by the users.  Latency usually refers to packet latency, that is typically from when data leaves the computer such as a client and reaches some server in the cloud. The data is traveling over an IP network which is packetized where other users’ data streams are competing for bandwidth and router resources. Packets can be lost, congested, jittered and delayed as they pass through these routers and this impacts voice quality. Therefore, packet latency is the most frequently measured.

However, in relation to voice services, the end user experience of latency starts from when the speaker makes the first audio utterance to when the listener hears that utterance at their ear. This we call specifically Mouth-to-Ear latency. There are complex signal processing components in the devices operating at the voice systems layer which introduce propagation delay which adds to the overall audio latency or mouth-to-ear (M2E) latency experienced by users in addition to packet latency.

This audio latency measurement is important for all telephony and unified communication such as cell phone calls, Zoom and Teams conferences. However, for emergency services, first responders and public safety personnel, while keeping us all safe, it is even more important to measure this quantity. Their extensive use of Mission Critical Push-To-Talk involves the complexities of the MCPTT signaling process, setting up a call or group conference session introduces significant end to end latency.

This is especially so where the voice transfers from legacy FM radio (P.25 LMR) to sophisticated cellular 4G-based radio as with the FirstNet system. in addition to the packet latencies of the LTE IP Evolved Packet Core network. 

This article shows why the two quantities give different measurement values. 

Research, Technology and Standards are Available to Help

The Public Safety Communications Research (PSCR which is part of NIST) provides the highest level of technical resource from the federal government for the benefit of First Responder engineers and state Public Safety authorities. PSCR provides research, defines the MCPTT architecture, represents the USA on Mission Critical Push-To-Talk 3gpp standards organizations and formally defines the measurements and MCPTT KPI’s recommended specifically for Field testing of wireless voice services for First Responders.

These measurements comprehensively define the delays involved in starting a MCPTT session and all system performance parameters making up user experience. We will discuss the complexities of the performance of MCPTT signaling in a later article. This article deals solely with the measurement of audio latency or mouth-to-ear (M2E) latency and the components within the audio path such as the network and the devices that make up propagation delay and cause audio latency.

Why is measuring packet latency or network latency not sufficient?

audio latency

As you know, the smartphone is probably the most sophisticated device designed by human beings, built in large volumes and packed into such a small physical form factor. Here is what happens to an audio voice signal when acoustically picked up by a microphone and converted to a small analog signal. From here it is amplified and then it may even be companded so that the variation of the power of the voice signal from loud to soft fits into the dynamic range of the voice network. See here for further details.

After this initial analog audio signal leveling and conditioning, it is put through an analog to digital converter (A/D Converter) and submitted to the highly complex voice encoder. This compresses the speech frequency components and the phonemes taking out the many pieces of silence and squeezes that into a signal that requires much less bandwidth than the original audio ∗∗. So now we can be efficient with the network’s resources both in terms of RF bandwidth and across the IP/IMS transport network which in LTE/4G is called the Evolved Packet Core (EPC)

Before it is transmitted over the RF network, the encoded audio must be packetized, a protocol header is added, turning into bitstreams and blocks which are then modulated on RF carrier signal to be radiated from the mobile smartphone antenna to the base station or EnodeB, as in the 4G/LTE network used by FirstNet

The packetized data encoded voice then must transit the IP network with the attendant packet only delays. This is followed by the inverse process within the receiving mobile which Depacketizes, decodes and D/A’s the signal before presenting an analogue electrical then acoustic signal to the receiving person’s ear.

All these processes add significant latency which is perceived by the users in addition to the packet latency across the IP network.


The process is more complicated when the audio travels via the MCPTT server including the signaling conversions from legacy FM & P.25 to LTE networks.

Measuring True Latency as Experienced by the End User


Audio Mouth to Ear (M2E) Latency Must be measured from the headset using an audio pulse with a clean sharp edge on which to measure the time interval, as shown above. A precise voltage point on the generated test audio impulse signal is used as the reference point for the beginning and the end of the latency measurement. The exact same voltage point is used on the receiving device to terminate the latency measurement. This requires analog test equipment with exceptional voltage sensitivity and resolution and with time synchronization between the two measurement locations to make this measurement with the required accuracy. 

Measurements as the Feed to Big Data Analytics and Drilling Down On Root Cause

It is also essential, these measurements are made in real-time and are automated so that thousands of measurements can be made for the same quantity or KPI (e.g. mouth-to-ear (M2E) latency) against the multiple differing variables and environments in the field.

These differing variables and environments include:

  1. Mobile devices.
  2. Device software releases.
  3. MCPTT mechanisms:
    • Enhanced Push-To-Talk
    • FirstNet push-to-talk,
    • Software PTT v hardware PTT etc 
  4. RF signal strength:
    • Designated as Reference Signal Received Power (RSRP) in 4G LTE networks.
  5. Network conditions, such as packet loss, jitter and packet latency in order to understand timeouts for signaling transactions which may require retransmission.
  6. During location updates and hand-offs.
  7. Service or network providers offering FirstNet voice .
  8. MCPTT service topologies:
    • Direct Mode – Mobile device to Mobile device direct communication, bypassing the cell tower/network).
    • Smartphone to LMR P.25 legacy FM radios etc. 

Only by making many multiple measurements of the MCPTT KPI’s and top line measurements which track user experience against the above network scenarios can you determine accumulate sufficient measurements versus variables to identify the bottlenecks and isolate the devices or parts of the network that are causing the problems and delays.

In order to ensure MCPTT services are fast, responsive and provide clear intelligible communication, the above testing must be performed both in the lab and in the field. This allows you to control the variables in a repeat test environment at the same time as comparing this with a representation of an operational deployment.

Drive testing must not only measure RF signal strength but also end-user experience including:

  1. Audio E2E delay.
  2. Speech quality measurements i.e. PESQ/POLQA.
  3. Plus the 4 x MCPTT KPI’s mentioned above.

Future articles will cover these measurements techniques in more detail. This article has focused on the importance of making audio latency measurements in addition to packet latency and given the reasons for this.

Contact Us for Help to Characterize Your End Users Experience

If you would like help implementing such measurements, give us a call. Or if you wish to learn more about the techniques and how to make the measurements, follow us on LinkedIn or subscribe to our “no fluff tech tips” when we will be covering PESQ/POLQA, intelligibility measurements and the four MCPTT KPI’s in more detail.

sound components constituting the syllables of language of the voice 

∗∗ bandwidth – in terms of bits per second (kbps)

Please get in touch as below to schedule your loaner equipment so you can make the measurements in your own environment and discover the difference. If you have any questions, please just send us an email using the Contact Us button below.