Confusing RTP Issue With Endpoints Over VPN

Just providing an update. The ISP had to send out a tech to perform some infrastructure repair on their equipment, which resolved the sudden increase in latency and dropped packets that we experienced last week.
However, that still has not resolved all of the phone calling issues. If a person calls in, or if a user calls another extension, it is 50/50 whether both people will hear each other. If not, then one of the other person will call back, and then both can hear each other. I can’t see anything in the logs that even shows that happening. I am definitely at my whit’s end trying to figure this out.

I’m desperately trying to find anything in the logs that will point me in the right direction. I had to disable debug logging due to storage limitations.

One question I keep forgetting to ask. I’ve seen people on YouTube show this before, but it’s often so quick that it’s hard to see how they did it.

How do I access the Asterisk CLI via a Putty SSH session? Then, how do I view the live logs in the Asterisk CLI for only one extension? Again, I’ve seen the results of that before, but it was never shown how to get there in SSH.

$ sudo -u asterisk /usr/sbin/asterisk -r
…might get you CLI, depending on the user you SSH’d in as.

Probably best to watch all of the SIP logs. Things like “core set debug 3” and “core set verbose 7” and “pjsip set logger on” and “rtp set debug on” are your friends there… but the CLI offers tab-completion, so type in the first word or two then hit Tab to see all the options, which vary based on your Asterisk version. You may be able to run “pjsip set logger host <name/subnet>” or “rtp set debug ip 1.2.3.4”

This reply is to show that I am still working on this issue, and to also keep this thread open.

It is incredibly frustrating when things don’t make sense, and even verbose logs do not show anything wrong. I even entered the Asterisk verbose CLI to watch a call that experienced the random issue, but nothing was shown. In fact I saw “GOT” and “SENT” RTP packets to both endpoints. As described above, the caller hung up and called back immediately and the audio was fine. Those logs were no different than the previous call.

To help clarify some additional information, the phones that are experiencing the bulk of this issue are old Cisco SPA308G models. I have replaced one of them with a Yealink t-43g after converting that extension to PJSIP. So far that user has not experienced this audio issue with external calls, but has experienced it whenever a user on a Cisco phone calls her extension, or vice versa. That’s why I am suspicious of these Cisco phones.

Of course it doesn’t help with the client responds with “well it worked before.” In this case they mean before we replaced their ancient routers with a site-to-site IPSEC VPN tunnel with new next generation firewalls with a site-to-site Wireguard VPN tunnel. It seems like the RTP traffic is the only thing experiencing issues across the VPN. All other site-to-site traffic greatly improved after the upgrade.

I don’t know if that helps or not. I don’t want to provide any specific network information due to security concerns (in fact I might eventually delete the log file I attached earlier). I am just at my wit’s end!

It’s difficult to tell the client to replace these Cisco phones with other brands that are much more compatible and are currently supported, especially when “it worked before.”

The recent change I made in the Asterisk SIP settings was to remove the STUN address in the Media Transport field, since that should be blank. I had found connection timeout errors in the log for STUN. Unfortunately that did not help. Yes, I ran a ‘fwconsole restart’ after applying that change.

Just curious, what testing, if any, did you do after converting the extension (and changing the port number on the Cisco), but before replacing the phone, with what results?

A few thoughts if I’ve read everything correctly here…

The cisco phones only support Chan_SIP (are you sure about this?). Chan_SIP while still “available” is “depreciated.”

Only the Cisco phones have the issue… That model phone hasn’t had a firmware update released since 2019 that I can see… “It worked before” well, so did lots of things we don’t use anymore, so we’re here now. Do they still have their computers from 2010?

How much time/labor are you charging them for this troubleshooting caused by old equipment that’s possibly not compatible with new systems??? Sometimes it’s just doesn’t make sense to save a few dollars on needed new equipment. How many end points are we talking about if replacing them is needed? If it’s a dozen, eh… if it’s a couple dozen then ok I can see being a little reluctant.

It’s never a fun conversation with a client to tell them they need to upgrade hardware but sometimes it’s what makes sense.

Stewart,

I did not change the extension to PJSIP while the Cisco was in place. These Cisco phones are not compatible with PJSIP, and are documented to cause problems with Asterisk if you try to use them with PJSIP.

I provisioned and placed the Yealink after unplugging the Cisco and then converting the extension to PJSIP.

Also, in case I forgot to mention this earlier, the only media codec that is enabled in the Asterisk SIP settings is ulaw.

And of course the following adds to the frustration, since I get so close to discovering an answer.

A user gave me the exact time of when an issue occured, which I found in the CDR reports in the web UI. Oddly enough, I can’t find this information when I cat the full log in the terminal using the unique ID.

As mentioned previously, I had to disable the sip and rtp debug log due to space concerns. Here is what I did find in the CDR reports (I removed the CNAM column for security reasons)

Time Event CNUM ANI DID AMA exten context App channel
Tue, 19 Dec 2023 15:35 CHAN_START 324 DEFAULT 544 from-internal PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 CHAN_START 544 DEFAULT s from-internal SIP/544-00001a13
Tue, 19 Dec 2023 15:35 ANSWER 544 544 DEFAULT 544 from-internal AppDial SIP/544-00001a13
Tue, 19 Dec 2023 15:35 ANSWER 324 324 544 DEFAULT s macro-dial-one Dial PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 BRIDGE_ENTER 544 544 DEFAULT from-internal AppDial SIP/544-00001a13
Tue, 19 Dec 2023 15:35 BRIDGE_ENTER 324 324 544 DEFAULT s macro-dial-one Dial PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 BRIDGE_EXIT 324 324 544 DEFAULT s macro-dial-one Dial PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 BRIDGE_EXIT 544 544 DEFAULT from-internal AppDial SIP/544-00001a13
Tue, 19 Dec 2023 15:35 HANGUP 544 544 DEFAULT from-internal AppDial SIP/544-00001a13
Tue, 19 Dec 2023 15:35 CHAN_END 544 544 DEFAULT from-internal AppDial SIP/544-00001a13
Tue, 19 Dec 2023 15:35 HANGUP 324 324 544 DEFAULT h ext-local PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 CHAN_END 324 324 544 DEFAULT h ext-local PJSIP/324-00000087
Tue, 19 Dec 2023 15:35 LINKEDID_END 324 324 544 DEFAULT h ext-local PJSIP/324-00000087

In this situation, extension 324 (which is the Yealink using PJSIP) could hear extension 544, however extension 544 (which is a Cisco SPA phone on CHAN_SIP) could not hear 324.
Immediately after they both hung up, extension 324 called 544 back and both could hear each other.

Both phones are on the same network in the same VLAN, and both access the PBX across the site-to-site Wireguard VPN.

I wish I had the RTP logs, but again I had to disable the rtp debug because of space concerns.

Missed that this is chan_sip not chan_pjsip…

What is setting for Reinvite Behavior in your SIP config? (The “safe” key/value in /etc/asterisk/sip* files that you want to see is “canreinvite=no”, which means all your RTP goes through Asterisk - although it appears that way from what you say about RTP logs it is hard to confirm without seeing the actual logs.)

And please capture and post some scrubbed output from “sip set debug peer 1.2.3.4” (the IP of your phone.)

Also here is some potentially helpful one-way audio troubleshooting guide from the phone manufacturer.

The Asterisk default is auto for both sub-options, not yes, although maybe FreePBX doesn’t allow you to set the default.

I thought Cisco and NAT didn’t play together, and I seem to remember they are the ones that need an explicit no.

You have a good point. Later I did change all of the extensions back to “No” after completely rebuilding the Wireguard VPN tunnel to make sure that NAT was disabled. So the NAT setting for all of the CHAN_SIP extensions for this location are set back to “No”.

Thank you for the tip.

The Reinvite Behavior in the Asterisk SIP settings under Chan Sip is set to “No”. The extensions in questions are set to “No” for the “Can Reinvite” option in the advanced tab.

Is it possible to enable the rtp debug the same way? For only the IP address of one peer?

Just for reference, here are the commands I ran in the Asterisk CLI to enable sip and rtp debugging for just one extension:

sip set debug peer extension number
rtp set debug on IP of phone for above extension

The extension I chose is of a user who has been very helpful with documenting the times that she experiences the audio issues. Let’s hope that this debug logging finally reveals the culprit!

At last! Thank you again for the tip to enable sip and rtp debug logging on one extension. The issue I’ve described finally happened to a user who’s extension I had debug logging turned on. She gave me the times that the problem happened. This log is for a 30 second call where this user called someone else, who could not hear the user, but the user could hear that person. Both hung up and this user called back that person immediately. Both could hear each other then.

Again, these logs are of the first call that had one way audio. I started looking through it but quickly got lost. Hopefully something jumps out for someone in this thread that will finally shed some light.

Please note that I have set an expiration date of two weeks for this pastebin link:
https://pastebin.com/BCGT2VrG

One thing I did notice at the beginning of this log, is that the phone invite is first NAT, then destroyed with a second INVITE as no NAT. I find that curious since I set these CHAN_SIP extensions to “No” for the NAT.

Trying to keep this thread alive. Any thoughts with the debug log I provided?

I compared the debug log of the call I provided with the call immediately afterwards, which was successful. I noticed in the failed call that all of the RTP traffic was being sent TO the endpoint, but nothing was being received FROM it. I also saw the following warning:

RTP forcing Marker bit, because SSRC has changed

This has been difficult to research, but some forums suggested it was because of a reinvite. However, in both the SIP settings and the extension settings, Reinvite is set to “No”.

The RTP logs in the successful call log shows traffic both TO and FROM the endpoint and Asterisk.

Nothing it obvious yet why that happened with the RTP traffic.

not sure if this helps but…

SSRC stands for Synchronization Source identifier. It is a unique identifier that differentiates each source of a stream in an RTP session. For instance, in a conversation between two people, each person’s audio stream would have a different SSRC.

SSRC has changed indicates that the SSRC value in the incoming RTP stream has changed. This could happen if the stream’s source has switched – for example, if a call is transferred, or if a new participant joins a conference.

When Asterisk logs “RTP forcing Marker bit, because SSRC has changed,” it’s indicating that it detected a change in the SSRC and, as a result, is setting the Marker bit in the RTP stream. This is likely done to signal to the receiving end (like a VoIP phone or gateway) that there has been a significant change in the stream, prompting it to handle the stream appropriately (like resetting any internal state linked to the previous SSRC).

This can be a normal part of handling RTP streams, especially in scenarios involving call transfers, conferences, or other situations where the source of a stream might change dynamically. However, if this message appears frequently or is associated with issues in call quality or connection stability, it could indicate a problem

Thank you for your reply.

This is the random problem that I have been trying to solve. The recent log that I posted does show that RTP traffic is not being received from the endpoint to Asterisk. This happens whether it’s an inbound call, outbound call, or internal call. All CHAN_SIP extensions have NAT set to “No”. So how is the RTP traffic not making it back to FreePBX? This SSRC error was the only thing I saw in the log between the RTP stream debug data that might explain it.

The log you provided does not show the raw SIP messages, only the way that Asterisk is handling them. While helpful, even more helpful is running “sip set debug on” inside the Asterisk console to immediately start flooding lots of multi-line SIP messages at level VERBOSE (and then scrubbing and posting those.)

And if there are messages that Asterisk is not processing, then you’d need an even raw-er dump eg. tcpdump, mirroring a switch port, firewall/router captures, etc.

One thing the logs do appear to show is that you may be calling out the trunk via direct IPv4 address instead of a string that references the SIP peer. So, you might try changing “1.2.3.4” to “mytrunk”. Although that doesn’t appear to be the case for the internal extension-to-extension calling issues, let’s fix one thing at a time :slight_smile:

But if the trunk IP is dialed directly, then those settings (saved in sip-*.conf files) might not get applied. See some legacy documentation here.

The under-the-hood Asterisk settings can be seen using “sip show peer 1.2.3.4” for the trunk and “sip show peer 555” for the extensions etc. (and posting that output, especially the DirectMedia lines.)

I did enable sip and rtp debug, but only for this specific extension. I can’t afford to globally enable debug and then wait for this to happen. That resulted in a 4GB text file.

I’m not completely understanding what you are saying by calling the trunk via an IP address. This call log was of the user making an outbound external call, however the same behavior has happened when a user makes an internal call to another user at the same location.