Good afternoon. Over the past month, we’ve been trying to trace down a problem with phones on a (first) migrated system (fresh build, then restore of v15 backup), then built fresh from scratch (no restore, just all fresh, as if a new system). BOTH systems will start to have BLF update issues in short order (where they no longer get updates on the status of various BLFs they’re subscribed to). Phones will also continue to ring in a ring group once the call has actually been answered. Multiple brands of phones, as well as at least 2 different software versions per brand of phone.
To me, both of these are a symptom of the same problem - missed signals from the system to the phone (so the phone holds the last state). The issue seems to happen more when the system is handling more calls (more traffic), but system’s max CPU has been no more than 25% (from the actual host it’s riding on, not the dashboard stats, which shows under 5%).
We’re also seeing some issues with phone registrations being maintained (again, also worse with more traffic). The “Users Online” graph looks like a southwestern skyline (jagged mountains/hills), rather than a nice Oklahoma field (flat line). At times, we’ll also see PJSIP report a new registration as 600s (what we’ve set as our max reg), but throughout the day, you’ll start to see that reg time decrement down to 542s, 318s, 142s, eventually going to the MIN reg time of 60s.
The system is in the Azure cloud, and there are others w/o issue (this system didn’t have issues for months, until it did), and we have a support ticket open with FPBX, but thus far have not been able to work through the issue.
One thought I had (after exhausting many others), in looking through the config files manually, I’ve noticed a couple items I did not put in there, that really shouldn’t be.
pjsip.transports.conf:
tos=cs3
cos=3
pjsip.endpoint.conf:
(on a per-extension basis)
tos_audio=ef
tos_video=af41
cos_audio=5
cos_video=4
Long ago, in CHAN_SIP, I had items similar to this (but with different values) in the custom fields. We learned at some point with those settings, that doesn’t work well in the cloud (sometimes marking those packets actually dropped them rather than ignore), so we found removing them actually worked better. However, I see no place to remove them from the config under SIP Settings. I have no idea how they got there, and if they’re a static element, I’m wondering if a module update ended up making them appear (explaining why it would be a recent issue).
How can I remove them from being generated (so I can test if this is actually part of the problem), as I see no place where they’ve been added in the first place. It seems odd such things would be forced into the config when they may not be compatible with the environment you’re running in.
The other issue we discovered is that the system seems to occasionally send a batch of BLF updates within a single NOTIFY message. Based on traces of an event when this starts to go downhill, it almost seems as if that is what makes the phones go out of sync (in one message, there were 3 BLF updates). It’s almost as if on occasion it’s aggregating the messages - anyone know how to disable it from doing that? The setting on the extension seems to only apply to MWI aggregation, not BLF.
Also, if anyone has a similar issue, please let me know.
Asterisk v16.13.0, as well as 16.11.0, and 16.9.1, using PJSIP with TCP Signalling for endpoints, and UDP signalling for trunks.
Thanks in advance.