UDP PJSIP cycles Reachable/Unreachable like MAD! TCP PJSIP is like butter. Why?

Ok - This may be my own ignorance, but I really would like to know why this happens, and why it made such a HUGE difference.

Just migrated a customer off of Hosted and onto my shiny-new all PJSIP FreePBX and no sooner did I have the phones provisioned (All Polycom 410’s) then they started cycling like MAD - literally seconds between unreachable/reachable over and over so much that the console was nothing but that - and only with 10 phones. Switched it to TCP and it went away instantly and has not even peeped for two hours.

Comcast Coax connection using Comcast as a router for all 10 endpoints - PBX is on a SDN Connection that is perfectly stable. I haven’t tried one of their phones away from their office, so I don’t know if they just have a flakey LAN, but if it was flakey, I would expect for the TCP connected-phones to cycle a little bit, but ZERO - it is 100% rock solid!!

Should I be doing TCP from now on? Why the HUGE difference?

I’ve not fully migrated over to PJSIP yet, but I’ve had to make sure that my NAT keep-alive settings on the phone are correct. Specifically, I came across this issue with my personal Yealink T54w, and I put this in the config:

account.1.nat.udp_update_enable = 3
account.1.nat.udp_update_time = 15

I can’t put together why NAT would behave differently with different SIP drivers, both over UDP - just sharing what I’ve done so far to fix it.

Yeah, we have the qualify pretty aggressive (30 seconds) because here in the “Land Of Enchantment” crappy Internet connections are the norm - smooth connections are the real exception. But before the frequent qualify always worked - this install, it was insane - nothing but unreachable/reachable on the console - and then switched to TCP and ZERO.

It’s like magic!

This is likely because one of the firewalls are killing the UDP live sessions. We have been switching users behind cheap Verizon routers that doesn’t allow you to disable SIP ALG, to TCP as well.

I remember reading somewhere the downside of using TCP for signalling… I don’t remember what it was.
Anyway, if you are using Kamailio, it can have the endpoints register via TCP and proxy the connection to the PBX over UDP.

Only downside I remember about Session-Based packets instead of Sessionless Packets is the increased overhead of the signaling and Asterisk having to keep track of the sessions - I wonder if that is even a thing anymore - I am going to switch a couple of other customers tomorrow - I seem to have the problem with Polycom Phones - yet another reason to dump that Garbage!

I will post tomorrow after I switch the other Box over.

Unless you have several hundred phones keeping TCP sessions open, I do not think it will be an issue at all. Use TCP, or better, use TLS which is TCP with encryption.

Alternatively you could fight with your NAT but why bother? :slight_smile:

2 Likes

Yeah, I have been playing with TLS for the “Oooh - Encrypted!” street-cred - I would guess eventually it will be a requirement just like HTTPS is today.

Endpoint Manager needs to be able to specify the Transport in the Templates - Yes, you can add them to the Basefiles, but it would be so much nicer to have it in the Provisioning template.

I regularly have between 80-100+ devices at locations all using PJSIP on Asterisk. I dont see this problem with my installs and yes it is all cloud based.

So this sounds like a network problem and TCP is just a band aid that doesnt solve the real issue.

It’s not like TCP is an inferior choice. Why would you say it is a band aid?

I have heard some express the opinion that since TCP signaling with Asterisk is in far less production usage than UDP/TCP, that it is less tested and by extension less certain to be reliable. I’m not sure I accept this, and our rollout of Sangoma Connect mobile client, all of which are using TCP by default has not shown to have any issues related to TCP signaling.

1 Like

When the Switchvox mobile client was first released we put a lot of time into investigating PJSIP TCP issues and resolving them so it’s smooth sailing these days. These changes were contributed upstream to PJSIP so all users of PJSIP could benefit.

3 Likes

I wasn’t saying that TCP was inferior at all. It would seem there is an issue on the network, moving to TCP doesn’t solve that issue. It just bypasses it.

I think this is one of those cases where PJSIP, being a far better SIP stack, is highlighting issues that weren’t known before. I just has something like this happen at a few hotels that use FXS based PBXes still. We swapped out the old FXS gateways with new ones and then new ones started having lines not releasing and eventually tie up all the ports. Turns out that it was due to poor programming in the PBX systems and it not releasing properly. The older gateways were apparently releasing the lines after a couple hours so no one really noticed.

Luckily since these are old ass Mitel systems and due to how the Mitel ecosystem works. Despite these all being different hotels with different owners, they all have the same Mitel tech since it’s a regional thing. So 1 guy has misprogrammed PBXes for years and we’re just now finding how many that is.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.