Pjsip causes hold drops with Cisco CP-8941 phones

I have a number of Cisco CP-8941 videophones on FreePBX 17 with Asterisk 22 in use and the USECALLMANAGER patch loaded.

As you may or may not know the current callmanager patch now includes a very slightly modified version of chan_sip. And, because it is a patch, it means you have to compile Asterisk from source if you are going to use it.

So now, if you want to reverse the decision to remove chan_sip from FreePBX 17, you now have 3 ways to do it, you can either use the usecallmanager patch or the chan_sip patch from interlink1 and compile from source, or you can use the precompiled Asterisk binary for the older version of chan_sip that Sangoma still distributes.

(I just wanted to get that established for the peanut gallery who likes to make the claim that chan_sip is dead, etc.)

Anyway, there’s the problem.

Using the CP-8941 phones (and every other Cisco phones I have tested with) with chan_sip, I can put an incoming call to the phone on hold, and everything is fine.

However, I recently discovered with the CP-8941 phones, using chan_pjsip, if I put a call on hold after 60 seconds the phone drops the call. The PBX still has the call on hold and the caller still hears the hold music but from the phone’s point of view, the call doesn’t exist.

Thinking that this might have been a bug that the usecallmanager patch fixed in chan_sip, I tried this test with older versions of FreePBX that have chan_sip but don’t have usecallmanager patch, and discovered the same issue - chan_pjsip the hold is broken, chan_sip it is not.

This does not happen with other Cisco phone models that run newer Enterprise firmware such as the 7841, nor does it happen with Cisco phones that run 3PCC firmware.

Obviously, this is some kind of firmware bug in the CP-8941 phone. Maybe chan_sip has a workaround specifically for this phone or maybe it’s just something in the architecture of the 2 channel drivers that is different that happens to tickle the bug under chan_pjsip.

There is newer firmware for the CP-8941 than the 9.3 firmware I am using, but the 9.4.x firmware for this phone causes video calls to and from the phone to crash on Asterisk. (I’ll have to test that firmware though, for this hold bug)

Unfortunately, there is a dearth of video deskphones on the market - only 2 or 3 companies are left that make them. Cisco’s current videophone is the CP-8875 that runs Phone OS but MSRP on this is over a thousand dollars a phone.

I have not started the process of gathering traces or any of that yet, I was just wondering if anyone has any ideas or suggestions on why pjsip is broken?

(I also felt this is important to get documented)

Have you checked the RTP Hold Timeout and RTP Keepalive settings? (Asterisk SIP Settings)

If indeed the phone is responsible for the hangups, enabling the RTP Keepalive might remind it that it is on hold and should not hang up.

That was one of the first things I tried playing with and it didn’t make any difference, unfortunately.

Actually, check that - as a last ditch try, I could not resist playing with it one last time - and I just figured this one out. It’s not the RTP Hold Timeout that needed to be changed it’s the RTP Timeout. Setting just the RTP Hold Timeout higher does nothing. Of course, what this does, is it just sets the time that the phone hangs up the call to whatever you set the RTP Timeout to, or whenever the user picks the call back up. I also discovered that the CP-8845 running Enterprise firmware is similarly affected - the CP-8845 running 3PCC firmware is not affected.

This doesn’t FIX the bug - it just covers it up. And, even though the timeout setting also applies to chan_sip, the setting is not needed for chan_sip as the bug does not appear there. This is a chan_pjsip + Cisco 8941/8845 Enterprise firmware phone bug.

What would be really great, two PCAPs. One with chan_pjsip and one with chan_sip showing how at the 6 minute mark (or whatever it is) on hold the call drops but with the other it doesn’t.

Blindly changing settings like RTP Timeout, etc are pointless if you don’t know what is actually happening. It might not be an RTP issue at all but a signalling issue. Actually capturing the traffic during a call that replicates the problem will give a lot more insight then guessing.

1 Like

This is correct, however keep in mind that Cisco does not “officially support” the “Enterprise” firmware on these phones with 3rd party PBXes such as Asterisk - and clearly, they know there is an issue because they fixed it in the 3PCC firmware version for the 8845s. And in fact Cisco has a great desire to make these phones incompatible in border conditions such as this since they want to discourage smart guys from configuring Asterisk-based PBXes and using them to shovel the Cisco garbage out of our enterprises.

This is sort of more in the realm of academics than production in any case. Since I can replicate the problem with my test PBXes very easily and I have these phone models, I’ll get a packet capture and post it. But, my expectation is the answer is “Cisco is doing it wrong” and it will boil down to keep using the RTP timeout hack - unless there’s any support for posting a code patch along the lines of:

“if user-flag set in extension.conf of cisco-broken-hold do this otherwise do the normal thing”

in the chan_pjsip driver.