Sangoma s705 and s505 Locking Up after Inactivity

I recently deployed 20 Sangoma phones (7 - s705’s and 13 - s505’s) at a doctor’s office. Running FreePBX 14.0.13.6 as a virtual machine in a Hyper-V environment. All modules are up-to-date, and I’m running the latest 1.58 (2.0.4.72) firmware on the phones. I have them setup as PJSIP extension/endpoints. They are also setup to go to screensaver after 30 minutes with Default Photo selected as the screensaver.

They worked great for entire week (Mon-Fri), but after sitting idle over the weekend, all of the 20 phones (except for 5 or so) had vertical lines across the screen or a blank screen, and were locked up; no dial tone, nothing; they showed as Unavailable in FreePBX. After rebooting the phones, they came back up, and have worked fine throughout the day today.

I have two other s705’s that I’ve testing in other FreePBX POCs, and they would also lock up and show vertical lines across the screen after being idle for a period of time. So, this makes 3 separate FreePBX instances with Sangoma phones that have exhibited this behavior. I thought it may be a fluke, so I moved forward with the large order for the doctor’s office.

Has anyone else ran into this type of issue?

Does the server have a Static IP?
Alao, did you reboot the phones after upgrading firmware?

I have a somewhat similar issue that I have been tracking for a month. We installed 40 Sangoma S505 IP phones at a school. Everything appears fine, but then after a period of time, usually over the weekend, the phones will disconnect from the PBX and will not re-register until they are rebooted. We do not have screen-savers set, so we may be seeing the same kind of lock up.

Initially we were using Sangoma phones because we were dropping them into an existing network. Due to this issue, we have replaced their entire network and the problem still persists. We have set a static IP on the PBX, put the phones on their own VLAN with no other devices (FreePBX is doing DHCP and routing for this vlan), added brand new POE switches, and replaced their router.

Looking for any advice on how to proceed.

Based on the limited info herein, I suspect this is due to an asterisk bug involving excessive MWI notification packets. Both of you, please open a phone hardware support ticket.

1 Like

I can speak for Kevin here since we work together.

Kevin opened a ticket with Sangoma and received a response already with a link to a recent firmware. He applied it to the endpoints. I suppose time will tell now if the firmware was beneficial.

So, I ran with the new “pre-release” firmware for a few days, but today I had a serious issues. I had reports that the lines sounded staticy, and calls were sounding garbled or completely dropping. Apparently it was intermittent but continued throughout the whole day. One call would come through fine, and the next may completely drop after 55 seconds or so. Another call would sound garbled and drop out.

I initially wondered if the updated firmware could have been the culprit, so I reverted the phones back to the latest stable release. That didn’t resolve the issue. When I was finally able to get on site to do additional troubleshooting, I discovered that some inbound calls were not making it in at all as if FreePBX lost it’s trunk registration. Also, the phones I tried didn’t have a dial tone.

Things were acting so badly, I rebooted the FreePBX server, and things are seemingly better now. I can still hear faint static in the handset, but calls are making it through.

Does anyone have any ideas about what might be going on?

Looking at the log from the day, here as some of the errors I consistently saw:

WARNING[84782][C-0000250e] ast_expr2.fl: ast_yyerror(): syntax error: syntax error, unexpected ‘=’, expecting $end; Input:
= 1 & 0 = 0

VERBOSE[55690][C-00002457] pbx.c: Executing [s@macro-user-callerid:18] GotoIf(“PJSIP/<trunk…sanitized log>”, “1?report2:macroerror”) in new stack

I would think this could be network related, but the first week they were open for business they didn’t have any of these issues, and nothing has changed on the network. I also checked bandwidth on the firewall a few times during the day, and they were not having bandwidth issues.

This does not bode well for a production environment…

What now?

-Kevin

I’m also seeing this warning:

WARNING[27008][C-000023fd] res_rtp_asterisk.c: RTP Read too short

When you say “static” are you really talking about the call cutting in and out? I am guessing that is what is going on because true static like you would have on an analog line really is not possible. I would tend to think you have a network issue or an issue with the virtual environment. Are your phones on their own VLAN or separate network, or are you sharing with other devices? I always put phones on their own VLAN to avoid these types of issues and only allow phones and the phone server on that VLAN. I am also not a fan of doing this in Hyper-V. There are timing issues between the hypervisor and the physical hardware. If you like a rock solid environment for little money just pickup an HP DL360 with a couple drives RAID 1 and dual power supplies. or a Dell 420 or 430 with a similar config. The Sangoma appliances are awesome and fully supported which is an advantage. I also have a couple small business environments running very well with the Intel NUC. Unless you are planning on spending a lot of money, I would avoid any type of virtualization. Even if you spend the money, you will still be more reliable on dedicated hardware. Like I have pointed out it is not very expensive for the examples I provided. You can pick up that dedicated hardware on ebay for under $400 or get a Sangoma box and be fully supported.

Thankfully my issue is finally resolved! It has actually been resolved for just over a week, but I’ve been so busy, I just now slowed down long enough to provide an update here.

I had initially placed the blame on the phones themselves, but I’ll let Sangoma support put the solution in their own words:

Hi Kevin,
Thank you for providing access. After connecting, this confirms what I suspected. There is a bug with the current FPBX released version of asterisk that can cause 2^n NOTIFY messages to be sent to each extension and over time this can cause this behavior. A SIP network trace indicates that this is what is occurring. Until the new version of asterisk is released, the work-around is to change the MWI Subscription Type to Solicited and the Aggregate MWI setting to Yes on the Advanced tab for each extension. This can cause a small delay in the VM light being illuminated when a VM is left for an extension, but it does avoid this issue.

I can confirm that this did indeed resolve my issue completely, and it’s been smooth sailing for over a week now.

-Kevin

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.