This is not a ‘solve my problem’ post. Actually I have a problem, and I’d like some tips on how to debug it. I have the following setup:
- a FreePBX 13 setup on CentOS 6.9 running on a VPS. Running ‘extended’ modules like the responsive firewall
- A site where the phones are, which has a Huawei B525 4G modem/router. The reason behind this is that the site has no cable or fibreglass internet available and ADSL speed is really poor. The downside here is that the 4G provider has their own NAT ‘in the air’ and there is no static IP address. I ‘solved’ this by applying a dynamic DNS that updates frequently. The DDNS hostname is in the responsive firewall '‘trusted’ zone.
- 3 Yealink T41P phones, all firmware up to date and a Yealink W52P DECT station with one handset, also up to date. All connected through chan_sip.
- The inbound route routes all incoming calls into a timegroup. During business hours and days it routes to a queue that rings all phones. Else it just routes to a message that ends in voicemail.
For the first 8 months or so, this setup worked without a single hickup. Recently, we’ve run into some problems. The phones would not ring on inbound calls. Sometimes just a couple, lately it’s been all or nothing. I’ve dived into the full log and saw that quite frequently some, or all phones would not qualify and therefor would become ‘UNREACHABLE’. I then started to experiment with the qualify intervals and wait times. I modified the intervals and times for some extensions and not for others to see if the ‘fallouts’ would not affect the extensions I modified. This seemed to work at first, but soon that stopped working as well.
I setup the server to send me a tail of the full log grepping the word ‘UNREACHABLE’ so I have a more clear view of what is happening now, I see that the phones all become unreachable at approximately the same time every day. This report does not include the extensions for which I switched off the qualify of course, but in practice they don’t respond as well. Rebooting the Huawei 4G router usually solves it.
I asked the VPS provider to monitor the internet speed of the VPS around the time the phones usually stop responding, they found no weird things. So now I’m kind of stuck. I suspect the 4G internet connection or the Huawei router to be the culprit, or maybe the DDNS system. My main question is, what would be the best approach to REALLY see in-detail what’s going on? Are there any logs I can turn on, or monitoring tools you would recommend? Are there any messages in the full log you recommend I should keep an eye out for? Any tips would be greatly appreciated, thanks in advance.