We’ve been having some intermittent problems with our frebpx server (2.4.0beta2.2) where we will suddenly get a lot of packet loss, and some (but not all) phones will stop working. Rebooting the server will fix the problems for a while, but the problems return after a few days.
I need to know is what kind of debug information I should be looking for, and generally where I should start to try to diagnose this problem.
This is most likely a network issue and not a FreePBX issue.
How do you know you have packet loss? Have you looked at the network interface stats on your server and on your switch?
I am monitoring it with nagios, and normally I would agree that it is a network issue, but nothing else on the network is having this problem. Last time it happened, I ssh’d into the trixbox server, and it was taking several seconds to respond to each command.
Just because nothing else on the network is having problems does not mean that the NIC or switch port on your FreePBX server is not having problems. Packet loss is a network problem and you stated that you were experiencing packet loss.
What does the interface on the FreePBX system indicate. Did you do an ifconfig and look at the stats?
What was the CPU utilization (use top) on the server when you are having these issues?
trixbox is a mess, and unsupported. If you have the TBMMUNIN package installed the python script can go into a race condition.
Since trixbox has not been updated in so long, and you are having problems I would start to consider a strategy to migrate to a current supported distro.
I don’t know what the cpu utilization was, because when the problem occurs the nrpe checks time out…
While I agree with the possibility of network equipment failure, there is also the possibility that if the system becomes unresponsive it will not respond to icmp requests. I did not get a look at ifconfig stats, but I’m currently running tcpdump to see if I can find anything the next time it happens.
I just had a system that was dropping calls last week. The problem turned out to be a bad NIC.