FreePBX in live environment calls cutting out


(Eric) #1

I’ve been running FreePBX on an EC2 instance since early April of this year and it has not been a smooth transition.

Ever since switching over we have calls cut out randomly on all the extensions. It can happen anywhere between 2-5x a day.
In light of current events we don’t even have many people on the phone at once. Only about 10 max (very small company). Before this was deployed my test calls didn’t seem to have any issues. However, please note I didn’t test the calls for very long, a few minutes at a time at different times of the day.

Complete list of symptoms:

  • Calls cutting out (can be anywhere between less than a half a second, to 5 seconds)
  • It can happen on all the calls that are live, or just on a couple of therm
  • The times it cuts out are at random times. some days it’s heavier in the morning, other days it’s in the afternoon.
  • The length of call doesn’t matter, the call itself can be only a few minutes in, or starting at a half hour or even hour in if a call goes that long.
  • The extensions affected are all of them, and it seems to surge with all active calls before dying down and performing as expected for another couple of hours.
  • Some surges cause very little interference (less than half a second), to a very noticeable disturbance (agents/clients both saying “You’re cutting out” to calls flat out dropping)
  • The surges can be a couple of cuts before being fine, to being 5 or 6 times before finally straitening out
  • Happens for both IB and OB calls.

Setup of our network:

  • The FreePBX server is on an EC2 instance, connected to an S3 bucket for easy access/storage to our recordings.
  • FreePBX version is 14.0.13.33.
  • All extensions are using PJSIP
  • We have an SD-WAN setup before it goes to our internal network, there’s no firewall rules setup to block VOIP calls from our specific network EC2 instance.
  • We have two routers going to our locations each with QoS setup. Please note, these are home/small business routers and not commercial routers and our ‘locations’ is in the same place, only different suites in the building.

Here’s what has been tried so far:

  • Setup QoS on the routers and the SD-WAN we have running to prioritize VOIP calls.
  • Put data on one channel and voice on the other for the SD-WAN so VOIP calls have a dedicated channel.
  • Each of our ISP’s have 500MB and the monitoring I have setup shows we don’t use more than 20MB at a time (we only have a few calls and people checking different map apps on stations for our clients)
  • Double checked all connections (I know someone would’ve asked that at one point)
  • Double-checked that the audio codecs on our VOIP Server and softphone are the same (currently using the softphone using MicroSIP).
  • The onboard FreePBX firewall is setup to allow our ISP’s

Other notes:

  • Whenever a surge happens I bring up one of the recordings to see if the recording is breaking up as well. The recording has no such issue. The recording catches everything.
  • I’ve checked the EC2 instance via putty to see if it’s running high for some reason and the server is running at 10% or lower with plenty of VRAM to spare during the surges, so I figure it’s not the server or installation itself
  • The logs on the FreePBX during one of these surges I haven’t seen anything pop out at me (although, I’ll be the first to admit that I’m no expert at these FreePBX logs). I don’t see any uptick or different errors on the FreePBX logs when a surge happens versus when there’s no issues at all.
  • The changes I’ve been doing have made small improvements to the VOIP calls. Prioritizing VOIP traffic throughout the network, matching the audio codecs as well as switching over to less bandwidth heavy codecs, have made the surges less disruptive than before but hasn’t completely solved the problem. This tells me I might be on the right track.

The last idea I have is to change our network landscape so that our routers are no longer issuing DHCP and only having our SD-WAN do the job which in turn will have ALL of our internet traffic not be filtered by any router/switch and going right to the SD-WAN, with our routers only being WiFi hubs in a sense. However, I was told by the SD-WAN administrator it may only solve the issue for a couple of days. I’m currently still planning to make the change however.

At this point, I’m at a complete lost and am looking for ANY ideas of what might be happening to the FreePBX installation


#2

The fact that both inbound and outbound voice is affected should make this easier to troubleshoot. Whether the problem is with your network, at Amazon or with your trunking provider, you should see the jitter and/or packet loss on traffic leaving the PBX. Based on your “recordings are good” observation, I’d suspect severe jitter.

Start by capturing all inbound and outbound traffic; see

Ask your agents to record the time when trouble occurs and to report it promptly. Look at the timestamps on the capture files, locate the one with the bad calls, download it to your PC and analyze it with Wireshark. If RTP forwarded from extension to trunk is bad, but not vice-versa, the problem is likely with your network or ISP. If the opposite, I’d suspect the trunking provider. If both are bad, either your instance isn’t being scheduled timely, or some resource in your instance is being exhausted at the time.

If the trouble is at the PBX, try running this script:


to see whether the instance is getting serviced when needed.


#3

Hey,

I have the same problem. Sometimes everything works fine and there are conversations of over 2 hours and sometimes the conversation breaks off after 10 seconds. But my configuration here is different because I have everything local. The outgoing calls have their own network card because the SIP trunk goes to a One Access 425 that comes from o2 Business Germany. So I have no influence on the trunk and only have access to this trunk (IP address etc.)
I will now monitor this line to see if I can detect problems with jitter or similar.

Regards
Nico


(Eric) #4

A quick update on my situation. I found out that there was a cache of 10 days worth of recordings on the server in rotation. I set it up to only hold onto the past two days since all of our recordings is on the S3 bucket I set up.

Also, I was able to get a fair amount of TCPdumps during a trouble issue of a call cutting out.
My question on this now, is what exactly would I be looking for in wireshark?
(I’m no network administartor :P)


(Eric) #5

One more update and you guys can tell me what you think. On some recent documentation for the FreePBX on an EC2 instance it says not to enable the onboard firewall as it can cause unpredictable behavior even if configured perfectly. Well guess what I had enabled?

This above coupled with my new caching procedure seems to have cut down drastically on the amount of cutting out the client/agent have heard. To the point that all of last week once I disabled it not a word from agents of anyone cutting out when it used to be everyday throughout the day of someone cutting out. The only cut out at this point is with the manager when he’s listening in on live calls, but it’s only on his device/headset for the time being.

I actually don’t believe this was the root cause of it, but it seems to have put the system at a great level of service. I’ll continue to monitor but it seems to have solved itself for the moment.


(Eric) #6

Last update for this general help question. It does appear to be on our network and I’m looking into getting a dedicated network line to come to our business. Thanks for the suggestions to help me get to the bottom of it