Thanks Dicko. However, we’ve experienced the issue on 13.15 as well. So it’s hard to decide what’s safe. Go to current 14 version?
Also, our procs are CPUs24 x Intel® Xeon® CPU E5-2620 and CPUs24 x Intel® Xeon® CPU L5640, and both seemed to be supported by CentOS 6.6, so that theory’s out the window.
Personally, I have found that using 13. pretty well anything with cdr-mysql will cause that , update to odbc, unload the cdr mysql stuff and maybe . . . ., but yes my machines are quite happy now under 13.18.rc? or 14.6.2 under ProxMox or Vultr ( both have been a PITA for a few months)
So we had another FRACK finally. It now does appear that the “Serious Network Trouble” Error issue we’ve been having seems related. Here’s part of our log:
[2017-11-11 06:03:50] WARNING[8721] chan_sip.c: Unable to cancel schedule ID 0. This is probably a bug (chan_sip.c: do_dialog_unlink_sched_items, line 3266).
[2017-11-11 06:03:50] ERROR[5146] /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/utils.h: Memory Allocation Failure in function ast_str_create at line 655 of /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/strings.h
[2017-11-11 06:03:50] WARNING[5146] chan_sip.c: sip_xmit of 0x7f0428c3af80 (len 139655827686296) to 108.23.78.98:4279 returned -2: Cannot allocate memory
[2017-11-11 06:03:50] ERROR[5146] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data
[2017-11-11 06:03:50] ERROR[5146] astobj2.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f04286eac38 (0)
Did you see the log I attached? Seems to have a lot more info around the FRACKs than previous ones, before BETTER_BACKTRACES was enabled. Do you see anything helpful there?
Particularly here:
[2017-11-11 06:03:50] ERROR[5146] /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/utils.h: Memory Allocation Failure in function ast_str_create at line 655 of /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/strings.h
We recently read that on proxmox, which is where we have about 10 FreePBX distro VMs, the default processor type “kvm64” is essentially equivalent to a Pentium 4 in it’s CPU flag set.
We changed all our VMs to the “host” processor type, as we aren’t doing live migrations. We saw a huge drop in CPU load on all processes within the VMs after we did this.
Probably four hours later, all these FRACKs started, but just on one of the 10 VMs.
As I mentioned near or at the top of this post, we have only had this issue on servers that use TCP for SIP, and a non standard port (not 5060). Our UDP servers have never had the issue. That said, from the several months of research and feedback from experts, that shouldn’t matter. We strongly prefer TCP and the non standard port for security and NAT traversal.
EDIT: Also, eight or so others of the VMs are setup the same way as the one that crashed, and on the same proxmox host.
CORRECTION: the FRACKs started a little over a day after the processor type change. However, non of the other VMs are having this issue.
So ya, we just keep getting these several times a minute. That said, is there something we can look at or monitor that would help identify the cause, since better backtraces isn’t working?