FreePBX | Register | Issues | Wiki | Portal | Support

Consistent Asterisk/FreePBX Crash Issue

asterisk
Tags: #<Tag:0x00007fb90322b708>

(Steven Sedory) #62

Thanks Dicko. However, we’ve experienced the issue on 13.15 as well. So it’s hard to decide what’s safe. Go to current 14 version?

Also, our procs are CPUs24 x Intel® Xeon® CPU E5-2620 and CPUs24 x Intel® Xeon® CPU L5640, and both seemed to be supported by CentOS 6.6, so that theory’s out the window.


#63

Personally, I have found that using 13. pretty well anything with cdr-mysql will cause that , update to odbc, unload the cdr mysql stuff and maybe . . . ., but yes my machines are quite happy now under 13.18.rc? or 14.6.2 under ProxMox or Vultr ( both have been a PITA for a few months)


(Andrew Nagy) #64

13.17.2-3.shmz65.1.183 is now live which as you can see has BETTER_BACKTRACES.

The same applies to Asterisk 14 as well

freepbxdev1*CLI> core show settings

PBX Core settings
-----------------
  Version:                     13.17.2
  Build Options:               DONT_OPTIMIZE, COMPILE_DOUBLE, BETTER_BACKTRACES, OPTIONAL_API

(Steven Sedory) #65

Great news!


(Steven Sedory) #66

We have ran the FreePBX update scripts and can confirm that BETTER_BACKTRACES show in the build options. Thanks!


(Steven Sedory) #67

Are you referring to the “bad magic number” FRACK Error I mentioned at the top of the thread?

If so, our next crash or FRACK, now that we have BETTER_BACKTRACES enabled, should show that, if it is indeed the cause, correct?


#68

no, a predicdtable asterisk crash on the second ‘core reload’ (I have never seen a frack in asterisk)


(Steven Sedory) #69

So we had another FRACK finally. It now does appear that the “Serious Network Trouble” Error issue we’ve been having seems related. Here’s part of our log:

[2017-11-11 06:03:50] WARNING[8721] chan_sip.c: Unable to cancel schedule ID 0. This is probably a bug (chan_sip.c: do_dialog_unlink_sched_items, line 3266).
[2017-11-11 06:03:50] ERROR[5146] /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/utils.h: Memory Allocation Failure in function ast_str_create at line 655 of /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/strings.h
[2017-11-11 06:03:50] WARNING[5146] chan_sip.c: sip_xmit of 0x7f0428c3af80 (len 139655827686296) to 108.23.78.98:4279 returned -2: Cannot allocate memory
[2017-11-11 06:03:50] ERROR[5146] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data
[2017-11-11 06:03:50] ERROR[5146] astobj2.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f04286eac38 (0)

More info on Asterisk bug tracker here: https://issues.asterisk.org/jira/browse/ASTERISK-27321


(Andrew Nagy) #70

You didn’t upload the backtrace to the asterisk ticket. Please ensure you do this.


(Steven Sedory) #71

So there was no core dump file, as Asterisk didn’t crash fully, but FRACKs only. How do I go about getting a backtrace for that?


(Andrew Nagy) #72

You can’t since it didn’t crash it’s not really related to your original issue.


(Steven Sedory) #73

Did you see the log I attached? Seems to have a lot more info around the FRACKs than previous ones, before BETTER_BACKTRACES was enabled. Do you see anything helpful there?

Particularly here:

[2017-11-11 06:03:50] ERROR[5146] /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/utils.h: Memory Allocation Failure in function ast_str_create at line 655 of /builddir/build/BUILD/asterisk-13.17.2/include/asterisk/strings.h


(Steven Sedory) #74

So we’ve been good since November or so.

Yesterday, seemingly out of no where, we had 8,000 FRACKs! And today so far, 4,000!

But sadly, there’s still no useful info. This is what the asterisk cli is showing every several seconds:

[2018-01-26 09:50:48] ERROR[20467]: astobj2.c:131 INTERNAL_OBJ: FRACK!, Failed assertion bad magic number 0x0 for object 0x3fe16a0 (0)
Got 18 backtrace records
#0: [0x607112] asterisk __ast_assert_failed() (0x60708a+88)
#1: [0x45e2c6] asterisk <unknown>()
#2: [0x45e2f3] asterisk <unknown>()
#3: [0x45f5f2] asterisk <unknown>()
#4: [0x45f829] asterisk __ao2_link() (0x45f7e6+43)
#5: [0x45fc9c] asterisk <unknown>()
#6: [0x45ff3f] asterisk __ao2_callback() (0x45fee0+5F)
#7: [0x7fed4997312d] chan_sip.so <unknown>()
#8: [0x7fed49972e6f] chan_sip.so <unknown>()
#9: [0x4dba68] asterisk ast_cli_command_full() (0x4db7f4+274)
#10: [0x4dbbcc] asterisk ast_cli_command_multiple_full() (0x4dbb34+98)
#11: [0x45512a] asterisk <unknown>()
#12: [0x603d14] asterisk <unknown>()

That said, no crashes yet… but calls seem to take a long time to initiate.

Has anything changed to where we can see what is behind the "unknown"s above?


(Andrew Nagy) #75

No. We have followed all of Digium’s recommendations.


(Steven Sedory) #76

Dang… :confused:


(Fetus) #77

Hi Steven,

Did you update the system recently? Did you change anything in your box?

Regards!


(Steven Sedory) #78

Yes.

We recently read that on proxmox, which is where we have about 10 FreePBX distro VMs, the default processor type “kvm64” is essentially equivalent to a Pentium 4 in it’s CPU flag set.

We changed all our VMs to the “host” processor type, as we aren’t doing live migrations. We saw a huge drop in CPU load on all processes within the VMs after we did this.

Probably four hours later, all these FRACKs started, but just on one of the 10 VMs.

As I mentioned near or at the top of this post, we have only had this issue on servers that use TCP for SIP, and a non standard port (not 5060). Our UDP servers have never had the issue. That said, from the several months of research and feedback from experts, that shouldn’t matter. We strongly prefer TCP and the non standard port for security and NAT traversal.

EDIT: Also, eight or so others of the VMs are setup the same way as the one that crashed, and on the same proxmox host.

CORRECTION: the FRACKs started a little over a day after the processor type change. However, non of the other VMs are having this issue.


(Steven Sedory) #79

So ya, we just keep getting these several times a minute. That said, is there something we can look at or monitor that would help identify the cause, since better backtraces isn’t working?


(Andrew Nagy) #80

Better Backtraces IS working. It is compiled into Asterisk as described on their wiki and has been cross referenced by two former Digium employees.


(Steven Sedory) #81

So are we alone it ours not showing proper backtrace information? Or is this true for everyone on the FreePBX distro?