Consistent Asterisk/FreePBX Crash Issue

asterisk
Tags: #<Tag:0x00007f702784c350>

(Steven Sedory) #1

Running FreePBX 13.0.192.16 and Asterisk 13.17.0

Recently used the “warm spare” method to move to a new server (new VM on KVM/proxmox)

The server has about 120 remote extension, and had no real problems before.

I posted about this crash issue yesterday here, but my hypothesis was off: Media_index.c: Failed to stat

Today we had a ton of users call and say their phones weren’t working. Funny thing is, they show OK with IP address in peers list when running “sip show peers” in cli.

So we did a fwconsole restart, and things started working again.

This crash has happened three times this week already. Sunday morning, yesterday morning, and today.

I starting digging through the logs, and these are the errors that may or may not be the cause. I’m hoping someone can give me some insight. Here are some error examples:

These ones show all over the logs, way before, way after, and right around the crash time:

[2017-08-22 09:30:59] ERROR[32499][C-00000028] pbx_functions.c: Function PJSIP_HEADER not registered

These ones yesterday were fairly close to before the crash, but there were none today before the crash:

[2017-08-22 09:38:08] ERROR[1488] netsock2.c: getaddrinfo("2605:e000:6045:3a00:20b:82ff:feac:c151:13312", "(null)", ...): Name or service not known
[2017-08-22 09:38:08] WARNING[1488] chan_sip.c: Could not resolve socket address for '2605:e000:6045:3a00:20b:82ff:feac:c151:13312'

These existing on all three instances:

Line 33144: [2017-08-20 10:34:38] ERROR[30555] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data

And finally, these ones look the like the most likely culprit, but didn’t show on Yesterday’s crash (just today and Sunday’s):
*note that the difference between Today’s and Sunday’s, vs Yesterday’s, is that the former showed all endpoints “OK”, though they truly weren’t, the latter showed only about half of them

[2017-08-23 13:42:49] ERROR[26004] astobj2.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x3de7430 (0)

So all that to say, I hope someone can help us find the root cause of all this.

Again, this server was a fresh v13 FreePBX server that we just “warm spare” copied to from an existing server. The existing was running on an ESXi host, fully updated to …66-21. We fully updated the fresh VM to 66-21 as well before running the backup/restore. The new server is a VM on KVM/proxmox.


Error[6641] chan_sip.c:4273 Serious Network Trouble; __sip_xmit returns error for pkt data
Error[6641] chan_sip.c:4273 Serious Network Trouble; __sip_xmit returns error for pkt data
Voicemail Broken on Brand New Server
Incomming Calls Going Nowhere
ERROR[5383][C-00003919] astobj2.c: Excessive refcount 100000 reached on ao2 object 0x142f648
(Steven Sedory) #2

Something worth noting:

The other servers in the same environment, and that went through the same “warm spare” type of move, have all the same type of errors above, with no apparent problems, except the FRACK! error. They all have none of those. But, those servers only have 20-30 extensions each, where the one in questions has about 130.

Also, we checked for the same type of errors on a server that we hadn’t moved yet, and didn’t have any of the FRACK! errors OR the “Serious Network Trouble” errors.


(Steven Sedory) #3

And more. Looks like asterisk is crashing and restarting often, thus the extensions go unreachable for a time. See yesterday’s restart log (I’m assuming the line below is the beginning of asterisk starting up. please correct me if not true):

Line 1: [2017-08-22 03:13:01] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 39686: [2017-08-22 09:22:54] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 46820: [2017-08-22 09:22:59] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 96752: [2017-08-22 09:51:16] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 122259: [2017-08-22 10:30:50] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 131142: [2017-08-22 10:42:12] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 136020: [2017-08-22 10:43:36] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 145626: [2017-08-22 10:55:08] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 209663: [2017-08-22 13:07:10] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 217871: [2017-08-22 13:16:31] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 229884: [2017-08-22 13:43:06] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 235313: [2017-08-22 13:43:48] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 245213: [2017-08-22 13:57:53] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 251307: [2017-08-22 14:00:31] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 273991: [2017-08-22 14:40:22] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC
Line 362576: [2017-08-23 00:50:05] Asterisk 13.17.0 built by mockbuild @ jenkins2.schmoozecom.net on a x86_64 running Linux on 2017-07-27 17:42:42 UTC

(Andrew Nagy) #4

Crash dumps will be in /tmp

https://wiki.freepbx.org/display/SUP/Providing+Great+Debug#ProvidingGreatDebug-Backtraces(Segfaults/CoreDumps/AsteriskCrashing)


(Steven Sedory) #5

Thanks. Made dump file. So create ticket or post here?


SNMP Monitor Extension, or Similar
(Andrew Nagy) #6

Depends on the issue usually they would go directly to Asterisk not through us (Sangoma can’t do much about crashing Asterisk)

However if you post it then use pastebin.freepbx.org as it’ll be pretty massive


Audio on Account 2 but not on Account 1
(Steven Sedory) #7

Thanks Andrew. Unfamiliar with pastebin but will check it out.


(Steven Sedory) #9

As explained, we did a move of data by using backup/restore. That said, I’m seeing a lot about pjsip as I search logs and related errors. On the server we backed up and then restored on a fresh FreePBX distro, it had SIP Driver set to chan_sip only (not the default of both). The new server has that settings too, as it apparently carried over from the backup/restore. All that to say, do you think that this type of move, with SIP drive set the way it was, could have broke it?


(Steven Sedory) #10

UPDATE: Had wrong pastebin URL before. here is the correct one.

Uploaded backtrace here: http://pastebin.freepbx.org/view/8e5e3ff1

Any help much appreciated.


(Steven Sedory) #11

Updates:

We disabled “ballooning” in proxmox on the VM memory settings two nights ago. All day yesterday, we had no asterisk crashes (but we had one day earlier this week with no crashes too). But we did have 17 FRACK errors still.

Last night, I changed from the proxmox 4 default NIC of virtio (a paravirtualization NIC) to the E1000 Intel one. We have continued to have no crashes so far, and zero FRACK errors too!

Still getting the “Serious Network Trouble” errors though, but less. Total of 124 so far today.


(Steven Sedory) #12

So today, out of no where, two thousand plus Frack errors and a bunch of extensions unreachable. fwconsole restart did not work because asterisk wouldn’t stop so I had to force stop asterisk and then run fwconsole restart. This issue is killing me. Anyone’s input is greatly appreciated.


(Andrew Nagy) #13

(Steven Sedory) #14

This is where I have just been led. I am doing some research now. Specifically when installed with the FreePBX 13 distro ISO (not setup from scratch, nor compiled asterisk separately).

Which error did you see exactly? The specific one mentioned in the subject?


(Dominic) #15

Yeah, I saw the error in the subject line, googled it, found this thread, and subscribed to see if the problem is systemic and if it gets resolved before I update from distro 6 to 7. I’m not even sure if I have the guest around anymore to see if I can reproduce it though.

I can tell you I’m using CentOS 7 as my hypervisor and I installed the alpha build of the v7 distro from the provided ISO shortly after it was first announced.


(Steven Sedory) #16

So another hint in a compatibility issues with KVM and the FreePBX distro…


(Andrew Nagy) #17

Freepbx hosting and our cloud solution is on KVM and we do not have these issues.


(Steven Sedory) #18

Thanks Andrew. Are they running the freepbx 13 distro with asterisk 13 precompiled?


(Andrew Nagy) #19

FreePBX Hosting and the Cloud solution use the standard distro with asterisk precompiled. We do not make any modifications to that.


(Steven Sedory) #20

Bummer biscuits (as for as any obvious troubleshooting leads).

Are there any of those KVM VMs that are spcifically on or near FreePBX 13.0.192.16 and Asterisk 13.17.0?


(Andrew Nagy) #21

Unfortunately that doesn’t make any difference. Some are some aren’t