Freepbx stops processing calls

Hi
I’m having serious issues with a freepbx installation. It occassionaly cease processing calls.
when in happens, the web interface works, the cli works
And I can even issue commands, I can see sip peers as expected.
Nothing strange is logged
It might run for weeks,process 150.000 call legs with no issues, or it can hang twice a day with minimal load

Asterisk 15.5.0 built by mockbuild @ jenkins7 on a x86_64 running Linux on 2018-07-25 22:19:48 UTC
Linux freepbx 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
This is a sangomaOS install.
System will recover completely with fwconsole restart.
Runs under centos kvm, with 8 gigs of ram and 8 cores.
Usage is at about 1.2 giga ram, and never reaches loads of more than 0.5

`Preformatted text`# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             8
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Westmere E56xx/L56xx/X56xx (IBRS update)
Stepping:              1
CPU MHz:               3324.998
BogoMIPS:              6649.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-7
Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm ibrs ibpb spec_ctrl`Preformatted text`

I’m also using asternic fop2.
A few questions
I’ve seen this

If i recall, dahdi_dummy is not a requirement anymore.
“affinity bits on the VM and lock all cores down to a dedicated cpu”
REALLY?

I’m suspecting a dead lock. Core show deadlocks doesn’t seem to be complied in.
Do I have to do a manual install and compile asterisk my self with debug flags ?
What are my options?

Have you tried logging into asterisk CLI and making a call during a period where freepbx is supposedly not processing calls, to check whether the call is received but not moving forward, or not received at all?

Step 1: Move to 13 or 16 as 15 is no longer supported. Until you exhibit this issue on a supported version nothing will be done really.

Supposedly, sangomaOS (centos7) comes bundled with asterisk15 and I have seen rpm updates coming from them. That is the idea of running an asterisk distro.
Now if sangoma doesn’t update and one needs to do this manually, no prob, but are we sure that this is the case?

@stom Yes, between Oct 2017- Oct 2018 Asterisk v15 was a “Stable” release. Stable releases for Asterisk have a one year life span, after that span only security fixes are available for another year. The LTS versions have multiple years for support/bug fixes. v13 and v16 are the current LTS versions.

So if there is a bug in v15 that is causing your issue, it’s not going to be addressed. The answer is going to be “Move to v13 or v16 and replicate. If still an issue open a bug report”. I’m just helping you cut out the extra dance moves you would have to do for this.

If you are running Asterisk 14/SNG7 make sure you have all the latest system updates and then via the system CLI run asterisk-version-switch you will be presented with options for v13, v15 (maybe and might say EOL) and v16. Pick v13 or v16 and it will update Asterisk for you in a minute or two.

Once you’ve done that, test your issue again. If you can replicate it, then we can figure out if it is really an Asterisk issue or something else and actually be able to move forward on it.

I’ m running asterisk 15/sng7
Pick the Asterisk Version you would like to change to.
Press 1 and the Enter key for Asterisk 13 (LTS) (With Opus and G729 codecs)
Press 2 and the Enter key for Asterisk 15 (With Opus and G729 codecs)
Press 3 and the Enter key for Asterisk 16 (LTS) (With Opus and G729 codecs)
Press 9 and the Enter key to exit and not change your Asterisk Version

I have a bad timing feeling.
Asterisk 16 is too new. 13 is a tad old. And 15 is feature eol.
I can’t replicate the issue easily. It might take weeks to occur. And without core show deadlocks, have nothing to report too.

So is it safe to move among versions? Can I switch to 16 and see how it goes?
Can I switch back to 13 without reinstalling?
And freepbx ontop will tolerate all this?
(Yes, I have a virtualised env, but still its a lot of hassle doing these with sane backups on a production system.)
And yes, Im also trying to avoid a few dance moves :slight_smile:

Sure I have. call apearances turn busy everywhere, incoming calls timeout because they are not processed, nothing is logged on cli. (but I can issue e.g core reload succesfully) And internal calls fail too

“Stable” versions are where new features and updates are made. PJSIP was introduced in 12, a Stable release. Asterisk 14 and 15 were both Stable releases due to the amount of changes made to Asterisk over those versions. This broken the old cycle of odd numbers being LTS.

Asterisk 16 is the result of Asterisk 14 and 15’s development. So it being “too new” means it has fixes/updates that last release of 15 doesn’t. So does 13. These are Long Term Support releases. They get active bug fixes during their release cycle. 13.23.1 was just released a few weeks ago.

I’ve been running 16 since it’s release and I have had zero issues.

That is strange for sure. Nothing logged on dmesg either?

Dmesg has only the last boot messages, days ago.
I don’t see any system related issues, both at the guest and the kvm host which has an uptime of 61 days with little load and no issues on other guests

I’m tempted to jump ship to 16… This might even solve the race condition that I’m facing, (or make it impossible to miss…
We shall see.
I have a maintenance window over the weekend.

The only bad thing is that even if it is fixed, the doubt will remain for weeks :frowning:

System was updated to asterisk 16.0
Run of the day was uneventful, with 10.500 call legs.

I have no means to replicate the issue on purpose, so I guess I have to wait for it to (not) happen. :slight_smile:
— Update
So, an uneventful week passed with about 30.000 call legs without a glitch.
No changes in configuration, no updates.
Suddenly today system stopped processing calls
With lots of
[2018-11-16 13:15:14] WARNING[28615][C-00000fbb]: channel.c:1124 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/[email protected];2
First occurance was at 13:10
Looks and feels like a deadlock
freepbx*CLI> core show version
Asterisk 16.0.0 built by mockbuild @ jenkins7 on a x86_64 running Linux on 2018-10-10 17:15:57 UTC

one reference is here
https://issues.asterisk.org/jira/browse/ASTERISK-26956 with no real resolution
and this one
{SOLVED} Weird! (Strange Things)
owing to dns, which seems very unlikely.
First dns was 127.0.0.1 eliminated to be the only one.
(other ns’s are local and with no issues too.)

What is the best approach to enable core show deadlocks?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

You have to build asterisk from source.