Packets not being processed by Asterisk on Debian 12

dotcom · February 2, 2025, 12:54pm

Maybe stupid question: Do you guys also use the combination:

Debian 12.2
Asterisk 18 or 20
FreePBX 17.0.19.23

Or is this too new?

BlazeStudios · February 2, 2025, 2:18pm

How did you install all this?

dotcom · February 2, 2025, 7:26pm

Spun up Debian 12 netinst → Used the FreePBX 17 installation script (GitHub - FreePBX/sng_freepbx_debian_install: FreePBX 17 Installation Script) to make live easier.

PS: Sorry, I’m running Debian 12.9 (not 12.2)

BlazeStudios · February 2, 2025, 7:51pm

And how did you get on Asterisk 18 or 20?

dotcom · February 2, 2025, 8:04pm

Using asterisk-version-switch

dotcom · February 5, 2025, 9:36pm

No other ideas? Pretty lost here what could cause this…
Are you guys also running FPBX 17 on Debian 12?

BlazeStudios · February 5, 2025, 9:50pm

Anyone running FreePBX v17 is running it on Debian 12. These issues don’t exist on the v17 systems I have.

So none of the endpoints can register or make calls either?

jcolp · February 5, 2025, 10:02pm

From an Asterisk perspective, we haven’t gotten any reports directly/via FreePBX/anywhere of things like this.

dotcom · February 6, 2025, 7:46am

I found this similar post in the Asterisk forum… from end of last year:

Here it seems you suggested taking a backtrace, but can you please clarify how I can take one on my FPBX system using ast_coredumper? (without compiling Asterisk?)

dotcom · February 6, 2025, 7:47am

Correct yes… Asterisk suddenly stops processing SIP packets. (they seem stuck in the kernel buffer)

jcolp · February 6, 2025, 9:15am

@kgupta What is the current method for getting a backtrace with FreePBX 17 on Debian?

kgupta · February 6, 2025, 10:36am

We are yet to push asterisk-debug info packages for the Debian platform. We have internal Jira for this , will try to release the package asap.

Best Regards
Kapil

dotcom · March 2, 2025, 8:32pm

Hi again, just wondering if you already have an idea when this will be pushed? Thanks!

dotcom · March 25, 2025, 10:27am

Unfortunately still having the issue here…

When the issue occurred last time, I took a core dump of the asterisk process using:
kill -SIGSEGV $(pidof asterisk)

Afterwards used the following to open the backtrace:
gdb /usr/sbin/asterisk /tmp/coreXXX

When loaded, took a backtrace using:
bt full

Took the output to my good friend ChatGPT and it said the following:

From the provided trace:

The call stack is blocked in a poll() call (__GI___poll) with a timeout=-1. A negative timeout means it will wait indefinitely for an event to occur.

The function calls beneath the poll() indicate involvement of the console input (ast_el_read_char, el_wgetc, el_wgets, el_gets). These functions originate from Asterisk’s use of the libedit library (used for the interactive Asterisk CLI).

Why this causes your issue:

When you run with -c, Asterisk is attached to the terminal. If the SSH session, terminal window, or connection hangs, gets interrupted, or experiences any disruption, it can block Asterisk indefinitely in an internal poll() call waiting for console input, exactly as your stack trace shows.**

When I run ps -ef | grep asterisk, I see 2 processes:

root 1697286 1 0 08:28 pts/0 00:00:00 /bin/sh /usr/sbin/safe_asterisk -U asterisk -G asterisk
asterisk 1697288 1697286 14 08:28 pts/0 00:24:50 /usr/sbin/asterisk -f -U asterisk -G asterisk -vvvg -c

“safe_asterisk” seems pretty old, not sure if that’s normal?

I initially installed FPBX / Asterisk using the sng_freepbx_debian_install.sh script…

dotcom · March 25, 2025, 11:07am

PS: According to my GPT friend:

On FreePBX-managed systems, the correct, production-ready method of starting Asterisk (managed by fwconsole) is typically:

/usr/sbin/asterisk -U asterisk -G asterisk

On my system, it currently runs as follows:
/usr/sbin/asterisk -f -U asterisk -G asterisk -vvvg -c

Should I change it? And if so, how?

david55 · March 25, 2025, 12:17pm

Killing with a false reason will just cause confusion.

For a presumed deadlock, you should use the coredumper script, which will, I believe, attach gdb to the running process.

It sounds like our favourite misinformation source has diagnosed the wrong thread. You need to provide the actual all threads backtrace, not an LLM’s attempt to summarise, a probably innocent, single thread.

dotcom · March 25, 2025, 12:34pm

ok… so will need to wait until FPBX dev’s provide the coredumper script for Debian right?

david55 · March 25, 2025, 12:49pm

You can use gcore to take a core dump of a running process.

Then you can use it gdb, and use the “thread apply all bt” command to get backtraces for all the threads.

coredumper will do more, but you would need to look at how it works.

dotcom · March 25, 2025, 3:58pm

Analysis: Core Issue Identified (Deadlock in PJSIP / TCP transport):

Multiple threads (especially threads #90, #96, #111, and #117) are blocked, attempting to acquire a lock in the PJSIP (libasteriskpj.so) layer:

Key repeated pattern across threads:

#0  futex_wait
#1  __GI___lll_lock_wait
#2  pthread_mutex_lock
#3  pj_mutex_lock
#4  pj_lock_acquire
#5  grp_lock_acquire
#6  pj_grp_lock_acquire
#7  pj_ioqueue_lock_key
#8  pj_ioqueue_send
#9  pj_activesock_send
#10 tcp_send_msg
...

Specifically, threads seem stuck on sending TCP messages (tcp_send_msg), indicating a lock-up or deadlock within PJSIP’s TCP transport layer.

Does this rings a bell on your side by any chance?

david55 · March 25, 2025, 5:55pm

Whilst it doesn’t ring any bells, it does sound like a deadlock.

What is the source code line number from which the above is called, and what is the version of the source code? The full stack frame for frames 7 through 9 might also be helpful.

I assume this wasn’t built with thread debugging, so there is no easy way of fining what currently owns the lock that isn’t available, but maybe look for a thread that is waiting on something different within tcp_send_msg (although it is possible that a lock was not released before that exited.