Callers randomly being put on hold, callee drops

I’ve been running into a bizarre issue for the last few months, and I can’t seem to track it down.

When someone makes a call (outside-to-inside our phone system, or inside-to-inside), they go through one of these paths:

  • They will be in a queue to wait to talk to someone
  • They will talk with someone and be put on hold (parked) to be answered by another phone
  • They will pass through a ring group and talk to someone

In all but the last ‘path’, something strange occasionally happens:
The callee will answer the phone or un-park the user and have anywhere from a few to 30 seconds of a conversation with the caller, then suddenly the callee’s side will drop as if they hanged up the phone, and the caller immediately starts hearing hold music.

Out of ~20,000 calls per day, this happens to maybe 50 of them.

The caller that hears the hold music doesn’t appear to be in one of the ‘parking groups’ because they never stop hearing hold music. All our parking groups get directed back to a ring group after ~180 seconds. The caller holds apparently forever until they decide to hang up.

Anyways, it’s difficult to troubleshoot because it’s a needle in a haystack…but I got lucky this morning and managed to have it happen to me.

In the logs, I see the following:

[2021-04-09 11:50:12] VERBOSE[34465][C-0000a6c6] bridge_channel.c: Channel PJSIP/4130-000255db joined 'simple_bridge' basic-bridge <17c0dee5-f718-4747-832b-0e768cb0b745>
[2021-04-09 11:50:12] VERBOSE[33709][C-0000a6c6] bridge_channel.c: Channel IAX2/CAIT-SD-11497 joined 'simple_bridge' basic-bridge <17c0dee5-f718-4747-832b-0e768cb0b745>
[2021-04-09 11:50:30] VERBOSE[33709][C-0000a6c6] bridge_channel.c: Channel IAX2/CAIT-SD-11497 left 'simple_bridge' basic-bridge <17c0dee5-f718-4747-832b-0e768cb0b745>
[2021-04-09 11:50:30] VERBOSE[33709][C-0000a6c6] parking/parking_bridge.c: Parking 'IAX2/CAIT-SD-11497' in 'parkinglot_2' at space 4151
[2021-04-09 11:50:30] VERBOSE[33709][C-0000a6c6] bridge_channel.c: Channel IAX2/CAIT-SD-11497 joined 'holding_bridge' parking-bridge <32fce9ac-60dd-42a9-8f94-6902649ac8c7>
[2021-04-09 11:50:30] VERBOSE[33709][C-0000a6c6] res_musiconhold.c: Started music on hold, class 'default', on channel 'IAX2/CAIT-SD-11497'

It literally looks like the callee decided to park my call while we were mid-sentence in a conversation.
After waiting on hold for 5 minutes to see if it would return me to the ring group, I hung up and called back. I asked if it was possible they (or something else) bumped the park button on the phone. I was told “absolutely not”. This matches with what I hear from every other office.

The only thing that looks even remotely like an error in the log is:

[2021-04-09 11:49:57] WARNING[33723][C-0000a6c6] taskprocessor.c: The 'stasis/m:cache_pattern:1/channel:all-0000030d' task processor queue reached 500 scheduled tasks again.

This does appear to happen around 50 times per day.

Anyways, I changed my stasis.conf to have a max_size of 700, but it looks like a ‘core reload’ won’t cut it. I’m guessing I have to fully restart Asterisk this evening when we’re outside production hours.

Any thoughts if I’m on the right path?

After making changes to stasis and restarting asterisk, the issue is still occurring.
It just happened to me again this morning. I called an office from my internal extension, I talked with someone for a few seconds, they parked my call, someone else unparked me and managed to get out “Hey, it’s Sue” and then the call cut out completely for a split second, then I was suddenly hearing hold music again.

I immediately called back, “Sue” answered and said “#@$#% phon–” and I was hearing hold music again.

I called back a third time and the call went through like normal.

During all that, I managed to gather some basic info. The asterisk process had 99% CPU usage and there were 8 calls occurring at the time.

This is on a single socket, 10-core 2.2 GHz CPU with 32 GB RAM and mirrored SSDs.

There was an update to the latest FreePBX version in there…and what I’m seeing now (that wasn’t occurring before) is that the Asterisk process is at 100% CPU usage all the time. Even when there are no calls occurring.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.