Phones randomly don't ring in a ring group, fail over to queue, queue phones don't ring, calls go to voicemail

darkpixel · January 16, 2020, 7:48pm

We had a perfectly working FreePBX server a few weeks ago, except it was starting to get overloaded.
We purchased a new, larger server, backed everything up, and then restored to the new phone server.

We found out very quickly that a “full backup” apparently doesn’t mean a “full backup”.
A lot of settings weren’t restored properly. One example is incoming routes. We had a handful set to ‘detect faxes’. These were magically flipped into ‘legacy mode’ so faxes were routed incorrectly. We had to manually change ~200 phone lines back to ‘yes’ instead of ‘legacy’…so this problem may be related, but I can’t seem to find it.

When people call in, it goes to an IVR that asks them to press 1 if they are a new customer or 2 if they are an existing customer. The options are simply for tracking purposes on our end.

Pressing 1 sends them to a queue–and for the purposes of this issue I’m ignoring this option.

Pressing 2 sends them to a ring group.
The ring group consists of a handful of Digium D60 phones.
Dialing that ring group from my internal phone always works. It always rings the phones, and someone always answers.

Calling in from a number of different sources (cell phone, home phone, Google Voice, etc…) will sometimes cause the phones to ring, and other times it immediately jumps to the ring group failover destination which is a time condition. If we are ‘in hours’ it goes to the queue I mentioned earlier. If we are ‘outside of hours’ it goes to voicemail.

It seems to be hit-or-miss, but when the phones in the ring group don’t ring, I will sometimes get dumped into the queue, and other times I will get dumped to voicemail.

Looking at the logs, I see:

  == Spawn extension (from-internal, 4001, 1) exited non-zero on 'PJSIP/4002-00005f23'
  == Spawn extension (from-internal, 4001, 1) exited non-zero on 'PJSIP/4003-00005f24'
  == Spawn extension (from-internal, 4001, 1) exited non-zero on 'PJSIP/4010-00005f25'
  == Spawn extension (from-internal, 4001, 1) exited non-zero on 'PJSIP/4020-00005f26'

If I immediately dial those extensions or the ring group on my desk phone, the call connects and the phones ring.

Each office has a firewall that is NOT running SIP ALG. UDP timeouts are set to 300 seconds, and nothing firewall-related has changed in months. It’s the same config that was working from before move to a new server. Each office connects out over the internet to our phone server. There is no VPN or ‘internal network’ involved in the phones communicating. The only phones that are on the same network as the phone server is the call center. The call center users are the ones in the queue I referred to previously.

Rebooting phones involves does appear to clear up the issue for a few days, then it comes back…but it still doesn’t explain why the queue will frequently send users directly to voicemail when there are agents signed in and ready to take calls…and calls from other desk phones are able to get through to the queue.

PitzKey · January 16, 2020, 8:40pm

Why didn’t you use bulk handler for that? Takes less than a minute to export, change an entire column, import and apply config. Bam!

Now to the Ring Group issue. Why are you using a Ring Group over a Queue?

To understand exactly why this is happening, please post a full (pastebin link) call trace

https://wiki.freepbx.org/display/SUP/Providing+Great+Debug#ProvidingGreatDebug-AsteriskLogs-PartII

darkpixel · January 16, 2020, 10:56pm

Why didn’t you use bulk handler for that? Takes less than a minute to export, change an entire column, import and apply config. Bam!

Darn it! I forgot the bulk handler supported routes. I wish I had remembered that a few days ago.

Now to the Ring Group issue. Why are you using a Ring Group over a Queue?

We have a large call center that answers calls. They are using a queue. But each individual office has a ring group because we want to ring their phones a few times and then send them to the call center queue if they’re busy. No need to manage agents, service timeouts, or pay for a bunch of queues in iSymphony when a ring group is more than enough.

We have several thousand calls per hour going through the system, so my debug snippet was short. I think they are closed on Monday for MLK day. If so I’ll grab debug info and packet captures–assuming I can reproduce it. We rebooted the phones a few hours ago, so hopefully it’ll reproduce in the next few days.

PitzKey · January 16, 2020, 11:01pm

Did you read the wiki I linked?

It’s described in detail how to find a call trace up to 7 days ago.

darkpixel · January 16, 2020, 11:35pm

Yup–I read it. I know I can get the info from /var/log/asterisk/full and I’m looking in to that now–but it’s a bit easier to grab packet captures from a system when there aren’t hundreds of active calls.

PitzKey · January 17, 2020, 9:47am

There’s no need for a packet capture, and no one asked for it here.

dicko · January 17, 2020, 10:06am

you can grep out individual calls from the log files with what’s inside the second bracketed string, after VERBOSE, add flavor by adding the first bracketed string (escape the bracket characters themselves in your grep expression)

darkpixel · January 17, 2020, 4:46pm

There’s no need for a packet capture, and no one asked for it here.

No worries. I won’t send you a packet capture.

you can grep out individual calls from the log files with what’s inside the second bracketed string

Yup. Are you volunteering to manually search through ~3,000,000 lines of logfiles to find the issue so you can grab the right string to grep?

I’ll wait until I can duplicate the issue again (shouldn’t be more than a few days) and traffic levels are low so tailing the logs isn’t like watching the star wars credits on 64x speed.

lgaetz · January 17, 2020, 4:48pm

This link shared above shows how to effortlessly isolate a single call trace from the full log(s):
https://wiki.freepbx.org/display/SUP/Providing+Great+Debug#ProvidingGreatDebug-AsteriskLogs-PartII

darkpixel · January 17, 2020, 9:05pm

This link shared above shows how to effortlessly isolate a single call trace from the full log(s):

Maybe I’m not explaining myself well enough.
I’ve read the wiki page. I know how to grab a call ID from the logs.

Now…out of several million lines of log files that occurred on the day this happened…find a call ID.

That’s the problem.

I’ll grab a call ID when:

The issue is occurring again
The system is slow and not flooding my terminal with thousands of log lines per second so I can more easily pick out my call where I can duplicate the issue and its associated Call ID.

dicko · January 17, 2020, 11:11pm

And maybe we are not explaining well enough.
There will be one file per day in /var/log/asterisk/full*

It doesn’t matter how many lines in the files because linux has all the tools you need to filter them appropriately, both programmatically and visually.
For example, Betty complained about something that happened yesterday at about 2pm with
ring-group 1027 then file to explore is full.1

cat /var/log/asterisk/full.1|less

will allow you to scroll backwards and forwards through the day with great alacrity and searching for anything is easy as typing “/” and a regex, perhap “/1027” for a specific endpoint or “/RG-” for ring groups “/ HH:MM” for time etc., the matched expression will be highlighted and the cursor will go to the first matched expression, and given fast scrolling it is really easy to hone in.

You can “pipe” the results of one grep into another so to filter a problem with RG 1027 at around 2pm yesterday

cat /var/log/asterisk/full.1|grep "RG-1027"|grep " 1[34]:|less

would likely expose a suspect call or calls betwen 1pm and 2:59:59 pm and you can extract the CALL-ID’s of the likely suspects, and then

cat /var/log/asterisk/full.1 |egrep “(CALL_ID1|CALL_ID2)” |less

Now the whole call log lines , not just the “quick scanned” ones from step 1, become apparent and “Bob’s your uncle”. There is no need to wait until nothing is happening, and there is no need to strain your eyes, everything is in as slow a motion as you care to hit the arrow keys and further the problem is still open for re-analysis until logrotate ages the event away.

sorvani · January 18, 2020, 1:28am

All oh those instructions aside, the wiki clearly gives you an example of how to get the information for a single call from the GUI

dicko · January 18, 2020, 1:43am

Apparently the OP seems to be stuck in the Asterisk CLI (maybe I misunderstand his posts though) I thought that that needed a correction/reorientation. When he finds ssh/bash (Putty? ) he will be ‘clear’ and the good old standard/easy debugging recipes will now work for him, finding the gold nugget in three million lines still works as it did in 1849, filter out the big rocks, filter out the medium rocks, filter out the small rocks, shake and bake the resulting sand, thats where the nugget is . . .

AdHominem · January 18, 2020, 3:23am

As I said in another thread, I would not rely on the backup and restore feature to move from one machine to another.

If you want to set-up a different machine, set it up from scratch. I know that it’s time consuming (moreso now than ever), but that’s nothing to the amount of time that you’ll spend figuring out all the stuff the backup and restore did and didn’t do.

AdHominem · January 18, 2020, 3:24am

The latest version of FreePBX has a bug in bulk import for extensions. The import fails, the extension is not shown in the GUI, but it IS partially created in Asterisk. I would avoid bulk import until it gets fixed.

dicko · January 18, 2020, 3:30am

Post the Bug reference please, I don’t see a problem in my deployments

PitzKey · January 18, 2020, 4:40pm

I don’t recall seeing any recent community posts with such an issue, but could be there is one.

Anyway, OP needed something for inbound routes, not for extensions.

Also, what’s the “latest version” you are referring to? which version of FreePBX? Edge version?

P.S. a lot of times where users had trouble with bulk import, the actual CSV had errors.

system · January 25, 2020, 4:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.