Asterisk quirk observer seeking sage advice

A clean load (amportal restart) tends to work well. After a few changes and the subsequent reloads I end up with the FAX detection code getting skipped. It isn’t that it has anything to do with FAX but seems to be more elusive.

Consider this excerpt from extensions_additional.conf

exten => 8885554320,1,Set(__FROM_DID=${EXTEN}) exten => 8885554320,n,ExecIf($[ "${CALLERID(name)}" = "" ] ?Set(CALLERID(name)=${CALLERID(num)})) exten => 8885554320,n,Set(__CALLINGPRES_SV=${CALLERPRES()}) exten => 8885554320,n,Set(CALLERPRES()=allowed_not_screened) exten => 8885554320,n,Set(FAX_DEST=ext-fax^4320^1) exten => 8885554320,n,Answer exten => 8885554320,n,Wait(4) exten => 8885554320,n(dest-ext),Goto(ext-queues,432099,1)

But then look at what shows in the log:

[Sep 15 13:23:42] VERBOSE[15103] chan_dahdi.c: -- Accepting call from '9115551013' to '8885554320' on channel 0/1, span 2 [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [[email protected]:1] Set("DAHDI/25-1", "CDR(accountcode)=6124604320") in new stack [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [[email protected]:2] ExecIf("DAHDI/25-1", "0?Set(CALLERID(name)="Paul Tuten")") in new stack [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [[email protected]:3] ExecIf("DAHDI/25-1", "0?Set(CALLERID(name)="Peter Mobile")") in new stack [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [[email protected]:4] Set("DAHDI/25-1", "CALLERPRES()=allowed_not_screened") in new stack [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [[email protected]:5] Goto("DAHDI/25-1", "ext-queues,432099,1") in new stack [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Goto (ext-queues,432099,1) [Sep 15 13:23:42] VERBOSE[31905] pbx.c: -- Executing [432[email protected]:1] Macro("DAHDI/25-1", "user-callerid,") in new stack

So, okay, I look at the current dialplan in asterisk:

'8885554320' => 1. Set(__FROM_DID=${EXTEN}) [pbx_config] 2. ExecIf($[ "${CALLERID(name)}" = "" ] ?Set(CALLERID(name)=${CALLERID(num)})) [pbx_config] 3. Set(__CALLINGPRES_SV=${CALLERPRES()}) [pbx_config] 4. Set(CALLERPRES()=allowed_not_screened) [pbx_config] 5. Set(FAX_DEST=ext-fax^4320^1) [pbx_config] 6. Answer() [pbx_config] 7. Wait(4) [pbx_config] [dest-ext] 8. Goto(ext-queues,432099,1) [pbx_config]

I know that the dialplan on this system is pretty big but it certainly isn’t huge. Is there a graceful way to correct the running system without doing a restart? (running 1.6.2.11, fpbx 2.7)

I can get a restart in tonight to get the proper behavior back but I wonder if there’s a way to recover from this or perhaps avoid it all together. Any ideas?

Thanks,

peter

To wrap up this issue – the AMI connection was being closed during the process of building the configuration. A reconnect logic was added to php-asmanager.php to handle this situation. The “why” isn’t known, but this was with asterisk 1.6.2.11.

To add a bit more information, it appeared that during periods of heavy activity during a reload, where many events were hitting the AMI, the connection was being lost as Peter points out. This would result in the php-asmanager.php calls simply returning with no results and since most of the code in FreePBX expects to get an answer, “bad things” would happen.

We have added code in the 2.9 version of this library to detect these situations where the socket unexpectedly closes, and if encountered, try to reconnect and re-issue the command that was attempted. This has been tested to work but we have not yet back-ported it to 2.8 until it gets a bit more testing.

However, the other “solution” that seemed to fix the problem (in as far as it stopped the unexpected disconnects) was disabling events for the manager connection that is opened. Once we did this, we could not get it to fail anymore. This has been back-ported to 2.8 since there is no reason to be bothered with events as we are not doing anything with them.

Between both changes, this issue should be gone. As far as why the AMI socket unexpectedly closes during heavy periods of events, that’s still a mystery and very possibly a bug somewhere in Asterisk.

The call is not in the same context. Note:

exten => 8885554320,1,Set(__FROM_DID=${EXTEN})
exten => 8885554320,n,ExecIf($[ "${CALLERID(name)}" = "" ] ?Set(CALLERID(name)=${CALLERID(num)}))

The above is from your extensions_additional-conf, however, when the call enters the system it goes to another context as you have:

[Sep 15 13:23:42] VERBOSE[15103] chan_dahdi.c:     -- Accepting call from '9115551013' to '8885554320' on channel 0/1, span 2
[Sep 15 13:23:42] VERBOSE[31905] pbx.c:     -- Executing [[email protected]:1] Set("DAHDI/25-1", "CDR(accountcode)=6124604320") in new stack

You have some custom dialplan that sets CDR(accountcode), you probably copied some dialplan from extensions_additional.conf and added it to some _custom.conf file and forgot about it.
When you post examples from your configuration, please include the context.

Here’s my extensions_custom.conf relevant context:

[from-pstn-custom] exten => _+1.,1,Set(DID=${EXTEN:2}) ; Remove +1 from DID exten => _+1.,n,Set(CALLERID(num)=${CALLERID(num):2}) ; Remove +1 from CID exten => _+1.,n,Set(CDR(accountcode)=${DID}) exten => _+1.,n,Goto(from-trunk,${DID},1) ; Go to from-trunk exten => _X.,1,Set(CDR(accountcode)=${EXTEN}) exten => _X.,n,ExecIf($["${CALLERID(number)}" = "6122163551"]?Set(CALLERID(name)="Paul Tuten")) exten => _X.,n,ExecIf($["${CALLERID(number)}" = "6128892293"]?Set(CALLERID(name)="Peter Mobile"))

The dialplan excerpt in the original post is from [ext-did-0002] found in extensions_additional.conf.

I’ve gone over the logs closer to find that the last “properly” handled call was just before a reload. That reload would have added a new Device, User, changed an inbound route for one DID (of the 400 in the system) and added a Conference room definition. (This is a typical Add for this system.)

From that reload until a reload later (there were three reloads in between) the steps for picking up the FAX calls were skipped. This appears to be true for all incoming DIDs, but I’m not sure. When I grabbed the dialplan from asterisk and from disk it was after it had apparently started working again. So my synopsis above isn’t necessarily accurate.

I’ll watch each reload now that I know what to look for in the logs. If the dialplan is being written the same I’ll know this is an asterisk issue. If the dialplan is written without the steps I’ll know it’s a FreePBX issue.

Here is the tell. Reload just when it stopped working as expected:

... [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554319' priority 5 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554320' priority 1 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554320' priority 2 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554320' priority 3 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554320' priority 4 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554320' priority 5 to ext-did-0002 (0x2aaad8031ed0) [Sep 15 10:06:52] VERBOSE[15730] pbx.c: -- Added extension '8885554321' priority 1 to ext-did-0002 (0x2aaad8031ed0) ...

Then the log from the reload where it started working again:

... [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554319' priority 9 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 1 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 2 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 3 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 4 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 5 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 6 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 7 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554320' priority 8 to ext-did-0002 (0x2aaaacb394c0) [Sep 15 15:56:23] VERBOSE[7746] pbx.c: -- Added extension '8885554321' priority 1 to ext-did-0002 (0x2aaaacb394c0) ...

This same pattern can be observed for most of the DIDs.

Thanks Mikael!

Doing misc things – nothing associated with this route being touched – and the extensions_additional.conf has in it:

exten => 8885554319,n(dest-ext),Goto(from-did-direct,4319,1) exten => 8885554320,1,Set(__FROM_DID=${EXTEN}) exten => 8885554320,n,ExecIf($[ "${CALLERID(name)}" = "" ] ?Set(CALLERID(name)=${CALLERID(num)})) exten => 8885554320,n,Set(__CALLINGPRES_SV=${CALLERPRES()}) exten => 8885554320,n,Set(CALLERPRES()=allowed_not_screened) exten => 8885554320,n(dest-ext),Goto(ext-queues,432099,1) exten => 8885554321,1,Set(__FROM_DID=${EXTEN})

I can open this as a bug (and will later today) but what is best next step to get the right info to put in the bug report? I can just include the data here in a report, but it seems like there might be a reload trace or something?

The only thing close I can find that if I last edit a route it will load right. If I edit a conference and reload it builds a wrong config.

What version of FreePBX?
Are you running in Extension mode or in User and Device mode?
In your idle system, do you have xactview installed?

I need a step-by-step instruction how you end up with the error.
You have in the ticket:

Please provide the above so that I get an understanding about your system.

FreePBX 2.7 reasonably current. (Missing the recent disa, infoservices and javassh.)

Running Device and User mode. xactview is installed (idle system is the production pbx only with no active calls – business closed).

change desc of user = Edit Device 4325 ->“Description” to a different name
change settings of user (email addrs) = User 4325 ->
- edit “Display Name”
- edit “Outbound CID”
- edit “FAX Email”
- edit “Voicemail Password”
- edit "Email Address"
change sip secret of device = Edit Device 4325 ->“secret” to a different secret
delete user = User->"Delete User 3908"
delete device = Device->"Delete Device 3908"
change incoming route destination = Point DID for 3908 to Main Menu

In the steps above the employee with ext 4325 left and the number built at 3908 was for the new employee to take their place. It was requested that they keep the same number for the role. The 3808 definition was then removed.

(There are 438 incoming definitions 181 devices and 158 users. The systems handles about 2000 calls per day. I can make the system available, and pay for support, if this is the preferred path.)

I wish I could say if I did “X Y Z” that I could predict it to break, but it doesn’t work like that. It’s as though the FAX configuration is just skipped sometimes for all incoming routes. It doesn’t only break incoming route dialplans with FAX though either. It may not be just FAX that break, but it is the most reported as users answer calls with CNG tones.

If this is a production system, I would say that paid support would be the best approach.

I am a volunteer in this forum, and if I should continue to help you it could take days (weeks) to get to the bottom of this, based on that I live in Sweden and only have an hour or two per day to frequent this forum.

Please click on the link “Get Official Paid Support - Click Here” to get professional support.

I added one user to a Ring Group and the resulting retrieve_conf left me with a broken config. Running retrieve_conf again (after making no changes) resulted in a good config. The diff is attached to the bug report.

I’m all in favor of supplying more information to the forum or the bug, but I’m not overly familiar with debugging php and I’m not exactly sure what you’re looking for in order to find my info helpful.

Is it possible to get a line by line trace with variable evaluation shown in the trace with php?

How much memory does the server have? Type free -m in a shell and post the output here.
Check also your php.ini for the section memory_limit, it should at least be set to: memory_limit = 100M

[[email protected] ~]# free -m total used free shared buffers cached Mem: 3940 3790 149 0 228 2612 -/+ buffers/cache: 949 2990 Swap: 5951 0 5951 [[email protected] ~]# grep memory_limit /etc/php.ini memory_limit = 1024M ; Maximum amount of memory a script may consume [[email protected] ~]#

I installed astmanproxy and pointed freepbx through it thinking that it might be something to do with the reading too much off the AMI stream when doing the early database tunings. This turned out to be a NoOp() as the third appy I tried produced the same, faulty file. The subsequent apply worked fine.

To do my testing I was simply clicking Submit for a User without editing any fields.

Is there a way I can run a trace on retrieve_conf beyond what --debug ? Any other things I can look at to help chase this?

Note: the mem size (VSZ) of retrieve conf never seems to top 200K when I’m watching it run.