Dropping calls see many " B-channel 0/x successfully restarted on span 1"

FreePBX 2.9 Hand Built
Asterisk 1.4.23.1
Sangoma A200/Remora FXO/FXS Analog AFT card

Lately we are seeing calls drop. The call drops are only happening on the weekends when the calls come into the system to a queue and the members of the queue are all external numbers. This system has been running like this for years without any issues.

I see many " B-channel 0/x successfully restarted on span 1" with x being each channel on the PRI. I don’t think this is the cause of the problem.

Telco says the PRI looks good.

I am going to do pri debug this weekend.

Can any give me some pointers on how to better debug this?

Look at the “dropped calls” for hangup cause.

If you have a PRI card (you don’t mention which one) then that is usually the result of a previous d-channel reset, do you see one, if so then usual causes are shared interrupts or loss of sync on the span, this is usually frame slips above an acceptable (very low) level, so debugging at the PRI level will possibly show the d-channel error, it will not see the lower level clocking etc. errors dahdi_tool is a quick visual check for obvious errors there. Check your line build-out and rx/tx settings for sane settings and the hardware is not sharing an interrupt, then you are pretty well left with needing a t-berd.

I posted the card type but here it is again.

lspci | grep Sang
01:01.0 Network controller: Sangoma Technologies Corp. A200/Remora FXO/FXS Analog AFT card

The card is in fact a PRI card so I don’t know why it shows up as FXO/FXS.

As I said this does not happen during the week when the calls are sent to the queue and the agents are on their Polycom phones. It only happens on the weekends when the calls are sent to the queue and the agents are cell phones.

This system has been working for years with the same config, same versions and no changes to software that I am aware of.

Like I said lets get the

That’s why I asked Alan, the A200 is indeed an fxo/fxs device,

An example of one of mine shows

lspci | grep Sang
01:06.0 Network controller: Sangoma Technologies Corp. A104d QUAD T1/E1 AFT card

So maybe check with Sangome there.

lshw|grep -B 3 -A 8 -i sangoma

should identify which one. Are you using the wanrouter/wanpipe setup from sangoma? if so they will be what you see for the interrupts and you will see wanpipen seperated with a comma but nothing else on that line

(but it is in fact an A108 in this case)

At some time in the past this system had a Sangoma analog card with POTS lines. The system was then converted to use a PRI so the analog card was removed and the Sabgoma PRI card installed. This was before I took over maintaining the system.

Single port T-1s show up as the A200 as they both share the same base. This is normal

Here is some of the log info for a dropped call. I don’t see anything indicating other than a normal hangup. Both parties say they did not hang up, the call was dropped.

Any input on debugging this would be appreciated.

[May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.0.74 with 10.10.194.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.0.74 with 10.16.20.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.0.74 with 10.10.200.0 [May 18 18:34:58] DEBUG[13507] chan_sip.c: Stopping retransmission on '[email protected]' of Request 102: Match Found [May 18 18:34:58] DEBUG[13507] chan_sip.c: Allocating new SIP dialog for (No Call-ID) - OPTIONS (No RTP) [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.1.179 with 172.16.0.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.1.179 with 10.10.194.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.1.179 with 10.16.20.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 172.16.1.179 with 10.10.200.0 [May 18 18:34:58] DEBUG[13507] chan_sip.c: Stopping retransmission on '[email protected]' of Request 102: Match Found [May 18 18:34:58] DEBUG[13507] chan_sip.c: Allocating new SIP dialog for (No Call-ID) - OPTIONS (No RTP) [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 10.10.200.199 with 172.16.0.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 10.10.200.199 with 10.10.194.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 10.10.200.199 with 10.16.20.0 [May 18 18:34:58] DEBUG[13507] acl.c: ##### Testing 10.10.200.199 with 10.10.200.0 [May 18 18:34:59] DEBUG[13507] chan_sip.c: Stopping retransmission on '[email protected]' of Request 102: Match Found [May 18 18:34:59] VERBOSE[13503] logger.c: < Protocol Discriminator: Q.931 (8) len=9 [May 18 18:34:59] VERBOSE[13503] logger.c: < Call Ref: len= 2 (reference 7362/0x1CC2) (Terminator) [May 18 18:34:59] VERBOSE[13503] logger.c: < Message type: CONNECT (7) [May 18 18:34:59] VERBOSE[13503] logger.c: < [1e 02 84 82] [May 18 18:34:59] VERBOSE[13503] logger.c: < Progress Indicator (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Public network serving the remote user (4) [May 18 18:34:59] VERBOSE[13503] logger.c: < Ext: 1 Progress Description: Called equipment is non-ISDN. (2) ] [May 18 18:34:59] VERBOSE[13503] logger.c: -- Processing IE 30 (cs0, Progress Indicator) [May 18 18:34:59] VERBOSE[13503] logger.c: q931.c:3620 q931_receive: call 40130 on channel 8 enters state 10 (Active) [May 18 18:34:59] VERBOSE[13503] logger.c: > Protocol Discriminator: Q.931 (8) len=5 [May 18 18:34:59] VERBOSE[13503] logger.c: > Call Ref: len= 2 (reference 7362/0x1CC2) (Originator) [May 18 18:34:59] VERBOSE[13503] logger.c: > Message type: CONNECT ACKNOWLEDGE (15) [May 18 18:34:59] DEBUG[13503] chan_dahdi.c: Echo cancellation already on [May 18 18:34:59] VERBOSE[22759] logger.c: -- Zap/8-1 answered Local/13096605500@from-queue-7392,2 [May 18 18:34:59] DEBUG[22759] rtp.c: Channel 'Local/13096605500@from-queue-7392,2' has no RTP, not doing anything [May 18 18:34:59] DEBUG[22759] chan_dahdi.c: Requested indication 20 on channel Zap/8-1 [May 18 18:34:59] DEBUG[13508] app_queue.c: Device 'Local/13096605500@from-queue' changed to state '2' (In use) [May 18 18:34:59] DEBUG[22751] app_queue.c: Dunno what to do with control type -1 [May 18 18:34:59] VERBOSE[22751] logger.c: -- Local/13096605500@from-queue-7392,1 answered Zap/10-1 [May 18 18:34:59] DEBUG[22751] chan_dahdi.c: Set option TONE VERIFY, mode: MUTECONF(1) on Zap/10-1 [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/15153417889@from-queue-735d,1' [May 18 18:34:59] DEBUG[13508] app_queue.c: Device 'Local/13096605500@from-queue' changed to state '2' (In use) [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/14235524684@from-queue-b32a,1' [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/13093608011@from-queue-ef54,1' [May 18 18:34:59] DEBUG[13508] app_queue.c: Device 'Local/15153417889@from-queue' changed to state '2' (In use) [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/13096575521@from-queue-b838,1' [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/13096574835@from-queue-a86c,1' [May 18 18:34:59] DEBUG[22755] rtp.c: Channel 'Local/13093608011@from-queue-ef54,2' has no RTP, not doing anything [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/13092316816@from-queue-dd57,1' [May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel 'Local/15073810626@from-queue-3b8e,1' [May 18 18:34:59] DEBUG[22760] rtp.c: Channel 'Local/15073810626@from-queue-3b8e,2' has no RTP, not doing anything [May 18 18:34:59] DEBUG[22760] channel.c: Hanging up channel 'Zap/3-1' [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: dahdi_hangup(Zap/3-1) [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Set option AUDIO MODE, value: ON(1) on Zap/3-1 [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Hangup: channel: 3 index = 0, normal = 10, callwait = -1, thirdcall = -1 [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Not yet hungup... Calling hangup once with icause, and clearing call [May 18 18:34:59] VERBOSE[22760] logger.c: NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Call Delivered, peerstate Call Received [May 18 18:34:59] VERBOSE[22760] logger.c: q931.c:3009 q931_disconnect: call 40125 on channel 3 enters state 11 (Disconnect Request) [May 18 18:34:59] VERBOSE[22760] logger.c: > Protocol Discriminator: Q.931 (8) len=9 [May 18 18:34:59] VERBOSE[22760] logger.c: > Call Ref: len= 2 (reference 7357/0x1CBD) (Originator) [May 18 18:34:59] VERBOSE[22760] logger.c: > Message type: DISCONNECT (69) [May 18 18:34:59] VERBOSE[22760] logger.c: > [08 02 81 90] [May 18 18:34:59] VERBOSE[22760] logger.c: > Cause (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) Spare: 0 Location: Private network serving the local user (1) [May 18 18:34:59] VERBOSE[22760] logger.c: > Ext: 1 Cause: Normal Clearing (16), class = Normal Event (1) ] [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: disabled echo cancellation on channel 3 [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Set option TDD MODE, value: OFF(0) on Zap/3-1 [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Updated conferencing on 3, with 0 conference users [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: Set option AUDIO MODE, value: OFF(0) on Zap/3-1 [May 18 18:34:59] DEBUG[22760] chan_dahdi.c: disabled echo cancellation on channel 3 [May 18 18:34:59] VERBOSE[22760] logger.c: -- Hungup 'Zap/3-1' [May 18 18:34:59] DEBUG[22760] app_dial.c: Exiting with DIALSTATUS=CANCEL.

My guess is that that the beginning of the log you post the bridge to the TDM channel is already broken and attempts to reconnect are not successful (the acl are being checked again, the sip bridge succeeds but the TDM leg fails, I notice that this deployment is still using ZAP not DAHDI, perhaps the compatibility is broken in your presumably updated dahdi, you might check that ZAP2DAHDICOMPAT=1 (advanced settings) and retry.

I notice that both zap/3 and zap/8 are both hungup at the same time, co-incidence? I don’t think so, back to debugging the PRI at a low level perhaps.

We are still using ZAP. This is a very old system. I just got verbal approval to replace it but it will be some time before the new system is in.

ZAP2DAHDICOMPAT is currently disabled. I am not sure if this setting was changed recently, I doubt it was changed recently.

What I find odd about this issue is that this problem only seems to happen when the calls come into the queue and are sent out to an offsite agent.

Ext: 1 Cause: Normal Clearing (16),
This was received from your carrier. They are convinced they were told to hang up

To dig deeper we need to turn on pri debug qhich will show you every frame.

Are all these tetsts between the same parties?

extracting the channel debug info

[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/15153417889@fromEmail links icon-queue-735d,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/14235524684@fromEmail links icon-queue-b32a,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/13093608011@fromEmail links icon-queue-ef54,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/13096575521@fromEmail links icon-queue-b838,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/13096574835@fromEmail links icon-queue-a86c,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/13092316816@fromEmail links icon-queue-dd57,1’
[May 18 18:34:59] DEBUG[22751] channel.c: Hanging up channel ‘Local/15073810626@fromEmail links icon-queue-3b8e,1’
[May 18 18:34:59] DEBUG[22760] channel.c: Hanging up channel ‘Zap/3-1’

I see 7 different local channels involved I suggest that that Would indicate that when your PRI decided to reset it’s bchannels
then the whole state of the span when to lalaland and the app_queue got confused, in recovery there is little it could do but hang up all parties.

are you saying that simple dahdi/zap bridged calls can recover correctly and it is only when chan_sip is involved to app_queue does the call get hungup ?

James

These are live calls. The customer calls the support number and is put in a queue for after hours support. All the agents in the after hours support queue as cell phone numbers. The system bridges the customer with the agent’s cell.

Not sure also what you mean “wait for new system”, the software is free to download.

This sounds very business critical, you may want to engage the FreePBX support team.

We are buying the hardware from SchmoozeCom. Dual Xtreme 1000s, HA, Digium R series for PRI fail over. Had to get approval from management which was not hard since it is so mission critical.

Exactly, so when the cell phone bridge goes away and the queue will hangup the bridge, are the other calls also coming in TDM and being dropped? apparently so, If so then again I suggest you have a problem with your span at a low level because app_queue could never ask for a global span reset.

And of course you should post your versions of

libpri
dahdi
asterisk

and
FreePBX,

making sure they are all up to date (or at least nout insanely out of date) would be my first order of business while debugging such a scenario.