FreePBX 2.9 Hand Built
Sangoma A200/Remora FXO/FXS Analog AFT card
Lately we are seeing calls drop. The call drops are only happening on the weekends when the calls come into the system to a queue and the members of the queue are all external numbers. This system has been running like this for years without any issues.
I see many " B-channel 0/x successfully restarted on span 1" with x being each channel on the PRI. I don’t think this is the cause of the problem.
Telco says the PRI looks good.
I am going to do pri debug this weekend.
Can any give me some pointers on how to better debug this?
If you have a PRI card (you don’t mention which one) then that is usually the result of a previous d-channel reset, do you see one, if so then usual causes are shared interrupts or loss of sync on the span, this is usually frame slips above an acceptable (very low) level, so debugging at the PRI level will possibly show the d-channel error, it will not see the lower level clocking etc. errors dahdi_tool is a quick visual check for obvious errors there. Check your line build-out and rx/tx settings for sane settings and the hardware is not sharing an interrupt, then you are pretty well left with needing a t-berd.
lspci | grep Sang
01:01.0 Network controller: Sangoma Technologies Corp. A200/Remora FXO/FXS Analog AFT card
The card is in fact a PRI card so I don’t know why it shows up as FXO/FXS.
As I said this does not happen during the week when the calls are sent to the queue and the agents are on their Polycom phones. It only happens on the weekends when the calls are sent to the queue and the agents are cell phones.
This system has been working for years with the same config, same versions and no changes to software that I am aware of.
should identify which one. Are you using the wanrouter/wanpipe setup from sangoma? if so they will be what you see for the interrupts and you will see wanpipen seperated with a comma but nothing else on that line
At some time in the past this system had a Sangoma analog card with POTS lines. The system was then converted to use a PRI so the analog card was removed and the Sabgoma PRI card installed. This was before I took over maintaining the system.
My guess is that that the beginning of the log you post the bridge to the TDM channel is already broken and attempts to reconnect are not successful (the acl are being checked again, the sip bridge succeeds but the TDM leg fails, I notice that this deployment is still using ZAP not DAHDI, perhaps the compatibility is broken in your presumably updated dahdi, you might check that ZAP2DAHDICOMPAT=1 (advanced settings) and retry.
I see 7 different local channels involved I suggest that that Would indicate that when your PRI decided to reset it’s bchannels
then the whole state of the span when to lalaland and the app_queue got confused, in recovery there is little it could do but hang up all parties.
are you saying that simple dahdi/zap bridged calls can recover correctly and it is only when chan_sip is involved to app_queue does the call get hungup ?
These are live calls. The customer calls the support number and is put in a queue for after hours support. All the agents in the after hours support queue as cell phone numbers. The system bridges the customer with the agent’s cell.
Exactly, so when the cell phone bridge goes away and the queue will hangup the bridge, are the other calls also coming in TDM and being dropped? apparently so, If so then again I suggest you have a problem with your span at a low level because app_queue could never ask for a global span reset.