Debugging PRI issues?

I have an installation that ran pretty well for the last three years and then, after a hardware upgrade, started dropping calls at random. It was originally built on a Dell 2850 2U rack mount chassis with a 2.8GHz processor and 4GB ram. The upgrade machine is an HP ProLiant DL380 G7 with an E5620 CPU and 6GB RAM. The operating system is CentOS 4.9 (updates run once a month), kernel-2.6.9-101.plus.c4.1.smp, asterisk 1.4.x, dahdi 2.4.x, freepbx 2.8.x, and a Sangoma A104 (4-port) PRI interface card.

When I did the upgrade, I built the OS using a standard image I created and built the dialplan from scratch using FreePBX. Our goal was to use this new machine as an edge router to route / connect calls to other internal machines. Once the new system was in place, it ran fine for a day or two and then started dropping calls and not allowing calls with “all circuits busy” errors. ISDN cause codes were all over the map (16,17,21,28,31,34,44,etc).

The first thing I did was contact Sangoma and although it took them a week to react, one of their techs did double duty working after hours and talking the TELCO via a conference call. After a long, painful week of collecting debug information, what was determined is that the system is not sending calling information out. Sangoma suggested that the problem most likely is a bug in asterisk-pri. We couldn’t afford any more downtime so we moved everything back to the Dell chassis and it’s worked well now for a couple months.

I’ve had great luck with Asterisk and FreePBX up until now. I’m comfortable with Linux, prefer the command line for maintenance tasks, compile all my software, build RPMS’s, etc. but I’m not sure how to proceed with debugging this error! I have watched the logs while a call was being made, saw the Q.931 calling/debug information on the console, and seen nothing go over the circuit using tcpdump. What does one do in this case? Are there any tools I can use to watch libpri? I realize this is like asking you to teach me to be a programmer and a debugger but I need some solid answers on what’s actually going on before I can formulate the next step.

Help save my sanity! Thanks !!

anybody?

To follow up on this issue, I had to disable channel restarts.

What was happening is Asterisk would restart a PRI channel right when a call was coming in and that busied (or just broke) the channel. This was not reported on the console. The net effect was that around 11 channels would be unusable. A developer on asterisk-users gave me the tip.

edit /etc/asterisk/chan_dahdi.conf and add the following to the [channels] context.

resetinterval=never

It’s been up for months now. Sheesh! So simple :wink: