Random Call Drops - Help!

Hello

I recently overhauled my VoIP network a few weeks back and set it back up on 64bit 5.211.65-11

I have been aware of the occasional call drop (even on the old set up) but by the time I’m told the logs are so huge with other stuff going on that it’s a nightmare tracing it back.

My system is primarily Polycom IP500 handsets registering to a primary FPBX which then has an E1 PRI to a legacy Alcatel 4400 where the majority of our users still are. The Alcatel also handles our outgoing ISDN

As the call drops are random I can’t watch anything realtime or replicate the problem so rely on what I get reported back, luckily a colleague sitting right next to me lost a call yesterday to an Alcatel extension, on looking at the logs everything seems fine and the call hangs up like a normal call from the IP side

I’ve now been told that one of our heavier IP users has had so many calls cut that he’s taken it to be the norm and never told us!

The only common thing I have is that all the calls that are dropping seem to be Alcatel extensions or outside lines, basically any call that passes the PRI link seems to be the problem. I’ve never had or experienced a dropped IP to IP call

As for circuit timing the E1 link on the IP side is set to get it’s clock from the Alcatel and the Alcatel is set to get it’s timing from our ISDN via BT

I ran this a few times yesterday and it varies hugely. Sometimes 100% to 99.999% through the entire test and others is drops way down.

Results of DAHDI Test are:

Opened pseudo dahdi interface, measuring accuracy…

8192 samples in 8188.112 system clock sample intervals (99.953%)
8192 samples in 8191.712 system clock sample intervals (99.996%)
8192 samples in 8195.552 system clock sample intervals (99.957%)
8192 samples in 8188.056 system clock sample intervals (99.952%)
8192 samples in 8191.720 system clock sample intervals (99.997%)
8192 samples in 8191.784 system clock sample intervals (99.997%)
8192 samples in 8188.008 system clock sample intervals (99.951%)
8192 samples in 8191.864 system clock sample intervals (99.998%)
8192 samples in 8191.872 system clock sample intervals (99.998%)
8192 samples in 8191.848 system clock sample intervals (99.998%)
8192 samples in 8191.760 system clock sample intervals (99.997%)
8192 samples in 8191.824 system clock sample intervals (99.998%)
8192 samples in 8191.784 system clock sample intervals (99.997%)
8192 samples in 8191.968 system clock sample intervals (100.000%)
8192 samples in 8196.688 system clock sample intervals (99.943%)
8192 samples in 8187.112 system clock sample intervals (99.940%)
8192 samples in 8191.712 system clock sample intervals (99.996%)
8192 samples in 8191.825 system clock sample intervals (99.998%)
8192 samples in 8191.849 system clock sample intervals (99.998%)
8192 samples in 8192.031 system clock sample intervals (100.000%)
— Results after 20 passes —
Best: 100.000% – Worst: 99.940% – Average: 99.983238%
Cummulative Accuracy (not per pass): 99.993

Unfortunately I’m not much of an expert and this is starting to get a bit out of hand so I would appreciate any help or advice.

Thanks

From

man dahdi_test
.
.
Values of 100% and 99.99% Are normally considered a definite pass. Values of 99.98% and 99.97% are probably OK as well.
.
.

You don’t cut it here:-

8192 samples in 8191.968 system clock sample intervals (100.000%)
8192 samples in 8196.688 system clock sample intervals (99.943%)
8192 samples in 8187.112 system clock sample intervals (99.940%)
8192 samples in 8191.712 system clock sample intervals (99.996%)

But without any mention of hardware,OS etc. little specific help is likely, often an interrupt sharing problem or overworked hardware, also you can try:-

pri set debug on span n

Hardware is a Dell R520 Xeon E5 1.8Ghz 4GB RAM

E1 PRI is via a Sangoma A101DE

OS is what ever comes with the latest 64-bit 5.211.65-11, looks like CentOS but branded Schmooze

What is the test actually doing? - Is that what it’s getting from the Alcatel or is it purely generated within itself?

A lot of what I’m reading regarding this dahdi_test points towards these test results impacting on voice quality and nothing to do with dropped calls. I’ve never heard any stuttering, background noise, delays, chopping etc. The audio is just as good as our old legacy box.

Is this correct or can poor results cause dropped calls?

On looking further into this I believe it’s possibly due to the card being on shared interrupts and the R520 does not allow you to move it onto its own!

Would using a Sangoma UT50 USB VoiceTime improve things?

I’ve just had another two dropped calls reported. I had PRI debug on since this morning and on comparing the output it’s identical to a genuinely hung up call from the IP side so I’m stumped!

I set up a test call before I left last night and when I came in this morning it was still active. 15hrs 29mins with no problems. This simulated the prime user who keeps loosing calls so I’m stumped!

Any suggestions?

I had lots of these last night (however it did not cause my 15hr call to drop)

kernel: wanec1: The H100 slave has lost its framing on the bus!

What do they mean?

Disabled busydetect
Disabled facilityenable
Disabled callprogress
Disabled resetinterval
Set pridialplans to unknown

Replaced the PRA card in the Alcatel and the Alcatel engineer see’s no faults

Still dropping calls at random!!

I’m getting desperate now!

Do all your circuit descriptors have exactly the same framing/line coding set on both ends?

Hi dicko

Yep, both ends identical

Sorry to bring this back to the floor however I’m still suffering random dropped calls. I had one myself today, the phone rang for 10 seconds and on answer got a hello from the person at the other end then… gone within 3 seconds.

I’m at a standstill now with this project and only managed 44 handsets out of 590 (590 still being on our Alcatel setup) however at this rate I’ll be moving them back.

DAHDI timing has nothing to do with line timing on the physical circuit.

Are you seeing and frame loss, errored seconds, SLIP’s etc. either on the Alcatel or the FreePBX box? Do you have access to a t-berd or similar?

The best way to troubleshoot is scientific method.

Ask a Question:
Why are my calls dropping?
Do Background Research:
Look at the logs for the date/time of the drop
Construct a Hypothesis:
From reading the history, The call is dropping on the sip side
Test Your Hypothesis by Doing an Experiment:
Focus on your hypothesis, run calls with sip debug until the issue pops up…
Analyze Your Data and Draw a Conclusion
Look at the logs again. Who did what? Do you have enough information? If so move on if not start from the top and refine.
Communicate Your Results:
Come back with more information and get feed back.

I see jumping all over and this seems to be all over the place. Let’s ignore all the dahdi stuff and focus.

Get a pcap of a dropped call, this may take some time but it is better to wait for good data than run with no data.