This is scaring the hell out of me

OK, I just did a brand new install of the FreePBX distro, latest version 1.8.2, on brand new server class hardware (rackmount server, Xeon 3470, 4GB DDR3 ECC Registered RAM)

After installation, I ran dahdi_test, and got best results of 100.000 but… worst results of 99.5xx !!!
Repeating the test many times, values of 99.5xx or 99.6xx keep popping up now and then!

Now, following the forums here, I remembered a quote from tonyclewis:

“The worst tells me everything I need. The worst your should be seeing to run a production system on is 99.97x it should never drop below that. If it does you will have lots of voice quality issues as anyone who understands how the test works knows.”

So here I am, running on dedicated, expensive server hardware, and getting timing results far worse than in a VM!

High Precision Event Timer in BIOS is set to enabled.

What can be wrong, what can I check or change?

If needed, I can provide you with remote access to the box.

Thanks!

You should not let anyone have access to your box other than paid FreePBX support.

It’s great you have that killer box however it does not tell us much.

The first question is DAHDI picking up a timing source (such as the USB?)

Here is the output of a DL360 G5 with properly configured timing:

I am thinking maybe Tony made a typo, not sure.

[code][[email protected] ~]# dahdi_test
Opened pseudo dahdi interface, measuring accuracy…
99.998% 99.617% 99.997% 99.997% 99.997% 99.605% 99.997% 99.612%
99.997% 99.996% 99.996% 99.997% 99.607% 99.997% 99.612% 99.996%
99.997% 99.997% 99.997% 99.607% 99.996% 99.612% 99.997% 99.998%
99.996% 99.996% 99.607% 99.997% 99.612% 99.996% 99.997% 99.997%
99.997% 99.607% 99.997% 99.613% 99.997% 99.997% 99.997% 99.997%
99.606%
— Results after 41 passes —
Best: 99.998 – Worst: 99.605 – Average: 99.883638, Difference: 99.996862
[[email protected] ~]#

[/code]

You want to see this when you start DAHDI

[code][[email protected] dahdi]# service dahdi start
Loading DAHDI hardware modules:
wcte12xp: [ OK ]

No hardware timing source found in /proc/dahdi, loading dahdi_dummy
Running dahdi_cfg: [ OK ]
[[email protected] dahdi]#
[[email protected] dahdi]#
[/code]

Most important you want to see this in Asterisk:

localhost*CLI> dahdi show channels Chan Extension Context Language MOH Interpret Blocked State pseudo default default In Service localhost*CLI> localhost*CLI>

Hope this info helps.

It seems like you don’t have precise timing on that HP as well…
Tony unfortunately didn’t make a typo, you can find this info in several other places, and 99.6xx is really bad :frowning:

Anyway, here is the output of the things you requested:

[[email protected] asterisk]# service dahdi start
Loading DAHDI hardware modules:
  wct4xxp:                                                 [  OK  ]
  wcte12xp:                                                [  OK  ]
  wct1xxp:                                                 [  OK  ]
  wcte11xp:                                                [  OK  ]
  wctdm24xxp:                                              [  OK  ]
  wcfxo:                                                   [  OK  ]
  wctdm:                                                   [  OK  ]
  wcb4xxp:                                                 [  OK  ]
  wctc4xxp:                                                [  OK  ]
  xpp_usb:                                                 [  OK  ]

No hardware timing source found in /proc/dahdi, loading dahdi_dummy
Running dahdi_cfg:                                         [  OK  ]

But in Asterisk (1.8.5.0):

mypbx*CLI>dahdi show channels
   Chan Extension  Context         Language   MOH Interpret        Blocked    State

[[email protected] asterisk]# dahdi_test Opened pseudo dahdi interface, measuring accuracy... 100.000% 99.987% 99.615% 99.604% 99.996% 99.994% 99.998% 99.994% 99.992% 99.992% 99.998% 99.986% 99.996% 99.994% 99.997% 99.996% 99.987% 100.000% 99.994% 99.994% 99.996% 99.998% 99.996% 99.991% 99.996% 99.993% 99.982% 99.999% 99.994% 99.997% 99.994% 99.991% 99.996% 99.995% 99.996% 99.994% 99.994% 99.998% 99.995% 99.994% --- Results after 40 passes --- Best: 100.000 -- Worst: 99.604 -- Average: 99.975146, Difference: 99.994819

That’s odd as I don’t have any trouble with this machine.

You do need a minimal /etc/asterisk/chan_dahd.conf so that Asterisk loads the pseduo driver.

Just for fun I looked at my primary conference server, it’s an older 1.4 box:

dahdi_test Opened pseudo dahdi interface, measuring accuracy... 99.925% 99.973% 99.981% 99.983% 99.979% 99.981% 99.982% 99.627% 99.981% 99.978% 99.981% 99.982% 99.916% 99.980% 99.973% 99.981% 99.981% 99.980% 99.980% 99.981% 99.982% 99.981% 99.982% 99.980% 99.922% 99.980% 99.981% 99.983% 99.979% 99.981% 99.981% 99.981% 99.982% 99.978% 99.981% 99.921% 99.980% 99.982% 99.980% 99.981% 99.981% 99.980% 99.982% 99.980% 99.981% 99.981% 99.981% 99.823% 99.980% 99.978% 99.982% 99.982% --- Results after 52 passes --- Best: 99.983 -- Worst: 99.627 -- Average: 99.966045, Difference: 99.999320

Indeed it is much better, however there is one 99.6. This machine runs large conferences all day long with no issues.

Check this out, however -

This box has an old x100p in it for timing:

dahdi_hardware pci:0000:00:0b.0 wcfxo- 1057:5608 Wildcard X100P

The interesting thing is the card is present but the driver is not loaded for it.

Scott

Those are not good results. Remember just because you buy a great server and spend a ton of money does not mean it will work well with asterisk and dahdi. A dahdi_test with anything less than 99.97 will result in issues with system recordings, conf rooms, paging and even MoH.

Here is a printout of the hardware we use with proper dahdi setup.

Opened pseudo dahdi interface, measuring accuracy…
100.000% 99.992% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.997% 99.998% 99.999% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.999% 99.998% 99.998% 99.998%
99.998% 99.997% 99.999% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.997% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.997% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.999% 99.997% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.999% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.999% 99.998%
99.998% 99.997% 99.998% 99.998% 99.998% 99.998% 99.998%
— Results after 95 passes —
Best: 100.000 – Worst: 99.992 – Average: 99.998071, Difference: 99.998070

Here is the print out from one of our thousands of proper hosted customers in a real shared environment with a custom kernel and real hardware timing with a Sangoma USB timing device. This shows a “virtual” setup can be done right with proper hardware timing but it also takes maintaining a custom kernel and real understanding how asterisk, dahdi and the kernel work together among other things.

[[email protected] ~]# dahdi_test
Opened pseudo dahdi interface, measuring accuracy…
100.000% 99.992% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.997% 99.998% 99.998% 99.998% 99.998% 99.999% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.999% 99.998% 99.997% 99.998% 99.998% 99.998%
99.998% 99.997% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.999% 99.998% 99.998% 99.999% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.999% 99.997% 99.998% 99.998% 99.998% 99.999% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.999% 99.998% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.999% 99.998% 99.998% 99.998%
99.998% 99.998% 99.998% 99.998% 99.998% 99.998% 99.999% 99.998%
99.998% 99.998% 99.998%
— Results after 115 passes —
Best: 100.000 – Worst: 99.992 – Average: 99.998116, Difference: 99.998116

I’m always suspicious of round numbers. If 99.97 will work, I can’t imagine that 99.9699999999999999999999 would really result in a noticeable difference.

I’ve now run the Dahdi_test on every machine that I’ve ever used to run Asterisk. Except for one, they’ve all gotten results similar to the one posted at the top of this thread, regardless of whether the machine was virtualized or not. All have used a psuedo timing source.

The sole exception was the Atom based netbook that was running Asterisk in a virtual environment (and I knew that wouldn’t work - I just tried it for kicks). Even that machine worked pretty well until I tried paging ten extensions at once. Even then it worked, but my voice was just out of sync.

I’ve never seen had any trouble with system recordings, IVR, voicemail, paging, or conferences. None of my machines are running a large business, but I’ve had ten people in a conference room and routinely sent pages to the ten extensions in my home, and experienced no issues.

Tony,

Thank you for confirming my fear.
And congrats with your great results.

Now I would appreciate if you could give me some suggestions of what I can check or do to improve our timing.

Apparently, it’s not using the HPET timer available in modern hardware, or probably not even the USB controller timer ?

I’ve also read https://wiki.asterisk.org/wiki/display/AST/Timing+Interfaces :

“res_timing_timerfd uses a timing mechanism provided directly by the Linux kernel. This timing interface is only available on Linux systems using a kernel version at least 2.6.25 and a glibc version at least 2.8. This interface has the benefit of being very efficient”

Unfortunately, we’re still stuck with a 2.6.18 kenel in the freepbx distro! Wouldn’t it be time to move to CentOS 6 to benefit from this new very precise timing source?

Thanks for your help!

The guys from PBX-in-a-flash having done tests with the new CentOS 6 confirm that it provides a consistent 99.99999% to 100% DAHDI test result in a ESXi virtual environment! I’m impressed, now that should be enough incentive to upgrade!

Yes except Centos 6.0 has lots of issues and is not ready for production PBX’s. Just following the threads on Centos 6.0. It wont be ready for a few months. I have a internal build working somewhat with Centos 6.0 but it will be some time before we even come out with a Beta of Centos 6.0 Distro.

Sorry but we build production stable and proven reliable PBX’s with our Distro and are not trying to bleeding edge on things we can not control like Centos. People have come in a short period of time to rely on our stable and single release cycle PBX knowing they can put it in place and not be chasing problems all day long. If you want to be bleeding edge there are plenty of FreePBX based Distros out there such as PBXiaF already.

Also the easy way to solve your issue is to buy a Sangoma UT50 USB dahdi timing device.

The CentOS 6 remark was only on a sidenote, my real question was about solving the timing issue with the current release of course.
I appreciate putting stability first :slight_smile:

So besides buying an extra piece of hardware, how can I check which timing source dahdi is using now? The RTC or the USB UHCI ?
E.g. if it’s not using the USB controller, maybe changing it to that could be a solution as well?

From the Sangoma documentation:

For best operation use a 2.6.25 or newer Linux kernel. The USB sub-system has changed drastically and earlier kernels are not as stable.

So again, the older CentOS kernel bites us here :frowning:

When I do cat /sys/devices/system/clocksource/clocksource0/current_clocksource I get: “tsc”

cat available_clocksource gives me:
acpi_pm jiffies hpet tsc

I recommend using a POTS line for 911, if you install a TDM400 board with an FXO it will also solve your problem for about the same price.