FreePBX / Asterisk needs machine reboot after a while

We have the following installation:

Intel Server with 2 x Quad Xeon, 16 Gig Ram
1 x Wildcard TE122 Card

CentOS 5.2
Asterisk 1.4.3 (same thing with 1.4.22 or older)
Libpri 1.4.9 (same thing with 1.4.7)
Zaptel 1.4.12.1

The configuration files are created with FreePBX 2.5

There are about 100 Users on this system with 700 calls a day.

Everything is running fine for about 2 weeks.
Then the systeme starts to behave strange: the busy signaling don´t function, agents in queues are getting calls (when they are talking)

First I tried a dialplan reload in the asterisk console.
-> The asterisk still behaved strange.

Then I stopped the asterisk server (amportal stop), I stopped the zaptel. I started zaptel and asterisk.
-> The asterisk still behaved strange.

Then I did a reboot of the complete system.
-> the system comes up and runs fine (for a few weeks).

tia florian

First off 1.4.3 is WAY, WAY old and has HAD many, many bugs fixed, so upgrade to something newer. Try 1.4.21-x to start.

There are also some watershed version points in Asterisk. For 1.4 it is 1.4.22.

Starting at version 1.4.22 they started supporting Dadhi as the replacement for zaptel. So again there were some major changes, if you use 1.4.22 then use the newest, otherwise looks for the latest 1.4.21-x version.

You have provided some details but not enough. How about Phones? Manufacure, type, connection type if it support more then one way, etc.

Have you looked at the system for Ghost calls? These are calls that the system still thinks are connected but in reality are not. We get them every once in a rare while when a remote extension has a network issue while on a phone call. The network issue clears up but only after the remote has hung up and then dialed in again. Mean while the phone system does not know any better and because there is a new call going on it see’s trafftic to the phone and assume both are now happening. The extension hangs up (which drops the specific sip session) but the original one is still going and becomes a ghost.

That’s when we start to see strange things. You can find these calls by looking at the FOP at times when you know they are not on the phone and see if the call counter is something crazy like been on the phone for 17 hours (had one last night, 197 hours).

Besides that you have anything else enabled on the system? Like maybe hosting web pages for a local intranet?

I found that setting rtptimeout in sip_general_custom.conf solves the orphaned calls issue.

never knew about that one, We’ll see if it makes them go away.

It’s not that often for us so not been a bother (as it’s our own internal system).

I´m Sorry but there was a typo. The server version we use is 1.4.23.

I thought the actual Asterisk server is 1.4.23 as this is the version when you download asterisk-1.4-current from digium.
Do you think I should use Asterisk 1.4.22.2 ?

The most phones are Snom 320´s and some 370´s. A few Siemens Gigaset DECT phones and a few Softphones. All are configured as generic SIP phones.

I see some Zombies in the logfiles.
How do I get rid of them?

The machine is dedicated for telephony. So it´s a clean install with all the stuff needed to get freePBX running.

What value do you set the variable rtptimeout?

I think it’s bad Hardware. Asterisk and Freepbx are working so good as Hardware. Maybe thats there any timouts - but it will never be the cause for a crash. Try to buy cheaper hardware and try again!

You are right!

I see a lot of zomie calls in the system.
And that could be the starting point of the troubles.

What can I do to get rid of them?

To start use the soft hangup command in the asterisk cli to hang them up and clean it up.

Then you need to track down why you have them. I explained why we get them your situation might be different, I’ve addede the change SkykingOH recommended and will see if it fixes out issue but we only get one every few weeks so it might be months before I can confirm we no longer get them. You can try it if you are getting them more often.

The trick to picking a time is make it longer then 98% of the people will leave a person on hold but not to long as it seems that a value that is shorter then the hold time somebody would normally do will hang up that call that is on hold based on some comments I’ve seen.

That describes our issues as well. I’m going to try that timeout thing.