Tried to upgrade, messed up system, trying to decide where to go from here

(Edit: Corrected version number and date of Elastix download).

I have a rather sad story to tell. Up until about two weeks ago, our FreePBX and asterisk system worked great. We were running Asterisk 1.2 something or other, FreePBX 2.3, and some version of Centos that was at least a year old. Everything worked great, the few users of the system (family and a couple of friends) loved it, and the sun was shining and the birds were singing.

Then, for whatever reason, we decided to upgrade (“we” means me and the other member of my family that actually owns the box that the system runs on). This system had started out as a [email protected] box and at the time we last upgraded, Trixbox 1.0 had just come out! There really weren’t any other choices back then, so we went with that, but really never used the Trixbox components much. Anyway, we didn’t really want to go with Trixbox again given recent events, so it was a tossup between PBX in a Flash and Elastix. We’d heard good reports about both so it was sort of a coin toss as to which to try first.

The first thing we tried was Elastix, specifically version 1.1-2 as released on July 10, 2008 (keep in mind we downloaded this version less than two weeks ago). And we couldn’t get anything to work. Extensions couldn’t place calls to each other, trunks wouldn’t register, and basically nothing worked. We couldn’t figure out why and batted our heads against a wall for several hours, until we finally gave up. That night, I decided to check into the Elastix IRC channel, where I saw this (or some very similar message) as the channel topic:

There is a fix for the latest bug in elastix, if you are having problems with extensions not registering, etc try http://www.voipcoop.org/viewtopic.php?t=78

It turns out that the problem is that Elastix was shipped ON JULY 10 with bad sip.conf, iax.conf, extensions.conf and extensions.conf.orig files (or at least a bad extensions.conf, depending on which post you believe). We downloaded the ISO file during the first days of AUGUST. Didn’t the Elastix folks think maybe it might be a good idea to fix the MAJOR BUG in their ISO distribution sometime during, say, the month of JULY? Apparently not. I’m told that a new Elastix BETA has come out in the last few days; I’m not sure if the stable version still has the bad files. Heck, even if they’d just put a notice next to the download link (“after downloading please go see this message”), that would have been helpful. At that point I sort of decided that I didn’t much like Elastix and then when I couldn’t get my trunks to work, that more or less signaled the demise of Elastix on that box. We downloaded and installed PBX in a Flash.

I wish I could tell you that after that all was joy and sunshine, but alas, it was not to be. For one thing, the PBX in a Flash installation was anything but - compared to Elastix’s installation time, one could be forgiven for thinking that the “in a Flash” portion of the name was inspired in a moment of high sarcasm. But okay, we didn’t care that much how long it took to install (okay, so maybe we did), we just wanted it to work, and initially extension to extension calling seemed to work okay. The problem came, once again, when we tried to set up trunks. Using the EXACT same configuration we had used on the old system, the trunks wouldn’t work. After playing around with trunk options we discovered that by removing any references to codecs, the trunks started working. Later we’d learn that in Asterisk 1.4, if you have a disallow=all statement in a trunk configuration, it must come BEFORE any allow= statements. It would be really great if the FreePBX trunk module would check for that and spit out a helpful error message if you try to create a trunk and disallow=all comes AFTER any allow= statements.

At that point we realized that we may have bailed on Elastix prematurely, however we were frankly SO pissed off about having spent all that time trying to get it to work when all along they had known about the MAJOR bug in their distribution, and neither fixed it nor made any mention of it on their download page, that we were going to try to make PBX in a Flash work for us no matter what. And in one way we wanted to like it, because it’s very easy to upgrade - but therein also lies a flaw. Apparently when you use their scripts to upgrade, it fixes permissions on certain files, but also breaks the permissions on other files. So you run their scripts and you have no idea what they are doing; what they are fixing and what (just maybe) they are breaking. But that wasn’t the real problem (at least I don’t think so).

The real problem now is that the system is having trouble completing calls (extension to extension), and we don’t know why. A typical symptom will be that you dial another extension, and you hear dead silence but the other person’s phone rings. They may pick up and answer, but you still hear nothing. It doesn’t happen every time and there seems to be no rhyme nor reason as to why it happens (you can hang up, place the same call again, and often it will work on the second attempt). Another variant is that you call someone, and maybe 20-25 seconds later their phone starts to ring (but again, you never know when that will happen). It’s not excessive CPU usage; the TOP command never shows any process even coming close to pushing the CPU hard. And keep in mind, this is a system that worked just about flawlessly under Asterisk’s 1.2 branch and FreePBX’s 2.3 branch.

So we are trying to decide what to do next. Here are the possible options as I see them:

  • Try to revert to the old system. We have a backup that was made using the following command in a batch file:

tar --exclude /var/spool/mail/admin -czf /backup/fullbackup.tgz /usr /var /etc /bin /dev /home /sbin /lib /root /tftpboot /boot

The trouble is that I’m not enough of a Linux geek to know how to actually expand this file and use it to replace the current system, and I fear I may really screw something up. How do I use the above backup to overwrite what’s now on the hard drive?

  • Give Elastix another chance. I’ve pretty much gotten over their botched files, but my fear here is that if these are issues caused by Asterisk 1.4 (or a newer version of CentOS, or whatever) then simply moving laterally to another distribution won’t fix the problem. The old system worked so wonderfully that everyone is really disappointed in the way this one is working, and I’d hate to prolong the agony while we work through any Elastix issues we might encounter.

  • Try the Elastix 1.2 beta. That should fix the incorrect files issues, but will it fix the other problems? Or did Elastix never have those problems to start with? Plus I’m sure we’d still be dealing with Asterisk 1.4.

  • Try some other distribution. I know there are others that use FreePBX and Asterisk.

  • Try an older/different distribution that still uses Asterisk 1.2. Do such even exist? I can always upgrade FreePBX - I don’t think it has anything to do with these issues.

Or ???

I’m looking for suggestions, where do we go from here?

I do have a number of ideas though. One is that there is a network config error. Another is that I have had a few upgrades (various ISO's) that went a bit south. In each case I just removed the offending (extension|trunk|ring group, etc,,,) and entered them again. Depending on what VoIP provider you are using, a sanity check is registering a softphone ( on the same network) to your SIP/IAX provider and make a few calls. If you can't get registered... or registered with one way audio you get a few ideas where to look. If you can register a softphone, time to look at the NIC config. Easiest way is to use webmin. Another step I would try is to install the FreePBX tarball. And while a pain, trying a fresh install with just two extensions (created fresh) would be another test.Lastly I wonder if you have two DHCP servers running or something?

As a side note, I use PiaF and am happy with it. If you continue to try it I may be able to give you a hand.

Robert

wiseoldowl,
sounds like you’ve run into a string of bad luck and circumstances. The Elastix issue I am surprised to hear, as I’ve mentioned the flaw in their setup on a few occasions. However, FreePBX does give you a notification warning of the potential error, that’ s why we put it in. And I updated that error message some weeks ago to be more helpful as to the urgency, but their version probably did not have that update yet.

As far as your described extension to extension behavior, if I had to guess, it sounds like you have either DNS issues or issues getting to the external network sometimes. It’s a known issue with Asterisk, 1.2 as well. It goes off the deep end if it can’t get to DNS and has an fqdn names used anywhere. (even though the local extensions could care less about fqdn’s).

The trunk configuration issue … it has always been incorrect in Asterisk to put disallow=all without any allow=codec statements following. Older versions of FreePBX used to reverse sort all your trunk settings, so disallow got moved up.That was removed because it created other errors. However, in a proper upgrade scenario, the reverse sorted order is what should be inherited. However, there are plenty of bad examples and information from vendors often. I’ve seen it requested before that FreePBX provide some sort of error when an in-operable configuration like you describe is provided. However, I don’t believe the suggestion has been put into the tracker which means it will be very quickly forgotten. If it did make it in there, then it has been overlooked because it probably would have been a good feature request for 2.5 that may have been considered. So to keep ideas like that alive, they are going to have to be in the tracker.

cosmicwombat, I’m fairly sure the network is configured properly, and I’m not having problems with any one particular extension, in fact the problem seems to affect all extensions but at various times.

Part of the problem is that over the last year or two we have made several small tweaks to the system to improver performane or minimize problems. It’s quite likely that one or more of these we “lost” in the upgrade. If I were a professional system administrator, I’d probably have a notebook or a text file documenting every change we’ve ever made to the system, and then I could have the joy of looking through pages and pages of notes, trying to figure out what we missed. But then again, part of the whole reason for wanting to upgrade was to start fresh, without having all the baggage of past experimentation. The trick would be to know which tweaks were actually necessary to proper functioning of the system.

Anyway, the likelihood of us continuing to use PiaF seems very low at the moment, unless we find the magic thing that fixes these problems! Thanks for your thoughts, I appreciate all input.

Philippe, that was another thing I didn’t like about the Elastix distribution - they were still using FreePBX 2.3.something rather than the 2.4 branch - of course it is easy enough to upgrade but I think our idea was that we wanted to get the system up and running first. If you posted about this problem I somehow missed it, but that’s not surprising considering the number of other things that have been demanding my attention recently. There is a part of me that feels like this is the sort of project that should be attempted in the long, dark winter months when folks (at least those of us who are not cold weather enthusiasts) have plenty of time to mess with this stuff.

The DNS thing may be a real possibility. I just reconfigured the system to use an alternate DNS server that I know to be reliable, rather than just going to the ISP’s DNS (which can be notoriously UNreliable). If that turns out to fix the issue, I’ll be kicking myself for not thinking of that sooner, because we get bit by the “(local) calls won’t go through” issue every time we lose internet connectivity. You’d think the Asterisk folks would fix that, but then I guess they all have very reliable Internet connections.

Regarding bad trunk configuration examples - even in your Wiki (will the Wiki EVER be opened up so that anyone can fix errors on existing pages?) there are configuration examples that show allows before a disallow=all statement (usually this is where the trunk configuration options are placed in alphabetical order). I actually went though last week and fixed those I could, but on most such pages all I could do was leave a comment.

It’s been a while since I ventured into the tracker system, but I’ll try posting something in there, since clearly it would help many people if an inoperable trunk configuration were flagged as such (although I am a big believer in giving the user the right to override such warnings, just in case they really do know what they are doing. I’ve already run afoul of too-strict error checking; I can’t recall the exact circumstance but I recall that it had something to do with the assumption that a certain character would NEVER be used in a valid dial string or dial plan, except that it could be under certain circumstances. Maybe it was a + character in front of an international-format number, or something like that? I just can’t recall right now).

If I can help in anyway,

Robert

wiseoldowl,
if there is a page you can’t edit, let me know which one. I open the wiki to everyone who asks to contribute but there may be some problematic pages still. And there has not been anyone yet with the experience level who has been able to provide the time to make the changes to Drupal on a test system so that we could make more granular permission abilities on the site so that continues to be delayed. (But again, if there is a page you can’t edit, you have my email address or no where to get my attention on the IRC so you should let me know)

As far as DNS, the best solution, if it is an option,is to use ip addresses only and no fqdn so Asterisk never needs to care about DNS and this would also address the issue if that is the issue (or if the issue ever arises). As far as running into issues like a character you could not use in a form (that has very possibly been fixed in a later release once bugs were reported) you should file bugs, assuming the errors were not fixed on one of the supported releases. (Since you were previously on 2.2 and our typical support is for the current and previous release).

In case anyone comes across this thread at this late date, I will just add this. We never were able to resolve the problem of PiaF not completing calls without an excessive delay (or, sometimes, dropping the call altogether) so after a week of putting up with that we went back to Elastix. It has worked pretty well for us but I can’t say we are totally happy with it. One major problem is that they have FreePBX so tightly integrated into Elastix that if you go to “Unembedded FreePBX” and try to upgrade FreePBX to the latest version, it may (or may not) break parts of the Elastix interface - and the Elastix folks aren’t particularly quick to release updates.

The other problem is their forum software sucks. If you go onto their forums to ask a question or leave a comment, better type quickly (or compose it ahead of time in a text editor and paste it in) or your post may be lost because they have a VERY short registration timeout. I’ve lost more than one post because of that and while there is a trick that sometimes will recover the post, it doesn’t always work (and many people might not stumble across it). This has been a problem now for over six months that I know of, and probably since they started their forums.

So my basic perception is this: PiaF is probably a great system for newbies (assuming they have fixed their permissions bugs) but it tries to micro-manage your system to the point that experienced users (people like me who cut their teeth on [email protected]) may resent the loss of control over your system - the very thing that makes it attractive to new users may put off some experienced users. Elastix seems at first to be a more professional system (and has more “bells and whistles” included) and, at least in our experience, works fairly well and doesn’t compete with you on who’s in control, UNLESS you try to upgrade software (like FreePBX or Asterisk) - then things can and do break. IMHO they should either issue more frequent upgrades to Elastix (if only to keep up with Asterisk and FreePBX updates) or they need to just use “unembedded FreePBX” as the primary interface (I truly wish they had an option to do that). They definitely need to fix the timeout on their forum software.

If I had it to do over again, I think I might have waited another three months and then tried AsteriskNOW, but at the time it didn’t include FreePBX. Our system is working well enough under Elastix, but it just frustrates me that we can’t upgrade Asterisk or FreePBX without the worry that it will break some other part of Elastix. As for PiaF, I would hope that they have fixed whatever caused the slowness in connecting internal extensions by now, but what really unsold me on it was the fact that it was overwriting config files that I wanted to manage, and not have overwritten every time I ran their upgrade script.

I still recommend PiaF for rank newbies (as long as you can still get it for free, anyway - I note they’ve started trying to figure out new and creative ways to make money off of it, and while I don’t begrudge them that, I hope that at least the basic PiaF iso’s stay free for the foreseeable future) - you will get a degree of hand-holding with PiaF that you won’t find elsewhere, and maybe that’s all you will ever need. But I think that many people, as they gain more experience with FreePBX and Asterisk, will want to graduate to something with a little more power and perhaps a little less micro-management of their system. Also, experienced Linux geeks (of which I am NOT one) would probably prefer something that lets them tweak their systems a bit without undoing their tweaks with every upgrade.