INBOUND calls placed on hold are dropped when taking off hold

I have been searching high and low, through over a dozen forums and support sites in search of someone else having this issue. As usual, I appear to be the only one. I’ve even tried sifting through the dial plans by hand to see what I could turn up. This issue is a truly odd one. In 6 years of working with fPBX, I’m at a total loss.

I was running a single instance of FreePBX (originally installed as PBX in a flash purple 1.8, progressively upgraded through the current 2.10 [per the module admin] version) with asterisk 1.8, running on Cent OS 5. Aside from the limitations inherent in the combination of fPBX 2.10, Asterisk 1.8 and the outdated CentOS 5, everything worked great.

However, a few weeks ago, I decided it was time to deploy a new server (was running on a small dual-core PC as a pilot) to expand to more departments in my organization. So, out came the quad-core beast. I loaded a fresh install of the latest stable FreePBX iso and chose the Asterisk 11 option (this was after running into issues with asterisk 1.8 and 10 builds, misc failures and such…asterisk 11 installed and got running without a single glitch). So, now I have the latest stable fPBX with asterisk 11 on CentOS 6.4.

I used the backup/restore module in fPBX to backup the old server. I then made manual backups of my custom MOH files and such (because I’ve had issues in the past restoring these, particularly regarding permissions…the backup/restore module sometimes causes more problems than it solves!).

I restored using the module on the new server. After making a few adjustments and addressing a few missing dependencies, everything works great…with one exception:

On inbound calls (only, outbound unaffected), when a user places the caller on hold, caller received MOH as expected and the user phone displays hold appropriately. When user takes caller off hold, call is immediately disconnected entirely. This occurs regardless of user device or software (my yealink, polycom, and software sip phones all have this behavior).

Now, here’s where things get difficult: I’ve monitored the asterisk logs, freepbx logs, and the CLI and nothing that would immediately point to the problem appears. Logs indicate that, upon being taken off hold, the next step in the macro is to simply hangup, instead of passing the channel back to the device.

Sadly, I don’t know the structures of the asterisk dialplan sufficiently to find the segment that encompasses this to see if something is missing or in error. No logs show an error or even a debug message at the time it passes from MOH to HangUp.

I had log files to attach here, at least, I thought I did. I’m writing this from my tablet at home and realize I didn’t actually save the file I pasted the logs into…so I’ll have to post them on Monday when I get in and can reproduce the issue and capture a fresh set of logs. But I assure you, there is nothing even remotely obvious there. I even compared to logs from the old working server…the same log output results.

That said, I thinking it must be something in MOH, hold functionality, or something similar that has changed in Asterisk 11 and, when I restored from the backup/restore module, I overwrote some one-liner in some damned file that accounted for the difference between the versions.

If anyone has ANY insight into this, no matter how seemingly insignificant it may be, please enlighten me. I’m no dummy…I’ve been troubleshooting virtually non-stop for weeks. I even fresh installed and tried again, only restoring the data I needed (extensions and module configs, only after installing the latest versions of all modules through admin) and manually restored MOH through the module and manually rebuilt voicemail.

While this issue only affects inbound calls to a device (whether internally sourced or inbound from our sip trunk) when trying to take a call off hold, it is significant enough that I can’t deploy…my users would murder me after the first dozen or so dropped calls! I can’t stay on the old server for long, it is being pushed pretty hard and is occasionally having issues because I can’t update a few key components like PHP due to limited Zend support on CentOS 5.

Thanks in advance to anyone who can help. If you need any more information (beyond the logs and such I’ll post on monday), let me know.

Turn off RPID, the call is getting reinvited off the current media path and Asterisk is failing (just a guess).

You have to look much deeper that a dialplan debug. You need to turn debugging up and look at the full log very carefully. Also turn off Asterisk debugging and look at the SIP call trace. With this complex a problem Wireshark provides a fully decoded SIP trace.

Start with the RPID, I of course could be wrong.