Aastra Internal phones losing registration and showing no service

I purchased two FreePBX appliances from Sangoma for two sister sites with the exact same phones, etc. I thought it would be easy. Nope, Sangoma sent one with FreePBX V12 and the other with V13.

I setup the V12 in a breeze following the time tested settings I have on another unit that has been in production for years.

VOIP sip provider = flowroute - works fine
Site to Site trunk = IAX2 with internal IPs (VPN) - works fine
Aastra phones (two different models, both behave the same)
Endpoint manager in FreePBX used to config phones
Default basefile for Aastra (sip line1 registration period: “60”)
Default Qualify (yes)
Default Qualify frequency (60)

Settings:
Chansip
Nat = yes
IP Config Static
External Ip = correct
registertimeout = 20
registerattepmpts = 0
minexpiry= 10
maxexpiry = 3600
defaultexpiry = 120
jitter buffer = disabled
allow sip guests = no
srv lookup = disabled
call events = yes
SIP canreinvite = no
SIP trsstrpid = yes
SIP NAT = yes
SIP encryption = no
SIP Quaify Frequency = 60
SIP and AIX Qualify = yes

All of these worked for 2 months at the north site. Worked like a charm I was the IT guy hero, we did the beta test in the less busy location and the customer was comfortable.

All hell broke loose at the busier south location. My original intent was to restore the appliance from a backup of north and just change a few settings. But they sent a whole different distro on the 2nd appliance, so I said “screw it, it is like 20 minutes of coping info, I will do it by hand.” The worst mistake of my week!

Behavior issues I was having:

At 60 seconds (the basefile default) registration, phones would become “unavailable”, then almost exactly 10 seconds later become available again.

The actual phones would display “No Service”.

The Astrisk Chan Sip Info would report no IP address for the phone and unavailable.

Some various items from the log:

[2016-03-02 11:49:04] NOTICE[1832] chan_sip.c: Peer ‘134’ is now UNREACHABLE! Last qualify: 25

… various log entries telling other extensions this bad news … then 10 seconds later:

[2016-03-02 11:49:14] NOTICE[1832] chan_sip.c: Peer ‘134’ is now Reachable. (9ms / 2000ms)

(I believe the phone lost registration)

And then two seconds later:

[2016-03-02 11:52:44] NOTICE[1832] chan_sip.c: Peer ‘136’ is now Lagged. (5026ms / 2000ms)

I have changed the network switch to a completely different unit/brand, no changes. Moved ports/patches/etc to rule out wiring.

What I have done to fix it:

Changed the registration time in the base file to 180. Aastra said they recommend 120 in their testing with Asterisk, but that did not fix it.

I tried putting qualify at 10 seconds so I could just keep refreshing the logs to watch behaviors for dropped phones. And that made them drop like crazy too. So I moved it back to 60.

So while it is currently working, I am concerned I spent the last 3 full days “hacking it to work” outside of the default settings sent by FreePBX/Sangoma on their appliance, and outside of what has known to work at 2 other locations.

Thoughts on solutions? Where to dig?

It is almost as if the Aastra phones are not registering in time and FreePBX is booting them out???

I did log into the GUI of the phones and verify the registration settings, and both sites matched.

One note - I did see Aastra phones (earlier versions) were known to become unresponsive if the NTP server lookup failed. I verified all three sites used the same 8.8.8.8 DNS lookup in the phones, and the same 0.us.pool.ntp.org.

When I go into the phone GUI I see DNS set to 8.8.8.8 and I did ping 0.us.pool.ntp.org and 8.8.8.8 resolved that to a good IP.

However, late last night one of the last things I did was I noticed in freepbx Sangoma on the appliance set the 1st DNS entry to 127.something.something.something that was not the recommended 127.0.0.1 first entry. I deleted that and changed it to the known working config of 8.8.8.8 as the 1st and 8.8.4.4 as the 2nd. But would Aastra phones try the PBX’s DNS before trying the one configured? (UPDATE - I was not able to “break” another system by changing this DNS setting, so I think it may be ruled out).

If you have setup monitoring on the VPN, have there been an drops or congestion?
Internet stability?
A firewall config behaving badly?

There is a known issue that looks like this if you are running an older Asterisk version, what version is running?

Answers:

  1. The phones that are behaving like a spoiled brat are all internal and therefore the VPN is not the issue. Meaning there is only a network switch between the brats and the PBX, no firewalls, VPNs, etc.

  2. The Asterisk version is unknown right now (not on site) however it is newer than the working sites as FreePBX was updated.

  3. I had a wonderful phone call back from Mitel (I really do like their customer service). They are having me upgrade firmware to a version level 1 support told me not to use. And then do a wireshark trace. Specifically they want firmware 4.1.0.2038. Level 1 told me to use 3.3.1.8106.

They confirmed the “No Service” light I was intermittently seeing is when the phone loses contact with the PBX. I still believe it has/had something to do with bad DNS due to this:

http://pbxinaflash.com/community/threads/aastra-6757i-bouncing-no-service-intermittantly.12138/ (see post 2).

Update. It appears to be the improperly configured DNS that Sangoma put in the installed FreePBX that caused it. All is fixed.

When I put 127.0.0.1 in the freePBX System Admin > DNS, that actually CAUSED this exact problem. After I removed it, this behavior went away.