Attn: Experts! Help Asterisk fix the "Extensions go down when internet goes down" bug in Asterisk

So, I’ve opened a bug report with Digium regarding the problem that causes Asterisk to freeze up and stop supporting all SIP extensions when the internet goes down.

However, they’re asking me to do things that are well beyond my level of technical ability.

Here’s a link to the report:

https://issues.asterisk.org/jira/browse/ASTERISK-18930?jwupdated=44725&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#issue-tabs

Tony/Skyking OH/Obeliks, etc.: Please help. Login to the bug tracker and provide Digium with the information that they’ve requested.

If you have SIP trunks or extensions that point to an external domain (or FQDN), this is a normal behaviour when Internet connection goes down.

To solve the problem check for IP corresponding to that domain and fix it in /etc/hosts.

Example: SIP domain = sip.poivy.com

check the corresponding IP with the command "ping sip.poivy.com"
result: 77.72.169.131

Add to /etc/hosts the following line:

77.72.169.131 sip.poivy.com

note: /etc/hosts has allways the precedence over the DNS resolution.

Yes, I’ve done that.

BUT- it won’t work if you use a provider that requires SRV Lookup, such as Callcentric. The hosts file can’t handle SRV lookups.

It also creates problems where your provider changes its IP address (as happened a month or so ago when seattle.voip.ms went down and they re-routed traffic to Chicago).

I’ve found a more elegant way to fix it that allows DNS to work until DNS goes down. It involves telling CentOS to use DNS first, and then the hosts file.

But, wouldn’t it be better if Digium just fixed the bug?

It’s not a bug, when internet goes down the dns resolution cannot be done.

Another workaround is to set your local DNS server to be authoritative for the external domain.
To do so you need to create a master domain with the same name and same records of the real one.

Use NSLOOKUP or DIG to check the record values on real domain.

or better,

if the authoritative nameserver (the real one) for that domain let you do zone transfer, you can create a local fake slave dns for that domain, so each time they change a record, your slave dns get automatically updated.

For both workarounds you should use 127.0.0.1 as a primary nameserver in resolv.conf

Tony,

Yes, I am using the latest Distro, which includes Asterisk 1.8.7.1. I’ve also tested on the prior version of the Distro (using Asterisk 1.8.6).

I also have the workaround set-up on my system. The only reason that I found out that the bug still exists is that I didn’t realize that enabling SRV lookup (as required by Callcentric’s set-up instructions) essentially negates the hosts file entry for Callcentric and causes Asterisk to have to do DNS lookups. The other day, my internet went down and all the Aastra phones reported “no service.”

No it really is a bug. What happens is chan_sip is blocking so when it can not resolve the FQDN it blocks all inbound SIP request until it can resolve it. They have stated it is fixed in Asterisk 1.8 but I have yet to test it since all of our systems have work arounds built in for this and it would take break those.

Are you using 1.8

I have two comments/questions:

  1. Can you give a step-by-step explanation of how to make your local nameserver to be authoritative for the external domain?

  2. I disagree with your contention that this is not a bug. Most programs that utilize DNS do not hang when DNS is absent. Rather, they are programmed to understand that DNS is not present, and to carry on with whatever they can accomplish in the absence of DNS resolution.

By way of example, my router continues to route local traffic, even when DNS services are down. My browser can continue to browse to local HTTP servers, even when DNS and the internet are down.

Asterisk’s behavior in the face of failed DNS resolution, which involves just repeating the attempt to lookup the failed DNS (instead of moving on to another task that doesn’t require it) is, in my view, a bug.

My current workaround is to set all trunks using IP addresses. I believe that the hosts file is too hidden and too easy to forget about.

If, for example, your trunk provider changes its IP address, it is easy to remember that you’re using IP addresses if you programmed them in the trunk settings, because they’ll jump out at your when you examine the trunk settings, and when you ping your provider by name, the IP won’t match.

My future workaround is going to involve changing nsswitch.conf so that CentOS uses DNS first and then the hosts file (instead of the other way around, which is the default).

My theory is that if DNS is down, I don’t really care what IP address Asterisk tries to reach because it won’t get anywhere until internet access is restored. If DNS is up, I’d rather that the system use the current, real DNS entry, than a hosts file that I set-up two years ago (and which may no longer be current).

first of all find the nameserver for that domain using a whois tool

eg:

Domain Name: POIVY.COM
Registrar: NETWORK SOLUTIONS, LLC.
Whois Server: whois.networksolutions.com
Referral URL: http://www.networksolutions.com/en_US/
Name Server: NS1.FINAREA.CH
Name Server: NS3.FINAREA.CH

then do a DIG from shell:

pilovis@pilovis-laptop:~$ dig calcentric.com ANY

; <<>> DiG 9.7.0-P1 <<>> calcentric.com ANY
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2054
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;calcentric.com. IN ANY

;; ANSWER SECTION:
calcentric.com. 7200 IN MX 1 mail.domainingdepot.com.
calcentric.com. 7200 IN SOA calcentric.com. dns.calcentric.com. 1 10800 3600 86400 3600
calcentric.com. 7048 IN A 72.172.91.210
calcentric.com. 7048 IN A 72.172.91.202
calcentric.com. 7048 IN A 72.172.91.204
calcentric.com. 7048 IN A 72.172.91.206
calcentric.com. 7048 IN A 72.172.91.209
calcentric.com. 7200 IN NS ns4.domainingdepot.com.
calcentric.com. 7200 IN NS ns3.domainingdepot.com.

;; Query time: 220 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Fri Dec 2 08:35:45 2011
;; MSG SIZE rcvd: 224

then use webmin to create master record:
http://doxfer.webmin.com/Webmin/BINDDNSServer

or if you prefere doing it by hands:
https://help.ubuntu.com/8.04/serverguide/C/dns-configuration.html

Pilovis,

I’m sure this all seems obvious to you, but it isn’t to me. I’ve never installed or configured Bind and never used Webmin. While this is a good starting point, it isn’t the step-by-step instruction I’d need. I’m willing to write a set of good instructions on how to install and configure Bind and webmin (as I’ve done with the FreePBX Distro), but I need someone to walk me through how to do it first.

The FreePBX Distro doesn’t include Bind9 (as far as I can tell) or Webmin. So, for a step-by-step set of instructions, I’d want you to start with how to install Bind9, and then how to configure it.

I’m also pretty sure that we’ll need to modify /etc/resolv.conf to change it so that the only listed nameserver is 127.0.0.1. If you wanted to consult your local nameserer and then some external servers, I’m guessing that you’d also want to enable the “strict-order” feature in dnsmasq.conf. I’m not sure, however, because I don’t really understand the relationship between dnsmasq.conf and named.conf - it may be that named.conf override dnsmasq.conf.

And again, it’d be soooo much better if Digium just fixed Asterisk! :slight_smile:

to install bind on Centos:

yum install bind

On Debian:

apt-get install bind9

to install webmin:

su - root
cd root
wget http://downloads.sourceforge.net/project/webadmin/webmin/1.570/webmin-1.570.tar.gz?r=http%3A%2F%2Fwww.webmin.com%2F&ts=1322781808&use_mirror=freefr
tar -zxvf webmin-1.570.tar.gz
cd webmin-1.570.tar.gz
./setup.sh

the setup is auto explanatory, use default values except for password :slight_smile:
On Debian box you shoud have build-essential to install webmin (apt-get install build-essential)

Okay, so here’s what I’ve got so far.

  1. Install bind.
  2. nano /etc/resolv.conf
  3. Make the only entry 127.0.0.1
  4. Configure Bind to be authoritative using the page you referenced earlier.

Doesn’t this just do the same thing as modifying the hosts file? Or will making it authoritative allow SRV lookups as well?

Also, what happens when my PBX decides it needs to do a lookup I haven’t configured as authoritative in bind? Can bind be configured to be authoritative for things it knows and caching for the rest?

You have to put the original SVR record in your fake local domain to be authoritative for SRV lookups as well.

command to check svr records in a domain name:

nslookup -q=srv _sip._udp.callcentric.com

or

nslookup -q=srv _sip._udp.poivy.com

or

nslookup -q=srv _sip._udp.whateverelse.com

nslookup -q=srv _sip._udp.callcentric.com
Server: 192.168.1.1
Address: 192.168.1.1#53

Non-authoritative answer:
_sip._udp.callcentric.com service = 20 0 5080 alpha6.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha7.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha8.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha9.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha1.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha2.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha3.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha4.callcentric.com.
_sip._udp.callcentric.com service = 20 0 5080 alpha5.callcentric.com.

so, you have to put the above records on your fake domain nameserver in this format:

_sip._udp.callcentric.com. 86400 IN SRV 20 0 5080 alpha6.callcentric.com.

Note that you should also have “A” record for alpha6.callcentric.com as well, so let’s check again

command:
dig alpha6.callcentric.com

reponse:
; <<>> DiG 9.7.0-P1 <<>> alpha6.callcentric.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6657
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;alpha6.callcentric.com. IN A

;; ANSWER SECTION:
alpha6.callcentric.com. 583 IN A 204.11.192.36

;; Query time: 43 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Fri Dec 2 01:37:32 2011
;; MSG SIZE rcvd: 56

so, the record to create is in ANSWER SECTION:
alpha6.callcentric.com. 583 IN A 204.11.192.36

Example:

The record you have to put in your domain is:

_sip._udp.example.com. 86400 IN SRV 0 5 5060 sipserver.example.com.

note: do not forget the dot after the domain name

The format of the record is:

_service._proto.name TTL class SRV priority weight port target

Q.: Can bind be configured to be authoritative for things it knows and caching for the rest?

R.: Yes, that’s the default configuration of bind.

The best would be to set bind to forward all dns queries to external public dns servers and resolve fake domain only when internet is down.

I will check if it is possible.

Pilovis,

Yes, that would be ideal. In fact, if the internet is down, I don’t even care if Bind gives the correct answer. When the internet is down, any answer is fine. If we could configure Bind to simply forward when the internet is up, and provide a dummy answer (like 11.11.11.11) when the internet is down, that would completely resolve the problem, for now.