Asterisk become mad when a DNS problem occur

I do not know if this issue is related to FreePBX or Asterisk, this is why i do not create a track report yet.

After three years using asterisk at different locations, we have seen that each time there is a DNS problem, Asterisk become mad, all extensions appear and disappear in an unpredictable cycle, making the system fully unusable.

This occur only if a trunk is declared with a domain name : host=host.domain-name.tld

The dirt fix we have found to avoid this, is to put the IP adress of the host in the Host file.

Replacing the host domain name with its IP adress in the host= decalaration do not always work because some providers are waiting for the domain name in the SIP frames.

Tested using Asterisk 1.4.21.1

This is a known issue for as far back as I can remember (1999)

run a local dns cache box and point everything to it (do not forget to set the forwarding on the
local box a real dns box

A search at any asterisk forum would have told you this.

The local DNS works only for the time of the declared cache lenght for the pointer.

After this period the cache become cleared on local DNS servers. This can be a few minutes or a few hours.

We have a provider with a root server down since two days (freephonie) here. The only solution is to have a host file in this case.

As Bubba says, this is a long known issue. I don’t think it is Asterisk that causes the problem, but the SIP protocol. I don’t have any problem with machines not running outside SIP trunks. Since all of the phones are assigned by IP address, the problem doesn’t occur. I ran a wire shark on my network here once while this was going on. The traffic on my network was so heavy, my Cisco switch starting shutting down ports because it thought they were storming.

Yes the problem does only exhibit with outside trunks. In our country we starded to use full IP telephony everywhere. That’s why those problems exhibit quite often, as well as problems with call loss or busy calls because of packet loss on unreliable DSL links.

In this regards it would be very interesting to use SIP TCP instead of UDP for WAN trunks, but the last try i did with asterisk 1.6 beta 6 and aastra phones was a disaster (not working at all). But this is another story.

I do not think that the problem does come from SIP.
For sure this is a thread lock inside asterisk. It is well known that the Asterisk engine is not rightly designed concerning thread management.

I do not see why a problem on one trunk would trig extensions to go down.
There is no reasons that extensions with IP address go down when an external trunk DNS server go down.

If you watch the FOP during such an evenement, you will see something like a christmas tree. Extensions are appearing and disappearing every minutes.

The cyclic nature of the problem show that a thread inside Asterisk is certainly locked during the timeout period of the DNS request.

I would be curious to try this inside Freeswitch to see if it does resist to external DNS problems. Unfortunately there is not yet a FreePBX GUI (yet).

The reason I say this is a SIP issue is that the boxes that only have IAX trunks to the outside world do not exhibit this behaviour. The DNS can fail to them and the inside phone system works fine. However when SIP loses DNS, my LAN gets stormed with traffic to the point where several ports on my switch shut down due to storm protection. If you ever ran a wireshark on a LAN that was running the Blaster Worm, it was like that.

According to the Freeswitch developpers, Asterisk is using a home made DNS resolver.

This resolver is synchronous, this means that a query need to be finished before to continue. This explain why Asterisk lock during the DNS timeout. Queries must be completed one at a time which slows everything down and generaly make the server fully unusable.

Freeswitch is using the Sofia SIP stack, where DNS queries are asynchronous. This means that in case of DNS problems, this does not produce a complete standstill like in Asterisk. Only the trunk where DNS fail cannot be opened.

This confirm the general feeling i have from Asterisk : the software has not been developped with industrial strenght, like this should be the case for a telephony software, being opensource or commercial.

Asterisk is certainly working nicely when using Digium cards, but as soon as we try to make full IP telephony, this does not work really well. Or at least we need to use some workaround to avoid problems.

I think that using IAX is not the best solution.

  1. IAX is missing TCP support, this is important for full IP telephony on DSL WAN networks.

  2. IAX cause some problems with signalisation mapping. Even in the SIP world, full IP telephony is far from been standardized regarding signaling (very often we receive congestion instead of busy, or vice versa) . Using IAX for provider links make things more worst.

As you know, telephony service cannot suffered from interuptions for professional use as well as for home use. Simply because in the first case the company rely on it for sales, and in the second case we rely on it for urgency calls.

A service level of at least 99.98 % (ideally 99,998 % like most of the POTS providers in our country) should be targeted.
A service level of 99.98 % is about 8 minutes of down time during one month.

Today this level of service is not possible to achieve with only one VoIP provider. This show that signaling is very important, because in case of a trunk failure, it is very important that Asterisk got the right signaling to try or not the next trunk.

A second big problem we have with all VoIP software is the lack of support for bandwith reservation. In some cases, this can produce missed calls with bad signalisation, or even cause problems on active calls.

This problem to me eyes need to be adressed in futur SIP software versions, as it will never be possible to mimic the service level of POTS without a serious bandwith reservation support.

Olivier ADLER
FRANCE IP

If your provider has DNS down for two days then it’s time to find another provider as mail and almost every other service uses dns.

The other solution is to install a full blown DNS server that does it’s own root server look ups. It’s not that hard then you are in control of your DNS server.

A caching name server uses less CPU time and bandwidth then regular name server which is why many people install a caching name server.

  1. Just reject the 53 udp&tcp port in case of internet outage with iptables rule.
    e.g: iptables -I OUTPUT -p udp -m udp --dport 53 -j REJECT iptables -I OUTPUT -p tcp -m tcp --dport 53 -j REJECT iptables -I OUTPUT lo -p udp -m udp --dport 53 -j ACCEPT iptables -I OUTPUT lo -p tcp -m tcp --dport 53 -j ACCEPT

  2. Use dnsmasq (DNS proxy) with some tricks:

    /etc/resolv.conf only contain: search aaaaaaaaaaaaaa.aa nameserver 127.0.0.1

    /etc/resolv.dnsmasq: nameserver 208.67.222.222 (or any other real recursive DNS server) nameserver 208.67.220.220

    /etc/dnsmasq.conf fit your needs and: resolv-file=/etc/resolv.dnsmasq address=/aaaaaaaaaaaaaa.aa/127.0.0.1

If you use DHCP client or PPP connection then the original resolv.conf could be automatically changed. So you must overwrite it all time (by script). Maybe use chattr.

Results: asterisk will not slow down

  1. answer will be unknown host inmediately.
  2. e.g: try to ping example.com. In case of internet outage, example.com will fail, so after 2-5sec by the “search” option example.com.aaaaaaaaaaaaaa.aa will be queried. After that the answer will be 127.0.0.1.

If the DNS server could be reachable, then the correct ip will be resolved.