IAX2 Trouble Shooting

I must admit, for the most part I just use SIP for most things, but after returning from the OTTS training, I figured I would try working with IAX for a bit. Everything I see related to IAX talks about how this is simpler to work with vs SIP, leave it to me to find one to stump me.

I have two FreePBX Distro’s running on a LAN segment here, but are 3.x releases, running Asterisk 11, so all the modern goodies.

I have configured static trunks between the two servers, and in fact both machines can ping each other just fine. Unrelated, I also configured a SIP trunk that just worked between the two without any issues. So after configuring the trunk between the two PBX’s, and enabling qualify, one says OK with a couple ms response time, and the other PBX says UNREACHABLE. It’s reachable via normal PING, and if I use SIP it just works, but on IAX it’s unreachable.

This is all in a local LAN segment, there is no firewall, filters, or any type of ACL. I have even disabled IPtables as a test to see if that had any impact, but nope.

Config PBX-A:

Trunk Name: demokit

Peer Info:
username=mainpbx
type=friend
trunk=yes
secret=xyzzy
qualify=yes
insecure=port,invite
host=10.0.0.26
context=from-internal
auth=md5
requirecalltoken=no
callcounter=yes

Config PBX-B:

Trunk Name: mainpbx

Peer Info:
username=demokit
type=friend
trunk=yes
secret=xyzzy
qualify=yes
insecure=port,invite
host=10.0.1.67
context=from-internal
auth=md5
requirecalltoken=no
callcounter=yes

The LAN is 10.0.0.0/16 so using both 10.0.0.x and 10.0.1.x is fine for the local network, just in case someone notices the difference in IP addressing.

When on demokit, I see:

Name/Username    Host                 Mask             Port          Status      Description                     
demokit/mainpbx   10.0.0.26       (S)  255.255.255.255  4569 (T)      UNREACHABLE                            

When on mainpbx, I see:

Name/Username    Host                 Mask             Port          Status      Description                     
mainpbx/demokit   10.0.1.67       (S)  255.255.255.255  4569 (T)      OK (1 ms)                                   
1 iax2 peers [1 online, 0 offline, 0 unmonitored]

For the life of me, I can’t figure out why I get unreachable in one direction, does anyone here with some IAX experience have any clues on how to debunk and chase this one down?

As stated above, I can ping between both servers just fine, and I also went and setup a SIP trunk between mainpbx and demokit as a test, and the SIP trunk comes right up, and I get a good qualify on both sides. As soon as I try and use IAX, one side claims it’s unreachable.

Any ideas on how to debug this one is most appreciated, as I am still fighting this, but at a bit of a loss as to why this would only fail on a local LAN so no firewall/filters in place…

My guess is that the netmask on one of your boxes is not 255.255.0.0

You did say you dont have ACLs or FWs installed, but one thing we are not sure is if the port is accessible from other places. Perhaps you can try this;

Install NMAP

on the demokit; try

nmap -sU -p4569 10.0.1.67

If all is good, you should see;

PORT STATE SERVICE
4569/udp open|filtered unknown

Also, won’t be a bad idea to restart the box that cannot connect. Ive had issues like this too but strange issues start to work with a reboot…:S

Do a tcpdump port 4569 -s 512 -vvvv

See if you are getting anything from the 10.0.0.26

Also ping uses ICMP that has Dicko pointed out can be forgiving to bad network masks. Since you have two different /24’s you need to use a 16 bit mask 255.255.0.0 unless you have a router that can bridge the networks. Of course you said you have no router so my vote is the mask.

Nope, netmask is fine, that was one of the first things that came to mind when I started looking at it. Being an internal range,I just allocated a /16, and at the moment I use 10.0.0.xx for static assignments, and 10.0.1.xx for DHCP.

Thanks for the input, and that is correct. I don’t mean to say I don’t have a firewall, but it is at the gateway to the network, both servers are inside the secure LAN, so to reach each other shouldn’t need to pass though the firewall, or anything beyond my switch for that matter.

I did try the nmap as you mentioned above, and as you mentioned I got the return the port was open:

Starting Nmap 5.51 ( http://nmap.org ) at 2013-02-09 02:39 EST
Nmap scan report for 10.0.1.67
Host is up.
PORT STATE SERVICE
4569/udp open|filtered unknown

Tried that as well, on both machines, as well as disabling iptables on both machines to make sure nothing was being blocked.

Netmasks are fine, here is one machine:

eth0 Link encap:Ethernet HWaddr 00:22:4D:A0:68:E8
inet addr:10.0.1.67 Bcast:10.0.255.255 Mask:255.255.0.0

Here is the other machine:

eth0 Link encap:Ethernet HWaddr 00:1A:64:98:7E:84
inet addr:10.0.0.26 Bcast:10.0.255.255 Mask:255.255.0.0

As you can see, the machines are masked as a /16 on both, so hopefully that has that issue covered, granted I guess I could statically assign an IP and put it in 10.0.0.xx and see what it does. I have a couple things test wise I keep thinking of trying.

Now on to the tcpdump, maybe something fishy here, granted I am not totally sure what. As suggested I ran a tcpdump on the 10.0.1.67 machine (demokit) that keeps saying the other machine is unavailable, and here is some of that output:

    10.0.0.26.iax > 10.0.1.67.iax: [udp sum ok] UDP, length 14
02:45:17.573802 IP (tos 0xb8, ttl 64, id 43941, offset 0, flags [none], proto UDP (17), length 40)
    10.0.1.67.iax > 10.0.0.26.iax: [bad udp cksum aa29!] UDP, length 12
02:45:17.574004 IP (tos 0xb8, ttl 64, id 52753, offset 0, flags [none], proto UDP (17), length 40)
    10.0.0.26.iax > 10.0.1.67.iax: [udp sum ok] UDP, length 12
02:45:18.401167 IP (tos 0xb8, ttl 64, id 43942, offset 0, flags [none], proto UDP (17), length 42)
    10.0.1.67.iax > 10.0.0.26.iax: [bad udp cksum e6d1!] UDP, length 14
02:45:18.401332 IP (tos 0xb8, ttl 64, id 52754, offset 0, flags [none], proto UDP (17), length 40)
    10.0.0.26.iax > 10.0.1.67.iax: [udp sum ok] UDP, length 12
02:45:19.402446 IP (tos 0xb8, ttl 64, id 43943, offset 0, flags [none], proto UDP (17), length 42)

Strange on the bad checksum’s, looks like one is 40 in length, and one is 42. Not quite sure what to make of that yet, but it’s a weird one indeed. Also why is this just affecting IAX, as SIP runs fine, as does ICMP (ping).

I then ran the tcpdump on the 10.0.0.26 machine as well, and pretty much the same result, it’s bitching about bad udp checksums as well.

If you have ever run into this one, by all means I am open to suggestions, as I am scratching my head a bit for sure at this point…

Seems the checksum errors are caused by the fact that most modern ethernet cards offload that processing to the cards, so when watching it with tcpdump you will see the checksum errors as that field has already been zeroed by the hardware on the card. I found that in a posting while looking up the checksum errors, and then I disabled the checksum offload with ethtool, and sure enough tcpdump was happy.

So looking back at the tcpdump now, it very much looks like both PBX’s are talking to each other, as you can see packet flows from both sides. I have run a tcpdump on both servers, and they are seeing the flows from each other.

Any other ideas that anyone has run across?

As a test, though you can see from the above the netmask’s were OK initially, I moved the demokit back into 10.0.0.x still leaving the /16 netmask (so I ONLY changed the IP address, nothing else).

Sure enough, the IAX trunk came to life, registered, and went on it’s way. I was then able to make/receive calls over the trunk, and see the qualify response times.

So at some level, problem solved! Though this does still leave the question, with a proper netmask, why does IAX fail when the IP ranges were NOT in the same /24, as anything in the /16 should have been valid. Especially considering the exact same trunk, using SIP, worked perfectly, but using IAX caused a failure.

I am leaning towards some bug in the IAX protocol, not sure exactly what, but I can’t find any other reason why this should have failed, yet all other protocols were fine.

Anyway I wanted to give a follow up on what all I have come up with working on the issue some more today…

I would be very interested in the output of ‘iax2 show peer’ on a non-working peer. That will show the bound ACL’s to the peer.

Easy enough to move it back on to the other LAN segment, as I just took and did a static assignment via DHCP. So if I pull that, the sucker should just drop back into 10.0.1.xx on a reboot. I will let you know what it gives me…

OK, now I am totally miffed… I logged in to my DHCP server, removed the static mapping, rebooted demokit, and as expected it jumped right back to the 10.0.1.67 IP address that wasn’t working before.

Well I’ll be dammed, it came right up, qualify is good, as I have the demokit in dynamic mode, it registered with the remote PBX, and is working exactly like it should.

So much for getting the info you requested, I will test and work with it some here, and if the problem comes back I will post that output.

Somehow between changing IP’s, and coming back, it seems to have magically fixed itself, go figure.

My partner was working with a demo as well, and was having an IAX issue, so if he still is, I will first verify stuff like netmaks and such, and then grab that output.

I’ve got tens of clients on IAX2 and always have SIP trunks too configured but disabled. I’ve had more than one occasions where a reboot fixed or even broke IAX2 while SIP was always stable.

Really couldn’t troubleshoot much then to avoid angry customers, getting angrier…haha

Good it work for you mate.

Please refer to asterisk issue
https://issues.asterisk.org/jira/browse/ASTERISK-18827

This should be in the next asterisk release candidates.