One-way audio on a small percentage of incoming calls

I have a fresh setup of FreePBX 2.8.0.4 on Asterisk 1.6.2.15 which is handling about 130 calls per day. It works perfectly about 95% of the time. However, each day there are a few incoming calls where the outgoing audio stops working after the first few seconds. For example: an outside party calls in, John answers by saying “Hello, this is John” and the outside party hears John say this. Then the outside party asks John a question and John begins to answer but the outside party hears nothing. John can still hear the outside party fine. Eventually, they both hang up and the outside party calls back. The second call proceeds with no problem whatsoever. The problem has never occurred with outgoing calls, nor with internal (extension-to-extension) calls.

The PBX is running CentOS 5.5 with kernel 2.6.18-194.32.1.el5 on a single P4 at 2.2GHz with 1GB RAM.

eth0 is connected to VLAN 1 (Default Data LAN) with static address 10.10.50.5
eth0.2 is connected to VLAN 2 (Voice) with static address 10.10.55.5
eth1 is connected directly to WAN with static address 66.XXX.XXX.205
I currently have routing enabled but the problem also happens with it disabled.

The phones (two Aastra 6753i and 22 Aastra 6731i, all with latest firmware) are on VLAN 2 and they receive DHCP and DNS from dnsmasq which is listening to eth0.2 on the PBX. Every extension is configured in sip_additional.conf with canreinvite=no and nat=yes.

Asterisk SIP Settings (in sip_general_additional.conf) include:

nat=yes
externip=66.XXX.XXX.205
localnet=10.10.50.0/255.255.255.0
localnet=10.10.55.0/255.255.255.0

Our trunks are from Bandwidth.com (not the SIPStation type) and are defined in sip_additional.conf as:

[from-bandwidth-A]
disallow=all
host=216.82.224.202
fromdomain=216.82.224.202
type=peer
insecure=port,invite
qualify=yes
canreinvite=no
dtmfmode=rfc2833
allow=ulaw
context=from-pstn-e164-us

[from-bandwidth-B]
disallow=all
host=216.82.225.202
fromdomain=216.82.225.202
type=peer
insecure=port,invite
qualify=yes
canreinvite=no
dtmfmode=rfc2833
allow=ulaw
context=from-pstn-e164-us

[to-bandwidth-A]
disallow=all
host=216.82.224.202
type=peer
insecure=port,invite
qualify=yes
canreinvite=no
dtmfmode=rfc2833
allow=ulaw
context=from-trunk-sip-to-bandwidth-A

[to-bandwidth-B]
disallow=all
host=216.82.225.202
type=peer
insecure=port,invite
qualify=yes
canreinvite=no
dtmfmode=rfc2833
allow=ulaw
context=from-trunk-sip-to-bandwidth-B

Primary trunk is shown above; Secondary is same except for IP address.

The codecs enabled in Asterisk are the defaults installed by AsteriskNOW 1.7.1 (ulaw, alaw, gsm). In the Aastra configurations, I’ve got:

sip use basic codecs: 0
sip customized codec: payload=0;ptime=30;silsupp=off,payload=8;ptime=30;silsupp=off

whichs means that only G711u (8K) and G711a (8K) are enabled, both with silence suppression off. Every phone was purchased new this month.

Basically, there is no NAT involved between Asterisk and the phones, and there is no NAT involved between Asterisk and Bandwidth.com, and Asterisk’s SIP Peer status correctly shows “N” under NAT for phones as well as trunks. I have tried various combinations of nat=no and canreinvite=nonat but they have no effect.

As far as I can tell, an incoming call should always be using ulaw (G711u) and the RTP stream should always pass thru Asterisk to the phones with no transcoding and no reinvite.

According to both Webmin and FreePBX System Status, the CPU usage is usually 0-2% and never over 10%. There is never more than 300MB RAM in use (usually 210-220MB). However, I have disabled the Flash Operator Panel just to make absolutely sure that nothing is contending with Asterisk for system resources.

At this point, am I looking at a trunk problem?

I’m having the same exact problem with almost the same setup, two nics. I’ve been working with bandwidth.com for Months sending them logs. Did you ever resolve this issue?

Thanks

The eth1 interface on the PBX is connected DIRECTLY to the WAN (Internet). There is NOTHING between Asterisk and Bandwidth.com – no firewall, no proxy, no forwarding, no NAT.

There is NOTHING between the phones and the eth0.2 interface of the PBX – no firewall, no proxy, no forwarding, no NAT.

The trunks are consistently qualifying with <50ms response. The phones are consistently qualifying with <20ms response.

That’s why this is so baffling. The problem calls begin with 2-way audio working, so if it stop working, doesn’t that imply a reinvite occurred? Why would a reinvite occur if canreinvite=no is set on Asterisk AND all extensions AND the trunks?

what UDP port do you have forwarded from your firewall to your server?

also, do you have any sort of SIP aware firewall?

I’m also having this issue with VoIP.ms. Every day we get about 100 to 130 calls, and today 3 of them were met with dead silence. I’m using FreePBX 2.9. All modules are up-to-date as of May 19, 2011. It’s really strange because there are no errors in the Asterisk Log, just a “hangup” after a few seconds of silence. Then the customer calls back and the call is fine. The owners of the business are getting really upset over this and are starting to talk about looking for a new phone system. If anyone has the answer to this I would sure appreciate knowing.

VoIP.ms uses NAT. 8 of the phones are at a remote site, and they are using NAT too. Canreinvite is set to “NO” on everything.

The other issue we have which may be related is with HOLD. The 8 remote phones share 2 lines on all 8 phones. When a customer calls and we answer on phone 1, and the customer wants to speak to Paul, we place the customer on hold. We let Paul know about the customer, but when he picks up (using HIS phone) it’s one-way audio. The customer hears him, but he cannot hear the customer. He puts the customer back on hold and walks over to the original phone and he is then able to pick up the call and have duplex audio.

After a lot of playing around, I discovered if I disabled Music-On-Hold this does not happen. So as a result we do not have MoH deployed anymore. Since we turned that off the hold feature is working well. The only issue we have now are the incoming calls not hearing us. This one-way audio problem only happens on incoming calls. Outgoing calls are never an issue.

I’m at the point where I’m scared to adjust the settings because if it messes the system up worse they are going to scrap it immediately. Patience are running very thin at this point. I hope someone can help us. There’s lots of stuff to read, but nothing I’ve read over the past 3 weeks has helped us.

Thank you.

We switched away from bandwidth.com and no longer have the issue.

For us, the problem turned out to be a faulty switch at our ISP’s facility which was causing sporadic spikes of latency on our Internet connection. Each spike was just big enough (~300ms) to corrupt the outbound media stream from our Aastra phones, but the spikes did not occur continually/consistently. Typically, we would have anywhere between 20 and 200 seconds of rock-solid connectivity (latency < 5ms), then a single spike > 250ms, then another 20-200 seconds with no spikes. Typical tools like SolarWinds and “speed test” web sites were useless for finding this kind of latency spiking. We eventually isolated the spikes using:

ping -A -w 200 x.x.x.x

where x.x.x.x was the address of a box at the ISP facility connected to the same switch as us. Note that the -A option is case-sensitive and specific to Linux (no equivalent in Windows). It means flood mode. The -a (lowercase) option is different.

Our problem was resolved by our ISP replacing the bad switch, but I never found an explanation as to why this problem didn’t affect the inbound stream. We did find that Panasonic phones were not vulnerable to this problem, which would imply a shortcoming in Aastra’s RTP/codec implementations, but we never had any problem recording the Aastra outbound streams on the PBX (even when the outside party couldn’t hear it). So a lot of mysteries remain.

During this ordeal, I learned that bandwidth.com doesn’t operate its own media servers. When you get SIP trunking from bandwidth.com, they only handle SIP (signaling). They hand-off the audio to various carriers like Level 3 who run media servers. Any given call could go to several possible carriers, due to various factors like load balancing, route optimization, availability, etc. Obviously, this complicates the troubleshooting process somewhat. This also explains why bandwidth.com trunks can’t support T.38; they can only support codecs which ALL of their media carriers support.

I should mention that bandwidth.com Support was very professional, courteous, helpful, and gracious (especially when our ISP service, which was NOT purchased from bandwidth.com, turned out to be the culprit). Overall, we remain satisfied with our bandwidth.com trunks.

How did you find this problem and get your ISP to fix it?