Freepbx in HyperV

I am getting chopped calls in a Freepbx running in Hyper-v 2012 R2 when we reach around 25 calls. I really think this is a small number of calls. My vm-guest was with only one vCPU. I tried to add more one vCPU and it has 08Gb Ram. I got a problem with irqbalance adding more one vCPU to Centos 6.8. The solution was disable irqbalance. OK, the system is running, but I still get chopped calls and Asterisk process goes to aroung 90% with 20~25 calls. Does anyone got this problem? Could give some tips how solve performance problems running under HyperV?
VM-Gen1 or Gen2? My Lis is up to dated to 4.2.4.1 (the latest!) Should I disable Timesync clock from LIS?
Thanks,

Definitely not the experience I’ve had using HyperV 2012 R2with similar and greater simultaneous call volumes.
In times when chop is happening, is it across the board for all users or just a few?
How much transcoding are you doing? Leaving everyone on uLaw?
I assume you have good underlining hardware running the VM system?
All mine are running under HyperV using GEN 1 VM’s. Although I’ve deployed all mine using FreePBX Distro. Not sure if that qualifies for you. 4 gig of ram 1 cpu. Most of the Vms I’ve got are running 20-30 simultanous calls, and each server has 5-8 instances of some flavor of asterisk.

Has this always been the behavior youve noticed? or just recently comming to your attention?

Hi Dickson!
When we reach these numbers, all extensions can feel the chop (drop packets? timing problem?). We didnt feel before because the calls are growing few days ago…
We are trying to test with alaw because our Sip Provider uses both (g729 and alow). So, I am testing with alaw. About our hardware… it is a DELL R430 with one Processor Xeon E5-2609 (6 cores) 16Gb Ram and a Raid5 storage. I think it is a good server, dont you? I am considering to move to another HyperV, and reformat this host with Vmware or XEN. In another forum, I did read that Asterisk with many calls under HyperV should have problem with RTP. It could be my cenario…

See what codecs are being used and whether any transcoding going on.
pjsip show channelstats
or
sip show channels

If you are seeing g729, try not allowing it on the trunk at all.
If you have g722 on the extension side, temporarily force alaw there and check whether it saves much CPU.

Other CPU hogs are recording calls (possibly you can do that externally) and listening for DTMF (e.g. can you use transfer key on the phone instead of *2)?

Yea, that is plenty of power. Stick with alaw as much as you can. ( i run alaw between north america and India at 250ms, works perfectly)

Couple of things - We have about 70 machines running under Hyper-V with no issues, some with as few as 4 extensions, largest with 75 extensions.

  1. Fixed RAM not Dynamic - your host only has 16G - How many other VM’s are you running? Also, in the past I have had a Memory Leak on a FreePBX instance and with Dynamic memory, it took down the host (with 128G!!). Under 30 users I allocate 4G over 30 users I allocate 8G - but fixed, not dynamic.

  2. ALWAYS give Linux VM’s at least 2 CPUs in ANY virtualization environment - Multi-Threaded execution is the expected norm anymore and there is ZERO impact to the host.

  3. Transcoding is CPU intensive - unless your bandwidth is limited (you don’t say) don’t even bother with transcoding - easy way to test this is to turn off everything except ulaw/alaw - and transcoding between those two is trivial - G.729 takes way more effort.

[off topic:] may i ask how you go with Trunking, remote sip connections, etc? Do you have a dedicated WAN IP for each VM?

HI’ve got a bunch. About 1/2 of the machines I have I put public ip right on the box (physical firewalls process traffic rules) and then free pbx distro using their fw and f2b. Works Like a charm.
I put 2 nic on each VM, 1 for wan and 1 for lan.
Other machines share IPs with nat, no problems. Hyperv can handle full failover no problems

Hi friends! I did dive in my problem in this Easter! Happy Easter, friends! Thanks for sharing your experiences.When the problem started, I was with 02 vCPU in my VM. I dont know if you already had this problem, but when I tried to add more than one vCPU to Centos, the network could not be reachable. It is a RedHat bug with IRQBalance. I did deactivate, and 02 vCPUs were working. Going back to problem… we started to feel these áudio call drops… Searching in the web, I found some sites blaming more than one CPU in Linux vm (does make sense?!) => Intermittent brief audio dropouts . By this reason, on last week, I did remove one vCPU, and my VM was with only one vCPU. Resuming… make stress tests in this Holiday, I did found the problem starts when the Asterisk eats the memory, and starts using SWAP. My Host has Raid5, but with SATA disks. I think the I/O was not so good… so, I stopped another VM… reboot Asterisk to fresh swap, and start my tests again… now with 04 vCPUs. The system is running fine until this moment. We had reach 70 channels and going up without problem. Searching for this problem… I found many threads about Asterisk memory leak, but I didnt find a Solution… just reboot. The sync, echo >3 drop caches does not appears to be a good Solution… it appears to be dangerous if another process is using that cache. So, reboot apears to be less dangerous… but… I confess that I dont like. Do you schedule Asterisk reboot, or have another Solution? Thanks!

Swapping to Hard Disk will DEFINITELY cause audio to drop - the machine just can’t process the packets fast enough - Asterisk uses 20msec packets, which means for every conversation it is processing 50 packets/second - for 70 calls, you are talking 3500 packets/second - introduce swapping and you are toast.

Give it enough RAM so you are never swapping.

Excellent to know. Is there a rule of thumb or basic calc for this?

Once the system is humming spend a little time in top to see how much it is actually using - but the above guidelines have never failed me.

Found this today - I will copy it here and also make a post - had some VM’s throwing TONS of these errors:

kernel: Clocksource tsc unstable (delta = 131717343 ns). Enable clocksource failover by adding clocksource_failover kernel parameter.

Come to find out it’s a bug (CentOS or Microsoft I don’t know…) that occurs when the VM is Live-Migrated - it actually hung a machine this morning which is why I found out about it - went and looked through all my running machines and it was occurring on 7 of them - the clock “De-Stabilizes” during the migration.

Simple solution - add this to /boot/grub/grub.conf and reboot when you can:

clocksource_failover=acpi_pm

Put it at the end of the kernel line as an option and then reboot.

Here is the original article:

Interesting. We have had issues with Conferences for years on Hyper-V hosted FreePBX systems. Audio sync/lag and participants getting “automuted” that we eventually opened a support case for and were ultimately told “Hyper-V is not a supported platform.”

These FreePBX VMs have not been live-migrated ever, but I’m wondering if the issue could be related.

The time sync issue on my HyperV deployment never suffered (knowingly) from any issue. I rebooted the guest OS and the message stopped. Again, for me it was the same issue, but a completely different deployment of an PBiaF from probably 3 years ago. I’ll have to watch to see if it comes back. Not sure when this problem started, but would have been relatively recently. Admittedly haven’t looked at the console of the box in several month (which is where the message was showing for me) vs a remote SSH session.

Hello,

I am having the same issue here. It happened after I installed patches on the Windows host OS and did a live migration.

@GSnover I was wondering if you would mind clarifying what you mean by “Put it at the end of the kernel line” please? I am not a Linux Expert.

Thanks

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.