High CPU utilization

FreerPBXer · April 17, 2014, 5:12pm

I’m watching a system that’s been somewhat problematic right now. It has 15 active channels and the CPU is at 70%.

This is a virtualized system and it has two virtual CPUs assigned. Hyper-V shows 4% proc utilization for the VM.

It’s currently on 5.211.65-9.

One call is a conference. The server also runs FOP2 (current).

We’ve been having intermittent call quality issues, and of late, users have reported ‘phantom’ key entries in conferences (i.e. participants are suddenly muted). Could the CPU utilization have something to do with this? Either way, it seems like the VM is not seeing/using all the processor resources we’ve allocated to it. Any ideas on how to fix/resolve this?

FreerPBXer · April 17, 2014, 5:56pm

Now with zero active calls the CPU shows 51%. Something is hinky.

jfinstrom · April 17, 2014, 6:22pm

look at the command “top”

FreerPBXer · April 17, 2014, 6:52pm

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29301 root 20 0 107m 1292 952 R 99.9 0.0 24810:47 whois
34244 asterisk 20 0 1769m 42m 13m S 5.6 1.1 19:13.23 asterisk
49748 root 20 0 98288 4268 3288 S 0.3 0.1 0:00.12 sshd
49787 root 20 0 15028 1356 1000 R 0.3 0.0 0:00.11 top
49900 root 20 0 96948 3728 2796 S 0.3 0.1 0:00.01 sshd
49902 root 20 0 96948 3728 2796 S 0.3 0.1 0:00.01 sshd

jfinstrom · April 17, 2014, 7:14pm

there you have it. Kill whois.

FreerPBXer · April 17, 2014, 7:57pm

Thanks. Was ready to do that but waiting to make sure it wasn’t somehow required for FreePBX. I have a cron job setup to run amportal restart every morning at 5:30 AM. As soon as I killed the whois process ID, the email informing me that was done this AM was delivered.

Any idea what might have instantiated whois and/or caused it to race?

jfinstrom · April 17, 2014, 8:05pm

perhaps fail2ban

SkykingOH · April 17, 2014, 10:59pm

Why do you need to do an amportal restart every morning? That is not productive.

We have boxes that have been up for years:

new-vg2*CLI> core show uptime
System uptime: 1 year, 4 weeks, 1 day, 19 hours, 18 minutes, 30 seconds
Last reload: 6 hours, 51 minutes, 34 seconds

FreerPBXer · April 18, 2014, 1:23am

We have an ongoing issue with three installs where FreePBX stops communicating with the Digium G100 T1 interface. It always happens overnight, but not every night. The result is no inbound or outbound calls. Amportal restart fixes the problem. It’s been difficult to get support on it quickly so for this one critical install we setup the cron job to proactively fix it early in the morning.

dicko · April 18, 2014, 1:45am

I technical point, FreePBX does not communicate with the gateway, that is Asterisk. Have you called Digium support? they normally reply within hours for their products, and they would be authoritative. Does “sip reload” from asterisk CLI equally fix your problem?

FreerPBXer · April 18, 2014, 1:56am

The problem has been getting someone on the phone while the problem is happening. As you might expect, users are quite anxious to get the problem fixed when it’s happening. I’ll try sip reload next time it happens. We do have cases open (or did) with Schmooze and Digium. Thanks much for the input. Always appreciated!

tampatech24 · April 18, 2014, 8:25pm

Hello! Have you considered migrating to a Rhino card? I’ve had better luck from a T1/PRI perspective using Rhino’s products over Digium.

Also why not consider calling your provider and asking if your T1/PRI switch is doing anything, or logging anything, to identify the symptoms?

Skyking is right, that little uptime indicator is the heartbeat of your system. My general rule is if I can’t keep uptime over 3 months, something’s worth investigating.

Glad you figured out your CPU spike issue.

FreerPBXer · April 18, 2014, 9:07pm

“Hello! Have you considered migrating to a Rhino card?”

No. The G100 is an external device, not a card. The FreePBX server is virtualized, so cards are out. The G100 and G200 connect to the T1 CSU/DSU but present as SIP to Asterisk, so we can connect from the virtualized boxes to the T1 via IP, through the G100. The loss of connectivity is something that started a couple months ago- it didn’t exist when we first setup the three affected systems.

Agreed, we shouldn’t have to be restarting Asterisk every day. But that’s preferable to 6:00 AM calls from the client because their phone system is ‘down’.

dicko · April 18, 2014, 9:20pm

It has never taken two months for me to get a response from Digium. If you are loosing network connectivity then simple tools like asterisk’s “sip debug peer . .” and a lower level “tcpdump udp -nnvv port 5060” should suffice

SkykingOH · April 18, 2014, 9:41pm

I don’t think sip reload would do it but dahdi reload might

asterisk -rx ‘dahdi reload’

Much better than an amportal restart. Much like hitting yourself with a hammer to cure a headache.

dicko · April 18, 2014, 9:54pm

Personally I don’t think dahdi reload would successfully reconnect a SIP gateway

SkykingOH · April 18, 2014, 11:46pm

I could have sworn that thing used a Redphone style Ethernet connection in Layer 2.

As much as I hate to admit it Dicko appears to be right, it’s a SIP gateway.