High CPU utilization

I’m watching a system that’s been somewhat problematic right now. It has 15 active channels and the CPU is at 70%.

This is a virtualized system and it has two virtual CPUs assigned. Hyper-V shows 4% proc utilization for the VM.

It’s currently on 5.211.65-9.

One call is a conference. The server also runs FOP2 (current).

We’ve been having intermittent call quality issues, and of late, users have reported ‘phantom’ key entries in conferences (i.e. participants are suddenly muted). Could the CPU utilization have something to do with this? Either way, it seems like the VM is not seeing/using all the processor resources we’ve allocated to it. Any ideas on how to fix/resolve this?

Now with zero active calls the CPU shows 51%. Something is hinky.

look at the command “top”

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29301 root 20 0 107m 1292 952 R 99.9 0.0 24810:47 whois
34244 asterisk 20 0 1769m 42m 13m S 5.6 1.1 19:13.23 asterisk
49748 root 20 0 98288 4268 3288 S 0.3 0.1 0:00.12 sshd
49787 root 20 0 15028 1356 1000 R 0.3 0.0 0:00.11 top
49900 root 20 0 96948 3728 2796 S 0.3 0.1 0:00.01 sshd
49902 root 20 0 96948 3728 2796 S 0.3 0.1 0:00.01 sshd

there you have it. Kill whois.

Thanks. Was ready to do that but waiting to make sure it wasn’t somehow required for FreePBX. I have a cron job setup to run amportal restart every morning at 5:30 AM. As soon as I killed the whois process ID, the email informing me that was done this AM was delivered.

Any idea what might have instantiated whois and/or caused it to race?

perhaps fail2ban

Why do you need to do an amportal restart every morning? That is not productive.

We have boxes that have been up for years:

new-vg2*CLI> core show uptime
System uptime: 1 year, 4 weeks, 1 day, 19 hours, 18 minutes, 30 seconds
Last reload: 6 hours, 51 minutes, 34 seconds

We have an ongoing issue with three installs where FreePBX stops communicating with the Digium G100 T1 interface. It always happens overnight, but not every night. The result is no inbound or outbound calls. Amportal restart fixes the problem. It’s been difficult to get support on it quickly so for this one critical install we setup the cron job to proactively fix it early in the morning.

I technical point, FreePBX does not communicate with the gateway, that is Asterisk. Have you called Digium support? they normally reply within hours for their products, and they would be authoritative. Does “sip reload” from asterisk CLI equally fix your problem?

The problem has been getting someone on the phone while the problem is happening. As you might expect, users are quite anxious to get the problem fixed when it’s happening. I’ll try sip reload next time it happens. We do have cases open (or did) with Schmooze and Digium. Thanks much for the input. Always appreciated!

Hello! Have you considered migrating to a Rhino card? I’ve had better luck from a T1/PRI perspective using Rhino’s products over Digium.

Also why not consider calling your provider and asking if your T1/PRI switch is doing anything, or logging anything, to identify the symptoms?

Skyking is right, that little uptime indicator is the heartbeat of your system. My general rule is if I can’t keep uptime over 3 months, something’s worth investigating.

Glad you figured out your CPU spike issue.

“Hello! Have you considered migrating to a Rhino card?”

No. The G100 is an external device, not a card. The FreePBX server is virtualized, so cards are out. The G100 and G200 connect to the T1 CSU/DSU but present as SIP to Asterisk, so we can connect from the virtualized boxes to the T1 via IP, through the G100. The loss of connectivity is something that started a couple months ago- it didn’t exist when we first setup the three affected systems.

Agreed, we shouldn’t have to be restarting Asterisk every day. But that’s preferable to 6:00 AM calls from the client because their phone system is ‘down’.

It has never taken two months for me to get a response from Digium. If you are loosing network connectivity then simple tools like asterisk’s “sip debug peer . .” and a lower level “tcpdump udp -nnvv port 5060” should suffice

I don’t think sip reload would do it but dahdi reload might

asterisk -rx ‘dahdi reload’

Much better than an amportal restart. Much like hitting yourself with a hammer to cure a headache.

Personally I don’t think dahdi reload would successfully reconnect a SIP gateway :wink:

I could have sworn that thing used a Redphone style Ethernet connection in Layer 2.

As much as I hate to admit it Dicko appears to be right, it’s a SIP gateway.