FRACK and other serious issues (maybe mixmonitor? maybe tasksprocessor?)

Stop FOP and other 3rd party add ok and see if that fixes your problem.

2 Likes

Me too! When this problem first occurred it caused new calls to return unavailable and no logfiles were updated when calls were received. Outgoing calls failed too (with no log messages).

I rebuilt the system from scratch and although I still get the warnings the system hasn’t failed since. I have just installed FreePBX on a ProLiant 12-core 32GB RAM that Is laying around unused so will see if the extra horsepower helps.

Jon
Do you happen to remember if you did any major changes before this started happening? Do you also use Asternic or fop2? My system ran fine for about a month before these problems started and though I did a few changes the days before, the problems happened 2 days after the latest change. I have now reverted all changes from the week prior to the problem starting and the problem continues.

Tony Lewis, we stopped the services but it failed again anyway. Any other things you can suggest we should try? Or what should we document when the problem arises so maybe the info can help debug it?

No Astemic or FOP2

It’s difficult to say what prompted the issue… what happened was…

My system is virtual, running on a VMware ESX installation. I had been gradually adding more “features” to the installation and, mid-week, decided to make a “bare-metal” backup. To do this, I shut the FreePBX guest down from the FreePBX system admin power options menu, then copied the VM folder on the host creating a duplicate copy. I then powered the original back on again, it started without an issue but the next day began to act very strangely. After a few queues had come into the queue (sequentially, we rarely get more than two simultaneous calls in the queue), new device registrations would fail, incoming calls didn’t ring and the FreePBX logs showed no activity.

I started to “unpick” my activities the previous week but was unable to correct the problem. I finally created a new VM with identical architecture, re-installed FreePBX, activated it, restored the config from the failing system, copied moh, voicemail and some other bits over… since then I have lots of taskprocessor warnings but no failures (although I am nervous).

So - a brief description, see what we have in common. Several queues. Each queue has another queue set not to answer as it’s destination. The second queue in line only has static agents. Two of these queues have 3 statics, the other two of these queues have about 12 statics. We are using PJSIP for device connections and CHANSIP for trunk connections. We use one trunk for all outgoing calls since out provider supports CLIP masquerading. There are another five trunks used for incoming calls.

We have multiple MOH groups, one for each queue. We have chosen to edit the MOH with Audacity to include “press * if you don’t want to wait” type announcements rather than letting the queue do the announcements. The MOH wavs are 8Khz and were uploaded to the moh folder with WinSCP and then the Settings\Music-on-hold module performed format changes creating alaw, sln, sln16, sln48 and g722 versions.

Each of the “front” queues (that receive inbound routes) have an IVR attached which allows the caller to leave the queue and go to voicemail.

All inbound and outbound routes have call recording set to Force and call recording format is wav (lowercase). The call recording location is as per default.

I really don’t know what to say - it would be a help if anyone could say WHAT these taskprocessor metrics are monitoring. If I restart Asterisk and make a test call, as soon as the handsets start ringing, the high water limit is hit and normally exceeded by 200-300%.

Oh, one other thing - the PBX isn’t local to the site - all the handsets (about 18) are registered through the responsive firewall. I have been wondering if these taskprocessor queues have something to do with BLF subscribers at the end of the remote links.

As I said, I have build FreePBX on a test server (a ProLiant G8, 32GB, 16C) which would normally be good for 20,000+ devices. I’ll try and switch over one day this week with the VM on hot-standby to see if the figures look better - or whether this is a network bottleneck etc.

Oh - something else that I wouldn’t class as “unusual” but maybe it is - neither PJSIP or CHANSIP are set to the default ports - we use UDP ports in the 3000-4000 range for SIP but keep to 10000-20000 for RTP.

Security by obscurity :wink: I know it’s hardly foolproof but it keeps some script kiddies at bay :wink:

I’m not sure if I’ve found something that may be causing the problem - but given that I suspect the large taskprocessor queues are due to notifying extensions that the call is ringing… Four of my extensions are showing over 100 watchers for their hint… I only have six handsets which support BLF subscriptions - the rest are all DECT handsets with no BLF functionality! Clearly the persistent information in astdb for hint/subscription watchers is in error… I will dig more tonight.

thats above my current knowledge. I will research a bit and then check my own install.
thank you.

Jon, we just found my system had hundreds of Conference Rooms that had been automatically created (not by us). Might be from an old bug, might be something else. Can you check your conference room menu and see if you have them too please?

I just had a look, no conference rooms on the system at all mate.

So from what I said earlier, I went to Reports, Info, Subscriptions - the “watchers” column was indicating an impossibly high number of watchers on several hints.

I deleted the persistent subscriptions by:-

Stop Asterisk (core stop gracefully)
SSH onto the server
cd /var/lib/asterisk
sqlite3 astdb.sqlite3

At the sqlite3 > prompt

delete from astdb where key like ‘/subscription_persistence/%’;
.quit

Restart Asterisk (fwconsole start)

The watchers count for the subscriptions is correct now. I must admit that with a few test calls, I’m not convinced it’s reduced the taskprocessors count but will see what happens tomorrow.

Regards, Jon

OK, you peaked my interest and I had a closer look at the astdb. Given there are no calls in progress, these are a surprise…

/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/Local/[email protected];2 : RINGING
/RG/302/PJSIP/201-0000015d : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/Local/[email protected];2 : RINGING
/RG/401/PJSIP/401-000001ba : RINGING
/RG/401/PJSIP/401-00000339 : RINGING
/RG/401/PJSIP/401-00000348 : RINGING
/RINGGROUP/280/changecid : default
/RINGGROUP/280/fixedcid :
/RINGGROUP/285/changecid : default
/RINGGROUP/285/fixedcid :
/RINGGROUP/380/changecid : default
/RINGGROUP/380/fixedcid

So, deleted these entries with the above process but the following sqllite commands

delete from astdb where key like ‘/RG/%’;
delete from astdb where key like ‘/RINGGROUP/%’;
.quit

astDB looks much cleaner now - but not sure the taskprocessor overcommitment has gone away…

Incidentally, if you don’t want to dive right in to running sqlite3, go to the Asterisk CLI and run “database show” (without the quotes)

I just checked the subscriptions and there are 484 hints and on the watchers column they are either 0 or 1, nothing is over that. But the system is pretty much at a standstill now and using a recreated astdb so i dont know if the problem is there or gone away.

A (hard earned) word of caution to anyone following this thread ,

NEVER ever use the sqlite3 interface while Asterisk is running, the code is thread safe for asterisk but in no way multi-user safe, you will , sooner or later, set up a lock on the database that is VERY hard to resolve. Asterisk has its inbuilt

database query 
Usage: database query "<SQL Statement>"
       Run a user-specified SQL query on the database. Be careful.

so if you want to mess with the database while asterisk is running, wrap it up this way while “being careful” (otherwise you will all learn what VACUUM does and when it can be run safely while asterisk is running (that is NEVER) " :slight_smile:

The underlying problem is still open, what is generating those spurious taskprocesses, until it is found and fixed, then . . . .

1 Like

Hey Dicko - GREAT POINT. I just found out myself (the “hard” way) that this is a big issue… I don’t mean to highjack the thread but maybe some of these leaking ASTDB fields are a problem? We get thousands of these, so I had set up a cronjob like this:
#!/bin/sh
SQLITE3=/usr/bin/sqlite3
$SQLITE3 /var/lib/asterisk/astdb.sqlite3 " DELETE FROM astdb WHERE key like ‘/RG/%’;"
$SQLITE3 /var/lib/asterisk/astdb.sqlite3 " DELETE FROM astdb WHERE key like ‘/BLKVM/%’;"
$SQLITE3 /var/lib/asterisk/astdb.sqlite3 " VACUUM;"

BUT, found out that sometimes, this causes asterisk to lose access to it’s database, and do so on a way that does not even clearly log that it’s an issue. When it happens the “fix” is to restart asterisk.

So given that sqlite3 does need to be maintained, and that asterisk bloats it up with various status entries, is there a recommended way to fix? I guess my delete statements could be moved to “database query” calls… but what about vacuum? I guess we could periodically shut down, vacuum, restart but we’ve never had to do that before - I don’t really want to start.

Any thoughts? If I should restart this as a separate thread please let me know.

#!/bin/sh
ASTERISK=`which asterisk`
$ASTERISK -rx 'database query " DELETE FROM astdb WHERE key like \'/RG/%\';" '
$ASTERISK -rx 'database query " DELETE FROM astdb WHERE key like \'/BLKVM/%\';" '
#DICKO SAYS NO! $ASTERISK -rx 'database query " VACUUM;" '

Thanks,

Mitch

Nio thoughts on this or the Fracking issue, Out of the literally hundreds of machines, large and small, that I have built, I have never experienced that problem… Some of those machines have hundreds of thousands of rows in astdb, its never been a problem.

Nonetheless dicko, there are at least three users on the FreePBX forums all with the same issue in the last week… I realise this is an Asterisk problem not a FreePBX one - but it seems very real to me.

Incidentally, a clean install on physical (not virtualised) Proliant G8 2x6 core 32GB - and when I say clean, I used the bulk handler to create extensions etc and configured everything else manually, no backup was restored. On the first test call which terminated with a queue, taskprocessors shows:-

subm:ast_bridge_topic_all-cached-0000007d 1 0 1 450 500
subm:ast_channel_topic_all-00000081 6501 0 139 450 500
subm:ast_channel_topic_all-00000087 6500 0 89 450 500
subm:ast_channel_topic_all-cached-0000007e 8459 0 235 450 500
subm:ast_channel_topic_all-cached-0000007f 8458 0 674 450 500
subm:ast_channel_topic_all-cached-00000080 8457 0 86 450 500
subm:ast_device_state_topic-00000002 454 0 6 450 500
subm:ast_device_state_topic-00000004 453 0 18 450 500
subm:ast_device_state_topic-00000086 256 0 12 450 500
subm:ast_parking-0000005e 1 0 1 450 500
subm:ast_presence_state_topic_all-00000005 15 0 9 450 500
subm:ast_security-00000084 1296 0 2 450 500
subm:ast_system-00000071 172 0 4 450 500
subm:ast_system-00000072 171 0 3 450 500
subm:ast_system-0000007c 136 0 1 450 500
subm:ast_system-00000088 129 0 1 450 500
subm:cdr_engine-00000003 8530 0 344 4500 5000
subm:cel_aggregation_topic-00000006 8462 0 5052 2700 3000
subm:endpoint_topic_all-cached-00000008 2172 0 37 450 500
subm:endpoint_topic_all-cached-00000082 2055 0 64 450 500
subm:manager_topic-00000007 10646 0 4487 2700 3000

Both the subm:ast_channel_topic_all-cached-0000007f and subm:manager_topic-00000007 look poor - as does the subm:cel_aggregation_topic-00000006

I think sooner or later, most users with queues will start reporting issues. My installation is tiny compared to most, there is no way this hardware should be showing warnings that the system is overstressed…

While you have a legitimate issue with task processors on your system I don’t think any of the issued are related to astdb.

1 Like

I concur with @tm1000 , yes there are generated rows in astdb, you can set the high water mark higher, or you can try chan_sip to likely alleviate the problem, but I would take these problems to asterisk/soon to be Sangoma :slight_smile: