I am having a problem where millions of tasks are queued up and then the PBX stops responding. Even without registered endpoints or calls it takes hours for the queue to empty itself.
While I migrate to different hardware, I am having to restart asterisk and/or the complete machine several times per day.
is there a command I could type to force flush the queue? and all other call state data?
thank you.
anybody have a suggestion for what to try? We are now having to restart every 10-20 minutes and we only have about 20 active calls.
thanks.
You could run a fwconsole restart command on a cron job, to automatically run every x minutes, but that will take everything down. You could do something like this, but be more selective and just run it on asterisk the asterisk service.
https://wiki.asterisk.org/wiki/display/AST/Stopping+and+Restarting+Asterisk+From+The+CLI
comtech, thats what we are doing now, but it doesnt fully flush everything, nor does a full machine restart. It helps our problem, but only temporarily.
than you for the suggestion though.
anybody? is this possible? once the cache is full, it takes hours to recover. this error state survives, hangup requests, fwconsole restarts and machine restarts.
If you know sql syntax then
rasterisk -x âdatabase query (your sql query)â
alows you to drill down and remove bad keys and values. A hammer solution with reference to:-
https://wiki.asterisk.org/wiki/display/AST/SQLite3+astdb+back-end
Is to you delete the sqlite3 database, it will be recreated on asterisk start up, then populated by a FreePBX reload, you will just loose any ad-hoc call forwarding/extension state and any blacklist
Trouble is with that, what is causing your problem? If you donât fix it it will just reappear
Dicko, thanks, maybe I should try that. Can you tell me how to delete the DB completely?
I dont know what happened. It was a new deployment, it worked fine for over a month. Suddenly Friday week and a half ago things started getting wonky. Lots of queued up tasks on the taskprocessor, SIP errors popping up on the CLI, Then people stopped receiving calls or being able to make new ones. A system reboot solved it until the following monday. Monday same thing happened twice. It got worse and worse by the day. Today we had to do fwconsole restarts every 20-30 minutes.
Its a Xen Server virtual machine with 4 processors and 16GB of ram for only about 100 registered extensions and about 40 active calls during peak hours.
The only thing iâve noticed is that the period it works fine is the longest in the morning. As if the some cache is getting filled up and is not clearing up fast enough, but clears up overnight. In the morning thigns work fine for a few hours and then all the same problems return.
Yesterday, under the assumption the problem was at a lower level (hardware or system installation) we installed a new machine with the latest frrepbx distro download and then backed up the current machine and restored into the the new installation. When we rebooted, even though it had a different IP and all trunks were disabled, it still showed queued up activity that had transferred over from the machine and it took hours for it to clean itself up. So maybe it is a DB problem? What exactly should I delete to test this? very few people have forwarding set up and nobody has blacklists.
THANK YOU!
dicko, should I simply delete (or temporarily rename this?
utils/astdb2bdb /var/lib/asterisk/astdb.sqlite3
(at whichever PATH it is located in my machine)
is that what you meant?
yes, delete /var/lib/asterisk/astdb.sqlite3 to hammer it
will try it first thing in the morning. thank you!
dicko - genius-level user that you are (absolutely NO irony⊠Iâve read your posts and your first-hand experience is formidable).
Both amcoit and I have similar problems which suddenly started for no discernible reason. We regularly see a 300-400% breach of the high water level for some taskprocesses. In some instances (amcoit is here now) after the PBX has been running for a short while it becomes completely unresponsive to SIP traffic. Incoming and outgoing calls get various unavailable signals and NOTHING (literally nothing!) is logged in any of the logfiles.
Our separate posts on the subject are here:-
If only someone could give some idea of what the problematic subprocessors are actually responsible for, it would helpâŠ
The one commonality I notice with this problem is âvirtualizationâ , perhaps xen and vmware more often than containers or kvm.
One would surmise this to be an asterisk âproblemâ not a FreePBX one, as such I would recompile asterisk from source against the running kernel, not an easy option if you are distro based though. This should eliminate a couple of possible causes.
Iâm not 100% certain that amcoit is virtualised⊠unless I missed it in his post?
And although itâs not very scientific testing, I did rebuild my FreePBX installation on a quad core i7 and still received regular warnings about taskprocessor limits being exceeded once I had restored my original configuration.
I have re-installed (and re-configured manually - subject to copying moh, voicemail etc) the installation on a new machine - Iâm just waiting for an opportunity to switch over to the new install and see if the issue goes away - the new install isnât virtual and is on a very overspecced machine.
Sadly, I lack the ability and knowledge to recompile the source - and this is a production system
I tend to agree that this is probably an Asterisk thing, not a distro one⊠just wish I could understand what the taskprocessers are responsible for so I could narrow-down the issue.
Again , you have.
âonce I had restored my original configuration.â
That points to the sqlite3 database. as the rest of the restore is just dialplan and voicemail, html and mysql tables and spooled files, none if which are likely culprits, if it is the astdb , then the problem remains, what is causing that?
dicko - good call⊠on the new machine, I didnât restore any backup but reconfigured manually⊠Iâll see what happens
Dicko,
I tried deleting the sqlite DBâŠhere are the results
fwconsole stop
mv astdb.sqlite3 astdb.sqlite3.bak
fwconsole start
fwconsole reload
the new astdb.sqlite3 DB was created and populated.
All extensions immediately registered and were able to make outgoing external calls. BUT, nobody was able to receive calls, neither calls coming in from the trunks nor internal ext to ext calls. All attempts gave the âthe person you are calling is unavailableâ and sent to VM. I tried toggling DND from the extension and though the voice response was correct (âDND activatedâ âDND deactivatedâ) nothing changed (still says âextension unavailableâ when called). If I went to the freepbx extensions page and searched for this extension the little checkboxes that show the status of DND (and other features) were all unchecked. As if the DND status is not recorded.
Here is the pjsip show endpoint for the extension I am testing with. This extension, using zoiper shows registered (on zoiper) and can successfully make outgoing calls.
pjsip show endpoint 7292 Endpoint: <Endpoint/CID.....................................> <State.....> <Channels.> I/OAuth: <AuthId/UserName...........................................................> Aor: <Aor............................................> <MaxContact> Contact: <Aor/ContactUri..........................> <Hash....> <Status> <RTT(ms)..> Transport: <TransportId........> <Type> <cos> <tos> <BindAddress..................> Identify: <Identify/Endpoint.........................................................> Match: <criteria.........................> Channel: <ChannelId......................................> <State.....> <Time.....> Exten: <DialedExten...........> CLCID: <ConnectedLineCID.......> ========================================================================================== Endpoint: 7292/7292 Unavailable 0 of inf InAuth: 7292-auth/7292 Aor: 7292 2 ParameterName : ParameterValue =========================================================== 100rel : yes accept_multiple_sdp_answers : false accountcode : acl : aggregate_mwi : true allow : (g729|ulaw|alaw|gsm) allow_overlap : true allow_subscribe : true allow_transfer : true aors : 7292 asymmetric_rtp_codec : false auth : 7292-auth bind_rtp_to_media_address : false call_group : callerid : "MyName" <7292> callerid_privacy : allowed_not_screened callerid_tag : connected_line_method : invite contact_acl : context : from-internal cos_audio : 5 cos_video : 4 device_state_busy_at : 0 direct_media : true direct_media_glare_mitigation : none direct_media_method : invite disable_direct_media_on_nat : false dtls_ca_file : dtls_ca_path : dtls_cert_file : dtls_cipher : dtls_fingerprint : SHA-256 dtls_private_key : dtls_rekey : 0 dtls_setup : active dtls_verify : No dtmf_mode : rfc4733 fax_detect : false fax_detect_timeout : 0 follow_early_media_fork : true force_avp : false force_rport : true from_domain : from_user : g726_non_standard : false ice_support : false identify_by : username,ip inband_progress : false incoming_mwi_mailbox : language : es_MX mailboxes : 7292@device media_address : media_encryption : no media_encryption_optimistic : false media_use_received_transport : false message_context : moh_suggest : default mwi_from_user : mwi_subscribe_replaces_unsolicited : true named_call_group : named_pickup_group : notify_early_inuse_ringing : false one_touch_recording : true outbound_auth : outbound_proxy : pickup_group : record_off_feature : apprecord record_on_feature : apprecord refer_blind_progress : true rewrite_contact : true rpid_immediate : false rtcp_mux : false rtp_engine : asterisk rtp_ipv6 : false rtp_keepalive : 0 rtp_symmetric : true rtp_timeout : 0 rtp_timeout_hold : 0 sdp_owner : - sdp_session : Asterisk send_diversion : true send_pai : true send_rpid : false set_var : srtp_tag_32 : false sub_min_expiry : 0 subscribe_context : t38_udptl : false t38_udptl_ec : none t38_udptl_ipv6 : false t38_udptl_maxdatagram : 0 t38_udptl_nat : false timers : yes timers_min_se : 90 timers_sess_expires : 1800 tone_zone : tos_audio : 184 tos_video : 136 transport : trust_id_inbound : true trust_id_outbound : false use_avpf : false use_ptime : false user_eq_phone : false voicemail_extension : FreePBXŸ FreePBX is a registered trademark of Sangoma Technologies Inc. FreePBX 14.0.3.13 is licensed under the GPL Copyright© 2007-2018www.sangoma.com undefined
In the case of Queues, even though all static agents were registered (I checked all pjsip endpoints), nobody was receiving any calls. The CLI was flooded with these messages:
[2018-08-22 08:37:28] ERROR[1760][C-0000003f]: res_pjsip_header_funcs.c:454 func_read_header: This function requires a PJSIP channel. [2018-08-22 08:37:28] WARNING[1783][C-0000003f]: chan_sip.c:22996 func_header_read: This function can only be used on SIP channels.
Which I assume is the system attempting to reach each agent in round robin.
Here is the last section of what the cli shows for a test we did coming in from an external call to a single extension
Goto (macro-user-callerid,s,16) -- Executing [s@macro-user-callerid:16] NoOp("PJSIP/amco-cc-MyCarrier-0000000f", "Macro Depth is 3") in new stack -- Executing [s@macro-user-callerid:17] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?report2:macroerror") in new stack -- Goto (macro-user-callerid,s,18) -- Executing [s@macro-user-callerid:18] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?continue") in new stack -- Goto (macro-user-callerid,s,37) -- Executing [s@macro-user-callerid:37] Set("PJSIP/amco-cc-MyCarrier-0000000f", "CALLERID(number)=3418653391") in new stack -- Executing [s@macro-user-callerid:38] Set("PJSIP/amco-cc-MyCarrier-0000000f", "CALLERID(name)=3418653391") in new stack -- Executing [s@macro-user-callerid:39] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "0?cnum") in new stack -- Executing [s@macro-user-callerid:40] Set("PJSIP/amco-cc-MyCarrier-0000000f", "CDR(cnam)=3418653391") in new stack -- Executing [s@macro-user-callerid:41] Set("PJSIP/amco-cc-MyCarrier-0000000f", "CDR(cnum)=3418653391") in new stack -- Executing [s@macro-user-callerid:42] Set("PJSIP/amco-cc-MyCarrier-0000000f", "CHANNEL(language)=es_MX") in new stack -- Executing [s@macro-user-callerid:43] GosubIf("PJSIP/amco-cc-MyCarrier-0000000f", "0?app-check-classofservce,s,1()") in new stack -- Executing [s@macro-vm:2] Set("PJSIP/amco-cc-MyCarrier-0000000f", "VMGAIN=") in new stack -- Executing [s@macro-vm:3] Macro("PJSIP/amco-cc-MyCarrier-0000000f", "blkvm-check,") in new stack -- Executing [s@macro-blkvm-check:1] Set("PJSIP/amco-cc-MyCarrier-0000000f", "GOSUB_RETVAL=") in new stack -- Executing [s@macro-blkvm-check:2] ExecIf("PJSIP/amco-cc-MyCarrier-0000000f", "0?Set(GOSUB_RETVAL=TRUE)") in new stack -- Executing [s@macro-blkvm-check:3] MacroExit("PJSIP/amco-cc-MyCarrier-0000000f", "") in new stack -- Executing [s@macro-vm:4] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?vmx,1") in new stack -- Goto (macro-vm,vmx,1) -- Executing [vmx@macro-vm:1] Set("PJSIP/amco-cc-MyCarrier-0000000f", "MEXTEN=7292") in new stack -- Executing [vmx@macro-vm:2] Set("PJSIP/amco-cc-MyCarrier-0000000f", "MMODE=NOANSWER") in new stack -- Executing [vmx@macro-vm:3] Set("PJSIP/amco-cc-MyCarrier-0000000f", "RETVM=") in new stack -- Executing [vmx@macro-vm:4] Set("PJSIP/amco-cc-MyCarrier-0000000f", "MODE=unavail") in new stack -- Executing [vmx@macro-vm:5] Macro("PJSIP/amco-cc-MyCarrier-0000000f", "get-vmcontext,7292") in new stack -- Executing [s@macro-get-vmcontext:1] Set("PJSIP/amco-cc-MyCarrier-0000000f", "VMCONTEXT=") in new stack -- Executing [s@macro-get-vmcontext:2] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?200:300") in new stack -- Goto (macro-get-vmcontext,s,200) -- Executing [s@macro-get-vmcontext:200] Set("PJSIP/amco-cc-MyCarrier-0000000f", "VMCONTEXT=default") in new stack -- Executing [vmx@macro-vm:6] Set("PJSIP/amco-cc-MyCarrier-0000000f", "MODE=unavail") in new stack -- Executing [vmx@macro-vm:7] NoOp("PJSIP/amco-cc-MyCarrier-0000000f", "MODE IS: unavail") in new stack -- Executing [vmx@macro-vm:8] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?chknomsg") in new stack -- Goto (macro-vm,vmx,10) -- Executing [vmx@macro-vm:10] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "0?s-NOANSWER,1") in new stack -- Executing [vmx@macro-vm:11] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?notdirect") in new stack -- Goto (macro-vm,vmx,13) -- Executing [vmx@macro-vm:13] NoOp("PJSIP/amco-cc-MyCarrier-0000000f", "Checking if ext 7292 is enabled: ") in new stack -- Executing [vmx@macro-vm:14] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?s-NOANSWER,1") in new stack -- Goto (macro-vm,s-NOANSWER,1) -- Executing [s-NOANSWER@macro-vm:1] Macro("PJSIP/amco-cc-MyCarrier-0000000f", "get-vmcontext,7292") in new stack -- Executing [s@macro-get-vmcontext:1] Set("PJSIP/amco-cc-MyCarrier-0000000f", "VMCONTEXT=") in new stack -- Executing [s@macro-get-vmcontext:2] GotoIf("PJSIP/amco-cc-MyCarrier-0000000f", "1?200:300") in new stack -- Goto (macro-get-vmcontext,s,200) -- Executing [s@macro-get-vmcontext:200] Set("PJSIP/amco-cc-MyCarrier-0000000f", "VMCONTEXT=default") in new stack -- Executing [s-NOANSWER@macro-vm:2] VoiceMail("PJSIP/amco-cc-MyCarrier-0000000f", "7292@default,u") in new stack -- <PJSIP/amco-cc-MyCarrier-0000000f> Playing 'vm-theperson.g729' (language 'es_MX') -- <Local/7225@from-queue-00000f08;2>AGI Script attendedtransfer-rec-restart.php completed, returning 0 -- Executing [s@macro-hangupcall:6] Hangup("Local/7225@from-queue-00000f08;2", "") in new stack == Spawn extension (macro-hangupcall, s, 6) exited non-zero on 'Local/7225@from-queue-00000f08;2' in macro 'hangupcall' == Spawn extension (ext-local, h, 1) exited non-zero on 'Local/7225@from-queue-00000f08;2' -- Nobody picked up in 1000 ms -- -- LazyMembers debugging - Numbusies: 0, Nummems: 17 cos.agi: Starting Class Of Service checks cos.agi: Detected EXTERNAL Call. Skipping CoS Checks cos.agi: Starting Class Of Service checks cos.agi: Detected EXTERNAL Call. Skipping CoS Checks -- <Local/7238@from-queue-00000f0a;2>AGI Script cos.agi completed, returning 0
any and all help is welcome
thank you!
PS.- Jon, yes, we are virtualized. Xen Server 4 cores, 16GB.
Did you reload FreePBX?
restarted and reloaded asterisk and also restarted the machine. No difference, all extensions unavailable. Even weirder, while asterisk was stoppped I returned to the previous (theoretically broken) astdb.sqlite3 file and I now have the exact same issue on that one, ie. All extensions are Unavailable.
I assumed this part of the issue might be related to https://community.asterisk.org/t/reload-endpoint-runtime/72156/5 but I did restart.
anything else you can think of that might help that I should try?
thank you
go back to your last known good backup?
we are doing that as we speak.
Something we just realized, is that on the failing system with all extensions showing as unavailable if we go one by one and press âsubmitâ followed by Apply (with no changes made) right after the apply the extension is now able to receive phone calls. Does that tell you anything? anyway to do this to all extensions at once?
thanks!
Edit: just to clarify, this is on the system where we deleted astdb and had it recreated by asterisk.