Asterisk locks up not allowing any calls in or out. (even between internal extensions)


(P Smith) #1

Hey we are currently getting a lock up where no one can’t make or receive any calls. This normally happens around the lunch hour.

Normally I am able to get it back going by using “core restart now”.

PBX Version: 15.0.17.24
PBX Distro: 12.7.5-1807-1.sng7
Asterisk Version: 13.22.0

Any help would be great. Thanks.


(Andrew) #2

You’re going to need to post logs - https://wiki.freepbx.org/display/SUP/Providing+Great+Debug#ProvidingGreatDebug-AsteriskLogs-PartII


(P Smith) #3

Hey Not sure if this helps, but it seems to drop the extensions one by one.
this is from the full log in /var/log/asterisk/

/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[117720] res_pjsip/pjsip_options.c: Contact 228/sip:228@192.168.20.43:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[117720] res_pjsip/pjsip_configuration.c: Endpoint 228 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[113560] res_pjsip/pjsip_options.c: Contact 236/sip:236@192.168.20.87:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[113560] res_pjsip/pjsip_configuration.c: Endpoint 236 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[105197] res_pjsip/pjsip_options.c: Contact 237/sip:237@192.168.20.93:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[105197] res_pjsip/pjsip_configuration.c: Endpoint 237 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[153253] res_pjsip/pjsip_options.c: Contact 243/sip:243@192.168.20.70:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[153253] res_pjsip/pjsip_configuration.c: Endpoint 243 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[95294] res_pjsip/pjsip_options.c: Contact 247/sip:247@192.168.20.60:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[95294] res_pjsip/pjsip_configuration.c: Endpoint 247 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[113560] res_pjsip/pjsip_options.c: Contact 251/sip:251@192.168.20.11:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[113560] res_pjsip/pjsip_configuration.c: Endpoint 251 is now Unreachable
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[153253] res_pjsip/pjsip_options.c: Contact 254/sip:254@192.168.20.81:5060 has been deleted
/var/log/asterisk/full:[2021-03-23 13:00:11] VERBOSE[153253] res_pjsip/pjsip_configuration.c: Endpoint 254 is now Unreachable


#4

Where are the phones and the PBX relative to each other on the internet?


(P Smith) #5

We have some in the same office on the same vlan. 192.168.20.0/24
Some are softphones on a vpn lan. 192.168.21.0/24


#6

And where is the PBX and how are the VLANs constructed?


(P Smith) #7

Server is on the 20 vlan ( or what we call “voice vlan”)
All in office desk phones are on this network. Server has access to 20 and 21 vlan.
192.168.20.5/255.255.255.0
Soft phones for people working remotely are on 21 vlan. These have access to all ports the sever.
192.168.21.5/255.255.255.0


#8

If the phones that go down are on the same vlan as the PBX as they seem to be, I would suspect the common hardware presumably a switch


(P Smith) #9

I made this change and it hasn’t dropped since.

Fraser

Mar '19

Hi,

I had this exact same problem with PJSIP and it is related to this Asterisk bug: https://issues.asterisk.org/jira/browse/ASTERISK-27821

Changing all PJSIP extension’s MWI Subscription Type from Auto to Solicited solves the problem.

Fraser.


(P Smith) #10

Server has since crashed at 4:21 PM. So above was not the solution.


(Dave Burgess) #11

That uncauses a resource exhaustion, which this doesn’t really sound like. With all of the extensions losing connection at the same time, I’d be suspecting hardware, especially something like a bad switch or wonky power. If you are using POE from the switch, for example, I’d be really likely to look in that direction.


(P Smith) #12

I will have a look at the switch logs to see if they are erroring out.
Checked the firewall and didn’t see and switch to the backup firewall or errors.

These are the only other warning / errors i have seen around the time of the crash.
[2021-04-06 16:20:55] WARNING[160839][C-000002cd] taskprocessor.c: The ‘subp:ast_channel_topic_all-000004bb’ task processor queue reached 500 scheduled tasks.

[2021-04-06 16:21:29] ERROR[14260] res_pjsip_header_funcs.c: No headers had been previously added to this session.


(Dave Burgess) #13

This is a symptom of the problem you solved with the MWI subscription type.

This, on the other hand, looks like a TCP failure. The most likely cause of that (that I can think of) would be the firewall resetting the TCP connections, a time-out somewhere, or the phones restarting the connections and the server trying to pick up in the middle of a conversation. All three of those could be caused by a bad switch module or a stateful firewall/switch losing its mind and forgetting who your phones are. Of course, as with most things IP, it could also be squirrels or a coincidence.

The more we talk, the more it sounds like bad infrastructure. I could be completely all wet on that, but I’d really like to suggest starting from the network core and working your way out.


#14

At the beginning you say you are running Asterisk 13 - any reason you are not running version 16? I’m not sure how well pjsip works on the old Asterisk 13.


(P Smith) #15

we have upgraded to V16 now. No crashes yet… knock on wood.