It just did it again. This time
fwconsole restart was able to restart asterisk, so this wasn’t exactly the same behavior. But, it acted the same in that no phone was able to register. No core.* file in /tmp either, but that is to be expected in this scenario I think.
I cranked up several asterisk logging settings while it was down (
core set debug 5,
core set verbose 5,
pjsip set logger on), and then I ran several registration attempts. Nothing showed in the asterisk logs when I attempted to register a phone.
I connected via ADB to the linphone-android app and saw the following logs when it attempted to register:
05-09 16:00:45.538 4948 4948 I Linphone : transaction [0x9ab74e60] channel state changed to [RES_DONE]
05-09 16:00:45.538 4948 4948 I Linphone : channel 0x97923a00: state CONNECTING
05-09 16:00:45.538 4948 4948 I Linphone : transaction [0x9ab74e60] channel state changed to [CONNECTING]
05-09 16:00:45.539 4948 4948 I Linphone : Trying to connect to [TLS://::ffff:192.168.1.5:5061]
05-09 16:00:45.626 4948 4948 I Linphone : Channel [0x97923a00]: Connected at TCP level, now doing TLS handshake with cname=freepbx.server.com
05-09 16:00:45.626 4948 4948 I Linphone : Channel [0x97923a00]: SSL handshake in progress...
05-09 16:01:39.304 4948 4948 E Linphone : Could not receive stream packet: Connection reset by peer
05-09 16:01:39.305 4948 4948 E Linphone : Channel [0x97923a00]: SSL handshake failed : NET - Connection was reset by peer
05-09 16:01:39.305 4948 4948 E Linphone : Cannot connect to [TLS://freepbx.server.com:5061]
05-09 16:01:39.305 4948 4948 I Linphone : channel 0x97923a00: state ERROR
So, the SSL handshake is failing because the server is resetting the connection. But, I don’t see any logs about attempted or failed connections on the server side.
Here’s where it gets weird. When I turned on debugging logs in Asterisk I saw a TON of
Examining AMI event: type messages. They all seem to be related to the last successful call before the system started failing. When I look at the timestamps from my ITSP I see that the call came around 15:27 local time and ended about a minute later.
In my asterisk logs I see a burst of messages around 15:27 related to the incoming call and then nothing until around 15:55 when I noticed the outage and enabled more verbose debugging. Here is a typical AMI event message:
[2018-05-09 15:55:08] DEBUG manager.c: Examining AMI event:
CallerIDName: Endpoint Name
ConnectedLineName: Caller Name Here
So, minutes after the call was supposed to have ended, I’m seeing messages that say the channel is “Up”. Those messages continue until I run
fwconsole restart. Stranger still, the extension listed in the RTCPSent event is not the one that the CDR reports show answered the call.
I’m totally stumped. Any ideas?