It just did it again. This time fwconsole restart
was able to restart asterisk, so this wasn’t exactly the same behavior. But, it acted the same in that no phone was able to register. No core.* file in /tmp either, but that is to be expected in this scenario I think.
I cranked up several asterisk logging settings while it was down (core set debug 5
, core set verbose 5
, pjsip set logger on
), and then I ran several registration attempts. Nothing showed in the asterisk logs when I attempted to register a phone.
I connected via ADB to the linphone-android app and saw the following logs when it attempted to register:
05-09 16:00:45.538 4948 4948 I Linphone : transaction [0x9ab74e60] channel state changed to [RES_DONE]
05-09 16:00:45.538 4948 4948 I Linphone : channel 0x97923a00: state CONNECTING
05-09 16:00:45.538 4948 4948 I Linphone : transaction [0x9ab74e60] channel state changed to [CONNECTING]
05-09 16:00:45.539 4948 4948 I Linphone : Trying to connect to [TLS://::ffff:192.168.1.5:5061]
05-09 16:00:45.626 4948 4948 I Linphone : Channel [0x97923a00]: Connected at TCP level, now doing TLS handshake with cname=freepbx.server.com
05-09 16:00:45.626 4948 4948 I Linphone : Channel [0x97923a00]: SSL handshake in progress...
05-09 16:01:39.304 4948 4948 E Linphone : Could not receive stream packet: Connection reset by peer
05-09 16:01:39.305 4948 4948 E Linphone : Channel [0x97923a00]: SSL handshake failed : NET - Connection was reset by peer
05-09 16:01:39.305 4948 4948 E Linphone : Cannot connect to [TLS://freepbx.server.com:5061]
05-09 16:01:39.305 4948 4948 I Linphone : channel 0x97923a00: state ERROR
So, the SSL handshake is failing because the server is resetting the connection. But, I don’t see any logs about attempted or failed connections on the server side.
Here’s where it gets weird. When I turned on debugging logs in Asterisk I saw a TON of Examining AMI event
: type messages. They all seem to be related to the last successful call before the system started failing. When I look at the timestamps from my ITSP I see that the call came around 15:27 local time and ended about a minute later.
In my asterisk logs I see a burst of messages around 15:27 related to the incoming call and then nothing until around 15:55 when I noticed the outage and enabled more verbose debugging. Here is a typical AMI event message:
[2018-05-09 15:55:08] DEBUG[6398] manager.c: Examining AMI event:
Event: RTCPSent
Privilege: reporting,all
Channel: PJSIP/1101-00000262
ChannelState: 6
ChannelStateDesc: Up
CallerIDNum: 1101
CallerIDName: Endpoint Name
ConnectedLineNum: +15558675309
ConnectedLineName: Caller Name Here
Language: en
AccountCode:
Context: macro-dial
Exten: s
Priority: 1
Uniqueid: 1525904848.613
Linkedid: 1525904847.612
To: 192.168.1.10:7077
From: 192.168.1.5:19783
SSRC: 0x1985c1a0
PT: 201(RR)
ReportCount: 1
Report0SourceSSRC: 0xd0b15fff
Report0FractionLost: 0
Report0CumulativeLost: 0
Report0HighestSequence: 3023
Report0SequenceNumberCycles: 0
Report0IAJitter: 8
Report0LSR: 4102905273
Report0DLSR: 1598.1920
So, minutes after the call was supposed to have ended, I’m seeing messages that say the channel is “Up”. Those messages continue until I run fwconsole restart
. Stranger still, the extension listed in the RTCPSent event is not the one that the CDR reports show answered the call.
I’m totally stumped. Any ideas?