FlowRoute flapping connection

jtsage · October 11, 2020, 8:57pm

Admittedly, I’m not really sure where to start on this one… I am getting in the full log several hundred of these:

[2020-10-11 03:46:39] VERBOSE[21867] res_pjsip/pjsip_configuration.c: Endpoint FlowRoute is now Reachable
[2020-10-11 03:46:39] VERBOSE[21867] res_pjsip/pjsip_options.c: Contact FlowRoute/sip:[email protected] is now Reachable.  RTT: 9.487 msec
[2020-10-11 03:47:42] VERBOSE[30549] res_pjsip/pjsip_configuration.c: Endpoint FlowRoute is now Unreachable
[2020-10-11 03:47:42] VERBOSE[30549] res_pjsip/pjsip_options.c: Contact FlowRoute/sip:[email protected] is now Unreachable.  RTT: 0.000 msec
[2020-10-11 03:53:39] VERBOSE[21867] res_pjsip/pjsip_configuration.c: Endpoint FlowRoute is now Reachable
[2020-10-11 03:53:39] VERBOSE[21867] res_pjsip/pjsip_options.c: Contact FlowRoute/sip:[email protected] is now Reachable.  RTT: 9.289 msec
[2020-10-11 03:55:42] VERBOSE[29640] res_pjsip/pjsip_configuration.c: Endpoint FlowRoute is now Unreachable
[2020-10-11 03:55:42] VERBOSE[29640] res_pjsip/pjsip_options.c: Contact FlowRoute/sip:[email protected] is now Unreachable.  RTT: 0.000 msec
[2020-10-11 03:56:39] VERBOSE[24147] res_pjsip/pjsip_configuration.c: Endpoint FlowRoute is now Reachable

I have had no issue INCOMING, even when it’s unreachable, calling that trunk will immediately reconnect.

OUTGOING works, but only when it’s marked reachable.

When I set up the route, I followed this: https://support.flowroute.com/895670-FreePBX-PJSIP-Trunk-Setup

Thanks for any insight.

~j

Stewart1 · October 11, 2020, 10:18pm

Just guessing here, perhaps Flowroute deprioritizes OPTIONS and it sometimes takes longer than 3 seconds to respond. I don’t have this issue with them but had it with another provider.

Try putting this in (or adding it to) /etc/asterisk/pjsip.aor_custom_post.conf:

[FlowRoute](+type=aor)
qualify_timeout=16.0

Restart Asterisk and see whether this eliminates or at least greatly reduces the occurrences.

If not, I’m wondering if pjsip is dumb about picking from the many servers that us-east-va.sip.flowroute.com resolves to. If it hits one that’s down, it might keep beating on the same one with successive OPTIONS retries, instead of trying another. Given that your log segment is in the wee morning hours on a Sunday, this might just be their routine maintenance. A current nslookup shows:

> _sip._udp.us-east-va.sip.flowroute.com
Server:  dns.google
Address:  8.8.8.8

Non-authoritative answer:
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 30
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-west-or-02.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 30
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-west-or-01.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 20
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-nj-01.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 10
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-va-01.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 10
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-va-02.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 20
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-nj-04.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 20
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-nj-02.flowroute.com
_sip._udp.us-east-va.sip.flowroute.com  SRV service location:
          priority       = 20
          weight         = 50
          port           = 5060
          svr hostname   = ep-us-east-nj-03.flowroute.com
>

Turning on pjsip logger (or capturing with tcpdump, etc.) would show whether this is the case.

jtsage · October 12, 2020, 12:08am

Giving that bit of config a shot.

Fwiw, I didn’t mean to imply that this was time specific - those log lines are still being added, on the xx:xx:39 and xx:xx:42 on the dot as it happens. (63 and 57 seconds apart. Indeed, qualify_frequency is 60sec, as is general retry).

Thanks much.

Stewart1 · October 12, 2020, 12:17am

The 3 second difference between 42 and 39 is the default value of qualify_timeout. If the config change works as intended but doesn’t solve the problem, you should now see a 16 second difference.

At the Asterisk command prompt, you can issue
pjsip show aor FlowRoute
and it should show the qualify_timeout value.

jtsage · October 12, 2020, 12:21am

Yeah, I think ignore this whole thread. I had turned the responsive firewall back “all the way” on, and had a typo on the permit line I think For reference, I had the firewall on since setup, but I was debugging a bunch of clients, had run into issues, and dropped the pjsip service back to the Internet zone. Fixed that right about the same time this cropped up.

Gonna let it do it’s thing overnight and see if anything persists.

Thanks for your time.

EDIT - some lies. maybe the match field is still wrong:

147.75.60.160/28,34.210.91.112/28,147.75.65.192/28,34.226.36.32/28

At any rate, when I take a look at the “fpbxsmarthosts” iptables chain, only 1 address appears.

34.226.36.32

For the time being, I added those netblocks to the trusted zone under networks. At a glance, it looks like this:

https://issues.freepbx.org/browse/FREEPBX-18741

Stewart1 · October 12, 2020, 12:28am

Unless there is also a bug in the firewall logic, this doesn’t make make sense. The OPTIONS request is an outbound packet and AFAIK the firewall blocks nothing going out. The response should appear as an established/related packet and should be accepted regardless of firewall settings.

AFAIK match/permit is only used for routing incoming INVITEs; responses are routed based on Call-ID, and other SIP tags if needed.

jtsage · October 13, 2020, 3:24am

Update: that change seems to have fixed the problem. No log entries since my last post, connection seems stable.

That said, @Stewart1, I totally agree - it should be an outgoing connection, and have nothing to do with what I changed assuming I am thinking about it right. But, it works, and I’ll not complain over-much.

Thanks for your help.

system · October 20, 2020, 3:24am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.