We have a bunch of FreePBX servers running on-premise at different client locations - a mix of hardware and firewall brands, currently no issues with any of them.
We just built our first server in the cloud using a known cloud hosting provider. Recently the customer for this server told us that some callers reported getting a fast busy signal and then the call dropping a few seconds later. Nothing appeared in our FreePBX/Asterisk logs nor have I ever heard FreePBX/Asterisk giving a fast busy. We opened a ticket to our SIP trunk provider who told us that they see the failed calls, and that our server never responded or accepted the connection.
Based on the above, including our experience with our on-premise servers, we suspect the issue is with the cloud host. We’ve opened a ticket with them and are working with them, but it’s early and I feel like this might be difficult to diagnose. Our plan is to build a bunch of FreePBX servers in the cloud for our clients, but we need to resolve all issues and have a clean, issue-less few months before we are convinced this is a viable solution.
My question is, has anyone else experienced similar issues with a SIP provider intermittently not getting a successful connection to a cloud server, and if so, what issue was uncovered as the cause?
You will need to trace SIP traffic on your external router, if it doesn’t see traffic from the provider, then it is presumably the VSP’s problem, or perhaps you haven’t purchased enough ‘channels’ ? ,if it is visible there but not at your PBX, then it is your routing problem.
To my knowledge, Asterisk never replies with a “fast busy” unless you force it to, because you would have to successfully accept the call and then provide appropriate ‘early media’ to synthesize that ‘fast busy’ tone
A “120” busy is the indication from an intervening switch that the call path to your destination switch is overloaded and the alternate route to the server is also unavailable. Normally, SIP servers don’t produce these - this sounds like a telco switch not having a route to that server.
Check with your ITSP and see if there are issues in the PSTN side of their operations. Having the original caller’s phone number would be a big help, although troubleshooting an intermittent problem like this is always a PITA.
I ran a continuous tcpdump on the server for the past 18 hours or so and also was able to see failed calls (i.e. calls that don’t get answered by FreePBX) at the SIP provider side throughout the day. The crazy thing is that I see SIP INVITE packets hitting the server for every single one of the failed calls.
The question now is, if the SIP INVITE packets are hitting the server, why is FreePBX/Asterisk not answering/responding? Most other calls are accepted with no problem - there is a “100 Trying” and then a “200 OK”. For the calls that are failing, I see our SIP provider try 6 to 7 times, and there is no “100 Trying” or “200 OK” response from our server.
One thing to note that may or may not matter - we are using the pjsip driver for this.
I suggest you install sngrep,
you will see a very concise view of the transactions in each call and why they are ultimately failing.
Goinng to try sngrep tonight.
In the meantime, I may have found something that could have been causing the issue, but need confirmation. In our pcap, I saw many 404 responses to our system trying to register with our SIP provider. Our SIP provider is doing a direct IP connection to our server when a call comes in - i.e. there is no registration needed needed in either direction. However, registration was set to the default “Send” option. I changed it to “None”.
We have the Responsive Firewall enabled, although not on the pjsip driver - only on chan_sip since that’s what we’re using for remote extensions (the pjsip port is protected by our cloud host’s firewall since we’ve set up an ACL for the IP addresses of our SIP provider). I also saw that pjsip was set to “Local” in the Core Services list. I toggled “Internet” on when I saw this.
As of now, since these two changes, I’ve seen no failed calls at all. I listed both changes I made for transparency, but I’m assuming it has something to do with the enabling of the Internet zone for pjsip in the Responsive Firewall. Is this a correct assumption, or at the very least, is there a way for me to see what the Responsive Firewall is blocking, so that I can switch back to Local only and monitor to see if this really was causing it?
So far still so good.
Is there a Responsive Firewall log that we can monitor?
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.