An apology for yesterday's outage

Please let me first apologize for the impact that yesterday’s outage of Sangoma infrastructure had on your business. We understand the trust you have placed in us to perform reliably, and we did not live up to those expectations yesterday. For that we are truly sorry, and commit to using this event as a learning experience to improve our service to you.

In summary, yesterday we suffered a major outage of our internal database cluster that powers many of the public facing services we provide. This outage had many impacts on our customers including the Sangoma Portal, Support Ticketing systems, FreePBX and PBXact updates, and many other business transactions with Sangoma. While many of Sangoma’s Cloud Services customers were unaffected, the event did create a service disruption for some SIPStation customers, as well as an outage on our FAXStation, VPN and SMS services.

As to what specifically happened, at approximately 2:08PM CST yesterday, one of our production database clusters crashed, and indicated data corruption as a result of the crash. Subsequently we were unable to simply restore it to a working state. Our internal infrastructure teams worked through the night to restore the data, redeploy the database (in a manner that would not be susceptible to another similar problem), and return the services to normal operation. All services were fully restored by 6AM CST today.

In addition to the outage, there was also a lack of effective communication and support for our customers and partners during the event. While there was not a lot of definitive information we could provide during the early portion of the outage, we should have done a better job of keeping you informed about the scope of the outage, and our progress. Some of you have expressed your displeasure with our lack of communication, and we certainly heard you, and commit to doing better in the future. As we continue our investigation into the root cause of this event, we will provide more details over the next few days, along with a plan for corrective actions.

We know that you depend on Sangoma products and services to run your business, and our goal is to exceed your expectations. Yesterday, we fell short of that goal, and for that we apologize. We take the availability and reliability of our services and infrastructure very seriously, and will take steps to prevent this in the future. Thank you for your patience, understanding and continued support.

3 Likes

Just two words here “High Availability”, we all think it’s a good idea :wink:

3 Likes

Just as a quick follow-up, we had a short unintended network outage this afternoon related to the IT team trying to put additional redundancies in place to avoid this type of issue in the future. The main items impacted by this network outage were the Sangoma Portal, the SIPStation store, and the FAXStation service. This should not have impacted FreePBX mirror servers or SIPStation SIP service. The network issue was quickly identified by the team and full services restored a few minutes later. The disruption was also communicated on https://status.sangoma.com.

Please accept our apologies for the disruptions, and know that we are trying to learn from these issues so that they are less likely to happen in the future.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.