Outage Summary - 4th/5th January UTC


#1

KoalaSafe experienced a significant outage in its DNS system from 04/01/2018 18:30 - 05/01/2018 03:30 UTC

This issue was caused by an unexpected flood of requests (we are still analysing whether it was a normal spike from school holidays or whether it was a bad actor)

It was related to last Novembers outage, we thought we had fixed the underlying issue with our scaling under these circumstances. We have now identified the issue, fixed and deployed the changes which eventually rectified the outage (last time we did not deploy the changes to try and rectify the outage as quickly as possible). We will continue to monitor the changes to ensure no side effects.

The outage was confirmed in this thread however it seems it was missed by numerous customers and did not convey the severity of the issue, so we will create a separate announcement in future to alleviate this confusion.

We understand the significant impact that outages like this have on families especially during school holidays and apologise for the disruption. We do perform extensive testing on the product and the stability of the system is our highest priority.

Adam and The KoalaSafe Team


#2

Sorry, Adam, I don’t mean to be rude, but there’s something disingenuous here.

You write in your announcement, “The outage was confirmed in this thread however it seems it was missed by numerous customers and did not convey the severity of the issue, so we will create a separate announcement in future to alleviate this confusion.”

In the linked thread, you wrote, and I’m quoting your post in its entirety: “We are having a slow down on DNS for some clients. Looking into it.”

That is NOT confirming an outage, it’s merely acknowledging that some customers have reported a problem. Acknowledging an outage means writing a post that says something like, “We are down right now, we are working diligently on a solution and we don’t have a timetable for restoration of service. We will keep you apprised of the situation as we are able.”

I encourage you to evaluate your procedures for handling communications with your customers. Much of the frustration and a great many messages on the discussion boards could have been averted with a simple message like the example in the previous paragraph.

Thank you.


#3

I think you have misunderstood the incredible disruption to our lives that this outage has caused. We had no filtered internet for 5 hours! That is 300 minutes without DanTDM, BubbleGuppies, and various YouTube videos demonstrating the latest Minecraft techniques. Now I know how people in third world countries feel. Sure they also have to deal with no food and limited fresh water but I’m sure that, after they get done trying to eat enough bugs to survive another day, they will weep for us all. I don’t know how the rest of you made it through this dark time but we were forced to interact with our children (gasp). We had to eat dinner together, we watched the science channel together for a while, and even played a couple non internet based board games. It was a struggle but we all pulled through. I hope the problem has been fixed and I never have to repeat this day.


#4

I have to agree with Mark however I did spend over an hour trying to troubleshoot thinking the problem was unique to me. It would be great if in the future and email could go out or some kind of announcement on the app so I would not have wasted my time trying to troubleshoot


#6