via The Cobbler’s Children Have No Shoes Dept….
I’ve been meaning to post this autopsy over a week ago, but I’ve been down with a chest cold, been busy and it’s just plain embarrassing.
On Saturday November 6th, 2014, the easypress.ca domain name expired at the registry, basically vaporizing a majority chunk of the easyPress managed wordpress hosting service for a few hours. (Because easyPress hostnames are typically aliases to *.wp.easypress.ca – so you get that really nifty datacenter failover, when easypress.ca went off the air, so did anything aliased to it.)
Once we discovered the issue, we remedied it (i.e. we hastily renewed the domain name), but we missed the cutoff for CIRA’s rootzone updates, had to wait for the next update an hour later, and then it takes another 40 minutes for CIRA to propagate zone updates out to all the .CA roots.
In the meantime, once we realized we’d missed the cutoff, we renamed all of the hostnames for which we were DNS provider onto the easy.press domain (which does illustrate another advantage of having easyDNS do your DNS for your easyPress installs, we’re in a better position to fix things when network events do occur.)
So What Happened?
As you may know, when you have an enterprise level domain with easyDNS, your registry fees (a.k.a domain renewals) are free, bundled with the service. We have easypress.ca loaded in our own system at enterprise level of service. But we don’t actually “pay ourselves” when easypress.ca services come due (job perk), we bypass payment processing under a nebulous process umbrella called “internal”. It turned out that we also bypassed the part where the system auto-renews .ca domains. This hasn’t been an issue before because in this case, the renewal processing was (again) peculiar to the way .ca domains work (we have a different auto-renew mechanism enabled on our .com’s etc)
What Happens Now
Let’s be clear: this was our screw up, not CIRA’s, but a clear takeaway came out of it.
The realization was that in general terms, we can’t put mission critical stuff on .ca domains until the .ca rootzone goes to realtime updates, and CIRA has no plans for doing so. This has been a long standing complaint of ours, especially because being Canadians, we’d like our nameserver level failover product: Proactive Nameservers, to be able to work with .ca domains, but it can’t.
It means that one of the key requirements that we employ for being able to ensure 100% DNS availability and thus system uptime available to us: having the ability to change nameservers in realtime, isn’t an option when it comes to .ca domains.
During the incident we renamed everything onto our newly acquired easy.press domain and are looking at leaving it like that on a go forward basis, possibly rebranding entirely onto the easy.press domain. We’ve confirmed with the Radix registry that the .press rootzone does indeed update in realtime (as do most others), so our confidence level is higher knowing we can employ all of the methods in our toolkit.
But at the end of the day, big #fail for us – the DNS company and speaking as the guy who is literally writing the book on how not to have stuff like this happen to you, it’s embarrassing.
Our sincerest apologies to all easyPress customers who were affected by this.
Leave a Reply