On Saturday, July 9, between 5:30pm EST and 6:30pm EST (21:30 – 22:30 UTC) we had a spike in users reporting response issues on some domains. Service degraded for the better part of an hour for some domains or (more likely) some parts of the world. As much as we hate to use the “regional outages” line, it’s looking like it.
Most of the effected domains were either: on the legacy system using the legacy nameserver delegation, or on the new system but had not yet switched their nameserver delegation to add the additional anycast constellation on the new platform.
I say most, because we have one report from a user who is fully on the new system who experienced issues.
We have had no reports from our Enterprise DNS users (if you are one and you had issues, please do let us know).
We initially believed this to be a general network issue (because we saw what looked like a corresponding spike in “* is down” reports on twitter, etc for unrelated domains around the same time). But as more data comes in, we are going to suck it up and say it: something weird happened, we think it was here.
Our response:
1) We are still gathering data and we have identified ways to enhance our own internal monitoring capabilities so that we can cross reference what we see with what external monitoring applications are seeing.
2) We recently made a change in the way we adjust our BGP announcements for individual members of the anycast constellations to optimize response times. Since this has never happened before, and we enacted this change recently, we are suspicious that this may be at the core of the problem and we have rolled that back.
3) We are laggard in our announcement here, and for that I personally apologize. When the event occurred many of the systems group conferred about the incident on Saturday night and we suspected a wider network issue, but we still made the decision to rollback the new BGP programs just to be on the safe side. It was only after we saw more reports roll in today from users that we had to rethink our stance around this and accept that this was most likely our problem and not a network flap or outage.
We’re sincerely sorry to all that were affected and for the delay in this posting.
Olivier Dagenais says
Hi,
I recently switched to the new system, but I don’t know if I added the additional anycast constellation. Can you let us know (1) how we tell if we have it and (2) how to add it if we don’t?
Thanks!
– Oli
Mark Jeftovic says
Take a look at your nameserver delegation either in the “nameservers” section of your Domain Overview (under the “Domain Info” tab) or look at your whois record.
If you see something like this:
ns1.easydns.com
ns2.easydns.com
ns3.easydns.org
ns6.easydns.net
remote1.easydns.com
remote2.easydns.com
you’re still using the legacy nameservers.
If we are your registrar, you can go into that aforementioned nameservers link and just click on the “Use easyDNS Nameservers” link, and it will load the form with the relevant nameservers.
You can see which nameservers apply to which level of service here:
https://web.easydns.com/our_nameservers.php
Rule of thumb:
All of the nameservers under the new system are named “dns1”, “dns2”, “dns3” and “dns4”
The legacy system are named “ns1”, “ns2” or “remote1”, etc.