Comics mentioned that last night we would get more rest, exactly one second. Hardly worth noticing though, if it weren’t for several computer systems which seemed to have issues with suddenly finding them one second out of whack.
The reason being that (in UTC) yesterday was one second longer than usual because of an inserted Leap Second. There’s something positive about leap seconds, as Wikipedia notes: “Between their adoption in 1972 and June 2012, 25 leap seconds have been scheduled, all positive.”
At the time of this writing (the day after) I cannot reach any of the sites on apple.com. Pure coincidence? Or a Leap Second Attack?
One should think that UNIX-based systems are immune to timing problems because internally unix timestamps are just the number of seconds since January 1st, 1970 00:00 UTC. As Cédric Luthi pointed out to me Mac OS X or iOS didn’t even exist in the year 2000 and so they didn’t have to face this problem. Those represent dates relative to January 1st, 2001. You can see this if you look closely at NSDate’s methods …sinceReferenceDate.
I sat down this morning with breakfast and the firm intention to watch a WWDC 2012 video. I hadn’t downloaded all the episodes because you can stream them from iTunes or via the website just fine. Usually. But not today.
So – as every engineer would – I started to eliminate variables to see what could be the cause of this. Google, still there. Amazon, ok. Then I remembered one rule that would fix most of the Windows problems I had in the past and very rarely also some OS X problems: when in doubt, reboot.
So reboot I did. Didn’t help.
Next up, maybe my shady ADSL connection could be the culprit. So I logged into my router and cut the connection. But restoring it still didn’t help. Maybe a problem of the DNS system?
lionking:~ oliver$ nslookup developer.apple.com Server: 192.168.1.254 Address: 192.168.1.254#53 ** server can't find developer.apple.com: NXDOMAIN
Ah, we’re getting somewhere. An nslookup should always return a valid IP address when queried like that. I am using the default name servers that my ISP provides, so next I tried forcing some public alternative DNS servers to see if they have better success in resolving the name. OpenDNS and Google DNS. But still no joy.
Some people reported that when they had some problems connecting a switch to Google DNS had fixed it for them. But this might have only been the case because name servers have different durations for how long they cache certain IP addresses. If Google had a longer timeout than your ISP then there would have been a grace period where Google could still resolve it.
Another theory I was pursing was that Apple became the victim of a distributed denial of service attack. This fit with Rene Schätzl‘s observation who reported that he was getting timeouts on 3 out of 6 Apple name servers. DDOS attacks often work by overwhelming certain parts of a victims network infrastructure by bomparding it with millions of requests from all the nodes.
Parts of Apple’s web sites are hosted on Akamai which distributes copies all around the globe to share the load and speed up delivery. This might explain why apple.com was not affected because most likely this is one of these distributed sites. The developer portal however might not be and thus became victim of the denial of service, distributed, accidental or otherwise.
Then I turned to Twitter asking my followers if anybody else was experiencing the same issues. And – to my great relief – others were seeing the same problem. This meant that the problem was not localized on my hardware or my own network, but apparently on a grander scale.
Bron Gondwana had asked on Serverfault.com: “Anyone else experiencing high rates of Linux server crashes during a leap second day?”. When I first saw this question tweeted I thought this to be a joke. I assumed that by now all server software vendors would have made their operating systems immune to problems related to the system time.
But apparently not, the question in question revolves around a bug in the Network Time Protocol (NTP) and the existing workaround is about disabling the NTP daemon (host process) and running several time fixing scripts. So that is no joke but actually reality, however hard it is to wrap your mind around that.
But the problem apparently was not limited to name resolution.
Ben Chatelain still know a previous IP address of developer.apple.com which he pinged, but unsuccessfully.
lionking:~ oliver$ ping 188.8.131.52 PING 184.108.40.206 (220.127.116.11): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 ^C --- 18.104.22.168 ping statistics --- 6 packets transmitted, 0 packets received, 100.0% packet loss
Not all of Apple’s servers are hosted equal however. I could still get into iTunes Connect, though with all the images missing.
Closer inspection of the image URLs showed that Apple is hosting them on a dedicated static asset server at itc.mzstatic.com.
Oh, well. You shouldn’t work on Sunday anyway …
Quick, somebody check at his closest open Apple Store if Apple is still in business. Do we still have a job on Monday?
— Cocoanetics (@Cocoanetics) July 1, 2012
Let me know on Twitter or the comments of this post if you have any new information.
Update: I pinged some of the guys at Apple who I know to be as addicted to Twitter as I am and Michael Jurewitz let us know that the team is on top of it.
— Michael Jurewitz (@Jury) July 1, 2012
Update: 2 hours after this the problem had been resolved. Just in time to permit several overworked iOS developers to avoid an otherwise lazy Sunday.