Ad

Our DNA is written in Swift
Jump

Leap Second Attack

Comics mentioned that last night we would get more rest, exactly one second. Hardly worth noticing though, if it weren’t for several computer systems which seemed to have issues with suddenly finding them one second out of whack.

The reason being that (in UTC) yesterday was one second longer than usual because of an inserted Leap Second. There’s something positive about leap seconds, as Wikipedia notes: “Between their adoption in 1972 and June 2012, 25 leap seconds have been scheduled, all positive.”

At the time of this writing (the day after) I cannot reach any of the sites on apple.com. Pure coincidence? Or a Leap Second Attack?

One should think that UNIX-based systems are immune to timing problems because internally unix timestamps are just the number of seconds since January 1st, 1970 00:00 UTC. As Cédric Luthi pointed out to me Mac OS X or iOS didn’t even exist in the year 2000 and so they didn’t have to face this problem. Those represent dates relative to January 1st, 2001. You can see this if you look closely at NSDate’s methods …sinceReferenceDate.

I sat down this morning with breakfast and the firm intention to watch a WWDC 2012 video. I hadn’t downloaded all the episodes because you can stream them from iTunes or via the website just fine. Usually. But not today.

So – as every engineer would – I started to eliminate variables to see what could be the cause of this. Google, still there. Amazon, ok. Then I remembered one rule that would fix most of the Windows problems I had in the past and very rarely also some OS X problems: when in doubt, reboot.

So reboot I did. Didn’t help.

Next up, maybe my shady ADSL connection could be the culprit. So I logged into my router and cut the connection. But restoring it still didn’t help. Maybe a problem of the DNS system?

lionking:~ oliver$ nslookup developer.apple.com
Server:		192.168.1.254
Address:	192.168.1.254#53

** server can't find developer.apple.com: NXDOMAIN

Ah, we’re getting somewhere. An nslookup should always return a valid IP address when queried like that. I am using the default name servers that my ISP provides, so next I tried forcing some public alternative DNS servers to see if they have better success in resolving the name. OpenDNS and Google DNS. But still no joy.

Some people reported that when they had some problems connecting a switch to Google DNS had fixed it for them. But this might have only been the case because name servers have different durations for how long they cache certain IP addresses. If Google had a longer timeout than your ISP then there would have been a grace period where Google could still resolve it.

Another theory I was pursing was that Apple became the victim of a distributed denial of service attack. This fit with Rene Schätzl‘s observation who reported that he was getting timeouts on 3 out of 6 Apple name servers. DDOS attacks often work by overwhelming certain parts of a victims network infrastructure by bomparding it with millions of requests from all the nodes.

Parts of Apple’s web sites are hosted on Akamai which distributes copies all around the globe to share the load and speed up delivery. This might explain why apple.com was not affected because most likely this is one of these distributed sites. The developer portal however might not be and thus became victim of the denial of service, distributed, accidental or otherwise.

Then I turned to Twitter asking my followers if anybody else was experiencing the same issues. And – to my great relief – others were seeing the same problem. This meant that the problem was not localized on my hardware or my own network, but apparently on a grander scale.

Bron Gondwana had asked on Serverfault.com: “Anyone else experiencing high rates of Linux server crashes during a leap second day?”. When I first saw this question tweeted I thought this to be a joke. I assumed that by now all server software vendors would have made their operating systems immune to problems related to the system time.

But apparently not, the question in question revolves around a bug in the Network Time Protocol (NTP) and the existing workaround is about disabling the NTP daemon (host process) and running several time fixing scripts. So that is no joke but actually reality, however hard it is to wrap your mind around that.

But the problem apparently was not limited to name resolution.

Ben Chatelain still know a previous IP address of developer.apple.com which he pinged, but unsuccessfully.

lionking:~ oliver$ ping 17.254.2.129
PING 17.254.2.129 (17.254.2.129): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
^C
--- 17.254.2.129 ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss

Not all of Apple’s servers are hosted equal however. I could still get into iTunes Connect, though with all the images missing.

Closer inspection of the image URLs showed that Apple is hosting them on a dedicated static asset server at itc.mzstatic.com.

Oh, well. You shouldn’t work on Sunday anyway …

Let me know on Twitter or the comments of this post if you have any new information.

Update: I pinged some of the guys at Apple who I know to be as addicted to Twitter as I am and Michael Jurewitz let us know that the team is on top of it.

Update: 2 hours after this the problem had been resolved. Just in time to permit several overworked iOS developers to avoid an otherwise lazy Sunday.


Categories: Apple

2 Comments »

  1. Two items to add to your post…
    1) MobileMe takes its last breath in any sec. (end-of-life June 30, 2012) &
    2) East coast US had a major wind storm/ power outage last night
    (millions of people without power still… worst non-hurricane power outage; compounded with 50-year old record breaking heat putting addition strain on system, due to air conditioning loads)

    References with some more details:

    MobileMe Shutting Down Services – last day June 30, 2012
    http://support.apple.com/kb/HT4597
    – expect future updates to OSX, iOS, iWork Apps, & perhaps iCloud would remove integrated references to MobileMe

    Amazon Virginia Data taken out by hurricane force wind storm (no specifics on other data centers effected)
    http://www.datacenterknowledge.com/archives/2012/06/30/amazon-data-center-loses-power-during-storm/

    East Coast Storm – Major Power Outages & triple-digit heat
    http://in.reuters.com/article/2012/07/01/usa-weather-storm-idINL2E8I100620120701
    Map of power-strained area
    http://www.globalpost.com/dispatch/news/regions/americas/united-states/120630/midwest-thunderstorms-power-outages

    –Brian

  2. Next time before rebooting or examining your network infrastructure you might want to check on http://www.downforeveryoneorjustme.com/ first 🙂

    It’s very useful, I use it every time I run into such a problem.