Ad

Our DNA is written in Swift
Jump

Parsing an RSS pubDate

Andreas Heller asks:

When parsing an RSS feed I get a date from pubDate, but as NSString and not NSDate. How do I get a date that I can use for sorting?

That’s a problem you face quite often when dealing with dates which are encoded in XML, be it an RSS feed or any other XML-based file format that you would be getting via HTTP GET. Contrary to other languages where any properly formatted date can be automatically parsed we have to do this ourselves in Cocoa.

Fortunately there is the NSDateFormatter class which can do it both ways: from date to nicely-formatted string as well as the other way around.

Let’s do like a Unix-pro and get ourselves some test data by getting some pubDates from my RSS feed. In terminal type:

curl www.drobnik.com/touch/feed/ | grep pubDate

This gets us the pubDates from the 10 latest articles on my blog. We see that WordPress encodes the pubDates in this format “Mon, 03 May 2010 18:54:26 +0000”, not really a very easy to parse one I concede. According to the RSS 2.0 spec this is supposed to be in RFC822 standard. If anyone would ask me, personally I think that the inventor of this date format should be poisoned, then hanged and maybe shot for good measure. Who in his right mind would create a date representation that does not allow for string sorting?

But in this case we are stuck with it and so let’s make the best of it. Fortunately NSDateFormatter can be taught to recognize this weird format. Apple uses format specifiers defined in Unicode UTR35, this link is handy URL to bookmark for reference.

NSString *dateString = @"Mon, 03 May 2010 18:54:26 +00:00";
 
NSDateFormatter *df = [[[NSDateFormatter alloc] init] autorelease];
[df setDateFormat:@"EEE, dd MMMM yyyy HH:mm:ss Z"];
 
// set locale to something English
NSLocale *enLocale = [[[NSLocale alloc] initWithLocaleIdentifier:@"en"] autorelease];
[df setLocale:enLocale];
 
NSDate *date = [df dateFromString:dateString];
 
NSLog(@"'%@' = %@", dateString, date);

UPDATE 3 days after I wrote this article: I woke up early, cold sweat on my forehead, remembering that the above example would not work if your system local was set to something other than English. NSDateFormatter does only recognize weekdays and months from your set locale. So I added the two lines setting an English locale. This even more underlines the stupidity of using RFC822 for pubDate.

This format should cover the vast majority of blogs. Note that the comma after the EEE is necessary and that the pubDate must have all the parts specified or else the dateFromString will return nil. If you find alternative date formats, then you could you an approach of trying different date formats until one returns a date.

One thing that might be confusing is that we are using a FORMATTER to parse a date. The format you set specifies how the string is to be interpreted, NOT how the date should look like. Internally NSDates are saved as the number of seconds since a reference date and therefore you cannot specify the “format of an NSDate”.

RFC822 is a crappy date format for another reason. Generally date/time in XML files is standardized to be represented in ISO 8601 format. The importance can also be inferred by the name of the standard, RFC is just a Request For Comment, whereas ISO is the International Standards Organization. Whom would you rather trust? 🙂

The ISO 8601 has some variations, but the generally seen one for the date I mentioned above is this:

// continuation of previous example
NSDateFormatter* df2 = [[[NSDateFormatter alloc] init] autorelease];
[df2 setDateFormat:@"yyyy-MM-dd'T'HH:mm:ssZ"];
NSString *dateString2 = [df2 stringFromDate:date];
 
NSLog(@"%@", dateString2);

Ah, the beauty of this format! 2010-05-03T20:54:26+0200 is human-readable, can be sorted as string and keeps chronological order if the time zone is the same and the time zone makes sure that this date works all around the globe.

Now you see +0200 in here. This comes from me being in the central European time zone which currently has a UTC offset of plus 2 hours. The truly international representation of dates is in UTC format where the +0000 can be abbreviated as Z. Because of this it is also called “Zulu Time”, but rest assured that stems from how pilots spell the letter Z and has not the least bit of African roots.

So the final example in this article shows how to output Zulu. All examples can be used either way, either for parsing strings into dates (dateFromString) or formatting dates into strings (stringFromDate).

NSDateFormatter* df2 = [[[NSDateFormatter alloc] init] autorelease];
[df2 setDateFormat:@"yyyy-MM-dd'T'HH:mm:ss'Z'"];
[df2 setTimeZone:[NSTimeZone timeZoneWithName:@"UTC"]];
NSString *dateString2 = [df2 stringFromDate:date];
 
NSLog(@"%@", dateString2);

Here I’m setting the time zone to UTC (Universal Time Coordinated) and I quoted the Z to make NSDateFormatter output the letter Z instead of the formatting element of the time zone.

Besides of some successful standardization we still a couple of different formats in the wild. NSDateFormatter helps you translate between them via NSDate.


Categories: Recipes

3 Comments »

  1. Hello, thanks for this tutorial. I’m parsing a twitter rss feed, which contains similiar timezones.
    But I have another question, when I’m parsing the feed I have a complete tweet as string, so the problem I’m having is I can’t visually differ from the contents, say an URL or a hashtag. I could compare the substrings to filter out hashtags or urls but since I can’t, for instance, colorize a certain substring in a UILabel I’d have to patch different UILabels and fit them together.
    Is there an easier way to accomplish this?

    In short I need some parts of a string colored differently than others.

  2. I kinda find a solution with this classes, which uses the RegexKit framework
    http://furbo.org/2008/10/07/fancy-uilabels/

  3. 5 years from this post and still useful! Thank you sooooo much my man.