Ad

Our DNA is written in Swift
Jump

Beware of NSString Optimizations

There are some scenarios where NSString acts as a class cluster internally to optimize handling of certain strings. One such case bit me today, and so I want to tell you about it.

Class clusters work such that you think you are always dealing with just instances of NSString, but in reality the runtime goes and chooses different subclasses for certain tasks. You might have already seen some effects of this behavior when debugging and the debugger actually showing you something other than NSString as the type of a variable.

Consider the following innocent splitting of tokens:

NSString *text =@"One,,two,three,";
NSArray *components = [text componentsSeparatedByString:@","];
 
for (NSString *oneString in components)
{
	NSLog(@"'%@' = %@ %p", oneString, [oneString class], oneString);
}

What are the class types you expect to see for the individual components? NSString? Wrong.

'One' = __NSCFString 0x688e270
'' = __NSCFConstantString 0x1459cd8
'two' = __NSCFString 0x688e290
'three' = __NSCFString 0x6881890
'' = __NSCFConstantString 0x1459cd8

The NSCFStrings are actually toll-free bridging strings that can act as CFString in Core Foundation Land as well as NSString in Objective-C land. But note the NSCFConstantString!

This is the kind of optimization that I fell prey to. Look at the memory address! The two empty strings are even one and the same object as you can see by glancing at the output memory address. Who would expect that?

Not only is it a different class that you thought it would be, Objective-C also optimizes certain strings by folding them into certain other subclasses.

Now for the example where this triggered a face-palm…

It’s a Trap!

In DTCoreText I added a convenience method that creates a HTML string from an attributed string. For this I am looping through the paragraphs and add P tags for each.

NSArray *paragraphs = [plainString componentsSeparatedByString:@"\n"];
 
for (NSString *oneParagraph in paragraphs)
{
	NSRange paragraphRange = NSMakeRange(location, [oneParagraph length]);
 
	// skip empty paragraph at end
	if (oneParagraph == [paragraphs lastObject] && !paragraphRange.length)
	{
		continue;
	}
	...
}

Since the attributed string might have a \n at the end we don’t want to create an empty P. So I thought that it would be smart to compare the oneParagraph with the lastObject of the paragraphs array and if this was empty then skip it.

BUT, I showed you above that an empty element is actually turned into a certain NSCFConstantString. This would mean that in this example every empty component would be equal to the empty last one and thus be skipped. Not what we wanted.

Because of this optimization the lastObject method becomes useless to determine if oneParagraph is indeed the last one. We have change the slick fast enumeration to a classical for loop with a counter variable.

for (int i=0; i<[paragraphs count]; i++)
{
	NSString *oneParagraph = [paragraphs objectAtIndex:i];
	NSRange paragraphRange = NSMakeRange(location, [oneParagraph length]);
 
	// skip empty paragraph at the end
	if (i==[paragraphs count]-1)
	{
		if (!paragraphRange.length)
		{
			continue;
		}
	}
	...
}

It’s a bit clunkier, but now it is safe from this optimization. An alternative to this approach would be to still use fast enumeration, but increment a counter variable. But unfortunately there is no built-in method of knowing if one loop is the last one of the enumeration. So we still have to compare our counter with the array count.

New or Improved?

I wonder why this doesn’t confuse more people who are unaware of this optimization. Or is that something that was introduced in the later iOS SDKs? Well if you run the first code block on iOS 4.3 simulator you get a slightly different result.

You still get both empty strings pointing to the same instance, but now the constant string is a “normal” NSCFString. So Apple apparently introduced the NSCFConstantString in iOS 5 to be able to optimized certain uses cases. Having a subclass allows them to override certain expensive operations like for example hardcoding the hash value.

But apparently this folding of constant strings into single instances was around even before iOS 4 and so we have to be aware of this possibly wreaking havoc with our slick loop logic.


Categories: Q&A

6 Comments »

  1. You can enumerate your paragraphs with enumerateObjectsUsingBlock:. Syntactically it might look a little nicer than having to manually iterate and keep a counter of your own.

  2. Good news Apple continues to optimize even such basics like NSString.

  3. This particular instance might be a new optimization, but Objective-C’s runtime and compilers have tried to optimize away duplicate NSStrings for quite some time now. Like with the following:

    NSString *one = @”One”;
    NSString *two = @”Two”;
    NSString *oneAgain = @”One”;

    NSLog(@”one: class = %@, pointer = %p”, [one class], one);
    NSLog(@”two: class = %@, pointer = %p”, [two class], two);
    NSLog(@”oneAgain: class = %@, pointer = %p”, [oneAgain class], oneAgain);

    Gives the result:

    one: class = __NSCFConstantString, pointer = 0x105f45078
    two: class = __NSCFConstantString, pointer = 0x105f45098
    oneAgain: class = __NSCFConstantString, pointer = 0x105f45078

    where one and oneAgain point to the same NSCFConstantString.

  4. This is nothing new. The empty string optimisation dates from before Apple bought NeXT.