Ad

Our DNA is written in Swift
Jump

Threadsafe Lazy Property Initialization

I was looking for a safe way to initialize a property on individual instances of an object. “Safe” as in “thread-safe”, because nowadays you never know… with GCD it could just happen that an object is used from multiple threads concurrently.

I felt the need to chronicle the findings here one the different approaches I tried.

In DTCoreText I have a class that represents one line of text. I wanted to protect the method to get the text metrics for this line.

@synchronized

The classical approach is to use the @synchronized keyword and provide any object to synchronize against. Usually you would use self for this purpose, but you don’t have to.

- (void)calculateMetrics
{
	@synchronized(self)
	{
		if (!_didCalculateMetrics)
		{
			// do calc
 
			_didCalculateMetrics = YES;
		}
	}
}

These days if you show such code to somebody with at least a passing knowledge of Grand Central Dispatch we will frown and turn his nose on @synchronize. “Man, synchronize is so much slower than GCD.”

Fiery Robot did a bit of benchmarking which shows that indeed GCD is marginally faster than synchronize, probably because of the more light-weight locking that is available through GCD. This is why David Hoerl changed it to using GCD locking when he updated the project to ARC.

GCD Semaphores

This adds quite a bit more code as you have to create the locking token (aka semaphore), keep it in an IVAR and release it at the end of the object’s lifetime.

@interface DTCoreTextLayoutLine ()
@property (nonatomic, assign) dispatch_semaphore_t layoutLock;
@end
 
@implementation DTCoreTextLayoutLine
 
- (id)init...
{
	if ((self = [super init]))
	{
		layoutLock = dispatch_semaphore_create(1);
	}
	return self;
}
 
- (void)dealloc
{
	// clean up semaphore
	dispatch_release(layoutLock);
}
 
- (void)calculateMetrics
{
	// wait for lock if active
	dispatch_semaphore_wait(layoutLock, DISPATCH_TIME_FOREVER);
 
	if (!_didCalculateMetrics)
	{
		// do calc
 
		_didCalculateMetrics = YES;
	}
 
	// release lock
	dispatch_semaphore_signal(layoutLock);
}

This achieves the same degree of safety as the synchronize version, but using GCD and – marginally – faster.

dispatch_once … NOT

Though there was something about this that bugged me. Previously we learned that if you want something to occur only exactly once you would use dispatch_once.

- (void)calculateMetrics
{
	static dispatch_once_t _onceToken;
	dispatch_once(&_onceToken, ^{
		// do calc
 
		_didCalculateMetrics = YES;
	});
}

That looks quaint, but it does not work for the simple reason that static variables are essentially global. Creating the dispatch once token inside the method only creates on token for all instances of this class. So it would calculate the text metrics only for the very first layout line.

So my next attempt was to make the once token a private instance variable. That’s exactly the same as the above example, but instead of static you move the token variable into the curly braces after the implementation.

I got even cockier than that. I still had a BOOL to keep track of wether the calc had already been done, so that I don’t get an overhead of many unnecessary Objective-C function calls. Since the dispatch once token is essentially an integer itself you can also use it in an if. It will be 0 when created and -1 once it was used once.

@implementation DTCoreTextLayoutLine
{
	dispatch_once_t _didCalculateMetrics;
}
 
- (void)calculateMetrics
{
	dispatch_once(&_didCalculateMetrics, ^{
		// do calc
	});
}
 
- (CGFloat)ascent
{
	if (!_didCalculateMetrics)
	{
		[self calculateMetrics];
	}
 
	return ascent;
}

This approach looked to me the most elegant involving the least amount of code. So I patted myself on my back and pushed these changes to GitHub for everybody to marvel at my ingenuity. Even some true experts called it “unconventional”.

BUT …

Unfortunately it’s not as good as it looks. Somebody had the audacity to read the f’in manual (RTFM) and point out this sentence to me:

The predicate must point to a variable stored in global or static scope. The result of using a predicate with automatic or dynamic storage is undefined.

OH NO! This sentence basically means that you cannot safely use a dispatch once token that is an instance variable or even a regular local variable. Instead it has to be either global outside of your implementation or with the static keyword.

The reason for this restriction must be in the way how GCD can access global or static storage in a thread-safe way that it cannot for other types of variables. Maybe something to do with memory locking, that only works properly for these cases.

The “elegant” method does indeed work, but falls apart in concurrency.

There’s yet another possibility for synchronizing, dispatch_sync.

dispatch_sync

This is similar to locking semaphores because internally dispatch_sync uses these for synchronizing. Though there is a bit of an advantage of using a sync dispatching over locking. You might end up with a dead lock if a lock-protected section tries to wait for the lock as well.

 
@implementation DTCoreTextLayoutLine
{
	BOOL _didCalculateMetrics;
	dispatch_queue_t _syncQueue;
}
 
- (id)init...
{
	if ((self = [super init]))
	{
		// get a global queue
		_syncQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
	}
	return self;
}
 
- (void)calculateMetrics
{
	dispatch_sync(_syncQueue, ^{
		if (!_didCalculateMetrics)
		{
			// do calc
 
			_didCalculateMetrics = YES;
		}
	});
}

The previous example uses a global queue (available in low, default and high priority) and as such does not need to worry about retaining or releasing it, as these live as long as your app does.

Again the RTFM contains a suggestion for us:

Although dispatch queues are reference-counted objects, you do not need to retain and release the global concurrent queues. Because they are global to your application, retain and release calls for these queues are ignored. Therefore, you do not need to store references to these queues. You can just call the dispatch_get_global_queue function whenever you need a reference to one of them.

But personally I’d rather set up the reference to the queue I want to use for synching in an IVAR because when I can be sure to have the exact same queue available in different methods of the same object. If you are doing some very tight and extensive looping then the overhead of unnecessary function calls can pile up.

UPDATE: Reader Peter correctly pointed out the the global queues are concurrent and thus unsuitable for serializing property access. Rather we have to create our own private serial queue and also destroy it as soon we don’t need it any more.

 
@implementation DTCoreTextLayoutLine
{
	BOOL _didCalculateMetrics;
	dispatch_queue_t _syncQueue;
}
 
- (id)init...
{
	if ((self = [super init]))
	{
		_syncQueue = dispatch_queue_create("MySyncQueue", 0);
	}
	return self;
}
 
- (void)dealloc
{
	dispatch_release(_syncQueue);
}
 
- (void)calculateMetrics
{
	dispatch_sync(_syncQueue, ^{
		if (!_didCalculateMetrics)
		{
			// do calc
 
			_didCalculateMetrics = YES;
		}
	});
}

Thanks Peter for spotting that!

Conclusion

It is unfortunate that dispatch_once does not work in the context of lazy property initialization. For this scenario we have 3 options at our disposal. The original synchronize, locking with semaphores and sync dispatching.

One use was intentionally omitted from this: dispatch_async. The reason being that the above mentioned benchmark showed it as substantially slower than the other variants because of the system having to copy the blocks to the heap. Sync dispatching can keep the block on the stack and execute it right away.


Categories: Recipes

9 Comments »

  1. Hello,

    sometimes I use an approach like the dispatch_sync you mentioned. The only difference with your example is that I usually setup a private serial queue and then dispatch_sync my access or modifier block on this queue.
    According to me the advantage, with respect to a concurrent queue, is that you have more control on the data access.
    So the serial queue provides me with the required synchronization safety, while the “dispatch_sync” call allows to me to finish the access/modifier block before continuing with the current thread.
    To avoid dead locks, I atomize my blocks as much as possible, especially avoiding to nest dispatch_sync calls on the same queue (so a set/get operation on the same data structure must be performed on two separate dispatch_sync calls).
    Finally, there is no impact in term of performances are each serial queue is executed concurrently inside a global queue.

    Note that this approach is not limited to safe initialization, but can be extended to any object access.

  2. Great post, very helpful in framing the options here. But I’m confused about the dispatch_sync example — by using dispatch_get_global_queue, isn’t the block being sent to a concurrent queue? Docs say “blocks submitted to these global concurrent queues may be executed concurrently with respect to each other.” Shouldn’t this be a serial queue?

  3. I think you are right. That should be a serial queue instead of a global concurrent one. I’ll modify the sample.

  4. How about OSAtomicCompareAndSwap?

  5. how would that work?

  6. Note that dispatch_sync doesn’t guarantee that it will run the block on any particular thread, only that multiple threads will each have exclusive access to the code. So if your “do calc” code makes calls to methods that must be run on, say, the main thread, you could have problems, even if your queue apparently targets the main thread. This is an optimization to allow GCD to not have to jump between threads unnecessarily.

  7. I saw some discussion as to why the dispatch_once token has to be static. It boils down to the fact that the history of the memory location used for the token can only be guaranteed if static, because those locations are clear when the app is started. Using general memory does not guarantee that it was in use by another thread a short while ago and, lacking memory barriers to prevent it, that location could still change as hardware reorders things, causing the fast barrier-free test for zero to give unpredictable results.