When using HTML to represent text that the user can edit you will meet a problem: HTML compresses whitespace. Tabs or newlines or even multiple spaces all get compressed to single spaces. That is, unless you enable the same sort of whitespace handling that PRE tags are using.
I was curious how Apple’s own NSHTMLWriter would be avoiding whitespace compression. And there I found a creative approach and adopted the same technique in my DTHTMLWriter which allows you to generate HTML from attributed strings.
The technique I am describing here is both implemented in the initWithHTML category methods on Mac as well as NSHTMLWriter. So I presume that this is indeed a feature of Webkit.
Consider the following test case: “Many spaces”, with exactly 5 regular space characters between the words. If you are typing this in a text editor you wouldn’t want the spaces to be compressed for you. Now one earlier approach I was using was to replace all regular spaces with non-breaking spaces, aka in HTML-lingo. But the problem here is that having all spaces to be non-breaking is interfering with proper line breaking.
How Apple does it
Apple has a different method. Instead of replacing all regular spaces with non-breaking ones they only replace every other one. Those converted spaces are then tagged in their own <span> with a class of “Apple-converted-space”. Doing it this way preserves a bit of line-wrapping functionality because text can still line-wrap on the normal spaces. But at the same time a receiving app can restore the length of normal space characters by watching out for spans with this class name. Ingenious!
You can see this in action whenever HTML is copied via the pasteboard, like from Safari, or whenever NSHTMLWriter is used, like for example if you use the setAttributedString method of UITextView.
This is how NSHTMLWriter (which calls itself “Cocoa HTML Writer”) represents the above sample text.
What you see here as a dot is actually a non-breaking space which Xcode displays such that you can tell it apart from regular spaces. This technique is only used for more than two spaces. The first space is outside the span and the remaining spaces are alternating non-breaking, breaking, non-breaking, etc.
And the other way around when DTHTMLAttributedStringBuilder processes the above HTML the original spaces are being restored. This is how this displays in the Demo app and the character view. Hex 20 (= Dec 32) are normal spaces.
There once was a second special tag that Webkit was using, “Apple-style-span”, and if you google for “Apple-converted-space” you still find many mentions of that. But Webkit developer Ryosuke Niwa concluded a two-year project in August 2011 to remove the need for “Apple-style-span” in Webkit.
However Apple’s special method for preserving whitespace prevails in Webkit.
By adding support for the “Apple-converted-space” span class we get the feature that whitespace is now properly preserved even when the HTML comes from other Apple apps or if you copy the resulting HTML data to the UIPasteboard so that it can be pasted into other Apple apps.
One might argue that Apple could also have gotten the same functionality by using the appropriate CSS white-space property instead of a proprietary span class. Maybe there are special cases where Apple’s approach works better? We may never know. This is not the place for discussing why they did it.