Ad

Our DNA is written in Swift
Jump

Apple-tab-span

An editor has to deal with the user hitting the tab key on an external keyboard and then be able to persist these tabs. Thus the question arose how I would best represent tab characters (\t) in HTML. At first I tried to encode them as 	 entities, but that is causing lots of trouble since on the parsing end it is difficult to know whether a tab came from this entity or if it came from the literal \t.

I could have done that with a very ugly hack of libxml2 (which powers my DTHTMLParser), but after having wasted half a day on this I relented. I previously reported my findings about Apple-converted-space which is the method NSHTMLWriter uses to preserve multiple spaces.

In this article I am documenting my findings related to how Apple conserves tabs for HTML output.

You might remember me posting about the internals of UITextView which employs NSHTMLWriter to encode attributed strings as HTML first to then render attributed text with WebKit. I had used this class (via a reconstructed header) in my MinMaxLineHeightBug sample which is available in my Radar Samples repository on GitHub. So I quickly grabbed this, added a couple of tabs to the attributed string and inspected the output.

NSHTMLWriter representing tabs

We can see that each tab character got enclosed into an Apple-tab-span span. Looking up this style in the style block contained in the header reveals that this span simply defines to preserve all white space.

In contract to Apple-converted-space where characters are modified this doesn’t change the actual content. It just adds on span per tab.

One Span Each

For I while I was pondering why Apple chose to add this span for each an every single tab. If there are multiple tab characters next to each other, wouldn’t it be more efficient to add one span for the entire range?

A quick test in Safari didn’t show anything different between having 3 tabs with 3 spans versus 3 tabs in a single span. The white-space:pre style is protecting ranges of tabs just the same as individual ones.

I have two theories as to the Why. Either there was some historic reason for it that maybe some older browsers didn’t properly deal with multi-character white space. Or was it simple laziness because then you can do the replacement with a single replace statement on NSMutableString?

Friend of the blog Simon Tiplady pointed me in the right direction by asking the key question:

Grouping tabs with 1 span: what happens if the tabs are longer than the available space on the line? Would it break as expected?

Bull’s Eye! If we look up white-space:pre we find that pre is defined to never wrap lines. Which is not what we want for an editor, where we always want to have as much text as fits into a line and then have it break.

Knowing that greatly also greatly simplifies my code.

 
BOOL hasTab = ([retString rangeOfString:@"\t"].location != NSNotFound);
 
// add style to header style block for document
 
if (hasTab)
{
   NSRange range = NSMakeRange(0, [retString length]);
 
   if (fragment)
   {
      [retString replaceOccurrencesOfString:@"\t" 
         withString:@"<span style=\"white-space:pre;\">\t</span>" 
         options:0 range:range];
   }
   else
   {
      [retString replaceOccurrencesOfString:@"\t" 
         withString:@"<span class=\"Apple-tab-span\">\t</span>" 
         options:0 range:range];
   }
}

For a fragment (which does not have a header) we have to add the style inline, for documents we have to add the span to the style block. Note the charactersToBeSkipped = nil, without which NSScanner would be automatically skipping white space.

Conclusion

Enclosing tab characters with a white-space-preserving span is a simple and effective method to preserve tabs. We could just as well name our span class differently or always do the style inline. But as a tip of the hat to this elegant solution I am sticking with Apple-tab-span.

This is how tabs will be represented by DTHTMLWriter as of DTCoreText 1.5.3 (coming soon).


Categories: Recipes

4 Comments »