Our DNA is written in Swift

Today’s Hero: CHEN Xian’an

I was having a problem in DTCoreText where the multi-byte sequence making up an Emoji would not get properly encoded by DTHTMLWriter. A quick peeking into NSHTMLWriter didn’t bring relief either, Apple is not encoding these characters, but leaves them unencoded.

You can see in this screenshot that I looked at how NSHTMLWriter would encode two Emojis:

NSHTMLWriter Non-Encoding

That’s not the proper way for us, we want our output to be safe with any kind of transport encoding because it might end up on a machine that does not support UTF8 as brilliantly as iOS and OS X do.

My first idea was that I might have to force the font family to be “Apple Color Emoji” which is the font that CoreText falls back to for displaying Emojis. However if you don’t have the proper encoding even setting the font family does not help.

The problem was in NSString+HTML in DTCoreText which I use in DTHTMLWriter to add HTML entities. I created an issue on GitHub to describe my plight and while I was still dabbling around with libxml2 looking for an answer there Xian’an stepped in, fixed the bug and sent a pull request. Brilliant, simply brilliant!

The solution is to use some wide unicode supporting methods found in Core Foundation, more precisely CFString.

// ... looping through the unicars
if (oneChar<=255)
   [tmpString appendFormat:@"%C", oneChar];
else if (CFStringIsSurrogateHighCharacter(oneChar) && i < [self length]-1)
   unichar surrogateLowChar = [self characterAtIndex:i];
   UTF32Char u32code = CFStringGetLongCharacterForSurrogatePair(oneChar, surrogateLowChar);
   [tmpString appendFormat:@"&#%lu;", (unsigned long)u32code];
   [tmpString appendFormat:@"&#%d;", oneChar];

CFStringIsSurrogateHighCharacter checks if the current unichar character is a "Surrogate High Character". If yes then CFStringGetLongCharacterForSurrogatePair retrieves the 32-bit wide character value.

It's late, I'm tired and annoyed from having to re-tag DTCoreText 1.3.2 three times, but I just had to write this up for the whole world to know ...

Thank you CHEN Xian'an, your mad bug fixing skillz my day!

Categories: Q&A

Leave a Comment

%d bloggers like this: