Ad

Our DNA is written in Swift
Jump

Image Decompression Benchmark: iPad 4 + Mini

It’s been a while since the unveiling of the iPad mini and new new Retina iPad (aka iPad 4). I like to compare image decompression performance between device generations because personally I believe that this tells a more tangible story than any other benchmark where you end up with some score.

We were told that iPad 4 would be twice as fast as the iPad 3 and that the iPad mini would be at the same performance level as the iPad 2. So we shall compare the results for these devices to see if these statements hold. Also we would like to know if there is any sort of tangible benefit from including armv7s code in our apps.

Originally my image decompression benchmark would compare multiple image sizes, PNG crushed and uncrushed and 10 levels of JPG compression. As devices grow faster with each generation we find that it does not make much sense any more to fuss over accurately measuring the millisecond times for the very small images. There are many factors causing a bit of a fluctuation on those small numbers which would require us to either consider many more samples or to fudge the numbers.

Which is why for this edition of my benchmark update I’m only going to compare the Retina resolution image in PNG crushed+uncrushed and JPEG 80% as this is the compression level providing sufficient fidelity for most use cases.

Measured times are in milliseconds and include the three phases: alloc/init of the UIImage, decompressing it and drawing it into a properly sized bitmap context simulating display on screen.

You can see the iPad mini fitting snugly between the iPad 2 and 3, I’m stressing between because there is a measurable performance improvement over the iPad 2. Only a little (5% faster) on decompressing JPEGs, but much more noticeable when dealing with PNGs (25% faster uncrushed and 33% faster crushed).

If you refer back to my iPhone 5 benchmark you can see how Apple managed to double the JPEG decompression performance by doubling the floating point registers. No such dramatic CPU improvement can be seen between generations of the iPad. Obviously there Apple is focussing on GPU performance tuning which has only a very small impact on the sheer crunching of numbers needed to decompress something from disk.

I put the iPhone 5 onto the chart as well because just like the mini fits neatly between iPad 2 and 3, the phone is almost equidistant between the iPad 3 and 4.If you just look at the green bar (JPEG decompression) you can see it improve almost in a linear fashion. If Moore’s law where true for Apple too, shouldn’t that be increasing? Oh yes, it might since the time between 4 and 3 was only half a year.

The iPad 4 doesn’t give us twice the CPU power, but still the improvements are nothing to sneeze at: 30% improvement for uncrushed PNGs, 17% improvement for crushed PNGs and 31% better JPEG decompression. Not half, but almost a third is still remarkable. I outlined in my iPad 3 benchmark why I believed this device to be seriously underpowered for its resolution. In the iPad 4 the CPU has finally caught up with the display resolution.

Now my original statement from my very first benchmark article last year. Crushed PNGs are ideal when used for relatively small icons and UI elements. But for any sort of image of display resolution JPEGs can be as much as twice as fast to get from disk to display. On iPad 4 JPEGs (80%) can be decompressed 49% faster than crushed PNGs of same size.

The sweet spot remains at 16 ms because there you could get 60 images to display per second. iPad 4 is at 116 ms which is 7 times that amount. Getting there slowly but steadily, just 2 or 3 more iPad generations… or maybe Apple will finally add hardware acceleration to UIImage/CGImage.

I almost forgot to mention that I also tested armv7 versus armv7s on iPhone 5 and iPad 4 which are the two devices that currently support this instruction set. But I could not see any sort of difference in the numbers.

Conclusion

Bear in mind that we are only looking at the performance we can see via UIImage and Quartz which for the most part is only looking at the CPU because of the synthetic nature of the drawing. There is no compositing which is GPU-accelerated. There is a technique to decompress images on the GPU via CoreImage, but unfortunately it takes more time to load the image data onto the GPU from disk than it takes to decompress it on the CPU. This technique is only useful in scenarios where you want to decompress the same images often, say in a game where the images are textures.

I’m speculating that these findings about the iPad 4 are also telling us part of the reason why Apple already released the iPad 4 this fall. According to Moore’s law we should be able to expect twice the CPU performance in an iPad released one year after the iPad 3. But the iPad 4 is only a linear improvement and not an exponential one.

Possibly Apple’s silicon engineers are seeing what performance they can achieve with a given architecture and feature size and then Apple is deciding the release time based on that? If they had released the iPad 4 in spring it would have been less of the expected improvement. By releasing it ahead of the general expectations they manage to keep their head start and also get the new power connector into their main tablet product. Also the iPad 4 is the first tablet where game makers can crank the rendering target resolution to be equal to the pixels.

I predict we will see the next iPad be release as early as next spring because they still have the double register trick up their sleeve. We would have seen a much bigger jump if they had already used this sleight of hand.


Categories: Apple

4 Comments »

  1. I’m a chemist, and perhaps your wording is meant humorously, but in “Apple’s silicone engineers…” — silicon is the element that is used in chips, “silicone” is used for bathtub caulk, or breast implants…

  2. Moore’s Law: The number of transistors doubles every 24 months. That does has nothing to do with CPU performance as those transistors are used for many things on modern SOCs. And it’s 24 months not 12.

    Regardless, appreciate the benchmarks.