September 12th, 2013
Internet Explorer 11 and Windows Store Apps on Windows 8.1 offload parts of the image decoding pipeline to the graphics hardware, resulting in up to 45% faster image load, up to 40% lower memory consumption, and improved battery life. Images on average account for the most bytes downloaded on the Web today. To improve the performance of loading JPG images, IE11 has streamlined its JPG decoding by moving some steps of the decoding process directly to the GPU where it can be done significantly faster and in parallel.
JPG Image Format
Images account for 61% of bytes downloaded on the Web today, and 47% of image requests are for JPG images. By using hardware more efficiently to decode JPG images, IE11 now loads JPG images up to 45% faster and uses up to 40% less memory over previous IE versions.
To understand these performance improvements, let’s start by looking at how JPG images are typically encoded. The first step of JPG encoding is to convert the bitmap from the RGB color space to YCbCr color space.
RGB defines color in three color components: red, green, and blue. If we break down this image of the Grand Tetons in to its RGB color channels, you can see that there is a high degree of detail in each of the channels. The amount of memory required by IE for an image in the RGB color model can be calculated by multiplying image width x image height x 32 bits.
Image of the Grand Tetons with its Red, Blue, and Green components
YCbCr (commonly referred to as YUV color space) defines a color space in terms of one luma (Y) and two chroma (CbCr) components. The luma component represents the brightness information of a color. The chroma components contain the color differences, with Cb containing the blue difference and Cr containing the red difference. The figure below shows the same image of the Grand Tetons broken down into its Y, Cb, and Cr channels.
Image of the Grand Tetons with its Y, Cb, Cr channels
The next step in the encoding process is to compress the image size using a lossy compression known as chroma subsampling. The chroma channels can be significantly compressed because the human eye is more sensitive to the brightness of an image and less sensitive to color or hue. For example, looking at the Cb and Cr channels, the last two on the right, you can see that there is very little information contained in these channels relative to the Y channel. The level of subsampling is typically expressed using a three part ratio, e.g., 4:2:0. The first part of the ratio represents a horizontal sampling reference, and the second two parts represent the number of chrominance samples in the first and second rows of that sampling reference. Chroma is typically subsampled at one of three levels:
- 4:4:4, where the chroma data isn’t subsampled
- 4:2:2, where the chroma data is downscaled horizontally by 2,
- 4:2:0, where the chroma data is downscaled horizontally and vertically by 2.
A 4:2:0 subsampled YCbCr JPG image can use up to 62.5% less memory than the original RGB bitmap! Most JPG images used on the Web today are already in 4:2:2 or 4:2:0 chroma subsampling modes. Most image processing tools will automatically subsample to 4:2:2 or 4:2:0 when using the ‘Save for Web’ option.
Once the image has been subsampled, the discrete cosine transformation, quantization, and Huffman encoding processes are also applied to get the final encoded JPG image.
More Efficiently Using Hardware in IE11
To decode a JPG image, you run the same encoding steps in reverse. Traditionally, IE had always decoded JPG images into RGB bitmaps by running all of the below decoding steps directly on the CPU. At render time, IE would copy the RGB bitmap to the GPU for rendering.
To more efficiently use the hardware, IE11 now splits the JPG decoding work between the CPU and GPU. IE11 decodes the JPG image into the chroma subsampled YCbCr color space on the CPU, but then does the chroma upsampling and YCbCr to RGB color conversion steps on the GPU at draw time, where it can happen much faster and in parallel. This process frees CPU time to perform other operations, as the CPU is a common bottleneck in modern sites and apps. In addition to the decode time improvements, copying the much smaller YCbCr image to the GPU reduces the amount of memory that is copied and stored on the GPU (a limited resource). Using less CPU and memory also reduces power consumption and increases data locality.
The below CPU chart shows the amount of time it takes to decode and draw a 4:2:0 subsampled YCbCr JPG version of the Grand Tetons image on IE10 on Windows 8. Image decoding took 31.9ms, and total time to draw the image was 81.5ms.
If we load the same JPG image on IE11 on Windows 8.1, you can see that the decode time is now only 17.9ms, which is an improvement of 44%! The total time to draw the image is now 57.5ms, which is 30% faster.
Because most JPG images on the Web today are in YCbCr format, IE11 on Windows 8.1 users will experience these improvements automatically. We recommend that developers ensure their JPG images are compressed in 4:2:2 or 4:2:0 chroma subsampling modes to maximize the performance benefits of hardware accelerating the JPG decoding pipeline.
By using the hardware more efficiently to decode JPG images, IE11 improves the performance of nearly every page you browse, while also improving the power consumption and battery life of your device. Please install the Windows 8.1 Preview from the Windows Store and try IE11. As always, we welcome your feedback, either through the IE11 Send Feedback tool or on Connect.
Jatinder Mann, Internet Explorer Program Manager