A Short Primer in Static Image Compression
by Jason Leung 12/11/2000
Download the MS Word version of this doc by clicking here
Background
Introduction
With the advent of multimedia and the Internet, the storage and transmission of pictures and photographs in the digital domain has attained central importance in defining our ability to communicate in the digital age. However, despite rapid evolution in mass storage capacity, processor speeds, and wired and wireless network communication bandwidth capacity, the size of uncompressed digital images still remains extremely bulky and inefficient for existing digital systems to handle.
The concept of digital image compression has been applied to reduce the size of digital images to a level more suitable for existing digital systems to handle. To reinforce the concept that image compression is necessary for expedient digital storage and delivery of images, the below table illustrates the common image types with their respective physical sizes, sampling rates, uncompressed digital sizes, and transmission times over a 56Kbit bandwidth network:
Multimedia Data |
Size/ Duration |
Bits/Pixel or Bits/Sample |
Uncompressed Size in Bytes |
Transmission Time over 56Kb Network |
A Page of Text |
11" x 8.5" |
Varies |
4-8 KB |
0.5 – 1.1 sec |
Telephone Quality Speech |
10 seconds |
8 bps |
80 KB |
11.1 sec |
Grayscale Image |
512 x 512 pixels |
8 bpp (8 shades of gray) |
262 KB |
37.5 sec |
Color Image |
512 x 512 pixels |
24 bpp (High Color) |
786 KB |
1 min 54.5 sec |
Medical Image |
2048 x 1680 pixels |
12 bpp |
5.16 MB |
11 min 57 sec |
Full Motion Video |
640 x 480 pixels, 1 minute @ 30 frames/sec |
24 bpp (High Color) |
1.66 GB |
2 days 16 hrs |
The above table shows that transmission times for the illustrated date in uncompressed form is unacceptable given the digital age’s "instant" expectations. Given the state of mass storage and network technology, it is clear that the solution to this problem is to use image compression. Assuming that a compression ratio of 32:1 were achievable with acceptable image quality, a color image that normally needs close to 2 minutes to send over a 56Kb network would only need 3.75 seconds to send in compressed format.
General Concepts in Image Compression
The aim of image compression is to reduce the number of bits required to encode an image. This reduction can be accomplished in one of two ways: Irrelevancy Reduction aims to eliminate the details in an image that are invisible to the human eye; Redundancy Reduction aims to remove duplicate information within an image. For the purposes of this discussion, we will assume that the detail in our source images are limited to the resolution of the CCD within the NUCam, and that all information captured by the CCD can be discerned by the human eye. In general, there are three types of redundancies that can be reduced by redundancy reduction:
Current static image compression techniques focus on maximizing the reduction of spatial and spectral redundancies.
Techniques in Image Compression
There exists two main techniques in image compression:
Image Encoder Block Diagram of Lossy, Transform Coding System
Lossy, transform coding image compression systems typically comprise of three basic parts:
Image Compression Algorithms
Graphics Interchange Format (GIF)
The GIF compression format was developed by Compuserve Information Service Inc. in 1987 as a lossless compression algorithm based on the proprietary LZW (Lempel Zev Welch) compression algorithm. The LZW compression algorithm works as follows:
If we assuming the input string of "WABBA@WABB" and an initial LZW dictionary of the table to the right, the compression algorithm will work as follows:
Index |
Entry |
1 |
@ |
2 |
A |
3 |
B |
4 |
O |
5 |
W |
Index |
Entry |
6 |
WA |
7 |
AB |
8 |
BB |
Continuing in this manner, the encoder output sequence is 5 2 3 3 2 1 6 8 with the additional following entries into the LZW dictionary:
The decoding process starts with the same initial LZW dictionary (5 indices) and works similarly to the encoding process, only in reverse.
In the GIF extrapolation of the LZW algorithm, the initial dictionary size is limited to 2^{8} = 256 discrete colors. If the initial source image contains more than the 256 color limit, color information is approximated and compromised in the GIF compressed version of the image. As the number of source colors increases above the 256 limit, the resultant GIF compressed image becomes more and more compromised from the original image.
It is for this reason that GIFs are not suited for compressing photographs of images with non sharp edges (gradual color transitions). The wide range of colors in an image with non sharp edges causes the GIF compression algorithm to compromise the color content of the edge which adversely affects image quality. GIF image compression is most ideally suited for images with limited color content that have hard edges (sharp color changes). Traditionally, GIFs work best with line drawings, cartoons, and Black and White images.
Joint Photographic Experts Group Standard (JPEG)
Discrete Cosine Transform
The JPEG format is a lossy, transform coding image compression format whose transform mechanism is based on the 1974 discovery of the Discrete Cosine Transform (DCT). For the purposes of JPEG compression, the DCT is defined as:
Where A represent the input image and B represents the DCT transformation of input image A. For the above equation, the input image is N2 pixels wide by N1 pixels high, A(i,j) is the intensity of the pixel in row i and column j, and B(k1,k2) is the DCT coefficient in row k1 and column k2 of the DCT matrix. All DCT multiplications are real. This lowers the number of required multiplications, as compared to the discrete Fourier transform. The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level; 8 bit pixels have levels from 0 to 255. The output array of DCT coefficients contains integers; these can range from -1024 to 1023. For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT. The lower right values represent higher frequencies, and are often small - small enough to be neglected with little visible distortion.
The above explanation is targeted for grayscale JPEG images, but the compression of color images is very similar, and can be regarded as the compression of the RGB color space of color images into three distinct grayscale images.The JPEG standard applies the DCT-to-grayscale-representation. In practical terms, the purpose behind the DCT in JPEG is to transform an RGB image into the luminance, chrominance color space (YcbCr, YUV, etc) as the human eye not as sensitive to high frequency chroma as to high f luminance. The following is the block diagram of the JPEG encoder:
The JEPG encoder divides the image to be encoded into 8 by 8 pixel blocks and applies the Discrete Cosine Transform to each block. The quantizer rounds off the DCT coefficients, and the degree of quantization must selected carefully as there exists a tradeoff between image quality and degree of quantization.
Quantization
A large quantization step size can produce unacceptably large image distortion. The effect is similar to quantizing Fourier series coefficients too coarsely; large distortion results. Yet, finer quantization leads to lower compression ratios. Thus, the quantizer in the JPEG encoding process must balance visual distortion with meaningful compression. Because human eyesight experiences sharp high frequency roll-off, the JPEG encoder uses much higher step sizes for high frequency coefficients, with little noticeable image deterioration.
Quantization Matrix
The quantization matrix is an 8 by 8 matrix of step sizes (also known as quantums) - one element for each DCT coefficient. It is usually symmetric. Step sizes will be small in the upper left (low frequencies), and large in the upper right (high frequencies); a step size of 1 is the most precise. The quantizer divides the DCT coefficient by its corresponding quantum, then rounds to the nearest integer. Large quantums drive small coefficients down to zero. The result: many high frequency coefficients become zero, and therefore easier to code. The low frequency coefficients undergo only minor adjustment
Consolidating Zero Coefficients after Quantization
Because many high frequency DCT coefficients are quantized to zero, the JPEG encoder attempts to contiguously consolidate a maximum number of zero ECT coefficients before applying run-length coding. For each non-zero DCT coefficient, JPEG records the number of zeros that preceed the number, the number of bits needed to represent the number's amplitude, and the amplitude itself. To consolidate the runs of zeros, JPEG processes DCT coefficients in the zigzag pattern shown below:
Entropy Encoding
The number of previous zeros and the bits needed for the current number's amplitude form a numeric pair which is assigned a code word through a variable length code (e.g. Huffman, Shannon-Fano or Arithmetic coding) in the entropy encoder. The entropy encoder serves to achieve additional compression losslessly. The JPEG encoder outputs the code word of the pair, the codeword for the coefficient's amplitude (also from a variable length code). This is repeated for each 8x8 block until the entire image is JPEG encoded.
Visual Performance
It is the break up of the original image into 8x8 blocks that leads to the blocky artifacts visible in highly compressed JPEG images. Notice the blockiness that occurs in a JPEG compressed image as illustrated below:
Wavelet Compression
While JPEG compression achieves admirable compression results through the use of the Discrete Cosine Transform and the quantizer, wavelet functions, pioneered in 1981, promise to replace the DCT in image compression to achieve much higher compression ratios than those possible with JPEG with much less loss. Since localization of signal characteristics in spatial (or time) and frequency domains can be accomplished very efficiently with wavelets, wavelets allow for the simultaneous determination of sharp transitions in the spectrum of the signal and in the position (or time) of their occurrence. This avoids much of the blocky artifacts that occur under JPEG’s DCT based compression scheme.
Brief Wavelet Theory
Wavelets are defined as functions over a finite interval with average value 0. A variety of wavelt functions have been discovered; of which the Daubechies wavelet is most commonly used for wavelet transform techniques in image compression applications:
where j represents scaling factor of the wavelet and the k represents the locations (center) of the wavelt. Both j and k are members of the integer number set. An example of an arbitrary wavelet is given below:
The principle idea of a wavelet transform is the same as that of a Discrete Cosine or Discrete Fourier Transform, i.e. to transform a function into a spatial domain representation by representing the function as a sum of distinct wavelets (basis functions).
In the wavelet transform, the idea is to use one central mother wavelet. Basis functions are then obtained through scaling and translating the mother wave. The wavelet transform of finite signal x(n) of N components is then described as an NxN matrix of the transform’s coefficients. It is because the input image is not blocked into 8x8 blocks like JPEG compression and the fact that the basis functions have variable length through scaling unlike the DCT that wavelets provide for higher image compression with reduced blocking artifacts over DCT methods.
Sub-band Coding
The Wavelet Transform is very closely related to Sub-band Coding Systems and Filter Banks. The key difference between the two is that wavelet transforms are designed to have some regularity properties (many zeros at z = 0 or z = 3.141516 (pi)). Because of the discovery that discrete-time filters can be iterated under these regularity conditions to form continuous time wavelets, the most basic of wavelet decomposition techniques uses sub-band coding systems to decompose the image into sub-bands. The figure below illustrates one such decomposition system which employs a two level octave-band decomposition scheme:
The sub-bands are constructed by cascading a series of filters with a down sampling. At each stage in filter bank, the number of samples is cut in half. The above diagram is actually not complete, the results of sub-band LL must be fed back into the system. The net effect is that the frequency band of the signal (image) is split into separate sub-bands in the frequency domain. Because each stage is successively downsampled by a factor of two, the resultant partition of the image in the frequency domain looks like the following:
For each sub-band depicted above, a version of the image showing edges at different resolutions is stored. To illustrate the visual results, the below simplified four quadrant uniform decomposition wavelet transform is shown on an actual image:
When combined, a close approximation of the original image is restored with the result:
Wavelet Decomposition Methods
Several methods for image decomposition exist, which include uniform decomposition (llustrated above with the bird), octave-band decomposition (illustrated with the sub-band coder diagram), and adaptive (wavelet-packet) decomposition. Octave band decomposition, which is used most frequently, continually decomposes the low frequency data of the image into narrower and narrower bands. Yet, recently, much research has been done on enhanced decomposition schemes. These include:
EZW was introduced in 1993 and has proven to be very effective.
Visual Performance
Compare the same set of images as described in JPEG compression: it can be noted that visual artifacts stemming from lossy wavelet-based compression does not take on the blockiness of JPEG compression. Artifacts under wavelet-based compression are much harder to
characterize; they are in the form of "smearing" of edges and uniform regions when significant coefficients that determine the boundary between regions are ignored.
A common rule of thumb between the comparison of JPEG and wavelet-based compression technologies states that JPEG images tend to be visually useless past compression ratios of 30:1, but that wavelet-based compression gracefully accepts compression ratios well beyond 100:1! To provide further visual proof of the superiority of wavelet-based compression over JPEG, witness the two images below, both compressions of an original at a 70:1 ratio. The image to the left uses DCT based JPEG compression, whereas the image to the right uses wavelet-based compression.
Conclusion
Currently, for photographic-quality images, JPEG is the standard compression format that is widely used to compress such images to sizes manageable by digital systems. However, while the DCT-based JPEG format performs well at moderate compression rates, higher compression rates reveal substantial artifacts that stem from JPEG’s 8x8 block coding scheme.
Research into wavelets has provided wavelet-based compression techniques that result in compression performance on the order of two to three times more effective than that of the JPEG standard. Moreover, these wavelet-based produce artifacts that are much less noticeable than the artifacts produced by JPEG.
By end of year 2000, work on the upcoming JPEG-2000 image compression standard will be complete. Because a vast majority of the 22 candidate algorithms under consideration for JPEG-2000 are wavelet based, it can be safely assumed that JPEG-2000 will most certainly by wavelet based.
Sources
http://www.faqs.org/faqs/jpeg-faq/
http://cm.bell-labs.com/cm/ms/who/jelena/Courses/98_ICASSP/2DWavelet/ImageCoder.html
http://www.icsl.ucla.edu/~ipl/psnr_results.html
http://www.mitre.org/technology/imagery_systems/imagelab/awic.html
http://users.intrepid.net/~hollyoak/wave.htm
http://www.acm.org/crossroads/xrds6-3/sahaimgcoding.html
http://www.faqs.org/faqs/compression-faq/part1/preamble.html
http://www331.jpl.nasa.gov/public/wave.html