This is a picture of a famous mathematician: Emmy Noether compressed in different ways
Introduction When retrieved from the Internet, digital images take a considerable amount of time to download and use a large amount of computer memory. The Haar wavelet transform that we will discuss in this application is one way of compressing digital images so they take less space when stored and transmitted. As we will see later, the word ``wavelet’’ stands for an orthogonal basis of a certain vector space.
The basic idea behind this method of compression is to treat a digital image as an array of numbers i.e., a matrix. Each image consists of a fairly large number of little squares called pixels (picture elements). The matrix corresponding to a digital image assigns a whole number to each pixel. For example, in the case of a 256x256 pixel gray scale image, the image is stored as a 256x256 matrix, with each element of the matrix being a whole number ranging from 0 (for black) to 225 (for white). The JPEG compression technique divides an image into 8x8 blocks and assigns a matrix to each block. One can use some linear algebra techniques to maximize compression of the image and maintain a suitable level of detail
Vector transform using Haar Wavelets Before we explain the transform of a matrix, let us see how the wavelets transform vectors (rows of a matrix). Suppose
is one row of an 8x8 image matrix. In general, if the data string has length equal to 2k, then the transformation process will consist of k steps. In the above case, there will be 3 steps since 8=23.
We perform the following operations on the entries of the vector r:
1. Divide the entries of r into four pairs: (420, 680), (448, 708), (1260, 1410), (1600, 600).
2. Form the average of each of these pairs:
These will form the first four entries of the next step vector r1.
3. Subtract each average from the first entry of the pair to get the numbers:
These will form the last four entries of the next step vector r1.
4. Form the new vector:
Note that the vector r1 can be obtained from r by multiplying r on the right by the matrix:
The first four coefficients of r1 are called the approximation coefficients and the last four entries are called the detail coefficients.
For our next step, we look at the first four entries of r1 as two pairs that we take their averages as in step 1 above. This gives the first two entries: 564, 1470 of the new vector r2. These are our new approximation coefficients. The third and the fourth entries of r2 are obtained by subtracting these averages from the first element of each pair. This results in the new detail coefficients: -14, -130. The last four entries of r2 are the same as the detail coefficients of r1:
Here the vector r2 can be obtained from r1 by multiplying r1 on the right by the matrix:
For the last step, average the first two entries of r2, and as before subtract the answer from the first entry. This results in the following vector:
As before, r3 can be obtained from r1 by multiplying r2 on the wright by the matrix:
As a consequence, one gets r3 immediately from r using the following equation
Note the following:
The fact the W is invertible allows us to retrieve our image from the compressed form using the relation
Suppose that A is the matrix corresponding to a certain image. The Haar transform is carried out by performing the above operations on each row of the matrix A and then by repeating the same operations on the columns of the resulting matrix. The row-transformed matrix is AW. Transforming the columns of AW is obtained by multiplying AW on the left by the matrix WT (the transpose of W). Thus, the Haar transform takes the matrix A and stores it as WTAW. Let S denote the transformed matrix:
Using the properties of inverse matrix, we can retrieve our original matrix:
This allows us to see the original image (decompressing the compressed image).
Let us try an example.
Example Suppose we have an 8x8 image represented by the matrix
The row-transformed matrix is
Transforming the columns of L is obtained as follows
The point of doing Haar wavelet transform is that areas of the original matrix that contain little variation will end up as zero elements in the transformed matrix. A matrix is considered sparse if it has a “high proportion of zero entries”. Sparse matrices take much less memory to store. Since we cannot expect the transformed matrices always to be sparse, we decide on a non-negative threshold value known as ε, and then we let any entry in the transformed matrix whose absolute value is less than ε to be reset to zero. This will leave us with a kind of sparse matrix. If ε is zero, we will not modify any of the elements.
Every time you click on an image to download it from the Internet, the source computer recalls the Haar transformed matrix from its memory. It first sends the overall approximation coefficients and larger detail coefficients and a bit later the smaller detail coefficients. As your computer receives the information, it begins reconstructing in progressively greater detail until the original image is fully reconstructed.
Let us first recall that an nxn square matrix A is called orthogonal if its columns form an orthonormal basis of Rn, that is the columns of A are pairwise orthogonal and the length of each column vector is 1. Equivalently, A is orthogonal if its inverse is equal to its transpose. That latter property makes retrieving the transformed image via the equation
Another powerful property of orthogonal matrices is that they preserve magnitude. In other words, if v is a vector of Rn and A is an orthogonal matrix, then ||Av||=||v||. Here is how it works:
This in turns shows that ||Av||=||v||. Also, the angle is preserved when the transformation is by orthogonal matrices: recall that the cosine of the angle between two vectors u and v is given by:
so, if A is an orthogonal matrix, ψ is the angle between the two vectors Au and Av, then
Since both magnitude and angle are preserved, there is significantly less distortion produced in the rebuilt image when an orthogonal matrix is used. Since the transformation matrix W is the product of three other matrices, one can normalize W by normalizing each of the three matrices. The normalized version of W is
Remark If you look closely at the process we described above, you will notice that the matrix W is nothing but a change of basis for R8. In other words, the columns of W form a new basis (a “very nice” one) of R8. So when you multiply a vector v (written in the standard basis) of R8 by W, what you get is the coordinates of v in this new basis. Some of these coordinates can be “neglected” using our threshold and this what allows the transformed matrix to be stored more easily and transmitted more quickly.
Compression ratio If we choose our threshold value ε to be positive (i.e. greater than zero), then some entries of the transformed matrix will be reset to zero and therefore some detail will be lost when the image is decompressed. The key issue is then to choose ε wisely so that the compression is done effectively with a minimum “damage” to the picture. Note that the compression ratio is defined as the ratio of nonzero entries in the transformed matrix (S=WTAW) to the number of nonzero entries in the compressed matrix obtained from S by applying the threshold ε.