Image Processing and Computer Vision

Study Notes

JPEG (Joint Photographic Experts Group) is a standard for compressing image files.
The main goal of compression is to reduce the file size, which is proportional to the storage requirement.

In JPEG images, RGB (Red, Green, Blue) is converted to YCbCr (Luminance and Chrominance) color space.
This is because humans are more sensitive to brightness than color differences.
The chroma (color) channels are subsampled to reduce the amount of data allocated to the image.

Quantization reduces the precision of the DCT coefficients.
The JPEG algorithm uses a quantization matrix to eliminate high-frequency components.

AC (Alternating Current) coefficients are encoded using run-length encoding and Huffman coding.
DC (Direct Current) coefficients are encoded using Huffman coding.
Huffman coding represents frequent patterns with low-bit codes and less frequent patterns with high-bit codes.

Decoding involves inverse quantization, inverse DCT, and color space conversion back to RGB.

Compression ratio is the ratio of the original image size to the compressed image size.
Artifacts (distortions) appear after lossy compression and are proportional to the compression ratio and inversely proportional to image quality.

The 2D DFT is used for image processing and is given by X(k, L) = Σx(i, j) * e^(-j2π(ki/M + Lj/N))### Fourier Transform
2D discrete Fourier transform (2D DFT) is used for image reconstruction.
Top-left coefficients in 2D DFT represent low-frequency components containing information on the general shape of the image.
Low-frequency components retain a high percentage of the signal energy, resulting in larger amplitude compared to high-frequency components.

Considering top-left coefficients (low-frequency components) yields better reconstruction results.
Considering bottom-right coefficients (high-frequency components) yields minor reconstruction results.

𝑋𝑑 is a complex number encoding amplitude and phase information of a complex sinusoidal component.
Amplitude (𝐴𝑚𝑝) and Phase (𝑃ℎ) are calculated using 𝑅𝑒(𝑋𝑑) and 𝐼𝑚(𝑋𝑑).

Original image phase information does not provide significant information.
Amplitude image is mostly black due to out-of-range values.
Normalization is done using 𝒇 𝒙, 𝒚 = 𝒍𝒐𝒈(𝟏 + 𝑨𝒎𝒑 𝒙, 𝒚) to improve visualization.
Low-frequency components are shifted to the center for better visualization.

JPEG uses DCT (Discrete Cosine Transform) instead of DFT because DCT concentrates energy in a small number of coefficients.
Fast Fourier Transform (FFT) is an efficient version of DFT.

Low pass filter in the frequency domain is used for smoothing the image by attenuating high-frequency components.
Ideal low pass filter is designed using a cut-off frequency C and applied element-wise to the transformed image.

Filter H(u, v) is designed to exclude values greater than C, where C is the cut-off frequency.
Distance 𝑫(𝒖, 𝒗) = 𝒖² + 𝒗² is used to determine the filter values.

High pass filter in the frequency domain is used for detecting edges of the image by attenuating low-frequency components.
Ideal high pass filter is the inverse of the ideal low pass filter.