Frequency Domain and Fourier Transform

One-Line Summary: The Fourier transform decomposes an image into a sum of sinusoidal components at different frequencies and orientations, enabling efficient filtering, compression, and analysis of periodic structures.

Prerequisites: Digital images and pixels, convolution and filtering, complex numbers, basic trigonometry.

What Is the Frequency Domain?

Consider a music equalizer on a stereo system. The raw audio signal is a complex waveform in the time domain, but the equalizer breaks it into separate frequency bands -- bass, midrange, treble -- so you can boost or cut each independently. The Fourier transform does the same thing for images, except now you have a 2D signal. Low frequencies correspond to smooth, slowly changing regions (like a clear sky), while high frequencies correspond to sharp transitions (edges, textures, noise). By converting an image to the frequency domain, you gain the ability to analyze and manipulate these components separately.

Mathematically, the 2D Discrete Fourier Transform (DFT) converts an $M \times N$ image $I [m, n]$ into a complex-valued frequency representation $F [u, v]$ :

$F [u, v] = \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} I [m, n] \cdot e^{- j 2 π (\frac{u m}{M} + \frac{v n}{N})}$

The inverse DFT recovers the image exactly:

$I [m, n] = \frac{1}{M N} \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} F [u, v] \cdot e^{j 2 π (\frac{u m}{M} + \frac{v n}{N})}$

Each complex coefficient $F [u, v]$ encodes the magnitude (amplitude of that sinusoidal component) and phase (its spatial shift). The frequency coordinates $(u, v)$ specify cycles per image width and height, respectively.

How It Works

Computing the DFT with FFT

The naive DFT requires $O (M^{2} N^{2})$ operations. The Fast Fourier Transform (Cooley-Tukey, 1965) exploits symmetry and periodicity to reduce this to $O (M N lo g (M N))$ , making practical computation possible. For a 1024x1024 image, this is the difference between ~ $1 0^{12}$ and ~ $2 \times 1 0^{7}$ operations.

import numpy as np
import cv2
 
img = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE).astype(np.float32)
 
# Compute 2D DFT
dft = np.fft.fft2(img)
 
# Shift zero-frequency to center for visualization
dft_shifted = np.fft.fftshift(dft)
 
# Magnitude spectrum (log scale for visibility)
magnitude = np.log1p(np.abs(dft_shifted))
 
# Phase spectrum
phase = np.angle(dft_shifted)

The Magnitude and Phase Spectra

The magnitude spectrum $∣ F [u, v] ∣$ tells you how much energy is present at each frequency and orientation. A bright spot at coordinates $(u, v)$ in the centered spectrum indicates a strong sinusoidal pattern at that frequency tilted at angle $θ = arctan (v / u)$ .

The phase spectrum $∠ F [u, v]$ encodes the spatial position of each sinusoidal component. A striking fact: the phase carries more structural information than the magnitude. If you swap the magnitude of one image with another but keep the phase, the result resembles the image whose phase was retained.

The Convolution Theorem

Convolution in the spatial domain equals element-wise multiplication in the frequency domain:

$I * K = F^{- 1} {F {I} \cdot F {K}}$

This means a spatial convolution with an $k \times k$ kernel can be performed as: (1) FFT the image, (2) FFT the kernel (zero-padded to image size), (3) multiply point-wise, (4) inverse FFT. For large kernels ( $k > 11$ or so), this is faster than direct spatial convolution.

Frequency-Domain Filtering

Filters are designed as 2D masks applied to the frequency spectrum:

Low-pass filter -- Passes low frequencies, blocks high frequencies. Effect: blurring.

$H_{ideal} (u, v) = {10 if u^{2} + v^{2} \leq D_{0} otherwise$

Ideal low-pass filters cause ringing artifacts (Gibbs phenomenon) because of the sharp cutoff. Butterworth and Gaussian filters provide smoother transitions:

$H_{Butterworth} (u, v) = \frac{1}{1 + ( \frac{u ^{2} + v ^{2}}{D _{0}} ) ^{2 n}}$

$H_{Gaussian} (u, v) = exp (- \frac{u ^{2} + v ^{2}}{2 D _{0}^{2}})$

High-pass filter -- $H_{HP} = 1 - H_{LP}$ . Effect: edge enhancement.

Band-pass / Band-reject -- Isolate or remove specific frequency ranges. Useful for removing periodic noise (e.g., scan lines in digitized film).

# Gaussian low-pass filter in frequency domain
rows, cols = img.shape
crow, ccol = rows // 2, cols // 2
D0 = 30  # cutoff frequency
 
u = np.arange(rows).reshape(-1, 1) - crow
v = np.arange(cols).reshape(1, -1) - ccol
H = np.exp(-(u**2 + v**2) / (2 * D0**2))
 
# Apply filter
filtered_dft = dft_shifted * H
result = np.real(np.fft.ifft2(np.fft.ifftshift(filtered_dft)))

The Discrete Cosine Transform (DCT)

The DCT is a real-valued variant of the DFT widely used in compression. JPEG compression works by:

Splitting the image into 8x8 blocks.
Applying the 2D DCT to each block.
Quantizing the DCT coefficients (lossy step).
Entropy coding the quantized values.

The DCT concentrates most image energy into a few low-frequency coefficients, enabling high compression ratios by discarding small high-frequency coefficients.

Why It Matters

The convolution theorem enables $O (N lo g N)$ filtering for large kernels, which is critical in applications like deconvolution and astronomical image processing.
JPEG, H.264, and HEVC all rely on frequency-domain transforms (DCT) for compression, making the Fourier perspective essential for understanding how images are stored and transmitted.
Frequency analysis can identify and remove periodic noise patterns (e.g., moire, scan lines) that are nearly impossible to address in the spatial domain.
The frequency content of an image determines appropriate sampling rates (connecting back to the Nyquist theorem) and informs the design of anti-aliasing filters.

Key Technical Details

A 1024x1024 FFT takes approximately 2-5 ms on a modern CPU (using FFTW or NumPy's FFT); padding to a power of 2 can speed up computation by 2-3x for non-power-of-2 sizes.
The frequency resolution of the DFT is $Δ f = 1/ N$ cycles per pixel for an $N$ -pixel dimension; longer signals yield finer frequency resolution.
The DFT is periodic with period $M \times N$ , which is why the output wraps around and fftshift is needed for centered visualization.
Phase information is critical for reconstruction; reconstructing from magnitude alone produces a blurred, unrecognizable ghost image.
Zero-padding the image before FFT does not add information but interpolates the frequency spectrum for smoother visualization and avoids circular convolution artifacts.

Common Misconceptions

"The Fourier transform only applies to 1D signals." The 2D DFT decomposes images into planar sinusoids characterized by both frequency and orientation. The concept extends naturally to 3D (video volumes) and higher dimensions.
"High-frequency content is always noise." Edges, textures, and fine details are all high-frequency content. Aggressive low-pass filtering removes legitimate detail along with noise. The challenge in denoising is distinguishing signal from noise in overlapping frequency bands.
"The magnitude spectrum alone characterizes an image." As demonstrated by Oppenheim and Lim (1981), two images with swapped magnitudes but preserved phases look like the phase-donor images, not the magnitude-donor images. Phase dominates structural appearance.

Connections to Other Concepts

convolution-and-filtering.md: The convolution theorem links spatial filtering directly to frequency-domain multiplication. Understanding one domain deepens understanding of the other.
image-pyramids-and-scale-space.md: Gaussian pyramid construction can be understood as successive low-pass filtering in the frequency domain, with each level retaining a progressively narrower band of frequencies.
image-noise-and-denoising.md: Wiener filtering and spectral subtraction are denoising techniques formulated entirely in the frequency domain.
digital-images-and-pixels.md: The Nyquist-Shannon sampling theorem, which governs image acquisition, is a frequency-domain concept.