One-Line Summary: Image interpolation estimates pixel values at non-integer coordinates by combining nearby known samples, enabling image resizing, rotation, warping, and any geometric transformation that maps output pixels to fractional input positions.
Prerequisites: Digital images and pixels, convolution and filtering, basic polynomial math.
What Is Image Interpolation?
Imagine you have a thermometer reading every 10 meters along a hiking trail, and you want to estimate the temperature at a point 7 meters past one sensor. You could take the nearest sensor's reading (nearest-neighbor), average the two flanking sensors (linear interpolation), or fit a smooth curve through several nearby sensors (cubic interpolation). Image interpolation does the same thing in 2D: when a geometric transformation (scaling, rotation, perspective warp) maps an output pixel to a fractional position in the input image, you must estimate the value at that non-integer coordinate from the surrounding integer-grid samples.
Formally, given a discrete image , interpolation constructs a continuous function such that at integer coordinates, and provides reasonable estimates at all real-valued .
The general framework uses an interpolation kernel :
This is a convolution of the discrete samples with the interpolation kernel, and different kernels yield different methods with different quality/speed tradeoffs.
How It Works
Nearest-Neighbor Interpolation
The simplest method: assign the value of the closest integer-coordinate pixel.
Kernel: A box function of width 1 centered at the origin.
Pros: Fastest. Preserves hard edges (no blending). Preserves exact pixel values. Cons: Produces blocky, jagged artifacts (aliasing) on diagonal edges when upscaling. Not suitable for smooth imagery.
Use case: Label maps, segmentation masks, and palette-indexed images where interpolated values would be meaningless.
Bilinear Interpolation
Performs linear interpolation in two directions. For a query point surrounded by four neighbors at integer coordinates:
where , , is the top-left neighbor, and .
Kernel: Triangle (tent) function: .
Pros: Smooth results. Fast (4 multiplications + 3 additions per pixel). Cons: Slight blurring, especially noticeable in text and fine patterns. Only continuous (smooth but with slope discontinuities).
Bicubic Interpolation
Fits a cubic polynomial through 16 surrounding neighbors (4x4 grid). The standard cubic convolution kernel (Keys, 1981):
where (the Catmull-Rom spline, which matches the ideal sinc function through the sample points) or (sharper, often preferred for photographic images).
Pros: continuous. Sharper than bilinear with fewer artifacts. The default for most image editing software. Cons: Slightly negative lobes can introduce small ringing near high-contrast edges. 4x slower than bilinear (16 samples per query).
import cv2
img = cv2.imread("photo.jpg")
# Resize to 2x with different interpolation methods
h, w = img.shape[:2]
new_size = (w * 2, h * 2)
nearest = cv2.resize(img, new_size, interpolation=cv2.INTER_NEAREST)
bilinear = cv2.resize(img, new_size, interpolation=cv2.INTER_LINEAR)
bicubic = cv2.resize(img, new_size, interpolation=cv2.INTER_CUBIC)
lanczos = cv2.resize(img, new_size, interpolation=cv2.INTER_LANCZOS4)Lanczos Interpolation
Uses a windowed sinc kernel with support over samples (typically , examining 36 neighbors in 2D):
where .
The sinc function is the ideal interpolation kernel (it perfectly reconstructs bandlimited signals), but it has infinite support. Lanczos approximates it with a finite window, providing the best frequency-domain characteristics among practical interpolation methods.
Pros: Sharpest results with minimal aliasing. Best for high-quality photographic resampling. Cons: Slowest (36 samples per query for Lanczos-3). Can produce visible ringing on very sharp edges.
Anti-Aliasing for Downsampling
Interpolation methods described above are designed for upsampling (mapping to positions between existing samples). When downsampling (reducing resolution), you must first apply a low-pass filter to prevent aliasing -- the same principle as the Nyquist theorem.
The proper pipeline for downsampling by factor :
- Apply a Gaussian blur with .
- Resample at the lower resolution using any interpolation method.
OpenCV's INTER_AREA mode handles this automatically for downsampling by averaging over source pixel regions, producing correct results without a separate pre-filter step.
# Correct downsampling (halving resolution)
small = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_AREA)
# Incorrect: using bicubic for downsampling without pre-filtering
# Can introduce aliasing artifacts
small_bad = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_CUBIC)Geometric Transformations
Interpolation is triggered whenever a geometric transformation maps output coordinates to non-integer input coordinates . Common transformations:
| Transformation | DOF | Preserves |
|---|---|---|
| Translation | 2 | Everything |
| Rigid (Euclidean) | 3 | Distances, angles |
| Similarity | 4 | Angles, ratios |
| Affine | 6 | Parallelism |
| Projective (homography) | 8 | Straight lines |
All of these require interpolation at the transformed coordinates. The inverse mapping approach is standard: for each output pixel, compute where it came from in the input, then interpolate.
# Rotation by 30 degrees around center using affine transform
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle=30, scale=1.0)
rotated = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REFLECT)Why It Matters
- Every image resizing operation -- from displaying a thumbnail to preparing training data at a fixed input size for a neural network -- requires interpolation. The choice of method directly affects visual quality and downstream accuracy.
- Data augmentation for deep learning (random crops, rotations, affine transforms) involves repeated interpolation; using bilinear instead of nearest-neighbor can measurably affect training outcomes.
- Panorama stitching, image registration, and optical flow visualization all rely on sub-pixel accurate interpolation to produce seamless results.
- Medical imaging standards (DICOM) specify interpolation requirements for displaying images at non-native resolutions to ensure diagnostic quality is maintained.
Key Technical Details
- Bilinear interpolation on a 1-megapixel image takes approximately 1 ms on a modern CPU; bicubic takes approximately 4 ms; Lanczos-3 approximately 10 ms.
- For CNN input preprocessing, bilinear is the most common choice (used by default in PyTorch's
F.interpolateand TensorFlow'stf.image.resize), balancing quality and speed. - Bicubic with (Catmull-Rom) exactly reproduces third-order polynomial interpolation and achieves approximation order, compared to for bilinear and for nearest-neighbor.
- In GPU texture sampling, bilinear interpolation is hardware-accelerated and essentially free (built into the texture unit), which is why it is the standard for real-time rendering.
- Pillow (PIL) uses Lanczos-3 as its default high-quality resizing filter (
Image.LANCZOS), while OpenCV defaults to bilinear (INTER_LINEAR).
Common Misconceptions
-
"Bicubic is always better than bilinear." For downsampling without pre-filtering, bicubic can actually produce worse results (more aliasing) than
INTER_AREAbecause it does not properly band-limit the signal. The "best" method depends on the operation direction and content type. -
"Interpolation creates new information." Interpolation estimates values between known samples based on assumptions about signal smoothness. It cannot recover information that was lost during the original sampling. True super-resolution requires learned priors about natural image statistics.
-
"Nearest-neighbor is always inferior." For categorical data (segmentation masks, label maps, indexed color), nearest-neighbor is the only correct choice. Bilinear interpolation of a binary mask produces gray values that have no semantic meaning; bilinear on a class-index map creates nonexistent class indices.
Connections to Other Concepts
digital-images-and-pixels.md: Interpolation reconstructs the continuous signal from discrete samples -- the inverse of the sampling process described in image formation.convolution-and-filtering.md: Every interpolation method can be expressed as convolution with an interpolation kernel, connecting resampling theory to filtering theory.image-pyramids-and-scale-space.md: Pyramid construction requires downsampling (with anti-aliasing), and Laplacian pyramid reconstruction requires upsampling -- both are interpolation operations.frequency-domain-and-fourier-transform.md: The ideal interpolation kernel (sinc function) is derived from the sampling theorem in the frequency domain. Practical kernels are finite approximations to the sinc, and their frequency responses determine aliasing behavior.
Further Reading
- Keys, "Cubic Convolution Interpolation for Digital Image Processing" (1981) -- Derives the cubic convolution kernel family and proves the Catmull-Rom variant () achieves third-order accuracy.
- Thevenaz et al., "Interpolation Revisited" (2000) -- Comprehensive comparison of interpolation methods from a signal processing perspective, including B-splines and their computational properties.
- Parker et al., "Comparison of Interpolating Methods for Image Resampling" (1983) -- Early systematic comparison of nearest-neighbor, bilinear, bicubic, and sinc-based methods with visual and quantitative evaluation.
- Szeliski, "Computer Vision: Algorithms and Applications" (2nd ed., 2022) -- Section 3.5 covers resampling, anti-aliasing, and the connection between interpolation and the sampling theorem.