One-Line Summary: Corner detection identifies points where image intensity changes sharply in multiple directions, producing stable landmarks for tracking and matching via methods like Harris and Shi-Tomasi.
Prerequisites: Image gradients, edge detection, convolution and filtering, eigenvalues (linear algebra)
What Is Corner Detection?
Think of standing at a street intersection versus walking along a straight road. On the road, sliding forward or backward looks similar; at the intersection, any direction of movement changes the scene. A corner in an image works the same way: it is a point where shifting a small window in any direction produces a significant change in pixel intensities.
Technically, a corner is a location where the local autocorrelation function of the image has high curvature in all directions. This is captured by analyzing the eigenvalues of a structure tensor computed from image gradients.
How It Works
The Structure Tensor
For a grayscale image , compute the gradient components and (e.g., via Sobel). At each pixel, form the structure tensor (also called the second-moment matrix):
where is a local window and is typically a Gaussian weighting function with -- pixels.
Harris Corner Detector
Harris and Stephens (1988) avoid explicit eigenvalue computation by defining a corner response function:
where is an empirical constant, typically --. The classification rule is:
- (positive, large): corner -- both eigenvalues are large.
- (negative, large magnitude): edge -- one eigenvalue dominates.
- : flat region -- both eigenvalues are small.
After computing for every pixel, apply non-maximum suppression in a local neighborhood (commonly or ) to retain only the sharpest corners.
Shi-Tomasi ("Good Features to Track")
Shi and Tomasi (1994) simplified the criterion to:
A point is a corner if . This directly measures the weaker of the two gradient directions and has been shown experimentally to produce features that are more reliably tracked across frames. OpenCV's goodFeaturesToTrack implements this method.
Code Example
import cv2
import numpy as np
img = cv2.imread("building.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.float32)
# Harris corners
harris = cv2.cornerHarris(gray, blockSize=2, ksize=3, k=0.04)
corners_harris = harris > 0.01 * harris.max()
# Shi-Tomasi corners
shi_tomasi = cv2.goodFeaturesToTrack(
gray, maxCorners=500, qualityLevel=0.01, minDistance=10
)Sub-pixel Refinement
For applications demanding high geometric precision (stereo vision, calibration), detected corners are refined to sub-pixel accuracy using cv2.cornerSubPix, which iteratively solves the system within a search window. This typically improves localization from integer-pixel to 0.05--0.1 pixel accuracy.
Why It Matters
- Corners serve as the primary interest points for feature matching, visual odometry, and SLAM pipelines.
- The Shi-Tomasi detector is the default front-end for the KLT (Kanade-Lucas-Tomasi) tracker, one of the most deployed tracking algorithms in history.
- Camera calibration depends on accurate corner detection in checkerboard patterns to estimate intrinsic and extrinsic parameters.
- Corner features are inherently more discriminative than edge pixels because they encode 2D structure, enabling unambiguous correspondence.
Key Technical Details
- Harris response computation costs roughly 15 multiply-accumulate operations per pixel (for a Sobel and Gaussian window).
- The Harris detector is rotation-invariant but not scale-invariant. Multi-scale extensions build Gaussian pyramids and detect at each level.
- Shi-Tomasi requires an explicit eigenvalue decomposition (or equivalent), adding approximately 30% compute over Harris but improving tracking stability.
- On a 640x480 image,
goodFeaturesToTrackwith 500 corners runs in under 2 ms on a modern CPU. - The FAST detector (Rosten and Drummond, 2006) tests a Bresenham circle of 16 pixels and runs 5--10x faster than Harris, making it the go-to choice for real-time systems like ORB-SLAM.
Common Misconceptions
- "Harris corners correspond to geometric corners of objects." Not necessarily. Harris detects any point with high gradient variation in two directions -- textured patches, T-junctions, and even some texture patterns qualify.
- "The parameter in Harris does not matter much." Values outside the -- range can dramatically change the number and distribution of detected points. Lower biases toward edges; higher suppresses corners with moderate response.
- "Shi-Tomasi always outperforms Harris." Shi-Tomasi is better for tracking because it directly optimizes the weaker gradient direction, but Harris can be preferred in recognition tasks where the full response distribution matters.
Connections to Other Concepts
edge-detection.md: Corners are special cases where two or more edges meet; the structure tensor generalizes the gradient magnitude used in edge detectors.sift.md: SIFT keypoints use Difference-of-Gaussian extrema rather than Harris, but the underlying principle of seeking high-information locations is the same.optical-flow.md: The KLT tracker explicitly requires corners (Shi-Tomasi features) because the linear system for flow estimation is only well-conditioned at points with two large eigenvalues.camera-calibration-and-geometry.md: Checkerboard corner detection is the first step in Zhang's calibration method.
Further Reading
- Harris, C. and Stephens, M., "A Combined Corner and Edge Detector" (1988) -- The original Harris detector paper.
- Shi, J. and Tomasi, C., "Good Features to Track" (1994) -- Eigenvalue-based reformulation with tracking experiments.
- Rosten, E. and Drummond, T., "Machine Learning for High-Speed Corner Detection" (2006) -- The FAST corner detector, learned from a decision tree on pixel comparisons.