One-Line Summary: Intersection over Union (IoU) measures the overlap between two bounding boxes as the ratio of their intersection area to their union area, serving as the universal metric for evaluating localization quality in object detection.
Prerequisites: Bounding box representation, object detection fundamentals
What Is Intersection over Union?
Picture two rectangles of transparent colored film laid on a table -- one red (your prediction) and one blue (the ground truth). Where they overlap, you see purple. IoU asks: what fraction of the total colored area is purple? If the rectangles are perfectly aligned, the answer is 1.0 (100% overlap). If they do not touch at all, the answer is 0.0. This single number captures how well a predicted bounding box matches the true object location.
Technically, Intersection over Union (also called the Jaccard index for sets) is defined as:
where and are two bounding box regions. IoU ranges from 0 (no overlap) to 1 (perfect overlap). It is symmetric: .
How It Works
Computing IoU for Axis-Aligned Boxes
Given two boxes defined by their corners:
- Box A:
- Box B:
Step 1: Compute intersection coordinates:
Step 2: Compute intersection area:
Step 3: Compute union area:
Step 4: Compute IoU:
IoU Thresholds in Evaluation
| Threshold | Name | Use Case |
|---|---|---|
| 0.5 | AP50 | Standard PASCAL VOC metric, lenient |
| 0.75 | AP75 | Strict localization quality |
| 0.5:0.95 | AP (COCO primary) | Average over 10 thresholds: 0.50, 0.55, ..., 0.95 |
A detection is a true positive if IoU with a matched ground-truth box exceeds the threshold and the class is correct; otherwise, it is a false positive.
IoU as a Loss Function
Standard IoU loss for bounding box regression:
This has a critical flaw: when boxes do not overlap (), the gradient is zero, providing no learning signal.
Generalized IoU (GIoU, 2019)
Rezatofighi et al. addressed the zero-gradient problem:
where is the smallest enclosing box of and . GIoU ranges from to , providing a gradient even when boxes do not overlap.
Distance-IoU (DIoU) and Complete-IoU (CIoU, 2020)
where is the Euclidean distance between box centers and is the diagonal of the enclosing box.
CIoU adds an aspect ratio consistency term:
where measures aspect ratio consistency and is a balancing parameter.
Why It Matters
- IoU is the standard localization metric used in every major detection benchmark (PASCAL VOC, COCO, Open Images, LVIS).
- COCO's primary metric (AP averaged over IoU 0.5:0.95) incentivizes precise localization, not just approximate overlap.
- IoU-based losses (GIoU, DIoU, CIoU) consistently outperform and box regression losses by 1-3% AP because they directly optimize the evaluation metric.
- IoU thresholds define what counts as a detection, making them among the most consequential hyperparameters in the entire detection pipeline.
Key Technical Details
- Computation cost: IoU between two boxes requires ~10 arithmetic operations. Pairwise IoU for boxes is .
- Scale invariance: IoU is invariant to box scale -- a 50% overlap at scores the same as at .
- GIoU loss improves Faster R-CNN by ~1% AP and YOLOv3 by ~2-3% AP compared to smooth loss.
- CIoU loss further improves over GIoU by ~0.5-1% AP by incorporating center distance and aspect ratio.
- PASCAL VOC uses AP50 (IoU ); COCO uses AP (averaged over 0.5:0.05:0.95), which is much stricter.
- IoU 0.5 vs. 0.75: A detector scoring 50% AP50 might score only 30% AP75, revealing coarse localization.
Common Misconceptions
- "IoU 0.5 means the prediction is 50% correct." IoU 0.5 means 50% of the union area is shared, but the prediction may include significant background or miss part of the object. Visually, IoU 0.5 boxes can look quite misaligned.
- "IoU is always the best matching metric." For very small objects (e.g., pixels), a shift of a few pixels causes a large IoU drop, even though the detection is essentially correct. Some benchmarks use pixel distance for very small objects.
- "L1 or L2 loss on box coordinates is equivalent to IoU." These losses treat each coordinate independently and are not scale-invariant. A 10-pixel error matters much more for a box than a box; IoU captures this naturally.
Connections to Other Concepts
non-maximum-suppression.md: Uses IoU to determine which overlapping boxes to suppress.r-cnn.md: IoU thresholds determine positive/negative assignment during training (e.g., IoU for positives, IoU for negatives).detr.md: Uses Generalized IoU in its matching cost and training loss.focal-loss.md: Training sample assignment relies on IoU between anchors and ground-truth boxes.sliding-window-and-region-proposals.md: Proposal recall is evaluated at specific IoU thresholds.
Further Reading
- Everingham et al., "The PASCAL Visual Object Classes (VOC) Challenge" (2010) -- Established IoU-based AP evaluation for detection.
- Lin et al., "Microsoft COCO: Common Objects in Context" (2014) -- Introduced the averaged AP metric over multiple IoU thresholds.
- Rezatofighi et al., "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression" (2019) -- GIoU.
- Zheng et al., "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression" (2020) -- DIoU and CIoU losses.