One-Line Summary: Anchor-free detectors eliminate predefined anchor boxes by directly predicting object locations as per-pixel classifications (FCOS) or center-point heatmaps (CenterNet), removing a major source of hyperparameter tuning while matching or exceeding anchor-based accuracy.
Prerequisites: Convolutional neural networks, feature pyramid network, bounding box regression, non-maximum suppression, focal loss
What Is Anchor-Free Detection?
Anchor-based detectors like Faster R-CNN and SSD tile thousands of predefined boxes across the image and ask, "Is there an object in this box? If so, how should I adjust the box?" Anchor-free detectors take a fundamentally different approach. Imagine laying a transparent grid over an image: instead of pre-placing boxes, you simply ask each grid point, "Are you inside an object? If so, how far is it to the object's edges?" -- or, even simpler, "Are you the center of an object? If so, how big is it?"
Technically, anchor-free detectors predict object locations without relying on a predefined set of anchor boxes. The two main families are per-pixel prediction (e.g., FCOS, which classifies every feature map location and regresses distances to box edges) and keypoint-based detection (e.g., CenterNet, which detects objects as center-point heatmap peaks and regresses size from those points).
How It Works
FCOS: Fully Convolutional One-Stage Detection (2019)
FCOS treats every location on the feature map as a potential detection point.
Per-pixel prediction: For a location on feature map level , if it falls inside a ground-truth box, FCOS predicts:
- Classification: -dimensional vector of class scores.
- Regression: 4 distances from the location to the box edges: where , , , .
Centerness branch: A scalar indicating how close the location is to the object center:
This down-weights detections from peripheral locations, reducing low-quality predictions. The centerness score is multiplied with the classification score during inference.
Multi-level assignment: Objects of different sizes are assigned to different FPN levels based on the regression target magnitude, avoiding ambiguity when objects overlap.
CenterNet: Objects as Points (2019)
CenterNet (Zhou et al.) models each object as a single point -- its bounding box center.
- Heatmap prediction: For each class , predict a heatmap where is the output stride (typically 4). Peaks correspond to object centers.
- Size regression: At each center point, predict the object width and height .
- Offset regression: Predict a sub-pixel offset to recover discretization error from downsampling.
Training: Ground-truth heatmaps are generated by placing a 2D Gaussian at each object center:
where is proportional to the object size. The loss is a modified focal loss on the heatmap.
Inference: Extract peaks from the heatmap via max pooling (a simple form of NMS), take the top- peaks (e.g., ), and read off the size and offset predictions at those locations. No traditional NMS post-processing is needed.
CornerNet (2018)
An earlier anchor-free approach by Law and Deng that detects objects as pairs of top-left and bottom-right corner keypoints, grouped by an associative embedding. It introduced the idea of keypoint-based detection but required complex corner pooling and grouping.
Why It Matters
- Anchor hyperparameter elimination: Anchor-based detectors require tuning scales, aspect ratios, IoU thresholds for matching, and sampling strategies. Anchor-free methods remove this entire design space.
- FCOS with ResNet-101-FPN achieved 44.7% AP on COCO, matching or exceeding Faster R-CNN and RetinaNet without any anchor-related hyperparameters.
- CenterNet achieved 45.1% AP on COCO (Hourglass-104 backbone) with a simpler architecture and no NMS.
- Anchor-free designs influenced subsequent work: YOLOv8 adopted an anchor-free head, and DETR can be viewed as an anchor-free detector.
Key Technical Details
- FCOS (ResNet-101-FPN): 44.7% AP on COCO, ~18 FPS on a V100 GPU.
- CenterNet (Hourglass-104): 45.1% AP on COCO, ~7.8 FPS. With DLA-34 backbone: 37.4% AP at ~52 FPS.
- CenterNet (ResNet-18): 28.1% AP on COCO at ~142 FPS -- suitable for real-time edge deployment.
- Centerness branch in FCOS: Adds ~2-3% AP over the base model by suppressing low-quality detections from peripheral locations.
- FCOS positive sample definition: A location is positive if it falls within a ground-truth box AND the regression targets are within the allowed range for that FPN level.
- CenterNet uses no anchors and no NMS, relying solely on heatmap peak extraction, making it architecturally the simplest modern detector.
Common Misconceptions
- "Anchor-free means no predefined spatial structure." FCOS still uses FPN levels with defined stride and regression ranges. CenterNet uses a fixed output stride. The term "anchor-free" specifically means no predefined bounding box templates.
- "Anchor-free detectors are always faster." The speed depends on the backbone and head complexity. CenterNet with Hourglass-104 is slower than many anchor-based detectors. The benefit is simpler design, not guaranteed speed improvement.
- "CenterNet completely eliminates NMS." CenterNet replaces traditional IoU-based NMS with a simple max-pooling operation on the heatmap, which is a form of local non-maximum suppression. However, it avoids the iterative greedy NMS used by other detectors.
Connections to Other Concepts
fast-and-faster-rcnn.md: The anchor-based two-stage paradigm that anchor-free methods seek to simplify.feature-pyramid-network.md: Both FCOS and many CenterNet variants use FPN for multi-scale feature extraction.focal-loss.md: FCOS uses focal loss for classification; CenterNet uses a modified focal loss for heatmap training.detr.md: Another anchor-free approach but uses transformers and set-based prediction rather than per-pixel classification.non-maximum-suppression.md: FCOS still requires NMS; CenterNet's heatmap peak extraction largely replaces it.yolo.md: YOLOv8 adopted anchor-free prediction heads inspired by FCOS.
Further Reading
- Tian et al., "FCOS: Fully Convolutional One-Stage Object Detection" (2019) -- Per-pixel anchor-free detection with centerness.
- Zhou et al., "Objects as Points" (2019) -- CenterNet's keypoint-based detection framework.
- Law and Deng, "CornerNet: Detecting Objects as Paired Keypoints" (2018) -- Pioneering keypoint-based anchor-free detection.
- Yang et al., "RepPoints: Point Set Representation for Object Detection" (2019) -- Represents objects as deformable point sets instead of boxes.
- Zhang et al., "Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection" (2020) -- ATSS, showing that sample selection strategy matters more than anchors vs. anchor-free.