Anchor-Free Detection

One-Line Summary: Anchor-free detectors eliminate predefined anchor boxes by directly predicting object locations as per-pixel classifications (FCOS) or center-point heatmaps (CenterNet), removing a major source of hyperparameter tuning while matching or exceeding anchor-based accuracy.

Prerequisites: Convolutional neural networks, feature pyramid network, bounding box regression, non-maximum suppression, focal loss

What Is Anchor-Free Detection?

Anchor-based detectors like Faster R-CNN and SSD tile thousands of predefined boxes across the image and ask, "Is there an object in this box? If so, how should I adjust the box?" Anchor-free detectors take a fundamentally different approach. Imagine laying a transparent grid over an image: instead of pre-placing boxes, you simply ask each grid point, "Are you inside an object? If so, how far is it to the object's edges?" -- or, even simpler, "Are you the center of an object? If so, how big is it?"

Technically, anchor-free detectors predict object locations without relying on a predefined set of anchor boxes. The two main families are per-pixel prediction (e.g., FCOS, which classifies every feature map location and regresses distances to box edges) and keypoint-based detection (e.g., CenterNet, which detects objects as center-point heatmap peaks and regresses size from those points).

How It Works

FCOS: Fully Convolutional One-Stage Detection (2019)

FCOS treats every location on the feature map as a potential detection point.

Per-pixel prediction: For a location $(x, y)$ on feature map level $P_{l}$ , if it falls inside a ground-truth box, FCOS predicts:

Classification: $C$ -dimensional vector of class scores.
Regression: 4 distances from the location to the box edges: $t^{*} = (l^{*}, t^{*}, r^{*}, b^{*})$ where $l^{*} = x - x_{0}$ , $t^{*} = y - y_{0}$ , $r^{*} = x_{1} - x$ , $b^{*} = y_{1} - y$ .

Centerness branch: A scalar indicating how close the location is to the object center: $centerness^{*} = \frac{m i n ( l ^{*} , r ^{*} )}{m a x ( l ^{*} , r ^{*} )} \times \frac{m i n ( t ^{*} , b ^{*} )}{m a x ( t ^{*} , b ^{*} )}$

This down-weights detections from peripheral locations, reducing low-quality predictions. The centerness score is multiplied with the classification score during inference.

Multi-level assignment: Objects of different sizes are assigned to different FPN levels based on the regression target magnitude, avoiding ambiguity when objects overlap.

CenterNet: Objects as Points (2019)

CenterNet (Zhou et al.) models each object as a single point -- its bounding box center.

Heatmap prediction: For each class $c$ , predict a heatmap $\hat{Y}_{c} \in [0, 1]^{H / R \times W / R}$ where $R$ is the output stride (typically 4). Peaks correspond to object centers.
Size regression: At each center point, predict the object width and height $(w, h)$ .
Offset regression: Predict a sub-pixel offset to recover discretization error from downsampling.

Training: Ground-truth heatmaps are generated by placing a 2D Gaussian at each object center: $Y_{x y c} = exp (- \frac{( x - p ~ _{x} ) ^{2} + ( y - p ~ _{y} ) ^{2}}{2 σ _{p}^{2}})$

where $σ_{p}$ is proportional to the object size. The loss is a modified focal loss on the heatmap.

Inference: Extract peaks from the heatmap via $3 \times 3$ max pooling (a simple form of NMS), take the top- $K$ peaks (e.g., $K = 100$ ), and read off the size and offset predictions at those locations. No traditional NMS post-processing is needed.

CornerNet (2018)

An earlier anchor-free approach by Law and Deng that detects objects as pairs of top-left and bottom-right corner keypoints, grouped by an associative embedding. It introduced the idea of keypoint-based detection but required complex corner pooling and grouping.

Why It Matters

Anchor hyperparameter elimination: Anchor-based detectors require tuning scales, aspect ratios, IoU thresholds for matching, and sampling strategies. Anchor-free methods remove this entire design space.
FCOS with ResNet-101-FPN achieved 44.7% AP on COCO, matching or exceeding Faster R-CNN and RetinaNet without any anchor-related hyperparameters.
CenterNet achieved 45.1% AP on COCO (Hourglass-104 backbone) with a simpler architecture and no NMS.
Anchor-free designs influenced subsequent work: YOLOv8 adopted an anchor-free head, and DETR can be viewed as an anchor-free detector.

Key Technical Details

FCOS (ResNet-101-FPN): 44.7% AP on COCO, ~18 FPS on a V100 GPU.
CenterNet (Hourglass-104): 45.1% AP on COCO, ~7.8 FPS. With DLA-34 backbone: 37.4% AP at ~52 FPS.
CenterNet (ResNet-18): 28.1% AP on COCO at ~142 FPS -- suitable for real-time edge deployment.
Centerness branch in FCOS: Adds ~2-3% AP over the base model by suppressing low-quality detections from peripheral locations.
FCOS positive sample definition: A location is positive if it falls within a ground-truth box AND the regression targets $(l, t, r, b)$ are within the allowed range for that FPN level.
CenterNet uses no anchors and no NMS, relying solely on heatmap peak extraction, making it architecturally the simplest modern detector.

Common Misconceptions

"Anchor-free means no predefined spatial structure." FCOS still uses FPN levels with defined stride and regression ranges. CenterNet uses a fixed output stride. The term "anchor-free" specifically means no predefined bounding box templates.
"Anchor-free detectors are always faster." The speed depends on the backbone and head complexity. CenterNet with Hourglass-104 is slower than many anchor-based detectors. The benefit is simpler design, not guaranteed speed improvement.
"CenterNet completely eliminates NMS." CenterNet replaces traditional IoU-based NMS with a simple $3 \times 3$ max-pooling operation on the heatmap, which is a form of local non-maximum suppression. However, it avoids the iterative greedy NMS used by other detectors.

Connections to Other Concepts

fast-and-faster-rcnn.md: The anchor-based two-stage paradigm that anchor-free methods seek to simplify.
feature-pyramid-network.md: Both FCOS and many CenterNet variants use FPN for multi-scale feature extraction.
focal-loss.md: FCOS uses focal loss for classification; CenterNet uses a modified focal loss for heatmap training.
detr.md: Another anchor-free approach but uses transformers and set-based prediction rather than per-pixel classification.
non-maximum-suppression.md: FCOS still requires NMS; CenterNet's heatmap peak extraction largely replaces it.
yolo.md: YOLOv8 adopted anchor-free prediction heads inspired by FCOS.