One-Line Summary: Anchor-free detectors eliminate predefined anchor boxes by directly predicting object locations as per-pixel classifications (FCOS) or center-point heatmaps (CenterNet), removing a major source of hyperparameter tuning while matching or exceeding anchor-based accuracy.

Prerequisites: Convolutional neural networks, feature pyramid network, bounding box regression, non-maximum suppression, focal loss

What Is Anchor-Free Detection?

Anchor-based detectors like Faster R-CNN and SSD tile thousands of predefined boxes across the image and ask, "Is there an object in this box? If so, how should I adjust the box?" Anchor-free detectors take a fundamentally different approach. Imagine laying a transparent grid over an image: instead of pre-placing boxes, you simply ask each grid point, "Are you inside an object? If so, how far is it to the object's edges?" -- or, even simpler, "Are you the center of an object? If so, how big is it?"

Technically, anchor-free detectors predict object locations without relying on a predefined set of anchor boxes. The two main families are per-pixel prediction (e.g., FCOS, which classifies every feature map location and regresses distances to box edges) and keypoint-based detection (e.g., CenterNet, which detects objects as center-point heatmap peaks and regresses size from those points).

How It Works

FCOS: Fully Convolutional One-Stage Detection (2019)

FCOS treats every location on the feature map as a potential detection point.

Per-pixel prediction: For a location on feature map level , if it falls inside a ground-truth box, FCOS predicts:

  • Classification: -dimensional vector of class scores.
  • Regression: 4 distances from the location to the box edges: where , , , .

Centerness branch: A scalar indicating how close the location is to the object center:

This down-weights detections from peripheral locations, reducing low-quality predictions. The centerness score is multiplied with the classification score during inference.

Multi-level assignment: Objects of different sizes are assigned to different FPN levels based on the regression target magnitude, avoiding ambiguity when objects overlap.

CenterNet: Objects as Points (2019)

CenterNet (Zhou et al.) models each object as a single point -- its bounding box center.

  1. Heatmap prediction: For each class , predict a heatmap where is the output stride (typically 4). Peaks correspond to object centers.
  2. Size regression: At each center point, predict the object width and height .
  3. Offset regression: Predict a sub-pixel offset to recover discretization error from downsampling.

Training: Ground-truth heatmaps are generated by placing a 2D Gaussian at each object center:

where is proportional to the object size. The loss is a modified focal loss on the heatmap.

Inference: Extract peaks from the heatmap via max pooling (a simple form of NMS), take the top- peaks (e.g., ), and read off the size and offset predictions at those locations. No traditional NMS post-processing is needed.

CornerNet (2018)

An earlier anchor-free approach by Law and Deng that detects objects as pairs of top-left and bottom-right corner keypoints, grouped by an associative embedding. It introduced the idea of keypoint-based detection but required complex corner pooling and grouping.

Why It Matters

  1. Anchor hyperparameter elimination: Anchor-based detectors require tuning scales, aspect ratios, IoU thresholds for matching, and sampling strategies. Anchor-free methods remove this entire design space.
  2. FCOS with ResNet-101-FPN achieved 44.7% AP on COCO, matching or exceeding Faster R-CNN and RetinaNet without any anchor-related hyperparameters.
  3. CenterNet achieved 45.1% AP on COCO (Hourglass-104 backbone) with a simpler architecture and no NMS.
  4. Anchor-free designs influenced subsequent work: YOLOv8 adopted an anchor-free head, and DETR can be viewed as an anchor-free detector.

Key Technical Details

  • FCOS (ResNet-101-FPN): 44.7% AP on COCO, ~18 FPS on a V100 GPU.
  • CenterNet (Hourglass-104): 45.1% AP on COCO, ~7.8 FPS. With DLA-34 backbone: 37.4% AP at ~52 FPS.
  • CenterNet (ResNet-18): 28.1% AP on COCO at ~142 FPS -- suitable for real-time edge deployment.
  • Centerness branch in FCOS: Adds ~2-3% AP over the base model by suppressing low-quality detections from peripheral locations.
  • FCOS positive sample definition: A location is positive if it falls within a ground-truth box AND the regression targets are within the allowed range for that FPN level.
  • CenterNet uses no anchors and no NMS, relying solely on heatmap peak extraction, making it architecturally the simplest modern detector.

Common Misconceptions

  • "Anchor-free means no predefined spatial structure." FCOS still uses FPN levels with defined stride and regression ranges. CenterNet uses a fixed output stride. The term "anchor-free" specifically means no predefined bounding box templates.
  • "Anchor-free detectors are always faster." The speed depends on the backbone and head complexity. CenterNet with Hourglass-104 is slower than many anchor-based detectors. The benefit is simpler design, not guaranteed speed improvement.
  • "CenterNet completely eliminates NMS." CenterNet replaces traditional IoU-based NMS with a simple max-pooling operation on the heatmap, which is a form of local non-maximum suppression. However, it avoids the iterative greedy NMS used by other detectors.

Connections to Other Concepts

  • fast-and-faster-rcnn.md: The anchor-based two-stage paradigm that anchor-free methods seek to simplify.
  • feature-pyramid-network.md: Both FCOS and many CenterNet variants use FPN for multi-scale feature extraction.
  • focal-loss.md: FCOS uses focal loss for classification; CenterNet uses a modified focal loss for heatmap training.
  • detr.md: Another anchor-free approach but uses transformers and set-based prediction rather than per-pixel classification.
  • non-maximum-suppression.md: FCOS still requires NMS; CenterNet's heatmap peak extraction largely replaces it.
  • yolo.md: YOLOv8 adopted anchor-free prediction heads inspired by FCOS.

Further Reading

  • Tian et al., "FCOS: Fully Convolutional One-Stage Object Detection" (2019) -- Per-pixel anchor-free detection with centerness.
  • Zhou et al., "Objects as Points" (2019) -- CenterNet's keypoint-based detection framework.
  • Law and Deng, "CornerNet: Detecting Objects as Paired Keypoints" (2018) -- Pioneering keypoint-based anchor-free detection.
  • Yang et al., "RepPoints: Point Set Representation for Object Detection" (2019) -- Represents objects as deformable point sets instead of boxes.
  • Zhang et al., "Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection" (2020) -- ATSS, showing that sample selection strategy matters more than anchors vs. anchor-free.