2 min read

Why 640 Beat 768 in My YOLO Training

Higher resolution should mean better detection, right? Not always.

Ran two experiments training YOLOv8s-P2 for four-leaf clover detection. Expected the 768 resolution to win. It didn’t.

The results

ModelResolutionmAP@0.5Recall
P264088.8%83.7%
P2_76876887.1%76.4%

640 won by 1.7% mAP and 7.3% recall. What?

Why higher resolution lost

1. Pretrained weights mismatch

YOLOv8s was pretrained on 640x640 images. When I trained at 768, the feature map scales didn’t match the pretrained weights. Transfer learning worked better when the resolution matched.

2. Small dataset + more pixels = overfitting

I only had 1,062 training images. That’s not a lot.

ResolutionPixelsPatterns to learn
640409,600baseline
768589,82444% more

768 had to learn 44% more spatial patterns from the same data. Recipe for overfitting.

3. Batch size dropped

Memory constraints forced me to reduce batch size:

# 640
batch=16

# 768
batch=12  # OOM otherwise

Smaller batch = noisier gradients = less stable training. Especially hurts on small datasets.

4. Augmentation already covered scale variance

I was already using:

multi_scale=True   # trains at various scales
scale=0.7          # random resize ±70%
mosaic=1.0         # combines 4 images

These augmentations simulate different resolutions. Going to 768 didn’t add much the augmentations weren’t already providing.

5. P2 head sweet spot

The P2 detection head operates at stride 4, optimized for small objects. At 640, objects fell into P2’s optimal detection range. At 768, objects became relatively larger in the feature space, potentially moving out of P2’s sweet spot.

When 768 would win

  • Large dataset (10,000+ images)
  • Very tiny objects (<10px at 640)
  • Enough GPU memory to keep batch=16
  • Training from scratch (no pretrained weights)

Takeaway

Higher resolution isn’t automatically better. Match your resolution to:

  • Pretrained model resolution
  • Dataset size
  • Available compute (batch size matters)
  • Actual object sizes in your data

For my clover detector, 640 + P2 + augmentation was the winning combo. 88.8% mAP from 58.4% baseline.