Nessuna descrizione

田运杰 a505cc6cfb Del		10 mesi fa
assets	a505cc6cfb Del	10 mesi fa
docker	1c68863560 upload docker	11 mesi fa
examples	1c0ef1216a examples	11 mesi fa
tests	c4a0a465fc test	11 mesi fa
ultralytics	cf93572c47 block	10 mesi fa
LICENSE	2758ec06b2 Initial commit	11 mesi fa
README.md	c85858d66f Update README.md	10 mesi fa
mkdocs.yml	dd89c57437 mkdocs	11 mesi fa
pyproject.toml	4adbe10cf2 upload pyproject	11 mesi fa
requirements.txt	8b92b3f97f requirements	11 mesi fa

YOLOv12: Attention-Centric Real-Time Object Detector

Official PyTorch implementation of YOLOv12.

Comparisons with others in terms of latency-accuracy (left) and FLOPs-accuracy (right) trade-offs.

YOLOv12: Attention-Centric Real-Time Object Detector.\ Yunjie Tian, Qixiang Ye, and David Doermann\

UPDATES 🔥

2025/02/18: Arxiv

Abstract

Enhancing the network architecture of the YOLO framework has been crucial for a long time but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.

Main Results

COCO

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
source activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.val(data='coco.yaml', save_json=True)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=256, 
  imgsz=640,
  scale=0.5,  # S:0.9; M:0.9; L:0.9; X:0.9
  mosaic=1.0,
  mixup=0.0,  # S:0.05; M:0.15; L:0.15; X:0.2
  copy_paste=0.1,  # S:0.15; M:0.4; L:0.5; X:0.6
  device="0,1,2,3,4,5,6,7",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

Finetuning

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')

Prediction

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.predict()

Export

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.export(format="engine", half=True)  # or ONNX format

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code base is based on ultralytics. Thanks for their excellent work!

Citation

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and etc.},
  journal={arXiv preprint arXiv:2502.xxxxx},
  year={2025}
}

Model	size ^(pixels)	mAP^val 50-95	Speed ^{T4 TensorRT10}	params ^(M)	FLOPs ^(G)
YOLO12n	640	40.6	1.64	2.6	6.5
YOLO12s	640	48.0	2.61	9.3	21.4
YOLO12m	640	52.5	4.86	20.2	67.5
YOLO12l	640	53.7	6.77	26.4	88.9
YOLO12x	640	55.2	11.79	59.1	199.0

README.md