README.md 6.8 KB

YOLOv12: Attention-Centric Real-Time Object Detector

Official PyTorch implementation of YOLOv12.


Comparisons with others in terms of latency-accuracy (left) and parameter-accuracy (right) trade-offs.

YOLOv12: Attention-Centric Real-Time Object Detector.\ Yunjie Tian, Qixiang Ye, and David Doermann\ arXiv Open In Colab Hugging Face Spaces

UPDATES 🔥

Abstract Enhancing the network architecture of the YOLO framework has been long crucial yet focused on CNN-based improvements, despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in both speed and accuracy. For example, YOLOv12-N achieves $40.4$ mAP with an inference latency of $1.4$ ms on a T4 GPU, outperforming the advanced YOLOv10-N/YOLOv11-N by $1.9/1.0$ mAP and being $x.x\%/x.x\%$ faster. This advantage extends to other model scales. Furthermore, YOLOv12-S achieves comparable accuracy to RT-DETR-R18/xxxx while running $86\%/xx\%$ faster, using only $35\%/xx\%$ of the computation and $45\%/xx\%$ of the parameters

Main Results

COCO

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
source activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/b/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.val(data='coco.yaml', batch=128)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=128, 
  imgsz=640,
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

# Export the model to ONNX format
path = model.export(format="onnx")  # return path to exported model

Finetuning


# If you want to finetune the model with pretrained weights, you could load the 
# pretrained weights like below
# model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
# model = YOLO('yolov12{n/s/m/b/l/x}.pt')

Prediction

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.predict()

Export

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLOv10('yolov12{n/s/m/b/l/x}.pt')

model.export(...)

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code base is based on ultralytics. Thanks for their excellent work!

Citation

If our code or models help your work, please cite our paper:

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.xxxxx},
  year={2025}
}
Model size
(pixels)
mAPval
50-95
Speed
T4 TensorRT10
params
(M)
FLOPs
(B)
YOLO12n 640 xx.x x.xx 2.5 6.6
YOLO12s 640 xx.x x.xx 8.9 22.0
YOLO12m 640 xx.x x.xx 19.9 69.7
YOLO12l 640 xx.x x.xx 28.3 97.2
YOLO12x 640 xx.x xx.x 63.2 216.5