Nessuna descrizione

田运杰 8b92b3f97f requirements		1 anno fa
assets	7de69b12d9 upload figures	1 anno fa
docker	1c68863560 upload docker	1 anno fa
examples	1c0ef1216a examples	1 anno fa
tests	c4a0a465fc test	1 anno fa
ultralytics	6b4cd3a234 init	1 anno fa
LICENSE	2758ec06b2 Initial commit	1 anno fa
README.md	1c96f4661a Update README.md	1 anno fa
mkdocs.yml	dd89c57437 mkdocs	1 anno fa
pyproject.toml	4adbe10cf2 upload pyproject	1 anno fa
requirements.txt	8b92b3f97f requirements	1 anno fa

YOLOv12: Attention-Centric Real-Time Object Detector

Official PyTorch implementation of YOLOv12.

Comparisons with others in terms of latency-accuracy (left) and parameter-accuracy (right) trade-offs.

YOLOv12: Attention-Centric Real-Time Object Detector.\ Yunjie Tian, Qixiang Ye, and David Doermann\

UPDATES 🔥

2025/02/15: Add colab demo, HuggingFace Demo, and HuggingFace Model Page. Thanks to SkalskiP and kadirnar!

Abstract

Enhancing the network architecture of the YOLO framework has been long crucial yet focused on CNN-based improvements, despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in both speed and accuracy. For example, YOLOv12-N achieves $40.4$ mAP with an inference latency of $1.4$ ms on a T4 GPU, outperforming the advanced YOLOv10-N/YOLOv11-N by $1.9/1.0$ mAP and being $x.x\%/x.x\%$ faster. This advantage extends to other model scales. Furthermore, YOLOv12-S achieves comparable accuracy to RT-DETR-R18/xxxx while running $86\%/xx\%$ faster, using only $35\%/xx\%$ of the computation and $45\%/xx\%$ of the parameters

Main Results

COCO

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
source activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/b/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.val(data='coco.yaml', batch=128)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=128, 
  imgsz=640,
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

# Export the model to ONNX format
path = model.export(format="onnx")  # return path to exported model

Finetuning


# If you want to finetune the model with pretrained weights, you could load the 
# pretrained weights like below
# model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
# model = YOLO('yolov12{n/s/m/b/l/x}.pt')

Prediction

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.predict()

Export

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov12{n/s/m/l/x}')
# or
# wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12{n/s/m/l/x}.pt
model = YOLOv10('yolov12{n/s/m/b/l/x}.pt')

model.export(...)

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code base is based on ultralytics. Thanks for their excellent work!

Citation

If our code or models help your work, please cite our paper:

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.xxxxx},
  year={2025}
}

Model	size ^(pixels)	mAP^val 50-95	Speed ^{T4 TensorRT10}	params ^(M)	FLOPs ^(B)
YOLO12n	640	xx.x	x.xx	2.5	6.6
YOLO12s	640	xx.x	x.xx	8.9	22.0
YOLO12m	640	xx.x	x.xx	19.9	69.7
YOLO12l	640	xx.x	x.xx	28.3	97.2
YOLO12x	640	xx.x	xx.x	63.2	216.5

README.md