Keine Beschreibung

田运杰 4adbe10cf2 upload pyproject		vor 1 Jahr
assets	7de69b12d9 upload figures	vor 1 Jahr
docker	1c68863560 upload docker	vor 1 Jahr
examples	1c0ef1216a examples	vor 1 Jahr
tests	c4a0a465fc test	vor 1 Jahr
ultralytics	6b4cd3a234 init	vor 1 Jahr
LICENSE	2758ec06b2 Initial commit	vor 1 Jahr
README.md	05505a5889 Update README.md	vor 1 Jahr
pyproject.toml	4adbe10cf2 upload pyproject	vor 1 Jahr
requirements.txt	f610fddfa7 upload requirements	vor 1 Jahr

YOLOv12: Attention-Centric Real-Time Object Detector

Official PyTorch implementation of YOLOv12.

Comparisons with others in terms of latency-accuracy (left) and parameter-accuracy (right) trade-offs.

YOLOv12: Attention-Centric Real-Time Object Detector.\ Yunjie Tian, Qixiang Ye, and David Doermann\

UPDATES 🔥

2025/02/15: Add colab demo, HuggingFace Demo, and HuggingFace Model Page. Thanks to SkalskiP and kadirnar!

Abstract

Enhancing the network architecture of the YOLO framework has been long crucial yet focused on CNN-based improvements, despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in both speed and accuracy. For example, YOLOv12-N achieves $40.4$ mAP with an inference latency of $1.4$ ms on a T4 GPU, outperforming the advanced YOLOv10-N/YOLOv11-N by $1.9/1.0$ mAP and being $x.x\%/x.x\%$ faster. This advantage extends to other model scales. Furthermore, YOLOv12-S achieves comparable accuracy to RT-DETR-R18/xxxx while running $86\%/xx\%$ faster, using only $35\%/xx\%$ of the computation and $45\%/xx\%$ of the parameters

Main Results

COCO

| Model | size
^{(pixels) | mAP^{val
50-95 | Speed
^{T4 TensorRT10
| params
^{(M) | FLOPs
^{(B) |
| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| YOLO12n | 640 | xx.x | x.xx | 2.5 | 6.6 |
| YOLO12s | 640 | xx.x | x.xx | 8.9 | 22.0 |
| YOLO12m | 640 | xx.x | x.xx | 19.9 | 69.7 |
| YOLO12l | 640 | xx.x | x.xx | 28.3 | 97.2 |
| YOLO12x | 640 | xx.x | xx.x | 63.2 | 216.5 |}}}}}

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
source activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov10n yolov10s yolov10m yolov10b yolov10l yolov10x

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.val(data='coco.yaml', batch=128)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=128, 
  imgsz=640,
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

# Export the model to ONNX format
path = model.export(format="onnx")  # return path to exported model

Finetuning


# If you want to finetune the model with pretrained weights, you could load the 
# pretrained weights like below
# model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
# model = YOLO('yolov12{n/s/m/b/l/x}.pt')

Prediction

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.predict()

Export

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLOv10('yolov12{n/s/m/b/l/x}.pt')

model.export(...)

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code base is based on ultralytics. Thanks for their excellent work!

Citation

If our code or models help your work, please cite our paper:

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.xxxxx},
  year={2025}
}

README.md