説明なし

田运杰 4adbe10cf2 upload pyproject		1 年間前
assets	7de69b12d9 upload figures	1 年間前
docker	1c68863560 upload docker	1 年間前
examples	1c0ef1216a examples	1 年間前
tests	c4a0a465fc test	1 年間前
ultralytics	6b4cd3a234 init	1 年間前
LICENSE	2758ec06b2 Initial commit	1 年間前
README.md	05505a5889 Update README.md	1 年間前
pyproject.toml	4adbe10cf2 upload pyproject	1 年間前
requirements.txt	f610fddfa7 upload requirements	1 年間前

YOLOv12: Attention-Centric Real-Time Object Detector

Official PyTorch implementation of YOLOv12.

Comparisons with others in terms of latency-accuracy (left) and parameter-accuracy (right) trade-offs.

YOLOv12: Attention-Centric Real-Time Object Detector.\ Yunjie Tian, Qixiang Ye, and David Doermann\

UPDATES 🔥

2025/02/15: Add colab demo, HuggingFace Demo, and HuggingFace Model Page. Thanks to SkalskiP and kadirnar!

Abstract

Enhancing the network architecture of the YOLO framework has been long crucial yet focused on CNN-based improvements, despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in both speed and accuracy. For example, YOLOv12-N achieves $40.4$ mAP with an inference latency of $1.4$ ms on a T4 GPU, outperforming the advanced YOLOv10-N/YOLOv11-N by $1.9/1.0$ mAP and being $x.x\%/x.x\%$ faster. This advantage extends to other model scales. Furthermore, YOLOv12-S achieves comparable accuracy to RT-DETR-R18/xxxx while running $86\%/xx\%$ faster, using only $35\%/xx\%$ of the computation and $45\%/xx\%$ of the parameters

Main Results

COCO

| Model | size
^{(pixels) | mAP^{val
50-95 | Speed
^{T4 TensorRT10
| params
^{(M) | FLOPs
^{(B) |
| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| YOLO12n | 640 | xx.x | x.xx | 2.5 | 6.6 |
| YOLO12s | 640 | xx.x | x.xx | 8.9 | 22.0 |
| YOLO12m | 640 | xx.x | x.xx | 19.9 | 69.7 |
| YOLO12l | 640 | xx.x | x.xx | 28.3 | 97.2 |
| YOLO12x | 640 | xx.x | xx.x | 63.2 | 216.5 |}}}}}

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
source activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov10n yolov10s yolov10m yolov10b yolov10l yolov10x

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.val(data='coco.yaml', batch=128)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=128, 
  imgsz=640,
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

# Export the model to ONNX format
path = model.export(format="onnx")  # return path to exported model

Finetuning


# If you want to finetune the model with pretrained weights, you could load the 
# pretrained weights like below
# model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
# model = YOLO('yolov12{n/s/m/b/l/x}.pt')

Prediction

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLO('yolov12{n/s/m/b/l/x}.pt')

model.predict()

Export

from ultralytics import YOLO

model = YOLO.from_pretrained('sunsmarterjie/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
model = YOLOv10('yolov12{n/s/m/b/l/x}.pt')

model.export(...)

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code base is based on ultralytics. Thanks for their excellent work!

Citation

If our code or models help your work, please cite our paper:

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.xxxxx},
  year={2025}
}

README.md