Нет описания

田运杰 0f98fe524e Update README.md 6 месяцев назад
assets 8bdd18ed99 ini 10 месяцев назад
docker 1c68863560 upload docker 10 месяцев назад
examples 1c0ef1216a examples 10 месяцев назад
logs 9726b39ca1 seg 7 месяцев назад
tests c4a0a465fc test 10 месяцев назад
ultralytics 805d7271f7 bug fix 9 месяцев назад
LICENSE 2758ec06b2 Initial commit 10 месяцев назад
README.md 0f98fe524e Update README.md 6 месяцев назад
app.py db51919405 demo 10 месяцев назад
mkdocs.yml dd89c57437 mkdocs 10 месяцев назад
pyproject.toml 4adbe10cf2 upload pyproject 10 месяцев назад
requirements.txt d4a6642056 Fix 'AAttn' object has no attribute 'qk' by adding supervision==0.22 7 месяцев назад

README.md

YOLOv12

YOLOv12: Attention-Centric Real-Time Object Detectors

[Yunjie Tian](https://sunsmarterjie.github.io/)1, [Qixiang Ye](https://people.ucas.ac.cn/~qxye?language=en)2, [David Doermann](https://cse.buffalo.edu/~doermann/)1 1 University at Buffalo, SUNY, 2 University of Chinese Academy of Sciences.


Comparison with popular methods in terms of latency-accuracy (left) and FLOPs-accuracy (right) trade-offs

arXiv Hugging Face Demo Open In Colab Kaggle Notebook LightlyTrain Notebook deploy Openbayes

Updates

Abstract Enhancing the network architecture of the YOLO framework has been crucial for a long time but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.

Main Results

Turbo (default): | Model (det) | size
(pixels) | mAPval
50-95 | Speed (ms)
T4 TensorRT10
| params
(M) | FLOPs
(G) | | :----------------------------------------------------------------------------------- | :-------------------: | :-------------------:| :------------------------------:| :-----------------:| :---------------:| | YOLO12n | 640 | 40.4 | 1.60 | 2.5 | 6.0 | | YOLO12s | 640 | 47.6 | 2.42 | 9.1 | 19.4 | | YOLO12m | 640 | 52.5 | 4.27 | 19.6 | 59.8 | | YOLO12l | 640 | 53.8 | 5.83 | 26.5 | 82.4 | | YOLO12x | 640 | 55.4 | 10.38 | 59.3 | 184.6 |

v1.0: | Model (det) | size
(pixels) | mAPval
50-95 | Speed (ms)
T4 TensorRT10
| params
(M) | FLOPs
(G) | | :----------------------------------------------------------------------------------- | :-------------------: | :-------------------:| :------------------------------:| :-----------------:| :---------------:| | YOLO12n | 640 | 40.6 | 1.64 | 2.6 | 6.5 | | YOLO12s | 640 | 48.0 | 2.61 | 9.3 | 21.4 | | YOLO12m | 640 | 52.5 | 4.86 | 20.2 | 67.5 | | YOLO12l | 640 | 53.7 | 6.77 | 26.4 | 88.9 | | YOLO12x | 640 | 55.2 | 11.79 | 59.1 | 199.0 |

Instance segmentation: | Model (seg) | size
(pixels) | mAPbox
50-95 | mAPmask
50-95 | Speed (ms)
T4 TensorRT10
| params
(M) | FLOPs
(B) | | :------------------------------------------------------------------------------------| :--------------------: | :-------------------: | :---------------------: | :--------------------------------:| :------------------: | :-----------------: | | YOLOv12n-seg | 640 | 39.9 | 32.8 | 1.84 | 2.8 | 9.9 | | YOLOv12s-seg | 640 | 47.5 | 38.6 | 2.84 | 9.8 | 33.4 | | YOLOv12m-seg | 640 | 52.4 | 42.3 | 6.27 | 21.9 | 115.1 | | YOLOv12l-seg | 640 | 54.0 | 43.2 | 7.61 | 28.8 | 137.7 | | YOLOv12x-seg | 640 | 55.2 | 44.2 | 15.43 | 64.5 | 308.7 |

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
conda activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.val(data='coco.yaml', save_json=True)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=256, 
  imgsz=640,
  scale=0.5,  # S:0.9; M:0.9; L:0.9; X:0.9
  mosaic=1.0,
  mixup=0.0,  # S:0.05; M:0.15; L:0.15; X:0.2
  copy_paste=0.1,  # S:0.15; M:0.4; L:0.5; X:0.6
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

Prediction

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.predict()

Export

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.export(format="engine", half=True)  # or format="onnx"

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code is based on ultralytics. Thanks for their excellent work!

Citation

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.12524},
  year={2025}
}