Không có mô tả

xujunwei f3a34e5ba4 测试数据		6 tháng trước cách đây
.gradio	f3a34e5ba4 测试数据	6 tháng trước cách đây
assets	8bdd18ed99 ini	10 tháng trước cách đây
docker	1c68863560 upload docker	10 tháng trước cách đây
examples	1c0ef1216a examples	10 tháng trước cách đây
logs	9726b39ca1 seg	7 tháng trước cách đây
mytest	512a6f8bfe 测试数据	6 tháng trước cách đây
runs	f3a34e5ba4 测试数据	6 tháng trước cách đây
tests	c4a0a465fc test	10 tháng trước cách đây
ultralytics	805d7271f7 bug fix	9 tháng trước cách đây
ultralytics.egg-info	f3a34e5ba4 测试数据	6 tháng trước cách đây
LICENSE	2758ec06b2 Initial commit	10 tháng trước cách đây
README.md	0f98fe524e Update README.md	6 tháng trước cách đây
app.py	db51919405 demo	10 tháng trước cách đây
mkdocs.yml	dd89c57437 mkdocs	10 tháng trước cách đây
pyproject.toml	4adbe10cf2 upload pyproject	10 tháng trước cách đây
requirements.txt	22c632b767 测试数据	6 tháng trước cách đây
train.py	f3a34e5ba4 测试数据	6 tháng trước cách đây
yolov12m.pt	f3a34e5ba4 测试数据	6 tháng trước cách đây
yolov12n.pt	f3a34e5ba4 测试数据	6 tháng trước cách đây
yolov12s.pt	f3a34e5ba4 测试数据	6 tháng trước cách đây

YOLOv12

YOLOv12: Attention-Centric Real-Time Object Detectors

[Yunjie Tian](https://sunsmarterjie.github.io/)¹, [Qixiang Ye](https://people.ucas.ac.cn/~qxye?language=en)², [David Doermann](https://cse.buffalo.edu/~doermann/)¹ ¹ University at Buffalo, SUNY, ² University of Chinese Academy of Sciences.

Comparison with popular methods in terms of latency-accuracy (left) and FLOPs-accuracy (right) trade-offs

Updates

2025/06/17: Use this repo for YOLOv12 instead of ultralytics. Their implementation is inefficient, requires more memory, and has unstable training, which are fixed here!
2025/06/04: YOLOv12's instance segmentation models are released, see code.
2025/04/15: Pretrain a YOLOv12 model with LightlyTrain, a novel framework that lets you pretrain any computer vision model on your unlabeled data, with YOLOv12 support. Here is also a Colab tutorial!
2025/03/18: Some guys are interested in the heatmap. See this issue.
2025/03/09: YOLOv12-turbo is released: a faster YOLOv12 version.
2025/02/24: Blogs: ultralytics, LearnOpenCV. Thanks to them!
2025/02/22: YOLOv12 TensorRT CPP Inference Repo + Google Colab Notebook.
2025/02/22: Android deploy / TensorRT-YOLO accelerates yolo12. Thanks to them!
2025/02/21: Try yolo12 for classification, oriented bounding boxes, pose estimation, and instance segmentation at ultralytics. Please pay attention to this issue. Thanks to them!
2025/02/20: Any computer or edge device? / ONNX CPP Version. Thanks to them!
2025/02/20: Train a yolov12 model on a custom dataset: Blog and Youtube. / Step-by-step instruction. Thanks to them!
2025/02/19: arXiv version is public. Demo is available (try Demo2 Demo3 if busy).

Abstract

Enhancing the network architecture of the YOLO framework has been crucial for a long time but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms.

YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.

Main Results

Turbo (default): | Model (det) | size
^{(pixels) | mAP^{val
50-95 | Speed (ms)
^{T4 TensorRT10
| params
^{(M) | FLOPs
^{(G) |
| :----------------------------------------------------------------------------------- | :-------------------: | :-------------------:| :------------------------------:| :-----------------:| :---------------:|
| YOLO12n | 640 | 40.4 | 1.60 | 2.5 | 6.0 |
| YOLO12s | 640 | 47.6 | 2.42 | 9.1 | 19.4 |
| YOLO12m | 640 | 52.5 | 4.27 | 19.6 | 59.8 |
| YOLO12l | 640 | 53.8 | 5.83 | 26.5 | 82.4 |
| YOLO12x | 640 | 55.4 | 10.38 | 59.3 | 184.6 |}}}}}

v1.0: | Model (det) | size
^{(pixels) | mAP^{val
50-95 | Speed (ms)
^{T4 TensorRT10
| params
^{(M) | FLOPs
^{(G) |
| :----------------------------------------------------------------------------------- | :-------------------: | :-------------------:| :------------------------------:| :-----------------:| :---------------:|
| YOLO12n | 640 | 40.6 | 1.64 | 2.6 | 6.5 |
| YOLO12s | 640 | 48.0 | 2.61 | 9.3 | 21.4 |
| YOLO12m | 640 | 52.5 | 4.86 | 20.2 | 67.5 |
| YOLO12l | 640 | 53.7 | 6.77 | 26.4 | 88.9 |
| YOLO12x | 640 | 55.2 | 11.79 | 59.1 | 199.0 |}}}}}

Instance segmentation: | Model (seg) | size
^{(pixels) | mAP^{box
50-95 | mAP^{mask
50-95 | Speed (ms)
^{T4 TensorRT10
| params
^{(M) | FLOPs
^{(B) |
| :------------------------------------------------------------------------------------| :--------------------: | :-------------------: | :---------------------: | :--------------------------------:| :------------------: | :-----------------: |
| YOLOv12n-seg | 640 | 39.9 | 32.8 | 1.84 | 2.8 | 9.9 |
| YOLOv12s-seg | 640 | 47.5 | 38.6 | 2.84 | 9.8 | 33.4 |
| YOLOv12m-seg | 640 | 52.4 | 42.3 | 6.27 | 21.9 | 115.1 |
| YOLOv12l-seg | 640 | 54.0 | 43.2 | 7.61 | 28.8 | 137.7 |
| YOLOv12x-seg | 640 | 55.2 | 44.2 | 15.43 | 64.5 | 308.7 |}}}}}}

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
conda activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.val(data='coco.yaml', save_json=True)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

# Train the model
results = model.train(
  data='coco.yaml',
  epochs=600, 
  batch=256, 
  imgsz=640,
  scale=0.5,  # S:0.9; M:0.9; L:0.9; X:0.9
  mosaic=1.0,
  mixup=0.0,  # S:0.05; M:0.15; L:0.15; X:0.2
  copy_paste=0.1,  # S:0.15; M:0.4; L:0.5; X:0.6
  device="0,1,2,3",
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

Prediction

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.predict()

Export

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt')
model.export(format="engine", half=True)  # or format="onnx"

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code is based on ultralytics. Thanks for their excellent work!

Citation

@article{tian2025yolov12,
  title={YOLOv12: Attention-Centric Real-Time Object Detectors},
  author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
  journal={arXiv preprint arXiv:2502.12524},
  year={2025}
}

README.md