[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
[](#contributors)
<!-- PROJECT LOGO -->
<br />
<h3 align="center">YoloV8 TensorRT CPP</h3>
<p align="center">
<b>
A C++ Implementation of YoloV8 using TensorRT
</b>
<br/>
Supports object detection, semantic segmentation, and body pose estimation.
</p>
<p align="center">
<a href="https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP">
<img width="70%" src="assets/yolov8.gif" alt="logo">
</a>
<a href="https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP">
<img width="70%" src="assets/yolov8-seg.gif" alt="logo">
</a>
<a href="https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP">
<img width="70%" src="assets/yolov8-pose.gif" alt="logo">
</a>
</p>
## Looking for Maintainers ð
This project is actively seeking maintainers to help guide its growth and improvement. If you're passionate about this project and interested in contributing, Iâd love to hear from you!
Please feel free to reach out via [LinkedIn](https://www.linkedin.com/in/cyrus-behroozi/) to discuss how you can get involved.
### Getting Started
This project demonstrates how to use the TensorRT C++ API to run GPU inference for YoloV8.
It makes use of my other project [tensorrt-cpp-api](https://github.com/cyrusbehr/tensorrt-cpp-api) to run inference behind the scene, so make sure you are familiar with that project.
### Prerequisites
- Tested and working on Ubuntu 20.04 & 22.04 (Windows is **not** supported at this time)
- Install CUDA, instructions [here](https://developer.nvidia.com/cuda-downloads).
- Recommended >= 12.0
- Install cudnn, instructions [here](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#download).
- Recommended >= 8
- `sudo apt install build-essential`
- `sudo apt install python3-pip`
- `pip3 install cmake`
- Install OpenCV with cuda support. To compile OpenCV from source, run the `build_opencv.sh` script provided [here](https://github.com/cyrusbehr/tensorrt-cpp-api/blob/ec6a7529a792b2a9b1ab466f2d0e2da5df47543d/scripts/build_opencv.sh).
- Recommended >= 4.8
- Download TensorRT 10 from [here](10x).
- Required >= 10.0
- Extract, and then navigate to the `CMakeLists.txt` file and replace the `TODO` with the path to your TensorRT installation.
### Installation
- `git clone https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP --recursive`
- **Note:** Be sure to use the `--recursive` flag as this repo makes use of git submodules.
### Converting Model from PyTorch to ONNX
- Navigate to the [official YoloV8 repository](https://github.com/ultralytics/ultralytics) and download your desired version of the model (ex. YOLOv8x).
- The code also supports semantic segmentation models out of the box (ex. YOLOv8x-seg) and pose estimation models (ex. yolov8x-pose.onnx).
- `pip3 install ultralytics`
- Navigate to the `scripts/` directory and run the following:
- ```python3 pytorch2onnx.py --pt_path <path to your pt file>```
- After running this command, you should successfully have converted from PyTorch to ONNX.
- **Note**: If converting the model using a different script, be sure that `end2end` is disabled. This flag will add bbox decoding and nms directly to the model, whereas my implementation does these steps external to the model using good old C++.
### Building the Project
- `mkdir build`
- `cd build`
- `cmake ..`
- `make -j`
### Running the Executables
- *Note*: the first time you run any of the scripts, it may take quite a long time (5 mins+) as TensorRT must generate an optimized TensorRT engine file from the onnx model. This is then saved to disk and loaded on subsequent runs.
- *Note*: The executables all work out of the box with Ultralytic's pretrained object detection, segmentation, and pose estimation models.
- To run the benchmarking script, run: `./benchmark --model /path/to/your/onnx/model.onnx --input /path/to/your/benchmark/image.png`
- To run inference on an image and save the annotated image to disk run: `./detect_object_image --model /path/to/your/onnx/model.onnx --input /path/to/your/image.jpg`
- You can use the images in the `images/` directory for testing
- To run inference using your webcam and display the results in real time, run: `./detect_object_video --model /path/to/your/onnx/model.onnx --input 0`
- Note: You can also specify a TensorRT engine file by using the `--trt_model` option if you have a pre-built TensorRT engine available.
- For a full list of arguments, run any of the executables without providing any arguments.
### INT8 Inference
Enabling INT8 precision can further speed up inference at the cost of accuracy reduction due to reduced dynamic range.
For INT8 precision, calibration data must be supplied which is representative of real data the model will see.
It is advised to use 1K+ calibration images. To enable INT8 inference with the YoloV8 sanity check model, the following steps must be taken:
- Download and extract the COCO validation dataset, or procure data representative of your inference data: `wget http://images.cocodataset.org/zips/val2017.zip`
- Provide the additional command line arguments when running the executables: `--precision INT8 --calibration-data /path/to/your/calibration/data`
- If you get an "out of memory in function allocate" error, then you must reduce `Options.calibrationBatchSize` so that the entire batch can fit in your GPU memory.
### Benchmarking
- Before running benchmarks, ensure your GPU is unloaded.
- Run the executable `benchmark` using the `/images/640_640.jpg` image.
- If you'd like to benchmark each component (`preprocess`, `inference`, `postprocess`), recompile setting the `ENABLE_BENCHMARKS` flag to `ON`: `cmake -DENABLE_BENCHMARKS=ON ..`.
- You can then rerun the executable
Benchmarks run on NVIDIA GeForce RTX 3080 Laptop GPU, Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz using 640x640 BGR image in GPU memory and FP16 precision.
| Model | Total Time | Preprocess Time | Inference Time | Postprocess Time |
|--------------|------------|-----------------|----------------|------------------|
| yolov8n | 3.613 ms | 0.081 ms | 1.703 ms | 1.829 ms |
| yolov8n-pose | 2.107 ms | 0.091 ms | 1.609 ms | 0.407 ms |
| yolov8n-seg | 15.194 ms | 0.109 ms | 2.732 ms | 12.353 ms |
| Model | Precision | Total Time | Preprocess Time | Inference Time | Postprocess Time |
|--------- |----------- |------------ |----------------- |---------------- |------------------ |
| yolov8x | FP32 | 25.819 ms | 0.103 ms | 23.763 ms | 1.953 ms |
| yolov8x | FP16 | 10.147 ms | 0.083 ms | 7.677 ms | 2.387 ms |
| yolov8x | INT8 | 7.32 ms | 0.103 ms | 4.698 ms | 2.519 ms |
TODO: Need to improve postprocessing time using CUDA kernel.
### How to debug
- If you have issues creating the TensorRT engine file from the onnx model, navigate to `libs/tensorrt-cpp-api/src/engine.cpp` and change the log level by changing the severity level to `kVERBOSE` and rebuild and rerun. This should give you more information on where exactly the build process is failing.
### Show your appreciation
If this project was helpful to you, I would appreicate if you could give it a star. That will encourage me to ensure it's up to date and solve issues quickly.
### Contributors
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- prettier-ignore-start -->
<!-- markdownlint-disable -->
<table>
<tbody>
<tr>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/z3lx"><img src="https://avatars.githubusercontent.com/u/57017122?v=4?s=100" width="100px;" alt="z3lx"/><br /><sub><b>z3lx</b></sub></a><br /><a

Mmnnnbb123
- 粉丝: 784
最新资源
- 分包工程管理程序372-pc-pc-0003-0.doc
- 阿里巴巴绩效管理制度.ppt
- 格式化CIPS或福禄克万用表采集的管地电位数据-管地电位数据整理与ExcelVBA自动化处理-用于高效整理和分析管道阴极保护系统的电位测量数据提升数据准确性和工作效率-Exce.zip
- 工程结算案例分析.doc
- [山东]建筑施工大体积混凝土专项施工方案.doc
- TTCookieStorages-master.zip
- 细心换安心-安度每一日.doc
- 标志的管理和使用控制程序.docx
- 生态办公产业园区.pdf
- 钢筋平法配筋计算讲解.ppt
- 未来移动通信的发展趋势及关键技术课件.ppt
- 某变电站工程电气安装施工工艺.ppt
- 第三章-microsoft-word-文档.doc
- 西南兵工医院装饰装修及安装工程.doc
- 造价建设工程工程量清单计价规范合同价款争议的解决(2013新版).docx
- 内蒙某办公楼地源热泵主机房设备及电控系统采购及安装招标文件.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


