基于tensorrt7.2.1.6 \ Cuda11.1版本下实现多GPU推理:
类似 GPU0:run model A,GPU1:run model B
1 模型最好分为2个独立文件,不要一个文件交给2个线程去加载,而且每个模型文件最好由该gpu转换生成,否则会有警告:
“WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.”
对于这种情况,一些git给出的解释
2 nv的官方注意事项:
multi gpus
Q: How do I use TensorRT on multiple GPUs?
A: Each ICudaEngine object is bound to a specific GPU when it is instantiated, either by the builder or on deserialization. To select the GPU, use cudaSetDevice() before calling the builder or deserializing the engine. Each IExecutionContext is bound to the same G