终于把TensorRT的engine
模型的结构图画出来了!
大概长这样(截取了最终模型图的输入部分),仔细看看:
可以看到很多层被融合了,比如conv1.weight QuantizeLinear_7_quantize_scale_node Conv_9 Relu_11
这个部分。也有没有被融合的,比如MaxPool_12
。另外QuantizeLinear
这个量化算子,可能有些童鞋没有见过,大家可以把它当做一个层就可以。
可以看到上面这个模型输入是Float
而输出是Int8
。这个模型是由TensorRT官方提供的pytorch-quantization工具对Pytorch模型进行量化后导出ONNX,然后再由TensorRT-8转化得到的engine,这个engine的精度是INT8。
PS:关于TensorRT的量化细节,老潘后续文章会陆续讲,不着急哈。
TensorRT的优化
众所周知,TensorRT会对模型做很多的优化,比如前后层融合(CONV BN RELU)、比如水平层融合、又比如去掉concat直接操作等等:
更多的细节可以看我之前的文章《内卷成啥了还不知道TensorRT?超详细入门指北,来看看吧!》回忆一下。
总之,通过TensorRT优化后的模型,基本已经“面目全非”了,TensorRT支持很多层的融合,你的模型扔给TensorRT再出来,会发现很多层都被合体了。当然这样做的目的是为了优化访存,减少数据在每层之间传输的消耗。
不过,这样做并不都没毛病,有时候会有奇奇怪怪的BUG。我们需要注意。
被合体之后的模型,我们一般无法通过Netron
来读取查看,毕竟TensorRT是闭源的,其生成的engine结构之复杂,只靠猜是不行的。不过TensorRT知道其闭源的缺点,为我们引入了log接口,如果我们想看到融合后的模型长什么样,只要在build engine
开启verbose模式即可。
Verbose查看engine结构
很简单,拿TensorRT的官方工具trtexec
为例,我们只需要在构建的时候添加verbose
命令:
./trtexec --explicitBatch --onnx=debug.onnx --saveEngine=debug.trt --verbose
即可在转换的时候得到大量的转换信息,例如build信息,我们可以看到这个模型的构建精度是FP32 INT8:
代码语言:javascript复制[08/25/2021-17:30:04] [I] === Build Options ===
[08/25/2021-17:30:04] [I] Max batch: explicit
[08/25/2021-17:30:04] [I] Workspace: 4096 MiB
[08/25/2021-17:30:04] [I] minTiming: 1
[08/25/2021-17:30:04] [I] avgTiming: 8
[08/25/2021-17:30:04] [I] Precision: FP32 INT8
[08/25/2021-17:30:04] [I] Calibration: Dynamic
[08/25/2021-17:30:04] [I] Refit: Disabled
[08/25/2021-17:30:04] [I] Sparsity: Disabled
[08/25/2021-17:30:04] [I] Safe mode: Disabled
[08/25/2021-17:30:04] [I] Restricted mode: Disabled
[08/25/2021-17:30:04] [I] Save engine: debug_int8.trt
在经过漫长且深奥的一堆优化步骤之后,我们可以看到最终模型的engine结构:
代码语言:javascript复制[V] [TRT] Engine Layer Information:
Layer(Scale): QuantizeLinear_2_quantize_scale_node, Tactic: 0, input[Float(1,3,-17,-18)] -> 255[Int8(1,3,-17,-18)]
Layer(CaskConvolution): conv1.weight QuantizeLinear_7_quantize_scale_node Conv_9 Relu_11, Tactic: 4438325421691896755, 255[Int8(1,3,-17,-18)] -> 267[Int8(1,64,-40,-44)]
Layer(CudaPooling): MaxPool_12, Tactic: -3, 267[Int8(1,64,-40,-44)] -> Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to MaxPool_12, Tactic: 0, Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)] -> 270[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv1.weight QuantizeLinear_20_quantize_scale_node Conv_22 Relu_24, Tactic: 4871133328510103657, 270[Int8(1,64,-21,-24)] -> 284[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv2.weight QuantizeLinear_32_quantize_scale_node Conv_34 Add_42 Relu_43, Tactic: 4871133328510103657, 284[Int8(1,64,-21,-24)], 270[Int8(1,64,-21,-24)] -> 305[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv1.weight QuantizeLinear_51_quantize_scale_node Conv_53 Relu_55, Tactic: 4871133328510103657, 305[Int8(1,64,-21,-24)] -> 319[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv2.weight QuantizeLinear_63_quantize_scale_node Conv_65 Add_73 Relu_74, Tactic: 4871133328510103657, 319[Int8(1,64,-21,-24)], 305[Int8(1,64,-21,-24)] -> 340[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv1.weight QuantizeLinear_82_quantize_scale_node Conv_84 Relu_86, Tactic: 4871133328510103657, 340[Int8(1,64,-21,-24)] -> 354[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv2.weight QuantizeLinear_94_quantize_scale_node Conv_96 Add_104 Relu_105, Tactic: 4871133328510103657, 354[Int8(1,64,-21,-24)], 340[Int8(1,64,-21,-24)] -> 375[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer2.0.conv1.weight QuantizeLinear_113_quantize_scale_node Conv_115 Relu_117, Tactic: -1841683966837205309, 375[Int8(1,64,-21,-24)] -> 389[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.downsample.0.weight QuantizeLinear_136_quantize_scale_node Conv_138, Tactic: -1494157908358500249, 375[Int8(1,64,-21,-24)] -> 415[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.conv2.weight QuantizeLinear_125_quantize_scale_node Conv_127 Add_146 Relu_147, Tactic: -1841683966837205309, 389[Int8(1,128,-52,-37)], 415[Int8(1,128,-52,-37)] -> 423[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv1.weight QuantizeLinear_155_quantize_scale_node Conv_157 Relu_159, Tactic: -1841683966837205309, 423[Int8(1,128,-52,-37)] -> 437[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv2.weight QuantizeLinear_167_quantize_scale_node Conv_169 Add_177 Relu_178, Tactic: -1841683966837205309, 437[Int8(1,128,-52,-37)], 423[Int8(1,128,-52,-37)] -> 458[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv1.weight QuantizeLinear_186_quantize_scale_node Conv_188 Relu_190, Tactic: -1841683966837205309, 458[Int8(1,128,-52,-37)] -> 472[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv2.weight QuantizeLinear_198_quantize_scale_node Conv_200 Add_208 Relu_209, Tactic: -1841683966837205309, 472[Int8(1,128,-52,-37)], 458[Int8(1,128,-52,-37)] -> 493[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv1.weight QuantizeLinear_217_quantize_scale_node Conv_219 Relu_221, Tactic: -1841683966837205309, 493[Int8(1,128,-52,-37)] -> 507[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv2.weight QuantizeLinear_229_quantize_scale_node Conv_231 Add_239 Relu_240, Tactic: -1841683966837205309, 507[Int8(1,128,-52,-37)], 493[Int8(1,128,-52,-37)] -> 528[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer3.0.conv1.weight QuantizeLinear_248_quantize_scale_node Conv_250 Relu_252, Tactic: -8431788508843860955, 528[Int8(1,128,-52,-37)] -> 542[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.downsample.0.weight QuantizeLinear_271_quantize_scale_node Conv_273, Tactic: -5697614955743334137, 528[Int8(1,128,-52,-37)] -> 568[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.conv2.weight QuantizeLinear_260_quantize_scale_node Conv_262 Add_281 Relu_282, Tactic: -496455309852654971, 542[Int8(1,256,-59,-62)], 568[Int8(1,256,-59,-62)] -> 576[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv1.weight QuantizeLinear_290_quantize_scale_node Conv_292 Relu_294, Tactic: -8431788508843860955, 576[Int8(1,256,-59,-62)] -> 590[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv2.weight QuantizeLinear_302_quantize_scale_node Conv_304 Add_312 Relu_313, Tactic: -496455309852654971, 590[Int8(1,256,-59,-62)], 576[Int8(1,256,-59,-62)] -> 611[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv1.weight QuantizeLinear_321_quantize_scale_node Conv_323 Relu_325, Tactic: -8431788508843860955, 611[Int8(1,256,-59,-62)] -> 625[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv2.weight QuantizeLinear_333_quantize_scale_node Conv_335 Add_343 Relu_344, Tactic: -496455309852654971, 625[Int8(1,256,-59,-62)], 611[Int8(1,256,-59,-62)] -> 646[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv1.weight QuantizeLinear_352_quantize_scale_node Conv_354 Relu_356, Tactic: -8431788508843860955, 646[Int8(1,256,-59,-62)] -> 660[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv2.weight QuantizeLinear_364_quantize_scale_node Conv_366 Add_374 Relu_375, Tactic: -496455309852654971, 660[Int8(1,256,-59,-62)], 646[Int8(1,256,-59,-62)] -> 681[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv1.weight QuantizeLinear_383_quantize_scale_node Conv_385 Relu_387, Tactic: -8431788508843860955, 681[Int8(1,256,-59,-62)] -> 695[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv2.weight QuantizeLinear_395_quantize_scale_node Conv_397 Add_405 Relu_406, Tactic: -496455309852654971, 695[Int8(1,256,-59,-62)], 681[Int8(1,256,-59,-62)] -> 716[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv1.weight QuantizeLinear_414_quantize_scale_node Conv_416 Relu_418, Tactic: -8431788508843860955, 716[Int8(1,256,-59,-62)] -> 730[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv2.weight QuantizeLinear_426_quantize_scale_node Conv_428 Add_436 Relu_437, Tactic: -496455309852654971, 730[Int8(1,256,-59,-62)], 716[Int8(1,256,-59,-62)] -> 751[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer4.0.conv1.weight QuantizeLinear_445_quantize_scale_node Conv_447 Relu_449, Tactic: -6371781333659293809, 751[Int8(1,256,-59,-62)] -> 765[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.downsample.0.weight QuantizeLinear_468_quantize_scale_node Conv_470, Tactic: -1494157908358500249, 751[Int8(1,256,-59,-62)] -> 791[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.conv2.weight QuantizeLinear_457_quantize_scale_node Conv_459 Add_478 Relu_479, Tactic: -2328318099174473157, 765[Int8(1,512,-71,-72)], 791[Int8(1,512,-71,-72)] -> 799[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv1.weight QuantizeLinear_487_quantize_scale_node Conv_489 Relu_491, Tactic: -2328318099174473157, 799[Int8(1,512,-71,-72)] -> 813[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv2.weight QuantizeLinear_499_quantize_scale_node Conv_501 Add_509 Relu_510, Tactic: -2328318099174473157, 813[Int8(1,512,-71,-72)], 799[Int8(1,512,-71,-72)] -> 834[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv1.weight QuantizeLinear_518_quantize_scale_node Conv_520 Relu_522, Tactic: -2328318099174473157, 834[Int8(1,512,-71,-72)] -> 848[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv2.weight QuantizeLinear_530_quantize_scale_node Conv_532 Add_540 Relu_541, Tactic: -2328318099174473157, 848[Int8(1,512,-71,-72)], 834[Int8(1,512,-71,-72)] -> 869[Int8(1,512,-71,-72)]
Layer(CaskDeconvolution): deconv_layers.0.weight QuantizeLinear_549_quantize_scale_node ConvTranspose_551, Tactic: -3784829056659735491, 869[Int8(1,512,-71,-72)] -> 881[Int8(1,512,-46,-47)]
Layer(CaskConvolution): deconv_layers.1.weight QuantizeLinear_559_quantize_scale_node Conv_561 Relu_563, Tactic: -496455309852654971, 881[Int8(1,512,-46,-47)] -> 895[Int8(1,256,-46,-47)]
Layer(CaskDeconvolution): deconv_layers.4.weight QuantizeLinear_571_quantize_scale_node ConvTranspose_573, Tactic: -3784829056659735491, 895[Int8(1,256,-46,-47)] -> 907[Int8(1,256,-68,-55)]
Layer(CaskConvolution): deconv_layers.5.weight QuantizeLinear_581_quantize_scale_node Conv_583 Relu_585, Tactic: -8431788508843860955, 907[Int8(1,256,-68,-55)] -> 921[Int8(1,256,-68,-55)]
Layer(CaskDeconvolution): deconv_layers.8.weight QuantizeLinear_593_quantize_scale_node ConvTranspose_595, Tactic: -2621193268472024213, 921[Int8(1,256,-68,-55)] -> 933[Int8(1,256,-29,-32)]
Layer(CaskConvolution): deconv_layers.9.weight QuantizeLinear_603_quantize_scale_node Conv_605 Relu_607, Tactic: -8431788508843860955, 933[Int8(1,256,-29,-32)] -> 947[Int8(1,256,-29,-32)]
Layer(CaskConvolution): hm.0.weight QuantizeLinear_615_quantize_scale_node Conv_617 Relu_618, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 960[Int8(1,64,-29,-32)]
Layer(CaskConvolution): wh.0.weight QuantizeLinear_636_quantize_scale_node Conv_638 Relu_639, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 985[Int8(1,64,-29,-32)]
Layer(CaskConvolution): reg.0.weight QuantizeLinear_657_quantize_scale_node Conv_659 Relu_660, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 1010[Int8(1,64,-29,-32)]
Layer(CaskConvolution): hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628, Tactic: -7185527339793611699, 960[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628, Tactic: 0, Reformatted Output Tensor 0 to hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628[Float(1,2,-29,-32)] -> hm[Float(1,2,-29,-32)]
Layer(CaskConvolution): wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649, Tactic: -7185527339793611699, 985[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649, Tactic: 0, Reformatted Output Tensor 0 to wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649[Float(1,2,-29,-32)] -> wh[Float(1,2,-29,-32)]
Layer(CaskConvolution): reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670, Tactic: -7185527339793611699, 1010[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670, Tactic: 0, Reformatted Output Tensor 0 to reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670[Float(1,2,-29,-32)] -> reg[Float(1,2,-29,-32)]
大家能猜到以上模型的backbone
是什么吗?
不画图根本看不出来好不。
由于老潘这几天翻Pytorch的Pull Request记录比较勤快,无意间发现了一个好东西——engine_layer_visualize.py
,其commit在这里:
- https://github.com/pytorch/pytorch/pull/66431/files
这是jerryzh168
大神开源了Facebook内部查看engine的工具,使用pydot和graphviz来画神经网络结构图,查了一下之前Keras竟然也是使用这个库来画图的。
使用Pydot和graphviz画TensorRT的Engine图
使用方式很简单,首先安装:
代码语言:javascript复制pip install pydot
conda install python-graphviz
PS:别问我为什么先pip install
后conda install
,我这边只有这样才不报错…否则会报[Errno 2] "dot" not found in path.
然后利用以下代码:
代码语言:javascript复制# (c) Facebook, Inc. and its affiliates. Confidential and proprietary.
import argparse
import re
from typing import NamedTuple, List, Optional
import pydot
"""
log_file is generated by tensorrt verbose logger during building engine.
profile_file is generated by tensorrt profiler.
Curretnly we support processing multiple logs in one log_file, which
would generate multiple dot graphs. However, multiple engine profiles are not
supported.
Usage:
python torch/fx/experimental/fx2trt/tools/engine_layer_visualize.py --log_file aaa --profile_file bbb
Usage(Facebook):
buck run //caffe2/torch/fx/experimental/fx2trt/tools:engine_layer_visualize -- --log_file aaa --profile_file bbb
"""
parser = argparse.ArgumentParser()
parser.add_argument(
"--log_file",
type=str,
default="",
help="TensorRT VERBOSE logging when building engines.",
)
parser.add_argument(
"--profile_file",
type=str,
default="",
help="TensorRT execution context profiler output.",
)
args = parser.parse_args()
...
完整代码在这里: https://github.com/pytorch/pytorch/pull/66431/files, 这里就不粘了。
需要注意我们需要输入log_file
也就是刚才开启Verbose的构建信息,然后profile_file
则是使用TensorRT来profile的信息,最简单的可以通过trtexec这样获取到:
./trtexec --loadEngine=debug_int8.trt --dumpProfile --shapes=input:1x3x512x512 --exportProfile=debug_profile
然后会产生类似于这样的profile信息,详细展示了融合后每层的平均运行时间、以及总体运行时间、时间占比:
代码语言:javascript复制[
{ "count" : 961 }
, { "name" : "QuantizeLinear_2_quantize_scale_node", "timeMs" : 19.9954, "averageMs" : 0.0208069, "percentage" : 0.801597 }
, { "name" : "conv1.weight QuantizeLinear_7_quantize_scale_node Conv_9 Relu_11", "timeMs" : 86.6105, "averageMs" : 0.0901253, "percentage" : 3.47213 }
, { "name" : "MaxPool_12", "timeMs" : 28.0466, "averageMs" : 0.0291848, "percentage" : 1.12436 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to MaxPool_12", "timeMs" : 12.9771, "averageMs" : 0.0135037, "percentage" : 0.520239 }
, { "name" : "layer1.0.conv1.weight QuantizeLinear_20_quantize_scale_node Conv_22 Relu_24", "timeMs" : 28.8356, "averageMs" : 0.0300059, "percentage" : 1.15599 }
, { "name" : "layer1.0.conv2.weight QuantizeLinear_32_quantize_scale_node Conv_34 Add_42 Relu_43", "timeMs" : 31.3897, "averageMs" : 0.0326635, "percentage" : 1.25838 }
, { "name" : "layer1.1.conv1.weight QuantizeLinear_51_quantize_scale_node Conv_53 Relu_55", "timeMs" : 28.788, "averageMs" : 0.0299563, "percentage" : 1.15408 }
, { "name" : "layer1.1.conv2.weight QuantizeLinear_63_quantize_scale_node Conv_65 Add_73 Relu_74", "timeMs" : 31.1857, "averageMs" : 0.0324513, "percentage" : 1.25021 }
, { "name" : "layer1.2.conv1.weight QuantizeLinear_82_quantize_scale_node Conv_84 Relu_86", "timeMs" : 28.7898, "averageMs" : 0.0299581, "percentage" : 1.15415 }
, { "name" : "layer1.2.conv2.weight QuantizeLinear_94_quantize_scale_node Conv_96 Add_104 Relu_105", "timeMs" : 31.1666, "averageMs" : 0.0324314, "percentage" : 1.24944 }
, { "name" : "layer2.0.conv1.weight QuantizeLinear_113_quantize_scale_node Conv_115 Relu_117", "timeMs" : 20.9996, "averageMs" : 0.0218519, "percentage" : 0.841856 }
, { "name" : "layer2.0.downsample.0.weight QuantizeLinear_136_quantize_scale_node Conv_138", "timeMs" : 10.1555, "averageMs" : 0.0105677, "percentage" : 0.407126 }
, { "name" : "layer2.0.conv2.weight QuantizeLinear_125_quantize_scale_node Conv_127 Add_146 Relu_147", "timeMs" : 31.8969, "averageMs" : 0.0331914, "percentage" : 1.27872 }
, { "name" : "layer2.1.conv1.weight QuantizeLinear_155_quantize_scale_node Conv_157 Relu_159", "timeMs" : 30.5402, "averageMs" : 0.0317796, "percentage" : 1.22433 }
, { "name" : "layer2.1.conv2.weight QuantizeLinear_167_quantize_scale_node Conv_169 Add_177 Relu_178", "timeMs" : 32.0256, "averageMs" : 0.0333253, "percentage" : 1.28388 }
, { "name" : "layer2.2.conv1.weight QuantizeLinear_186_quantize_scale_node Conv_188 Relu_190", "timeMs" : 30.5798, "averageMs" : 0.0318208, "percentage" : 1.22591 }
, { "name" : "layer2.2.conv2.weight QuantizeLinear_198_quantize_scale_node Conv_200 Add_208 Relu_209", "timeMs" : 31.813, "averageMs" : 0.0331041, "percentage" : 1.27536 }
, { "name" : "layer2.3.conv1.weight QuantizeLinear_217_quantize_scale_node Conv_219 Relu_221", "timeMs" : 30.6143, "averageMs" : 0.0318568, "percentage" : 1.2273 }
, { "name" : "layer2.3.conv2.weight QuantizeLinear_229_quantize_scale_node Conv_231 Add_239 Relu_240", "timeMs" : 32.123, "averageMs" : 0.0334266, "percentage" : 1.28778 }
, { "name" : "layer3.0.conv1.weight QuantizeLinear_248_quantize_scale_node Conv_250 Relu_252", "timeMs" : 21.1744, "averageMs" : 0.0220337, "percentage" : 0.848863 }
, { "name" : "layer3.0.downsample.0.weight QuantizeLinear_271_quantize_scale_node Conv_273", "timeMs" : 12.0922, "averageMs" : 0.0125829, "percentage" : 0.484765 }
, { "name" : "layer3.0.conv2.weight QuantizeLinear_260_quantize_scale_node Conv_262 Add_281 Relu_282", "timeMs" : 34.8428, "averageMs" : 0.0362568, "percentage" : 1.39682 }
, { "name" : "layer3.1.conv1.weight QuantizeLinear_290_quantize_scale_node Conv_292 Relu_294", "timeMs" : 31.9807, "averageMs" : 0.0332785, "percentage" : 1.28207 }
, { "name" : "layer3.1.conv2.weight QuantizeLinear_302_quantize_scale_node Conv_304 Add_312 Relu_313", "timeMs" : 34.4399, "averageMs" : 0.0358375, "percentage" : 1.38066 }
, { "name" : "layer3.2.conv1.weight QuantizeLinear_321_quantize_scale_node Conv_323 Relu_325", "timeMs" : 31.7602, "averageMs" : 0.0330491, "percentage" : 1.27324 }
, { "name" : "layer3.2.conv2.weight QuantizeLinear_333_quantize_scale_node Conv_335 Add_343 Relu_344", "timeMs" : 35.1158, "averageMs" : 0.0365409, "percentage" : 1.40776 }
, { "name" : "layer3.3.conv1.weight QuantizeLinear_352_quantize_scale_node Conv_354 Relu_356", "timeMs" : 32.027, "averageMs" : 0.0333267, "percentage" : 1.28393 }
, { "name" : "layer3.3.conv2.weight QuantizeLinear_364_quantize_scale_node Conv_366 Add_374 Relu_375", "timeMs" : 34.6465, "averageMs" : 0.0360526, "percentage" : 1.38895 }
, { "name" : "layer3.4.conv1.weight QuantizeLinear_383_quantize_scale_node Conv_385 Relu_387", "timeMs" : 31.7624, "averageMs" : 0.0330514, "percentage" : 1.27332 }
, { "name" : "layer3.4.conv2.weight QuantizeLinear_395_quantize_scale_node Conv_397 Add_405 Relu_406", "timeMs" : 34.3392, "averageMs" : 0.0357328, "percentage" : 1.37663 }
, { "name" : "layer3.5.conv1.weight QuantizeLinear_414_quantize_scale_node Conv_416 Relu_418", "timeMs" : 31.728, "averageMs" : 0.0330156, "percentage" : 1.27195 }
, { "name" : "layer3.5.conv2.weight QuantizeLinear_426_quantize_scale_node Conv_428 Add_436 Relu_437", "timeMs" : 34.2101, "averageMs" : 0.0355985, "percentage" : 1.37145 }
, { "name" : "layer4.0.conv1.weight QuantizeLinear_445_quantize_scale_node Conv_447 Relu_449", "timeMs" : 25.4399, "averageMs" : 0.0264723, "percentage" : 1.01986 }
, { "name" : "layer4.0.downsample.0.weight QuantizeLinear_468_quantize_scale_node Conv_470", "timeMs" : 8.88198, "averageMs" : 0.00924243, "percentage" : 0.35607 }
, { "name" : "layer4.0.conv2.weight QuantizeLinear_457_quantize_scale_node Conv_459 Add_478 Relu_479", "timeMs" : 44.1804, "averageMs" : 0.0459734, "percentage" : 1.77115 }
, { "name" : "layer4.1.conv1.weight QuantizeLinear_487_quantize_scale_node Conv_489 Relu_491", "timeMs" : 44.3623, "averageMs" : 0.0461627, "percentage" : 1.77844 }
, { "name" : "layer4.1.conv2.weight QuantizeLinear_499_quantize_scale_node Conv_501 Add_509 Relu_510", "timeMs" : 44.3341, "averageMs" : 0.0461333, "percentage" : 1.77731 }
, { "name" : "layer4.2.conv1.weight QuantizeLinear_518_quantize_scale_node Conv_520 Relu_522", "timeMs" : 42.4246, "averageMs" : 0.0441463, "percentage" : 1.70076 }
, { "name" : "layer4.2.conv2.weight QuantizeLinear_530_quantize_scale_node Conv_532 Add_540 Relu_541", "timeMs" : 43.7076, "averageMs" : 0.0454813, "percentage" : 1.75219 }
, { "name" : "deconv_layers.0.weight QuantizeLinear_549_quantize_scale_node ConvTranspose_551", "timeMs" : 77.9405, "averageMs" : 0.0811035, "percentage" : 3.12456 }
, { "name" : "deconv_layers.1.weight QuantizeLinear_559_quantize_scale_node Conv_561 Relu_563", "timeMs" : 60.049, "averageMs" : 0.0624859, "percentage" : 2.40731 }
, { "name" : "deconv_layers.4.weight QuantizeLinear_571_quantize_scale_node ConvTranspose_573", "timeMs" : 107.53, "averageMs" : 0.111894, "percentage" : 4.31079 }
, { "name" : "deconv_layers.5.weight QuantizeLinear_581_quantize_scale_node Conv_583 Relu_585", "timeMs" : 80.9985, "averageMs" : 0.0842856, "percentage" : 3.24715 }
, { "name" : "deconv_layers.8.weight QuantizeLinear_593_quantize_scale_node ConvTranspose_595", "timeMs" : 381.204, "averageMs" : 0.396674, "percentage" : 15.2821 }
, { "name" : "deconv_layers.9.weight QuantizeLinear_603_quantize_scale_node Conv_605 Relu_607", "timeMs" : 221.925, "averageMs" : 0.230931, "percentage" : 8.89675 }
, { "name" : "hm.0.weight QuantizeLinear_615_quantize_scale_node Conv_617 Relu_618", "timeMs" : 84.4777, "averageMs" : 0.087906, "percentage" : 3.38663 }
, { "name" : "wh.0.weight QuantizeLinear_636_quantize_scale_node Conv_638 Relu_639", "timeMs" : 85.658, "averageMs" : 0.0891342, "percentage" : 3.43395 }
, { "name" : "reg.0.weight QuantizeLinear_657_quantize_scale_node Conv_659 Relu_660", "timeMs" : 85.4159, "averageMs" : 0.0888823, "percentage" : 3.42424 }
, { "name" : "hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628", "timeMs" : 19.5074, "averageMs" : 0.0202991, "percentage" : 0.782035 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to hm.2.weight QuantizeLinear_626_quantize_scale_node Conv_628", "timeMs" : 6.52869, "averageMs" : 0.00679364, "percentage" : 0.261729 }
, { "name" : "wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649", "timeMs" : 18.7298, "averageMs" : 0.0194899, "percentage" : 0.750862 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to wh.2.weight QuantizeLinear_647_quantize_scale_node Conv_649", "timeMs" : 6.69421, "averageMs" : 0.00696588, "percentage" : 0.268364 }
, { "name" : "reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670", "timeMs" : 18.7625, "averageMs" : 0.0195239, "percentage" : 0.752172 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to reg.2.weight QuantizeLinear_668_quantize_scale_node Conv_670", "timeMs" : 7.04306, "averageMs" : 0.00732889, "percentage" : 0.28235 }
]
然后通过上述代码生成EngineLayers_0.dot
。
这个.dot
就包含了网络计算图的信息,节点、线段等。
最终通过以下代码画图就可以了!
代码语言:javascript复制import pydot
graphs = pydot.graph_from_dot_file("EngineLayers_0.dot")
graph = graphs[0]
graph.write_png("trt_engine.png")
简单对比
简单对比下原模型和构建engine后的模型:
- 输入部分:
- 输出部分:
关于TensorRT模型量化的细节部分,老潘之后会花篇幅单独说,这里就不详谈了。
结语
如果你遇到画出来的图是这样的:
恭喜你!你的电脑是万中无一的绝世高手!解决方法很简单,换台电脑就好了(逃)!