YOLOV1
YOLO(You Only Look Once: Unified, Real-Time Object Detection)是Joseph Redmon和Ali Farhadi等于2015年首次提出,在2017年CVPR上,Joseph Redmon和Ali Farhadi又提出的YOLOV2,后又再次提出YOLOV3,它是一个标准的One-stage目标检测算法。
相对于Faster RCNN系列和SSD系列,它能够更好的贯彻采用直接回归的方法获取到当前需要检测的目标以及目标类别问题的思想。YOLO算法的核心点在于输入图像,采用同时预测多个Bounding box的位置和类别的方法来检测位置和类别的分类。它是一种更加彻底的端到端的目标检测识别的方法。相比于Faster RCNN和SSD而言,能够达到一个更快的检测速度,但是相对于这两个系列的算法而言,YOLO算法整体检测的精度会低一些。
YOLO算法采用直接回归功能的CNN来完成整个目标检测的过程。这个过程不需要额外,设计复杂的过程。SSD算法在对目标检测的过程中,一方面用到了Anchor机制,另一方面需要针对多个不同的feature map来进行预测。Faster RCNN算法需要通过一个RPN网络来获取到我们需要的候选Bounding box,再通过后续的预测模型来完成最终的预测结果。YOLO算法相比于这两种算法而言,没有Anchor机制,多尺度等等设计的过程。YOLO直接采用一个卷积网络,最终通过直接回归的方法,来获取多个Bounding box的位置以及类别。直接选用整图来进行模型的训练并且能够更好的区分目标和背景区域。
YOLOV1算法的核心思想就在于一方面它通过将图像分成S*S的格子。对于每一个格子(区域)去负责预测相应的物体。即包含GT物体中心的格子,由它来预测相应的物体。在上图中有一个带红点的格子,它包含了狗的一部分,因此我们就需要用这样一个格子来检测狗这种物体。又比如说包含了自行车的格子就用来预测自行车这种物体。
在实际使用的时候,每个格子都会预测B(超参值,通常为2)个检测框Bounding box及每个检测框的置信度(表明了当前的格子所包含物体的概率),对于每个检测框,要预测C个类别概率(含有待检测物体的可能性)。对于一个格子,它最终预测出来的向量的长度为5*B C,这里5包含了每一个Bounding box的(x,y,w,h,c),c是置信度,而整个图所回归的向量长度为(5*B C)*S*S。
Bounding box信息(x,y,w,h)为物体的中心位置相对格子位置的偏移及宽度和高度,这些值均被归一化。换句话说,实际上我们预测的值是一个相对的值。置信度反映是否包含物体以及包含物体情况下位置的准确性,定义为
其中
YOLOV1的网络结构图
YOLO的网络结构采用直接回归的方式来获取到图片中所需要检测到的图片的位置和类别。从上图中我们可以看到,它只包含了CNN的网络结构,从最开始的原始的输入图像,经过多层卷积之后,最终通过一个FC层,最终输出的向量为S*S*(B*5 C)的长度。对于YOLOV1来说,通常S会定义为7*7,B会定义为2,C定义为20,也就是20个类别。通过这样一个回归,我们最终就能够得到对于每一个格子,它所包含的Bounding box的置信度以及是否包含待检测物体。如果包含待检测物体,当前这样一个物体,它的偏移量是多少以及它的长宽是多少,并且我们能够得到相应概率的分布。
通过对FC输出向量的解码,解码之后就能够得到
这样两幅图,对于第一幅图虽然每个格子预测了很多Bounding box,但是我们只选择IoU最高的Bounding box作为物体检测的输出。也就是说对于每个格子,我们最终只预测当中的一个物体,实际上这也是YOLO算法的一个缺陷。图像中可能会包含多个小的目标,此时由于每个格子只预测一个物体,如果一个格子同时出现多个物体的时候,对于小目标预测效果就会变得非常差。这也是YOLOV1算法主要的一个缺陷。通过NMS来进行Bounding box的合并、筛选和过滤之后,我们就能得到最终的检测结果。
YOLO算法强调网络使用小卷积,即:1*1和3*3(GoogleNet),能够一方面减少计算量,另一方面减少模型的大小。网络相比VGG16而言,速度会更快,但准确度稍差。
- YOLOV1损失函数
包含了三种Loss,坐标误差 、IoU误差和分类误差。这里每一种Loss也对应到了每一个网格所预测的信息对应的三种Loss。其中坐标误差对应到了S*S*(B*5 C)中的B,也就是Bounding box预测的信息之间的偏差。IoU的Loss对应到了坐标的误差。分类误差对应到了当前的格子所包含的groud true(gt,物体类别)所产生的误差。
通过对这三种误差的结合,最终通过加权方式进行权重考量来得到最终的loss,通过均方和误差的方式来进行最终的考量用于后续的网络模型的训练。
- YOLOV1网络训练
对于YOLOV1的具体使用的时候会用到下面一些技巧
首先在网络进行目标检测的时候,会采用预训练模型来对模型的参数进行初步的训练,对模型参数进行初始化。这里采用ImageNet 1000类来对模型进行预训练。对于分类任务和回归任务而言,会存在最后几重的差异。对于分类任务ImageNet 1000类的FC层的输出应该是1000,而YOLOV1的FC层最终输出为S*S*(B*5 C)这样一个值,因此我们在使用预训练模型的时候会去掉后面的几个FC层。
这里实际上采用了预训练模型前20个卷积层,并且用这前20个卷积层来初始化YOLO,用于后续的目标检测任务的训练,如VOC20数据集。由于ImageNet数据的输入图像为224*224,在YOLOV1中会将图像resize到448*448。对于预训练模型,如果我们仅仅是使用了它的卷积层,而卷积层对于feature map的大小实际上是不敏感的,它仅仅关注卷积核的参数(大小和通道数)。但是如果我们复用了FC层,FC层的参数量和我们输入的图像或者feature map的大小是相关的,如果图像的大小发生了变化,会影响到FC层的输入,此时FC层就没办法采用预训练模型来进行训练了。这里由于我们在YOLO预训练的时候只采用了前20个卷积层,去掉了FC层,此时就可以改变图像的大小,并且能够保证预训练模型能继续使用。
在训练B个Bounding box的时候,它的GT(真值)的设置是相同的。
- YOLOV1网络存在的问题
相对于SSD算法和Faster RCNN算法的效果有一定的差距,
- 在进行YOLO最后检测的时候,输入尺寸固定,没有采用多尺度的特征的输入。这是相对SSD算法对6个尺度来进行Prio box的提取以及最终的预测。而YOLO算法是一个完整的卷积网络,没有提取多尺度的feature map。因此YOLOV1算法在特征提取的时候通过多个下采样层学到的最终物体的特征并不精细,因此也会影响到检测的效果。
- YOLOV1在进行小目标检测的时候效果差。在同一个格子中包含多个目标时,仅预测一个目标(IoU最高),会忽略掉其他目标,此时就必然会有漏检的情况产生。
- 在YOLOV1的损失函数中关于IoU的loss,实际上并没有去区分大物体的IoU和小物体IoU的误差对于网络训练loss贡献值的影响。这里它们的贡献值基本上是接近的,实际上对于小物体而言,小物体的IoU的误差会对网络优化造成更大的影响,进而降低物体检测定位的准确性。因此YOLOV1算法在loss设计上也没有特别多的技巧,这也是后续YOLO算法的改进点。
- 如果同一个物体出现新的不常见的长宽比和一些其他情况的时候,YOLOV1算法的泛化能力也较差。
- YOLOV1网络性能
上图是不同尺度训练的精度与其他网络的精度对比,我们不难发现YOLOV1在相同的数据集中,他的mAP(精度)下降了很多。但是在检测速度上,如果只考虑相同尺度的条件下(448*448),YOLO算法能够达到45FPS,相对于Faster RCNN而言,检测速度是非常快的。相比于SSD500(即图像尺寸500*500)的速度也是非常快的,相比于SSD300(图像尺寸300*300)的速度是非常接近的。也就是说YOLOV1在较大尺寸上的图像检测速度能够保持跟SSD较小图像检测速度相同的检测速度。
YOLOV2
基于YOLOV1存在的问题,作者在2017年提出了YOLOV2的算法,并基于YOLOV2提出了YOLO9000这样两种模型。YOLOV2相对于YOLOV1改进的几个核心的点在于
- 引入了Anchor box的思想,改进直接回归这样一种比较粗糙的方式
- 在输出层使用卷积层替代YOLOV1的全连接层(FC层),能够带来一个比较直观的好处就是能够减少对于输入图像尺寸的敏感程度。因为FC层的参数量同图像大小是息息相关的,而卷积层同图像大小是不存在关联的。
- 对于YOLO9000而言,在最终训练的时候,实际上是采用了ImageNet物体分类以及coco物体检测这样的两种数据集来对模型进行训练。用检测中的数据集中的数据来学习物体的准确的位置信息。用分类数据集来学习分类的信息。通过这种多任务来提高最终网络的鲁棒性。
- 相比于YOLOV1而言,YOLOV2不仅在识别物体的种类上,以及精度、速度、和物体的定位上都得到了大大的提升。
YOLOV2算法成为了当时最具有代表性的目标检测算法的一种,YOLOV2/YOLO9000的改进之处:
在上图中,我们可以看到主干网络采用了DarkNet的网络结构,在YOLOV1算法中,作者采用了GoogleNet这样一种架构来作为主干网络,它的性能要优于VGGNet的。DarkNet类似于VGGNet,采用了小的3*3的卷积核,在每次池化之后,整个通道的数量都会增加一倍,并且在网络结构中采用Batch Normalization来进行归一化处理,进而让整个训练过程变得更加的稳定,收敛速度变得更快,达到模型规范化的效果。
由于使用卷积层来代替FC层,因此输入的图像尺寸就可以发生变化,因而整个网络的参数同feature map的大小是无关的。因此我们可以改变图像的尺寸来进行多尺度的训练。对于分类模型采用了高分辨率的分类器。
YOLOV1算法只采用了一个维度上的特征,因此它学到的特征因此相对来说不会太精细,而YOLOV2采用了一个跳连的结构,换句话说在最终的预测的阶段,实际上采用了不同粒度上的特征,通过对不同粒度上特征的融合,来提高最终检测的性能。在最终预测的时候同样采用了Anchor的机制,Anchor机制也是Faster RCNN或者SSD算法一个非常核心重要的元素,这个元素能够带来模型在性能上的提升。
- Batch Normalization
- V1中也大量用了BN,但是在定位层FC层采用了dropout来防止过拟合。
- V2中取消了dropout,在整个网络结构中均采用BN来进行模型的规范化,模型更加稳定,收敛速度更快,
- 高分辨率分类器
- V1中使用224*224的预训练模型,但是实际上采用了448*448的图像来用于网络检测。这个过程实际上会存在一定的偏差,必然带来分布上的差异,
- V2直接采用448*448的分辨率微调最初的分类网络。保证了分类和检测这样的两个模型在分布上的一致性。
- Anchor Boxes
- 在预测Bounding box的偏移,使用卷积代替FC。我们知道在V1中FC层输出的向量的大小为S*S*(B*5 C),而V2中直接采用卷积来代替的话,卷积之后的feature map的大小为S*S,(B*5 C)则对应了通道的数量,此时同样能够达到V1的FC层相同的效果。
- 在V2中输入的图像尺寸为416*416,而不是448*448,主要原因就在于图片中的物体倾向于出现在图片的中心位置,特别是比较大的物体,因此就需要有一个单独位于物体中心的位置用来预测这个物体。而YOLO通过卷积层之后,会进行32倍的下采样。对于416*416的图像,下采样32倍之后就会得到一个13*13的feature map。对于448*448的图像进行32倍下采样之后得到一个14*14的feature map,此时就不存在这样一个网格位于图像的正中心。为了保证feature map必然会存在一个网格位于图像的正中心,此时我们只需要将经过下采样之后的feature map的大小定义为13*13,就能保证一定会存在中心的一个Cell,能够预测位于中心的物体。因此我们要保证最终得到的feature map的大小为13*13,反推过来,进行32倍的上采样,就能够得到输入的尺寸为416。这也是为了后面产生的卷积图的宽高比为奇数,就能产生一个中心的Cell。主要原因是作者通过观察发现大物体通常占据图像的中间位置,就需要一个位于中间的Cell来预测位于中间的物体的位置。如果不采用奇数的长宽的话,就需要用到中间的4个Cell来预测中间的物体。通过奇数输出的技巧就能够提高总体的效率。
- V2加入了Anchor机制之后,对于每一个Cell,会预测多个建议框。相比于之前的网络仅仅预测B个(B通常为2)建议框而言,采用Anchor Box之后,结果的召回率得到了显著的提升。但是mAP却有了一点点的下降。在作者看来准确率只有小幅度的下降,但是召回率提高了非常大,这也反映了Anchor Box确实能够在一定程度上带来整个模型性能上的提升。当然我们也需要去进一步优化准确度下降的缺陷,在V2中采用了max pooling的方法来进行下采样。
- 加入了Anchor机制之后,整个Bounding box的预测数量超过了1000个。比如说经过下采样之后的feature map为13*13的话,每个Anchor需要预测9个Bounding box的话,那么整个feature map需要预测13*13*9=1521个Bounding box。相比于之前的7*7*2=98个而言,整体需要预测的框的数量就得到了提高。进而也会带来模型在性能上的提高。但是作者在使用Anchor Box之后也遇到了两个问题,一个是对于Anchor Box而言,它的宽高维度往往是精选先验框,虽然在训练的过程中网络也会调整Box的宽高维度,最终得到准确的Bounding box的位置,但是作者希望在最开始选择的时候就选择那些最具代表性的先验的Bounding box的维度,这样就能够通过网络更容易的学习到准确的预测的位置。因此作者采用了K-means方法来对Bounding box进行回归,自动找到那些更好的Bounding box的宽高维度比。在使用K-means方法来对Bounding box聚类的时候,同样作者也遇到了一个问题,就是传统的K-means方法来进行Bounding box聚类的时候,主要采用欧式距离的方法来度量两个Bounding box之间的相似度,这样也就意味着较大的Bounding box会比较小的Bounding box产生更多的误差,因此作者在训练的时候就采用了IoU得分作为距离的度量。此时所得到的误差也就和Bounding box的尺寸无关了。经过聚类之后,作者也确定了预测的Anchor Box数量为5的时候比较合适。作者最终选择了5种大小的Bounding box的维度来进行定位预测。在这样一个聚类的结果中,作者也发现扁长的框较少,而瘦高的框会较多一点,这实际上也符合了行人的特征。有关K-means聚类的内容可以参考聚类 。
- 细粒度特征
在传统的V1算法中,直接采用了从顶到下的神经网的结构,没有考虑不同尺度下的特征。在V2中通过添加pass through layer,把浅层特征图(26*26)连接到深层特征图(13*13)。在连接的时候作者并没有采用Pooling的方法进行下采样,而是将26*26*512的特征图直接叠加成13*13*2048的特征图,然后再与深层特征图相连接,增加细粒度特征。将粗粒度与细粒度的融合,性能获得了1%的提升。这是类似于ResNet中的identity mapping的方法。
- Multi-Scale Training
多尺度训练,每隔几次迭代后就会微调网络的输入尺寸。输入图像尺寸包括了多个不同的尺度{320,352,...,608},这里为什么针对不同的图像输入尺度采用同一种参数,主要原因就在于在整个V2结构中并没有采用FC层这种同feature map大小相关的网络层,整个网络通过卷积层的堆叠来完成,因此整个网络参数的数量同feature map的大小是不相关的。因此我们可以改变图像的尺寸来增加整个网络对于图像尺寸变化的鲁棒性。通过这样的机制就使得网络可以预测不同尺度的图片。这也意味着同一个网络可以进行不同分辨率的检测任务。在小尺寸图片上,V2能够实现更快的运行检测速度,并且在速度和精度上达到一个平衡。实际上如果输入的图像为228*228的话,它的FPS能够达到90,并且mAP值同Faster RCNN在同一个水准的。因此V2通常会用在低性能的GPU,高帧率的视频检测,多路视频场景中,也就是说在一些低功耗和视频图像处理中,YOLO算法会有更大的应用的范围,因为它的速度能够达到更高的实时性并且在精度上能够同一些其他的深度学习检测算法保持在相同的水准上。
- Darknet-19
在V1中使用GoogleNet作为主干网络,在V2中作者重新设计了一种新的网络来作为特征提取部分。Darknet这种网络结构,作者也参考了一些前人的先进经验,它的整个网络结构也类似于VGGNet,作者使用了较多的3*3的卷积核来进行堆叠。在一次Max Pooling操作后,通道数量进行了翻倍。另外作者借鉴了Net in Net的方法,使用了global average pooling,将1*1的卷积核置于3*3的卷积核之间,用来压缩特征。并且在网络结构设计的时候,作者采用了batch normalization来对模型进行规范化。通过batch normalization一方面加快了模型的训练速度,另一方面提高了模型训练的稳定性。整个Darknet的网络结构包括了19个卷积层以及5个池化层,整个运算的次数为55.8亿次。相比于VGGNet而言,它的整个计算量也有了一定的下降。在ImageNet图像分类任务上,在top 1准确度上也达到了72.9%的准确度。当然我们在使用YOLOV2的时候,同样也可以采用更加先进的网络结构,如ResNet、DenseNet等等一些其他的主干网络结构。或者说更加轻量级的网络结构,如MobileNet等。具体采用什么网络结构可以经过不断的尝试来对比不同的网络结构对于YOLOV2算法性能的影响。
对于YOLOV2而言,我们在预测Anchor box,对于每一个Bounding box,同样会预测4个坐标值,1个置信度值和C个类别上的概率分布值。这一点也是同V1存在区别的,对于V1而言,这里的类别值是针对于一个Cell(格子)而言的,每一个格子又对应了B个Bounding box,最终预测出来的向量为(5*B C)。而在V2中,类别的概率分布实际上是对于每一个Bounding box而言的,这一点也是同Anchor box保持一致的,对于每一个Bounding box会预测出(5 C)长度的向量,整个Anchor box假设是B个Bounding box,那整个Anchor box预测出来的向量就是B*(5 C)个。这是跟V1相区别的一点,在类别的预测上更加关注于每一个Bounding box,主要原因就在于这里采用了Anchor机制。而LOYOV1的类别主要是针对于每一个Cell,也就是说对于每一个Cell,只预测一个类别的物体。
- YOLOV2算法网络性能
通过YOLOV2对比YOLOV1几点的改进上来看,我们会发现作者在进行改进的时候,每一点的的加入都会带来性能上的提升。但是有一点下降的时候就是如上图所示的加入了Anchor Box的时候性能有了一点点的下降,从69.5降低到了69.2。但是这一点点下降带来的是召回率的较大程度上的提升。经过了后面跳连、多尺度的加入后,YOLOV2在整体上相对于V1有了一个非常大的提升,从63.4提升到了78.6。
这里我们可以看一下,相比于SSD和Faster RCNN算法而言,YOLOV2算法能够达到一个更好的检测精度,并且能够实现更快的检测速度,因此YOLOV2也成为了当时最先进的深度学习目标检测算法。
同样我们也可以看到,上图是关于mAP和FPS整体的一个曲线图,YOLOV2它能达到更好的一个效果,在保证较快的检测速度的同时,能够保证较好的检测精度。
YOLO9000
YOLO9000是在YOLOV2的基础上提出的一种可以检测超过9000个类别的模型,其主要贡献点在于提出了一种分类和检测的联合训练策略。
这主要归功于它采用了WordTree这样一种结构。通过WordTree来混合检测数据集与识别数据集中的数据,来达到检测和分类联合训练的效果。这种联合技术分别在ImageNet和COCO数据集上进行训练。对于分类任务,它的标签粒度实际上是更细的。换句话说,对于分类任务而言,同样是狗,对于数据集中的label,它可能就包括了更加细的狗的类别的划分,比如说包括了哈士奇、金毛等更细粒度的标签。而对于检测任务而言,它仅仅是区别猫、狗这样一种相对来说粗的粒度上的概念。如果将分类和回归采用简单的方法磨合,就会同时存在狗这样的label和哈士奇这样的label的情况。而WordTree则是将这两种label来构建它们之间的粒度关系,将整个分类和检测任务的数据集来进行融合。在检测数据集中,我们不仅需要完成物体类别的回归,同样我们需要对物体的类别进行判定;而在分类数据集上,我们需要对物体的类别进行分类,但是物体类别的粒度会更细。通过WordTree就能够将label之间的层次关系表示出来。在这样一种结构中,我们采用了一种图或者叫WordNet来进行表示,通过WordTree来找到标签与标签之间的关系以及包含关系。
在具体训练的时候,如果一副图片的label是拿到更多的一些label,比如说不仅是狗,同时也是哺乳动物,同时是犬科,也可能是家畜。那这些label就会同时作为这个图片的标记,换句话说对于一副图片就会产生多个标记,标签之间不需要相互独立。对于ImageNet分类任务而言,它使用一个大的SoftMax就能够完成分类任务。而WordTree则需要对同一概念下的同义词进行SoftMax,这样做的好处就在于对一些未知的新的物体在进行分类的时候,整体的性能降低是很优雅的。比如看到一个狗的照片,但是不知道它属于哪种类别的狗,这个时候高置信度预测就是狗,而其他狗的类别同义词中,比如说哈士奇、金毛等这些词,它们的置信度就是低的置信度。作者通过这样的一种方式,将COCO检测数据集、ImageNet中的分类数据集来进行混合,利用混合之后的数据集,来进行检测和分类任务的训练。最终得到了YOLO9000这样一个性能更加优的分类器和检测器。YOLO9000能够完成9000个物体的检测和分类,并且能够保证较高的一个实时性。因此我们将YOLO9000称作YOLOV2更强的版本。
在上图中,对于ImageNet分类任务而言,我们需要针对每一个类别,通过一个较大的SoftMax来完成分类。而对于WordTree在进行SoftMax的时候,需要考虑label和label之间的关系,考虑这些label和label之间的关系之后,再通过对同一概念下的同义词进行SoftMax分类来完成最终的分类loss的计算。通过联合训练策略,YOLO9000可以快速检测出超过9000个类别的物体,总体mAP值为19.7%。
YOLOV3
YOLOV3相比于V1、V2,更多的考虑的是速度和精度的均衡,融合了更多先进的方法,重点解决了小物体检测的问题。
- YOLOV3改进策略:
1、首先在主干网络上进行了优化,采用了类似ResNet的网络结构,来提取更多更优的特征表示。
如上图所示,采用ResNet网络结构,能够获取到更加好的检测效果,当然采用更深层的网络结构会带来检测速度上的下降。这也是在速度和精度上的一种平衡。
2、采用了多尺度的预测,类如FPN的结构来提高检测的精度。
在上图的右下角我们可以看到,V3分别从不同尺度的feature map上来提取特征,作为YOLO检测的输入。对于Anchor的设计,同样采用聚类的方法来获得最终的长宽比。通过聚类之后得到9个簇(聚类中心),将这9个簇平均分到了3种尺度上,每一种尺度预测3个Bounding box。对于每一种尺度,作者会引入一些卷积层来进一步的提取特征,之后再输出Bounding box的信息。对于尺度1而言,作者直接卷积之后直接输出Bounding box的信息。对于尺度2而言,作者在输出Bounding box之前,会对尺度1输出的卷积进行上采样,然后同卷积2的feature map进行相加,相加之后再输出到后续的Bounding box的信息。整个feature map尺寸的大小相对尺度1而言,扩大了两倍。尺度3相对于尺度2而言,同样也扩大了两倍。它的输入同样也是在尺度2上经过上采样,来得到的feature map的大小加上原先的feature map的大小,之后再通过卷积输出最后的Bounding box的信息。整个结构也是类似于FPN的一种结构。
3、采用了更好的分类器(binary cross-entropy loss二值交叉熵)来完成分类任务。
主要原因就在于Softmax在对每一个Bounding box进行分类的时候只能分配一个类别,就是分数最大的那个类别,最终会输出一个概率分布,概率分布最高的那个值作为当前Bounding box的类别。当前的目标如果存在重叠的目标标签的时候,Softmax就不适合这种多标签分类的问题。实际上Softmax可以通过多个logistic分类器替代,且准确度不会下降。
- YOLOV3网络性能
通过上图我们可以看到,对比YOLOV2的网络结构,V3能够实现更好的效果,由于上图中V3采用的是Darknet,相比于其他采用ResNet的结构,性能会有一些下降。
对于YOLOV3本身采用不同的主干网络,采用ResNet-152的时候,它的整体性能能够达到最好的效果。
对于COCO数据集,这里也给出了一个性能对比,YOLOV3对比于其他的目标识别网络结构,同样也能达到一个比较好的性能的优势。但YOLOV3整体的检测速度会有所下降,但相比于其他的目标检测算法,检测速度依然会更快。
VOLOV3的框架源码是由Darknet框架完成的,Darknet框架是由C语言和CUDA实现的,对GPU显存的利用率较高,对第三方的依赖库较少。容易实现跨平台接口的移植,能够较好的应用于Windows或者嵌入式设备中。Darknet也是实现深度网络很好的一种框架。
现在我们来看一下YOLOV3的代码结构,这里依然以Darknet作为V3的主干网络
代码语言:javascript复制import tensorflow as tf
from tensorflow.keras import layers, regularizers, models
class DarkNet:
def __init__(self):
pass
def _darknet_conv(self, x, filters, size, strides=1, batch_norm=True):
if strides == 1:
padding = 'same'
else:
# 对输入的图像矩阵上、左各添加一行(列)的0来作为padding
x = layers.ZeroPadding2D(((1, 0), (1, 0)))(x) # top left half-padding
padding = 'valid'
x = layers.Conv2D(filters, (size, size),
strides=strides,
padding=padding,
use_bias=not batch_norm,
kernel_regularizer=regularizers.l2(0.0005))(x)
if batch_norm:
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(alpha=0.1)(x)
return x
def _darknet_residual(self, x, filters):
prev = x
x = self._darknet_conv(x, filters // 2, 1)
x = self._darknet_conv(x, filters, 3)
x = layers.Add()([prev, x])
return x
def _darknet_block(self, x, filters, blocks):
x = self._darknet_conv(x, filters, 3, strides=2)
for _ in range(blocks):
x = self._darknet_residual(x, filters)
return x
def build_darknet(self, x, name=None):
# x = inputs = tf.keras.layers.Input([None, None, 3])
x = self._darknet_conv(x, 32, 3)
# 1/2
x = self._darknet_block(x, 64, 1)
# 1/4
x = self._darknet_block(x, 128, 2)
# 1/8
x = x1 = self._darknet_block(x, 256, 8)
# 1/16
x = x2 = self._darknet_block(x, 512, 8)
# 1/32
x3 = self._darknet_block(x, 1024, 4)
# return tf.keras.Model(inputs, (x_36, x_61, x), name=name)
return x1, x2, x3
def build_darknet_tiny(self, x, name=None):
# x = inputs = tf.keras.layers.Input([None, None, 3])
x = self._darknet_conv(x, 16, 3)
x = layers.MaxPool2D(2, 2, 'same')(x)
x = self._darknet_conv(x, 32, 3)
x = layers.MaxPool2D(2, 2, 'same')(x)
x = self._darknet_conv(x, 64, 3)
x = layers.MaxPool2D(2, 2, 'same')(x)
x = self._darknet_conv(x, 128, 3)
x = layers.MaxPool2D(2, 2, 'same')(x)
x = x_8 = self._darknet_conv(x, 256, 3) # skip connection
x = layers.MaxPool2D(2, 2, 'same')(x)
x = self._darknet_conv(x, 512, 3)
x = layers.MaxPool2D(2, 1, 'same')(x)
x = self._darknet_conv(x, 1024, 3)
# return tf.keras.Model(inputs, (x_8, x), name=name)
return x_8, x
if __name__ == '__main__':
# import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"
darknet = DarkNet()
x = layers.Input(shape=(500, 600, 3))
darknet_model = darknet.build_darknet(x)
model = models.Model(x, darknet_model)
print(model.summary())
运行结果
代码语言:javascript复制Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 500, 600, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 500, 600, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 500, 600, 32) 128 conv2d[0][0]
__________________________________________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 500, 600, 32) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, 501, 601, 32) 0 leaky_re_lu[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 250, 300, 64) 18432 zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 250, 300, 64) 256 conv2d_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 250, 300, 64) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 250, 300, 32) 2048 leaky_re_lu_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 250, 300, 32) 128 conv2d_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 250, 300, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 250, 300, 64) 18432 leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 250, 300, 64) 256 conv2d_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 250, 300, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
add (Add) (None, 250, 300, 64) 0 leaky_re_lu_1[0][0]
leaky_re_lu_3[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 251, 301, 64) 0 add[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 125, 150, 128 73728 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 125, 150, 128 512 conv2d_4[0][0]
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 125, 150, 128 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 125, 150, 64) 8192 leaky_re_lu_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 125, 150, 64) 256 conv2d_5[0][0]
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, 125, 150, 64) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 125, 150, 128 73728 leaky_re_lu_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 125, 150, 128 512 conv2d_6[0][0]
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, 125, 150, 128 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 125, 150, 128 0 leaky_re_lu_4[0][0]
leaky_re_lu_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 125, 150, 64) 8192 add_1[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 125, 150, 64) 256 conv2d_7[0][0]
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, 125, 150, 64) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 125, 150, 128 73728 leaky_re_lu_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 125, 150, 128 512 conv2d_8[0][0]
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, 125, 150, 128 0 batch_normalization_8[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 125, 150, 128 0 add_1[0][0]
leaky_re_lu_8[0][0]
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 126, 151, 128 0 add_2[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 62, 75, 256) 294912 zero_padding2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 62, 75, 256) 1024 conv2d_9[0][0]
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 62, 75, 128) 32768 leaky_re_lu_9[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 62, 75, 128) 512 conv2d_10[0][0]
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_10[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 62, 75, 256) 1024 conv2d_11[0][0]
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_11[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 62, 75, 256) 0 leaky_re_lu_9[0][0]
leaky_re_lu_11[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 62, 75, 128) 32768 add_3[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 62, 75, 128) 512 conv2d_12[0][0]
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_12[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_12[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 62, 75, 256) 1024 conv2d_13[0][0]
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_13[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 62, 75, 256) 0 add_3[0][0]
leaky_re_lu_13[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 62, 75, 128) 32768 add_4[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 62, 75, 128) 512 conv2d_14[0][0]
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_14[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_14[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 62, 75, 256) 1024 conv2d_15[0][0]
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_15[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 62, 75, 256) 0 add_4[0][0]
leaky_re_lu_15[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D) (None, 62, 75, 128) 32768 add_5[0][0]
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 62, 75, 128) 512 conv2d_16[0][0]
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_16[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_16[0][0]
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 62, 75, 256) 1024 conv2d_17[0][0]
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_17[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 62, 75, 256) 0 add_5[0][0]
leaky_re_lu_17[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, 62, 75, 128) 32768 add_6[0][0]
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 62, 75, 128) 512 conv2d_18[0][0]
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_18[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_18[0][0]
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 62, 75, 256) 1024 conv2d_19[0][0]
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_19[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 62, 75, 256) 0 add_6[0][0]
leaky_re_lu_19[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, 62, 75, 128) 32768 add_7[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 62, 75, 128) 512 conv2d_20[0][0]
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_20[0][0]
__________________________________________________________________________________________________
conv2d_21 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_20[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 62, 75, 256) 1024 conv2d_21[0][0]
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_21[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 62, 75, 256) 0 add_7[0][0]
leaky_re_lu_21[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D) (None, 62, 75, 128) 32768 add_8[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 62, 75, 128) 512 conv2d_22[0][0]
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_22[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_22[0][0]
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 62, 75, 256) 1024 conv2d_23[0][0]
__________________________________________________________________________________________________
leaky_re_lu_23 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_23[0][0]
__________________________________________________________________________________________________
add_9 (Add) (None, 62, 75, 256) 0 add_8[0][0]
leaky_re_lu_23[0][0]
__________________________________________________________________________________________________
conv2d_24 (Conv2D) (None, 62, 75, 128) 32768 add_9[0][0]
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 62, 75, 128) 512 conv2d_24[0][0]
__________________________________________________________________________________________________
leaky_re_lu_24 (LeakyReLU) (None, 62, 75, 128) 0 batch_normalization_24[0][0]
__________________________________________________________________________________________________
conv2d_25 (Conv2D) (None, 62, 75, 256) 294912 leaky_re_lu_24[0][0]
__________________________________________________________________________________________________
batch_normalization_25 (BatchNo (None, 62, 75, 256) 1024 conv2d_25[0][0]
__________________________________________________________________________________________________
leaky_re_lu_25 (LeakyReLU) (None, 62, 75, 256) 0 batch_normalization_25[0][0]
__________________________________________________________________________________________________
add_10 (Add) (None, 62, 75, 256) 0 add_9[0][0]
leaky_re_lu_25[0][0]
__________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D (None, 63, 76, 256) 0 add_10[0][0]
__________________________________________________________________________________________________
conv2d_26 (Conv2D) (None, 31, 37, 512) 1179648 zero_padding2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_26 (BatchNo (None, 31, 37, 512) 2048 conv2d_26[0][0]
__________________________________________________________________________________________________
leaky_re_lu_26 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_26[0][0]
__________________________________________________________________________________________________
conv2d_27 (Conv2D) (None, 31, 37, 256) 131072 leaky_re_lu_26[0][0]
__________________________________________________________________________________________________
batch_normalization_27 (BatchNo (None, 31, 37, 256) 1024 conv2d_27[0][0]
__________________________________________________________________________________________________
leaky_re_lu_27 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_27[0][0]
__________________________________________________________________________________________________
conv2d_28 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_27[0][0]
__________________________________________________________________________________________________
batch_normalization_28 (BatchNo (None, 31, 37, 512) 2048 conv2d_28[0][0]
__________________________________________________________________________________________________
leaky_re_lu_28 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_28[0][0]
__________________________________________________________________________________________________
add_11 (Add) (None, 31, 37, 512) 0 leaky_re_lu_26[0][0]
leaky_re_lu_28[0][0]
__________________________________________________________________________________________________
conv2d_29 (Conv2D) (None, 31, 37, 256) 131072 add_11[0][0]
__________________________________________________________________________________________________
batch_normalization_29 (BatchNo (None, 31, 37, 256) 1024 conv2d_29[0][0]
__________________________________________________________________________________________________
leaky_re_lu_29 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_29[0][0]
__________________________________________________________________________________________________
conv2d_30 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_29[0][0]
__________________________________________________________________________________________________
batch_normalization_30 (BatchNo (None, 31, 37, 512) 2048 conv2d_30[0][0]
__________________________________________________________________________________________________
leaky_re_lu_30 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_30[0][0]
__________________________________________________________________________________________________
add_12 (Add) (None, 31, 37, 512) 0 add_11[0][0]
leaky_re_lu_30[0][0]
__________________________________________________________________________________________________
conv2d_31 (Conv2D) (None, 31, 37, 256) 131072 add_12[0][0]
__________________________________________________________________________________________________
batch_normalization_31 (BatchNo (None, 31, 37, 256) 1024 conv2d_31[0][0]
__________________________________________________________________________________________________
leaky_re_lu_31 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_31[0][0]
__________________________________________________________________________________________________
conv2d_32 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_31[0][0]
__________________________________________________________________________________________________
batch_normalization_32 (BatchNo (None, 31, 37, 512) 2048 conv2d_32[0][0]
__________________________________________________________________________________________________
leaky_re_lu_32 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_32[0][0]
__________________________________________________________________________________________________
add_13 (Add) (None, 31, 37, 512) 0 add_12[0][0]
leaky_re_lu_32[0][0]
__________________________________________________________________________________________________
conv2d_33 (Conv2D) (None, 31, 37, 256) 131072 add_13[0][0]
__________________________________________________________________________________________________
batch_normalization_33 (BatchNo (None, 31, 37, 256) 1024 conv2d_33[0][0]
__________________________________________________________________________________________________
leaky_re_lu_33 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_33[0][0]
__________________________________________________________________________________________________
conv2d_34 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_33[0][0]
__________________________________________________________________________________________________
batch_normalization_34 (BatchNo (None, 31, 37, 512) 2048 conv2d_34[0][0]
__________________________________________________________________________________________________
leaky_re_lu_34 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_34[0][0]
__________________________________________________________________________________________________
add_14 (Add) (None, 31, 37, 512) 0 add_13[0][0]
leaky_re_lu_34[0][0]
__________________________________________________________________________________________________
conv2d_35 (Conv2D) (None, 31, 37, 256) 131072 add_14[0][0]
__________________________________________________________________________________________________
batch_normalization_35 (BatchNo (None, 31, 37, 256) 1024 conv2d_35[0][0]
__________________________________________________________________________________________________
leaky_re_lu_35 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_35[0][0]
__________________________________________________________________________________________________
conv2d_36 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_35[0][0]
__________________________________________________________________________________________________
batch_normalization_36 (BatchNo (None, 31, 37, 512) 2048 conv2d_36[0][0]
__________________________________________________________________________________________________
leaky_re_lu_36 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_36[0][0]
__________________________________________________________________________________________________
add_15 (Add) (None, 31, 37, 512) 0 add_14[0][0]
leaky_re_lu_36[0][0]
__________________________________________________________________________________________________
conv2d_37 (Conv2D) (None, 31, 37, 256) 131072 add_15[0][0]
__________________________________________________________________________________________________
batch_normalization_37 (BatchNo (None, 31, 37, 256) 1024 conv2d_37[0][0]
__________________________________________________________________________________________________
leaky_re_lu_37 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_37[0][0]
__________________________________________________________________________________________________
conv2d_38 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_37[0][0]
__________________________________________________________________________________________________
batch_normalization_38 (BatchNo (None, 31, 37, 512) 2048 conv2d_38[0][0]
__________________________________________________________________________________________________
leaky_re_lu_38 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_38[0][0]
__________________________________________________________________________________________________
add_16 (Add) (None, 31, 37, 512) 0 add_15[0][0]
leaky_re_lu_38[0][0]
__________________________________________________________________________________________________
conv2d_39 (Conv2D) (None, 31, 37, 256) 131072 add_16[0][0]
__________________________________________________________________________________________________
batch_normalization_39 (BatchNo (None, 31, 37, 256) 1024 conv2d_39[0][0]
__________________________________________________________________________________________________
leaky_re_lu_39 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_39[0][0]
__________________________________________________________________________________________________
conv2d_40 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_39[0][0]
__________________________________________________________________________________________________
batch_normalization_40 (BatchNo (None, 31, 37, 512) 2048 conv2d_40[0][0]
__________________________________________________________________________________________________
leaky_re_lu_40 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_40[0][0]
__________________________________________________________________________________________________
add_17 (Add) (None, 31, 37, 512) 0 add_16[0][0]
leaky_re_lu_40[0][0]
__________________________________________________________________________________________________
conv2d_41 (Conv2D) (None, 31, 37, 256) 131072 add_17[0][0]
__________________________________________________________________________________________________
batch_normalization_41 (BatchNo (None, 31, 37, 256) 1024 conv2d_41[0][0]
__________________________________________________________________________________________________
leaky_re_lu_41 (LeakyReLU) (None, 31, 37, 256) 0 batch_normalization_41[0][0]
__________________________________________________________________________________________________
conv2d_42 (Conv2D) (None, 31, 37, 512) 1179648 leaky_re_lu_41[0][0]
__________________________________________________________________________________________________
batch_normalization_42 (BatchNo (None, 31, 37, 512) 2048 conv2d_42[0][0]
__________________________________________________________________________________________________
leaky_re_lu_42 (LeakyReLU) (None, 31, 37, 512) 0 batch_normalization_42[0][0]
__________________________________________________________________________________________________
add_18 (Add) (None, 31, 37, 512) 0 add_17[0][0]
leaky_re_lu_42[0][0]
__________________________________________________________________________________________________
zero_padding2d_4 (ZeroPadding2D (None, 32, 38, 512) 0 add_18[0][0]
__________________________________________________________________________________________________
conv2d_43 (Conv2D) (None, 15, 18, 1024) 4718592 zero_padding2d_4[0][0]
__________________________________________________________________________________________________
batch_normalization_43 (BatchNo (None, 15, 18, 1024) 4096 conv2d_43[0][0]
__________________________________________________________________________________________________
leaky_re_lu_43 (LeakyReLU) (None, 15, 18, 1024) 0 batch_normalization_43[0][0]
__________________________________________________________________________________________________
conv2d_44 (Conv2D) (None, 15, 18, 512) 524288 leaky_re_lu_43[0][0]
__________________________________________________________________________________________________
batch_normalization_44 (BatchNo (None, 15, 18, 512) 2048 conv2d_44[0][0]
__________________________________________________________________________________________________
leaky_re_lu_44 (LeakyReLU) (None, 15, 18, 512) 0 batch_normalization_44[0][0]
__________________________________________________________________________________________________
conv2d_45 (Conv2D) (None, 15, 18, 1024) 4718592 leaky_re_lu_44[0][0]
__________________________________________________________________________________________________
batch_normalization_45 (BatchNo (None, 15, 18, 1024) 4096 conv2d_45[0][0]
__________________________________________________________________________________________________
leaky_re_lu_45 (LeakyReLU) (None, 15, 18, 1024) 0 batch_normalization_45[0][0]
__________________________________________________________________________________________________
add_19 (Add) (None, 15, 18, 1024) 0 leaky_re_lu_43[0][0]
leaky_re_lu_45[0][0]
__________________________________________________________________________________________________
conv2d_46 (Conv2D) (None, 15, 18, 512) 524288 add_19[0][0]
__________________________________________________________________________________________________
batch_normalization_46 (BatchNo (None, 15, 18, 512) 2048 conv2d_46[0][0]
__________________________________________________________________________________________________
leaky_re_lu_46 (LeakyReLU) (None, 15, 18, 512) 0 batch_normalization_46[0][0]
__________________________________________________________________________________________________
conv2d_47 (Conv2D) (None, 15, 18, 1024) 4718592 leaky_re_lu_46[0][0]
__________________________________________________________________________________________________
batch_normalization_47 (BatchNo (None, 15, 18, 1024) 4096 conv2d_47[0][0]
__________________________________________________________________________________________________
leaky_re_lu_47 (LeakyReLU) (None, 15, 18, 1024) 0 batch_normalization_47[0][0]
__________________________________________________________________________________________________
add_20 (Add) (None, 15, 18, 1024) 0 add_19[0][0]
leaky_re_lu_47[0][0]
__________________________________________________________________________________________________
conv2d_48 (Conv2D) (None, 15, 18, 512) 524288 add_20[0][0]
__________________________________________________________________________________________________
batch_normalization_48 (BatchNo (None, 15, 18, 512) 2048 conv2d_48[0][0]
__________________________________________________________________________________________________
leaky_re_lu_48 (LeakyReLU) (None, 15, 18, 512) 0 batch_normalization_48[0][0]
__________________________________________________________________________________________________
conv2d_49 (Conv2D) (None, 15, 18, 1024) 4718592 leaky_re_lu_48[0][0]
__________________________________________________________________________________________________
batch_normalization_49 (BatchNo (None, 15, 18, 1024) 4096 conv2d_49[0][0]
__________________________________________________________________________________________________
leaky_re_lu_49 (LeakyReLU) (None, 15, 18, 1024) 0 batch_normalization_49[0][0]
__________________________________________________________________________________________________
add_21 (Add) (None, 15, 18, 1024) 0 add_20[0][0]
leaky_re_lu_49[0][0]
__________________________________________________________________________________________________
conv2d_50 (Conv2D) (None, 15, 18, 512) 524288 add_21[0][0]
__________________________________________________________________________________________________
batch_normalization_50 (BatchNo (None, 15, 18, 512) 2048 conv2d_50[0][0]
__________________________________________________________________________________________________
leaky_re_lu_50 (LeakyReLU) (None, 15, 18, 512) 0 batch_normalization_50[0][0]
__________________________________________________________________________________________________
conv2d_51 (Conv2D) (None, 15, 18, 1024) 4718592 leaky_re_lu_50[0][0]
__________________________________________________________________________________________________
batch_normalization_51 (BatchNo (None, 15, 18, 1024) 4096 conv2d_51[0][0]
__________________________________________________________________________________________________
leaky_re_lu_51 (LeakyReLU) (None, 15, 18, 1024) 0 batch_normalization_51[0][0]
__________________________________________________________________________________________________
add_22 (Add) (None, 15, 18, 1024) 0 add_21[0][0]
leaky_re_lu_51[0][0]
==================================================================================================
Total params: 40,620,640
Trainable params: 40,584,928
Non-trainable params: 35,712
__________________________________________________________________________________________________
None
现在我们同样以一张真实图片放进DarkNet网络来看一下它的输出
代码语言:javascript复制import tensorflow as tf
from tensorflow.keras import layers, preprocessing, backend, models, optimizers, losses
import numpy as np
from skimage.transform import resize
import cv2
from darknet import DarkNet
if __name__ == '__main__':
image_shape = (640, 640, 3)
inputs = layers.Input(shape=image_shape, name='input_images')
img = preprocessing.image.load_img("/Users/admin/Documents/6565.jpeg", color_mode='rgb')
print(np.asarray(img).shape)
img = preprocessing.image.img_to_array(img, dtype=np.uint8)
img = resize(img, (500, 600, 3))
r, g, b = cv2.split(img)
img_new = cv2.merge((b, g, r))
cv2.imshow('img', img_new)
cv2.waitKey()
img = tf.reshape(img, shape=(1, 500, 600, 3))
img = tf.cast(img, dtype=tf.float32)
darknet = DarkNet()
x1, x2, x3 = darknet.build_darknet(img, "darknet")
print(x1)
print(x2)
print(x3)
运行结果
代码语言:javascript复制(500, 600, 3)
tf.Tensor(
[[[[ 0.00702189 0.03710163 0.0196409 ... 0.0163209 0.01883623
0.02028573]
[ 0.01853188 0.04479578 0.01628121 ... 0.02737337 0.01972338
0.0221351 ]
[ 0.02489552 0.04561189 0.0133727 ... 0.02933762 0.02601534
0.02545091]
...
[ 0.02999318 0.04430323 0.01117283 ... 0.04108566 0.02752275
0.02969706]
[ 0.03310268 0.03064804 0.00811778 ... 0.04248232 0.01873943
0.03429878]
[ 0.031242 0.01869549 0.00538015 ... 0.04010146 0.01265335
0.03006411]]
[[ 0.01190397 0.04426598 0.02267551 ... 0.02721478 0.01419058
0.01956502]
[ 0.02011285 0.05427074 0.02812263 ... 0.04790227 0.02177166
0.02097031]
[ 0.02350339 0.04734711 0.02639016 ... 0.04541105 0.03672761
0.03044088]
...
[ 0.02873143 0.05200965 0.02874331 ... 0.05127688 0.04358829
0.03549737]
[ 0.03104832 0.04192881 0.02595827 ... 0.05019434 0.03324148
0.04566974]
[ 0.03644987 0.02229784 0.01345429 ... 0.05408909 0.0206928
0.02558102]]
[[ 0.01576635 0.04403085 0.03064483 ... 0.03431061 0.01345932
0.02265406]
[ 0.02731288 0.0571069 0.03528018 ... 0.06157918 0.02313543
0.02421679]
[ 0.03191518 0.05347876 0.03840743 ... 0.05584713 0.0448707
0.03770072]
...
[ 0.03558546 0.06352354 0.04227144 ... 0.06258293 0.0542843
0.04468985]
[ 0.03699733 0.05328134 0.04033919 ... 0.06317733 0.04202078
0.05739952]
[ 0.04035077 0.02690394 0.01971592 ... 0.06475294 0.03254093
0.02828585]]
...
[[ 0.01630963 0.04668541 0.02819202 ... 0.03200665 0.01913418
0.02782134]
[ 0.0491737 0.05831444 0.03875992 ... 0.07023347 0.02521073
0.0173834 ]
[ 0.0415144 0.05914963 0.04380757 ... 0.06772075 0.0484581
0.03031482]
...
[ 0.05148096 0.08445016 0.053005 ... 0.09943888 0.07069471
0.05040834]
[ 0.05343552 0.07812162 0.05166402 ... 0.09464369 0.05342869
0.0650906 ]
[ 0.05246405 0.03943954 0.03826581 ... 0.09720362 0.04240302
0.02389028]]
[[ 0.01854013 0.03467961 0.02737298 ... 0.03321863 0.01745614
0.02645899]
[ 0.05019822 0.05316209 0.03685486 ... 0.06981583 0.02412502
0.01746458]
[ 0.0455527 0.05220181 0.04563557 ... 0.07015773 0.04351527
0.02548869]
...
[ 0.05588122 0.07288983 0.06352465 ... 0.0998327 0.06205979
0.04501405]
[ 0.05254165 0.06506801 0.05582106 ... 0.09188974 0.05029549
0.05668368]
[ 0.054254 0.03359404 0.0368329 ... 0.09183563 0.03533145
0.02233268]]
[[ 0.02590168 0.01221103 0.00661758 ... 0.01732716 0.01981493
0.00553053]
[ 0.06578204 0.02100405 0.03607714 ... 0.04162673 0.02033466
0.00474999]
[ 0.06639072 0.02467684 0.04038451 ... 0.04298527 0.03473096
0.00497131]
...
[ 0.08611905 0.03445664 0.05900205 ... 0.06652813 0.04711878
0.00797101]
[ 0.07101764 0.03633063 0.05526533 ... 0.06015524 0.04113264
0.00857197]
[ 0.06011952 0.03160603 0.04531481 ... 0.05932839 0.03322645
-0.00144226]]]], shape=(1, 62, 75, 256), dtype=float32)
tf.Tensor(
[[[[ 0.02010754 0.17533582 0.07757577 ... 0.06052487 0.05777644
0.10139249]
[ 0.05621743 0.22742833 0.08875396 ... 0.06083819 0.08230163
0.12694804]
[ 0.05694219 0.27579683 0.10591049 ... 0.08499209 0.09056436
0.13364902]
...
[ 0.09318738 0.32588002 0.13163184 ... 0.09205563 0.13108005
0.16528013]
[ 0.10288294 0.29111558 0.12460643 ... 0.09246793 0.12271982
0.154916 ]
[ 0.11644219 0.29084933 0.10303604 ... 0.06042754 0.1123874
0.05603432]]
[[-0.00731172 0.23350656 0.14397429 ... 0.0901071 0.08942708
0.11593238]
[ 0.03458138 0.2926241 0.15676174 ... 0.11007611 0.1111774
0.18014196]
[ 0.0284584 0.34004262 0.19374296 ... 0.16848375 0.12760192
0.2139673 ]
...
[ 0.06327949 0.34813675 0.22408469 ... 0.18899271 0.18891017
0.28227586]
[ 0.07761866 0.31988186 0.20391057 ... 0.17408505 0.19080023
0.270067 ]
[ 0.10583464 0.33156028 0.20244673 ... 0.09893966 0.17711449
0.20129773]]
[[ 0.01340551 0.28949237 0.19811937 ... 0.1197026 0.11860391
0.14620262]
[ 0.06738993 0.33153313 0.18949795 ... 0.14439419 0.15668163
0.20356293]
[ 0.06309617 0.37025958 0.22893772 ... 0.21589889 0.17632814
0.26374507]
...
[ 0.09764073 0.41124275 0.25439623 ... 0.24678275 0.24769326
0.38024455]
[ 0.10629504 0.39008772 0.22078443 ... 0.22285342 0.2542427
0.37246144]
[ 0.14019735 0.41126236 0.22011776 ... 0.11746476 0.22358818
0.29107058]]
...
[[ 0.06512184 0.31763586 0.21111774 ... 0.08311962 0.11782292
0.13175835]
[ 0.11063426 0.31823912 0.18542264 ... 0.15265766 0.19259651
0.14326593]
[ 0.12094036 0.30361003 0.18941748 ... 0.17448957 0.25755906
0.28367803]
...
[ 0.2248446 0.5807909 0.29713866 ... 0.26443014 0.46212077
0.50833344]
[ 0.20268725 0.54203445 0.31950969 ... 0.31023803 0.57477856
0.5131881 ]
[ 0.19338077 0.55789274 0.3008828 ... 0.20943408 0.5993015
0.5247482 ]]
[[ 0.04425564 0.24623281 0.16312422 ... 0.07397985 0.10738321
0.09644185]
[ 0.07367805 0.25757003 0.14356498 ... 0.1096781 0.15777385
0.14369756]
[ 0.08915307 0.27343556 0.15532103 ... 0.1299849 0.22844025
0.23822355]
...
[ 0.16184637 0.44786355 0.22896504 ... 0.16219209 0.42592645
0.41359898]
[ 0.13247554 0.46103957 0.21493404 ... 0.19752344 0.46263826
0.4113955 ]
[ 0.13998859 0.424602 0.19340041 ... 0.11524324 0.48122478
0.42501563]]
[[ 0.06795549 0.1174883 0.12472259 ... 0.00938366 0.09621227
0.02467335]
[ 0.1118784 0.11568361 0.1193563 ... 0.04631645 0.14883237
0.04253691]
[ 0.11193752 0.11736955 0.12856942 ... 0.04773692 0.1910527
0.14409938]
...
[ 0.17385212 0.18195051 0.20425987 ... 0.06048537 0.2962573
0.26750037]
[ 0.13494101 0.17817453 0.19761309 ... 0.07331598 0.29464245
0.26997644]
[ 0.11244369 0.16725752 0.15111834 ... 0.08176479 0.2358211
0.28005368]]]], shape=(1, 31, 37, 512), dtype=float32)
tf.Tensor(
[[[[-0.01265945 0.06105311 0.02804097 ... 0.12468794 0.00993979
0.3949808 ]
[ 0.02812381 0.12599275 0.06643102 ... 0.20592886 0.01675945
0.35711288]
[ 0.03800426 0.15380652 0.06957074 ... 0.17797028 0.03165228
0.33485994]
...
[ 0.05460297 0.17340466 0.08649275 ... 0.20660254 0.04767137
0.27955294]
[ 0.06527366 0.16711406 0.08819915 ... 0.20657346 0.04138339
0.23658685]
[ 0.11341671 0.15978217 0.09385981 ... 0.06267272 0.04208312
0.06184213]]
[[-0.02480961 0.11789224 0.05241875 ... 0.24305731 0.026287
0.6968279 ]
[-0.01524782 0.27773243 0.08247501 ... 0.35821885 0.13594112
0.6374743 ]
[ 0.01146706 0.35383514 0.06215107 ... 0.34377554 0.20146516
0.65299857]
...
[ 0.0049678 0.41639516 0.06534901 ... 0.40614185 0.27951655
0.60622 ]
[ 0.0549583 0.3994353 0.08921719 ... 0.33266076 0.22348869
0.5212743 ]
[ 0.11582644 0.33168983 0.11826031 ... 0.09884495 0.23059697
0.18818964]]
[[-0.00709713 0.10665773 0.10562691 ... 0.3685503 0.03174011
1.0569239 ]
[ 0.0360801 0.3181733 0.11319471 ... 0.609921 0.19532445
0.90524477]
[ 0.07110845 0.39570072 0.10941249 ... 0.5508422 0.28382924
0.88486195]
...
[ 0.09356576 0.56346405 0.16079594 ... 0.6316314 0.42121774
0.8889071 ]
[ 0.14804433 0.53353816 0.20019136 ... 0.505939 0.30195767
0.7483924 ]
[ 0.18007258 0.46273366 0.21454462 ... 0.1231877 0.34005257
0.29117164]]
...
[[ 0.57801944 0.05140074 0.55010873 ... 0.9907694 0.27018997
1.5803752 ]
[ 0.7717136 0.41169658 0.45828584 ... 1.1706412 0.877574
1.614941 ]
[ 0.83528 0.5458925 0.5262903 ... 0.95375824 0.7727885
1.7768972 ]
...
[ 1.1012962 0.8821167 0.7818441 ... 1.3975596 1.2036581
2.2841585 ]
[ 1.1586943 0.8808095 0.736365 ... 1.0890223 1.0880564
1.9274839 ]
[ 0.55047524 0.8218602 0.31735015 ... 0.5116454 0.71488905
1.2035301 ]]
[[ 0.52717656 0.0187534 0.52658385 ... 0.74684787 0.28416955
1.1192551 ]
[ 0.78041464 0.2183921 0.5570664 ... 0.8381381 0.71258795
1.2932761 ]
[ 0.7950839 0.45496145 0.44901538 ... 0.65927815 0.52610666
1.3399026 ]
...
[ 1.0636898 0.5894906 0.6011242 ... 1.1307018 0.93491393
1.7814779 ]
[ 1.0547557 0.515274 0.60575294 ... 0.82745934 0.8540727
1.4874971 ]
[ 0.5840069 0.4702899 0.19164832 ... 0.4153113 0.5881666
0.98950684]]
[[ 0.5906736 0.07756135 0.37920105 ... 0.2741555 0.3630209
0.5063239 ]
[ 0.84665227 0.18433347 0.36145717 ... 0.37650666 0.7024845
0.63645005]
[ 0.9077507 0.21267995 0.31218895 ... 0.33049637 0.5790174
0.7540965 ]
...
[ 1.3001715 0.30852342 0.3727007 ... 0.42047045 0.8897114
1.095215 ]
[ 1.1691496 0.2579837 0.32209545 ... 0.28450698 0.85795134
0.8737666 ]
[ 0.74934727 0.08250578 0.02699032 ... 0.26691487 0.48186785
0.57626563]]]], shape=(1, 15, 18, 1024), dtype=float32)
通过结果我们可以看到输出的x3是一个15*18的1024通道的feature map。通过几个卷积层的特征提取后,我们来看一下第一种尺度的确立。
代码语言:javascript复制if isinstance(x3, tuple):
x, x_skip = x3[0], x3[1]
# concat with skip connection
x = darknet._darknet_conv(x, 512, 1)
x = layers.UpSampling2D(2)(x)
x = layers.Concatenate()([x, x_skip])
else:
x = x3
# 继续提取特征
x = darknet._darknet_conv(x, 512, 1)
x = darknet._darknet_conv(x, 512 * 2, 3)
x = darknet._darknet_conv(x, 512, 1)
x = darknet._darknet_conv(x, 512 * 2, 3)
x = darknet._darknet_conv(x, 512, 1)
# 第一个连接点
concat_output = x
x = darknet._darknet_conv(x, 512 * 2, 3)
# 9个簇,3个尺度
anchor_masks = np.array([[6, 7, 8], [3, 4, 5], [0, 1, 2]])
# 3
num_anchors = len(anchor_masks[0])
# [batch, h, w, num_anchors * (num_class 5)]
# 此处不使用批归一化和激活函数,91为分类的类别数,经过1*1的卷积变为通道数为288
x = darknet._darknet_conv(x, num_anchors * (91 5), 1, batch_norm=False)
# [batch, h, w, num_anchors, (num_class 5)]
# 获取每一个Bounding box的坐标,宽高,置信度,91种分类值
x = layers.Lambda(lambda x: tf.reshape(x, (-1, tf.shape(x)[1], tf.shape(x)[2],
num_anchors, 91 5)))(x)
print('x-feature', x)
运行结果
代码语言:javascript复制x-feature tf.Tensor(
[[[[[-2.12986674e-02 -1.28190415e-02 2.42650993e-02 ...
1.88951138e-02 1.15461722e-02 4.10516514e-03]
[-7.00458232e-03 -6.40480444e-02 -4.11605602e-03 ...
-1.90706477e-02 -7.39076827e-03 -2.13008132e-02]
[ 1.48373526e-02 2.99843848e-02 -2.35161907e-03 ...
1.17246658e-02 3.67713533e-03 -2.74285711e-02]]
[[ 1.19262021e-02 -1.46993799e-02 5.35590984e-02 ...
4.29752022e-02 4.74381819e-02 -3.46213300e-03]
[ 1.87750198e-02 -7.29137212e-02 -2.13017017e-02 ...
-6.19563311e-02 -4.59589064e-05 -1.09684207e-02]
[ 2.74974927e-02 4.51491997e-02 5.54233184e-03 ...
3.97420377e-02 1.03670759e-02 -3.50360572e-02]]
[[ 3.08010839e-02 -2.76494306e-02 5.07744811e-02 ...
5.73975183e-02 5.89894168e-02 1.00010457e-02]
[ 3.29134949e-02 -7.68970698e-02 -2.95035299e-02 ...
-6.91443607e-02 -9.03178658e-03 -9.73423943e-03]
[ 3.41692045e-02 5.63440025e-02 -1.01298261e-02 ...
4.07708511e-02 -1.06241880e-03 -4.11202013e-02]]
...
[[ 3.71415019e-02 -2.97103673e-02 2.49126554e-02 ...
5.25119938e-02 8.22339877e-02 1.78629234e-02]
[ 1.99203826e-02 -3.20278443e-02 -9.32650343e-02 ...
-6.59081787e-02 -4.74899914e-03 -1.24815479e-03]
[ 2.49552056e-02 6.95627034e-02 -4.95083481e-02 ...
4.11871448e-02 -3.24234599e-03 -3.45583595e-02]]
[[ 2.72666849e-02 -2.06625983e-02 1.15971528e-02 ...
4.86928970e-02 7.82895312e-02 3.55140828e-02]
[-1.80562376e-03 -3.33333760e-02 -7.57523999e-02 ...
-5.37165552e-02 2.02620588e-02 1.37979332e-02]
[ 1.74374748e-02 4.06187028e-02 -5.19423224e-02 ...
3.70235816e-02 -1.95782110e-02 -5.10013811e-02]]
[[ 5.77433035e-04 2.36083427e-03 7.29786232e-04 ...
1.71147492e-02 4.00328860e-02 1.20161343e-02]
[-2.11647302e-02 -7.13961478e-03 -4.78811935e-02 ...
-3.01137157e-02 4.76393141e-02 2.44592540e-02]
[ 1.69811584e-02 -1.01883989e-03 -2.33765990e-02 ...
1.25630796e-02 -1.02819242e-02 -3.58622633e-02]]]
[[[-1.10443812e-02 -3.97518836e-02 3.01363952e-02 ...
1.91549212e-03 1.22005846e-02 2.44541988e-02]
[ 1.05863065e-02 -7.72392303e-02 -1.22049963e-02 ...
-2.31272057e-02 -1.54958181e-02 -2.62442492e-02]
[ 1.81336086e-02 5.29933721e-02 5.19179460e-03 ...
7.89460726e-04 2.39550695e-02 -1.43806385e-02]]
[[ 4.21044566e-02 -7.19758123e-02 6.48800433e-02 ...
3.11497897e-02 6.79308027e-02 3.03095281e-02]
[ 4.01652753e-02 -7.80624002e-02 -4.19066474e-02 ...
-9.42696854e-02 1.69266611e-02 -1.42076090e-02]
[ 1.06336940e-02 5.82374185e-02 1.74147114e-02 ...
1.64043251e-02 8.21948890e-03 -6.20956253e-03]]
[[ 6.93715289e-02 -8.70447382e-02 5.05565703e-02 ...
5.02983816e-02 8.30220953e-02 4.85016853e-02]
[ 5.95633276e-02 -6.68422505e-02 -5.59888855e-02 ...
-1.10772729e-01 6.89282548e-03 9.51338559e-04]
[ 3.90676036e-03 5.89937456e-02 1.66862123e-02 ...
1.55973276e-02 -1.69991283e-03 -1.78720281e-02]]
...
[[ 7.07603395e-02 -7.71692768e-02 -1.01323351e-02 ...
7.58069083e-02 1.04550324e-01 6.81465715e-02]
[ 3.67756709e-02 -5.27387708e-02 -1.56000599e-01 ...
-1.08684987e-01 3.44326384e-02 1.70609877e-02]
[-1.35356737e-02 9.69064161e-02 -3.80031727e-02 ...
2.69469768e-02 -1.52276689e-02 -5.39844111e-03]]
[[ 6.45810515e-02 -5.63131124e-02 -3.85337472e-02 ...
3.73310447e-02 9.05677006e-02 8.90817791e-02]
[ 7.48985074e-03 -5.72662205e-02 -1.15535840e-01 ...
-1.02188706e-01 6.14462420e-02 4.50671501e-02]
[-1.99525543e-02 7.84753338e-02 -3.74116749e-02 ...
1.82851423e-02 -3.34787518e-02 -4.37803492e-02]]
[[-1.70683172e-02 -1.12030376e-02 -1.44486539e-02 ...
5.33634387e-02 4.62283641e-02 4.11102436e-02]
[-3.59186642e-02 -3.71208005e-02 -7.87584633e-02 ...
-3.97154316e-02 7.98852742e-02 6.14069253e-02]
[ 2.42478121e-02 -2.32372768e-02 -3.60472724e-02 ...
4.28188033e-03 -1.38179241e-02 -4.66152951e-02]]]
[[[-1.32958740e-02 -4.95475829e-02 3.39892246e-02 ...
-7.29576126e-03 1.39791202e-02 2.34869123e-02]
[ 2.13094950e-02 -9.99129117e-02 -1.57876909e-02 ...
-5.18525280e-02 -2.61400267e-02 -2.71103717e-02]
[ 2.83630360e-02 6.01008907e-02 -7.27777090e-03 ...
-9.83146857e-03 4.59993072e-02 -1.40527617e-02]]
[[ 4.07150239e-02 -8.50995481e-02 8.06102604e-02 ...
3.95527221e-02 8.69234279e-02 3.50015983e-02]
[ 6.92836344e-02 -1.12608701e-01 -3.40074413e-02 ...
-1.20482400e-01 2.42354535e-02 -1.87436230e-02]
[ 1.57830790e-02 7.76715353e-02 3.87026346e-03 ...
1.78444646e-02 1.43550970e-02 -1.17646521e-02]]
[[ 7.31113181e-02 -1.17991671e-01 4.95737828e-02 ...
7.20864907e-02 1.01454437e-01 5.67514747e-02]
[ 9.37286466e-02 -8.80911946e-02 -5.65224215e-02 ...
-1.34926200e-01 1.53460065e-02 -1.23040527e-02]
[ 1.84805840e-02 7.86037892e-02 5.05544478e-03 ...
4.10796180e-02 -3.68518289e-04 -4.25781161e-02]]
...
[[ 9.04548764e-02 -1.10353246e-01 -2.21383497e-02 ...
7.09554851e-02 1.09944977e-01 7.76602924e-02]
[ 5.82461879e-02 -8.31968114e-02 -2.06670105e-01 ...
-1.05571590e-01 7.52032101e-02 1.09575912e-02]
[ 1.42870555e-02 1.25893652e-01 -2.98469309e-02 ...
3.48280445e-02 -2.46300697e-02 -2.88325157e-02]]
[[ 7.61369914e-02 -8.99936631e-02 -4.74772304e-02 ...
3.83360982e-02 9.07076597e-02 8.10515806e-02]
[ 2.15916149e-02 -7.67830163e-02 -1.64944768e-01 ...
-1.15896091e-01 9.59205031e-02 4.32182997e-02]
[ 1.68116391e-02 8.79817605e-02 -3.16673741e-02 ...
2.40424946e-02 -5.35408445e-02 -7.24187940e-02]]
[[-1.48314722e-02 -2.96833459e-02 -2.56156735e-02 ...
9.28033143e-02 5.50253950e-02 3.62834856e-02]
[-5.43197468e-02 -4.87328805e-02 -1.25860408e-01 ...
-5.65597974e-02 1.15847975e-01 6.99854195e-02]
[ 5.48458323e-02 -2.06113495e-02 -4.03817855e-02 ...
1.25552900e-02 -4.22744006e-02 -5.89952245e-02]]]
...
[[[-1.20623149e-02 2.61196606e-02 -4.35408466e-02 ...
-6.16324097e-02 -1.68263055e-02 4.68136035e-02]
[-5.67124411e-03 -1.45342827e-01 -8.67251232e-02 ...
-5.69032058e-02 -1.38447434e-02 -7.54084662e-02]
[ 1.02853000e-01 7.38101602e-02 1.93650331e-02 ...
-6.05053008e-02 7.13789389e-02 -6.47684783e-02]]
[[-4.10030484e-02 2.85905600e-03 -3.00528929e-02 ...
-5.35979383e-02 3.19539197e-03 3.25613022e-02]
[ 6.39054403e-02 -2.14769080e-01 -1.12496294e-01 ...
-1.34656668e-01 7.26364851e-02 -2.66989619e-02]
[ 1.29280925e-01 1.70464933e-01 -2.00560223e-02 ...
-2.38300376e-02 3.59551683e-02 -6.01377822e-02]]
[[ 4.06577624e-03 -3.74247953e-02 -1.20954126e-01 ...
4.62541133e-02 1.38009787e-02 1.09562829e-01]
[ 1.05925158e-01 -2.12012216e-01 -1.50621578e-01 ...
-1.98456317e-01 1.34330913e-01 -3.83280814e-02]
[ 1.54474065e-01 1.80665389e-01 -3.71079184e-02 ...
3.55897620e-02 5.87542914e-02 -1.04030527e-01]]
...
[[-4.68370020e-02 -5.71053624e-02 -2.49573573e-01 ...
1.88227922e-01 -2.86086351e-02 2.14172781e-01]
[ 1.20051935e-01 -3.19604158e-01 -3.19904029e-01 ...
-3.41793090e-01 2.84540415e-01 6.17709756e-03]
[ 2.71654487e-01 2.33168036e-01 3.40336002e-02 ...
7.59018213e-02 5.71172796e-02 -1.09001949e-01]]
[[ 1.04058608e-02 -3.59554365e-02 -2.51959532e-01 ...
1.65973037e-01 1.64872371e-02 1.03472635e-01]
[ 1.59810111e-02 -2.14048713e-01 -2.96314359e-01 ...
-2.54471689e-01 2.61308998e-01 7.82132745e-02]
[ 1.84800684e-01 2.08053455e-01 2.48465687e-03 ...
5.88002466e-02 -6.42427132e-02 -7.00361654e-02]]
[[-1.34417951e-01 -4.51179445e-02 -1.71515286e-01 ...
1.70900643e-01 6.63031712e-02 6.88617751e-02]
[-8.58106390e-02 -1.88835293e-01 -3.03194582e-01 ...
-1.54090762e-01 2.28593558e-01 1.75451607e-01]
[ 1.84561789e-01 -2.21906416e-03 3.08258720e-02 ...
6.78364933e-03 -5.10172807e-02 -7.03569800e-02]]]
[[[ 2.31989585e-02 1.84244215e-02 -5.45321777e-02 ...
-9.91214290e-02 -5.50193004e-02 3.97444963e-02]
[-3.14349271e-02 -1.12125173e-01 -1.22499153e-01 ...
-1.16276247e-02 -4.86939400e-03 -3.75156552e-02]
[ 9.71443057e-02 6.07669540e-02 1.00417566e-02 ...
-5.56786843e-02 6.09663352e-02 -4.14383933e-02]]
[[ 3.81110795e-03 2.76268972e-03 -7.44963288e-02 ...
-9.59365070e-02 -2.70121004e-02 4.88485955e-03]
[ 1.08721070e-02 -1.82402730e-01 -1.29066288e-01 ...
-1.01776771e-01 8.73479322e-02 2.53140889e-02]
[ 1.26176834e-01 1.51847810e-01 -2.55566053e-02 ...
7.53580034e-03 2.93306783e-02 -6.35222942e-02]]
[[ 2.35941987e-02 2.16898695e-02 -1.43177032e-01 ...
-1.95587501e-02 -3.28312069e-02 3.41644585e-02]
[ 1.58402100e-02 -1.59175903e-01 -1.86870903e-01 ...
-1.82109341e-01 1.54500306e-01 3.20900790e-02]
[ 1.43276423e-01 1.30958587e-01 -2.35337093e-02 ...
2.80009657e-02 2.75624190e-02 -7.08849207e-02]]
...
[[-2.02391483e-02 7.22819194e-02 -2.93630600e-01 ...
8.13771337e-02 -6.83161616e-02 1.02509551e-01]
[ 6.99979439e-03 -3.09366494e-01 -3.28241378e-01 ...
-3.01551223e-01 2.70715624e-01 3.55862156e-02]
[ 2.74527669e-01 2.44567603e-01 2.45369077e-02 ...
6.41325787e-02 2.80965250e-02 -1.04708076e-01]]
[[ 3.02943066e-02 1.08648062e-01 -2.61392057e-01 ...
9.51729715e-02 4.58405614e-02 4.49815802e-02]
[-7.45093524e-02 -1.85242623e-01 -2.91598648e-01 ...
-2.33028024e-01 2.22766444e-01 6.61788359e-02]
[ 1.94759756e-01 2.09963858e-01 1.67359784e-02 ...
2.11250931e-02 -4.91552465e-02 -1.05114125e-01]]
[[-8.30361769e-02 5.13372794e-02 -1.72320426e-01 ...
1.41022027e-01 3.73988375e-02 -7.74629973e-03]
[-8.95664692e-02 -1.94477350e-01 -2.23829746e-01 ...
-1.45139411e-01 2.12749004e-01 1.18668720e-01]
[ 1.46615759e-01 3.49683425e-04 2.07263827e-02 ...
6.89087808e-03 -1.09619228e-02 -7.46264979e-02]]]
[[[-4.14312258e-03 -2.30032764e-02 -5.19185401e-02 ...
-1.10398076e-01 -7.14685097e-02 -2.26081964e-02]
[-6.89245388e-02 -9.49723199e-02 -8.78821686e-02 ...
-3.86478305e-02 1.00823008e-02 -7.87448417e-03]
[ 5.39410636e-02 4.03013751e-02 4.10149917e-02 ...
-4.83371317e-02 4.39780615e-02 -1.48745105e-02]]
[[-4.13541123e-03 -6.34293705e-02 -7.05688000e-02 ...
-5.06710261e-02 -6.01965152e-02 -2.22593173e-02]
[-7.18676969e-02 -1.30974457e-01 -8.47775340e-02 ...
-8.01434666e-02 6.16097823e-02 1.31439492e-02]
[ 1.08527943e-01 1.11964203e-01 5.53309657e-02 ...
-2.50461772e-02 7.19226226e-02 -3.34107801e-02]]
[[ 9.59025882e-03 -2.81581841e-02 -1.01271510e-01 ...
1.65477283e-02 -6.78266138e-02 -1.69788972e-02]
[-8.36558342e-02 -1.43099621e-01 -1.24651879e-01 ...
-1.11780643e-01 9.90719125e-02 7.35272467e-03]
[ 1.19166039e-01 9.60514396e-02 6.00976646e-02 ...
-3.09683867e-02 6.78957105e-02 -1.01662055e-02]]
...
[[ 1.64548047e-02 6.70955554e-02 -2.01510817e-01 ...
6.33064955e-02 -1.68538511e-01 1.59906782e-02]
[-1.33484960e-01 -2.70103425e-01 -2.45062202e-01 ...
-2.12776259e-01 1.93733349e-01 -1.66241117e-02]
[ 2.48514816e-01 1.84183270e-01 7.60227069e-02 ...
-5.83735704e-02 1.14162177e-01 -3.51295769e-02]]
[[ 5.50285727e-02 9.92684141e-02 -1.31815210e-01 ...
1.19052537e-01 -5.98695688e-02 8.18241388e-04]
[-1.51684225e-01 -1.70005709e-01 -2.13801295e-01 ...
-1.65214032e-01 1.97645426e-01 -3.23740542e-02]
[ 2.13599682e-01 1.31373867e-01 4.59876806e-02 ...
-7.22393990e-02 5.31942993e-02 1.05985254e-02]]
[[-6.49436191e-03 5.94013035e-02 -1.12281799e-01 ...
8.47744867e-02 -6.11773804e-02 -5.76723367e-02]
[-8.54235440e-02 -1.07037961e-01 -1.07448973e-01 ...
-6.31732643e-02 1.32712960e-01 1.94089264e-02]
[ 1.40816450e-01 -5.00221644e-03 3.66713330e-02 ...
-4.24850695e-02 4.74549830e-02 4.81987931e-02]]]]], shape=(1, 15, 18, 3, 96), dtype=float32)