Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.CV 方向,今日共计39篇
检测相关(5篇)
【1】 Detecting Mitosis against Domain Shift using a Fused Detector and Deep Ensemble Classification Model for MIDOG Challenge 标题:基于融合检测器和深度集成分类模型的MIDOG挑战有丝分裂检测 链接:https://arxiv.org/abs/2108.13983
作者:Jingtang Liang,Cheng Wang,Yujie Cheng,Zheng Wang,Fang Wang,Liyu Huang,Zhibin Yu,Yubo Wang 机构:School of Life Science and Technology, Xidian University, Xi’an, Shannxi, China, College of Electrical Engineering, Ocean University of China, Qingdao, Shandong 摘要:有丝分裂图形计数是肿瘤增殖的一个重要标志物,已被证明与患者的预后相关。基于深度学习的有丝分裂图形检测方法已被用于使用苏木精-伊红(H&E)染色图像自动定位有丝分裂中的细胞。然而,由于H&E图像中色调和强度的巨大变化,模型性能恶化。在这项工作中,我们提出了一个融合检测器和深度集成分类模型的两阶段有丝分裂图形检测框架。为了减轻H&E图像中颜色变化的影响,我们利用染色归一化和数据增强,帮助模型学习颜色无关的特征。在MIDOG challenge发布的初步测试集上,建议模型的F1得分为0.7550。 摘要:Mitotic figure count is an important marker of tumor proliferation and has been shown to be associated with patients' prognosis. Deep learning based mitotic figure detection methods have been utilized to automatically locate the cell in mitosis using hematoxylin & eosin (H&E) stained images. However, the model performance deteriorates due to the large variation of color tone and intensity in H&E images. In this work, we proposed a two stage mitotic figure detection framework by fusing a detector and a deep ensemble classification model. To alleviate the impact of color variation in H&E images, we utilize both stain normalization and data augmentation, aiding model to learn color irrelevant features. The proposed model obtains an F1 score of 0.7550 on the preliminary testing set released by the MIDOG challenge.
【2】 A Novel Dataset for Keypoint Detection of quadruped Animals from Images 标题:一种新的从图像中检测四足动物关键点的数据集 链接:https://arxiv.org/abs/2108.13958
作者:Prianka Banik,Lin Li,Xishuang Dong 机构:Department of Computer Science, Prairie View A&M University, Department of Electrical and Computer Engineering 摘要:在本文中,我们研究了从图像中定位多个四足动物或四足动物物种的通用关键点集的问题。由于缺乏具有地面真值注释的大规模动物关键点数据集,我们开发了一种新的数据集,AwA Pose,用于从图像中检测四足动物的关键点。与现有的动物关键点检测数据集相比,我们的数据集包含的每只动物的关键点明显更多,并且具有更多样化的动物。我们针对不同的关键点检测任务(包括可见和不可见的动物病例),使用最先进的深度学习模型对数据集进行基准测试。实验结果表明了该数据集的有效性。我们相信,该数据集将有助于计算机视觉界设计和评估广义四足动物关键点检测问题的改进模型。 摘要:In this paper, we studied the problem of localizing a generic set of keypoints across multiple quadruped or four-legged animal species from images. Due to the lack of large scale animal keypoint dataset with ground truth annotations, we developed a novel dataset, AwA Pose, for keypoint detection of quadruped animals from images. Our dataset contains significantly more keypoints per animal and has much more diverse animals than the existing datasets for animal keypoint detection. We benchmarked the dataset with a state-of-the-art deep learning model for different keypoint detection tasks, including both seen and unseen animal cases. Experimental results showed the effectiveness of the dataset. We believe that this dataset will help the computer vision community in the design and evaluation of improved models for the generalized quadruped animal keypoint detection problem.
【3】 Discriminative Semantic Feature Pyramid Network with Guided Anchoring for Logo Detection 标题:基于引导锚定的判别性语义特征金字塔网络标识检测 链接:https://arxiv.org/abs/2108.13775
作者:Baisong Zhang,Weiqing Min,Jing Wang,Sujuan Hou,Qiang Hou,Yuanjie Zheng,Shuqiang Jiang 备注:13 pages, 11 figures 摘要:近年来,logo检测在知识产权保护、产品品牌管理、logo持续时间监控等多媒体领域得到了越来越广泛的应用。与一般的目标检测不同,logo检测是一项具有挑战性的任务,特别是对于现实场景中的小logo对象和大纵横比logo对象。在本文中,我们提出了一种新的方法,称为带引导锚定的区分语义特征金字塔网络(DSFP-GA),它可以通过聚合语义信息和生成不同长宽比的锚盒来解决这些挑战。更具体地说,我们的方法主要包括区分语义特征金字塔(DSFP)和引导锚定(GA)。考虑到用于检测小徽标对象的低层特征映射缺乏语义信息,我们提出了DSFP,它可以丰富低层特征映射的更多区分性语义特征,并在小徽标对象上实现更好的性能。此外,预设锚定框在检测大纵横比徽标对象时效率较低。因此,我们将遗传算法集成到我们的方法中,以生成大纵横比锚箱来缓解此问题。在四个基准上的大量实验结果证明了我们提出的DSFP-GA的有效性。此外,我们还进一步进行了视觉分析和烧蚀研究,以说明我们的方法在检测小尺寸和大尺寸标志对象方面的优势。有关代码和模型,请访问https://github.com/Zhangbaisong/DSFP-GA. 摘要:Recently, logo detection has received more and more attention for its wide applications in the multimedia field, such as intellectual property protection, product brand management, and logo duration monitoring. Unlike general object detection, logo detection is a challenging task, especially for small logo objects and large aspect ratio logo objects in the real-world scenario. In this paper, we propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA), which can address these challenges via aggregating the semantic information and generating different aspect ratio anchor boxes. More specifically, our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA). Considering that low-level feature maps that are used to detect small logo objects lack semantic information, we propose the DSFP, which can enrich more discriminative semantic features of low-level feature maps and can achieve better performance on small logo objects. Furthermore, preset anchor boxes are less efficient for detecting large aspect ratio logo objects. We therefore integrate the GA into our method to generate large aspect ratio anchor boxes to mitigate this issue. Extensive experimental results on four benchmarks demonstrate the effectiveness of our proposed DSFP-GA. Moreover, we further conduct visual analysis and ablation studies to illustrate the advantage of our method in detecting small and large aspect logo objects. The code and models can be found at https://github.com/Zhangbaisong/DSFP-GA.
【4】 End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations 标题:基于车道标注的端到端单目消失点检测 链接:https://arxiv.org/abs/2108.13699
作者:Hiroto Honda,Motoki Kimura,Takumi Karasawa,Yusuke Uchida 机构:Mobility Technologies 摘要:消失点(VP)在各种计算机视觉任务中起着至关重要的作用,特别是在从图像中识别三维场景时。在汽车应用的真实场景中,当摄像头连接到车辆或附件意外扰动时,手动获取外部摄像头参数的成本很高。本文介绍了一种简单而有效的端到端消失点检测方法。通过自动计算外推车道标记标注的交点,我们获得了几何一致的VP标注,并减轻了手动VP标注引起的人为标注错误。利用计算出的VP标签,我们通过热图估计来训练端到端VP检测器。VP检测器实现了比使用手动注释或车道检测的方法更高的精度,为准确的在线摄像机校准铺平了道路。 摘要:Vanishing points (VPs) play a vital role in various computer vision tasks, especially for recognizing the 3D scenes from an image. In the real-world scenario of automobile applications, it is costly to manually obtain the external camera parameters when the camera is attached to the vehicle or the attachment is accidentally perturbed. In this paper we introduce a simple but effective end-to-end vanishing point detection. By automatically calculating intersection of the extrapolated lane marker annotations, we obtain geometrically consistent VP labels and mitigate human annotation errors caused by manual VP labeling. With the calculated VP labels we train end-to-end VP Detector via heatmap estimation. The VP Detector realizes higher accuracy than the methods utilizing manual annotation or lane detection, paving the way for accurate online camera calibration.
【5】 Fiducial marker recovery and detection from severely truncated data in navigation assisted spine surgery 标题:导航辅助脊柱手术中严重截断数据的基准标记恢复与检测 链接:https://arxiv.org/abs/2108.13844
作者:Fuxin Fan,Björn Kreher,Holger Keil,Maier Andreas,Yixing Huang 机构:Pattern Recognition Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen , Germany, Siemens Healthcare GmbH, Forchheim , Germany 摘要:基准标记通常用于导航辅助的微创脊柱手术(MISS),它们有助于将图像坐标转换为真实世界坐标。实际上,由于术中手术中使用的C臂锥束计算机断层扫描(CBCT)系统探测器尺寸有限,这些标记可能位于视野(FOV)之外。因此,CBCT体积中重建的标记物会出现伪影,形状扭曲,这为导航设置了障碍。在这项工作中,我们提出了两种基准标记检测方法:从畸变标记直接检测(直接方法)和标记恢复后检测(恢复方法)。针对重建体中畸变标记的直接检测问题,提出了一种基于两个神经网络和传统圆检测算法的高效自动标记检测方法。对于标记恢复,提出了一种任务特定的学习策略来从严重截断的数据中恢复标记。然后,采用传统的标记检测算法进行位置检测。在模拟数据和真实数据上对这两种方法进行了评估,均实现了小于0.2mm的标记配准误差。我们的实验表明,直接方法能够准确地检测到扭曲的标记,而基于任务特定学习的恢复方法在各种数据集上具有很高的鲁棒性和通用性。此外,任务特定学习能够从严重截断的数据中准确地重建其他感兴趣的结构,例如用于图像引导穿刺活检的肋骨,这为CBCT系统提供了新的潜在应用。 摘要:Fiducial markers are commonly used in navigation assisted minimally invasive spine surgery (MISS) and they help transfer image coordinates into real world coordinates. In practice, these markers might be located outside the field-of-view (FOV), due to the limited detector sizes of C-arm cone-beam computed tomography (CBCT) systems used in intraoperative surgeries. As a consequence, reconstructed markers in CBCT volumes suffer from artifacts and have distorted shapes, which sets an obstacle for navigation. In this work, we propose two fiducial marker detection methods: direct detection from distorted markers (direct method) and detection after marker recovery (recovery method). For direct detection from distorted markers in reconstructed volumes, an efficient automatic marker detection method using two neural networks and a conventional circle detection algorithm is proposed. For marker recovery, a task-specific learning strategy is proposed to recover markers from severely truncated data. Afterwards, a conventional marker detection algorithm is applied for position detection. The two methods are evaluated on simulated data and real data, both achieving a marker registration error smaller than 0.2 mm. Our experiments demonstrate that the direct method is capable of detecting distorted markers accurately and the recovery method with task-specific learning has high robustness and generalizability on various data sets. In addition, the task-specific learning is able to reconstruct other structures of interest accurately, e.g. ribs for image-guided needle biopsy, from severely truncated data, which empowers CBCT systems with new potential applications.
分类|识别相关(1篇)
【1】 Semi-supervised Image Classification with Grad-CAM Consistency 标题:基于Grad-CAM一致性的半监督图像分类 链接:https://arxiv.org/abs/2108.13673
作者:Juyong Lee,Seunghyuk Cho 机构:This work was based on project during class "Deep Learning" in PohangUniversity of Science and Technology (POSTECH), H Cho are the members of the Department of Com-puter Science and Engineering 备注:4 pages, 3 figures 摘要:一致性训练是以半监督学习(SSL)的方式利用未标记数据的一种有效方法,它利用有监督学习和无监督学习,并对图像进行不同的增强。在这里,我们提出了另一种具有梯度CAM一致性损失的方法,因此它可以用于训练模型,具有更好的泛化性和可调整性。我们表明,我们的方法改进了基线ResNet模型,与CIFAR-10数据集相比,平均精度提高了1.44%和0.31$pm$0.59%。我们进行了消融研究,与仅使用psuedo标签进行一致性训练进行比较。此外,我们认为,当针对模型中的不同单元时,我们的方法可以在不同的环境中进行调整。代码如下:https://github.com/gimme1dollar/gradcam-consistency-semi-sup. 摘要:Consistency training, which exploits both supervised and unsupervised learning with different augmentations on image, is an effective method of utilizing unlabeled data in semi-supervised learning (SSL) manner. Here, we present another version of the method with Grad-CAM consistency loss, so it can be utilized in training model with better generalization and adjustability. We show that our method improved the baseline ResNet model with at most 1.44 % and 0.31 $pm$ 0.59 %p accuracy improvement on average with CIFAR-10 dataset. We conducted ablation study comparing to using only psuedo-label for consistency training. Also, we argue that our method can adjust in different environments when targeted to different units in the model. The code is available: https://github.com/gimme1dollar/gradcam-consistency-semi-sup.
分割|语义相关(5篇)
【1】 One-shot domain adaptation for semantic face editing of real world images using StyleALAE 标题:基于StyleALAE的真实世界图像语义人脸编辑的单击域自适应 链接:https://arxiv.org/abs/2108.13876
作者:Ravi Kiran Reddy,Kumar Shubham,Gopalakrishnan Venkatesh,Sriram Gandikota,Sarthak Khoche,Dinesh Babu Jayagopi,Gopalakrishnan Srinivasaraghavan 机构:International Institute of Information, Technology Bangalore, India 备注:12 pages, 3 figures 摘要:真实人脸图像的语义人脸编辑是生成模型的一个重要应用。最近,多项研究探索了利用预先训练的GAN模型的潜在结构生成此类修改的可能技术。然而,这种方法通常需要训练编码器网络,这通常是一个耗时且资源密集的过程。这种基于GAN的架构的一种可能的替代方案是styleALAE,一种基于潜在空间的自动编码器,可以生成高质量的照片真实图像。不幸的是,styleALAE中的重建图像不能保留输入人脸图像的身份。这限制了styleALAE在具有已知身份的图像的语义人脸编辑中的应用。在我们的工作中,我们使用了一次域适应的最新进展来解决这个问题。我们的工作确保重建图像的身份与给定的输入图像相同。我们进一步利用预先训练好的styleALAE模型的潜在空间对重建图像进行语义修改。结果表明,我们的方法可以在保留身份的同时对任何真实人脸图像进行语义修改。 摘要:Semantic face editing of real world facial images is an important application of generative models. Recently, multiple works have explored possible techniques to generate such modifications using the latent structure of pre-trained GAN models. However, such approaches often require training an encoder network and that is typically a time-consuming and resource intensive process. A possible alternative to such a GAN-based architecture can be styleALAE, a latent-space based autoencoder that can generate photo-realistic images of high quality. Unfortunately, the reconstructed image in styleALAE does not preserve the identity of the input facial image. This limits the application of styleALAE for semantic face editing of images with known identities. In our work, we use a recent advancement in one-shot domain adaptation to address this problem. Our work ensures that the identity of the reconstructed image is the same as the given input image. We further generate semantic modifications over the reconstructed image by using the latent space of the pre-trained styleALAE model. Results show that our approach can generate semantic modifications on any real world facial image while preserving the identity.
【2】 InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images 标题:InSeGan:深度图像中分割相同实例的产生式方法 链接:https://arxiv.org/abs/2108.13865
作者:Anoop Cherian,Goncalo Dias Pais,Siddarth Jain,Tim K. Marks,Alan Sullivan 机构:Gonc¸alo Dias Pais,, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, Instituto Superior T´ecnico, University of Lisbon, Portugal 备注:Accepted at ICCV 2021 摘要:在本文中,我们提出了InSeGAN,一种无监督的3D生成对抗网络(GAN),用于在深度图像中分割(几乎)相同的刚性物体实例。采用综合分析的方法,我们设计了一种新颖的GAN结构来合成多实例深度图像,并对每个实例进行独立控制。InSeGAN接受一组代码向量(例如,随机噪声向量),每个编码对象的3D姿势,该姿势由学习的隐式对象模板表示。发电机有两个不同的模块。第一个模块,实例特征生成器,使用每个编码姿势将隐式模板转换为每个对象实例的特征映射表示。第二个模块(深度图像渲染器)聚合第一个模块输出的所有单实例特征贴图,并生成多实例深度图像。鉴别器将生成的多实例深度图像与真实深度图像的分布区分开来。为了使用我们的模型进行实例分割,我们提出了一个实例姿势编码器,该编码器学习获取生成的深度图像,并为所有对象实例重现姿势代码向量。为了评估我们的方法,我们引入了一个新的合成数据集“Insta-10”,由100000个深度图像组成,每个深度图像包含10个类中的5个对象实例。我们在Insta-10上的实验以及在现实世界中有噪声的深度图像上的实验表明,InSeGAN实现了最先进的性能,通常比以前的方法有很大的优势。 摘要:In this paper, we present InSeGAN, an unsupervised 3D generative adversarial network (GAN) for segmenting (nearly) identical instances of rigid objects in depth images. Using an analysis-by-synthesis approach, we design a novel GAN architecture to synthesize a multiple-instance depth image with independent control over each instance. InSeGAN takes in a set of code vectors (e.g., random noise vectors), each encoding the 3D pose of an object that is represented by a learned implicit object template. The generator has two distinct modules. The first module, the instance feature generator, uses each encoded pose to transform the implicit template into a feature map representation of each object instance. The second module, the depth image renderer, aggregates all of the single-instance feature maps output by the first module and generates a multiple-instance depth image. A discriminator distinguishes the generated multiple-instance depth images from the distribution of true depth images. To use our model for instance segmentation, we propose an instance pose encoder that learns to take in a generated depth image and reproduce the pose code vectors for all of the object instances. To evaluate our approach, we introduce a new synthetic dataset, "Insta-10", consisting of 100,000 depth images, each with 5 instances of an object from one of 10 classes. Our experiments on Insta-10, as well as on real-world noisy depth images, show that InSeGAN achieves state-of-the-art performance, often outperforming prior methods by large margins.
【3】 Segmentation Fault: A Cheap Defense Against Adversarial Machine Learning 标题:分割错误:对抗对抗性机器学习的廉价防御 链接:https://arxiv.org/abs/2108.13617
作者:Doha Al Bared,Mohamed Nassar 机构:American University of Beirut (AUB), Beirut, Lebanon, University of New Haven, West Haven, CT, USA 摘要:最近发表的针对深层神经网络(DNN)的攻击强调了评估在关键系统中使用该技术的安全风险的方法和工具的重要性。检测对抗性机器学习的有效技术有助于建立信任,并促进在敏感和安全系统中采用深度学习。在本文中,我们提出了一种新的技术来保护深度神经网络分类器,特别是卷积分类器。我们的防御是廉价的,因为它需要更少的计算能力,尽管在检测精度方面花费很小。这项工作引用了最近发表的一种称为ML-LOO的技术。我们采用粗粒度的漏选方法,取代了ML-LOO中昂贵的逐像素漏选方法。我们评估和比较了不同分割算法在这项任务中的效率。我们的结果表明,即使检测精度略有下降,效率仍有可能大幅度提高。 摘要:Recently published attacks against deep neural networks (DNNs) have stressed the importance of methodologies and tools to assess the security risks of using this technology in critical systems. Efficient techniques for detecting adversarial machine learning helps establishing trust and boost the adoption of deep learning in sensitive and security systems. In this paper, we propose a new technique for defending deep neural network classifiers, and convolutional ones in particular. Our defense is cheap in the sense that it requires less computation power despite a small cost to pay in terms of detection accuracy. The work refers to a recently published technique called ML-LOO. We replace the costly pixel by pixel leave-one-out approach of ML-LOO by adopting coarse-grained leave-one-out. We evaluate and compare the efficiency of different segmentation algorithms for this task. Our results show that a large gain in efficiency is possible, even though penalized by a marginal decrease in detection accuracy.
【4】 SMAC-Seg: LiDAR Panoptic Segmentation via Sparse Multi-directional Attention Clustering 标题:SMAC-SEG:基于稀疏多方向注意力聚类的激光雷达全景图像分割 链接:https://arxiv.org/abs/2108.13588
作者:Enxu Li,Ryan Razani,Yixuan Xu,Liu Bingbing 机构:Huawei Noah’s Ark Lab, Toronto, Canada 摘要:全景分割的目的是在一个统一的框架内同时处理语义和实例分割。然而,在自动驾驶等应用中,如何有效地解决全景图像分割仍然是一个有待解决的研究问题。在这项工作中,我们提出了一种新的基于激光雷达的全景系统,称为SMAC Seg。我们提出了一种可学习的稀疏多方向注意聚类方法来分割多尺度前景实例。SMAC Seg是一种基于实时聚类的方法,它消除了复杂的提案网络以分割实例。现有的大多数基于聚类的方法利用预测值与地面真值中心偏移量的差值作为唯一的损失来监督实例质心回归。然而,该损失函数仅考虑当前对象的质心,而在学习聚类时不考虑其相对于相邻对象的相对位置。因此,我们建议使用一种新的质心感知排斥损失作为附加项来有效地监督网络,以区分每个对象簇及其邻居。我们的实验结果表明,SMAC Seg在大规模公共语义ITTI和nuScenes全景分割数据集上的所有实时部署网络中都达到了最先进的性能。 摘要:Panoptic segmentation aims to address semantic and instance segmentation simultaneously in a unified framework. However, an efficient solution of panoptic segmentation in applications like autonomous driving is still an open research problem. In this work, we propose a novel LiDAR-based panoptic system, called SMAC-Seg. We present a learnable sparse multi-directional attention clustering to segment multi-scale foreground instances. SMAC-Seg is a real-time clustering-based approach, which removes the complex proposal network to segment instances. Most existing clustering-based methods use the difference of the predicted and ground truth center offset as the only loss to supervise the instance centroid regression. However, this loss function only considers the centroid of the current object, but its relative position with respect to the neighbouring objects is not considered when learning to cluster. Thus, we propose to use a novel centroid-aware repel loss as an additional term to effectively supervise the network to differentiate each object cluster with its neighbours. Our experimental results show that SMAC-Seg achieves state-of-the-art performance among all real-time deployable networks on both large-scale public SemanticKITTI and nuScenes panoptic segmentation datasets.
【5】 Simultaneous Nuclear Instance and Layer Segmentation in Oral Epithelial Dysplasia 标题:口腔上皮异型增生的同时核实例和层分割 链接:https://arxiv.org/abs/2108.13904
作者:Adam J. Shephard,Simon Graham,R. M. Saad Bashir,Mostafa Jahanifar,Hanya Mahmood,Syed Ali Khurram,Nasir M. Rajpoot 机构:Department of Computer Science, University of Warwick, Coventry, UK, School of Clinical Dentistry, University of Sheffield, Sheffield, UK 备注:10 pages, 3 figures, conference 摘要:口腔上皮发育不良(OED)是口腔病变的一种恶性前组织病理学诊断。预测OED分级或病例是否会转变为恶性肿瘤对于早期发现和适当治疗至关重要。OED通常开始于上皮的下三分之一,然后随着严重程度的加重而向上发展,因此我们建议,除了单个细胞核外,分割上皮内各层,可能使研究人员能够评估重要的层特异性形态学特征,以预测等级/恶性肿瘤。我们介绍了HoVer Net ,这是一个深度学习框架,用于同时分割(和分类)OED病例H&E染色切片中的细胞核和(上皮内)层。该结构由一个编码器分支和四个解码器分支组成,用于同时进行细胞核实例分割和上皮层语义分割。我们证明,与以前的SOTA方法相比,该模型在两个任务中都达到了最先进的性能(SOTA),并且没有额外的成本。据我们所知,我们的方法是第一种同时进行核实例分割和语义组织分割的方法,有可能在计算病理学中用于其他类似的同时任务,并用于恶性肿瘤预测的未来研究。 摘要:Oral epithelial dysplasia (OED) is a pre-malignant histopathological diagnosis given to lesions of the oral cavity. Predicting OED grade or whether a case will transition to malignancy is critical for early detection and appropriate treatment. OED typically begins in the lower third of the epithelium before progressing upwards with grade severity, thus we have suggested that segmenting intra-epithelial layers, in addition to individual nuclei, may enable researchers to evaluate important layer-specific morphological features for grade/malignancy prediction. We present HoVer-Net , a deep learning framework to simultaneously segment (and classify) nuclei and (intra-)epithelial layers in H&E stained slides from OED cases. The proposed architecture consists of an encoder branch and four decoder branches for simultaneous instance segmentation of nuclei and semantic segmentation of the epithelial layers. We show that the proposed model achieves the state-of-the-art (SOTA) performance in both tasks, with no additional costs when compared to previous SOTA methods for each task. To the best of our knowledge, ours is the first method for simultaneous nuclear instance segmentation and semantic tissue segmentation, with potential for use in computational pathology for other similar simultaneous tasks and for future studies into malignancy prediction.
Zero/Few Shot|迁移|域适配|自适应(3篇)
【1】 SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory 标题:SimulLR:具有注意力引导自适应记忆的同时唇读传感器 链接:https://arxiv.org/abs/2108.13630
作者:Zhijie Lin,Zhou Zhao,Haoyuan Li,Jinglin Liu,Meng Zhang,Xingshan Zeng,Xiaofei He 机构:Zhejiang University, Huawei Noah’s Ark Lab 备注:ACMMM 2021 摘要:唇读是一种根据给定的唇部运动视频而不依赖音频流来识别口语句子的技术,由于其在许多场景中的应用而引起了人们的极大兴趣。尽管之前的研究唇读的工作已经取得了显著的成就,但它们都是以非同步的方式进行训练的,在这种情况下,预测是生成的,需要访问完整的视频。为了突破这一限制,我们研究了同时唇读的任务,并从三个方面设计了SimulLR,一种具有注意引导自适应记忆的同时唇读传感器:(1)在考虑同时环境下生成句子的句法结构的同时,解决单调对齐的挑战,我们建立了一个基于传感器的模型,并设计了有效的训练策略,包括CTC预训练、模型热身和课程学习,以促进唇读传感器的训练(2) 为了更好地学习同步编码器的时空表示,我们构造了一个截断的三维卷积和时间限制的自我注意层,在包含固定帧数的视频片段中执行帧到帧的交互(3) 由于实时场景中的存储,尤其是海量视频数据,历史信息往往受到限制。因此,我们设计了一种新的注意引导自适应记忆来组织历史片段的语义信息,并在可接受的计算感知延迟下增强视觉表现。实验表明,与目前最先进的非同步翻译方法相比,SimulLR实现了9.10$倍的翻译速度,并且获得了具有竞争力的结果,这表明了我们提出的方法的有效性。 摘要:Lip reading, aiming to recognize spoken sentences according to the given video of lip movements without relying on the audio stream, has attracted great interest due to its application in many scenarios. Although prior works that explore lip reading have obtained salient achievements, they are all trained in a non-simultaneous manner where the predictions are generated requiring access to the full video. To breakthrough this constraint, we study the task of simultaneous lip reading and devise SimulLR, a simultaneous lip Reading transducer with attention-guided adaptive memory from three aspects: (1) To address the challenge of monotonic alignments while considering the syntactic structure of the generated sentences under simultaneous setting, we build a transducer-based model and design several effective training strategies including CTC pre-training, model warm-up and curriculum learning to promote the training of the lip reading transducer. (2) To learn better spatio-temporal representations for simultaneous encoder, we construct a truncated 3D convolution and time-restricted self-attention layer to perform the frame-to-frame interaction within a video segment containing fixed number of frames. (3) The history information is always limited due to the storage in real-time scenarios, especially for massive video data. Therefore, we devise a novel attention-guided adaptive memory to organize semantic information of history segments and enhance the visual representations with acceptable computation-aware latency. The experiments show that the SimulLR achieves the translation speedup 9.10$times$ compared with the state-of-the-art non-simultaneous methods, and also obtains competitive results, which indicates the effectiveness of our proposed methods.
【2】 Self-balanced Learning For Domain Generalization 标题:自平衡学习在领域泛化中的应用 链接:https://arxiv.org/abs/2108.13597
作者:Jin Kim,Jiyoung Lee,Jungin Park,Dongbo Min,Kwanghoon Sohn 机构:†School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea, ‡Department of Computer Science and Engineering, Ewha Womans University, Seoul, Korea 备注:None 摘要:领域泛化的目的是在多领域源数据上学习一个预测模型,使该模型能够泛化到具有未知统计信息的目标领域。大多数现有方法都是在假设源数据在域和类方面都很平衡的情况下开发的。然而,使用不同组合偏差收集的真实世界训练数据往往在领域和类别上表现出严重的分布差距,导致性能大幅下降。在本文中,我们提出了一个自平衡的领域泛化框架,该框架自适应地学习损失的权重,以减轻由多领域源数据的不同分布所引起的偏差。自平衡方案基于一个辅助重新加权网络,该网络通过利用平衡元数据迭代更新以域和类信息为条件的损失权重。实验结果表明,我们的方法在领域泛化方面是有效的。 摘要:Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. However, real-world training data collected with different composition biases often exhibits severe distribution gaps for domain and class, leading to substantial performance degradation. In this paper, we propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data. The self-balanced scheme is based on an auxiliary reweighting network that iteratively updates the weight of loss conditioned on the domain and class information by leveraging balanced meta data. Experimental results demonstrate the effectiveness of our method overwhelming state-of-the-art works for domain generalization.
【3】 Iterative Filter Adaptive Network for Single Image Defocus Deblurring 标题:迭代过滤自适应网络在单幅图像散焦去模糊中的应用 链接:https://arxiv.org/abs/2108.13610
作者:Junyong Lee,Hyeongseok Son,Jaesung Rim,Sunghyun Cho,Seungyong Lee 机构:POSTECH 备注:None 摘要:我们提出了一种新的基于端到端学习的单幅图像散焦去模糊方法。提出的方法配备了一种新的迭代滤波器自适应网络(IFAN),专门设计用于处理空间变化和大散焦模糊。为了自适应地处理空间变化的模糊,IFAN预测像素级去模糊滤波器,该滤波器应用于输入图像的散焦特征以生成去模糊特征。为了有效地管理大模糊,IFAN将去模糊过滤器建模为一堆小尺寸的可分离过滤器。使用一种新的迭代自适应卷积(IAC)层将预测可分离去模糊滤波器应用于离焦特征。我们还提出了一种基于离焦视差估计和再模糊的训练方案,显著提高了去模糊质量。我们证明了我们的方法在真实图像上实现了最先进的定量和定性性能。 摘要:We propose a novel end-to-end learning-based approach for single image defocus deblurring. The proposed approach is equipped with a novel Iterative Filter Adaptive Network (IFAN) that is specifically designed to handle spatially-varying and large defocus blur. For adaptively handling spatially-varying blur, IFAN predicts pixel-wise deblurring filters, which are applied to defocused features of an input image to generate deblurred features. For effectively managing large blur, IFAN models deblurring filters as stacks of small-sized separable filters. Predicted separable deblurring filters are applied to defocused features using a novel Iterative Adaptive Convolution (IAC) layer. We also propose a training scheme based on defocus disparity estimation and reblurring, which significantly boosts the deblurring quality. We demonstrate that our method achieves state-of-the-art performance both quantitatively and qualitatively on real-world images.
半弱无监督|主动学习|不确定性(3篇)
【1】 S4-Crowd: Semi-Supervised Learning with Self-Supervised Regularisation for Crowd Counting 标题:S4-人群:具有自我监督规则化的半监督学习人群计数 链接:https://arxiv.org/abs/2108.13969
作者:Haoran Duan,Yu Guan 机构:Newcastle University 摘要:人群计数因其在智能城市中的广泛应用而受到越来越多的关注。最近的工作取得了良好的性能,但依赖于监督范式和昂贵的人群注释。为了降低注释成本,在这项工作中,我们提出了一个半监督学习框架S4-Crowd,它可以利用未标记/标记的数据进行鲁棒群组建模。在无监督路径中,提出了两种自监督损失来模拟人群的尺度、光照等变化,并在此基础上生成监督信息伪标签,并逐步细化。我们还提出了一种群组驱动的递归单元——门控群组递归单元(GCRU),它可以通过提取二阶统计量来保留鉴别群组信息,产生质量更好的伪标签。提出了一种包含无监督/监督信息的联合损失,并采用动态加权策略来平衡不同训练阶段的无监督损失和监督损失的重要性。我们在半监督环境下对四个流行的人群计数数据集进行了广泛的实验。实验结果表明,在我们的S4群组框架中,每个提议的组件都是有效的。在这些人群数据集上,我们的方法也优于其他最先进的半监督学习方法。 摘要:Crowd counting has drawn more attention because of its wide application in smart cities. Recent works achieved promising performance but relied on the supervised paradigm with expensive crowd annotations. To alleviate annotation cost, in this work we proposed a semi-supervised learning framework S4-Crowd, which can leverage both unlabeled/labeled data for robust crowd modelling. In the unsupervised pathway, two self-supervised losses were proposed to simulate the crowd variations such as scale, illumination, etc., based on which and the supervised information pseudo labels were generated and gradually refined. We also proposed a crowd-driven recurrent unit Gated-Crowd-Recurrent-Unit (GCRU), which can preserve discriminant crowd information by extracting second-order statistics, yielding pseudo labels with improved quality. A joint loss including both unsupervised/supervised information was proposed, and a dynamic weighting strategy was employed to balance the importance of the unsupervised loss and supervised loss at different training stages. We conducted extensive experiments on four popular crowd counting datasets in semi-supervised settings. Experimental results suggested the effectiveness of each proposed component in our S4-Crowd framework. Our method also outperformed other state-of-the-art semi-supervised learning approaches on these crowd datasets.
【2】 ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datasets 标题:ScatSimCLR:基于借口任务正则化的小规模数据集自监督对比学习 链接:https://arxiv.org/abs/2108.13939
作者:Vitaliy Kinakh,Olga Taran,Svyatoslav Voloshynovskiy 机构:Department of Computer Science, University of Geneva, Switzerland 摘要:在本文中,我们考虑了基于多个数据视图之间的对比损失的小规模数据集的自监督学习问题,这表明了分类任务的最新性能。尽管报告了这些结果,但诸如需要复杂体系结构的训练的复杂性、数据扩充所需的视图数量以及它们对分类精度的影响等因素都是尚未研究的问题。为了建立这些因素的作用,我们考虑了一个对比损失系统的体系结构,如SimCLR,将基线模型替换为几何不变的“手工制作”网络ScatNet和小的可训练适配器网络,并认为整个系统的参数数量和视图数量可以大大减少,同时实际保持相同的分类精度。此外,对于传统的基线模型和基于ScatNet的模型,我们还研究了基于增强变换参数(如旋转和拼图排列)估计的借口任务学习的正则化策略的影响。最后,我们证明了所提出的具有借口任务学习正则化的体系结构在可训练参数数量较少和视图数量较少的情况下实现了最先进的分类性能。 摘要:In this paper, we consider a problem of self-supervised learning for small-scale datasets based on contrastive loss between multiple views of the data, which demonstrates the state-of-the-art performance in classification task. Despite the reported results, such factors as the complexity of training requiring complex architectures, the needed number of views produced by data augmentation, and their impact on the classification accuracy are understudied problems. To establish the role of these factors, we consider an architecture of contrastive loss system such as SimCLR, where baseline model is replaced by geometrically invariant "hand-crafted" network ScatNet with small trainable adapter network and argue that the number of parameters of the whole system and the number of views can be considerably reduced while practically preserving the same classification accuracy. In addition, we investigate the impact of regularization strategies using pretext task learning based on an estimation of parameters of augmentation transform such as rotation and jigsaw permutation for both traditional baseline models and ScatNet based models. Finally, we demonstrate that the proposed architecture with pretext task learning regularization achieves the state-of-the-art classification performance with a smaller number of trainable parameters and with reduced number of views.
【3】 Scene Synthesis via Uncertainty-Driven Attribute Synchronization 标题:基于不确定性驱动的属性同步的场景合成 链接:https://arxiv.org/abs/2108.13499
作者:Haitao Yang,Zaiwei Zhang,Siming Yan,Haibin Huang,Chongyang Ma,Yi Zheng,Chandrajit Bajaj,Qixing Huang 机构:The University of Texas at Austin, Kuaishou Technology 摘要:开发深度神经网络以生成三维场景是神经合成中的一个基本问题,它直接应用于建筑CAD、计算机图形学以及生成虚拟机器人训练环境。这项任务具有挑战性,因为3D场景显示各种模式,从连续模式(如对象大小和形状对之间的相对姿势)到离散模式(如具有对称关系的对象的出现和共存)。本文介绍了一种新的神经场景合成方法,该方法可以捕获三维场景的各种特征模式。我们的方法结合了基于神经网络和传统场景合成方法的优点。我们使用从训练数据中学习的参数先验分布(提供对象属性和相对属性的不确定性)来正则化前馈神经模型的输出。此外,我们的方法不只是预测场景布局,而是预测一组过于完整的属性。这种方法允许我们利用预测属性之间的潜在一致性约束来修剪不可行的预测。实验结果表明,该方法的性能明显优于现有方法。生成的三维场景忠实地插值训练数据,同时保留连续和离散的特征模式。 摘要:Developing deep neural networks to generate 3D scenes is a fundamental problem in neural synthesis with immediate applications in architectural CAD, computer graphics, as well as in generating virtual robot training environments. This task is challenging because 3D scenes exhibit diverse patterns, ranging from continuous ones, such as object sizes and the relative poses between pairs of shapes, to discrete patterns, such as occurrence and co-occurrence of objects with symmetrical relationships. This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes. Our method combines the strength of both neural network-based and conventional scene synthesis approaches. We use the parametric prior distributions learned from training data, which provide uncertainties of object attributes and relative attributes, to regularize the outputs of feed-forward neural models. Moreover, instead of merely predicting a scene layout, our approach predicts an over-complete set of attributes. This methodology allows us to utilize the underlying consistency constraints among the predicted attributes to prune infeasible predictions. Experimental results show that our approach outperforms existing methods considerably. The generated 3D scenes interpolate the training data faithfully while preserving both continuous and discrete feature patterns.
GAN|对抗|攻击|生成相关(2篇)
【1】 Automatic digital twin data model generation of building energy systems from piping and instrumentation diagrams 标题:从管道和仪表图自动生成建筑能源系统的数字孪生数据模型 链接:https://arxiv.org/abs/2108.13912
作者:Florian Stinner,Martin Wiecek,Marc Baranski,Alexander Kümpel,Dirk Müller 机构:a RWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate 备注:None 摘要:建筑物直接和间接排放了当前二氧化碳排放量的很大一部分。在楼宇自动化系统(BAS)中,如模型预测控制(MPC),通过现代控制方法有很高的CO2节约潜力。为了进行适当的控制,MPC需要数学模型来预测受控系统的未来行为。为此,可以使用建筑物的数字孪生体。然而,在现有建筑中使用当前的方法时,数字孪生兄弟通常是劳动密集型的。尤其是将技术系统的不同组件连接到建筑的整体数字孪生兄弟是非常耗时的。管道和仪表图(P&ID)可提供所需信息,但有必要提取信息并以标准格式提供,以便进一步处理。在这项工作中,我们提出了一种以完全自动化的方式从建筑物中识别P&ID符号和连接的方法。建筑能源系统P&ID中符号的图形表示有各种标准。因此,我们使用不同的数据源和标准来生成一个完整的训练数据集。我们应用了符号识别、线识别和数据集连接推导的算法。此外,结果将导出为提供建筑能源系统语义的格式。符号识别、线路识别和连接识别结果良好,平均精度为93.7%,可用于控制生成、分布式模型预测控制或故障检测等后续过程。然而,这种方法还需要进一步研究。 摘要:Buildings directly and indirectly emit a large share of current CO2 emissions. There is a high potential for CO2 savings through modern control methods in building automation systems (BAS) like model predictive control (MPC). For a proper control, MPC needs mathematical models to predict the future behavior of the controlled system. For this purpose, digital twins of the building can be used. However, with current methods in existing buildings, a digital twin set up is usually labor-intensive. Especially connecting the different components of the technical system to an overall digital twin of the building is time-consuming. Piping and instrument diagrams (P&ID) can provide the needed information, but it is necessary to extract the information and provide it in a standardized format to process it further. In this work, we present an approach to recognize symbols and connections of P&ID from buildings in a completely automated way. There are various standards for graphical representation of symbols in P&ID of building energy systems. Therefore, we use different data sources and standards to generate a holistic training data set. We apply algorithms for symbol recognition, line recognition and derivation of connections to the data sets. Furthermore, the result is exported to a format that provides semantics of building energy systems. The symbol recognition, line recognition and connection recognition show good results with an average precision of 93.7%, which can be used in further processes like control generation, (distributed) model predictive control or fault detection. Nevertheless, the approach needs further research.
【2】 AIP: Adversarial Iterative Pruning Based on Knowledge Transfer for Convolutional Neural Networks 标题:AIP:卷积神经网络中基于知识转移的对抗性迭代剪枝 链接:https://arxiv.org/abs/2108.13591
作者:Jingfei Chang,Yang Lu,Ping Xue,Yiqun Xu,Zhen Wei 机构:Wei are with the School of ComputerScience and Information Engineering, Hefei University of Technology 备注:15 pages, 7 figures 摘要:随着结构复杂度的增加,卷积神经网络(CNN)的计算量越来越大。同时,已有的研究揭示了CNN中显著的参数冗余。现有的剪枝方法可以在性能下降很小的情况下对CNN进行压缩,但随着剪枝率的增加,压缩精度的损失更为严重。此外,由于剪枝过程中的精度下降,一些迭代剪枝方法难以准确识别和删除不重要的参数。我们提出了一种新的基于知识转移的CNN对抗性迭代剪枝方法(AIP)。原始网络被视为教师,压缩网络被视为学生。我们应用注意图和输出特征将信息从教师传递给学生。然后,设计了一个浅层全连通网络作为鉴别器,允许两个网络的输出进行对抗性博弈,从而在剪枝间隔之间快速恢复剪枝精度。最后,提出了一种基于信道重要性的迭代剪枝方案。我们在图像分类任务CIFAR-10、CIFAR-100和ILSVRC-2012上进行了广泛的实验,以验证我们的剪枝方法可以在不损失准确性的情况下实现CNN的高效压缩。在ILSVRC-2012上,当删除ResNet-18中36.78%的参数和45.55%的浮点运算(FLOPs)时,最高精度下降仅为0.66%。我们的方法在压缩率和准确性方面优于一些最先进的剪枝方案。此外,我们进一步证明了AIP在目标检测任务PASCAL VOC上具有良好的泛化能力。 摘要:With the increase of structure complexity, convolutional neural networks (CNNs) take a fair amount of computation cost. Meanwhile, existing research reveals the salient parameter redundancy in CNNs. The current pruning methods can compress CNNs with little performance drop, but when the pruning ratio increases, the accuracy loss is more serious. Moreover, some iterative pruning methods are difficult to accurately identify and delete unimportant parameters due to the accuracy drop during pruning. We propose a novel adversarial iterative pruning method (AIP) for CNNs based on knowledge transfer. The original network is regarded as the teacher while the compressed network is the student. We apply attention maps and output features to transfer information from the teacher to the student. Then, a shallow fully-connected network is designed as the discriminator to allow the output of two networks to play an adversarial game, thereby it can quickly recover the pruned accuracy among pruning intervals. Finally, an iterative pruning scheme based on the importance of channels is proposed. We conduct extensive experiments on the image classification tasks CIFAR-10, CIFAR-100, and ILSVRC-2012 to verify our pruning method can achieve efficient compression for CNNs even without accuracy loss. On the ILSVRC-2012, when removing 36.78% parameters and 45.55% floating-point operations (FLOPs) of ResNet-18, the Top-1 accuracy drop are only 0.66%. Our method is superior to some state-of-the-art pruning schemes in terms of compressing rate and accuracy. Moreover, we further demonstrate that AIP has good generalization on the object detection task PASCAL VOC.
Attention注意力(1篇)
【1】 Attention-based Multi-Reference Learning for Image Super-Resolution 标题:基于注意力的图像超分辨率多参考学习 链接:https://arxiv.org/abs/2108.13697
作者:Marco Pesavento,Marco Volino,Adrian Hilton 机构:Centre for Vision, Speech and Signal Processing, University of Surrey, UK 摘要:本文提出了一种新的基于注意的多参考超分辨率网络(AMRSR),该网络在给定低分辨率图像的情况下,学习将多参考图像中最相似的纹理自适应地传输到超分辨率输出,同时保持空间一致性。在多个基准数据集上,使用多个参考图像和基于注意的采样可以显著提高最先进的参考超分辨率方法的性能。参考超分辨率方法最近被提出,通过提供来自高分辨率参考图像的附加信息来克服图像超分辨率的不适定问题。多参考超分辨率通过提供更多样化的图像特征池来扩展此方法,以克服固有的信息不足,同时保持内存效率。提出了一种新的基于分层注意的抽样方法,用于基于感知损失的低分辨率图像特征与多幅参考图像之间的相似性学习。消融证明了多参考和基于分层注意的抽样对整体表现的贡献。即使参考图像明显偏离目标图像,感知和定量的地面真实度评估也显示出显著的性能改进。项目网站可在以下网址找到:https://marcopesavento.github.io/AMRSR/ 摘要:This paper proposes a novel Attention-based Multi-Reference Super-resolution network (AMRSR) that, given a low-resolution image, learns to adaptively transfer the most similar texture from multiple reference images to the super-resolution output whilst maintaining spatial coherence. The use of multiple reference images together with attention-based sampling is demonstrated to achieve significantly improved performance over state-of-the-art reference super-resolution approaches on multiple benchmark datasets. Reference super-resolution approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution reference image. Multi-reference super-resolution extends this approach by providing a more diverse pool of image features to overcome the inherent information deficit whilst maintaining memory efficiency. A novel hierarchical attention-based sampling approach is introduced to learn the similarity between low-resolution image features and multiple reference images based on a perceptual loss. Ablation demonstrates the contribution of both multi-reference and hierarchical attention-based sampling to overall performance. Perceptual and quantitative ground-truth evaluation demonstrates significant improvement in performance even when the reference images deviate significantly from the target image. The project website can be found at https://marcopesavento.github.io/AMRSR/
人脸|人群计数(2篇)
【1】 Super-Resolution Appearance Transfer for 4D Human Performances 标题:4D人体表演的超分辨率外观传递 链接:https://arxiv.org/abs/2108.13739
作者:Marco Pesavento,Marco Volino,Adrian Hilton 机构:Centre for Vision, Speech and Signal Processing, University of Surrey, UK 摘要:从多视点视频中对人进行4D重建的一个常见问题是捕获的动态纹理外观的质量,这取决于相机分辨率和捕获体积。通常,要求对摄像机进行帧处理以捕获动态性能的体积($>50m^3$)导致人员仅占据视野的一小部分$<$10%。即使使用超高清晰度4k视频采集,这也会导致以低于标准清晰度0.5k视频分辨率对人进行采样,从而导致低质量渲染。在本文中,我们提出了一种解决方案,通过使用数字静物照相机($>8k$)从静态高分辨率外观捕捉装置进行超分辨率外观转移,以小体积($<8m^3$)捕捉人物。提出了一种从高分辨率静态捕获到动态视频性能捕获的超分辨率外观转换管道,以生成超分辨率动态纹理。这解决了两个关键问题:不同摄像机系统之间的颜色映射;并利用学习到的模型进行动态纹理贴图超分辨率处理。对比评估表明,在呈现具有超分辨率动态纹理外观的4D性能捕获方面,在定性和定量方面都有显著改进。所提出的方法再现了静态捕获的高分辨率细节,同时保持捕获视频的外观动态。 摘要:A common problem in the 4D reconstruction of people from multi-view video is the quality of the captured dynamic texture appearance which depends on both the camera resolution and capture volume. Typically the requirement to frame cameras to capture the volume of a dynamic performance ($>50m^3$) results in the person occupying only a small proportion $<$ 10% of the field of view. Even with ultra high-definition 4k video acquisition this results in sampling the person at less-than standard definition 0.5k video resolution resulting in low-quality rendering. In this paper we propose a solution to this problem through super-resolution appearance transfer from a static high-resolution appearance capture rig using digital stills cameras ($> 8k$) to capture the person in a small volume ($<8m^3$). A pipeline is proposed for super-resolution appearance transfer from high-resolution static capture to dynamic video performance capture to produce super-resolution dynamic textures. This addresses two key problems: colour mapping between different camera systems; and dynamic texture map super-resolution using a learnt model. Comparative evaluation demonstrates a significant qualitative and quantitative improvement in rendering the 4D performance capture with super-resolution dynamic texture appearance. The proposed approach reproduces the high-resolution detail of the static capture whilst maintaining the appearance dynamics of the captured video.
【2】 Spectral Splitting and Aggregation Network for Hyperspectral Face Super-Resolution 标题:用于高光谱人脸超分辨率的光谱分裂聚合网络 链接:https://arxiv.org/abs/2108.13584
作者:Junjun Jiang,Chenyang Wang,Kui Jiang,Xianming Liu,Jiayi Ma 机构:Ma Member, IEEE 备注:12 pages, 10 figures 摘要:高分辨率(HR)高光谱人脸图像在非受控条件下(如弱光环境和欺骗攻击)与人脸相关的计算机视觉任务中起着重要作用。然而,高光谱人脸图像的密集光谱带是以有限数量的光子平均到达狭窄的光谱窗口为代价的,这大大降低了高光谱人脸图像的空间分辨率。在本文中,我们研究了如何将深度学习技术应用于高光谱人脸图像超分辨率(HFSR),特别是在训练样本非常有限的情况下。利用谱带的数量,每个谱带都可以看作一幅图像,我们提出了一个训练样本有限的HFSR谱分裂和聚合网络(SSANet)。在浅层中,我们将高光谱图像分成不同的光谱组,并将每个光谱组作为一个单独的训练样本(从某种意义上说,每个光谱组将被送入同一个网络)。然后,我们逐渐聚集更深层次的相邻波段,以利用光谱相关性。通过这种光谱分割和聚集策略(SSAS),我们可以将原始高光谱图像分割成多个样本,以支持网络的有效训练,并有效地利用光谱之间的光谱相关性。为了应对小训练样本量(S3)问题的挑战,我们建议通过自表示模型和对称诱导增强来扩展训练样本。实验表明,引入的SSANet能够很好地模拟空间和光谱信息的联合相关性。通过扩展训练样本,我们提出的方法可以有效地缓解S3问题。比较结果表明,我们提出的方法优于现有的方法。 摘要:High-resolution (HR) hyperspectral face image plays an important role in face related computer vision tasks under uncontrolled conditions, such as low-light environment and spoofing attacks. However, the dense spectral bands of hyperspectral face images come at the cost of limited amount of photons reached a narrow spectral window on average, which greatly reduces the spatial resolution of hyperspectral face images. In this paper, we investigate how to adapt the deep learning techniques to hyperspectral face image super-resolution (HFSR), especially when the training samples are very limited. Benefiting from the amount of spectral bands, in which each band can be seen as an image, we present a spectral splitting and aggregation network (SSANet) for HFSR with limited training samples. In the shallow layers, we split the hyperspectral image into different spectral groups and take each of them as an individual training sample (in the sense that each group will be fed into the same network). Then, we gradually aggregate the neighbor bands at the deeper layers to exploit the spectral correlations. By this spectral splitting and aggregation strategy (SSAS), we can divide the original hyperspectral image into multiple samples to support the efficient training of the network and effectively exploit the spectral correlations among spectrum. To cope with the challenge of small training sample size (S3) problem, we propose to expand the training samples by a self-representation model and symmetry-induced augmentation. Experiments show that the introduced SSANet can well model the joint correlations of spatial and spectral information. By expanding the training samples, our proposed method can effectively alleviate the S3 problem. The comparison results demonstrate that our proposed method can outperform the state-of-the-arts.
跟踪(2篇)
【1】 DepthTrack : Unveiling the Power of RGBD Tracking 标题:DepthTrack:揭开RGBD跟踪的神秘面纱 链接:https://arxiv.org/abs/2108.13962
作者:Song Yan,Jinyu Yang,Jani Käpylä,Feng Zheng,Aleš Leonardis,Joni-Kristian Kämäräinen 机构:Tampere University, Southern University of Science and Technology, University of Birmingham 备注:Accepted to ICCV2021 摘要:随着RGBD传感器在机器人等许多应用领域的普及,RGBD(RGB plus depth)目标跟踪正在获得发展势头。然而,最好的RGBD跟踪器是最先进的深RGB跟踪器的扩展。他们使用RGB数据进行训练,深度通道被用作诸如遮挡检测等细节的助手。这可以通过以下事实来解释:没有足够大的RGBD数据集1)训练深度跟踪器,2)用深度线索至关重要的序列挑战RGB跟踪器。这项工作引入了一个新的RGBD跟踪数据集-深度跟踪-它的序列(200)和场景类型(40)是现有最大数据集的两倍,对象(90)的三倍。此外,序列的平均长度(1473)、可变形对象的数量(16)和带注释的跟踪属性的数量(15)已经增加。此外,通过在DepthTrack上运行SotA RGB和RGBD跟踪器,我们提出了一种新的RGBD跟踪基线,即DeT,这表明深度RGBD跟踪确实受益于真实的训练数据。代码和数据集可在https://github.com/xiaozai/DeT 摘要:RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors have become popular in many application fields such as robotics.However, the best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no sufficiently large RGBD datasets to 1) train deep depth trackers and to 2) challenge RGB trackers with sequences for which the depth cue is essential. This work introduces a new RGBD tracking dataset - Depth-Track - that has twice as many sequences (200) and scene types (40) than in the largest existing dataset, and three times more objects (90). In addition, the average length of the sequences (1473), the number of deformable objects (16) and the number of annotated tracking attributes (15) have been increased. Furthermore, by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed benefits from genuine training data. The code and dataset is available at https://github.com/xiaozai/DeT
【2】 Is First Person Vision Challenging for Object Tracking? 标题:第一人称视觉对目标跟踪是否具有挑战性? 链接:https://arxiv.org/abs/2108.13665
作者:Matteo Dunnhofer,Antonino Furnari,Giovanni Maria Farinella,Christian Micheloni 机构:•Machine Learning and Perception Lab, University of Udine, Udine, Italy, ⋆Image Processing Laboratory, University of Catania, Catania, Italy 备注:IEEE/CVF International Conference on Computer Vision (ICCV) 2021, Visual Object Tracking Challenge VOT2021 workshop. arXiv admin note: text overlap with arXiv:2011.12263 摘要:理解人机交互是第一人称视觉(FPV)的基础。跟踪由相机佩戴者操纵的对象的跟踪算法可以提供有用的线索来有效地模拟这种交互。在过去的几年中,计算机视觉文献中提供的视觉跟踪解决方案显著提高了它们在各种目标对象和跟踪场景中的性能。然而,尽管之前有几次尝试在FPV应用中利用跟踪器,但仍然缺少对该领域最先进跟踪器性能的系统分析。本文首次对FPV中的目标跟踪进行了系统的研究,填补了这一空白。我们的研究从不同方面广泛分析了最新视觉跟踪器和基线FPV跟踪器的性能,并考虑了新的性能度量。这是通过TREK-150实现的,TREK-150是一种新的基准数据集,由150个密集注释的视频序列组成。我们的研究结果表明,FPV中的目标跟踪是一个具有挑战性的问题,这意味着需要对这个问题进行更多的研究,以便跟踪能够对FPV任务有所帮助。 摘要:Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks.
裁剪|量化|加速|压缩相关(1篇)
【1】 Pruning with Compensation: Efficient Channel Pruning for Deep Convolutional Neural Networks 标题:带补偿的修剪:深卷积神经网络的高效通道修剪 链接:https://arxiv.org/abs/2108.13728
作者:Zhouyang Xie,Yan Fu,Shengzhao Tian,Junlin Zhou,Duanbing Chen 机构:Tian is with School of Computer Science and Engineering, Universityof Electronic Science and Technology of China, Chen are also with School of Computer Science andEngineering, University of Electronic Science and Technology of China 摘要:通道剪枝是一种很有前途的深度卷积神经网络(DCNN)参数压缩和推理加速技术。本文旨在解决长期以来渠道修剪效率低下的问题。大多数信道剪枝方法都是通过从剩余参数或随机初始化中重新训练剪枝模型来恢复预测精度。这种再训练过程在很大程度上取决于计算资源、训练数据和人为干扰(调整训练策略)的充分性。本文提出了一种高效的剪枝方法,大大降低了剪枝DCNN的成本。我们的方法的主要贡献包括:1)剪枝补偿,一种快速且数据高效的重新训练替代方法,以最小化剪枝后重建特征的损失;2)补偿感知剪枝(CaP),一种新的剪枝算法,通过最小化信息损失来移除冗余或权重较小的通道,3)具有步长约束的二元结构搜索,最大限度地减少人为干扰。在包括CIFAR-10/100和ImageNet在内的基准上,我们的方法显示了在最先进的基于再训练的修剪方法中具有竞争力的修剪性能,更重要的是,减少了95%的处理时间和90%的数据使用。 摘要:Channel pruning is a promising technique to compress the parameters of deep convolutional neural networks(DCNN) and to speed up the inference. This paper aims to address the long-standing inefficiency of channel pruning. Most channel pruning methods recover the prediction accuracy by re-training the pruned model from the remaining parameters or random initialization. This re-training process is heavily dependent on the sufficiency of computational resources, training data, and human interference(tuning the training strategy). In this paper, a highly efficient pruning method is proposed to significantly reduce the cost of pruning DCNN. The main contributions of our method include: 1) pruning compensation, a fast and data-efficient substitute of re-training to minimize the post-pruning reconstruction loss of features, 2) compensation-aware pruning(CaP), a novel pruning algorithm to remove redundant or less-weighted channels by minimizing the loss of information, and 3) binary structural search with step constraint to minimize human interference. On benchmarks including CIFAR-10/100 and ImageNet, our method shows competitive pruning performance among the state-of-the-art retraining-based pruning methods and, more importantly, reduces the processing time by 95% and data usage by 90%.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】 Automatic labelling of urban point clouds using data fusion 标题:基于数据融合的城市点云自动标注 链接:https://arxiv.org/abs/2108.13757
作者:Daan Bloembergen,Chris Eijgenstein 机构:Chief Technology Office, City of Amsterdam, Amsterdam, The Netherlands 备注:5 pages, 5 figures; code for this paper is available at this https URL 摘要:在本文中,我们描述了一种半自动创建标记数据集的方法,用于城市街道水平点云的语义分割。我们使用数据融合技术,使用公共数据源(如高程数据和大比例尺地形图)自动标记点云的一部分,之后只需有限的人力检查结果并在需要时进行修改。这极大地限制了创建标记数据集所需的时间,该数据集的范围足以训练深层语义分割模型。我们将我们的方法应用于阿姆斯特丹地区的点云,并在标记的数据集上成功地训练了RandLA Net语义分割模型。这些结果显示了智能数据融合和语义分割在未来智能城市规划和管理中的潜力。 摘要:In this paper we describe an approach to semi-automatically create a labelled dataset for semantic segmentation of urban street-level point clouds. We use data fusion techniques using public data sources such as elevation data and large-scale topographical maps to automatically label parts of the point cloud, after which only limited human effort is needed to check the results and make amendments where needed. This drastically limits the time needed to create a labelled dataset that is extensive enough to train deep semantic segmentation models. We apply our method to point clouds of the Amsterdam region, and successfully train a RandLA-Net semantic segmentation model on the labelled dataset. These results demonstrate the potential of smart data fusion and semantic segmentation for the future of smart city planning and management.
3D|3D重建等相关(2篇)
【1】 Realistic Hands: A Hybrid Model for 3D Hand Reconstruction 标题:真实感手部:一种用于三维手部重建的混合模型 链接:https://arxiv.org/abs/2108.13995
作者:Michael Seeber,Martin R. Oswald,Roi Poranne 机构:ETH Zurich, University of Haifa 摘要:从RGB图像中稳健地估计三维手部网格是一项非常理想的任务,由于自由度众多以及自相似性和遮挡等问题,这项任务具有挑战性。以前的方法通常使用参数化三维手模型或采用无模型方法。虽然前者可以被认为更稳健,例如对遮挡,但它们的表现力较低。我们提出了一种混合方法,利用深度神经网络和基于差分渲染的优化来实现这两个方面的最佳效果。此外,我们还将虚拟现实(VR)作为一种应用进行了探索。现在大多数虚拟现实耳机都配备了多个摄像头,我们可以通过将我们的方法扩展到以自我为中心的立体领域来利用这些摄像头。事实证明,这种扩展对上述问题更有弹性。最后,作为一个用例,我们展示了改进的图像模型对齐可以用于获取用户的手纹理,从而获得更真实的虚拟手表示。 摘要:Estimating 3D hand meshes from RGB images robustly is a highly desirable task, made challenging due to the numerous degrees of freedom, and issues such as self similarity and occlusions. Previous methods generally either use parametric 3D hand models or follow a model-free approach. While the former can be considered more robust, e.g. to occlusions, they are less expressive. We propose a hybrid approach, utilizing a deep neural network and differential rendering based optimization to demonstrably achieve the best of both worlds. In addition, we explore Virtual Reality (VR) as an application. Most VR headsets are nowadays equipped with multiple cameras, which we can leverage by extending our method to the egocentric stereo domain. This extension proves to be more resilient to the above mentioned issues. Finally, as a use-case, we show that the improved image-model alignment can be used to acquire the user's hand texture, which leads to a more realistic virtual hand representation.
【2】 LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies 标题:LSD结构网:三维零件层次结构细节的建模 链接:https://arxiv.org/abs/2108.13459
作者:Dominic Roberts,Ara Danielyan,Hang Chu,Mani Golparvar-Fard,David Forsyth 机构:University of Illinois at Urbana-Champaign, Autodesk AI Lab 备注:accepted by ICCV 2021 摘要:由零件层次结构表示的三维形状的生成模型可以生成真实和多样的输出集。但是,现有模型在整体建模形状方面存在关键的实际限制,因此无法执行条件采样,即,如果不修改形状的其余部分,它们无法在生成的形状的各个部分上生成变体。这限制了3D CAD设计等涉及在多个细节级别调整创建形状的应用。为了解决这个问题,我们引入了LSD StructureNet,它是StructureNet体系结构的一个扩展,可以重新生成位于其输出层次结构中任意位置的部件。我们通过学习每个层次深度的单个概率条件解码器来实现这一点。我们在PartNet数据集上评估LSD StructureNet,该数据集是由零件层次结构表示的最大3D形状数据集。我们的结果表明,与现有方法相反,LSD StructureNet可以在不影响推理速度或其输出的真实性和多样性的情况下执行条件采样。 摘要:Generative models for 3D shapes represented by hierarchies of parts can generate realistic and diverse sets of outputs. However, existing models suffer from the key practical limitation of modelling shapes holistically and thus cannot perform conditional sampling, i.e. they are not able to generate variants on individual parts of generated shapes without modifying the rest of the shape. This is limiting for applications such as 3D CAD design that involve adjusting created shapes at multiple levels of detail. To address this, we introduce LSD-StructureNet, an augmentation to the StructureNet architecture that enables re-generation of parts situated at arbitrary positions in the hierarchies of its outputs. We achieve this by learning individual, probabilistic conditional decoders for each hierarchy depth. We evaluate LSD-StructureNet on the PartNet dataset, the largest dataset of 3D shapes represented by hierarchies of parts. Our results show that contrarily to existing methods, LSD-StructureNet can perform conditional sampling without impacting inference speed or the realism and diversity of its outputs.
其他神经网络|深度学习|模型|建模(4篇)
【1】 Deep Learning on Edge TPUs 标题:边缘TPU的深度学习 链接:https://arxiv.org/abs/2108.13732
作者:Andreas M Kist 机构:KistDepartment Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen-NürnbergGermanyandreas 备注:8 pages, 3 figures, 3 tables 摘要:边缘计算在远程环境中非常重要,然而,传统硬件并没有针对深度神经网络进行优化。Google Edge TPU是一款新兴的硬件加速器,具有成本、功耗和速度效率,可用于原型设计和生产目的。在此,我将回顾Edge TPU平台、使用Edge TPU完成的任务,以及将模型部署到Edge TPU硬件所需的步骤。Edge TPU不仅能够处理常见的计算机视觉任务,而且优于其他硬件加速器,尤其是当整个模型可以部署到Edge TPU时。将Edge TPU共同嵌入摄像头,可以无缝分析原始数据。总之,Edge TPU是一个成熟的系统,已经在多个任务中证明了其可用性。 摘要:Computing at the edge is important in remote settings, however, conventional hardware is not optimized for utilizing deep neural networks. The Google Edge TPU is an emerging hardware accelerator that is cost, power and speed efficient, and is available for prototyping and production purposes. Here, I review the Edge TPU platform, the tasks that have been accomplished using the Edge TPU, and which steps are necessary to deploy a model to the Edge TPU hardware. The Edge TPU is not only capable of tackling common computer vision tasks, but also surpasses other hardware accelerators, especially when the entire model can be deployed to the Edge TPU. Co-embedding the Edge TPU in cameras allows a seamless analysis of primary data. In summary, the Edge TPU is a maturing system that has proven its usability across multiple tasks.
【2】 Module-Power Prediction from PL Measurements using Deep Learning 标题:基于深度学习的PL测量模功率预测 链接:https://arxiv.org/abs/2108.13640
作者:Mathis Hoffmann,Johannes Hepp,Bernd Doll,Claudia Buerhop-Lutz,Ian Marius Peters,Christoph Brabec,Andreas Maier,Vincent Christlein 机构:Pattern Recognition Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany (FAU), Materials for Electronics and Energy Technology, FAU, Helmholtz Institut Erlangen N¨urnberg, Erlangen, Germany 摘要:光伏组件功率损失的个别原因已经研究了相当长的一段时间。最近,研究表明,模块的功率损耗与非活动区域的比例有关。虽然这些区域可以很容易地从电致发光(EL)图像中识别出来,但这对于光致发光(PL)图像来说要困难得多。通过这项工作,我们缩小了EL和PL图像的功率回归之间的差距。我们应用深度卷积神经网络从PL图像预测模块功率,平均绝对误差(MAE)为4.4%或11.7WP。此外,我们描述了从训练网络的嵌入计算的回归映射可以用于计算局部功率损耗。最后,我们证明了这些回归映射也可以用于识别PL图像中的非活动区域。 摘要:The individual causes for power loss of photovoltaic modules are investigated for quite some time. Recently, it has been shown that the power loss of a module is, for example, related to the fraction of inactive areas. While these areas can be easily identified from electroluminescense (EL) images, this is much harder for photoluminescence (PL) images. With this work, we close the gap between power regression from EL and PL images. We apply a deep convolutional neural network to predict the module power from PL images with a mean absolute error (MAE) of 4.4% or 11.7WP. Furthermore, we depict that regression maps computed from the embeddings of the trained network can be used to compute the localized power loss. Finally, we show that these regression maps can be used to identify inactive regions in PL images as well.
【3】 Spike time displacement based error backpropagation in convolutional spiking neural networks 标题:卷积尖峰神经网络中基于尖峰时间位移的误差反向传播 链接:https://arxiv.org/abs/2108.13621
作者:Maryam Mirsadeghi,Majid Shalchian,Saeed Reza Kheradpisheh,Timothée Masquelier 机构:Timoth´ee Masquelier, Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran, Department of Computer and data Sciences, Shahid Beheshti University, Tehran, Iran, CerCo UMR , CNRS Universit´e Toulouse , France 摘要:我们最近提出了STiDi BP算法,该算法避免了向后递归梯度计算,用于训练具有单尖峰时间编码的多层尖峰神经网络(SNN)。该算法采用线性近似来计算尖峰潜伏期相对于膜电位的导数,并使用具有分段线性突触后电位的尖峰神经元来降低计算成本和神经处理的复杂性。在本文中,我们扩展了STiDi-BP算法,将其应用于更深层次的卷积结构。基于两个流行基准MNIST和Fashion MNIST数据集的图像分类任务的评估结果表明,该算法的准确率分别为99.2%和92.8%,证实了该算法在深度SNN中的适用性。我们考虑的另一个问题是内存存储和计算成本的减少。要做到这一点,我们考虑卷积SNN(CSNN)与两组权重:实值权重,更新在后向传递和它们的符号,二进制权重,在前馈过程中使用。我们在两个数据集MNIST和Fashion MNIST上对二进制CSNN进行了评估,并获得了可接受的性能,与实值权重相关的精度下降可以忽略不计(分别约为$0.6%$和$0.8%$)。 摘要:We recently proposed the STiDi-BP algorithm, which avoids backward recursive gradient computation, for training multi-layer spiking neural networks (SNNs) with single-spike-based temporal coding. The algorithm employs a linear approximation to compute the derivative of the spike latency with respect to the membrane potential and it uses spiking neurons with piecewise linear postsynaptic potential to reduce the computational cost and the complexity of neural processing. In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures. The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST datasets with the accuracies of respectively 99.2% and 92.8%, confirm that this algorithm has been applicable in deep SNNs. Another issue we consider is the reduction of memory storage and computational cost. To do so, we consider a convolutional SNN (CSNN) with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process. We evaluate the binary CSNN on two datasets of MNIST and Fashion-MNIST and obtain acceptable performance with a negligible accuracy drop with respect to real-valued weights (about $0.6%$ and $0.8%$ drops, respectively).
【4】 The Application of Convolutional Neural Networks for Tomographic Reconstruction of Hyperspectral Images 标题:卷积神经网络在高光谱图像层析重建中的应用 链接:https://arxiv.org/abs/2108.13458
作者:Wei-Chih Huang,Mads Svanborg Peters,Mads Juul Ahlebaek,Mads Toudal Frandsen,René Lynge Eriksen,Bjarke Jørgensen 机构:CP,-Origins, University of Southern Denmark, Campusvej , Odense M, Denmark, Newtec Engineering AS, Odense, Denmark, Department of Physics, Chemistry and Pharmacy, Mads Clausen Institute, University of Southern Denmark, ) 备注:22 pages, 12 figures and 3 tables 摘要:提出了一种利用卷积神经网络(CNNs)从CT成像光谱仪(CTIS)图像重建高光谱立方体的新方法。目前的重建算法在大量光谱通道的情况下,重建时间长,精度一般。构造的CNN比标准的期望最大化算法具有更高的精度和更短的重建时间。此外,该网络可以同时处理两种不同类型的真实图像——特别是考虑了ColorChecker和carrot光谱图像。这项工作为从CTIS图像实时重建高光谱立方体铺平了道路。 摘要:A novel method, utilizing convolutional neural networks (CNNs), is proposed to reconstruct hyperspectral cubes from computed tomography imaging spectrometer (CTIS) images. Current reconstruction algorithms are usually subject to long reconstruction times and mediocre precision in cases of a large number of spectral channels. The constructed CNNs deliver higher precision and shorter reconstruction time than a standard expectation maximization algorithm. In addition, the network can handle two different types of real-world images at the same time -- specifically ColorChecker and carrot spectral images are considered. This work paves the way toward real-time reconstruction of hyperspectral cubes from CTIS images.
其他(7篇)
【1】 Estimation of Air Pollution with Remote Sensing Data: Revealing Greenhouse Gas Emissions from Space 标题:用遥感数据估算空气污染:揭示来自太空的温室气体排放 链接:https://arxiv.org/abs/2108.13902
作者:Linus Scheibenreif,Michael Mommert,Damian Borth 机构:an-thropogenic GHG emissions from the combustion of fossilfuels in industrial plants or for transportation are harmful 1Institute of Computer Science, University of St 备注:for associated codebase, see this https URL 摘要:空气污染是气候变化的主要驱动力。为运输和发电而燃烧化石燃料所产生的人为排放物排放了大量有问题的空气污染物,包括温室气体(GHG)。尽管限制温室气体排放对缓解气候变化十分重要,但很难获得有关温室气体和其他空气污染物时空分布的详细信息。现有的地表空气污染模型依赖于广泛的土地利用数据集,这些数据集通常是局部受限的,并且是暂时静态的。这项工作为环境空气污染预测提出了一种深度学习方法,该方法仅依赖于全球可用且经常更新的遥感数据。将光学卫星图像与基于卫星的大气柱密度空气污染测量相结合,可以在任意位置将空气污染估计值(本例中为NO$_2$)缩放到高空间分辨率(高达$sim$10m),并在这些估计值中添加时间成分。当根据地面站的空气质量测量值(平均绝对误差$<$6$~mu g/m^3$)进行评估时,建议的模型具有较高的精度。我们的结果有助于识别和暂时监测空气污染和温室气体的主要来源。 摘要:Air pollution is a major driver of climate change. Anthropogenic emissions from the burning of fossil fuels for transportation and power generation emit large amounts of problematic air pollutants, including Greenhouse Gases (GHGs). Despite the importance of limiting GHG emissions to mitigate climate change, detailed information about the spatial and temporal distribution of GHG and other air pollutants is difficult to obtain. Existing models for surface-level air pollution rely on extensive land-use datasets which are often locally restricted and temporally static. This work proposes a deep learning approach for the prediction of ambient air pollution that only relies on remote sensing data that is globally available and frequently updated. Combining optical satellite imagery with satellite-based atmospheric column density air pollution measurements enables the scaling of air pollution estimates (in this case NO$_2$) to high spatial resolution (up to $sim$10m) at arbitrary locations and adds a temporal component to these estimates. The proposed model performs with high accuracy when evaluated against air quality measurements from ground stations (mean absolute error $<$6$~mu g/m^3$). Our results enable the identification and temporal monitoring of major sources of air pollution and GHGs.
【2】 PACE: Posthoc Architecture-Agnostic Concept Extractor for Explaining CNNs 标题:PACE:用于解释CNN的后架构不可知性概念提取器 链接:https://arxiv.org/abs/2108.13828
作者:Vidhya Kamakshi,Uday Gupta,Narayanan C Krishnan 机构:Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Rupnagar - , Punjab, India. 备注:Accepted at International Joint Conference on Neural Networks (IJCNN 2021) 摘要:深度CNN虽然在图像分类任务中取得了最先进的性能,但对于使用它们的人来说仍然是一个黑匣子。人们越来越有兴趣解释这些深层模型的工作原理,以提高其可信度。在本文中,我们介绍了一种Posthoc架构不可知概念提取器(PACE),它可以自动提取图像中较小的子区域,称为与黑盒预测相关的概念。PACE将解释框架的忠实性与黑盒模型紧密结合。据我们所知,这是第一个以事后方式自动提取特定于类的区别性概念的工作。PACE框架用于为两种不同的CNN体系结构生成解释,这两种体系结构经过训练,用于对AWA2和Imagenet鸟类数据集进行分类。大量的人体实验被用来验证由PACE提取的解释的人类可解释性和一致性。这些实验的结果表明,PACE提取的概念中有72%以上是人类可以理解的。 摘要:Deep CNNs, though have achieved the state of the art performance in image classification tasks, remain a black-box to a human using them. There is a growing interest in explaining the working of these deep models to improve their trustworthiness. In this paper, we introduce a Posthoc Architecture-agnostic Concept Extractor (PACE) that automatically extracts smaller sub-regions of the image called concepts relevant to the black-box prediction. PACE tightly integrates the faithfulness of the explanatory framework to the black-box model. To the best of our knowledge, this is the first work that extracts class-specific discriminative concepts in a posthoc manner automatically. The PACE framework is used to generate explanations for two different CNN architectures trained for classifying the AWA2 and Imagenet-Birds datasets. Extensive human subject experiments are conducted to validate the human interpretability and consistency of the explanations extracted by PACE. The results from these experiments suggest that over 72% of the concepts extracted by PACE are human interpretable.
【3】 Self-Calibrating Neural Radiance Fields 标题:自定标神经辐射场 链接:https://arxiv.org/abs/2108.13826
作者:Yoonwoo Jeong,Seokjun Ahn,Christopher Choy,Animashree Anandkumar,Minsu Cho,Jaesik Park 机构:POSTECH, NVIDIA, Caltech 备注:Accepted in ICCV21 摘要:在这项工作中,我们提出了一种适用于具有任意非线性畸变的普通摄像机的摄像机自标定算法。我们在没有任何校准对象的情况下共同学习场景的几何体和精确的相机参数。我们的相机模型由针孔模型、四阶径向畸变和可学习任意非线性相机畸变的通用噪声模型组成。虽然传统的自校准算法主要依赖于几何约束,但我们还加入了光度一致性。这需要学习场景的几何结构,我们使用神经辐射场(NeRF)。我们还提出了一种新的几何损失函数,即投影射线距离损失,以结合复杂非线性相机模型的几何一致性。我们在标准真实图像数据集上验证了我们的方法,并证明了我们的模型可以从头开始学习相机的内部和外部(姿势),而无需COLMAP初始化。此外,我们还表明,以可微的方式学习精确的相机模型可以使我们在基线上提高峰值信噪比。我们的模块是一个易于使用的插件,可应用于NeRF变体以提高性能。代码和数据目前可在https://github.com/POSTECH-CVLab/SCNeRF 摘要:In this work, we propose a camera self-calibration algorithm for generic cameras with arbitrary non-linear distortions. We jointly learn the geometry of the scene and the accurate camera parameters without any calibration objects. Our camera model consists of a pinhole model, a fourth order radial distortion, and a generic noise model that can learn arbitrary non-linear camera distortions. While traditional self-calibration algorithms mostly rely on geometric constraints, we additionally incorporate photometric consistency. This requires learning the geometry of the scene, and we use Neural Radiance Fields (NeRF). We also propose a new geometric loss function, viz., projected ray distance loss, to incorporate geometric consistency for complex non-linear camera models. We validate our approach on standard real image datasets and demonstrate that our model can learn the camera intrinsics and extrinsics (pose) from scratch without COLMAP initialization. Also, we show that learning accurate camera models in a differentiable manner allows us to improve PSNR over baselines. Our module is an easy-to-use plugin that can be applied to NeRF variants to improve performance. The code and data are currently available at https://github.com/POSTECH-CVLab/SCNeRF
【4】 SemIE: Semantically-aware Image Extrapolation 标题:Semie:语义感知的图像外推 链接:https://arxiv.org/abs/2108.13702
作者:Bholeshwar Khurana,Soumya Ranjan Dash,Abhishek Bhatia,Aniruddha Mahapatra,Hrituraj Singh,Kuldeep Kulkarni 机构:IIT Kanpur, Adobe Research India, Triomics 备注:To appear in International Conference on Computer Vision (ICCV) 2021. Project URL: this https URL 摘要:我们提出了一个语义感知的新范例来执行图像外推,从而能够添加新的对象实例。所有以前的方法都局限于其外推能力,仅扩展图像中已经存在的对象。然而,我们提出的方法不仅关注(i)扩展已经存在的对象,还关注(ii)基于上下文在扩展区域中添加新对象。为此,对于给定的图像,我们首先使用最先进的语义分割方法获得对象分割图。因此,将获得的分割图输入网络,以计算外推语义分割和相应的全景分割图。进一步利用输入图像和获得的分割图来生成最终的外推图像。我们在Cityscapes和ADE20K卧室数据集上进行了实验,结果表明,我们的方法在FID和对象共现统计的相似性方面优于所有基线。 摘要:We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the extended region based on the context. To this end, for a given image, we first obtain an object segmentation map using a state-of-the-art semantic segmentation method. The, thus, obtained segmentation map is fed into a network to compute the extrapolated semantic segmentation and the corresponding panoptic segmentation maps. The input image and the obtained segmentation maps are further utilized to generate the final extrapolated image. We conduct experiments on Cityscapes and ADE20K-bedroom datasets and show that our method outperforms all baselines in terms of FID, and similarity in object co-occurrence statistics.
【5】 Dead Pixel Test Using Effective Receptive Field 标题:利用有效感受野进行死像素测试 链接:https://arxiv.org/abs/2108.13576
作者:Bum Jun Kim,Hyeyeon Choi,Hyeonah Jang,Dong Gu Lee,Wonseok Jeong,Sang Woo Kim 机构:Department of Electrical Engineering, Pohang University of Science and Technology, Graduate School of Artificial Intelligence, Pohang University of Science and Technology 备注:9 pages, 5 figures 摘要:深度神经网络已被应用于各个领域,但其内部行为尚不清楚。在这项研究中,我们讨论了卷积神经网络(CNN)的两种反直觉行为。首先,我们评估了感受野的大小。以前的研究试图增加或控制感受野的大小。然而,我们观察到感受野的大小并不能描述分类的准确性。感受野的大小不适合代表性能优势,因为它只反映深度或内核大小,而不反映宽度或基数等其他因素。其次,利用有效感受野,我们检查了影响输出的像素。直观地说,每个像素对最终输出的贡献是相等的。然而,我们发现存在部分死区状态的像素,对输出几乎没有贡献。我们揭示了造成这种现象的原因在于CNN的架构,并讨论了减少这种现象的解决方案。有趣的是,对于一般分类任务,死像素的存在改善了CNN的训练。然而,在捕获小扰动的任务中,死像素会降低性能。因此,在CNN的实际应用中,应该理解和考虑这些死像素的存在。 摘要:Deep neural networks have been used in various fields, but their internal behavior is not well known. In this study, we discuss two counterintuitive behaviors of convolutional neural networks (CNNs). First, we evaluated the size of the receptive field. Previous studies have attempted to increase or control the size of the receptive field. However, we observed that the size of the receptive field does not describe the classification accuracy. The size of the receptive field would be inappropriate for representing superiority in performance because it reflects only depth or kernel size and does not reflect other factors such as width or cardinality. Second, using the effective receptive field, we examined the pixels contributing to the output. Intuitively, each pixel is expected to equally contribute to the final output. However, we found that there exist pixels in a partially dead state with little contribution to the output. We reveal that the reason for this lies in the architecture of CNN and discuss solutions to reduce the phenomenon. Interestingly, for general classification tasks, the existence of dead pixels improves the training of CNNs. However, in a task that captures small perturbation, dead pixels degrade the performance. Therefore, the existence of these dead pixels should be understood and considered in practical applications of CNN.
【6】 Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision 标题:低碳计算机视觉的全周期能耗基准 链接:https://arxiv.org/abs/2108.13465
作者:Bo Li,Xinyang Jiang,Donglin Bai,Yuge Zhang,Ningxin Zheng,Xuanyi Dong,Lu Liu,Yuqing Yang,Dongsheng Li 机构:Microsoft Research Asia, University of Technology Sydney 备注:ArXiv Preprint 摘要:深度学习模式的能源消耗正以惊人的速度增长,这引起了人们对全球变暖和气候变化背景下碳中和的潜在负面影响的担忧。随着有效的深度学习技术的进步,例如模型压缩,研究人员可以获得参数更少、延迟更小的有效模型。然而,大多数现有的有效的深度学习方法没有明确地将能耗作为一个关键性能指标。此外,现有的方法大多侧重于得到的有效模型的推理成本,而忽略了算法整个生命周期中显著的能量消耗。在本文中,我们提出了第一个高效计算机视觉模型的大规模能源消耗基准,其中提出了一个新的指标来明确评估不同模型使用强度下的全周期能源消耗。当在不同的模型使用场景中选择有效的深度学习算法时,基准可以为低碳排放提供见解。 摘要:The energy consumption of deep learning models is increasing at a breathtaking rate, which raises concerns due to potential negative effects on carbon neutrality in the context of global warming and climate change. With the progress of efficient deep learning techniques, e.g., model compression, researchers can obtain efficient models with fewer parameters and smaller latency. However, most of the existing efficient deep learning methods do not explicitly consider energy consumption as a key performance indicator. Furthermore, existing methods mostly focus on the inference costs of the resulting efficient models, but neglect the notable energy consumption throughout the entire life cycle of the algorithm. In this paper, we present the first large-scale energy consumption benchmark for efficient computer vision models, where a new metric is proposed to explicitly evaluate the full-cycle energy consumption under different model usage intensity. The benchmark can provide insights for low carbon emission when selecting efficient deep learning algorithms in different model usage scenarios.
【7】 OARnet: Automated organs-at-risk delineation in Head and Neck CT images 标题:OARnet:在头颈部CT图像中自动勾画危险器官 链接:https://arxiv.org/abs/2108.13987
作者:Mumtaz Hussain Soomro,Hamidreza Nourzadeh,Victor Gabriel Leandro Alves,Wookjin Choi,Jeffrey V. Siebers 机构:University of Virginia Health System, Charlottesville, VA, Thomas Jefferson University Hospital, Philadelphia, PA, Virgina State University, Petersburg, VA, Objective: To auto-delineate organs-at-risk (OARs) in head and neck (H&N) CT image sets via a 摘要:开发了一个三维深度学习模型(OARnet),用于在CT图像上描绘28个H&N OAR。OARnet利用一个紧密连接的网络来检测OAR边界框,然后在框内描绘OAR。它将任何层的信息重用到后续层,并使用跳过连接来组合来自不同密集块级别的信息,以逐步提高描绘精度。训练使用来自165个CTs的多达28个专家手册描绘(MD)桨。对70个其他CT评估了与MD相关的骰子相似系数(DSC)和第95百分位Hausdorff距离(HD95)。对70次CT中的56次进行MD的平均、最大和均方根剂量差评估。OARnet与UaNet、AnatomyNet和多图谱分割(MAS)进行了比较。使用95%置信区间的Wilcoxon符号秩检验来评估显著性。Wilcoxon签名排名测试表明,与UaNet相比,OARnet改善了DSC(23/28 OAR)和HD95(17/28)(p<0.05)。OARnet在DSC(28/28)和HD95(27/28)方面均优于AnatomyNet和MAS。与UaNet相比,OARnet将中值DSC提高至0.05,HD95提高至1.5mm。与AnatomyNet和MAS相比,OARnet将中值(DSC,HD95)提高了(0.08,2.7mm)和(0.17,6.3mm)。在剂量学上,OARnet优于UaNet(Dmax 7/28;Dmean 10/28),AnatomyNet(Dmax 21/28;Dmean 24/28)和MAS(Dmax 22/28;Dmean 21/28)。DenseNet架构使用混合方法进行优化,该方法执行OAR特定的边界框检测,然后进行特征识别。与其他自动描绘方法相比,OARnet在28个H&N OAR中,除了一个几何(颞叶L,HD95)和一个剂量学(眼睛L,平均剂量)终点外,在所有其他方面都优于或等于UaNet,在所有OAR中都优于或等于AnatomyNet和MAS。 摘要:A 3D deep learning model (OARnet) is developed and used to delineate 28 H&N OARs on CT images. OARnet utilizes a densely connected network to detect the OAR bounding-box, then delineates the OAR within the box. It reuses information from any layer to subsequent layers and uses skip connections to combine information from different dense block levels to progressively improve delineation accuracy. Training uses up to 28 expert manual delineated (MD) OARs from 165 CTs. Dice similarity coefficient (DSC) and the 95th percentile Hausdorff distance (HD95) with respect to MD is assessed for 70 other CTs. Mean, maximum, and root-mean-square dose differences with respect to MD are assessed for 56 of the 70 CTs. OARnet is compared with UaNet, AnatomyNet, and Multi-Atlas Segmentation (MAS). Wilcoxon signed-rank tests using 95% confidence intervals are used to assess significance. Wilcoxon signed ranked tests show that, compared with UaNet, OARnet improves (p<0.05) the DSC (23/28 OARs) and HD95 (17/28). OARnet outperforms both AnatomyNet and MAS for DSC (28/28) and HD95 (27/28). Compared with UaNet, OARnet improves median DSC up to 0.05 and HD95 up to 1.5mm. Compared with AnatomyNet and MAS, OARnet improves median (DSC, HD95) by up to (0.08, 2.7mm) and (0.17, 6.3mm). Dosimetrically, OARnet outperforms UaNet (Dmax 7/28; Dmean 10/28), AnatomyNet (Dmax 21/28; Dmean 24/28), and MAS (Dmax 22/28; Dmean 21/28). The DenseNet architecture is optimized using a hybrid approach that performs OAR-specific bounding box detection followed by feature recognition. Compared with other auto-delineation methods, OARnet is better than or equal to UaNet for all but one geometric (Temporal Lobe L, HD95) and one dosimetric (Eye L, mean dose) endpoint for the 28 H&N OARs, and is better than or equal to both AnatomyNet and MAS for all OARs.
机器翻译,仅供参考