Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.CV 方向,今日共计40篇
Transformer(1篇)
【1】 Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models 标题:基于Transformer模型聚合的VSPW数据集语义分割 链接:https://arxiv.org/abs/2109.01316
作者:Zixuan Chen,Junhong Zou,Xiaotao Wang 机构:Xiaomi Inc. 摘要:语义分割是计算机视觉中的一项重要任务,从中衍生出一些重要的使用场景,如自动驾驶、场景解析等。由于对视频语义分割任务的重视,我们参加了本次比赛。在本报告中,我们简要介绍了ICCV2021团队“BetterThing”的解决方案-野外挑战中的视频场景解析。变换器被用作提取视频帧特征的主干,最终结果是两个变换器模型(SWN和VOLO)的输出聚合。该解决方案实现了57.3%的mIoU,在Wild Challenge的视频场景解析中排名第三。 摘要:Semantic segmentation is an important task in computer vision, from which some important usage scenarios are derived, such as autonomous driving, scene parsing, etc. Due to the emphasis on the task of video semantic segmentation, we participated in this competition. In this report, we briefly introduce the solutions of team 'BetterThing' for the ICCV2021 - Video Scene Parsing in the Wild Challenge. Transformer is used as the backbone for extracting video frame features, and the final result is the aggregation of the output of two Transformer models, SWIN and VOLO. This solution achieves 57.3% mIoU, which is ranked 3rd place in the Video Scene Parsing in the Wild Challenge.
检测相关(2篇)
【1】 MitoDet: Simple and robust mitosis detection 标题:MitoDet:简单而可靠的有丝分裂检测 链接:https://arxiv.org/abs/2109.01485
作者:Jakob Dexl,Michaela Benz,Volker Bruns,Petr Kuritcyn,Thomas Wittenberg 机构:Fraunhofer Institute for Integrated Circuits IIS, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) 摘要:有丝分裂图形检测在数字病理学中是一项具有挑战性的任务,直接影响治疗决策。虽然自动化方法通常在实验室条件下取得可接受的结果,但它们在临床部署阶段往往失败。这个问题主要归因于一种称为畴移的现象。不同的显微镜及其摄像系统引入了一个重要的域偏移源,这显著地改变了数字化图像的颜色表示。在该方法描述中,我们提出了我们提交的有丝分裂域泛化挑战算法,该算法使用了经过强数据增强训练的视网膜网,在初步测试集上获得了0.7138的F1分数。 摘要:Mitotic figure detection is a challenging task in digital pathology that has a direct impact on therapeutic decisions. While automated methods often achieve acceptable results under laboratory conditions, they frequently fail in the clinical deployment phase. This problem can be mainly attributed to a phenomenon called domain shift. An important source of a domain shift is introduced by different microscopes and their camera systems, which noticeably change the color representation of digitized images. In this method description we present our submitted algorithm for the Mitosis Domain Generalization Challenge, which employs a RetinaNet trained with strong data augmentation and achieves an F1 score of 0.7138 on the preliminary test set.
【2】 Multi-centred Strong Augmentation via Contrastive Learning for Unsupervised Lesion Detection and Segmentation 标题:基于对比学习的多中心强增强无监督病变检测与分割 链接:https://arxiv.org/abs/2109.01303
作者:Yu Tian,Fengbei Liu,Guansong Pang,Yuanhong Chen,Yuyuan Liu,Johan W. Verjans,Rajvinder Singh,Gustavo Carneiro 机构:Carneiroare with the Australian Institute for Machine Learning, University ofAdelaide, University of Adelaide 备注:Submit to IEEE Transactions on Medical Imaging (TMI); Under Review 摘要:缺乏高质量的医学图像注释阻碍了准确的临床应用,以检测和分割异常病变。为了缓解这一问题,科学界正在开发无监督异常检测(UAD)系统,该系统从仅包含正常(即健康)图像的训练集学习,其中异常样本(即。,根据与正常样本学习分布的偏差程度,检测和分割不健康样本。UAD方法面临的一个重大挑战是如何学习有效的低维图像表示,这些低维图像表示足够敏感,能够检测和分割不同大小、外观和形状的异常病变。为了应对这一挑战,我们提出了一种新的自监督UAD预训练算法,称为基于对比学习的多中心强增强(MSACL)。MSACL通过分离正常图像样本的几种类型的强增强和弱增强来学习表示,其中弱增强表示正常图像,强增强表示合成异常图像。为了产生如此强大的增强效果,我们引入了MedMix,这是一种新的数据增强策略,可以在正常图像中创建具有逼真外观病变(即异常)的新训练图像。来自MSACL的预训练表示是通用的,可用于提高不同类型的现成最先进(SOTA)UAD模型的效能。综合实验结果表明2019冠状病毒疾病模型的应用,在四个医学影像数据集上,分别从结肠镜检查、眼底筛查和COVID-19胸片数据集中获得了极大的改进。 摘要:The scarcity of high quality medical image annotations hinders the implementation of accurate clinical applications for detecting and segmenting abnormal lesions. To mitigate this issue, the scientific community is working on the development of unsupervised anomaly detection (UAD) systems that learn from a training set containing only normal (i.e., healthy) images, where abnormal samples (i.e., unhealthy) are detected and segmented based on how much they deviate from the learned distribution of normal samples. One significant challenge faced by UAD methods is how to learn effective low-dimensional image representations that are sensitive enough to detect and segment abnormal lesions of varying size, appearance and shape. To address this challenge, we propose a novel self-supervised UAD pre-training algorithm, named Multi-centred Strong Augmentation via Contrastive Learning (MSACL). MSACL learns representations by separating several types of strong and weak augmentations of normal image samples, where the weak augmentations represent normal images and strong augmentations denote synthetic abnormal images. To produce such strong augmentations, we introduce MedMix, a novel data augmentation strategy that creates new training images with realistic looking lesions (i.e., anomalies) in normal images. The pre-trained representations from MSACL are generic and can be used to improve the efficacy of different types of off-the-shelf state-of-the-art (SOTA) UAD models. Comprehensive experimental results show that the use of MSACL largely improves these SOTA UAD models on four medical imaging datasets from diverse organs, namely colonoscopy, fundus screening and covid-19 chest-ray datasets.
分类|识别相关(3篇)
【1】 CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models 标题:CX-TOM:图像识别模型中增强人类信任度的心理理论反事实解释 链接:https://arxiv.org/abs/2109.01401
作者:Arjun R. Akula,Keze Wang,Changsong Liu,Sari Saba-Sadiya,Hongjing Lu,Sinisa Todorovic,Joyce Chai,Song-Chun Zhu 机构:Oregon State University, University of Michigan 备注:Accepted by iScience Cell Press Journal 2021. arXiv admin note: text overlap with arXiv:1909.06907 摘要:我们提出CX-ToM(心灵理论反事实解释的简称),一个新的可解释人工智能(XAI)框架,用于解释深度卷积神经网络(CNN)做出的决策。与XAI中当前将解释生成为单发响应的方法不同,我们将解释设置为机器和人类用户之间的迭代通信过程,即对话。更具体地说,我们的CX-ToM框架通过调解机器和人类用户的思维差异,在对话中生成一系列解释。为了做到这一点,我们使用心理理论(ToM),它帮助我们明确地建模人类的意图、人类推断出的机器思维以及机器推断出的人类思维。此外,大多数最先进的XAI框架都提供了基于注意(或热图)的解释。在我们的工作中,我们发现这些基于注意力的解释不足以增加人类对潜在CNN模型的信任。在CX ToM中,我们使用称为“断层线”的反事实解释,我们定义如下:给定CNN分类模型M预测c_pred类的输入图像I,断层线识别最小语义级别特征(例如斑马条纹、狗尖耳朵),称为可解释概念,需要添加到I或从I中删除,以便将I by M的分类类别更改为另一个指定的类别c_alt。我们认为,由于CX-ToM解释的迭代性、概念性和反事实性,对于专家和非专家用户来说,我们的框架是实用的,更自然地理解复杂深度学习模型的内部工作原理。大量的定量和定性实验验证了我们的假设,表明我们的CX-ToM显著优于最先进的可解释人工智能模型。 摘要:We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows: given an input image I for which a CNN classification model M predicts class c_pred, a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class c_alt. We argue that, due to the iterative, conceptual and counterfactual nature of CX-ToM explanations, our framework is practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art explainable AI models.
【2】 Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition 标题:面向Few-Shot、细粒度运动动作识别的视频姿态提取 链接:https://arxiv.org/abs/2109.01305
作者:James Hong,Matthew Fisher,Michaël Gharbi,Kayvon Fatahalian 机构:Micha¨el Gharbi, Stanford University, Adobe Research 备注:ICCV 2021 (poster) 摘要:人体姿势是细粒度运动动作理解的有用特征。然而,在运动视频上运行时,由于域移动和运动模糊和遮挡等因素,姿势估计器往往不可靠。当下游任务(如动作识别)依赖于姿势时,这会导致精度低下。端到端学习绕过了姿势,但需要更多的标签来概括。我们引入了视频姿势提取(VPD),这是一种弱监督技术,用于学习新视频领域的特征,例如挑战姿势估计的个人运动。在虚拟专用数据库(VPD)下,学生网络学习从体育视频中的RGB帧中提取鲁棒的姿势特征,这样,只要姿势被认为是可靠的,这些特征就会与预先训练的教师姿势检测器的输出相匹配。我们的策略保留了姿势和端到端世界的优点,利用原始视频帧中丰富的视觉模式,同时学习与目标视频域中运动员姿势和运动一致的功能,以避免过度拟合与运动员运动无关的模式。VPD功能在四个真实世界的体育视频数据集中提高了Few-Shot、细粒度动作识别、检索和检测任务的性能,而不需要额外的地面真实姿势注释。 摘要:Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose. End-to-end learning circumvents pose, but requires more labels to generalize. We introduce Video Pose Distillation (VPD), a weakly-supervised technique to learn features for new video domains, such as individual sports that challenge pose estimation. Under VPD, a student network learns to extract robust pose features from RGB frames in the sports video, such that, whenever pose is considered reliable, the features match the output of a pretrained teacher pose detector. Our strategy retains the best of both pose and end-to-end worlds, exploiting the rich visual patterns in raw video frames, while learning features that agree with the athletes' pose and motion in the target video domain to avoid over-fitting to patterns unrelated to athletes' motion. VPD features improve performance on few-shot, fine-grained action recognition, retrieval, and detection tasks in four real-world sports video datasets, without requiring additional ground-truth pose annotations.
【3】 A Reliable, Self-Adaptive Face Identification Framework via Lyapunov Optimization 标题:基于Lyapunov优化的可靠自适应人脸识别框架 链接:https://arxiv.org/abs/2109.01212
作者:Dohyeon Kim,Joongheon Kim,Jae young Bang 机构:Naver Webtoon, Seongnam, Republic of Korea, Korea University, Seoul, Republic of Korea, Quandary Peak Research, Los Angeles, CA, USA 备注:This paper was presented at ACM Symposium on Operating Systems Principles (SOSP) Workshop on AI Systems (AISys), Shanghai, China, October 2017 摘要:来自视频馈送的实时人脸识别(FID)是高度计算密集型的,并且如果在具有有限资源量的设备(例如,移动设备)上执行,则可能消耗计算资源。一般来说,当以更高的速率对图像进行采样时,FID的性能更好,从而最大限度地减少误报。但是,以极高的速率执行此操作会使系统面临队列溢出的风险,从而影响系统的可靠性。本文提出了一种新的队列感知FID框架,该框架通过实现Lyapunov优化来调整采样率以最大化FID性能,同时避免队列溢出。通过基于跟踪的模拟进行的初步评估证实了该框架的有效性。 摘要:Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a queue overflow that hampers the system's reliability. This paper proposes a novel, queue-aware FID framework that adapts the sampling rate to maximize the FID performance while avoiding a queue overflow by implementing the Lyapunov optimization. A preliminary evaluation via a trace-based simulation confirms the effectiveness of the framework.
分割|语义相关(4篇)
【1】 Wildfire smoke plume segmentation using geostationary satellite imagery 标题:利用地球同步卫星图像分割野火烟羽 链接:https://arxiv.org/abs/2109.01637
作者:Jeff Wen,Marshall Burke 机构:Stanford University 摘要:在过去二十年中,野火的频率和严重程度都在增加,尤其是在美国西部。除了这些野火事件造成的物理基础设施破坏外,研究人员越来越多地发现野火烟雾产生的颗粒物对呼吸、心血管和认知健康的有害影响。由于在空间和时间上不确定有多少颗粒物可具体归因于野火烟雾,因此这一推断很困难。造成这一挑战的一个因素是依赖于手动绘制的烟羽注释,这些注释通常仅限于美国。这项工作使用深度卷积神经网络从地球同步卫星图像中分割烟羽。我们使用因果推断方法比较预测的羽流分段与噪声注释的性能,以估计环境保护局(EPA)测量的表面水平颗粒物直径<2.5um($textrm{PM}{2.5}$)中每个解释的变化量。 摘要:Wildfires have increased in frequency and severity over the past two decades, especially in the Western United States. Beyond physical infrastructure damage caused by these wildfire events, researchers have increasingly identified harmful impacts of particulate matter generated by wildfire smoke on respiratory, cardiovascular, and cognitive health. This inference is difficult due to the spatial and temporal uncertainty regarding how much particulate matter is specifically attributable to wildfire smoke. One factor contributing to this challenge is the reliance on manually drawn smoke plume annotations, which are often noisy representations limited to the United States. This work uses deep convolutional neural networks to segment smoke plumes from geostationary satellite imagery. We compare the performance of predicted plume segmentations versus the noisy annotations using causal inference methods to estimate the amount of variation each explains in Environmental Protection Agency (EPA) measured surface level particulate matter <2.5um in diameter ($textrm{PM}_{2.5}$).
【2】 Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning 标题:基于无监督集成学习的湍流计算流体力学模拟分割 链接:https://arxiv.org/abs/2109.01381
作者:Maarja Bussov,Joonas Nättilä 机构:Tartu Observatory, University of Tartu, Observatooriumi ,-zip, T˜oravere, Estonia, Department of Physics, University of Helsinki, P.O. Box , FI-, Helsinki, Finland 备注:15 pages, 8 figures. Accepted to Signal Processing: Image Communication. Code available from a repository: this https URL 摘要:计算机视觉和机器学习工具为从复杂的计算机模拟中自动分析和分类信息提供了一种令人兴奋的新方法。在这里,我们设计了一个集成机器学习框架,该框架能够独立且稳健地将湍流模式的模拟数据输出内容分类并分解为不同的结构目录。使用无监督聚类算法进行分割,该算法通过将模拟图像中的相似像素分组在一起来分割物理结构。通过组合来自多个同时评估的聚类操作的信息,提高了生成的分段区域边界的准确性和鲁棒性。使用图像掩码组合操作执行对象分割评估的堆叠。这种不同簇掩码的统计组合集成(SCE)允许我们为每个像素和相关片段构建簇可靠性度量,而无需任何事先用户输入。通过比较集合中不同簇出现的相似性,我们还可以评估描述数据所需的最佳簇数。此外,通过依赖于集合平均的空间段区域边界,SCE方法能够为不同的图像数据簇重建更精确和鲁棒的感兴趣区域(ROI)边界。我们将SCE算法应用于磁主导的全动力学湍流等离子体流的二维模拟数据快照,其中需要精确的ROI边界来对称为电流片的间歇流结构进行几何测量。 摘要:Computer vision and machine learning tools offer an exciting new way for automatically analyzing and categorizing information from complex computer simulations. Here we design an ensemble machine learning framework that can independently and robustly categorize and dissect simulation data output contents of turbulent flow patterns into distinct structure catalogues. The segmentation is performed using an unsupervised clustering algorithm, which segments physical structures by grouping together similar pixels in simulation images. The accuracy and robustness of the resulting segment region boundaries are enhanced by combining information from multiple simultaneously-evaluated clustering operations. The stacking of object segmentation evaluations is performed using image mask combination operations. This statistically-combined ensemble (SCE) of different cluster masks allows us to construct cluster reliability metrics for each pixel and for the associated segments without any prior user input. By comparing the similarity of different cluster occurrences in the ensemble, we can also assess the optimal number of clusters needed to describe the data. Furthermore, by relying on ensemble-averaged spatial segment region boundaries, the SCE method enables reconstruction of more accurate and robust region of interest (ROI) boundaries for the different image data clusters. We apply the SCE algorithm to 2-dimensional simulation data snapshots of magnetically-dominated fully-kinetic turbulent plasma flows where accurate ROI boundaries are needed for geometrical measurements of intermittent flow structures known as current sheets.
【3】 Access Control Using Spatially Invariant Permutation of Feature Maps for Semantic Segmentation Models 标题:基于特征图空间不变置换的语义分割模型访问控制 链接:https://arxiv.org/abs/2109.01332
作者:Hiroki Ito,MaungMaung AprilPyone,Hitoshi Kiya 机构:Tokyo Metropolitan University, Japan 备注:To appear in 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021) 摘要:在本文中,我们提出了一种访问控制方法,该方法使用具有秘密密钥的特征映射的空间不变排列来保护语义分割模型。分割模型通过使用密钥排列选定的特征映射进行训练和测试。所提出的方法不仅允许拥有正确密钥的合法用户以最大容量访问模型,而且还可以降低未授权用户的性能。传统的访问控制方法只关注图像分类任务,而这些方法从未应用于语义分割任务。在一项实验中,受保护的模型被证明可以让合法用户获得与非受保护模型几乎相同的性能,但也能够抵抗未经授权的用户在没有密钥的情况下进行的访问。此外,在语义分割模型下,传统的分块变换方法也被证明性能下降。 摘要:In this paper, we propose an access control method that uses the spatially invariant permutation of feature maps with a secret key for protecting semantic segmentation models. Segmentation models are trained and tested by permuting selected feature maps with a secret key. The proposed method allows rightful users with the correct key not only to access a model to full capacity but also to degrade the performance for unauthorized users. Conventional access control methods have focused only on image classification tasks, and these methods have never been applied to semantic segmentation tasks. In an experiment, the protected models were demonstrated to allow rightful users to obtain almost the same performance as that of non-protected models but also to be robust against access by unauthorized users without a key. In addition, a conventional method with block-wise transformations was also verified to have degraded performance under semantic segmentation models.
【4】 Automatic Foot Ulcer segmentation Using an Ensemble of Convolutional Neural Networks 标题:基于卷积神经网络集成的足部溃疡自动分割 链接:https://arxiv.org/abs/2109.01408
作者:Amirreza Mahbod,Rupert Ecker,Isabella Ellinger 机构:⋆ Institute for Pathophysiology and Allergy Research, Medical University of Vienna, Vienna, Austria, †Department of Research and Development, TissueGnostics GmbH, Vienna, Austria 备注:6 pages, 2 figures 摘要:足溃疡是糖尿病的常见并发症;它与大量的发病率和死亡率相关,并且仍然是下肢截肢的主要危险因素。从足部伤口中提取准确的形态学特征对于正确治疗至关重要。虽然医学专业人员的目视和手动检查是提取特征的常用方法,但这种方法主观且容易出错。计算机介导的方法是分割病变和提取相关形态学特征的替代方法。在各种基于计算机的图像分割方法中,基于深度学习的方法和更具体地说卷积神经网络(CNN)在各种图像分割任务中表现出了优异的性能,包括医学图像分割。在这项工作中,我们提出了一种基于两个基于编码-解码器的CNN模型的集成方法,即LinkNet和UNet,来执行足部溃疡分割。为了处理有限的训练样本,我们使用预先训练的权重(LinkNet模型的EfficientNetB1和UNet模型的EfficientNetB2)和Medetec数据集的进一步预训练。我们还应用了一些基于形态学和基于颜色的增强技术来训练模型。我们在提出的集成方法中集成了五重交叉验证、测试时间增强和结果融合,以提高分割性能。应用于公开的足部溃疡分割数据集和MICCAI 2021足部溃疡分割(FUSeg)挑战,我们的方法分别获得了92.07%和88.80%的基于数据的最新Dice分数。我们开发的方法在FUSeg挑战排行榜中排名第一。已发布的GitHub存储库中公开了停靠指南、推理代码和保存的训练模型:https://github.com/masih4/Foot_Ulcer_Segmentation 摘要:Foot ulcer is a common complication of diabetes mellitus; it is associated with substantial morbidity and mortality and remains a major risk factor for lower leg amputation. Extracting accurate morphological features from the foot wounds is crucial for proper treatment. Although visual and manual inspection by medical professionals is the common approach to extract the features, this method is subjective and error-prone. Computer-mediated approaches are the alternative solutions to segment the lesions and extract related morphological features. Among various proposed computer-based approaches for image segmentation, deep learning-based methods and more specifically convolutional neural networks (CNN) have shown excellent performances for various image segmentation tasks including medical image segmentation. In this work, we proposed an ensemble approach based on two encoder-decoder-based CNN models, namely LinkNet and UNet, to perform foot ulcer segmentation. To deal with limited training samples, we used pre-trained weights (EfficientNetB1 for the LinkNet model and EfficientNetB2 for the UNet model) and further pre-training by the Medetec dataset. We also applied a number of morphological-based and colour-based augmentation techniques to train the models. We integrated five-fold cross-validation, test time augmentation and result fusion in our proposed ensemble approach to boost the segmentation performance. Applied on a publicly available foot ulcer segmentation dataset and the MICCAI 2021 Foot Ulcer Segmentation (FUSeg) Challenge, our method achieved state-of-the-art data-based Dice scores of 92.07% and 88.80%, respectively. Our developed method achieved the first rank in the FUSeg challenge leaderboard. The Dockerised guideline, inference codes and saved trained models are publicly available in the published GitHub repository: https://github.com/masih4/Foot_Ulcer_Segmentation
Zero/Few Shot|迁移|域适配|自适应(3篇)
【1】 Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object Localization and Task-Decomposition 标题:基于弱监督目标定位和任务分解的自学跨域小概率学习 链接:https://arxiv.org/abs/2109.01302
作者:Xiyao Liu,Zhong Ji,Yanwei Pang,Zhongfei Zhang 机构:Pang are with the School ofElectrical and Information Engineering, Tianjin University 摘要:源域和目标域之间的域转移是跨域Few-Shot学习(CD-FSL)的主要挑战。然而,在源域上进行训练时,目标域是绝对未知的,这导致缺乏对目标任务的直接指导。我们观察到,由于目标领域中存在相似的背景,它可以将自标记样本作为优先任务来将知识转移到目标任务上。为此,我们提出了一种CD-FSL任务扩展分解框架,称为自学习(ST)方法,该方法通过构造面向任务的度量空间来缓解非目标制导问题。具体而言,采用弱监督对象定位(WSOL)和自监督技术,通过交换和旋转判别区域来丰富面向任务的样本,从而生成更丰富的任务集。然后将这些任务分解为多个任务,完成少量镜头识别和旋转分类任务。它有助于将源知识转移到目标任务上,并将注意力集中在有区别的区域。我们在跨域设置下进行了广泛的实验,包括8个目标域:CUB、Cars、Places、Plantae、Cropdieas、EuroSAT、ISIC和ChestX。实验结果表明,所提出的ST方法适用于各种基于度量的模型,并为CD-FSL提供了有希望的改进。 摘要:The domain shift between the source and target domain is the main challenge in Cross-Domain Few-Shot Learning (CD-FSL). However, the target domain is absolutely unknown during the training on the source domain, which results in lacking directed guidance for target tasks. We observe that since there are similar backgrounds in target domains, it can apply self-labeled samples as prior tasks to transfer knowledge onto target tasks. To this end, we propose a task-expansion-decomposition framework for CD-FSL, called Self-Taught (ST) approach, which alleviates the problem of non-target guidance by constructing task-oriented metric spaces. Specifically, Weakly Supervised Object Localization (WSOL) and self-supervised technologies are employed to enrich task-oriented samples by exchanging and rotating the discriminative regions, which generates a more abundant task set. Then these tasks are decomposed into several tasks to finish the task of few-shot recognition and rotation classification. It helps to transfer the source knowledge onto the target tasks and focus on discriminative regions. We conduct extensive experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX. Experimental results demonstrate that the proposed ST approach is applicable to various metric-based models, and provides promising improvements in CD-FSL.
【2】 Information Symmetry Matters: A Modal-Alternating Propagation Network for Few-Shot Learning 标题:信息对称问题:一种用于少发学习的模态交替传播网络 链接:https://arxiv.org/abs/2109.01295
作者:Zhong Ji,Zhishen Hou,Xiyao Liu,Yanwei Pang,Jungong Han 摘要:语义信息提供了超越视觉概念的类内一致性和类间可辨别性,这已被用于少数镜头学习(FSL)以实现进一步的收益。然而,语义信息只适用于标记样本,而不适用于未标记样本,通过语义引导少数标记样本单方面纠正嵌入。因此,语义引导样本和非语义引导样本之间不可避免地存在跨模态偏差,从而导致信息不对称问题。为了解决这一问题,我们提出了一种模式交替传播网络(MAP-Net)来补充未标记样本的缺失语义信息,该网络在视觉和语义模式中建立了所有样本之间的信息对称性。具体地说,地图网络通过图传播来传输邻居信息,在完成的视觉关系的指导下生成未标记样本的伪语义,并校正特征嵌入。此外,由于视觉和语义模式之间的巨大差异,我们设计了一种关系引导(RG)策略,通过语义来引导视觉关系向量,从而使传播的信息更加有利。在加州理工大学UCSD Birds 200-2011、SUN属性数据库和牛津102 Flower三个语义标记数据集上的大量实验结果表明,我们提出的方法取得了令人满意的性能,优于最先进的方法,这表明了信息对称的必要性。 摘要:Semantic information provides intra-class consistency and inter-class discriminability beyond visual concepts, which has been employed in Few-Shot Learning (FSL) to achieve further gains. However, semantic information is only available for labeled samples but absent for unlabeled samples, in which the embeddings are rectified unilaterally by guiding the few labeled samples with semantics. Therefore, it is inevitable to bring a cross-modal bias between semantic-guided samples and nonsemantic-guided samples, which results in an information asymmetry problem. To address this problem, we propose a Modal-Alternating Propagation Network (MAP-Net) to supplement the absent semantic information of unlabeled samples, which builds information symmetry among all samples in both visual and semantic modalities. Specifically, the MAP-Net transfers the neighbor information by the graph propagation to generate the pseudo-semantics for unlabeled samples guided by the completed visual relationships and rectify the feature embeddings. In addition, due to the large discrepancy between visual and semantic modalities, we design a Relation Guidance (RG) strategy to guide the visual relation vectors via semantics so that the propagated information is more beneficial. Extensive experimental results on three semantic-labeled datasets, i.e., Caltech-UCSD-Birds 200-2011, SUN Attribute Database, and Oxford 102 Flower, have demonstrated that our proposed method achieves promising performance and outperforms the state-of-the-art approaches, which indicates the necessity of information symmetry.
【3】 Remote Multilinear Compressive Learning with Adaptive Compression 标题:自适应压缩的远程多线性压缩学习 链接:https://arxiv.org/abs/2109.01184
作者:Dat Thanh Tran,Moncef Gabbouj,Alexandros Iosifidis 机构:∗Department of Computing Sciences, Tampere University, Tampere, Finland, †Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark 备注:2 figures, 6 tables 摘要:多线性压缩学习(MCL)是一种有效的多维信号获取和学习方法。信号压缩水平影响MCL模型的检测或分类性能,较高的压缩率通常与较低的推理精度相关。然而,更高的压缩率更适合于更广泛的应用,特别是那些需要低工作带宽和最低能耗的应用,如物联网(IoT)应用。许多通信协议都支持自适应数据传输,以最大限度地提高吞吐量和降低能耗。通过开发能够以自适应压缩率运行的压缩感知和学习模型,我们可以最大限度地提高整个应用程序的信息内容吞吐量。在本文中,我们提出了一种新的优化方案,使这种功能的MCL模型。我们的建议使自适应压缩信号采集和推理系统的实际实现成为可能。实验结果表明,该方法可以显著减少远程学习系统训练阶段所需的计算量,同时通过自适应速率感知提高信息内容的吞吐量。 摘要:Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially those that require low operating bandwidth and minimal energy consumption such as Internet-of-Things (IoT) applications. Many communication protocols provide support for adaptive data transmission to maximize the throughput and minimize energy consumption. By developing compressive sensing and learning models that can operate with an adaptive compression rate, we can maximize the informational content throughput of the whole application. In this paper, we propose a novel optimization scheme that enables such a feature for MCL models. Our proposal enables practical implementation of adaptive compressive signal acquisition and inference systems. Experimental results demonstrated that the proposed approach can significantly reduce the amount of computations required during the training phase of remote learning systems but also improve the informational content throughput via adaptive-rate sensing.
半弱无监督|主动学习|不确定性(3篇)
【1】 UnDeepLIO: Unsupervised Deep Lidar-Inertial Odometry 标题:UnDeepLIO:无监控深度激光雷达惯性里程计 链接:https://arxiv.org/abs/2109.01533
作者:Yiming Tu,Jin Xie 机构:PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional, Information of Ministry of Education, Nanjing University of Science and Technology, Jiangsu Key Lab of Image and Video Understanding for Social Security 备注:14 pages, accepted by ACPR2021 摘要:广泛的研究致力于基于深度学习的里程计。然而,很少有人在无监督的深激光雷达里程计方面做出努力。在本文中,我们设计了一个新的框架,用于带IMU的无监督激光雷达里程测量,这在其他深度方法中从未使用过。首先,使用一对暹罗LSTM从IMU的线加速度和角速度获取初始姿态。对于初始姿势,我们在当前帧上执行刚性变换,并将其与最后一帧对齐。然后,我们从变换后的点云及其法线中提取顶点和法线特征。接下来,提出了两个分支注意模块,分别从提取的顶点和法线特征估计剩余旋转和平移。最后,我们的模型输出初始姿势和剩余姿势之和作为最终姿势。对于无监督训练,我们引入了一种用于体素化点云的无监督损失函数。所提出的方法在KITTI里程估计基准上进行了评估,并与其他最先进的方法取得了相当的性能。 摘要:Extensive research efforts have been dedicated to deep learning based odometry. Nonetheless, few efforts are made on the unsupervised deep lidar odometry. In this paper, we design a novel framework for unsupervised lidar odometry with the IMU, which is never used in other deep methods. First, a pair of siamese LSTMs are used to obtain the initial pose from the linear acceleration and angular velocity of IMU. With the initial pose, we perform the rigid transform on the current frame and align it closer to the last frame. Then, we extract vertex and normal features from the transformed point clouds and its normals. Next a two-branches attention modules are proposed to estimate residual rotation and translation from the extracted vertex and normal features, respectively. Finally, our model outputs the sum of initial and residual poses as the final pose. For unsupervised training, we introduce an unsupervised loss function which is employed on the voxelized point clouds. The proposed approach is evaluated on the KITTI odometry estimation benchmark and achieves comparable performances against other state-of-the-art methods.
【2】 Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based Cross-View Gait Pose Estimation 标题:基于遮挡不变旋转等变半监督深度的横视步态姿态估计 链接:https://arxiv.org/abs/2109.01397
作者:Xiao Gu,Jianxin Yang,Hanxiao Zhang,Jianing Qiu,Frank Po Wen Lo,Yao Guo,Guang-Zhong Yang,Benny Lo 摘要:从深度图像中准确估计三维人体骨骼可以为医疗保健应用,特别是生物力学步态分析提供重要指标。然而,存在与从单个视图捕获的深度图像相关的固有问题。收集的数据受遮挡的影响很大,遮挡只能记录部分表面数据。此外,随着视点的变化,人体深度图像呈现出异质性特征,并且在局部坐标系下估计的姿态将经历等变旋转。大多数现有的姿态估计模型对这两个问题都很敏感。为了解决这个问题,我们提出了一种新的交叉视图泛化方法,该方法基于一种新的旋转等变主干结构,采用遮挡不变的半监督学习框架。我们的模型使用来自单个视图的真实数据和来自多个视图的未标记合成数据进行训练。它可以很好地概括来自所有其他看不见的视图的真实数据。我们的方法在ICL步态数据集上的步态分析性能优于其他最新技术,并且它可以在ITOP数据集上产生比其提供的“基本事实”更令人信服的关键点。 摘要:Accurate estimation of three-dimensional human skeletons from depth images can provide important metrics for healthcare applications, especially for biomechanical gait analysis. However, there exist inherent problems associated with depth images captured from a single view. The collected data is greatly affected by occlusions where only partial surface data can be recorded. Furthermore, depth images of human body exhibit heterogeneous characteristics with viewpoint changes, and the estimated poses under local coordinate systems are expected to go through equivariant rotations. Most existing pose estimation models are sensitive to both issues. To address this, we propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework built upon a novel rotation-equivariant backbone. Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views. It can generalize well on the real-world data from all the other unseen views. Our approach has shown superior performance on gait analysis on our ICL-Gait dataset compared to other state-of-the-arts and it can produce more convincing keypoints on ITOP dataset, than its provided "ground truth".
【3】 Unsupervised multi-latent space reinforcement learning framework for video summarization in ultrasound imaging 标题:超声成像中视频摘要的无监督多潜在空间强化学习框架 链接:https://arxiv.org/abs/2109.01309
作者:Roshan P Mathews,Mahesh Raveendranatha Panicker,Abhilash R Hareendranathan,Yale Tung Chen,Jacob L Jaremko,Brian Buchanan,Kiran Vishnu Narayan,Kesavadas C,Greeta Mathews 机构:Indian Institute of Technology, Palakkad, India, University of Alberta, Alberta, Canada, Hospital Universitario Puerta de Hierro, Majadahonda, Spain, Government Medical College, Thiruvananthapuram, India 备注:24 pages, submitted to Elsevier Medical Image Analysis for review 摘要:2019冠状病毒疾病流行的流行,强调了在超声扫描中加速分类的工具,为临床医师提供了快速获取相关信息的工具。建议的视频摘要技术是朝着这个方向迈出的一步,它使临床医生能够从给定的超声扫描(如肺超声)中访问相关的关键帧,同时减少资源、存储和带宽需求。我们提出了一个新的无监督强化学习(RL)框架,该框架具有新颖的奖励,有助于无监督学习,避免了繁琐和不切实际的手动标签,用于总结超声视频,以增强其在急诊科(ED)和远程医疗中作为分诊工具的实用性。使用编码器的注意集合,高维图像被投影到低维潜在空间中:a)与正常或异常类别(分类器编码器)的距离缩短,b)遵循地标拓扑(分割编码器),c)距离或拓扑不可知的潜在表示(卷积自动编码器)。解码器使用双向长短时存储器(bi LSTM)实现,该存储器利用来自编码器的潜在空间表示。我们新的视频摘要范例能够为每个摘要关键帧提供分类标签和关键地标的分割。验证是在肺超声(LUS)数据集上进行的,该数据集通常代表远程医疗和ED分类中的潜在使用案例,这些案例来自不同地区(印度、西班牙和加拿大)的不同医疗中心。 摘要:The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. The proposed video-summarization technique is a step in this direction that provides clinicians access to relevant key-frames from a given ultrasound scan (such as lung ultrasound) while reducing resource, storage and bandwidth requirements. We propose a new unsupervised reinforcement learning (RL) framework with novel rewards that facilitates unsupervised learning avoiding tedious and impractical manual labelling for summarizing ultrasound videos to enhance its utility as a triage tool in the emergency department (ED) and for use in telemedicine. Using an attention ensemble of encoders, the high dimensional image is projected into a low dimensional latent space in terms of: a) reduced distance with a normal or abnormal class (classifier encoder), b) following a topology of landmarks (segmentation encoder), and c) the distance or topology agnostic latent representation (convolutional autoencoders). The decoder is implemented using a bi-directional long-short term memory (Bi-LSTM) which utilizes the latent space representation from the encoder. Our new paradigm for video summarization is capable of delivering classification labels and segmentation of key landmarks for each of the summarized keyframes. Validation is performed on lung ultrasound (LUS) dataset, that typically represent potential use cases in telemedicine and ED triage acquired from different medical centers across geographies (India, Spain and Canada).
时序|行为识别|姿态|视频|运动估计(2篇)
【1】 DeepTracks: Geopositioning Maritime Vehicles in Video Acquired from a Moving Platform 标题:DeepTracks:在从移动平台获取的视频中对海上车辆进行地理定位 链接:https://arxiv.org/abs/2109.01235
作者:Jianli Wei,Guanyu Xu,Alper Yilmaz 机构:Photogrammetric Computer Vision Lab., The Ohio State University, Columbus, OH, USA, �, wei., xu., yilmaz. 摘要:对海上移动的船只进行地理定位和跟踪是一个非常具有挑战性的问题,需要从没有共同特征的图像中检测、匹配和估计船只的GPS位置。问题可以表述如下:给定安装在移动平台上的摄像机(已知GPS位置为唯一有效传感器)的图像,我们预测图像中可见的目标船的地理位置。我们的解决方案使用了最新的ML算法、摄像机场景几何和贝叶斯滤波。该管道首先采用逐点跟踪的策略,检测并跟踪目标船在图像中的位置。然后,使用平面投影几何将该图像位置转换为地理位置,并转换为参考相机GPS位置的本地海洋坐标。最后,将目标船的局部坐标转换为全球GPS坐标以估计地理位置。为了获得平滑的地质轨迹,我们采用了unscented卡尔曼滤波器(UKF),它隐式地克服了管道早期的小检测误差。我们使用GPS地面实况测试了我们方法的性能,并显示了估计的地理位置的准确性和速度。我们的代码在https://github.com/JianliWei1995/AI-Track-at-Sea. 摘要:Geopositioning and tracking a moving boat at sea is a very challenging problem, requiring boat detection, matching and estimating its GPS location from imagery with no common features. The problem can be stated as follows: given imagery from a camera mounted on a moving platform with known GPS location as the only valid sensor, we predict the geoposition of a target boat visible in images. Our solution uses recent ML algorithms, the camera-scene geometry and Bayesian filtering. The proposed pipeline first detects and tracks the target boat's location in the image with the strategy of tracking by detection. This image location is then converted to geoposition to the local sea coordinates referenced to the camera GPS location using plane projective geometry. Finally, target boat local coordinates are transformed to global GPS coordinates to estimate the geoposition. To achieve a smooth geotrajectory, we apply unscented Kalman filter (UKF) which implicitly overcomes small detection errors in the early stages of the pipeline. We tested the performance of our approach using GPS ground truth and show the accuracy and speed of the estimated geopositions. Our code is publicly available at https://github.com/JianliWei1995/AI-Track-at-Sea.
【2】 Optimal Target Shape for LiDAR Pose Estimation 标题:激光雷达位姿估计的最优目标形状 链接:https://arxiv.org/abs/2109.01181
作者:Jiunn-Kai Huang,William Clark,Jessy W. Grizzle 机构:are with the RoboticsInstitute, University of Michigan 摘要:目标在诸如杂乱或无纹理环境中的目标跟踪、摄像机(和多传感器)校准任务以及同步定位和映射(SLAM)等问题中至关重要。这些任务的目标形状通常是对称的(正方形、矩形或圆形),适用于结构化、密集的传感器数据,如像素阵列(即图像)。然而,当使用稀疏传感器数据(如激光雷达点云)时,对称形状会导致姿态模糊,并且会受到激光雷达量化不确定性的影响。本文介绍了优化目标形状以消除激光雷达点云姿态模糊的概念。目标被设计为在相对于激光雷达旋转和平移的情况下在边缘点处诱导大梯度,以改善与点云稀疏性相关的量化不确定性。此外,给定目标形状,我们提出了一种方法,利用目标的几何体来估计目标的顶点,同时全局估计姿势。仿真和实验结果(通过运动捕捉系统验证)都证实,通过使用最佳形状和全局解算器,即使将部分照明的目标放置在30米之外,我们也可以实现厘米的平移误差和几度的旋转误差。所有的实现和数据集都可以在https://github.com/UMich-BipedLab/optimal_shape_global_pose_estimation. 摘要:Targets are essential in problems such as object tracking in cluttered or textureless environments, camera (and multi-sensor) calibration tasks, and simultaneous localization and mapping (SLAM). Target shapes for these tasks typically are symmetric (square, rectangular, or circular) and work well for structured, dense sensor data such as pixel arrays (i.e., image). However, symmetric shapes lead to pose ambiguity when using sparse sensor data such as LiDAR point clouds and suffer from the quantization uncertainty of the LiDAR. This paper introduces the concept of optimizing target shape to remove pose ambiguity for LiDAR point clouds. A target is designed to induce large gradients at edge points under rotation and translation relative to the LiDAR to ameliorate the quantization uncertainty associated with point cloud sparseness. Moreover, given a target shape, we present a means that leverages the target's geometry to estimate the target's vertices while globally estimating the pose. Both the simulation and the experimental results (verified by a motion capture system) confirm that by using the optimal shape and the global solver, we achieve centimeter error in translation and a few degrees in rotation even when a partially illuminated target is placed 30 meters away. All the implementations and datasets are available at https://github.com/UMich-BipedLab/optimal_shape_global_pose_estimation.
医学相关(1篇)
【1】 Studying the Effects of Self-Attention for Medical Image Analysis 标题:自我注意在医学图像分析中的作用研究 链接:https://arxiv.org/abs/2109.01486
作者:Adrit Rao,Jongchan Park,Sanghyun Woo,Joon-Young Lee,Oliver Aalami 机构:Palo Alto High School, Lunit Inc., KAIST, Adobe Research, Stanford University 备注:ICCV 2021 CVAMD 摘要:当训练有素的医生解释医学图像时,他们了解视觉特征的临床重要性。通过应用认知注意,他们在忽略不必要特征的同时,将更多的注意力放在临床相关区域。利用计算机视觉对医学图像进行自动分类得到了广泛的研究。然而,标准卷积神经网络(CNN)不一定采用与训练有素的医学专家类似的潜意识特征相关性评估技术,而是更普遍地评估特征。自我注意机制使CNN能够更多地关注语义上重要的区域或具有长期依赖性的聚合相关上下文。通过使用注意力,医学图像分析系统可以通过关注更重要的临床特征区域而变得更加健壮。在这篇论文中,我们提供了一个综合的比较,各种国家的最先进的自我注意机制在多个医学图像分析任务。通过定量和定性评估以及以临床用户为中心的调查研究,我们旨在更深入地了解自我注意在医学计算机视觉任务中的作用。 摘要:When the trained physician interprets medical images, they understand the clinical importance of visual features. By applying cognitive attention, they apply greater focus onto clinically relevant regions while disregarding unnecessary features. The use of computer vision to automate the classification of medical images is widely studied. However, the standard convolutional neural network (CNN) does not necessarily employ subconscious feature relevancy evaluation techniques similar to the trained medical specialist and evaluates features more generally. Self-attention mechanisms enable CNNs to focus more on semantically important regions or aggregated relevant context with long-range dependencies. By using attention, medical image analysis systems can potentially become more robust by focusing on more important clinical feature regions. In this paper, we provide a comprehensive comparison of various state-of-the-art self-attention mechanisms across multiple medical image analysis tasks. Through both quantitative and qualitative evaluations along with a clinical user-centric survey study, we aim to provide a deeper understanding of the effects of self-attention in medical computer vision tasks.
自动驾驶|车辆|车道检测等(1篇)
【1】 Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving 标题:基于隐身车辆的自动驾驶安全运动预测 链接:https://arxiv.org/abs/2109.01510
作者:Xuanchi Ren,Tao Yang,Li Erran Li,Alexandre Alahi,Qifeng Chen 机构:HKUST, Xi’an Jiaotong University, Alexa AI, Amazon, EPFL 备注:Accepted to ICCV 2021 摘要:由于复杂环境中的不确定性以及遮挡和有限的传感器范围造成的有限可见性,车辆的运动预测至关重要,但具有挑战性。在本文中,我们研究了一个新的任务,安全感知的运动预测与看不见的车辆自动驾驶。与现有的可见车辆轨迹预测任务不同,我们的目标是预测占用地图,该地图指示每个位置可被可见和不可见车辆占用的最早时间。预测看不见车辆的能力对于自动驾驶的安全至关重要。为了解决这一具有挑战性的任务,我们提出了一个具有安全意识的深度学习模型,该模型具有三个新的损失函数,用于预测最早的入住地图。在大规模自主驾驶nuScenes数据集上的实验表明,我们提出的模型在安全感知运动预测任务上显著优于最新的基线。据我们所知,在大多数情况下,我们的方法是第一个能够预测看不见车辆存在的方法。位于{url的项目页{https://github.com/xrenaa/Safety-Aware-Motion-Prediction}}. 摘要:Motion prediction of vehicles is critical but challenging due to the uncertainties in complex environments and the limited visibility caused by occlusions and limited sensor ranges. In this paper, we study a new task, safety-aware motion prediction with unseen vehicles for autonomous driving. Unlike the existing trajectory prediction task for seen vehicles, we aim at predicting an occupancy map that indicates the earliest time when each location can be occupied by either seen and unseen vehicles. The ability to predict unseen vehicles is critical for safety in autonomous driving. To tackle this challenging task, we propose a safety-aware deep learning model with three new loss functions to predict the earliest occupancy map. Experiments on the large-scale autonomous driving nuScenes dataset show that our proposed model significantly outperforms the state-of-the-art baselines on the safety-aware motion prediction task. To the best of our knowledge, our approach is the first one that can predict the existence of unseen vehicles in most cases. Project page at {url{https://github.com/xrenaa/Safety-Aware-Motion-Prediction}}.
NAS模型搜索(1篇)
【1】 Edge-featured Graph Neural Architecture Search 标题:边特征图神经网络结构搜索 链接:https://arxiv.org/abs/2109.01356
作者:Shaofei Cai,Liang Li,Xinzhe Han,Zheng-jun Zha,Qingming Huang 机构:Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS, Beijing, China, University of Chinese Academy of Sciences, Beijing, China, University of Science and Technology of China, China,Peng Cheng Laboratory, Shenzhen, China 摘要:图神经网络(GNNs)已成功地应用于许多关系任务中的图表示学习。最近,研究人员研究了神经结构搜索(NAS),以减少对人类专业知识的依赖,探索更好的GNN结构,但他们过分强调实体特征,忽略了隐藏在边缘中的潜在关系信息。为了解决这个问题,我们将边缘特征融入到图搜索空间中,并提出边缘特征图神经结构搜索来寻找最优GNN结构。具体地说,我们设计了丰富的实体和边更新操作来学习高阶表示,这传递了更通用的消息传递机制。此外,我们的搜索空间中的架构拓扑允许探索实体和边缘的复杂特征依赖性,这可以通过可微搜索策略进行有效优化。在六个数据集上进行的三个图形任务的实验表明,EGNAS可以搜索性能更好的GNN,比目前最先进的基于人类设计和搜索的GNN具有更高的性能。 摘要:Graph neural networks (GNNs) have been successfully applied to learning representation on graphs in many relational tasks. Recently, researchers study neural architecture search (NAS) to reduce the dependence of human expertise and explore better GNN architectures, but they over-emphasize entity features and ignore latent relation information concealed in the edges. To solve this problem, we incorporate edge features into graph search space and propose Edge-featured Graph Neural Architecture Search to find the optimal GNN architecture. Specifically, we design rich entity and edge updating operations to learn high-order representations, which convey more generic message passing mechanisms. Moreover, the architecture topology in our search space allows to explore complex feature dependence of both entities and edges, which can be efficiently optimized by differentiable search strategy. Experiments at three graph tasks on six datasets show EGNAS can search better GNNs with higher performance than current state-of-the-art human-designed and searched-based GNNs.
Attention注意力(1篇)
【1】 Dual-Camera Super-Resolution with Aligned Attention Modules 标题:具有对准注意模块的双摄像头超分辨率 链接:https://arxiv.org/abs/2109.01349
作者:Tengfei Wang,Jiaxin Xie,Wenxiu Sun,Qiong Yan,Qifeng Chen 机构:HKUST, SenseTime Research and Tetras.AI 备注:ICCV 2021 摘要:我们提出了一种新的基于参考的超分辨率(RefSR)方法,重点是双摄像机超分辨率(DCSR),它利用参考图像获得高质量和高保真的结果。我们提出的方法推广了标准的基于面片的特征匹配和空间对齐操作。我们进一步探索了RefSR的一个有前途的应用——双摄像机超分辨率,并构建了一个数据集,该数据集由智能手机中主摄像机和长焦摄像机的146个图像对组成。为了弥补真实世界图像和训练图像之间的领域差距,我们提出了一种针对真实世界图像的自监督领域自适应策略。在我们的数据集和一个公共基准上进行的大量实验表明,我们的方法在定量评估和视觉比较方面都比最先进的方法有明显的改进。 摘要:We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR), which utilizes reference images for high-quality and high-fidelity results. Our proposed method generalizes the standard patch-based feature matching with spatial alignment operations. We further explore the dual-camera super-resolution that is one promising application of RefSR, and build a dataset that consists of 146 image pairs from the main and telephoto cameras in a smartphone. To bridge the domain gaps between real-world images and the training images, we propose a self-supervised domain adaptation strategy for real-world images. Extensive experiments on our dataset and a public benchmark demonstrate clear improvement achieved by our method over state of the art in both quantitative evaluation and visual comparisons.
人脸|人群计数(2篇)
【1】 Neural Human Deformation Transfer 标题:神经人体变形转移 链接:https://arxiv.org/abs/2109.01588
作者:Jean Basset,Adnane Boukhayma,Stefanie Wuhrer,Franck Multon,Edmond Boyer 机构:Grenoble INP (Institute of En-gineering Univ 摘要:我们考虑人类变形转移的问题,其中目标是重新定位不同字符之间的姿态。解决此问题的传统方法需要明确的姿势定义,并使用此定义在角色之间传递姿势。在这项工作中,我们采用不同的方法,在不修改角色姿势的情况下,将角色的身份转换为新身份。这提供了不必定义3D人体姿势之间的等效性的优势,这并不简单,因为姿势往往会根据执行姿势的角色的身份而变化,并且其含义具有高度上下文性。为了实现变形传递,我们提出了一种仅编码身份信息且解码器以姿势为条件的神经编码器-解码器结构。我们使用姿势无关的表示,例如等距不变形状特征来表示身份特征。我们的模型使用这些特征来监控从变形姿势到转换结果的偏移预测。我们的实验表明,我们的方法在数量和质量上都优于最先进的方法,并且能够更好地概括训练过程中没有看到的姿势。我们还引入了一个微调步骤,允许获得极端身份的竞争结果,并允许转移简单的服装。 摘要:We consider the problem of human deformation transfer, where the goal is to retarget poses between different characters. Traditional methods that tackle this problem require a clear definition of the pose, and use this definition to transfer poses between characters. In this work, we take a different approach and transform the identity of a character into a new identity without modifying the character's pose. This offers the advantage of not having to define equivalences between 3D human poses, which is not straightforward as poses tend to change depending on the identity of the character performing them, and as their meaning is highly contextual. To achieve the deformation transfer, we propose a neural encoder-decoder architecture where only identity information is encoded and where the decoder is conditioned on the pose. We use pose independent representations, such as isometry-invariant shape characteristics, to represent identity features. Our model uses these features to supervise the prediction of offsets from the deformed pose to the result of the transfer. We show experimentally that our method outperforms state-of-the-art methods both quantitatively and qualitatively, and generalises better to poses not seen during training. We also introduce a fine-tuning step that allows to obtain competitive results for extreme identities, and allows to transfer simple clothing.
【2】 3D Human Shape Style Transfer 标题:3D人形样式转换 链接:https://arxiv.org/abs/2109.01587
作者:Joao Regateiro,Edmond Boyer 摘要:我们考虑的问题,修改/替换一个真正的移动字符的形状风格与任意静态实心源字符。传统的解决方案遵循姿势转换策略,从移动角色到源角色形状,这依赖于骨骼姿势参数化。在本文中,我们探索了一种替代方法,将源形状样式转移到移动角色上。预期的好处是避免了骨骼参数化应用于真实角色所需的固有困难的姿势到形状转换。为了这个目的,我们考虑图像风格转移技术,并探讨如何适应他们的三维人体形状。自适应实例规范化(AdaIN)和SPADE体系结构已被证明能够在保持原始图像结构的同时高效、准确地将图像样式转换到另一个图像上。其中,AdaIN提供一个模块,通过主题的统计数据执行样式转换,SPADE提供一个剩余块体系结构,以改进样式转换的质量。我们通过提出一种卷积神经网络来证明这些方法可以扩展到三维形状域,该网络应用了相同的原理,即在传递新对象形状样式的同时保持形状结构(形状姿势)。生成的结果通过鉴别器模块进行监督,以评估形状的真实性,同时强制解码器合成合理的形状,并改进看不见的对象的样式转换。我们的实验表明,通过基于优化和基于学习的方法,在形状转移方面,与基线相比,平均有大约56%$的定性和定量改进。 摘要:We consider the problem of modifying/replacing the shape style of a real moving character with those of an arbitrary static real source character. Traditional solutions follow a pose transfer strategy, from the moving character to the source character shape, that relies on skeletal pose parametrization. In this paper, we explore an alternative approach that transfers the source shape style onto the moving character. The expected benefit is to avoid the inherently difficult pose to shape conversion required with skeletal parametrization applied on real characters. To this purpose, we consider image style transfer techniques and investigate how to adapt them to 3D human shapes. Adaptive Instance Normalisation (AdaIN) and SPADE architectures have been demonstrated to efficiently and accurately transfer the style of an image onto another while preserving the original image structure. Where AdaIN contributes with a module to perform style transfer through the statistics of the subjects and SPADE contribute with a residual block architecture to refine the quality of the style transfer. We demonstrate that these approaches are extendable to the 3D shape domain by proposing a convolutional neural network that applies the same principle of preserving the shape structure (shape pose) while transferring the style of a new subject shape. The generated results are supervised through a discriminator module to evaluate the realism of the shape, whilst enforcing the decoder to synthesise plausible shapes and improve the style transfer for unseen subjects. Our experiments demonstrate an average of $approx 56%$ qualitative and quantitative improvements over the baseline in shape transfer through optimization-based and learning-based methods.
裁剪|量化|加速|压缩相关(1篇)
【1】 Using Topological Framework for the Design of Activation Function and Model Pruning in Deep Neural Networks 标题:利用拓扑框架设计深层神经网络的激活函数和模型剪枝 链接:https://arxiv.org/abs/2109.01572
作者:Yogesh Kochar,Sunil Kumar Vengalil,Neelam Sinha 机构:Samsung India Research Bangalore, name of organization (of Aff.), Bangalore, India, International Institute of Information Technology 摘要:深度神经网络在计算机视觉、语音识别和自然语言处理等领域的各种任务中取得成功,需要了解训练过程的动态以及训练模型的工作。本文的两个独立贡献是:1)用于更快训练收敛的新激活函数2)不考虑激活函数训练的模型的滤波器的系统修剪。通过改变激活函数,我们分析了训练样本空间在训练过程中被每个连续层变换时的拓扑变换。针对二元分类任务,研究了训练过程中激活函数变化对收敛性的影响。提出了一种新的激活函数,旨在加快分类任务的收敛速度。这里,Betti数用于量化数据的拓扑复杂性。报告了使用MLPs在具有大Betti数(>150)的流行合成二元分类数据集上的实验结果。结果表明,所提出的激活函数导致更快的收敛,所需的时间更少,因子为1.5到2,因为使用所提出的激活函数,层间Betti数减少得更快。所提出的方法在基准图像数据集上得到了验证:fashion MNIST、CIFAR-10和cat vs dog图像,使用CNN。基于实证结果,我们提出了一种新的修剪训练模型的方法。通过消除将数据转换为具有大Betti数的拓扑空间的过滤器,对训练后的模型进行修剪。所有Betti数大于300的过滤器均从每一层移除,且准确度无明显降低。这导致更快的预测时间和减少模型的内存大小。 摘要:Success of deep neural networks in diverse tasks across domains of computer vision, speech recognition and natural language processing, has necessitated understanding the dynamics of training process and also working of trained models. Two independent contributions of this paper are 1) Novel activation function for faster training convergence 2) Systematic pruning of filters of models trained irrespective of activation function. We analyze the topological transformation of the space of training samples as it gets transformed by each successive layer during training, by changing the activation function. The impact of changing activation function on the convergence during training is reported for the task of binary classification. A novel activation function aimed at faster convergence for classification tasks is proposed. Here, Betti numbers are used to quantify topological complexity of data. Results of experiments on popular synthetic binary classification datasets with large Betti numbers(>150) using MLPs are reported. Results show that the proposed activation function results in faster convergence requiring fewer epochs by a factor of 1.5 to 2, since Betti numbers reduce faster across layers with the proposed activation function. The proposed methodology was verified on benchmark image datasets: fashion MNIST, CIFAR-10 and cat-vs-dog images, using CNNs. Based on empirical results, we propose a novel method for pruning a trained model. The trained model was pruned by eliminating filters that transform data to a topological space with large Betti numbers. All filters with Betti numbers greater than 300 were removed from each layer without significant reduction in accuracy. This resulted in faster prediction time and reduced memory size of the model.
3D|3D重建等相关(1篇)
【1】 CAP-Net: Correspondence-Aware Point-view Fusion Network for 3D Shape Analysis 标题:CAP-Net:面向三维形状分析的对应感知点视融合网络 链接:https://arxiv.org/abs/2109.01291
作者:Xinwei He,Silin Cheng,Song Bai,Xiang Bai 机构:Huazhong University of Science and Technology, Bytedance Research 摘要:通过融合点云和多视图数据学习三维表示已被证明是相当有效的。虽然以前的工作通常侧重于利用这两种模式的全局特征,但在本文中,我们认为,通过建模“在哪里融合”,可以获得更多的区分性特征。为了研究这一点,我们提出了一种新的通信感知点视图融合网络(CAPNet)。CAP网络的核心元素是一个名为通信感知融合(CAF)的模块,该模块根据两种模式的通信分数,集成了两种模式的局部特征。我们进一步建议过滤掉具有低值的对应分数以获得显著的局部对应,这减少了融合过程的冗余。在我们的CAP网络中,我们利用CAF模块将两种模式的多尺度特征进行双向和分层融合,以获得更多的信息特征。综合评价了当前流行的三维形状基准,包括三维物体分类和检索,表明了该框架的优越性。 摘要:Learning 3D representations by fusing point cloud and multi-view data has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, in this paper we argue that more discriminative features can be derived by modeling "where to fuse". To investigate this, we propose a novel Correspondence-Aware Point-view Fusion Net (CAPNet). The core element of CAP-Net is a module named Correspondence-Aware Fusion (CAF) which integrates the local features of the two modalities based on their correspondence scores. We further propose to filter out correspondence scores with low values to obtain salient local correspondences, which reduces redundancy for the fusion process. In our CAP-Net, we utilize the CAF modules to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically in order to obtain more informative features. Comprehensive evaluations on popular 3D shape benchmarks covering 3D object classification and retrieval show the superiority of the proposed framework.
其他神经网络|深度学习|模型|建模(6篇)
【1】 Representing Shape Collections with Alignment-Aware Linear Models 标题:用对齐感知的线性模型表示形状集合 链接:https://arxiv.org/abs/2109.01605
作者:Romain Loiseau,Tom Monnier,Loïc Landrieu,Mathieu Aubry 机构:Lo¨ıc Landrieu, LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, France, LASTIG, Univ. Gustave Eiffel, ENSG, IGN, F-, Saint-Mande, France 备注:17 pages, 10 figures. Code and data are available at: this https URL 摘要:在本文中,我们将重新讨论三维点云作为线性形状模型的经典表示。我们的主要见解是利用深度学习将一组形状表示为低维线性形状模型的仿射变换。每个线性模型由一个形状原型、一个低维形状基和两个神经网络组成。该网络以点云为输入,以线性基和最接近输入的仿射变换预测形状的坐标。线性模型和神经网络都是使用单个重建损耗进行端到端学习的。我们的方法的主要优点是,与许多最近学习基于特征的复杂形状表示的深度方法相比,我们的模型是显式的,每个操作都发生在三维空间中。因此,我们的线性形状模型可以很容易地可视化和注释,并且可以直观地理解故障案例。虽然我们的主要目标是介绍一个紧凑和可解释的形状集合表示,但我们表明,它可以为少数镜头分割带来最先进的结果。 摘要:In this paper, we revisit the classical representation of 3D point clouds as linear shape models. Our key insight is to leverage deep learning to represent a collection of shapes as affine transformations of low-dimensional linear shape models. Each linear model is characterized by a shape prototype, a low-dimensional shape basis and two neural networks. The networks take as input a point cloud and predict the coordinates of a shape in the linear basis and the affine transformation which best approximate the input. Both linear models and neural networks are learned end-to-end using a single reconstruction loss. The main advantage of our approach is that, in contrast to many recent deep approaches which learn feature-based complex shape representations, our model is explicit and every operation occurs in 3D space. As a result, our linear shape models can be easily visualized and annotated, and failure cases can be visually understood. While our main goal is to introduce a compact and interpretable representation of shape collections, we show it leads to state of the art results for few-shot segmentation.
【2】 Deep Metric Learning for Ground Images 标题:地面图像的深度度量学习 链接:https://arxiv.org/abs/2109.01569
作者:Raaghav Radhakrishnan,Jan Fabian Schmid,Randolf Scholz,Lars Schmidt-Thieme 机构:Robert Bosch GmbH, Hildesheim, Germany, University of Hildesheim, Goethe University, Frankfurt am Main, Germany 摘要:基于地面纹理的定位方法是低成本、高精度机器人自定位解决方案的潜在前景。这些方法估计给定查询图像的姿势,即相对于一组在应用区域中姿势已知的参考图像,从下向相机对地面的当前观察。在这项工作中,我们处理初始定位任务,其中我们没有关于当前机器人定位的先验知识。在这种情况下,定位方法必须考虑所有可用的参考图像。然而,为了减少计算工作量和接收错误结果的风险,我们只考虑那些与查询图像实际上重叠的参考图像。为此,我们提出了一种深度度量学习方法,用于检索与查询图像最相似的参考图像。与现有的地面图像图像检索方法相比,我们的方法实现了显著更好的召回性能,并提高了基于最先进地面纹理的定位方法的定位性能。 摘要:Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localization task, in which we have no prior knowledge about the current robot positioning. In this situation, the localization method would have to consider all available reference images. However, in order to reduce computational effort and the risk of receiving a wrong result, we would like to consider only those reference images that are actually overlapping with the query image. For this purpose, we propose a deep metric learning approach that retrieves the most similar reference images to the query image. In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method.
【3】 Model-Based Parameter Optimization for Ground Texture Based Localization Methods 标题:基于模型的地面纹理定位方法参数优化 链接:https://arxiv.org/abs/2109.01559
作者:Jan Fabian Schmid,Stephan F. Simon,Rudolf Mester 机构:Robert Bosch GmbH, Hildesheim, Germany, VSI Lab, CS Dept., Goethe University, Frankfurt am Main, Germany, Norwegian Open AI Lab, CS Dept., NTNU Trondheim, Norway 摘要:基于地面纹理的机器人定位是一种很有前途的精确定位方法。这是基于地面图像的视觉特征能够实现类似指纹的地点识别的观察。我们解决了这类方法的有效参数化问题,导出了定位性能的预测模型,该模型只需要一小部分应用领域的样本图像。在第一步中,我们检验模型是否能够预测改变基于特征的定位方法的一个最重要参数(提取的特征数量)的效果。我们研究了两种定位方法,在这两种情况下,我们的评估表明预测是足够准确的。由于该模型可用于为任何参数找到合适的值,因此我们提出了一个整体参数优化框架,该框架可找到合适的纹理特定参数配置,仅使用该模型评估所考虑的参数配置。 摘要:A promising approach to accurate positioning of robots is ground texture based localization. It is based on the observation that visual features of ground images enable fingerprint-like place recognition. We tackle the issue of efficient parametrization of such methods, deriving a prediction model for localization performance, which requires only a small collection of sample images of an application area. In a first step, we examine whether the model can predict the effects of changing one of the most important parameters of feature-based localization methods: the number of extracted features. We examine two localization methods, and in both cases our evaluation shows that the predictions are sufficiently accurate. Since this model can be used to find suitable values for any parameter, we then present a holistic parameter optimization framework, which finds suitable texture-specific parameter configurations, using only the model to evaluate the considered parameter configurations.
【4】 Deep Learning for Fitness 标题:健身中的深度学习 链接:https://arxiv.org/abs/2109.01376
作者:Mahendran N 机构:Indian Institute of Technology Tirupati, Andhra Pradesh, India 备注:6 pages, 3 figures, 2 tables. Rejected by a TradiCV 2021 摘要:我们介绍健身导师,这是一个在锻炼或做瑜伽时保持正确姿势的应用程序。目前关于健身的工作主要集中在建议食物补充剂、获得锻炼、锻炼穿戴设备在改善健身方面做了大量工作。同时,目前的情况使得很难监控学员的训练。受机器人手术等医疗创新的启发,我们设计了一种新型的应用健身导师,可以使用姿势估计指导训练。姿态估计可以部署在参考图像上以收集数据,并用数据指导用户。这使得健身导师能够在远程条件下以单一参考姿势作为图像指导训练(包括锻炼和瑜伽)。我们在tensorflow中使用posenet模型和p5js来开发骨架。健身导师是姿势估计模型的一个应用,它带来了健身方面的实时教学经验。我们的实验表明,它可以利用潜在的姿态估计模型提供实时指导。 摘要:We present Fitness tutor, an application for maintaining correct posture during workout exercises or doing yoga. Current work on fitness focuses on suggesting food supplements, accessing workouts, workout wearables does a great job in improving the fitness. Meanwhile, the current situation is making difficult to monitor workouts by trainee. Inspired by healthcare innovations like robotic surgery, we design a novel application Fitness tutor which can guide the workouts using pose estimation. Pose estimation can be deployed on the reference image for gathering data and guide the user with the data. This allow Fitness tutor to guide the workouts (both exercise and yoga) in remote conditions with a single reference posture as image. We use posenet model in tensorflow with p5js for developing skeleton. Fitness tutor is an application of pose estimation model in bringing a realtime teaching experience in fitness. Our experiments shows that it can leverage potential of pose estimation models by providing guidance in realtime.
【5】 Towards Learning Spatially Discriminative Feature Representations 标题:关于学习空间区分性特征表示的探讨 链接:https://arxiv.org/abs/2109.01359
作者:Chaofei Wang,Jiayu Xiao,Yizeng Han,Qisen Yang,Shiji Song,Gao Huang 机构:Department of Automation, Tsinghua University 备注:Accepted by ICCV2021 摘要:传统的CNN分类器的主干通常被认为是一个特征提取器,然后是一个执行分类的线性层。我们提出了一种新的损失函数,称为CAM损失,用类激活映射(CAM)来约束嵌入的特征映射,该类激活映射表示特定类别图像的空间分辨区域。CAM丢失驱动主干表达目标类别的特征,抑制非目标类别或背景的特征,从而获得更具区分性的特征表示。它可以简单地应用于任何CNN体系结构中,并具有可忽略的附加参数和计算。实验结果表明,CAM-loss适用于多种网络结构,可以与主流正则化方法相结合,提高图像分类性能。在迁移学习和Few-Shot学习任务中,验证了CAM损失的强泛化能力。基于CAM损失,提出了一种新的CAAM-CAM匹配知识提取方法。该方法直接利用教师网络生成的CAM对学生网络生成的CAAM进行监控,有效地提高了学生网络的精度和收敛速度。 摘要:The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification. We propose a novel loss function, termed as CAM-loss, to constrain the embedded feature maps with the class activation maps (CAMs) which indicate the spatially discriminative regions of an image for particular categories. CAM-loss drives the backbone to express the features of target category and suppress the features of non-target categories or background, so as to obtain more discriminative feature representations. It can be simply applied in any CNN architecture with neglectable additional parameters and calculations. Experimental results show that CAM-loss is applicable to a variety of network structures and can be combined with mainstream regularization methods to improve the performance of image classification. The strong generalization ability of CAM-loss is validated in the transfer learning and few shot learning tasks. Based on CAM-loss, we also propose a novel CAAM-CAM matching knowledge distillation method. This method directly uses the CAM generated by the teacher network to supervise the CAAM generated by the student network, which effectively improves the accuracy and convergence rate of the student network.
【6】 Deep Learning Approach for Hyperspectral Image Demosaicking, Spectral Correction and High-resolution RGB Reconstruction 标题:高光谱图像去马赛克、光谱校正和高分辨率RGB重建的深度学习方法 链接:https://arxiv.org/abs/2109.01403
作者:Peichao Li,Michael Ebner,Philip Noonan,Conor Horgan,Anisha Bahl,Sebastien Ourselin,Jonathan Shapey,Tom Vercauteren 机构:School of Biomedical Engineering & Imaging Sciences, King’s College London, London, UK, Hypervision Surgical Ltd, London, UK, Department of Neurosurgery, King’s College Hospital NHS Foundation Trust, London, UK, ARTICLE HISTORY 摘要:高光谱成像是术中组织特征化最有前途的技术之一。快照马赛克相机可以在一次曝光中捕获高光谱数据,有可能使手术决策的实时高光谱成像系统成为可能。然而,对捕获数据的优化利用需要解决不适定的解模糊问题,并应用额外的光谱校正来恢复图像的空间和光谱信息。在这项工作中,我们提出了一种基于深度学习的基于监督学习方法的快照高光谱图像去噪算法。由于缺乏使用快照马赛克相机获取的公共可用医学图像,因此提出了一种合成图像生成方法,以模拟由高分辨率但速度较慢的高光谱成像设备捕获的现有医学图像数据集中的快照图像。利用卷积神经网络实现高光谱图像的超分辨率重建,然后利用传感器特定的校准矩阵进行串扰和泄漏校正。所得到的去马赛克图像进行了定量和定性评估,与使用线性插值的基线去马赛克方法相比,显示出明显的图像质量改进。此外,我们的算法为最先进的快照马赛克相机获得每帧超分辨率RGB或氧饱和度图的快速处理时间约为45 ,ms,表明其无缝集成到实时外科高光谱成像应用中的潜力。 摘要:Hyperspectral imaging is one of the most promising techniques for intraoperative tissue characterisation. Snapshot mosaic cameras, which can capture hyperspectral data in a single exposure, have the potential to make a real-time hyperspectral imaging system for surgical decision-making possible. However, optimal exploitation of the captured data requires solving an ill-posed demosaicking problem and applying additional spectral corrections to recover spatial and spectral information of the image. In this work, we propose a deep learning-based image demosaicking algorithm for snapshot hyperspectral images using supervised learning methods. Due to the lack of publicly available medical images acquired with snapshot mosaic cameras, a synthetic image generation approach is proposed to simulate snapshot images from existing medical image datasets captured by high-resolution, but slow, hyperspectral imaging devices. Image reconstruction is achieved using convolutional neural networks for hyperspectral image super-resolution, followed by cross-talk and leakage correction using a sensor-specific calibration matrix. The resulting demosaicked images are evaluated both quantitatively and qualitatively, showing clear improvements in image quality compared to a baseline demosaicking method using linear interpolation. Moreover, the fast processing time of~45,ms of our algorithm to obtain super-resolved RGB or oxygenation saturation maps per image frame for a state-of-the-art snapshot mosaic camera demonstrates the potential for its seamless integration into real-time surgical hyperspectral imaging applications.
其他(8篇)
【1】 Instabilities in Plug-and-Play (PnP) algorithms from a learned denoiser 标题:来自学习消噪器的即插即用(PNP)算法中的不稳定性 链接:https://arxiv.org/abs/2109.01655
作者:Abinash Nayak 备注:arXiv admin note: text overlap with arXiv:2106.07795 摘要:众所周知,反问题是不适定的,为了有意义地解决它们,必须使用正则化方法。传统上,流行的正则化方法是惩罚变分方法。近年来,经典的正则化方法已经被所谓的即插即用(PnP)算法所超越,该算法复制了最近的梯度最小化过程,如ADMM或FISTA,但使用了任何通用的去噪器。然而,与传统的近端梯度法不同,这些PnP算法的理论基础、收敛性和稳定性结果都不够充分。因此,从这些算法中获得的结果,尽管在经验上很突出,但并不总是完全可信的,因为它们可能包含由去噪器产生的某些不稳定性或(幻觉)特征,特别是在使用预先训练的学习去噪器时。事实上,在本文中,我们证明了当使用预训练的基于深度学习(DnCNN)的去噪器时,PnP算法可以诱导幻觉特征。我们证明了这种不稳定性与不适定问题固有的不稳定性是完全不同的。我们还提出了抑制这些不稳定性并显著提高回收率的方法。我们比较了学习型去噪器与经典去噪器(此处为BM3D)的优缺点,以及FISTA PnP算法与ADMM PnP算法的有效性。此外,我们还提供了一种算法,以加权方式将这两种去噪器(学习型和经典型)结合起来,以产生更好的结果。数值结果验证了理论的正确性。 摘要:It's well-known that inverse problems are ill-posed and to solve them meaningfully, one has to employ regularization methods. Traditionally, popular regularization methods are the penalized Variational approaches. In recent years, the classical regularization approaches have been outclassed by the so-called plug-and-play (PnP) algorithms, which copy the proximal gradient minimization processes, such as ADMM or FISTA, but with any general denoiser. However, unlike the traditional proximal gradient methods, the theoretical underpinnings, convergence, and stability results have been insufficient for these PnP-algorithms. Hence, the results obtained from these algorithms, though empirically outstanding, can't always be completely trusted, as they may contain certain instabilities or (hallucinated) features arising from the denoiser, especially when using a pre-trained learned denoiser. In fact, in this paper, we show that a PnP-algorithm can induce hallucinated features, when using a pre-trained deep-learning-based (DnCNN) denoiser. We show that such instabilities are quite different than the instabilities inherent to an ill-posed problem. We also present methods to subdue these instabilities and significantly improve the recoveries. We compare the advantages and disadvantages of a learned denoiser over a classical denoiser (here, BM3D), as well as, the effectiveness of the FISTA-PnP algorithm vs. the ADMM-PnP algorithm. In addition, we also provide an algorithm to combine these two denoisers, the learned and the classical, in a weighted fashion to produce even better results. We conclude with numerical results which validate the developed theories.
【2】 Super Neurons 标题:超级神经元 链接:https://arxiv.org/abs/2109.01594
作者:Serkan Kiranyaz,Junaid Malik,Mehmet Yamac,Esin Guldogan,Turker Ince,Moncef Gabbouj 机构:Electrical & Electronics Engineering Department, Izmir University of Economics, Turkey; e-mail: 摘要:操作神经网络(ONN)是新一代的网络模型,可以通过“节点”和“池”操作符的适当组合执行任何(非线性)转换。然而,它们仍然有一定的限制,即对每个神经元的所有(突触)连接只使用一个节点操作符。“生成神经元”背后的想法是为了弥补这一限制而产生的,在训练过程中,每个节点算子都可以“定制”,以最大限度地提高学习性能。由生成神经元组成的自组织神经元(Self-ONNs)即使在紧凑的结构下也能实现最大程度的多样性;然而,它仍然受到从CNN继承的最后一个属性的影响:本地化内核操作,这对层之间的信息流造成了严重的限制。因此,希望神经元在不增加内核大小的情况下,从先前层映射中的较大区域收集信息。对于某些应用程序,可能更希望在训练过程中“学习”每个连接的核心位置以及定制的节点操作符,以便两者可以同时优化。这项研究引入了超级(生成)神经元模型,它可以在不改变内核大小的情况下实现这一点,并将在信息流方面实现显著的多样性。本研究中提出的两种超级神经元模型在内核的定位过程上有所不同:i)在为每层设置的偏差范围内随机定位内核,ii)在反向传播(BP)训练期间优化每个内核的位置。大量的比较评估表明,具有超级神经元的自神经元确实可以在不显著增加计算复杂度的情况下实现优异的学习和泛化能力。 摘要:Operational Neural Networks (ONNs) are new generation network models that can perform any (non-linear) transformation with a proper combination of "nodal" and "pool" operators. However, they still have a certain restriction, which is the sole usage of a single nodal operator for all (synaptic) connections of each neuron. The idea behind the "generative neurons" was born as a remedy for this restriction where each nodal operator can be "customized" during the training in order to maximize the learning performance. Self-Organized ONNs (Self-ONNs) composed with the generative neurons can achieve an utmost level of diversity even with a compact configuration; however, it still suffers from the last property that was inherited from the CNNs: localized kernel operations which imposes a severe limitation to the information flow between layers. It is, therefore, desirable for the neurons to gather information from a larger area in the previous layer maps without increasing the kernel size. For certain applications, it might be even more desirable "to learn" the kernel locations of each connection during the training process along with the customized nodal operators so that both can be optimized simultaneously. This study introduces the super (generative) neuron models that can accomplish this without altering the kernel sizes and will enable a significant diversity in terms of information flow. The two models of super neurons proposed in this study vary on the localization process of the kernels: i) randomly localized kernels within a bias range set for each layer, ii) optimized locations of each kernel during the Back-Propagation (BP) training. The extensive set of comparative evaluations show that Self-ONNs with super-neurons can indeed achieve a superior learning and generalization capability without any significant rise of the computational complexity.
【3】 Ordinal Pooling 标题:序数合用 链接:https://arxiv.org/abs/2109.01561
作者:Adrien Deliège,Maxime Istasse,Ashwani Kumar,Christophe De Vleeschouwer,Marc Van Droogenbroeck 机构:University of Liège, University of Louvain, University of Sheffield, How to cite this work? This is the authors’ preprint version of a paper published at BMVC ,. Please cite it as 备注:None 摘要:在卷积神经网络的框架中,下采样通常使用平均池进行,其中所有激活被同等对待,或者使用最大池操作,该操作仅保留具有最大激活的元素,而丢弃其他元素。这两种操作都是限制性的,以前已经证明是次优的。为了解决这个问题,本文引入了一个名为emph{ordinal pooling}的新型池方案。有序池在序列中重新排列池区域的所有元素,并根据其在序列中的顺序为每个元素分配不同的权重。这些权重用于将池操作计算为池区域的重新排列元素的加权和。他们通过标准的基于梯度的训练来学习,允许以可微的方式学习从平均池到最大池的频谱中的任何地方的行为。我们的实验表明,网络在池层中执行不同类型的池操作是有利的,并且平均池和最大池之间的混合行为通常是有益的。更重要的是,他们还证明了顺序池可以在平均池或最大池操作的基础上持续提高准确性,同时加快训练,缓解网络中使用的池操作和激活函数的选择问题。特别是,序数池主要有助于轻量级或量化的深度学习体系结构,例如嵌入式应用程序。 摘要:In the framework of convolutional neural networks, downsampling is often performed with an average-pooling, where all the activations are treated equally, or with a max-pooling operation that only retains an element with maximum activation while discarding the others. Both of these operations are restrictive and have previously been shown to be sub-optimal. To address this issue, a novel pooling scheme, namedemph{ ordinal pooling}, is introduced in this work. Ordinal pooling rearranges all the elements of a pooling region in a sequence and assigns a different weight to each element based upon its order in the sequence. These weights are used to compute the pooling operation as a weighted sum of the rearranged elements of the pooling region. They are learned via a standard gradient-based training, allowing to learn a behavior anywhere in the spectrum of average-pooling to max-pooling in a differentiable manner. Our experiments suggest that it is advantageous for the networks to perform different types of pooling operations within a pooling layer and that a hybrid behavior between average- and max-pooling is often beneficial. More importantly, they also demonstrate that ordinal pooling leads to consistent improvements in the accuracy over average- or max-pooling operations while speeding up the training and alleviating the issue of the choice of the pooling operations and activation functions to be used in the networks. In particular, ordinal pooling mainly helps on lightweight or quantized deep learning architectures, as typically considered e.g. for embedded applications.
【4】 Ghost Loss to Question the Reliability of Training Data 标题:幽灵丢失对训练数据可靠性的质疑 链接:https://arxiv.org/abs/2109.01504
作者:Adrien Deliège,Anthony Cioppa,Marc Van Droogenbroeck 机构:University of Liège, How to cite this work? This is the authors’ preprint, version of a paper published in IEEE Access in ,., Please cite it as follows:, A. Deliège, A. Cioppa, and M. Van Droogenbroeck, "Ghost Loss to Ques- 备注:None 摘要:有监督的图像分类问题依赖于假设已正确注释的训练数据;这一假设支撑了深度学习领域的大部分工作。因此,在训练过程中,网络被迫匹配注释者提供的标签,并且不能灵活地选择它可能能够检测到的不一致性的替代方案。因此,错误标记的训练图像可能最终被“正确”分类到它们实际上不属于的类中。这可能会降低网络的性能,从而导致在不检查训练数据质量的情况下构建更复杂的网络。在这项工作中,我们质疑带注释数据集的可靠性。为此,我们引入了重影损失的概念,它可以被视为一种常规损失,以确定的方式对某些预测值进行调零,并允许网络选择给定标签的替代方案,而不会受到惩罚。在概念验证实验之后,我们使用ghost loss原理检测著名训练数据集中(MNIST、Fashion MNIST、SVHN、CIFAR10)的混淆图像和错误标记图像,并提供一种新工具,称为健全矩阵,用于总结这些混淆。 摘要:Supervised image classification problems rely on training data assumed to have been correctly annotated; this assumption underpins most works in the field of deep learning. In consequence, during its training, a network is forced to match the label provided by the annotator and is not given the flexibility to choose an alternative to inconsistencies that it might be able to detect. Therefore, erroneously labeled training images may end up ``correctly'' classified in classes which they do not actually belong to. This may reduce the performances of the network and thus incite to build more complex networks without even checking the quality of the training data. In this work, we question the reliability of the annotated datasets. For that purpose, we introduce the notion of ghost loss, which can be seen as a regular loss that is zeroed out for some predicted values in a deterministic way and that allows the network to choose an alternative to the given label without being penalized. After a proof of concept experiment, we use the ghost loss principle to detect confusing images and erroneously labeled images in well-known training datasets (MNIST, Fashion-MNIST, SVHN, CIFAR10) and we provide a new tool, called sanity matrix, for summarizing these confusions.
【5】 MitoVis: A Visually-guided Interactive Intelligent System for Neuronal Mitochondria Analysis 标题:MitoVis:用于神经元线粒体分析的可视化交互式智能系统 链接:https://arxiv.org/abs/2109.01351
作者:JunYoung Choi,Hakjun Lee,Suyeon Kim,Seok-Kyu Kwon,Won-Ki Jeong 摘要:神经元有一个极化结构,包括树突和轴突,而隔室特定的功能可以受到线粒体的影响。线粒体的形态与神经元的功能和神经退行性疾病密切相关。尽管已经开发了几种深度学习方法来自动分析线粒体的形态,但将现有方法应用于实际分析仍然遇到一些困难。由于预先训练的深度学习模型的性能可能因目标数据而异,因此通常需要对模型进行重新训练。此外,尽管深度学习在受限设置下表现出了优异的性能,但在实际分析中,仍然存在需要人为纠正的错误。为了解决这些问题,我们介绍了MitoVis,一种用于端到端数据处理和神经元线粒体形态交互分析的新型可视化系统。MitoVis能够在不需要机器学习领域知识的情况下,对预先训练好的神经网络模型进行交互式微调,这使得神经科学家能够轻松地利用深度学习进行研究。MitoVis还提供新颖的视觉指南和交互式校对功能,使用户能够以最小的努力快速识别和纠正结果中的错误。我们通过一位神经科学家在真实分析场景中进行的案例研究,证明了该系统的有用性和有效性。结果表明,与完全手动分析方法相比,MitoVis的分析速度高达15倍,精确度相近。 摘要:Neurons have a polarized structure, including dendrites and axons, and compartment-specific functions can be affected by dwelling mitochondria. It is known that the morphology of mitochondria is closely related to the functions of neurons and neurodegenerative diseases. Even though several deep learning methods have been developed to automatically analyze the morphology of mitochondria, the application of existing methods to actual analysis still encounters several difficulties. Since the performance of pre-trained deep learning model may vary depending on the target data, re-training of the model is often required. Besides, even though deep learning has shown superior performance under a constrained setup, there are always errors that need to be corrected by humans in real analysis. To address these issues, we introduce MitoVis, a novel visualization system for end-to-end data processing and interactive analysis of the morphology of neuronal mitochondria. MitoVis enables interactive fine-tuning of a pre-trained neural network model without the domain knowledge of machine learning, which allows neuroscientists to easily leverage deep learning in their research. MitoVis also provides novel visual guides and interactive proofreading functions so that the users can quickly identify and correct errors in the result with minimal effort. We demonstrate the usefulness and efficacy of the system via a case study conducted by a neuroscientist on a real analysis scenario. The result shows that MitoVis allows up to 15x faster analysis with similar accuracy compared to the fully manual analysis method.
【6】 Spatially varying white balancing for mixed and non-uniform illuminants 标题:混合光源和非均匀光源的空间变化白平衡 链接:https://arxiv.org/abs/2109.01350
作者:Teruaki Akazawa,Yuma Kinoshita,Hitoshi Kiya 机构:∗ Tokyo Metropolitan University, Tokyo, Japan 摘要:在本文中,我们提出了一种新的白平衡调整,称为“空间变化白平衡”,用于单一、混合和非均匀光源。通过使用n个对角矩阵和一个权重,该方法可以在这样的光照条件下减少图像中所有空间变化颜色的光照效果。相反,传统的白平衡调节不考虑除单个光源外的所有颜色的校正。此外,多色平衡调整可以将多个颜色映射到相应的基本真值颜色,尽管它们可能会导致秩不足问题,因为使用非对角矩阵,这与白色平衡不同。在一个实验中,与传统的白平衡和多色平衡相比,该方法在混合和非均匀光源下显示了其有效性。此外,在单一光源下,该方法与传统的白平衡方法具有几乎相同的性能。 摘要:In this paper, we propose a novel white balance adjustment, called "spatially varying white balancing," for single, mixed, and non-uniform illuminants. By using n diagonal matrices along with a weight, the proposed method can reduce lighting effects on all spatially varying colors in an image under such illumination conditions. In contrast, conventional white balance adjustments do not consider the correcting of all colors except under a single illuminant. Also, multi-color balance adjustments can map multiple colors into corresponding ground truth colors, although they may cause the rank deficiency problem to occur as a non-diagonal matrix is used, unlike white balancing. In an experiment, the effectiveness of the proposed method is shown under mixed and non-uniform illuminants, compared with conventional white and multi-color balancing. Moreover, under a single illuminant, the proposed method has almost the same performance as the conventional white balancing.
【7】 roadscene2vec: A Tool for Extracting and Embedding Road Scene-Graphs 标题:RoadScene2vec:一种提取和嵌入道路场景图的工具 链接:https://arxiv.org/abs/2109.01183
作者:Arnav Vaibhav Malawade,Shih-Yuan Yu,Brandon Hsu,Harsimrat Kaeley,Anurag Karra,Mohammad Abdullah Al Faruque 机构:Department of Electrical Engineering & Computer Science, University of California -, Irvine, Irvine, CA , USA 摘要:最近,与图形学习技术结合使用的道路场景图形表示在动作分类、风险评估和碰撞预测等任务中的表现优于最先进的深度学习技术。为了探索道路场景图表示的应用,我们介绍了roadscene2vec:一种用于提取和嵌入道路场景图的开源工具。roadscene2vec的目标是通过提供用于生成场景图的工具、用于生成时空场景图嵌入的图形学习模型以及用于可视化和分析基于场景图的方法的工具,来研究道路场景图的应用和功能。roadscene2vec的功能包括(i)从视频剪辑或来自卡拉模拟器的数据生成自定义场景图,(ii)多个可配置时空图嵌入模型和基于CNN的基线模型,(iii)用于使用图形和序列嵌入进行风险评估和碰撞预测应用的内置功能,(iv)用于评估转移学习的工具,以及(v)用于可视化场景图和分析图形学习模型可解释性的实用工具。我们通过对图形学习模型和基于CNN的模型的实验结果和定性评估,展示了roadscene2vec在这些用例中的实用性。roadscene2vec可在以下网址获得:https://github.com/AICPS/roadscene2vec. 摘要:Recently, road scene-graph representations used in conjunction with graph learning techniques have been shown to outperform state-of-the-art deep learning techniques in tasks including action classification, risk assessment, and collision prediction. To enable the exploration of applications of road scene-graph representations, we introduce roadscene2vec: an open-source tool for extracting and embedding road scene-graphs. The goal of roadscene2vec is to enable research into the applications and capabilities of road scene-graphs by providing tools for generating scene-graphs, graph learning models to generate spatio-temporal scene-graph embeddings, and tools for visualizing and analyzing scene-graph-based methodologies. The capabilities of roadscene2vec include (i) customized scene-graph generation from either video clips or data from the CARLA simulator, (ii) multiple configurable spatio-temporal graph embedding models and baseline CNN-based models, (iii) built-in functionality for using graph and sequence embeddings for risk assessment and collision prediction applications, (iv) tools for evaluating transfer learning, and (v) utilities for visualizing scene-graphs and analyzing the explainability of graph learning models. We demonstrate the utility of roadscene2vec for these use cases with experimental results and qualitative evaluations for both graph learning models and CNN-based models. roadscene2vec is available at https://github.com/AICPS/roadscene2vec.
【8】 Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions 标题:作物制图的两个转变:利用综合作物统计数据改进新区域的卫星地图 链接:https://arxiv.org/abs/2109.01246
作者:Dan M. Kluger,Sherrie Wang,David B. Lobell 机构:a Department of Statistics, Sequoia Hall, Mail Code , Jane Stanford Way, Stanford, University, Stanford, CA ,-, United States of America, b Department of Earth System Science and Center on Food Security and the Environment, Encina 备注:None 摘要:农田一级的作物类型制图对于农业监测的各种应用至关重要,卫星图像正成为制作作物类型地图的日益丰富和有用的原始输入。尽管如此,在许多地区,利用卫星数据绘制作物类型图仍然受到缺乏用于训练监督分类模型的田间作物标签的限制。当一个区域中没有可用的训练数据时,可以传输在类似区域中训练的分类器,但作物类型分布的变化以及区域之间特征的转换会导致分类精度降低。我们提出了一种方法,通过考虑这两种类型的偏移,使用聚合级作物统计来校正分类器。为了调整作物类型组成的变化,我们提出了一种方案,用于适当地重新加权分类器输出的每个类别的后验概率。为了调整特征中的偏移,我们提出了一种估计和去除平均特征向量中线性偏移的方法。我们证明,当使用线性判别分析(LDA)绘制法国西塔尼省和肯尼亚西部省的作物类型时,该方法可显著提高总体分类精度。当使用LDA作为我们的基本分类器时,我们发现在法国,我们的方法使11个不同训练部门的误分类率降低了2.8%至42.2%(平均值=21.9%),在肯尼亚,三个训练区域的误分类率分别降低了6.6%、28.4%和42.7%。虽然我们的方法在统计学上是由LDA分类器驱动的,但它可以应用于任何类型的分类器。作为一个例子,我们展示了它在改进随机森林分类器中的成功应用。 摘要:Crop type mapping at the field level is critical for a variety of applications in agricultural monitoring, and satellite imagery is becoming an increasingly abundant and useful raw input from which to create crop type maps. Still, in many regions crop type mapping with satellite data remains constrained by a scarcity of field-level crop labels for training supervised classification models. When training data is not available in one region, classifiers trained in similar regions can be transferred, but shifts in the distribution of crop types as well as transformations of the features between regions lead to reduced classification accuracy. We present a methodology that uses aggregate-level crop statistics to correct the classifier by accounting for these two types of shifts. To adjust for shifts in the crop type composition we present a scheme for properly reweighting the posterior probabilities of each class that are output by the classifier. To adjust for shifts in features we propose a method to estimate and remove linear shifts in the mean feature vector. We demonstrate that this methodology leads to substantial improvements in overall classification accuracy when using Linear Discriminant Analysis (LDA) to map crop types in Occitanie, France and in Western Province, Kenya. When using LDA as our base classifier, we found that in France our methodology led to percent reductions in misclassifications ranging from 2.8% to 42.2% (mean = 21.9%) over eleven different training departments, and in Kenya the percent reductions in misclassification were 6.6%, 28.4%, and 42.7% for three training regions. While our methodology was statistically motivated by the LDA classifier, it can be applied to any type of classifier. As an example, we demonstrate its successful application to improve a Random Forest classifier.
机器翻译,仅供参考