Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.CV 方向,今日共计33篇
Transformer(1篇)
【1】 Evaluating Transformer based Semantic Segmentation Networks for Pathological Image Segmentation 标题:基于Transformer的语义分割网络在病理图像分割中的评价 链接:https://arxiv.org/abs/2108.11993
作者:Cam Nguyen,Zuhayr Asad,Yuankai Huo 机构:Department of Computer Science, Vanderbilt University, Nashville, TN, USA 摘要:组织病理学在癌症诊断中起着至关重要的作用。随着卷积神经网络(CNN)的迅速发展。各种基于CNN的自动病理图像分割方法已经在计算机辅助病理图像分析中得到发展。在过去的几年中,Transformer神经网络(Transformer)作为一种新的深度学习范式,显示了在整个图像中捕获全局长距离依赖关系的独特优点。这种优点对于探索空间异质性的病理图像很有吸引力。然而,很少有研究系统地评估了病理图像分割中基于电流互感器的方法。为了评估Transformer分割模型在全幻灯片图像(WSI)上的性能,我们使用广泛使用的PAIP肝脏组织病理学数据集,定量评估了六种流行的基于Transformer的肿瘤分割模型。为了进行更全面的分析,我们还将基于Transformer的模型与六种主要的传统CNN模型进行了比较。结果表明,基于Transformer的模型比基于CNN的模型具有更高的性能。特别是,Segmenter、Swin Transformer和Transune(均基于Transformer)在12个评估模型中表现最佳。 摘要:Histopathology has played an essential role in cancer diagnosis. With the rapid advances in convolutional neural networks (CNN). Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis. In the past few years, Transformer neural networks (Transformer) have shown the unique merit of capturing the global long distance dependencies across the entire image as a new deep learning paradigm. Such merit is appealing for exploring spatially heterogeneous pathological images. However, there have been very few, if any, studies that have systematically evaluated the current Transformer based approaches in pathological image segmentation. To assess the performance of Transformer segmentation models on whole slide images (WSI), we quantitatively evaluated six prevalent transformer-based models on tumor segmentation, using the widely used PAIP liver histopathological dataset. For a more comprehensive analysis, we also compare the transformer-based models with six major traditional CNN-based models. The results show that the Transformer-based models exhibit a general superior performance over the CNN-based models. In particular, Segmenter, Swin-Transformer and TransUNet, all transformer-based, came out as the best performers among the twelve evaluated models.
检测相关(9篇)
【1】 A Pedestrian Detection and Tracking Framework for Autonomous Cars: Efficient Fusion of Camera and LiDAR Data 标题:一种自动驾驶汽车行人检测与跟踪框架:摄像机与LiDAR数据的有效融合 链接:https://arxiv.org/abs/2108.12375
作者:Muhammad Mobaidul Islam,Abdullah Al Redwan Newaz,Ali Karimoddini 机构:Karimoddini are with theDepartment of Electrical and Computer Engineering, North Carolina A&TState University 摘要:提出了一种融合摄像机和激光雷达传感器数据的行人检测与跟踪新方法。为了应对与自主驾驶场景相关的挑战,提出了一个集成的跟踪和检测框架。检测阶段通过将激光雷达流转换为计算可处理的深度图像来执行,然后,开发深度神经网络来识别RGB和深度图像中的行人候选。为了提供准确的信息,通过使用卡尔曼滤波器融合多模态传感器信息,进一步增强了检测阶段。跟踪阶段是卡尔曼滤波预测和光流算法的组合,用于跟踪场景中的多个行人。我们在真实的公共驾驶数据集上评估我们的框架。实验结果表明,与仅使用基于图像的行人检测的基线方法相比,该方法实现了显著的性能改进。 摘要:This paper presents a novel method for pedestrian detection and tracking by fusing camera and LiDAR sensor data. To deal with the challenges associated with the autonomous driving scenarios, an integrated tracking and detection framework is proposed. The detection phase is performed by converting LiDAR streams to computationally tractable depth images, and then, a deep neural network is developed to identify pedestrian candidates both in RGB and depth images. To provide accurate information, the detection phase is further enhanced by fusing multi-modal sensor information using the Kalman filter. The tracking phase is a combination of the Kalman filter prediction and an optical flow algorithm to track multiple pedestrians in a scene. We evaluate our framework on a real public driving dataset. Experimental results demonstrate that the proposed method achieves significant performance improvement over a baseline method that solely uses image-based pedestrian detection.
【2】 TE-YOLOF: Tiny and efficient YOLOF for blood cell detection 标题:TE-YOLOF:微小而高效的血细胞检测YOLOF 链接:https://arxiv.org/abs/2108.12313
作者:Fanxin Xu,Xiangkui Li,Hang Yang,Yali Wang,Wei Xiang 机构:College of Electronic and Information, Southwest Minzu University, West China Biomedical Big Data Center 摘要:显微图像中的血细胞检测是医学图像处理研究的一个重要分支。由于基于人工检查血细胞的疾病检测耗时且充满误差,因此使用具有深度卷积神经网络的目标检测器检测血细胞可以被视为一种可行的解决方案。在这项工作中,提出了一种基于YOLOF的目标检测器,用于检测红细胞、白细胞和血小板等血细胞目标。这种物体检测器称为TE-YOLOF,小巧高效,是一种利用扩展编码器从单级特征地图中提取信息的单级检测器。为了提高效率和灵活性,EfficientNet卷积神经网络被用作目标检测器的主干。此外,为了提高网络性能和最小化网络参数,采用了深度可分离卷积。此外,采用Mish激活函数来提高精度。在BCCD数据集上的大量实验证明了该模型的有效性,它比现有的其他血细胞检测研究更有效。 摘要:Blood cell detection in microscopic images is an essential branch of medical image processing research. Since disease detection based on manual checking of blood cells is time-consuming and full of errors, testing of blood cells using object detectors with Deep Convolutional Neural Network can be regarded as a feasible solution. In this work, an object detector based on YOLOF has been proposed to detect blood cell objects such as red blood cells, white blood cells and platelets. This object detector is called TE-YOLOF, Tiny and Efficient YOLOF, and it is a One-Stage detector using dilated encoder to extract information from single-level feature maps. For increasing efficiency and flexibility, the EfficientNet Convolutional Neural Network is utilized as the backbone for the proposed object detector. Furthermore, the Depthwise Separable Convolution is applied to enhance the performance and minimize the parameters of the network. In addition, the Mish activation function is employed to increase the precision. Extensive experiments on the BCCD dataset prove the effectiveness of the proposed model, which is more efficient than other existing studies for blood cell detection.
【3】 Fast Rule-Based Clutter Detection in Automotive Radar Data 标题:基于规则的汽车雷达数据杂波快速检测 链接:https://arxiv.org/abs/2108.12224
作者:Johannes Kopp,Dominik Kellner,Aldi Piroli,Klaus Dietmayer 机构:©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media 备注:To be published in IEEE 24th International Conference on Intelligent Transportation Systems (ITSC), Indianapolis, USA, 2021 摘要:汽车雷达传感器输出大量无用的杂波或鬼影检测,其位置和速度与传感器视野中的任何真实物体都不对应。这对目标检测或跟踪等环境感知方法提出了重大挑战。尤其是在多个连续测量中出现的成组或相似位置的杂波检测问题。本文提出了一种识别此类错误检测的新算法。它主要基于对导致杂波的特定常见波传播路径的建模。特别是,明确涵盖的三种效应是汽车或卡车底部的反射、安装传感器的车辆与另一物体之间来回移动的信号以及通过镜面反射的多径传播。后者通常发生在护栏、混凝土墙或类似反射面附近。这些效应中的每一个都在理论上和关于识别相应杂波检测的方法上进行了描述。仅通过分析单个传感器测量产生的检测来进行识别。最后的算法是在真实的城外交通记录上评估的。对于标签,采用半自动过程。结果是有希望的,无论是在性能方面还是在非常低的执行时间方面。通常,大部分杂波被发现,而与真实对象相对应的检测只有一小部分被算法错误分类。 摘要:Automotive radar sensors output a lot of unwanted clutter or ghost detections, whose position and velocity do not correspond to any real object in the sensor's field of view. This poses a substantial challenge for environment perception methods like object detection or tracking. Especially problematic are clutter detections that occur in groups or at similar locations in multiple consecutive measurements. In this paper, a new algorithm for identifying such erroneous detections is presented. It is mainly based on the modeling of specific commonly occurring wave propagation paths that lead to clutter. In particular, the three effects explicitly covered are reflections at the underbody of a car or truck, signals traveling back and forth between the vehicle on which the sensor is mounted and another object, and multipath propagation via specular reflection. The latter often occurs near guardrails, concrete walls or similar reflective surfaces. Each of these effects is described both theoretically and regarding a method for identifying the corresponding clutter detections. Identification is done by analyzing detections generated from a single sensor measurement only. The final algorithm is evaluated on recordings of real extra-urban traffic. For labeling, a semi-automatic process is employed. The results are promising, both in terms of performance and regarding the very low execution time. Typically, a large part of clutter is found, while only a small ratio of detections corresponding to real objects are falsely classified by the algorithm.
【4】 Rethinking the Aligned and Misaligned Features in One-stage Object Detection 标题:一步目标检测中对齐和未对齐特征的再思考 链接:https://arxiv.org/abs/2108.12176
作者:Yang Yang,Min Li,Bo Meng,Junxing Ren,Degang Sun,Zihao Huang 机构:Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Beijing Institute of Technology, Beijing, China 摘要:单级目标检测器依靠点特征预测检测结果。但是,点特征可能缺少整个对象的信息,并导致对象和点特征之间的错位。同时,分类和回归任务对不同的目标区域很敏感,但它们的特征在空间上是一致的。在本文中,我们提出了一个简单的和插件操作符,可以分别为每个任务生成对齐和分离的特征,而不破坏完全卷积的方式。通过预测位于每个敏感区域的两个任务感知点集,该算子可以从空间维度上分离这两个任务,并将点特征与对象对齐。我们还揭示了一个有趣的发现,即分类和回归的长程跳跃连接的相反效果。基于对象对齐和任务分离算子(OAT),我们提出了OAT网络,它显式地利用点集特征来获得更准确的检测结果。在MS-COCO数据集上进行的大量实验表明,OAT可以将不同的单级探测器持续提升$sim$2 AP。值得注意的是,OAT Net通过Res2Net-101-DCN主干网实现了53.7 AP,并且对于小型对象显示出了良好的性能增益。 摘要:One-stage object detectors rely on the point feature to predict the detection results. However, the point feature may lack the information of the whole object and lead to a misalignment between the object and the point feature. Meanwhile, the classification and regression tasks are sensitive to different object regions, but their features are spatially aligned. In this paper, we propose a simple and plug-in operator that could generate aligned and disentangled features for each task, respectively, without breaking the fully convolutional manner. By predicting two task-aware point sets that are located in each sensitive region, this operator could disentangle the two tasks from the spatial dimension, as well as align the point feature with the object. We also reveal an interesting finding of the opposite effect of the long-range skip-connection for classification and regression, respectively. Based on the object-aligned and task-disentangled operator (OAT), we propose OAT-Net, which explicitly exploits point-set features for more accurate detection results. Extensive experiments on the MS-COCO dataset show that OAT can consistently boost different one-stage detectors by $sim$2 AP. Notably, OAT-Net achieves 53.7 AP with Res2Net-101-DCN backbone and shows promising performance gain for small objects.
【5】 Anomaly Detection of Defect using Energy of Point Pattern Features within Random Finite Set Framework 标题:随机有限集框架下基于点模式特征能量的缺陷异常检测 链接:https://arxiv.org/abs/2108.12159
作者:Ammar Mansoor Kamoona,Amirali Khodadadian Gostar,Alireza Bab-Hadiashar,Reza Hoseinnezhad 机构:Royal Melbourne Institute ofTechnology 备注:to be submitted to TII journal, 17pages 摘要:本文提出了一种基于点模式数据异常检测的工业缺陷检测方法。最新作品使用textit{global features}进行特征提取,以总结图像内容。但是,全局特征对照明和视点变化不具有鲁棒性,并且不能描述要在制造业中充分利用的图像几何信息。据我们所知,我们是第一个提出使用局部/点模式特征的转移学习来克服这些限制并捕获图像区域的几何信息的人。我们将这些局部/点模式特征建模为随机有限集(RFS)。此外,我们提出了RFS能量,与RFS似然作为异常分数相比。正态样本点模式特征的相似性分布被建模为多元高斯分布。所提出的RFS能量的参数学习不需要任何繁重的计算。我们在MVTec AD数据集(一个多目标缺陷检测数据集)上评估了所提出的方法。实验结果表明,与目前最先进的方法相比,我们提出的方法具有优异的性能,并且所提出的RFS能量在少数镜头学习环境下优于最先进的方法。 摘要:In this paper, we propose an efficient approach for industrial defect detection that is modeled based on anomaly detection using point pattern data. Most recent works use textit{global features} for feature extraction to summarize image content. However, global features are not robust against lighting and viewpoint changes and do not describe the image's geometrical information to be fully utilized in the manufacturing industry. To the best of our knowledge, we are the first to propose using transfer learning of local/point pattern features to overcome these limitations and capture geometrical information of the image regions. We model these local/point pattern features as a random finite set (RFS). In addition we propose RFS energy, in contrast to RFS likelihood as anomaly score. The similarity distribution of point pattern features of the normal sample has been modeled as a multivariate Gaussian. Parameters learning of the proposed RFS energy does not require any heavy computation. We evaluate the proposed approach on the MVTec AD dataset, a multi-object defect detection dataset. Experimental results show the outstanding performance of our proposed approach compared to the state-of-the-art methods, and the proposed RFS energy outperforms the state-of-the-art in the few shot learning settings.
【6】 Densely-Populated Traffic Detection using YOLOv5 and Non-Maximum Suppression Ensembling 标题:基于YOLOv5和非最大抑制集成的密集交通检测 链接:https://arxiv.org/abs/2108.12118
作者:Raian Rahman,Zadid Bin Azad,Md. Bakhtiar Hasan 机构:Department of Computer Science and Engineering, Islamic University of Technology 备注:13 pages, 4 figures, conference: International Conference on Big Data, IoT and Machine Learning 2021 (BIM 2021) 摘要:车辆目标检测是任何智能交通系统的核心。这对于城市交通管理至关重要。R-CNN、Fast R-CNN、Faster R-CNN和YLO是早期最先进的模型之一。基于区域的CNN方法存在推理时间长的问题,这使得实时使用该模型变得不现实。另一方面,YOLO努力检测成组出现的小物体。在本文中,我们提出了一种方法,可以定位和分类车辆对象从一个给定的密集拥挤的图像使用YOLOv5。YOLO的缺点在我集成了4种不同的模型后得到了解决。我们提出的模型在白天和晚上从街道的俯视图和侧视图拍摄的图像上都表现良好。我们提出的模型的性能是在达卡AI数据集上测量的,该数据集包含密集拥挤的车辆图像。我们的实验表明,我们的模型达到了预期的效果mAP@0.50.458,推断时间为0.75秒,在性能上优于其他最先进的模型。因此,该模型可以在街道上实现实时交通检测,用于交通控制和数据采集。 摘要:Vehicular object detection is the heart of any intelligent traffic system. It is essential for urban traffic management. R-CNN, Fast R-CNN, Faster R-CNN and YOLO were some of the earlier state-of-the-art models. Region based CNN methods have the problem of higher inference time which makes it unrealistic to use the model in real-time. YOLO on the other hand struggles to detect small objects that appear in groups. In this paper, we propose a method that can locate and classify vehicular objects from a given densely crowded image using YOLOv5. The shortcoming of YOLO was solved my ensembling 4 different models. Our proposed model performs well on images taken from both top view and side view of the street in both day and night. The performance of our proposed model was measured on Dhaka AI dataset which contains densely crowded vehicular images. Our experiment shows that our model achieved mAP@0.5 of 0.458 with inference time of 0.75 sec which outperforms other state-of-the-art models on performance. Hence, the model can be implemented in the street for real-time traffic detection which can be used for traffic control and data collection.
【7】 Detection and Continual Learning of Novel Face Presentation Attacks 标题:新型人脸呈现攻击的检测与持续学习 链接:https://arxiv.org/abs/2108.12081
作者:Mohammad Rostami,Leonidas Spinoulas,Mohamed Hussein,Joe Mathai,Wael Abd-Almageed 机构:USC Information Sciences Institute, Los Angeles, CA, USA, Alexandria University, Alexandria, Egypt 备注:None 摘要:深度学习的进步,加上大数据集的可用性,使人脸呈现攻击检测研究取得了令人印象深刻的进步。然而,最先进的面部防喷系统仍然容易受到训练期间从未见过的新型攻击。此外,即使正确检测到此类攻击,这些系统也无法适应新遇到的攻击。在初始检测阶段之后,持续检测新类型攻击和自适应识别这些攻击类型的后训练能力非常吸引人。在本文中,我们通过在训练样本分布之外抑制网络的置信水平,使深度神经网络能够检测到观察到的输入数据点中的异常,作为潜在的新类型攻击。然后,我们使用经验重播来更新模型,以便在不忘记过去学习到的攻击类型的情况下纳入关于新攻击类型的知识。实验结果证明了该方法在两个基准数据集以及一个新引入的具有多种攻击类型的数据集上的有效性。 摘要:Advances in deep learning, combined with availability of large datasets, have led to impressive improvements in face presentation attack detection research. However, state-of-the-art face antispoofing systems are still vulnerable to novel types of attacks that are never seen during training. Moreover, even if such attacks are correctly detected, these systems lack the ability to adapt to newly encountered attacks. The post-training ability of continually detecting new types of attacks and self-adaptation to identify these attack types, after the initial detection phase, is highly appealing. In this paper, we enable a deep neural network to detect anomalies in the observed input data points as potential new types of attacks by suppressing the confidence-level of the network outside the training samples' distribution. We then use experience replay to update the model to incorporate knowledge about new types of attacks without forgetting the past learned attack types. Experimental results are provided to demonstrate the effectiveness of the proposed method on two benchmark datasets as well as a newly introduced dataset which exhibits a large variety of attack types.
【8】 Ultrafast Focus Detection for Automated Microscopy 标题:用于自动显微镜的超快聚焦检测 链接:https://arxiv.org/abs/2108.12050
作者:Maksim Levental,Ryan Chard,Gregg A. Wildenberg 机构:University of Chicago, Argonne National Laboratory 摘要:科学仪器的最新进展导致每天实验室产生的数据量和速度急剧增加。扫描电子显微镜就是这样一个例子,在这个例子中,技术的进步让科学家们对蒙太奇、对齐和图像分割的关键数据感到不知所措——这是许多科学领域的关键实践,例如,包括神经科学,在神经科学中,它们被用来推导大脑的解剖关系。这些工具现在需要同样先进的计算资源和技术来实现其全部潜力。在这里,我们提出了一种快速失焦检测算法,用于连续采集的电子显微镜图像,并证明它可以用于神经病学研究提供近实时质量控制。我们的技术,多尺度组织学特征检测,采用经典的计算机视觉技术,并基于检测各种细粒度组织学特征。我们通过使用GPGPU原语进一步利用该技术中固有的并行性,以加速表征。执行的测试演示了对离焦情况的近实时检测。我们将这些功能部署为一个funcX函数,并表明它可以在使用自动管道收集数据时应用。我们将讨论扩展功能,这些扩展功能支持多光束显微镜,并与现有聚焦系统集成,以实现自动聚焦。 摘要:Recent advances in scientific instruments have resulted in dramatic increase in the volumes and velocities of data being generated in every-day laboratories. Scanning electron microscopy is one such example where technological advancements are now overwhelming scientists with critical data for montaging, alignment, and image segmentation -- key practices for many scientific domains, including, for example, neuroscience, where they are used to derive the anatomical relationships of the brain. These instruments now necessitate equally advanced computing resources and techniques to realize their full potential. Here we present a fast out-of-focus detection algorithm for electron microscopy images collected serially and demonstrate that it can be used to provide near-real time quality control for neurology research. Our technique, Multi-scale Histologic Feature Detection, adapts classical computer vision techniques and is based on detecting various fine-grained histologic features. We further exploit the inherent parallelism in the technique by employing GPGPU primitives in order to accelerate characterization. Tests are performed that demonstrate near-real-time detection of out-of-focus conditions. We deploy these capabilities as a funcX function and show that it can be applied as data are collected using an automated pipeline . We discuss extensions that enable scaling out to support multi-beam microscopes and integration with existing focus systems for purposes of implementing auto-focus.
【9】 Anomaly Detection in Medical Imaging -- A Mini Review 标题:医学影像中的异常检测--简评 链接:https://arxiv.org/abs/2108.11986
作者:Maximilian E. Tschuchnig,Michael Gadermayr 机构:Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein S¨ud , Puch, Austria 备注:Conference: iDSC2021 摘要:医学影像的日益数字化使得基于机器学习的检测、可视化和分割病变的改进成为可能,从而减轻了医学专家的工作量。然而,有监督的机器学习需要可靠的标记数据,这通常很难或不可能收集,或者至少很耗时,因此成本很高。因此,只需要部分标记数据(半监督)或完全不需要标记(非监督方法)的方法得到了更经常的应用。异常检测是一种可能的方法,它能够利用半监督和非监督方法来处理医学成像任务,如分类和分割。本文对医学影像学中的相关异常检测论文进行了半详尽的文献综述,以将其应用于分类,突出重要结果,总结经验教训,并就如何在医学影像学中进行异常检测提出进一步建议。定性分析基于google scholar和4个不同的搜索词,得出120篇不同的分析论文。主要结果表明,当前的研究主要是为了减少对标记数据的需要。此外,脑MRI领域的大量成功研究显示了在OCT和胸部X射线等进一步领域的应用潜力。 摘要:The increasing digitization of medical imaging enables machine learning based improvements in detecting, visualizing and segmenting lesions, easing the workload for medical experts. However, supervised machine learning requires reliable labelled data, which is is often difficult or impossible to collect or at least time consuming and thereby costly. Therefore methods requiring only partly labeled data (semi-supervised) or no labeling at all (unsupervised methods) have been applied more regularly. Anomaly detection is one possible methodology that is able to leverage semi-supervised and unsupervised methods to handle medical imaging tasks like classification and segmentation. This paper uses a semi-exhaustive literature review of relevant anomaly detection papers in medical imaging to cluster into applications, highlight important results, establish lessons learned and give further advice on how to approach anomaly detection in medical imaging. The qualitative analysis is based on google scholar and 4 different search terms, resulting in 120 different analysed papers. The main results showed that the current research is mostly motivated by reducing the need for labelled data. Also, the successful and substantial amount of research in the brain MRI domain shows the potential for applications in further domains like OCT and chest X-ray.
分类|识别相关(2篇)
【1】 Recognition Awareness: An Application of Latent Cognizance to Open-Set Recognition 标题:识别意识:潜在认知在开集识别中的应用 链接:https://arxiv.org/abs/2108.12115
作者:Tatpong Katanyukul,Pisit Nakjai 机构:Computer Engineering, Khon Kaen Univerity, Khon Kaen, Computer Science, Uttaradit Rajabhat University, Uttaradit, Thailand 备注:27 pages 摘要:本研究探讨了一种新的概率解释softmax输出在开放集识别(OSR)中的应用。Softmax是一种广泛应用于分类和对象识别的机制。然而,softmax机制强制模型在封闭集范式下运行,即从一组预定义的标签中预测对象类。这种特征有助于分类的有效性,但在目标识别中存在非感官预测的风险。目标识别通常是在动态和多样的条件下进行的。任何时候都可能遇到外部对象——任何未准备好的类的对象。OSR旨在解决在对象识别中识别异物的问题。基于Bayes定理和对上下文条件的强调,重新解释了softmax推理。这种重新解释导致了一种新的OSR方法,称为潜在认知(LC)。我们的调查采用了各种场景,使用Imagenet 2012数据集以及愚弄和开放集图像。研究结果支持LC假说,并显示其对OSR的有效性。 摘要:This study investigates an application of a new probabilistic interpretation of a softmax output to Open-Set Recognition (OSR). Softmax is a mechanism wildly used in classification and object recognition. However, a softmax mechanism forces a model to operate under a closed-set paradigm, i.e., to predict an object class out of a set of pre-defined labels. This characteristic contributes to efficacy in classification, but poses a risk of non-sense prediction in object recognition. Object recognition is often operated under a dynamic and diverse condition. A foreign object -- an object of any unprepared class -- can be encountered at any time. OSR is intended to address an issue of identifying a foreign object in object recognition. Based on Bayes theorem and the emphasis of conditioning on the context, softmax inference has been re-interpreted. This re-interpretation has led to a new approach to OSR, called Latent Cognizance (LC). Our investigation employs various scenarios, using Imagenet 2012 dataset as well as fooling and open-set images. The findings support LC hypothesis and show its effectiveness on OSR.
【2】 Binocular Mutual Learning for Improving Few-shot Classification 标题:一种改进Few-Shot分类的双目交互学习方法 链接:https://arxiv.org/abs/2108.12104
作者:Ziqi Zhou,Xi Qiu,Jiangtao Xie,Jianan Wu,Chi Zhang 机构:Megvii Technology, Dalian University of Technology 备注:Accepted by ICCV 2021 摘要:大多数少数快照学习方法学习从具有大量标记数据(即基集)的数据集中转移知识。从基集上的类空间的角度来看,现有的方法要么通过正常的预训练,集中于利用全局视图下的所有类,要么更注重在局部视图中采用幕式方式,在少数类内训练元任务。然而,很少探讨这两种观点之间的相互作用。当这两个视图捕获互补信息时,我们自然会考虑它们之间的兼容性,以实现进一步的性能提升。受相互学习范式和双目视差的启发,我们提出了一个统一的框架,即双目相互学习(BML),它通过内部视图和交叉视图建模实现全局视图和局部视图的兼容性。具体地说,全局视图在整个类空间中学习,以捕获丰富的类间关系。同时,局部视图在每个片段的局部类空间中学习,重点是正确匹配正对。此外,跨视角互动进一步促进了协作学习和相互间对有用知识的内隐探索。在元测试中,双目嵌入被聚合在一起以支持决策,这大大提高了分类的准确性。在多个基准上进行的大量实验(包括跨域验证)证实了我们方法的有效性。 摘要:Most of the few-shot learning methods learn to transfer knowledge from datasets with abundant labeled data (i.e., the base set). From the perspective of class space on base set, existing methods either focus on utilizing all classes under a global view by normal pretraining, or pay more attention to adopt an episodic manner to train meta-tasks within few classes in a local view. However, the interaction of the two views is rarely explored. As the two views capture complementary information, we naturally think of the compatibility of them for achieving further performance gains. Inspired by the mutual learning paradigm and binocular parallax, we propose a unified framework, namely Binocular Mutual Learning (BML), which achieves the compatibility of the global view and the local view through both intra-view and cross-view modeling. Concretely, the global view learns in the whole class space to capture rich inter-class relationships. Meanwhile, the local view learns in the local class space within each episode, focusing on matching positive pairs correctly. In addition, cross-view mutual interaction further promotes the collaborative learning and the implicit exploration of useful knowledge from each other. During meta-test, binocular embeddings are aggregated together to support decision-making, which greatly improve the accuracy of classification. Extensive experiments conducted on multiple benchmarks including cross-domain validation confirm the effectiveness of our method.
分割|语义相关(3篇)
【1】 ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation 标题:ISNet:综合图像级和语义级上下文进行语义分割 链接:https://arxiv.org/abs/2108.12382
作者:Zhenchao Jin,Bin Liu,Qi Chu,Nenghai Yu 机构:CAS Key Laboratory of Electromagnetic Space Information, University of Science and Technology of China 备注:Accepted by ICCV2021 摘要:共现视觉模式使得聚集上下文信息成为增强语义图像分割像素表示的常用范例。现有的方法侧重于从整个图像的角度对上下文进行建模,即聚合图像级上下文信息。尽管令人印象深刻,但这些方法削弱了同一类别像素表示的重要性,即语义级上下文信息。为了解决这个问题,本文提出通过分别聚合图像级和语义级上下文信息来增强像素表示。首先,设计了一个图像级上下文模块,用于捕获整个图像中每个像素的上下文信息。其次,我们为每个像素聚合相同类别的表示,其中类别区域在地面真值分割的监督下学习。第三,我们分别计算每个像素表示和图像级上下文信息、语义级上下文信息之间的相似度。最后,通过将图像级上下文信息和语义级上下文信息以相似度作为权重进行加权聚合,增强像素表示。结合图像级和语义级上下文,本文可以报告四个基准的最先进的准确性,即ADE20K、LIP、COCOStuff和Cityscapes。 摘要:Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation. The existing approaches focus on modeling the context from the perspective of the whole image, i.e., aggregating the image-level contextual information. Despite impressive, these methods weaken the significance of the pixel representations of the same category, i.e., the semantic-level contextual information. To address this, this paper proposes to augment the pixel representations by aggregating the image-level and semantic-level contextual information, respectively. First, an image-level context module is designed to capture the contextual information for each pixel in the whole image. Second, we aggregate the representations of the same category for each pixel where the category regions are learned under the supervision of the ground-truth segmentation. Third, we compute the similarities between each pixel representation and the image-level contextual information, the semantic-level contextual information, respectively. At last, a pixel representation is augmented by weighted aggregating both the image-level contextual information and the semantic-level contextual information with the similarities as the weights. Integrating the image-level and semantic-level context allows this paper to report state-of-the-art accuracy on four benchmarks, i.e., ADE20K, LIP, COCOStuff and Cityscapes.
【2】 Predicting Stable Configurations for Semantic Placement of Novel Objects 标题:预测新对象语义放置的稳定构形 链接:https://arxiv.org/abs/2108.12062
作者:Chris Paxton,Chris Xie,Tucker Hermans,Dieter Fox 机构:Allen School of Computer Science & Engineering, USA; 3School of Computing, University of Utah 摘要:人类环境包含许多以各种安排配置的对象。我们的目标是使机器人能够在新的环境中,根据学习到的语义关系,休息以前看不见的物体。我们将这个问题分为两部分:(1)为对象找到物理上有效的位置;(2)确定这些姿势是否满足已学习的高级语义关系。我们从头开始构建模型和训练,以便与我们提出的未知对象语义放置规划算法紧密结合。我们纯粹在模拟中训练模型,不需要在现实世界中使用微调。我们的方法使运动规划成为可能,用于在具有不同几何体的场景中对未知对象进行语义重排,而仅限于RGB-D感知。我们通过一组模拟烧蚀进行的实验表明,仅使用关系分类器不足以进行可靠的规划。我们还通过对各种对象的一组真实实验,进一步证明了我们的计划者生成和执行各种操作计划的能力。 摘要:Human environments contain numerous objects configured in a variety of arrangements. Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We break this problem down into two parts: (1) finding physically valid locations for the objects and (2) determining if those poses satisfy learned, high-level semantic relationships. We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects. We train our models purely in simulation, with no fine-tuning needed for use in the real world. Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing. Our experiments through a set of simulated ablations demonstrate that using a relational classifier alone is not sufficient for reliable planning. We further demonstrate the ability of our planner to generate and execute diverse manipulation plans through a set of real-world experiments with a variety of objects.
【3】 CoCo DistillNet: a Cross-layer Correlation Distillation Network for Pathological Gastric Cancer Segmentation 标题:COCO DistillNet:一种用于病理性胃癌分割的跨层相关蒸馏网络 链接:https://arxiv.org/abs/2108.12173
作者:Wenxuan Zou,Muyi Sun 机构:School of Automation, Beijing University of Posts and Telecommunications, Beijing, China, Center for Research on Intelligent Perception and Computing, NLPR, CASIA, Beijing, China 摘要:近年来,深卷积神经网络在病理图像分割方面取得了重大进展。然而,病理图像分割遇到了一个难题,即高性能的网络通常需要更多的计算资源和存储。由于病理图像固有的高分辨率,这种现象限制了高精度网络在真实场景中的应用。为了解决这个问题,我们提出了一种新的用于病理性胃癌分割的跨层相关(CoCo)知识提取网络——CoCo蒸馏网络。知识提取是一种通用技术,旨在通过从繁琐的网络中转移知识来提高紧凑网络的性能。具体地说,我们的CoCo提取网对不同层之间的通道混合空间相似性的相关性进行建模,然后将这些知识从预先训练的笨重教师网络转移到未训练的紧凑学生网络。此外,我们还利用对抗性学习策略来进一步促进蒸馏过程,称之为对抗性蒸馏(AD)。此外,为了稳定我们的训练过程,我们利用无监督的释义模块(PM)来促进教师网络中的知识释义。因此,在胃癌分割数据集上进行的大量实验证明了CoCo蒸馏网的突出能力,它实现了最先进的性能。 摘要:In recent years, deep convolutional neural networks have made significant advances in pathology image segmentation. However, pathology image segmentation encounters with a dilemma in which the higher-performance networks generally require more computational resources and storage. This phenomenon limits the employment of high-accuracy networks in real scenes due to the inherent high-resolution of pathological images. To tackle this problem, we propose CoCo DistillNet, a novel Cross-layer Correlation (CoCo) knowledge distillation network for pathological gastric cancer segmentation. Knowledge distillation, a general technique which aims at improving the performance of a compact network through knowledge transfer from a cumbersome network. Concretely, our CoCo DistillNet models the correlations of channel-mixed spatial similarity between different layers and then transfers this knowledge from a pre-trained cumbersome teacher network to a non-trained compact student network. In addition, we also utilize the adversarial learning strategy to further prompt the distilling procedure which is called Adversarial Distillation (AD). Furthermore, to stabilize our training procedure, we make the use of the unsupervised Paraphraser Module (PM) to boost the knowledge paraphrase in the teacher network. As a result, extensive experiments conducted on the Gastric Cancer Segmentation Dataset demonstrate the prominent ability of CoCo DistillNet which achieves state-of-the-art performance.
Zero/Few Shot|迁移|域适配|自适应(3篇)
【1】 Continual learning under domain transfer with sparse synaptic bursting 标题:稀疏突触爆发域转移下的连续学习 链接:https://arxiv.org/abs/2108.12056
作者:Shawn L. Beaulieu,Jeff Clune,Nick Cheney 机构:Systems Center; fDepartment of Computer Science, University of British Columbia: Vancouver, BC, Canada 摘要:现有的机器是功能特定的工具,易于预测和控制。未来的机器在易变性、弹性和自主性方面可能更接近生物系统。但首先,他们必须能够学习和保留新信息,而不必反复接触。过去设计这类系统的努力都是在应用环境受限的情况下,利用特定于任务的模块来构建或调节人工神经网络。这还不能在不破坏现有知识的情况下对以前看不见的长序列数据进行持续学习:这是一个被称为灾难性遗忘的问题。在本文中,我们介绍了一个系统,该系统可以在以前看不到的数据集(ImageNet,CIFAR-100)上顺序学习,并且随着时间的推移几乎不会忘记。这是通过使用由第二个前馈神经网络生成的自上而下调制,在输入的基础上调节卷积神经网络中权重的活动来实现的。我们发现,我们的方法在域转移下不断学习,在任务之间循环的权重中有稀疏的活动突发,而不是通过维护特定于任务的模块。研究发现,稀疏的突触爆破可以平衡活动的增强和减弱,从而有助于适应新的输入,而不会破坏先前获得的功能。这种行为出现在先前的元学习阶段,在此阶段,受调节的突触从一致抑制的初始状态选择性地去抑制或生长。 摘要:Existing machines are functionally specific tools that were made for easy prediction and control. Tomorrow's machines may be closer to biological systems in their mutability, resilience, and autonomy. But first they must be capable of learning, and retaining, new information without repeated exposure to it. Past efforts to engineer such systems have sought to build or regulate artificial neural networks using task-specific modules with constrained circumstances of application. This has not yet enabled continual learning over long sequences of previously unseen data without corrupting existing knowledge: a problem known as catastrophic forgetting. In this paper, we introduce a system that can learn sequentially over previously unseen datasets (ImageNet, CIFAR-100) with little forgetting over time. This is accomplished by regulating the activity of weights in a convolutional neural network on the basis of inputs using top-down modulation generated by a second feed-forward neural network. We find that our method learns continually under domain transfer with sparse bursts of activity in weights that are recycled across tasks, rather than by maintaining task-specific modules. Sparse synaptic bursting is found to balance enhanced and diminished activity in a way that facilitates adaptation to new inputs without corrupting previously acquired functions. This behavior emerges during a prior meta-learning phase in which regulated synapses are selectively disinhibited, or grown, from an initial state of uniform suppression.
【2】 A Tutorial on Learning Disentangled Representations in the Imaging Domain 标题:关于在成像域中学习解缠表示的教程 链接:https://arxiv.org/abs/2108.12043
作者:Xiao Liu,Pedro Sanchez,Spyridon Thermos,Alison Q. O'Neil,Sotirios A. Tsaftaris 机构:Member, IEEE 备注:This paper follows a tutorial style but also surveys a considerable (200 citations) number of works 摘要:解构表征学习是一种学习一般表征的方法。这可以在没有注释或注释有限的情况下完成。一个好的通用表示可以很容易地使用少量数据为新的目标任务进行微调,甚至可以直接用于在相应任务中实现显著性能的不可见域。这种数据和注释需求的缓解为计算机视觉和医疗保健中易于处理且价格合理的应用提供了诱人的前景。最后,解纠缠表示可以提供模型解释性,并可以帮助我们理解变异因素的潜在因果关系,从而提高其对实际部署的适用性。在这篇教程论文中,我们将提供一个解构表征学习的概述,它的构建块和标准,并讨论在计算机视觉和医学成像中的应用。在结束本教程时,我们将介绍将最新的机器学习进展集成到解纠缠中的已确定的机会,以及剩余的挑战。 摘要:Disentangled representation learning has been proposed as an approach to learning general representations. This can be done in the absence of, or with limited, annotations. A good general representation can be readily fine-tuned for new target tasks using modest amounts of data, or even be used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for tractable and affordable applications in computer vision and healthcare. Finally, disentangled representations can offer model explainability and can help us understand the underlying causal relations of the factors of variation, increasing their suitability for real-world deployment. In this tutorial paper, we will offer an overview of the disentangled representation learning, its building blocks and criteria, and discuss applications in computer vision and medical imaging. We conclude our tutorial by presenting the identified opportunities for the integration of recent machine learning advances into disentanglement, as well as the remaining challenges.
【3】 Learning Cross-modal Contrastive Features for Video Domain Adaptation 标题:用于视频域自适应的跨模态对比特征学习 链接:https://arxiv.org/abs/2108.11974
作者:Donghyun Kim,Yi-Hsuan Tsai,Bingbing Zhuang,Xiang Yu,Stan Sclaroff,Kate Saenko,Manmohan Chandraker 机构:Boston University,NEC Labs America,MIT-IBM Watson AI Lab 备注:Accepted in ICCV'21 摘要:从视频中学习可转移和领域自适应的特征表示对于动作识别等视频相关任务非常重要。现有的视频域自适应方法主要依赖于基于RGB图像空间的对抗性特征对齐。然而,视频数据通常与多模态信息(例如RGB和光流)相关联,因此设计一种更好的方法来考虑跨域自适应设置下的跨模态输入仍然是一个挑战。为此,我们提出了一个统一的视频域自适应框架,该框架同时规范了跨模态和跨域特征表示。具体来说,我们将一个领域中的每一种模式视为一个视图,并利用对比学习技术和适当设计的抽样策略。因此,我们的目标是规范化特征空间,这些特征空间最初缺乏跨模式的连接,或者跨领域的对齐较少。我们在领域自适应动作识别基准数据集(即UCF、HMDB和EPIC Kitchens)上进行了实验,并证明了我们的组件对最先进算法的有效性。 摘要:Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.
半弱无监督|主动学习|不确定性(1篇)
【1】 MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving 标题:MultiSiam:自主驾驶的自监督多实例暹罗表示学习 链接:https://arxiv.org/abs/2108.12178
作者:Kai Chen,Lanqing Hong,Hang Xu,Zhenguo Li,Dit-Yan Yeung 机构:Hong Kong University of Science and Technology, Huawei Noah’s Ark Lab 备注:Accepted by ICCV 2021 摘要:多年来,自动驾驶引起了人们的广泛关注,但事实证明它比预期的更难,这可能是因为模型训练中难以收集标记数据。自监督学习(SSL)仅利用未标记数据进行表示学习,可能是提高模型性能的一种很有前途的方法。然而,现有的SSL方法通常依赖于单中心对象保证,这可能不适用于街道场景等多实例数据集。为了缓解这一限制,我们提出了两个需要解决的问题:(1)如何定义交叉视图一致性的正样本;(2)如何在多实例情况下度量相似度。我们首先在随机裁剪过程中采用IoU阈值,将全局不一致性转化为局部一致性。然后,我们提出了两种特征对齐方法,使二维特征映射能够进行多实例相似性度量。此外,我们采用具有自关注的图像内聚类方法进一步挖掘图像内的相似性和翻译不变性。实验表明,当在Waymo数据集上进行预训练时,我们称之为多实例暹罗网络(MultiSiam)的方法显著提高了泛化能力,并在自主驾驶基准(包括Cityscapes和BDD100K)上实现了最先进的传输性能,而现有的SSL对应物(如MoCo、MoCo-v2、,BYOL的性能明显下降。通过对大规模自主驾驶数据集SODA1000进行预训练,MultiSiam超过了ImageNet预训练的MoCo-v2,展示了特定领域预训练的潜力。代码将在https://github.com/KaiChen1998/MultiSiam. 摘要:Autonomous driving has attracted much attention over the years but turns out to be harder than expected, probably due to the difficulty of labeled data collection for model training. Self-supervised learning (SSL), which leverages unlabeled data only for representation learning, might be a promising way to improve model performance. Existing SSL methods, however, usually rely on the single-centric-object guarantee, which may not be applicable for multi-instance datasets such as street scenes. To alleviate this limitation, we raise two issues to solve: (1) how to define positive samples for cross-view consistency and (2) how to measure similarity in multi-instance circumstances. We first adopt an IoU threshold during random cropping to transfer global-inconsistency to local-consistency. Then, we propose two feature alignment methods to enable 2D feature maps for multi-instance similarity measurement. Additionally, we adopt intra-image clustering with self-attention for further mining intra-image similarity and translation-invariance. Experiments show that, when pre-trained on Waymo dataset, our method called Multi-instance Siamese Network (MultiSiam) remarkably improves generalization ability and achieves state-of-the-art transfer performance on autonomous driving benchmarks, including Cityscapes and BDD100K, while existing SSL counterparts like MoCo, MoCo-v2, and BYOL show significant performance drop. By pre-training on SODA10M, a large-scale autonomous driving dataset, MultiSiam exceeds the ImageNet pre-trained MoCo-v2, demonstrating the potential of domain-specific pre-training. Code will be available at https://github.com/KaiChen1998/MultiSiam.
GAN|对抗|攻击|生成相关(1篇)
【1】 DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis 标题:DAE-GAN:用于文本到图像合成的动态纵横比感知GAN 链接:https://arxiv.org/abs/2108.12141
作者:Shulan Ruan,Yong Zhang,Kun Zhang,Yanbo Fan,Fan Tang,Qi Liu,Enhong Chen 机构:School of Computer Science and Technology, University of Science and Technology of China, Tencent AI Lab,Hefei University of Technology,Jilin University 备注:10 pages, 6 figures 摘要:文本到图像合成是指从给定的文本描述生成图像,其关键目标在于照片真实感和语义一致性。以前的方法通常通过句子嵌入生成初始图像,然后通过细粒度单词嵌入对其进行细化。尽管取得了重大进展,但文本中包含的“方面”信息(例如,红眼)通常被忽略,它指的是几个词,而不是描述“某事物的特定部分或特征”的词,这对合成图像细节非常有帮助。如何更好地利用文本到图像合成中的纵横比信息仍然是一个尚未解决的挑战。为了解决这一问题,本文提出了一种动态方面感知GAN(DAE-GAN),它从多粒度(包括句子级、单词级和方面级)全面表示文本信息。此外,受人类学习行为的启发,我们开发了一种新的用于图像细化的方面感知动态重画器(ADR),其中交替使用了有注意全局细化(AGR)模块和方面感知局部细化(ALR)模块。AGR利用单词级嵌入对先前生成的图像进行全局增强,而ALR则动态地利用纵横比级嵌入从局部角度细化图像细节。最后,设计了相应的匹配损失函数,保证了文本图像在不同层次上的语义一致性。在两个研究充分且公开可用的数据集(即CUB-200和COCO)上进行的大量实验证明了我们方法的优越性和合理性。 摘要:Text-to-image synthesis refers to generating an image from a given text description, the key goal of which lies in photo realism and semantic consistency. Previous methods usually generate an initial image with sentence embedding and then refine it with fine-grained word embedding. Despite the significant progress, the 'aspect' information (e.g., red eyes) contained in the text, referring to several words rather than a word that depicts 'a particular part or feature of something', is often ignored, which is highly helpful for synthesizing image details. How to make better utilization of aspect information in text-to-image synthesis still remains an unresolved challenge. To address this problem, in this paper, we propose a Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence-level, word-level, and aspect-level. Moreover, inspired by human learning behaviors, we develop a novel Aspect-aware Dynamic Re-drawer (ADR) for image refinement, in which an Attended Global Refinement (AGR) module and an Aspect-aware Local Refinement (ALR) module are alternately employed. AGR utilizes word-level embedding to globally enhance the previously generated image, while ALR dynamically employs aspect-level embedding to refine image details from a local perspective. Finally, a corresponding matching loss function is designed to ensure the text-image semantic consistency at different levels. Extensive experiments on two well-studied and publicly available datasets (i.e., CUB-200 and COCO) demonstrate the superiority and rationality of our method.
人脸|人群计数(1篇)
【1】 DC-GNet: Deep Mesh Relation Capturing Graph Convolution Network for 3D Human Shape Reconstruction 标题:DC-GNET:用于三维人体重建的深网格关系捕捉图形卷积网络 链接:https://arxiv.org/abs/2108.12384
作者:Shihao Zhou,Mengxi Jiang,Shanshan Cai,Yunqi Lei 机构:Department of Computer Science, School of Informatics, Xiamen University, Xiamen, Fujian Province, China 备注:Accepted by ACM MM'21 (oral) 摘要:在本文中,我们的目标是重建一个完整的三维人体形状从一个单一的图像。以前的顶点级和参数回归方法基于预定义的邻接矩阵重建三维人体形状,以编码节点之间的正关系。三维人体表面的深层拓扑关系没有被仔细利用。此外,在处理真实场景中更多遮挡情况时,大多数现有方法的性能往往会受到域间隙的影响。在这项工作中,我们提出了一种深网格关系捕获图卷积网络,DC-GNet,具有用于三维人体形状重建的形状完成任务。首先,我们提出捕捉网格顶点内的深层关系,其中引入了一种同时编码正关系和负关系的自适应矩阵。其次,我们提出了一个形状完成任务来学习各种遮挡情况的先验知识。我们的方法从更遥远区域的节点之间更微妙的关系编码网格结构。此外,我们的形状完成模块缓解了室外场景中的性能下降问题。在多个基准上的大量实验表明,我们的方法优于以前的三维人体姿势和形状估计方法。 摘要:In this paper, we aim to reconstruct a full 3D human shape from a single image. Previous vertex-level and parameter regression approaches reconstruct 3D human shape based on a pre-defined adjacency matrix to encode positive relations between nodes. The deep topological relations for the surface of the 3D human body are not carefully exploited. Moreover, the performance of most existing approaches often suffer from domain gap when handling more occlusion cases in real-world scenes. In this work, we propose a Deep Mesh Relation Capturing Graph Convolution Network, DC-GNet, with a shape completion task for 3D human shape reconstruction. Firstly, we propose to capture deep relations within mesh vertices, where an adaptive matrix encoding both positive and negative relations is introduced. Secondly, we propose a shape completion task to learn prior about various kinds of occlusion cases. Our approach encodes mesh structure from more subtle relations between nodes in a more distant region. Furthermore, our shape completion module alleviates the performance degradation issue in the outdoor scene. Extensive experiments on several benchmarks show that our approach outperforms the previous 3D human pose and shape estimation approaches.
图像视频检索|Re-id相关(1篇)
【1】 An Automatic Image Content Retrieval Method for better Mobile Device Display User Experiences 标题:一种改善移动设备显示用户体验的自动图像内容检索方法 链接:https://arxiv.org/abs/2108.12068
作者:Alessandro Bruno 机构:Department of Computing and Informatics at Bournemouth University, Fern Barrow, Poole, Dorset, BH,BB, United Kingdom 备注:5 pages, 5 figures 摘要:越来越多的商用手机配备了集成的高分辨率数码相机。这为图像分析提供了一类新的专用应用,如移动视觉搜索、图像裁剪、目标检测、基于内容的图像检索、图像分类。本文提出了一种用于移动设备显示的图像内容检索和分类的移动应用程序,以丰富用户的视觉体验。移动应用程序可以基于图像的内容,通过视觉显著性方法提取一定数量的图像,该视觉显著性方法旨在从感知视点检测给定图像中的最关键区域。首先,使用2D显著性函数的局部极大值从感知角度提取最关键的区域。接下来,使用以图像的阈值显著性贴图的局部最大值为中心的边界框裁剪显著区域。然后,将每幅图像裁剪成一个基于SVM和SIFT描述符的图像分类系统,以检测图像中存在的对象类别。使用ImageNet存储库作为语义类别分类的参考。Android平台用于在客户端-服务器架构上实现移动应用程序。移动客户端将相机拍摄的照片发送到服务器,服务器处理图像并将结果(图像内容,如图像裁剪和相关目标类)返回给移动客户端。该应用程序在数千张图片上运行,并显示了令人鼓舞的结果,通过移动显示器实现了更好的用户视觉体验。 摘要:A growing number of commercially available mobile phones come with integrated high-resolution digital cameras. That enables a new class of dedicated applications to image analysis such as mobile visual search, image cropping, object detection, content-based image retrieval, image classification. In this paper, a new mobile application for image content retrieval and classification for mobile device display is proposed to enrich the visual experience of users. The mobile application can extract a certain number of images based on the content of an image with visual saliency methods aiming at detecting the most critical regions in a given image from a perceptual viewpoint. First, the most critical areas from a perceptual perspective are extracted using the local maxima of a 2D saliency function. Next, a salient region is cropped using the bounding box centred on the local maxima of the thresholded Saliency Map of the image. Then, each image crop feds into an Image Classification system based on SVM and SIFT descriptors to detect the class of object present in the image. ImageNet repository was used as the reference for semantic category classification. Android platform was used to implement the mobile application on a client-server architecture. A mobile client sends the photo taken by the camera to the server, which processes the image and returns the results (image contents such as image crops and related target classes) to the mobile client. The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.
超分辨率|去噪|去模糊|去雾(1篇)
【1】 Deep Denoising Method for Side Scan Sonar Images without High-quality Reference Data 标题:无高质量参考数据的侧扫声纳图像深度去噪方法 链接:https://arxiv.org/abs/2108.12083
作者:Xiaoteng Zhou,Changli Yu,Xin Yuan,Citong Luo 机构:School of Ocean Engineering, Harbin Institute of Technology, Weihai, China 摘要:侧扫声纳(SSS)测量的水下图像是自主式水下机器人(AUV)深海探测过程中必不可少的视觉数据。它们可以生动地反映海底的地形,但通常伴随着复杂而严重的噪音。提出了一种针对无高质量参考数据的SSS图像的深度去噪方法,该方法使用一幅单噪声SSS图像进行自监督去噪。与经典的人工设计滤波器相比,深度去噪方法具有明显的优势。对真实海底SSS图像进行了去噪实验,结果表明,该方法能够有效地降低SSS图像的噪声,同时最大限度地降低图像质量和细节损失。 摘要:Subsea images measured by the side scan sonars (SSSs) are necessary visual data in the process of deep-sea exploration by using the autonomous underwater vehicles (AUVs). They could vividly reflect the topography of the seabed, but usually accompanied by complex and severe noise. This paper proposes a deep denoising method for SSS images without high-quality reference data, which uses one single noise SSS image to perform self-supervised denoising. Compared with the classical artificially designed filters, the deep denoising method shows obvious advantages. The denoising experiments are performed on the real seabed SSS images, and the results demonstrate that our proposed method could effectively reduce the noise on the SSS image while minimizing the image quality and detail loss.
3D|3D重建等相关(1篇)
【1】 A Novel Hierarchical Light Field Coding Scheme Based on Hybrid Stacked Multiplicative Layers and Fourier Disparity Layers for Glasses-Free 3D Displays 标题:一种新的基于混合叠加乘层和傅里叶视差层的免眼镜3D显示器分层光场编码方案 链接:https://arxiv.org/abs/2108.12399
作者:Joshitha Ravishankar,Mansi Sharma 机构:Indian Institute of Technology Madras, Chennai , India. 摘要:提出了一种基于低阶乘法层和傅里叶视差层透射模式的光场分层编码方案。该方案利用卷积神经网络对不同扫描阶数的光场视图子集进行优化,识别出乘法层。我们的方法利用了从不同扫描模式子集获得的乘法层中隐藏的低秩结构。通过在Krylov子空间的不同秩上执行低秩近似,可以有效地去除乘法层中的空间冗余。近似层之间的视图内和视图间冗余通过HEVC编码进一步去除。接下来,基于所选择的层次顺序,从近似光场的第一子集构造傅里叶视差层表示。随后的视图子集是通过对傅里叶视差层建模合成的,傅里叶视差层以更高的精度迭代地细化表示。所提出的混合分层表示和编码方案的关键优势在于,它不仅利用了光场中的空间和时间冗余,而且有效地利用了相邻子孔径图像在水平和垂直方向上的内在相似性,如不同预测顺序所指定的。此外,该方案灵活地在单个集成系统内的解码器处实现多个比特率的范围。在真实光场上分析了该方案的压缩性能。我们实现了大量的比特率节约,并保持了良好的光场重建质量。 摘要:This paper presents a novel hierarchical coding scheme for light fields based on transmittance patterns of low-rank multiplicative layers and Fourier disparity layers. The proposed scheme identifies multiplicative layers of light field view subsets optimized using a convolutional neural network for different scanning orders. Our approach exploits the hidden low-rank structure in the multiplicative layers obtained from the subsets of different scanning patterns. The spatial redundancies in the multiplicative layers can be efficiently removed by performing low-rank approximation at different ranks on the Krylov subspace. The intra-view and inter-view redundancies between approximated layers are further removed by HEVC encoding. Next, a Fourier disparity layer representation is constructed from the first subset of the approximated light field based on the chosen hierarchical order. Subsequent view subsets are synthesized by modeling the Fourier disparity layers that iteratively refine the representation with improved accuracy. The critical advantage of the proposed hybrid layered representation and coding scheme is that it utilizes not just spatial and temporal redundancies in light fields but efficiently exploits intrinsic similarities among neighboring sub-aperture images in both horizontal and vertical directions as specified by different predication orders. In addition, the scheme is flexible to realize a range of multiple bitrates at the decoder within a single integrated system. The compression performance of the proposed scheme is analyzed on real light fields. We achieved substantial bitrate savings and maintained good light field reconstruction quality.
其他神经网络|深度学习|模型|建模(1篇)
【1】 Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process 标题:基于知识驱动Dirichlet过程的终身无限混合模型 链接:https://arxiv.org/abs/2108.12278
作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted by International Conference on Computer Vision (ICCV 2021) 摘要:最近在终身学习方面的研究工作提出了一种混合模式,以适应越来越多的任务。所提出的方法在克服灾难性遗忘方面显示了良好的效果。然而,这些成功模式背后的理论仍然没有得到很好的理解。在本文中,我们通过基于模型生成的数据的概率表示与目标数据集对应的概率表示之间的差异距离来推导风险边界,从而对终身学习模型进行理论分析。受理论分析的启发,我们引入了一种新的终身学习方法,即终身无限混合(LIMix)模型,该模型可以自动扩展其网络结构或选择适当的组件来调整其参数以学习新任务,同时保留其先前学习的信息。我们建议通过Dirichlet过程,通过使用门控机制来合并知识,门控机制计算先前学习并存储在每个组件中的知识与新数据集之间的依赖关系。此外,我们还训练了一个紧凑的学生模型,该模型可以随着时间的推移积累跨域表示并进行快速推断。该守则可于https://github.com/dtuzi123/Lifelong-infinite-mixture-model. 摘要:Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks. The proposed methodology shows promising results in overcoming catastrophic forgetting. However, the theory behind these successful models is still not well understood. In this paper, we perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data generated by the model and that corresponding to the target dataset. Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model, which can automatically expand its network architectures or choose an appropriate component to adapt its parameters for learning a new task, while preserving its previously learnt information. We propose to incorporate the knowledge by means of Dirichlet processes by using a gating mechanism which computes the dependence between the knowledge learnt previously and stored in each component, and a new set of data. Besides, we train a compact Student model which can accumulate cross-domain representations over time and make quick inferences. The code is available at https://github.com/dtuzi123/Lifelong-infinite-mixture-model.
其他(8篇)
【1】 Stop Throwing Away Discriminators! Re-using Adversaries for Test-Time Training 标题:别再扔掉歧视性的人了!再利用对手进行测试时间训练 链接:https://arxiv.org/abs/2108.12280
作者:Gabriele Valvano,Andrea Leo,Sotirios A. Tsaftaris 机构:IMT School for Advanced Studies Lucca, Lucca , LU, Italy, School of Engineering, University of Edinburgh, Edinburgh EH,FB, UK 备注:Accepted at: Domain Adaptation and Representation Transfer (DART) 2021 摘要:由于生成性对抗网络(GAN)能够在不需要成对数据的情况下学习数据分布,因此它已成为许多计算机视觉方法的一个组成部分,包括为医学图像分割而开发的方法。这些方法联合训练分割器和对抗性掩模鉴别器,后者提供数据驱动的形状识别。在推断时,将丢弃鉴别器,并且仅使用分割器来预测测试图像上的标签映射。但我们是否应该放弃鉴别器?在这里,我们认为对抗性歧视者的生命周期不应在训练后结束。相反,训练稳定的GANs会产生强大的形状先验,我们可以用来纠正推理中的分段或错误。为了实现这一点,我们开发了稳定的掩模鉴别器,不会过度匹配或灾难性遗忘。在测试时,我们在每个单独的测试实例上微调分段器,直到它满足预先学习的形状。我们的方法实现简单,提高了模型性能。此外,它为在推理中重新使用掩码鉴别器开辟了新的方向。我们发布了用于实验的代码https://vios-s.github.io/adversarial-test-time-training. 摘要:Thanks to their ability to learn data distributions without requiring paired data, Generative Adversarial Networks (GANs) have become an integral part of many computer vision methods, including those developed for medical image segmentation. These methods jointly train a segmentor and an adversarial mask discriminator, which provides a data-driven shape prior. At inference, the discriminator is discarded, and only the segmentor is used to predict label maps on test images. But should we discard the discriminator? Here, we argue that the life cycle of adversarial discriminators should not end after training. On the contrary, training stable GANs produces powerful shape priors that we can use to correct segmentor mistakes at inference. To achieve this, we develop stable mask discriminators that do not overfit or catastrophically forget. At test time, we fine-tune the segmentor on each individual test instance until it satisfies the learned shape prior. Our method is simple to implement and increases model performance. Moreover, it opens new directions for re-using mask discriminators at inference. We release the code used for the experiments at https://vios-s.github.io/adversarial-test-time-training.
【2】 TIMo -- A Dataset for Indoor Building Monitoring with a Time-of-Flight Camera 标题:TIMO--一种带飞行时间相机的室内建筑监测数据集 链接:https://arxiv.org/abs/2108.12196
作者:Pascal Schneider,Yuriy Anisimov,Raisul Islam,Bruno Mirbach,Jason Rambach,Frédéric Grandidier,Didier Stricker 摘要:我们介绍了TIMo(飞行时间室内监控),这是一个使用飞行时间(ToF)摄像机捕获的基于视频的室内空间监控数据集。由此产生的深度视频让人们执行一组不同的预定义动作,我们提供了详细的注释。用于人员计数的人员检测和异常检测是两个目标应用。大多数现有的监控视频数据集提供灰度或RGB视频。另一方面,深度信息在这类数据集中仍然很少见,尽管它在计算机视觉的其他研究领域非常流行和普遍。我们的数据集解决了监控视频数据集领域的这一差距。这些记录发生在两个不同的位置,ToF摄像机设置为自上而下或倾斜视角。该数据集可在以下位置公开获取:https://vizta-tof.kl.dfki.de/timo-dataset-overview/. 摘要:We present TIMo (Time-of-flight Indoor Monitoring), a dataset for video-based monitoring of indoor spaces captured using a time-of-flight (ToF) camera. The resulting depth videos feature people performing a set of different predefined actions, for which we provide detailed annotations. Person detection for people counting and anomaly detection are the two targeted applications. Most existing surveillance video datasets provide either grayscale or RGB videos. Depth information, on the other hand, is still a rarity in this class of datasets in spite of being popular and much more common in other research fields within computer vision. Our dataset addresses this gap in the landscape of surveillance video datasets. The recordings took place at two different locations with the ToF camera set up either in a top-down or a tilted perspective on the scene. The dataset is publicly available at https://vizta-tof.kl.dfki.de/timo-dataset-overview/.
【3】 LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links 标题:LassoLayer:一对一链路切换的非线性特征选择 链接:https://arxiv.org/abs/2108.12165
作者:Akihito Sudo,Teng Teck Hou,Masaki Yamaguchi,Yoshinori Tone 机构:Shizuoka University, Japan, ST Engineering Ltd., Singapore, yamaguchi.masaki., JAVIS CO., LTD., Vietnam. 摘要:随着人们对解决更复杂问题的渴望,特征选择方法变得越来越重要。特征选择方法可分为包装方法、过滤方法和嵌入方法。Lasso作为一种强大的嵌入式特征选择方法,引起了众多研究者的关注。然而,作为一种线性方法,套索的适用性受到限制。在这项工作中,我们提出了LassoLayer,它是一对一连接的,并通过L1优化进行训练,从而去掉不必要的预测单元。对于非线性特征选择,我们构建了LassoMLP:配备LassoLayer作为第一层的网络。因为我们可以在任何网络结构中插入LassoLayer,所以它可以利用神经网络的强度,适用于需要特征选择的任务。我们通过回归和分类任务评估LassoMLP在特征选择中的作用。LassoMLP接收的功能包括大量噪音因素,这些因素对过度装配有害。在使用MNIST数据集的实验中,我们确认LassoMLP优于最先进的方法。 摘要:Along with the desire to address more complex problems, feature selection methods have gained in importance. Feature selection methods can be classified into wrapper method, filter method, and embedded method. Being a powerful embedded feature selection method, Lasso has attracted the attention of many researchers. However, as a linear approach, the applicability of Lasso has been limited. In this work, we propose LassoLayer that is one-to-one connected and trained by L1 optimization, which work to drop out unnecessary units for prediction. For nonlinear feature selections, we build LassoMLP: the network equipped with LassoLayer as its first layer. Because we can insert LassoLayer in any network structure, it can harness the strength of neural network suitable for tasks where feature selection is needed. We evaluate LassoMLP in feature selection with regression and classification tasks. LassoMLP receives features including considerable numbers of noisy factors that is harmful for overfitting. In the experiments using MNIST dataset, we confirm that LassoMLP outperforms the state-of-the-art method.
【4】 A Matching Algorithm based on Image Attribute Transfer and Local Features for Underwater Acoustic and Optical Images 标题:一种基于图像属性传递和局部特征的水下声光图像匹配算法 链接:https://arxiv.org/abs/2108.12151
作者:Xiaoteng Zhou,Changli Yu,Xin Yuan,Citong Luo 机构:The authors in this manuscript are with the School of Ocean, Engineering, Harbin Institute of Technology, Weihai , China, (e-mail:, by turbidity. However, it will encounter special cases in the, imaging process, such as low signal-to-noise ratio (SNR) and 摘要:在水下视觉研究领域,声纳传感器与光学相机之间的图像匹配一直是一个具有挑战性的问题。由于它们在成像机理上的差异,即声学图像和光学图像的灰度值、纹理、对比度等在局部位置上也是不同的,这使得传统的基于光学图像的匹配方法失效。加上水下数据采集的难度和成本高,进一步影响了声光数据融合技术的研究进程。为了最大限度地利用水下传感器数据,促进多传感器信息融合(MSIF)的发展,本研究采用基于深度学习的图像属性转移方法来解决声光图像匹配问题,其核心是尽可能地消除它们之间的成像差异。同时,引入了先进的局部特征描述子来解决具有挑战性的声光匹配问题。实验结果表明,该方法能有效地对声光图像进行预处理,获得准确的匹配结果。另外,该方法基于图像深度语义层的组合,可以间接显示原始图像对之间的局部特征匹配关系,为水下多传感器图像匹配问题提供了一种新的解决方案。 摘要:In the field of underwater vision research, image matching between the sonar sensors and optical cameras has always been a challenging problem. Due to the difference in the imaging mechanism between them, which are the gray value, texture, contrast, etc. of the acoustic images and the optical images are also variant in local locations, which makes the traditional matching method based on the optical image invalid. Coupled with the difficulties and high costs of underwater data acquisition, it further affects the research process of acousto-optic data fusion technology. In order to maximize the use of underwater sensor data and promote the development of multi-sensor information fusion (MSIF), this study applies the image attribute transfer method based on deep learning approach to solve the problem of acousto-optic image matching, the core of which is to eliminate the imaging differences between them as much as possible. At the same time, the advanced local feature descriptor is introduced to solve the challenging acousto-optic matching problem. Experimental results show that our proposed method could preprocess acousto-optic images effectively and obtain accurate matching results. Additionally, the method is based on the combination of image depth semantic layer, and it could indirectly display the local feature matching relationship between original image pair, which provides a new solution to the underwater multi-sensor image matching problem.
【5】 FOVEA: Foveated Image Magnification for Autonomous Navigation 标题:用于自主导航的凹槽图像放大技术 链接:https://arxiv.org/abs/2108.12102
作者:Chittesh Thavamani,Mengtian Li,Nicolas Cebron,Deva Ramanan 机构:Carnegie Mellon University, Argo AI 备注:ICCV 2021. Code can be found on the project page at this https URL 摘要:高效处理高分辨率视频流对于许多机器人应用(如自动驾驶)来说是安全关键。图像下采样是确保满足延迟约束的常用技术。然而,这种幼稚的方法极大地限制了对象检测器识别小对象的能力。在本文中,我们提出了一种注意方法,在保持小的输入画布的同时,弹性地放大某些区域。放大区域是那些被认为具有包含对象的高概率的区域,其信号可以来自数据集范围的先验或根据最近的对象预测计算的帧级先验。通过基于KDE的映射实现放大,将边界框转换为扭曲参数,然后将扭曲参数输入具有反裁剪正则化的图像采样器。然后将扭曲的图像反馈给检测器,并应用可微后向映射来获得原始空间中的边界框输出。我们的区域放大使算法能够更好地利用高分辨率输入,而不会产生高分辨率处理的成本。在自主驾驶数据集Argoverse HD和BDD100K上,我们展示了我们提出的方法在有或无微调的情况下,将检测AP提高到标准更快的R-CNN。此外,我们的方法建立在先前流式检测技术的基础上,在Argoverse HD上为AP流式检测创造了新的记录(在GTX 1080 Ti GPU上从17.8到23.0),这表明它实现了更高的准确性和延迟权衡。 摘要:Efficient processing of high-resolution video streams is safety-critical for many robotics applications such as autonomous driving. Image downsampling is a commonly adopted technique to ensure the latency constraint is met. However, this naive approach greatly restricts an object detector's capability to identify small objects. In this paper, we propose an attentional approach that elastically magnifies certain regions while maintaining a small input canvas. The magnified regions are those that are believed to have a high probability of containing an object, whose signal can come from a dataset-wide prior or frame-level prior computed from recent object predictions. The magnification is implemented by a KDE-based mapping to transform the bounding boxes into warping parameters, which are then fed into an image sampler with anti-cropping regularization. The detector is then fed with the warped image and we apply a differentiable backward mapping to get bounding box outputs in the original space. Our regional magnification allows algorithms to make better use of high-resolution input without incurring the cost of high-resolution processing. On the autonomous driving datasets Argoverse-HD and BDD100K, we show our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning. Additionally, building on top of the previous state-of-the-art in streaming detection, our method sets a new record for streaming AP on Argoverse-HD (from 17.8 to 23.0 on a GTX 1080 Ti GPU), suggesting that it has achieved a superior accuracy-latency tradeoff.
【6】 Matching Underwater Sonar Images by the Learned Descriptor Based on Style Transfer Method 标题:基于风格转移的学习描述子匹配水下声纳图像 链接:https://arxiv.org/abs/2108.12072
作者:Xiaoteng Zhou,Changli Yu,Xin Yuan,Citong Luo 机构:School of Ocean Engineering, Harbin Institute of Technology, Weihai, China 摘要:该文提出了一种将样式转换技术与学习描述符相结合的方法来提高水下声纳图像的匹配性能。在水下视觉领域,声纳是目前最有效的远程探测传感器,它在地图绘制和目标搜索任务中具有优异的性能。然而,传统的图像匹配算法都是基于光学图像的。为了解决这一矛盾,本文采用样式转换方法将声纳图像转换为光学样式,同时引入具有良好表达能力的学习描述子进行声纳图像匹配。实验表明,该方法显著提高了声纳图像的匹配质量。此外,还为利用样式转换方法对水下声纳图像进行预处理提供了新的思路。 摘要:This paper proposes a method that combines the style transfer technique and the learned descriptor to enhance the matching performances of underwater sonar images. In the field of underwater vision, sonar is currently the most effective long-distance detection sensor, it has excellent performances in map building and target search tasks. However, the traditional image matching algorithms are all developed based on optical images. In order to solve this contradiction, the style transfer method is used to convert the sonar images into optical styles, and at the same time, the learned descriptor with excellent expressiveness for sonar images matching is introduced. Experiments show that this method significantly enhances the matching quality of sonar images. In addition, it also provides new ideas for the preprocessing of underwater sonar images by using the style transfer approach.
【7】 Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers 标题:Drop-DTW:丢弃离群值的同时对齐序列间的公共信号 链接:https://arxiv.org/abs/2108.11996
作者:Nikita Dvornik,Isma Hadji,Konstantinos G. Derpanis,Animesh Garg,Allan D. Jepson 机构:Samsung AI Centre Toronto, University of Toronto 摘要:在这项工作中,我们考虑序列异常的信号序列序列比对问题。假设没有异常值,标准动态时间扭曲(DTW)算法有效地计算两个(通常)可变长度序列之间的最佳对齐。虽然DTW对信号的时间偏移和膨胀具有鲁棒性,但在存在可以任意散布在序列中的异常值时,它无法以有意义的方式对齐序列。为了解决这个问题,我们引入了Drop-DTW,这是一种新的算法,可以在序列之间对齐公共信号,同时自动从匹配中删除异常元素。整个过程作为一个有效且完全可微的动态程序来实现。在我们的实验中,我们证明了Drop-DTW是一种用于序列检索的鲁棒相似性度量,并证明了它在不同应用中作为训练损失的有效性。通过Drop DTW,我们解决了教学视频的时间步长定位、噪声视频的表征学习以及用于视听检索和定位的跨模态表征学习。在所有应用中,我们都采用弱监督或无监督的方法,并在这些设置下展示最先进的结果。 摘要:In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way in the presence of outliers that can be arbitrarily interspersed in the sequences. To address this problem, we introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. The entire procedure is implemented as a single dynamic program that is efficient and fully differentiable. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications. With Drop-DTW, we address temporal step localization on instructional videos, representation learning from noisy videos, and cross-modal representation learning for audio-visual retrieval and localization. In all applications, we take a weakly- or unsupervised approach and demonstrate state-of-the-art results under these settings.
【8】 SynthIA: A Synthetic Inversion Approximation for the Stokes Vector Fusing SDO and Hinode into a Virtual Observatory 标题:Synthia:将SDO和Hinode融合成虚拟天文台的Stokes矢量的综合反演近似 链接:https://arxiv.org/abs/2108.12421
作者:Richard E. L. Higgins,David F. Fouhey,Spiro K. Antiochos,Graham Barnes,Mark C. M. Cheung,J. Todd Hoeksema,KD Leka,Yang Liu,Peter W. Schuck,Tamas I. Gombosi 机构:Mark C.M. Cheung, K. D. Leka, University of Michigan, Department of Electrical Engineering and Computer Science, Ann Arbor, MI, NASA GSFC, Silver Spring, MD, NorthWest Research Associates, Boulder, CO 摘要:NASA的太阳动力学观测站(SDO)和JAXA/NASA Hinode任务都包括用于测量光球磁场的光谱偏振仪器。SDO的太阳地震和磁成像仪(HMI)强调全盘高步频和良好的空间分辨率数据采集,而Hinode的太阳光学望远镜光谱偏振仪(SOT-SP)则以有限的视野和较慢的时间步频为代价,专注于高空间分辨率和光谱采样。这项工作引入了一个名为SynthIA(合成反演近似)的深度学习系统,该系统可以通过捕获每个仪器的最佳特性来增强这两项任务。我们使用SynthIA生成了一个新的磁图数据产品SynodeP(合成Hinode管道),该产品模拟了高光谱分辨率Hinode/SOT-SP管道的磁图,但源自全磁盘、高频率和低光谱分辨率SDO/HMI Stokes观测。所提供数据的结果表明,SynodeP与Hinode/SOT-SP管道反演(包括当前SDO/HMI管道未提供的磁填充分数)具有良好的一致性。SynodeP进一步显示SDO/HMI数据中存在的24小时振荡幅度减小。为了证明SynthIA的通用性,我们展示了SDO/AIA数据和HMI数据子集作为输入的使用,这可以在Hinode/SOT-SP反演的保真度、使用的观测数量和时间伪影之间进行权衡。我们讨论了SynthIA的可能推广及其对空间天气建模的影响。这项工作是美国宇航局太阳物理科学中心(SOLSTCE)在美国国家航空航天局80SnSc20K0600 E资助下的一部分,将是开源的。 摘要:Both NASA's Solar Dynamics Observatory (SDO) and the JAXA/NASA Hinode mission include spectropolarimetric instruments designed to measure the photospheric magnetic field. SDO's Helioseismic and Magnetic Imager (HMI) emphasizes full-disk high-cadence and good spatial resolution data acquisition while Hinode's Solar Optical Telescope Spectro-Polarimeter (SOT-SP) focuses on high spatial resolution and spectral sampling at the cost of a limited field of view and slower temporal cadence. This work introduces a deep-learning system named SynthIA (Synthetic Inversion Approximation), that can enhance both missions by capturing the best of each instrument's characteristics. We use SynthIA to produce a new magnetogram data product, SynodeP (Synthetic Hinode Pipeline), that mimics magnetograms from the higher spectral resolution Hinode/SOT-SP pipeline, but is derived from full-disk, high-cadence, and lower spectral-resolution SDO/HMI Stokes observations. Results on held-out data show that SynodeP has good agreement with the Hinode/SOT-SP pipeline inversions, including magnetic fill fraction, which is not provided by the current SDO/HMI pipeline. SynodeP further shows a reduction in the magnitude of the 24-hour oscillations present in the SDO/HMI data. To demonstrate SynthIA's generality, we show the use of SDO/AIA data and subsets of the HMI data as inputs, which enables trade-offs between fidelity to the Hinode/SOT-SP inversions, number of observations used, and temporal artifacts. We discuss possible generalizations of SynthIA and its implications for space weather modeling. This work is part of the NASA Heliophysics DRIVE Science Center (SOLSTICE) at the University of Michigan under grant NASA 80NSSC20K0600E, and will be open-sourced.
机器翻译,仅供参考