人工智能学术速递[12.6]

cs.AI人工智能，共计57篇

【1】 Coupling Vision and Proprioception for Navigation of Legged Robots 标题：腿式机器人导航中的视觉与视觉耦合链接：https://arxiv.org/abs/2112.02094

作者：Zipeng Fu,Ashish Kumar,Ananye Agarwal,Haozhi Qi,Jitendra Malik,Deepak Pathak 备注：Website and videos at this https URL 摘要：我们利用视觉和本体感觉的互补优势，在腿部机器人中实现点目标导航。腿式系统比轮式机器人能够穿越更复杂的地形，但为了充分利用这一能力，我们需要导航系统中的高级路径规划人员了解低级别移动策略在不同地形上的行走能力。我们通过使用本体感知反馈来估计行走策略的安全操作极限，并感知意外障碍物和地形特性，如视觉可能错过的地面平滑度或柔软度，从而实现这一目标。导航系统使用车载摄像头生成入住地图和相应的成本地图，以实现目标。然后，FMM（快速行进法）规划器生成目标路径。速度命令生成器将此作为输入，使用来自安全顾问的意外障碍物和地形确定速度限制的附加约束作为输入，为移动策略生成所需速度。与轮式机器人（LoCoBot）基线和其他具有不相交的高层规划和底层控制的基线相比，我们显示出了优越的性能。我们还展示了我们的系统在四足机器人上的实际部署，该机器人带有机载传感器和计算机。视频在https://navigation-locomotion.github.io/camera-ready 摘要：We exploit the complementary strengths of vision and proprioception to achieve point goal navigation in a legged robot. Legged systems are capable of traversing more complex terrain than wheeled robots, but to fully exploit this capability, we need the high-level path planner in the navigation system to be aware of the walking capabilities of the low-level locomotion policy on varying terrains. We achieve this by using proprioceptive feedback to estimate the safe operating limits of the walking policy, and to sense unexpected obstacles and terrain properties like smoothness or softness of the ground that may be missed by vision. The navigation system uses onboard cameras to generate an occupancy map and a corresponding cost map to reach the goal. The FMM (Fast Marching Method) planner then generates a target path. The velocity command generator takes this as input to generate the desired velocity for the locomotion policy using as input additional constraints, from the safety advisor, of unexpected obstacles and terrain determined speed limits. We show superior performance compared to wheeled robot (LoCoBot) baselines, and other baselines which have disjoint high-level planning and low-level control. We also show the real-world deployment of our system on a quadruped robot with onboard sensors and compute. Videos at https://navigation-locomotion.github.io/camera-ready

【2】 Class-agnostic Reconstruction of Dynamic Objects from Videos 标题：视频中与类无关的动态对象重建链接：https://arxiv.org/abs/2112.02091

作者：Zhongzheng Ren,Xiaoming Zhao,Alexander G. Schwing 备注：NeurIPS 2021 摘要：我们引入了REDO，一个与类无关的框架来从RGBD或校准视频重建动态对象。与之前的工作相比，我们的问题设置更真实，但更具挑战性，原因有三：1）由于遮挡或相机设置，感兴趣的对象可能永远不会完全可见，但我们的目标是重建完整的形状；2）我们的目标是处理不同的对象动力学，包括刚体运动、非刚体运动和关节；3）我们的目标是用一个统一的框架重建不同类别的对象。为了应对这些挑战，我们开发了两个新模块。首先，我们引入一个标准的4D隐式函数，该函数是与聚集的时间视觉线索对齐的像素。其次，我们开发了一个4D转换模块，该模块捕获对象动态以支持时间传播和聚合。我们在合成RGBD视频数据集SAIL-VOS 3D和变形4D 以及真实世界视频数据3DPW的大量实验中研究了REDO的效果。我们发现REDO比最先进的动态重建方法有一定的优势。在消融研究中，我们验证了每个开发的组件。摘要：We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics including rigid motion, non-rigid motion, and articulation; 3) we aim to reconstruct different categories of objects with one unified framework. To address these challenges, we develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues. Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation. We study the efficacy of REDO in extensive experiments on synthetic RGBD video datasets SAIL-VOS 3D and DeformingThings4D , and on real-world video data 3DPW. We find REDO outperforms state-of-the-art dynamic reconstruction methods by a margin. In ablation studies we validate each developed component.

【3】 Data-Free Neural Architecture Search via Recursive Label Calibration 标题：基于递归标签校准的无数据神经结构搜索链接：https://arxiv.org/abs/2112.02086

作者：Zechun Liu,Zhiqiang Shen,Yun Long,Eric Xing,Kwang-Ting Cheng,Chas Leichner 备注：Technical report 摘要：本文旨在探讨在不使用任何原始训练数据的情况下，仅给出预训练模型的神经结构搜索（NAS）的可行性。在现实场景中，这是隐私保护、避免偏见等的重要环境。为了实现这一点，我们首先通过从预先训练的深层神经网络中恢复知识来合成可用数据。然后，我们使用合成数据及其预测的软标签来指导神经结构搜索。我们发现NAS任务需要合成的数据（我们这里的目标是图像域），这些数据具有足够的语义、多样性，并且与自然图像的域间距最小。对于语义，我们提出递归标签校准来产生更多的信息输出。对于多样性，我们提出了一种区域更新策略，以生成更加多样和语义丰富的合成数据。对于最小域间距，我们使用输入和特征级正则化来模拟原始数据在潜在空间中的分布。我们用三种流行的NAS算法来实例化我们提出的框架：DART、ProxylessNAS和SPO。令人惊讶的是，我们的结果表明，通过使用我们的合成数据搜索发现的架构实现了与通过搜索原始架构首次发现的架构相当甚至更高的精确度，得出结论：如果综合方法设计得当，NAS可以有效地进行，而无需访问原始数据或所谓的自然数据。我们的代码将公开提供。摘要：This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. This is an important circumstance for privacy protection, bias avoidance, etc., in real-world scenarios. To achieve this, we start by synthesizing usable data through recovering the knowledge from a pre-trained deep neural network. Then we use the synthesized data and their predicted soft-labels to guide neural architecture search. We identify that the NAS task requires the synthesized data (we target at image domain here) with enough semantics, diversity, and a minimal domain gap from the natural images. For semantics, we propose recursive label calibration to produce more informative outputs. For diversity, we propose a regional update strategy to generate more diverse and semantically-enriched synthetic data. For minimal domain gap, we use input and feature-level regularization to mimic the original data distribution in latent space. We instantiate our proposed framework with three popular NAS algorithms: DARTS, ProxylessNAS and SPOS. Surprisingly, our results demonstrate that the architectures discovered by searching with our synthetic data achieve accuracy that is comparable to, or even higher than, architectures discovered by searching from the original ones, for the first time, deriving the conclusion that NAS can be done effectively with no need of access to the original or called natural data if the synthesis method is well designed. Our code will be publicly available.

【4】 Cyberphysical Sequencing for Distributed Asset Management with Broad Traceability 标题：具有广泛可追溯性的分布式资产管理的CyberPhysical Sequence 链接：https://arxiv.org/abs/2112.02079

作者：Joshua Siegel,Gregory Falco 备注：14 pages, 6 figures 摘要：网络物理系统（CPS）具有复杂的生命周期，涉及多个利益相关者，硬件和软件组件供应链的透明度充其量是不透明的。这引起了利益相关者的担忧，他们可能不相信他们收到的是所要求的。有机会建立一个网络物理所有权过程，提供普遍的可追溯性和基于出处区分系统的能力。如今，RFID标签和条形码满足了其中的一些需求，尽管它们由于与对象或系统的固有特性没有链接而易于操作。我们建议将网络物理测序作为一种低成本、轻量和普及的方法，将跟踪和跟踪功能添加到将系统的物理身份与唯一不变的数字标识符联系起来的任何资产。CPS测序提供了类似于“数字孪生兄弟”的好处，可以在资产的整个生命周期内用更少的计算和其他资源识别和管理资产的来源和身份。摘要：Cyber-Physical systems (CPS) have complex lifecycles involving multiple stakeholders, and the transparency of both hardware and software components' supply chain is opaque at best. This raises concerns for stakeholders who may not trust that what they receive is what was requested. There is an opportunity to build a cyberphysical titling process offering universal traceability and the ability to differentiate systems based on provenance. Today, RFID tags and barcodes address some of these needs, though they are easily manipulated due to non-linkage with an object or system's intrinsic characteristics. We propose cyberphysical sequencing as a low-cost, light-weight and pervasive means of adding track-and-trace capabilities to any asset that ties a system's physical identity to a unique and invariant digital identifier. CPS sequencing offers benefits similar Digital Twins' for identifying and managing the provenance and identity of an asset throughout its life with far fewer computational and other resources.

【5】 Malakai: Music That Adapts to the Shape of Emotions 标题：马拉凯：适应情感形态的音乐链接：https://arxiv.org/abs/2112.02070

作者：Zack Harris,Liam Atticus Clarke,Pietro Gagliano,Dante Camarena,Manal Siddiqui,Pablo S. Castro 摘要：ML音乐模型的出现，比如谷歌品红的MusicVAE，现在允许我们从其他复杂的数据集中提取和复制构图特征。这些模型允许计算作曲家参数化抽象变量，如风格和情绪。通过利用这些模型，并将它们与过去几十年的程序算法相结合，可以创建一首动态歌曲，实时合成音乐，伴随互动体验。Malakai是一款帮助不同技能水平的用户创作、聆听、混音和分享此类动态歌曲的工具。使用Malakai，作曲家可以创作一首动态歌曲，听众可以与之互动摘要：The advent of ML music models such as Google Magenta's MusicVAE now allow us to extract and replicate compositional features from otherwise complex datasets. These models allow computational composers to parameterize abstract variables such as style and mood. By leveraging these models and combining them with procedural algorithms from the last few decades, it is possible to create a dynamic song that composes music in real-time to accompany interactive experiences. Malakai is a tool that helps users of varying skill levels create, listen to, remix and share such dynamic songs. Using Malakai, a Composer can create a dynamic song that can be interacted with by a Listener

【6】 An Analytical Update Rule for General Policy Optimization 标题：一种通用策略优化的解析更新规则链接：https://arxiv.org/abs/2112.02045

作者：Hepeng Li,Nicholas Clavette,Haibo He 摘要：我们提出了一个独立于参数化函数逼近器的分析策略更新规则。该更新规则适用于具有单调改进保证的一般随机策略。更新规则是使用变分法从封闭形式的信赖域解决方案中推导出来的，遵循一个新的理论结果，该结果收紧了使用信赖域方法进行策略搜索的现有边界。提供了在策略更新规则和值函数方法之间建立连接的说明。基于更新规则的递归形式，自然地导出了一种非策略算法，并且保持了单调的改进保证。此外，当一次由一个代理执行更新时，更新规则立即扩展到多代理系统。摘要：We present an analytical policy update rule that is independent of parameterized function approximators. The update rule is suitable for general stochastic policies with monotonic improvement guarantee. The update rule is derived from a closed-form trust-region solution using calculus of variation, following a new theoretical result that tightens existing bounds for policy search using trust-region methods. An explanation building a connection between the policy update rule and value-function methods is provided. Based on a recursive form of the update rule, an off-policy algorithm is derived naturally, and the monotonic improvement guarantee remains. Furthermore, the update rule extends immediately to multi-agent systems when updates are performed by one agent at a time.

【7】 Could AI Democratise Education? Socio-Technical Imaginaries of an EdTech Revolution 标题：人工智能能让教育民主化吗？教育技术革命的社会技术想象链接：https://arxiv.org/abs/2112.02034

作者：Sahan Bulathwela,María Pérez-Ortiz,Catherine Holloway,John Shawe-Taylor 备注：To be presented at Workshop on Machine Learning for the Developing World (ML4D) at the Conference on Neural Information Processing Systems 2021 摘要：据说，教育中的人工智能（AI）有潜力构建更个性化的课程，并在全球范围内实现教育民主化，创造新的教学方式和学习方式的复兴。数以百万计的学生已经开始从这些技术的使用中受益，但全世界还有数以百万计的学生没有受益。如果这一趋势继续下去，人工智能在教育领域的首次应用可能会导致更大的教育不平等，以及当前技术决定论叙事所引发的全球教育资源配置不当。在本文中，我们重点围绕人工智能在教育中的未来进行推测并提出问题，目的是展开紧迫的对话，为技术渗透的新一代教育奠定正确的基础。本文首先综合人工智能如何改变我们的学习和教学方式，特别关注个性化学习伙伴的情况，然后讨论一些社会技术特征，这些特征对于避免这些人工智能系统在全球范围内的危险（并可能确保其成功）至关重要。本文还讨论了将人工智能与免费、参与性和民主资源（如维基百科、开放教育资源和开源工具）结合使用的潜力。我们还强调需要集体设计以人为中心、透明、互动和协作的基于人工智能的算法，为利益相关者提供授权和完全代理，并支持新兴的教学法。最后，我们要问的是，这场教育革命需要什么才能超越任何政治、文化、语言、地理和学习能力障碍，提供平等和授权的教育机会。摘要：Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first delivery of AI in Education could be greater educational inequality, along with a global misallocation of educational resources motivated by the current technological determinism narrative. In this paper, we focus on speculating and posing questions around the future of AI in Education, with the aim of starting the pressing conversation that would set the right foundations for the new generation of education that is permeated by technology. This paper starts by synthesising how AI might change how we learn and teach, focusing specifically on the case of personalised learning companions, and then move to discuss some socio-technical features that will be crucial for avoiding the perils of these AI systems worldwide (and perhaps ensuring their success). This paper also discusses the potential of using AI together with free, participatory and democratic resources, such as Wikipedia, Open Educational Resources and open-source tools. We also emphasise the need for collectively designing human-centered, transparent, interactive and collaborative AI-based algorithms that empower and give complete agency to stakeholders, as well as support new emerging pedagogies. Finally, we ask what would it take for this educational revolution to provide egalitarian and empowering access to education, beyond any political, cultural, language, geographical and learning ability barriers.

【8】 Practitioner-Centric Approach for Early Incident Detection Using Crowdsourced Data for Emergency Services 标题：使用众包数据进行应急服务的以从业者为中心的早期事件检测方法链接：https://arxiv.org/abs/2112.02012

作者：Yasas Senarath,Ayan Mukhopadhyay,Sayyed Mohsen Vazirizade,Hemant Purohit,Saideep Nannapaneni,Abhishek Dubey 备注：Accepted at IEEE International Conference on Data Mining (ICDM) 2021 摘要：应急响应在很大程度上取决于事件报告的时间。不幸的是，接收事故报告的传统方法（如在美国拨打911）存在时间延迟。Waze等众包平台为事件的早期识别提供了机会。然而，由于与众包数据流相关的噪声和不确定性的挑战，从众包数据流中检测事件是困难的。此外，简单地优化过检测精度可能会影响推理的时空定位，从而使此类方法不适用于实际部署。本文以应急响应管理为例，提出了一种新的基于众包数据的以从业者为中心的事件检测问题公式和解决方法。提议的方法CROME（众包多目标事件检测）量化了事件分类的性能指标（如F1分数）与模型从业者的要求（如事件检测的1 km半径）之间的关系。首先，我们展示了如何在卷积神经网络（CNN）架构中，将众包报告、地面实况历史数据和其他相关决定因素（如交通和天气）结合使用，以早期检测紧急事件。然后，我们使用基于帕累托优化的方法来优化CNN的输出，并结合以从业者为中心的参数来平衡检测精度和时空定位。最后，我们使用来自Waze的众包数据和来自美国田纳西州纳什维尔的交通事故报告证明了该方法的适用性。我们的实验表明，该方法在事件检测方面优于现有方法，同时优化了对真实世界部署和可用性的需求。摘要：Emergency response is highly dependent on the time of incident reporting. Unfortunately, the traditional approach to receiving incident reports (e.g., calling 911 in the USA) has time delays. Crowdsourcing platforms such as Waze provide an opportunity for early identification of incidents. However, detecting incidents from crowdsourced data streams is difficult due to the challenges of noise and uncertainty associated with such data. Further, simply optimizing over detection accuracy can compromise spatial-temporal localization of the inference, thereby making such approaches infeasible for real-world deployment. This paper presents a novel problem formulation and solution approach for practitioner-centered incident detection using crowdsourced data by using emergency response management as a case-study. The proposed approach CROME (Crowdsourced Multi-objective Event Detection) quantifies the relationship between the performance metrics of incident classification (e.g., F1 score) and the requirements of model practitioners (e.g., 1 km. radius for incident detection). First, we show how crowdsourced reports, ground-truth historical data, and other relevant determinants such as traffic and weather can be used together in a Convolutional Neural Network (CNN) architecture for early detection of emergency incidents. Then, we use a Pareto optimization-based approach to optimize the output of the CNN in tandem with practitioner-centric parameters to balance detection accuracy and spatial-temporal localization. Finally, we demonstrate the applicability of this approach using crowdsourced data from Waze and traffic accident reports from Nashville, TN, USA. Our experiments demonstrate that the proposed approach outperforms existing approaches in incident detection while simultaneously optimizing the needs for real-world deployment and usability.

【9】 A network analysis of decision strategies of human experts in steel manufacturing 标题：钢铁制造业人类专家决策策略的网络分析链接：https://arxiv.org/abs/2112.01991

作者：Daniel Christopher Merten,Prof. Dr. Marc-Thorsten Hütt,Prof. Dr. Yilmaz Uygun 备注：submitted to Computers & Industrial Engineering, 29 pages, 12 fiugres, 3 tables 摘要：钢铁生产调度通常由人力专家计划员完成。因此，与全自动调度系统相比，钢铁制造商更喜欢辅助推荐算法。通过建议合适的订单，这些算法可以帮助负责选择和安排生产订单的人类专家计划员。然而，由于钢铁战役规划缺乏精确的基于规则的程序，很难估计这些算法的复杂程度；事实上，它需要广泛的领域知识和直觉，而这些只有通过多年的业务经验才能获得。在这里，我们没有开发新的算法或改进旧的算法，而是引入了一种洗牌辅助网络方法来评估由人类专家建立的选择模式的复杂性。这种技术使我们能够形式化并表示进入活动规划的隐性知识。通过网络分析，我们发现生产订单的选择主要取决于订单的碳含量。令人惊讶的是，锰、硅和钛等微量元素对选择决策的影响小于相关文献的假设。当人类专家需要创建满足某些隐含选择标准的订单组（“活动”）时，我们的方法可以作为一系列决策支持系统的输入。摘要：Steel production scheduling is typically accomplished by human expert planners. Hence, instead of fully automated scheduling systems steel manufacturers prefer auxiliary recommendation algorithms. Through the suggestion of suitable orders, these algorithms assist human expert planners who are tasked with the selection and scheduling of production orders. However, it is hard to estimate, what degree of complexity these algorithms should have as steel campaign planning lacks precise rule-based procedures; in fact, it requires extensive domain knowledge as well as intuition that can only be acquired by years of business experience. Here, instead of developing new algorithms or improving older ones, we introduce a shuffling-aided network method to assess the complexity of the selection patterns established by a human expert. This technique allows us to formalize and represent the tacit knowledge that enters the campaign planning. As a result of the network analysis, we have discovered that the choice of production orders is primarily determined by the orders' carbon content. Surprisingly, trace elements like manganese, silicon, and titanium have a lesser impact on the selection decision than assumed by the pertinent literature. Our approach can serve as an input to a range of decision-support systems, whenever a human expert needs to create groups of orders ('campaigns') that fulfill certain implicit selection criteria.

【10】 Survey on English Entity Linking on Wikidata 标题：关于维基数据上英文实体链接的调查链接：https://arxiv.org/abs/2112.01989

作者：Cedric Möller,Jens Lehmann,Ricardo Usbeck 备注：Disclaimer: Cedric M"oller, Jens Lehmann, Ricardo Usbeck, 2021. The definitive, peer reviewed and edited version of this article is published in the Semantic Web Journal, Special issue: Latest Advancements in Linguistic 3 Linked Data, 2021 摘要：Wikidata是一个经常更新的、社区驱动的、多语言的知识图。因此，维基数据是实体链接的一个有吸引力的基础，最近发表的论文的增加就是明证。本次调查主要关注四个主题：（1）链接数据集的Wikidata实体存在哪些，它们的使用范围有多广，以及它们是如何构建的？（2） Wikidata的特性对实体链接数据集的设计有影响吗？如果有，如何影响？（3）当前的实体链接方法如何利用Wikidata的特定特性？（4）现有实体链接方法未利用哪些Wikidata特性？这项调查显示，当前Wikidata特定实体链接数据集的注释方案与其他知识图（如DBpedia）的注释方案没有区别。因此，自然适合Wikidata的多语言和时间相关数据集的潜力并没有被释放。此外，我们还表明，大多数实体链接方法使用Wikidata的方式与任何其他知识图使用Wikidata的方式相同，没有机会利用Wikidata特定的特性来提高质量。几乎所有的方法都使用特定的属性，如标签，有时使用描述，但忽略了超关系结构等特征。因此，仍然有改进的余地，例如，通过包含超关系图嵌入或类型信息。许多方法还包括来自Wikipedia的信息，它很容易与Wikidata结合，并提供Wikidata所缺乏的有价值的文本信息。摘要：Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

【11】 Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts 标题：情绪的形状：通过情绪转换识别对话中的多模态情绪链接：https://arxiv.org/abs/2112.01938

作者：Harsh Agarwal,Keshav Bansal,Abhinav Joshi,Ashutosh Modi 备注：13 pages 摘要：会话中的情感识别是一个重要而活跃的研究课题。最近的工作表明，在ERC任务中使用多种模式（如文本、音频和视频）的好处。在谈话中，参与者倾向于保持特定的情绪状态，除非某些外部刺激引起变化。在一次谈话中，情绪会不断地起伏波动。受这一观察结果的启发，我们提出了一个多模态ERC模型，并用情绪转移成分对其进行了扩充。提出的情感转移组件是模块化的，可以添加到任何现有的多模态ERC模型中（只需稍作修改），以提高情感识别。我们对该模型的不同变体进行了实验，结果表明，包含情绪转移信号有助于该模型优于现有的ERC多模态模型，从而在MOSEI和IEMOCAP数据集上显示出最先进的性能。摘要：Emotion Recognition in Conversations (ERC) is an important and active research problem. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some external stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications), to improve emotion recognition. We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.

【12】 Heuristic Search Planning with Deep Neural Networks using Imitation, Attention and Curriculum Learning 标题：基于模仿、注意和课程学习的深度神经网络启发式搜索规划链接：https://arxiv.org/abs/2112.01918

作者：Leah Chrestien,Tomas Pevny,Antonin Komenda,Stefan Edelkamp 备注：8 pages plus references 摘要：为硬任务规划领域学习一个信息充分的启发式函数是一个难以捉摸的问题。虽然有已知的神经网络结构来表示此类启发式知识，但不清楚学习了哪些具体信息，以及旨在理解结构的技术是否有助于提高启发式知识的质量。本文提出了一个网络模型来学习一个启发式函数，该启发式函数能够通过使用注意机制的最优计划模拟来关联状态空间的遥远部分，从而大大提高了对一个好的启发式函数的学习。为了克服该方法在创建难度越来越大的问题时的局限性，我们演示了课程学习的使用，在训练集中添加新解决的问题实例，这反过来，有助于解决更复杂的问题，远远超过所有现有基线（包括经典规划启发式）的性能。我们证明了它对网格型PDDL域的有效性。摘要：Learning a well-informed heuristic function for hard task planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model to learn a heuristic capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism, which drastically improves the learning of a good heuristic function. To counter the limitation of the method in the creation of problems of increasing difficulty, we demonstrate the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and far exceeds the performances of all existing baselines including classical planning heuristics. We demonstrate its effectiveness for grid-type PDDL domains.

【13】 Hybrid Digital Twin for process industry using Apros simulation environment 标题：基于Apros仿真环境的流程工业混合数字孪生系统链接：https://arxiv.org/abs/2112.01903

作者：Mohammad Azangoo,Joonas Salmi,Iivo Yrjölä,Jonathan Bensky,Gerardo Santillan,Nikolaos Papakonstantinou,Seppo Sierla,Valeriy Vyatkin 摘要：制作更新的竣工模型在工艺装置的生命周期中起着重要作用。特别是，数字孪生模型必须精确，以保证系统的效率和可靠性。数据驱动模型可以通过考虑不确定性和生命周期相关变化来模拟子系统的最新行为。本文以一个早期实现的原型为例，提出了过程工厂混合数字双模型的逐步概念。它将详细说明使用工艺设备的数据驱动模型更新棕地工艺系统的第一原理模型和数字孪生模型的步骤。还将讨论生成竣工混合数字孪生兄弟的挑战。借助过程历史数据来教授机器学习模型，随着时间的推移，实现的数字孪生可以不断改进，并且可以进一步优化正在进行的工作。摘要：Making an updated and as-built model plays an important role in the life-cycle of a process plant. In particular, Digital Twin models must be precise to guarantee the efficiency and reliability of the systems. Data-driven models can simulate the latest behavior of the sub-systems by considering uncertainties and life-cycle related changes. This paper presents a step-by-step concept for hybrid Digital Twin models of process plants using an early implemented prototype as an example. It will detail the steps for updating the first-principles model and Digital Twin of a brownfield process system using data-driven models of the process equipment. The challenges for generation of an as-built hybrid Digital Twin will also be discussed. With the help of process history data to teach Machine Learning models, the implemented Digital Twin can be continually improved over time and this work in progress can be further optimized.

【14】 The Catalan Language CLUB 标题：加泰罗尼亚语言俱乐部链接：https://arxiv.org/abs/2112.01894

作者：Carlos Rodriguez-Penagos,Carme Armentano-Oller,Marta Villegas,Maite Melero,Aitor Gonzalez,Ona de Gibert Bonet,Casimiro Carrino Pio 备注：OpenCor Forum 2021. arXiv admin note: text overlap with arXiv:2107.07903 摘要：加泰罗尼亚语言理解基准（CLUB）包含代表不同NLU任务的各种数据集，这些数据集能够准确评估语言模型，遵循通用语言理解评估（GLUE）示例。它是AINA和PlanTL的一部分，这两项公共资助计划旨在增强人工智能时代加泰罗尼亚语言的能力。摘要：The Catalan Language Understanding Benchmark (CLUB) encompasses various datasets representative of different NLU tasks that enable accurate evaluations of language models, following the General Language Understanding Evaluation (GLUE) example. It is part of AINA and PlanTL, two public funding initiatives to empower the Catalan language in the Artificial Intelligence era.

【15】 Image-to-image Translation as a Unique Source of Knowledge 标题：作为一种独特的知识来源的影象翻译链接：https://arxiv.org/abs/2112.01873

作者：Alejandro D. Mousist 摘要：图像到图像（I2I）转换是一种将数据从一个域转换到另一个域的既定方法，但在处理SAR/光学卫星图像等不同域时，目标域中转换图像的可用性以及将多少原始域转换到目标域仍然不够清楚。本文通过使用最新的I2I算法将标记数据集从光学域转换到SAR域，从目标域中传输的特征中学习，并在稍后评估从原始数据集传输了多少数据，从而解决了这一问题。除此之外，还建议将堆叠作为一种结合从不同I2I翻译中学习到的知识并针对单个模型进行评估的方法。摘要：Image-to-image (I2I) translation is an established way of translating data from one domain to another but the usability of the translated images in the target domain when working with such dissimilar domains as the SAR/optical satellite imagery ones and how much of the origin domain is translated to the target domain is still not clear enough. This article address this by performing translations of labelled datasets from the optical domain to the SAR domain with different I2I algorithms from the state-of-the-art, learning from transferred features in the destination domain and evaluating later how much from the original dataset was transferred. Added to this, stacking is proposed as a way of combining the knowledge learned from the different I2I translations and evaluated against single models.

【16】 Active Inference in Robotics and Artificial Agents: Survey and Challenges 标题：机器人和人工智能体中的主动推理：综述和挑战链接：https://arxiv.org/abs/2112.01871

作者：Pablo Lanillos,Cristian Meo,Corrado Pezzato,Ajith Anil Meera,Mohamed Baioumy,Wataru Ohata,Alexander Tschantz,Beren Millidge,Martijn Wisse,Christopher L. Buckley,Jun Tani 备注：This manuscript is under review in a IEEE journal 摘要：主动推理是一种数学框架，起源于计算神经科学，是一种关于大脑如何执行动作、感知和学习的理论。最近，它被证明是一种很有前途的方法，在不确定状态下的状态估计和控制问题，以及基础建设的目标驱动行为在机器人和人工智能体一般。在这里，我们回顾了用于状态估计、控制、规划和学习的主动推理的最新理论和实现；描述当前的成就，特别关注机器人技术。我们展示了相关的实验，展示了它在适应性、泛化和鲁棒性方面的潜力。此外，我们将此方法与其他框架联系起来，并讨论其预期的好处和挑战：使用变分贝叶斯推理的具有功能生物学合理性的统一框架。摘要：Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning. Recently, it has been shown to be a promising approach to the problems of state-estimation and control under uncertainty, as well as a foundation for the construction of goal-driven behaviours in robotics and artificial agents in general. Here, we review the state-of-the-art theory and implementations of active inference for state-estimation, control, planning and learning; describing current achievements with a particular focus on robotics. We showcase relevant experiments that illustrate its potential in terms of adaptation, generalization and robustness. Furthermore, we connect this approach with other frameworks and discuss its expected benefits and challenges: a unified framework with functional biological plausibility using variational Bayesian inference.

【17】 Discovery of Crime Event Sequences with Constricted Spatio-Temporal Sequential Patterns 标题：具有压缩时空序列模式的犯罪事件序列的发现链接：https://arxiv.org/abs/2112.01863

作者：Piotr S. Maciąg,Robert Bembenik,Artur Dubrawski 备注：37 pages 摘要：在这篇文章中，我们介绍了一种新型的时空序列模式，称为压缩时空序列（CSTS）模式，并深入分析了它们的特性。我们证明了CSTS模式集是可以在给定数据集中发现的所有时空序列模式的简明表示。为了测量发现的CSTS模式的重要性，我们采用了参与指数测量。我们还提供了CSTS Miner：一种在事件数据中发现所有参与索引强CSTS模式的算法。我们使用两个与犯罪相关的数据集：匹兹堡警察事件记录数据集和波士顿犯罪事件报告数据集对所提出的算法进行了实验评估。在实验中，将CSTS-Miner算法与其他四种最先进的算法：STS-Miner、CSTPM、STBFM和CST-SPMiner进行了比较。实验结果表明，该算法比其他算法发现的模式要少得多。最后，我们提供了所提出的CSTS Miner算法发现的有趣的犯罪相关模式的示例。摘要：In this article, we introduce a novel type of spatio-temporal sequential patterns called Constricted Spatio-Temporal Sequential (CSTS) patterns and thoroughly analyze their properties. We demonstrate that the set of CSTS patterns is a concise representation of all spatio-temporal sequential patterns that can be discovered in a given dataset. To measure significance of the discovered CSTS patterns we adapt the participation index measure. We also provide CSTS-Miner: an algorithm that discovers all participation index strong CSTS patterns in event data. We experimentally evaluate the proposed algorithms using two crime-related datasets: Pittsburgh Police Incident Blotter Dataset and Boston Crime Incident Reports Dataset. In the experiments, the CSTS-Miner algorithm is compared with the other four state-of-the-art algorithms: STS-Miner, CSTPM, STBFM and CST-SPMiner. As the results of experiments suggest, the proposed algorithm discovers much fewer patterns than the other selected algorithms. Finally, we provide the examples of interesting crime-related patterns discovered by the proposed CSTS-Miner algorithm.

【18】 A Proposal of Automatic Error Correction in Text 标题：一种文本自动纠错的方案链接：https://arxiv.org/abs/2112.01846

作者：Wulfrano A. Luna-Ramírez,Carlos R. Jaimez-González 备注：None 摘要：可以存储在电子媒体中的大量信息每天都在增长。其中很多主要是通过打字获得的，比如从Web2.0网站获得的大量信息；或者通过光学字符识别软件进行扫描和处理，比如图书馆和政府办公室的文本。这两个过程都会在文本中引入错误，因此很难将数据用于除阅读之外的其他目的，即通过其他应用程序（如电子学习、语言学习、电子教程、数据挖掘、信息检索以及更专业的系统，如TIFLOGIC软件）处理这些文本，特别是面向人的盲应用程序，如自动阅读，其中文本将尽可能无错误，以简化文本到语音的任务，等等。本文介绍了一种自动识别和纠正电子文本中印刷错误的应用。该任务由三个阶段组成：a）错误检测；b）候选修正生成；c）纠正——选择最佳候选人。该方案基于词性文本分类、词语相似度、词语分类、统计度量、形态分析和基于n-grams的西班牙语语言模型。摘要：The great amount of information that can be stored in electronic media is growing up daily. Many of them is got mainly by typing, such as the huge of information obtained from web 2.0 sites; or scaned and processing by an Optical Character Recognition software, like the texts of libraries and goverment offices. Both processes introduce error in texts, so it is difficult to use the data for other purposes than just to read it, i.e. the processing of those texts by other applications like e-learning, learning of languages, electronic tutorials, data minning, information retrieval and even more specialized systems such as tiflologic software, specifically blinded people-oriented applications like automatic reading, where the text would be error free as possible in order to make easier the text to speech task, and so on. In this paper it is showed an application of automatic recognition and correction of ortographic errors in electronic texts. This task is composed of three stages: a) error detection; b) candidate corrections generation; and c) correction -selection of the best candidate. The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.

【19】 Combining Sub-Symbolic and Symbolic Methods for Explainability 标题：子符号法和符号法相结合的可解释性研究链接：https://arxiv.org/abs/2112.01844

作者：Anna Himmelhuber,Stephan Grimm,Sonja Zillner,Mitchell Joblin,Martin Ringsquandl,Thomas Runkler 备注：RuleML RR 2021 摘要：与其他联结主义模型类似，图形神经网络（GNN）在决策过程中缺乏透明度。为了深入了解GNN决策过程，已经开发了许多次符号方法。这些是解释性的第一个重要步骤，但对于非人工智能专家的用户来说，生成的解释通常很难理解。为了克服这个问题，我们引入了一种概念方法，将亚符号和符号方法相结合，用于以人为中心的解释，该方法结合了领域知识和因果关系。我们还引入了保真度的概念，作为评估解释与GNN内部决策过程的接近程度的指标。通过对一个化学数据集和本体的评估，表明了该方法的解释价值和可靠性。摘要：Similarly to other connectionist models, Graph Neural Networks (GNNs) lack transparency in their decision-making. A number of sub-symbolic approaches have been developed to provide insights into the GNN decision making process. These are first important steps on the way to explainability, but the generated explanations are often hard to understand for users that are not AI experts. To overcome this problem, we introduce a conceptual approach combining sub-symbolic and symbolic methods for human-centric explanations, that incorporate domain knowledge and causality. We furthermore introduce the notion of fidelity as a metric for evaluating how close the explanation is to the GNN's internal decision making process. The evaluation with a chemical dataset and ontology shows the explanatory value and reliability of our method.

【20】 Graph-Guided Deformation for Point Cloud Completion 标题：基于图形引导的点云补全变形算法链接：https://arxiv.org/abs/2112.01840

作者：Jieqi Shi,Lingyun Xu,Liang Heng,Shaojie Shen 备注：RAL with IROS 2021 摘要：长期以来，点云完成任务一直被视为纯生成任务。在通过编码器获得全局形状代码后，使用网络预先学习的形状生成完整的点云。然而，这样的模型不希望偏向于先前的平均对象，并且固有地局限于适合几何细节。本文提出了一种以输入数据和中间生成为控制点和支撑点的图引导变形网络，并对点云完成任务的图卷积网络（GCN）引导的优化建模。我们的主要见解是通过网格变形方法模拟最小二乘拉普拉斯变形过程，这为建模几何细节的变化带来了适应性。通过这种方法，我们还减少了完成任务和网格变形算法之间的差距。据我们所知，我们是第一个通过使用GCN引导变形模拟传统图形算法来优化点云完成任务的人。我们在模拟的室内数据集ShapeNet、室外数据集KITTI和我们自行收集的自主驾驶数据集Pandar40上进行了广泛的实验。结果表明，在三维点云完成任务中，我们的方法优于现有的最新算法。摘要：For a long time, the point cloud completion task has been regarded as a pure generation task. After obtaining the global shape code through the encoder, a complete point cloud is generated using the shape priorly learnt by the networks. However, such models are undesirably biased towards prior average objects and inherently limited to fit geometry details. In this paper, we propose a Graph-Guided Deformation Network, which respectively regards the input data and intermediate generation as controlling and supporting points, and models the optimization guided by a graph convolutional network(GCN) for the point cloud completion task. Our key insight is to simulate the least square Laplacian deformation process via mesh deformation methods, which brings adaptivity for modeling variation in geometry details. By this means, we also reduce the gap between the completion task and the mesh deformation algorithms. As far as we know, we are the first to refine the point cloud completion task by mimicing traditional graphics algorithms with GCN-guided deformation. We have conducted extensive experiments on both the simulated indoor dataset ShapeNet, outdoor dataset KITTI, and our self-collected autonomous driving dataset Pandar40. The results show that our method outperforms the existing state-of-the-art algorithms in the 3D point cloud completion task.

【21】 Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling 标题：小心你的聪明邻居：通过自适应聚类关系建模实现无人监督的重新身份识别链接：https://arxiv.org/abs/2112.01839

作者：Lianjie Jia,Chenyang Yu,Xiehao Ye,Tianyu Yan,Yinjie Lei,Pingping Zhang 备注：This work has been accepted by AAAI-2022. Some modifications may be performed for the final version 摘要：无监督人员再识别（re-ID）因其解决有监督re-ID模型可扩展性问题的潜力而受到越来越多的关注。大多数现有的无监督方法采用迭代聚类机制，其中网络是基于无监督聚类生成的伪标签进行训练的。然而，聚类错误是不可避免的。为了生成高质量的伪标签并减轻聚类错误的影响，我们提出了一种新的无监督人员Re-ID聚类关系建模框架。具体而言，在聚类之前，基于图相关学习（GCL）探索未标记图像之间的关系模型和细化后的特征用于聚类，生成高质量的伪标签。因此，GCL自适应挖掘小批量样本之间的关系，以减少训练时异常聚类的影响。为了更有效地训练网络，我们进一步提出了一种具有选择性记忆库更新策略的选择性对比学习（SCL）方法。大量实验表明，在Market1501、DukeMTMC reID和MSMT17数据集上，我们的方法比大多数最先进的无监督方法显示出更好的结果。我们将发布模型复制的代码。摘要：Unsupervised person re-identification (Re-ID) attracts increasing attention due to its potential to resolve the scalability problem of supervised Re-ID models. Most existing unsupervised methods adopt an iterative clustering mechanism, where the network was trained based on pseudo labels generated by unsupervised clustering. However, clustering errors are inevitable. To generate high-quality pseudo-labels and mitigate the impact of clustering errors, we propose a novel clustering relationship modeling framework for unsupervised person Re-ID. Specifically, before clustering, the relation between unlabeled images is explored based on a graph correlation learning (GCL) module and the refined features are then used for clustering to generate high-quality pseudo-labels.Thus, GCL adaptively mines the relationship between samples in a mini-batch to reduce the impact of abnormal clustering when training. To train the network more effectively, we further propose a selective contrastive learning (SCL) method with a selective memory bank update policy. Extensive experiments demonstrate that our method shows much better results than most state-of-the-art unsupervised methods on Market1501, DukeMTMC-reID and MSMT17 datasets. We will release the code for model reproduction.

【22】 Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer 标题：一种新型一元成对Transformer人-物相互作用的高效两阶段检测链接：https://arxiv.org/abs/2112.01838

作者：Frederic Z. Zhang,Dylan Campbell,Stephen Gould 备注：14 pages, 14 figures and 5 tables 摘要：用于视觉数据的Transformer模型的最新发展已导致识别和检测任务的显著改进。特别是，使用可学习查询代替区域建议已经产生了一类新的单阶段检测模型，由检测Transformer（DETR）牵头。此后，这种单阶段方法的变化主导了人机交互（HOI）检测。然而，这种单级HOI探测器的成功在很大程度上归功于Transformer的表现力。我们发现，当配备相同的Transformer时，两级Transformer的性能和内存效率会更高，而训练时间只需一小部分。在这项工作中，我们提出了一元成对变换器，这是一种两级检测器，利用了HOI的一元和成对表示。我们观察到，Transformer网络的一元部分和成对部分具有特殊性，前者优先增加正面示例的分数，后者减少负面示例的分数。我们在HICO-DET和V-COCO数据集上评估了我们的方法，并且显著优于最先进的方法。在推断时，我们使用ResNet50的模型在单个GPU上接近实时性能。摘要：Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

【23】 Semantic Segmentation of Legal Documents via Rhetorical Roles 标题：基于修辞角色的法律文本语义切分链接：https://arxiv.org/abs/2112.01836

作者：Vijit Malik,Rishabh Sanjay,Shouvik Kumar Guha,Shubham Kumar Nigam,Angshuman Hazarika,Arnab Bhattacharya,Ashutosh Modi 备注：16 pages 摘要：法律文档是非结构化的，使用法律术语，并且具有相当长的长度，因此很难通过传统的文本处理技术自动处理。如果文档可以在语义上分割为连贯的信息单元，那么法律文档处理系统将大大受益。本文提出了一个修辞角色（RR）系统，用于将法律文件分割为语义连贯的单元：事实、论点、法规、问题、先例、裁决和比率。在法律专家的帮助下，我们提出了一套13个细粒度的修辞角色标签，并创建了一个新的法律文件语料库，用建议的RR注释。我们开发了一个将文档分割成修辞角色单元的系统。特别是，我们开发了一个基于多任务学习的深度学习模型，将文档修辞角色标签转换作为分割法律文档的辅助任务。我们对各种深度学习模型进行了广泛的实验，以预测文档中的修辞角色，与现有模型相比，该模型表现出了更高的性能。此外，我们将RR应用于预测法律案件的判决，并表明与基于Transformer的模型相比，RR的使用增强了预测。摘要：Legal documents are unstructured, use legal jargon, and have considerable length, making it difficult to process automatically via conventional text processing techniques. A legal document processing system would benefit substantially if the documents could be semantically segmented into coherent units of information. This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units: facts, arguments, statute, issue, precedent, ruling, and ratio. With the help of legal experts, we propose a set of 13 fine-grained rhetorical role labels and create a new corpus of legal documents annotated with the proposed RR. We develop a system for segmenting a document into rhetorical role units. In particular, we develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document. We experiment extensively with various deep learning models for predicting rhetorical roles in a document, and the proposed model shows superior performance over the existing models. Further, we apply RR for predicting the judgment of legal cases and show that the use of RR enhances the prediction compared to the transformer-based models.

【24】 Table2Vec: Automated Universal Representation Learning to Encode All-round Data DNA for Benchmarkable and Explainable Enterprise Data Science 标题：表2Vec：自动化通用表示学习为可基准和可解释的企业数据科学编码全面的数据DNA 链接：https://arxiv.org/abs/2112.01830

作者：Longbing Cao,Chengzhang Zhu 备注：24 pages, 16 figures, 1 table 摘要：企业数据通常涉及多个异构数据源和外部数据，分别记录业务活动、交易、客户统计、状态、行为、与企业的交互和通信，以及其产品、服务、生产、营销和运营的消费和反馈，企业数据科学面临的一个关键挑战是，如何在全方位的企业DNA上实现有效的整体企业数据理解、数据驱动的发现和决策。我们介绍了一种神经编码器Table2Vec，用于自动通用表示学习实体，如来自全方位企业DNA的客户，并具有自动数据特征分析和数据质量增强功能。学习到的通用表示可以作为代表性和基准企业数据基因组，并可用于企业范围和特定领域的学习任务。表2VEC集成了低质量企业数据和下游学习任务的自动化通用表示学习。我们举例说明Table2Vec在复杂异构多关系大表上描述企业中全面的客户数据DNA，以构建通用客户向量表示。每个客户学习到的通用表示法是全面的、有代表性的和基准的，能够支持企业数据科学中企业范围和特定领域的学习目标和任务。Table2Vec显著优于企业分析中常用的现有浅层、推进和深度学习方法。我们进一步讨论了自动化通用企业表示和学习的研究机会、方向和应用，以及用于自动化、通用、全企业和道德机器学习和数据科学的企业数据DNA。摘要：Enterprise data typically involves multiple heterogeneous data sources and external data that respectively record business activities, transactions, customer demographics, status, behaviors, interactions and communications with the enterprise, and the consumption and feedback of its products, services, production, marketing, operations, and management, etc. A critical challenge in enterprise data science is to enable an effective whole-of-enterprise data understanding and data-driven discovery and decision-making on all-round enterprise DNA. We introduce a neural encoder Table2Vec for automated universal representation learning of entities such as customers from all-round enterprise DNA with automated data characteristics analysis and data quality augmentation. The learned universal representations serve as representative and benchmarkable enterprise data genomes and can be used for enterprise-wide and domain-specific learning tasks. Table2Vec integrates automated universal representation learning on low-quality enterprise data and downstream learning tasks. We illustrate Table2Vec in characterizing all-round customer data DNA in an enterprise on complex heterogeneous multi-relational big tables to build universal customer vector representations. The learned universal representation of each customer is all-round, representative and benchmarkable to support both enterprise-wide and domain-specific learning goals and tasks in enterprise data science. Table2Vec significantly outperforms the existing shallow, boosting and deep learning methods typically used for enterprise analytics. We further discuss the research opportunities, directions and applications of automated universal enterprise representation and learning and the learned enterprise data DNA for automated, all-purpose, whole-of-enterprise and ethical machine learning and data science.

【25】 A Survey: Deep Learning for Hyperspectral Image Classification with Few Labeled Samples 标题：深度学习在少样本高光谱图像分类中的研究进展链接：https://arxiv.org/abs/2112.01800

作者：Sen Jia,Shuguo Jiang,Zhijie Lin,Nanying Li,Meng Xu,Shiqi Yu 备注：None 摘要：随着深度学习技术的快速发展和计算能力的提高，深度学习在高光谱图像分类领域得到了广泛的应用。一般来说，深度学习模型通常包含许多可训练的参数，需要大量的标记样本才能实现最佳性能。然而，在HSI分类方面，由于手动标记的困难性和耗时性，通常难以获取大量标记样本。因此，许多研究工作致力于在标记样本较少的情况下建立HSI分类的深度学习模型。在这篇文章中，我们专注于这个主题，并对相关文献进行了系统的回顾。具体而言，本文的贡献是双重的。首先，根据学习范式对相关方法的研究进展进行了分类，包括迁移学习、主动学习和Few-Shot学习。其次，采用各种最先进的方法进行了大量实验，并对结果进行了总结，以揭示潜在的研究方向。更重要的是，值得注意的是，尽管深度学习模型（通常需要足够的标记样本）与标记样本较少的HSI场景之间存在巨大差距，但小样本集的问题可以通过深度学习方法和相关技术的融合得到很好的表征，例如转移学习和轻量级模型。对于再现性，本文中评估方法的源代码可在https://github.com/ShuGuoJ/HSI-Classification.git. 摘要：With the rapid development of deep learning technology and improvement in computing capability, deep learning has been widely used in the field of hyperspectral image (HSI) classification. In general, deep learning models often contain many trainable parameters and require a massive number of labeled samples to achieve optimal performance. However, in regard to HSI classification, a large number of labeled samples is generally difficult to acquire due to the difficulty and time-consuming nature of manual labeling. Therefore, many research works focus on building a deep learning model for HSI classification with few labeled samples. In this article, we concentrate on this topic and provide a systematic review of the relevant literature. Specifically, the contributions of this paper are twofold. First, the research progress of related methods is categorized according to the learning paradigm, including transfer learning, active learning and few-shot learning. Second, a number of experiments with various state-of-the-art approaches has been carried out, and the results are summarized to reveal the potential research directions. More importantly, it is notable that although there is a vast gap between deep learning models (that usually need sufficient labeled samples) and the HSI scenario with few labeled samples, the issues of small-sample sets can be well characterized by fusion of deep learning methods and related techniques, such as transfer learning and a lightweight model. For reproducibility, the source codes of the methods assessed in the paper can be found at https://github.com/ShuGuoJ/HSI-Classification.git.

【26】 A Systematic IoU-Related Method: Beyond Simplified Regression for Better Localization 标题：一种系统的欠条相关方法：超越简化回归以实现更好的本地化链接：https://arxiv.org/abs/2112.01793

作者：Hanyang Peng,Shiqi Yu 备注：None 摘要：现代检测器默认使用四变量独立回归局部化损失，例如平滑-$elluu 1$损失。然而，这种损失过于简单，因此与最终评估指标——联合交集（IoU）不一致。直接使用标准IoU也不是不可行的，因为在非重叠盒的情况下，恒定的零平台和最小的非零梯度可能使其无法训练。因此，我们提出了一个系统的方法来解决这些问题。首先，我们提出了一个新的度量，扩展IoU（EIoU），当两个盒子不重叠时，它是定义良好的，当重叠时，它被简化为标准IoU。其次，我们提出了基于EIoU的凸化技术（CT）来构造一个损失，它可以保证最小梯度为零。第三，我们提出了一种稳定优化技术（SOT），使分数EIoU损失更稳定、更平滑地逼近最小值。第四，为了充分利用基于EIoU的定位精度，我们引入了一个相关的IoU预测头来进一步提高定位精度。根据提议的贡献，新方法与以ResNet50 FPN为主干的快速R-CNN结合，在VOC0007上产生textbf{4.2 mAP}增益，在COCO2017上产生textbf{2.3 mAP}增益，超过基线平滑-$ellu 1$损失，几乎没有训练和推断计算成本。具体而言，指标越严格，收益越显著，VOC2007上的textbf{8.2 mAP}和COCO2017上的textbf{5.4 mAP}分别提高到指标$AP{90}$。摘要：Four-variable-independent-regression localization losses, such as Smooth-$ell_1$ Loss, are used by default in modern detectors. Nevertheless, this kind of loss is oversimplified so that it is inconsistent with the final evaluation metric, intersection over union (IoU). Directly employing the standard IoU is also not infeasible, since the constant-zero plateau in the case of non-overlapping boxes and the non-zero gradient at the minimum may make it not trainable. Accordingly, we propose a systematic method to address these problems. Firstly, we propose a new metric, the extended IoU (EIoU), which is well-defined when two boxes are not overlapping and reduced to the standard IoU when overlapping. Secondly, we present the convexification technique (CT) to construct a loss on the basis of EIoU, which can guarantee the gradient at the minimum to be zero. Thirdly, we propose a steady optimization technique (SOT) to make the fractional EIoU loss approaching the minimum more steadily and smoothly. Fourthly, to fully exploit the capability of the EIoU based loss, we introduce an interrelated IoU-predicting head to further boost localization accuracy. With the proposed contributions, the new method incorporated into Faster R-CNN with ResNet50 FPN as the backbone yields textbf{4.2 mAP} gain on VOC2007 and textbf{2.3 mAP} gain on COCO2017 over the baseline Smooth-$ell_1$ Loss, at almost textbf{no training and inferencing computational cost}. Specifically, the stricter the metric is, the more notable the gain is, improving textbf{8.2 mAP} on VOC2007 and textbf{5.4 mAP} on COCO2017 at metric $AP_{90}$.

【27】 Detect Faces Efficiently: A Survey and Evaluations 标题：有效的人脸检测：综述与评价链接：https://arxiv.org/abs/2112.01787

作者：Yuantao Feng,Shiqi Yu,Hanyang Peng,Yan-Ran Li,Jianguo Zhang 摘要：人脸检测是在图像中搜索所有可能的人脸区域，并在有人脸的情况下进行定位。包括人脸识别、表情识别、人脸跟踪和头部姿势估计在内的许多应用都假设人脸在图像中的位置和大小都是已知的。近几十年来，研究人员创造了许多典型而高效的人脸检测器，从Viola-Jones人脸检测器到目前基于CNN的人脸检测器。然而，随着图像和视频的大量增加，人脸的尺度、外观、表情、遮挡和姿势都发生了变化，传统的人脸检测器面临着检测各种“野生”人脸的挑战。深度学习技术的出现给人脸检测带来了显著的突破，同时计算量也大幅增加。本文介绍了有代表性的基于深度学习的方法，并从准确性和效率方面进行了深入和彻底的分析。我们进一步比较和讨论了流行和具有挑战性的数据集及其评估指标。对几种成功的基于深度学习的人脸检测器进行了综合比较，通过两个指标：FLOPs和延迟来揭示它们的效率。本文的研究可以指导人们根据不同的应用场合选择合适的人脸检测仪，也可以开发出更高效、更准确的人脸检测仪。摘要：Face detection is to search all the possible regions for faces in images and locate the faces if there are any. Many applications including face recognition, facial expression recognition, face tracking and head-pose estimation assume that both the location and the size of faces are known in the image. In recent decades, researchers have created many typical and efficient face detectors from the Viola-Jones face detector to current CNN-based ones. However, with the tremendous increase in images and videos with variations in face scale, appearance, expression, occlusion and pose, traditional face detectors are challenged to detect various "in the wild" faces. The emergence of deep learning techniques brought remarkable breakthroughs to face detection along with the price of a considerable increase in computation. This paper introduces representative deep learning-based methods and presents a deep and thorough analysis in terms of accuracy and efficiency. We further compare and discuss the popular and challenging datasets and their evaluation metrics. A comprehensive comparison of several successful deep learning-based face detectors is conducted to uncover their efficiency using two metrics: FLOPs and latency. The paper can guide to choose appropriate face detectors for different applications and also to develop more efficient and accurate detectors.

【28】 Characterizing Performance Bugs in Deep Learning Systems 标题：深度学习系统中性能缺陷的表征链接：https://arxiv.org/abs/2112.01771

作者：Junming Cao,Bihuan Chen,Chao Sun,Longjie Hu,Xin Peng 摘要：深度学习（DL）已越来越多地应用于各个领域。编程范式从传统系统向DL系统的转变对工程DL系统提出了独特的挑战。性能是一个挑战，DL系统中的性能缺陷（PBs）会导致严重的后果，如过度的资源消耗和财务损失。虽然DL系统中的bug已经得到了广泛的研究，但DL系统中的PBs几乎没有被研究过。为了弥补这一差距，我们提出了第一项综合研究，以描述TensorFLow和Keras开发的DL系统中PBs的症状、根本原因以及引入和暴露阶段，共从225个堆垛溢流桩收集了238个PBs。我们的发现为开发高性能DL系统以及检测和定位DL系统中的PBs提供了启示。我们还建立了DL系统中56个PBs的第一个基准，并评估了现有方法解决这些问题的能力。此外，我们还开发了一个静态检查器DeepPerf来检测三种类型的PBs，并在130个GitHub项目中识别了488个新PBs。其中62个和18个分别得到了开发人员的确认和修复。摘要：Deep learning (DL) has been increasingly applied to a variety of domains. The programming paradigm shift from traditional systems to DL systems poses unique challenges in engineering DL systems. Performance is one of the challenges, and performance bugs(PBs) in DL systems can cause severe consequences such as excessive resource consumption and financial loss. While bugs in DL systems have been extensively investigated, PBs in DL systems have hardly been explored. To bridge this gap, we present the first comprehensive study to characterize symptoms, root causes, and introducing and exposing stages of PBs in DL systems developed in TensorFLow and Keras, with a total of 238 PBs collected from 225 StackOverflow posts. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them. Moreover, we develop a static checker DeepPerf to detect three types of PBs, and identify 488 new PBs in 130 GitHub projects.62 and 18 of them have been respectively confirmed and fixed by developers.

【29】 Prescriptive Process Monitoring: Quo Vadis? 标题：规范过程监控：现状VADIS？链接：https://arxiv.org/abs/2112.01769

作者：Kateryna Kubrak,Fredrik Milani,Alexander Nolte,Marlon Dumas 摘要：规定性流程监控方法旨在通过在运行时建议干预措施来优化业务流程，以防止出现负面结果或表现不佳的情况。近年来，人们提出了各种规定性的过程监控方法。本文通过系统文献综述（SLR）研究该领域的现有方法。为了构建该领域，本文提出了一个根据绩效目标、绩效指标、干预类型、建模技术、数据输入和干预策略来描述规范性流程监控方法的框架。SLR为未来的研究提供了挑战和领域的见解，这些挑战和领域可以增强规范性过程监控方法的有用性和适用性。该文件强调需要在现实环境中验证现有和新方法，将干预类型扩展到与时间和成本角度相关的干预类型之外，并设计考虑因果关系和二阶效应的政策。摘要：Prescriptive process monitoring methods seek to optimize a business process by recommending interventions at runtime to prevent negative outcomes or poorly performing cases. In recent years, various prescriptive process monitoring methods have been proposed. This paper studies existing methods in this field via a Systematic Literature Review (SLR). In order to structure the field, the paper proposes a framework for characterizing prescriptive process monitoring methods according to their performance objective, performance metrics, intervention types, modeling techniques, data inputs, and intervention policies. The SLR provides insights into challenges and areas for future research that could enhance the usefulness and applicability of prescriptive process monitoring methods. The paper highlights the need to validate existing and new methods in real-world settings, to extend the types of interventions beyond those related to the temporal and cost perspectives, and to design policies that take into account causality and second-order effects.

【30】 NeRF-SR: High-Quality Neural Radiance Fields using Super-Sampling 标题：NERF-SR：采用超采样的高质量神经辐射场链接：https://arxiv.org/abs/2112.01759

作者：Chen Wang,Xian Wu,Yuan-Chen Guo,Song-Hai Zhang,Yu-Wing Tai,Shi-Min Hu 备注：this https URL 摘要：我们提出了NeRF SR，这是一种用于高分辨率（HR）新视图合成的解决方案，主要用于低分辨率（LR）输入。我们的方法是建立在神经辐射场（NeRF）的基础上的，它通过多层感知器预测每一点的密度和颜色。在以任意比例生成图像的同时，NeRF努力解决超出观察图像的分辨率问题。我们的关键洞察是NeRF具有局部先验，这意味着3D点的预测可以在附近区域传播并保持准确。我们首先利用超级采样策略，在每个图像像素上发射多条光线，这在亚像素级别上强制多视图约束。然后，我们证明了NeRF-SR可以通过一个细化网络进一步提高超级采样的性能，该网络利用估计的手边深度来幻觉HR参考图像上相关补丁的细节。实验结果表明，无论是在合成数据集还是在真实数据集上，NeRF-SR都能在HR生成高质量的新视图合成结果。摘要：We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color with a multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles with resolutions that go beyond observed images. Our key insight is that NeRF has a local prior, which means predictions of a 3D point can be propagated in the nearby region and remain accurate. We first exploit it by a super-sampling strategy that shoots multiple rays at each image pixel, which enforces multi-view constraint at a sub-pixel level. Then, we show that NeRF-SR can further boost the performance of super-sampling by a refinement network that leverages the estimated depth at hand to hallucinate details from related patches on an HR reference image. Experiment results demonstrate that NeRF-SR generates high-quality results for novel view synthesis at HR on both synthetic and real-world datasets.

【31】 Probing Linguistic Information For Logical Inference In Pre-trained Language Models 标题：在预先训练的语言模型中探索逻辑推理的语言信息链接：https://arxiv.org/abs/2112.01753

作者：Zeming Chen,Qiyue Gao 备注：Accepted in AAAI 2022 摘要：在预先训练的语言模型方面取得的进展已经在自然语言理解的下游任务上取得了令人印象深刻的成果。最近关于探索预先训练的语言模型的工作揭示了在其语境化表达中编码的广泛的语言特性。然而，目前尚不清楚它们是否编码了对符号推理方法至关重要的语义知识。我们提出了一种在预先训练的语言模型表示中探测逻辑推理的语言信息的方法。我们的探测数据集涵盖了主要符号推理系统所需的语言现象列表。我们发现（i）预先训练的语言模型确实编码了几种类型的语言信息用于推理，但也有一些类型的信息是弱编码的，（ii）语言模型可以通过微调有效地学习缺失的语言信息。总的来说，我们的研究结果提供了语言模型及其训练前程序捕捉逻辑推理的语言信息的哪些方面的见解。此外，我们还展示了语言模型作为支持符号推理方法的语义和背景知识库的潜力。摘要：Progress in pre-trained language models has led to a surge of impressive results on downstream tasks for natural language understanding. Recent work on probing pre-trained language models uncovered a wide range of linguistic properties encoded in their contextualized representations. However, it is unclear whether they encode semantic knowledge that is crucial to symbolic inference methods. We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through fine-tuning. Overall, our findings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.

【32】 MaxRay: A Raytracing-based Integrated Sensing and Communication Framework 标题：MaxRay：一种基于光线跟踪的集成传感与通信框架链接：https://arxiv.org/abs/2112.01751

作者：M. Arnold,M. Bauhofer,S. Mandelli,M. Henninger,F. Schaich,T. Wild,S. ten Brink 备注：Submitted to ICAS2021 摘要：综合传感和通信（ISAC）通过利用通信网络提取环境信息，在人类对通信的需求和提高生产力的需求之间形成共生关系。由于多重感官已经产生了对环境的感知，因此需要调查ISAC与此类模式相比的优势。因此，我们引入了MaxRay，这是一个ISAC框架，允许联合模拟通信、感知和附加感知。强调创建此类传感网络的挑战，我们介绍了传感所需的传播特性以及如何利用这些特性。为了比较不同传感技术的性能，我们分析了在不同领域中使用的四种常用指标，并评估了它们在传感方面的优缺点。我们描述了基于显著性的度量适合于覆盖大多数算法。此外，我们还强调了杂波消除算法的要求，使用两种标准杂波消除技术来检测典型工业场景中的目标。一般来说，演示了一个多功能的框架，它允许创建自动标记的数据集来调查各种各样的任务。摘要：Integrated Sensing And Communication (ISAC)forms a symbiosis between the human need for communication and the need for increasing productivity, by extracting environmental information leveraging the communication network. As multiple sensory already create a perception of the environment, an investigation into the advantages of ISAC compare to such modalities is required. Therefore, we introduce MaxRay, an ISAC framework allowing to simulate communication, sensing, and additional sensory jointly. Emphasizing the challenges for creating such sensing networks, we introduce the required propagation properties for sensing and how they are leveraged. To compare the performance of the different sensing techniques, we analyze four commonly used metrics used in different fields and evaluate their advantages and disadvantages for sensing. We depict that a metric based on prominence is suitable to cover most algorithms. Further we highlight the requirement of clutter removal algorithms, using two standard clutter removal techniques to detect a target in a typical industrial scenario. In general a versatile framework, allowing to create automatically labeled datasets to investigate a large variety of tasks is demonstrated.

【33】 Single-Shot Black-Box Adversarial Attacks Against Malware Detectors: A Causal Language Model Approach 标题：针对恶意软件检测器的单发黑盒对抗性攻击：一种因果语言模型方法链接：https://arxiv.org/abs/2112.01724

作者：James Lee Hu,Mohammadreza Ebrahimi,Hsinchun Chen 摘要：基于深度学习（DL）的恶意软件检测器越来越多地被用于网络安全中恶意行为的早期检测。然而，它们对恶意软件变体的敏感性引起了极大的安全问题。防御者生成此类对抗性变体对于提高基于DL的恶意软件检测器对它们的抵抗力至关重要。这种必要性导致了一种新兴的机器学习研究流，即对抗性恶意软件示例生成（AMG），其目的是生成规避性的对抗性恶意软件变体，以保留给定恶意软件的恶意功能。在AMG研究中，黑盒方法比白盒方法得到了更多的关注。然而，大多数黑盒AMG方法需要和恶意软件检测器进行大量交互，以生成对抗性恶意软件示例。鉴于大多数恶意软件检测器强制执行查询限制，这可能会导致生成不现实的对抗性示例，这些示例可能会在实践中由于缺乏隐蔽性而被检测到。在这项研究中，我们展示了一种新的基于DL的因果语言模型，该模型通过将恶意软件可执行文件的内容视为字节序列并训练生成的预训练转换器（GPT），实现了单次闪避（即，仅对恶意软件检测器进行一次查询）。我们提出的方法MalGPT在从VirusTotal获得的真实世界恶意软件数据集上的表现明显优于领先的基准方法，实现了超过24.51%的规避率。MalGPT使网络安全研究人员能够通过模拟大规模真实AMG来开发先进的防御能力。摘要：Deep Learning (DL)-based malware detectors are increasingly adopted for early detection of malicious behavior in cybersecurity. However, their sensitivity to adversarial malware variants has raised immense security concerns. Generating such adversarial variants by the defender is crucial to improving the resistance of DL-based malware detectors against them. This necessity has given rise to an emerging stream of machine learning research, Adversarial Malware example Generation (AMG), which aims to generate evasive adversarial malware variants that preserve the malicious functionality of a given malware. Within AMG research, black-box method has gained more attention than white-box methods. However, most black-box AMG methods require numerous interactions with the malware detectors to generate adversarial malware examples. Given that most malware detectors enforce a query limit, this could result in generating non-realistic adversarial examples that are likely to be detected in practice due to lack of stealth. In this study, we show that a novel DL-based causal language model enables single-shot evasion (i.e., with only one query to malware detector) by treating the content of the malware executable as a byte sequence and training a Generative Pre-Trained Transformer (GPT). Our proposed method, MalGPT, significantly outperformed the leading benchmark methods on a real-world malware dataset obtained from VirusTotal, achieving over 24.51% evasion rate. MalGPT enables cybersecurity researchers to develop advanced defense capabilities by emulating large-scale realistic AMG.

【34】 Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents 标题：使用串联的BioMed-Transformers改进长医学文档的尾端标签预测链接：https://arxiv.org/abs/2112.01718

作者：Vithya Yogarajan,Bernhard Pfahringer,Tony Smith,Jacob Montiel 摘要：多标签学习在考虑标签相关性的同时，从给定标签集中预测未知实例的标签子集。多标签分类的一个已知挑战是标签的长尾分布。许多研究侧重于改进模型的总体预测，因此没有优先考虑尾部标签。改进医学文本多标签分类中的尾端标签预测，有助于更好地了解患者并改善护理。由一个或多个不常见标签获得的知识可能会影响医疗决策和治疗计划的原因。这项研究提出了不同的连接领域特定的语言模型，包括多个BioMed转换器，以实现两个主要目标。首先，在多标签问题上提高不常见标签的F1分数，特别是长尾标签；第二，处理长医疗文本和多源电子健康记录（EHR），对于设计用于短输入序列的标准Transformer来说，这是一项具有挑战性的任务。这项研究的一个重要贡献是使用TransformerXL预测医学代码获得的最新技术（SOTA）结果。在重症监护医疗信息集市（MIMIC-III）数据库上进行了各种实验。结果表明，在整体微观和宏观F1分数以及尾端标签的单个F1分数方面，串联的BioMedTransformer优于标准Transformer，而对于长输入序列，其训练时间低于现有的基于Transformer的解决方案。摘要：Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.

【35】 Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks 标题：用于遥感任务的自监督材料和纹理表示学习链接：https://arxiv.org/abs/2112.01715

作者：Peri Akiva,Matthew Purri,Matthew Leotta 摘要：自监督学习的目的是在不使用人工标注标签的情况下学习图像特征表示。它通常被用作获得有用的初始网络权值的先导步骤，这有助于加快收敛速度并提高下游任务的性能。虽然自我监督可以在不使用标签的情况下减少监督学习和非监督学习之间的领域差距，但自我监督目标仍然需要对下游任务产生强烈的归纳偏见，以实现有效的迁移学习。在这项工作中，我们提出了一种基于材料和纹理的自我监督方法，名为MATTER（材料和纹理表示学习），该方法受经典材料和纹理方法的启发。材质和纹理可以有效地描述任何表面，包括其触觉特性、颜色和镜面反射度。通过扩展，材料和纹理的有效表示可以描述与所述材料和纹理密切相关的其他语义类。MATTER利用不变区域上的多时、空间对齐的遥感图像来学习照明不变性和视角不变性，以此作为实现材质和纹理表示一致性的机制。我们表明，我们的自我监督预训练方法可以在无监督和微调设置中实现高达24.22%和6.33%的性能提升，并在变化检测、土地覆盖分类和语义分割任务上加快高达76%的收敛速度。摘要：Self-supervised learning aims to learn image feature representations without the usage of manually annotated labels. It is often used as a precursor step to obtain useful initial network weights which contribute to faster convergence and superior performance of downstream tasks. While self-supervision allows one to reduce the domain gap between supervised and unsupervised learning without the usage of labels, the self-supervised objective still requires a strong inductive bias to downstream tasks for effective transfer learning. In this work, we present our material and texture based self-supervision method named MATTER (MATerial and TExture Representation Learning), which is inspired by classical material and texture methods. Material and texture can effectively describe any surface, including its tactile properties, color, and specularity. By extension, effective representation of material and texture can describe other semantic classes strongly associated with said material and texture. MATTER leverages multi-temporal, spatially aligned remote sensing imagery over unchanged regions to learn invariance to illumination and viewing angle as a mechanism to achieve consistency of material and texture representation. We show that our self-supervision pre-training method allows for up to 24.22% and 6.33% performance increase in unsupervised and fine-tuned setups, and up to 76% faster convergence on change detection, land cover classification, and semantic segmentation tasks.

【36】 TransCouplet:Transformer based Chinese Couplet Generation 标题：TransCouet：基于Transformer的汉语对联生成链接：https://arxiv.org/abs/2112.01707

作者：Kuan-Yu Chiang,Shihao Lin,Joe Chen,Qian Yin,Qizhen Jin 摘要：对联是一种特殊的诗歌形式，由复杂的句法和古代汉语构成。由于语义和语法规则的复杂性，创作合适的对联是一项艰巨的挑战。本文提出了一种基于Transformer的顺序-顺序耦合器生成模型。通过使用AnchiBERT，该模型能够捕捉古代汉语的理解。此外，我们评估了对联语法规则上的字形、拼音和词性标记，以进一步改进模型。摘要：Chinese couplet is a special form of poetry composed of complex syntax with ancient Chinese language. Due to the complexity of semantic and grammatical rules, creation of a suitable couplet is a formidable challenge. This paper presents a transformer-based sequence-to-sequence couplet generation model. With the utilization of AnchiBERT, the model is able to capture ancient Chinese language understanding. Moreover, we evaluate the Glyph, PinYin and Part-of-Speech tagging on the couplet grammatical rules to further improve the model.

【37】 Differential Property Prediction: A Machine Learning Approach to Experimental Design in Advanced Manufacturing 标题：微分性能预测：先进制造实验设计的机器学习方法链接：https://arxiv.org/abs/2112.01687

作者：Loc Truong,WoongJo Choi,Colby Wight,Lizzy Coda,Tegan Emerson,Keerti Kappagantula,Henry Kvinge 摘要：先进的制造技术使生产具有最先进性能的材料成为可能。然而，在许多情况下，这些技术的基于物理的模型的开发落后于它们在实验室中的使用。这意味着设计和运行实验主要通过反复试验来进行。这是次优的，因为实验是成本、时间和劳动密集型的。在这项工作中，我们提出了一个机器学习框架，即差分属性分类（DPC），它使实验者能够利用机器学习无与伦比的模式匹配能力来进行数据驱动的实验设计。DPC获取两个可能的实验参数集，并输出一个预测值，该预测值将产生具有操作员指定的更理想特性的材料。我们使用剪切辅助加工和挤出（ShAPE）这一固相加工技术，在AA7075管材制造工艺和机械性能数据上展示了DPC的成功。我们表明，通过关注实验者在多个候选实验参数之间进行选择的需要，我们可以将从加工参数预测材料特性的具有挑战性的回归任务重新构造为机器学习模型可以获得良好性能的分类任务。摘要：Advanced manufacturing techniques have enabled the production of materials with state-of-the-art properties. In many cases however, the development of physics-based models of these techniques lags behind their use in the lab. This means that designing and running experiments proceeds largely via trial and error. This is sub-optimal since experiments are cost-, time-, and labor-intensive. In this work we propose a machine learning framework, differential property classification (DPC), which enables an experimenter to leverage machine learning's unparalleled pattern matching capability to pursue data-driven experimental design. DPC takes two possible experiment parameter sets and outputs a prediction of which will produce a material with a more desirable property specified by the operator. We demonstrate the success of DPC on AA7075 tube manufacturing process and mechanical property data using shear assisted processing and extrusion (ShAPE), a solid phase processing technology. We show that by focusing on the experimenter's need to choose between multiple candidate experimental parameters, we can reframe the challenging regression task of predicting material properties from processing parameters, into a classification task on which machine learning models can achieve good performance.

【38】 TransZero: Attribute-guided Transformer for Zero-Shot Learning 标题：TransZero：一种基于属性引导的Zero-Shot学习转换器链接：https://arxiv.org/abs/2112.01683

作者：Shiming Chen,Ziming Hong,Yang Liu,Guo-Sen Xie,Baigui Sun,Hao Li,Qinmu Peng,Ke Lu,Xinge You 备注：Accepted to AAAI'22 摘要：Zero-Shot学习（Zero-shot learning，ZSL）旨在通过将语义知识从可见类转移到不可见类来识别新类。语义知识是从不同类之间共享的属性描述中学习的，这些属性描述充当了定位对象属性（表示有区别的区域特征）的强优先级，从而实现显著的视觉语义交互。尽管一些基于注意的模型试图在单个图像中学习这些区域特征，但是视觉特征的可转移性和区分性属性定位通常被忽略。在本文中，我们提出了一种属性引导Transformer网络，称为TransZero，用于细化视觉特征和学习ZSL中有区别的视觉嵌入表示的属性定位。具体而言，TransZero采用特征增强编码器来缓解ImageNet和ZSL基准之间的交叉数据集偏差，并通过减少区域特征之间的纠缠相对几何关系来提高视觉特征的可转移性。为了学习局部增强的视觉特征，TransZero使用视觉语义解码器，在语义属性信息的指导下，定位与给定图像中每个属性最相关的图像区域。然后，利用局部增强的视觉特征和语义向量在视觉语义嵌入网络中进行有效的视觉语义交互。大量的实验表明，TransZero在三个ZSL基准上达到了最新水平。代码可从以下网址获得：url{https://github.com/shiming-chen/TransZero}. 摘要：Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant visual-semantic interaction. Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we propose an attribute-guided Transformer network, termed TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations in ZSL. Specifically, TransZero takes a feature augmentation encoder to alleviate the cross-dataset bias between ImageNet and ZSL benchmarks, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. To learn locality-augmented visual features, TransZero employs a visual-semantic decoder to localize the image regions most relevant to each attribute in a given image, under the guidance of semantic attribute information. Then, the locality-augmented visual features and semantic vectors are used to conduct effective visual-semantic interaction in a visual-semantic embedding network. Extensive experiments show that TransZero achieves the new state of the art on three ZSL benchmarks. The codes are available at: url{https://github.com/shiming-chen/TransZero}.

【39】 An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images 标题：一种从历史地图图像自动生成丰富的链接地理元数据的方法链接：https://arxiv.org/abs/2112.01671

作者：Zekun Li,Yao-Yi Chiang,Sasan Tavakkol,Basel Shbita,Johannes H. Uhl,Stefan Leyk,Craig A. Knoblock 备注：10.1145/3394486.3403381 摘要：历史地图包含在其他地方很难找到的详细地理信息，涵盖很长时间（例如，美国历史地形图为125年）。但是，这些地图通常以扫描图像的形式存在，没有可搜索的元数据。使历史地图可搜索的现有方法依靠繁琐的手工工作（包括众包）来生成元数据（例如地理位置和关键字）。光学字符识别（OCR）软件可以减轻所需的手动工作，但识别结果是单个单词而不是位置短语（例如，“黑色”和“山”与“黑色山”）。本文提出了一种端到端的方法来解决查找和索引历史地图图像的实际问题。这种方法自动处理历史地图图像以提取其文本内容，并生成一组链接到大型外部地理空间知识库的元数据。RDF（资源描述框架）格式的链接元数据支持查找和索引历史地图的复杂查询，例如检索覆盖加利福尼亚州1000米以上山峰的所有历史地图。我们已经在一个名为mapKurator的系统中实现了该方法。我们使用来自不同地图样式、比例尺和覆盖范围的多个来源的历史地图对mapKurator进行了评估。我们的结果显示，与最先进的方法相比，有显著的改进。该代码已作为Kartta实验室项目的模块公开提供，网址为https://github.com/kartta-labs/Project. 摘要：Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.

【40】 The Influence of Data Pre-processing and Post-processing on Long Document Summarization 标题：数据预处理和后处理对长文档摘要的影响链接：https://arxiv.org/abs/2112.01660

作者：Xinwei Du,Kailun Dong,Yuchen Zhang,Yongsheng Li,Ruei-Yu Tsay 摘要：长文档摘要是自然语言处理领域的一项重要而艰巨的任务。长文档摘要的良好性能表明该模型对人类语言有很好的理解。目前，大多数研究集中在如何修改Transformer的注意机制，以实现更高的胭脂评分。对数据预处理和后处理的研究相对较少。在本文中，我们使用了两种前处理方法和一种后处理方法，并分析了这些方法对各种长文档摘要模型的影响。摘要：Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-processing are relatively few. In this paper, we use two pre-processing methods and a post-processing method and analyze the effect of these methods on various long document summarization models.

【41】 Multi-modal application: Image Memes Generation 标题：多模态应用：图像模因生成链接：https://arxiv.org/abs/2112.01651

作者：Zhiyuan Liu,Chuanzheng Sun,Yuxin Jiang,Shiqi Jiang,Mei Ming 摘要：模因是一个有趣的词。互联网模因为我们对世界、媒体和我们自己生活的认知变化提供了独特的见解。如果你在网上冲浪足够长的时间，你会在网上的某个地方看到它。随着社交媒体平台的兴起和便捷的图像传播，图像模因已经声名鹊起。图像模因已经成为一种流行文化，在社交媒体、博客和公开信息的传播中发挥着重要作用。随着人工智能的发展和深度学习的广泛应用，自然语言处理（NLP）和计算机视觉（CV）也可以用来解决生活中的更多问题，包括模因生成。互联网模因通常采用图像的形式，通过结合模因模板（图像）和标题（自然语言句子）创建。在我们的项目中，我们提出了一种端到端编码器-解码器结构的meme生成器。对于给定的输入句子，我们使用模因模板选择模型来确定其表达的情感，并选择图像模板。然后通过meme字幕生成器生成字幕和模因。代码和模型可在github上获得摘要：Meme is an interesting word. Internet memes offer unique insights into the changes in our perception of the world, the media and our own lives. If you surf the Internet for long enough, you will see it somewhere on the Internet. With the rise of social media platforms and convenient image dissemination, Image Meme has gained fame. Image memes have become a kind of pop culture and they play an important role in communication over social media, blogs, and open messages. With the development of artificial intelligence and the widespread use of deep learning, Natural Language Processing (NLP) and Computer Vision (CV) can also be used to solve more problems in life, including meme generation. An Internet meme commonly takes the form of an image and is created by combining a meme template (image) and a caption (natural language sentence). In our project, we propose an end-to-end encoder-decoder architecture meme generator. For a given input sentence, we use the Meme template selection model to determine the emotion it expresses and select the image template. Then generate captions and memes through to the meme caption generator. Code and models are available at github

【42】 Investigating the usefulness of Quantum Blur 标题：量子模糊的有用性研究链接：https://arxiv.org/abs/2112.01646

作者：James R. Wootton,Marcel Pfaffhauser 摘要：虽然距离量子计算能够超越传统计算还有几年的时间，但它已经提供了用于各个领域探索目的的资源。这包括电脑游戏、音乐和艺术中程序生成的某些任务。量子模糊方法作为一个原理证明的例子被介绍，以表明它可以用于设计使用量子软件原理的程序生成方法。在这里，我们分析了该方法的效果，并将其与传统的模糊效果进行了比较。我们还确定了所看到的效应是如何从量子叠加和纠缠的操纵中产生的。摘要：Though some years remain before quantum computation can outperform conventional computation, it already provides resources that be used for exploratory purposes in various fields. This includes certain tasks for procedural generation in computer games, music and art. The Quantum Blur method was introduced as a proof-of-principle example, to show that it can be useful to design methods for procedural generation using the principles of quantum software. Here we analyse the effects of the method and compare it to conventional blur effects. We also determine how the effects seen derive from the manipulation of quantum superposition and entanglement.

【43】 Hamiltonian prior to Disentangle Content and Motion in Image Sequences 标题：图像序列中内容和运动解缠前的哈密顿量链接：https://arxiv.org/abs/2112.01641

作者：Asif Khan,Amos Storkey 备注：Controllable Generative Modeling in Language and Vision Workshop at NeurIPS 2021 摘要：我们提出了一个高维序列数据的深层潜变量模型。我们的模型将潜在空间分解为内容和运动变量。为了对不同的动力学进行建模，我们将运动空间划分为子空间，并为每个子空间引入唯一的哈密顿算符。哈密顿公式提供可逆动力学，学习约束运动路径以保持不变特性。运动空间的显式分裂将哈密顿量分解为对称群，并给出动力学的长期可分性。这种分离还意味着可以学习易于理解和控制的表达。我们展示了我们的模型用于交换两个视频的运动、从给定图像生成各种动作序列和无条件序列生成的实用性。摘要：We present a deep latent variable model for high dimensional sequential data. Our model factorises the latent space into content and motion variables. To model the diverse dynamics, we split the motion space into subspaces, and introduce a unique Hamiltonian operator for each subspace. The Hamiltonian formulation provides reversible dynamics that learn to constrain the motion path to conserve invariant properties. The explicit split of the motion space decomposes the Hamiltonian into symmetry groups and gives long-term separability of the dynamics. This split also means representations can be learnt that are easy to interpret and control. We demonstrate the utility of our model for swapping the motion of two videos, generating sequences of various actions from a given image and unconditional sequence generation.

【44】 LongChecker: Improving scientific claim verification by modeling full-abstract context 标题：LongChecker：通过对全抽象上下文建模来改进科学索赔验证链接：https://arxiv.org/abs/2112.01640

作者：David Wadden,Kyle Lo,Lucy Lu Wang,Arman Cohan,Iz Beltagy,Hannaneh Hajishirzi 备注：Preprint - work in progress. 9 pages 摘要：我们介绍了用于科学索赔验证的LongChecker系统。给出一份科学声明和一份包含研究摘要的证据，LongChecker根据声明和摘要的共享编码，预测一个准确性标签，并以多任务方式确定支持的理由。我们在SciFact数据集上进行了实验，发现LongChecker实现了最先进的性能。我们进行分析以了解这一改进的来源，并发现确定索赔和报告科学发现的理由之间的关系通常需要了解理由出现的背景。通过基于所有可用上下文做出标记决策，LongChecker在需要此类理解的情况下获得了更好的性能。此外，我们还证明了LongChecker能够利用弱监督的域内数据，以促进科学索赔验证的Few-Shot域自适应。摘要：We introduce the LongChecker system for scientific claim verification. Given a scientific claim and an evidence-containing research abstract, LongChecker predicts a veracity label and identifies supporting rationales in a multitask fashion based on a shared encoding of the claim and abstract. We perform experiments on the SciFact dataset, and find that LongChecker achieves state-of-the-art performance. We conduct analysis to understand the source of this improvement, and find that identifying the relationship between a claim and a rationale reporting a scientific finding often requires understanding the context in which the rationale appears. By making labeling decisions based on all available context, LongChecker achieves better performance on cases requiring this type of understanding. In addition, we show that LongChecker is able to leverage weakly-supervised in-domain data to facilitate few-shot domain adaptation for scientific claim verification.

【45】 Evaluator for Emotionally Consistent Chatbots 标题：情感一致聊天机器人的评价器链接：https://arxiv.org/abs/2112.01616

作者：Chenxiao Liu,Guanzhi Deng,Tao Ji,Difei Tang,Silai Zheng 备注：7 pages, 6 charts, 1 figure 摘要：评估当前序列级或对话级聊天机器人（如移情开放域会话模型）的一个挑战是确定聊天机器人是否以情感一致的方式执行。最近的研究仅从语境连贯性、语言流畅性、反应多样性或对话之间的逻辑自我一致性等方面进行评价。这项工作建议训练一名评估员来确定聊天机器人的情感一致性。摘要：One challenge for evaluating current sequence- or dialogue-level chatbots, such as Empathetic Open-domain Conversation Models, is to determine whether the chatbot performs in an emotionally consistent way. The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues. This work proposes training an evaluator to determine the emotional consistency of chatbots.

【46】 Neurosymbolic Systems of Perception & Cognition: The Role of Attention 标题：感知和认知的神经符号系统：注意的作用链接：https://arxiv.org/abs/2112.01603

作者：Hugo Latapie,Ozkan Kilic,Kristinn R. Thorisson,Pei Wang,Patrick Hammer 摘要：以累积学习为目标的认知架构必须提供必要的信息和控制结构，以允许代理从他们的经验中进行增量和自主学习。这包括管理代理的目标，以及在其感知信息堆栈中不断地将感官信息与这些目标关联。学习代理的环境越多样化，这些机制就必须越通用和灵活，以处理更广泛的相关模式、任务和目标结构。虽然许多研究人员都同意，不同抽象层次的信息可能在组成、结构和处理机制上有所不同，但研究界对这些差异的细节并不普遍认同。二进制处理架构（通常称为System-1和System-2）分别被提出作为低层和高层信息的认知处理模型。我们假设认知不是以这种方式二元的，任何抽象层次的知识都涉及我们所说的神经符号信息，这意味着高层次和低层次的数据都必须包含符号和亚符号信息。此外，我们认为，高层次和低层次数据抽象处理之间的主要区别因素在很大程度上可以归因于所涉及的注意机制的性质。我们描述了这一观点背后的关键论点，并回顾了文献中的相关证据。摘要：A cognitive architecture aimed at cumulative learning must provide the necessary information and control structures to allow agents to learn incrementally and autonomously from their experience. This involves managing an agent's goals as well as continuously relating sensory information to these in its perception-cognition information stack. The more varied the environment of a learning agent is, the more general and flexible must be these mechanisms to handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers agree that information at different levels of abstraction likely differs in its makeup and structure and processing mechanisms, agreement on the particulars of such differences is not generally shared in the research community. A binary processing architecture (often referred to as System-1 and System-2) has been proposed as a model of cognitive processing for low- and high-level information, respectively. We posit that cognition is not binary in this way and that knowledge at any level of abstraction involves what we refer to as neurosymbolic information, meaning that data at both high and low levels must contain both symbolic and subsymbolic information. Further, we argue that the main differentiating factor between the processing of high and low levels of data abstraction can be largely attributed to the nature of the involved attention mechanisms. We describe the key arguments behind this view and review relevant evidence from the literature.

【47】 Online Search With Best-Price and Query-Based Predictions 标题：基于最优价格和基于查询的预测的在线搜索链接：https://arxiv.org/abs/2112.01592

作者：Spyros Angelopoulos,Shahin Kamali,Dehou Zhang 备注：22 pages, 5 figures 摘要：在在线（时间序列）搜索问题中，玩家会看到一系列在线显示的价格。在问题的标准定义中，对于每个披露的价格，参与者必须在不知道未来价格（除了其极值的上限和下限）的情况下，不可撤销地决定是否接受或拒绝该价格，并且目标是最小化竞争比，即序列中的最高价格与玩家选择的价格之间的最坏情况比率。该问题描述了在面对暴露样本的不确定性时决策的若干应用。以前关于这个问题的工作基本上假设了极端情况，要么玩家几乎没有关于输入的信息，要么玩家得到了一些强大的、无错误的建议。在这项工作中，我们研究学习增强算法，其中有一个潜在的错误预测有关的输入。具体而言，我们考虑两种不同的设置：预测与序列中的最大价格有关的设置，以及作为对多个二进制查询的响应而获得预测的设置。对于这两种设置，我们提供了搜索算法最坏情况下性能的严格或接近严格的上下界，作为预测误差的函数。我们还提供了从证券交易市场获得的数据的实验结果，证实了理论分析，并解释了我们的技术如何适用于其他学习增强应用程序。摘要：In the online (time-series) search problem, a player is presented with a sequence of prices which are revealed in an online manner. In the standard definition of the problem, for each revealed price, the player must decide irrevocably whether to accept or reject it, without knowledge of future prices (other than an upper and a lower bound on their extreme values), and the objective is to minimize the competitive ratio, namely the worst-case ratio between the maximum price in the sequence and the one selected by the player. The problem formulates several applications of decision-making in the face of uncertainty on the revealed samples. Previous work on this problem has largely assumed extreme scenarios in which either the player has almost no information about the input, or the player is provided with some powerful, and error-free advice. In this work, we study learning-augmented algorithms, in which there is a potentially erroneous prediction concerning the input. Specifically, we consider two different settings: the setting in which the prediction is related to the maximum price in the sequence, as well as the setting in which the prediction is obtained as a response to a number of binary queries. For both settings, we provide tight, or near-tight upper and lower bounds on the worst-case performance of search algorithms as a function of the prediction error. We also provide experimental results on data obtained from stock exchange markets that confirm the theoretical analysis, and explain how our techniques can be applicable to other learning-augmented applications.

【48】 InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation 标题：InfoLM：一种评价摘要的新指标&Data2Text生成链接：https://arxiv.org/abs/2112.01589

作者：Pierre Colombo,Chloe Clave,Pablo Piantanida 备注：None 摘要：通过人工注释评估自然语言生成系统的质量非常昂贵。此外，人工注释活动非常耗时，包括不可重复使用的人工劳动。在实践中，研究人员依靠自动度量作为质量的代理。在过去十年中，引入了许多基于字符串的度量（例如BLEU）。然而，这些度量通常依赖于精确匹配，因此不能可靠地处理同义词。在本文中，我们介绍了InfoLM一系列未经训练的度量，这些度量可以被视为基于字符串的度量，由于预先训练的屏蔽语言模型，可以解决上述缺陷。这一系列指标还利用了信息度量，允许InfoLM适应各种评估标准。通过使用直接评估，我们证明InfoLM在统计上取得了显著的改进，在许多配置中，在摘要和data2text生成方面的相关收益超过10美元。摘要：Assessing the quality of natural language generation systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the adaptation of InfoLM to various evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and over $10$ points of correlation gains in many configurations on both summarization and data2text generation.

【49】 Towards Intrinsic Interactive Reinforcement Learning: A Survey 标题：本征交互式强化学习研究综述链接：https://arxiv.org/abs/2112.01575

作者：Benjamin Poole,Minwoo Lee 摘要：强化学习（RL）和脑机接口（BCI）是过去十年中不断发展的两个领域。直到最近，这些领域还彼此独立运作。随着人们对人在回路（HITL）应用的兴趣不断增加，RL算法已被用于解释人的引导，从而产生了交互式强化学习（IRL）的子领域。与之相邻的是，BCI应用长期以来一直对从人机交互过程中的神经活动中提取内在反馈感兴趣。这两个想法通过将BCI集成到IRL框架中，使RL和BCI相互冲突，在IRL框架中，可以利用内在反馈来帮助训练代理。该交叉点被表示为内在IRL。为了进一步促进BCI和IRL的深层次讨好，我们对内在IRL进行了回顾，重点介绍了反馈驱动IRL的父领域，同时还就有效性、挑战和未来研究方向进行了讨论。摘要：Reinforcement learning (RL) and brain-computer interfaces (BCI) are two fields that have been growing over the past decade. Until recently, these fields have operated independently of one another. With the rising interest in human-in-the-loop (HITL) applications, RL algorithms have been adapted to account for human guidance giving rise to the sub-field of interactive reinforcement learning (IRL). Adjacently, BCI applications have been long interested in extracting intrinsic feedback from neural activity during human-computer interactions. These two ideas have set RL and BCI on a collision course for one another through the integration of BCI into the IRL framework where intrinsic feedback can be utilized to help train an agent. This intersection has been denoted as intrinsic IRL. To further help facilitate deeper ingratiation of BCI and IRL, we provide a review of intrinsic IRL with an emphasis on its parent field of feedback-driven IRL while also providing discussions concerning the validity, challenges, and future research directions.

【50】 Trajectory Clustering Performance Evaluation: If we know the answer, it's not clustering 标题：轨迹聚类性能评估：如果我们知道答案，那就不是聚类链接：https://arxiv.org/abs/2112.01570

作者：Mohsen Rezaie,Nicolas Saunier 摘要：智能交通系统（ITS）的进步使得通过自动数据采集可以获得大量的交通数据。这些数据的很大一部分存储为移动车辆和道路使用者的轨迹。在最少的人工监督下自动分析这些数据既可以降低成本，又可以消除分析的主观性。轨迹聚类是一项无监督的任务。在本文中，我们使用来自七个交叉口的轨迹数据对相似性度量、聚类算法和评估度量进行了综合比较。我们还提出了一种基于原点和终点自动生成轨迹参考簇的方法，用于基于标签的评估度量。因此，整个过程在聚类和评估级别上都没有监督。最后，我们使用一组评估指标来寻找每个交叉点的最佳相似性指标和聚类算法。结果表明，距离和聚类算法的单一组合并不总是前十大聚类设置之一。摘要：Advancements in Intelligent Traffic Systems (ITS) have made huge amounts of traffic data available through automatic data collection. A big part of this data is stored as trajectories of moving vehicles and road users. Automatic analysis of this data with minimal human supervision would both lower the costs and eliminate subjectivity of the analysis. Trajectory clustering is an unsupervised task. In this paper, we perform a comprehensive comparison of similarity measures, clustering algorithms and evaluation measures using trajectory data from seven intersections. We also propose a method to automatically generate trajectory reference clusters based on their origin and destination points to be used for label-based evaluation measures. Therefore, the entire procedure remains unsupervised both in clustering and evaluation levels. Finally, we use a combination of evaluation measures to find the top performing similarity measures and clustering algorithms for each intersection. The results show that there is no single combination of distance and clustering algorithm that is always among the top ten clustering setups.

【51】 Is Approximation Universally Defensive Against Adversarial Attacks in Deep Neural Networks? 标题：深度神经网络中的近似是否普遍防御敌意攻击？链接：https://arxiv.org/abs/2112.01555

作者：Ayesha Siddique,Khaza Anuarul Hoque 备注：Accepted for publication in DATE 2022 摘要：近似计算以其在提高深度神经网络（DNN）加速器能量效率方面的有效性而闻名，但其代价是轻微的精度损失。最近，据报道，近似组件（如近似乘法器）的不精确性也成功地防御了对DNNs模型的对抗性攻击。由于近似误差以掩蔽或未掩蔽的形式穿过DNN层，这就提出了一个关键的研究问题：近似计算能否始终对DNN中的对抗性攻击提供防御，即它们是否具有普遍的防御能力？为此，我们使用最先进的近似乘法器对不同的近似DNN加速器（AXDNN）进行了广泛的对抗鲁棒性分析。特别是，我们使用MNIST和CIFAR-10数据集评估了十次对抗性攻击对不同AXDNN的影响。我们的结果表明，对AXDNN的对抗性攻击可导致53%的准确度损失，而相同的攻击可能导致准确DNN几乎没有准确度损失（低至0.06%）。因此，近似计算不能被称为对抗性攻击的通用防御策略。摘要：Approximate computing is known for its effectiveness in improvising the energy efficiency of deep neural network (DNN) accelerators at the cost of slight accuracy loss. Very recently, the inexact nature of approximate components, such as approximate multipliers have also been reported successful in defending adversarial attacks on DNNs models. Since the approximation errors traverse through the DNN layers as masked or unmasked, this raises a key research question-can approximate computing always offer a defense against adversarial attacks in DNNs, i.e., are they universally defensive? Towards this, we present an extensive adversarial robustness analysis of different approximate DNN accelerators (AxDNNs) using the state-of-the-art approximate multipliers. In particular, we evaluate the impact of ten adversarial attacks on different AxDNNs using the MNIST and CIFAR-10 datasets. Our results demonstrate that adversarial attacks on AxDNNs can cause 53% accuracy loss whereas the same attack may lead to almost no accuracy loss (as low as 0.06%) in the accurate DNN. Thus, approximate computing cannot be referred to as a universal defense strategy against adversarial attacks.

【52】 Improving mathematical questioning in teacher training 标题：在教师训练中改进数学提问链接：https://arxiv.org/abs/2112.01537

作者：Debajyoti Datta,Maria Phillips,James P Bywater,Jennifer Chiu,Ginger S. Watson,Laura E. Barnes,Donald E Brown 备注：Accepted to appear at the NeurIPS 2021 Human Centered AI Workshop (HCAI). arXiv admin note: text overlap with arXiv:2112.00985 摘要：高保真、基于人工智能的模拟教室系统使教师能够演练有效的教学策略。然而，以对话为导向的开放式对话，如向学生讲授比例因素，可能难以建模。本文介绍了一个高保真的，基于人工智能的课堂模拟器，以帮助教师排练基于研究的数学提问技能。我们采用以人为中心的方法来设计我们的系统，依靠深度学习、不确定性量化和自然语言处理方面的进步，同时承认会话代理对于特定教学需求的局限性。在模拟过程中直接使用专家的输入，我们演示了如何实现对话成功率和高用户满意度。摘要：High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factor can be difficult to model. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skills. We take a human centered approach to designing our system, relying advances in deep-learning, uncertainty quantification and natural language processing while acknowledging the limitations of conversational agents for specific pedagogical needs. Using experts' input directly during the simulation, we demonstrate how conversation success rate and high user satisfaction can be achieved.

【53】 Towards Super-Resolution CEST MRI for Visualization of Small Structures 标题：面向小结构可视化的超分辨率CEST MRI 链接：https://arxiv.org/abs/2112.01905

作者：Lukas Folle,Katharian Tkotz,Fasil Gadjimuradov,Lorenz Kapsner,Moritz Fabian,Sebastian Bickelhaupt,David Simon,Arnd Kleyer,Gerhard Krönke,Moritz Zaiß,Armin Nagel,Andreas Maier 摘要：风湿性疾病（如类风湿性关节炎）的发病通常是亚临床的，这对疾病的早期发现具有挑战性。然而，解剖结构的特征性变化可以通过MRI或CT等成像技术检测出来。现代成像技术，如化学交换饱和转移（CEST）MRI，通过对体内代谢物的成像，有望进一步改善早期检测。为了成像患者关节中的小结构，通常是疾病引起变化的第一个区域之一，CEST MR成像需要高分辨率。然而，由于收购的潜在物理限制，目前CEST MR固有的分辨率较低。在这项工作中，我们比较了已建立的上采样技术和基于神经网络的超分辨率方法。我们可以证明，神经网络能够比现有方法更好地学习从低分辨率到高分辨率非饱和CEST图像的映射。在测试集上，使用ResNet神经网络可以实现32.29dB（ 10%）、0.14（ 28%）的NRMSE和0.85（ 15%）的SSIM，从而显著改善基线。这项工作为超分辨率CEST MRI神经网络的前瞻性研究铺平了道路，并可能导致风湿性疾病发病的早期检测。摘要：The onset of rheumatic diseases such as rheumatoid arthritis is typically subclinical, which results in challenging early detection of the disease. However, characteristic changes in the anatomy can be detected using imaging techniques such as MRI or CT. Modern imaging techniques such as chemical exchange saturation transfer (CEST) MRI drive the hope to improve early detection even further through the imaging of metabolites in the body. To image small structures in the joints of patients, typically one of the first regions where changes due to the disease occur, a high resolution for the CEST MR imaging is necessary. Currently, however, CEST MR suffers from an inherently low resolution due to the underlying physical constraints of the acquisition. In this work we compared established up-sampling techniques to neural network-based super-resolution approaches. We could show, that neural networks are able to learn the mapping from low-resolution to high-resolution unsaturated CEST images considerably better than present methods. On the test set a PSNR of 32.29dB ( 10%), a NRMSE of 0.14 ( 28%), and a SSIM of 0.85 ( 15%) could be achieved using a ResNet neural network, improving the baseline considerably. This work paves the way for the prospective investigation of neural networks for super-resolution CEST MRI and, followingly, might lead to a earlier detection of the onset of rheumatic diseases.

【54】 Causal Homotopy 标题：因果同伦链接：https://arxiv.org/abs/2112.01847

作者：Sridhar Mahadevan 备注：18 pages. arXiv admin note: text overlap with arXiv:2110.15431 摘要：我们利用DAG（偏序集）的偏序集表示和有限Alexandroff拓扑之间的密切联系，刻画了因果DAG模型之间的同伦等价性。Alexandroff空间产生一个定向拓扑空间：拓扑由一个唯一的最小基定义，该最小基由每个变量x的开集定义，指定为包含x的所有开集的交集。Alexandroff空间产生一个（自反的，传递的）前序。满足Kolmogorov T0分离准则的Alexandroff空间，其中开集区分变量，将预序转化为偏序。我们的方法大体上是从数据中构造偏序集的拓扑表示，然后使用偏序集表示来构建传统的DAG因果模型。我们通过展示如何将不同的算法和先前提出的案例研究统一起来来说明我们的框架。拓扑在因果发现中起着两个关键作用。首先，数据集上的拓扑可分性约束已在以前的几种方法中用于从观察和干预推断因果结构。其次，用于表示因果结构的各种图形模型可以用诱导偏序集结构的拓扑表示的统一方式表示。我们证明了利用Alexandroff空间的同伦理论可以有效地减少可能的DAG结构的数量，将搜索空间减少几个数量级。摘要：We characterize homotopical equivalences between causal DAG models, exploiting the close connections between partially ordered set representations of DAGs (posets) and finite Alexandroff topologies. Alexandroff spaces yield a directional topological space: the topology is defined by a unique minimal basis defined by an open set for each variable x, specified as the intersection of all open sets containing x. Alexandroff spaces induce a (reflexive, transitive) preorder. Alexandroff spaces satisfying the Kolmogorov T0 separation criterion, where open sets distinguish variables, converts the preordering into a partial ordering. Our approach broadly is to construct a topological representation of posets from data, and then use the poset representation to build a conventional DAG causal model. We illustrate our framework by showing how it unifies disparate algorithms and case studies proposed previously. Topology plays two key roles in causal discovery. First, topological separability constraints on datasets have been used in several previous approaches to infer causal structure from observations and interventions. Second, a diverse range ofgraphical models used to represent causal structures can be represented in a unified way in terms of a topological representation of the induced poset structure. We show that the homotopy theory of Alexandroff spaces can be exploited to significantly efficiently reduce the number of possible DAG structures, reducing the search space by several orders of magnitude.

【55】 Engineering AI Tools for Systematic and Scalable Quality Assessment in Magnetic Resonance Imaging 标题：磁共振成像系统化、可扩展性质量评估的工程人工智能工具链接：https://arxiv.org/abs/2112.01629

作者：Yukai Zou,Ikbeom Jang 备注：6 pages, 2 figures, NeurIPS Data-Centric AI Workshop 2021 (Virtual) 摘要：随着机器学习算法、并行计算和硬件技术的发展，实现大型医学影像数据集的愿望不断增加。因此，越来越多的人需要汇集来自多个临床和学术机构的数据，以便进行大规模的临床或转化研究。磁共振成像（MRI）是一种常用的非侵入性成像方式。然而，构建大型MRI数据存储库面临着与隐私、数据大小、DICOM格式、物流和非标准化图像相关的多重挑战。不仅构建数据存储库很困难，而且使用存储库中汇集的数据也很困难，因为MRI供应商和成像站点的图像采集、重建和处理管道存在异构性。本职位论文描述了构建大型MRI数据存储库以及在各个方面使用从此类数据存储库下载的数据所面临的挑战。为了帮助应对这些挑战，本文建议引入一个质量评估管道，考虑因素和一般设计原则。摘要：A desire to achieve large medical imaging datasets keeps increasing as machine learning algorithms, parallel computing, and hardware technology evolve. Accordingly, there is a growing demand in pooling data from multiple clinical and academic institutes to enable large-scale clinical or translational research studies. Magnetic resonance imaging (MRI) is a frequently used, non-invasive imaging modality. However, constructing a big MRI data repository has multiple challenges related to privacy, data size, DICOM format, logistics, and non-standardized images. Not only building the data repository is difficult, but using data pooled from the repository is also challenging, due to heterogeneity in image acquisition, reconstruction, and processing pipelines across MRI vendors and imaging sites. This position paper describes challenges in constructing a large MRI data repository and using data downloaded from such data repositories in various aspects. To help address the challenges, the paper proposes introducing a quality assessment pipeline, with considerations and general design principles.

【56】 Quantifying the uncertainty of neural networks using Monte Carlo dropout for deep learning based quantitative MRI 标题：基于蒙特卡罗辍学的定量MRI深度学习神经网络不确定性量化链接：https://arxiv.org/abs/2112.01587

作者：Mehmet Yigit Avci,Ziyu Li,Qiuyun Fan,Susie Huang,Berkin Bilgic,Qiyuan Tian 摘要：辍学通常在训练阶段用作正则化方法，并用于量化深度学习中的不确定性。我们建议在训练和推理步骤中使用辍学，并平均多次预测，以提高准确性，同时减少和量化不确定性。对仅从3个方向扫描获得的分数各向异性（FA）和平均扩散率（MD）图的结果进行了评估。使用我们的方法，与网络输出相比，准确度可以显著提高，而不会丢失，特别是在训练数据集很小的情况下。此外，生成的置信图可能有助于诊断未知的病理学或伪影。摘要：Dropout is conventionally used during the training phase as regularization method and for quantifying uncertainty in deep learning. We propose to use dropout during training as well as inference steps, and average multiple predictions to improve the accuracy, while reducing and quantifying the uncertainty. The results are evaluated for fractional anisotropy (FA) and mean diffusivity (MD) maps which are obtained from only 3 direction scans. With our method, accuracy can be improved significantly compared to network outputs without dropout, especially when the training dataset is small. Moreover, confidence maps are generated which may aid in diagnosis of unseen pathology or artifacts.

【57】 Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images 标题：基于未配准多期CT图像的端到端局灶性肝病检测链接：https://arxiv.org/abs/2112.01535

作者：Sang-gil Lee,Eunji Kim,Jae Seok Bae,Jung Hoon Kim,Sungroh Yoon 备注：IEEE TETCI. 14 pages, 8 figures, 5 tables 摘要：肝脏局灶性病变（FLLs）的计算机辅助诊断有助于改进工作流程，实现正确诊断；FLL检测是计算机辅助诊断的第一步。尽管最近基于深度学习的方法在检测FLL方面取得了成功，但目前的方法对于评估失调的多相数据还不够稳健。通过在特征空间中引入注意引导的多相位对齐，本研究提出了一个全自动的端到端学习框架，用于从多相位CT（CT）图像中检测FLL。我们的方法由于其完全基于学习的方法而对错位多相图像具有鲁棒性，这降低了模型性能对配准质量的敏感性，并使模型能够在临床实践中独立部署。对一个包含280名患者的大规模数据集的评估证实，我们的方法优于以前的最新方法，并显著降低了使用错位多期CT图像检测FLL的性能下降。该方法的鲁棒性可以提高基于深度学习的计算机辅助检测系统的临床应用。摘要：The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase alignment in feature space, this study presents a fully automated, end-to-end learning framework for detecting FLLs from multiphase computed tomography (CT) images. Our method is robust to misaligned multiphase images owing to its complete learning-based approach, which reduces the sensitivity of the model's performance to the quality of registration and enables a standalone deployment of the model in clinical practice. Evaluation on a large-scale dataset with 280 patients confirmed that our method outperformed previous state-of-the-art methods and significantly reduced the performance degradation for detecting FLLs using misaligned multiphase CT images. The robustness of the proposed method can enhance the clinical adoption of the deep-learning-based computer-aided detection system.

机器翻译，仅供参考

linux https 网络安全 python macos

0 人点赞