人工智能学术速递[12.15]

2021-12-17 16:25:23 浏览数 (1)

cs.AI人工智能,共计57篇

【1】 EgoBody: Human Body Shape, Motion and Social Interactions from Head-Mounted Devices 标题:EgoBody:来自头盔设备的人体形状、运动和社会互动 链接:https://arxiv.org/abs/2112.07642

作者:Siwei Zhang,Qianli Ma,Yan Zhang,Zhiyin Qian,Marc Pollefeys,Federica Bogo,Siyu Tang 机构:ETH Z¨urich, Microsoft 摘要:从第一人称视角理解社会互动对于许多应用至关重要,从辅助机器人到AR/VR。推理交互的第一步是理解人类的姿势和形状。然而,由于缺乏数据,这方面的研究目前受到阻碍。现有数据集在大小、注释、地面真相捕获模式或交互多样性方面都受到限制。我们通过提出EgoBody来解决这个缺点,EgoBody是一个用于复杂3D场景中社会交互的新型大规模数据集。我们使用Microsoft HoloLens2耳机记录丰富的以自我为中心的数据流(包括RGB、深度、眼睛注视、头部和手部跟踪)。为了获得准确的3D地面真实感,我们使用multi-Kinect装备校准耳机,并将富有表现力的SMPL-X身体网格拟合到多视图RGB-D帧,重建与场景相关的3D人体姿势和形状。我们收集了68个序列,跨越不同的社会学交互类别,并提出了第一个从自我中心的角度进行3D全身姿势和形状估计的基准。我们的数据集和代码将在https://sanweiliti.github.io/egobody/egobody.html. 摘要:Understanding social interactions from first-person views is crucial for many applications, ranging from assistive robotics to AR/VR. A first step for reasoning about interactions is to understand human pose and shape. However, research in this area is currently hindered by the lack of data. Existing datasets are limited in terms of either size, annotations, ground-truth capture modalities or the diversity of interactions. We address this shortcoming by proposing EgoBody, a novel large-scale dataset for social interactions in complex 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground-truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human poses and shapes relative to the scene. We collect 68 sequences, spanning diverse sociological interaction categories, and propose the first benchmark for 3D full-body pose and shape estimation from egocentric views. Our dataset and code will be available for research at https://sanweiliti.github.io/egobody/egobody.html.

【2】 How and Why to Manipulate Your Own Agent 标题:如何以及为什么操纵您自己的代理 链接:https://arxiv.org/abs/2112.07640

作者:Yoav Kolumbus,Noam Nisan 机构:†The Hebrew University of Jerusalem 摘要:我们考虑战略设置,其中几个用户参与重复的在线互动,辅以后悔最小化代理重复玩“游戏”为他们的名义。我们研究了代理重复博弈的动力学和平均结果,并将其视为诱导用户之间的元博弈。我们主要关注的是,用户是否可以在这个元游戏中通过错误地向代理报告他们的参数来“操纵”他们自己的代理而获益。我们正式定义了一般博弈的“用户-代理元博弈”模型,讨论了它在自动代理动力学收敛的不同概念下的性质,并分析了在2x2博弈中,当动态收敛到一个均衡时,对用户产生的均衡。 摘要:We consider strategic settings where several users engage in a repeated online interaction, assisted by regret-minimizing agents that repeatedly play a "game" on their behalf. We study the dynamics and average outcomes of the repeated game of the agents, and view it as inducing a meta-game between the users. Our main focus is on whether users can benefit in this meta-game from "manipulating" their own agent by mis-reporting their parameters to it. We formally define this "user-agent meta-game" model for general games, discuss its properties under different notions of convergence of the dynamics of the automated agents and analyze the equilibria induced on the users in 2x2 games in which the dynamics converge to a single equilibrium.

【3】 ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs 标题:ISEEQ:基于动态元信息检索和知识图的信息搜索问题生成 链接:https://arxiv.org/abs/2112.07622

作者:Manas Gaur,Kalpa Gunaratna,Vijay Srinivasan,Hongxia Jin 机构:†AI Institute, University of South Carolina, SC, USA, ‡Samsung Research America, Mountain View, CA, USA 备注:Accepted at AAAI 2022, preprint version. Supplementary materials are provided in the paper and alternatively can be found at this https URL 摘要:会话信息搜索(CIS)是会话人工智能中一个相对较新的研究领域,它试图从最终用户那里寻找信息,以理解和满足用户的需求。如果实现,这样的系统在现实世界中具有深远的好处;例如,CIS系统可以帮助临床医生在医疗保健中对患者进行预筛选或分类。CIS中的一个关键开放子问题在文献中尚未解决,即基于最终用户的简短初始查询生成信息查询问题(ISQ)。为了解决这个开放性问题,我们提出了信息搜索问题生成器(ISEEQ),这是一种新的方法,用于在给定与用户查询相关的大型文本语料库的情况下,从一个简短的用户查询生成ISQ。首先,ISEEQ使用知识图来丰富用户查询。其次,ISEEQ使用知识丰富的查询来检索相关的上下文段落,以询问遵循概念流的连贯ISQ。第三,ISEEQ引入了一种新的基于深度生成对抗强化学习的ISQ生成方法。我们证明ISEEQ可以生成高质量的ISQ,以促进CIS试剂的发展。ISEEQ在具有来自不同领域的用户查询的四个数据集的五个ISQ评估指标上显著优于可比基线。此外,我们认为ISEEQ可以跨域转移以生成ISQ,因为当在不同的域对上进行训练和测试时,它显示了可接受的性能。定性人工评估证实ISEEQ生成的ISQ在质量上与人工生成的问题相当,并且优于最佳可比基线。 摘要:Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users' needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end-user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative-adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline.

【4】 Cold Item Integration in Deep Hybrid Recommenders via Tunable Stochastic Gates 标题:基于可调随机门的深度混合推荐器冷项集成 链接:https://arxiv.org/abs/2112.07615

作者:Oren Barkan,Roy Hirsch,Ori Katz,Avi Caciularu,Jonathan Weill,Noam Koenigstein 机构:The Open University, Tel-Aviv University, Technion, Bar-Ilan University, Microsoft 摘要:协作过滤方法中的一个主要挑战是如何为冷项(没有评级的项)生成建议,或将冷项集成到现有目录中。多年来,人们提出了各种混合推荐模型,通过利用项目的元数据和内容以及它们的评级或使用模式来解决这个问题。在这项工作中,我们希望重新探讨冷启动问题,以提请注意一个被忽视的挑战:整合和平衡(常规)温暖项目和完全寒冷项目的能力。在这种情况下,出现了两个不同的挑战:(1)保持温暖物品的高质量性能,(2)学习向相关用户推广冷物品。首先,我们表明这两个目标实际上是相互冲突的,它们之间的平衡取决于业务需求和手头的应用程序。接下来,我们提出了一种新的混合推荐算法,该算法将这两个相互冲突的目标连接起来,并在保持温暖项目的高准确性的同时有效地促进完全寒冷的项目之间实现协调平衡。我们在电影、应用程序和文章推荐上证明了所提出算法的有效性,并对冷-暖权衡进行了实证分析。 摘要:A major challenge in collaborative filtering methods is how to produce recommendations for cold items (items with no ratings), or integrate cold item into an existing catalog. Over the years, a variety of hybrid recommendation models have been proposed to address this problem by utilizing items' metadata and content along with their ratings or usage patterns. In this work, we wish to revisit the cold start problem in order to draw attention to an overlooked challenge: the ability to integrate and balance between (regular) warm items and completely cold items. In this case, two different challenges arise: (1) preserving high quality performance on warm items, while (2) learning to promote cold items to relevant users. First, we show that these two objectives are in fact conflicting, and the balance between them depends on the business needs and the application at hand. Next, we propose a novel hybrid recommendation algorithm that bridges these two conflicting objectives and enables a harmonized balance between preserving high accuracy for warm items while effectively promoting completely cold items. We demonstrate the effectiveness of the proposed algorithm on movies, apps, and articles recommendations, and provide an empirical analysis of the cold-warm trade-off.

【5】 Semantic Answer Type and Relation Prediction Task (SMART 2021) 标题:语义答案类型和关系预测任务(SMART 2021) 链接:https://arxiv.org/abs/2112.07606

作者:Nandana Mihindukulasooriya,Mohnish Dubey,Alfio Gliozzo,Jens Lehmann,Axel-Cyrille Ngonga Ngomo,Ricardo Usbeck,Gaetano Rossiello,Uttam Kumar 机构:IBM Research AI, University of Bonn and Fraunhofer IAIS, Universität Paderborn, Conversational AI, Fraunhofer IAIS Dresden, IBM Research, USA, Germany 摘要:每年,国际语义网会议都会组织一系列语义网挑战赛,以建立竞赛,推动某些问题领域的最新解决方案。语义答案类型和关系预测任务(SMART)任务是ISWC 2021语义Web挑战之一。在ISWC 2020成功举办SMART 2020之后,今年是挑战的第二年。今年的版本重点关注对知识库问答(KBQA)非常重要的两个子任务:答案类型预测和关系预测。问题类型和答案类型预测可以在知识库问答系统中发挥关键作用,提供有关预期答案的见解,有助于生成正确的查询或对候选答案进行排序。更具体地说,给出一个自然语言问题,第一个任务是,使用目标本体(例如DBpedia或Wikidata)预测答案类型。类似地,第二项任务是识别自然语言查询中的关系并将其链接到目标本体中的关系。本文讨论了任务描述、基准数据集和评估指标。有关更多信息,请访问https://smart-task.github.io/2021/. 摘要:Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge after a successful SMART 2020 at ISWC 2020. This year's version focuses on two sub-tasks that are very important to Knowledge Base Question Answering (KBQA): Answer Type Prediction and Relation Prediction. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights about the expected answer that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the first task is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata. Similarly, the second task is to identify relations in the natural language query and link them to the relations in a target ontology. This paper discusses the task descriptions, benchmark datasets, and evaluation metrics. For more information, please visit https://smart-task.github.io/2021/.

【6】 The King is Naked: on the Notion of Robustness for Natural Language Processing 标题:国王是裸体的:关于自然语言处理的健壮性概念 链接:https://arxiv.org/abs/2112.07605

作者:Emanuele La Malfa,Marta Kwiatkowska 机构:Department of Computer Science, University of Oxford 备注:AAAI 2022 main-track (full-paper) 摘要:越来越多的证据表明,最初为图像引入的对抗性稳健性的经典概念已被NLP研究界的大部分人作为事实标准采用。我们表明,在NLP的背景下,这个概念是有问题的,因为它考虑了一个狭窄的语言现象谱。在本文中,我们主张语义稳健性,这与人类的语言保真度概念更为一致。我们根据模型中预期会产生的偏差来描述语义稳健性。我们使用一个基于模板的生成性测试平台研究了一系列普通的、经过健壮训练的体系结构的语义健壮性。我们用实证证据来补充分析,尽管语义稳健性很难实现,但语义稳健性可以提高性能%,这保证了在经典意义上的模型稳健性失败的复杂语言现象上的性能。 摘要:There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance %gives guarantees for on complex linguistic phenomena where models robust in the classical sense fail.

【7】 Learning to Deblur and Rotate Motion-Blurred Faces 标题:学习消除模糊和旋转运动模糊面 链接:https://arxiv.org/abs/2112.07599

作者:Givi Meishvili,Attila Szabó,Simon Jenni,Paolo Favaro 机构: University of Bern, Switzerland, Adobe Research, Huawei, Noah’s Ark Lab, (work was done before joining) 备注:British Machine Vision Conference 2021 摘要:我们提出了一个解决方案,从一个单一的运动模糊的人脸图像从新的视点渲染清晰的视频。我们的方法通过在三个大型数据集(FFHQ和300VW)上的联合训练隐式学习人脸的几何和运动来处理人脸模糊的复杂性,这三个数据集是公开的,并且我们构建了一个新的Bern多视图人脸数据集(BMFD)。前两个数据集提供了大量的面,使我们的模型能够更好地概括。BMFD允许我们引入多视图约束,这对于从新的摄像机视图合成清晰视频至关重要。它由来自多个主题的多个视图的高帧速率同步视频组成,显示了广泛的面部表情。我们使用高帧率视频通过平均来模拟真实的运动模糊。借助于这个数据集,我们训练了一个神经网络,从单个图像和相应的面部注视重建三维视频表示。然后,我们提供一个相对于估计的凝视和模糊图像的摄像机视点,作为编码器-解码器网络的输入,以生成具有新摄像机视点的锐利帧的视频。我们在多视图数据集和VIDTIMIT的测试对象上演示了我们的方法。 摘要:We propose a solution to the novel task of rendering sharp videos from new viewpoints from a single motion-blurred image of a face. Our method handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built. The first two datasets provide a large variety of faces and allow our model to generalize better. BMFD instead allows us to introduce multi-view constraints, which are crucial to synthesizing sharp videos from a new camera view. It consists of high frame rate synchronized videos from multiple views of several subjects displaying a wide range of facial expressions. We use the high frame rate videos to simulate realistic motion blur through averaging. Thanks to this dataset, we train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze. We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint. We demonstrate our approach on test subjects of our multi-view dataset and VIDTIMIT.

【8】 Rushing and Strolling among Answer Sets -- Navigation Made Easy 标题:在答案集之间奔波漫步--导航变得简单 链接:https://arxiv.org/abs/2112.07596

作者:Johannes K. Fichte,Sarah Alice Gaggl,Dominik Rusovac 摘要:答案集编程(ASP)是一种流行的声明式编程范式,在人工智能中有着广泛的应用。通常,在使用ASP对人工智能问题建模时,尤其是当我们对简单搜索最优解决方案以外的问题感兴趣时,ASP程序的实际解决方案、解决方案之间的差异或解决方案的数量非常重要。例如,当用户打算根据自己的需要确定特定的答案集时,或者需要总数量的分歧解决方案来理解概率应用,例如医学领域中的推理。然后,只有某些特定于问题的手工编码技术可用于浏览ASP程序的解决方案空间,这通常是不够的。在本文中,我们提出了一个正式和通用的框架,用于交互式导航到类似于分面浏览的答案集的期望子集。我们的方法使用户能够以某种可配置的速度有意识地放大或缩小解决方案的子空间,从而探索解决方案空间。我们说明了加权分面导航在计算上是困难的。最后,我们提供了一个实现我们的方法,证明了我们的框架对于不可理解的解空间的可行性。 摘要:Answer set programming (ASP) is a popular declarative programming paradigm with a wide range of applications in artificial intelligence. Oftentimes, when modeling an AI problem with ASP, and in particular when we are interested beyond simple search for optimal solutions, an actual solution, differences between solutions, or number of solutions of the ASP program matter. For example, when a user aims to identify a specific answer set according to her needs, or requires the total number of diverging solutions to comprehend probabilistic applications such as reasoning in medical domains. Then, there are only certain problem specific and handcrafted encoding techniques available to navigate the solution space of ASP programs, which is oftentimes not enough. In this paper, we propose a formal and general framework for interactive navigation towards desired subsets of answer sets analogous to faceted browsing. Our approach enables the user to explore the solution space by consciously zooming in or out of sub-spaces of solutions at a certain configurable pace. We illustrate that weighted faceted navigation is computationally hard. Finally, we provide an implementation of our approach that demonstrates the feasibility of our framework for incomprehensible solution spaces.

【9】 Cooperation for Scalable Supervision of Autonomy in Mixed Traffic 标题:混合交通中自主性可扩展监管的协作 链接:https://arxiv.org/abs/2112.07569

作者:Cameron Hickert,Sirui Li,Cathy Wu 机构:comSirui Li is with the Institute for Data, MassachusettsInstitute of Technology 备注:14 pages, 7 figures 摘要:自主性的提高为许多领域带来了积极成果的潜力,但很难保证它们的安全部署。这项工作调查了人类如何能够智能地监督代理,以实现某种程度的安全,即使在性能保证难以捉摸的情况下。令人振奋的研究问题是:在安全关键设置中,我们是否可以避免需要一个人随时监控一台机器?本文形式化了这种“缩放监控”问题,并研究了其在自动车辆(AVs)融入交通安全关键环境中的应用。它提出了一种保守的、基于可达性的方法,以减轻AVs人工监管者的负担,允许在此设置中建立监督要求的高置信上限。订单统计和具有深度强化学习的流量模拟从分析和数值上表明,AVs的组合能够实现AV采用的监控时间次线性。一个关键的收获是,尽管AVs目前存在缺陷,但随着AVs的大规模部署,监督变得更加容易处理。虽然这项工作的重点是AVs,但可扩展的监控框架与更广泛的自主控制挑战相关。 摘要:Improvements in autonomy offer the potential for positive outcomes in a number of domains, yet guaranteeing their safe deployment is difficult. This work investigates how humans can intelligently supervise agents to achieve some level of safety even when performance guarantees are elusive. The motivating research question is: In safety-critical settings, can we avoid the need to have one human supervise one machine at all times? The paper formalizes this 'scaling supervision' problem, and investigates its application to the safety-critical context of autonomous vehicles (AVs) merging into traffic. It proposes a conservative, reachability-based method to reduce the burden on the AVs' human supervisors, which allows for the establishment of high-confidence upper bounds on the supervision requirements in this setting. Order statistics and traffic simulations with deep reinforcement learning show analytically and numerically that teaming of AVs enables supervision time sublinear in AV adoption. A key takeaway is that, despite present imperfections of AVs, supervision becomes more tractable as AVs are deployed en masse. While this work focuses on AVs, the scalable supervision framework is relevant to a broader array of autonomous control challenges.

【10】 Modeling Strong and Human-Like Gameplay with KL-Regularized Search 标题:基于KL正则化搜索的强势类人游戏建模 链接:https://arxiv.org/abs/2112.07544

作者:Athul Paul Jacob,David J. Wu,Gabriele Farina,Adam Lerer,Anton Bakhtin,Jacob Andreas,Noam Brown 机构: USA 3School of Computer Sci-ence, Carnegie Mellon University 摘要:考虑到人类行为的例子,我们考虑在多智能体决策问题中建立强但类人的策略的任务。模仿学习在预测人类行为方面是有效的,但可能无法与专家的力量相匹配,而自我游戏学习和搜索技术(如AlphaZero)可以带来强大的性能,但可能产生人类难以理解和协调的策略。我们在chess and Go中表明,通过应用Monte Carlo树搜索,基于模仿学习策略的KL差异对搜索策略进行正则化,可以产生比模仿策略更高的人类预测精度和更强的策略。然后,我们介绍了一种新的基于模仿学习策略KL发散的正则化后悔最小化算法,并表明将该算法应用于无新闻外交会产生一种策略,该策略在保持与模仿学习相同的人类预测精度的同时,具有更大的强度。 摘要:We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.

【11】 Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning 标题:科学发现与测量成本--强化学习中信息与成本的平衡 链接:https://arxiv.org/abs/2112.07535

作者:Colin Bellinger,Andriy Drozdyuk,Mark Crowley,Isaac Tamblyn 机构: National Research Council of Canada, Carleton University, University of Waterloo, University of Ottawa, Vector Institute for Artificial Intelligence 备注:To appear in: 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) 摘要:强化学习(RL)在材料设计和自动化化学等科学应用中的应用正在增加。然而,一个主要挑战在于,在科学应用中,测量系统状态通常成本高昂且耗时,而使用RL学习政策需要在每个时间步后进行测量。在这项工作中,我们以成本报酬的形式明确了测量成本,并提出了一个框架,使现成的深度RL算法能够学习选择操作和确定是否在每个时间步测量系统当前状态的策略。通过这种方式,代理学会在信息需求和信息成本之间取得平衡。我们的结果表明,当在这种机制下训练时,决斗的DQN和PPO代理可以学习最佳行动策略,同时减少50%的状态测量,而递归神经网络可以减少50%以上的测量。我们假设这些减少有助于降低将RL应用于实际科学应用的障碍。 摘要:The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50% fewer state measurements, and recurrent neural networks can produce a greater than 50% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.

【12】 n-CPS: Generalising Cross Pseudo Supervision to n networks for Semi-Supervised Semantic Segmentation标题:n-CPS:将交叉伪监督推广到n网络进行半监督语义分割链接:https://arxiv.org/abs/2112.07528

作者:Dominik Filipiak,Piotr Tempczyk,Marek Cygan 机构: AI Clearing, Inc., Semantic Technology Institute, Department of Computer Science, University of Innsbruck, Informatics and Mechanics, University of Warsaw 摘要:我们提出了$n$-CPS——一种最新的用于半监督语义切分任务的交叉伪监督(CPS)方法的推广。在$n$-CPS中,有$n$同时训练的子网络通过一个热编码扰动和一致性正则化相互学习。我们还表明,集成技术应用于子网输出可以显著提高性能。据我们所知,$n$-CPS与CutMix组合的表现优于CPS,并为Pascal VOC 2012设定了新的最先进水平,包括(1/16、1/8、1/4和1/2监管制度)和城市景观(1/16监管)。 摘要:We present $n$-CPS - a generalisation of the recent state-of-the-art cross pseudo supervision (CPS) approach for the task of semi-supervised semantic segmentation. In $n$-CPS, there are $n$ simultaneously trained subnetworks that learn from each other through one-hot encoding perturbation and consistency regularisation. We also show that ensembling techniques applied to subnetworks outputs can significantly improve the performance. To the best of our knowledge, $n$-CPS paired with CutMix outperforms CPS and sets the new state-of-the-art for Pascal VOC 2012 with (1/16, 1/8, 1/4, and 1/2 supervised regimes) and Cityscapes (1/16 supervised).

【13】 A Style and Semantic Memory Mechanism for Domain Generalization 标题:一种面向领域泛化的风格和语义记忆机制 链接:https://arxiv.org/abs/2112.07517

作者:Yang Chen,Yu Wang,Yingwei Pan,Ting Yao,Xinmei Tian,Tao Mei 机构:† University of Science and Technology of China, Hefei, China, ‡JD AI Research, Beijing, China 备注:ICCV 2021 摘要:主流最先进的领域泛化算法倾向于优先考虑跨领域语义不变性的假设。同时,固有的域内风格不变性通常被低估和搁置。在本文中,我们发现利用域内风格不变性对于提高域泛化的效率也至关重要。我们验证了网络提供关于哪些领域特征是不变的以及在实例之间共享的信息是至关重要的,这样网络可以增强其理解能力并提高其语义辨别能力。相应地,我们还提出了一种新的“陪审团”机制,该机制在学习领域间有用的语义特征共性方面特别有效。我们称为STEAM的完整模型可以解释为一种新的概率图形模型,其实现需要方便地构造两种类型的内存库:语义特征库和样式特征库。实证结果表明,我们提出的框架明显优于最先进的方法。 摘要:Mainstream state-of-the-art domain generalization algorithms tend to prioritize the assumption on semantic invariance across domains. Meanwhile, the inherent intra-domain style invariance is usually underappreciated and put on the shelf. In this paper, we reveal that leveraging intra-domain style invariance is also of pivotal importance in improving the efficiency of domain generalization. We verify that it is critical for the network to be informative on what domain features are invariant and shared among instances, so that the network sharpens its understanding and improves its semantic discriminative ability. Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains. Our complete model called STEAM can be interpreted as a novel probabilistic graphical model, for which the implementation requires convenient constructions of two kinds of memory banks: semantic feature bank and style feature bank. Empirical results show that our proposed framework surpasses the state-of-the-art methods by clear margins.

【14】 Transferrable Contrastive Learning for Visual Domain Adaptation 标题:用于视域自适应的可转移对比学习 链接:https://arxiv.org/abs/2112.07516

作者:Yang Chen,Yingwei Pan,Yu Wang,Ting Yao,Xinmei Tian,Tao Mei 机构:University of Science and Technology of China, Hefei, China, JD AI Research, Beijing, China 备注:ACM Multimedia 2021 摘要:自监督学习(SSL)最近成为特征学习方法中最受欢迎的一种。因此,呼吁域适应方法考虑纳入SSL。直觉是强制执行实例级的特性一致性,这样预测器就可以在域之间保持不变。然而,域适配机制中的大多数现有SSL方法通常被视为独立的辅助组件,使得域适配的签名无人值守。实际上,域间隙消失的最佳区域和SSL所研究的实例级约束可能根本不一致。从这一点出发,我们提出了一种专门针对领域适应的自我监督学习范式,即可转移对比学习(TCL),它将SSL与所需的跨领域转移性一致地联系起来。我们发现,对比学习本质上是一种适用于领域适应的方法,因为它的实例不变性假设可以方便地推广到领域适应任务所青睐的跨领域类级不变性。TCL基于特定的记忆库结构和伪标记策略,通过清晰新颖的对比损失来惩罚源和目标之间的跨域类内域差异。免费午餐是,得益于对比学习的结合,TCL依赖于移动平均密钥编码器,该编码器自然实现了目标数据伪标签的临时集成版本,从而避免了伪标签错误传播,无需额外成本。因此,TCL有效地减少了跨域差距。通过针对单源和多源域适配任务的大量基准测试(Office Home、VisDA-2017、Digits five、PACS和DomainNet),TCL展示了最先进的性能。 摘要:Self-supervised learning (SSL) has recently become the favorite among feature learning methodologies. It is therefore appealing for domain adaptation approaches to consider incorporating SSL. The intuition is to enforce instance-level feature consistency such that the predictor becomes somehow invariant across domains. However, most existing SSL methods in the regime of domain adaptation usually are treated as standalone auxiliary components, leaving the signatures of domain adaptation unattended. Actually, the optimal region where the domain gap vanishes and the instance level constraint that SSL peruses may not coincide at all. From this point, we present a particular paradigm of self-supervised learning tailored for domain adaptation, i.e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently. We find contrastive learning intrinsically a suitable candidate for domain adaptation, as its instance invariance assumption can be conveniently promoted to cross-domain class-level invariance favored by domain adaptation tasks. Based on particular memory bank constructions and pseudo label strategies, TCL then penalizes cross-domain intra-class domain discrepancy between source and target through a clean and novel contrastive loss. The free lunch is, thanks to the incorporation of contrastive learning, TCL relies on a moving-averaged key encoder that naturally achieves a temporally ensembled version of pseudo labels for target data, which avoids pseudo label error propagation at no extra cost. TCL therefore efficiently reduces cross-domain gaps. Through extensive experiments on benchmarks (Office-Home, VisDA-2017, Digits-five, PACS and DomainNet) for both single-source and multi-source domain adaptation tasks, TCL has demonstrated state-of-the-art performances.

【15】 CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising 标题:Coco-BERT:用对比跨模态匹配和去噪改进视频语言预训练 链接:https://arxiv.org/abs/2112.07515

作者:Jianjie Luo,Yehao Li,Yingwei Pan,Ting Yao,Hongyang Chao,Tao Mei 机构:★ School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, ♣ The Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of, Education, Guangzhou, China, ♠ JD AI Research, Beijing, China 备注:ACM Multimedia 2021 摘要:BERT式结构导致了视觉语言预训练的革命,并在众多视觉语言下游任务中取得了最新成果。现有解决方案主要利用带有掩码令牌的多模态输入来触发基于掩码的代理预训练任务(例如,掩码语言建模和掩码对象/帧预测)。在这项工作中,我们认为这样的蒙蔽输入将不可避免地在跨模式匹配代理任务中引入噪声,从而使固有的视觉语言关联未得到充分的探索。作为替代方案,我们推导了一种用于视频语言预训练的特定形式的跨模态代理目标,即对比跨模态匹配和去噪(CoCo)。通过将屏蔽帧/词序列视为主要未屏蔽帧/词序列的噪声增强,CoCo通过在屏蔽和未屏蔽输入之间以对比方式同时进行模态间匹配和模态内去噪来增强视频语言关联。我们的CoCo代理目标可以进一步集成到任何用于视频语言预训练的BERT类型编解码器结构中,称为对比跨模态BERT(CoCo BERT)。我们在电视数据集和新收集的大规模GIF视频数据集(ACTION)上预先训练CoCo BERT。通过广泛的下游任务(如跨模式检索、视频问答和视频字幕)的大量实验,我们证明了CoCo-BERT作为预训练结构的优越性。 摘要:BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks. Existing solutions dominantly capitalize on the multi-modal inputs with mask tokens to trigger mask-based proxy pre-training tasks (e.g., masked language modeling and masked object/frame prediction). In this work, we argue that such masked inputs would inevitably introduce noise for cross-modal matching proxy task, and thus leave the inherent vision-language association under-explored. As an alternative, we derive a particular form of cross-modal proxy objective for video-language pre-training, i.e., Contrastive Cross-modal matching and denoising (CoCo). By viewing the masked frame/word sequences as the noisy augmentation of primary unmasked ones, CoCo strengthens video-language association by simultaneously pursuing inter-modal matching and intra-modal denoising between masked and unmasked inputs in a contrastive manner. Our CoCo proxy objective can be further integrated into any BERT-type encoder-decoder structure for video-language pre-training, named as Contrastive Cross-modal BERT (CoCo-BERT). We pre-train CoCo-BERT on TV dataset and a newly collected large-scale GIF video dataset (ACTION). Through extensive experiments over a wide range of downstream tasks (e.g., cross-modal retrieval, video question answering, and video captioning), we demonstrate the superiority of CoCo-BERT as a pre-trained structure.

【16】 CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning 标题:核心文本:利用对比关系推理改进场景文本检测 链接:https://arxiv.org/abs/2112.07513

作者:Jingyang Lin,Yingwei Pan,Rongfeng Lai,Xuehang Yang,Hongyang Chao,Ting Yao 机构:∗Sun Yat-sen University, Guangzhou, China, †The Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, Guangzhou, P. R. China, ‡JD AI Research, Beijing, China 备注:ICME 2021 (Oral); Code is publicly available at: this https URL 摘要:在自然场景中定位文本实例被认为是计算机视觉的一个基本挑战。然而,由于真实场景中文本实例的纵横比和比例差异极大,大多数传统的文本检测器都面临着只定位文本实例片段(即子文本)的子文本问题。在这项工作中,我们定量分析了子文本问题,并提出了一个简单而有效的设计,对比关系(核心)模块,以缓解该问题。CORE首先利用一个普通的关系块来建模所有文本提议(多个文本实例的子文本)之间的关系,并通过实例级子文本区分以对比方式进一步增强关系推理。这种方法自然地学习文本建议的实例感知表示,从而促进场景文本检测。我们将核心模块集成到Mask R-CNN的两级文本检测器中,并设计了我们的文本检测器核心文本。在四个基准上的大量实验证明了核心文本的优越性。代码可用:url{https://github.com/jylins/CORE-Text}. 摘要:Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scales of text instances in real scenes, most conventional text detectors suffer from the sub-text problem that only localizes the fragments of text instance (i.e., sub-texts). In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module, to mitigate that issue. CORE first leverages a vanilla relation block to model the relations among all text proposals (sub-texts of multiple text instances) and further enhances relational reasoning via instance-level sub-text discrimination in a contrastive manner. Such way naturally learns instance-aware representations of text proposals and thus facilitates scene text detection. We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text. Extensive experiments on four benchmarks demonstrate the superiority of CORE-Text. Code is available: url{https://github.com/jylins/CORE-Text}.

【17】 Adversarial Examples for Extreme Multilabel Text Classification 标题:极端多标签文本分类的对抗性实例 链接:https://arxiv.org/abs/2112.07512

作者:Mohammadreza Qaraei,Rohit Babbar 机构:Aalto University, Helsinki, Finland 摘要:极端多标签文本分类(XMTC)是一个文本分类问题,其中,(i)输出空间非常大,(ii)每个数据点可能有多个正标签,以及(iii)数据遵循强不平衡分布。随着XMTC在推荐系统和web文档自动标注中的应用,XMTC的研究重点已经放在提高预测精度和处理不平衡数据上。然而,基于深度学习的XMTC模型对对抗性示例的鲁棒性在很大程度上还没有得到充分的探索。本文研究了XMTC模型在对抗攻击下的行为。为此,首先,我们在多标签文本分类问题中定义了对抗性攻击。我们将攻击性多标签文本分类器分类为(a)正目标,其中目标正标签应不属于前k个预测标签;和(b)负目标,其中目标负标签应位于前k个预测标签中。然后,通过在APLC XLNet和AttentionXML上的实验,我们表明XMTC模型对积极目标攻击非常脆弱,但对消极目标攻击更具鲁棒性。此外,我们的实验表明,正面目标对抗攻击的成功率具有不平衡分布。更准确地说,tail类极易受到敌对攻击,攻击者可以为其生成与实际数据点高度相似的敌对样本。为了克服这个问题,我们探索了XMTC中重新平衡损失函数的效果,它们不仅提高了尾部类的准确性,而且还提高了这些类对敌对攻击的鲁棒性。我们的实验代码可在https://github.com/xmc-aalto/adv-xmtc 摘要:Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution. With applications in recommendation systems and automatic tagging of web-scale documents, the research on XMTC has been focused on improving prediction accuracy and dealing with imbalanced data. However, the robustness of deep learning based XMTC models against adversarial examples has been largely underexplored. In this paper, we investigate the behaviour of XMTC models under adversarial attacks. To this end, first, we define adversarial attacks in multilabel text classification problems. We categorize attacking multilabel text classifiers as (a) positive-targeted, where the target positive label should fall out of top-k predicted labels, and (b) negative-targeted, where the target negative label should be among the top-k predicted labels. Then, by experiments on APLC-XLNet and AttentionXML, we show that XMTC models are highly vulnerable to positive-targeted attacks but more robust to negative-targeted ones. Furthermore, our experiments show that the success rate of positive-targeted adversarial attacks has an imbalanced distribution. More precisely, tail classes are highly vulnerable to adversarial attacks for which an attacker can generate adversarial samples with high similarity to the actual data-points. To overcome this problem, we explore the effect of rebalanced loss functions in XMTC where not only do they increase accuracy on tail classes, but they also improve the robustness of these classes against adversarial attacks. The code for our experiments is available at https://github.com/xmc-aalto/adv-xmtc

【18】 Reconfiguring Shortest Paths in Graphs 标题:重新配置图中的最短路径 链接:https://arxiv.org/abs/2112.07499

作者:Kshitij Gajjar,Agastya Vibhuti Jha,Manish Kumar,Abhiruk Lahiri 机构:National University of Singapore, Singapore, École polytechnique fédérale de Lausanne, Switzerland, Ben-Gurion University of the Negev, Israel, Ariel University, Israel 备注:28 pages, 14 figures. To be presented at AAAI 2022 摘要:在图中重新配置两条最短路径意味着通过一次更改一个顶点来修改一条最短路径到另一条最短路径,以便所有中间路径也是最短路径。这个问题有几个自然的应用,即:(a)改造道路网络,(b)在同步多处理环境下重新路由数据包,(c)海运集装箱配载问题,以及(d)列车编组问题。当建模为图问题时,(a)是最普遍的情况,而(b)、(c)和(d)是对不同图类的限制。我们证明了(a)是难以解决的,即使对于问题的放松变体也是如此。对于(b)、(c)和(d),我们提出了有效的算法来解决相应的问题。我们还将该问题推广到,最短路径上的相邻顶点每次最多可以更改$k$(对于固定整数$kgeq 2$)。 摘要:Reconfiguring two shortest paths in a graph means modifying one shortest path to the other by changing one vertex at a time so that all the intermediate paths are also shortest paths. This problem has several natural applications, namely: (a) revamping road networks, (b) rerouting data packets in synchronous multiprocessing setting, (c) the shipping container stowage problem, and (d) the train marshalling problem. When modelled as graph problems, (a) is the most general case while (b), (c) and (d) are restrictions to different graph classes. We show that (a) is intractable, even for relaxed variants of the problem. For (b), (c) and (d), we present efficient algorithms to solve the respective problems. We also generalize the problem to when at most $k$ (for a fixed integer $kgeq 2$) contiguous vertices on a shortest path can be changed at a time.

【19】 EABlock: A Declarative Entity Alignment Block for Knowledge Graph Creation Pipelines 标题:EABlock:一种面向知识图创建管道的声明性实体对齐挡路 链接:https://arxiv.org/abs/2112.07493

作者:Samaneh Jozashoori,Ahmad Sakor,Enrique Iglesias,Maria-Esther Vidal 机构: TIB Leibniz Information Center for Science and Technology & L,S, Germany, L,S Research Centre, University of Hannover, Germany, L,S Research Center 摘要:尽管编码了大量丰富而有价值的数据,但现有数据源大多是独立创建的,这对它们的集成是一个重大挑战。映射语言,例如RML和R2RML,有助于对应用元数据和将数据集成到知识图中的过程进行声明性规范。除了表示数据源和统一模式之间的对应关系外,映射规则还可以包括知识提取功能。将映射规则和函数结合起来代表了一种强大的形式主义,可以指定管道,以便将数据透明地集成到知识图中。令人惊讶的是,这些形式并没有完全适应,许多知识图是通过执行特殊程序来预处理和集成数据而创建的。在本文中,我们提出了EABlock,一种集成实体对齐(EA)作为RML映射规则一部分的方法。EABlock包括一个功能块,用于根据文本属性执行实体识别,并将识别的实体链接到Wikidata、DBpedia和特定领域的同义词库(例如UMLS)中的相应资源。EABlock提供了不可知且高效的技术来评估函数并传输映射,以便于在任何符合RML的引擎中应用。我们对EABlock性能进行了实证评估,结果表明,EABlock加快了知识图创建管道的速度,这些管道需要在最先进的RML兼容引擎中进行实体识别和链接。EABlock还可以通过GitHub存储库作为工具公开使用(https://github.com/SDM-TIB/EABlock)还有一个DOI(https://doi.org/10.5281/zenodo.5779773). 摘要:Despite encoding enormous amount of rich and valuable data, existing data sources are mostly created independently, being a significant challenge to their integration. Mapping languages, e.g., RML and R2RML, facilitate declarative specification of the process of applying meta-data and integrating data into a knowledge graph. Mapping rules can also include knowledge extraction functions in addition to expressing correspondences among data sources and a unified schema. Combining mapping rules and functions represents a powerful formalism to specify pipelines for integrating data into a knowledge graph transparently. Surprisingly, these formalisms are not fully adapted, and many knowledge graphs are created by executing ad-hoc programs to pre-process and integrate data. In this paper, we present EABlock, an approach integrating Entity Alignment (EA) as part of RML mapping rules. EABlock includes a block of functions performing entity recognition from textual attributes and link the recognized entities to the corresponding resources in Wikidata, DBpedia, and domain specific thesaurus, e.g., UMLS. EABlock provides agnostic and efficient techniques to evaluate the functions and transfer the mappings to facilitate its application in any RML-compliant engine. We have empirically evaluated EABlock performance, and results indicate that EABlock speeds up knowledge graph creation pipelines that require entity recognition and linking in state-of-the-art RML-compliant engines. EABlock is also publicly available as a tool through a GitHub repository(https://github.com/SDM-TIB/EABlock) and a DOI(https://doi.org/10.5281/zenodo.5779773).

【20】 AI Ethics Principles in Practice: Perspectives of Designers and Developers 标题:实践中的人工智能伦理原则:设计者和开发者的视角 链接:https://arxiv.org/abs/2112.07467

作者:Conrad Sanderson,David Douglas,Qinghua Lu,Emma Schleiger,Jon Whittle,Justine Lacey,Glenn Newnham,Stefan Hajkowicz,Cathy Robinson,David Hansen 机构:Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia 摘要:随着各种已发表的人工智能道德原则的共识逐渐接近,高层原则和可用于设计和开发负责任人工智能系统的实用技术之间仍存在差距。我们考察了澳大利亚国家科学研究机构(CSIRO)的研究人员和工程师的实践和经验,他们参与设计和开发各种用途的人工智能系统。半结构式访谈用于检查参与者的实践如何与澳大利亚政府提出的一套高水平人工智能道德原则相关联并与之保持一致。这些原则包括:隐私保护和安全、可靠性和安全性、透明度和可解释性、公平性、可竞争性、问责制、以人为本的价值观以及人类、社会和环境福祉。研究人员和工程师的见解以及他们在实际应用这些原则时遇到的挑战都得到了检验。最后,提供了一组组织响应,以支持将高级人工智能道德原则付诸实践。 摘要:As consensus across the various published AI ethics principles is approached, a gap remains between high-level principles and practical techniques that can be readily adopted to design and develop responsible AI systems. We examine the practices and experiences of researchers and engineers from Australia's national scientific research agency (CSIRO), who are involved in designing and developing AI systems for a range of purposes. Semi-structured interviews were used to examine how the practices of the participants relate to and align with a set of high-level AI ethics principles that are proposed by the Australian Government. The principles comprise: Privacy Protection & Security, Reliability & Safety, Transparency & Explainability, Fairness, Contestability, Accountability, Human-centred Values, and Human, Social & Environmental Wellbeing. The insights of the researchers and engineers as well as the challenges that arose for them in the practical application of the principles are examined. Finally, a set of organisational responses are provided to support the implementation of high-level AI ethics principles into practice.

【21】 An Interpretive Constrained Linear Model for ResNet and MgNet 标题:ResNet和MgNet的一种解释性约束线性模型 链接:https://arxiv.org/abs/2112.07441

作者:Juncai He,Jinchao Xu,Lian Zhang,Jianqing Zhu 机构: The University of Texas at Austin, †Department of Mathematics, The Pennsylvania State University, University Park 备注:26 pages, 2 figures and 11 tables. arXiv admin note: text overlap with arXiv:1911.10428 摘要:我们提出了一种约束线性数据特征映射模型,作为使用卷积神经网络(CNN)进行图像分类的可解释数学模型。从这个观点出发,我们在线性系统的传统迭代方案和ResNet和MgNet类型模型的基本块的体系结构之间建立了详细的联系。利用这些联系,我们提出了一些改进的ResNet模型,与原始模型相比,这些模型具有更少的参数,但可以产生更精确的结果,从而证明了这种约束学习数据特征映射假设的有效性。基于这一假设,我们进一步提出了一种通用的数据特征迭代方案来证明MgNet的合理性。我们还对MgNet进行了系统的数值研究,以展示其在图像分类问题上的成功和优势,并与已建立的网络进行了比较。 摘要:We propose a constrained linear data-feature-mapping model as an interpretable mathematical model for image classification using a convolutional neural network (CNN). From this viewpoint, we establish detailed connections between the traditional iterative schemes for linear systems and the architectures of the basic blocks of ResNet- and MgNet-type models. Using these connections, we present some modified ResNet models that compared with the original models have fewer parameters and yet can produce more accurate results, thereby demonstrating the validity of this constrained learning data-feature-mapping assumption. Based on this assumption, we further propose a general data-feature iterative scheme to show the rationality of MgNet. We also provide a systematic numerical study on MgNet to show its success and advantages in image classification problems and demonstrate its advantages in comparison with established networks.

【22】 Multi-Leader Congestion Games with an Adversary 标题:有对手的多队长拥堵对策 链接:https://arxiv.org/abs/2112.07435

作者:Tobias Harks,Mona Henle,Max Klimm,Jannik Matuschke,Anja Schedel 机构: University of Augsburg, Augsburg, Germany, University of Applied Sciences Augsburg, Augsburg, Germany, TU Berlin, Berlin, Germany, KU Leuven, Leuven, Belgium 摘要:我们研究了一个多领导者单追随者拥塞博弈,其中多个用户(领导者)从一组资源中选择一个资源,在观察到已实现的负载后,一个对手(单追随者)以最大负载攻击资源,导致领导者的额外成本。对于领导者之间的战略游戏,我们表明,纯纳什均衡可能不存在,因此,我们考虑近似均衡代替。作为我们的第一个主要结果,我们证明了$K$-近似平衡的存在总是可以保证的,其中$K约1.1974$是一个三次多项式方程的唯一解。为此,我们给出了一个多项式时间组合算法来计算$K$-近似平衡。系数$K$很紧,这意味着有一个实例不允许任何$alpha<K$的$alpha$-近似平衡。因此$alpha=K$是$alpha$的最小可能值,因此$alpha$近似均衡的存在可以在所考虑的博弈的任何实例中得到保证。其次,我们关注给定固定实例的近似平衡。我们展示了如何有效地计算最佳近似均衡,即给定实例的所有$alpha$近似均衡中可能存在的最小$alpha$。 摘要:We study a multi-leader single-follower congestion game where multiple users (leaders) choose one resource out of a set of resources and, after observing the realized loads, an adversary (single-follower) attacks the resources with maximum loads, causing additional costs for the leaders. For the resulting strategic game among the leaders, we show that pure Nash equilibria may fail to exist and therefore, we consider approximate equilibria instead. As our first main result, we show that the existence of a $K$-approximate equilibrium can always be guaranteed, where $K approx 1.1974$ is the unique solution of a cubic polynomial equation. To this end, we give a polynomial time combinatorial algorithm which computes a $K$-approximate equilibrium. The factor $K$ is tight, meaning that there is an instance that does not admit an $alpha$-approximate equilibrium for any $alpha

【23】 Obtaining Calibrated Probabilities with Personalized Ranking Models 标题:利用个性化排序模型获取校准概率 链接:https://arxiv.org/abs/2112.07428

作者:Wonbin Kweon,SeongKu Kang,Hwanjo Yu 机构:Pohang University of Science and Technology, South Korea 备注:AAAI 2022 摘要:对于个性化的排名模型,一个项目被用户首选的良好校准概率具有很大的实用价值。虽然现有的工作在图像分类方面显示了有希望的结果,但对于个性化排序的概率校准还没有太多的探索。在本文中,我们的目标是估计用户选择某个项目的可能性。我们研究了各种参数分布,并提出了两种参数校准方法,即高斯校准和伽马校准。每种方法都可以看作是一种后处理函数,它将预先训练的模型的排名分数映射到经过良好校准的偏好概率,而不会影响推荐性能。我们还设计了无偏经验风险最小化框架,指导校准方法从有偏用户项交互数据集中学习真实偏好概率。对真实数据集的各种个性化排序模型的广泛评估表明,所提出的校准方法和无偏经验风险最小化显著提高了校准性能。 摘要:For personalized ranking models, the well-calibrated probability of an item being preferred by a user has great practical value. While existing work shows promising results in image classification, probability calibration has not been much explored for personalized ranking. In this paper, we aim to estimate the calibrated probability of how likely a user will prefer an item. We investigate various parametric distributions and propose two parametric calibration methods, namely Gaussian calibration and Gamma calibration. Each proposed method can be seen as a post-processing function that maps the ranking scores of pre-trained models to well-calibrated preference probabilities, without affecting the recommendation performance. We also design the unbiased empirical risk minimization framework that guides the calibration methods to learning of true preference probability from the biased user-item interaction dataset. Extensive evaluations with various personalized ranking models on real-world datasets show that both the proposed calibration methods and the unbiased empirical risk minimization significantly improve the calibration performance.

【24】 Conjugated Discrete Distributions for Distributional Reinforcement Learning 标题:分布式强化学习的共轭离散分布 链接:https://arxiv.org/abs/2112.07424

作者:Björn Lindenberg,Jonas Nordqvist,Karl-Olof Lindahl 机构:Department of Mathematics, Linnæus University, V¨axj¨o, Sweden 备注:17 pages, 7 figures, conference 摘要:在这项工作中,我们继续建立在有限马尔可夫过程强化学习的最新进展之上。在以前的现有算法中,无论是单参与者算法还是分布式算法,都有一种常见的方法,即要么剪辑奖励,要么对Q函数应用转换方法,以处理实际贴现回报中的大量数量级。我们从理论上证明,如果我们有一个不确定的过程,最成功的方法之一可能不会产生最优策略。作为一种解决方案,我们认为分布强化学习有助于完全纠正这种情况。通过引入共轭分配算子,我们可以在保证理论收敛的情况下处理一大类实际收益的变换。我们提出了一种基于该算子的近似单角色算法,该算法使用Cramêer距离给出的适当分布度量,直接在不变的奖励上训练代理。为了评估其在随机环境中的性能,我们使用粘性动作在55个Atari 2600游戏套件上训练代理,并与多巴胺框架中的其他著名算法相比,获得最先进的性能。 摘要:In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cram'er distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

【25】 You Only Need One Model for Open-domain Question Answering 标题:你只需要一个模型就可以进行开放领域的答疑 链接:https://arxiv.org/abs/2112.07381

作者:Haejun Lee,Akhil Kedia,Jongwon Lee,Ashwin Paranjape,Christopher D. Manning,Kyoung-Gu Woo 机构:♠ Samsung Research, ♣ Stanford University, ♥ Growdle Corporation 备注:preprint 摘要:开放领域问题回答的最新工作涉及使用检索器模型的外部知识库,可以选择使用单独的重排器模型重新排列文章,并使用另一个读者模型生成答案。尽管执行了相关任务,但这些模型具有单独的参数,并且在训练期间耦合性很弱。在这项工作中,我们建议将检索器和重排器转换为硬注意机制,在transformer体系结构中顺序应用,并将生成的计算表示提供给读者。在这种单一的模型架构中,隐藏的表示从检索器到重新排序器再到读取器被逐步细化,这更有效地利用了模型容量,并且当我们以端到端的方式对其进行训练时,也会产生更好的梯度流。我们还提出了一种预训练方法来有效地训练这种体系结构。我们在自然问题和TriviaQA开放数据集上评估了我们的模型,对于固定参数预算,我们的模型的精确匹配分数分别为1.0和0.7,优于先前的最先进模型。 摘要:Recent works for Open-domain Question Answering refer to an external knowledge base using a retriever model, optionally rerank the passages with a separate reranker model and generate an answer using an another reader model. Despite performing related tasks, the models have separate parameters and are weakly-coupled during training. In this work, we propose casting the retriever and the reranker as hard-attention mechanisms applied sequentially within the transformer architecture and feeding the resulting computed representations to the reader. In this singular model architecture the hidden representations are progressively refined from the retriever to the reranker to the reader, which is more efficient use of model capacity and also leads to better gradient flow when we train it in an end-to-end manner. We also propose a pre-training methodology to effectively train this architecture. We evaluate our model on Natural Questions and TriviaQA open datasets and for a fixed parameter budget, our model outperforms the previous state-of-the-art model by 1.0 and 0.7 exact match scores.

【26】 Simple and Robust Loss Design for Multi-Label Learning with Missing Labels 标题:具有缺失标签的多标签学习的简单鲁棒损失设计 链接:https://arxiv.org/abs/2112.07368

作者:Youcai Zhang,Yuhao Cheng,Xinyu Huang,Fei Wen,Rui Feng,Yaqian Li,Yandong Guo 机构:Guo 摘要:标签缺失情况下的多标签学习(MLML)是一个具有挑战性的问题。现有的方法主要集中在网络结构或训练方案的设计上,这增加了实现的复杂性。这项工作试图在不增加程序和复杂性的情况下实现MLML中的潜在损失函数。为此,我们提出了两种简单而有效的方法,通过鲁棒损失设计,基于模型可以在训练期间以高精度识别缺失标签的观察结果。第一种是一种新颖的鲁棒性负片损失,即希尔损失,它将负片重新加权为希尔形状,以减轻假负片的影响。第二种是自配速损失校正(SPLC)方法,该方法在丢失标签的近似分布下使用从最大似然准则导出的损失。在大量多标签图像分类数据集上的综合实验表明,我们的方法可以显著提高MLML的性能,并在MLML中实现新的最先进的损失函数。 摘要:Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods via robust loss design based on an observation that a model can identify missing labels during training with a high precision. The first is a novel robust loss for negatives, namely the Hill loss, which re-weights negatives in the shape of a hill to alleviate the effect of false negatives. The second is a self-paced loss correction (SPLC) method, which uses a loss derived from the maximum likelihood criterion under an approximate distribution of missing labels. Comprehensive experiments on a vast range of multi-label image classification datasets demonstrate that our methods can remarkably boost the performance of MLML and achieve new state-of-the-art loss functions in MLML.

【27】 Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry 标题:流程工业智能故障诊断中的技术语言监督 链接:https://arxiv.org/abs/2112.07356

作者:Karl Löwenmark,Cees Taal,Stephan Schnabel,Marcus Liwicki,Fredrik Sandin 机构:Embedded Intelligent Systems Laboratory (EISLAB), Lule˚a University of Technology, Lule˚a, Sweden, SKF Research & Technology Development, Meidoornkade , AE Houten, P.O. Box , DT Nieuwegein, The, Netherlands 摘要:在流程工业中,具有自动故障诊断方法的状态监测系统可帮助人类专家,从而提高维护效率、流程可持续性和工作场所安全性。利用基于数据和机器学习的模型改进自动故障诊断方法是智能故障诊断(IFD)的一个核心方面。IFD中的一个主要挑战是开发具有训练和验证模型所需的准确标签的真实数据集,并将使用标签实验室数据训练的模型传输到异构流程工业环境。然而,在现代状态监测系统中,领域专家编写的故障描述和工作指令越来越数字化,例如在旋转设备监测中。因此,关于故障特征和严重性的领域特定知识作为技术语言注释存在于工业数据集中。此外,自然语言处理的最新进展使使用自然语言注释的弱监督模型优化成为可能,最明显的形式是自然语言监督(NLS)。这为基于工业数据的IFD系统开发技术语言监控(TLS)解决方案创造了一个及时的机会,例如,作为实验室数据预训练的补充,以解决过度拟合和不准确的样本外概括等问题。我们调查了文献,发现在过去两年中NLS的成熟度有了相当大的提高,促进了自然语言以外的应用;薄弱监督手段迅速发展;迁移学习是IFD的一个当前趋势,可以从这些发展中受益。最后,我们描述了一个在IFD中集成TLS的框架,该框架受到最近NLS创新的启发。 摘要:In the process industry, condition monitoring systems with automated fault diagnosis methods assisthuman experts and thereby improve maintenance efficiency, process sustainability, and workplace safety.Improving the automated fault diagnosis methods using data and machine learning-based models is a centralaspect of intelligent fault diagnosis (IFD). A major challenge in IFD is to develop realistic datasets withaccurate labels needed to train and validate models, and to transfer models trained with labeled lab datato heterogeneous process industry environments. However, fault descriptions and work-orders written bydomain experts are increasingly digitized in modern condition monitoring systems, for example in the contextof rotating equipment monitoring. Thus, domain-specific knowledge about fault characteristics and severitiesexists as technical language annotations in industrial datasets. Furthermore, recent advances in naturallanguage processing enable weakly supervised model optimization using natural language annotations, mostnotably in the form ofnatural language supervision(NLS). This creates a timely opportunity to developtechnical language supervision(TLS) solutions for IFD systems grounded in industrial data, for exampleas a complement to pre-training with lab data to address problems like overfitting and inaccurate out-of-sample generalisation. We surveyed the literature and identify a considerable improvement in the maturityof NLS over the last two years, facilitating applications beyond natural language; a rapid development ofweak supervision methods; and transfer learning as a current trend in IFD which can benefit from thesedevelopments. Finally, we describe a framework for integration of TLS in IFD which is inspired by recentNLS innovations.

【28】 Learning to Guide and to Be Guided in the Architect-Builder Problem 标题:在架构师-构建者问题中学习指导和被指导 链接:https://arxiv.org/abs/2112.07342

作者:Barde Paul,Karch Tristan,Nowrouzezahrai Derek,Moulin-Frier Clément,Pal Christopher,Oudeyer Pierre-Yves 机构:†Qu´ebec AI institute (Mila)McGill Universitybardepau, Inria - Flowers teamUniversit´e de Bordeauxtristan 摘要:我们感兴趣的是学习协调的交互式代理,即$builder$(执行操作但忽略任务目标)和$architect$(引导构建器实现任务目标)。我们定义并探索了一个正式的环境,其中人工智能体配备了一种机制,允许他们在同时学习任务的同时进化出一个共享的通信协议。实验符号学领域已经显示了人类从先验未知指令中学习意义的熟练程度。因此,我们从中得到启发,提出了建筑师-建筑商问题(ABP):一种不对称的环境,建筑师必须学会引导建筑商建造特定的结构。架构师知道目标结构,但不能在环境中操作,只能向架构师发送任意消息。另一方面,架构师可以在环境中工作,但不知道手头的任务,必须学会仅依靠架构师发送的消息来解决它。至关重要的是,消息的含义最初没有定义,也没有在代理之间共享,但必须在整个学习过程中协商。在这些约束条件下,我们提出了架构师-架构师迭代指导(ABIG),这是架构师-架构师问题的一种解决方案,架构师利用已学习的架构师模型来指导它,而架构师使用自模仿学习来强化其指导行为。我们分析了ABIG的关键学习机制,并在ABP的二维实例中对其进行了测试,其中的任务包括抓取立方体、将其放置在给定位置或构建各种形状。在这种环境中,ABIG产生了一种低级别、高频率、指导性的通信协议,它不仅使架构师-构建者对能够解决手头的任务,而且还可以推广到不可见的任务。 摘要:We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at the same time evolving a shared communication protocol. The field of Experimental Semiotics has shown the extent of human proficiency at learning from a priori unknown instructions meanings. Therefore, we take inspiration from it and present the Architect-Builder Problem (ABP): an asymmetrical setting in which an architect must learn to guide a builder towards constructing a specific structure. The architect knows the target structure but cannot act in the environment and can only send arbitrary messages to the builder. The builder on the other hand can act in the environment but has no knowledge about the task at hand and must learn to solve it relying only on the messages sent by the architect. Crucially, the meaning of messages is initially not defined nor shared between the agents but must be negotiated throughout learning. Under these constraints, we propose Architect-Builder Iterated Guiding (ABIG), a solution to the Architect-Builder Problem where the architect leverages a learned model of the builder to guide it while the builder uses self-imitation learning to reinforce its guided behavior. We analyze the key learning mechanisms of ABIG and test it in a 2-dimensional instantiation of the ABP where tasks involve grasping cubes, placing them at a given location, or building various shapes. In this environment, ABIG results in a low-level, high-frequency, guiding communication protocol that not only enables an architect-builder pair to solve the task at hand, but that can also generalize to unseen tasks.

【29】 Multi-Instance Training for Question Answering Across Table and Linked Text 标题:跨表、跨链接文本问答的多实例训练 链接:https://arxiv.org/abs/2112.07337

作者:Vishwajeet Kumar,Saneem Chemmengath,Yash Gupta,Jaydeep Sen,Samarth Bharadwaj,Soumen Chakrabarti 机构:Saneem Chemmengarh†, † IBM Research, ‡ IIT Bombay 摘要:使用表中的信息(TableQA)回答自然语言问题是最近相当感兴趣的话题。在许多应用程序中,表不是孤立出现的,而是嵌入或链接到非结构化文本中。通常,最好通过将问题的各个部分与表格单元格内容或非结构化文本范围相匹配,并从任一来源提取答案来回答问题。这导致了HybridQA数据集引入的TextTableQA问题的新空间。现有的基于Transformer的阅读理解(RC)结构的表格表示法无法通过单个系统解决两种表示法的不同模式。由于需要远程监督,对此类系统的训练面临进一步的挑战。为了减少认知负担,训练实例通常只包括问答,后者匹配多个表格行和文本段落。这导致了一个嘈杂的多实例训练机制,不仅涉及表的行,还涉及链接文本的跨度。为了应对这些挑战,我们提出了MITQA,这是一个新的TextTableQA系统,它明确地对表行选择和文本跨度选择的不同但密切相关的概率空间进行建模。我们的实验表明,与最近的基线相比,我们的方法具有优越性。提出的方法目前在HybridQA排行榜上名列前茅,测试集保持不变,与之前公布的结果相比,EM和F1成绩绝对提高了21%。 摘要:Answering natural language questions using information from tables (TableQA) is of considerable recent interest. In many applications, tables occur not in isolation, but embedded in, or linked to unstructured text. Often, a question is best answered by matching its parts to either table cell contents or unstructured text spans, and extracting answers from either source. This leads to a new space of TextTableQA problems that was introduced by the HybridQA dataset. Existing adaptations of table representation to transformer-based reading comprehension (RC) architectures fail to tackle the diverse modalities of the two representations through a single system. Training such systems is further challenged by the need for distant supervision. To reduce cognitive burden, training instances usually include just the question and answer, the latter matching multiple table rows and text passages. This leads to a noisy multi-instance training regime involving not only rows of the table, but also spans of linked text. We respond to these challenges by proposing MITQA, a new TextTableQA system that explicitly models the different but closely-related probability spaces of table row selection and text span selection. Our experiments indicate the superiority of our approach compared to recent baselines. The proposed method is currently at the top of the HybridQA leaderboard with a held out test set, achieving 21 % absolute improvement on both EM and F1 scores over previous published results.

【30】 Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models 标题:面向预训练语言模型的模型不确定性感知知识融合 链接:https://arxiv.org/abs/2112.07327

作者:Lei Li,Yankai Lin,Xuancheng Ren,Guangxiang Zhao,Peng Li,Jie Zhou,Xu Sun 机构:†MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, §Pattern Recognition Center, WeChat AI, Tencent Inc., China 摘要:随着许多性能良好的微调预训练语言模型(PLM)的大量发布,研究更好的方法重用这些模型至关重要,因为它可以大大降低再训练计算成本和潜在的环境副作用。在本文中,我们探索了一种新的模型重用范式,即PLMs的知识融合(KA)。在没有人类注释的情况下,KA旨在将来自不同教师PLM的知识(每个PLM专门处理不同的分类问题)合并到一个通用的学生模型中。为了实现这一点,我们设计了一个模型不确定性-感知知识融合(MUKA)框架,该框架使用蒙特卡罗辍学近似黄金指导来识别潜在的合格教师。实验结果表明,与基准数据集上的基线相比,MUKA实现了实质性的改进。进一步的分析表明,MUKA可以在多教师模型、异质教师甚至跨数据集教师的复杂环境下很好地推广。 摘要:As many fine-tuned pre-trained language models~(PLMs) with promising performance are generously released, investigating better ways to reuse these models is vital as it can greatly reduce the retraining computational cost and the potential environmental side-effects. In this paper, we explore a novel model reuse paradigm, Knowledge Amalgamation~(KA) for PLMs. Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. The achieve this, we design a Model Uncertainty--aware Knowledge Amalgamation~(MUKA) framework, which identifies the potential adequate teacher using Monte-Carlo Dropout for approximating the golden supervision to guide the student. Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKA can generalize well under several complicate settings with multiple teacher models, heterogeneous teachers, and even cross-dataset teachers.

【31】 Kernel-aware Raw Burst Blind Super-Resolution 标题:内核感知的原始突发盲超分辨率 链接:https://arxiv.org/abs/2112.07315

作者:Wenyi Lian,Shanglian Peng 机构:School of Computer Science, Chengdu University of Information Technology, China, DBSR, st Frame of LR, DeepREP, EBSR, Ours, GT 摘要:突发超分辨率(SR)提供了从低质量图像恢复丰富细节的可能性。然而,由于低分辨率(LR)图像在实际应用中存在多种复杂且未知的退化,现有的非盲(如双三次)设计网络通常会导致恢复高分辨率(HR)图像的性能严重下降。此外,处理多个未对准的噪声原始输入也是一项挑战。在本文中,我们解决了从现代手持设备获取的原始突发序列重建HR图像的问题。核心思想是一种内核引导策略,它可以通过两个步骤解决突发SR:内核建模和HR恢复。前者根据原始输入估计突发核,而后者根据估计的核预测超分辨率图像。此外,我们还引入了一个核感知的可变形对齐模块,该模块可以在考虑模糊先验的情况下有效地对齐原始图像。对合成数据集和真实数据集的大量实验表明,该方法在突发随机共振问题中具有良好的性能。 摘要:Burst super-resolution (SR) provides a possibility of restoring rich details from low-quality images. However, since low-resolution (LR) images in practical applications have multiple complicated and unknown degradations, existing non-blind (e.g., bicubic) designed networks usually lead to a severe performance drop in recovering high-resolution (HR) images. Moreover, handling multiple misaligned noisy raw inputs is also challenging. In this paper, we address the problem of reconstructing HR images from raw burst sequences acquired from modern handheld devices. The central idea is a kernel-guided strategy which can solve the burst SR with two steps: kernel modeling and HR restoring. The former estimates burst kernels from raw inputs, while the latter predicts the super-resolved image based on the estimated kernels. Furthermore, we introduce a kernel-aware deformable alignment module which can effectively align the raw images with consideration of the blurry priors. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method can perform favorable state-of-the-art performance in the burst SR problem.

【32】 MMO: Meta Multi-Objectivization for Software Configuration Tuning 标题:MMO:软件配置调优的元多目标化 链接:https://arxiv.org/abs/2112.07303

作者:Tao Chen,Miqing Li 机构: This has been well-echoed from the•Tao Chen is with the Department of Computer Science, LoughboroughUniversity, uk•Miqing Li is with the School of Computer Science, University ofBirmingham 备注:12 figures, 5 tables. arXiv admin note: text overlap with arXiv:2106.01331 摘要:软件配置调优对于优化给定的性能目标(例如,最小化延迟)至关重要。然而,由于软件本身复杂的配置环境和昂贵的测量,它取得了相当小的成功,特别是在防止搜索陷入局部最优方面。为了解决这个问题,在本文中我们采取了不同的观点。我们没有专注于改进优化器,而是致力于优化模型的层次,并提出了一个元多目标化(MMO)模型,该模型考虑了一个辅助性能目标(例如,除了延迟之外的吞吐量)。该模型的独特之处在于,我们没有优化辅助性能目标,而是使用它使不同配置之间的可比性降低(即,帕累托非优势配置),从而防止搜索陷入局部最优。重要的是,我们展示了如何有效地使用MMO模型,而不必担心其权重——这是唯一一个可能影响其有效性的高度敏感参数。对来自11个真实软件系统/环境的22个案例进行的实验证实,我们的MMO模型在82%的案例中比其最先进的单目标模型表现更好,同时实现了2.09x的加速比。对于67%的情况,新的规范化还使MMO模型在使用之前FSE工作中使用的在预调整的最佳权重下的规范化时,表现优于实例,从而节省了大量资源,而这些资源对于找到一个好的权重是必要的。我们还证明,使用新规范化的MMO模型可以在68%的情况下以1.22倍的加速比整合Flash,这是一种最新的基于模型的调优工具。 摘要:Software configuration tuning is essential for optimizing a given performance objective (e.g., minimizing latency). Yet, due to the software's intrinsically complex configuration landscape and expensive measurement, there has been a rather mild success, particularly in preventing the search from being trapped in local optima. To address this issue, in this paper we take a different perspective. Instead of focusing on improving the optimizer, we work on the level of optimization model and propose a meta multi-objectivization (MMO) model that considers an auxiliary performance objective (e.g., throughput in addition to latency). What makes this model unique is that we do not optimize the auxiliary performance objective, but rather use it to make similarly-performing while different configurations less comparable (i.e. Pareto nondominated to each other), thus preventing the search from being trapped in local optima. Importantly, we show how to effectively use the MMO model without worrying about its weight -- the only yet highly sensitive parameter that can affect its effectiveness. Experiments on 22 cases from 11 real-world software systems/environments confirm that our MMO model with the new normalization performs better than its state-of-the-art single-objective counterparts on 82% cases while achieving up to 2.09x speedup. For 67% of the cases, the new normalization also enables the MMO model to outperform the instance when using it with the normalization used in our prior FSE work under pre-tuned best weights, saving a great amount of resources which would be otherwise necessary to find a good weight. We also demonstrate that the MMO model with the new normalization can consolidate Flash, a recent model-based tuning tool, on 68% of the cases with 1.22x speedup in general.

【33】 MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems 标题:MCDS:移动边缘云计算系统中的人工智能增广工作流调度 链接:https://arxiv.org/abs/2112.07269

作者:Shreshth Tuli,Giuliano Casale,Nicholas R. Jennings 机构: Jennings are with the Department ofComputing, Jennings is also with Loughborough University 备注:Accepted in IEEE Transactions on Parallel and Distributed Systems (Special Issue on PDC for AI), 2022 摘要:工作流调度是并行分布式计算(PDC)中一个长期研究的问题,其目的是有效地利用计算资源来满足用户的服务需求。最近提出的调度方法利用边缘计算平台的低响应时间来优化应用程序服务质量(QoS)。然而,由于计算异构性、移动设备不断变化的延迟以及工作负载资源需求的易变性,在移动边缘云系统中调度工作流应用程序具有挑战性。为了克服这些困难,有必要但同时也有挑战性地开发一个长远的优化方案,有效地建模QoS目标。在这项工作中,我们提出了MCDS:Monte Carlo学习,使用深度代理模型有效地调度移动边缘云计算系统中的工作流应用程序。MCDS是一种基于人工智能(AI)的调度方法,它使用基于树的搜索策略和基于深度神经网络的代理模型来估计即时行动对调度决策鲁棒优化的长期QoS影响。在物理和模拟边缘云试验台上的实验表明,MCDS在能耗、响应时间、SLA违规和成本方面比最先进的方法分别提高至少6.13%、4.56%、45.09%和30.71%。 摘要:Workflow scheduling is a long-studied problem in parallel and distributed computing (PDC), aiming to efficiently utilize compute resources to meet user's service requirements. Recently proposed scheduling methods leverage the low response times of edge computing platforms to optimize application Quality of Service (QoS). However, scheduling workflow applications in mobile edge-cloud systems is challenging due to computational heterogeneity, changing latencies of mobile devices and the volatile nature of workload resource requirements. To overcome these difficulties, it is essential, but at the same time challenging, to develop a long-sighted optimization scheme that efficiently models the QoS objectives. In this work, we propose MCDS: Monte Carlo Learning using Deep Surrogate Models to efficiently schedule workflow applications in mobile edge-cloud computing systems. MCDS is an Artificial Intelligence (AI) based scheduling approach that uses a tree-based search strategy and a deep neural network-based surrogate model to estimate the long-term QoS impact of immediate actions for robust optimization of scheduling decisions. Experiments on physical and simulated edge-cloud testbeds show that MCDS can improve over the state-of-the-art methods in terms of energy consumption, response time, SLA violations and cost by at least 6.13, 4.56, 45.09 and 30.71 percent respectively.

【34】 Quantifying Multimodality in World Models 标题:世界模型中多模态的量化 链接:https://arxiv.org/abs/2112.07263

作者:Andreas Sedlmeier,Michael Kölle,Robert Müller,Leo Baudrexel,Claudia Linnhoff-Popien 机构:LMU Munich, Munich, Germany 摘要:基于模型的深度强化学习(RL)假设环境的底层过渡动力学模型可用。该模型可用于预测代理人可能采取的行动的未来影响。当没有此类模型可用时,可以学习真实环境的近似值,例如通过使用生成性神经网络,有时也称为世界模型。由于大多数现实世界的环境本质上是随机的,过渡动力学通常是多模态的,因此使用能够反映这种多模态不确定性的建模技术非常重要。为了安全地将这些学习系统部署在现实世界中,特别是在工业环境中,考虑这些不确定性是至关重要的。在这项工作中,我们分析了基于RL的世界模型中的多模态不确定性,并提出了新的检测和量化指标。正确的建模和检测不确定的未来状态奠定了基础,以安全的方式处理危急情况,这是在现实世界中部署RL系统的先决条件。 摘要:Model-based Deep Reinforcement Learning (RL) assumes the availability of a model of an environment's underlying transition dynamics. This model can be used to predict future effects of an agent's possible actions. When no such model is available, it is possible to learn an approximation of the real environment, e.g. by using generative neural networks, sometimes also called World Models. As most real-world environments are stochastic in nature and the transition dynamics are oftentimes multimodal, it is important to use a modelling technique that is able to reflect this multimodal uncertainty. In order to safely deploy such learning systems in the real world, especially in an industrial context, it is paramount to consider these uncertainties. In this work, we analyze existing and propose new metrics for the detection and quantification of multimodal uncertainty in RL based World Models. The correct modelling & detection of uncertain future states lays the foundation for handling critical situations in a safe way, which is a prerequisite for deploying RL systems in real-world settings.

【35】 Margin Calibration for Long-Tailed Visual Recognition 标题:长尾视觉识别中的边缘校正 链接:https://arxiv.org/abs/2112.07225

作者:Yidong Wang,Bowen Zhang,Wenxin Hou,Zhen Wu,Jindong Wang,Takahiro Shinozaki 机构:Tokyo Institute of Technology, Nanjing University, Microsoft Research Asia 备注:Technical report; 9 pages 摘要:视觉识别任务中的长尾类分布对神经网络如何处理头类和尾类之间的有偏预测提出了巨大挑战,即该模型倾向于将尾类分类为头类。虽然现有的研究主要集中在数据重采样和损失函数工程上,但在本文中,我们采用了不同的视角:分类裕度。我们研究了边际与logits(分类分数)之间的关系,并实证观察了有偏边际和有偏logits之间的正相关关系。我们提出MARC,一个简单而有效的边缘校准函数,用于动态校准无偏Logit的有偏边缘。我们通过对常见的长尾基准测试(包括CIFAR-LT、ImageNet LT、Places LT和iNaturalist-LT)进行广泛的实验来验证MARC。实验结果表明,我们的MARC在这些基准测试上取得了良好的结果。此外,MARC非常容易实现,只需三行代码。我们希望这一简单的方法将激励人们重新思考长尾视觉识别中的偏差边际和偏差逻辑。 摘要:The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.

【36】 Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module 标题:Meta-CPR:用通信模式识别模块推广到看不见的大量Agent 链接:https://arxiv.org/abs/2112.07222

作者:Wei-Cheng Tseng,Wei Wei,Da-Chen Juan,Min Sun 机构: National Tsing Hua University, Google AI Research, Appier Inc., Taiwan 摘要:在强化学习中,设计一种有效的agent之间的通信机制一直是一项具有挑战性的任务,特别是在现实应用中。代理的数量可能会增加,或者环境有时需要与真实场景中不断变化的代理数量进行交互。为此,多代理框架需要在规模和动态方面处理代理的各种场景,以便在实际应用中实用。我们将具有不同数量代理的多代理环境描述为一个多任务问题,并提出了一个元强化学习(meta-RL)框架来解决这个问题。该框架采用元学习通信模式识别(CPR)模块来识别通信行为,并提取有助于训练过程的信息。实验结果表明,所提出的框架(a)可以推广到未知的更大数量的代理,并且(b)允许代理的数量在不同的事件之间变化。消融研究也被提供来解释提议的CPR设计,并表明这种设计是有效的。 摘要:Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and dynamics, for being practical to real-world applications. We formulate the multi-agent environment with a different number of agents as a multi-tasking problem and propose a meta reinforcement learning (meta-RL) framework to tackle this problem. The proposed framework employs a meta-learned Communication Pattern Recognition (CPR) module to identify communication behavior and extract information that facilitates the training process. Experimental results are poised to demonstrate that the proposed framework (a) generalizes to an unseen larger number of agents and (b) allows the number of agents to change between episodes. The ablation study is also provided to reason the proposed CPR design and show such design is effective.

【37】 A real-time spatiotemporal AI model analyzes skill in open surgical videos 标题:一种实时时空人工智能模型在开放式手术视频中的技能分析 链接:https://arxiv.org/abs/2112.07219

作者:Emmett D. Goodman,Krishna K. Patel,Yilun Zhang,William Locke,Chris J. Kennedy,Rohan Mehrotra,Stephen Ren,Melody Guan,Maren Downing,Hao Wei Chen,Jevin Z. Clark,Gabriel A. Brat,Serena Yeung 机构:Department of Computer Science, Stanford University, Stanford, CA, USA., Department of Biomedical Data Science, Stanford University, Stanford, CA, USA, Department of Electrical Engineering, Stanford University, Stanford, CA, USA 备注:22 pages, 4 main text figures, 7 extended data figures, 4 extended data tables 摘要:开放式手术是全世界外科手术的主要形式。人工智能(AI)有可能优化手术实践并改善患者预后,但工作主要集中在微创技术上。我们的工作通过在YouTube上策划迄今为止最大的开放手术视频数据集,克服了现有人工智能模型训练数据的局限性:1997年从50个国家上传的23个手术视频。利用这个数据集,我们开发了一个多任务人工智能模型,能够实时理解手术行为、手和工具——程序流程和外科医生技能的构建块。我们证明了我们的模型可以推广到不同的手术类型和环境中。为了说明这一普遍性,我们直接应用YouTube训练的模型分析了在学术医疗中心前瞻性收集的开放手术,并确定了与手部运动效率相关的手术技能运动学描述符。我们的开放手术(AVOS)数据集和训练模型的注释视频将用于外科人工智能的进一步开发。 摘要:Open procedures represent the dominant form of surgery worldwide. Artificial intelligence (AI) has the potential to optimize surgical practice and improve patient outcomes, but efforts have focused primarily on minimally invasive techniques. Our work overcomes existing data limitations for training AI models by curating, from YouTube, the largest dataset of open surgical videos to date: 1997 videos from 23 surgical procedures uploaded from 50 countries. Using this dataset, we developed a multi-task AI model capable of real-time understanding of surgical behaviors, hands, and tools - the building blocks of procedural flow and surgeon skill. We show that our model generalizes across diverse surgery types and environments. Illustrating this generalizability, we directly applied our YouTube-trained model to analyze open surgeries prospectively collected at an academic medical center and identified kinematic descriptors of surgical skill related to efficiency of hand motion. Our Annotated Videos of Open Surgery (AVOS) dataset and trained model will be made available for further development of surgical AI.

【38】 ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval 标题:ACE-BERT:面向电子商务检索的对抗性跨模态增强型BERT 链接:https://arxiv.org/abs/2112.07209

作者:Boxuan Zhang,Chao Wei,Yan Jin,Weiru Zhang 机构:Alibaba Group, Hangzhou, China 摘要:如今,在电子商务平台上,产品以多种形式呈现给客户。在为客户提供吸引人的产品的同时,这些多种模式对于检索系统非常重要。因此,如何同时考虑多个模式以提高检索性能至关重要。由于以下原因,这个问题对我们是一个巨大的挑战:(1)用预先训练好的图像模型(如基于CNN的模型)提取面片特征的方法存在很大的归纳偏差。在电子商务中,很难从产品图像中获取有效的信息。(2) 多模态数据的异构性使得在一个公共子空间中构造包含标题和图像的查询文本和产品的表示具有挑战性。我们提出了一种新的对抗性跨模式增强型BERT(ACE-BERT),用于高效的电子商务检索。具体而言,ACE-BERT利用面片特征和像素特征作为图像表示。因此,转换器结构可以直接应用于原始图像序列。ACE-BERT以预先训练好的增强型BERT为骨干网络,通过添加领域分类器进一步采用对抗式学习,确保不同模态表示的分布一致性,以缩小查询和产品之间的表示差距。实验结果表明,ACE-BERT在检索任务上优于现有的方法。值得注意的是,ACE-BERT已经部署在我们的电子商务搜索引擎中,导致收入增长1.46%。 摘要:Nowadays on E-commerce platforms, products are presented to the customers with multiple modalities. These multiple modalities are significant for a retrieval system while providing attracted products for customers. Therefore, how to take into account those multiple modalities simultaneously to boost the retrieval performance is crucial. This problem is a huge challenge to us due to the following reasons: (1) the way of extracting patch features with the pre-trained image model (e.g., CNN-based model) has much inductive bias. It is difficult to capture the efficient information from the product image in E-commerce. (2) The heterogeneity of multimodal data makes it challenging to construct the representations of query text and product including title and image in a common subspace. We propose a novel Adversarial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval. In detail, ACE-BERT leverages the patch features and pixel features as image representation. Thus the Transformer architecture can be applied directly to the raw image sequences. With the pre-trained enhanced BERT as the backbone network, ACE-BERT further adopts adversarial learning by adding a domain classifier to ensure the distribution consistency of different modality representations for the purpose of narrowing down the representation gap between query and product. Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task. It is remarkable that ACE-BERT has already been deployed in our E-commerce's search engine, leading to 1.46% increase in revenue.

【39】 Weakly Supervised High-Fidelity Clothing Model Generation 标题:弱监督高保真服装模型生成 链接:https://arxiv.org/abs/2112.07200

作者:Ruili Feng,Cheng Ma,Chengji Shen,Xin Gao,Zhenjiang Liu,Xiaobo Li,Kairi Ou,Zhengjun Zha 机构:University of Science and Technology of China,Zhejiang University,Alibaba Group 摘要:网络经济的发展引发了在产品服装上生成模特形象、展示新服装和促进销售的需求。然而,昂贵的专有模型图像挑战了该场景中现有的图像虚拟试穿方法,因为大多数模型图像都需要经过大量模型图像和成对衣服图像的训练。在本文中,我们提出了一种廉价但可扩展的弱监督方法,称为深度生成投影(DGP),以解决这一特定场景。该方法的核心是模仿人类预测磨损效果的过程,这是一种基于生活经验的无监督想象,而不是从监督中学习的计算规则。在这里,一个预训练的款式被用来捕捉穿着的实际体验。实验表明,将衣服和身体的粗略对齐投影到StyleGAN空间可以产生逼真的穿着效果。在真实场景专有模型图像上的实验表明,在生成服装模型图像时,DGP优于几种最先进的监督方法。 摘要:The development of online economics arouses the demand of generating images of models on product clothes, to display new clothes and promote sales. However, the expensive proprietary model images challenge the existing image virtual try-on methods in this scenario, as most of them need to be trained on considerable amounts of model images accompanied with paired clothes images. In this paper, we propose a cheap yet scalable weakly-supervised method called Deep Generative Projection (DGP) to address this specific scenario. Lying in the heart of the proposed method is to imitate the process of human predicting the wearing effect, which is an unsupervised imagination based on life experience rather than computation rules learned from supervisions. Here a pretrained StyleGAN is used to capture the practical experience of wearing. Experiments show that projecting the rough alignment of clothing and body onto the StyleGAN space can yield photo-realistic wearing results. Experiments on real scene proprietary model images demonstrate the superiority of DGP over several state-of-the-art supervised methods when generating clothing model images.

【40】 From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression 标题:从密集到稀疏:对比剪枝以实现更好的预训练语言模型压缩 链接:https://arxiv.org/abs/2112.07198

作者:Runxin Xu,Fuli Luo,Chengyu Wang,Baobao Chang,Jun Huang,Songfang Huang,Fei Huang 机构:Key Laboratory of Computational Linguistics, Peking University, MOE, China, Alibaba Group 备注:Accepted to AAAI 2022 摘要:在预训练和微调范式下,预训练语言模型(PLM)在各种自然语言处理(NLP)任务中取得了巨大成功。plm具有大量的参数,计算量大,资源消耗大。因此,模型修剪被引入到大规模PLM的压缩中。然而,大多数先验方法只考虑任务特定知识对下游任务的影响,而忽略修剪过程中不必要的任务无关知识,这可能导致灾难性遗忘问题,并导致泛化能力差。为了在我们的剪枝模型中保持任务不可知和任务特定的知识,我们在预训练和微调的范式下提出了对比剪枝(CAP)。它被设计为一个通用框架,兼容结构化和非结构化修剪。CAP统一于对比学习,使剪枝模型能够从预先训练的任务不可知知识模型和微调的任务特定知识模型中学习。此外,为了更好地保持修剪模型的性能,快照(即每次修剪迭代中的中间模型)也可以作为修剪的有效监督。我们的大量实验表明,采用CAP始终会产生显著的改进,特别是在非常高的稀疏性场景中。仅保留3%的模型参数(即97%的稀疏性),CAP在QQP和MNLI任务中成功实现了原始BERT性能的99.2%和96.3%。此外,我们的探索性实验表明,CAP修剪后的模型具有更好的泛化能力。 摘要:Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

【41】 An Adaptive Graph Pre-training Framework for Localized Collaborative Filtering 标题:一种用于局部化协同过滤的自适应图预训练框架 链接:https://arxiv.org/abs/2112.07191

作者:Yiqi Wang,Chaozhuo Li,Zheng Liu,Mingzheng Li,Jiliang Tang,Xing Xie,Lei Chen,Philip S. Yu 机构: Michigan State University, Hong Kong University of Science and Technology 摘要:图神经网络(GNNs)在推荐任务中得到了广泛的应用,并取得了很好的性能。然而,大多数基于GNN的推荐方法在实际应用中都存在数据稀疏的问题。同时,预训练技术在缓解自然语言处理(NLP)和计算机视觉(CV)等领域的数据稀疏性方面取得了巨大成功。因此,图形预训练对于缓解基于GNN的推荐中的数据稀疏性具有巨大的潜力。然而,针对建议的训练前GNN面临着独特的挑战。例如,不同推荐任务中的用户项目交互图具有不同的用户和项目集,并且它们通常呈现不同的属性。因此,NLP和CV中常用的将知识从训练前任务转移到下游任务(如共享所学嵌入或特征提取器)的成功机制并不直接适用于现有的基于GNN的推荐模型。为了应对这些挑战,我们精心设计了一个用于本地化协同过滤(ADAPT)的自适应图预训练框架。它不需要传输用户/项目嵌入,并且能够捕获不同图的公共知识和每个图的唯一性。大量的实验结果证明了自适应算法的有效性和优越性。 摘要:Graph neural networks (GNNs) have been widely applied in the recommendation tasks and have obtained very appealing performance. However, most GNN-based recommendation methods suffer from the problem of data sparsity in practice. Meanwhile, pre-training techniques have achieved great success in mitigating data sparsity in various domains such as natural language processing (NLP) and computer vision (CV). Thus, graph pre-training has the great potential to alleviate data sparsity in GNN-based recommendations. However, pre-training GNNs for recommendations face unique challenges. For example, user-item interaction graphs in different recommendation tasks have distinct sets of users and items, and they often present different properties. Therefore, the successful mechanisms commonly used in NLP and CV to transfer knowledge from pre-training tasks to downstream tasks such as sharing learned embeddings or feature extractors are not directly applicable to existing GNN-based recommendations models. To tackle these challenges, we delicately design an adaptive graph pre-training framework for localized collaborative filtering (ADAPT). It does not require transferring user/item embeddings, and is able to capture both the common knowledge across different graphs and the uniqueness for each graph. Extensive experimental results have demonstrated the effectiveness and superiority of ADAPT.

【42】 On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation 标题:大脑皮层放大率和眼跳作为数据增强生物指标的研究 链接:https://arxiv.org/abs/2112.07173

作者:Binxu Wang,David Mayo,Arturo Deza,Andrei Barbu,Colin Conwell 机构:Dept. of Neurobiology, Harvard Medical School &, Dept. of Neuroscience, Washington University in St Louis, Dept. of Psychology, Harvard University; ,Google Research, Brain Team, BCS & CBMM, MIT; ,CSAIL & CBMM, MIT 备注:14 pages, 6 figures, 2 tables. Published in NeurIPS 2021 Workshop, Shared Visual Representations in Human & Machine Intelligence (SVRHM). For code, see this https URL 摘要:自监督学习是从自然数据中学习有用表示的有效方法。它也被认为是在人类中建立视觉表现的一种可能方法,但具体目标和算法尚不清楚。目前,大多数自监督方法鼓励系统学习同一图像与其他图像不同变换的不变表示。然而,这种转换通常在生物学上是不合理的,并且通常由人为的感知方案组成,例如随机裁剪和颜色抖动。在本文中,我们试图对这些增强进行反向工程,使其在生物学或感知上更具合理性,同时仍能为鼓励稳健表征提供相同的好处。关键的是,我们发现随机裁剪可以被皮质放大所取代,而像扫视一样的图像采样也可以帮助表征学习。这些转变的可行性表明了生物视觉系统实现自我监控的一种潜在方式。此外,它们打破了许多计算机视觉算法中广泛接受的空间一致性处理假设,表明了空间自适应计算在人类和机器中的作用。我们的代码和演示可以在这里找到。 摘要:Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast to those of other images. However, such transformations are generally non-biologically plausible, and often consist of contrived perceptual schemes such as random cropping and color jittering. In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible while still conferring the same benefits for encouraging robust representation. Critically, we find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning. The feasibility of these transformations suggests a potential way that biological visual systems could implement self-supervision. Further, they break the widely accepted spatially-uniform processing assumption used in many computer vision algorithms, suggesting a role for spatially-adaptive computation in humans and machines alike. Our code and demo can be found here.

【43】 Improving Spectral Graph Convolution for Learning Graph-level Representation 标题:改进谱图卷积学习图级表示 链接:https://arxiv.org/abs/2112.07160

作者:Mingqi Yang,Rui Li,Yanming Shen,Heng Qi,Baocai Yin 摘要:从最初理论上定义良好的谱图卷积到随后的基于空间的消息传递模型,空间局部性(顶点域)是大多数图神经网络(GNN)的基本原理。在谱图卷积中,滤波器用多项式近似,其中$k$阶多项式覆盖$k$跳邻居。在消息传递中,聚合中使用的各种邻居定义实际上是对空间位置信息的广泛探索。对于学习节点表示,拓扑距离似乎是必要的,因为它表征了节点之间的基本关系。然而,为了学习整个图的表示,是否仍然需要保持?在这项工作中,我们表明,这样的原则是没有必要的,它阻碍了大多数现有的GNN有效地编码图形结构。通过消除它以及多项式滤波器的限制,由此产生的新体系结构显著提高了学习图表示的性能。我们还研究了图形频谱对信号的影响,并将现有的各种改进解释为不同的频谱平滑技术。它作为一种空间理解,与众所周知的高/低通滤波器频谱理解相比,定量测量频谱对输入信号的影响。更重要的是,它为开发强大的图形表示模型提供了帮助。 摘要:From the original theoretically well-defined spectral graph convolution to the subsequent spatial bassed message-passing model, spatial locality (in vertex domain) acts as a fundamental principle of most graph neural networks (GNNs). In the spectral graph convolution, the filter is approximated by polynomials, where a $k$-order polynomial covers $k$-hop neighbors. In the message-passing, various definitions of neighbors used in aggregations are actually an extensive exploration of the spatial locality information. For learning node representations, the topological distance seems necessary since it characterizes the basic relations between nodes. However, for learning representations of the entire graphs, is it still necessary to hold? In this work, we show that such a principle is not necessary, it hinders most existing GNNs from efficiently encoding graph structures. By removing it, as well as the limitation of polynomial filters, the resulting new architecture significantly boosts performance on learning graph representations. We also study the effects of graph spectrum on signals and interpret various existing improvements as different spectrum smoothing techniques. It serves as a spatial understanding that quantitatively measures the effects of the spectrum to input signals in comparison to the well-known spectral understanding as high/low-pass filters. More importantly, it sheds the light on developing powerful graph representation models.

【44】 Birds Eye View Social Distancing Analysis System 标题:鸟瞰社交距离分析系统 链接:https://arxiv.org/abs/2112.07159

作者:Zhengye Yang,Mingfei Sun,Hongzhe Ye,Zihao Xiong,Gil Zussman,Zoran Kostic 机构:†Department of Electrical Engineering, Columbia University, United States of America 摘要:在2019冠状病毒疾病中,社会距离可以降低呼吸道传染病的感染率。交通路口特别适合于监测和评估大都市的社会距离行为。我们提出并评估了一个隐私保护的社会距离分析系统(B-SDA),该系统使用行人穿越十字路口的鸟瞰视频记录。我们设计了视频预处理、目标检测和跟踪的算法,这些算法植根于已知的计算机视觉和深度学习技术,但经过修改以解决检测高度提升的摄像机捕获的非常小的物体/行人的问题。我们提出了一种结合行人分组的方法来检测违反社交距离的行为。B-SDA用于根据大城市地区大流行前和大流行视频比较行人行为。完成的行人检测性能为$63.0%$$AP{50}$,跟踪性能为$47.6%$MOTA。流感大流行期间,违反社会距离的比率为15.6%$,明显低于流感大流行前的31.4%$基线,表明行人遵守CDC规定的社会距离建议。该系统适用于实际应用中的部署。 摘要:Social distancing can reduce the infection rates in respiratory pandemics such as COVID-19. Traffic intersections are particularly suitable for monitoring and evaluation of social distancing behavior in metropolises. We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections. We devise algorithms for video pre-processing, object detection and tracking which are rooted in the known computer-vision and deep learning techniques, but modified to address the problem of detecting very small objects/pedestrians captured by a highly elevated camera. We propose a method for incorporating pedestrian grouping for detection of social distancing violations. B-SDA is used to compare pedestrian behavior based on pre-pandemic and pandemic videos in a major metropolitan area. The accomplished pedestrian detection performance is $63.0%$ $AP_{50}$ and the tracking performance is $47.6%$ MOTA. The social distancing violation rate of $15.6%$ during the pandemic is notably lower than $31.4%$ pre-pandemic baseline, indicating that pedestrians followed CDC-prescribed social distancing recommendations. The proposed system is suitable for deployment in real-world applications.

【45】 Building on Huang et al. GlossBERT for Word Sense Disambiguation 标题:建立在Huang等人的基础上。词义排歧的GlossBERT 链接:https://arxiv.org/abs/2112.07089

作者:Nikhil Patel,James Hale,Kanika Jindal,Apoorva Sharma,Yichun Yu 机构:University of Southern California 摘要:我们建议处理词义消歧(WSD)问题。在语言中,同一形式的词可以根据上下文而具有不同的含义。虽然人类很容易根据上下文推断出这些单词的含义或光泽,但机器却在这项任务中遇到了麻烦。因此,我们打算复制和扩展Huang et al.GlossBERT的结果,他们设计了一个模型来消除这些单词的歧义(Huang et al.,2019)。具体来说,我们提出了以下增强:数据集调整(alpha超参数)、集成方法,以及用BART-andALBERT替换BERT。以下GitHub存储库包含本报告中使用的所有代码,它扩展了Huang等人提供的代码。 摘要:We propose to take on the problem ofWord Sense Disambiguation (WSD). In language, words of the same form can take different meanings depending on context. While humans easily infer the meaning or gloss of such words by their context, machines stumble on this task.As such, we intend to replicated and expand upon the results of Huang et al.GlossBERT, a model which they design to disambiguate these words (Huang et al.,2019). Specifically, we propose the following augmentations: data-set tweaking(alpha hyper-parameter), ensemble methods, and replacement of BERT with BART andALBERT. The following GitHub repository contains all code used in this report, which extends on the code made available by Huang et al.

【46】 Fuzzy Win-Win: A Novel Approach to Quantify Win-Win Using Fuzzy Logic 标题:模糊双赢:一种用模糊逻辑量化双赢的新方法 链接:https://arxiv.org/abs/2112.07045

作者:Ahmad B. Hassanat,Ghada A. Altarawneh,Ahmad S. Tarawneh 机构:Mutah University, Department of accounting, Karak, Jordan, Department of algorithms and their applications, Eotvos Lorand University, Budapest, Hungary 备注:25 pages, 5 figures 摘要:经典的双赢模式有一个关键缺陷,那就是它不能为各方提供适当数量的胜利,因为各方都认为自己是赢家。事实上,一方可能比另一方赢得更多。该策略不限于单一产品或谈判;它可以应用于生活中的各种情况。本文提出了一种新的衡量双赢局面的方法。该方法利用模糊逻辑建立数学模型,帮助谈判者量化他们的获胜百分比。该模型将在伊朗铀浓缩谈判、伊拉克-约旦石油协议和铁矿石谈判(2005-2009)等现实谈判场景中进行测试。所提出的模型在实践中是一个有用的工具,并且可以很容易地推广到其他领域。 摘要:The classic win-win has a key flaw in that it cannot offer the parties the right amounts of winning because each party believes they are winners. In reality, one party may win more than the other. This strategy is not limited to a single product or negotiation; it may be applied to a variety of situations in life. We present a novel way to measure the win-win situation in this paper. The proposed method employs Fuzzy logic to create a mathematical model that aids negotiators in quantifying their winning percentages. The model is put to the test on real-life negotiations scenarios such as the Iranian uranium enrichment negotiations, the Iraqi-Jordanian oil deal, and the iron ore negotiation (2005-2009). The presented model has shown to be a useful tool in practice and can be easily generalized to be utilized in other domains as well.

【47】 Synapse Compression for Event-Based Convolutional-Neural-Network Accelerators 标题:基于事件的卷积神经网络加速器的突触压缩 链接:https://arxiv.org/abs/2112.07019

作者:Lennart Bamberg,Arash Pourtaherian,Luc Waeijen,Anupam Chahar,Orlando Moreira 机构:! 备注:Preprint submitted to IEEE Transactions on Parallel and Distributed Systems 摘要:制造可行的神经形态芯片需要新的计算机架构,以实现大脑轻松支持的大规模并行和高效的信息处理。新兴的基于事件的体系结构使这一梦想成为现实。然而,突触连接的大内存需求阻碍了现代卷积神经网络(CNN)在大规模并行、基于事件(尖峰)结构上的应用。这项工作克服了这一障碍,提供了一个轻量级的硬件方案,将突触内存需求压缩数千倍,使复杂的CNN能够在一个小型芯片上执行。12纳米技术中的硅实现表明,该技术仅增加了系统的实现成本2%,尽管与之前发布的最佳技术相比,总内存占用减少了374倍。 摘要:Manufacturing-viable neuromorphic chips require novel computer architectures to achieve the massively parallel and efficient information processing the brain supports so effortlessly. Emerging event-based architectures are making this dream a reality. However, the large memory requirements for synaptic connectivity are a showstopper for the execution of modern convolutional neural networks (CNNs) on massively parallel, event-based (spiking) architectures. This work overcomes this roadblock by contributing a lightweight hardware scheme to compress the synaptic memory requirements by several thousand times, enabling the execution of complex CNNs on a single chip of small form factor. A silicon implementation in a 12-nm technology shows that the technique increases the system's implementation cost by only 2%, despite achieving a total memory-footprint reduction of up to 374x compared to the best previously published technique.

【48】 PantheonRL: A MARL Library for Dynamic Training Interactions 标题:PantheonRL:用于动态训练交互的Marl库 链接:https://arxiv.org/abs/2112.07013

作者:Bidipta Sarkar,Aditi Talati,Andy Shih,Dorsa Sadigh 机构:Department of Computer Science, Stanford University 备注:3 pages, 3 figures. Published in Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track) 2022 摘要:我们介绍了PantheonRL,一个用于动态训练交互(如循环、自适应和即席训练)的多智能体强化学习软件包。我们的软件包是围绕灵活的代理对象设计的,这些对象可以轻松配置以支持不同的训练交互,并处理具有混合奖励和n个代理的完全通用多代理环境。我们的软件包建立在StableBaselines3之上,可直接与现有强大的deep RL算法配合使用。最后,PantheonRL提供了一个直观但功能强大的web用户界面,用于配置实验和启动多个异步作业。我们的包裹可以在https://github.com/Stanford-ILIAD/PantheonRL. 摘要:We present PantheonRL, a multiagent reinforcement learning software package for dynamic training interactions such as round-robin, adaptive, and ad-hoc training. Our package is designed around flexible agent objects that can be easily configured to support different training interactions, and handles fully general multiagent environments with mixed rewards and n agents. Built on top of StableBaselines3, our package works directly with existing powerful deep RL algorithms. Finally, PantheonRL comes with an intuitive yet functional web user interface for configuring experiments and launching multiple asynchronous jobs. Our package can be found at https://github.com/Stanford-ILIAD/PantheonRL.

【49】 Controlled Cue Generation for Play Scripts 标题:游戏脚本的受控线索生成 链接:https://arxiv.org/abs/2112.06953

作者:Alara Dirik,Hilal Donmez,Pinar Yanardag 机构:Bo˘gaziçi University, Istanbul, Turkey 摘要:在本文中,我们使用了一个大规模的剧本数据集,提出了从对话中生成戏剧线索的新任务。使用超过一百万行的对话和线索,我们将线索生成问题作为受控文本生成任务来处理,并展示如何使用以对话/线索鉴别器为条件的语言模型来使用线索来增强对话的影响。此外,我们还探讨了主题关键字和情感在受控文本生成中的应用。大量的定量和定性实验表明,语言模型可以成功地用于在高度专业化的领域(如剧本)中生成合理的、属性受控的文本。有关支持材料,请访问:https://catlab-team.github.io/cuegen. 摘要:In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts. Supporting materials can be found at: https://catlab-team.github.io/cuegen.

【50】 Branching Strategy Selection Approach Based on Vivification Ratio 标题:基于活化率的分支策略选择方法 链接:https://arxiv.org/abs/2112.06917

作者:Mao Luo,Chu-Min Li,Xinyun Wu,Shuolin Li,Zhipeng Lü 机构: (School of Computer Science, Huazhong Univ. of Science and Technology 摘要:两种最有效的分支策略LRB和VSID在不同类型的实例上执行不同的操作。通常,LRB在精心编制的实例上更有效,而VSID在应用程序实例上更有效。然而,区分实例的类型是困难的。为了克服这一缺点,我们提出了一种基于活体比率的分支策略选择方法。这种方法更多地使用LRB分支策略来解决具有非常低活体化率的实例。我们测试了近几年SAT比赛的主要赛道上的实例。结果表明,该方法具有较强的鲁棒性,显著增加了求解实例的数量。值得一提的是,借助我们的方法,解算器Maple\u CM可以解决2020 SAT竞赛中基准测试的16个以上实例。 摘要:The two most effective branching strategies LRB and VSIDS perform differently on different types of instances. Generally, LRB is more effective on crafted instances, while VSIDS is more effective on application ones. However, distinguishing the types of instances is difficult. To overcome this drawback, we propose a branching strategy selection approach based on the vivification ratio. This approach uses the LRB branching strategy more to solve the instances with a very low vivification ratio. We tested the instances from the main track of SAT competitions in recent years. The results show that the proposed approach is robust and it significantly increases the number of solved instances. It is worth mentioning that, with the help of our approach, the solver Maple_CM can solve more than 16 instances for the benchmark from the 2020 SAT competition.

【51】 Visualizing Ensemble Predictions of Music Mood 标题:音乐情绪的合奏预测可视化 链接:https://arxiv.org/abs/2112.07627

作者:Zelin Ye,Min Chen 机构:University of Oxford, UK 备注:10 pages, 7 figures, submitted to EuroVis 2022 摘要:与其他分类问题(如流派、作曲家或时期)相比,音乐情绪分类一直是一个具有挑战性的问题。解决这一难题的一个解决方案是使用集成机器学习模型。在本文中,我们展示了可视化技术可以有效地传达流行的预测以及沿时间轴的不同音乐部分的不确定性,同时能够分析单个ML模型及其对不同音乐数据的应用。除了传统的视觉设计,如堆叠线图、Themerier和基于像素的可视化,我们还引入了Themerier的一种新变体,称为“双通量Themerier”,它允许观众比堆叠线图和Themerier更容易地观察和测量最流行的预测。测试表明,可视化集合预测在模型开发工作流和使用模型预测注释音乐方面都很有帮助。 摘要:Music mood classification has been a challenging problem in comparison with some other classification problems (e.g., genre, composer, or period). One solution for addressing this challenging is to use an of ensemble machine learning models. In this paper, we show that visualization techniques can effectively convey the popular prediction as well as uncertainty at different music sections along the temporal axis, while enabling the analysis of individual ML models in conjunction with their application to different musical data. In addition to the traditional visual designs, such as stacked line graph, ThemeRiver, and pixel-based visualization, we introduced a new variant of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe and measure the most popular prediction more easily than stacked line graph and ThemeRiver. Testing indicates that visualizing ensemble predictions is helpful both in model-development workflows and for annotating music using model predictions.

【52】 Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans{ä}tze 标题:利用群等变卷积量子算法加速学习量子态 链接:https://arxiv.org/abs/2112.07611

作者:Han Zheng,Zimu Li,Junyu Liu,Sergii Strelchuk,Risi Kondor 机构:Department of Statistics, The University of Chicago, Chicago, IL , USA, DAMTP, Center for Mathematical Sciences, University of Cambridge, Cambridge CB,WA, UK, Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL , USA 备注:16 pages, 12 figures 摘要:我们发展了一个$S_n$等变量子卷积电路的理论框架,建立在并显著推广了Jordan的置换量子计算(PQC)形式。我们证明,量子电路是傅里叶空间神经结构的自然选择,与对称群上最著名的经典快速傅里叶变换(FFT)相比,量子电路在计算$S_n$-傅里叶系数的矩阵元素时具有超指数加速。特别地,我们利用Okounkov-Vershik方法证明了Harrow关于$operatorname{SU}(D)$和$s_n$不可修复基之间等价性的陈述(博士论文2005年第160页),并使用Young Jucys Murphy(YJM)建立了$s_n$等变卷积量子交替Ans{a}tze($s_n$-CQA)元素。我们证明了$S_n$-CQA是稠密的,因此可以在每个$S_n$-unrepr块中表达,这可以作为未来潜在量子机器学习和优化应用的通用模型。我们的方法从表象理论的角度为证明量子近似优化算法(QOA)的普适性提供了另一种途径。我们的框架可以自然地应用于具有全局$operatorname{SU}(d)$对称性的广泛问题。我们通过数值模拟展示了ans{“a}的有效性tze在矩形晶格和Kagome晶格上找到$J_1$--$J_2$反铁磁海森堡模型基态的符号结构。我们的工作确定了特定机器学习问题的量子优势,并首次将著名的Okounkov Vershik表示理论应用于机器学习和量子物理。 摘要:We develop a theoretical framework for $S_n$-equivariant quantum convolutional circuits, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism. We show that quantum circuits are a natural choice for Fourier space neural architectures affording a super-exponential speedup in computing the matrix elements of $S_n$-Fourier coefficients compared to the best known classical Fast Fourier Transform (FFT) over the symmetric group. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $operatorname{SU}(d)$- and $S_n$-irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans{"a}tze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA are dense, thus expressible within each $S_n$-irrep block, which may serve as a universal model for potential future quantum machine learning and optimization applications. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA), from the representation-theoretical point of view. Our framework can be naturally applied to a wide array of problems with global $operatorname{SU}(d)$ symmetry. We present numerical simulations to showcase the effectiveness of the ans{"a}tze to find the sign structure of the ground state of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work identifies quantum advantage for a specific machine learning problem, and provides the first application of the celebrated Okounkov-Vershik's representation theory to machine learning and quantum physics.

【53】 Efficient differentiable quadratic programming layers: an ADMM approach 标题:高效可微二次规划层:ADMM方法 链接:https://arxiv.org/abs/2112.07464

作者:Andrew Butler,Roy Kwon 机构:University of Toronto, Department of Mechanical and Industrial Engineering 摘要:神经网络结构的最新进展允许将凸优化问题无缝集成为端到端可训练神经网络中的可微层。然而,将大中型二次规划集成到深度神经网络结构中是一个挑战,因为用内点方法精确求解二次规划在变量数量上具有最坏情况下的立方复杂性。在本文中,我们提出了一种基于交替方向乘数法(ADMM)的替代网络层架构,该架构能够扩展到具有中等数量变量的问题。后向微分是通过对修正的定点迭代的残差映射进行隐式微分来实现的。模拟结果证明了ADMM层的计算优势,对于中等规模的问题,它比OptNet二次规划层大约快一个数量级。此外,与基于KKT最优性条件的展开微分或隐式微分的标准方法相比,从记忆和计算的角度来看,我们新的后向传递例程是有效的。最后,我们以综合预测和优化范式中的投资组合优化为例进行总结。 摘要:Recent advances in neural-network architecture allow for seamless integration of convex optimization problems as differentiable layers in an end-to-end trainable neural network. Integrating medium and large scale quadratic programs into a deep neural network architecture, however, is challenging as solving quadratic programs exactly by interior-point methods has worst-case cubic complexity in the number of variables. In this paper, we present an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to problems with a moderately large number of variables. Backward differentiation is performed by implicit differentiation of the residual map of a modified fixed-point iteration. Simulated results demonstrate the computational advantage of the ADMM layer, which for medium scaled problems is approximately an order of magnitude faster than the OptNet quadratic programming layer. Furthermore, our novel backward-pass routine is efficient, from both a memory and computation standpoint, in comparison to the standard approach based on unrolled differentiation or implicit differentiation of the KKT optimality conditions. We conclude with examples from portfolio optimization in the integrated prediction and optimization paradigm.

【54】 Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration 标题:无监督变形图像配准的随机规划者-执行者-批评者 链接:https://arxiv.org/abs/2112.07415

作者:Ziwei Luo,Jing Hu,Xin Wang,Shu Hu,Bin Kong,Youbing Yin,Qi Song,Xi Wu,Siwei Lyu 机构: Chengdu University of Information Technology, China, Keya Medical, Seattle, USA, University at Buffalo, SUNY, USA 备注:Accepted by AAAI'22 摘要:不同形状和非线性形状变化引起的器官大变形对医学图像配准提出了重大挑战。传统的配准方法需要通过特定的变形模型迭代优化目标函数,并进行细致的参数调整,但在配准大变形图像时能力有限。虽然基于深度学习的方法可以学习从输入图像到其各自变形场的复杂映射,但它是基于回归的,并且容易陷入局部极小值,特别是当涉及大变形时。为此,我们提出了一种新的基于强化学习的框架,即随机规划-演员-评论家(SPAC),该框架执行逐步注册。关键概念是按每个时间步连续扭曲运动图像,以最终与固定图像对齐。考虑到在传统强化学习(RL)框架中处理高维连续动作和状态空间具有挑战性,我们在标准演员-评论家模型中引入了一个新概念“计划”,该概念具有低维性,可以帮助演员生成可处理的高维动作。整个框架以无监督训练为基础,以端到端的方式运作。我们在几个二维和三维医学图像数据集上评估了我们的方法,其中一些包含大变形。我们的实证结果突出表明,我们的工作取得了一致、显著的收益,并且优于最先进的方法。 摘要:Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan' to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods.

【55】 A Deep Knowledge Distillation framework for EEG assisted enhancement of single-lead ECG based sleep staging 标题:基于EEG的深度知识提取框架增强基于单导联心电信号的睡眠分期 链接:https://arxiv.org/abs/2112.07252

作者:Vaibhav Joshi,Sricharan Vijayarangan,Preejith SP,Mohanasankar Sivaprakasam 机构:IndianInstituteofTechnology(IIT-M), in 2 is with Department of Electrical Engineering, Indian Institute ofTechnology 备注:Accepted for IEEE HI-POCT 2022 摘要:自动睡眠分期研究目前是在脑电图(EEG)信号的帮助下完成的。最近,基于深度学习(DL)的方法在这一领域取得了重大进展,使得自动睡眠分级的准确性接近人类。然而,基于脑电图的睡眠分期需要广泛且昂贵的临床设置。此外,专家对设置的要求以及对研究对象的额外不便使其在护理点环境中不利。心电图(ECG)是EEG的一种不引人注目的替代方法,更为合适,但毫不奇怪,它的性能与基于EEG的睡眠分期相比仍处于劣势。当然,将知识从EEG传输到ECG,最终提高模型在基于ECG的输入上的性能将是有帮助的。知识提炼(KD)是DL中的一个著名概念,它旨在将知识从一个更好但可能更麻烦的教师模型转移到一个紧凑的学生模型。基于这一概念,我们提出了一个跨模态KD框架,以通过EEG训练模型学习的特征来提高基于ECG的睡眠分期性能。此外,我们还对所提出模型的各个组成部分进行了多次实验,以更好地了解蒸馏方法。我们的研究使用了蒙特利尔睡眠研究档案(MASS)中200名受试者的数据。提出的模型显示,在4级和3级睡眠分期中,加权F1评分分别增加了14.3%和13.4%。这证明了KD在4级(W-L-D-R)和3级(W-N-R)分类中提高单通道基于ECG的睡眠分期性能的可行性。 摘要:Automatic Sleep Staging study is presently done with the help of Electroencephalogram (EEG) signals. Recently, Deep Learning (DL) based approaches have enabled significant progress in this area, allowing for near-human accuracy in automated sleep staging. However, EEG based sleep staging requires an extensive as well as an expensive clinical setup. Moreover, the requirement of an expert for setup and the added inconvenience to the subject under study renders it unfavourable in a point of care context. Electrocardiogram (ECG), an unobtrusive alternative to EEG, is more suitable, but its performance, unsurprisingly, remains sub-par compared to EEG-based sleep staging. Naturally, it would be helpful to transfer knowledge from EEG to ECG, ultimately enhancing the model's performance on ECG based inputs. Knowledge Distillation (KD) is a renowned concept in DL that looks to transfer knowledge from a better but potentially more cumbersome teacher model to a compact student model. Building on this concept, we propose a cross-modal KD framework to improve ECG-based sleep staging performance with assistance from features learned through models trained on EEG. Additionally, we also conducted multiple experiments on the individual components of the proposed model to get better insight into the distillation approach. Data of 200 subjects from the Montreal Archive of Sleep Studies (MASS) was utilized for our study. The proposed model showed a 14.3% and 13.4% increase in weighted-F1-score in 4-class and 3-class sleep staging, respectively. This demonstrates the viability of KD for performance improvement of single-channel ECG based sleep staging in 4-class(W-L-D-R) and 3-class(W-N-R) classification.

【56】 Quantum Stream Learning 标题:量子流学习 链接:https://arxiv.org/abs/2112.06628

作者:Yongcheng Ding,Xi Chen,Rafael Magdalena-Benedicto,José D. Martín-Guerrero 机构: quantum control arisesYongcheng Ding and Xi Chen are with Department of Physical Chemistry, University of the Basque Country UPVEHU 备注:7 pages, 3 figures, submitted to the special issue on stream learning, comments are welcomed 摘要:量子力学的奇异性质使得机器学习(ML)在量子领域与经典应用不同。ML可以用于知识发现,它使用从量子系统中连续提取的信息来完成广泛的任务。该模型接收用于学习和决策的流式量子信息,从而对量子系统产生即时反馈。作为一种流学习方法,我们提出了一种在失谐、退相和弛豫情况下对连续测量的量子比特流数据进行深度强化学习的方法。我们还研究了代理如何通过转移学习来适应另一种量子噪声模式。流学习提供了对闭环量子控制的更好理解,这可能为先进的量子技术铺平道路。 摘要:The exotic nature of quantum mechanics makes machine learning (ML) be different in the quantum realm compared to classical applications. ML can be used for knowledge discovery using information continuously extracted from a quantum system in a broad range of tasks. The model receives streaming quantum information for learning and decision-making, resulting in instant feedback on the quantum system. As a stream learning approach, we present a deep reinforcement learning on streaming data from a continuously measured qubit at the presence of detuning, dephasing, and relaxation. We also investigate how the agent adapts to another quantum noise pattern by transfer learning. Stream learning provides a better understanding of closed-loop quantum control, which may pave the way for advanced quantum technologies.

【57】 Active Learning for the Optimal Design of Multinomial Classification in Physics 标题:主动学习在物理多项式分类优化设计中的应用 链接:https://arxiv.org/abs/2109.08612

作者:Yongcheng Ding,José D. Martín-Guerrero,Yujing Song,Rafael Magdalena-Benedito,Xi Chen 机构:Department of Physical Chemistry, University of the Basque Country UPVEHU, Apartado , Bilbao, Spain, ProQuam Co., Ltd., Shanghai, China, IDAL, Electronic Engineering Department, ETSE-UV, University of Valencia 备注:13 pages and 11 figures 摘要:模型训练的优化设计是机器学习中的一个重要课题。主动学习的目的是根据人工标注的估计模型,通过查询具有最大不确定性的样本,获得改进的模型;这还有一个额外的优点,即通过减少标记样本的数量来实现成功的性能。我们分析了它作为实验设计的助手的能力,以最小的保真度损失提取最大的学习信息,或者减少实验室标记的总操作成本。我们介绍了两个典型的应用,如量子信息检索在qutrits和相界预测在多体物理。对于一个等价的多项式分类问题,我们在标记少于2%样本的情况下,实现了99%的正确率。我们认为,以主动学习为灵感的物理实验将在不降低准确性的情况下显著节省预算。 摘要:Optimal design for model training is a critical topic in machine learning. Active Learning aims at obtaining improved models by querying samples with maximum uncertainty according to the estimation model for artificially labeling; this has the additional advantage of achieving successful performances with a reduced number of labeled samples. We analyze its capability as an assistant for the design of experiments, extracting maximum information for learning with the minimal cost in fidelity loss, or reducing total operation costs of labeling in the laboratory. We present two typical applications as quantum information retrieval in qutrits and phase boundary prediction in many-body physics. For an equivalent multinomial classification problem, we achieve the correct rate of 99% with less than 2% samples labeled. We reckon that active-learning-inspired physics experiments will remarkably save budget without loss of accuracy.

0 人点赞