cs.AI人工智能,共计82篇
【1】 Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning 标题:理解数据集特征对离线强化学习的影响 链接:https://arxiv.org/abs/2111.04714
作者:Kajetan Schweighofer,Markus Hofmarcher,Marius-Constantin Dinu,Philipp Renz,Angela Bitto-Nemling,Vihang Patil,Sepp Hochreiter 机构:§ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria, ‡Dynatrace Research, Austria, †Institute of Advanced Research in Artificial Intelligence (IARAI) 备注:Code: this https URL 摘要:在现实世界中,通过弱策略影响环境可能代价高昂或风险很大,因此阻碍了强化学习在现实世界中的应用。离线强化学习(RL)可以从给定的数据集中学习策略,而无需与环境交互。然而,数据集是离线RL算法的唯一信息源,并决定学习策略的性能。我们仍然缺乏关于数据集特征如何影响不同离线RL算法的研究。因此,我们对数据集特征如何影响离散动作环境下离线RL算法的性能进行了全面的实证分析。数据集由两个指标构成:(1)由轨迹质量(TQ)测量的平均数据集回报率;(2)由州行动覆盖率(SACo)测量的覆盖率。我们发现,非策略深度Q网络家族的变体需要SACo较高的数据集才能表现良好。对给定数据集约束学习策略的算法对于具有高TQ或SACo的数据集表现良好。对于具有高TQ的数据集,行为克隆的性能优于或类似于最佳离线RL算法。 摘要:In real world, affecting the environment by a weak policy can be expensive or very risky, therefore hampers real world applications of reinforcement learning. Offline Reinforcement Learning (RL) can learn policies from a given dataset without interacting with the environment. However, the dataset is the only source of information for an Offline RL algorithm and determines the performance of the learned policy. We still lack studies on how dataset characteristics influence different Offline RL algorithms. Therefore, we conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments. A dataset is characterized by two metrics: (1) the average dataset return measured by the Trajectory Quality (TQ) and (2) the coverage measured by the State-Action Coverage (SACo). We found that variants of the off-policy Deep Q-Network family require datasets with high SACo to perform well. Algorithms that constrain the learned policy towards the given dataset perform well for datasets with high TQ or SACo. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.
【2】 A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters 标题:价格响应型热水器无模型控制与模型预测控制的比较 链接:https://arxiv.org/abs/2111.04689
作者:David J. Biagioni,Xiangyu Zhang,Peter Graf,Devon Sigler,Wesley Jones 备注:None 摘要:我们对两种无模型控制算法,进化策略(ES)和近端策略优化(PPO),以及运行模拟价格响应型热水器的滚动时域模型预测控制(MPC)进行了仔细的比较。考虑了四种MPC变量:一种具有完美预测的单次控制器,产生最优控制;具有完美预测的有限时域控制器;基于均值预测的控制器;以及使用历史场景的两阶段随机规划控制器。在所有情况下,水温和电价的MPC模型都是精确的;只有水的需求是不确定的。为了进行比较,ES和PPO都通过在MPC使用的相同场景下直接与模拟环境交互来学习基于神经网络的策略。然后,在需求时间序列的单独一周连续上评估所有方法。我们证明了这个问题的最优控制是具有挑战性的,需要对MPC进行8小时以上的前瞻,并进行完美的预测,以达到最低成本。尽管存在这一挑战,ES和PPO都学习到了良好的通用策略,这些策略在平均成本方面优于平均预测和两阶段随机MPC控制器,并且计算速度快两个数量级以上。我们表明,ES尤其可以利用并行性,使用1150个CPU核在90秒内学习策略。 摘要:We present a careful comparison of two model-free control algorithms, Evolution Strategies (ES) and Proximal Policy Optimization (PPO), with receding horizon model predictive control (MPC) for operating simulated, price responsive water heaters. Four MPC variants are considered: a one-shot controller with perfect forecasting yielding optimal control; a limited-horizon controller with perfect forecasting; a mean forecasting-based controller; and a two-stage stochastic programming controller using historical scenarios. In all cases, the MPC model for water temperature and electricity price are exact; only water demand is uncertain. For comparison, both ES and PPO learn neural network-based policies by directly interacting with the simulated environment under the same scenarios used by MPC. All methods are then evaluated on a separate one-week continuation of the demand time series. We demonstrate that optimal control for this problem is challenging, requiring more than 8-hour lookahead for MPC with perfect forecasting to attain the minimum cost. Despite this challenge, both ES and PPO learn good general purpose policies that outperform mean forecast and two-stage stochastic MPC controllers in terms of average cost and are more than two orders of magnitude faster at computing actions. We show that ES in particular can leverage parallelism to learn a policy in under 90 seconds using 1150 CPU cores.
【3】 Reinforcement Learning for Mixed Autonomy Intersections 标题:混合自主交叉口的强化学习 链接:https://arxiv.org/abs/2111.04686
作者:Zhongxia Yan,Cathy Wu 机构: Massachusetts Institute ofTechnology zxyan, Massachusetts Instituteof Technology cathywu 备注:None 摘要:我们提出了一种无模型强化学习方法,用于控制只有双向和四向交叉口直通交通的模拟交通网络中的混合自主交通。我们的方法利用多智能体策略分解,允许对任意数量的受控车辆进行基于局部观测的分散控制。我们证明,即使没有奖励塑造,强化学习也能学会协调车辆以展示类似交通信号的行为,在33-50%受控车辆的情况下实现接近最优的吞吐量。借助于多任务学习和迁移学习,我们证明了这种行为在流入率和交通网络规模上的普遍性。我们的代码、模型和结果视频可在https://github.com/ZhongxiaYan/mixed_autonomy_intersections. 摘要:We propose a model-free reinforcement learning method for controlling mixed autonomy traffic in simulated traffic networks with through-traffic-only two-way and four-way intersections. Our method utilizes multi-agent policy decomposition which allows decentralized control based on local observations for an arbitrary number of controlled vehicles. We demonstrate that, even without reward shaping, reinforcement learning learns to coordinate the vehicles to exhibit traffic signal-like behaviors, achieving near-optimal throughput with 33-50% controlled vehicles. With the help of multi-task learning and transfer learning, we show that this behavior generalizes across inflow rates and size of the traffic network. Our code, models, and videos of results are available at https://github.com/ZhongxiaYan/mixed_autonomy_intersections.
【4】 Revisiting Methods for Finding Influential Examples 标题:重新审视寻找有影响力的例子的方法 链接:https://arxiv.org/abs/2111.04683
作者:Karthikeyan K,Anders Søgaard 机构: Duke University, University of Copenhagen 摘要:最近提出了几种基于实例的可解释性方法,用于寻找对测试时间决策有影响的训练示例,包括影响函数、TraceIn、Representer点选择、Grad-Dot和Grad-Cos。通常,这些方法使用LOO影响(库克距离)作为金标准进行评估,或使用各种启发式方法进行评估。在本文中,我们证明了上述所有方法都是不稳定的,即对初始化、训练数据的排序和批量大小非常敏感。我们认为,这是文献中假设示例的影响独立于模型状态和其他示例的自然结果,并认为事实并非如此。因此,我们证明了LOO影响和启发式是衡量基于实例的解释质量的糟糕指标,相反,我们建议通过它们检测中毒攻击的能力来评估这些解释。此外,我们提供了一个简单而有效的基线来改进上述所有方法,并展示了它如何导致下游任务的非常显著的改进。 摘要:Several instance-based explainability methods for finding influential training examples for test-time decisions have been proposed recently, including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot, and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's distance) as a gold standard, or using various heuristics. In this paper, we show that all of the above methods are unstable, i.e., extremely sensitive to initialization, ordering of the training data, and batch size. We suggest that this is a natural consequence of how in the literature, the influence of examples is assumed to be independent of model state and other examples -- and argue it is not. We show that LOO influence and heuristics are, as a result, poor metrics to measure the quality of instance-based explanations, and instead propose to evaluate such explanations by their ability to detect poisoning attacks. Further, we provide a simple, yet effective baseline to improve all of the above methods and show how it leads to very significant improvements on downstream tasks.
【5】 SMU: smooth activation function for deep networks using smoothing maximum technique 标题:SMU:基于平滑极大值技术的深层网络平滑激活函数 链接:https://arxiv.org/abs/2111.04682
作者:Koushik Biswas,Sandeep Kumar,Shilpak Banerjee,Ashish Kumar Pandey 备注:7 pages 摘要:深度学习研究人员对提出两种新的激活函数非常感兴趣,它们可以提高网络性能。选择一个好的激活函数可以在提高网络性能方面产生重大影响。人工激活是神经网络模型中最常见的选择。尽管ReLU有一些严重的缺点,但由于其简单性,ReLU是深度学习社区中最常见的选择。在本文中,我们提出了一种新的激活函数,该函数基于对已知激活函数(如泄漏ReLU)的近似,我们称之为平滑最大单位(SMU)。使用SMU替换ReLU后,使用ShuffleNet V2模型的CIFAR100数据集的性能提高了6.22%。 摘要:Deep learning researchers have a keen interest in proposing two new novel activation functions which can boost network performance. A good choice of activation function can have significant consequences in improving network performance. A handcrafted activation is the most common choice in neural network models. ReLU is the most common choice in the deep learning community due to its simplicity though ReLU has some serious drawbacks. In this paper, we have proposed a new novel activation function based on approximation of known activation functions like Leaky ReLU, and we call this function Smooth Maximum Unit (SMU). Replacing ReLU by SMU, we have got 6.22% improvement in the CIFAR100 dataset with the ShuffleNet V2 model.
【6】 Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data 标题:利用真实数据评估分布漂移的预测不确定性和稳健性 链接:https://arxiv.org/abs/2111.04665
作者:Kumud Lakara,Akshat Bhandari,Pratinav Seth,Ujjwal Verma 备注:6 pages, 3 figures, 4 tables 摘要:大多数机器学习模型都假设训练、测试和部署数据是独立的且分布相同(i.i.d.)。这种假设在自然环境中通常不成立。通常,部署数据会受到各种类型的分布转移的影响。模型性能的大小与数据集分布的这种变化成正比。因此,有必要评估模型的不确定性和对分布变化的鲁棒性,以便在真实数据上获得对其预期性能的现实估计。现有的评估不确定性和模型稳健性的方法缺乏全面性,往往无法描绘全貌。此外,到目前为止,大多数分析主要集中在分类任务上。在本文中,我们提出了更具洞察力的指标一般回归任务使用的移位天气预报数据集。我们还使用这些指标对基线方法进行了评估。 摘要:Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model's performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model's uncertainty and robustness to distributional shifts to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model's robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics.
【7】 Intelligent Reflecting Surfaces for Enhanced NOMA-based Visible Light Communications 标题:用于增强型NOMA可见光通信的智能反射面 链接:https://arxiv.org/abs/2111.04646
作者:Hanaa Abumarshoud,Bassant Selim,Mallik Tatipamula,Harald Haas 摘要:新兴的智能反射面(IRS)技术引入了在可见光通信(VLC)系统中控制光传播的潜力。这一概念为新的应用打开了大门,在新的应用中,频道本身可以改变,以实现特定的关键性能指标。在本文中,我们首次在公开文献中研究了IRS在采用非正交多址(NOMA)的VLC系统中提高链路可靠性的作用。我们为NOMA和IRS参数的联合优化提出了一个框架,并表明它在链路可靠性方面提供了显著的增强。当VLC通道受到阻塞和随机设备定向时,增强更加明显。 摘要:The emerging intelligent reflecting surface (IRS) technology introduces the potential of controlled light propagation in visible light communication (VLC) systems. This concept opens the door for new applications in which the channel itself can be altered to achieve specific key performance indicators. In this paper, for the first time in the open literature, we investigate the role that IRSs can play in enhancing the link reliability in VLC systems employing non-orthogonal multiple access (NOMA). We propose a framework for the joint optimisation of the NOMA and IRS parameters and show that it provides significant enhancements in link reliability. The enhancement is even more pronounced when the VLC channel is subject to blockage and random device orientation.
【8】 DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories 标题:DeepSteal:高级模型提取,利用记忆中有效的重量窃取 链接:https://arxiv.org/abs/2111.04625
作者:Adnan Siraj Rakin,Md Hafizul Islam Chowdhuryy,Fan Yao,Deliang Fan 机构: Co-First Authors with Equal Contributions, Department of Electrical,Computer and Energy Engineering, Arizona State University, Department of Electrical and Computer Engineering, University of Central Florida 摘要:深度神经网络(DNN)的最新发展在多个安全敏感领域得到了广泛的应用。资源密集型训练的需要和对有价值的特定领域训练数据的使用使这些模型成为模型所有者的顶级知识产权(IP)。DNN隐私的主要威胁之一是模型提取攻击,对手试图窃取DNN模型中的敏感信息。最近的研究表明,基于硬件的侧通道攻击可以揭示关于DNN模型(例如模型架构)的内部知识,但是,迄今为止,现有攻击无法提取详细的模型参数(例如权重/偏差)。在这项工作中,我们首次提出了一种先进的模型提取攻击框架DeepSteal,该框架借助内存侧通道攻击有效地窃取DNN权重。我们提议的DeepSteal包括两个关键阶段。首先,采用基于rowhammer的硬件故障技术作为信息泄漏向量,提出了一种新的权值位信息提取方法HammerLeak。HammerLeak利用了几种针对DNN应用的新型系统级技术,以实现快速高效的重量窃取。其次,我们提出了一种新的具有平均聚类权重惩罚的替代模型训练算法,该算法有效地利用了部分泄漏的比特信息,并生成了目标受害者模型的替代原型。我们在三个流行的图像数据集(如CIFAR-10/100/GTSRB)和四个DNN体系结构(如ResNet-18/34/Wide ResNet/VGG-11)上评估了这种替代模型提取方法。提取的替代模型在CIFAR-10数据集的深度残差网络上已成功实现90%以上的测试精度。此外,我们提取的替代模型还可以生成有效的对抗性输入样本来愚弄受害者模型。 摘要:Recent advancements of Deep Neural Networks (DNNs) have seen widespread deployment in multiple security-sensitive domains. The need of resource-intensive training and use of valuable domain-specific training data have made these models a top intellectual property (IP) for model owners. One of the major threats to the DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. Recent studies show hardware-based side channel attacks can reveal internal knowledge about DNN models (e.g., model architectures) However, to date, existing attacks cannot extract detailed model parameters (e.g., weights/biases). In this work, for the first time, we propose an advanced model extraction attack framework DeepSteal that effectively steals DNN weights with the aid of memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer based hardware fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailed for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate this substitute model extraction method on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNet/VGG-11). The extracted substitute model has successfully achieved more than 90 % test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model.
【9】 CoCo Games: Graphical Game-Theoretic Swarm Control for Communication-Aware Coverage 标题:可可博弈:通信感知覆盖的图形化博弈理论群控制 链接:https://arxiv.org/abs/2111.04576
作者:Malintha Fernando,Ransalu Senanayake,Martin Swany 机构: thus re-ducing the communication overhead incurred by the controlMalintha Fernando and Martin Swany are with the Luddy SchoolofInformatics, andEngineeringatIndianaUniversity, Ransalu Senanayake is with StanfordUniversity 备注:8 pages, 7 figures 摘要:我们提出了一种新的方法,最大限度地提高机器人在大规模感兴趣地理区域(ROI)上的通信感知覆盖率。我们的方法在邻域选择和控制方面补充了底层网络拓扑,使其在动态环境中具有高度鲁棒性。我们将覆盖率描述为一个多阶段的合作图形博弈,并使用变分推理(VI)来达到均衡。我们在使用无人机(UAV)和用户设备(UE)机器人的移动ad-hoc无线网络场景中对我们的方法进行了实验验证。我们表明,它可以满足现实网络条件下由静止和移动用户设备(UE)机器人定义的ROI。 摘要:We present a novel approach to maximize the communication-aware coverage for robots operating over large-scale geographical regions of interest (ROIs). Our approach complements the underlying network topology in neighborhood selection and control, rendering it highly robust in dynamic environments. We formulate the coverage as a multi-stage, cooperative graphical game and employ Variational Inference (VI) to reach the equilibrium. We experimentally validate our approach in an mobile ad-hoc wireless network scenario using Unmanned Aerial Vehicles (UAV) and User Equipment (UE) robots. We show that it can cater to ROIs defined by stationary and moving User Equipment (UE) robots under realistic network conditions.
【10】 Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models 标题:使用单语和多语BERT和集成模型预测西班牙语和英语推文中的性别歧视 链接:https://arxiv.org/abs/2111.04551
作者:Angel Felipe Magnossão de Paula,Roberto Fray da Silva,Ipek Baris Schlicht 机构: Universitat Politcnica de Valncia, Spain, Escola Politcnica da Universidade de So Paulo 备注:18 pages, presented at IberLEF: this http URL, the best scoring system at EXIST 摘要:社交媒体的普及带来了仇恨言论和性别歧视等问题。社会媒体中性别歧视的识别和分类是非常相关的任务,因为这将有助于建立一个更健康的社会环境。然而,这些任务相当具有挑战性。这项工作提出了一个系统,使用多语言和单语的BERT和数据点翻译和集成策略来识别和分类英语和西班牙语中的性别歧视。它是在伊比利亚语言评估论坛(IberLEF)提出的社会网络共享2021(EXIST 2021)任务中进行性别歧视识别的。描述了所提出的系统及其主要部件,并进行了深入的超参数分析。观察到的主要结果是:(i)系统比基线模型(多语言BERT)获得更好的结果;(ii)集成模型比单语模型获得更好的结果;(iii)考虑所有单个模型和最佳标准化值的集成模型可获得两项任务的最佳精确度和F1分数。这项工作在EXIST的两项任务中均获得第一名,具有最高的精确度(任务1为0.780,任务2为0.658)和F1分数(任务1为0.780的F1二进制,任务2为0.579的F1宏)。 摘要:The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).
【11】 Improving RNA Secondary Structure Design using Deep Reinforcement Learning 标题:利用深度强化学习改进RNA二级结构设计 链接:https://arxiv.org/abs/2111.04504
作者:Alexander Whatley,Zhekun Luo,Xiangru Tang 机构:Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 摘要:近年来开发新药和治疗方法的成本不断上升,导致了生物分子设计优化技术的广泛研究。目前,生物分子设计中应用最广泛的方法是定向进化,这是一种模拟生物进化的贪婪爬山算法。在本文中,我们提出了一个将强化学习应用于RNA序列设计的新基准,其中目标函数被定义为序列二级结构中的自由能。除了对标准库中每种强化学习算法的普通实现进行实验外,我们还分析了每种算法的变体,其中我们修改了算法的奖励函数并调整了模型的超参数。我们展示了我们对这些算法所做的消融分析的结果,以及表明该算法跨批次性能及其搜索RNA序列可能空间的能力的图表。我们发现,我们的DQN算法在这种情况下表现最好,而PPO在所有测试算法中表现最好。我们的结果应该引起生物分子设计界的兴趣,并且应该作为未来分子设计中涉及机器学习的实验的基线。 摘要:Rising costs in recent years of developing new drugs and treatments have led to extensive research in optimization techniques in biomolecular design. Currently, the most widely used approach in biomolecular design is directed evolution, which is a greedy hill-climbing algorithm that simulates biological evolution. In this paper, we propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure. In addition to experimenting with the vanilla implementations of each reinforcement learning algorithm from standard libraries, we analyze variants of each algorithm in which we modify the algorithm's reward function and tune the model's hyperparameters. We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches and its ability to search the possible space of RNA sequences. We find that our DQN algorithm performs by far the best in this setting, contrasting with, in which PPO performs the best among all tested algorithms. Our results should be of interest to those in the biomolecular design community and should serve as a baseline for future experiments involving machine learning in molecule design.
【12】 Multi-Airport Delay Prediction with Transformers 标题:基于Transformer的多机场延误预测 链接:https://arxiv.org/abs/2111.04494
作者:Liya Wang,Alex Tien,Jason Chou 机构:The MITRE Corporation, McLean, VA, United States 摘要:具有合理前瞻时间的机场性能预测是一项具有挑战性的任务,之前的各种研究已经尝试过。交通、需求、天气和交通管理措施都是任何预测模型的关键输入。本文提出了一种基于时间融合变换(TFT)的新方法来同时预测多个机场的起飞和到达延误。这种方法可以捕获预测时已知输入的复杂时间动态,然后预测未来四小时内选定的延迟度量。在处理天气输入时,开发了一个自监督学习(SSL)模型,将高维天气数据编码为低维表示,以使TFT的训练更加高效。初步结果表明,基于TFT的延迟预测模型在测试数据集上通过较小的预测误差获得了令人满意的性能。此外,模型输出的可解释性分析确定了延迟预测的重要输入因素。所提议的方法有望帮助空中交通管理者或决策者了解关于延误缓解的交通管理措施,一旦投入使用,将提供足够的准备时间来规划预测的性能下降。 摘要:Airport performance prediction with a reasonable look-ahead time is a challenging task and has been attempted by various prior research. Traffic, demand, weather, and traffic management actions are all critical inputs to any prediction model. In this paper, a novel approach based on Temporal Fusion Transformer (TFT) was proposed to predict departure and arrival delays simultaneously for multiple airports at once. This approach can capture complex temporal dynamics of the inputs known at the time of prediction and then forecast selected delay metrics up to four hours into the future. When dealing with weather inputs, a self-supervised learning (SSL) model was developed to encode high-dimensional weather data into a much lower-dimensional representation to make the training of TFT more efficiently and effectively. The initial results show that the TFT-based delay prediction model achieves satisfactory performance measured by smaller prediction errors on a testing dataset. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction. The proposed approach is expected to help air traffic managers or decision makers gain insights about traffic management actions on delay mitigation and once operationalized, provide enough lead time to plan for predicted performance degradation.
【13】 Identifying the Leading Factors of Significant Weight Gains Using a New Rule Discovery Method 标题:用一种新的规则发现方法识别显著增重的主导因素 链接:https://arxiv.org/abs/2111.04475
作者:Mina Samizadeh,Jessica C Jones-Smith,Bethany Sheridan,Rahmatollah Beheshti 机构: University of Delaware, Delaware, USA., University of Washington, Washington, USA., athenahealth, Inc., Massachusetts, USA. 备注:The code for this project is available on: this https URL 摘要:超重和肥胖仍然是一个主要的全球公共卫生问题,确定增加未来体重增加风险的个体化模式对于预防肥胖和许多与肥胖相关的后续疾病具有至关重要的作用。在这项工作中,我们使用一种规则发现方法来研究这个问题,方法是提供真正的可解释性,同时优化识别模式的准确性(通常是正确的)和支持(应用于许多样本)。具体而言,我们扩展了一种已建立的子组发现方法,以生成所需的X->Y类型规则,并展示如何从X侧提取顶部特征,作为Y的最佳预测因子。在我们的肥胖问题中,X指从非常大和多站点EHR数据中提取的特征,Y表示体重显著增加。使用我们的方法,我们还广泛地比较了22个阶层的模式差异和不平等,这22个阶层由个人的性别、年龄、种族、保险类型、社区类型和收入水平决定。通过一系列广泛的实验,我们展示了关于未来危险体重增加预测因子的新的补充发现。 摘要:Overweight and obesity remain a major global public health concern and identifying the individualized patterns that increase the risk of future weight gains has a crucial role in preventing obesity and numerous sub-sequent diseases associated with obesity. In this work, we use a rule discovery method to study this problem, by presenting an approach that offers genuine interpretability and concurrently optimizes the accuracy(being correct often) and support (applying to many samples) of the identified patterns. Specifically, we extend an established subgroup-discovery method to generate the desired rules of type X -> Y and show how top features can be extracted from the X side, functioning as the best predictors of Y. In our obesity problem, X refers to the extracted features from very large and multi-site EHR data, and Y indicates significant weight gains. Using our method, we also extensively compare the differences and inequities in patterns across 22 strata determined by the individual's gender, age, race, insurance type, neighborhood type, and income level. Through extensive series of experiments, we show new and complementary findings regarding the predictors of future dangerous weight gains.
【14】 Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network 标题:基于深度神经网络的武器交战区域最大射程估计 链接:https://arxiv.org/abs/2111.04474
作者:Joao P. A. Dantas,Andre N. Costa,Diego Geraldo,Marcos R. O. A. Maximo,Takashi Yoneyama 摘要:这项工作研究了使用深度神经网络(DNN)来估计武器交战区(WEZ)的最大发射距离。WEZ允许飞行员识别可用导弹成功打击特定目标的可能性更大的空域,即敌方易受射击的飞机周围的假设区域。我们提出了一种在可变条件下使用50000次模拟发射确定给定导弹WEZ的方法。这些模拟用于训练DNN,当飞机发现自己处于不同的射击条件时,该DNN可以预测WEZ,确定系数为0.99。它提供了另一个与先前研究相关的程序,因为它采用了非离散化模型,即,它同时考虑了WEZ的所有方向,这是以前从未做过的。此外,所提出的方法使用了一种实验设计,允许更少的模拟运行,提供更快的模型训练。 摘要:This work investigates the use of a Deep Neural Network (DNN) to perform an estimation of the Weapon Engagement Zone (WEZ) maximum launch range. The WEZ allows the pilot to identify an airspace in which the available missile has a more significant probability of successfully engaging a particular target, i.e., a hypothetical area surrounding an aircraft in which an adversary is vulnerable to a shot. We propose an approach to determine the WEZ of a given missile using 50,000 simulated launches in variate conditions. These simulations are used to train a DNN that can predict the WEZ when the aircraft finds itself on different firing conditions, with a coefficient of determination of 0.99. It provides another procedure concerning preceding research since it employs a non-discretized model, i.e., it considers all directions of the WEZ at once, which has not been done previously. Additionally, the proposed method uses an experimental design that allows for fewer simulation runs, providing faster model training.
【15】 DeSkew-LSH based Code-to-Code Recommendation Engine 标题:基于DeSkew-LSH的码到码推荐引擎 链接:https://arxiv.org/abs/2111.04473
作者:Fran Silavong,Sean Moran,Antonios Georgiadis,Rohan Saphal,Robert Otter 摘要:源代码机器学习(MLOnCode)是一个受欢迎的研究领域,这是由大规模代码存储库的可用性以及为挖掘源代码而开发的强大概率和深度学习模型推动的。代码到代码推荐是MLOnCode中的一项任务,旨在推荐相关、多样和简洁的代码片段,这些代码片段可以有效地扩展开发人员当前在其开发环境(IDE)中编写的代码。代码到代码推荐引擎通过减少IDE中的上下文切换和增加代码重用来提高开发人员的生产率。现有的代码到代码推荐引擎不能很好地扩展到大型代码库,随着代码库的大小增加,查询时间呈线性增长。此外,现有的代码到代码推荐引擎无法在排名函数中考虑代码存储库的全局统计信息,例如代码段长度的分布,从而导致次优检索结果。我们使用一个新的代码对代码推荐引擎emph{Senatus}来解决这两个弱点。Senatus的核心是emph{De Skew}LSH一种新的局部敏感哈希(LSH)算法,该算法对数据进行索引以进行快速(次线性时间)检索,同时还使用新的基于抽象语法树的特征评分和选择算法抵消片段长度分布中的偏斜。我们通过自动评估和专家开发人员用户研究对Senatus进行评估,发现建议的质量高于竞争基线,同时实现更快的搜索。例如,在CodeSearchNet数据集上,我们显示Senatus在代码对代码推荐任务上的性能提高了6.7�,查询时间比Facebook Aroma快16倍。 摘要:Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code snippets that usefully extend the code currently being written by a developer in their development environment (IDE). Code-to-code recommendation engines hold the promise of increasing developer productivity by reducing context switching from the IDE and increasing code-reuse. Existing code-to-code recommendation engines do not scale gracefully to large codebases, exhibiting a linear growth in query time as the code repository increases in size. In addition, existing code-to-code recommendation engines fail to account for the global statistics of code repositories in the ranking function, such as the distribution of code snippet lengths, leading to sub-optimal retrieval results. We address both of these weaknesses with emph{Senatus}, a new code-to-code recommendation engine. At the core of Senatus is emph{De-Skew} LSH a new locality sensitive hashing (LSH) algorithm that indexes the data for fast (sub-linear time) retrieval while also counteracting the skewness in the snippet length distribution using novel abstract syntax tree-based feature scoring and selection algorithms. We evaluate Senatus via automatic evaluation and with an expert developer user study and find the recommendations to be of higher quality than competing baselines, while achieving faster search. For example, on the CodeSearchNet dataset we show that Senatus improves performance by 6.7% F1 and query time 16x is faster compared to Facebook Aroma on the task of code-to-code recommendation.
【16】 Ten Conceptual Dimensions of Context 标题:语境的十个概念维度 链接:https://arxiv.org/abs/2111.04472
作者:Hashai Papneja 摘要:本文试图综合计算文献中“上下文”一词的各种概念。因此,语境的十个概念维度出现了——位置;用户、任务和系统特征;物理、社会、组织和文化环境;与时间相关的方面和历史信息。总之,语境的十个维度提供了语境概念的综合视图,并允许更系统地检查语境和语境信息对人类系统或人类AI交互的影响。 摘要:This paper attempts to synthesize various conceptualizations of the term "context" as found in computing literature. Ten conceptual dimensions of context thus emerge -- location; user, task, and system characteristics; physical, social, organizational, and cultural environments; time-related aspects, and historical information. Together, the ten dimensions of context provide a comprehensive view of the notion of context, and allow for a more systematic examination of the influence of context and contextual information on human-system or human-AI interactions.
【17】 Flight Demand Forecasting with Transformers 标题:基于Transformer的航班需求预测 链接:https://arxiv.org/abs/2111.04471
作者:Liya Wang,Amy Mykityshyn,Craig Johnson,Jillian Cheng 机构:The MITRE Corporation, McLean, VA, United States, Federal Aviation Administration 备注:arXiv admin note: substantial text overlap with arXiv:2011.04476 摘要:Transformer已经成为自然语言处理(NLP)领域事实上的标准。他们在计算机视觉和其他领域也取得了进展。Transformer可以使人工智能(AI)模型动态关注其输入的某些部分,从而更有效地进行推理。受transformers成功的启发,我们采用该技术预测多个视野中的战略航班起飞需求。这项工作是为了支持MITRE开发的移动应用程序Pacer而进行的,Pacer向通用航空(GA)航班运营商显示预测的离港需求,以便他们能够更好地了解繁忙时段离港延误的可能性。涉及Pacer先前设计的基于规则的预测方法的现场演示表明,出发需求的预测精度仍有改进的余地。本研究致力于从两个关键方面提高预测精度:更好的数据源和稳健的预测算法。我们利用了两个数据源,航空系统性能指标(ASPM)和系统范围信息管理(SWIM),作为我们的输入。然后,我们使用时间融合变换器(TFT)对五个不同机场的预测模型进行训练。案例研究表明,TFT比传统的预测方法有更大的优势,它们可以在不同的机场进行更好的预测,并具有更好的可解释性。 摘要:Transformers have become the de-facto standard in the natural language processing (NLP) field. They have also gained momentum in computer vision and other domains. Transformers can enable artificial intelligence (AI) models to dynamically focus on certain parts of their input and thus reason more effectively. Inspired by the success of transformers, we adopted this technique to predict strategic flight departure demand in multiple horizons. This work was conducted in support of a MITRE-developed mobile application, Pacer, which displays predicted departure demand to general aviation (GA) flight operators so they can have better situation awareness of the potential for departure delays during busy periods. Field demonstrations involving Pacer's previously designed rule-based prediction method showed that the prediction accuracy of departure demand still has room for improvement. This research strives to improve prediction accuracy from two key aspects: better data sources and robust forecasting algorithms. We leveraged two data sources, Aviation System Performance Metrics (ASPM) and System Wide Information Management (SWIM), as our input. We then trained forecasting models with temporal fusion transformer (TFT) for five different airports. Case studies show that TFTs can perform better than traditional forecasting methods by large margins, and they can result in better prediction across diverse airports and with better interpretability.
【18】 Improving Peer Assessment with Graph Convolutional Networks 标题:利用图卷积网络改进对等点评估 链接:https://arxiv.org/abs/2111.04466
作者:Alireza A. Namanloo,Julie Thorpe,Amirali Salehi-Abari 机构:Ontario Tech University, Oshawa, Ontario, Canada 摘要:同行评估系统出现在许多社会和多代理环境中,如大型(在线)班级中的同行评分、会议中的同行评审、同行艺术评估等。然而,同行评估可能不如专家评估准确,因此这些系统不可靠。同行评估系统的可靠性受到各种因素的影响,如同行的评估能力、他们的战略评估行为和同行评估设置(例如,同行评估小组工作或其他人的个人工作)。在这项工作中,我们首先将同伴评估建模为多关系加权网络,该网络可以表示各种同伴评估设置,并捕获利益冲突和战略行为。利用我们的同行评估网络模型,我们引入了一个图卷积网络,它可以学习评估模式和用户行为,从而更准确地预测专家评估。我们在真实和合成数据集上的大量实验证明了我们提出的方法的有效性,它优于现有的同行评估方法。 摘要:Peer assessment systems are emerging in many social and multi-agent settings, such as peer grading in large (online) classes, peer review in conferences, peer art evaluation, etc. However, peer assessments might not be as accurate as expert evaluations, thus rendering these systems unreliable. The reliability of peer assessment systems is influenced by various factors such as assessment ability of peers, their strategic assessment behaviors, and the peer assessment setup (e.g., peer evaluating group work or individual work of others). In this work, we first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups, plus capture conflicts of interest and strategic behaviors. Leveraging our peer assessment network model, we introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations. Our extensive experiments on real and synthetic datasets demonstrate the efficacy of our proposed approach, which outperforms existing peer assessment methods.
【19】 IoT to monitor people flow in areas of public interest 标题:物联网监控公共利益领域的人员流动 链接:https://arxiv.org/abs/2111.04465
作者:Damiano Perri,Marco Simonetti,Alex Bordini,Simone Cimarelli,Osvaldo Gervasi 机构: University of Florence, Dept. of Mathematics and Computer Science, Florence, University of Perugia, Dept. of Mathematics and Computer Science, Perugia, Italy 备注:None 摘要:我们生活在一个意想不到的历史时期,这一时期突然迫使我们放松了个人之间的任何形式的互动,逐渐迫使我们采取新的方式来遵守安全距离;事实上,目前的情况比以往任何时候都更加表明,能够正确地组织我们的旅行计划,使人们处于安全的条件下,并避免有害的环境是多么重要。这项研究的目的是建立一个系统,在不收集个人或敏感数据的情况下,监测感兴趣的公共场所和设施(博物馆、剧院、电影院等)内的人流。通过物联网工具对人流的微弱监控(即,在没有对被监控对象进行个人识别的情况下进行监控)可能是一个可行的解决方案,可以最大限度地减少排队和过度拥挤。我们的研究始于意大利翁布里亚地区的一项实验,目的是成为自动规划人流的几个答案之一,以使我们的土地更宜居。我们打算展示物联网提供了几乎无限的工具和可能性,从开发基本信息流程到实现真正的门户,使商务人员能够与感兴趣的消费者联系。 摘要:The unexpected historical period we are living has abruptly pushed us to loosen any sort of interaction between individuals, gradually forcing us to deal with new ways to allow compliance with safety distances; indeed the present situation has demonstrated more than ever how critical it is to be able to properly organize our travel plans, put people in safe conditions, and avoid harmful circumstances. The aim of this research is to set up a system to monitor the flow of people inside public places and facilities of interest (museums, theatres, cinemas, etc.) without collecting personal or sensitive data. Weak monitoring of people flows (i.e. monitoring without personal identification of the monitored subjects) through Internet of Things tools might be a viable solution to minimize lineups and overcrowding. Our study, which began as an experiment in the Umbria region of Italy, aims to be one of several answers to automated planning of people's flows in order to make our land more liveable. We intend to show that the Internet of Things gives almost unlimited tools and possibilities, from developing a basic information process to implementing a true portal which enables business people to connect with interested consumers.
【20】 Creating A Coefficient of Change in the Built Environment After a Natural Disaster 标题:创造自然灾害后建筑环境的变化系数 链接:https://arxiv.org/abs/2111.04462
作者:Karla Saldana Ochoa 机构:School of Architecture, SHARE Lab, University of Florida, Inner Rd, Gainesville, FL 摘要:本研究提出了一种新的方法来评估建筑环境中的损伤,使用深度学习工作流对其进行量化。借助自动爬虫,从Google Earth获得了全球50个震中自然灾害前后的航空图像,生成了10000个航空图像数据库,空间分辨率为每像素2米。本研究利用Seg网络算法从两种情况(自然灾害前后)的卫星图像中对建筑环境进行语义分割。对于图像分割,Seg网络是最流行和通用的CNN体系结构之一。使用Seg网络算法进行分割,准确率达到92%。在分割之后,我们比较了两种情况之间的差异(以变化百分比表示)。这种变化系数在数值上代表了城市环境的破坏,以量化建筑环境中的整体破坏。这样一个指数可以让政府估计受影响家庭的数量,或许还可以估计房屋受损的程度。 摘要:This study proposes a novel method to assess damages in the built environment using a deep learning workflow to quantify it. Thanks to an automated crawler, aerial images from before and after a natural disaster of 50 epicenters worldwide were obtained from Google Earth, generating a 10,000 aerial image database with a spatial resolution of 2 m per pixel. The study utilizes the algorithm Seg-Net to perform semantic segmentation of the built environment from the satellite images in both instances (prior and post-natural disasters). For image segmentation, Seg-Net is one of the most popular and general CNN architectures. The Seg-Net algorithm used reached an accuracy of 92% in the segmentation. After the segmentation, we compared the disparity between both cases represented as a percentage of change. Such coefficient of change represents the damage numerically an urban environment had to quantify the overall damage in the built environment. Such an index can give the government an estimate of the number of affected households and perhaps the extent of housing damage.
【21】 Systematic Review for AI-based Language Learning Tools 标题:基于人工智能的语言学习工具的系统评价 链接:https://arxiv.org/abs/2111.04455
作者:Jin Ha Woo,Heeyoul Choi 机构:Arizona State University, Tempe, AZ, Handong Global University, Pohang, South Korea 备注:10 pages, 6 figures 摘要:个性化学习的日益重视和人工智能(AI)的迅速发展对第二语言习得领域产生了重大影响。尽管随着人工智能在计算机辅助语言学习领域的应用,人们正在开发越来越多的适应性语言学习工具,但人们一直担心信息不足和教师准备不足。为了有效地利用这些工具,教师需要对最近开发的基于人工智能的语言学习工具进行深入的概述。因此,本综述综合了2017年至2020年间开发的人工智能工具的信息。这些工具大多利用机器学习和自然语言处理,用于识别错误、提供反馈和评估语言能力。使用这些工具后,学习者的语言能力和知识有所提高。本综述最后介绍了基于人工智能的语言学习工具未来研究的教学意义和新兴主题。 摘要:The Second Language Acquisition field has been significantly impacted by a greater emphasis on individualized learning and rapid developments in artificial intelligence (AI). Although increasingly adaptive language learning tools are being developed with the application of AI to the Computer Assisted Language Learning field, there have been concerns regarding insufficient information and teacher preparation. To effectively utilize these tools, teachers need an in-depth overview on recently developed AI-based language learning tools. Therefore, this review synthesized information on AI tools that were developed between 2017 and 2020. A majority of these tools utilized machine learning and natural language processing, and were used to identify errors, provide feedback, and assess language abilities. After using these tools, learners demonstrated gains in their language abilities and knowledge. This review concludes by presenting pedagogical implications and emerging themes in the future research of AI-based language learning tools.
【22】 The Problem of Zombie Datasets:A Framework For Deprecating Datasets 标题:僵尸数据集问题:一个弃用数据集的框架 链接:https://arxiv.org/abs/2111.04424
作者:Frances Corry,Hamsini Sridharan,Alexandra Sasha Luccioni,Mike Ananny,Jason Schultz,Kate Crawford 机构:University of Southern California, Mila, Université de Montréal, New York University, Microsoft Research 摘要:当机器学习数据集因法律、道德或技术原因被弃用,但仍被广泛使用时,会发生什么情况?在本文中,我们研究了几个著名的弃用或修订数据集的公众来世,包括ImageNet、8000万张微型图像、MS-Celeb-1M、Duke MTMC、Brainwash和HRT Transgender,以便为更加一致、道德和负责任的数据集弃用提供框架。在先前研究的基础上,我们发现,关于数据集弃用的信息缺乏一致性、透明度和集中来源,因此,这些数据集及其衍生产品继续被论文引用并在网上传播。这些永不消亡的数据集——我们称之为“僵尸数据集”——继续为生产级系统的设计提供信息,引发技术、法律和道德挑战;在这样做的过程中,他们有可能使导致他们所谓的退出的伤害永久化,包括对偏见、歧视和隐私的担忧。基于这一分析,我们提出了一个数据集弃用框架,其中包括风险考虑、影响缓解、申诉机制、时间线、弃用后协议和可由机器学习社区调整和实施的发布检查。在数据表和检查表的基础上,我们进一步提供了两个样本数据集弃用表,并提出了一个集中存储库,用于跟踪哪些数据集已弃用,并可纳入NeurIPS等场馆的发布协议。 摘要:What happens when a machine learning dataset is deprecated for legal, ethical, or technical reasons, but continues to be widely used? In this paper, we examine the public afterlives of several prominent deprecated or redacted datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender, in order to inform a framework for more consistent, ethical, and accountable dataset deprecation. Building on prior research, we find that there is a lack of consistency, transparency, and centralized sourcing of information on the deprecation of datasets, and as such, these datasets and their derivatives continue to be cited in papers and circulate online. These datasets that never die -- which we term "zombie datasets" -- continue to inform the design of production-level systems, causing technical, legal, and ethical challenges; in so doing, they risk perpetuating the harms that prompted their supposed withdrawal, including concerns around bias, discrimination, and privacy. Based on this analysis, we propose a Dataset Deprecation Framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks that can be adapted and implemented by the machine learning community. Drawing on work on datasheets and checklists, we further offer two sample dataset deprecation sheets and propose a centralized repository that tracks which datasets have been deprecated and could be incorporated into the publication protocols of venues like NeurIPS.
【23】 A Survey of Human Activity Recognition in Smart Homes Based on IoT Sensors Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning 标题:基于物联网传感器算法的智能家居人类活动识别综述:分类、挑战和深度学习带来的机遇 链接:https://arxiv.org/abs/2111.04418
作者:Damien Bouchabou,Sao Mai Nguyen,Christophe Lohr,Benoit Leduc,Ioannis Kanellos 机构:Citation: Bouchabou, D.; Nguyen, S.;, Lohr, C.; LeDuc, B.; Kanellos, I. A, Survey of Human Activity Recognition, in Smart Homes Based on IoT Sensors, Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning., Sensors ,. 备注:None 摘要:物联网(IoT)技术的最新进展和传感器成本的降低鼓励了智能环境的发展,如智能家居。智能家居可以提供家居协助服务,以改善居民的生活质量、自主性和健康,特别是老年人和依赖者。要提供此类服务,智能家居必须能够了解住户的日常活动。识别智能家庭中人类活动的技术每天都在进步。但新的挑战每天都在出现。在本文中,我们介绍了通过环境传感器在智能家庭中识别人类活动的最新算法、工作、挑战和分类。此外,由于智能家庭中的活动识别是一个年轻的领域,我们提出了具体的问题,缺少和需要的贡献。同时也提出方向、研究机会和解决方案,以加速这一领域的进展。 摘要:Recent advances in Internet of Things (IoT) technologies and the reduction in the cost of sensors have encouraged the development of smart environments, such as smart homes. Smart homes can offer home assistance services to improve the quality of life, autonomy and health of their residents, especially for the elderly and dependent. To provide such services, a smart home must be able to understand the daily activities of its residents. Techniques for recognizing human activity in smart homes are advancing daily. But new challenges are emerging every day. In this paper, we present recent algorithms, works, challenges and taxonomy of the field of human activity recognition in a smart home through ambient sensors. Moreover, since activity recognition in smart homes is a young field, we raise specific problems, missing and needed contributions. But also propose directions, research opportunities and solutions to accelerate advances in this field.
【24】 Get a Model! Model Hijacking Attack Against Machine Learning Models 标题:找个模特来!针对机器学习模型的模型劫持攻击 链接:https://arxiv.org/abs/2111.04394
作者:Ahmed Salem,Michael Backes,Yang Zhang 机构:CISPA Helmholtz Center for Information Security 备注:To Appear in NDSS 2022 摘要:机器学习(ML)已成为从自动驾驶到认证系统等各种关键应用的基石。然而,随着机器学习模型采用率的提高,出现了多种攻击。其中一类攻击是训练时攻击,对手在机器学习模型训练之前或期间执行攻击。在这项工作中,我们针对基于计算机视觉的机器学习模型提出了一种新的训练时间攻击,即模型劫持攻击。对手的目标是劫持目标模型,以便在模型所有者未注意到的情况下执行与其原始模型不同的任务。模型劫持可能会导致责任和安全风险,因为被劫持的模型所有者可能被诬陷为其模型提供非法或不道德的服务。模型劫持攻击的发起方式与现有数据中毒攻击相同。然而,模型劫持攻击的一个要求是隐蔽性,即用于劫持目标模型的数据样本应与模型的原始训练数据集相似。为此,我们提出了两种不同的劫持攻击模型,即变色龙和反向变色龙,基于一种新的编码器-解码器风格的ML模型,即伪装器。我们的评估表明,我们的两种模型劫持攻击都实现了较高的攻击成功率,模型效用的下降可以忽略不计。 摘要:Machine learning (ML) has established itself as a cornerstone for various critical applications ranging from autonomous driving to authentication systems. However, with this increasing adoption rate of machine learning models, multiple attacks have emerged. One class of such attacks is training time attack, whereby an adversary executes their attack before or during the machine learning model training. In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. The adversary aims to hijack a target model to execute a different task than its original one without the model owner noticing. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Model hijacking attacks are launched in the same way as existing data poisoning attacks. However, one requirement of the model hijacking attack is to be stealthy, i.e., the data samples used to hijack the target model should look similar to the model's original training dataset. To this end, we propose two different model hijacking attacks, namely Chameleon and Adverse Chameleon, based on a novel encoder-decoder style ML model, namely the Camouflager. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
【25】 Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation 标题:用于无监督医疗报告生成的知识图自动编码 链接:https://arxiv.org/abs/2111.04318
作者:Fenglin Liu,Chenyu You,Xian Wu,Shen Ge,Sheng Wang,Xu Sun 机构:MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, School of ECE, Peking University, Paul G. Allen School of Computer Science and Engineering, University of Washington, Department of Electrical Engineering, Yale University 摘要:医学报告生成是指对给定的医学图像自动生成一个长而连贯的报告,它受到了越来越多的研究兴趣。现有的方法主要采用有监督的方式,并且严重依赖于耦合的图像-报告对。然而,在医学领域,建立一个大规模的图像报告配对数据集既耗时又昂贵。为了放松对成对数据的依赖,我们提出了一种无监督的模型知识图自动编码器(KGAE),它在训练中接受独立的图像和报告集。KGAE由预先构造的知识图、知识驱动编码器和知识驱动解码器组成。知识图作为一个共享的潜在空间,架起了视觉域和文本域之间的桥梁;知识驱动编码器将医学图像和报告投影到该潜在空间中的相应坐标,知识驱动解码器根据该空间中的坐标生成医学报告。由于知识驱动的编码器和解码器可以使用独立的图像和报告集进行训练,因此KGAE是无监督的。实验表明,无监督的KGAE在不使用任何图像报告训练对的情况下生成理想的医疗报告。此外,KGAE还可以在半监督和监督环境下工作,并在训练中接受成对的图像和报告。通过进一步微调图像报告对,KGAE在两个数据集上始终优于当前最先进的模型。 摘要:Medical report generation, which aims to automatically generate a long and coherent report of a given medical image, has been receiving growing research interests. Existing approaches mainly adopt a supervised manner and heavily rely on coupled image-report pairs. However, in the medical domain, building a large-scale image-report paired dataset is both time-consuming and expensive. To relax the dependency on paired data, we propose an unsupervised model Knowledge Graph Auto-Encoder (KGAE) which accepts independent sets of images and reports in training. KGAE consists of a pre-constructed knowledge graph, a knowledge-driven encoder and a knowledge-driven decoder. The knowledge graph works as the shared latent space to bridge the visual and textual domains; The knowledge-driven encoder projects medical images and reports to the corresponding coordinates in this latent space and the knowledge-driven decoder generates a medical report given a coordinate in this space. Since the knowledge-driven encoder and decoder can be trained with independent sets of images and reports, KGAE is unsupervised. The experiments show that the unsupervised KGAE generates desirable medical reports without using any image-report training pairs. Moreover, KGAE can also work in both semi-supervised and supervised settings, and accept paired images and reports in training. By further fine-tuning with image-report pairs, KGAE consistently outperforms the current state-of-the-art models on two datasets.
【26】 Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning 标题:图健壮性基准:对图机器学习的对抗性健壮性进行基准测试 链接:https://arxiv.org/abs/2111.04314
作者:Qinkai Zheng,Xu Zou,Yuxiao Dong,Yukuo Cen,Da Yin,Jiarong Xu,Yang Yang,Jie Tang 机构:† Department of Computer Science and Technology, Tsinghua University, ‡ Microsoft Research, Redmond ◦ Fudan University ⋄ Zhejiang University 备注:21 pages, 12 figures, NeurIPS 2021 Datasets and Benchmarks Track 摘要:对图的对抗性攻击对图机器学习(GML)模型的鲁棒性构成了重大威胁。自然,袭击者和捍卫者之间的军备竞赛不断升级。然而,在相同和现实的条件下,双方背后的战略往往没有得到公平的比较。为了弥补这一差距,我们提出了图形鲁棒性基准(GRB),旨在为GML模型的对抗性鲁棒性提供可扩展、统一、模块化和可复制的评估。GRB通过1)开发可扩展和多样化的数据集,2)模块化攻击和防御实施,以及3)在细化场景中统一评估协议,使攻击和防御过程标准化。通过利用GRB管道,最终用户可以专注于开发具有自动化数据处理和实验评估的稳健GML模型。为了支持图形对抗性学习的开放性和可复制性研究,GRB还主持了不同场景的公共排行榜。作为起点,我们对基准技术进行了广泛的实验。GRB是开源的,欢迎社区的贡献。数据集、代码、排行榜可在https://cogdl.ai/grb/home. 摘要:Adversarial attacks on graphs have posed a major threat to the robustness of graph machine learning (GML) models. Naturally, there is an ever-escalating arms race between attackers and defenders. However, the strategies behind both sides are often not fairly compared under the same and realistic conditions. To bridge this gap, we present the Graph Robustness Benchmark (GRB) with the goal of providing a scalable, unified, modular, and reproducible evaluation for the adversarial robustness of GML models. GRB standardizes the process of attacks and defenses by 1) developing scalable and diverse datasets, 2) modularizing the attack and defense implementations, and 3) unifying the evaluation protocol in refined scenarios. By leveraging the GRB pipeline, the end-users can focus on the development of robust GML models with automated data processing and experimental evaluations. To support open and reproducible research on graph adversarial learning, GRB also hosts public leaderboards across different scenarios. As a starting point, we conduct extensive experiments to benchmark baseline techniques. GRB is open-source and welcomes contributions from the community. Datasets, codes, leaderboards are available at https://cogdl.ai/grb/home.
【27】 A Relational Model for One-Shot Classification 标题:一种用于一次分类的关系模型 链接:https://arxiv.org/abs/2111.04313
作者:Arturs Polis,Alexander Ilin 机构:Aalto University, Espoo, Finland 备注:Published at ESANN 2021 摘要:我们表明,具有内置关系归纳偏差的深度学习模型可以在不依赖大量数据扩充的情况下为样本有效学习带来好处。所提出的一次性分类模型以局部和成对注意的形式对一对输入进行关系匹配。我们的方法完美地解决了一次性图像分类Omniglot挑战。我们的模型在没有数据扩充的情况下,超过了人类水平的准确性,以及以前的最先进水平。 摘要:We show that a deep learning model with built-in relational inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention. Our approach solves perfectly the one-shot image classification Omniglot challenge. Our model exceeds human level accuracy, as well as the previous state of the art, with no data augmentation.
【28】 Defense Against Explanation Manipulation 标题:对解释操纵的防御 链接:https://arxiv.org/abs/2111.04303
作者:Ruixiang Tang,Ninghao Liu,Fan Yang,Na Zou,Xia Hu 机构:Texas A&M University, Rice University 摘要:可解释机器学习由于提高了模型的透明度而受到越来越多的关注,这有助于机器学习在实际应用中得到信任。然而,解释方法最近被证明是易受操纵的,我们可以很容易地改变模型的解释,同时保持其预测不变。为了解决这个问题,已经付出了一些努力来使用更稳定的解释方法或改变模型配置。在这项工作中,我们从训练的角度解决了这个问题,并提出了一种新的训练方案,称为解释上的对抗性训练(ATEX),以提高模型的内部解释稳定性,而不管采用何种解释方法。ATEX没有直接指定数据实例的解释值,而是只对模型预测提出要求,从而避免在优化过程中涉及二阶导数。作为进一步讨论,我们还发现解释稳定性与模型的另一个属性密切相关,即暴露于对抗性攻击的风险。通过实验,除了表明ATEX提高了模型对操纵目标解释的鲁棒性外,它还带来了其他好处,包括平滑解释和提高对抗性训练的效果(如果应用于模型)。 摘要:Explainable machine learning attracts increasing attention as it improves transparency of models, which is helpful for machine learning to be trusted in real applications. However, explanation methods have recently been demonstrated to be vulnerable to manipulation, where we can easily change a model's explanation while keeping its prediction constant. To tackle this problem, some efforts have been paid to use more stable explanation methods or to change model configurations. In this work, we tackle the problem from the training perspective, and propose a new training scheme called Adversarial Training on EXplanations (ATEX) to improve the internal explanation stability of a model regardless of the specific explanation method being applied. Instead of directly specifying explanation values over data instances, ATEX only puts requirement on model predictions which avoids involving second-order derivatives in optimization. As a further discussion, we also find that explanation stability is closely related to another property of the model, i.e., the risk of being exposed to adversarial attack. Through experiments, besides showing that ATEX improves model robustness against manipulation targeting explanation, it also brings additional benefits including smoothing explanations and improving the efficacy of adversarial training if applied to the model.
【29】 Deep Unsupervised Active Learning on Learnable Graphs 标题:基于可学习图的深度无监督主动学习 链接:https://arxiv.org/abs/2111.04286
作者:Handong Ma,Changsheng Li,Xinchu Shi,Ye Yuan,Guoren Wang 机构: SCSE, University of Electronic Science and Technology of China, Chengdu, China, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China, Meituan 摘要:近年来,深度学习已成功应用于无监督主动学习。然而,目前的方法试图通过自动编码器学习非线性变换,而忽略了样本关系,为无监督主动学习设计更有效的表示学习机制留下了巨大的空间。本文提出了一种基于可学习图的深度无监督主动学习模型ALLG。ALLG得益于学习最佳图结构,以获得更好的样本表示和选择代表性样本。为了使学习到的图结构更加稳定和有效,我们将$k$-最近邻图作为先验,学习一种关系传播图结构。我们还加入了不同层之间的快捷连接,这可以在一定程度上缓解众所周知的过度平滑问题。据我们所知,这是第一次尝试利用图形结构学习进行无监督主动学习。在六个数据集上进行的大量实验证明了我们方法的有效性。 摘要:Recently deep learning has been successfully applied to unsupervised active learning. However, current method attempts to learn a nonlinear transformation via an auto-encoder while ignoring the sample relation, leaving huge room to design more effective representation learning mechanisms for unsupervised active learning. In this paper, we propose a novel deep unsupervised Active Learning model via Learnable Graphs, named ALLG. ALLG benefits from learning optimal graph structures to acquire better sample representation and select representative samples. To make the learnt graph structure more stable and effective, we take into account $k$-nearest neighbor graph as a priori, and learn a relation propagation graph structure. We also incorporate shortcut connections among different layers, which can alleviate the well-known over-smoothing problem to some extent. To the best of our knowledge, this is the first attempt to leverage graph structure learning for unsupervised active learning. Extensive experiments performed on six datasets demonstrate the efficacy of our method.
【30】 Batch Reinforcement Learning from Crowds 标题:群体中的批量强化学习 链接:https://arxiv.org/abs/2111.04279
作者:Guoxi Zhang,Hisashi Kashima 机构:Graduate School of Informatics, Kyoto University 备注:16 pages 摘要:批量强化学习的一个缺点是它需要数据中的奖励,因此不适用于没有奖励函数的任务。缺乏奖励的现有设置,如行为克隆,依赖于从人类收集的最佳演示。不幸的是,需要广泛的专业知识来确保最佳性,这阻碍了为复杂任务获取大规模数据。本文通过从偏好中学习奖励函数来解决批量强化学习环境中的奖励不足问题。生成首选项只需要对任务有基本的了解。作为一个心理过程,生成偏好比执行演示更快。因此,可以使用众包从非专家人群中大规模收集偏好。本文解决了从非专家人群收集数据时出现的一个关键挑战:偏好中的噪音。提出了一种新的用于标签可靠性建模的概率模型。此外,该模型利用学习的奖励函数平滑估计。对Atari数据集的评估证明了所提出模型的有效性,随后进行了消融研究,以分析所提出想法的相对重要性。 摘要:A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences. Generating preferences only requires a basic understanding of a task. Being a mental process, generating preferences is faster than performing demonstrations. So preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.
【31】 Mimic: An adaptive algorithm for multivariate time series classification 标题:MIMIC:一种适用于多变量时间序列分类的自适应算法 链接:https://arxiv.org/abs/2111.04273
作者:Yuhui Wang,Diane J. Cook 机构:Washington State University 摘要:时间序列数据很有价值,但往往难以理解。在金融、医疗和其他关键应用程序中获得时间序列分类器的信任可能依赖于创建可解释的模型。研究人员此前被迫在缺乏预测能力的可解释方法和缺乏透明度的深度学习方法之间做出选择。在本文中,我们提出了一种新的模拟算法,在引入可解释性的同时保留最强分类器的预测精度。模拟反映了现有多变量时间序列分类器的学习方法,同时生成视觉表示,以增强用户对学习模型的理解。在26个时间序列数据集上的实验支持Mimic可视化和准确地模拟各种时间序列分类器的能力。 摘要:Time series data are valuable but are often inscrutable. Gaining trust in time series classifiers for finance, healthcare, and other critical applications may rely on creating interpretable models. Researchers have previously been forced to decide between interpretable methods that lack predictive power and deep learning methods that lack transparency. In this paper, we propose a novel Mimic algorithm that retains the predictive accuracy of the strongest classifiers while introducing interpretability. Mimic mirrors the learning method of an existing multivariate time series classifier while simultaneously producing a visual representation that enhances user understanding of the learned model. Experiments on 26 time series datasets support Mimic's ability to imitate a variety of time series classifiers visually and accurately.
【32】 Group-Aware Threshold Adaptation for Fair Classification 标题:用于公平分类的组感知阈值自适应 链接:https://arxiv.org/abs/2111.04271
作者:Taeuk Jang,Pengyi Shi,Xiaoqian Wang 机构:School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA, Krannert School of Management, Purdue University, West Lafayette, USA 备注:19 pages 1 figures 摘要:随着机器学习在不同领域的应用不断扩大和多样化,机器学习中的公平性越来越受到人们的关注。为了缓解不同人口群体之间的模型歧视行为,我们引入了一种新的后处理方法,通过群体感知阈值自适应来优化多个公平性约束。我们建议通过优化根据分类模型输出的概率分布估计的混淆矩阵来学习每个人口统计组的自适应分类阈值。由于我们只需要估计模型输出的概率分布,而不需要分类模型结构,因此我们的后处理模型可以应用于广泛的分类模型,以模型无关的方式提高公平性,并确保隐私。这甚至允许我们对现有的公平性方法进行后处理,以进一步改善准确性和公平性之间的权衡。此外,我们的模型具有较低的计算成本。我们对优化算法的收敛性以及方法的准确性和公平性之间的权衡进行了严格的理论分析。在相同条件下,我们的方法在理论上比现有方法具有更好的近似最优性上界。实验结果表明,我们的方法优于现有的方法,得到的结果最接近理论精度和公平性权衡边界。 摘要:The fairness in machine learning is getting increasing attention, as its applications in different fields continue to expand and diversify. To mitigate the discriminated model behaviors between different demographic groups, we introduce a novel post-processing method to optimize over multiple fairness constraints through group-aware threshold adaptation. We propose to learn adaptive classification thresholds for each demographic group by optimizing the confusion matrix estimated from the probability distribution of a classification model output. As we only need an estimated probability distribution of model output instead of the classification model structure, our post-processing model can be applied to a wide range of classification models and improve fairness in a model-agnostic manner and ensure privacy. This even allows us to post-process existing fairness methods to further improve the trade-off between accuracy and fairness. Moreover, our model has low computational cost. We provide rigorous theoretical analysis on the convergence of our optimization algorithm and the trade-off between accuracy and fairness of our method. Our method theoretically enables a better upper bound in near optimality than existing method under same condition. Experimental results demonstrate that our method outperforms state-of-the-art methods and obtains the result that is closest to the theoretical accuracy-fairness trade-off boundary.
【33】 JaMIE: A Pipeline Japanese Medical Information Extraction System 标题:Jamie:一种流水线日文医疗信息抽取系统 链接:https://arxiv.org/abs/2111.04261
作者:Fei Cheng,Shuntaro Yada,Ribeka Tanaka,Eiji Aramaki,Sadao Kurohashi 机构:Kyoto University, Kyoto, Japan, Nara Institute of Science and Technology, Kyoto, Japan 备注:8 pages 摘要:我们提出了一个开放存取的自然语言处理工具包,用于日语医学信息提取。我们首先提出了一种新的关系注释模式,用于研究日本医疗报告中医疗实体之间的医疗和时间关系。我们通过分别对两种不同类型的报告进行注释,对实际注释场景进行了实验。我们设计了一个包含三个组件的管道系统,用于识别医疗实体、分类实体模式和提取关系。实证结果显示了准确的分析性能,并表明了令人满意的注释质量、针对报告类型的有效注释策略以及最新上下文嵌入模型的优越性。 摘要:We present an open-access natural language processing toolkit for Japanese medical information extraction. We first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline system with three components for recognizing medical entities, classifying entity modalities, and extracting relations. The empirical results show accurate analyzing performance and suggest the satisfactory annotation quality, the effective annotation strategy for targeting report types, and the superiority of the latest contextual embedding models.
【34】 Personalized Benchmarking with the Ludwig Benchmarking Toolkit 标题:使用路德维希基准测试工具包进行个性化基准测试 链接:https://arxiv.org/abs/2111.04260
作者:Avanika Narayan,Piero Molino,Karan Goel,Willie Neiswanger,Christopher Ré 机构:Department of Computer Science, Stanford University 备注:14 pages, 14 figures, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks 摘要:机器学习模型在各个领域和部署环境中的快速扩散已经引起了各种社区(如行业从业者)的兴起,他们寻求在任务和个人价值目标之间建立模型基准。不幸的是,这些用户无法使用标准基准结果进行价值驱动的比较,因为传统基准评估模型只针对一个目标(如平均准确度),无法促进控制混杂变量(如计算预算)的标准化训练框架,使公平比较变得困难。为了应对这些挑战,我们引入了开源Ludwig Benchmarking Toolkit(LBT),这是一个个性化的基准测试工具包,用于跨一组易于扩展的任务、深度学习模型、数据集和评估指标运行端到端基准研究(从超参数优化到评估)。LBT提供用于控制训练和定制评估的可配置界面、用于消除混杂变量的标准化训练框架,以及对多目标评估的支持。我们演示了如何使用LBT创建个性化基准研究,并对7个模型和9个数据集的文本分类进行大规模比较分析。我们探讨了推理延迟和性能之间的权衡,数据集属性和性能之间的关系,以及预训练对收敛性和鲁棒性的影响,展示了如何使用LBT来满足各种基准测试目标。 摘要:The rapid proliferation of machine learning models across domains and deployment settings has given rise to various communities (e.g. industry practitioners) which seek to benchmark models across tasks and objectives of personal value. Unfortunately, these users cannot use standard benchmark results to perform such value-driven comparisons as traditional benchmarks evaluate models on a single objective (e.g. average accuracy) and fail to facilitate a standardized training framework that controls for confounding variables (e.g. computational budget), making fair comparisons difficult. To address these challenges, we introduce the open-source Ludwig Benchmarking Toolkit (LBT), a personalized benchmarking toolkit for running end-to-end benchmark studies (from hyperparameter optimization to evaluation) across an easily extensible set of tasks, deep learning models, datasets and evaluation metrics. LBT provides a configurable interface for controlling training and customizing evaluation, a standardized training framework for eliminating confounding variables, and support for multi-objective evaluation. We demonstrate how LBT can be used to create personalized benchmark studies with a large-scale comparative analysis for text classification across 7 models and 9 datasets. We explore the trade-offs between inference latency and performance, relationships between dataset attributes and performance, and the effects of pretraining on convergence and robustness, showing how LBT can be used to satisfy various benchmarking objectives.
【35】 Trust-aware Control for Intelligent Transportation Systems 标题:智能交通系统中的信任感知控制 链接:https://arxiv.org/abs/2111.04248
作者:Mingxi Cheng,Junyao Zhang,Shahin Nazarian,Jyotirmoy Deshmukh,Paul Bogdan 机构: The view of most security-based approaches isThe authors are with the Department of Electrical and ComputerEngineering, University of Southern California 备注:None 摘要:许多智能交通系统都是多智能体系统,即交通参与者和交通基础设施中的子系统都可以建模为交互智能体。使用基于AI的方法来实现不同代理系统之间的协调,可以在仅包含人工操作车辆的运输系统上提供更大的安全性,还可以提高系统在交通吞吐量、感应范围和支持协作任务方面的效率。然而,自主性的提高使得交通基础设施容易受到受损车辆代理或基础设施的影响。本文提出了一个新的框架,通过将信任权限嵌入到交通基础设施中,使用一种称为主观逻辑的认知逻辑系统地量化代理的可信度。在本文中,我们做出了以下新的贡献:(i)我们提出了一个框架,用于使用代理的量化可信度来实现信任感知的协调和控制。(ii)我们演示了如何使用基于强化学习的方法合成信任感知控制器。(iii)我们全面分析了一个自主交叉口管理(AIM)案例研究,并开发了一个称为AIM trust的信任感知版本,该版本在由受信任和不受信任的代理组成的场景中可以降低事故率。 摘要:Many intelligent transportation systems are multi-agent systems, i.e., both the traffic participants and the subsystems within the transportation infrastructure can be modeled as interacting agents. The use of AI-based methods to achieve coordination among the different agents systems can provide greater safety over transportation systems containing only human-operated vehicles, and also improve the system efficiency in terms of traffic throughput, sensing range, and enabling collaborative tasks. However, increased autonomy makes the transportation infrastructure vulnerable to compromised vehicular agents or infrastructure. This paper proposes a new framework by embedding the trust authority into transportation infrastructure to systematically quantify the trustworthiness of agents using an epistemic logic known as subjective logic. In this paper, we make the following novel contributions: (i) We propose a framework for using the quantified trustworthiness of agents to enable trust-aware coordination and control. (ii) We demonstrate how to synthesize trust-aware controllers using an approach based on reinforcement learning. (iii) We comprehensively analyze an autonomous intersection management (AIM) case study and develop a trust-aware version called AIM-Trust that leads to lower accident rates in scenarios consisting of a mixture of trusted and untrusted agents.
【36】 Automated Detection of GDPR Disclosure Requirements in Privacy Policies using Deep Active Learning 标题:基于深度主动学习的隐私策略GDPR泄露要求自动检测 链接:https://arxiv.org/abs/2111.04224
作者:Tamjid Al Rahat,Tu Le,Yuan Tian 机构:University of Virginia 摘要:自2018年5月GDPR生效以来,各公司一直致力于数据实践,以遵守该隐私法。特别是,由于隐私政策是用户了解和控制其隐私的重要沟通渠道,许多公司在GDPR实施后更新了其隐私政策。然而,大多数隐私政策都是冗长的,充满了行话,模糊地描述了公司的数据实践和用户权利。因此,尚不清楚它们是否符合GDPR。在本文中,我们创建了一个由1080个网站组成的隐私政策数据集,这些网站标有18项GDPR要求,并开发了一个基于卷积神经网络(CNN)的模型,该模型可以对隐私政策进行分类,准确率为89.2%。我们应用我们的模型来衡量隐私政策的合规性。我们的研究结果表明,即使在GDPR生效后,97%的网站仍然未能遵守GDPR的至少一项要求。 摘要:Since GDPR came into force in May 2018, companies have worked on their data practices to comply with this privacy law. In particular, since the privacy policy is the essential communication channel for users to understand and control their privacy, many companies updated their privacy policies after GDPR was enforced. However, most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights. Therefore, it is unclear if they comply with GDPR. In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 GDPR requirements and develop a Convolutional Neural Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%. We apply our model to perform a measurement on the compliance in the privacy policies. Our results show that even after GDPR went into effect, 97% of websites still fail to comply with at least one requirement of GDPR.
【37】 VizAI : Selecting Accurate Visualizations of Numerical Data 标题:VizAI:选择准确的数字数据可视化 链接:https://arxiv.org/abs/2111.04190
作者:Ritvik Vij,Rohit Raj,Madhur Singhal,Manish Tanwar,Srikanta Bedathur 机构:Department of CSE, IIT Delhi, India 备注:Proc. of the ACM India Joint International Conference on Data Sciences and Management of Data (CODS-COMAD) 2022 (9th ACM IKDD CODS and 27th COMAD) - To Appear 摘要:良好的数据可视化不仅是数据的无失真图形表示,也是揭示数据潜在统计特性的一种方法。尽管它在数据分析的各个阶段都很常用,但选择一个好的可视化通常是一个涉及许多迭代的手动过程。最近,人们有兴趣通过开发可以推荐可视化的模型来减少这种努力,但它们的用途有限,因为它们需要大量的训练样本(数据和可视化对),并且主要关注设计方面,而不是评估所选可视化的有效性。在本文中,我们提出了VizAI,一个生成性判别框架,该框架首先从数据的许多可选可视化生成数据的各种统计特性。它链接到一个判别模型,该模型选择与被可视化数据的真实统计数据最匹配的可视化。VizAI可以在最少的监督下轻松接受训练,并且可以轻松适应不同监督程度的环境。使用众包判断和大量公开可用的可视化存储库,我们证明VizAI优于学习推荐可视化的最先进方法。 摘要:A good data visualization is not only a distortion-free graphical representation of data but also a way to reveal underlying statistical properties of the data. Despite its common use across various stages of data analysis, selecting a good visualization often is a manual process involving many iterations. Recently there has been interest in reducing this effort by developing models that can recommend visualizations, but they are of limited use since they require large training samples (data and visualization pairs) and focus primarily on the design aspects rather than on assessing the effectiveness of the selected visualization. In this paper, we present VizAI, a generative-discriminative framework that first generates various statistical properties of the data from a number of alternative visualizations of the data. It is linked to a discriminative model that selects the visualization that best matches the true statistics of the data being visualized. VizAI can easily be trained with minimal supervision and adapts to settings with varying degrees of supervision easily. Using crowd-sourced judgements and a large repository of publicly available visualizations, we demonstrate that VizAI outperforms the state of the art methods that learn to recommend visualizations.
【38】 A Word on Machine Ethics: A Response to Jiang et al. (2021) 标题:机器伦理学概论:对江等人的回应。(2021年) 链接:https://arxiv.org/abs/2111.04158
作者:Zeerak Talat,Hagen Blix,Josef Valvoda,Maya Indira Ganesh,Ryan Cotterell,Adina Williams 机构:Simon Fraser University, New York University, University of Cambridge, ETH Zürich, Facebook AI Research 备注:11 pages, 2 figures, submitting soon to ACL Rolling Review 摘要:伦理是人类最长久的智力活动之一。近年来,人工智能和自然语言处理领域试图就如何限制与人类互动的学习系统的道德行为进行争论。这方面的一个建议是构建道德模型,该模型可以接收任意文本并输出对所描述情况的道德判断。在这项工作中,我们专注于最近提出的德尔菲模型的单一案例研究,并对该项目提出的道德判断自动化方法提出批评。通过对德尔福的审计,我们研究了适用于任何类似尝试的更广泛的问题。最后,我们讨论了机器伦理如何以透明、民主价值观为中心,通过关注当前和近期的技术使用,并允许直接问责,有效地进行。 摘要:Ethics is one of the longest standing intellectual endeavors of humanity. In recent years, the fields of AI and NLP have attempted to wrangle with how learning systems that interact with humans should be constrained to behave ethically. One proposal in this vein is the construction of morality models that can take in arbitrary text and output a moral judgment about the situation described. In this work, we focus on a single case study of the recently proposed Delphi model and offer a critique of the project's proposed method of automating morality judgments. Through an audit of Delphi, we examine broader issues that would be applicable to any similar attempt. We conclude with a discussion of how machine ethics could usefully proceed, by focusing on current and near-future uses of technology, in a way that centers around transparency, democratic values, and allows for straightforward accountability.
【39】 Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator 标题:用专门的神经算子学习有限线性时态逻辑规范 链接:https://arxiv.org/abs/2111.04147
作者:Homer Walke,Daniel Ritter,Carl Trimbach,Michael Littman 机构: Brown University, UC Berkeley 备注:10 pages, 5 figures 摘要:有限线性时态逻辑($mathsf{LTL}U f$)是建模时态序列的一种强大的形式化表示。我们解决了从系统行为的标记痕迹中学习紧凑的$mathsf{LTL}u f$公式的问题。我们提出了一种新的神经网络算子,并对其结构neural$mathsf{LTL}u f$进行了评估。我们的方法包括一个专门的递归过滤器,设计用于包含$mathsf{LTL}f$时态运算符,以学习一个高精度的跟踪分类器。然后,对激活进行离散化,提取由学习权重表示的真值表。此真值表转换为符号形式,并作为学习公式返回。对随机生成的$mathsf{LTL}u f$公式的实验表明,与现有方法相比,神经$mathsf{LTL}u f$可以扩展到更大的公式大小,并且即使在存在噪声的情况下也能保持较高的精度。 摘要:Finite linear temporal logic ($mathsf{LTL}_f$) is a powerful formal representation for modeling temporal sequences. We address the problem of learning a compact $mathsf{LTL}_f$ formula from labeled traces of system behavior. We propose a novel neural network operator and evaluate the resulting architecture, Neural$mathsf{LTL}_f$. Our approach includes a specialized recurrent filter, designed to subsume $mathsf{LTL}_f$ temporal operators, to learn a highly accurate classifier for traces. Then, it discretizes the activations and extracts the truth table represented by the learned weights. This truth table is converted to symbolic form and returned as the learned formula. Experiments on randomly generated $mathsf{LTL}_f$ formulas show Neural$mathsf{LTL}_f$ scales to larger formula sizes than existing approaches and maintains high accuracy even in the presence of noise.
【40】 Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis 标题:看看这个差异!基于SOBOL灵敏度分析的有效黑盒解释 链接:https://arxiv.org/abs/2111.04138
作者:Thomas Fel,Remi Cadene,Mathieu Chalvidal,Matthieu Cord,David Vigouroux,Thomas Serre 机构:Rémi Cadène, ∗†, Carney Institute for Brain Science, Brown University, USA, Sorbonne Université, CNRS, France, Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, France, Institut de Recherche Technologique Saint-Exupery, France 备注:NeurIPS2021 摘要:我们描述了一种基于敏感性分析并使用Sobol指数的新归因方法。除了对图像区域的单个贡献进行建模外,Sobol指数还提供了一种有效的方法,通过方差透镜捕捉图像区域之间的高阶交互作用及其对神经网络预测的贡献。我们描述了一种方法,通过使用微扰模板和有效的估计器来处理图像的高维性,使得这些指数的计算对于高维问题是有效的。重要的是,我们证明了所提出的方法在视觉(和语言模型)的标准基准上取得了良好的分数,同时与其他黑盒方法相比大大减少了计算时间——甚至超过了需要访问内部表示的最新白盒方法的精度。我们的代码免费提供:https://github.com/fel-thomas/Sobol-Attribution-Method 摘要:We describe a novel attribution method which is grounded in Sensitivity Analysis and uses Sobol indices. Beyond modeling the individual contributions of image regions, Sobol indices provide an efficient way to capture higher-order interactions between image regions and their contributions to a neural network's prediction through the lens of variance. We describe an approach that makes the computation of these indices efficient for high-dimensional problems by using perturbation masks coupled with efficient estimators to handle the high dimensionality of images. Importantly, we show that the proposed method leads to favorable scores on standard benchmarks for vision (and language models) while drastically reducing the computing time compared to other black-box methods -- even surpassing the accuracy of state-of-the-art white-box methods which require access to internal representations. Our code is freely available: https://github.com/fel-thomas/Sobol-Attribution-Method
【41】 NeurInt : Learning to Interpolate through Neural ODEs 标题:NeurInt:通过神经微调学习插值 链接:https://arxiv.org/abs/2111.04123
作者:Avinandan Bose,Aniket Das,Yatin Dandi,Piyush Rai 机构:Indian Institute of Technology Kanpur 备注:Accepted (Spotlight paper) at the NeurIPS 2021 Workshop on the Symbiosis of Deep Learning and Differential Equations (DLDE) 摘要:广泛的应用需要学习图像生成模型,其潜在空间有效地捕获数据分布中存在的高级变化因素。模型通过其潜在空间表示这种变化的程度可以通过其在图像之间平滑插值的能力来判断。然而,大多数生成模型在生成图像之前映射一个固定值,导致插值轨迹缺乏平滑度,并且包含质量降低的图像。在这项工作中,我们提出了一种新的生成模型,该模型在插值轨迹上学习灵活的非参数先验知识,条件是一对源图像和目标图像。我们设计了一个框架,使用潜在的二阶神经常微分方程学习两幅给定图像之间的轨迹分布,而不是依赖于确定性插值方法(如潜在空间中的线性或球形插值)。通过重建和对抗损失的混合组合,生成器被训练成将这些轨迹中的采样点映射到真实图像序列,这些图像序列从源图像平滑过渡到目标图像。通过全面的定性和定量实验,我们证明了我们的方法在生成质量更高的图像方面的有效性,以及它能够学习任何一对真实源图像和目标图像在平滑插值轨迹上的不同分布。 摘要:A wide range of applications require learning image generation models whose latent space effectively captures the high-level factors of variation present in the data distribution. The extent to which a model represents such variations through its latent space can be judged by its ability to interpolate between images smoothly. However, most generative models mapping a fixed prior to the generated images lead to interpolation trajectories lacking smoothness and containing images of reduced quality. In this work, we propose a novel generative model that learns a flexible non-parametric prior over interpolation trajectories, conditioned on a pair of source and target images. Instead of relying on deterministic interpolation methods (such as linear or spherical interpolation in latent space), we devise a framework that learns a distribution of trajectories between two given images using Latent Second-Order Neural Ordinary Differential Equations. Through a hybrid combination of reconstruction and adversarial losses, the generator is trained to map the sampled points from these trajectories to sequences of realistic images that smoothly transition from the source to the target image. Through comprehensive qualitative and quantitative experiments, we demonstrate our approach's effectiveness in generating images of improved quality as well as its ability to learn a diverse distribution over smooth interpolation trajectories for any pair of real source and target images.
【42】 Automatic Goal Generation using Dynamical Distance Learning 标题:基于动态远程学习的目标自动生成 链接:https://arxiv.org/abs/2111.04120
作者:Bharat Prakash,Nicholas Waytowich,Tinoosh Mohsenin,Tim Oates 机构:University of Maryland, Baltimore County, Baltimore, MD USA, US Army Research Lab, Aberdeen, MD USA 摘要:强化学习(RL)代理可以通过与环境交互来学习解决复杂的顺序决策任务。然而,样本效率仍然是一个重大挑战。在多目标RL领域,agent需要达到多个目标来解决复杂的任务,提高样本效率尤其具有挑战性。另一方面,人类或其他生物制剂以一种更具战略性的方式学习此类任务,遵循一种课程,在该课程中,任务的抽样难度越来越大,以便取得渐进和有效的学习进展。在这项工作中,我们提出了一种基于动态距离函数(DDF)的自动目标生成方法。DDF是一个预测马尔可夫决策过程(MDP)中任意两个状态之间动态距离的函数。通过这一点,我们在适当的难度水平上制定了目标课程,以促进整个训练过程中的有效学习。我们在几个目标条件下的机器人操作和导航任务上评估了这种方法,并与仅使用随机目标抽样的基线方法相比,显示了样本效率的改进。 摘要:Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment. However, sample efficiency remains a major challenge. In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging. On the other hand, humans or other biological agents learn such tasks in a much more strategic way, following a curriculum where tasks are sampled with increasing difficulty level in order to make gradual and efficient learning progress. In this work, we propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion. DDF is a function which predicts the dynamical distance between any two states within a markov decision process (MDP). With this, we generate a curriculum of goals at the appropriate difficulty level to facilitate efficient learning throughout the training process. We evaluate this approach on several goal-conditioned robotic manipulation and navigation tasks, and show improvements in sample efficiency over a baseline method which only uses random goal sampling.
【43】 MetaMIML: Meta Multi-Instance Multi-Label Learning 标题:MetaMIML:元多实例多标签学习 链接:https://arxiv.org/abs/2111.04112
作者:Yuanlin Yang,Guoxian Yu,Jun Wang,Lei Liu,Carlotta Domeniconi,Maozu Guo 机构:College of Computer and Information Sciences, Southwest University, Chongqing, China, School of Software, Shandong University, Jinan, China, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, China 备注:10 pages, 2 figures 摘要:多实例多标签学习(MIML)为复杂对象(包)建模,每个对象与一组相互关联的标签相关联,并由一组实例组成。当前的MIML解决方案仍然关注单一类型的对象,并假设训练数据是IID分布。但这些对象与其他类型的对象相链接,%(即Facebook中与不同用户链接的图片),这些对象也对目标对象的语义进行编码。此外,他们通常需要大量的标记数据进行训练。为了有效挖掘不同类型的相互依赖的MIML对象,我们提出了一种基于网络嵌入和元学习的方法(MetaMIML)。MetaMIML引入了具有网络嵌入的上下文学习器来捕获不同类型对象的语义信息,并引入了任务学习器来提取元知识以快速适应新任务。通过这种方式,MetaMIML可以自然地在数据级别处理MIML对象,但也可以在模型增强级别利用元学习的能力。在基准数据集上的实验表明,MetaMIML的性能明显优于最先进的算法。 摘要:Multi-Instance Multi-Label learning (MIML) models complex objects (bags), each of which is associated with a set of interrelated labels and composed with a set of instances. Current MIML solutions still focus on a single-type of objects and assumes an IID distribution of training data. But these objects are linked with objects of other types, %(i.e., pictures in Facebook link with various users), which also encode the semantics of target objects. In addition, they generally need abundant labeled data for training. To effectively mine interdependent MIML objects of different types, we propose a network embedding and meta learning based approach (MetaMIML). MetaMIML introduces the context learner with network embedding to capture semantic information of objects of different types, and the task learner to extract the meta knowledge for fast adapting to new tasks. In this way, MetaMIML can naturally deal with MIML objects at data level improving, but also exploit the power of meta-learning at the model enhancing. Experiments on benchmark datasets demonstrate that MetaMIML achieves a significantly better performance than state-of-the-art algorithms.
【44】 Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias 标题:潜在混杂因素和选择偏差可能存在的迭代因果发现 链接:https://arxiv.org/abs/2111.04095
作者:Raanan Y. Rohekar,Shami Nisimov,Yaniv Gurwicz,Gal Novik 机构:Intel Labs 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021). arXiv admin note: text overlap with arXiv:2012.07513 摘要:我们提出了一个完善的算法,称为迭代因果发现(ICD),用于在潜在混杂因素和选择偏差存在的情况下恢复因果图。ICD依赖于因果马尔可夫和忠实性假设,并恢复基础因果图的等价类。它从一个完整的图开始,由单个迭代阶段组成,该阶段通过识别连接节点之间的条件独立性(CI)来逐步细化该图。任何迭代后产生的独立性和因果关系都是正确的,可以随时呈现ICD。本质上,我们将CI条件集的大小与其在图上与测试节点的距离联系起来,并在后续迭代中增加该值。因此,每次迭代都会细化一个由先前迭代恢复的图,该图具有更小的条件集(更高的统计能力),这有助于提高稳定性。我们通过经验证明,与FCI、FCI 和RFCI算法相比,ICD需要更少的CI测试,并学习更准确的因果图。 摘要:We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI , and RFCI algorithms.
【45】 Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer 标题:主题Transformer:用主题制约的Transformer生成符号音乐 链接:https://arxiv.org/abs/2111.04093
作者:Yi-Jen Shih,Shih-Lun Wu,Frank Zalkow,Meinard Müller,Yi-Hsuan Yang 摘要:基于注意力的Transformer模型已越来越多地用于自动音乐生成。为了使用用户指定的序列来调节此类模型的生成过程,一种流行的方法是将该调节序列作为启动序列,并要求转换器解码器生成延续。然而,这种基于提示的条件作用不能保证条件作用序列会发展,甚至不能在生成的延续中简单地重复自身。在本文中,我们提出了一种替代的条件作用方法,称为基于主题的条件作用,该方法明确训练转换器将条件作用序列视为主题材料,必须在其生成结果中多次显示。这是通过两项主要的技术贡献实现的。首先,我们提出了一种基于深度学习的方法,使用对比表征学习和聚类从训练数据中的音乐片段中自动检索主题材料。其次,我们提出了一种新的门控并行注意模块,用于序列对序列(seq2seq)编码器/解码器体系结构中,以更有效地解释Transformer解码器生成过程中给定的调节主题材料。我们报告了对提议的主题变换器和传统的基于提示的基线变体的客观和主观评估,表明我们的最佳模型可以在一定程度上生成具有给定条件的重复和合理变体的复调流行钢琴音乐。 摘要:Attention-based Transformer models have been increasingly employed for automatic music generation. To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation. However, this prompt-based conditioning cannot guarantee that the conditioning sequence would develop or even simply repeat itself in the generated continuation. In this paper, we propose an alternative conditioning approach, called theme-based conditioning, that explicitly trains the Transformer to treat the conditioning sequence as a thematic material that has to manifest itself multiple times in its generation result. This is achieved with two main technical contributions. First, we propose a deep learning-based approach that uses contrastive representation learning and clustering to automatically retrieve thematic materials from music pieces in the training data. Second, we propose a novel gated parallel attention module to be used in a sequence-to-sequence (seq2seq) encoder/decoder architecture to more effectively account for a given conditioning thematic material in the generation process of the Transformer decoder. We report on objective and subjective evaluations of variants of the proposed Theme Transformer and the conventional prompt-based baseline, showing that our best model can generate, to some extent, polyphonic pop piano music with repetition and plausible variations of a given condition.
【46】 Consistency and Consensus Driven for Hesitant Fuzzy Linguistic Decision Making with Pairwise Comparisons 标题:两两比较的犹豫型模糊语言决策的一致性和一致性驱动 链接:https://arxiv.org/abs/2111.04092
作者:Peijia Ren,Zixu Liu,Wei-Guo Zhang,Xilan Wu 机构:a School of Business Administration, South China University of Technology, Guangzhou , b Crop Science Centre, National Institute of Agricultural Botany, Cambridge CB,LE, U.K. 备注:Submitted to Expert Systems with Applications (ISSN: 0957-4174) 摘要:犹豫不决模糊语言偏好关系(HFLPR)是一种在不确定性条件下表达观点的有效方法,因而备受关注。为了增强HFLPR决策理论,本文介绍了一种基于可接受一致性和一致性度量的HFLPR群决策算法,该算法包括:(1)定义犹豫模糊语言几何一致性指数(HFLGCI)提出HFLPR的一致性检查和不一致性改进程序;(2) 根据原始个体HFLPR和整体完美HFLPR之间的相似性衡量群体共识,然后建立共识确保程序,包括确定决策者权重。证明了这两种方法的收敛性和单调性。通过实验进一步研究了定义的HFLGCI的临界值,并进行了对比分析,证明了该算法的有效性。以风险投资引导基金绩效评价为例,说明了该算法的有效性。作为我们工作的一个应用,最终为决策者提供了一个在线决策门户,以利用所提出的算法来解决决策问题。 摘要:Hesitant fuzzy linguistic preference relation (HFLPR) is of interest because it provides an efficient way for opinion expression under uncertainty. For enhancing the theory of decision making with HFLPR, the paper introduces an algorithm for group decision making with HFLPRs based on the acceptable consistency and consensus measurements, which involves (1) defining a hesitant fuzzy linguistic geometric consistency index (HFLGCI) and proposing a procedure for consistency checking and inconsistency improving for HFLPR; (2) measuring the group consensus based on the similarity between the original individual HFLPRs and the overall perfect HFLPR, then establishing a procedure for consensus ensuring including the determination of decision-makers weights. The convergence and monotonicity of the proposed two procedures have been proved. Some experiments are furtherly performed to investigate the critical values of the defined HFLGCI, and comparative analyses are conducted to show the effectiveness of the proposed algorithm. A case concerning the performance evaluation of venture capital guiding funds is given to illustrate the availability of the proposed algorithm. As an application of our work, an online decision-making portal is finally provided for decision-makers to utilize the proposed algorithms to solve decision-making problems.
【47】 Meta Cross-Modal Hashing on Long-Tailed Data 标题:长尾数据的Meta Cross-Modal散列 链接:https://arxiv.org/abs/2111.04086
作者:Runmin Wang,Guoxian Yu,Carlotta Domeniconi,Xiangliang Zhang 机构: Shandong University, com‡Department of Computer Science, George Mason University 备注:10 pages, 4 figures 摘要:由于跨模式散列在减少存储量的同时加快了对大型异构数据的查询速度,因此在多模式数据的近似最近邻搜索中,跨模式散列得到了广泛的研究。大多数散列方法假设训练数据是类平衡的。然而,在实践中,真实世界的数据通常具有长尾分布。在本文中,我们介绍了一种基于元学习的跨模态散列方法(MetaCMH)来处理长尾数据。由于尾类中缺少训练样本,MetaCMH首先从不同模式的数据中学习直接特征,然后引入联想记忆模块来学习尾类样本的记忆特征。然后,它将直接和内存特性结合起来,以获得每个示例的元特性。对于长尾分布的头类样本,直接特征的权重较大,因为有足够的训练数据可以很好地学习它们;而对于稀有类,内存特性的权重更大。最后,MetaCMH使用似然损失函数来保持不同模式下的相似性,并以端到端的方式学习哈希函数。在长尾数据集上的实验表明,MetaCMH的性能明显优于最先进的方法,尤其是在尾类上。 摘要:Due to the advantage of reducing storage while speeding up query time on big heterogeneous data, cross-modal hashing has been extensively studied for approximate nearest neighbor search of multi-modal data. Most hashing methods assume that training data is class-balanced.However, in practice, real world data often have a long-tailed distribution. In this paper, we introduce a meta-learning based cross-modal hashing method (MetaCMH) to handle long-tailed data. Due to the lack of training samples in the tail classes, MetaCMH first learns direct features from data in different modalities, and then introduces an associative memory module to learn the memory features of samples of the tail classes. It then combines the direct and memory features to obtain meta features for each sample. For samples of the head classes of the long tail distribution, the weight of the direct features is larger, because there are enough training data to learn them well; while for rare classes, the weight of the memory features is larger. Finally, MetaCMH uses a likelihood loss function to preserve the similarity in different modalities and learns hash functions in an end-to-end fashion. Experiments on long-tailed datasets show that MetaCMH performs significantly better than state-of-the-art methods, especially on the tail classes.
【48】 Modelling and Optimisation of Resource Usage in an IoT Enabled Smart Campus 标题:物联网智能校园中资源使用的建模和优化 链接:https://arxiv.org/abs/2111.04085
作者:Thanchanok Sutjarittham 机构:A Thesis submitted in fulfillment of the requirements for the Degree of, Doctor of Philosophy, School of Electrical Engineering and Telecommunications, arXiv:,.,v, [cs.CY] , Nov 备注:Doctoral thesis 摘要:大学校园本质上是一座城市的缩影。它们包括各种设施,如住宅、体育中心、演讲厅、停车位和公共交通站。大学面临着不断提高效率的压力,同时为包括学生、员工和访客在内的各种利益相关者提供更好的体验。尽管如此,传闻证据表明,校园资产没有得到有效利用,这通常是由于缺乏数据收集和分析,从而限制了在资源分配和管理方面做出明智决策的能力。物联网(IoT)技术的进步可以感知和交流来自物理世界的数据,再加上数据分析和人工智能(AI)可以预测使用模式,为组织降低成本和改善用户体验开辟了新的机会。本论文以新南威尔士大学悉尼分校为实验室,通过理论和实验探索这一机遇。 摘要:University campuses are essentially a microcosm of a city. They comprise diverse facilities such as residences, sport centres, lecture theatres, parking spaces, and public transport stops. Universities are under constant pressure to improve efficiencies while offering a better experience to various stakeholders including students, staff, and visitors. Nonetheless, anecdotal evidence indicates that campus assets are not being utilised efficiently, often due to the lack of data collection and analysis, thereby limiting the ability to make informed decisions on the allocation and management of resources. Advances in the Internet of Things (IoT) technologies that can sense and communicate data from the physical world, coupled with data analytics and Artificial intelligence (AI) that can predict usage patterns, have opened up new opportunities for organisations to lower cost and improve user experience. This thesis explores this opportunity via theory and experimentation using UNSW Sydney as a living laboratory.
【49】 Cross-modal Zero-shot Hashing by Label Attributes Embedding 标题:基于标签属性嵌入的跨模式零射散列 链接:https://arxiv.org/abs/2111.04080
作者:Runmin Wang,Guoxian Yu,Lei Liu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 备注:7 pages, 2 figures 摘要:跨模态哈希(CMH)是跨模态近似最近邻搜索中最有前途的方法之一。大多数CMH解决方案理想地假设训练集和测试集的标签是相同的。然而,这一假设经常被违反,导致零炮CMH问题。最近解决这个问题的努力集中在使用标签属性将知识从可见的类转移到不可见的类上。然而,属性与多模态数据的特征是分离的。为了减少信息差距,我们引入了一种称为LAEH(零炮跨模式散列标签属性嵌入)的方法。LAEH首先通过word2vec模型获取标签的初始语义属性向量,然后使用转换网络将其转换为公共子空间。接下来,它利用散列向量和特征相似矩阵来指导不同模式的特征提取网络。同时,LAEH利用属性相似度作为标签相似度的补充,对标签嵌入和公共子空间进行了修正。实验表明,LAEH的性能优于相关的代表性零炮和跨模态散列方法。 摘要:Cross-modal hashing (CMH) is one of the most promising methods in cross-modal approximate nearest neighbor search. Most CMH solutions ideally assume the labels of training and testing set are identical. However, the assumption is often violated, causing a zero-shot CMH problem. Recent efforts to address this issue focus on transferring knowledge from the seen classes to the unseen ones using label attributes. However, the attributes are isolated from the features of multi-modal data. To reduce the information gap, we introduce an approach called LAEH (Label Attributes Embedding for zero-shot cross-modal Hashing). LAEH first gets the initial semantic attribute vectors of labels by word2vec model and then uses a transformation network to transform them into a common subspace. Next, it leverages the hash vectors and the feature similarity matrix to guide the feature extraction network of different modalities. At the same time, LAEH uses the attribute similarity as the supplement of label similarity to rectify the label embedding and common subspace. Experiments show that LAEH outperforms related representative zero-shot and cross-modal hashing methods.
【50】 Open-Set Crowdsourcing using Multiple-Source Transfer Learning 标题:基于多源迁移学习的开集众包 链接:https://arxiv.org/abs/2111.04073
作者:Guangyang Han,Guoxian Yu,Lei Liu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 机构:Zhang, College of Computer and Information Sciences, Southwest University, China, School of Software, Shandong University, China, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, China 备注:8 pages, 1 figures 摘要:我们提出并定义了一个新的众包场景,开放式众包,我们只知道一个不熟悉的众包项目的一般主题,而不知道它的标签空间,即一组可能的标签。这仍然是一个任务注释问题,但对任务和标签空间的不熟悉妨碍了任务和工作人员的建模以及真理推理。我们提出了一个直观的解决方案OSCrowd。首先,OSCrowd将群组主题相关数据集集成到一个大的源域中,以促进部分迁移学习,从而近似这些任务的标签空间推断。接下来,它根据类别相关性为每个源域分配权重。在此之后,它使用多源开放集迁移学习来建模群组任务并分配可能的注释。迁移学习给出的标签空间和注释将用于指导和规范人群工作者的注释。我们在一个在线场景中验证了OSCrowd,并证明OSCrowd解决了开放集众包问题,比相关的众包解决方案工作得更好。 摘要:We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the task and of workers, and also the truth inference. We propose an intuitive solution, OSCrowd. First, OSCrowd integrates crowd theme related datasets into a large source domain to facilitate partial transfer learning to approximate the label space inference of these tasks. Next, it assigns weights to each source domain based on category correlation. After this, it uses multiple-source open set transfer learning to model crowd tasks and assign possible annotations. The label space and annotations given by transfer learning will be used to guide and standardize crowd workers' annotations. We validate OSCrowd in an online scenario, and prove that OSCrowd solves the open set crowdsourcing problem, works better than related crowdsourcing solutions.
【51】 DVS: Deep Visibility Series and its Application in Construction Cost Index Forecasting 标题:DVS:深能见度序列及其在工程造价指数预测中的应用 链接:https://arxiv.org/abs/2111.04071
作者:Tianxiang Zhan,Yuanpeng He,Hanwen Li,Fuyuan Xiao 摘要:时间序列预测一直是科学研究的热点。随着人工智能的发展,新的时间序列预测方法通过仿生研究和对以往方法的改进,取得了更好的预测效果和预测性能。在以往的研究中,可视图(VG)算法常用于时间序列预测,但其预测效果不如人工神经网络(ANN)、卷积神经网络(CNN)和长短时记忆网络(LSTM)等深度学习预测方法。VG算法包含了丰富的网络信息,但以往的研究没有有效地利用网络信息进行预测,导致预测误差较大。为了解决这一问题,本文通过对VG的仿生设计和对以往研究的扩展,提出了深度可视系列(DVS)模块,这是首次将VG与仿生设计和深度网络相结合。将生物视觉仿生设计应用于VG,使DVS时间序列获得了较高的预测精度,为时间序列预测做出了贡献。同时,本文将DVS预测方法应用到工程造价指标预测中,具有一定的现实意义。 摘要:Time series forecasting has always been a hot spot in scientific research. With the development of artificial intelligence, new time series forecasting methods have obtained better forecasting effects and forecasting performance through bionic research and improvements to the past methods. Visibility Graph (VG) algorithm is often used for time series prediction in previous research, but the prediction effect is not as good as deep learning prediction methods such as Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) prediction. The VG algorithm contains a wealth of network information, but previous studies did not effectively use the network information to make predictions, resulting in relatively large prediction errors. In order to solve this problem, this paper proposes the Deep Visibility Series (DVS) module through the bionic design of VG and the expansion of the past research, which is the first time to combine VG with bionic design and deep network. By applying the bionic design of biological vision to VG, the time series of DVS has obtained superior forecast accuracy, which has made a contribution to time series forecasting. At the same time, this paper applies the DVS forecasting method to the construction cost index forecast, which has practical significance.
【52】 Crowdsourcing with Meta-Workers: A New Way to Save the Budget 标题:元员工众包:节约预算的新途径 链接:https://arxiv.org/abs/2111.04068
作者:Guangyang Han,Guoxian Yu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 备注:11 pages, 6 figures 摘要:由于互联网工作者的不可靠性,很难令人满意地完成众包项目,特别是在任务多且预算有限的情况下。近年来,元学习为Few-Shot学习带来了新的活力,使得仅使用少量训练样本就可以获得性能良好的分类器成为可能。这里我们介绍emph{meta worker}的概念,这是一种通过元学习训练的机器注释器,用于非常适合人工智能的任务类型(即图像分类)。与普通人群工作者不同,元工作者可以是可靠的、稳定的,更重要的是,不知疲倦的、自由的。我们首先对未标记的数据进行聚类,并要求群组工作人员重复标注聚类中心附近的实例;然后,我们利用带注释的数据和元训练数据集,使用不同的元学习算法构建元工作者集群。随后,元工作者被要求对剩余的众包任务进行注释。Jensen-Shannon分歧用于衡量元工作者提供的注释之间的分歧,这决定是否应邀请群组工作者对同一任务进行进一步注释。最后,我们对元工作者的偏好进行建模,并通过加权多数投票计算共识注释。我们的实证研究证实,通过结合机器和人类智能,我们可以以比最先进的任务分配方法更低的预算完成众包项目,同时实现更高或可比的质量。 摘要:Due to the unreliability of Internet workers, it's difficult to complete a crowdsourcing project satisfactorily, especially when the tasks are multiple and the budget is limited. Recently, meta learning has brought new vitality to few-shot learning, making it possible to obtain a classifier with a fair performance using only a few training samples. Here we introduce the concept of emph{meta-worker}, a machine annotator trained by meta learning for types of tasks (i.e., image classification) that are well-fit for AI. Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free. We first cluster unlabeled data and ask crowd workers to repeatedly annotate the instances nearby the cluster centers; we then leverage the annotated data and meta-training datasets to build a cluster of meta-workers using different meta learning algorithms. Subsequently, meta-workers are asked to annotate the remaining crowdsourced tasks. The Jensen-Shannon divergence is used to measure the disagreement among the annotations provided by the meta-workers, which determines whether or not crowd workers should be invited for further annotation of the same task. Finally, we model meta-workers' preferences and compute the consensus annotation by weighted majority voting. Our empirical study confirms that, by combining machine and human intelligence, we can accomplish a crowdsourcing project with a lower budget than state-of-the-art task assignment methods, while achieving a superior or comparable quality.
【53】 Coordinated Proximal Policy Optimization 标题:协调近邻策略优化 链接:https://arxiv.org/abs/2111.04051
作者:Zifan Wu,Chao Yu,Deheng Ye,Junge Zhang,Haiyin Piao,Hankz Hankui Zhuo 机构:School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, Tencent AI Lab, Shenzhen, China, Institute of Automation, Chinese Academy of Science, Beijing, China, School of Electronic and Information 摘要:我们提出了协调近端策略优化(CoPPO),一种将原始近端策略优化(PPO)扩展到多代理设置的算法。关键思想在于多个代理在策略更新过程中协调调整步长。我们证明了当优化一个有理论基础的联合目标时,政策改进的单调性,并基于一组近似推导出一个简化的优化目标。然后,我们解释了CoPPO中的这一目标可以实现代理之间的动态信用分配,从而缓解代理策略同时更新期间的高差异问题。最后,我们证明了在典型的多智能体环境下,包括合作矩阵游戏和星际争霸II微观管理任务,CoPPO的表现优于几个强基线,并与最新的多智能体PPO方法(即MAPPO)具有竞争力。 摘要:We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i.e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.
【54】 V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects 标题:V-MAO:关节物体多臂操作的产生式建模 链接:https://arxiv.org/abs/2111.03987
作者:Xingyu Liu,Kris M. Kitani 机构:Robotics Institute, Carnegie Mellon University 备注:CoRL 2021 摘要:操纵铰接对象通常需要多个机器人手臂。让多个机器人手臂协同完成关节对象上的操纵任务是一项挑战。在本文中,我们提出了$textbf{V-MAO}$,一个学习关节对象多臂操作的框架。我们的框架包括一个变分生成模型,学习每个机器人手臂在物体刚性部分上的接触点分布。训练信号是通过与仿真环境的交互获得的,仿真环境通过规划和关节对象的以对象为中心的控制的新形式实现。我们在定制的MuJoCo仿真环境中部署了我们的框架,并证明我们的框架在六个不同的对象和两个不同的机器人上实现了较高的成功率。我们还表明,生成建模可以有效地学习关节对象上的接触点分布。 摘要:Manipulating articulated objects requires multiple robot arms in general. It is challenging to enable multiple robot arms to collaboratively complete manipulation tasks on articulated objects. In this paper, we present $textbf{V-MAO}$, a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm. The training signal is obtained from interaction with the simulation environment which is enabled by planning and a novel formulation of object-centric control for articulated objects. We deploy our framework in a customized MuJoCo simulation environment and demonstrate that our framework achieves a high success rate on six different objects and two different robots. We also show that generative modeling can effectively learn the contact point distribution on articulated objects.
【55】 Profitable Trade-Off Between Memory and Performance In Multi-Domain Chatbot Architectures 标题:多域聊天机器人体系结构中内存和性能之间的有利可图的权衡 链接:https://arxiv.org/abs/2111.03963
作者:D Emre Tasar,Sukru Ozan,M Fatih Akca,Oguzhan Olmez,Semih Gulum,Secilay Kutay,Ceren Belhan 机构:Özet, Doğal dil işleme alanında metin sınıflandırma problemi oldukça geniş bir çalışma alanıdır. Metin sınıflandırma, problemi kısaca, verilen metnin daha öncesinde belirlenen sınıflardan hangisine ait olduğunun tespit edilmesidir. 备注:in Turkish language. ICADA 21 1st International Conference on Artificial Intelligence and Data Science Nov 26-Nov 28 2021 Izmir Katip Celebi University Izmir, Turkey 摘要:文本分类问题是自然语言处理领域中一个非常广泛的研究领域。简而言之,文本分类问题是确定给定文本属于先前确定的类别中的哪一个。在过去的研究中,已经在这一领域进行了成功的研究。在这项研究中,使用了Transformer的双向编码器表示(BERT),这是自然语言处理领域中解决分类问题的常用方法。通过在聊天机器人体系结构中使用单个模型来解决分类问题,其目的是减轻服务器上的负载,该负载将由用于解决多个分类问题的多个模型创建。在这一点上,在估计单个BERT模型(为多个主题的分类而创建)期间应用掩蔽方法,在基于问题的基础上提供模型的估计。为了使问题复杂化,采用不同的方法划分了三个相互覆盖不同领域的独立数据集,并以这种方式包括了在领域方面非常接近的分类问题。以这种方式使用的数据集由五个分类问题组成,共有154个类。一个包含所有分类问题的BERT模型和专门为这些问题训练的其他BERT模型在性能和它们在服务器上占用的空间方面相互比较。 摘要:Text classification problem is a very broad field of study in the field of natural language processing. In short, the text classification problem is to determine which of the previously determined classes the given text belongs to. Successful studies have been carried out in this field in the past studies. In the study, Bidirectional Encoder Representations for Transformers (BERT), which is a frequently preferred method for solving the classification problem in the field of natural language processing, is used. By solving classification problems through a single model to be used in a chatbot architecture, it is aimed to alleviate the load on the server that will be created by more than one model used for solving more than one classification problem. At this point, with the masking method applied during the estimation of a single BERT model, which was created for classification in more than one subject, the estimation of the model was provided on a problem-based basis. Three separate data sets covering different fields from each other are divided by various methods in order to complicate the problem, and classification problems that are very close to each other in terms of field are also included in this way. The dataset used in this way consists of five classification problems with 154 classes. A BERT model containing all classification problems and other BERT models trained specifically for the problems were compared with each other in terms of performance and the space they occupied on the server.
【56】 Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods 标题:策略梯度方法的时间离散化-不变安全动作重复 链接:https://arxiv.org/abs/2111.03941
作者:Seohong Park,Jaekyeom Kim,Gunhee Kim 机构:Seoul National University 摘要:在强化学习中,连续时间通常由时间尺度$delta$离散,由此产生的性能对其高度敏感。在这项工作中,我们寻求为策略梯度(PG)方法找到一个$delta$不变的算法,该算法无论$delta$的值是多少,都能很好地执行。我们首先确定了导致PG方法失败为$delta到0$的根本原因,证明了在一定的随机性假设下,PG估计量的方差在随机环境中可以发散到无穷大。虽然可以使用持续动作或动作重复来保持$delta$-不变性,但以前的动作重复方法无法立即对随机环境中的意外情况做出反应。因此,我们提出了一种新的$delta$不变方法,称为安全动作重复(SAR),适用于任何现有的PG算法。SAR可以通过对动作重复期间的状态变化作出自适应反应来处理环境的随机性。我们的经验表明,我们的方法不仅是$delta$不变的,而且对随机性也具有鲁棒性,在八个具有确定性和随机设置的MuJoCo环境中,优于以前的$delta$不变方法。我们的代码可在https://vision.snu.ac.kr/projects/sar. 摘要:In reinforcement learning, continuous time is often discretized by a time scale $delta$, to which the resulting performance is known to be highly sensitive. In this work, we seek to find a $delta$-invariant algorithm for policy gradient (PG) methods, which performs well regardless of the value of $delta$. We first identify the underlying reasons that cause PG methods to fail as $delta to 0$, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to have $delta$-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novel $delta$-invariant method named Safe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments by adaptively reacting to changes in states during action repetition. We empirically show that our method is not only $delta$-invariant but also robust to stochasticity, outperforming previous $delta$-invariant approaches on eight MuJoCo environments with both deterministic and stochastic settings. Our code is available at https://vision.snu.ac.kr/projects/sar.
【57】 Convolutional Gated MLP: Combining Convolutions & gMLP 标题:卷积门控MLP:结合卷积和gMLP 链接:https://arxiv.org/abs/2111.03940
作者:A. Rajagopal,V. Nirmala 备注:Conference 摘要:据我们所知,这是第一篇将卷积引入选通多层感知器的论文,并促成了这种新型深度学习体系结构的实现。谷歌大脑于2021年5月推出了gMLP。微软于2021年3月在Vision Transformer中引入了卷积。受gMLP和CvT的启发,我们在gMLP中引入了卷积层。CvT结合了卷积和注意力的力量。我们的实现结合了卷积学习和空间选通MLP的优点。此外,本文还可视化了CgMLP是如何学习的。可视化显示CgMLP如何从汽车轮廓等特征中学习。虽然注意力是深度学习最近许多进展的基础,但gMLP提出了一种不使用注意力计算的方法。在基于Transformer的方法中,需要使用大量的训练数据来学习大量的注意矩阵。在gMLP中,通过使用较小的数据集进行迁移学习,新任务的精细调整可能具有挑战性。我们实现了CgMLP,并将其与CIFAR数据集上的gMLP进行了比较。实验结果探索了CgMLP的泛化能力,而gMLP倾向于大幅过度拟合训练数据。总之,本文提出了一种新的深度学习架构,并首次在文献中通过可视化展示了CgMLP的学习机制。 摘要:To the best of our knowledge, this is the first paper to introduce Convolutions to Gated MultiLayer Perceptron and contributes an implementation of this novel Deep Learning architecture. Google Brain introduced the gMLP in May 2021. Microsoft introduced Convolutions in Vision Transformer in Mar 2021. Inspired by both gMLP and CvT, we introduce convolutional layers in gMLP. CvT combined the power of Convolutions and Attention. Our implementation combines the best of Convolutional learning along with spatial gated MLP. Further, the paper visualizes how CgMLP learns. Visualizations show how CgMLP learns from features such as outline of a car. While Attention was the basis of much of recent progress in Deep Learning, gMLP proposed an approach that doesn't use Attention computation. In Transformer based approaches, a whole lot of Attention matrixes need to be learnt using vast amount of training data. In gMLP, the fine tunning for new tasks can be challenging by transfer learning with smaller datasets. We implement CgMLP and compares it with gMLP on CIFAR dataset. Experimental results explore the power of generaliza-tion of CgMLP, while gMLP tend to drastically overfit the training data. To summarize, the paper contributes a novel Deep Learning architecture and demonstrates the learning mechanism of CgMLP through visualizations, for the first time in literature.
【58】 Transformer Based Bengali Chatbot Using General Knowledge Dataset 标题:使用常识数据集的基于Transformer的孟加拉聊天机器人 链接:https://arxiv.org/abs/2111.03937
作者:Abu Kaisar Mohammad Masum,Sheikh Abujar,Sharmin Akter,Nushrat Jahan Ria,Syed Akhter Hossain 机构:Daffodil International University, Independent International University, University of Liberal Arts Bangladesh, Dhaka, Bangladesh 备注:None 摘要:人工智能聊天机器人在从经过训练的数据集中学习后提供了令人印象深刻的响应。在这十年中,大多数研究工作表明,深层神经模型优于任何其他模型。RNN模型通常用于确定与序列相关的问题,如问题和答案。这种方法在seq2seq学习中让每个人都很熟悉。在seq2seq模型机制中,它具有编码器和解码器。编码器嵌入任何输入序列,解码器嵌入输出序列。为了增强seq2seq模型的性能,在编码器和解码器中添加了注意机制。在此之后,transformer模型作为一个高性能模型介绍了自己,该模型具有解决序列相关困境的多重注意机制。与基于RNN的模型相比,该模型减少了训练时间,并且实现了序列转导的最新性能。在本研究中,我们应用了基于孟加拉通用知识问答(QA)数据集的孟加拉通用知识聊天机器人transformer模型。在应用的质量保证数据上,其得分为85.0 BLEU。为了检查Transformer模型性能的比较,我们对seq2seq模型进行了训练,并将注意力集中在我们的数据集上,该数据集得分为23.5 BLEU。 摘要:An AI chatbot provides an impressive response after learning from the trained dataset. In this decade, most of the research work demonstrates that deep neural models superior to any other model. RNN model regularly used for determining the sequence-related problem like a question and it answers. This approach acquainted with everyone as seq2seq learning. In a seq2seq model mechanism, it has encoder and decoder. The encoder embedded any input sequence, and the decoder embedded output sequence. For reinforcing the seq2seq model performance, attention mechanism added into the encoder and decoder. After that, the transformer model has introduced itself as a high-performance model with multiple attention mechanism for solving the sequence-related dilemma. This model reduces training time compared with RNN based model and also achieved state-of-the-art performance for sequence transduction. In this research, we applied the transformer model for Bengali general knowledge chatbot based on the Bengali general knowledge Question Answer (QA) dataset. It scores 85.0 BLEU on the applied QA data. To check the comparison of the transformer model performance, we trained the seq2seq model with attention on our dataset that scores 23.5 BLEU.
【59】 Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits 标题:非平稳对决过程中最优高效的动态后悔算法 链接:https://arxiv.org/abs/2111.03917
作者:Shubham Gupta,Aadirupa Saha 机构:Indian Institute of Science 摘要:研究了非平稳或时变偏好下$K$武装决斗土匪的动态后悔最小化问题。这是一个在线学习设置,在该设置中,代理在每轮中选择一对项目,并仅观察这对项目的相对二进制“赢-输”反馈,从该轮的基本偏好矩阵中取样。我们首先研究了对抗性偏好序列的静态遗憾最小化问题,设计了一个高概率遗憾为$O(sqrt{KT})$的高效算法。接下来,我们使用类似的算法思想,在两种非平稳性的概念下,提出了一种有效且可证明最优的动态遗憾最小化算法。特别是,我们建立了$tO(sqrt{SKT})$和$tO({V_T^{1/3}K^{1/3}T^{2/3}})$动态后悔保证,$S$是基本偏好关系中的“有效开关”总数,$V_T$是“连续变化”非平稳性的度量。尽管非平稳环境在现实世界系统中具有实用性,但在这项工作之前,尚未研究这些问题的复杂性。我们通过证明在上述两种非平稳性概念下匹配的下界保证来证明我们算法的最优性。最后,我们通过大量的模拟验证了我们的结果,并将我们的算法的有效性与最先进的基线进行了比较。 摘要:We study the problem of emph{dynamic regret minimization} in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(sqrt{KT})$ high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish $tO(sqrt{SKT})$ and $tO({V_T^{1/3}K^{1/3}T^{2/3}})$ dynamic-regret guarantees, $S$ being the total number of `effective-switches' in the underlying preference relations and $V_T$ being a measure of `continuous-variation' non-stationarity. The complexity of these problems have not been studied prior to this work despite the practicability of non-stationary environments in real world systems. We justify the optimality of our algorithms by proving matching lower bound guarantees under both the above-mentioned notions of non-stationarities. Finally, we corroborate our results with extensive simulations and compare the efficacy of our algorithms over state-of-the-art baselines.
【60】 Robust Deep Reinforcement Learning for Quadcopter Control 标题:用于四轴飞行器控制的鲁棒深度强化学习 链接:https://arxiv.org/abs/2111.03915
作者:Aditya M. Deshpande,Ali A. Minai,Manish Kumar 机构:University of Cincinnati, Clifton Ave., Cincinnati, Ohio 备注:6 pages; 3 Figures; Accepted in this https URL 摘要:深度强化学习(RL)使得使用神经网络作为函数逼近器来解决复杂的机器人问题成为可能。然而,在静态环境中训练的策略在从一个环境转移到另一个环境时会受到泛化的影响。在这项工作中,我们使用鲁棒马尔可夫决策过程(RMDP)来训练无人机控制策略,它结合了鲁棒控制和RL的思想。它选择悲观优化来处理从一个环境到另一个环境的策略转移之间的潜在差距。训练后的控制策略在四旋翼机位置控制任务上进行了测试。RL特工在MuJoCo模拟器中接受训练。在测试过程中,使用不同的环境参数(在训练过程中看不到)来验证训练策略从一个环境转移到另一个环境的鲁棒性。在这些环境中,鲁棒策略的性能优于标准代理,这表明增加的鲁棒性增加了通用性,并且可以适应非平稳环境。代码:https://github.com/adipandas/gym_multirotor 摘要:Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments. Codes: https://github.com/adipandas/gym_multirotor
【61】 TND-NAS: Towards Non-differentiable Objectives in Progressive Differentiable NAS Framework 标题:TND-NAS:渐进式可区分NAS框架中的不可区分目标 链接:https://arxiv.org/abs/2111.03892
作者:Bo Lyu,Shiping Wen,Zheng Yan,Kaibo Shi,Ke Li,Tingwen Huang 摘要:与早期的基于EA、RL的神经结构搜索(NAS)方法相比,可微结构搜索(Differentiable architecture search,简称为差分结构搜索)能够提高搜索效率,已逐渐成为神经结构搜索(Neural architecture search,NAS)领域的主流研究课题。最近的可区分NAS还旨在进一步提高搜索效率,减少GPU内存消耗,并解决“深度差距”问题。然而,这些方法不再能够处理不可微的目标,更不用说多目标,例如性能、鲁棒性、效率和其他指标。我们提出了一个面向不可微目标的端到端体系结构搜索框架TND-NAS,该框架具有可微NAS框架的高效性和多目标NAS(MNA)中不可微度量之间的兼容性。在可微NAS框架下,随着搜索空间的不断放松,TND-NAS在离散空间中优化了体系结构参数($alpha$),同时采用将超级网络逐步缩小$alpha$的搜索策略。我们的代表性实验以两个目标(参数、精度)为例,在CIFAR10(1.09M/3.3%、2.4M/2.95%、9.57M/2.54%)和CIFAR100(2.46M/18.3%、5.46/16.73%、12.88/15.20%)数据集上实现了一系列高性能的紧凑体系结构。有利的是,在现实场景(资源受限、平台专用)下,TND-NAS可以方便地获得帕累托最优解决方案。 摘要:Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its capability to improve efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving search efficiency, reducing the GPU-memory consumption, and addressing the "depth gap" issue. However, these methods are no longer capable of tackling the non-differentiable objectives, let alone multi-objectives, e.g., performance, robustness, efficiency, and other metrics. We propose an end-to-end architecture search framework towards non-differentiable objectives, TND-NAS, with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS (MNAS). Under differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($alpha$) been optimized in discrete space, while resorting to the search policy of progressively shrinking the supernetwork by $alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3%, 2.4M/2.95%, 9.57M/2.54%) and CIFAR100 (2.46M/18.3%, 5.46/16.73%, 12.88/15.20%) datasets. Favorably, under real-world scenarios (resource-constrained, platform-specialized), the Pareto-optimal solutions can be conveniently reached by TND-NAS.
【62】 What augmentations are sensitive to hyper-parameters and why? 标题:哪些增量对超参数敏感?为什么? 链接:https://arxiv.org/abs/2111.03861
作者:Ch Muhammad Awais,Imad Eddine Ibrahim Bekkouch 机构:Machine Learning and, Knowledge Representation Lab, Innopolis University, Innopolis, Russia, BEKKOUCH Imad Eddine Ibrahim, Sorbonne Center for Artificial, Intelligence - SCAI, Sorbonne University, Paris, France 备注:10 pages, 17 figures 摘要:我们对数据集进行增强,以提高预测质量,并使最终模型更能适应噪声数据和域漂移。但问题仍然存在,这些增强如何在不同的超参数下运行?在本研究中,我们评估了增强对模型宣传的敏感性r参数及其一致性和通过执行局部代理(LIME)产生的影响对机器学习模型应用不同增强时超参数影响的解释。我们使用线性回归系数对每个增强进行加权。我们的研究证明,有些增强对超参数高度敏感,而另一些增强更具弹性和可靠性能够的 摘要:We apply augmentations to our dataset to enhance the quality of our predictions and make our final models more resilient to noisy data and domain drifts. Yet the question remains, how are these augmentations going to perform with different hyper-parameters? In this study we evaluate the sensitivity of augmentations with regards to the model's hyper parameters along with their consistency and influence by performing a Local Surrogate (LIME) interpretation on the impact of hyper-parameters when different augmentations are applied to a machine learning model. We have utilized Linear regression coefficients for weighing each augmentation. Our research has proved that there are some augmentations which are highly sensitive to hyper-parameters and others which are more resilient and reliable.
【63】 SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines 标题:SIG-VC:一种说话人信息制导的人机零发声转换系统 链接:https://arxiv.org/abs/2111.03811
作者:Zhang Haozhe,Cai Zexin,Qin Xiaoyi,Li Ming 机构: Duke Kunshan University, China 2School of Computer Science, Wuhan University 摘要:如今,随着越来越多的系统在传统的语音转换(VC)任务中取得了良好的性能,人们的注意力逐渐转向极端条件下的VC任务。在本文中,我们提出了一种新的Zero-Shot语音转换方法。我们的目标是获得说话人内容分离的中间表示,以便更好地去除说话人信息,获得纯内容信息。因此,我们提出的框架包含一个模块,该模块从源说话人的声学特征中去除说话人信息。此外,系统还增加了说话人信息控制,以保持语音克隆性能。通过主观和客观指标对所提出的系统进行评估。结果表明,我们提出的系统显著减少了零炮语音转换中的折衷问题,同时对说话人验证系统具有较高的欺骗能力。 摘要:Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice conversion. We aim to obtain intermediate representations for speaker-content disentanglement of speech to better remove speaker information and get pure content information. Accordingly, our proposed framework contains a module that removes the speaker information from the acoustic feature of the source speaker. Moreover, speaker information control is added to our system to maintain the voice cloning performance. The proposed system is evaluated by subjective and objective metrics. Results show that our proposed system significantly reduces the trade-off problem in zero-shot voice conversion, while it also manages to have high spoofing power to the speaker verification system.
【64】 Development of collective behavior in newborn artificial agents 标题:新生儿人工智能体中集体行为的发展 链接:https://arxiv.org/abs/2111.03796
作者:Donsuk Lee,Samantha M. W. Wood,Justin N. Wood 机构:Informatics Department, Indiana University, United States, Center for the Integrated Study of Animal Behavior, Indiana University, United States, Cognitive Science Program, Indiana University, United States 摘要:集体行为在动物王国中广泛存在。然而,迄今为止,集体行为的发展和机制基础尚未正式确立。什么样的学习机制推动了新生动物集体行为的发展?在这里,我们使用深层强化学习和好奇心驱动学习——这两种学习机制深深植根于心理学和神经科学研究——来构建发展集体行为的新生人工智能体。像新生动物一样,我们的代理人从自然环境中的原始感官输入中学习集体行为。我们的代理人也学习集体行为,没有外部奖励,只使用内在动机(好奇心)来推动学习。具体来说,当我们在自然视觉环境中与团队成员一起培养人工智能体时,智能体会自发地发展自我运动、物体识别和对团队成员的偏好,从而快速学习集体行为所需的所有核心技能。这项工作弥合了高维感官输入和集体行动之间的鸿沟,从而形成了一个集体动物行为的像素到行动模型。更一般地说,我们证明了两种通用的学习机制——深度强化学习和好奇心驱动学习——足以从无监督的自然经验中学习集体行为。 摘要:Collective behavior is widespread across the animal kingdom. To date, however, the developmental and mechanistic foundations of collective behavior have not been formally established. What learning mechanisms drive the development of collective behavior in newborn animals? Here, we used deep reinforcement learning and curiosity-driven learning -- two learning mechanisms deeply rooted in psychological and neuroscientific research -- to build newborn artificial agents that develop collective behavior. Like newborn animals, our agents learn collective behavior from raw sensory inputs in naturalistic environments. Our agents also learn collective behavior without external rewards, using only intrinsic motivation (curiosity) to drive learning. Specifically, when we raise our artificial agents in natural visual environments with groupmates, the agents spontaneously develop ego-motion, object recognition, and a preference for groupmates, rapidly learning all of the core skills required for collective behavior. This work bridges the divide between high-dimensional sensory inputs and collective action, resulting in a pixels-to-actions model of collective animal behavior. More generally, we show that two generic learning mechanisms -- deep reinforcement learning and curiosity-driven learning -- are sufficient to learn collective behavior from unsupervised natural experience.
【65】 Generation of microbial colonies dataset with deep learning style transfer 标题:基于深度学习风格迁移的微生物菌落数据集生成 链接:https://arxiv.org/abs/2111.03789
作者:Jarosław Pawłowski,Sylwia Majchrowska,Tomasz Golan 机构:NeuroSYS, Rybacka ,-, Wrocław, Poland, Wroclaw University of Science and Technology, Wybrze˙ze S. Wyspia´nskiego ,-, Wrocław, Poland 备注:11 pages, 9 figures, 2 tables 摘要:我们引入了一种有效的策略来生成一个Petri训练皿微生物图像的合成数据集,该数据集可用于训练深度学习模型。开发的发生器采用传统的计算机视觉算法和神经风格的数据增强传输方法。我们表明,该方法能够合成具有真实感的图像数据集,用于训练能够定位、分割和分类五种不同微生物物种的神经网络模型。我们的方法需要更少的资源来获得一个有用的数据集,而不是收集和标记一整组带有注释的真实图像。我们证明,从100幅真实图像开始,我们可以生成数据来训练一个检测器,该检测器可以获得与同一个检测器相当的结果,但在一个真实的、几十倍大的数据集上训练。我们证明了该方法在微生物检测和分割中的有效性,但我们希望它具有通用性和灵活性,并且也可以应用于其他科学和工业领域,以检测各种物体。 摘要:We introduce an effective strategy to generate a synthetic dataset of microbiological images of Petri dishes that can be used to train deep learning models. The developed generator employs traditional computer vision algorithms together with a neural style transfer method for data augmentation. We show that the method is able to synthesize a dataset of realistic looking images that can be used to train a neural network model capable of localising, segmenting, and classifying five different microbial species. Our method requires significantly fewer resources to obtain a useful dataset than collecting and labeling a whole large set of real images with annotations. We show that starting with only 100 real images, we can generate data to train a detector that achieves comparable results to the same detector but trained on a real, several dozen times bigger dataset. We prove the usefulness of the method in microbe detection and segmentation, but we expect that it is general and flexible and can also be applicable in other domains of science and industry to detect various objects.
【66】 d3rlpy: An Offline Deep Reinforcement Learning Library 标题:d3rlpy:一个离线深度强化学习库 链接:https://arxiv.org/abs/2111.03788
作者:Takuma Seno,Michita Imai 机构:Keio University, Sony AI 备注:Accepted at Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2021 摘要:在本文中,我们介绍了d3rlpy,一个用于Python的开源离线深度强化学习(RL)库。d3rlpy通过用户友好的API支持许多离线深度RL算法以及在线算法。为了协助深度RL研究和开发项目,d3rlpy提供了实用且独特的功能,如数据收集、导出部署策略、预处理和后处理、分布式Q函数、多步骤学习和方便的命令行界面。此外,d3rlpy还提供了一种新颖的图形界面,使用户能够在不编写程序的情况下训练离线RL算法。最后,使用D4RL数据集对实现的算法进行了基准测试,以确保实现质量。d3rlpy源代码可以在GitHub:url上找到{https://github.com/takuseno/d3rlpy}. 摘要:In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a number of offline deep RL algorithms as well as online algorithms via a user-friendly API. To assist deep RL research and development projects, d3rlpy provides practical and unique features such as data collection, exporting policies for deployment, preprocessing and postprocessing, distributional Q-functions, multi-step learning and a convenient command-line interface. Furthermore, d3rlpy additionally provides a novel graphical interface that enables users to train offline RL algorithms without coding programs. Lastly, the implemented algorithms are benchmarked with D4RL datasets to ensure the implementation quality. The d3rlpy source code can be found on GitHub: url{https://github.com/takuseno/d3rlpy}.
【67】 Confidence Composition for Monitors of Verification Assumptions 标题:验证假设监视器的置信度合成 链接:https://arxiv.org/abs/2111.03782
作者:Ivan Ruchkin,Matthew Cleaveland,Radoslav Ivanov,Pengyuan Lu,Taylor Carpenter,Oleg Sokolsky,Insup Lee 机构:University of Pennsylvania, Philadelphia, Pennsylvania 摘要:在某些假设下,使用神经网络控制器对网络物理系统进行闭环验证提供了强大的安全保证。然而,很难确定这些保证是否在运行时适用,因为可能会违反验证假设。为了预测验证系统中的安全违规行为,我们提出了一个三步框架,用于监控验证假设的可信度。首先,我们用假设上的命题逻辑公式表示验证安全性的充分条件。其次,我们建立校准的置信度监控器,评估每个假设成立的概率。第三,我们通过使用适用于逻辑公式的组合函数组合假设监控器来获得验证保证的置信度。我们的框架为成分监测器的校准和保守性提供了理论界限。在两个案例研究中,我们证明了复合监测器比其成分有所改善,并成功预测了安全违规行为。 摘要:Closed-loop verification of cyber-physical systems with neural network controllers offers strong safety guarantees under certain assumptions. It is, however, difficult to determine whether these guarantees apply at run time because verification assumptions may be violated. To predict safety violations in a verified system, we propose a three-step framework for monitoring the confidence in verification assumptions. First, we represent the sufficient condition for verified safety with a propositional logical formula over assumptions. Second, we build calibrated confidence monitors that evaluate the probability that each assumption holds. Third, we obtain the confidence in the verification guarantees by composing the assumption monitors using a composition function suitable for the logical formula. Our framework provides theoretical bounds on the calibration and conservatism of compositional monitors. In two case studies, we demonstrate that the composed monitors improve over their constituents and successfully predict safety violations.
【68】 Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation 标题:时空图学习与基于模型估计的异步协同定位 链接:https://arxiv.org/abs/2111.03751
作者:Peng Gao,Brian Reily,Rui Guo,Hongsheng Lu,Qingzhao Zhu,Hao Zhang 摘要:协作定位是一组机器人(如连接的车辆)从多个角度协作估计目标位置并进行可靠协作的基本能力。为了实现协作定位,必须解决四个关键挑战,包括建模观察对象之间的复杂关系、融合来自任意数量协作机器人的观察结果、量化定位不确定性以及解决机器人通信延迟问题。在本文中,我们介绍了一种新的方法,该方法集成了不确定性感知时空图学习和基于模型的状态估计,用于一组机器人协作定位对象。具体而言,我们引入了一种新的不确定性感知图学习模型,该模型学习时空图来表示每个机器人随时间观察到的对象的历史运动,并提供对象定位中的不确定性。此外,我们还提出了一种新的集成学习和基于模型的状态估计方法,该方法融合了从任意数量的机器人获得的异步观测数据,用于协作定位。我们在仿真和真实机器人的两个协作对象定位场景中评估了我们的方法。实验结果表明,我们的方法在异步协作定位方面优于以往的方法,并取得了最新的性能。 摘要:Collaborative localization is an essential capability for a team of robots such as connected vehicles to collaboratively estimate object locations from multiple perspectives with reliant cooperation. To enable collaborative localization, four key challenges must be addressed, including modeling complex relationships between observed objects, fusing observations from an arbitrary number of collaborating robots, quantifying localization uncertainty, and addressing latency of robot communications. In this paper, we introduce a novel approach that integrates uncertainty-aware spatiotemporal graph learning and model-based state estimation for a team of robots to collaboratively localize objects. Specifically, we introduce a new uncertainty-aware graph learning model that learns spatiotemporal graphs to represent historical motions of the objects observed by each robot over time and provides uncertainties in object localization. Moreover, we propose a novel method for integrated learning and model-based state estimation, which fuses asynchronous observations obtained from an arbitrary number of robots for collaborative localization. We evaluate our approach in two collaborative object localization scenarios in simulations and on real robots. Experimental results show that our approach outperforms previous methods and achieves state-of-the-art performance on asynchronous collaborative localization.
【69】 An Algorithmic Theory of Metacognition in Minds and Machines 标题:心理与机器元认知的算法理论 链接:https://arxiv.org/abs/2111.03745
作者:Rylan Schaeffer 机构:Department of Computer Science, Stanford University 摘要:人类有时会选择自己认为是次优或错误的行为,即使在没有额外信息的情况下也是如此。这怎么可能?我们提出了一种元认知算法理论,该理论基于强化学习(RL)中基于价值的RL和基于策略的RL之间的权衡。对于认知(神经)科学界来说,我们的理论回答了一个悬而未决的问题:为什么信息可以用于错误检测,而不能用于行动选择。对于机器学习社区,我们提出的理论在演员-评论家代理中创建了演员和评论家之间的新型交互,并注意到RL和贝叶斯优化之间的新型联系。我们把我们提出的代理称为元认知演员批评家(MAC)。最后,我们展示了如何通过实现深度MAC在机器中创建元认知,并展示了它可以在没有外部信息或延迟的情况下检测(某些)自己的次优行为。 摘要:Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding question of why information can be used for error detection but not for action selection. To the machine learning community, our proposed theory creates a novel interaction between the Actor and Critic in Actor-Critic agents and notes a novel connection between RL and Bayesian Optimization. We call our proposed agent the Metacognitive Actor Critic (MAC). We conclude with showing how to create metacognition in machines by implementing a deep MAC and showing that it can detect (some of) its own suboptimal actions without external information or delay.
【70】 Increasing Data Diversity with Iterative Sampling to Improve Performance 标题:通过迭代采样提高数据多样性以提高性能 链接:https://arxiv.org/abs/2111.03743
作者:Devrim Cavusoglu,Ogulcan Eryuksel,Sinan Altinuc 机构:O˘gulcan Eryuksel, OBSS AI 备注:5 pages, 2 (6) figures, to be published in 1st NeurIPS Data-Centric AI Workshop 摘要:作为以数据为中心的人工智能竞赛的一部分,我们提出了一种以数据为中心的方法,通过迭代采样提高训练样本的多样性。该方法本身强烈依赖于增强样本的保真度和增强方法的多样性。此外,我们还通过为困难类引入更多样本,特别是提供更接近边缘的样本,进一步提高了性能,这些样本可能会导致现有模型的误分类。 摘要:As a part of the Data-Centric AI Competition, we propose a data-centric approach to improve the diversity of the training samples by iterative sampling. The method itself relies strongly on the fidelity of augmented samples and the diversity of the augmentation methods. Moreover, we improve the performance further by introducing more samples for the difficult classes especially providing closer samples to edge cases potentially those the model at hand misclassifies.
【71】 Shared Model of Sense-making for Human-Machine Collaboration 标题:人机协作的共享意义生成模型 链接:https://arxiv.org/abs/2111.03728
作者:Gheorghe Tecuci,Dorin Marcu,Louis Kaiser,Mihai Boicu 机构:Learning Agents Center, School of Computing, George Mason University, Fairfax, VA , USA 备注:Presented at AAAI FSS-21: Artificial Intelligence in Government and Public Sector, Washington, DC, USA 摘要:我们提出了一个感知模型,它极大地促进了智能分析师和基于知识的代理之间的协作。这是一个基于证据科学和假设生成和检验的科学方法的通用模型,其中生成解释观察的有意义假设,然后发现相关证据,并根据发现的证据检验假设。我们说明了该模型如何使分析员能够直接指导代理人了解涉及可能生产武器(如化学战剂)的情况,以及代理人如何越来越有能力从该领域了解其他情况(例如,可能生产离心浓缩铀或隐形战斗机)。 摘要:We present a model of sense-making that greatly facilitates the collaboration between an intelligent analyst and a knowledge-based agent. It is a general model grounded in the science of evidence and the scientific method of hypothesis generation and testing, where sense-making hypotheses that explain an observation are generated, relevant evidence is then discovered, and the hypotheses are tested based on the discovered evidence. We illustrate how the model enables an analyst to directly instruct the agent to understand situations involving the possible production of weapons (e.g., chemical warfare agents) and how the agent becomes increasingly more competent in understanding other situations from that domain (e.g., possible production of centrifuge-enriched uranium or of stealth fighter aircraft).
【72】 Reconstructing Training Data from Diverse ML Models by Ensemble Inversion 标题:基于集成反演的不同ML模型训练数据重构 链接:https://arxiv.org/abs/2111.03702
作者:Qian Wang,Daniel Kurz 备注:9 pages, 8 figures, WACV 2022 摘要:模型反演(MI)是指对手滥用对经过训练的机器学习(ML)模型的访问权,试图推断其原始训练数据的敏感信息,引起了越来越多的研究关注。在MI期间,受攻击训练模型(MUA)通常被冻结并用于指导生成器(如生成性对抗网络(GAN))的训练,以重建该模型原始训练数据的分布。这可能会导致原始训练样本泄漏,如果成功,如果训练数据包含个人识别信息(PII),则数据集受试者的隐私将受到威胁。因此,深入研究MI技术的潜力对于相应防御技术的发展至关重要。基于单一模型的高质量训练数据重建具有挑战性。然而,现有的MI文献并未探讨联合瞄准多个模型,这可能会为对手提供额外的信息和不同的视角。我们提出了集合反演技术,该技术通过训练受集合(或集合)约束的生成器来估计原始训练数据的分布。与单个ML模型的MI相比,使用数据集实体的可区分特征生成的样本的质量显著提高。我们在没有任何数据集的情况下获得了高质量的结果,并展示了如何利用与假定训练数据相似的辅助数据集来改进结果。深入研究了集合中模型多样性的影响,并利用附加约束来鼓励对重建样本进行精确预测和高激活,从而更准确地重建训练图像。 摘要:Model Inversion (MI), in which an adversary abuses access to a trained Machine Learning (ML) model attempting to infer sensitive information about its original training data, has attracted increasing research attention. During MI, the trained model under attack (MUA) is usually frozen and used to guide the training of a generator, such as a Generative Adversarial Network (GAN), to reconstruct the distribution of the original training data of that model. This might cause leakage of original training samples, and if successful, the privacy of dataset subjects will be at risk if the training data contains Personally Identifiable Information (PII). Therefore, an in-depth investigation of the potentials of MI techniques is crucial for the development of corresponding defense techniques. High-quality reconstruction of training data based on a single model is challenging. However, existing MI literature does not explore targeting multiple models jointly, which may provide additional information and diverse perspectives to the adversary. We propose the ensemble inversion technique that estimates the distribution of original training data by training a generator constrained by an ensemble (or set) of trained models with shared subjects or entities. This technique leads to noticeable improvements of the quality of the generated samples with distinguishable features of the dataset entities compared to MI of a single ML model. We achieve high quality results without any dataset and show how utilizing an auxiliary dataset that's similar to the presumed training data improves the results. The impact of model diversity in the ensemble is thoroughly investigated and additional constraints are utilized to encourage sharp predictions and high activations for the reconstructed samples, leading to more accurate reconstruction of training images.
【73】 A space of goals: the cognitive geometry of informationally bounded agents 标题:目标空间:信息受限主体的认知几何 链接:https://arxiv.org/abs/2111.03699
作者:Karen Archer,Nicola Catenacci Volpi,Franziska Bröker,Daniel Polani 机构: Adaptive Systems Group, Department of Computer Science, University of, Hertfordshire, Hatfield, United Kingdom, Gatsby Computational Neuroscience Unit, University College London, London, United 备注:Includes supplementary material, 5 figures in the main document, 1 figure in the supplementary material 摘要:传统上,科学家将欧几里德几何视为先验和客观的。然而,当我们选择一个代理的位置时,选择最佳路径的问题也应该考虑代理的能力、它的体现,特别是它的认知努力。在本文中,我们考虑几何之间的旅行之间的国家之间的世界,通过将信息处理成本与适当的空间距离。随着信息成本变得越来越重要,这导致几何体与给定世界的原始几何体越来越不同。我们通过将其投射到二维和三维空间,将其可视化,显示出明显的扭曲,反映出认知和信息保存策略以及轴心状态的出现。传统的基于成本的几何图形与由额外信息成本引起的几何图形之间的相似性,使传统的测地线概念成为通向信息设计概念的最廉价途径。关键的是,infodesics的概念近似于通常的几何特性,即沿测地线从起点移动到目标,不仅目标,而且所有中间点从一开始就以最佳成本平等访问。 摘要:Traditionally, Euclidean geometry is treated by scientists as a priori and objective. However, when we take the position of an agent, the problem of selecting a best route should also factor in the abilities of the agent, its embodiment and particularly its cognitive effort. In this paper we consider geometry in terms of travel between states within a world by incorporating information processing costs with the appropriate spatial distances. This induces a geometry that increasingly differs from the original geometry of the given world, as information costs become increasingly important. We visualize this textit{"cognitive geometry"} by projecting it onto 2- and 3-dimensional spaces showing distinct distortions reflecting the emergence of epistemic and information-saving strategies as well as pivot states. The analogies between traditional cost-based geometries and those induced by additional informational costs invite a generalization of the traditional notion of geodesics as cheapest routes towards the notion of textit{infodesics}. Crucially, the concept of infodesics approximates the usual geometric property that, travelling from a start to a goal along a geodesic, not only the goal, but all intermediate points are equally visited at optimal cost from the start.
【74】 AI and Blackness: Towards moving beyond bias and representation 标题:人工智能与黑色:走向超越偏见和表征 链接:https://arxiv.org/abs/2111.03687
作者:Christopher L. Dancy,P. Khalil Saucier 备注:10 pages, 3 figures, 2 tables 摘要:在本文中,我们认为人工智能伦理必须超越基于种族的代表性和偏见的概念,转向那些探索影响这些系统的设计、开发和部署的更深层次关系的概念。最近关于人工智能系统中偏见的伦理考虑的许多讨论都集中在种族偏见上。我们主张AI中的反黑度需要更多地检查本体论空间,这为人工智能系统的设计、开发和部署提供了基础。我们从设计、开发和部署人工智能系统的社会文化背景的角度来研究这一争论意味着什么,并将重点放在与反黑人种族主义(反黑人)的交叉点上。为了将这些多个视角结合在一起,并展示一个反黑的例子,我们讨论了审计现有开源语义网络(ConceptNet)的结果。我们利用这一讨论进一步分析了人工智能系统设计、开发和部署中的反黑问题,并提出了在试图对抗人工智能系统中的反黑问题时可能提出的问题。 摘要:In this paper, we argue that AI ethics must move beyond the concepts of race-based representation and bias, and towards those that probe the deeper relations that impact how these systems are designed, developed, and deployed. Many recent discussions on ethical considerations of bias in AI systems have centered on racial bias. We contend that antiblackness in AI requires more of an examination of the ontological space that provides a foundation for the design, development, and deployment of AI systems. We examine what this contention means from the perspective of the sociocultural context in which AI systems are designed, developed, and deployed and focus on intersections with anti-Black racism (antiblackness). To bring these multiple perspectives together and show an example of antiblackness in the face of attempts at de-biasing, we discuss results from auditing an existing open-source semantic network (ConceptNet). We use this discussion to further contextualize antiblackness in design, development, and deployment of AI systems and suggest questions one may ask when attempting to combat antiblackness in AI systems.
【75】 Human Activity Recognition using Attribute-Based Neural Networks and Context Information 标题:基于属性神经网络和上下文信息的人体活动识别 链接:https://arxiv.org/abs/2111.04564
作者:Stefan Lüdtke,Fernando Moya Rueda,Waqas Ahmed,Gernot A. Fink,Thomas Kirste 机构:Institute of Visual & Analytic Computing, University of Rostock, Germany, Department of Computer Science, TU Dortmund University, Germany 备注:3rd International Workshop on Deep Learning for Human Activity Recognition 摘要:我们从手工工作过程中的可穿戴传感器数据中考虑人类活动识别(HAR),如仓库订单拣选。这种结构化域通常可以划分为不同的过程步骤,例如打包或运输。每个过程步骤在活动类别上可能具有不同的先验分布,例如站立或行走,以及不同的系统动力学。在这里,我们展示了如何将这些上下文信息系统地集成到基于深度神经网络的HAR系统中。具体而言,我们提出了一种混合体系结构,该结构结合了从原始传感器数据估计高级运动描述符、属性的深层神经网络和从估计属性和(可选)上下文信息预测活动类的浅层分类器,如当前执行的过程步骤。我们的经验表明,与最先进的方法相比,我们提出的体系结构提高了HAR性能。此外,我们还表明,当包含有关流程步骤的信息时,HAR性能可以进一步提高,即使该信息仅部分正确。 摘要:We consider human activity recognition (HAR) from wearable sensor data in manual-work processes, like warehouse order-picking. Such structured domains can often be partitioned into distinct process steps, e.g., packaging or transporting. Each process step can have a different prior distribution over activity classes, e.g., standing or walking, and different system dynamics. Here, we show how such context information can be integrated systematically into a deep neural network-based HAR system. Specifically, we propose a hybrid architecture that combines a deep neural network-that estimates high-level movement descriptors, attributes, from the raw-sensor data-and a shallow classifier, which predicts activity classes from the estimated attributes and (optional) context information, like the currently executed process step. We empirically show that our proposed architecture increases HAR performance, compared to state-of-the-art methods. Additionally, we show that HAR performance can be further increased when information about process steps is incorporated, even when that information is only partially correct.
【76】 Representation Learning via Quantum Neural Tangent Kernels 标题:基于量子神经正切核的表示学习 链接:https://arxiv.org/abs/2111.04225
作者:Junyu Liu,Francesco Tacchino,Jennifer R. Glick,Liang Jiang,Antonio Mezzacapo 机构:Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL , USA, Chicago Quantum Exchange, Chicago, IL , USA, Kadanoff Center for Theoretical Physics, The University of Chicago, Chicago, IL , USA 备注:40=11 29 pages, many figures 摘要:变分量子电路用于量子机器学习和变分量子模拟任务。设计好的变分电路或预测它们在给定的学习或优化任务中的表现仍不清楚。在这里,我们讨论这些问题,用神经正切核理论分析变分量子电路。我们定义了量子神经正切核,并推导了优化和学习任务中相关损失函数的动力学方程。我们解析地求解冻结极限或惰性训练区域中的动力学,其中变分角变化缓慢,线性扰动足够好。我们将分析扩展到动态环境,包括变分角度的二次修正。然后我们考虑混合量子经典体系结构,并定义了混合核的大宽度极限,表明混合量子经典神经网络可以近似高斯。本文给出的结果表明,对用于量子机器学习和优化问题的变分量子电路的训练动力学的分析理解是可能的。这些分析结果得到了量子机器学习实验数值模拟的支持。 摘要:Variational quantum circuits are used in quantum machine learning and variational quantum simulation tasks. Designing good variational circuits or predicting how well they perform for given learning or optimization tasks is still unclear. Here we discuss these problems, analyzing variational quantum circuits using the theory of neural tangent kernels. We define quantum neural tangent kernels, and derive dynamical equations for their associated loss function in optimization and learning tasks. We analytically solve the dynamics in the frozen limit, or lazy training regime, where variational angles change slowly and a linear perturbation is good enough. We extend the analysis to a dynamical setting, including quadratic corrections in the variational angles. We then consider hybrid quantum-classical architecture and define a large width limit for hybrid kernels, showing that a hybrid quantum-classical neural network can be approximately Gaussian. The results presented here show limits for which analytical understandings of the training dynamics for variational quantum circuits, used for quantum machine learning and optimization problems, are possible. These analytical results are supported by numerical simulations of quantum machine learning experiments.
【77】 Dense Representative Tooth Landmark/axis Detection Network on 3D Model 标题:基于三维模型的密集代表性牙齿标志点/轴线检测网络 链接:https://arxiv.org/abs/2111.04212
作者:Guangshun Wei,Zhiming Cui,Jie Zhu,Lei Yang,Yuanfeng Zhou,Pradeep Singh,Min Gu,Wenping Wang 机构:Fellow, IEEE 备注:11pages,27figures 摘要:人工智能(AI)技术越来越多地用于数字正畸,但其中一个挑战是自动准确地检测牙齿的标志和轴线。这部分是因为它们的复杂几何定义,部分是由于单个牙齿和不同类型牙齿之间的巨大差异。因此,我们提出了一种由专业牙医使用标记数据集的深度学习方法,用于牙齿模型上的牙齿地标/轴检测,这对于正畸治疗至关重要。我们的方法不仅可以提取以点(如尖点)形式存在的牙齿标志,还可以提取测量牙齿角度和倾斜度的轴。该网络以三维牙齿模型作为输入,预测各种类型的牙齿标志和轴。具体来说,我们将标记和轴编码为牙齿模型表面上定义的密集场。这种设计选择和一组添加的组件使得所提出的网络更适合从给定的三维牙齿模型中提取稀疏的地标。在一组由经验丰富的牙医准备的牙科模型上对所提出的方法进行了广泛的评估。结果表明,该方法能够生成高精度的牙齿标志点。我们的方法通过与最先进的方法以及消融研究的比较进行了检验和论证。 摘要:Artificial intelligence (AI) technology is increasingly used for digital orthodontics, but one of the challenges is to automatically and accurately detect tooth landmarks and axes. This is partly because of sophisticated geometric definitions of them, and partly due to large variations among individual tooth and across different types of tooth. As such, we propose a deep learning approach with a labeled dataset by professional dentists to the tooth landmark/axis detection on tooth model that are crucial for orthodontic treatments. Our method can extract not only tooth landmarks in the form of point (e.g. cusps), but also axes that measure the tooth angulation and inclination. The proposed network takes as input a 3D tooth model and predicts various types of the tooth landmarks and axes. Specifically, we encode the landmarks and axes as dense fields defined on the surface of the tooth model. This design choice and a set of added components make the proposed network more suitable for extracting sparse landmarks from a given 3D tooth model. Extensive evaluation of the proposed method was conducted on a set of dental models prepared by experienced dentists. Results show that our method can produce tooth landmarks with high accuracy. Our method was examined and justified via comparison with the state-of-the-art methods as well as the ablation studies.
【78】 On the Limits of Design: What Are the Conceptual Constraints on Designing Artificial Intelligence for Social Good? 标题:论设计的极限:为社会公益而设计人工智能的概念约束是什么? 链接:https://arxiv.org/abs/2111.04165
作者:Jakob Mokander 机构: Mökander Oxford Internet Institute, University of Oxford 备注:None 摘要:人工智能AI可以通过帮助降低成本、提高效率和为复杂问题提供新的解决方案,为社会带来巨大的好处。使用Floridi的概念,如何设计“信息圈”作为起点,在这一章中,我考虑的问题是什么是设计的限制,即什么是概念设计AI的社会善?本章的主要论点是,虽然设计是塑造技术和社会的有用概念工具,但设计未来社会的集体努力受到内部和外部因素的制约。通过唤起哈丁关于“公地悲剧”的思想实验,讨论了设计的内部约束。此外,哈耶克对“宇宙”和“出租车”的经典区分被用来界定设计的外部约束。最后,提出了五个设计原则,旨在帮助决策者管理设计的内部和外部约束。设计未来社会的成功方法需要考虑复杂系统的涌现特性,为偶然性和社会技术共同进化留出空间。 摘要:Artificial intelligence AI can bring substantial benefits to society by helping to reduce costs, increase efficiency and enable new solutions to complex problems. Using Floridi's notion of how to design the 'infosphere' as a starting point, in this chapter I consider the question: what are the limits of design, i.e. what are the conceptual constraints on designing AI for social good? The main argument of this chapter is that while design is a useful conceptual tool to shape technologies and societies, collective efforts towards designing future societies are constrained by both internal and external factors. Internal constraints on design are discussed by evoking Hardin's thought experiment regarding 'the Tragedy of the Commons'. Further, Hayek's classical distinction between 'cosmos' and 'taxis' is used to demarcate external constraints on design. Finally, five design principles are presented which are aimed at helping policymakers manage the internal and external constraints on design. A successful approach to designing future societies needs to account for the emergent properties of complex systems by allowing space for serendipity and socio-technological coevolution.
【79】 The Three-Dimensional Structural Configuration of the Central Retinal Vessel Trunk and Branches as a Glaucoma Biomarker 标题:作为青光眼生物标志物的视网膜中央血管干和分支的三维结构形态 链接:https://arxiv.org/abs/2111.03997
作者:Satish K. Panda,Haris Cheong,Tin A. Tun,Thanadet Chuangsuwanich,Aiste Kadziauskiene,Vijayalakshmi Senthil,Ramaswami Krishnadas,Martin L. Buist,Shamira Perera,Ching-Yu Cheng,Tin Aung,Alexandre H. Thiery,Michael J. A. Girard 机构:Ophthalmic Engineering & Innovation Laboratory (OEIL), Singapore Eye Research Institute, Singapore National, Department of Biomedical Engineering, National University of Singapore, Singapore 摘要:目的:评估视网膜中央血管主干及其分支(CRVT&B)的三维结构是否可以作为青光眼的诊断标志。方法:我们训练了一个深度学习网络,从视神经头(ONH)光学相干断层扫描(OCT)容积的B扫描中自动分割CRVT和B。随后,使用从OCT体积中提取的CRVT和B的结构配置,使用两种不同的方法进行青光眼诊断。在第一种方法中,我们旨在仅使用3D CNN和CRVT&B的3D结构进行诊断。在第二种方法中,我们将CRVT&B的3D结构正交投影到三个平面上以获得2D图像,然后使用2D CNN进行诊断。使用Dice系数评估分割准确度,而使用受试者工作特征曲线(AUC)下的面积评估诊断准确度。CRVT&B的诊断性能也与视网膜神经纤维层(RNFL)厚度进行了比较。结果:我们的分割网络能够从OCT扫描中有效地分割视网膜血管。在测试集上,我们获得了0.81pm0.07的骰子系数。3D和2D诊断网络能够区分青光眼和非青光眼受试者,准确率分别为82.7%和83.3%。CRVT和B的相应AUC分别为0.89和0.90,高于仅使用RNFL厚度获得的AUC。结论:我们的工作表明CRVT&B的诊断能力优于金标准青光眼参数,即RNFL厚度。我们的研究还表明,视网膜的主要血管形成了一个骨架——其结构可能代表了青光眼发展和进展过程中典型观察到的主要ONH结构变化。 摘要:Purpose: To assess whether the three-dimensional (3D) structural configuration of the central retinal vessel trunk and its branches (CRVT&B) could be used as a diagnostic marker for glaucoma. Method: We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography (OCT) volume of the optic nerve head (ONH). Subsequently, two different approaches were used for glaucoma diagnosis using the structural configuration of the CRVT&B as extracted from the OCT volumes. In the first approach, we aimed to provide a diagnosis using only 3D CNN and the 3D structure of the CRVT&B. For the second approach, we projected the 3D structure of the CRVT&B orthographically onto three planes to obtain 2D images, and then a 2D CNN was used for diagnosis. The segmentation accuracy was evaluated using the Dice coefficient, whereas the diagnostic accuracy was assessed using the area under the receiver operating characteristic curves (AUC). The diagnostic performance of the CRVT&B was also compared with that of retinal nerve fiber layer (RNFL) thickness. Results: Our segmentation network was able to efficiently segment retinal blood vessels from OCT scans. On a test set, we achieved a Dice coefficient of 0.81pm0.07. The 3D and 2D diagnostic networks were able to differentiate glaucoma from non-glaucoma subjects with accuracies of 82.7% and 83.3%, respectively. The corresponding AUCs for CRVT&B were 0.89 and 0.90, higher than those obtained with RNFL thickness alone. Conclusions: Our work demonstrated that the diagnostic power of the CRVT&B is superior to that of a gold-standard glaucoma parameter, i.e., RNFL thickness. Our work also suggested that the major retinal blood vessels form a skeleton -- the configuration of which may be representative of major ONH structural changes as typically observed with the development and progression of glaucoma.
【80】 Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach 标题:用于投资组合管理的可解释深度强化学习:一种实证方法 链接:https://arxiv.org/abs/2111.03995
作者:Mao Guan,Xiao-Yang Liu 机构:Computer Science, Columbia University, New York City, New York, Electrical Engineering, Columbia University 摘要:深度强化学习(DRL)在项目组合管理任务中得到了广泛的研究。然而,由于深层神经网络的黑箱性质,理解基于DRL的交易策略是一个挑战。在本文中,我们提出了一个实证方法来解释投资组合管理任务中DRL代理的策略。首先,我们使用后见之明的线性模型作为参考模型,该模型通过假设在预见中知道实际股票收益来确定最佳投资组合权重。特别是,我们使用后见之明的线性模型的系数作为参考特征权重。其次,对于DRL代理,我们使用积分梯度来定义特征权重,即线性回归模型下奖励和特征之间的系数。第三,研究了单步预测和多步预测两种情况下的预测能力。特别是,我们通过计算DRL代理的特征权重和参考特征权重之间的线性相关性来量化预测能力,对于机器学习方法也是如此。最后,我们评估了2009年1月1日至2021年1月9日期间道琼斯30只成份股的投资组合管理任务。我们的方法经验表明,DRL代理比机器学习方法具有更强的多步预测能力。 摘要:Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks. In this paper, we propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction. In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.
【81】 Multimodal PET/CT Tumour Segmentation and Prediction of Progression-Free Survival using a Full-Scale UNet with Attention 标题:多模式PET/CT肿瘤分割及全尺寸有注意UNET对无进展生存期的预测 链接:https://arxiv.org/abs/2111.03848
作者:Emmanuelle Bourigault,Daniel R. McGowan,Abolfazl Mehranian,Bartłomiej W. Papież 机构: Department of Engineering, Doctoral Training Centre, University of Oxford, Oxford, UK, Department of Oncology, University of Oxford, Oxford, UK, GE Healthcare, Oxford, UK 备注:13 pages, 3 figures, 2 tables. To appear in Head and Neck Tumor Segmentation in PET/CT: The HECKTOR Challenge,Valentin Oreiller et al., Medical Image Analysis,2021, HECKTOR 2021, Lecture Notes in Computer Science, Springer 摘要:头颈部(H&N)肿瘤的分割和患者预后的预测对于患者的疾病诊断和治疗监测至关重要。目前,由于缺乏具有高质量注释的大型多中心、多模态数据,稳健深度学习模型的发展受到阻碍。MICCAI 2021头颈部肿瘤(HECKTOR)分割和结果预测挑战创建了一个平台,用于比较氟脱氧葡萄糖(FDG)-PET和计算机断层扫描图像上主要总靶体积的分割方法,以及预测H&N口咽癌的无进展生存率。对于分割任务,我们提出了一种基于编码器-解码器架构的新网络,该网络具有完整的帧间和帧内跳跃连接,以充分利用全尺度下的低层和高层语义。此外,我们使用条件随机场作为后处理步骤来细化预测的分割图。我们训练了多个用于肿瘤体积分割的神经网络,这些分割被集成在一起,在交叉验证中平均骰子相似系数为0.75,在挑战测试数据集上平均骰子相似系数为0.76。为了预测患者无进展生存任务,我们提出了结合临床、放射和深度学习特征的Cox比例风险回归。我们的生存预测模型在交叉验证中的一致性指数为0.82,在挑战测试数据集上的一致性指数为0.62。 摘要:Segmentation of head and neck (H&N) tumours and prediction of patient outcome are crucial for patient's disease diagnosis and treatment monitoring. Current developments of robust deep learning models are hindered by the lack of large multi-centre, multi-modal data with quality annotations. The MICCAI 2021 HEad and neCK TumOR (HECKTOR) segmentation and outcome prediction challenge creates a platform for comparing segmentation methods of the primary gross target volume on fluoro-deoxyglucose (FDG)-PET and Computed Tomography images and prediction of progression-free survival in H&N oropharyngeal cancer.For the segmentation task, we proposed a new network based on an encoder-decoder architecture with full inter- and intra-skip connections to take advantage of low-level and high-level semantics at full scales. Additionally, we used Conditional Random Fields as a post-processing step to refine the predicted segmentation maps. We trained multiple neural networks for tumor volume segmentation, and these segmentations were ensembled achieving an average Dice Similarity Coefficient of 0.75 in cross-validation, and 0.76 on the challenge testing data set. For prediction of patient progression free survival task, we propose a Cox proportional hazard regression combining clinical, radiomic, and deep learning features. Our survival prediction model achieved a concordance index of 0.82 in cross-validation, and 0.62 on the challenge testing data set.
【82】 Artifact- and content-specific quality assessment for MRI with image rulers 标题:基于图像标尺的MRI伪影和内容特异性质量评估 链接:https://arxiv.org/abs/2111.03780
作者:Ke Lei,John M. Pauly,Shreyas S. Vasanawala 机构:Department of Electrical Engineering, Stanford University, Stanford, CA, USA, Department of Radiology, Stanford University, Stanford, CA, USA, A R T I C L E I N F O, Article history: 摘要:在临床实践中,磁共振图像通常在扫描后很长一段时间才被放射科医生看到。如果图像质量不足,要么患者必须返回进行额外扫描,要么进行次优解释。自动图像质量评估(IQA)将实现实时修复。现有用于MRI的IQA只给出一般质量分数,不知道低质量扫描的原因和解决方案。此外,放射科医生的图像质量要求因扫描类型和诊断任务而异。因此,相同的分数可能对不同的扫描有不同的影响。我们提出了一个多任务CNN模型框架,该模型通过标定标签进行训练,并通过图像标尺进行推理。通过人工输入校准的标签遵循定义明确且高效的标签任务。图像标尺处理不同的质量标准,并提供解释CNN原始分数的具体方法。该模型支持评估MRI中两种最常见的伪影:噪声和运动。它的准确度达到90%左右,比之前检查过的最佳方法高6%,比噪声评估方面的人类专家高3%。我们的实验表明,标签校准、图像标尺和多任务训练提高了模型的性能和通用性。 摘要:In clinical practice MR images are often first seen by radiologists long after the scan. If image quality is inadequate either patients have to return for an additional scan, or a suboptimal interpretation is rendered. An automatic image quality assessment (IQA) would enable real-time remediation. Existing IQA works for MRI give only a general quality score, agnostic to the cause of and solution to low-quality scans. Furthermore, radiologists' image quality requirements vary with the scan type and diagnostic task. Therefore, the same score may have different implications for different scans. We propose a framework with multi-task CNN model trained with calibrated labels and inferenced with image rulers. Labels calibrated by human inputs follow a well-defined and efficient labeling task. Image rulers address varying quality standards and provide a concrete way of interpreting raw scores from the CNN. The model supports assessments of two of the most common artifacts in MRI: noise and motion. It achieves accuracies of around 90%, 6% better than the best previous method examined, and 3% better than human experts on noise assessment. Our experiments show that label calibration, image rulers, and multi-task training improve the model's performance and generalizability.