金融/语音/音频处理学术速递[5.19]

2021-05-20 11:39:57 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

q-fin金融,共计8篇

cs.SD语音,共计7篇

eess.AS音频处理,共计8篇

1.q-fin金融:

【1】 Deep Graph Convolutional Reinforcement Learning for Financial Portfolio Management -- DeepPocket 标题:深图卷积强化学习在金融组合管理中的应用--DeepPocket

作者:Farzan Soleymani,Eric Paquet 机构:National Research Council, Montreal Road, Ottawa, ON K,K ,E, Canada 链接:https://arxiv.org/abs/2105.08664 摘要:投资组合管理的目标是通过不断地重新配置构成投资组合的资产,在最大限度地提高投资回报的同时,最大限度地降低风险。这些资产不是独立的,而是在短时间内相互关联的。提出了一个图卷积强化学习框架DeepPocket,其目标是利用金融工具之间的时变相互关系。这些相互关系由一个图表示,该图的节点对应于金融工具,而边对应于资产之间的成对相关函数。DeepPocket由一个受限的、用于特征提取的堆叠式自动编码器、一个用于收集金融工具之间共享的底层局部信息的卷积网络和一个actor-critics强化学习代理组成。演员-评论家结构包含两个卷积网络,其中演员学习并执行投资政策,而投资政策又由评论家评估,以便通过不断重新分配各种投资组合资产以优化预期投资回报来确定最佳行动方案。首先对agent进行离线训练,对历史数据进行在线随机批处理。当有新的数据可用时,它会通过一种被动的概念漂移方法进行在线训练,以处理其分布中的意外变化。DeepPocket在三个不同的投资时期(包括Covid-19危机期间)根据五个真实数据集进行评估,其表现明显优于市场指数。 摘要:Portfolio management aims at maximizing the return on investment while minimizing risk by continuously reallocating the assets forming the portfolio. These assets are not independent but correlated during a short time period. A graph convolutional reinforcement learning framework called DeepPocket is proposed whose objective is to exploit the time-varying interrelations between financial instruments. These interrelations are represented by a graph whose nodes correspond to the financial instruments while the edges correspond to a pair-wise correlation function in between assets. DeepPocket consists of a restricted, stacked autoencoder for feature extraction, a convolutional network to collect underlying local information shared among financial instruments, and an actor-critic reinforcement learning agent. The actor-critic structure contains two convolutional networks in which the actor learns and enforces an investment policy which is, in turn, evaluated by the critic in order to determine the best course of action by constantly reallocating the various portfolio assets to optimize the expected return on investment. The agent is initially trained offline with online stochastic batching on historical data. As new data become available, it is trained online with a passive concept drift approach to handle unexpected changes in their distributions. DeepPocket is evaluated against five real-life datasets over three distinct investment periods, including during the Covid-19 crisis, and clearly outperformed market indexes.

【2】 Liquidity Stress Testing in Asset Management -- Part 2. Modeling the Asset Liquidity Risk 标题:资产管理中的流动性压力测试--第二部分:资产流动性风险建模

作者:Thierry Roncalli,Amina Cherief,Fatma Karray-Meziou,Margaux Regnault 机构:Quantitative Research, Amundi Asset Management, Paris, Risk Management, Statistics & Economics, ENSAE, Paris 备注:86 pages, 36 figures, 44 tables 链接:https://arxiv.org/abs/2105.08377 摘要:本文是资产管理流动性风险综合研究项目的一部分,可分为三个维度。第一个维度是负债流动性风险(或融资流动性)建模,第二个维度是资产流动性风险(或市场流动性)建模,第三个维度是资产负债管理的流动性缺口风险(或资产负债匹配)。本研究旨在提出一个方法和实践框架,以执行流动性压力测试计划,该计划符合监管准则(ESMA,2019年、2020年),并对基金经理有用。对学术文献和专业研究的回顾表明,缺乏规范的分析模型。本研究计画的目的是以发展数学和统计方法,并提供适当的答案来填补这个空白。在本文的第二篇文章集中在资产流动性风险模型,我们提出了一个市场影响模型来估计交易成本。在提出一个有助于理解资产流动性主要概念的toy模型之后,我们考虑了一个基于价格影响幂律性质的两制度模型。然后,我们定义了几种资产流动性度量指标,如流动性成本、清算比率和不足或清算时间,以评估资产流动性的不同维度。最后,我们将此资产流动性框架应用于股票和债券,并讨论交易成本模型的校正问题。 摘要:This article is part of a comprehensive research project on liquidity risk in asset management, which can be divided into three dimensions. The first dimension covers liability liquidity risk (or funding liquidity) modeling, the second dimension focuses on asset liquidity risk (or market liquidity) modeling, and the third dimension considers the asset-liability management of the liquidity gap risk (or asset-liability matching). The purpose of this research is to propose a methodological and practical framework in order to perform liquidity stress testing programs, which comply with regulatory guidelines (ESMA, 2019, 2020) and are useful for fund managers. The review of the academic literature and professional research studies shows that there is a lack of standardized and analytical models. The aim of this research project is then to fill the gap with the goal of developing mathematical and statistical approaches, and providing appropriate answers. In this second article focused on asset liquidity risk modeling, we propose a market impact model to estimate transaction costs. After presenting a toy model that helps to understand the main concepts of asset liquidity, we consider a two-regime model, which is based on the power-law property of price impact. Then, we define several asset liquidity measures such as liquidity cost, liquidation ratio and shortfall or time to liquidation in order to assess the different dimensions of asset liquidity. Finally, we apply this asset liquidity framework to stocks and bonds and discuss the issues of calibrating the transaction cost model.

【3】 A Quantile Approach to Asset Pricing Models 标题:资产定价模型的分位数方法

作者:Tjeerd de Vries 链接:https://arxiv.org/abs/2105.08208 摘要:本文提出了一种分析资产定价模型性能的新方法。我证明了许多经典的资产定价界,如Hansen和Jagannathan(1991)(HJ)界,可以通过研究衍生品合约来改进。结果表明,所得的界比经验数据中的HJ界要紧得多。一个直接的含义是,SDF过程比先前假设的更不稳定,并对基于消费的资产定价模型提出了新的挑战。这个新界限的一个核心要素是风险中性分位数函数。另外两个应用考虑使用此函数:(i)作为风险价值的预测值(ii)作为碰撞风险的前瞻性度量。这两个应用程序都强调了分析数据分位数的重要性,而不是更普遍的方差和股票溢价。 摘要:This paper develops a new way of analyzing the performance of asset pricing models. I show that many classical asset pricing bounds, such as the Hansen and Jagannathan (1991) (HJ) bound, can be improved upon by looking at derivative contracts. The resulting bound is found to be much tighter than the HJ bound in empirical data. A direct implication is that the SDF process is more volatile than previously assumed and poses new challenges to consumption based asset pricing models. A central ingredient of this new bound is the risk-neutral quantile function. Two additional applications consider the use of this function: (i) As a predictor of Value-at-Risk (ii) As a forward looking measure of crash risk. Both applications underscore the importance of analyzing quantiles of the data, instead of the more prevalent variance and equity premia.

【4】 Optimal Portfolio with Power Utility of Absolute and Relative Wealth 标题:具有绝对财富和相对财富幂效用的最优投资组合

作者:Andrey Sarantsev 备注:Merton's problem; stochastic optimization; portfolio theory; wealth process 链接:https://arxiv.org/abs/2105.08139 摘要:投资组合经理通常根据基准(通常被视为标准普尔500指数基金)来评估业绩。这种相对投资组合财富是指绝对投资组合财富除以投资基准的财富(包括再投资股息)。投资组合优化的经典Merton问题考虑了投资组合的绝对财富。我们在新的效用函数中结合了绝对财富和相对财富。我们还考虑了多个基准的情况。对于绝对财富和相对财富,我们采用幂效用函数,可能具有不同的指数。得到了显式解,并与经典的Merton解进行了比较。我们将我们的结果应用于资本资产定价模型设置。 摘要:Portfolio managers often evaluate performance relative to benchmark, usually taken to be the Standard & Poor 500 stock index fund. This relative portfolio wealth is defined as the absolute portfolio wealth divided by wealth from investing in the benchmark (including reinvested dividends). The classic Merton problem for portfolio optimization considers absolute portfolio wealth. We combine absolute and relative wealth in our new utility function. We also consider the case of multiple benchmarks. To both absolute and relative wealth, we apply power utility functions, possibly with different exponents. We obtain an explicit solution and compare it to the classic Merton solution. We apply our results to the Capital Asset Pricing Model setting.

【5】 Adaptive Complementary Ensemble EMD and Energy-Frequency Spectra of Cryptocurrency Prices 标题:自适应互补系综经验模态分解与加密货币价格能谱

作者:Tim Leung,Theodore Zhao 备注:20 pages, 8 figures 链接:https://arxiv.org/abs/2105.08133 摘要:我们利用自适应互补系综经验模式分解(ACE-EMD)和希尔BERT谱分析研究了加密货币的价格动态。这是一种多尺度噪声辅助方法,可将任何时间序列分解为若干固有模态函数,以及相应的瞬时振幅和瞬时频率。该分解方法能适应每种加密货币价格演化的时变波动性。不同的模式组合允许我们使用不同时间尺度的组件重建时间序列。然后利用Hilbert谱分析定义并计算了每一种加密货币的瞬时能量频谱,以说明嵌入原始时间序列的各种时间尺度的性质。 摘要:We study the price dynamics of cryptocurrencies using adaptive complementary ensemble empirical mode decomposition (ACE-EMD) and Hilbert spectral analysis. This is a multiscale noise-assisted approach that decomposes any time series into a number of intrinsic mode functions, along with the corresponding instantaneous amplitudes and instantaneous frequencies. The decomposition is adaptive to the time-varying volatility of each cryptocurrency price evolution. Different combinations of modes allow us to reconstruct the time series using components of different timescales. We then apply Hilbert spectral analysis to define and compute the instantaneous energy-frequency spectrum of each cryptocurrency to illustrate the properties of various timescales embedded in the original time series.

【6】 AI and Shared Prosperity 标题:人工智能与共享繁荣

作者:Katya Klinova,Anton Korinek 机构:Partnership on AI, San Francisco, California, USA, University of Virginia, Charlottesville, Virginia, USA 备注:None 链接:https://arxiv.org/abs/2105.08475 摘要:人工智能的未来发展,使人类劳动自动化,可能会对劳动力市场和不平等产生明显的影响。本文提出了一个框架来分析特定类型的人工智能系统对劳动力市场的影响,其依据是人工智能系统将创造多少劳动力需求与取代多少劳动力需求,同时考虑到生产率的提高也会使社会更加富裕,从而促进额外的劳动力需求。这一分析使创建或部署人工智能系统的有道德意识的公司以及研究人员和政策制定者能够考虑到他们的行动对劳动力市场和不平等的影响,从而引导人工智能的进展朝着促进全人类共同繁荣和包容性经济未来的方向发展。 摘要:Future advances in AI that automate away human labor may have stark implications for labor markets and inequality. This paper proposes a framework to analyze the effects of specific types of AI systems on the labor market, based on how much labor demand they will create versus displace, while taking into account that productivity gains also make society wealthier and thereby contribute to additional labor demand. This analysis enables ethically-minded companies creating or deploying AI systems as well as researchers and policymakers to take into account the effects of their actions on labor markets and inequality, and therefore to steer progress in AI in a direction that advances shared prosperity and an inclusive economic future for all of humanity.

【7】 Optimal Lockdown Policies driven by Socioeconomic Costs 标题:由社会经济成本驱动的最优封锁政策

作者:Elena Gubar,Laura Policardo,Edgar J. Sanchez Carrera,Vladislav Taynitskiy 机构:Edgar J. Sánchez Carrera§ 链接:https://arxiv.org/abs/2105.08349 摘要:为了更好地适应COVID-19的动态性,本文对经典SIR模型进行了改进,即提出了一个异质SQAIRD模型,其中COVID-19分布在一个经济主体群体中,即老年人、成年人和年轻人。然后,我们计算并模拟了一个政府所面临的最优控制问题,其目标是以强制检疫措施(即封锁)作为控制措施,使流行病产生的成本最小化。我们首先从理论的角度分析了这一问题,认为不同的封锁政策(完全封锁、不封锁或部分封锁)可能是由不同的经济成本结构(凹或凸)所决定的。然后,我们关注一个特定的成本结构(凸成本),通过将整个人口划分为三个人口组(年轻人、成年人和老年人),模拟一个有针对性的最优策略与一个统一的最优策略。我们还模拟了在没有实施任何政策的情况下大流行的动态。模拟强调了这样一个事实:a)封锁政策总是比自由放任政策好,因为它限制了流行病在不受控制的情况下产生的代价;b) 以个人年龄为基础的有针对性的政策在其产生的成本方面优于统一的政策,这是一项成本较低且在控制流行病方面同样有效的有针对性的政策。 摘要:In this research paper we modify a classical SIR model to better adapt to the dynamics of COVID-19, that is we propose the heterogeneous SQAIRD model where COVID-19 spreads over a population of economic agents, namely: the elderly, adults and young people. We then compute and simulate an optimal control problem faced by a Government, where its objective is to minimize the costs generated by the pandemics using as control a compulsory quarantine measure (that is, a lockdown). We first analyze the problem from a theoretical perspective, claiming that different lockdown policies (total lockdown, no lockdown or partial lockdown) may justified by different cost (concave or convex) structures of the economies. We then focus on a particular cost structure (convex costs) and we simulate a targeted optimal policy vs. a uniform optimal policy, by dividing the whole population in three demographic groups (young, adults and old). We also simulate the dynamic of the pandemic with no policy implemented. Simulations highlighted the fact that: a) a policy of lockdown is always better than the emph{laissez faire} policy, because it limits the costs that the pandemic generates in an uncontrolled situation; b) a targeted policy based on age of the individuals outperforms a uniform policy in terms of costs that it generates, being a targeted policy less costly and equally effective in the control of the pandemic.

【8】 BBE: Simulating the Microstructural Dynamics of an In-Play Betting Exchange via Agent-Based Modelling 标题:BBE:用基于Agent的建模方法模拟场内博彩交易所的微观结构动力学

作者:Dave Cliff 机构:Department of Computer Science, University of Bristol, Bristol BS,UB, U.K. 备注:47 pages, 9 figures, 120 references 链接:https://arxiv.org/abs/2105.08310 摘要:我描述了基于代理的当代在线体育博彩交易所模拟模型的原理和设计:这种交易所与主要金融市场核心的交易机制密切相关,在过去20年中彻底改变了博彩业,但是,从实际交流中收集足够数量的丰富的、暂时的高分辨率数据——即,深度学习所需的大量数据——往往非常昂贵,有时甚至根本不可能;这就需要一个貌似真实的合成数据生成器,这就是这个模拟现在提供的。该模拟器名为“布里斯托尔博彩交易所”(BBE),旨在作为一个公共平台、数据源和实验测试平台,供研究人员研究人工智能和机器学习(ML)技术在博彩交易中的应用;而且,据我所知,BBE是第一个这样的模型:一个基于代理的免费开源仿真模型,不仅包括一个体育博彩交易所,而且还包括一个赛马场体育赛事(如赛马或赛车)的最小仿真模型,可以对其进行下注,以及一群模拟投注者,他们各自形成自己的个人赔率评估,并在比赛前和比赛期间(至关重要的是)在交易所下注(即所谓的“进行中”投注),他们的投注意见随着每个比赛项目的展开一秒一秒地发生变化。BBE是一个概念验证系统,通过应用AI/ML和先进的数据分析技术,能够生成大型高分辨率数据集,用于自动发现或改进体育赛事博彩的盈利策略。本文提供了一个广泛的调查相关文献,并解释了动机和设计的BBE,并提出了简要的说明结果。 摘要:I describe the rationale for, and design of, an agent-based simulation model of a contemporary online sports-betting exchange: such exchanges, closely related to the exchange mechanisms at the heart of major financial markets, have revolutionized the gambling industry in the past 20 years, but gathering sufficiently large quantities of rich and temporally high-resolution data from real exchanges - i.e., the sort of data that is needed in large quantities for Deep Learning - is often very expensive, and sometimes simply impossible; this creates a need for a plausibly realistic synthetic data generator, which is what this simulation now provides. The simulator, named the "Bristol Betting Exchange" (BBE), is intended as a common platform, a data-source and experimental test-bed, for researchers studying the application of AI and machine learning (ML) techniques to issues arising in betting exchanges; and, as far as I have been able to determine, BBE is the first of its kind: a free open-source agent-based simulation model consisting not only of a sports-betting exchange, but also a minimal simulation model of racetrack sporting events (e.g., horse-races or car-races) about which bets may be made, and a population of simulated bettors who each form their own private evaluation of odds and place bets on the exchange before and - crucially - during the race itself (i.e., so-called "in-play" betting) and whose betting opinions change second-by-second as each race event unfolds. BBE is offered as a proof-of-concept system that enables the generation of large high-resolution data-sets for automated discovery or improvement of profitable strategies for betting on sporting events via the application of AI/ML and advanced data analytics techniques. This paper offers an extensive survey of relevant literature and explains the motivation and design of BBE, and presents brief illustrative results.

2.cs.SD语音:

【1】 Federated Learning With Highly Imbalanced Audio Data 标题:高度不平衡音频数据的联合学习

作者:Marc C. Green,Mark D. Plumbley 机构:Centre for Vision, Speech and Signal Processing, University of Surrey 链接:https://arxiv.org/abs/2105.08550 摘要:联邦学习(FL)是一种隐私保护的机器学习方法,它允许使用来自许多不同客户机的数据来训练模型,而这些客户机不必将其所有数据传输到中央服务器。到目前为止,对FL或其他音频隐私保护方法的考虑还相对较少。在这篇论文中,我们研究了使用FL的声音事件检测任务使用音频从FSD50K数据集。音频根据上传者元数据被分为多个客户端。这导致客户端之间的数据子集高度不平衡,这是FL场景中的一个关键问题。使用贡献100个或更多音频片段的“大容量”客户机训练一系列模型,测试不同FL参数的效果,然后使用没有最小音频贡献的所有客户机训练附加模型。结果表明,使用大容量客户机训练的FL模型可以表现出与集中训练的模型相似的性能,尽管结果中的噪声比集中训练的模型通常预期的要大得多。使用所有客户机训练的FL模型与集中训练的模型相比,性能大大降低。 摘要:Federated learning (FL) is a privacy-preserving machine learning method that has been proposed to allow training of models using data from many different clients, without these clients having to transfer all their data to a central server. There has as yet been relatively little consideration of FL or other privacy-preserving methods in audio. In this paper, we investigate using FL for a sound event detection task using audio from the FSD50K dataset. Audio is split into clients based on uploader metadata. This results in highly imbalanced subsets of data between clients, noted as a key issue in FL scenarios. A series of models is trained using `high-volume' clients that contribute 100 audio clips or more, testing the effects of varying FL parameters, followed by an additional model trained using all clients with no minimum audio contribution. It is shown that FL models trained using the high-volume clients can perform similarly to a centrally-trained model, though there is much more noise in results than would typically be expected for a centrally-trained model. The FL model trained using all clients has a considerably reduced performance compared to the centrally-trained model.

【2】 Relative Positional Encoding for Transformers with Linear Complexity 标题:线性复杂度Transformer的相对位置编码

作者:Antoine Liutkus,Ondřej Cífka,Shih-Lun Wu,Umut Şimşekli,Yi-Hsuan Yang,Gaël Richard 机构: Taiwan 4National Taiwan University, Taiwan 6Inria - DIENS - PSL Research University 备注:Accepted to ICML 2021 (long talk). 23 pages 链接:https://arxiv.org/abs/2105.08399 摘要:由于线性空间和时间的复杂性,Transformer模型的最新进展允许前所未有的序列长度。同时,相对位置编码(RPE)被认为是对经典Transformer有益的,它利用滞后代替绝对位置进行推理。然而,RPE不适用于Transformer的最近线性变量,因为它需要注意矩阵的显式计算,这正是这种方法所避免的。在本文中,我们弥合了这一差距,并提出了随机位置编码作为一种生成PE的方法,这种方法可以替代经典的加法(正弦)PE,并且可以证明其行为类似于RPE。主要的理论贡献是在位置编码和相关高斯过程的互协方差结构之间建立联系。我们举例说明了我们的方法在远程竞技场基准和音乐生成上的性能。 摘要:Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

【3】 Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding 标题:基于点的声散射表面编码交互式声传播

作者:Hsien-Yu Meng,Zhenyu Tang,Dinesh Manocha 机构:University of Maryland, College Park 备注:IJCAI 2021 main track paper 链接:https://arxiv.org/abs/2105.08177 摘要:提出了一种计算几何目标声散射特性的几何深度学习方法。我们的学习算法使用物体的点云表示来计算散射特性,并将其与光线跟踪相结合,以实现动态场景中的交互式声音传播。我们使用离散拉普拉斯表面编码器和近似的邻域,每个点使用一个共享的多层感知器。我们证明了我们的公式是置换不变的,并提出了一种利用球谐函数计算散射函数的神经网络。我们的方法可以处理具有任意拓扑和变形模型的对象,并且在商品GPU上每个对象所需的时间不到1ms。我们已经分析了精度,并对数千个看不见的三维对象进行了验证,并强调了与其他基于点的几何深度学习方法相比的优势。据我们所知,这是第一个实时学习算法,可以近似的声学散射特性的任意对象的高精度。 摘要:We present a novel geometric deep learning method to compute the acoustic scattering properties of geometric objects. Our learning algorithm uses a point cloud representation of objects to compute the scattering properties and integrates them with ray tracing for interactive sound propagation in dynamic scenes. We use discrete Laplacian-based surface encoders and approximate the neighborhood of each point using a shared multi-layer perceptron. We show that our formulation is permutation invariant and present a neural network that computes the scattering function using spherical harmonics. Our approach can handle objects with arbitrary topologies and deforming models, and takes less than 1ms per object on a commodity GPU. We have analyzed the accuracy and perform validation on thousands of unseen 3D objects and highlight the benefits over other point-based geometric deep learning methods. To the best of our knowledge, this is the first real-time learning algorithm that can approximate the acoustic scattering properties of arbitrary objects with high accuracy.

【4】 Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics 标题:基于朗之万动力学的自回归模型并行灵活抽样

作者:Vivek Jayaram,John Thickstun 机构:There are two major drawbacks to ancestral sampling thatEqual contribution 1Department of Computer Science, University of Washington 备注:16 pages, 8 figures, to appear in ICML 2021 链接:https://arxiv.org/abs/2105.08164 摘要:本文介绍了一种从自回归模型中抽样的替代方法。根据模型定义的过渡动力学,自回归模型通常按顺序采样。相反,我们提出了一个采样过程,用白噪声初始化一个序列,并遵循由Langevin dynamics定义的Markov链对该序列的全局对数似然性。这种方法将采样过程并行化,并推广到条件采样。使用自回归模型作为贝叶斯先验,我们可以使用条件似然或约束来控制生成模型的输出。我们将这些技术应用于视觉和音频领域的自回归模型,在音频源分离、超分辨率和修复方面取得了有竞争力的结果。 摘要:This paper introduces an alternative approach to sampling from autoregressive models. Autoregressive models are typically sampled sequentially, according to the transition dynamics defined by the model. Instead, we propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global log-likelihood of the sequence. This approach parallelizes the sampling process and generalizes to conditional sampling. Using an autoregressive model as a Bayesian prior, we can steer the output of a generative model using a conditional likelihood or constraints. We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.

【5】 MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task 标题:Muser:以情绪识别为辅助任务的多模态压力检测

作者:Yiqun Yao,Michalis Papakostas,Mihai Burzo,Mohamed Abouelenien,Rada Mihalcea 机构:Computer Science and Engineering, University of Michigan, Mechanical Engineering, University of Michigan, Computer and Information Science, University of Michigan 备注:NAACL 2021 accepted 链接:https://arxiv.org/abs/2105.08146 摘要:自动检测人类压力的能力有利于参与情感计算和人机交互的人工智能体。压力和情绪都是人类的情感状态,压力对情绪的调节和表达具有重要意义。虽然已经建立了一系列的多模态压力检测方法,但是在探索压力和情绪之间潜在的相互依赖性方面却采取了有限的步骤。在这项工作中,我们探讨了情绪识别作为辅助任务的价值,以提高压力检测。我们提出了MUSER——一种基于Transformer的模型结构和一种基于速度的动态采样策略的多任务学习算法。对多模态应激情绪(MuSE)数据集的评价表明,该模型能有效地进行内外辅助任务的应激检测,取得了良好的效果。 摘要:The capability to automatically detect human stress can benefit artificial intelligent agents involved in affective computing and human-computer interaction. Stress and emotion are both human affective states, and stress has proven to have important implications on the regulation and expression of emotion. Although a series of methods have been established for multimodal stress detection, limited steps have been taken to explore the underlying inter-dependence between stress and emotion. In this work, we investigate the value of emotion recognition as an auxiliary task to improve stress detection. We propose MUSER -- a transformer-based model architecture and a novel multi-task learning algorithm with speed-based dynamic sampling strategy. Evaluations on the Multimodal Stressed Emotion (MuSE) dataset show that our model is effective for stress detection with both internal and external auxiliary tasks, and achieves state-of-the-art results.

【6】 Handling Structural Mismatches in Real-time Opera Tracking 标题:实时歌剧跟踪中的结构失配处理

作者:Charles Brazier,Gerhard Widmer 机构:Institute of Computational Perception and, LIT AI Lab, Linz Institute of Technology, Johannes Kepler University, Linz, Austria 备注:5 pages, 1 figure, In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2020), Dublin, Ireland 链接:https://arxiv.org/abs/2105.08531 摘要:在现场歌剧中可靠的实时分数跟踪算法有很多有用的应用前景,如自动字幕显示,或实时视频流切割。到目前为止,这样的系统是建立在一个强有力的假设基础上的,即歌剧表演是线性地遵循乐谱结构的。然而,由于不同的歌剧版本和导演的剪辑选择,这种情况在实践中很少出现。在本文中,我们提出了一个两层的解决方案。我们介绍了一个实时功能,高分辨率(HR)跟踪器,可以处理跳跃或重复在特定位置提供给它。然后,我们将其与一个额外的低分辨率(LR)跟踪器相结合,该跟踪器可以处理随时可能发生的各种不匹配,具有一定的不精确性,并且如果后者在分数中“丢失”,则可以重新定向HR跟踪器。结果表明,在存在强结构失配的情况下,两者的结合提高了跟踪的鲁棒性。 摘要:Algorithms for reliable real-time score following in live opera promise a lot of useful applications such as automatic subtitles display, or real-time video cutting in live streaming. Until now, such systems were based on the strong assumption that an opera performance follows the structure of the score linearly. However, this is rarely the case in practice, because of different opera versions and directors' cutting choices. In this paper, we propose a two-level solution to this problem. We introduce a real-time-capable, high-resolution (HR) tracker that can handle jumps or repetitions at specific locations provided to it. We then combine this with an additional low-resolution (LR) tracker that can handle all sorts of mismatches that can occur at any time, with some imprecision, and can re-direct the HR tracker if the latter is `lost' in the score. We show that the combination of the two improves tracking robustness in the presence of strong structural mismatches.

【7】 Deep Correlation Analysis for Audio-EEG Decoding 标题:音频-脑电解码的深度相关分析

作者:Jaswanth Reddy Katthi,Sriram Ganapathy 机构:This work was supported in part by the grants from Department ofAtomic Energy project 3 4 20 1 2 20 18-BRNS 3 4088Jaswanth Reddy and Sriram Ganapathy are with the Learning andExtraction of Acoustic Patterns (LEAP) laboratory 备注:Got accepted to IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING 链接:https://arxiv.org/abs/2105.08492 摘要:脑电图(electroencephalography,EEG)是以非侵入性方式记录脑激活的最简单的模式之一,由于记录的伪影对刺激-反应分析产生不利影响,它常常被扭曲。到目前为止,最突出的技术是利用线性方法来改善刺激-反应的相关性。在这篇论文中,我们提出了一个基于神经网络的相关分析框架,显著改善了听觉刺激的线性方法。提出了一种基于直接优化相关损失的深度模型用于受试者内部音频脑电分析。此外,提出了一种具有共享编码器结构的神经网络模型,以改善受试者之间的刺激-反应相关性。这些模型试图抑制脑电伪影,同时保留与刺激相关的成分。一些实验是用听语音和音乐刺激的受试者的脑电图记录进行的。在这些实验中,我们发现深度模型比线性方法显著改善了Pearson相关(语音任务和音乐任务的平均绝对改善率分别为7.4%和29.3%)。我们还分析了几个模型参数对刺激-反应相关性的影响。 摘要:The electroencephalography (EEG), which is one of the easiest modes of recording brain activations in a non-invasive manner, is often distorted due to recording artifacts which adversely impacts the stimulus-response analysis. The most prominent techniques thus far attempt to improve the stimulus-response correlations using linear methods. In this paper, we propose a neural network based correlation analysis framework that significantly improves over the linear methods for auditory stimuli. A deep model is proposed for intra-subject audio-EEG analysis based on directly optimizing the correlation loss. Further, a neural network model with a shared encoder architecture is proposed for improving the inter-subject stimulus response correlations. These models attempt to suppress the EEG artifacts while preserving the components related to the stimulus. Several experiments are performed using EEG recordings from subjects listening to speech and music stimuli. In these experiments, we show that the deep models improve the Pearson correlation significantly over the linear methods (average absolute improvements of 7.4% in speech tasks and 29.3% in music tasks). We also analyze the impact of several model parameters on the stimulus-response correlation.

3.eess.AS音频处理:

【1】 Handling Structural Mismatches in Real-time Opera Tracking 标题:实时歌剧跟踪中的结构失配处理

作者:Charles Brazier,Gerhard Widmer 机构:Institute of Computational Perception and, LIT AI Lab, Linz Institute of Technology, Johannes Kepler University, Linz, Austria 备注:5 pages, 1 figure, In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2020), Dublin, Ireland 链接:https://arxiv.org/abs/2105.08531 摘要:在现场歌剧中可靠的实时分数跟踪算法有很多有用的应用前景,如自动字幕显示,或实时视频流切割。到目前为止,这样的系统是建立在一个强有力的假设基础上的,即歌剧表演是线性地遵循乐谱结构的。然而,由于不同的歌剧版本和导演的剪辑选择,这种情况在实践中很少出现。在本文中,我们提出了一个两层的解决方案。我们介绍了一个实时功能,高分辨率(HR)跟踪器,可以处理跳跃或重复在特定位置提供给它。然后,我们将其与一个额外的低分辨率(LR)跟踪器相结合,该跟踪器可以处理随时可能发生的各种不匹配,具有一定的不精确性,并且如果后者在分数中“丢失”,则可以重新定向HR跟踪器。结果表明,在存在强结构失配的情况下,两者的结合提高了跟踪的鲁棒性。 摘要:Algorithms for reliable real-time score following in live opera promise a lot of useful applications such as automatic subtitles display, or real-time video cutting in live streaming. Until now, such systems were based on the strong assumption that an opera performance follows the structure of the score linearly. However, this is rarely the case in practice, because of different opera versions and directors' cutting choices. In this paper, we propose a two-level solution to this problem. We introduce a real-time-capable, high-resolution (HR) tracker that can handle jumps or repetitions at specific locations provided to it. We then combine this with an additional low-resolution (LR) tracker that can handle all sorts of mismatches that can occur at any time, with some imprecision, and can re-direct the HR tracker if the latter is `lost' in the score. We show that the combination of the two improves tracking robustness in the presence of strong structural mismatches.

【2】 Deep Correlation Analysis for Audio-EEG Decoding 标题:音频-脑电解码的深度相关分析

作者:Jaswanth Reddy Katthi,Sriram Ganapathy 机构:This work was supported in part by the grants from Department ofAtomic Energy project 3 4 20 1 2 20 18-BRNS 3 4088Jaswanth Reddy and Sriram Ganapathy are with the Learning andExtraction of Acoustic Patterns (LEAP) laboratory 备注:Got accepted to IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING 链接:https://arxiv.org/abs/2105.08492 摘要:脑电图(electroencephalography,EEG)是以非侵入性方式记录脑激活的最简单的模式之一,由于记录的伪影对刺激-反应分析产生不利影响,它常常被扭曲。到目前为止,最突出的技术是利用线性方法来改善刺激-反应的相关性。在这篇论文中,我们提出了一个基于神经网络的相关分析框架,显著改善了听觉刺激的线性方法。提出了一种基于直接优化相关损失的深度模型用于受试者内部音频脑电分析。此外,提出了一种具有共享编码器结构的神经网络模型,以改善受试者之间的刺激-反应相关性。这些模型试图抑制脑电伪影,同时保留与刺激相关的成分。一些实验是用听语音和音乐刺激的受试者的脑电图记录进行的。在这些实验中,我们发现深度模型比线性方法显著改善了Pearson相关(语音任务和音乐任务的平均绝对改善率分别为7.4%和29.3%)。我们还分析了几个模型参数对刺激-反应相关性的影响。 摘要:The electroencephalography (EEG), which is one of the easiest modes of recording brain activations in a non-invasive manner, is often distorted due to recording artifacts which adversely impacts the stimulus-response analysis. The most prominent techniques thus far attempt to improve the stimulus-response correlations using linear methods. In this paper, we propose a neural network based correlation analysis framework that significantly improves over the linear methods for auditory stimuli. A deep model is proposed for intra-subject audio-EEG analysis based on directly optimizing the correlation loss. Further, a neural network model with a shared encoder architecture is proposed for improving the inter-subject stimulus response correlations. These models attempt to suppress the EEG artifacts while preserving the components related to the stimulus. Several experiments are performed using EEG recordings from subjects listening to speech and music stimuli. In these experiments, we show that the deep models improve the Pearson correlation significantly over the linear methods (average absolute improvements of 7.4% in speech tasks and 29.3% in music tasks). We also analyze the impact of several model parameters on the stimulus-response correlation.

【3】 A time-domain nearfield frequency-invariant beamforming method 标题:一种时域近场频率不变波束形成方法

作者:Fei Ma,Thushara D. Abhayapala,Prasanga N. Samarasinghe 机构: Samarasingheare all with the Research School of Engineering, The Australian National University 链接:https://arxiv.org/abs/2105.08219 摘要:现有的波束形成方法大多是频域方法,设计用于在窄频带上增强远场目标源。他们已经发现了各种各样的应用,并且仍在积极开发中。然而,如果目标源位于具有宽带输出的近场,则它们很难达到预期的性能。提出了一种时域近场频率不变波束形成方法。时域实现使得波束形成器输出适合于实时应用的进一步使用,近场聚焦使得波束形成方法能够抑制干扰,即使干扰与目标源在同一方向上,频率不变的波束图使得波束形成方法适用于在宽频带上增强目标源。这三个特性共同使得波束形成方法适用于实时宽带近场源增强,例如室内环境中的语音增强。波束形成器的设计过程与声场测量过程相分离,使得所设计的波束形成器适用于各种结构的传感器阵列。通过将波束形成器分解为几个独立的部分,进一步简化了波束形成器的设计过程。仿真结果验证了该波束形成方法的性能。 摘要:Most existing beamforming methods are frequency-domain methods, and are designed for enhancing a farfield target source over a narrow frequency band. They have found diverse applications and are still under active development. However, they struggle to achieve desired performance if the target source is in the nearfield with a broadband output. This paper proposes a time-domain nearfield frequency-invariant beamforming method. The time-domain implementation makes the beamformer output suitable for further use by real-time applications, the nearfield focusing enables the beamforming method to suppress an interference even if it is in the same direction as the target source, and the frequency-invariant beampattern makes the beamforming method suitable for enhancing the target source over a broad frequency band. These three features together make the beamforming method suitable for real-time broadband nearfield source enhancement, such as speech enhancement in room environments. The beamformer design process is separated from the sound field measurement process, and such that a designed beamformer applies to sensor arrays with various structures. The beamformer design process is further simplified by decomposing it into several independent parts. Simulation results confirm the performance of the proposed beamforming method.

【4】 Federated Learning With Highly Imbalanced Audio Data 标题:高度不平衡音频数据的联合学习

作者:Marc C. Green,Mark D. Plumbley 机构:Centre for Vision, Speech and Signal Processing, University of Surrey 链接:https://arxiv.org/abs/2105.08550 摘要:联邦学习(FL)是一种隐私保护的机器学习方法,它允许使用来自许多不同客户机的数据来训练模型,而这些客户机不必将其所有数据传输到中央服务器。到目前为止,对FL或其他音频隐私保护方法的考虑还相对较少。在这篇论文中,我们研究了使用FL的声音事件检测任务使用音频从FSD50K数据集。音频根据上传者元数据被分为多个客户端。这导致客户端之间的数据子集高度不平衡,这是FL场景中的一个关键问题。使用贡献100个或更多音频片段的“大容量”客户机训练一系列模型,测试不同FL参数的效果,然后使用没有最小音频贡献的所有客户机训练附加模型。结果表明,使用大容量客户机训练的FL模型可以表现出与集中训练的模型相似的性能,尽管结果中的噪声比集中训练的模型通常预期的要大得多。使用所有客户机训练的FL模型与集中训练的模型相比,性能大大降低。 摘要:Federated learning (FL) is a privacy-preserving machine learning method that has been proposed to allow training of models using data from many different clients, without these clients having to transfer all their data to a central server. There has as yet been relatively little consideration of FL or other privacy-preserving methods in audio. In this paper, we investigate using FL for a sound event detection task using audio from the FSD50K dataset. Audio is split into clients based on uploader metadata. This results in highly imbalanced subsets of data between clients, noted as a key issue in FL scenarios. A series of models is trained using `high-volume' clients that contribute 100 audio clips or more, testing the effects of varying FL parameters, followed by an additional model trained using all clients with no minimum audio contribution. It is shown that FL models trained using the high-volume clients can perform similarly to a centrally-trained model, though there is much more noise in results than would typically be expected for a centrally-trained model. The FL model trained using all clients has a considerably reduced performance compared to the centrally-trained model.

【5】 Relative Positional Encoding for Transformers with Linear Complexity 标题:线性复杂度Transformer的相对位置编码

作者:Antoine Liutkus,Ondřej Cífka,Shih-Lun Wu,Umut Şimşekli,Yi-Hsuan Yang,Gaël Richard 机构: Taiwan 4National Taiwan University, Taiwan 6Inria - DIENS - PSL Research University 备注:Accepted to ICML 2021 (long talk). 23 pages 链接:https://arxiv.org/abs/2105.08399 摘要:由于线性空间和时间的复杂性,Transformer模型的最新进展允许前所未有的序列长度。同时,相对位置编码(RPE)被认为是对经典Transformer有益的,它利用滞后代替绝对位置进行推理。然而,RPE不适用于Transformer的最近线性变量,因为它需要注意矩阵的显式计算,这正是这种方法所避免的。在本文中,我们弥合了这一差距,并提出了随机位置编码作为一种生成PE的方法,这种方法可以替代经典的加法(正弦)PE,并且可以证明其行为类似于RPE。主要的理论贡献是在位置编码和相关高斯过程的互协方差结构之间建立联系。我们举例说明了我们的方法在远程竞技场基准和音乐生成上的性能。 摘要:Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

【6】 Point-based Acoustic Scattering for Interactive Sound Propagation via Surface Encoding 标题:基于点的声散射表面编码交互式声传播

作者:Hsien-Yu Meng,Zhenyu Tang,Dinesh Manocha 机构:University of Maryland, College Park 备注:IJCAI 2021 main track paper 链接:https://arxiv.org/abs/2105.08177 摘要:提出了一种计算几何目标声散射特性的几何深度学习方法。我们的学习算法使用物体的点云表示来计算散射特性,并将其与光线跟踪相结合,以实现动态场景中的交互式声音传播。我们使用离散拉普拉斯表面编码器和近似的邻域,每个点使用一个共享的多层感知器。我们证明了我们的公式是置换不变的,并提出了一种利用球谐函数计算散射函数的神经网络。我们的方法可以处理具有任意拓扑和变形模型的对象,并且在商品GPU上每个对象所需的时间不到1ms。我们已经分析了精度,并对数千个看不见的三维对象进行了验证,并强调了与其他基于点的几何深度学习方法相比的优势。据我们所知,这是第一个实时学习算法,可以近似的声学散射特性的任意对象的高精度。 摘要:We present a novel geometric deep learning method to compute the acoustic scattering properties of geometric objects. Our learning algorithm uses a point cloud representation of objects to compute the scattering properties and integrates them with ray tracing for interactive sound propagation in dynamic scenes. We use discrete Laplacian-based surface encoders and approximate the neighborhood of each point using a shared multi-layer perceptron. We show that our formulation is permutation invariant and present a neural network that computes the scattering function using spherical harmonics. Our approach can handle objects with arbitrary topologies and deforming models, and takes less than 1ms per object on a commodity GPU. We have analyzed the accuracy and perform validation on thousands of unseen 3D objects and highlight the benefits over other point-based geometric deep learning methods. To the best of our knowledge, this is the first real-time learning algorithm that can approximate the acoustic scattering properties of arbitrary objects with high accuracy.

【7】 Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics 标题:基于朗之万动力学的自回归模型并行灵活抽样

作者:Vivek Jayaram,John Thickstun 机构:There are two major drawbacks to ancestral sampling thatEqual contribution 1Department of Computer Science, University of Washington 备注:16 pages, 8 figures, to appear in ICML 2021 链接:https://arxiv.org/abs/2105.08164 摘要:本文介绍了一种从自回归模型中抽样的替代方法。根据模型定义的过渡动力学,自回归模型通常按顺序采样。相反,我们提出了一个采样过程,用白噪声初始化一个序列,并遵循由Langevin dynamics定义的Markov链对该序列的全局对数似然性。这种方法将采样过程并行化,并推广到条件采样。使用自回归模型作为贝叶斯先验,我们可以使用条件似然或约束来控制生成模型的输出。我们将这些技术应用于视觉和音频领域的自回归模型,在音频源分离、超分辨率和修复方面取得了有竞争力的结果。 摘要:This paper introduces an alternative approach to sampling from autoregressive models. Autoregressive models are typically sampled sequentially, according to the transition dynamics defined by the model. Instead, we propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global log-likelihood of the sequence. This approach parallelizes the sampling process and generalizes to conditional sampling. Using an autoregressive model as a Bayesian prior, we can steer the output of a generative model using a conditional likelihood or constraints. We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.

【8】 MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task 标题:Muser:以情绪识别为辅助任务的多模态压力检测

作者:Yiqun Yao,Michalis Papakostas,Mihai Burzo,Mohamed Abouelenien,Rada Mihalcea 机构:Computer Science and Engineering, University of Michigan, Mechanical Engineering, University of Michigan, Computer and Information Science, University of Michigan 备注:NAACL 2021 accepted 链接:https://arxiv.org/abs/2105.08146 摘要:自动检测人类压力的能力有利于参与情感计算和人机交互的人工智能体。压力和情绪都是人类的情感状态,压力对情绪的调节和表达具有重要意义。虽然已经建立了一系列的多模态压力检测方法,但是在探索压力和情绪之间潜在的相互依赖性方面却采取了有限的步骤。在这项工作中,我们探讨了情绪识别作为辅助任务的价值,以提高压力检测。我们提出了MUSER——一种基于Transformer的模型结构和一种基于速度的动态采样策略的多任务学习算法。对多模态应激情绪(MuSE)数据集的评价表明,该模型能有效地进行内外辅助任务的应激检测,取得了良好的效果。 摘要:The capability to automatically detect human stress can benefit artificial intelligent agents involved in affective computing and human-computer interaction. Stress and emotion are both human affective states, and stress has proven to have important implications on the regulation and expression of emotion. Although a series of methods have been established for multimodal stress detection, limited steps have been taken to explore the underlying inter-dependence between stress and emotion. In this work, we investigate the value of emotion recognition as an auxiliary task to improve stress detection. We propose MUSER -- a transformer-based model architecture and a novel multi-task learning algorithm with speed-based dynamic sampling strategy. Evaluations on the Multimodal Stressed Emotion (MuSE) dataset show that our model is effective for stress detection with both internal and external auxiliary tasks, and achieves state-of-the-art results.

0 人点赞