金融/语音/音频处理学术速递[8.20]

2021-08-24 16:36:52 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

q-fin金融,共计3篇

cs.SD语音,共计2篇

eess.AS音频处理,共计5篇

1.q-fin金融:

【1】 Discriminating modelling approaches for Point in Time Economic Scenario Generation 标题:时间点经济情景生成的判别建模方法 链接:https://arxiv.org/abs/2108.08818

作者:Rui Wang 机构:Departement of Mathematics, D-MATH, ETH Zürich, Supervisors:, Prof. Dr. Patrick Cheridito, Mr. Binghuan Lin (UBS), arXiv:,.,v, [q-fin.CP] , Aug 备注:49 pages, 20 figures 摘要:我们引入时间点经济情景生成(PiT ESG)的概念,并给出清晰的数学问题公式,以统一和比较基于前瞻性市场数据的经济情景生成方法。与仅根据长期历史数据校准的传统ESG相比,此类PiT ESG应能对突然的经济变化做出更快、更灵活的反应。我们特别将S&P500指数和VIX指数作为经济变量作为前瞻性市场数据,比较非参数过滤历史模拟、GARCH模型和联合似然估计(参数),受限玻尔兹曼机器和条件变分自动编码器(生成网络)适用于PiT ESG。我们的评估包括模型拟合的统计测试和样本外预测质量的基准测试,以及使用模型输出作为止损标准的策略回溯测试。我们发现,在我们的测试中,两种生成网络的性能都优于非参数和经典参数模型,但CVAE似乎特别适合我们的目的:产生更稳健的性能和更轻的计算量。 摘要:We introduce the notion of Point in Time Economic Scenario Generation (PiT ESG) with a clear mathematical problem formulation to unify and compare economic scenario generation approaches conditional on forward looking market data. Such PiT ESGs should provide quicker and more flexible reactions to sudden economic changes than traditional ESGs calibrated solely to long periods of historical data. We specifically take as economic variable the S&P500 Index with the VIX Index as forward looking market data to compare the nonparametric filtered historical simulation, GARCH model with joint likelihood estimation (parametric), Restricted Boltzmann Machine and the conditional Variational Autoencoder (Generative Networks) for their suitability as PiT ESG. Our evaluation consists of statistical tests for model fit and benchmarking the out of sample forecasting quality with a strategy backtest using model output as stop loss criterion. We find that both Generative Networks outperform the nonparametric and classic parametric model in our tests, but that the CVAE seems to be particularly well suited for our purposes: yielding more robust performance and being computationally lighter.

【2】 Regional disparities in Social Mobility of India 标题:印度社会流动性的地区差异 链接:https://arxiv.org/abs/2108.08816

作者:Anuradha Singh 机构:Economics and Finance Department, BITS Pilani Campus 摘要:印度收入不平等的迅速加剧令人严重关切。虽然重点是包容性增长,但如果不考虑问题的复杂性,似乎很难解决问题。社会流动指数是一个重要工具,通过确定国家的优先政策领域,重点关注实现长期平等。采用主成分分析技术计算指数。总体而言,德里的联邦领土排名第一,社会流动性最高,而恰蒂斯加尔的社会流动性最低。此外,卫生和教育机会、质量和公平是有助于改善印度社会流动性的关键优先领域。因此,我们得出结论,人力资本在当今时代对促进社会流动和发展具有重要意义。 摘要:Rapid rise in income inequality in India is a serious concern. While the emphasis is on inclusive growth, it seems difficult to tackle the problem without looking at the intricacies of the problem. The Social Mobility Index is an important tool that focuses on bringing long-term equality by identifying priority policy areas in the country. The PCA technique is employed in computation of the index. Overall, the Union Territory of Delhi ranks first, with the highest social mobility and the least social mobility is in Chhattisgarh. In addition, health and education access, quality and equity are key priority areas that can help improve social mobility in India. Thus, we conclude that human capital is of great importance in promoting social mobility and development in the present times.

【3】 Risk Preferences in Time Lotteries 标题:时间彩票的风险偏好 链接:https://arxiv.org/abs/2108.08366

作者:Yonatan Berman,Mark Kirstein 机构: along with seminar participants at the Max PlanckInstitute for Mathematics in the Sciences, University of Duisburg-Essen, and Rutgers University 摘要:经济学中一个重要但尚未研究的问题是,当人们面对事件发生时间的不确定性时,他们会如何选择。在这里,我们研究随时间变化的彩票偏好,其中支付金额是确定的,但支付时间是不确定的。预期贴现效用理论(EDUT)预测决策者随着时间的推移会寻求风险。我们探索了一个增长最优的规范模型,在该模型中,决策者将其财富的长期增长率最大化。回顾时间彩票的实验证据,我们发现增长最优性比EDUT更符合证据。我们概述了未来的实验,以进一步审视增长最优性的合理性。 摘要:An important but understudied question in economics is how people choose when facing uncertainty in the timing of events. Here we study preferences over time lotteries, in which the payment amount is certain but the payment time is uncertain. Expected discounted utility theory (EDUT) predicts decision makers to be risk-seeking over time lotteries. We explore a normative model of growth-optimality, in which decision makers maximise the long-term growth rate of their wealth. Revisiting experimental evidence on time lotteries, we find that growth-optimality accords better with the evidence than EDUT. We outline future experiments to scrutinise further the plausibility of growth-optimality.

2.cs.SD语音:

【1】 Integrating Dialog History into End-to-End Spoken Language Understanding Systems 标题:将对话历史记录集成到端到端口语理解系统中 链接:https://arxiv.org/abs/2108.08405

作者:Jatin Ganhotra,Samuel Thomas,Hong-Kwang J. Kuo,Sachindra Joshi,George Saon,Zoltán Tüske,Brian Kingsbury 机构:IBM Research AI, Yorktown Heights, NY, USA 备注:Interspeech 2021 摘要:处理人机交互的端到端口语理解(SLU)系统通常与上下文无关,并独立处理会话的每一轮。另一方面,口语对话在很大程度上依赖于上下文,对话历史记录中包含有用的信息,可以改进每次会话的处理。在本文中,我们研究了对话历史的重要性,以及如何将其有效地集成到端到端SLU系统中。在处理口语时,我们提出的基于RNN传感器(RNN-T)的SLU模型能够以解码转录本和前几轮SLU标签的形式访问其对话历史。我们将对话历史编码为BERT嵌入,并将其与当前话语的语音特征一起作为SLU模型的附加输入。我们在最近发布的口语对话数据集HarperValleyBank语料库上评估了我们的方法。我们观察到显著的改进:与竞争性上下文无关的端到端基线系统相比,对话操作和呼叫者意图识别任务分别提高了8%和30%。 摘要:End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.

【2】 More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations 标题:事半功倍:使用有限的注释进行非侵入式语音质量评估 链接:https://arxiv.org/abs/2108.08745

作者:Alessandro Ragano,Emmanouil Benetos,Andrew Hines 机构: School of Computer Science, University College Dublin, Ireland, Insight Centre for Data Analytics, Ireland, School of EECS, Queen Mary University of London, UK, The Alan Turing Institute, UK 备注:Published in 2021 13th International Conference on Quality of Multimedia Experience (QoMEX) 摘要:非侵入式语音质量评估是多媒体应用中的一项关键操作。注释数据的缺乏和参考信号的缺乏是设计有效质量评估指标的一些主要挑战。在本文中,我们提出了两个多任务模型来解决上述问题。在第一个模型中,我们首先在一个大数据集上学习一个带有退化分类器的特征表示。然后,我们在一个用MOS标注的小数据集上同时执行MOS预测和退化分类。在第二种方法中,初始阶段包括在大数据集上使用基于深度聚类的无监督特征表示的学习特征。接下来,我们在一个小数据集上同时执行MOS预测和聚类标签分类。结果表明,基于深度聚类的模型优于基于降级分类器的模型和TCD VoIP上的3条基线(autoencoder features、P.563和SRMRnorm)。本文指出,多任务学习与未标记数据的特征表示相结合是一种很有前途的方法,可以解决大型MOS标注数据集的不足。 摘要:Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation classifier-based model and the 3 baselines (autoencoder features, P.563, and SRMRnorm) on TCD-VoIP. This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.

3.eess.AS音频处理:

【1】 More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations 标题:事半功倍:使用有限的注释进行非侵入式语音质量评估 链接:https://arxiv.org/abs/2108.08745

作者:Alessandro Ragano,Emmanouil Benetos,Andrew Hines 机构: School of Computer Science, University College Dublin, Ireland, Insight Centre for Data Analytics, Ireland, School of EECS, Queen Mary University of London, UK, The Alan Turing Institute, UK 备注:Published in 2021 13th International Conference on Quality of Multimedia Experience (QoMEX) 摘要:非侵入式语音质量评估是多媒体应用中的一项关键操作。注释数据的缺乏和参考信号的缺乏是设计有效质量评估指标的一些主要挑战。在本文中,我们提出了两个多任务模型来解决上述问题。在第一个模型中,我们首先在一个大数据集上学习一个带有退化分类器的特征表示。然后,我们在一个用MOS标注的小数据集上同时执行MOS预测和退化分类。在第二种方法中,初始阶段包括在大数据集上使用基于深度聚类的无监督特征表示的学习特征。接下来,我们在一个小数据集上同时执行MOS预测和聚类标签分类。结果表明,基于深度聚类的模型优于基于降级分类器的模型和TCD VoIP上的3条基线(autoencoder features、P.563和SRMRnorm)。本文指出,多任务学习与未标记数据的特征表示相结合是一种很有前途的方法,可以解决大型MOS标注数据集的不足。 摘要:Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation classifier-based model and the 3 baselines (autoencoder features, P.563, and SRMRnorm) on TCD-VoIP. This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets.

【2】 Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel 标题:基于伪多标签的无监督跨语言语音情感识别 链接:https://arxiv.org/abs/2108.08663

作者:Jin Li,Nan Yan,Lan Wang 机构:CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems 摘要:在过去的十年中,通过深入学习的方法,单一语言的语音情感识别(SER)已经取得了显著的成果。然而,由于源域和目标域分布之间的巨大差异,跨语言SER在实际应用中仍然是一个挑战。为了解决这个问题,我们提出了一种带伪多标签的无监督跨语言神经网络(UCNNPM),该网络经过训练以学习外部记忆中源域特征之间的情感相似性,并调整以识别跨语言数据库中的情感。UCNNPM引入了一种新方法,该方法利用外部内存存储源域特征,并通过计算外部内存和目标域特征之间的相似性为每个目标域数据生成伪多标签。我们在多种不同语言的语音情感数据库上评估了我们的方法。实验结果表明,我们提出的方法显著提高了Urdu、Skropus、ShEMO和EMO-DB语料库中多种低资源语言的加权准确度(WA)。 摘要:Speech Emotion Recognition (SER) in a single language has achieved remarkable results through deep learning approaches in the last decade. However, cross-lingual SER remains a challenge in real-world applications due to a great difference between the source and target domain distributions. To address this issue, we propose an Unsupervised Cross-Lingual Neural Network with Pseudo Multilabel (UCNNPM) that is trained to learn the emotion similarities between source domain features inside an external memory adjusted to identify emotion in cross-lingual databases. UCNNPM introduces a novel approach that leverages external memory to store source domain features and generates pseudo multilabel for each target domain data by computing the similarities between the external memory and the target domain features. We evaluate our approach on multiple different languages of speech emotion databases. Experimental results show our proposed approach significantly improves the weighted accuracy (WA) across multiple low-resource languages on Urdu, Skropus, ShEMO, and EMO-DB corpus.

【3】 ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition 标题:ChMusic:一种用于乐器识别评价的中国传统音乐数据集 链接:https://arxiv.org/abs/2108.08470

作者:Xia Gong,Yuxiang Zhu,Haidi Zhu,Haoran Wei 机构:School of Music, Shandong University of Technology, Zibo, Chia, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, China, No., High School (Baoshan) of, East China Normal University 摘要:乐器识别是音乐信息检索的一个广泛应用。由于以往的乐器识别数据大多集中在西方乐器上,研究者很难对中国传统乐器识别领域进行研究和评价。本文提出了一个用于训练模型和性能评估的中国传统音乐数据集ChMusic。该数据集是免费和公开的,其中记录了11种中国传统乐器和55种中国传统音乐节选。然后提出了基于ChMusic数据集的评价标准。有了这个标准,研究人员可以按照相同的规则比较他们的结果,不同研究人员的结果将变得具有可比性。 摘要:Musical instruments recognition is a widely used application for music information retrieval. As most of previous musical instruments recognition dataset focus on western musical instruments, it is difficult for researcher to study and evaluate the area of traditional Chinese musical instrument recognition. This paper propose a traditional Chinese music dataset for training model and performance evaluation, named ChMusic. This dataset is free and publicly available, 11 traditional Chinese musical instruments and 55 traditional Chinese music excerpts are recorded in this dataset. Then an evaluation standard is proposed based on ChMusic dataset. With this standard, researchers can compare their results following the same rule, and results from different researchers will become comparable.

【4】 Integrating Dialog History into End-to-End Spoken Language Understanding Systems 标题:将对话历史记录集成到端到端口语理解系统中 链接:https://arxiv.org/abs/2108.08405

作者:Jatin Ganhotra,Samuel Thomas,Hong-Kwang J. Kuo,Sachindra Joshi,George Saon,Zoltán Tüske,Brian Kingsbury 机构:IBM Research AI, Yorktown Heights, NY, USA 备注:Interspeech 2021 摘要:处理人机交互的端到端口语理解(SLU)系统通常与上下文无关,并独立处理会话的每一轮。另一方面,口语对话在很大程度上依赖于上下文,对话历史记录中包含有用的信息,可以改进每次会话的处理。在本文中,我们研究了对话历史的重要性,以及如何将其有效地集成到端到端SLU系统中。在处理口语时,我们提出的基于RNN传感器(RNN-T)的SLU模型能够以解码转录本和前几轮SLU标签的形式访问其对话历史。我们将对话历史编码为BERT嵌入,并将其与当前话语的语音特征一起作为SLU模型的附加输入。我们在最近发布的口语对话数据集HarperValleyBank语料库上评估了我们的方法。我们观察到显著的改进:与竞争性上下文无关的端到端基线系统相比,对话操作和呼叫者意图识别任务分别提高了8%和30%。 摘要:End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.

【5】 An analytic physically motivated model of the mammalian cochlea 标题:哺乳动物耳蜗的解析物理激励模型 链接:https://arxiv.org/abs/2012.15750

作者:Samiya A Alkhairy,Christopher A Shera 机构:)Massachusetts Institute of Technology, Cambridge, MA, USAa, University of South California, Los Angeles, CA, USA 备注:None 摘要:我们建立了哺乳动物耳蜗的分析模型。我们使用混合物理现象学方法,利用耳蜗经典盒子表示的物理和最新数据得出的波数估计的行为的现有工作。空间变化通过结合空间和频率的单个独立变量合并。我们得出了科尔蒂器官的速度、阻抗、通过科尔蒂器官的压差及其波数的封闭式表达式。我们使用来自多个地点和多个变量的灰鼠数据的实部和虚部进行模型测试。该模型还预测了与当前文献定性一致的阻抗。对于实现,该模型可以利用现有的滤波器组和滤波器级联模型,以提高算法或模拟电路效率为目标。耳蜗模型简单,模型常数少,能够捕捉调谐变化,物理相关变量的闭合形式表达式,这些表达式的形式可以很容易地从另一个变量中确定一个变量,这使得该模型适用于这里讨论的分析和数字听觉滤波器的实现,以及提取有关耳蜗如何工作的宏观力学见解。 摘要:We develop an analytic model of the mammalian cochlea. We use a mixed physical-phenomenological approach by utilizing existing work on the physics of classical box-representations of the cochlea, and behavior of recent data-derived wavenumber estimates. Spatial variation is incorporated through a single independent variable that combines space and frequency. We arrive at closed-form expressions for the organ of Corti velocity, its impedance, the pressure difference across the organ of Corti, and its wavenumber. We perform model tests using real and imaginary parts of chinchilla data from multiple locations and for multiple variables. The model also predicts impedances that are qualitatively consistent with current literature. For implementation, the model can leverage existing efforts for both filter bank and filter cascade models that target improved algorithmic or analog circuit efficiencies. The simplicity of the cochlear model, its small number of model constants, its ability to capture the variation of tuning, its closed-form expressions for physically-interrelated variables, and the form of these expressions that allows for easily determining one variable from another make the model appropriate for analytic and digital auditory filter implementations as discussed here, as well as for extracting macromechanical insights regarding how the cochlea works.

0 人点赞