金融/语音/音频处理学术速递[7.19]

2021-07-27 11:02:21 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

q-fin金融,共计6篇

cs.SD语音,共计5篇

eess.AS音频处理,共计5篇

1.q-fin金融:

【1】 Key features of administrative responsibility 标题:行政责任的主要特征

作者:Vladimir Zhavoronkov,Valeri Lipunov,Mattia Masolletti 机构:Associate Professor, Russian University of Transport, Moscow, Russia, Lecturer, NUST University, Rome, Italy 备注:Volume: 6 pages. Key words: administrative responsibility. JEL codes: K-1; K-4; K-10 链接:https://arxiv.org/abs/2107.07816 摘要:本文从法律责任本身、法律责任类型以及法律责任的各个方面进行了探讨。作者运用法律分析,以及一致性和完整性原则。行政责任的矛盾及其解释上的法律空白凸显。 摘要:The article examines both the legal responsibility itself and its types, and in various aspects. The authors apply legal analysis, as well as the principles of consistency and integrity. The contradictions of administrative responsibility, as well as legal gaps in its interpretation, are highlighted.

【2】 Predicting Daily Trading Volume via Various Hidden States 标题:基于各种隐含状态的日交易量预测

作者:Shaojun Ma,Pengcheng Li 机构:Department of Mathematics, Georgia Institute of Technology, Atlanta, GA , USA, Global Head of Trading Alpha Research, Invesco Ltd 链接:https://arxiv.org/abs/2107.07678 摘要:日内成交量的预测在alpha研究中起着重要的作用。现有的方法如滚动平均法(RM)和基于两状态的Kalman滤波方法已经在本课题中提出。在Kalman滤波框架下,将两种状态扩展为不同的状态,提高了预测精度。具体来说,对于不同的股票,我们利用交叉验证,并通过最小化交易量的均方误差来确定最佳状态数。通过一系列的对比实验和数值分析,证明了该方法的有效性。 摘要:Predicting intraday trading volume plays an important role in trading alpha research. Existing methods such as rolling means(RM) and a two-states based Kalman Filtering method have been presented in this topic. We extend two states into various states in Kalman Filter framework to improve the accuracy of prediction. Specifically, for different stocks we utilize cross validation and determine best states number by minimizing mean squared error of the trading volume. We demonstrate the effectivity of our method through a series of comparison experiments and numerical analysis.

【3】 Modelling risk for commodities in Brazil: An application to live cattle spot and futures prices 标题:巴西商品风险模型:活牛现货和期货价格的应用

作者:R. G. Alcoforado,W. Bernardino,A. D. Egídio dos Reis,J. A. C. Santos 机构: Universidade de Lisboa ² Universidade Federal de Pernambuco ³ Universidade do Algarve Abstract This study analysed a series of live cattle spot and futures prices from the Boi Gordo Index (BGI) in Brazil 链接:https://arxiv.org/abs/2107.07556 摘要:这项研究分析了巴西Boi Gordo指数(BGI)的一系列活牛现货和期货价格。其目的是建立一个模型,最好地描述这种商品的行为,以更准确地估计期货价格。创建的数据库包含2010年每日发生期货合约交易的条目,以及2006年12月1日至2015年4月30日期间BGI在市场上的现货销售。需要衡量此类风险的最重要原因之一是设定损失限额。为了确定价格行为的模式,以改善未来交易的结果,投资者必须对资产价值的长期波动进行分析。文献研究表明,没有其他研究使用这种方法对这种商品进行过全面的分析。畜牧业在巴西是一项大生意,因为2017年,这一行业转移了5232.5亿巴西雷亚尔(约1305亿美元)。那一年,农业综合企业贡献了巴西国内生产总值的22%。利用所提出的风险建模技术,经济主体可以做出最佳决策,在这些投资者的能力范围内,哪些期权产生更有效的风险管理。该方法基于Holt-Winters指数平滑算法、自回归综合滑动平均(ARIMA)、带外生输入的ARIMA、广义自回归条件异方差模型和广义自回归滑动平均(GARMA)模型。更具体地说,应用了5种不同的方法,对12种不同的模型进行比较,以此来描述和预测BGI商品的行为。结果表明,c(2,1)阶无截距的GARMA模型是最优的。 摘要:This study analysed a series of live cattle spot and futures prices from the Boi Gordo Index (BGI) in Brazil. The objective was to develop a model that best portrays this commodity's behaviour to estimate futures prices more accurately. The database created contained 2,010 daily entries in which trade in futures contracts occurred, as well as BGI spot sales in the market, from 1 December 2006 to 30 April 2015. One of the most important reasons why this type of risk needs to be measured is to set loss limits. To identify patterns in price behaviour in order to improve future transactions' results, investors must analyse fluctuations in assets' value for longer periods. Bibliographic research revealed that no other study has conducted a comprehensive analysis of this commodity using this approach. Cattle ranching is big business in Brazil given that in 2017, this sector moved 523.25 billion Brazilian reals (about 130.5 billion United States dollars). In that year, agribusiness contributed 22% of Brazil's total gross domestic product. Using the proposed risk modelling technique, economic agents can make the best decision about which options within these investors' reach produce more effective risk management. The methodology was based on Holt-Winters exponential smoothing algorithm, autoregressive integrated moving average (ARIMA), ARIMA with exogenous inputs, generalised autoregressive conditionally heteroskedastic and generalised autoregressive moving average (GARMA) models. More specifically, 5 different methods were applied that allowed a comparison of 12 different models as ways to portray and predict the BGI commodity's behaviour. The results show that GARMA with order c(2,1) and without intercept is the best model.

【4】 Public Health, Technology, and Human Rights: Lessons from Digital Contact Tracing 标题:公共卫生、技术和人权:来自数字接触者追踪的教训

作者:Maria Carnovale,Khahlil Louisy 备注:23 Pages 链接:https://arxiv.org/abs/2107.07552 摘要:为了减轻人工接触追踪过程的低效率,在SARS-CoV-2全球大流行期间开发了数字接触追踪和接触通报系统,作为公共利益技术使用。这些工具的有效实施需要多个因素的协调,包括地方法规和政策以及对政府和公共卫生官员的信任。还应认真考虑尽量减少与已证明有效的公共卫生现有进程的任何潜在冲突。本文详细介绍了爱尔兰、瓜亚基尔、海地和菲律宾的四个独特案例,强调了坚持科学有效性、必要性、时限性和相称性原则的重要性。 摘要:To mitigate inefficiencies in manual contact tracing processes, Digital Contact Tracing and Exposure Notifications Systems were developed for use as public-interest technologies during the SARS-CoV-2 global pandemic. Effective implementation of these tools requires alignment across several factors, including local regulations and policies and trust in government and public health officials. Careful consideration should also be made to minimize any potential conflicts with existing processes in public health which has demonstrated effectiveness. Four unique cases-of Ireland, Guayaquil, Haiti, and the Philippines-detailed in this paper will highlight the importance of upholding the principles of Scientific Validity, Necessity, Time Boundedness, and Proportionality.

【5】 Predicting Drought and Subsidence Risks in France 标题:预测法国的干旱和沉降风险

作者:Arthur Charpentier,Molly James,Hani Ali 机构:a Universit´e du Qu´ebec a Montr´eal (UQAM), Montr´eal (Qu´ebec), Canada, b EURo Institut d’Actuariat (EURIA), Universit´e de Brest, France, c Willis Re, Paris, France 链接:https://arxiv.org/abs/2107.07668 摘要:干旱事件的经济后果越来越重要,尽管部分原因是潜在机制的复杂性,人们往往难以理解。在本文中,我们将研究干旱的后果之一,即沉降风险(或更具体地说,粘土收缩引起的沉降),几十年来,法国一直强制为其提供保险。在过去的二十年里,我们利用从几家保险公司(约占家庭保险市场的四分之一)获得的数据,提出了一些统计模型来预测这些干旱的发生频率和强度,这表明气候变化可能会对这一风险产生重大的经济后果。但是,即使我们使用比标准回归型模型更先进的模型(这里是随机森林来捕捉非线性和交叉效应),仍然很难预测沉降索赔的经济成本,即使所有地球物理和气候信息都可用。 摘要:The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which insurance has been mandatory in France for several decades. Using data obtained from several insurers, representing about a quarter of the household insurance market, over the past twenty years, we propose some statistical models to predict the frequency but also the intensity of these droughts, for insurers, showing that climate change will have probably major economic consequences on this risk. But even if we use more advanced models than standard regression-type models (here random forests to capture non linearity and cross effects), it is still difficult to predict the economic cost of subsidence claims, even if all geophysical and climatic information is available.

【6】 Supporting the robust ordinal regression approach to multiple criteria decision aiding with a set of representative value functions 标题:支持多准则决策的稳健有序回归方法,并辅之以一组代表性的值函数

作者:Sally Giuseppe Arcidiacono,Salvatore Corrente,Salvatore Greco 链接:https://arxiv.org/abs/2107.07553 摘要:本文提出了一种新的方法,用一系列代表值函数来表示稳健序数回归法的结果,对于这些函数,我们分别取了$a$和$b$,满足以下两个条件:1)如果对于所有兼容的值函数,$a$的评估值不低于$b$,并且对于至少一个值函数,$a$的评估值更好,则对于所有代表值函数,$a$的评估值大于$b$的评估值;2) 如果存在一个兼容值函数,给出$a$的评估值大于$b$,而另一个兼容值函数给出$a$的评估值小于$b$,则还存在至少一个代表函数,给出$a$更好的评估值,另一个代表值函数给出$a$的评估值小于$b$。这一系列代表性价值函数旨在为决策者提供一个更为清晰的概念,以支持多准则决策辅助的建设性方法的讨论。 摘要:In this paper we propose a new methodology to represent the results of the robust ordinal regression approach by means of a family of representative value functions for which, taken two alternatives $a$ and $b$, the following two conditions are satisfied: 1) if for all compatible value functions $a$ is evaluated not worse than $b$ and for at least one value function $a$ has a better evaluation, then the evaluation of $a$ is greater than the evaluation of $b$ for all representative value functions; 2) if there exists one compatible value function giving $a$ an evaluation greater than $b$ and another compatible value function giving $a$ an evaluation smaller than $b$, then there are also at least one representative function giving a better evaluation to $a$ and another representative value function giving $a$ an evaluation smaller than $b$. This family of representative value functions intends to provide the Decision Maker (DM) a more clear idea of the preferences obtained by the compatible value functions, with the aim to support the discussion in constructive approach of Multiple Criteria Decision Aiding.

2.cs.SD语音:

【1】 Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach 标题:基于学习不忘方法的自动音频字幕的持续学习

作者:Jan Berg,Konstantinos Drossos 机构:Audio Research Group, Tampere University, Finland 链接:https://arxiv.org/abs/2107.08028 摘要:自动音频字幕(AAC)是为一般音频信号的内容自动创建文本描述(即字幕)的任务。大多数AAC方法都是使用现有的数据集来优化和/或评估。考虑到AAC数据集所拥有的有限信息,AAC方法很可能只学习所使用的数据集中包含的信息。在本文中,我们提出了第一种方法,不断适应AAC方法,以新的信息,使用一个持续的学习方法。在我们的场景中,一个预先优化的AAC方法被用于一些看不见的一般音频信号,并且可以更新其参数以适应新的信息,给定一个新的参考标题。我们评估我们的方法使用一个免费的,预先优化的AAC方法和两个免费可用的AAC数据集。我们将我们提出的方法与三个场景进行了比较,其中两个场景是在一个数据集上训练并对另一个数据集进行评估,另外三个场景是在一个数据集上训练并对另一个数据集进行微调。结果表明,该方法在提取新知识和不遗忘前一知识之间取得了很好的平衡。 摘要:Automated audio captioning (AAC) is the task of automatically creating textual descriptions (i.e. captions) for the contents of a general audio signal. Most AAC methods are using existing datasets to optimize and/or evaluate upon. Given the limited information held by the AAC datasets, it is very likely that AAC methods learn only the information contained in the utilized datasets. In this paper we present a first approach for continuously adapting an AAC method to new information, using a continual learning method. In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption. We evaluate our method using a freely available, pre-optimized AAC method and two freely available AAC datasets. We compare our proposed method with three scenarios, two of training on one of the datasets and evaluating on the other and a third of training on one dataset and fine-tuning on the other. Obtained results show that our method achieves a good balance between distilling new knowledge and not forgetting the previous one.

【2】 Controlled AutoEncoders to Generate Faces from Voices 标题:受控自动编码器,可从语音生成人脸

作者:Hao Liang,Lulan Yu,Guikang Xu,Bhiksha Raj,Rita Singh 机构:Carnegie Mellon University, Pittsburgh PA , USA 链接:https://arxiv.org/abs/2107.07988 摘要:以往的多项研究表明,人声特征与面部特征之间存在着很强的相关性。然而,现有的方法只是从声音中生成人脸,而没有探索导致这些观察到的相关性的一组特征。为了探索这一点,可以设计一种计算方法,将问题重新表述为:“为了被视为源语音的发起者,目标人脸需要改变多少?”,本文提出了一种基于学习的语音-人脸相关性隐式引导人脸特征的目标人脸变形框架。我们的框架包括一个引导式自动编码器,可将一张脸转换为另一张脸,由一个称为选通控制器的独特模型调节组件控制,该控制器根据输入语音记录修改重建的脸。在VoxCelab和VGGFace数据集上,我们通过人类主题和人脸检索对该框架进行了评估。实验结果证明了该模型的有效性。 摘要:Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model.

【3】 A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation 标题:一种用于教师发声评价的多模态机器学习框架

作者:Hang Li,Yu Kang,Yang Hao,Wenbiao Ding,Zhongqin Wu,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021 链接:https://arxiv.org/abs/2107.07956 摘要:声乐教学质量是评价教师教学积极性的重要指标之一,已被普遍认为与整个课程质量密切相关。然而,现有的声乐演唱评价主要采用人工评分的方法,这就面临着主观性和耗时性两大核心挑战。在本文中,我们提出了一种新的机器学习方法,利用成对比较和多模态正交融合算法来生成大规模的客观评价结果,教师声乐表达的流畅性和激情。我们收集了两个真实教育场景的数据集,实验结果证明了算法的有效性。为了鼓励可复制的结果,我们在url公开了我们的代码{https://github.com/tal-ai/ML4VocalDelivery.git}. 摘要:The quality of vocal delivery is one of the key indicators for evaluating teacher enthusiasm, which has been widely accepted to be connected to the overall course qualities. However, existing evaluation for vocal delivery is mainly conducted with manual ratings, which faces two core challenges: subjectivity and time-consuming. In this paper, we present a novel machine learning approach that utilizes pairwise comparisons and a multimodal orthogonal fusing algorithm to generate large-scale objective evaluation results of the teacher vocal delivery in terms of fluency and passion. We collect two datasets from real-world education scenarios and the experiment results demonstrate the effectiveness of our algorithm. To encourage reproducible results, we make our code public available at url{https://github.com/tal-ai/ML4VocalDelivery.git}.

【4】 Recognizing bird species in diverse soundscapes under weak supervision 标题:弱监督下不同声景中鸟类的识别

作者:Christof Henkel,Pascal Pfeiffer,Philipp Singer 机构:NVIDIA, Munich, Germany, RWTH Aachen University, Aachen, Germany, H,O.ai, Mountain View, USA, All authors contributed equally. 备注:All authors contributed equally, 8 pages, 4 figures, submitted to CEUR-WS 链接:https://arxiv.org/abs/2107.07728 摘要:我们提出了一种稳健的分类方法,用于复杂多样的声景中的鸟类发声,在2021年BirdCLEF2021挑战赛中获得第二名。我们说明了如何充分利用预先训练的卷积神经网络,通过使用一个有效的建模和训练例程,辅以新的增强方法。因此,我们改进了弱标记众包数据到自治记录单元收集的生产数据的泛化。因此,我们说明了如何朝着准确的自动评估鸟类种群的方向发展,这将使全球生物多样性的大规模监测成为可能,而手动注释是不可能的。 摘要:We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF2021 challenge. We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods. Thereby, we improve the generalization of weakly labeled crowd-sourced data to productive data collected by autonomous recording units. As such, we illustrate how to progress towards an accurate automated assessment of avian population which would enable global biodiversity monitoring at scale, impossible by manual annotation.

【5】 Multi-task Learning with Cross Attention for Keyword Spotting 标题:基于交叉注意的关键词识别多任务学习

作者:Takuya Higuchi,Anmol Gupta,Chandra Dhir 机构:Apple, Department of Computer Science, The University of Hong Kong 备注:Submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.07634 摘要:关键词定位(keywordspotting,KWS)是语音应用中的一项重要技术,它使用户能够通过说出关键词短语来激活设备。尽管音素分类器可以用于KWS,但它可以利用大量的转录数据进行自动语音识别(ASR),但训练标准(音素识别)和目标任务(KWS)之间存在不匹配。最近,多任务学习被应用到KWS中,以利用ASR和KWS训练数据。在这种方法中,声学模型的输出被分成两个分支,一个是用ASR数据训练的音素转录,另一个是用KWS数据训练的关键词分类。本文介绍了一种多任务学习框架下的交叉注意解码器。与输出层简单分割的传统多任务学习方法不同,交叉注意解码器通过在编码器输出和可训练查询序列之间执行交叉注意来总结来自语音编码器的信息,以预测KWS任务的置信度得分。在KWS任务上的实验结果表明,该方法比传统的分支分裂多任务学习和双向长-短团队记忆译码器的性能平均提高了12%。 摘要:Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multi-task learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach outperformed the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder by 12% on average.

3.eess.AS音频处理:

【1】 Multi-task Learning with Cross Attention for Keyword Spotting 标题:基于交叉注意的关键词识别多任务学习

作者:Takuya Higuchi,Anmol Gupta,Chandra Dhir 机构:Apple, Department of Computer Science, The University of Hong Kong 备注:Submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.07634 摘要:关键词定位(keywordspotting,KWS)是语音应用中的一项重要技术,它使用户能够通过说出关键词短语来激活设备。尽管音素分类器可以用于KWS,但它可以利用大量的转录数据进行自动语音识别(ASR),但训练标准(音素识别)和目标任务(KWS)之间存在不匹配。最近,多任务学习被应用到KWS中,以利用ASR和KWS训练数据。在这种方法中,声学模型的输出被分成两个分支,一个是用ASR数据训练的音素转录,另一个是用KWS数据训练的关键词分类。本文介绍了一种多任务学习框架下的交叉注意解码器。与输出层简单分割的传统多任务学习方法不同,交叉注意解码器通过在编码器输出和可训练查询序列之间执行交叉注意来总结来自语音编码器的信息,以预测KWS任务的置信度得分。在KWS任务上的实验结果表明,该方法比传统的分支分裂多任务学习和双向长-短团队记忆译码器的性能平均提高了12%。 摘要:Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multi-task learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach outperformed the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder by 12% on average.

【2】 Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach 标题:基于学习不忘方法的自动音频字幕的持续学习

作者:Jan Berg,Konstantinos Drossos 机构:Audio Research Group, Tampere University, Finland 链接:https://arxiv.org/abs/2107.08028 摘要:自动音频字幕(AAC)是为一般音频信号的内容自动创建文本描述(即字幕)的任务。大多数AAC方法都是使用现有的数据集来优化和/或评估。考虑到AAC数据集所拥有的有限信息,AAC方法很可能只学习所使用的数据集中包含的信息。在本文中,我们提出了第一种方法,不断适应AAC方法,以新的信息,使用一个持续的学习方法。在我们的场景中,一个预先优化的AAC方法被用于一些看不见的一般音频信号,并且可以更新其参数以适应新的信息,给定一个新的参考标题。我们评估我们的方法使用一个免费的,预先优化的AAC方法和两个免费可用的AAC数据集。我们将我们提出的方法与三个场景进行了比较,其中两个场景是在一个数据集上训练并对另一个数据集进行评估,另外三个场景是在一个数据集上训练并对另一个数据集进行微调。结果表明,该方法在提取新知识和不遗忘前一知识之间取得了很好的平衡。 摘要:Automated audio captioning (AAC) is the task of automatically creating textual descriptions (i.e. captions) for the contents of a general audio signal. Most AAC methods are using existing datasets to optimize and/or evaluate upon. Given the limited information held by the AAC datasets, it is very likely that AAC methods learn only the information contained in the utilized datasets. In this paper we present a first approach for continuously adapting an AAC method to new information, using a continual learning method. In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption. We evaluate our method using a freely available, pre-optimized AAC method and two freely available AAC datasets. We compare our proposed method with three scenarios, two of training on one of the datasets and evaluating on the other and a third of training on one dataset and fine-tuning on the other. Obtained results show that our method achieves a good balance between distilling new knowledge and not forgetting the previous one.

【3】 Controlled AutoEncoders to Generate Faces from Voices 标题:受控自动编码器,可从语音生成人脸

作者:Hao Liang,Lulan Yu,Guikang Xu,Bhiksha Raj,Rita Singh 机构:Carnegie Mellon University, Pittsburgh PA , USA 链接:https://arxiv.org/abs/2107.07988 摘要:以往的多项研究表明,人声特征与面部特征之间存在着很强的相关性。然而,现有的方法只是从声音中生成人脸,而没有探索导致这些观察到的相关性的一组特征。为了探索这一点,可以设计一种计算方法,将问题重新表述为:“为了被视为源语音的发起者,目标人脸需要改变多少?”,本文提出了一种基于学习的语音-人脸相关性隐式引导人脸特征的目标人脸变形框架。我们的框架包括一个引导式自动编码器,可将一张脸转换为另一张脸,由一个称为选通控制器的独特模型调节组件控制,该控制器根据输入语音记录修改重建的脸。在VoxCelab和VGGFace数据集上,我们通过人类主题和人脸检索对该框架进行了评估。实验结果证明了该模型的有效性。 摘要:Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model.

【4】 A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation 标题:一种用于教师发声评价的多模态机器学习框架

作者:Hang Li,Yu Kang,Yang Hao,Wenbiao Ding,Zhongqin Wu,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021 链接:https://arxiv.org/abs/2107.07956 摘要:声乐教学质量是评价教师教学积极性的重要指标之一,已被普遍认为与整个课程质量密切相关。然而,现有的声乐演唱评价主要采用人工评分的方法,这就面临着主观性和耗时性两大核心挑战。在本文中,我们提出了一种新的机器学习方法,利用成对比较和多模态正交融合算法来生成大规模的客观评价结果,教师声乐表达的流畅性和激情。我们收集了两个真实教育场景的数据集,实验结果证明了算法的有效性。为了鼓励可复制的结果,我们在url公开了我们的代码{https://github.com/tal-ai/ML4VocalDelivery.git}. 摘要:The quality of vocal delivery is one of the key indicators for evaluating teacher enthusiasm, which has been widely accepted to be connected to the overall course qualities. However, existing evaluation for vocal delivery is mainly conducted with manual ratings, which faces two core challenges: subjectivity and time-consuming. In this paper, we present a novel machine learning approach that utilizes pairwise comparisons and a multimodal orthogonal fusing algorithm to generate large-scale objective evaluation results of the teacher vocal delivery in terms of fluency and passion. We collect two datasets from real-world education scenarios and the experiment results demonstrate the effectiveness of our algorithm. To encourage reproducible results, we make our code public available at url{https://github.com/tal-ai/ML4VocalDelivery.git}.

【5】 Recognizing bird species in diverse soundscapes under weak supervision 标题:弱监督下不同声景中鸟类的识别

作者:Christof Henkel,Pascal Pfeiffer,Philipp Singer 机构:NVIDIA, Munich, Germany, RWTH Aachen University, Aachen, Germany, H,O.ai, Mountain View, USA, All authors contributed equally. 备注:All authors contributed equally, 8 pages, 4 figures, submitted to CEUR-WS 链接:https://arxiv.org/abs/2107.07728 摘要:我们提出了一种稳健的分类方法,用于复杂多样的声景中的鸟类发声,在2021年BirdCLEF2021挑战赛中获得第二名。我们说明了如何充分利用预先训练的卷积神经网络,通过使用一个有效的建模和训练例程,辅以新的增强方法。因此,我们改进了弱标记众包数据到自治记录单元收集的生产数据的泛化。因此,我们说明了如何朝着准确的自动评估鸟类种群的方向发展,这将使全球生物多样性的大规模监测成为可能,而手动注释是不可能的。 摘要:We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF2021 challenge. We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods. Thereby, we improve the generalization of weakly labeled crowd-sourced data to productive data collected by autonomous recording units. As such, we illustrate how to progress towards an accurate automated assessment of avian population which would enable global biodiversity monitoring at scale, impossible by manual annotation.

0 人点赞