金融/语音/音频处理学术速递[8.24]

2021-08-25 16:10:41 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

q-fin金融,共计11篇

cs.SD语音,共计6篇

eess.AS音频处理,共计5篇

1.q-fin金融:

【1】 Effect of Share Capital on Financial Growth of Non-Financial Firms Listed at the Nairobi Securities Exchange 标题:股本对内罗毕证券交易所非金融上市公司财务增长的影响 链接:https://arxiv.org/abs/2108.10244

作者:David Haritone Shikumo 备注:None 摘要:目的:内罗毕证券交易所(NSE)上市的大量非金融公司的财务业绩一直在下降,这阻碍了投资者对此类公司的投资。贷款人也不愿意向这些公司贷款。因此,这些公司很难为其运营筹集资金。谨慎的融资决策可以导致公司的财务增长。本研究的目的是评估股本对内罗毕证券交易所上市的非金融公司财务增长的影响。金融公司被排除在外,因为它们具有特定的行业特征和严格的监管框架。本研究以市场时机理论和企业成长理论为指导。方法:采用解释性研究设计。本研究的目标人群包括45家在NSE上市的非金融公司,从2008年到2017年为期十年。本研究进行了描述性统计分析和面板数据分析。研究结果:结果表明,以每股收益增长和市值增长衡量,股本分别解释了32.73%和11.62%的财务增长变化。通过每股收益的增长和市值的增长来衡量,股本对财务增长产生积极而显著的影响。影响:该研究建议非金融企业利用股权融资作为筹集资本的一种方式,用于可能需要大量资金的重大扩张、资产增长或收购。通过这种方式,企业将获得更高的绩效和更高的财务增长。该研究还建议企业通过股权进行大量融资。价值:股权融资对任何公司都很重要,如果收益用于投资项目,最终为公司带来增长。 摘要:Purpose: A significant number of the non-financial firms listed at the Nairobi Securities Exchange (NSE) have been experiencing declining financial performance which deters investors from investing in such firms. The lenders are also not willing to lend to such firms. As such, the firms struggle to raise funds for their operations. Prudent financing decisions can lead to financial growth of the firm. The purpose of this study is to assess the effect of Share capital on financial growth of Non-financial firms listed at the Nairobi Securities Exchange. Financial firms were excluded because of their specific sector characteristics and stringent regulatory framework. The study is guided by Market Timing Theory and Theory of Growth of the Firm. Methodology: Explanatory research design was adopted. The target population of the study comprised of 45 non-financial firms listed at NSE for a period of ten years from 2008 to 2017. The study conducted both descriptive statistics analysis and panel data analysis. Findings: The result indicates that, share capital explains 32.73% and 11.62% of variations in financial growth as measure by growth in earnings per share and growth in market capitalization respectively. Share capital positively and significantly influences financial growth as measured by both growth in earnings per share and growth in market capitalization. Implications: The study recommends for the Non-financial firms to utilize equity financing as a way of raising capital for major expansions, asset growth or acquisitions which may require heavy funding. In this way, firms will be assured of improved performance as well as high financial growth. The study also recommends for substantial firm financing through equity. Value: Equity financing is important to any firm, if the proceeds are used to invest in projects which eventually bring growth to the firm.

【2】 Gender Differences in the Cost of Corrections in Group Work 标题:小组作业中矫正成本的性别差异 链接:https://arxiv.org/abs/2108.10109

作者:Yuki Takahashi 机构:Click here for the latest version 备注:This draft is still preliminary; future versions can differ substantially from this version 摘要:同事之间的纠正是小组工作的一个组成部分,但人们可能会将纠正视为个人批评,尤其是女性的纠正。我研究人们是否不喜欢与纠正他们的人合作,尤其是当那个人是女性时。人们,包括生产率高的人,不太愿意与纠正他们的人合作,即使纠正提高了团队绩效。然而,人们对女性矫正的反应和男性一样消极。这些发现表明,尽管女性不会面临更高的障碍,但纠正同事的错误代价高昂,并会降低团队效率。 摘要:Corrections among colleagues are an integral part of group work, but people may take corrections as personal criticism, especially corrections by women. I study whether people dislike collaborating with someone who corrects them and more so when that person is a woman. People, including those with high productivity, are less willing to collaborate with a person who has corrected them even if the correction improves group performance. Yet, people respond to corrections by women as negatively as by men. These findings suggest that although women do not face a higher hurdle, correcting colleagues is costly and reduces group efficiency.

【3】 Minimizing ruin probability under dependencies for insurance pricing 标题:保险定价相依条件下的最小破产概率 链接:https://arxiv.org/abs/2108.10075

作者:Ragnar Levy Gudmundarson,Manuel Guerra,Alexandra Bugalho de Moura 机构: de Moura ‡ 2 1Edinburgh Business School, Heriot-Watt University 2ISEG-School of Economics and Management, Universidade de Lisboa; REM - Research inEconomics and Mathematics 摘要:在这项工作中,Lundberg风险过程的破产概率被用作在存在价格敏感保险需求的情况下确定保费的最优安全负荷的准则。考虑了单个索赔过程和聚合索赔过程,分析了独立和相依索赔过程。对于单一风险情况,我们证明了最优负载不依赖于初始储备。在多风险情况下,我们考虑了不同风险之间的任意依赖结构,以及客户获取不同风险策略的概率之间的依赖关系。在这种情况下,最佳载荷取决于初始储备。在所有情况下,最小化破产概率的负荷与最大化期望利润的负荷并不一致。 摘要:In this work the ruin probability of the Lundberg risk process is used as a criterion for determining the optimal security loading of premia in the presence of price-sensitive demand for insurance. Both single and aggregated claim processes are considered and the independent and the dependent cases are analyzed. For the single-risk case, we show that the optimal loading does not depend on the initial reserve. In the multiple risk case we account for arbitrary dependency structures between different risks and for dependencies between the probabilities of a client acquiring policies for different risks. In this case, the optimal loadings depend on the initial reserve. In all cases the loadings minimizing the ruin probability do not coincide with the loadings maximizing the expected profit.

【4】 Previsão dos preços de abertura, mínima e máxima de índices de mercados financeiros usando a associação de redes neurais LSTM 标题:在此之前,我不知道怎么做才是最好的,因为它是一种神经性疾病的治疗方法,也是一种有效的治疗方法,它可以帮助人们更好地了解自己的生活方式和生活方式,从而更好地适应社会经济发展的需要,并为未来的发展提供更多的信息和技术支持,更好地满足人们的需求。 链接:https://arxiv.org/abs/2108.10065

作者:Gabriel de Oliveira Guedes Nogueira,Marcel Otoboni de Lima 机构:Instituto de Ciências Matemáticas e de, Computação, Universidade de São Paulo 备注:in Portuguese 摘要:为了做出良好的投资决策,投资者必须知道如何对金融时间序列进行良好的分析。在这一背景下,关于股票价格价值和趋势预测的研究变得更加重要。目前,有不同的方法来处理这项任务。两个主要方面是对股票价格和技术指标的历史分析,以及对新闻、博客和推特中有关市场的情绪分析。一些最常用的统计和人工智能技术是遗传算法、支持向量机(SVM)和不同结构的人工神经网络。这项工作提出了对一个模型的改进,该模型基于三个不同的LSTM神经网络的关联,每个神经网络并行作用,以预测分析后一天的股票交易所指数的开盘价、最低和最高价格。该数据集由世界最大证券交易所10多个指数的历史数据组成。结果表明,该模型能够以合理的精度预测趋势和股票价格。 摘要:In order to make good investment decisions, it is vitally important for an investor to know how to make good analysis of financial time series. Within this context, studies on the forecast of the values and trends of stock prices have become more relevant. Currently, there are different approaches to dealing with the task. The two main ones are the historical analysis of stock prices and technical indicators and the analysis of sentiments in news, blogs and tweets about the market. Some of the most used statistical and artificial intelligence techniques are genetic algorithms, Support Vector Machines (SVM) and different architectures of artificial neural networks. This work proposes the improvement of a model based on the association of three distinct LSTM neural networks, each acting in parallel to predict the opening, minimum and maximum prices of stock exchange indices on the day following the analysis. The dataset is composed of historical data from more than 10 indices from the world's largest stock exchanges. The results demonstrate that the model is able to predict trends and stock prices with reasonable accuracy.

【5】 Continuous-time Portfolio Optimization for Absolute Return Funds 标题:绝对收益基金的连续时间投资组合优化 链接:https://arxiv.org/abs/2108.09985

作者:Masashi Ieda 机构:† Department of Business Economics, School of Management, Tokyo University of Science, -,-, Fujimi, Chiyoda-ku, Tokyo,-, Japan 摘要:本文研究了具有以下特征的连续时间投资组合优化问题:(i)无卖空约束(ii)杠杆约束,即投资组合权重总和的上限;以及(iii)基于投资者财富和预定目标财富水平之间较低均方误差的绩效标准。由于目标水平由独立于市场指数的确定函数定义,因此它对应于绝对回报基金的标准。该模型采用具有显式边界条件的随机控制框架。采用基于核的配置方法对相应的Hamilton-Jacobi-Bellman方程进行了数值求解。然而,简单的实施并不能提供稳定和可接受的投资策略;因此,提出了一些解决这一缺点的技术。通过应用所提出的方法,得到了两个数值结果:一个使用人工数据,另一个使用来自日本组织的经验数据。第一个结果有两个含义:如何稳定数值解,以及如何避免接近终点时间的成就率暴跌。第二个结果表明,在本文讨论的环境中,杠杆是实现目标水平的必然选择。 摘要:This paper investigates a continuous-time portfolio optimization problem with the following features: (i) a no-short selling constraint; (ii) a leverage constraint, that is, an upper limit for the sum of portfolio weights; and (iii) a performance criterion based on the lower mean square error between the investor's wealth and a predetermined target wealth level. Since the target level is defined by a deterministic function independent of market indices, it corresponds to the criterion of absolute return funds. The model is formulated using the stochastic control framework with explicit boundary conditions. The corresponding Hamilton-Jacobi-Bellman equation is solved numerically using the kernel-based collocation method. However, a straightforward implementation does not offer a stable and acceptable investment strategy; thus, some techniques to address this shortcoming are proposed. By applying the proposed methodology, two numerical results are obtained: one uses artificial data, and the other uses empirical data from Japanese organizations. There are two implications from the first result: how to stabilize the numerical solution, and a technique to circumvent the plummeting achievement rate close to the terminal time. The second result implies that leverage is inevitable to achieve the target level in the setting discussed in this paper.

【6】 Welfare Effects of the Labor Income Tax Changes on Married Couples: A Sufficient Statistics Approach 标题:劳动所得税变化对已婚夫妇的福利效应:充分统计方法 链接:https://arxiv.org/abs/2108.09981

作者:Egor Malkov 机构: University of Minnesota and the Federal Reserve Bank of Minneapolis 备注:60 pages 摘要:本文开发了一个框架,用于评估劳动所得税变化对已婚夫妇的福利影响。我构建了一个夫妻劳动供给的静态模型,该模型以密集型和广泛型边际为特征,并推导出一个易于理解的表达式,该表达式提供了一个关于劳动供给反应、政策参数和收入分配如何影响改革引发的福利收益的透明理解。利用这个公式,我对美国过去四十年实施的四项税收改革进行了福利比较分析,即1986年的税收改革法案、1993年的综合预算调节法案、2001年的经济增长和税收减免调节法案,我发现这些改革创造的福利收益占总劳动收入的-0.16%到0.62%。很大一部分收益来自妇女的劳动力参与反应。尽管三项改革带来了总体福利收益,但我认为每项改革都会产生赢家和输家。此外,我还揭示了福利收益与夫妻劳动收入之间关系的两种模式。特别是,1986年和2017年的改革呈现出单调增长的关系,而其他两项改革则呈现出U型格局。最后,我描述了由线性税收函数假设产生的福利收益偏差。我认为一个改变税收累进性的改革表明,线性化偏倚是由税收累进性参数与应纳税所得额的逆弹性之比给出的。从数量上讲,这意味着线性化将美国税收改革的福利效应高估了3.6-18.1%。 摘要:This paper develops a framework for assessing the welfare effects of labor income tax changes on married couples. I build a static model of couples' labor supply that features both intensive and extensive margins and derive a tractable expression that delivers a transparent understanding of how labor supply responses, policy parameters, and income distribution affect the reform-induced welfare gains. Using this formula, I conduct a comparative welfare analysis of four tax reforms implemented in the United States over the last four decades, namely the Tax Reform Act of 1986, the Omnibus Budget Reconciliation Act of 1993, the Economic Growth and Tax Relief Reconciliation Act of 2001, and the Tax Cuts and Jobs Act of 2017. I find that these reforms created welfare gains ranging from -0.16 to 0.62 percent of aggregate labor income. A sizable part of the gains is generated by the labor force participation responses of women. Despite three reforms resulted in aggregate welfare gains, I show that each reform created both winners and losers. Furthermore, I uncover two patterns in the relationship between welfare gains and couples' labor income. In particular, the reforms of 1986 and 2017 display a monotonically increasing relationship, while the other two reforms demonstrate a U-shaped pattern. Finally, I characterize the bias in welfare gains resulting from the assumption about a linear tax function. I consider a reform that changes tax progressivity and show that the linearization bias is given by the ratio between the tax progressivity parameter and the inverse elasticity of taxable income. Quantitatively, it means that linearization overestimates the welfare effects of the U.S. tax reforms by 3.6-18.1%.

【7】 Community Detection in Cryptocurrencies with Potential Applications to Portfolio Diversification 标题:加密货币中的社区检测及其在投资组合多样化中的潜在应用 链接:https://arxiv.org/abs/2108.09763

作者:J. Gavin,M. Crane 机构:COMMUNITY DETECTION IN CRYPTOCURRENCIES WITHPOTENTIAL APPLICATIONS TO PORTFOLIO DIVERSIFICATIONA PREPRINTJenna GavinSchool of ComputingDublin City UniversityGlasnevin, ieMartin CraneSchool of ComputingDublin City UniversityGlasnevin 备注:14 pages, 8 figures 摘要:本文分析了加密货币收益率的相互关系。本文对2019年1月1日至2019年12月31日期间146种加密货币的一年期数据进行了检验。首先,通过将互相关矩阵C的特征值和特征向量分量与随机矩阵理论(RMT)假设进行比较,分析了这些收益的互相关。结果表明,C偏离了这些假设,表明C包含关于不同加密货币之间相关性的真实信息。在此基础上,采用Louvain社区检测方法作为聚类机制,检测出15个社区分组。最后,对每个集群的标准化回报进行PCA,以创建用于投资的加密货币组合。该方法选择的投资组合中包含大量高价值硬币,与同年的市场排名进行对比。为了评估初始结果的连续性,该方法还适用于T=125天三个时间段内的前50名加密货币的较小数据集,这产生了类似的结果。本文的结果表明,这些方法对于构造性能最优的加密货币组合是有用的。 摘要:In this paper, the cross-correlations of cryptocurrency returns are analysed. The paper examines one years worth of data for 146 cryptocurrencies from the period January 1 2019 to December 31 2019. The cross-correlations of these returns are firstly analysed by comparing eigenvalues and eigenvector components of the cross-correlation matrix C with Random Matrix Theory (RMT) assumptions. Results show that C deviates from these assumptions indicating that C contains genuine information about the correlations between the different cryptocurrencies. From here, Louvain community detection method is applied as a clustering mechanism and 15 community groupings are detected. Finally, PCA is completed on the standardised returns of each of these clusters to create a portfolio of cryptocurrencies for investment. This method selects a portfolio which contains a number of high value coins when compared back against their market ranking in the same year. In the interest of assessing continuity of the initial results, the method is also applied to a smaller dataset of the top 50 cryptocurrencies across three time periods of T = 125 days, which produces similar results. The results obtained in this paper show that these methods could be useful for constructing a portfolio of optimally performing cryptocurrencies.

【8】 Fragmentation, Price Formation, and Cross-Impact in Bitcoin Markets 标题:比特币市场的碎片化、价格形成与交叉影响 链接:https://arxiv.org/abs/2108.09750

作者:Jakob Albers,Mihai Cucuringu,Sam Howison,Alexander Y. Shestopaloff 机构:Department of Statistics, University of Oxford, Department of Statistics and Mathematical Institute, University of Oxford, Oxford, UK, The Alan Turing Institute, London, UK, School of Mathematical Sciences, Queen Mary University of London 备注:62 pages, 34 figures, 24 tables 摘要:鉴于比特币交易格局高度碎片化导致的微观层面的低效率,我们利用由订单簿和来自流动性最强的比特币市场的交易数据组成的粒度数据集,以了解亚1秒时间尺度下的价格形成过程。为了实现这一目标,我们构建了一组特性,这些特性封装了短回望窗口上的相关微观结构信息。随后,首先利用这些功能生成一个leader-lagger网络,量化市场如何相互影响,然后训练线性模型,该模型能够解释500美元/毫秒未来收益总变化的10%至37%(取决于预测目标市场)。然后将结果与考虑交易成本等交易现实的各种PnL计算结果进行比较。PnL计算基于自然的$textit{taker}$策略(意味着他们采用市场订单),我们将其与每个模型相关联。我们的研究结果强调了市场收费制度在决定其成为领先者或落后者的倾向方面的作用,以及我们的接受者策略的盈利能力。进一步分析后,我们还得出了自然的$textit{maker}$策略(即仅使用被动限价指令的策略),由于与回测庄家策略相关的困难,我们在真实世界的实时交易实验中进行了测试,在该实验中,我们的名义交易额超过了150万美元。结果为我们的模型提供了额外的信心,并扩展了它们所基于的功能,结果表明,与简单的基准策略相比,该策略有了显著的改进。为了便于比较,我们还将其部署在具有真实资本的实时交易环境中。 摘要:In light of micro-scale inefficiencies induced by the high degree of fragmentation of the Bitcoin trading landscape, we utilize a granular data set comprised of orderbook and trades data from the most liquid Bitcoin markets, in order to understand the price formation process at sub-1 second time scales. To achieve this goal, we construct a set of features that encapsulate relevant microstructural information over short lookback windows. These features are subsequently leveraged first to generate a leader-lagger network that quantifies how markets impact one another, and then to train linear models capable of explaining between 10% and 37% of total variation in $500$ms future returns (depending on which market is the prediction target). The results are then compared with those of various PnL calculations that take trading realities, such as transaction costs, into account. The PnL calculations are based on natural $textit{taker}$ strategies (meaning they employ market orders) that we associate to each model. Our findings emphasize the role of a market's fee regime in determining its propensity to being a leader or a lagger, as well as the profitability of our taker strategy. Taking our analysis further, we also derive a natural $textit{maker}$ strategy (i.e., one that uses only passive limit orders), which, due to the difficulties associated with backtesting maker strategies, we test in a real-world live trading experiment, in which we turned over 1.5 million USD in notional volume. Lending additional confidence to our models, and by extension to the features they are based on, the results indicate a significant improvement over a naive benchmark strategy, which we also deploy in a live trading environment with real capital, for the sake of comparison.

【9】 The changing dynamics of HIV/AIDS during the Covid-19 pandemic in the Rohingya refugee camps in Bangladesh a call for action 标题:孟加拉国罗辛亚难民营冠状病毒大流行期间艾滋病毒/艾滋病的变化呼吁采取行动 链接:https://arxiv.org/abs/2108.09690

作者:Muhammad Anwar Hossain,Iryna Zablotska-Manos 机构:. Assistant Professor, Department of Sociology, Begum Rokeya University, Rangpur, Bangladesh., . College of Public Health Medicine and Veterinary Sciences, James Cook University, Townsville, Australia., . Westmead Clinical School, University of Sydney, Westmead. 备注:10 pages, no figure 摘要:2019冠状病毒疾病已经影响到了每个国家的卫生服务,并使难民陷入最绝望的境地。罗兴亚难民的困境是最严峻的。这严重影响了他们现有的艾滋病毒/性传播感染预防和管理服务,并进一步增加了难民营内暴力和艾滋病毒进一步传播的风险。在这篇评论2019冠状病毒疾病的孟加拉社区中,我们讨论了艾滋病毒和艾滋病的背景和变化的动态。我们目前观察到的是罗兴亚难民营迄今为止最严重的危机。首先,由于流离失所,罗兴亚难民更容易感染艾滋病毒、性传播感染和其他不良健康结果。第二,出于同样的原因,他们无法充分获得艾滋病毒检测治疗和护理。不仅因为他们的难民地位,而且因为东道国提供服务的能力差。第三,一系列复杂的经济、社会文化和行为因素加剧了他们在获得艾滋病毒检测、治疗和护理方面的悲惨处境。最后,2019冠状病毒疾病的出现改变了所有社会的优先事项,包括难民营。在2019冠状病毒疾病2019冠状病毒疾病的背景下,更多的关注是COVID-19,而不是其他健康问题,这加剧了在罗兴亚难民中艾滋病毒检测、管理和预防的可怕情况。尽管世界上大多数国家都经历了共同的危机,但国际社会有义务共同努力改善最弱势群体的生活、生计和健康。罗兴亚难民就是其中之一。 摘要:COVID-19 pandemic has affected each and every country's health service and plunged refugees into the most desperate conditions. The plight of Rohingya refugees is among the harshest. It has severely affected their existing HIV/STI prevention and management services and further increased the risk of violence and onward HIV transmission within the camps. In this commentary, we discuss the context and the changing dynamics of HIV/AIDS during COVID-19 among the Rohingya refugee community in Bangladesh. What we currently observe is the worst crisis in the Rohingya refugee camps thus far. Firstly, because of being displaced, Rohingya refugees have increased vulnerability to HIV, as well as to STIs and other poor health outcomes. Secondly, for the same reason, they have inadequate access to HIV testing treatment and care. Not only because of their refugee status but also because of the poor capacity of the host country to provide services. Thirdly, a host of complex economic, socio-cultural and behavioural factors exacerbate their dire situation with access to HIV testing, treatment and care. And finally, the advent of the COVID-19 pandemic has changed priorities in all societies, including the refugee camps. In the context of the unfolding COVID-19 crisis, more emphasis is placed on COVID-19 rather than other health issues, which exacerbates the dire situation with HIV detection, management, and prevention among Rohingya refugees. Despite the common crisis experienced by most countries around the world, the international community has an obligation to work together to improve the life, livelihood, and health of those who are most vulnerable. Rohingya refugees are among them.

【10】 Multivariate self-exciting jump processes with applications to financial data 标题:多元自激跳跃过程及其在金融数据中的应用 链接:https://arxiv.org/abs/2108.10176

作者:Heidar Eyjolfsson,Dag Tjøstheim 摘要:本文讨论了多元自激励和交叉激励过程。我们定义了一类由随机跳跃驱动的多元点过程,通过它们相应的随机强度过程。本质上,每当相应的点过程记录一个事件时,强度过程就会发生跳跃。我们的建模类的一个属性是,不仅在每个实例中记录跳跃,而且还记录其大小。这使得大跳跃比小跳跃对强度的影响更大。我们给出了保证过程在不爆炸的意义上是稳定的条件,并详细讨论了线性模型的子类何时是稳定的。最后,我们将我们的模型分别与标准普尔500指数和日经225指数的金融时间序列数据进行拟合。我们得出结论,我们的建模类中的非线性变量最适合数据。这支持了这样的观察:在危机时期(高强度),跳跃往往以集群的形式出现,而当市场较为平静时,跳跃之间的时间通常较长。此外,我们还观察到,当强度较高时,跳跃大小的变化比强度较低时更大。 摘要:The paper discusses multivariate self- and cross-exciting processes. We define a class of multivariate point processes via their corresponding stochastic intensity processes that are driven by stochastic jumps. Essentially, there is a jump in an intensity process whenever the corresponding point process records an event. An attribute of our modelling class is that not only a jump is recorded at each instance, but also its magnitude. This allows large jumps to influence the intensity to a larger degree than smaller jumps. We give conditions which guarantee that the process is stable, in the sense that it does not explode, and provide a detailed discussion on when the subclass of linear models is stable. Finally, we fit our model to financial time series data from the S&P 500 and Nikkei 225 indices respectively. We conclude that a nonlinear variant from our modelling class fits the data best. This supports the observation that in times of crises (high intensity) jumps tend to arrive in clusters, whereas there are typically longer times between jumps when the markets are calmer. We moreover observe more variability in jump sizes when the intensity is high, than when it is low.

【11】 Adaptive Gradient Descent Methods for Computing Implied Volatility 标题:计算隐含波动率的自适应梯度下降法 链接:https://arxiv.org/abs/2108.07035

作者:Yixiao Lu,Yihong Wang,Tinggan Yang 机构:Department of Statistics and Mathematics, Shanghai Lixin University of Accounting and Finance, Shanghai, China. 备注:6 pages, 1 table, 4 figures 摘要:本文提出了一种新的基于自适应梯度下降优化器的数值方法,用于计算Black-Scholes(B-S)期权定价模型的隐含波动率。结果表明,新方法比闭式近似更精确。与Newton-Raphson方法相比,新方法具有可靠的收敛速度,且对起始点不太敏感。 摘要:In this paper, a new numerical method based on adaptive gradient descent optimizers is provided for computing the implied volatility from the Black-Scholes (B-S) option pricing model. It is shown that the new method is more accurate than the close form approximation. Compared with the Newton-Raphson method, the new method obtains a reliable rate of convergence and tends to be less sensitive to the beginning point.

2.cs.SD语音:

【1】 General Theory of Music by Icosahedron 2: Analysis of musical pieces by the exceptional musical icosahedra 标题:二十面体的音乐通论2:特殊音乐二十面体的音乐作品分析 链接:https://arxiv.org/abs/2108.10294

作者:Yusuke Imai 机构:Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka ,-, Japan 备注:33 pages, 51 figures 摘要:我们提出了一种新的方法来分析音乐作品,使用特殊的音乐二十面体,其中所有的主要/次要三位一体都由黄金三角形和黄金侏儒表示。首先,我们引入一个黄金邻域的概念,它描述了与给定的黄金三角形或侏儒相邻的黄金三角形/侏儒。然后,我们研究了特殊的音乐二十面体和新黎曼理论之间的关系,发现黄金邻域和二十面体对称性将任何大/小三和弦与任何大/小三和弦联系起来。其次,我们展示了如何运用特殊的音乐二十面体来分析由四个或更多音调构成的和声。我们介绍了两个概念,黄金分解和黄金奇异。黄金分割是将给定和声分解为若干和声,构成给定和声,并表示黄金图形(金三角、黄金侏儒或黄金矩形)。和声是金色单数的当且仅当和声没有金色分解。我们展示了间日第七和弦和神秘和弦的黄金分析(黄金分解分析)的结果。在第1类[星]和第4类[星]特殊音乐二十面体中,占主导地位的第七和弦是金色单数,而在第2类[星]和第3类[星]特殊音乐二十面体中,减半的第七和弦是金色单数。最后,我们将黄金分析应用于约翰·塞巴斯蒂安·巴赫(BWV846)的著名C大调前奏曲。我们发现2型[星型]或3型[星型]特殊音乐二十面体上的7个金色数字组合双重代表了BWV 846的所有度量。 摘要:We propose a new approach to analyses of musical pieces by using the exceptional musical icosahedra where all the major/minor triads are represented by golden triangles and golden gnomons. First, we introduce a concept of the golden neighborhood that characterizes golden triangles/gnomons that neighbor a given golden triangle or gnomon. Then, we investigate a relation between the exceptional musical icosahedra and the neo-Riemannian theory, and find that the golden neighborhoods and the icosahedron symmetry relate any major/minor triad with any major/minor triad. Second, we show how the exceptional musical icosahedra are applied to analyzing harmonies constructed by four or more tones. We introduce two concepts, golden decomposition and golden singular. The golden decomposition is a decomposition of a given harmony into some harmonies constructing the given harmony and represented the golden figure (a golden triangle, a golden gnomon, or a golden rectangle). A harmony is golden singular if and only if the harmony does not have golden decompositions. We show results of the golden analysis (analysis by the golden decomposition) of the tertian seventh chords and the mystic chords. While the dominant seventh chord is golden singular in the type 1[star] and the type 4[star] exceptional musical icosahedron, the half-diminished seventh chord is golden singular in the type 2 [star] and the type 3[star] exceptional musical icosahedron. Last, we apply the golden analysis to the famous prelude in C major by Johan Sebastian Bach (BWV 846). We found 7 combinations of the golden figures on the type 2 [star] or the type 3 [star] exceptional musical icosahedron dually represent all the measures of the BWV 846.

【2】 Automatic Speech Recognition using limited vocabulary: A survey 标题:使用有限词汇量的自动语音识别:综述 链接:https://arxiv.org/abs/2108.10254

作者:Jean Louis K. E. Fendji,Diane M. Tala,Blaise O. Yenke,Marcellin Atemkeng 机构:Department Mathematics and Computer Science, University of Ngaoundere, Ngaoundere, Cameroon (e-mail: 备注:20 pages, 9 figures, 6 tables, submitted to IEEE ACCESS for possible publication 摘要:自动语音识别(ASR)是一个非常活跃的研究领域,因为它有着大量的应用和支持语音处理的接口或计算设备。但大部分应用程序都基于资源丰富的语言,而这些语言的资源不足。然而,ASR代表了一种不可否认的推广此类语言的手段,尤其是在设计涉及文盲的人对人或人对机器系统时。设计面向资源不足语言的ASR系统的一种方法是从有限的词汇表开始。ASR使用有限的词汇量是语音识别问题的一个子集,其重点是识别少量的单词或句子。本文旨在通过有限的词汇,对ASR系统背后的机制、技术、工具、项目、最近的贡献以及可能的未来方向提供一个全面的视角。因此,这项工作为使用有限词汇设计ASR系统提供了一条途径。虽然重点放在有限的词汇上,但本次调查中报告的大多数工具和技术一般适用于ASR系统。 摘要:Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary. This work consequently provides a way to go when designing ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey applied to ASR systems in general.

【3】 Tracked 3D Ultrasound and Deep Neural Network-based Thyroid Segmentation reduce Interobserver Variability in Thyroid Volumetry 标题:基于跟踪三维超声和深度神经网络的甲状腺分割降低甲状腺体积测量中的观察者间变异性 链接:https://arxiv.org/abs/2108.10118

作者:Markus Krönke,Christine Eilers,Desislava Dimova,Melanie Köhler,Gabriel Buschner,Lilit Mirzojan,Lemonia Konstantinidou,Marcus R. Makowski,James Nagarajah,Nassir Navab,Wolfgang Weber,Thomas Wendler 机构:Department of Nuclear Medicine, School of Medicine, Technical University of Munich, Department of Radiology, School of Medicine, Technical University of Munich, Munich, Chair for Computer Aided Medical Procedures and Augmented Reality, Department of 备注:7 figures, 19 pages, under review 摘要:背景:甲状腺容量测定在甲状腺疾病的诊断、治疗和监测中至关重要。然而,传统的二维超声甲状腺容积测定法高度依赖于操作者。本研究比较了2D超声和跟踪3D超声与基于深度神经网络的甲状腺自动分割,包括观察者之间和观察者内部的变异性、时间和准确性。体积参考值为MRI。方法:对28名健康志愿者进行二维、三维超声扫描及MRI检查。三位具有不同经验(6、4和1A)的医师(MD 1、2、3)对每位志愿者进行了三次二维超声和三次跟踪三维超声扫描。在2D扫描中,甲状腺叶体积采用椭球体公式计算。卷积深度神经网络(CNN)自动分割3D甲状腺叶。在MRI(T1 VIBE序列)上,由经验丰富的医生手动分割甲状腺。结果:CNN经过训练,骰子得分为0.94。比较两种MDs的观察者间变异性显示,2D和3D的平均差异分别为0.58 ml至0.52 ml(MD1对2)、-1.33 ml至-0.17 ml(MD1对3)和-1.89 ml至-0.70 ml(MD2对3)。配对样本t检验显示2D和3D的两个比较存在显著差异。二维和三维超声的观察内变异性相似。通过配对样本t检验比较超声体积和MRI体积,发现所有MDs的2D体积测定结果存在显著差异,而3D超声体积测定结果无显著差异。三维超声的采集时间明显缩短。结论:跟踪三维超声结合CNN分割显著降低了观察者之间甲状腺容积测量的变异性,并在较短的采集时间内提高了测量的准确性。 摘要:Background: Thyroid volumetry is crucial in diagnosis, treatment and monitoring of thyroid diseases. However, conventional thyroid volumetry with 2D ultrasound is highly operator-dependent. This study compares 2D ultrasound and tracked 3D ultrasound with an automatic thyroid segmentation based on a deep neural network regarding inter- and intraobserver variability, time and accuracy. Volume reference was MRI. Methods: 28 healthy volunteers were scanned with 2D and 3D ultrasound as well as by MRI. Three physicians (MD 1, 2, 3) with different levels of experience (6, 4 and 1 a) performed three 2D ultrasound and three tracked 3D ultrasound scans on each volunteer. In the 2D scans the thyroid lobe volumes were calculated with the ellipsoid formula. A convolutional deep neural network (CNN) segmented the 3D thyroid lobes automatically. On MRI (T1 VIBE sequence) the thyroid was manually segmented by an experienced medical doctor. Results: The CNN was trained to obtain a dice score of 0.94. The interobserver variability comparing two MDs showed mean differences for 2D and 3D respectively of 0.58 ml to 0.52 ml (MD1 vs. 2), -1.33 ml to -0.17 ml (MD1 vs. 3) and -1.89 ml to -0.70 ml (MD2 vs. 3). Paired samples t-tests showed significant differences in two comparisons for 2D and none for 3D. Intraobsever variability was similar for 2D and 3D ultrasound. Comparison of ultrasound volumes and MRI volumes by paired samples t-tests showed a significant difference for the 2D volumetry of all MDs, and no significant difference for 3D ultrasound. Acquisition time was significantly shorter for 3D ultrasound. Conclusion: Tracked 3D ultrasound combined with a CNN segmentation significantly reduces interobserver variability in thyroid volumetry and increases the accuracy of the measurements with shorter acquisition times.

【4】 Subject Envelope based Multitype Reconstruction Algorithm of Speech Samples of Parkinson's Disease 标题:基于主题包络的帕金森病语音样本多类型重建算法 链接:https://arxiv.org/abs/2108.09922

作者:Yongming Li,Chengyu Liu,Pin Wang,Hehua Zhang,Anhai Wei 机构:(,. School of Microcommunication Engineering, Chongqing University, Chongqing , P.R. China;, . Department of Medical Engineering, Daping Hospital, Army Medical University (Third Military Medical University), Chongqing, China) 备注:11 pages, 6 tables 摘要:帕金森病(Parkinson'sdisease,PD)的危险性非常严重,PD语音识别是目前一种有效的诊断方法。然而,由于疾病分期、语料库和其他因素对数据收集的影响,同一受试者内每个样本反映PD状态的能力各不相同。没有样品是完全无用的,没有样品是100%完美的。这一特点意味着它不适合仅仅移除一些样品或保留一些样品。为了获得高质量的新样品,需要考虑样品转化。不幸的是,现有的PD语音识别方法主要集中在特征学习和分类器设计上,而不是样本学习,很少有方法考虑样本变换。针对上述问题,本文提出了一种基于多类型重构算子的局部放电语音样本变换算法。该算法分为四个主要步骤。该算法设计了三种类型的重建算子:A型、B型和C型。对于A型算子,通过设计线性变换直接重构原始数据集以获得第一个数据集。类型B算子用于对数据集进行聚类和线性变换,以获得第二个新数据集。第三种算子,即C型算子,通过聚类和卷积重构数据集,得到第三种数据集。最后,基于这三个新的数据集对基础分类器进行训练,然后对分类结果进行决策加权融合。在实验部分,使用了两个具有代表性的PD语音数据集进行验证。结果表明,该算法是有效的。与其他算法相比,该算法在分类精度上有明显的提高。 摘要:The risk of Parkinson's disease (PD) is extremely serious, and PD speech recognition is an effective method of diagnosis nowadays. However, due to the influence of the disease stage, corpus, and other factors on data collection, the ability of every samples within one subject to reflect the status of PD vary. No samples are useless totally, and not samples are 100% perfect. This characteristic means that it is not suitable just to remove some samples or keep some samples. It is necessary to consider the sample transformation for obtaining high quality new samples. Unfortunately, existing PD speech recognition methods focus mainly on feature learning and classifier design rather than sample learning, and few methods consider the sample transformation. To solve the problem above, a PD speech sample transformation algorithm based on multitype reconstruction operators is proposed in this paper. The algorithm is divided into four major steps. Three types of reconstruction operators are designed in the algorithm: types A, B and C. Concerning the type A operator, the original dataset is directly reconstructed by designing a linear transformation to obtain the first dataset. The type B operator is designed for clustering and linear transformation of the dataset to obtain the second new dataset. The third operator, namely, the type C operator, reconstructs the dataset by clustering and convolution to obtain the third dataset. Finally, the base classifier is trained based on the three new datasets, and then the classification results are fused by decision weighting. In the experimental section, two representative PD speech datasets are used for verification. The results show that the proposed algorithm is effective. Compared with other algorithms, the proposed algorithm achieves apparent improvements in terms of classification accuracy.

【5】 Using growth transform dynamical systems for spatio-temporal data sonification 标题:利用生长变换动力系统进行时空数据可听化 链接:https://arxiv.org/abs/2108.09537

作者:Oindrila Chatterjee,Shantanu Chakrabartty 机构:Washington University in St. Louis, Missouri , USA. 备注:This article was submitted to PLoS One in March, 2021 and is currently under peer review 摘要:声音化,或在有意义的音频签名中编码信息,在增强或取代传统的人在回路决策可视化方法方面有几个优势。文献中报告的标准超声方法涉及(i)仅使用变量子集,或(ii)首先解决数据上的学习任务,然后将输出映射到音频波形,最终用户利用该波形做出决策。本文提出了一种利用复杂增长变换动态系统模型对高维数据进行超声处理的新框架,该模型将学习(或更一般地说,优化)和超声处理过程集成在一起。我们的算法将学习或预测任务的数据和优化参数作为输入,并将其与用户定义的心理声学参数相结合。因此,该框架输出的双耳音频特征不仅编码了高维数据的一些统计特性,而且还揭示了优化/学习过程的潜在复杂性。通过使用合成数据集进行大量实验,我们展示了超声脑电图(EEG)数据框架,该框架具有检测儿童癫痫发作的潜力。 摘要:Sonification, or encoding information in meaningful audio signatures, has several advantages in augmenting or replacing traditional visualization methods for human-in-the-loop decision-making. Standard sonification methods reported in the literature involve either (i) using only a subset of the variables, or (ii) first solving a learning task on the data and then mapping the output to an audio waveform, which is utilized by the end-user to make a decision. This paper presents a novel framework for sonifying high-dimensional data using a complex growth transform dynamical system model where both the learning (or, more generally, optimization) and the sonification processes are integrated together. Our algorithm takes as input the data and optimization parameters underlying the learning or prediction task and combines it with the psychoacoustic parameters defined by the user. As a result, the proposed framework outputs binaural audio signatures that not only encode some statistical properties of the high-dimensional data but also reveal the underlying complexity of the optimization/learning process. Along with extensive experiments using synthetic datasets, we demonstrate the framework on sonifying Electro-encephalogram (EEG) data with the potential for detecting epileptic seizures in pediatric patients.

【6】 Using Large Pre-Trained Models with Cross-Modal Attention for Multi-Modal Emotion Recognition 标题:利用具有跨模态注意的大型预训练模型进行多模态情感识别 链接:https://arxiv.org/abs/2108.09669

作者:Krishna D N 机构:Freshworks Inc. 备注:5 Pages, REJECTED FROM INTERSPEECH 2021 摘要:最近,自我监督预训练在机器学习的许多领域都取得了显著的进步,包括语音和NLP。我们建议使用大型的自我监督预训练模型对音频和文本模态进行跨模态注意,以实现多模态情感识别。我们使用Wav2Vec2.0[1]作为音频编码器基础,用于稳健的语音特征提取,并使用BERT模型[2]作为文本编码器基础,以更好地表示文本的上下文。这些在大量未标记数据上训练的高容量模型包含丰富的特征表示,可以提高下游任务的性能。我们使用跨模态注意[3]机制从自监督模型学习音频和文本表示之间的对齐。跨模态注意还有助于提取音频和文本特征之间的交互信息。我们使用音频和文本模态的统计池从帧级特征中获得话语级特征表示,并使用早期融合技术将它们结合起来。我们的实验表明,在IEMOCAP数据集[35]上,与之前最先进的方法[3]相比,所提出的方法在精确度上获得了1.88%的绝对提高。我们还对音频和文本模式进行了单峰实验,并将其与以前最好的方法进行了比较。 摘要:Recently, self-supervised pre-training has shown significant improvements in many areas of machine learning, including speech and NLP. We propose using large self-supervised pre-trained models for both audio and text modality with cross-modality attention for multimodal emotion recognition. We use Wav2Vec2.0 [1] as an audio encoder base for robust speech features extraction and the BERT model [2] as a text encoder base for better contextual representation of text. These high capacity models trained on large amounts of unlabeled data contain rich feature representations and improve the downstream task's performance. We use the cross-modal attention [3] mechanism to learn alignment between audio and text representations from self-supervised models. Cross-modal attention also helps in extracting interactive information between audio and text features. We obtain utterance-level feature representation from frame-level features using statistics pooling for both audio and text modality and combine them using the early fusion technique. Our experiments show that the proposed approach obtains a 1.88% absolute improvement in accuracy compared to the previous state-of-the-art method [3] on the IEMOCAP dataset [35]. We also conduct unimodal experiments for both audio and text modalities and compare them with previous best methods.

3.eess.AS音频处理:

【1】 Using Large Pre-Trained Models with Cross-Modal Attention for Multi-Modal Emotion Recognition 标题:利用具有跨模态注意的大型预训练模型进行多模态情感识别 链接:https://arxiv.org/abs/2108.09669

作者:Krishna D N 机构:Freshworks Inc. 备注:5 Pages, REJECTED FROM INTERSPEECH 2021 摘要:最近,自我监督预训练在机器学习的许多领域都取得了显著的进步,包括语音和NLP。我们建议使用大型的自我监督预训练模型对音频和文本模态进行跨模态注意,以实现多模态情感识别。我们使用Wav2Vec2.0[1]作为音频编码器基础,用于稳健的语音特征提取,并使用BERT模型[2]作为文本编码器基础,以更好地表示文本的上下文。这些在大量未标记数据上训练的高容量模型包含丰富的特征表示,可以提高下游任务的性能。我们使用跨模态注意[3]机制从自监督模型学习音频和文本表示之间的对齐。跨模态注意还有助于提取音频和文本特征之间的交互信息。我们使用音频和文本模态的统计池从帧级特征中获得话语级特征表示,并使用早期融合技术将它们结合起来。我们的实验表明,在IEMOCAP数据集[35]上,与之前最先进的方法[3]相比,所提出的方法在精确度上获得了1.88%的绝对提高。我们还对音频和文本模式进行了单峰实验,并将其与以前最好的方法进行了比较。 摘要:Recently, self-supervised pre-training has shown significant improvements in many areas of machine learning, including speech and NLP. We propose using large self-supervised pre-trained models for both audio and text modality with cross-modality attention for multimodal emotion recognition. We use Wav2Vec2.0 [1] as an audio encoder base for robust speech features extraction and the BERT model [2] as a text encoder base for better contextual representation of text. These high capacity models trained on large amounts of unlabeled data contain rich feature representations and improve the downstream task's performance. We use the cross-modal attention [3] mechanism to learn alignment between audio and text representations from self-supervised models. Cross-modal attention also helps in extracting interactive information between audio and text features. We obtain utterance-level feature representation from frame-level features using statistics pooling for both audio and text modality and combine them using the early fusion technique. Our experiments show that the proposed approach obtains a 1.88% absolute improvement in accuracy compared to the previous state-of-the-art method [3] on the IEMOCAP dataset [35]. We also conduct unimodal experiments for both audio and text modalities and compare them with previous best methods.

【2】 General Theory of Music by Icosahedron 2: Analysis of musical pieces by the exceptional musical icosahedra 标题:二十面体的音乐通论2:特殊音乐二十面体的音乐作品分析 链接:https://arxiv.org/abs/2108.10294

作者:Yusuke Imai 机构:Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka ,-, Japan 备注:33 pages, 51 figures 摘要:我们提出了一种新的方法来分析音乐作品,使用特殊的音乐二十面体,其中所有的主要/次要三位一体都由黄金三角形和黄金侏儒表示。首先,我们引入一个黄金邻域的概念,它描述了与给定的黄金三角形或侏儒相邻的黄金三角形/侏儒。然后,我们研究了特殊的音乐二十面体和新黎曼理论之间的关系,发现黄金邻域和二十面体对称性将任何大/小三和弦与任何大/小三和弦联系起来。其次,我们展示了如何运用特殊的音乐二十面体来分析由四个或更多音调构成的和声。我们介绍了两个概念,黄金分解和黄金奇异。黄金分割是将给定和声分解为若干和声,构成给定和声,并表示黄金图形(金三角、黄金侏儒或黄金矩形)。和声是金色单数的当且仅当和声没有金色分解。我们展示了间日第七和弦和神秘和弦的黄金分析(黄金分解分析)的结果。在第1类[星]和第4类[星]特殊音乐二十面体中,占主导地位的第七和弦是金色单数,而在第2类[星]和第3类[星]特殊音乐二十面体中,减半的第七和弦是金色单数。最后,我们将黄金分析应用于约翰·塞巴斯蒂安·巴赫(BWV846)的著名C大调前奏曲。我们发现2型[星型]或3型[星型]特殊音乐二十面体上的7个金色数字组合双重代表了BWV 846的所有度量。 摘要:We propose a new approach to analyses of musical pieces by using the exceptional musical icosahedra where all the major/minor triads are represented by golden triangles and golden gnomons. First, we introduce a concept of the golden neighborhood that characterizes golden triangles/gnomons that neighbor a given golden triangle or gnomon. Then, we investigate a relation between the exceptional musical icosahedra and the neo-Riemannian theory, and find that the golden neighborhoods and the icosahedron symmetry relate any major/minor triad with any major/minor triad. Second, we show how the exceptional musical icosahedra are applied to analyzing harmonies constructed by four or more tones. We introduce two concepts, golden decomposition and golden singular. The golden decomposition is a decomposition of a given harmony into some harmonies constructing the given harmony and represented the golden figure (a golden triangle, a golden gnomon, or a golden rectangle). A harmony is golden singular if and only if the harmony does not have golden decompositions. We show results of the golden analysis (analysis by the golden decomposition) of the tertian seventh chords and the mystic chords. While the dominant seventh chord is golden singular in the type 1[star] and the type 4[star] exceptional musical icosahedron, the half-diminished seventh chord is golden singular in the type 2 [star] and the type 3[star] exceptional musical icosahedron. Last, we apply the golden analysis to the famous prelude in C major by Johan Sebastian Bach (BWV 846). We found 7 combinations of the golden figures on the type 2 [star] or the type 3 [star] exceptional musical icosahedron dually represent all the measures of the BWV 846.

【3】 Automatic Speech Recognition using limited vocabulary: A survey 标题:使用有限词汇量的自动语音识别:综述 链接:https://arxiv.org/abs/2108.10254

作者:Jean Louis K. E. Fendji,Diane M. Tala,Blaise O. Yenke,Marcellin Atemkeng 机构:Department Mathematics and Computer Science, University of Ngaoundere, Ngaoundere, Cameroon (e-mail: 备注:20 pages, 9 figures, 6 tables, submitted to IEEE ACCESS for possible publication 摘要:自动语音识别(ASR)是一个非常活跃的研究领域,因为它有着大量的应用和支持语音处理的接口或计算设备。但大部分应用程序都基于资源丰富的语言,而这些语言的资源不足。然而,ASR代表了一种不可否认的推广此类语言的手段,尤其是在设计涉及文盲的人对人或人对机器系统时。设计面向资源不足语言的ASR系统的一种方法是从有限的词汇表开始。ASR使用有限的词汇量是语音识别问题的一个子集,其重点是识别少量的单词或句子。本文旨在通过有限的词汇,对ASR系统背后的机制、技术、工具、项目、最近的贡献以及可能的未来方向提供一个全面的视角。因此,这项工作为使用有限词汇设计ASR系统提供了一条途径。虽然重点放在有限的词汇上,但本次调查中报告的大多数工具和技术一般适用于ASR系统。 摘要:Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary. This work consequently provides a way to go when designing ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey applied to ASR systems in general.

【4】 Tracked 3D Ultrasound and Deep Neural Network-based Thyroid Segmentation reduce Interobserver Variability in Thyroid Volumetry 标题:基于跟踪三维超声和深度神经网络的甲状腺分割降低甲状腺体积测量中的观察者间变异性 链接:https://arxiv.org/abs/2108.10118

作者:Markus Krönke,Christine Eilers,Desislava Dimova,Melanie Köhler,Gabriel Buschner,Lilit Mirzojan,Lemonia Konstantinidou,Marcus R. Makowski,James Nagarajah,Nassir Navab,Wolfgang Weber,Thomas Wendler 机构:Department of Nuclear Medicine, School of Medicine, Technical University of Munich, Department of Radiology, School of Medicine, Technical University of Munich, Munich, Chair for Computer Aided Medical Procedures and Augmented Reality, Department of 备注:7 figures, 19 pages, under review 摘要:背景:甲状腺容量测定在甲状腺疾病的诊断、治疗和监测中至关重要。然而,传统的二维超声甲状腺容积测定法高度依赖于操作者。本研究比较了2D超声和跟踪3D超声与基于深度神经网络的甲状腺自动分割,包括观察者之间和观察者内部的变异性、时间和准确性。体积参考值为MRI。方法:对28名健康志愿者进行二维、三维超声扫描及MRI检查。三位具有不同经验(6、4和1A)的医师(MD 1、2、3)对每位志愿者进行了三次二维超声和三次跟踪三维超声扫描。在2D扫描中,甲状腺叶体积采用椭球体公式计算。卷积深度神经网络(CNN)自动分割3D甲状腺叶。在MRI(T1 VIBE序列)上,由经验丰富的医生手动分割甲状腺。结果:CNN经过训练,骰子得分为0.94。比较两种MDs的观察者间变异性显示,2D和3D的平均差异分别为0.58 ml至0.52 ml(MD1对2)、-1.33 ml至-0.17 ml(MD1对3)和-1.89 ml至-0.70 ml(MD2对3)。配对样本t检验显示2D和3D的两个比较存在显著差异。二维和三维超声的观察内变异性相似。通过配对样本t检验比较超声体积和MRI体积,发现所有MDs的2D体积测定结果存在显著差异,而3D超声体积测定结果无显著差异。三维超声的采集时间明显缩短。结论:跟踪三维超声结合CNN分割显著降低了观察者之间甲状腺容积测量的变异性,并在较短的采集时间内提高了测量的准确性。 摘要:Background: Thyroid volumetry is crucial in diagnosis, treatment and monitoring of thyroid diseases. However, conventional thyroid volumetry with 2D ultrasound is highly operator-dependent. This study compares 2D ultrasound and tracked 3D ultrasound with an automatic thyroid segmentation based on a deep neural network regarding inter- and intraobserver variability, time and accuracy. Volume reference was MRI. Methods: 28 healthy volunteers were scanned with 2D and 3D ultrasound as well as by MRI. Three physicians (MD 1, 2, 3) with different levels of experience (6, 4 and 1 a) performed three 2D ultrasound and three tracked 3D ultrasound scans on each volunteer. In the 2D scans the thyroid lobe volumes were calculated with the ellipsoid formula. A convolutional deep neural network (CNN) segmented the 3D thyroid lobes automatically. On MRI (T1 VIBE sequence) the thyroid was manually segmented by an experienced medical doctor. Results: The CNN was trained to obtain a dice score of 0.94. The interobserver variability comparing two MDs showed mean differences for 2D and 3D respectively of 0.58 ml to 0.52 ml (MD1 vs. 2), -1.33 ml to -0.17 ml (MD1 vs. 3) and -1.89 ml to -0.70 ml (MD2 vs. 3). Paired samples t-tests showed significant differences in two comparisons for 2D and none for 3D. Intraobsever variability was similar for 2D and 3D ultrasound. Comparison of ultrasound volumes and MRI volumes by paired samples t-tests showed a significant difference for the 2D volumetry of all MDs, and no significant difference for 3D ultrasound. Acquisition time was significantly shorter for 3D ultrasound. Conclusion: Tracked 3D ultrasound combined with a CNN segmentation significantly reduces interobserver variability in thyroid volumetry and increases the accuracy of the measurements with shorter acquisition times.

【5】 Subject Envelope based Multitype Reconstruction Algorithm of Speech Samples of Parkinson's Disease 标题:基于主题包络的帕金森病语音样本多类型重建算法 链接:https://arxiv.org/abs/2108.09922

作者:Yongming Li,Chengyu Liu,Pin Wang,Hehua Zhang,Anhai Wei 机构:(,. School of Microcommunication Engineering, Chongqing University, Chongqing , P.R. China;, . Department of Medical Engineering, Daping Hospital, Army Medical University (Third Military Medical University), Chongqing, China) 备注:11 pages, 6 tables 摘要:帕金森病(Parkinson'sdisease,PD)的危险性非常严重,PD语音识别是目前一种有效的诊断方法。然而,由于疾病分期、语料库和其他因素对数据收集的影响,同一受试者内每个样本反映PD状态的能力各不相同。没有样品是完全无用的,没有样品是100%完美的。这一特点意味着它不适合仅仅移除一些样品或保留一些样品。为了获得高质量的新样品,需要考虑样品转化。不幸的是,现有的PD语音识别方法主要集中在特征学习和分类器设计上,而不是样本学习,很少有方法考虑样本变换。针对上述问题,本文提出了一种基于多类型重构算子的局部放电语音样本变换算法。该算法分为四个主要步骤。该算法设计了三种类型的重建算子:A型、B型和C型。对于A型算子,通过设计线性变换直接重构原始数据集以获得第一个数据集。类型B算子用于对数据集进行聚类和线性变换,以获得第二个新数据集。第三种算子,即C型算子,通过聚类和卷积重构数据集,得到第三种数据集。最后,基于这三个新的数据集对基础分类器进行训练,然后对分类结果进行决策加权融合。在实验部分,使用了两个具有代表性的PD语音数据集进行验证。结果表明,该算法是有效的。与其他算法相比,该算法在分类精度上有明显的提高。 摘要:The risk of Parkinson's disease (PD) is extremely serious, and PD speech recognition is an effective method of diagnosis nowadays. However, due to the influence of the disease stage, corpus, and other factors on data collection, the ability of every samples within one subject to reflect the status of PD vary. No samples are useless totally, and not samples are 100% perfect. This characteristic means that it is not suitable just to remove some samples or keep some samples. It is necessary to consider the sample transformation for obtaining high quality new samples. Unfortunately, existing PD speech recognition methods focus mainly on feature learning and classifier design rather than sample learning, and few methods consider the sample transformation. To solve the problem above, a PD speech sample transformation algorithm based on multitype reconstruction operators is proposed in this paper. The algorithm is divided into four major steps. Three types of reconstruction operators are designed in the algorithm: types A, B and C. Concerning the type A operator, the original dataset is directly reconstructed by designing a linear transformation to obtain the first dataset. The type B operator is designed for clustering and linear transformation of the dataset to obtain the second new dataset. The third operator, namely, the type C operator, reconstructs the dataset by clustering and convolution to obtain the third dataset. Finally, the base classifier is trained based on the three new datasets, and then the classification results are fused by decision weighting. In the experimental section, two representative PD speech datasets are used for verification. The results show that the proposed algorithm is effective. Compared with other algorithms, the proposed algorithm achieves apparent improvements in terms of classification accuracy.

0 人点赞