时间衰减流的改进算法

作者：Vladimir Braverman,Harry Lang,Enayat Ullah,Samson Zhou

摘要：在数据流的时间衰减模型中，基础数据集的元素在按时间顺序获得的情况下，越晚获得的元素更重要。处理大型数据集的常用方法是去维持 emph {coreset}，这是处理数据的简洁摘要，即允许近似恢复预定查询。我们提供了一个通用框架，它采用任何离线核心集，并为多项式时间衰减函数提供时间衰减核心集。

我们还考虑了k-中值聚类的指数时间衰减模型，其中我们提供了利用在线设施定位算法的常数因子近似算法。我们的算法存储O（klog（hΔ） h）点，其中h是衰减函数的半衰期，Δ是数据集的纵横比。我们的技术也扩展到k-means聚类和M-estimators。

原文标题：Improved Algorithms for Time Decay Streams

原文摘要：In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a emph{coreset}, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions.

We also consider the exponential time decay model fork-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm storesO(klog(hΔ) h)points wherehis the half-life of the decay function andΔis the aspect ratio of the dataset. Our techniques extend tok-means clustering andM-estimators as well.

地址：https://arxiv.org/abs/1907.07574

时序数据库CTSDB

0 人点赞