在流式模型和分布式模型中实现最优矩估计

2019-07-18 16:55:03 浏览数 (2)

作者:Rajesh Jayaram,David P. Woodruff

摘要:数据流模型中最古老的问题之一是近似第p个矩∥X∥pp=Σni= 1 | Xi | pof基础向量X∈Rn,它表示为poly(n)更新的序列。坐标。特别感兴趣的是当p∈(0,2)。虽然当允许正和负更新时,已知这个问题的紧密空间界限(ε-2logn)位,但令人惊讶的是,当所有空间复杂性都存在差距时更新是正的。具体来说,上限是O(ε-2logn)位,而下限只是Ω(ε-2 logn)位。最近,假设得到了O~(ε-2 logn)位的上界。更新以随机顺序到达。

我们证明了forp∈(0,1),不需要随机顺序假设。即,我们给出了用于估计∥X∥pp的O~(ε-2 logn)位的最坏情况流的上界。我们的技术还给出了估计流中经验熵的新上界。另一方面,我们证明了forp∈(1,2),在自然协调器和黑板通信拓扑中,有一个O~(ε-2)位最大值 - 基于随机舍入方案的通信上界。我们的协议还产生了重击者和近似矩阵乘积的协议。我们将结果推广到任意通信拓扑G,获得一个O~(ε2logd)最大通信上界,直径是直径有趣的是,我们的上界排除了基于自然通信复杂性的方法,用于证明流式算法的μ(ε-2logn)比特下限为p∈(1,2)。特别是,任何这样的下界都必须来自拓扑结构。大直径。

原文标题:Towards Optimal Moment Estimation in Streaming and Distributed Models

原文摘要:One of the oldest problems in the data stream model is to approximate thep-th moment∥X∥pp=∑ni=1|Xi|pof an underlying vectorX∈Rn, which is presented as a sequence of poly(n)updates to its coordinates. Of particular interest is whenp∈(0,2]. Although a tight space bound ofΘ(ϵ−2logn)bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity when all updates are positive. Specifically, the upper bound isO(ϵ−2logn)bits, while the lower bound is onlyΩ(ϵ−2 logn)bits. Recently, an upper bound ofO~(ϵ−2 logn)bits was obtained assuming that the updates arrive in a random order.

We show that forp∈(0,1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams ofO~(ϵ−2 logn)bits for estimating∥X∥pp. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that forp∈(1,2], in the natural coordinator and blackboard communication topologies, there is anO~(ϵ−2)bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologiesG, obtaining anO~(ϵ2logd)max-communication upper bound, wheredis the diameter ofG. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving anΩ(ϵ−2logn)bit lower bound forp∈(1,2]for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.

地址:https://arxiv.org/abs/1907.05816

0 人点赞