Sweet Snippet 之 方差计算

2020-02-20 11:27:12 浏览数 (1)

方差计算的简单实现

在概率统计中,方差用于衡量一组数据的离散程度,相关的计算公式如下(总体方差):

μ=1N∑i=1Nxiσ2=1N∑i=1N(xi−μ)2 begin{aligned} &mu = frac{1}{N}sum_{i = 1}^{N}x_i \ &sigma^2 = frac{1}{N}sum_{i = 1}^{N}(x_i - mu)^2 end{aligned} ​μ=N1​i=1∑N​xi​σ2=N1​i=1∑N​(xi​−μ)2​

其中 μmuμ 为数据的平均值, 而 σ2sigma^2σ2 即是(总体)方差.

相应的实现代码如下:

代码语言:javascript复制
-- Lua
function average(values, count)
    local sum = 0
    
    for i = 1, count do
        sum = sum   values[i]
    end
    
    return sum / count
end

function variance(values, count)
    local average = average(values, count)
    local variance = 0
    
    for i = 1, count do
        local delta = values[i] - average 
        variance = variance   delta * delta
    end
    
    return variance / count
end

通常我们需要在获取新样本数据时更新方差,简单的方法就是按照上述公式重新计算一遍,我们可以通过计算数据子集方差的方式来模拟这个过程:

代码语言:javascript复制
-- Lua
function variance_list(values)
    local ret = {}
    
    for i = 1, #values do
        ret[i] = variance(values, i)
    end
    
    return ret
end

更好的一种方式是通过递推来计算数据子集的方差,这需要对方差的计算公式做一些变形:

σ2=1N∑i=1N(xi−μ)2 ⟹ σ2=1N∑i=1N(xi2 μ2−2xiμ) ⟹ σ2=1N(∑i=1Nxi2 ∑i=1Nμ2−∑i=1N2xiμ) ⟹ σ2=1N(∑i=1Nxi2 Nμ2−2μ∑i=1Nxi) ⟹ σ2=1N(∑i=1Nxi2 Nμ2−2Nμ2) ⟹ σ2=1N(∑i=1Nxi2−Nμ2) ⟹ σ2=1N(∑i=1Nxi2−N(∑i=1NxiN)2) ⟹ σ2=1N(∑i=1Nxi2−(∑i=1Nxi)2N) begin{aligned} &sigma^2 = frac{1}{N}sum_{i = 1}^{N}(x_i - mu)^2 implies \ &sigma^2 = frac{1}{N}sum_{i = 1}^{N}(x_i^2 mu^2 - 2x_imu) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 sum_{i = 1}^{N}mu^2 - sum_{i = 1}^{N}2x_imu) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 Nmu^2 - 2musum_{i = 1}^{N}x_i) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 Nmu^2 - 2Nmu^2) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 - Nmu^2) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 - N(frac{sum_{i=1}^{N}x_i}{N})^2) implies \ &sigma^2 = frac{1}{N}(sum_{i = 1}^{N}x_i^2 - frac{(sum_{i=1}^{N}x_i)^2}{N}) end{aligned} ​σ2=N1​i=1∑N​(xi​−μ)2⟹σ2=N1​i=1∑N​(xi2​ μ2−2xi​μ)⟹σ2=N1​(i=1∑N​xi2​ i=1∑N​μ2−i=1∑N​2xi​μ)⟹σ2=N1​(i=1∑N​xi2​ Nμ2−2μi=1∑N​xi​)⟹σ2=N1​(i=1∑N​xi2​ Nμ2−2Nμ2)⟹σ2=N1​(i=1∑N​xi2​−Nμ2)⟹σ2=N1​(i=1∑N​xi2​−N(N∑i=1N​xi​​)2)⟹σ2=N1​(i=1∑N​xi2​−N(∑i=1N​xi​)2​)​

基于此,我们就可以递推的计算数据子集的方差了,相关的计算复杂度则降低了一个数量级(O(n2) ⟹ O(n)O(n^2) implies O(n)O(n2)⟹O(n)):

代码语言:javascript复制
-- lua
function variance_list_recurrence(values)
    local ret = {}
    
    local pre_square_sum = 0
    local pre_sum = 0
    
    for i = 1, #values do
        local val = values[i]
        
        pre_square_sum = pre_square_sum   val * val
        pre_sum = pre_sum   val
        
        ret[i] = (pre_square_sum - (pre_sum * pre_sum / i)) / i
    end
    
    return ret
end
sum

0 人点赞