统计学中的区间估计

统计学中有两大分支——描述性统计学（description stats）和推断性统计学（inference stats）。推断性统计学中，很重要的一点就是区间估计。

三种估计区间

置信区间

置信区间（confidence intervals）是最常用的区间估计。

其估计对象为群体参数（诸如平均数，标准差，比例等），来源为样本采样，产生误差的原因为采样误差（不同的样本得到的目标参数可能不一样）。

其解释可参考下图：

95%的置信区间含义如下：从同一个群体中采样100次，目标是群体的平均数。100个不同的样本，有100个不同的置信区间，95个置信区间中含有群体目标参数（该例中即为平均是）。

同时，谈到置信区间时，需要注意以下两点： 1. 提高样本容量时，取样误差减小，置信区间变得狭窄。极限情况下，样本等于总体，没有取样误差，置信区间归于样本参数。 2. 置信区间只告诉了群体参数的大致范围，不告诉个体参数的分布情况。

预测区间

预测区间，指的是通过一定的模型（比如线性模型）得到某个数据的预测值，并估计预测值的区间。

预测遇见一般比置信区间（对于预测的置信区间，可以把参考对象设置为预测的平均数）更宽。因为置信区间只考虑到了样本中的取样误差，而预测区间还得考虑到预测的不确定性。

忍受区间

忍受空间，在置信空间的基础上，增加了包含群体比例这一参数。

上图中，有95%的置信水平，至少95%的灯泡时长会落在（1060，1435）这个区间中。

忍受区间，一般用在对于置信区间有严格要求，通过改变群体比例参数达到要求的情况。

三个区间的比较

置信区间来源于采样误差。
预测区间来源于采样误差，预测误差。
忍受区间来源于采样误差，群体比例误差。

数据显著性

假设检验是根据样本数据，在虚无假说与实验假说中二选一（mutually exclusive）。

一个检验是数据显著（statistically significant），当且仅当其取样数据相对虚无假说（lack of difference）极不寻常，以至于我们针对群体数据可以拒绝虚无假说。

极不寻常有以下三点反映： - 虚无假说成立——图像以虚无假说为中心 - 显著水平——我们画的临界线距离虚无假说多远 - 取样数据——是否落在临界线外面

显著水平有时候也叫做误差率（error rate），其原因是：假设α=0.05alpha=0.05，假设虚无假说成立，那么我们有0.05的几率，采样的点落在虚无假说之外且足够远以至于拒绝虚无假说，得到了错误的结果。但这种误差并不意味着实验的错误，而是因为不寻常的随机采样误差，运气罢了。

三个指标

通常有以下三个指标，检验数据显著性：

P value
significance level（又名αalpha）
confidence level(1-αalpha)

以下举一个例子来说明：首先，确定significance level为0.05，假若数据显著，那么能反映： 1. P value 小于 0.05（假设的前提是null hypothesis满足，P值足够小表示离虚无假说足够远，能够推翻虚无假说） 2. 置信区间中不包含虚无假说

P值

How Do You Interpret P Values?

In technical terms, a P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.

For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due to random sampling error.

P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. This limitation leads us into the next section to cover a very common misinterpretation of P values.

P Values Are NOT the Probability of Making a Mistake

Incorrect interpretations of P values are very common. The most common mistake is to interpret a P value as the probability of making a mistake by rejecting a true null hypothesis (a Type I error).

There are several reasons why P values can’t be the error rate.

First, P values are calculated based on the assumptions that the null is true for the population and that the difference in the sample is caused entirely by random chance. Consequently, P values can’t tell you the probability that the null is true or false because it is 100% true from the perspective of the calculations.

Second, while a low P value indicates that your data are unlikely assuming a true null, it can’t evaluate which of two competing cases is more likely:

The null is true but your sample was unusual.
The null is false.

Determining which case is more likely requires subject area knowledge and replicate studies.

Let’s go back to the vaccine study and compare the correct and incorrect way to interpret the P value of 0.04:

Correct: Assuming that the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due to random sampling error.
Incorrect: If you reject the null hypothesis, there’s a 4% chance that you’re making a mistake.

To see a graphical representation of how hypothesis tests work, see my post: Understanding Hypothesis Tests: Significance Levels and P Values.

What Is the True Error Rate?

Think that this interpretation difference is simply a matter of semantics, and only important to picky statisticians? Think again. It’s important to you.

If a P value is not the error rate, what the heck is the error rate? (Can you guess which way this is heading now?)

Sellke et al.* have estimated the error rates associated with different P values. While the precise error rate depends on various assumptions (which I discuss here), the table summarizes them for middle-of-the-road assumptions.

P value	Probability of incorrectly rejecting a true null hypothesis
0.05	At least 23% (and typically close to 50%)
0.01	At least 7% (and typically close to 15%)

Do the higher error rates in this table surprise you? Unfortunately, the common misinterpretation of P values as the error rate creates the illusion of substantially more evidence against the null hypothesis than is justified. As you can see, if you base a decision on a single study with a P value near 0.05, the difference observed in the sample may not exist at the population level. That can be costly!

Now that you know how to interpret P values, read my five guidelines for how to use P values and avoid mistakes.

You can also read my rebuttal to an academic journal that actually banned P values!

difference error null random sample

0 人点赞