Duke@coursera 数据分析与统计推断 unit4 inference for numerical variables

2019-04-10 16:48:22 浏览数 (1)

inference for numerical variables

一、hypothesis testing for paired data

hypotheses for paired means:

二、confidence intervals for paired data

estimating the difference between pairedmeans:

三、comparing independent means

Conditions for inference for comparing twoindependent means:

1. Independence:

✓ within groups: sampled observations mustbe independent

‣ random sample/assignment

‣ if sampling without replacement, n < 10% of population

✓ between groups: the two groups must beindependent of each other (non-paired)

2. Sample size/skew: Each sample size mustbe at least 30 (n1 ≥ 30 and n2 ≥ 30), larger if the population distributionsare very skewed.

testing for a differencebetween independent means

‣ null hypothesis: no difference

‣ alternative hypothesis: some difference

‣ same conditions and SE as the confidenceinterval

summary

四、bootstrapping

‣ An alternative approach to constructingconfidence intervals is bootstrapping.

‣ This term comes from the phrase “pullingoneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossibletask without any outside help.

‣ In this case the im/possible task isestimating a population parameter, and we’ll accomplish it using data from onlythe given sample.

bootstrapping scheme

(1) take a bootstrap sample - a randomsample taken with replacement from the original sample, of the same size as theoriginal sample

(2) calculate the bootstrap statistic - astatistic such as mean, median, proportion, etc. computed on the bootstrap samples

(3) repeat steps (1) and (2) many times tocreate a bootstrap distribution - a distribution of bootstrap statistics

bootstrappinglimitations

‣ Not as rigid conditions as CLT basedmethods.

‣ However if the bootstrap distribution isextremely skewed or sparse, the bootstrap interval might be unreliable.

‣ A representative sample is required forgeneralizability. If the sample is biased, the estimates resulting from thissample will also be biased

bootstrap vs.sampling distribution

‣ Sampling distribution created usingsampling (with replacement) from the population.

‣ Bootstrap distribution created usingsampling (with replacement) from the sample.

‣ Both are distributions of samplestatistics

五、t distribution

‣ n is small & σ unknown (almostalways), use the t distribution to address the uncertainty of the standarderror estimate

‣ bell shaped but thicker tails than thenormal

‣ observations more likely to fall beyond2 SDs from the mean ‣ extra thick tails helpful for mitigatingthe effect of a less reliable estimate for the standard error of the samplingdistribution

‣ always centered at 0 (like the standardnormal)

‣ has one parameter: degrees of freedom(df) - determines thickness of tails

‣ remember, the normal distribution hastwo parameters: mean and SD

tstatistic

六、inference for a small sample mean

七、inference for comparing two small sample means

八、comparing more than two means

0 人点赞