inference for numerical variables
一、hypothesis testing for paired data
hypotheses for paired means:
二、confidence intervals for paired data
estimating the difference between pairedmeans:
三、comparing independent means
Conditions for inference for comparing twoindependent means:
1. Independence:
✓ within groups: sampled observations mustbe independent
‣ random sample/assignment
‣ if sampling without replacement, n < 10% of population
✓ between groups: the two groups must beindependent of each other (non-paired)
2. Sample size/skew: Each sample size mustbe at least 30 (n1 ≥ 30 and n2 ≥ 30), larger if the population distributionsare very skewed.
testing for a differencebetween independent means
‣ null hypothesis: no difference
‣ alternative hypothesis: some difference
‣ same conditions and SE as the confidenceinterval
summary
四、bootstrapping
‣ An alternative approach to constructingconfidence intervals is bootstrapping.
‣ This term comes from the phrase “pullingoneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossibletask without any outside help.
‣ In this case the im/possible task isestimating a population parameter, and we’ll accomplish it using data from onlythe given sample.
bootstrapping scheme
(1) take a bootstrap sample - a randomsample taken with replacement from the original sample, of the same size as theoriginal sample
(2) calculate the bootstrap statistic - astatistic such as mean, median, proportion, etc. computed on the bootstrap samples
(3) repeat steps (1) and (2) many times tocreate a bootstrap distribution - a distribution of bootstrap statistics
bootstrappinglimitations
‣ Not as rigid conditions as CLT basedmethods.
‣ However if the bootstrap distribution isextremely skewed or sparse, the bootstrap interval might be unreliable.
‣ A representative sample is required forgeneralizability. If the sample is biased, the estimates resulting from thissample will also be biased
bootstrap vs.sampling distribution
‣ Sampling distribution created usingsampling (with replacement) from the population.
‣ Bootstrap distribution created usingsampling (with replacement) from the sample.
‣ Both are distributions of samplestatistics
五、t distribution
‣ n is small & σ unknown (almostalways), use the t distribution to address the uncertainty of the standarderror estimate
‣ bell shaped but thicker tails than thenormal
‣ observations more likely to fall beyond2 SDs from the mean ‣ extra thick tails helpful for mitigatingthe effect of a less reliable estimate for the standard error of the samplingdistribution
‣ always centered at 0 (like the standardnormal)
‣ has one parameter: degrees of freedom(df) - determines thickness of tails
‣ remember, the normal distribution hastwo parameters: mean and SD
tstatistic
六、inference for a small sample mean
七、inference for comparing two small sample means
八、comparing more than two means