全文链接:https://tecdat.cn/?p=33341
The NHEFS survey was designed to investigate the relationships between clinical, nutritional, and behavioural factors assessed in the first National Health and Nutrition Examination Survey NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. For more information see http://www.cdc.gov/nchs/nhanes/nhefs/nhefs. htm. This question will involve using this data to estimate the average causal effect of smoking cessation on weight gain. (a) Individuals were classified as treated if they reported, being smokers at baseline in 1971-75, and having quit smoking in the 1982 survey. The latter implies that the individuals included in our study did not die and were not otherwise lost to follow-up between baseline and 1982 (otherwise they would not have been able to respond to the survey). That is, we selected individuals into our study conditional on an event (responding to the 1982 survey) that occurred after the start of smoking cessation. If smoking cessation affects the probability of selection into the study, we might have selection bias (Hernan, Robins, 2014 Chapter 12, page 11). Would a randomized experiment of smoking cessation have this problem? How could a randomized experiment of smoking cessation be designed? What is the major difference between the latter randomized experiment and this study (NHEFS survey)? (b) Should a statistician be concerned that using the NHEFS data to compare weight loss in the group of subjects that quit smoking versus those that did not quit smoking is biased? If yes then state why you think the comparison might be biased, otherwise state why the comparison is unbiased. (c) Use R to estimate the propensity score for each subject in the study. Use the variables: sex, race, age, education.code, smokeintensity, smokers, exercise, active, wt71 as covariates. After calculating the propensity score use the Match function in R to match subjects on the propensity score. Does the balance between the two groups improve after matching? Hand in your R code and output. (d) Estimate the effect of smoking cessation on weight loss using propensity score matching? Did the propensity reduce the bias in estimating the treatment effect? What assumption can make to conclude that smoking cessation causes weight loss? Do you think this assumption is valid? Briefly explain. Hand in your R code and output
代码语言:javascript复制prop.model<-glm(qsmk~sex race age education.code smokeintensity smokeyrs exercise active wt71, family = binomial(), data = nhefshwdat)
对我们要对总体样本执行广义回归模型(logit回归),以是否戒烟为因变量,性别种族年龄教育程度等8个变量作为协变量,然后估计出每一个观测对象戒烟的概率是多少。
可以得到是否戒烟这个二元逻辑变量与其他协变量的线性回归关系。
代码语言:javascript复制nhefshwdat$p.qsmk.obs <- ifelse(qsmk == 0, 1 - predict(prop.model, type = "response"),
predict(prop.model, type = "response"))#用上一步得到的模型预测每一个观测对象的戒烟概率为多少,并赋值给p.qsmk.obs这个变量。
X <- prop.model$fitted#对nhefshwdat数据集中原始数据进行拟合
Y <- nhefshwdat$wt82_71#Y为观测对象从71年到82年的体重变化
Tr <-nhefshwdat$qsmk#Tr为观测对象是否戒烟
library(Matching)#读取Matching包
rr <-Match(Y=Y,Tr=Tr,X=X,M=1)#使用Match命令,对于每个戒烟的观测对象,找出一个与之具有最接近的概率值的,可是抽烟的观测对象,使得任何戒烟的观察对象的对照对象都具有唯一性,换言之,只能1对1匹配。观测他们的体重变化差异。
summary(rr)#