所以结论是有问题!我依然还是不推荐用这个包做亚组分析哈~
下面我的一些探索过程。
Publish
包有一个subgroupAnalysis
函数也可以实现亚组分析。我在之前的推文中说这个函数有一些问题,所以不推荐使用。
今天来探索下它的问题。还是用之前的数据集,这里就不对这个数据集做介绍了,大家可以翻看之前的推文。
- R语言亚组分析及森林图绘制
- R语言亚组分析1行代码实现!
#rm(list = ls())
library(survival)
#str(colon)
suppressMessages(library(tidyverse))
df <- colon %>%
mutate(rx=as.numeric(rx)) %>%
filter(etype == 1, !rx == 2) %>% #rx %in% c("Obs","Lev 5FU"),
select(time, status,rx, sex, age,obstruct,perfor,adhere,differ,extent,surg,node4) %>%
mutate(sex=factor(sex, levels=c(0,1),labels=c("female","male")),
age=ifelse(age >65,">65","<=65"),
age=factor(age, levels=c(">65","<=65")),
obstruct=factor(obstruct, levels=c(0,1),labels=c("No","Yes")),
perfor=factor(perfor, levels=c(0,1),labels=c("No","Yes")),
adhere=factor(adhere, levels=c(0,1),labels=c("No","Yes")),
differ=factor(differ, levels=c(1,2,3),labels=c("well","moderate","poor")),
extent=factor(extent, levels=c(1,2,3,4),
labels=c("submucosa","muscle","serosa","contiguous")),
surg=factor(surg, levels=c(0,1),labels=c("short","long")),
node4=factor(node4, levels=c(0,1),labels=c("No","Yes")),
rx=ifelse(rx==3,0,1),
rx=factor(rx,levels=c(0,1))
)
str(df)
## 'data.frame': 619 obs. of 12 variables:
## $ time : num 968 3087 542 245 523 ...
## $ status : num 1 0 1 1 1 1 0 0 0 1 ...
## $ rx : Factor w/ 2 levels "0","1": 1 1 2 1 2 1 2 1 1 2 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 1 1 2 1 2 1 2 2 ...
## $ age : Factor w/ 2 levels ">65","<=65": 2 2 1 1 1 2 2 1 2 2 ...
## $ obstruct: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ perfor : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ adhere : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
## $ differ : Factor w/ 3 levels "well","moderate",..: 2 2 2 2 2 2 2 2 3 2 ...
## $ extent : Factor w/ 4 levels "submucosa","muscle",..: 3 3 2 3 3 3 3 3 3 3 ...
## $ surg : Factor w/ 2 levels "short","long": 1 1 1 2 2 1 1 2 2 1 ...
## $ node4 : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 1 1 1 1 ...
进行亚组分析:
代码语言:javascript复制library(Publish)
## Loading required package: prodlim
fit <- coxph(Surv(time, status) ~ rx, data = df)
res_publish <- subgroupAnalysis(fit,
data = df,
treatment = "rx",
subgroups = c("sex","age","obstruct","perfor","adhere","differ","extent","surg","node4")
)
publish(res_publish)
## subgroups level sample_0 sample_1 event_0 event_1 time_0 time_1 HazardRatio pinteraction subgroup Lower confint
## sex female 163 149 74 81 243849 195096 1.321339e 00 0.0283 NA 0.964 0.964- 1.81
## sex male 141 166 45 96 250006 208495 2.237907e 00 0.0283 NA 1.570 1.570- 3.19
## age >65 114 110 40 63 186392 135830 1.961300e 00 0.3145 NA 1.319 1.319- 2.92
## age <=65 190 205 79 114 307463 267761 1.526851e 00 0.3145 NA 1.146 1.146- 2.03
## obstruct No 250 252 98 140 412753 329934 1.635528e 00 0.7511 NA 1.263 1.263- 2.12
## obstruct Yes 54 63 21 37 81102 73657 1.800281e 00 0.7511 NA 1.054 1.054- 3.08
## perfor No 296 306 116 170 481322 394990 1.642233e 00 0.4283 NA 1.297 1.297- 2.08
## perfor Yes 8 9 3 7 12533 8601 2.815180e 00 0.4283 NA 0.728 0.728-10.89
## adhere No 265 268 100 148 439661 352840 1.688957e 00 0.7564 NA 1.310 1.310- 2.18
## adhere Yes 39 47 19 29 54194 50751 1.527858e 00 0.7564 NA 0.857 0.857- 2.73
## differ well 29 27 9 17 54084 33478 2.690891e 00 0.4025 NA 1.199 1.199- 6.04
## differ moderate 215 229 81 124 359531 311017 1.652095e 00 0.4025 NA 1.248 1.248- 2.19
## differ poor 54 52 28 33 68676 49502 1.413040e 00 0.4025 NA 0.854 0.854- 2.34
## extent submucosa 10 8 3 0 19733 17689 1.900719e-07 0.1000 NA 0.000 0.000- Inf
## extent muscle 32 38 6 15 63100 61693 2.441617e 00 0.1000 NA 0.947 0.947- 6.29
## extent serosa 251 249 104 148 397919 307963 1.679296e 00 0.1000 NA 1.306 1.306- 2.16
## extent contiguous 11 20 6 14 13103 16246 1.547962e 00 0.1000 NA 0.595 0.595- 4.03
## surg short 228 224 82 123 379802 292677 1.825926e 00 0.1850 NA 1.381 1.381- 2.41
## surg long 76 91 37 54 114053 110914 1.297101e 00 0.1850 NA 0.853 0.853- 1.97
## node4 No 225 228 70 114 400644 331484 1.832787e 00 0.3383 NA 1.361 1.361- 2.47
## node4 Yes 79 87 49 63 93211 72107 1.451102e 00 0.3383 NA 0.998 0.998- 2.11
结果给出了HR、HR的可信区间、P-for-interaction。
我们探索下它的HR、HR的可信区间、P-for-interaction是怎么计算的。
P-for-interaction
首先是P-for-interaction的计算,通过翻看它的原代码可知,p-for-interaction 是通过似然比检验计算的,方法如下:
以sex
为例
fit0 <- coxph(Surv(time, status) ~ rx*sex, data = df)
fit1 <- coxph(Surv(time, status) ~ rx sex, data = df)
anova(fit0,fit1,test = "chisp")
## Analysis of Deviance Table
## Cox model: response is Surv(time, status)
## Model 1: ~ rx * sex
## Model 2: ~ rx sex
## loglik Chisq Df Pr(>|Chi|)
## 1 -1795.1
## 2 -1797.5 4.8122 1 0.02826 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
计算出来刚好是0.0283,和上面的结果是一样的,这个结果是没有问题的。虽然和jstable
计算的有误差,但是结果都是正确的。
HR及可信区间
我们以obstruct
这个变量为例,先手动计算一下它的HR和可信区间。
对数据取子集然后计算即可:
代码语言:javascript复制fit0 <- coxph(Surv(time, status) ~ rx, data = df[df["sex"] == "male",])
summary(fit0)
## Call:
## coxph(formula = Surv(time, status) ~ rx, data = df[df["sex"] ==
## "male", ])
##
## n= 307, number of events= 141
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rx1 0.8274 2.2873 0.1812 4.567 4.95e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rx1 2.287 0.4372 1.604 3.262
##
## Concordance= 0.598 (se = 0.02 )
## Likelihood ratio test= 22.45 on 1 df, p=2e-06
## Wald test = 20.85 on 1 df, p=5e-06
## Score (logrank) test = 22.05 on 1 df, p=3e-06
fit1 <- coxph(Surv(time, status) ~ rx, data = df[df["sex"] == "female",])
summary(fit1)
## Call:
## coxph(formula = Surv(time, status) ~ rx, data = df[df["sex"] ==
## "female", ])
##
## n= 312, number of events= 155
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rx1 0.2745 1.3159 0.1608 1.707 0.0878 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rx1 1.316 0.7599 0.9601 1.804
##
## Concordance= 0.539 (se = 0.02 )
## Likelihood ratio test= 2.92 on 1 df, p=0.09
## Wald test = 2.91 on 1 df, p=0.09
## Score (logrank) test = 2.93 on 1 df, p=0.09
当sex=male
时,计算出来的HR=2.287,可信区间为1.604~3.262, 当sex=female
时,计算出来的HR=1.3159,可信区间为0.9601~1.804,
这个结果和我们之前用jstable
以及purrr
手动计算出来的结果都是一样的,但是和上面subgroupAnalysis
的结果不一样。
翻看源码得知,它的HR和可信区间是使用下面的代码计算的:
代码语言:javascript复制ff1 <- update.formula(object$formula, paste("~ . ", var, "*", treatment))
ff2 <- update.formula(object$formula, paste("~ . ", var, " ", treatment))
fit1 <- do.call("coxph", list(formula = ff1, data = datt,
fit2 <- do.call("coxph", list(formula = ff2, data = datt,
pinteraction <- anova(fit1, fit2)[4][2, ]
## 省略
rt <- suppressMessages(data.table::setDT(summary(regressionTable(fit1),
print = FALSE)$rawTable)[, tail(.SD, length)])
rt <- rt[, `:=`(Variable, NULL)]
也就是说,对于这个sex
变量,它的male和female的HR及可信区间是通过以下方式计算出来的:
fit3 <- coxph(Surv(time, status) ~ rx sex rx:sex, data = df)
summary(regressionTable(fit3),print = FALSE)
## Variable Units HazardRatio CI.95 p-value
## 1 rx(0): sex(male vs female) 0.61 [0.42;0.89] 0.00991
## 2 rx(1): sex(male vs female) 1.04 [0.77;1.40] 0.79501
## 3 sex(female): rx(1 vs 0) 1.32 [0.96;1.81] 0.08318
## 4 sex(male): rx(1 vs 0) 2.24 [1.57;3.19] < 0.001
此时的sex=female
的HR是1.32,可信区间为0.96~1.81,这个结果还是和我们自己计算的是一样的;sex=male
是的HR是2.24,可信区间为1.57~3.19。这个结果和上面subgroupAnalysis
的结果是一样的,但是和我们自己计算的差距就比较大了。
但是很明显是有问题的,因为它没分亚组,而且我也不太懂它的公式为什么这么复杂,也有可能是regressionTable
进行了一些计算。
限于个人水平,难免出错,欢迎各位老师批评指正。