Publish做亚组分析有问题吗?

2023-10-10 12:50:33 浏览数 (2)

所以结论是有问题!我依然还是不推荐用这个包做亚组分析哈~

下面我的一些探索过程。

Publish包有一个subgroupAnalysis函数也可以实现亚组分析。我在之前的推文中说这个函数有一些问题,所以不推荐使用。

今天来探索下它的问题。还是用之前的数据集,这里就不对这个数据集做介绍了,大家可以翻看之前的推文。

  • R语言亚组分析及森林图绘制
  • R语言亚组分析1行代码实现!
代码语言:javascript复制
#rm(list = ls())
library(survival)
#str(colon)
suppressMessages(library(tidyverse))

df <- colon %>% 
  mutate(rx=as.numeric(rx)) %>% 
  filter(etype == 1, !rx == 2) %>%  #rx %in% c("Obs","Lev 5FU"), 
  select(time, status,rx, sex, age,obstruct,perfor,adhere,differ,extent,surg,node4) %>% 
  mutate(sex=factor(sex, levels=c(0,1),labels=c("female","male")),
         age=ifelse(age >65,">65","<=65"),
         age=factor(age, levels=c(">65","<=65")),
         obstruct=factor(obstruct, levels=c(0,1),labels=c("No","Yes")),
         perfor=factor(perfor, levels=c(0,1),labels=c("No","Yes")),
         adhere=factor(adhere, levels=c(0,1),labels=c("No","Yes")),
         differ=factor(differ, levels=c(1,2,3),labels=c("well","moderate","poor")),
         extent=factor(extent, levels=c(1,2,3,4),
                       labels=c("submucosa","muscle","serosa","contiguous")),
         surg=factor(surg, levels=c(0,1),labels=c("short","long")),
         node4=factor(node4, levels=c(0,1),labels=c("No","Yes")),
         rx=ifelse(rx==3,0,1),
         rx=factor(rx,levels=c(0,1))
         )

str(df)
## 'data.frame': 619 obs. of  12 variables:
##  $ time    : num  968 3087 542 245 523 ...
##  $ status  : num  1 0 1 1 1 1 0 0 0 1 ...
##  $ rx      : Factor w/ 2 levels "0","1": 1 1 2 1 2 1 2 1 1 2 ...
##  $ sex     : Factor w/ 2 levels "female","male": 2 2 1 1 2 1 2 1 2 2 ...
##  $ age     : Factor w/ 2 levels ">65","<=65": 2 2 1 1 1 2 2 1 2 2 ...
##  $ obstruct: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ perfor  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ adhere  : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
##  $ differ  : Factor w/ 3 levels "well","moderate",..: 2 2 2 2 2 2 2 2 3 2 ...
##  $ extent  : Factor w/ 4 levels "submucosa","muscle",..: 3 3 2 3 3 3 3 3 3 3 ...
##  $ surg    : Factor w/ 2 levels "short","long": 1 1 1 2 2 1 1 2 2 1 ...
##  $ node4   : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 1 1 1 1 ...

进行亚组分析:

代码语言:javascript复制
library(Publish)
## Loading required package: prodlim

fit <- coxph(Surv(time, status) ~ rx, data = df)
res_publish <- subgroupAnalysis(fit,
                        data = df,
                        treatment = "rx",
                        subgroups = c("sex","age","obstruct","perfor","adhere","differ","extent","surg","node4")
                        )
publish(res_publish)
##  subgroups      level sample_0 sample_1 event_0 event_1 time_0 time_1  HazardRatio pinteraction subgroup Lower     confint 
##        sex     female      163      149      74      81 243849 195096 1.321339e 00       0.0283       NA 0.964 0.964- 1.81 
##        sex       male      141      166      45      96 250006 208495 2.237907e 00       0.0283       NA 1.570 1.570- 3.19 
##        age        >65      114      110      40      63 186392 135830 1.961300e 00       0.3145       NA 1.319 1.319- 2.92 
##        age       <=65      190      205      79     114 307463 267761 1.526851e 00       0.3145       NA 1.146 1.146- 2.03 
##   obstruct         No      250      252      98     140 412753 329934 1.635528e 00       0.7511       NA 1.263 1.263- 2.12 
##   obstruct        Yes       54       63      21      37  81102  73657 1.800281e 00       0.7511       NA 1.054 1.054- 3.08 
##     perfor         No      296      306     116     170 481322 394990 1.642233e 00       0.4283       NA 1.297 1.297- 2.08 
##     perfor        Yes        8        9       3       7  12533   8601 2.815180e 00       0.4283       NA 0.728 0.728-10.89 
##     adhere         No      265      268     100     148 439661 352840 1.688957e 00       0.7564       NA 1.310 1.310- 2.18 
##     adhere        Yes       39       47      19      29  54194  50751 1.527858e 00       0.7564       NA 0.857 0.857- 2.73 
##     differ       well       29       27       9      17  54084  33478 2.690891e 00       0.4025       NA 1.199 1.199- 6.04 
##     differ   moderate      215      229      81     124 359531 311017 1.652095e 00       0.4025       NA 1.248 1.248- 2.19 
##     differ       poor       54       52      28      33  68676  49502 1.413040e 00       0.4025       NA 0.854 0.854- 2.34 
##     extent  submucosa       10        8       3       0  19733  17689 1.900719e-07       0.1000       NA 0.000 0.000-  Inf 
##     extent     muscle       32       38       6      15  63100  61693 2.441617e 00       0.1000       NA 0.947 0.947- 6.29 
##     extent     serosa      251      249     104     148 397919 307963 1.679296e 00       0.1000       NA 1.306 1.306- 2.16 
##     extent contiguous       11       20       6      14  13103  16246 1.547962e 00       0.1000       NA 0.595 0.595- 4.03 
##       surg      short      228      224      82     123 379802 292677 1.825926e 00       0.1850       NA 1.381 1.381- 2.41 
##       surg       long       76       91      37      54 114053 110914 1.297101e 00       0.1850       NA 0.853 0.853- 1.97 
##      node4         No      225      228      70     114 400644 331484 1.832787e 00       0.3383       NA 1.361 1.361- 2.47 
##      node4        Yes       79       87      49      63  93211  72107 1.451102e 00       0.3383       NA 0.998 0.998- 2.11

结果给出了HR、HR的可信区间、P-for-interaction。

我们探索下它的HR、HR的可信区间、P-for-interaction是怎么计算的。

P-for-interaction

首先是P-for-interaction的计算,通过翻看它的原代码可知,p-for-interaction 是通过似然比检验计算的,方法如下:

sex为例

代码语言:javascript复制
fit0 <- coxph(Surv(time, status) ~ rx*sex, data = df)
fit1 <- coxph(Surv(time, status) ~ rx sex, data = df)

anova(fit0,fit1,test = "chisp")
## Analysis of Deviance Table
##  Cox model: response is  Surv(time, status)
##  Model 1: ~ rx * sex
##  Model 2: ~ rx   sex
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -1795.1                       
## 2 -1797.5 4.8122  1    0.02826 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

计算出来刚好是0.0283,和上面的结果是一样的,这个结果是没有问题的。虽然和jstable计算的有误差,但是结果都是正确的。

HR及可信区间

我们以obstruct这个变量为例,先手动计算一下它的HR和可信区间。

对数据取子集然后计算即可:

代码语言:javascript复制
fit0 <- coxph(Surv(time, status) ~ rx, data = df[df["sex"] == "male",])
summary(fit0)
## Call:
## coxph(formula = Surv(time, status) ~ rx, data = df[df["sex"] == 
##     "male", ])
## 
##   n= 307, number of events= 141 
## 
##       coef exp(coef) se(coef)     z Pr(>|z|)    
## rx1 0.8274    2.2873   0.1812 4.567 4.95e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## rx1     2.287     0.4372     1.604     3.262
## 
## Concordance= 0.598  (se = 0.02 )
## Likelihood ratio test= 22.45  on 1 df,   p=2e-06
## Wald test            = 20.85  on 1 df,   p=5e-06
## Score (logrank) test = 22.05  on 1 df,   p=3e-06

fit1 <- coxph(Surv(time, status) ~ rx, data = df[df["sex"] == "female",])
summary(fit1)
## Call:
## coxph(formula = Surv(time, status) ~ rx, data = df[df["sex"] == 
##     "female", ])
## 
##   n= 312, number of events= 155 
## 
##       coef exp(coef) se(coef)     z Pr(>|z|)  
## rx1 0.2745    1.3159   0.1608 1.707   0.0878 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## rx1     1.316     0.7599    0.9601     1.804
## 
## Concordance= 0.539  (se = 0.02 )
## Likelihood ratio test= 2.92  on 1 df,   p=0.09
## Wald test            = 2.91  on 1 df,   p=0.09
## Score (logrank) test = 2.93  on 1 df,   p=0.09

sex=male时,计算出来的HR=2.287,可信区间为1.604~3.262, 当sex=female时,计算出来的HR=1.3159,可信区间为0.9601~1.804,

这个结果和我们之前用jstable以及purrr手动计算出来的结果都是一样的,但是和上面subgroupAnalysis的结果不一样。

翻看源码得知,它的HR和可信区间是使用下面的代码计算的:

代码语言:javascript复制
ff1 <- update.formula(object$formula, paste("~ .  ", var, "*", treatment))
ff2 <- update.formula(object$formula, paste("~ .  ", var, " ", treatment))
fit1 <- do.call("coxph", list(formula = ff1, data = datt, 
fit2 <- do.call("coxph", list(formula = ff2, data = datt, 
pinteraction <- anova(fit1, fit2)[4][2, ]
## 省略
rt <- suppressMessages(data.table::setDT(summary(regressionTable(fit1), 
                                                   print = FALSE)$rawTable)[, tail(.SD, length)])
rt <- rt[, `:=`(Variable, NULL)]

也就是说,对于这个sex变量,它的male和female的HR及可信区间是通过以下方式计算出来的:

代码语言:javascript复制
fit3 <- coxph(Surv(time, status) ~ rx   sex   rx:sex, data = df)

summary(regressionTable(fit3),print = FALSE)
##                     Variable Units HazardRatio       CI.95   p-value
## 1 rx(0): sex(male vs female)              0.61 [0.42;0.89]   0.00991
## 2 rx(1): sex(male vs female)              1.04 [0.77;1.40]   0.79501
## 3    sex(female): rx(1 vs 0)              1.32 [0.96;1.81]   0.08318
## 4      sex(male): rx(1 vs 0)              2.24 [1.57;3.19]   < 0.001

此时的sex=female的HR是1.32,可信区间为0.96~1.81,这个结果还是和我们自己计算的是一样的;sex=male是的HR是2.24,可信区间为1.57~3.19。这个结果和上面subgroupAnalysis的结果是一样的,但是和我们自己计算的差距就比较大了。

但是很明显是有问题的,因为它没分亚组,而且我也不太懂它的公式为什么这么复杂,也有可能是regressionTable进行了一些计算。

限于个人水平,难免出错,欢迎各位老师批评指正。

0 人点赞