同时也给出来了反向鉴定关键单细胞亚群的流程,就是发表在2021年Nature Biotechnology上的Scissor算法,它们的结果非常一致,说明了算法的可靠性,而且类似的算法还有一个发表在NAR的一篇算法文章《scAB detects multiresolution cell states with clinical significance by integrating single-cell genomics and bulk sequencing data》,DOI10.1093/nar/gkac1109
因为我不是做算法开发的生信工程师,所以我没办法去给大家推理这两个软件背后的统计学原理或者代码底层架构,但是我用实例给出来了算法和运行和理解,而且我们可以从另外一个侧面来做同样的反向鉴定关键单细胞亚群,未必就不比这些发表在好的杂志上面的算法效果差!
首先载入前面的批量生存分析结果以及单细胞对象
代码语言:javascript复制rm(list=ls())
library(Seurat)
library(preprocessCore)
library(survival)
library(survminer)
library(ggstatsplot)
library(remotes)
# remotes::install_github("carmonalab/UCell")
library(UCell)
load( file = '../01-tcga_luad_from_xena/batch_cox_results.Rdata')
cox_results = as.data.frame(cox_results)
cox_results=cox_results[order(cox_results$HR,decreasing = T),]
load('scRNA_for_scAB_Scissor.Rdata')
scRNA
table(scRNA$celltype)
p1=DimPlot(scRNA, reduction ="umap", group.by="celltype",label = T)
p1
cox_results=cox_results[rownames(cox_results)%in% rownames(scRNA),]
cox_markers=list(
pos = head(rownames(cox_results),100),
neg = tail(rownames(cox_results),100)
)
拿统计学显著的基因列表去单细胞打分
我们这里采取平平无奇的AddModuleScore_UCell函数即可:
代码语言:javascript复制sc_dataset <- AddModuleScore_UCell(scRNA,
features = cox_markers)
signature.names <- paste0(names(cox_markers), "_UCell")
options(repr.plot.width=6, repr.plot.height=4)
colnames(sc_dataset@meta.data)
VlnPlot(sc_dataset, features = signature.names,
#group.by = "celltype",
stack=TRUE ) NoLegend()
FeaturePlot(sc_dataset,'pos_UCell')
table(sc_dataset$pos_UCell> 0)
fivenum(sc_dataset$pos_UCell)
b1=table(sc_dataset$pos_UCell> fivenum(sc_dataset$pos_UCell)[4],
sc_dataset$celltype)
gplots::balloonplot(b1)
phe=sc_dataset@meta.data
很容易看到,那些风险基因在cycle亚群是打分很高,如下所示:
风险基因在cycle亚群是打分很高
查看Scissor算法和打分结果一致性;
代码语言:javascript复制load(file = 'output_of_Scissor.Rdata')
sc_dataset$pos_UCell = phe$pos_UCell
FeaturePlot(sc_dataset,'pos_UCell',split.by ='scissor' )
sc_dataset$neg_UCell = phe$neg_UCell
FeaturePlot(sc_dataset,'neg_UCell',split.by ='scissor' )
table(sc_dataset$celltype)
table(sc_dataset$scissor)
b1=table(sc_dataset$pos_UCell> fivenum(sc_dataset$pos_UCell)[4],
sc_dataset$scissor)
gplots::balloonplot(b1)
boxplot(sc_dataset$pos_UCell ~ sc_dataset$scissor)
很容易看到,Scissor算法判定的阳性细胞就是富集了那些生存分析的风险基因的细胞亚群。
Scissor算法判定的阳性细胞就是富集了那些生存分析的风险基因的细胞亚群
差异分析和生存分析结果交集
代码语言:javascript复制#Idents(sc_dataset)= sc_dataset$scissor
deg_Scissor_markers <- FindMarkers(sc_dataset,
ident.1 = "Scissor ",
group.by = 'scissor',
logfc.threshold = 0.25 )
deg_Scissor_markers <- deg_Scissor_markers[which(deg_Scissor_markers$p_val_adj<0.05),]
head(deg_Scissor_markers)
cox_markers$up = rownames(deg_Scissor_markers)[deg_Scissor_markers$avg_log2FC>0]
cox_markers$down = rownames(deg_Scissor_markers)[deg_Scissor_markers$avg_log2FC < 0]
require("VennDiagram")
grid.newpage()
venn.plot <- venn.diagram(cox_markers , NULL,
fill=c("red", "blue",'green','black'),
alpha=c(0.5,0.5,0.5,0.5), cex = 2, cat.fontface=4,
category.names= names(cox_markers),
main="venn.diagram")
grid.draw(venn.plot)
pdf('Scissor_select.venn.plot.pdf')
grid.draw(venn.plot)
dev.off()
也是非常漂亮的结果啦,Scissor算法判定的阳性细胞的高表达量基因绝大部分都是生存分析的风险基因啦。
非常漂亮的结果