最近学徒在分享单细胞数据集实战的时候,发现一个简单的各个单细胞亚群特异性标记基因热图都绘制的很奇怪,感觉大家仅仅是跑标准代码,而不能花时间去研发和提高,做出有自己特色的分析。比如热图首先基因不能显示出顺序的区块效果,其次也不设置一下自己的喜欢的配色。
我们以大家熟知的pbmc3k数据集为例。大家先安装这个数据集对应的包,并且对它进行降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释 ,而且每个亚群找高表达量基因,都存储为Rdata文件。标准代码是:
代码语言:javascript复制library(SeuratData) #加载seurat数据集
getOption('timeout')
options(timeout=10000)
#InstallData("pbmc3k")
data("pbmc3k")
sce <- pbmc3k.final
library(Seurat)
table(Idents(sce))
DimPlot(sce,label = T)
sce.markers <- FindAllMarkers(object = sce, only.pos = TRUE,
min.pct = 0.25,
thresh.use = 0.25)
library(dplyr)
top3 <- sce.markers %>% group_by(cluster) %>% top_n(3, avg_log2FC)
DoHeatmap(sce ,top3$gene,size=3)
会得到如下所示的热图:
默认热图
如果是针对上面的FindAllMarkers定位到的各个单细胞亚群各自特异性基因,
代码语言:javascript复制library(dplyr)
top3 <- sce.markers %>% group_by(cluster) %>% top_n(3, avg_log2FC)
sce.all <- ScaleData(sce,features = top3$gene)
library(paletteer)
color <- c(paletteer_d("awtools::bpalette"),
paletteer_d("awtools::a_palette"),
paletteer_d("awtools::mpalette"))
unique(sce.all$celltype)
ord = c('Naive CD4 T' ,'Memory CD4 T', 'CD8 T', 'NK',
'CD14 Mono', 'FCGR3A Mono' ,'DC', 'B','Platelet')
sce.all$celltype = factor(sce.all$celltype ,levels = ord)
ll = split(top10$gene,top10$cluster)
ll = ll[ord]
rmg=names(table(unlist(ll))[table(unlist(ll))>1])
ll = lapply(ll, function(x) x[!x %in% rmg])
library(ggplot2)
DoHeatmap(sce.all,
features = unlist(ll),
group.by = "celltype",
assay = 'RNA',
group.colors = color,label = F)
scale_fill_gradientn(colors = c("white","grey","firebrick3"))
ggsave(filename = "marker_pheatmap.pdf",units = "cm",width = 36,height = 42)
效果如下所示:
改进的热图
如果没有使用FindAllMarkers函数,而是 速度上吊打FindAllMarkers的单细胞亚群特异性高表达基因查询算法 :
代码语言:javascript复制
library(dplyr)
top_10 <- unique(as.character(apply(marker_cosg$names,2,head,10)))
sce.all <- ScaleData(sce,features = top_10 )
table(sce.all$celltype)
library(paletteer)
color <- c(paletteer_d("awtools::bpalette"),
paletteer_d("awtools::a_palette"),
paletteer_d("awtools::mpalette"))
ord = names(marker_cosg$names)
sce.all$celltype = factor(sce.all$celltype ,levels = ord)
table(sce.all$celltype)
ll = as.list(marker_cosg$names)
ll = ll[ord]
rmg=names(table(unlist(ll))[table(unlist(ll))>1])
ll = lapply(ll, function(x) x[!x %in% rmg])
DoHeatmap(sce.all,
features = unlist(ll),
group.by = "celltype",
assay = 'RNA',
group.colors = color,label = T)
scale_fill_gradientn(colors = c("white","grey","firebrick3"))
ggsave(filename = "marker_pheatmap_by_celltype.pdf",units = "cm",width = 36,height = 42)
其实原理很简单,就是针对单细胞亚群进行排序,而且必须要跟各个亚群特异性的高表达基因的顺序对应哦。