单细胞数据分析里面最基础的就是降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释 ,这个大家基本上问题不大了,使用seurat标准流程即可,不过它默认出图并不好看,详见以前我们做的投票:可视化单细胞亚群的标记基因的5个方法,下面的5个基础函数相信大家都是已经烂熟于心了:
- VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
- FeaturePlot(pbmc, features = c("MS4A1", "CD79A"))
- RidgePlot(pbmc, features = c("MS4A1", "CD79A"), ncol = 1)
- DotPlot(pbmc, features = unique(features)) RotatedAxis()
- DoHeatmap(subset(pbmc, downsample = 100), features = features, size = 3)
最近,郑州大学第一附属医院的史阳同学无私的分享了他对这些基础函数的改造,颜值说不上巅峰,但打败基础函数是没有问题的, 同时也算是抛砖引玉吧,希望广大生信技能树粉丝们都投稿分享自己的创作,投稿请发邮件到 jmzeng1314@163.com
仍然是以大家熟知的pbmc3k数据集为例
大家先安装这个数据集对应的包,并且对它进行降维聚类分群,参考前面的例子:人人都能学会的单细胞聚类分群注释 ,而且每个亚群找高表达量基因,都存储为Rdata文件。
代码语言:javascript复制# 0.安装R包 ----
# devtools::install_github('caleblareau/BuenColors')
# utils::install.packages(pkgs = "ggstatsplot")
# InstallData("pbmc3k")
# 1.加载R包和测试数据 ----
rm(list = ls())
library(SeuratData) #加载seurat数据集
getOption('timeout')
options(timeout = 10000)
data("pbmc3k")
sce <- pbmc3k.final
library(Seurat)
DimPlot(sce,reduction = "umap",label = TRUE)
unique(Idents(sce))
sce$celltype = Idents(sce)
# 2.单细胞分析基本流程 ----
# 主要是拿到 tSNE和Umap的坐标,因为默认的 pbmc3k.final 是没有的
sce <- NormalizeData(sce, normalization.method = "LogNormalize",
scale.factor = 1e4)
sce <- FindVariableFeatures(sce,
selection.method = "vst", nfeatures = 2000)
sce <- ScaleData(sce)
sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce))
sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.8)
set.seed(123)
sce <- RunTSNE(object = sce, dims = 1:15, do.fast = TRUE)
sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE)
DimPlot(object = sce, reduction = "umap",label = TRUE)
# 3.定义分组信息 ----
Idents(sce) = sce$celltype
# 4.寻找各个单细胞亚群的特征性高表达量基因----
sce.markers <- FindAllMarkers(object = sce,
only.pos = TRUE,
min.pct = 0.25,
thresh.use = 0.25)
save(sce, sce.markers, file = 'tmp.Rdata')
接下来改造 单细胞亚群比例图
先放一下改造效果图:
代码如下所示:
代码语言:javascript复制source('PropPlot.R')
table(sce$seurat_clusters, sce$celltype)
PropPlot(object = sce, groupBy = 'celltype')
PropPlot(object = sce, groupBy = 'seurat_clusters')
所有的细节都在 PropPlot.R 文件里面的 PropPlot 函数, 是自定义的,内容如下所示:
代码语言:javascript复制PropPlot <- function(object, groupBy){
# (1)获取绘图数据
plot_data = object@meta.data %>%
dplyr::select(orig.ident, {{groupBy}}) %>%
dplyr::rename(group = as.name(groupBy))
# (2)绘图
figure = ggbarstats(data = plot_data,
x = group, y = orig.ident,
package = 'ggsci',
palette = 'category20c_d3',
results.subtitle = FALSE,
bf.message = FALSE,
proportion.test = FALSE,
label.args = list(size = 2,
fill = 'white',
alpha = 0.85,
family = 'Arial',
fontface = 'bold'),
perc.k = 2,
title = '',
xlab = '',
legend.title = 'Seurat Cluster',
ggtheme = ggpubr::theme_pubclean())
theme(axis.ticks.x = element_blank(),
axis.ticks.y = element_line(color = 'black', lineend = 'round'),
legend.position = 'right',
axis.text.x = element_text(size = 15, color = 'black', family = 'Arial'),
axis.text.y = element_text(size = 15, color = 'black', family = 'Arial'),
legend.text = element_text(family = 'Arial', size = 10, color = 'black'),
legend.title = element_text(family = 'Arial', size = 13, color = 'black'))
# (3)去除柱子下面的样本量标识:
gginnards::delete_layers(x = figure, match_type = 'GeomText')
}
可以看到两个命名体系差不多:
代码语言:javascript复制> table(sce$celltype,sce$seurat_clusters)
0 1 2 3 4 5 6 7 8
Naive CD4 T 163 505 0 0 29 0 0 0 0
Memory CD4 T 444 20 0 0 19 0 0 0 0
CD14 Mono 0 0 466 0 0 14 0 0 0
B 1 0 0 342 1 0 0 0 0
CD8 T 1 0 0 0 264 0 6 0 0
FCGR3A Mono 0 0 0 0 0 162 0 0 0
NK 0 0 0 0 2 0 153 0 0
DC 0 0 0 0 0 0 0 32 0
Platelet 0 0 1 0 0 0 0 0 13
所以,使用两种情况看比例,也差不多。
如果你确实觉得我的教程对你的科研课题有帮助,让你茅塞顿开,或者说你的课题大量使用我的技能,烦请日后在发表自己的成果的时候,加上一个简短的致谢,如下所示:
代码语言:javascript复制We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.
十年后我环游世界各地的高校以及科研院所(当然包括中国大陆)的时候,如果有这样的情谊,我会优先见你。