分享是一种态度
删繁就简三秋树,领异标新二月花
本章介绍SCP中对于单细胞数据的标准处理流程,适用于单样本数据、无批次效应的多样本数据和其他探索性分析等。
- 主要函数:Standard_SCP;
- SCP版本:0.5.3;Seurat版本:v4.4.0;
Standard_SCP函数
Standard_SCP是对单细胞数据的标准处理流程。主要参考Seurat标准流程建立的,包括了单细胞数据的标准化、高变异基因(HVF)检测、线性和非线性降维、细胞聚类等步骤。
该流程有以下特点:
- 参数简化,直接参数均为各步骤中主要参数,其余参数可通过list递入,具体参数说明请查阅Standard_SCP函数文档[1]。;
- 自动化,例如自动检查数据类型、各步骤是否需要进行、自动估计线性降维空间的内在维度(intrinsic dimension)、细胞群编号自动排序等;
- 多种线性(pca,ica,nmf,mds,glmpca)或非线性降维方法(umap,tsne,dm,phate,pacmap,trimap,largevis,fr)组合分析;
标准流程示例
下面使用下采样后的小鼠胚胎E15.5天的胰腺上皮单细胞数据进行示例分析,通过在R中运行?pancreas_sub
可以查看该示例数据相关信息。
library(SCP)
library(Seurat)
data("pancreas_sub")
pancreas_sub
#> An object of class Seurat
#> 47874 features across 1000 samples within 3 assays
#> Active assay: RNA (15958 features, 3467 variable features)
#> 2 other assays present: spliced, unspliced
#> 2 dimensional reductions calculated: PCA, UMAP
默认参数下,Standard_SCP将使用2000个HVF进行分析,线性降维方法选择PCA,利用intrinsicDimension::maxLikGlobalDimEst
估计内在维度并进行UMAP非线性降维以及细胞分群等:
pancreas_sub <- Standard_SCP(srt = pancreas_sub)
#> [2023-10-27 06:36:02] Start Standard_SCP
#> [2023-10-27 06:36:02] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:03] Finished checking.
#> [2023-10-27 06:36:03] Perform ScaleData on the data...
#> [2023-10-27 06:36:03] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:04] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:04] Reorder clusters...
#> [2023-10-27 06:36:05] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:12] Standard_SCP done
#> Elapsed time: 9.52 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "Standardclusters"))
返回的Seurat对象中包含了处理后的数据矩阵,默认参数下分析所用的assay是RNA
,所以改动的数据主要在pancreas_sub[["RNA"]]
中;同时新增分析过程产生的graphs或reductions,其中非线性降维默认返回细胞在2D和3D向量空间的embedding坐标;在meta.data中新增细胞clusters;所有新增的graphs、reductions、clusters的名称前缀默认为Standard
,中间生成的reductions名称会附有线性(小写)和非线性降维(大写)的名称,最终的reduction只会保留非线性降维名称:
Graphs(pancreas_sub)
#> [1] "Standardpca_KNN" "Standardpca_SNN"
Reductions(pancreas_sub)
#> [1] "PCA" "UMAP" "Standardpca"
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"
#> [7] "StandardUMAP3D"
colnames(pancreas_sub@meta.data)
#> [1] "orig.ident" "nCount_RNA"
#> [3] "nFeature_RNA" "S_score"
#> [5] "G2M_score" "nCount_spliced"
#> [7] "nFeature_spliced" "nCount_unspliced"
#> [9] "nFeature_unspliced" "CellType"
#> [11] "SubCellType" "Phase"
#> [13] "Standardpca_SNN_res.0.6" "ident"
#> [15] "Standardpcaclusters" "Standardclusters"
另外,CellDimPlot
画图时默认使用DefaultReduction
所返回的reduction,它将在每次运行Standard_SCP后更新。
names(pancreas_sub@reductions)
#> [1] "PCA" "UMAP" "Standardpca"
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"
#> [7] "StandardUMAP3D"
DefaultReduction(pancreas_sub)
#> [1] "StandardUMAP2D"
也可以根据需求更换assay并且修改前缀,以防止覆盖之前的结果。注意,指定assay会改变Seurat对象的默认assay,后面我们将继续使用RNA
而非unspliced
,所以需要更改回去:
pancreas_sub <- Standard_SCP(srt = pancreas_sub, assay = "unspliced", prefix = "unspliced")
#> [2023-10-27 06:36:13] Start Standard_SCP
#> [2023-10-27 06:36:13] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:14] Finished checking.
#> [2023-10-27 06:36:14] Perform ScaleData on the data...
#> [2023-10-27 06:36:14] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:15] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:15] Reorder clusters...
#> [2023-10-27 06:36:15] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:27] Standard_SCP done
#> Elapsed time: 14.59 secs
DefaultAssay(pancreas_sub)
#> [1] "unspliced"
DefaultAssay(pancreas_sub) <- "RNA"
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "unsplicedclusters"))
分析中常会手动调整所要使用的线性降维维度,例如计算50个PC,使用前30个PC进行非线性降维聚类:
代码语言:javascript复制pancreas_sub <- Standard_SCP(
srt = pancreas_sub, prefix = "PC30",
linear_reduction = "pca",
linear_reduction_dims = 50,
linear_reduction_dims_use = 1:30
)
#> [2023-10-27 06:36:29] Start Standard_SCP
#> [2023-10-27 06:36:29] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:30] Finished checking.
#> [2023-10-27 06:36:30] Perform ScaleData on the data...
#> [2023-10-27 06:36:31] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:31] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:32] Reorder clusters...
#> [2023-10-27 06:36:32] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:40] Standard_SCP done
#> Elapsed time: 10.53 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "PC30clusters"))
如果Seurat对象中已经有了线性降维的结果,我们也可以指定它从而跳过这部分的计算:
代码语言:javascript复制pancreas_sub <- Standard_SCP(
srt = pancreas_sub, prefix = "SKIP",
linear_reduction = "Standardpca"
)
#> [2023-10-27 06:36:41] Start Standard_SCP
#> [2023-10-27 06:36:41] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:42] Finished checking.
#> [2023-10-27 06:36:42] Perform ScaleData on the data...
#> [2023-10-27 06:36:42] Perform linear dimension reduction (Standardpca) on the data...
#> [2023-10-27 06:36:43] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:43] Reorder clusters...
#> [2023-10-27 06:36:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:54] Standard_SCP done
#> Elapsed time: 13.11 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "SKIPclusters"))
不同的线性 非线性降维方法将直接影响到降维效果和细胞分群,Standard_SCP可以一次进行多种方法的组合,为了避免过多的组合计算,我们分别使用以下组合进行示例分析:
1. 不同的线性降维方法 umap:
代码语言:javascript复制linear_reductions <- c("pca", "ica", "nmf", "mds", "glmpca")
pancreas_sub <- Standard_SCP(
srt = pancreas_sub,
linear_reduction = linear_reductions,
nonlinear_reduction = "umap"
)
#> [2023-10-27 06:36:55] Start Standard_SCP
#> [2023-10-27 06:36:55] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:56] Finished checking.
#> [2023-10-27 06:36:56] Perform ScaleData on the data...
#> [2023-10-27 06:36:56] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:58] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:58] Reorder clusters...
#> [2023-10-27 06:36:58] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:06] Perform linear dimension reduction (ica) on the data...
#> [2023-10-27 06:37:09] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:09] Reorder clusters...
#> [2023-10-27 06:37:09] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:16] Perform linear dimension reduction (nmf) on the data...
#> [2023-10-27 06:37:30] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:30] Reorder clusters...
#> [2023-10-27 06:37:31] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:39] Perform linear dimension reduction (mds) on the data...
#> [2023-10-27 06:37:42] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:42] Reorder clusters...
#> [2023-10-27 06:37:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:57] Perform linear dimension reduction (glmpca) on the data...
#> [2023-10-27 06:40:20] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:20] Reorder clusters...
#> [2023-10-27 06:40:21] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:32] Standard_SCP done
#> Elapsed time: 3.61 mins
plist1 <- lapply(linear_reductions, function(lr) {
CellDimPlot(pancreas_sub,
group.by = "SubCellType",
reduction = paste0("Standard", lr, "UMAP2D"),
xlab = "", ylab = "", title = lr,
legend.position = "none",
theme_use = "theme_blank"
)
})
patchwork::wrap_plots(plotlist = plist1)
2. pca 不同的非线性降维方法:
代码语言:javascript复制nonlinear_reductions <- c("umap", "tsne", "dm", "phate", "pacmap", "trimap", "largevis", "fr")
pancreas_sub <- Standard_SCP(
srt = pancreas_sub,
linear_reduction = "pca",
nonlinear_reduction = nonlinear_reductions
)
#> [2023-10-27 06:40:33] Start Standard_SCP
#> [2023-10-27 06:40:33] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:40:35] Finished checking.
#> [2023-10-27 06:40:35] Perform ScaleData on the data...
#> [2023-10-27 06:40:35] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:40:37] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:37] Reorder clusters...
#> [2023-10-27 06:40:38] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:51] Perform nonlinear dimension reduction (tsne) on the data...
#> [2023-10-27 06:41:36] Perform nonlinear dimension reduction (dm) on the data...
#> [2023-10-27 06:41:38] Perform nonlinear dimension reduction (phate) on the data...
#> [2023-10-27 06:42:06] Perform nonlinear dimension reduction (pacmap) on the data...
#> [2023-10-27 06:42:23] Perform nonlinear dimension reduction (trimap) on the data...
#> [2023-10-27 06:42:49] Perform nonlinear dimension reduction (largevis) on the data...
#> [2023-10-27 06:47:58] Perform nonlinear dimension reduction (fr) on the data...
#> [2023-10-27 06:48:02] Standard_SCP done
#> Elapsed time: 7.49 mins
plist2 <- lapply(nonlinear_reductions, function(nr) {
CellDimPlot(pancreas_sub,
group.by = "SubCellType",
reduction = paste0("Standardpca", toupper(nr), "2D"),
xlab = "", ylab = "", title = nr,
legend.position = "none",
theme_use = "theme_blank"
)
})
patchwork::wrap_plots(plotlist = plist2)
文中资料
[1]
Standard_SCP函数文档: https://zhanghao-njmu.github.io/SCP/reference/Standard_SCP.html