端到端的单细胞管道SCP-标准流程

2023-10-30 15:32:17 浏览数 (2)

分享是一种态度

删繁就简三秋树,领异标新二月花

本章介绍SCP中对于单细胞数据的标准处理流程,适用于单样本数据、无批次效应的多样本数据和其他探索性分析等。

  • 主要函数:Standard_SCP;
  • SCP版本:0.5.3;Seurat版本:v4.4.0;

Standard_SCP函数

Standard_SCP是对单细胞数据的标准处理流程。主要参考Seurat标准流程建立的,包括了单细胞数据的标准化、高变异基因(HVF)检测、线性和非线性降维、细胞聚类等步骤。

该流程有以下特点:

  1. 参数简化,直接参数均为各步骤中主要参数,其余参数可通过list递入,具体参数说明请查阅Standard_SCP函数文档[1]。;
  2. 自动化,例如自动检查数据类型、各步骤是否需要进行、自动估计线性降维空间的内在维度(intrinsic dimension)、细胞群编号自动排序等;
  3. 多种线性(pca,ica,nmf,mds,glmpca)或非线性降维方法(umap,tsne,dm,phate,pacmap,trimap,largevis,fr)组合分析;

标准流程示例

下面使用下采样后的小鼠胚胎E15.5天的胰腺上皮单细胞数据进行示例分析,通过在R中运行?pancreas_sub可以查看该示例数据相关信息。

代码语言:javascript复制
library(SCP)
library(Seurat)
data("pancreas_sub")
pancreas_sub
#> An object of class Seurat 
#> 47874 features across 1000 samples within 3 assays 
#> Active assay: RNA (15958 features, 3467 variable features)
#>  2 other assays present: spliced, unspliced
#>  2 dimensional reductions calculated: PCA, UMAP

默认参数下,Standard_SCP将使用2000个HVF进行分析,线性降维方法选择PCA,利用intrinsicDimension::maxLikGlobalDimEst估计内在维度并进行UMAP非线性降维以及细胞分群等:

代码语言:javascript复制
pancreas_sub <- Standard_SCP(srt = pancreas_sub)
#> [2023-10-27 06:36:02] Start Standard_SCP
#> [2023-10-27 06:36:02] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:03] Finished checking.
#> [2023-10-27 06:36:03] Perform ScaleData on the data...
#> [2023-10-27 06:36:03] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:04] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:04] Reorder clusters...
#> [2023-10-27 06:36:05] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:12] Standard_SCP done
#> Elapsed time: 9.52 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "Standardclusters"))

返回的Seurat对象中包含了处理后的数据矩阵,默认参数下分析所用的assay是RNA,所以改动的数据主要在pancreas_sub[["RNA"]]中;同时新增分析过程产生的graphs或reductions,其中非线性降维默认返回细胞在2D和3D向量空间的embedding坐标;在meta.data中新增细胞clusters;所有新增的graphs、reductions、clusters的名称前缀默认为Standard,中间生成的reductions名称会附有线性(小写)和非线性降维(大写)的名称,最终的reduction只会保留非线性降维名称:

代码语言:javascript复制
Graphs(pancreas_sub)
#> [1] "Standardpca_KNN" "Standardpca_SNN"
Reductions(pancreas_sub)
#> [1] "PCA"               "UMAP"              "Standardpca"      
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"   
#> [7] "StandardUMAP3D"
colnames(pancreas_sub@meta.data)
#>  [1] "orig.ident"              "nCount_RNA"             
#>  [3] "nFeature_RNA"            "S_score"                
#>  [5] "G2M_score"               "nCount_spliced"         
#>  [7] "nFeature_spliced"        "nCount_unspliced"       
#>  [9] "nFeature_unspliced"      "CellType"               
#> [11] "SubCellType"             "Phase"                  
#> [13] "Standardpca_SNN_res.0.6" "ident"                  
#> [15] "Standardpcaclusters"     "Standardclusters"

另外,CellDimPlot画图时默认使用DefaultReduction所返回的reduction,它将在每次运行Standard_SCP后更新。

代码语言:javascript复制
names(pancreas_sub@reductions)
#> [1] "PCA"               "UMAP"              "Standardpca"      
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"   
#> [7] "StandardUMAP3D"
DefaultReduction(pancreas_sub)
#> [1] "StandardUMAP2D"

也可以根据需求更换assay并且修改前缀,以防止覆盖之前的结果。注意,指定assay会改变Seurat对象的默认assay,后面我们将继续使用RNA而非unspliced,所以需要更改回去:

代码语言:javascript复制
pancreas_sub <- Standard_SCP(srt = pancreas_sub, assay = "unspliced", prefix = "unspliced")
#> [2023-10-27 06:36:13] Start Standard_SCP
#> [2023-10-27 06:36:13] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:14] Finished checking.
#> [2023-10-27 06:36:14] Perform ScaleData on the data...
#> [2023-10-27 06:36:14] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:15] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:15] Reorder clusters...
#> [2023-10-27 06:36:15] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:27] Standard_SCP done
#> Elapsed time: 14.59 secs
DefaultAssay(pancreas_sub)
#> [1] "unspliced"
DefaultAssay(pancreas_sub) <- "RNA"
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "unsplicedclusters"))

分析中常会手动调整所要使用的线性降维维度,例如计算50个PC,使用前30个PC进行非线性降维聚类:

代码语言:javascript复制
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub, prefix = "PC30",
  linear_reduction = "pca",
  linear_reduction_dims = 50,
  linear_reduction_dims_use = 1:30
)
#> [2023-10-27 06:36:29] Start Standard_SCP
#> [2023-10-27 06:36:29] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:30] Finished checking.
#> [2023-10-27 06:36:30] Perform ScaleData on the data...
#> [2023-10-27 06:36:31] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:31] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:32] Reorder clusters...
#> [2023-10-27 06:36:32] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:40] Standard_SCP done
#> Elapsed time: 10.53 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "PC30clusters"))

如果Seurat对象中已经有了线性降维的结果,我们也可以指定它从而跳过这部分的计算:

代码语言:javascript复制
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub, prefix = "SKIP",
  linear_reduction = "Standardpca"
)
#> [2023-10-27 06:36:41] Start Standard_SCP
#> [2023-10-27 06:36:41] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:42] Finished checking.
#> [2023-10-27 06:36:42] Perform ScaleData on the data...
#> [2023-10-27 06:36:42] Perform linear dimension reduction (Standardpca) on the data...
#> [2023-10-27 06:36:43] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:43] Reorder clusters...
#> [2023-10-27 06:36:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:54] Standard_SCP done
#> Elapsed time: 13.11 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "SKIPclusters"))

不同的线性 非线性降维方法将直接影响到降维效果和细胞分群,Standard_SCP可以一次进行多种方法的组合,为了避免过多的组合计算,我们分别使用以下组合进行示例分析:

1. 不同的线性降维方法 umap:

代码语言:javascript复制
linear_reductions <- c("pca", "ica", "nmf", "mds", "glmpca")
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub,
  linear_reduction = linear_reductions,
  nonlinear_reduction = "umap"
)
#> [2023-10-27 06:36:55] Start Standard_SCP
#> [2023-10-27 06:36:55] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:56] Finished checking.
#> [2023-10-27 06:36:56] Perform ScaleData on the data...
#> [2023-10-27 06:36:56] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:58] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:58] Reorder clusters...
#> [2023-10-27 06:36:58] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:06] Perform linear dimension reduction (ica) on the data...
#> [2023-10-27 06:37:09] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:09] Reorder clusters...
#> [2023-10-27 06:37:09] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:16] Perform linear dimension reduction (nmf) on the data...
#> [2023-10-27 06:37:30] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:30] Reorder clusters...
#> [2023-10-27 06:37:31] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:39] Perform linear dimension reduction (mds) on the data...
#> [2023-10-27 06:37:42] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:42] Reorder clusters...
#> [2023-10-27 06:37:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:57] Perform linear dimension reduction (glmpca) on the data...
#> [2023-10-27 06:40:20] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:20] Reorder clusters...
#> [2023-10-27 06:40:21] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:32] Standard_SCP done
#> Elapsed time: 3.61 mins
plist1 <- lapply(linear_reductions, function(lr) {
  CellDimPlot(pancreas_sub,
    group.by = "SubCellType",
    reduction = paste0("Standard", lr, "UMAP2D"),
    xlab = "", ylab = "", title = lr,
    legend.position = "none",
    theme_use = "theme_blank"
  )
})
patchwork::wrap_plots(plotlist = plist1)

2. pca 不同的非线性降维方法:

代码语言:javascript复制
nonlinear_reductions <- c("umap", "tsne", "dm", "phate", "pacmap", "trimap", "largevis", "fr")
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub,
  linear_reduction = "pca",
  nonlinear_reduction = nonlinear_reductions
)
#> [2023-10-27 06:40:33] Start Standard_SCP
#> [2023-10-27 06:40:33] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:40:35] Finished checking.
#> [2023-10-27 06:40:35] Perform ScaleData on the data...
#> [2023-10-27 06:40:35] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:40:37] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:37] Reorder clusters...
#> [2023-10-27 06:40:38] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:51] Perform nonlinear dimension reduction (tsne) on the data...
#> [2023-10-27 06:41:36] Perform nonlinear dimension reduction (dm) on the data...
#> [2023-10-27 06:41:38] Perform nonlinear dimension reduction (phate) on the data...
#> [2023-10-27 06:42:06] Perform nonlinear dimension reduction (pacmap) on the data...
#> [2023-10-27 06:42:23] Perform nonlinear dimension reduction (trimap) on the data...
#> [2023-10-27 06:42:49] Perform nonlinear dimension reduction (largevis) on the data...
#> [2023-10-27 06:47:58] Perform nonlinear dimension reduction (fr) on the data...
#> [2023-10-27 06:48:02] Standard_SCP done
#> Elapsed time: 7.49 mins
plist2 <- lapply(nonlinear_reductions, function(nr) {
  CellDimPlot(pancreas_sub,
    group.by = "SubCellType",
    reduction = paste0("Standardpca", toupper(nr), "2D"),
    xlab = "", ylab = "", title = nr,
    legend.position = "none",
    theme_use = "theme_blank"
  )
})
patchwork::wrap_plots(plotlist = plist2)

文中资料

[1]

Standard_SCP函数文档: https://zhanghao-njmu.github.io/SCP/reference/Standard_SCP.html

0 人点赞