Seurat软件学习1-多个模型得数据进行整合:https://cloud.tencent.com/developer/article/2130078
Seurat软件学习2-scrna数据整合分析:https://cloud.tencent.com/developer/article/2131431
Seurat软件学习3-scrna数据整合分析注释数据集:https://cloud.tencent.com/developer/article/2133583
在这节中,我们提出了一个稍作修改的工作流程来整合scRNA-seq数据集。我们没有利用典型相关分析(CCA)来确定锚点,而是利用RPCA。当使用RPCA确定任何两个数据集之间的锚时,我们将每个数据集投射到其他PCA空间,并通过相同的相互邻接要求来约束锚。这两种工作流程的命令基本相同,但这两种方法可以在不同的背景下应用。
通过识别数据集之间的共同变异源,CCA很适合在细胞类型保守,但不同实验的基因表达有很大差异的情况下识别锚点。因此,当实验条件或疾病状态引入非常强烈的表达变化时,或在整合不同模式和物种的数据集时,基于CCA的整合能够进行综合分析。然而,基于CCA的整合也可能导致过度校正,特别是当很大一部分细胞在不同数据集之间不重叠的时候。
基于RPCA的整合运行速度明显加快,也代表了一种更保守的方法,不同生物状态的细胞在整合后不太可能 "对齐"。因此,我们建议在整合分析中使用RPCA。
下面,我们展示了使用交互式PCA来对齐我们在介绍scRNA-seq整合时首次分析的相同的刺激和静止数据集。虽然命令列表几乎是相同的,但这个工作流程要求用户在整合前对每个数据集单独运行主成分分析(PCA)。在运行FindIntegrationAnchors()时,用户还应该将 "还原 "参数设置为 "rpca"。
代码语言:javascript复制library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
执行整合
然后,我们使用FindIntegrationAnchors()函数识别锚点,该函数将Seurat对象的列表作为输入,并使用这些锚点将两个数据集整合在一起。
代码语言:javascript复制##这里与第一节不同的是reduction变成了rpca
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca")
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
现在,我们可以对所有的细胞进行单一的综合分析!
代码语言:javascript复制# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 p2
修改整合的强度
结果显示,基于rpca的整合更加保守,在这种情况下,不要在不同的实验中完美地对齐一个细胞子集(这是幼稚和记忆T细胞)。你可以通过增加k.anchor参数来增加对齐的强度,该参数默认设置为5。将这个参数增加到20将有助于对齐这些群体。
代码语言:javascript复制#k.anchor = 20:选择锚点时要使用20个邻居(k)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca",
k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 13999
## Number of edges: 589767
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9094
## Number of communities: 15
## Elapsed time: 4 seconds
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 p2
现在,数据集已经被整合,你可以按照之前介绍scRNA-seq整合前几节的步骤来确定细胞类型和细胞类型的特异性反应。
对用SCTransform规范化的数据集进行整合
作为一个额外的例子,我们重复上面进行的分析,但使用SCTransform对数据集进行标准化处理。我们可以选择将方法参数设置为glmGamPoi(在此安装),以便在SCTransform()中能够更快地估计回归参数。
代码语言:javascript复制LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
anchor.features = features, dims = 1:30, reduction = "rpca", k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT", dims = 1:30)
immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)
# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 p2
总结
对于log和sct矫正数据集,我目前更倾向于sct矫正,多出来的1000个基因是有很多新东西的,但是为了保险起见,可以把两个都做完了,然后比对一下。