精准的文献复现—seurat对象添加细胞亚群meta信息

❝我又继续探索了一下上上周推文复现的science推文数据集，这周的推文就继续写一下。 ❞

文章标题：

CXCL9:SPP1 macrophage polarity identifies a network of cellular programs that control human cancers.

Science:

数据集：GSE234933

整个推文中需要注意的地方有三点：

作者给出的数据是多个rds格式文件压缩在一块的，解压后循环读取文件并合并成seurat对象
作者给出的细胞亚群信息可以后续添加到metadata信息中，以便之后直接用其细胞命名
检查分群情况的时候，因为已经添加了细胞亚群信息，但是由于作者前期过滤了一部分细胞，最后只有187,399cells，所以需要去除NA部分。

代码语言：javascript复制

# install.packages('R.utils')
rm(list=ls())
options(stringsAsFactors = F) 
library(Seurat)
library(ggplot2)
library(clustree)
library(cowplot)
library(dplyr)
library(data.table)
library(stringr)
#install.packages("textshape")
library(textshape)
#setwd("../../")
# 获取所有rds文件的列表
file_list <- list.files("./GSE234933_raw/rds/", pattern = ".rds")

# 创建一个空的列表来存储Seurat对象
seurat_list <- list()

# 循环读取每个rds文件的数据并创建Seurat对象
for (file in file_list) {
  # 拼接文件路径
  data.path <- paste0("./GSE234933_raw/rds/", file)
  # 读取RDS文件数据
  seurat_data <- readRDS(data.path)
  # 创建Seurat对象，并指定项目名称为文件名（去除后缀）
  sample_name <- tools::file_path_sans_ext(basename(file))
  seurat_obj <- CreateSeuratObject(counts = seurat_data,
                                   project = sample_name,
                                   min.features = 200,
                                   min.cells = 3)
  # 将Seurat对象添加到列表中
  seurat_list <- append(seurat_list, seurat_obj)
}

# 提取下划线前面的部分
sample_names <- sub("_.*", "", file_list)
# 合并Seurat对象，将所有Seurat对象合并到一个对象中
seurat_combined <- merge(seurat_list[[1]],
                         y = seurat_list[-1])
# 打印合并后的Seurat对象
print(seurat_combined)
seurat_combined@assays$RNA@counts[1:10, 1:2]
##存储数据所用时间会比较长####
saveRDS(seurat_combined, file = "seurat2.rds")

读取数据集给出的细胞亚群信息

代码语言：javascript复制

###读取数据集给出的细胞亚群信息
cell<-fread("./GSE234933_raw/GSE234933_MGH_HNSCC_cell_annotation.txt.gz")
colnames(cell)
meta2 <- cell %>% column_to_rownames("sample_barcode")
###两种方法可以添加细胞亚群信息，文章中给出的subclustering那列
# sce_obj <- CreateSeuratObject(counts = seurat_combined@assays$RNA$counts,
#                               meta.data =meta2,
#                               project = sample_name,
#                               min.features = 200,
#                               min.cells = 3)
sce3 <- AddMetaData(object = seurat_combined, metadata = meta2)
head(sce3@meta.data)

降维分群及harmony

代码语言：javascript复制

###########
###降维分群及harmony
dir.create("2-harmony2")
getwd()
setwd("2-harmony2")

# sce.all=readRDS("../1-QC/sce.all_qc.rds")
sce=sce3 
sce
sce <- NormalizeData(sce, 
                     normalization.method = "LogNormalize",
                     scale.factor = 1e4) 
sce <- FindVariableFeatures(sce)
sce <- ScaleData(sce)
sce <- RunPCA(sce, features = VariableFeatures(object = sce))
#install.packages("harmony")
library(harmony)
seuratObj <- RunHarmony(sce, "orig.ident")
names(seuratObj@reductions)
seuratObj <- RunUMAP(seuratObj,  dims = 1:15, 
                     reduction = "harmony")
DimPlot(seuratObj,reduction = "umap",label=T ) 

sce=seuratObj
sce <- FindNeighbors(sce, reduction = "harmony",
                     dims = 1:15) 
sce.all=sce

sce.all=FindClusters(sce.all, #graph.name = "CCA_snn",
                     resolution = 0.05, algorithm = 1)

sel.clust = "RNA_snn_res.0.05"
sce.all <- SetIdent(sce.all, value = sel.clust)
table(sce.all@active.ident) 
saveRDS(sce.all, "sce.all_int0.05.rds")

检查分群情况

代码语言：javascript复制

####################
#####检查分群情况
DimPlot(sce.all, reduction = "umap", group.by = "seurat_clusters",label = T)
colnames(sce.all@meta.data)
DimPlot(sce.all, reduction = "umap", group.by = "subclustering",label = T)

#####去除NA部分的亚群#######
table(sce.all$subclustering)
###给亚群中<NA>赋值0
sce.all$subclustering = sce.all$subclustering %>%
  replace(x = ., list =is.na(.), values =0)
sce.all$celltype<-ifelse(sce.all$subclustering=="Endothelial cells","Endothelial cells",
                              ifelse(sce.all$subclustering=="Fibroblasts","Fibroblasts",
                                     ifelse(sce.all$subclustering=="Lymphocytes","Lymphocytes",
                                            ifelse(sce.all$subclustering=="Myeloid cells","Myeloid cells",
                                                   ifelse(sce.all$subclustering=="Tumor cells","Tumor cells","NA")))))
table(sce.all$celltype)
sce <- sce.all[, !(sce.all$celltype %in% c('NA'))]
DimPlot(sce, reduction = "umap", group.by = "celltype",label=T) 
ggsave('umap_by_RNA_snn_res.0.05_paper.pdf',width =7,height = 6)

「左图是文章中的图，右图是我加入meta信息后重新降维分群得到的，复现可行性很高。」

「这样得出的结果去如果继续复现后续分析就会更方便一点，虽然上次的推文也进行了细胞亚群细分，但是不同的人对亚群细分的认知还是存在差异的。」

存储 list meta 对象数据

0 人点赞