单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析6

2022-08-28 18:03:58 浏览数 (1)

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析1:https://cloud.tencent.com/developer/article/2055573

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析2:https://cloud.tencent.com/developer/article/2072069

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析3:https://cloud.tencent.com/developer/article/2078159

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析4:https://cloud.tencent.com/developer/article/2078348

单细胞代码解析-妇科癌症单细胞转录组及染色质可及性分析5:https://cloud.tencent.com/developer/article/2084580

图片.png图片.png

代码解析

代码语言:javascript复制
# Part 4: SingleR cell typing 
###########################################################

# SingleR labeling of celltypes
##########################################################################
##开始进行单细胞的细胞注释
##SingleR里面自带了数据集,依托于人类和小鼠的注释内容进行细胞注释
##构建S4对象
rna.sce <- as.SingleCellExperiment(rna)
# Set reference paths:
# Wang et al. GSE111976
ref.data.counts <- readRDS("./GSE111976_ct_endo_10x.rds")
meta <- read.csv("./GSE111976_summary_10x_day_donor_ctype.csv")
rownames(meta) <- meta$X

length(which(rownames(meta) == colnames(ref.data.counts)))
ref.data.endo <- CreateSeuratObject(ref.data.counts,meta.data = meta)
Idents(ref.data.endo) <- "cell_type"
ref.data.endo <- NormalizeData(ref.data.endo)

# SingleR annotation
#####################################################
##读取内置数据集
# Read in reference datasets for SingleR annotation 

# 1) Slyper et al. Nat. Medicine 2020 scRNA-seq ovarian tumor 
ref.data.endo <- as.SingleCellExperiment(ref.data.endo)

# 2) Human Primary Cell Atlas Data (microarray)
ref.data.HPCA <- readRDS("./HPCA_celldex.rds")
# 
# # 3) BluePrint Encode (bulk RNA-seq)
ref.data.BED <- readRDS("./BluePrintEncode_celldex.rds")



# 2) Single-cell level label transfer: 
##SingleR:单细胞细胞类型定义工具
predictions.HPCA.sc <- SingleR(test=rna.sce, assay.type.test="logcounts", assay.type.ref="logcounts",
                               ref=ref.data.HPCA, labels=ref.data.HPCA$label.main)
predictions.BED.sc <- SingleR(test=rna.sce, assay.type.test="logcounts", assay.type.ref="logcounts",
                              ref=ref.data.BED, labels=ref.data.BED$label.main)
# Use de.method wilcox with scRNA-seq reference b/c the reference data is more sparse
predictions.endo.sc <- SingleR(test=rna.sce, assay.type.test="logcounts", assay.type.ref="logcounts",
                               ref=ref.data.endo, labels=ref.data.endo$cell_type,de.method = "wilcox")


rna$SingleR.HPCA <- predictions.HPCA.sc$pruned.labels
rna$SingleR.BED <- predictions.BED.sc$pruned.labels
rna$SingleR.endo <- predictions.endo.sc$pruned.labels


# Save Seurat object 
#date <- Sys.Date()
saveRDS(rna,paste0(SAMPLE.ID,"_scRNA_processed.rds"))

# ###########################################################################################################
# # Starting cells, PostQC cells, doublets, Post doublet/QC cells, Cluster #
output.meta <- data.frame(StartingNumCells=length(colnames(counts.init)),
                          nMADLogCounts =2,
                          nMADLogFeatures = 2,
                          nMADLog1pMito =2,
                          PostQCNumCells=PostQCNumCells,
                          ExpectedDoubletFraction=doublet.rate,
                          ObservedDoubletFraction=length(doublets)/length(colnames(counts.init)),
                          PostDoubletNumCells=length(colnames(rna)),
                          NumClusters=length(levels(Idents(rna))),
                          DoubletFinderpK = pK.1,
                          MinNumCounts=min(rna$nCount_RNA),
                          MaxNumCounts= max(rna$nCount_RNA),
                          MedianNumbCounts = median(rna$nCount_RNA),
                          MinNumFeats=min(rna$nFeature_RNA),
                          MaxNumFeats= max(rna$nFeature_RNA),
                          MedianNumbFeats = median(rna$nFeature_RNA),
                          stringsAsFactors=FALSE)
output <- as.data.frame(t(output.meta))
colnames(output) <- SAMPLE.ID
xlsx::write.xlsx(output, "scRNA_pipeline_summary.xlsx",
                 row.names = T, col.names = TRUE)




#########################################################################################################
# END OF SCRIPT
##软件的详细链接网址:[https://github.com/dviraran/SingleR](https://github.com/dviraran/SingleR)
#########################################################################################################

总结

为了快速的获得细胞的类型,如果有大量的数据集支撑还是比较好的,但是对于植物而言,除了拟南芥这个比较模式物种外,其他的物种注释也是个问题,因此我觉得结合普通转录组的数据比较好,后面我自己也要放这个内容,准备下面做其他的复现的时候,加这些内容。感觉作者提供的这些单细胞的数据的代码有些冗余,其实不用这么多的代码就可以解决上面的问题。

0 人点赞