单细胞数据复现-肺癌文章代码复现1:https://cloud.tencent.com/developer/article/1992648
单细胞数据复现-肺癌文章代码复现2:https://cloud.tencent.com/developer/article/1995619
通过第一部分的数据基本处理,目前是已经得到了三种不同的细胞分群结果。
这一部分主要是对的epi细胞分群进行的细分,也是代码很长,我准备分成两个部分进行拆解
library、color及数据的加载
首先是按照复现1同样的参数加载包及颜色等变量。
代码语言:javascript复制// ### load libraries
library(ggplot2)
library(Seurat)
library(dplyr)
library(reticulate)
library(sctransform)
library(cowplot)
library(viridis)
library(tidyr)
library(magrittr)
library(reshape2)
library(readxl)
library(stringr)
library(cowplot)
library(scales)
library(tibble)
library(gplots)
library(RColorBrewer)
theme_set(theme_cowplot())
#color scheme
use_colors <- c(
Tumor = "brown2",
Normal = "deepskyblue2",
G1 = "#46ACC8",
G2M = "#E58601",
S = "#B40F20",
Epithelial = "seagreen",
Immune = "darkgoldenrod2",
Stromal = "steelblue",
p018 = "#E2D200",
p019 = "#46ACC8",
p023 = "#E58601",
p024 = "#B40F20",
p027 = "#0B775E",
p028 = "#E1BD6D",
p029 = "#35274A",
p030 = "#F2300F",
p031 = "#7294D4",
p032 = "#5B1A18",
p033 = "#9C964A",
p034 = "#FD6467",
AT1 = "#2B8CBE",
AT2 = "#045A8D",
Club = "#006D2C",
DifferentiatingCiliated = "#31A354",
Ciliated = "#74C476",
Neuroendocrine = "#8856A7",
lepidic = "#0B775E",
acinar = "#74A089",
`mucinuous (papillary)` = "#E2D200",
`(micro)papillary` = "#CEAB07",
solid = "#B40F20",
sarcomatoid = "#5B1A18",
CNN = "chartreuse4",
CNA = "orange")
#load data
epi_anno <- readRDS("seurat_objects/epi_anno.RDS")
读取亚群细分的水平变量
下面为了对epi这个分群进行细分,加入了几种水平变量,也是为了后面的亚群在细分。
代码语言:javascript复制epi_anno@meta.data$cell_type_epi <- factor(epi_anno@meta.data$cell_type_epi, levels = c("AT2",
"AT1",
"Club",
"Ciliated",
"Neuroendocrine",
"Tumor"))
#add inferCNV clone scores (for inferCNV, use seurat object epi_anno and the following code: https://github.com/bischofp/single_cell_lung_adenocarcinoma, computation time ~12h)
##加入scna读数,对epi的分群结果进行评估,并将结果整合到meta.data中
##read.delim函数读取带分隔符的文本文件
scna_scores <- read.delim("infercnv_clone_scores_nsclc.tsv") %>% filter(tissue_type == "Tumor") %>% filter(!is.na(cna_clone))
rownames(scna_scores) <- scna_scores$cell_id
scna_scores <- scna_scores %>% select(cna_clone) %>% mutate(cna_clone = as.character(cna_clone))
epi_anno <- AddMetaData(epi_anno, scna_scores)
scna_data <- FetchData(epi_anno, c("tissue_type", "cna_clone"))
scna_data <- scna_data %>% mutate(cna_clone = ifelse(is.na(cna_clone), "CNN", cna_clone))
epi_anno <- AddMetaData(epi_anno, scna_data)
features的可视化
由于seurat的更新,有些函数发生了一些变化,所以在test代码的时候,我也是进行了修改,是比较适用于目前的seurat4的版本的。
这部分作者选取了自己感兴趣的基因,来通过不同的group.by的方式进行分群的可视化。
代码语言:javascript复制// #some plots
##更改地方subsetdata改为subseet,subset.name及后面的accept.values,主要原因是软件升级后函数更改
DotPlot(subset(epi_anno, subset = cluster_type == "Normal"), features = c("ABCA3", "SFTPC", "AGER", "PDPN", "KRT5", "TRP63", "NGFR", "SCGB1A1", "MUC5B", "FOXJ1", "TMEM190", "CHGA", "CALCA"), group.by = "cell_type_epi")
coord_flip()
scale_color_viridis()
##ggsave2("DotPlot_markergenes_epi_cell_type.emf", path = "../results", width = 11, height = 8, units = "cm")
ggsave2("Fig2B.png", path = "../results", width = 11, height = 8, units = "cm")
这里应该是需要调节字体大小的,但是我懒得调了,因为我只是学思路,也不是搬运换图的代码,后期自己需要的话,我在改。
这里是根据的celltype进行区分,来看分群的情况。
代码语言:javascript复制DimPlot(epi_anno, group.by = "cell_type_epi", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_celltype.png", path = "../results", width = 15, height = 15, units = "cm")
我怀疑我的图片有背景,有可能是前面的参数丢了一个,
跟上面的代码也是一样的,主要是更改了group.by的参数,改成病人来源的id。
代码语言:javascript复制DimPlot(epi_anno, group.by = "patient_id", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_patients.png", path = "../results", width = 15, height = 15, units = "cm")
下面的是组织来源的样本亚群可视化结果。
代码语言:javascript复制DimPlot(epi_anno, group.by = "tissue_type", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_tissuetype.png", path = "../results", width = 15, height = 15, units = "cm")
对每个类型的count值进行可视化
代码语言:javascript复制##首先是将这三个不同来源的分组合并到counts矩阵中
epi_cell_counts <- FetchData(epi_anno, vars = c("tissue_type", "cell_type_epi", "cna_clone", "cluster_type")) %>%
mutate(tissue_type = factor(tissue_type, levels = c("Tumor", "Normal")))
##根据不同的fill填充变量来对count值进行可视化
ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cell_type_epi))
geom_bar(position = "fill")
scale_fill_manual(values = use_colors)
scale_y_reverse()
coord_flip()
ggsave2("Fig2A_barplot.pdf", path = "../results", width = 20, height = 5, units = "cm")
ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cna_clone))
geom_bar(position = "fill")
scale_fill_manual(values = use_colors)
scale_y_reverse()
coord_flip()
ggsave2("SuppFig4_barplot_cna_clone.pdf", path = "../results", width = 20, height = 5, units = "cm")
ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cluster_type))
geom_bar(position = "fill")
scale_fill_manual(values = c("cyan3", "darkorange2"))
coord_flip()
ggsave2("SuppFig4_barplot_cluster_type.pdf", path = "../results", width = 20, height = 5, units = "cm")
##也是跟上面类似,是选用不同的分组方式进行亚群的展示,我觉得这个放到上面那一类有可能相对好一点
DimPlot(epi_anno, group.by = "cna_clone", cols = use_colors, pt.size = 0.5)
ggsave2("SuppFig4_umap_cna_clone.png", path = "../results", width = 15, height = 15, units = "cm")
DimPlot(epi_anno, group.by = "cluster_type", cols = c("cyan3", "darkorange2"), pt.size = 0.5)
ggsave2("SuppFig4_umap_cluster_type.png", path = "../results", width = 15, height = 15, units = "cm")
这张图的height值应该改的大一些,比如height = 10左右,应该就能把全部的值放到一个画布上了。
这张图也是同样的问题,所有的元素没有到一张画布上,还是需要更改height值。
或者跟这张图一样,把多余的图例去除,应该就能把全部的元素放到一张图上了。
总结
作者通过epi的亚群进行分析,选择其中的几种分群方式,然后定了level的程度,对不同分组来源的进行可视化,这个基本上做亚群细分的那部分的必要要做的,也是我后面的课题当中需要做的,可以发现由于颜色设置的选取较多,导致绘图的时候是有一些元素不在画布上的,因此需要自己后续手动调整,但是前面保存了文件,就可以后面不断的调试。如果大家也是跟到这一步的话,比较建议把今天的rdata保存下来,明天直接加载。
发现在这部分我们可以用到的就是亚群细分的levels的读取的函数,还有对这个亚群的矩阵打分的函数,以及后面的根据不同的分组来源进行umap图可视化的过程,其实可以加入正常和肿瘤样本的分组,对结果更加直观化。