单细胞数据复现-肺癌文章代码复现3

2022-05-08 21:15:51 浏览数 (1)

单细胞数据复现-肺癌文章代码复现1:https://cloud.tencent.com/developer/article/1992648

单细胞数据复现-肺癌文章代码复现2:https://cloud.tencent.com/developer/article/1995619

通过第一部分的数据基本处理,目前是已经得到了三种不同的细胞分群结果。

这一部分主要是对的epi细胞分群进行的细分,也是代码很长,我准备分成两个部分进行拆解

library、color及数据的加载

首先是按照复现1同样的参数加载包及颜色等变量。

代码语言:javascript复制
// ### load libraries
library(ggplot2)
library(Seurat)
library(dplyr)
library(reticulate)
library(sctransform)
library(cowplot)
library(viridis)
library(tidyr)
library(magrittr)
library(reshape2)
library(readxl)
library(stringr)
library(cowplot)
library(scales)
library(tibble)
library(gplots)
library(RColorBrewer)

theme_set(theme_cowplot())

#color scheme
use_colors <- c(
  Tumor = "brown2",
  Normal = "deepskyblue2",
  G1 = "#46ACC8",
  G2M = "#E58601",
  S = "#B40F20",
  Epithelial = "seagreen",
  Immune = "darkgoldenrod2",
  Stromal = "steelblue",
  p018 = "#E2D200",
  p019 = "#46ACC8",
  p023 = "#E58601",
  p024 = "#B40F20",
  p027 = "#0B775E",
  p028 = "#E1BD6D",
  p029 = "#35274A",
  p030 = "#F2300F",
  p031 = "#7294D4",
  p032 = "#5B1A18",
  p033 = "#9C964A",
  p034 = "#FD6467",
  AT1 = "#2B8CBE",
  AT2 = "#045A8D",
  Club = "#006D2C",
  DifferentiatingCiliated = "#31A354",
  Ciliated = "#74C476",
  Neuroendocrine = "#8856A7",
  lepidic = "#0B775E",
  acinar = "#74A089",
  `mucinuous (papillary)` = "#E2D200",
  `(micro)papillary` = "#CEAB07",
  solid = "#B40F20",
  sarcomatoid = "#5B1A18",
  CNN = "chartreuse4",
  CNA = "orange")

#load data
epi_anno <- readRDS("seurat_objects/epi_anno.RDS")

读取亚群细分的水平变量

下面为了对epi这个分群进行细分,加入了几种水平变量,也是为了后面的亚群在细分。

代码语言:javascript复制
epi_anno@meta.data$cell_type_epi <- factor(epi_anno@meta.data$cell_type_epi, levels = c("AT2",
                                                                                        "AT1",
                                                                                        "Club",
                                                                                        "Ciliated",
                                                                                        "Neuroendocrine",
                                                                                        "Tumor"))

#add inferCNV clone scores (for inferCNV, use seurat object epi_anno and the following code: https://github.com/bischofp/single_cell_lung_adenocarcinoma, computation time ~12h)
##加入scna读数,对epi的分群结果进行评估,并将结果整合到meta.data中
##read.delim函数读取带分隔符的文本文件 
scna_scores <- read.delim("infercnv_clone_scores_nsclc.tsv") %>% filter(tissue_type == "Tumor") %>% filter(!is.na(cna_clone))
rownames(scna_scores) <- scna_scores$cell_id
scna_scores <- scna_scores %>% select(cna_clone) %>% mutate(cna_clone = as.character(cna_clone))
epi_anno <- AddMetaData(epi_anno, scna_scores)

scna_data <- FetchData(epi_anno, c("tissue_type", "cna_clone"))
scna_data <- scna_data %>% mutate(cna_clone = ifelse(is.na(cna_clone), "CNN", cna_clone))
epi_anno <- AddMetaData(epi_anno, scna_data)

features的可视化

由于seurat的更新,有些函数发生了一些变化,所以在test代码的时候,我也是进行了修改,是比较适用于目前的seurat4的版本的。

这部分作者选取了自己感兴趣的基因,来通过不同的group.by的方式进行分群的可视化。

代码语言:javascript复制
// #some plots
##更改地方subsetdata改为subseet,subset.name及后面的accept.values,主要原因是软件升级后函数更改
DotPlot(subset(epi_anno, subset = cluster_type == "Normal"), features = c("ABCA3", "SFTPC", "AGER", "PDPN",  "KRT5", "TRP63", "NGFR", "SCGB1A1", "MUC5B", "FOXJ1", "TMEM190", "CHGA", "CALCA"), group.by = "cell_type_epi")   
  coord_flip()   
  scale_color_viridis()
##ggsave2("DotPlot_markergenes_epi_cell_type.emf", path = "../results", width = 11, height = 8, units = "cm")
ggsave2("Fig2B.png", path = "../results", width = 11, height = 8, units = "cm")

这里应该是需要调节字体大小的,但是我懒得调了,因为我只是学思路,也不是搬运换图的代码,后期自己需要的话,我在改。

Fig2B.pngFig2B.png

这里是根据的celltype进行区分,来看分群的情况。

代码语言:javascript复制
DimPlot(epi_anno, group.by = "cell_type_epi", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_celltype.png", path = "../results", width = 15, height = 15, units = "cm")

我怀疑我的图片有背景,有可能是前面的参数丢了一个,

Fig2A_celltype.pngFig2A_celltype.png

跟上面的代码也是一样的,主要是更改了group.by的参数,改成病人来源的id。

代码语言:javascript复制
DimPlot(epi_anno, group.by = "patient_id", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_patients.png", path = "../results", width = 15, height = 15, units = "cm")

下面的是组织来源的样本亚群可视化结果。

代码语言:javascript复制
DimPlot(epi_anno, group.by = "tissue_type", cols = use_colors, pt.size = 0.5)
ggsave2("Fig2A_tissuetype.png", path = "../results", width = 15, height = 15, units = "cm")
Fig2A_tissuetype.pngFig2A_tissuetype.png

对每个类型的count值进行可视化

代码语言:javascript复制
##首先是将这三个不同来源的分组合并到counts矩阵中
epi_cell_counts <- FetchData(epi_anno, vars = c("tissue_type", "cell_type_epi", "cna_clone", "cluster_type")) %>%
  mutate(tissue_type = factor(tissue_type, levels = c("Tumor", "Normal")))
##根据不同的fill填充变量来对count值进行可视化
ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cell_type_epi))  
  geom_bar(position = "fill")  
  scale_fill_manual(values = use_colors)  
  scale_y_reverse()  
  coord_flip()
ggsave2("Fig2A_barplot.pdf", path = "../results", width = 20, height = 5, units = "cm")

ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cna_clone))  
  geom_bar(position = "fill")  
  scale_fill_manual(values = use_colors)  
  scale_y_reverse()  
  coord_flip()
ggsave2("SuppFig4_barplot_cna_clone.pdf", path = "../results", width = 20, height = 5, units = "cm")

ggplot(data = epi_cell_counts, aes(x = tissue_type, fill = cluster_type))  
  geom_bar(position = "fill")  
  scale_fill_manual(values = c("cyan3", "darkorange2"))  
  coord_flip()
ggsave2("SuppFig4_barplot_cluster_type.pdf", path = "../results", width = 20, height = 5, units = "cm")
##也是跟上面类似,是选用不同的分组方式进行亚群的展示,我觉得这个放到上面那一类有可能相对好一点
DimPlot(epi_anno, group.by = "cna_clone", cols = use_colors, pt.size = 0.5)
ggsave2("SuppFig4_umap_cna_clone.png", path = "../results", width = 15, height = 15, units = "cm")

DimPlot(epi_anno, group.by = "cluster_type", cols = c("cyan3", "darkorange2"), pt.size = 0.5)
ggsave2("SuppFig4_umap_cluster_type.png", path = "../results", width = 15, height = 15, units = "cm")

这张图的height值应该改的大一些,比如height = 10左右,应该就能把全部的值放到一个画布上了。

Fig2A_barplot.pdfFig2A_barplot.pdf

这张图也是同样的问题,所有的元素没有到一张画布上,还是需要更改height值。

SuppFig4_barplot_cluster_type.pdfSuppFig4_barplot_cluster_type.pdf

或者跟这张图一样,把多余的图例去除,应该就能把全部的元素放到一张图上了。

SuppFig4_barplot_cluster_type.pdfSuppFig4_barplot_cluster_type.pdf
SuppFig4_umap_cna_clone.pngSuppFig4_umap_cna_clone.png
SuppFig4_umap_cluster_type.pngSuppFig4_umap_cluster_type.png

总结

作者通过epi的亚群进行分析,选择其中的几种分群方式,然后定了level的程度,对不同分组来源的进行可视化,这个基本上做亚群细分的那部分的必要要做的,也是我后面的课题当中需要做的,可以发现由于颜色设置的选取较多,导致绘图的时候是有一些元素不在画布上的,因此需要自己后续手动调整,但是前面保存了文件,就可以后面不断的调试。如果大家也是跟到这一步的话,比较建议把今天的rdata保存下来,明天直接加载。

发现在这部分我们可以用到的就是亚群细分的levels的读取的函数,还有对这个亚群的矩阵打分的函数,以及后面的根据不同的分组来源进行umap图可视化的过程,其实可以加入正常和肿瘤样本的分组,对结果更加直观化。

0 人点赞