分享是一种态度
我们今天继续探索这3个gene signatures,首先看它在不同clusters的细胞之间的表达分布。
代码语言:javascript复制clust_avg_prognosis <- matrix(NA, nrow = length(unique(HSMM_allepith_clustering$Cluster)), ncol = ncol(prognosis_sig))
#列明是clusters1-5
rownames(clust_avg_prognosis) <- paste("clust", c(1:length(unique(HSMM_allepith_clustering$Cluster))), sep = "")
#行名是mammaprint、zenawerb、artega和Cluster
colnames(clust_avg_prognosis) <- colnames(prognosis_sig)
#每个clusters的三个gene signatures的平均值
for (c in 1:length(unique(HSMM_allepith_clustering$Cluster))) {
clust_avg_prognosis[c,] <- apply(prognosis_sig[which(HSMM_allepith_clustering$Cluster == c),], 2, mean)}
prognosis_sig <- as.data.frame(prognosis_sig)
all.equal(rownames(prognosis_sig), colnames(HSMM_allepith_clustering))
prognosis_sig$Cluster <- as.numeric(HSMM_allepith_clustering$Cluster)
prognosis_melt <- melt(prognosis_sig, id.vars = c("Cluster"))
prognosis_melt$value <- as.numeric(prognosis_melt$value)
prognosis_melt <- prognosis_melt %>%
filter(Cluster %in% c(2,3,4))
#画fig4b
p <- ggplot(prognosis_melt, aes(factor(Cluster), value, fill = factor(Cluster)))
scale_fill_manual(values = c("1" = "#ee204d", "2" = "#17806d", "3" = "#b2ec5d", "4" = "#cda4de", "5" = "#1974d2"))
p geom_violin(adjust = .5) facet_wrap(~variable) stat_summary(fun.y = mean, geom = "point", shape = 22, size = 3)
代码语言:javascript复制#画figS16
venn(list("PS" = mammaprint, "MBS" = zenawerb, "RTS" = artega))
可以看到这3个gene signatures没有重叠的基因,并且它们来源不同,但这3个 gene signatures均在clusters 2亚群中都高表达。
今天这一部分,作者主要是想跟临床预后联系到一起,于是选取了3个有代表性的数据集gene signature,对868个上皮细胞的5个clusters进行探索,结果提示clusters2具有与其他clusters不同的特征,接着对clusters2进行进一步探索。在之前的第五篇笔记tSNE中,已经用differentialGeneTest对每一个上皮细胞cluster与全部的上皮细胞进行差异分析后,以FDR=0.1为阈值,选取前10个有显著差异变化的基因在METABRIC 数据库中进行生存分析,作者在补充材料里已经描述得很详细。
结果显示高表达clusters2的肿瘤与OS显著负相关相关性。相反,但是在三个gene signatures中的预后则无统计学意义。此外,clusters1、3、4和5的基因无法预测临床结果。这些发现揭示了单细胞测序分析能揭示与临床预后相关细胞状态的能力,但这些细胞状态还没有在肿瘤真实世界样本中没被发现。
我们继续探索。在fig3a中,我们已经对868个上皮细胞进行tSNE,接着我们将这个tSNE结果,进行进一步注释可并可视化。
代码语言:javascript复制## 可视化normal signatures,figS10b
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "assignments_normsig_both", cell_size = 3)
代码语言:javascript复制#可视化TNBCtype4 Lehman signatures
#去掉immunomodulatory和mesenchymal_stem_like上皮细胞
lehman_avg_both_epithelial_new <- lehman_avg_both_epithelial[-which(rownames(lehman_avg_both_epithelial) %in% c("immunomodulatory", "mesenchymal_stem_like")),]
#给868个上皮细胞进行分类(basal_like_1,basal_like_2,mesenchymal,luminal_ar)
assignments_lehman_both_epithelial_new <- apply(lehman_avg_both_epithelial_new, 2, function(x){rownames(lehman_avg_both_epithelial_new)[which.max(x)]})
all.equal(colnames(HSMM_allepith_clustering), names(assignments_lehman_both_epithelial_new))
pData(HSMM_allepith_clustering)$assignments_lehman_both_new <- assignments_lehman_both_epithelial_new
#画figS10c
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "assignments_lehman_both_new", cell_size = 3)
代码语言:javascript复制#可视化mammaprint signature (PS),figS10d
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "mammaprint_avg_exprs", cell_size = 3)
scale_color_continuous(low = "yellow", high = "blue")
然后,通过研究在cluster2中与其他上皮细胞群的差异表达的基因中富集的通路变化,来确定cluster2亚群背后的功能机制。我们发现,最富集的通路与鞘糖脂的生物合成和溶酶体的周转有关,这两种途径影响了同样富集到的先天免疫系统的细胞因子途径。接着可视化每个样本的鞘糖脂的生物合成和先天免疫系统的细胞因子两条途径的热图。
糖鞘糖脂最近被认为是乳腺癌中许多促进肿瘤特性的介质,包括改变的生长因子信号、EMT和干样行为。该途径中的多个关键基因在cluster2亚群中高表达,包括糖脂转移蛋白基因GLTP和鞘脂生物合成关键亚单位基因SPTLC1。鞘磷脂也被认为是天然免疫系统和获得性免疫系统的重要调节剂,并与多种上皮组织炎症相关癌变有关。
代码语言:javascript复制#path1是糖鞘糖脂的关键基因
path1 <- c("ST3GAL4", "ST3GAL6", "ST8SIA1", "FUT1", "FUT2", "FUT3", "FUT4", "FUT6", "FUT9", "GLTP", "SPTLC1", "KDSR", "SPTLC2", "CERK", "ARSA")
idx_path1 <- match(path1, rownames(HSMM_allepith_clustering))
path1是先天免疫系统的细胞因子途径的关键基因
path2 <- c("CCL20", "CCL22", "CCL4", "CCR6", "IL11", "IL12RB1", "IL6R", "CSF2RA", "BMP7", "GLMN", "GPI", "INHBA", "TNF", "TNFSF15", "GHR", "LEPR", "TLR1", "TLR2", "TLR5", "TLR7", "TLR10", "F11R")
idx_path2 <- match(path2, rownames(HSMM_allepith_clustering))
path3 <- c("ERBB2", "AKT1", "AKT3", "PIK3R3", "PIK3R4", "RPS6KB2", "TRIB3", "BTK", "GRB10", "GRB2", "ILK", "PAK1", "PRKCZ", "CSNK2A1",
"IRS1", "IRAK1", "MYD88", "MAP2K1", "MAPK8", "MAPK1", "PTPN11", "EIF4E", "EIF4EBP1", "EIF4G1", "FKBP1A", "PDK1", "RHEB", "RPS6KB1")
idx_path3 <- match(path3, rownames(HSMM_allepith_clustering))
all_idx_paths <- c(idx_path1, idx_path2)
#命名
names_path <- c(rep("glycosphigolipid metabolism", length(idx_path1)), rep("innate immunity", length(idx_path2)))
#上色
anno_cols_path <- c("glycosphigolipid metabolism" = "#ff9baa", "innate immunity" = "#17806d")
ha_path_rows <- HeatmapAnnotation(df = data.frame(pathway = names_path),
annotation_legend_param = list(pathway = list(ncol = 1, title = "pathway", title_position = "topcenter")),
which = "row", col = list("pathway" = anno_cols_path), annotation_width = unit(3, "mm"))
#分离每个cluster的矩阵,并提取关键基因。
mat_epith_allpats <- exprs(HSMM_allepith_clustering)
mats_epith_patient <- list()
pds_epith_patient <- list()
mats_epith_patient_clusters <- list()
for (i in 1:length(patients_now)) {
pds_epith_patient[[i]] <- pData(HSMM_allepith_clustering)[which(pData(HSMM_allepith_clustering)$patient == patients_now[i]), ]
mats_epith_patient[[i]] <- mat_epith_allpats[all_idx_paths, which(pData(HSMM_allepith_clustering)$patient == patients_now[i])]
mats_epith_patient_clusters[[i]] <- list()
for (c in 1:length(unique(HSMM_allepith_clustering$Cluster))) {
mats_epith_patient_clusters[[i]][[c]] <- mats_epith_patient[[i]][, which(pds_epith_patient[[i]]$Cluster == c)]
}
names(mats_epith_patient_clusters[[i]]) <- paste0("clust", c(1:5))
}
names(mats_epith_patient) <- patients_now
names(pds_epith_patient) <- patients_now
# 接下来就是画图了。
ht_paths <-
ha_path_rows
Heatmap(mats_epith_patient_clusters[[1]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
show_column_dend = TRUE, show_column_names = FALSE,
name = "cluster1", column_title = "cluster1",
row_names_gp = gpar(fontsize = 9, col = anno_cols_path),
split = names_path)
Heatmap(mats_epith_patient_clusters[[1]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[1]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[1]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[1]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[2]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
show_column_dend = TRUE, show_column_names = FALSE,
name = "cluster1_2", column_title = "cluster1",
row_names_gp = gpar(fontsize = 9, col = anno_cols_path),
split = names_path)
Heatmap(mats_epith_patient_clusters[[2]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2_2", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[2]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3_2", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[2]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4_2", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[2]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5_2", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[3]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster1_3", column_title = "cluster1",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[3]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2_3", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[3]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3_3", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[3]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4_3", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[3]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5_3", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[4]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster1_4", column_title = "cluster1",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[4]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2_4", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[4]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3_4", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[4]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4_4", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[4]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5_4", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[5]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster1_5", column_title = "cluster1",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[5]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2_5", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[5]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3_5", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[5]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4_5", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[5]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5_5", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[6]][[1]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster1_6", column_title = "cluster1",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[6]][[2]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster2_6", column_title = "cluster2",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[6]][[3]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster3_6", column_title = "cluster3",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[6]][[4]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster4_6", column_title = "cluster4",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
Heatmap(mats_epith_patient_clusters[[6]][[5]],
col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
cluster_rows = TRUE, show_row_dend = FALSE,
name = "cluster5_6", column_title = "cluster5",
show_row_names = FALSE,
show_column_dend = TRUE, show_column_names = FALSE)
画得有点丑,画图参数得调整需要参考ComplexHeatmap包的说明书,继续美化一下。
到了这里,这篇文献的单细胞部分就学习完了,花了一个月的时间断断续续把这个系列共7篇笔记写完。作者把这1000多个细胞不断分群,不断探索,最后集中在了上皮细胞群,进一步细分,然后结合临床预后情况,继续探索。这是很标准的单细胞分析流程。当然,文章中也加了一些免疫组化的实验方法,加以验证。还有一部分外显子测序,这又是另外一个需要学习的领域了,在单细胞学习的路上,感恩得到了“单细胞天地”公众号团队的答疑解惑。2021年,希望能学到更多的东西,写更多的笔记,相信一切的努力都不会白白付出。
往期回顾
多组学分析肺结核队列的记忆T细胞状态
单细胞混样测序至少可以区分性别
构建seurat对象之初就应该是把基因名字转换好
OSCA单细胞数据分析笔记9—Clustering
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
- 生信爆款入门-2021第4期
- 数据挖掘(GEO,TCGA,单细胞)2021第4期
- 明码标价之共享96线程384G内存服务器