GEO数据挖掘—3
富集分析
(一)GO富集分析(用差异基因做富集)
输入数据
代码语言:javascript复制#(1)输入数据
gene_up = deg$ENTREZID[deg$change == 'up']
gene_down = deg$ENTREZID[deg$change == 'down']
gene_diff = c(gene_up,gene_down) #得到了差异基因
代码语言:javascript复制#(2)富集
#以下步骤耗时很长,设置了存在即跳过
f = paste0(gse_number,"_GO.Rdata")
if(!file.exists(f)){
ego <- enrichGO(gene = gene_diff,
OrgDb= org.Hs.eg.db,
ont = "ALL",
readable = TRUE)
ego_BP <- enrichGO(gene = gene_diff,
OrgDb= org.Hs.eg.db,
ont = "BP",
readable = TRUE)
#ont参数:One of "BP", "MF", and "CC" subontologies, or "ALL" for all three.
save(ego,ego_BP,file = f)
}
代码语言:javascript复制#(3)可视化
#条带图
barplot(ego)
barplot(ego, split = "ONTOLOGY", font.size = 10,
showCategory = 5)
facet_grid(ONTOLOGY ~ ., space = "free_y",scales = "free_y")
#气泡图
dotplot(ego)
dotplot(ego, split = "ONTOLOGY", font.size = 10,
showCategory = 5)
facet_grid(ONTOLOGY ~ ., space = "free_y",scales = "free_y")
代码语言:javascript复制#(4)展示top通路的共同基因,要放大看。
#gl 用于设置下图的颜色
gl = deg$logFC
names(gl)=deg$ENTREZID
#Gene-Concept Network,要放大看
cnetplot(ego,
#layout = "star",
color.params = list(foldChange = gl),
showCategory = 3)
(二)KEGG富集分析
上调、下调、差异、所有基因
代码语言:javascript复制#(1)输入数据
gene_up = deg[deg$change == 'up','ENTREZID']
gene_down = deg[deg$change == 'down','ENTREZID']
gene_diff = c(gene_up,gene_down)
代码语言:javascript复制#(2)对上调/下调/所有差异基因进行富集分析
f2 = paste0(gse_number,"_KEGG.Rdata")
if(!file.exists(f2)){
kk.up <- enrichKEGG(gene = gene_up,
organism = 'hsa')
kk.down <- enrichKEGG(gene = gene_down,
organism = 'hsa')
kk.diff <- enrichKEGG(gene = gene_diff,
organism = 'hsa')
save(kk.diff,kk.down,kk.up,file = f2)
}
load(f2)
代码语言:javascript复制#(3)看看富集到了吗?https://mp.weixin.qq.com/s/NglawJgVgrMJ0QfD-YRBQg
table(kk.diff@result$p.adjust<0.05)
table(kk.up@result$p.adjust<0.05)
table(kk.down@result$p.adjust<0.05)
代码语言:javascript复制#(4)双向图
# 富集分析所有图表默认都是用p.adjust,富集不到可以退而求其次用p值,在文中说明即可
down_kegg <- kk.down@result %>%
filter(pvalue<0.05) %>% #筛选行
mutate(group=-1) #新增列
up_kegg <- kk.up@result %>%
filter(pvalue<0.05) %>%
mutate(group=1)
source("kegg_plot_function.R")
g_kegg <- kegg_plot(up_kegg,down_kegg)
g_kegg
标准流程的后续
问题数据和常见错误分析
复杂数据及其分析
1.多分组数据:示例GSE474
2.多数据联系分析:例如GSE83521_ and_ GSE89143
批次效应