表达芯片数据分析4——复杂数据及其分析（多分组数据）

多分组数据

示例：GSE474

练习：GSE106191

一般有一个对照组，多个实验组或者两两差异比较。

title: "tinyarray简化常规芯片分析流程"

output: html_document

editor_options:

chunk_output_type: console

代码语言：{r setup, include=FALSE}复制

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(fig.width = 10,fig.height = 7,collapse = TRUE)
knitr::opts_chunk$set(message = FALSE,warning = FALSE)

需要R包版本2.3.1及以上

代码语言：text复制

rm(list = ls())
#打破下载时间的限制,改前60秒，改后10w秒
options(timeout = 100000) 
options(scipen = 20)#不要以科学计数法表示

library(tinyarray)
packageVersion("tinyarray")
library(stringr)
gse = "GSE7305"
geo = geo_download(gse)##替代标准流程的两种代码；
exp = geo$exp
range(exp)
exp = log2(exp 1)
boxplot(exp,las = 2)
pd = geo$pd
gpl_number = geo$gpl
# 分组信息
k = str_detect(pd$title,"Normal");table(k)
Group = ifelse(k,"Normal","Disease")
Group = factor(Group,levels = c("Normal","Disease"))
# 探针注释
find_anno(geo$gpl)
ids <- AnnoProbe::idmap('GPL570')
head(ids)
#差异分析和它的可视化
dcp = get_deg_all(exp,Group,ids,entriz = F)
table(dcp$deg$change)
head(dcp$deg)
dcp$plots
library(ggplot2)
ggsave("deg.png",width = 15,height = 5)

代码语言：text复制

#富集分析
deg = get_deg(exp,Group,ids)
genes = deg$ENTREZID[deg$change!="stable"]
head(genes)
#有可能因为网络问题报错
g = quick_enrich(genes,destdir = tempdir())
names(g)
g[[1]][1:4,1:4]
library(patchwork)
g[[3]] g[[4]]
ggsave("enrich.png",width = 12,height = 7)

多分组数据

title: "GSE474"

output: html_document

editor_options:

chunk_output_type: console

代码语言：{r setup, include=FALSE}复制

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(fig.width = 10,fig.height = 7,collapse = TRUE)
knitr::opts_chunk$set(message = FALSE,warning = FALSE)

1.获取数据

R包需要自己安装哦。如果不会安装，建议先学习R语言基础，不要直接上手实战。另外，学习本篇需要建立在tinyarray基本使用会了的基础上，不会的话先看复杂分析这里的第一个文件夹。

代码语言：text复制

rm(list = ls())
#打破下载时间的限制,改前60秒，改后10w秒
options(timeout = 100000) 
options(scipen = 20)#不要以科学计数法表示

library(tinyarray)
packageVersion("tinyarray")
library(stringr)
gse = "GSE474"
geo = geo_download(gse)
exp = geo$exp
range(exp)
exp = log2(exp 1)
boxplot(exp,las = 2)
pd = geo$pd
gpl_number = geo$gpl

2.生成分组向量与探针注释

注意仍然是对照组放在levels的第一个位置，三分组一般是两两比较，后面差异分析时看清楚谁比谁就可以了。

代码语言：text复制

Group=str_split(pd$title,"-",simplify = T)[,3]
Group=factor(Group,levels = c("NonObese","Obese","MObese"))
gpl_number
find_anno(gpl_number)
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# 
# BiocManager::install("hgu133a.db")
library(hgu133a.db);ids <- toTable(hgu133aSYMBOL)
head(ids)

3.差异分析

代码参考自limma的pdf文档,如果使用tinyarray简化，这段代码就不需要了，已经写进了函数里。以下是原始代码，可以选择性学习一下

代码语言：text复制

library(limma)
design=model.matrix(~0 Group)
colnames(design)=levels(Group)
px = levels(Group)
px
contrast.matrix <- makeContrasts(paste0(px[2],"-",px[1]), 
                                 paste0(px[3],"-",px[2]), 
                                 paste0(px[3],"-",px[1]), 
                                 levels=design)
fit <- lmFit(exp, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
#用lapply分别提取出差异分析结果表格
deg = lapply(1:ncol(design), function(i){
  topTable(fit2,coef = i,number = Inf)
})
names(deg) = colnames(contrast.matrix)
names(deg)
head(deg[[1]])

此时得到的deg就是三次差异分析的结果表格组成的列表。Obese-NonObese的意思就是Obese为处理，NoObese为对照的差异分析结果。

之后就分别提取出结果画图即可。

4.tinyarray的简化操作

多分组的数据，get_deg_all仍然可以帮你简化操作，目前是三分组就两两差异分析，四个或五个分组的数据是后面几个组与第一个组差异分析，暂不支持其他的做法和更多的分组。

你可以自行对矩阵和临床信息取子集，两两差异分析。

代码语言：text复制

dcp = get_deg_all(exp,Group,ids,logFC_cutoff = 0.585,entriz = F)
dcp$plots
ggplot2::ggsave("deg.png",width = 15,height = 10)

富集分析

富集分析的输入数据是差异基因名字，只要你提供一组基因名就能做，要分别做三次还是取交集或合集一起做都可以，没有标准答案。

代码语言：text复制

deg = get_deg(exp,Group,ids,logFC_cutoff = 0.585)
g1 = deg$`Obese-NonObese`$ENTREZID[deg$`Obese-NonObese`$change!="stable"]
g2 = deg$`MObese-NonObese`$ENTREZID[deg$`MObese-NonObese`$change!="stable"]
g3 = deg$`MObese-Obese`$ENTREZID[deg$`MObese-Obese`$change!="stable"]

#T交集，F合集
if(F){
  genes = intersect_all(g1,g2,g3)
}else{
  genes = union_all(g1,g2,g3)
}

head(genes)
#有可能因为网络问题报错
g = quick_enrich(genes,destdir = tempdir())
names(g)
g[[1]][1:4,1:4]
library(patchwork)
g[[3]] g[[4]]
ggplot2::ggsave("enrich.png",width = 12,height = 7)

r语言

0 人点赞