跟着Genes|Genomes|Genetics学数据分析:R语言edgeR包做转录组差异表达分析

2023-01-06 20:09:25 浏览数 (2)

论文

Sex-Specific Co-expression Networks and Sex-Biased Gene Expression in the Salmonid Brook Charr Salvelinus fontinalis

数据代码公开

https://github.com/bensutherland/sfon_wgcna

还有wgcna的代码,论文里对方法和结果部分介绍的还挺详细,可以对照着论文然后学习WGCNA的代码

今天的推文先学习差异表达分析的代码

论文中提供的原始count文件有100多个样本,数据量有点大。这里我只选择其中的20个样本。

读取表达量文件

代码语言:javascript复制
library(readr)
my.counts<-read_csv("data/20220623/edgeR_counts.csv")
head(my.counts)
dim(my.counts)

对数据进行取整

代码语言:javascript复制
library(tidyverse)
my.counts.round<- my.counts %>% 
  column_to_rownames("transcript.id") %>% 
  round()
dim(my.counts.round)
head(my.counts.round)

对数据进行过滤

这里的过滤标准我有点没看明白

代码语言:javascript复制
library(edgeR)
edger.counts <- DGEList(counts = my.counts.round)
min.reads.mapping.per.transcript <- 10
cpm.filt <- min.reads.mapping.per.transcript / min(edger.counts$samples$lib.size) * 1000000
cpm.filt
min.ind <- 5

keep <- rowSums(cpm(edger.counts)>cpm.filt) >= min.ind
table(keep)
filtered.counts <- edger.counts[keep, , keep.lib.sizes=FALSE]
filtered.counts %>% class()
dim(filtered.counts)

filtered.counts <- calcNormFactors(filtered.counts, method = c("TMM"))
filtered.counts$samples

filtered.counts<-estimateDisp(filtered.counts)

将数据和样本信息结合

代码语言:javascript复制
new.group.info<-read_csv("data/20220623/edgeR_group_info.csv")


identical(filtered.counts$samples %>% rownames(),
          new.group.info$file.name)
new.group.info$sex<-factor(new.group.info$sex,
                           levels = c("F","M"))
levels(new.group.info$sex)
design <- model.matrix(~filtered.counts$samples$group)
design
colnames(design)[2] <- "sex"

差异表达分析

代码语言:javascript复制
fit <- glmFit(y = filtered.counts, design = design)
lrt <- glmLRT(fit)

result <- topTags(lrt, n = 1000000) 

火山图

代码语言:javascript复制
result$table %>% 
  mutate(change = case_when(
    PValue < 0.05 & logFC > 2 ~ "UP",
    PValue < 0.05 & logFC < -2 ~ "DOWN",
    TRUE ~ "NOT"
  )) -> DEG

table(DEG$change)

library(ggplot2)
ggplot(data=DEG,aes(x=logFC,
                   y=-log10(PValue),
                   color=change)) 
  geom_point(alpha=0.8,size=3) 
  labs(x="log2 fold change")  ylab("-log10 pvalue") 
  #ggtitle(this_title) 
  theme_bw(base_size = 20) 
  #theme(plot.title = element_text(size=15,hjust=0.5),) 
  scale_color_manual(values=c('#a121f0','#bebebe','#ffad21')) -> p1 

p1  
  geom_vline(xintercept = 2,lty="dashed") 
  geom_vline(xintercept = -2,lty="dashed") -> p2

library(patchwork)
pdf(file = "edger_deg.pdf",
    width = 9.4,height = 4,family = "serif")
p1 p2 
  plot_layout(guides = "collect")
dev.off()

0 人点赞