qqboxplot--实现Q-Q plot和箱型图的整合!

2022-03-29 14:07:34 浏览数 (2)

导语

GUIDE ╲

qqboxplot作为ggplot的扩展,可以实现q-q箱线图的绘制。

背景介绍

箱形图(Box-plot)又称为盒式图或箱线图,是一种用作显示一组数据分散情况资料的统计图。它主要用于反映原始数据分布的特征,还可以进行多组数据分布特征的比较。QQplot也就是Quantile-Quantile Plots。是通过比较两个概率分布的分位数对这两个概率分布进行比较的概率图方法。

今天小编给大家介绍的qqboxplot,正是整合了这两类图形,将Q-Q plot的尾部信息合并到传统箱线图中,并显示尾部的置信区间,qqboxplot对于大型数据集具有更高的可靠性。

R包安装

代码语言:javascript复制
BiocManager::install("qqboxplot")
library(qqboxplot)

可视化介绍

01

比较箱线图、q-q图和 q-q箱线图

使用来自一名自闭症患者和一名对照患者的随机基因样本,对表达counts值取对数进行统计。

代码语言:javascript复制
library(ggplot2)
library(dplyr)
#设置统一文本大小
eltext <- 12
#q-q boxplot
qqbox <- expression_data %>% 
  ggplot(aes(specimen, log_count))   geom_qqboxplot(varwidth = TRUE, notch = TRUE)  
  ylab('logged normalized expression')   ggtitle("c) q-q boxplot")  
  theme(plot.title=element_text(size=eltext, face="plain", hjust=0.5), axis.title.x = element_text(size=eltext), axis.title.y = element_text(size=eltext),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid.major = element_line(colour = "grey70"),
        panel.grid.minor = element_line(colour="grey80"))

#常规箱型图
box <- expression_data %>%
  ggplot(aes(specimen, log_count))   geom_boxplot(varwidth = TRUE, notch = TRUE)  
  ylab('logged normalized expression')   ggtitle('a) boxplot')  
  theme(plot.title=element_text(size=eltext, face="plain", hjust=0.5), axis.title.x = element_text(size=eltext), axis.title.y = element_text(size=eltext),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid.major = element_line(colour = "grey70"),
        panel.grid.minor = element_line(colour="grey80"))

override.shape <- c(16, 17)
#q-q plot
qq <- expression_data %>%
  ggplot(aes(sample=log_count))   geom_qq(aes(color=specimen, shape=specimen))  
  xlab('theoretical normal distribution')  
  ylab('logged normalized expression')   ggtitle('b) q-q plot')  
  labs(color="specimen")  
  guides(color = guide_legend(override.aes = list(shape=override.shape)), shape=FALSE)  
  theme(plot.title=element_text(size=eltext, face="plain", hjust=0.5), axis.title.x = element_text(size=eltext), axis.title.y = element_text(size=eltext),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid.major = element_line(colour = "grey70"),
        panel.grid.minor = element_line(colour="grey80"),
        legend.position = c(0.8, 0.2))

library(gridExtra)
#合并图片
gridExtra::grid.arrange(box, qq, qqbox, ncol=3)

02

示例

模拟数据展示

代码语言:javascript复制
tibble(y=c(rnorm(1000, mean=2), rt(1000, 16), rt(500, 4), 
                   rt(1000, 8), rt(1000, 32)),
        group=c(rep("normal, mean=2", 1000), 
                rep("t distribution, df=16", 1000), 
                rep("t distribution, df=4", 500), 
                rep("t distribution, df=8", 1000), 
                rep("t distribution, df=32", 1000)))

使用模拟数据绘制箱型图:

代码语言:javascript复制
simulated_data %>%
  ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y))  
  geom_boxplot(notch=TRUE, varwidth = TRUE)  
  xlab(NULL)  
  ylab(NULL)  
  theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))

使用同一组数据绘制QQ-plot

代码语言:javascript复制
override.shape <- c(16, 17, 15, 3, 7)

simulated_data %>% ggplot(aes(sample=y, color=factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")),
                              shape=factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4"))))  
  geom_qq()   geom_qq_line()   labs(color="distribution")  
  xlab("Normal Distribution")  
  ylab("Simulated Datasets")  
  guides(color = guide_legend(override.aes = list(shape=override.shape)), shape=FALSE)  
  theme(axis.title.x = element_text(size=15), axis.title.y = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))

使用同一组数据绘制Q-Q boxplot

代码语言:javascript复制
simulated_data %>%
  ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y))  
  geom_qqboxplot(notch=TRUE, varwidth = TRUE, reference_dist="norm")  
  xlab("reference: normal distribution")  
  ylab(NULL)  
  guides(color=FALSE)  
  theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15),
        axis.title.x = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))

使用qqboxplot中自带的一些示例数据绘图

代码语言:javascript复制
comparison_data <- indicators %>% filter(year==2008 & `Series Code`=="SL.TLF.ACTI.1524.MA.NE.ZS")

indicators %>%
  #将series名称中的标签更改为较短的标题
  mutate(`Series Name`= ifelse(
    `Series Name`=="Labor force participation rate for ages 15-24, male (%) (national estimate)", 
    "Male ages 15-24", 
    "Female ages 15-24")) %>%
  ggplot(aes(as.factor(year), y=indicator)) 
  geom_qqboxplot(notch=TRUE, varwidth = TRUE, compdata=comparison_data$indicator)  
  xlab("Year")  
  ylab("Country level labor forcenparticipation rate (%)")  
  facet_wrap(~factor(`Series Name`, levels = c("Male ages 15-24", "Female ages 15-24")))  
  theme(strip.text = element_text(size=12), axis.text.x = element_text(size = 15), axis.title.x = element_text(size=15),
        axis.title.y = element_text(size=12),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))
代码语言:javascript复制
spike_data %>% filter(region=="V1") %>%
  ggplot(aes(factor(orientation), nspikes))  
  geom_qqboxplot(notch=TRUE, varwidth = TRUE, reference_dist="norm")  
  xlab("orientation")  
  ylab("spike count")  
  theme(axis.text.x = element_text(size = 15), axis.text.y = element_text(size=14), axis.title.x = element_text(size=15),
        axis.title.y = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))

同样的数据用bean plot展示

代码语言:javascript复制
spike_data %>% filter(region=="V1") %>%
  ggplot() 
  geom_violin(aes(x=factor(orientation),y=nspikes),fill='grey',trim=F, draw_quantiles = c(.25, .5, .75)) 
  geom_segment(aes(
    x=match(factor(orientation),levels(factor(orientation)))-0.1,
    xend=match(factor(orientation),levels(factor(orientation))) 0.1,
    y=nspikes,yend=nspikes),
    col='black'
  )  
  xlab("orientation")  
  ylab("spike count")  
  theme(axis.text.x = element_text(size = 15), axis.text.y = element_text(size=14), axis.title.x = element_text(size=15),
        axis.title.y = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))

小编总结

qqboxplot在箱型图和Q-Q图的结合上做了非常好的尝试,作为ggplot的扩展包,内部的函数也是大家比较熟悉的一些,上手还是非常快的!

0 人点赞