导语
GUIDE ╲
用许多条目来表示和比较时间序列,将它们绘制为折线图可能具有挑战性。绘制此类数据集的一种更方便的方法是地平线图,它能够压缩数据但仍保留所有信息。
背景介绍
ggHoriPlot 允许我们在 ggplot2 中轻松构建地平线图。
R包安装
代码语言:javascript复制BiocManager::install("ggthemes")
library(ggHoriPlot)
library(tidyverse)
library(patchwork)
library(ggthemes)
可视化介绍
01
基本绘图
通过使用 geom_horizon() 可以在 ggplot2 框架中添加一个层来构建一个地平线图。
代码语言:javascript复制x = 1:300
y = x * sin(0.1 * x)
dat_tab <- tibble(x = x,
xend = x 0.9999,
y = y)
a <- dat_tab %>%
ggplot()
geom_horizon(aes(x = x, y=y))
a
默认的 ggplot2 填充颜色可能不是地平线图的最佳调色板选择。可以使用 scale_fill_hcl() 函数来选择合适的配色方案。默认调色板会将低值着色为红色,将高值着色为蓝色。
代码语言:javascript复制a <- dat_tab %>%
ggplot()
geom_horizon(aes(x = x, y=y))
theme_few()
scale_fill_hcl()
a
改变原点
上面使用默认设置的示例将地平线图的原点计算为数据范围之间的中点。在 ggHoriPlot 中,这可以通过在 geom_horizon() 中指定所需的原点参数来实现。例如,如果我们想使用中位数作为原点:
代码语言:javascript复制a <- dat_tab %>%
ggplot()
geom_horizon(
aes(x = x,
y=y,
fill=..Cutpoints..),
origin = 'median'
)
theme_few()
scale_fill_hcl()
cutpoints <- tibble(
cuts = c(96.20134, 190.4594, 284.7174, -92.31478, -186.57283, -280.83089),
names = c('ypos1', 'ypos2', 'ypos3', 'yneg1', 'yneg2', 'yneg3'),
color = c("#D7E2D4", "#36ABA9", "#324DA0", "#F6DE90", "#E78200", "#A51122")
) %>%
mutate(names = factor(names, rev(names))) %>%
arrange(names)
me <- median(dat_tab$y, na.rm = T)
b <- plotAllLayers(dat_tab, me, cutpoints$cuts, cutpoints$color)
b/a plot_layout(guides = 'collect', heights = c(6, 1))
更改地平线比例
除了原点,ggHoriPlot 还允许自定义地平线比例,即切割的数量和发生的位置。切割的默认数量设置为 6,如在上面的所有示例中,但它可以设置为任何其他整数,例如5:
代码语言:javascript复制a <- dat_tab %>%
ggplot()
geom_horizon(
aes(x = x,
# xend = xend,
y=y,
fill=..Cutpoints..),
horizonscale = 5,
origin = 'midpoint'
)
theme_few()
scale_fill_hcl()
cutpoints <- tibble(
cuts = c(97.33383, 210.44349, -128.88551, -241.99518, -355.10485),
names = c('ypos1', 'ypos2', 'yneg1', 'yneg2', 'yneg3'),
color = c("#69BBAB", "#324DA0", "#FEFDBE", "#EB9C00", "#A51122")
) %>%
mutate(names = factor(names, rev(names))) %>%
arrange(names)
mid <- sum(range(dat_tab$y, na.rm = T))/2
b <- plotAllLayers(dat_tab, mid, cutpoints$cuts, cutpoints$color)
b/a plot_layout(guides = 'collect', heights = c(5, 1))
分面
ggHoriPlot 也可用于对数据进行分面,这在需要比较不同时间序列时特别有用:
代码语言:javascript复制x = 1:400
y = x * sin(0.2 * x) 100
dat_tab_bis <- tibble(x = x,
xend = x 0.9999,
y = y)
tab_tot <- mutate(dat_tab, type = 'A') %>%
bind_rows(mutate(dat_tab_bis, type='B'))
tab_tot %>%
ggplot()
geom_horizon(aes(x = x, y=y))
facet_wrap(~type, ncol = 1, scales = 'free_y')
theme_few()
scale_fill_hcl()
02
应用实例
一天中活动的高峰时间
运动和休闲活动发生在一天中的不同时间。地平线图可用于将此时间序列数据压缩为信息丰富且易于解释的图表:
代码语言:javascript复制utils::data(sports_time)
sports_time %>% ggplot()
geom_horizon(aes(time/60, p), origin = 'min', horizonscale = 4)
facet_wrap(~activity, ncol = 1, strip.position = 'right')
scale_fill_hcl(palette = 'Peach', reverse = T)
theme_few()
theme(
panel.spacing.y=unit(0, "lines"),
strip.text.y = element_text(angle = 0),
legend.position = 'none',
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
panel.border = element_blank()
)
scale_x_continuous(
name = 'Time',
breaks=seq(from = 3, to = 27, by = 3),
labels = function(x) {sprintf("d:00", as.integer(x %% 24))})
ggtitle('Peak time of day for sports and leisure')
亚洲的 COVID-19 病例
代码语言:javascript复制utils::data(COVID)
COVID %>%
ggplot()
geom_horizon(aes(date_mine,
y), origin = 'min', horizonscale = 4)
scale_fill_hcl(palette = 'BluGrn', reverse = T)
facet_grid(countriesAndTerritories~.)
theme_few()
theme(
panel.spacing.y=unit(0, "lines"),
strip.text.y = element_text(size = 7, angle = 0, hjust = 0),
legend.position = 'none',
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
panel.border = element_blank()
)
scale_x_date(expand=c(0,0), date_breaks = "1 month", date_labels = "%b")
ggtitle('Cumulative number for 14 days of COVID-19 cases per 100,000',
'in Asia, 2020')
xlab('Date')
Simple repeat content
地平线图也可用于绘制基因组数据,例如人类基因组中的Simple repeat content。
代码语言:javascript复制utils::data(rmsk)
cutpoint_tab <- rmsk %>%
ungroup() %>%
mutate(
outlier = between(
p_repeat,
quantile(p_repeat, 0.25, na.rm=T)-1.5*IQR(p_repeat, na.rm=T),
quantile(p_repeat, 0.75, na.rm=T) 1.5*IQR(p_repeat, na.rm=T))) %>%
filter(outlier)
ori <- sum(range(cutpoint_tab$p_repeat, na.rm = T))/2
sca <- seq(range(cutpoint_tab$p_repeat)[1],
range(cutpoint_tab$p_repeat)[2],
length.out = 6)
rmsk %>%
ggplot()
geom_horizon(aes(x = bin, xend=bin_2, y = p_repeat, fill = ..Cutpoints..),
origin = ori, horizonscale = sca)
facet_grid(genoName~., switch = 'y')
theme_few()
theme(
panel.spacing.y=unit(0, "lines"),
strip.text.y.left = element_text(size = 7, angle = 0, hjust = 1),
legend.position = c(0.85, 0.4),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
panel.border = element_blank()
)
scale_x_continuous(expand=c(0,0))
scale_fill_hcl()
ggtitle('Simple repeat content along the human genome',
'in 100 kb windows')
xlab('Position')
guides(fill=guide_legend(title="% of repeats"))
小编总结
作为比较新上线的R包,ggHoriPlot绘制地平线图是非常有优势的,允许我们比较简单的生成基本图形,通过与ggplot2的联合使用,可以实现大数据的可视化,结果清晰美观,非常适合大家使用!