聚类分析,老生常谈了,物以类聚人以群分,大概就是这么个意思。
相比于聚类分析本身,我更喜欢它的结果可视化的部分,虽然之前介绍过很多了,但是今天这个方法,还是要强烈推荐一下。
加载数据和R包
为了和之前保持一致,还是使用之前用过的数据演示~
代码语言:javascript复制data(nutrient, package = "flexclust")
row.names(nutrient) <- tolower(row.names(nutrient))
# 简单看下数据结构
dim(nutrient)
## [1] 27 5
str(nutrient)
## 'data.frame': 27 obs. of 5 variables:
## $ energy : int 340 245 420 375 180 115 170 160 265 300 ...
## $ protein: int 20 21 15 19 22 20 25 26 20 18 ...
## $ fat : int 28 17 39 32 10 3 7 5 20 25 ...
## $ calcium: int 9 9 7 9 17 8 12 14 9 9 ...
## $ iron : num 2.6 2.7 2 2.6 3.7 1.4 1.5 5.9 2.6 2.3 ...
然后就是今天的要介绍的R包:dendextend
。
它很神奇,支持管道!
代码语言:javascript复制suppressPackageStartupMessages(library(tidyverse))
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.15.2
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
进行聚类分析
使用管道构建一个聚类树对象,感觉很神奇!
代码语言:javascript复制dend <- nutrient %>%
dist() %>% # 计算距离
hclust() %>% # 聚类
as.dendrogram() # 转换一下
画图
其实你直接plot
也是可以出图的,并且也能进行一些美化操作:
plot(dend)
plot of chunk unnamed-chunk-4
但是今天介绍的这个dendextend
更加方便也更加操作友好,支持管道操作!
dend %>%
# 线条的设置
set("branches_col", "grey") %>%
set("branches_lwd", "3") %>%
set("labels_col", "orange") %>%
# 标签的颜色
set("labels_cex", 0.8) %>%
# 点的设置
set("leaves_pch", 19) %>%
set("leaves_cex", 0.7) %>%
set("leaves_col", "red") %>%
# 画图
plot()
plot of chunk unnamed-chunk-5
看看这个流畅又好理解的操作,很强!
美化
也可以进行更加精细化的美化,比如添加矩形框,分组添加颜色等,都是支持的~
代码语言:javascript复制par(mar=c(1,1,1,7))
dend %>%
# 添加不同的颜色
set("labels_col", value = c("skyblue", "orange", "grey"), k=3) %>%
set("branches_k_color", value = c("skyblue", "orange", "grey"), k = 3) %>%
# 水平绘制
plot(horiz=TRUE, axes=FALSE)
# 添加一条竖线
abline(v = 350, lty = 2)
# 添加矩形框
rect.dendrogram(dend, k=3, lty = 5, lwd = 0, x=1, horiz = T, col=rgb(0.1, 0.2, 0.4, 0.1) )
是不是很神奇?
添加分组条形
类似于WGCNA里面的聚类树一样,可以在底部添加条形。
比如根据protein
这一列分组,大于等于20的显示红色,小于20的是绿色。
tmp <- ifelse(nutrient$protein < 20, "green","red")
par(mar=c(10,1,1,1))
dend %>%
set("labels_col", value = c("skyblue", "orange", "grey"), k=3) %>%
set("branches_k_color", value = c("skyblue", "orange", "grey"), k = 3) %>%
set("leaves_pch", 19) %>%
set("nodes_cex", 0.7) %>%
plot(axes=FALSE)
# 加颜色条
colored_bars(colors = tmp, dend = dend, rowLabels = "am")
tanglegram图
代码语言:javascript复制# 准备2个聚类树对象,使用不同的方法
d1 <- nutrient %>% dist() %>% hclust( method="average" ) %>% as.dendrogram()
d2 <- nutrient %>% dist() %>% hclust( method="complete" ) %>% as.dendrogram()
# 自定义每个聚类树,放到一个列表中
dl <- dendlist(
d1 %>%
set("labels_col", value = c("skyblue", "orange", "grey"), k=3) %>%
set("branches_lty", 1) %>%
set("branches_k_color", value = c("skyblue", "orange", "grey"), k = 3),
d2 %>%
set("labels_col", value = c("skyblue", "orange", "grey"), k=3) %>%
set("branches_lty", 1) %>%
set("branches_k_color", value = c("skyblue", "orange", "grey"), k = 3)
)
# 使用tanglegram画到一起
tanglegram(dl,
common_subtrees_color_lines = FALSE, highlight_distinct_edges = TRUE, highlight_branches_lwd=FALSE,
margin_inner=7,
lwd=2
)
plot
非常强,今天又学到了!