106-R可视化30-底层绘图系统grid学习之重头创建ggplot对象之一

参考：
- 《R绘图系统》Paul Murrell
- 4.7 Building New Graphical Elements | Mastering Software Development in R (bookdown.org)[1]

前言

虽然ggplot2 和它的朋友们[[xx-R可视化30-ggplot又一拓展包之ggforce]], [[xx-R可视化xx-用ggalt体验ggplot新版DLC（拓展）]] 给我们提供了大量绘图的选项。比如通过操纵 geom_** 和stat_** 函数。

主要有两种场景，我们可能会需要自己创建一个绘图对象：

有时候，ggplot 并不能满足我们的一些绘图需求，比如：

上面提到的ggalt 与ggforce 这两个包出发点就是基于此的。

亦或是，有时候我们的绘图直接调用ggplot 调整比较麻烦，比如你的工作流对图片的格式，以及输出，比如统计学计算都有一定的要求，此时也可以手撕一个ggplot 对象；比如 [[58-R可视化7-用ggpubr与ggstatsplot快速出高颜值图]] 这两个包就是基于此开发出了快速绘制高颜值ggplot 的方法。

从某种程度来说，自定义一个绘图对象，有点类似于函数——同样都是把框架碓好，通过传入对象与设定的参数，达到可重复操作的目的。

在[[91-R可视化23-底层绘图系统grid学习之grob对象]] 与[[96-R可视化25-底层绘图系统grid学习之viewports]] 中，我们分别学习了操作viewport与创建底层grob 对象，以及组合、编辑、合并等操作。

现在，基于现有的ggplot 功力以及还没出新手村的grid 实力，正好我们才[[101-R可视化29-底层绘图系统grid学习之使用grid作图]] 昨晚，让我们趁热打铁。

1.1-重头创建ggplot 对象

这一部分会涉及到一些R 的面向对象编程。预警哦。

ggplot 对象的重头创建，主要有两个步骤：

ggproto() 方法从父类Geom创建ggproto 类，并个性化类中的方法和属性，以控制输出的图像；
通过上一步创建的类，创建一个geom_* 方法，该方法就类似于我们在一般ggplot 作图时使用的方法了，如geom_point；

1-创建ggproto 类

有如下几个设置的内容：

代码语言：javascript复制

GeomNEW <- ggproto("GeomNEW", Geom,
        required_aes = <a character vector of required aesthetics>,
        default_aes = aes(<default values for certain aesthetics>),
        draw_key = <a function used to draw the key in the legend>,
        draw_panel = function(data, panel_scales, coord) {
                ## Function that returns a grid grob that will 
                ## be plotted (this is where the real work occurs)
        }
)

这里从父类Geom 继承，并需要指定四个参数：

required_aes 设置图像必须需要的元素，比如我们想要画散点图，那么x, y 是必不可少的；
default_aes 设置默认的元素及其参数，比如shape, color, size 这些；
draw_key 用来配置图例的标记，可以直接调用ggplot 内部的函数draw_key_*，如abline, blank, crossbar 等等，基本和geom_** 对应；
draw_panel 用来绘制具体的图，其会接受三个参数，data 也就是ggplot 中指定的绘图数据框，panel_scales 则是一个包括x 与y 的scale 信息的列表，而coord 则是一个描述图象坐标系统的对象。

下面是一个示例：

代码语言：javascript复制

library(grid)
GeomMyPoint <- ggproto("GeomMyPoint", Geom,
                       required_aes = c("x", "y"),
                       default_aes = aes(shape = 1),
                       draw_key = draw_key_point,
                       draw_panel = function(data, panel_scales, coord) {
                         ## Transform the data first
                         coords <- coord$transform(data, panel_scales)
                         ## Let's print out the structure of the 'coords' and other object
                         sink("tmp.txt")
                         str(data)
                         print("data")
                         str(panel_scales)
                         print("panel_scales")
                         str(coord)
                         print("coord")
                         str(coords)
                         print("coords")
                         sink()
                         ## Construct a grid grob
                         pointsGrob(
                           x = coords$x,
                           y = coords$y,
                           pch = coords$shape
                         )
                       })

在draw_panel 方法中，通过coord 将数据转换为对应坐标轴的数值，目测这里ggplot 也提供了接口，我们后面可以直接在构建grid 对象时直接调用转换后结果中的内容，比如x,y 等aesthetics。

这里在创建draw_panel 方法的同时，同时保存这几个对象的结构，便于我们理解这个过程。

简单查看一下这个类：

代码语言：javascript复制

> str(GeomMyPoint)
Classes 'GeomMyPoint', 'Geom', 'ggproto', 'gg' <ggproto object: Class GeomMyPoint, Geom, gg>
    aesthetics: function
    default_aes: uneval
    draw_group: function
    draw_key: function
    draw_layer: function
    draw_panel: function
    extra_params: na.rm
    handle_na: function
    non_missing_aes: 
    optional_aes: 
    parameters: function
    required_aes: x y
    setup_data: function
    setup_params: function
    use_defaults: function
    super:  <ggproto object: Class Geom, gg> 
> class(GeomMyPoint)
[1] "GeomMyPoint" "Geom"        "ggproto"     "gg"

**2.1-创建一个`geom_*` 方法**

接下来我们就要就要借助ggplot2::layer 方法创建我们个性化的图层。

这里摘一下help 说明中的内容：

★A layer is a combination of data, stat and geom with a potential position adjustment. Usually layers are created using geom_* or stat_* calls but it can also be created directly using this function. ”

默认值为：

代码语言：javascript复制

layer(
  geom = NULL,
  stat = NULL,
  data = NULL,
  mapping = NULL,
  position = NULL,
  params = list(),
  inherit.aes = TRUE,
  check.aes = TRUE,
  check.param = TRUE,
  show.legend = NA,
  key_glyph = NULL,
  layer_class = Layer
)

按我个人理解，这个layer 方法创建的对象和我们平时利用geom_* or stat_* 创建的对象是一致的。

只不过，限定了这个layer 方法必须具体指定geom mapping data stat, position 这些参数：

代码语言：javascript复制

> tmp <- geom_point()
> tmp
geom_point: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 
> tmp2 <- ggplot2::layer()
错误: Attempted to create layer with no geom.
Run `rlang::last_error()` to see where the error occurred.

而geom_* or stat_* 创建的对象相当于替我们配置了这些参数的默认值。

这里我们将ggplot2::layer 外面再包装一层函数，设置好默认值：

代码语言：javascript复制

geom_mypoint <- function(mapping = NULL, data = NULL, stat = "identity",
                         position = "identity", na.rm = FALSE, 
                         show.legend = NA, inherit.aes = TRUE, ...) {
  ggplot2::layer(
    geom = GeomMyPoint, mapping = mapping,  
    data = data, stat = stat, position = position, 
    show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

接下来我们就可以利用这个创建的自定义的geom_** 方法进绘图了：

代码语言：javascript复制

ggplot(data = iris, aes(Sepal.Length, Sepal.Width))   geom_mypoint()

怎么样，是不是挺丑的。

但是！这可是我们自己创建的ggplot 对象呢！

代码语言：javascript复制

> p <- ggplot(data = iris, aes(Sepal.Length, Sepal.Width))   geom_mypoint()
> class(p)
[1] "gg"     "ggplot"

这也就代表着我们可以调用ggplot 的方法来处理他：

代码语言：javascript复制

p   ggtitle("my ggplot object")

不过，即便我指定了color 参数，依旧没有变化：

代码语言：javascript复制

p <- ggplot(data = iris, aes(Sepal.Length, Sepal.Width),
            color = "red")   geom_mypoint()
p   ggtitle("my ggplot object")

我唯一能想到的方法，就是第一步创建类的时候直接处理 pointsGrob 对象：

代码语言：javascript复制

pointsGrob(
                           x = coords$x,
                           y = coords$y,
                           pch = coords$shape,
                           gp = gpar(color = "blue")
                         )

可即便如此，依旧没有任何变化。

后来思考了一下，应该是通过required_aes 来控制吧，发现还是没用。

哦，原来是参数写错了：

代码语言：javascript复制

                         ## Construct a grid grob
pointsGrob(
                           x = coords$x,
                           y = coords$y,
                           pch = coords$shape,
                           gp = grid::gpar(col = "blue")
                         )

不过，我可不可以制作出根据类别一列区分颜色的效果呢？

3-查看draw_panel 函数相关的几个对象

先前我们通过sink 捕获了几个对象：

代码语言：javascript复制

library(grid)
GeomMyPoint <- ggproto("GeomMyPoint", Geom,
                       required_aes = c("x", "y"),
                       default_aes = aes(shape = 1),
                       draw_key = draw_key_point,
                       draw_panel = function(data, panel_scales, coord) {
                         ## Transform the data first
                         coords <- coord$transform(data, panel_scales)
                         ## Let's print out the structure of the 'coords' and other object
                         sink("tmp.txt")
                         str(data)
                         print("data")
                         str(panel_scales)
                         print("panel_scales")
                         str(coord)
                         print("coord")
                         str(coords)
                         print("coords")
                         sink()
                         ## Construct a grid grob
                         pointsGrob(
                           x = coords$x,
                           y = coords$y,
                           pch = coords$shape
                         )
                       })

来查看一下。

data

代码语言：javascript复制

'data.frame': 150 obs. of  5 variables:
 $ x    : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ y    : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ group: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ shape: num  1 1 1 1 1 1 1 1 1 1 ...
[1] "data"

虽然我们在创建grob 对象的时候使用了shape，不过底下的三个是哪里来的呢？

panel_scales

这个包括x 与y 的scale 信息的列表对象，真的是非常的复杂：

coord

代码语言：javascript复制

Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
    aspect: function
    backtransform_range: function
    clip: on
    default: TRUE
    distance: function
    expand: TRUE
    is_free: function
    is_linear: function
    labels: function
    limits: list
    modify_scales: function
    range: function
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    setup_data: function
    setup_layout: function
    setup_panel_guides: function
    setup_panel_params: function
    setup_params: function
    train_panel_guides: function
    transform: function
    super:  <ggproto object: Class CoordCartesian, Coord, gg> 
[1] "coord"

你和coords 仅仅差了一个字母，怎么就复杂了这么多捏？

这个coords 就是根据坐标转换后的data 了：

代码语言：javascript复制

'data.frame': 150 obs. of  5 variables:
 $ x    : num  0.247 0.197 0.146 0.121 0.222 ...
 $ y    : num  0.614 0.424 0.5 0.462 0.652 ...
 $ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ group: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ shape: num  1 1 1 1 1 1 1 1 1 1 ...
[1] "coords"

我们可以对比一下这个data 数据框：

代码语言：javascript复制

'data.frame': 150 obs. of  5 variables:
 $ x    : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ y    : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ group: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ shape: num  1 1 1 1 1 1 1 1 1 1 ...
[1] "data"

虽然coords 是data 经过了某种转换，可是我并没有发现其中的规律，比如x 与y 对应位置上的元素并非是等比的，不过也有可能打乱了顺序。

4-根据数据大小自动变化透明度

主要还是增强draw_panel 方法：

代码语言：javascript复制

GeomAutoTransparent <- ggproto("GeomAutoTransparent", Geom,
        required_aes = c("x", "y"),
        default_aes = aes(shape = 19),
        draw_key = draw_key_point,
        draw_panel = function(data, panel_scales, coord) {
                ## Transform the data first
                coords <- coord$transform(data, panel_scales)
                
                ## Compute the alpha transparency factor based on the
                ## number of data points being plotted
                n <- nrow(data)
                if(n > 100 && n <= 200)
                        coords$alpha <- 0.3
                else if(n > 200)
                        coords$alpha <- 0.15
                else
                        coords$alpha <- 1
                ## Construct a grid grob
                grid::pointsGrob(
                        x = coords$x,
                        y = coords$y,
                        pch = coords$shape,
                        gp = grid::gpar(alpha = coords$alpha)
                )
        })

这样我们的geom 对象就会根据nrow 来判断，选择合适的透明度：

代码语言：javascript复制

geom_transparent <- function(mapping = NULL, data = NULL, stat = "identity",
                         position = "identity", na.rm = FALSE, 
                         show.legend = NA, inherit.aes = TRUE, ...) {
        ggplot2::layer(
                geom = GeomAutoTransparent, mapping = mapping,  
                data = data, stat = stat, position = position, 
                show.legend = show.legend, inherit.aes = inherit.aes,
                params = list(na.rm = na.rm, ...)
        )
}

ggplot(data = worldcup, aes(Time, Shots))   geom_transparent()

不过需要注意的是，这样的方法虽然在透明度上没有什么问题，不过如果是在其他分类时，比如说按照列对shape 进行区分，则在分面是可能会有一些问题。

这个主要是每次计算，geom 都会根据分类获得的subgroup data 进行计算，虽然这样帮我们抽象了分组的方法，但是却可能造成分面与原先数据的差异，如果有这点的考虑的话，建议在设计data 时，就新增用于图形属性绘制的一列。

不过缺点就是需要用户自行调整了。这也是某种意义上的tradeoff吧。

可问题是，我前面也提到了，我们自己创建的这个geom_** 对象，根本没有办法接收诸如color, fill, alpha 在内的属性。这又改如何把data 中的对应列给到图像呢？

只能先在此处停顿一下了。

下一部分我们继续介绍剩下的内容。

参考资料

[1]

4.7 Building New Graphical Elements | Mastering Software Development in R (bookdown.org): https://bookdown.org/rdpeng/RProgDA/building-new-graphical-elements.html

data draw grid layer panel

0 人点赞

106-R可视化30-底层绘图系统grid学习之重头创建ggplot对象之一

前言

1.1-重头创建ggplot 对象

1-创建ggproto 类

2.1-创建一个geom_* 方法

3-查看draw_panel 函数相关的几个对象

4-根据数据大小自动变化透明度

参考资料

**2.1-创建一个`geom_*` 方法**