dplyr中的across操作

2022-11-15 11:00:55 浏览数 (1)

dplyr中的across函数取代了之前的xx_if/xx_at/xx_all,用法更加灵活,初学时觉得不如xx_if/xx_at/xx_all简单易懂,用习惯后真是利器!

主要是介绍across函数的用法,这是dplyr1.0才出来的一个函数,大大简化了代码

可用于对多列做同一个操作。

  • 一般用法
    • 陷阱
  • across其他连用
  • 和filter()连用

一般用法

代码语言:javascript复制
library(dplyr, warn.conflicts = FALSE)

across()有两个基本参数:

  • .cols:选择你想操作的列
  • .fn:你想进行的操作,可以使一个函数或者多个函数组成的列表

可以替代_if(),at_(),all_()

代码语言:javascript复制
starwars %>% 
  summarise(across(where(is.character), n_distinct))
## # A tibble: 1 × 8
##    name hair_color skin_color eye_color   sex gender homeworld species
##   <int>      <int>      <int>     <int> <int>  <int>     <int>   <int>
## 1    87         13         31        15     5      3        49      38

可以直接写列名:

代码语言:javascript复制
starwars %>% 
  group_by(species) %>% 
  filter(n() > 1) %>% 
  summarise(across(c(sex, gender, homeworld), n_distinct))
## # A tibble: 9 × 4
##   species    sex gender homeworld
##   <chr>    <int>  <int>     <int>
## 1 Droid        1      2         3
## 2 Gungan       1      1         1
## 3 Human        2      2        16
## 4 Kaminoan     2      2         1
## 5 Mirialan     1      1         1
## 6 Twi'lek      2      2         1
## 7 Wookiee      1      1         1
## 8 Zabrak       1      1         2
## 9 <NA>         1      1         3

也可以和where函数连用,省时省力:

代码语言:javascript复制
starwars %>% 
  group_by(homeworld) %>% 
  filter(n() > 1) %>% 
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))
## # A tibble: 10 × 4
##    homeworld height  mass birth_year
##    <chr>      <dbl> <dbl>      <dbl>
##  1 Alderaan    176.  64         43  
##  2 Corellia    175   78.5       25  
##  3 Coruscant   174.  50         91  
##  4 Kamino      208.  83.1       31.5
##  5 Kashyyyk    231  124        200  
##  6 Mirial      168   53.1       49  
##  7 Naboo       175.  64.2       55  
##  8 Ryloth      179   55         48  
##  9 Tatooine    170.  85.4       54.6
## 10 <NA>        139.  82        334.

如果没有缺失值,可以直接写mean,

代码语言:javascript复制
library(tidyr)
starwars %>% drop_na() %>% 
  group_by(homeworld) %>% 
  filter(n() > 1) %>% 
  summarise(across(where(is.numeric), mean))
## # A tibble: 4 × 4
##   homeworld height  mass birth_year
##   <chr>      <dbl> <dbl>      <dbl>
## 1 Corellia    175   78.5       25  
## 2 Mirial      168   53.1       49  
## 3 Naboo       177   62         60  
## 4 Tatooine    181.  96         37.6

acorss支持多个函数同时使用,只要放入列表中即可:

代码语言:javascript复制
min_max <- list(
  min = ~min(.x, na.rm = TRUE), 
  max = ~max(.x, na.rm = TRUE)
)
starwars %>% summarise(across(where(is.numeric), min_max))
## # A tibble: 1 × 6
##   height_min height_max mass_min mass_max birth_year_min birth_year_max
##        <int>      <int>    <dbl>    <dbl>          <dbl>          <dbl>
## 1         66        264       15     1358              8            896
starwars %>% summarise(across(c(height, mass, birth_year), min_max))
## # A tibble: 1 × 6
##   height_min height_max mass_min mass_max birth_year_min birth_year_max
##        <int>      <int>    <dbl>    <dbl>          <dbl>          <dbl>
## 1         66        264       15     1358              8            896

当然也是支持glue的:

代码语言:javascript复制
starwars %>% summarise(across(where(is.numeric), min_max, .names = "{.fn}.{.col}"))
## # A tibble: 1 × 6
##   min.height max.height min.mass max.mass min.birth_year max.birth_year
##        <int>      <int>    <dbl>    <dbl>          <dbl>          <dbl>
## 1         66        264       15     1358              8            896
代码语言:javascript复制
starwars %>% summarise(across(c(height, mass, birth_year), min_max, .names = "{.fn}.{.col}"))
## # A tibble: 1 × 6
##   min.height max.height min.mass max.mass min.birth_year max.birth_year
##        <int>      <int>    <dbl>    <dbl>          <dbl>          <dbl>
## 1         66        264       15     1358              8            896

分开写也是可以的:

代码语言:javascript复制
starwars %>% summarise(
  across(c(height, mass, birth_year), ~min(.x, na.rm = TRUE), .names = "min_{.col}"),
  across(c(height, mass, birth_year), ~max(.x, na.rm = TRUE), .names = "max_{.col}")
)
## # A tibble: 1 × 6
##   min_height min_mass min_birth_year max_height max_mass max_birth_year
##        <int>    <dbl>          <dbl>      <int>    <dbl>          <dbl>
## 1         66       15              8        264     1358            896

这种情况不能使用where(is.numeric),因为第2个across会使用新创建的列(“min_height”, “min_mass” and “min_birth_year”)。

可以放在tibble里解决:

代码语言:javascript复制
starwars %>% summarise(
  tibble(
    across(where(is.numeric), ~min(.x, na.rm = TRUE), .names = "min_{.col}"),
    across(where(is.numeric), ~max(.x, na.rm = TRUE), .names = "max_{.col}")  
  )
)
## # A tibble: 1 × 6
##   min_height min_mass min_birth_year max_height max_mass max_birth_year
##        <int>    <dbl>          <dbl>      <int>    <dbl>          <dbl>
## 1         66       15              8        264     1358            896

陷阱

在使用where(is.numeric),要注意下面这种情况:

代码语言:javascript复制
df <- data.frame(x = c(1, 2, 3), y = c(1, 4, 9))

df %>% 
  summarise(n = n(), across(where(is.numeric), sd))
##    n x        y
## 1 NA 1 4.041452

n这里是3,是一个常数,所以它的sd变成了NA,可以通过换一下顺序解决:

代码语言:javascript复制
df %>% summarise(across(where(is.numeric), sd),
                 n = n()
                 )
##   x        y n
## 1 1 4.041452 3

或者通过下面两种方法解决:

代码语言:javascript复制
df %>% 
  summarise(n = n(), across(where(is.numeric) & !n, sd))
##   n x        y
## 1 3 1 4.041452

df %>% 
  summarise(
    tibble(n = n(), across(where(is.numeric), sd))
  )
##   n x        y
## 1 3 1 4.041452

across其他连用

还可以和group_by()/count()/distinct()连用。

和filter()连用

across()不能直接和filter()连用,和filter()连用的是if_any()if_all()

  • if_any():任何一列满足条件即可
  • if_all():所有列都要满足条件
代码语言:javascript复制
starwars %>% 
  filter(if_any(everything(), ~ !is.na(.x)))
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>
代码语言:javascript复制
starwars %>% 
  filter(if_all(everything(), ~ !is.na(.x)))
## # A tibble: 29 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  3 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  4 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  5 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  6 Biggs D…    183    84 black      light      brown           24   male  mascu…
##  7 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
##  8 Anakin …    188    84 blond      fair       blue            41.9 male  mascu…
##  9 Chewbac…    228   112 brown      unknown    blue           200   male  mascu…
## 10 Han Solo    180    80 brown      fair       brown           29   male  mascu…
## # … with 19 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

0 人点赞