今天我们依旧利用 msleep 数据集来探讨 dplyr 的列筛选,并在最后补充几个行筛选的例子。
切片选择
- 选择某列到某列的数据
msleep %>%
select(name:order)
# A tibble: 83 x 4
# name genus vore order
# <chr> <chr> <chr> <chr>
# 1 Cheetah Acinonyx carni Carnivora
# 2 Owl monkey Aotus omni Primates
# 3 Mountain beaver Aplodontia herbi Rodentia
# 4 Greater short-tailed shrew Blarina omni Soricomorpha
# 5 Cow Bos herbi Artiodactyla
# 6 Three-toed sloth Bradypus herbi Pilosa
# 7 Northern fur seal Callorhinus carni Carnivora
# 8 Vesper mouse Calomys NA Rodentia
# 9 Dog Canis carni Carnivora
#10 Roe deer Capreolus herbi Artiodactyla
# … with 73 more rows
- 去除某列到某列数据
去除 sleep_total 到 awake 列
代码语言:javascript复制msleep %>% select(-(sleep_total:awake))
# A tibble: 83 x 7
# name genus vore order conservation brainwt bodywt
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 Cheetah Acinonyx carni Carnivora lc NA 50
# 2 Owl monkey Aotus omni Primates NA 0.0155 0.48
# 3 Mountain beaver Aplodont… herbi Rodentia nt NA 1.35
# 4 Greater short-tailed… Blarina omni Soricomor… lc 0.00029 0.019
# 5 Cow Bos herbi Artiodact… domesticated 0.423 600
# 6 Three-toed sloth Bradypus herbi Pilosa NA NA 3.85
# 7 Northern fur seal Callorhi… carni Carnivora vu NA 20.5
# 8 Vesper mouse Calomys NA Rodentia NA NA 0.045
# 9 Dog Canis carni Carnivora domesticated 0.07 14
#10 Roe deer Capreolus herbi Artiodact… lc 0.0982 14.8
# … with 73 more rows
- 删除 sleep_total 到 awake|的数据,但保留 sleep_rem。
msleep %>% select(-(sleep_total:awake),sleep_rem)
# A tibble: 83 x 8
# name genus vore order conservation brainwt bodywt sleep_rem
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 Cheetah Acinon… carni Carnivo… lc NA 50 NA
# 2 Owl monkey Aotus omni Primates NA 0.0155 0.48 1.8
# 3 Mountain beaver Aplodo… herbi Rodentia nt NA 1.35 2.4
# 4 Greater short-… Blarina omni Soricom… lc 0.00029 0.019 2.3
# 5 Cow Bos herbi Artioda… domesticated 0.423 600 0.7
# 6 Three-toed slo… Bradyp… herbi Pilosa NA NA 3.85 2.2
# 7 Northern fur s… Callor… carni Carnivo… vu NA 20.5 1.4
# 8 Vesper mouse Calomys NA Rodentia NA NA 0.045 NA
# 9 Dog Canis carni Carnivo… domesticated 0.07 14 2.9
#10 Roe deer Capreo… herbi Artioda… lc 0.0982 14.8 NA
# … with 73 more rows
基于模式匹配选择
❝select() 语法 : select(data , ....) data : Data Frame .... : 变量名或者是 function ❞
前面的基本都是变量名,下面我们来看几个 function 的例子
- 选择以 sleep 开头的列
msleep %>% select(name,starts_with('sleep'))
# A tibble: 83 x 4
# name sleep_total sleep_rem sleep_cycle
# <chr> <dbl> <dbl> <dbl>
# 1 Cheetah 12.1 NA NA
# 2 Owl monkey 17 1.8 NA
# 3 Mountain beaver 14.4 2.4 NA
# 4 Greater short-tailed shrew 14.9 2.3 0.133
# 5 Cow 4 0.7 0.667
# 6 Three-toed sloth 14.4 2.2 0.767
# 7 Northern fur seal 8.7 1.4 0.383
# 8 Vesper mouse 7 NA NA
# 9 Dog 10.1 2.9 0.333
#10 Roe deer 3 NA NA
# … with 73 more rows
类似的 function 还有
函数 | 解释 |
---|---|
starts_with() | Starts with a prefix |
ends_with() | Ends with a prefix |
contains() | Contains a literal string |
matches() | Matches a regular expression |
num_range() | Numerical range like x01, x02, x03. |
one_of() | Variables in character vector. |
everything() | All variables. |
我们再来看几个例子
选择列名中含有正则 o. er
模式的, . 代表任意字符, 表示一个或多个
msleep %>% select(matches('o. er'))
# A tibble: 83 x 2
# order conservation
# <chr> <chr>
# 1 Carnivora lc
# 2 Primates NA
# 3 Rodentia nt
# 4 Soricomorpha lc
# 5 Artiodactyla domesticated
# 6 Pilosa NA
# 7 Carnivora vu
# 8 Rodentia NA
# 9 Carnivora domesticated
#10 Artiodactyla lc
# … with 73 more rows
- 选择包含字符串 serv 的列
msleep %>% select(contains('serv'))
#> A tibble: 83 x 1
# conservation
# <chr>
# 1 lc
# 2 NA
# 3 nt
# 4 lc
# 5 domesticated
# 6 NA
# 7 vu
# 8 NA
# 9 domesticated
#10 lc with 73 more rows
- 选择所有列并重新排序
将 awake 列放在第一列
代码语言:javascript复制msleep %>% select(awake,everything())
# A tibble: 83 x 11
# awake name genus vore order conservation sleep_total sleep_rem sleep_cycle
# <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 11.9 Chee… Acin… carni Carn… lc 12.1 NA NA
# 2 7 Owl … Aotus omni Prim… NA 17 1.8 NA
# 3 9.6 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA
# 4 9.1 Grea… Blar… omni Sori… lc 14.9 2.3 0.133
# 5 20 Cow Bos herbi Arti… domesticated 4 0.7 0.667
# 6 9.6 Thre… Brad… herbi Pilo… NA 14.4 2.2 0.767
# 7 15.3 Nort… Call… carni Carn… vu 8.7 1.4 0.383
# 8 17 Vesp… Calo… NA Rode… NA 7 NA NA
# 9 13.9 Dog Canis carni Carn… domesticated 10.1 2.9 0.333
#10 21 Roe … Capr… herbi Arti… lc 3 NA NA
# … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
- 筛选数值型的列
msleep %>%
select_if(is.numeric) %>%
glimpse
Observations: 83
Variables: 6
$ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1, 3.0, 5.…
$ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6, 0.8, 0…
$ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.3833333, NA…
$ awake <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9, 21.0, 18…
$ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.07000, 0.…
$ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.490, 0.045…
类似的还有is.character
、is.factor
等
补充几个行筛选
- 随机选择5个样本
msleep %>% sample_n(5)
# A tibble: 5 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Star… Cond… omni Sori… lc 10.3 2.2 NA 13.7
# 2 Donk… Equus herbi Peri… domesticated 3.1 0.4 NA 20.9
# 3 Musk… Sunc… NA Sori… NA 12.8 2 0.183 11.2
# 4 Pig Sus omni Arti… domesticated 9.1 2.4 0.5 14.9
# 5 Hous… Mus herbi Rode… nt 12.5 1.4 0.183 11.5
# … with 2 more variables: brainwt <dbl>, bodywt <dbl>
- 随机选择 10% 的样本
msleep %>% sample_frac(0.1)
# A tibble: 8 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Big … Epte… inse… Chir… lc 19.7 3.9 0.117 4.3
# 2 East… Tami… herbi Rode… NA 15.8 NA NA 8.2
# 3 Braz… Tapi… herbi Peri… vu 4.4 1 0.9 19.6
# 4 Pilo… Glob… carni Ceta… cd 2.7 0.1 NA 21.4
# 5 Musk… Sunc… NA Sori… NA 12.8 2 0.183 11.2
# 6 Chim… Pan omni Prim… NA 9.7 1.4 1.42 14.3
# 7 Slow… Nyct… carni Prim… NA 11 NA NA 13
# 8 Red … Vulp… carni Carn… NA 9.8 2.4 0.35 14.2
# … with 2 more variables: brainwt <dbl>, bodywt <dbl>
- 去除重复的观测值
没有完全重复的值,所以所有的值都选到了。
代码语言:javascript复制msleep %>% distinct()
# A tibble: 83 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Chee… Acin… carni Carn… lc 12.1 NA NA 11.9
# 2 Owl … Aotus omni Prim… NA 17 1.8 NA 7
# 3 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
# 4 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
# 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
# 6 Thre… Brad… herbi Pilo… NA 14.4 2.2 0.767 9.6
# 7 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3
# 8 Vesp… Calo… NA Rode… NA 7 NA NA 17
# 9 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9
#10 Roe … Capr… herbi Arti… lc 3 NA NA 21
# … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
- 去除 sleep_total 重复的观测值
设置 .keep_all
将保留所有其他变量
msleep %>% distinct(sleep_total,.keep_all = TRUE)
# A tibble: 65 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Chee… Acin… carni Carn… lc 12.1 NA NA 11.9
# 2 Owl … Aotus omni Prim… NA 17 1.8 NA 7
# 3 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
# 4 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
# 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
# 6 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3
# 7 Vesp… Calo… NA Rode… NA 7 NA NA 17
# 8 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9
# 9 Roe … Capr… herbi Arti… lc 3 NA NA 21
#10 Goat Capri herbi Arti… lc 5.3 0.6 NA 18.7
# … with 55 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>