专题1 玩转字符串 stringr包

2024-04-18 10:52:28 浏览数 (2)

1. 玩转字符串

概念区分
代码语言:r复制
rm(list = ls())
if(!require(stringr))install.packages('stringr')
library(stringr)

x <- "The birch canoe slid on the smooth planks."
x
代码语言:r复制
## [1] "The birch canoe slid on the smooth planks."

1.检测字符串长度

代码语言:r复制
str_length(x)
代码语言:r复制
## [1] 42
代码语言:r复制
length(x)
代码语言:r复制
## [1] 1

2.字符串拆分

代码语言:r复制
str_split(x," ") # 把x按空格拆分,得到一个只有一个元素的列表
代码语言:r复制
## [[1]]
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth"  "planks."
代码语言:r复制
class(str_split(x," "))
代码语言:r复制
## [1] "list"
代码语言:r复制
x2 = str_split(x," ")[[1]];x2
代码语言:r复制
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth"  "planks."
代码语言:r复制
y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ") # 把y按空格拆分,得到一个有三个元素的列表(对多个字符串同时拆分)
代码语言:r复制
## [[1]]
## [1] "jimmy" "150"  
## 
## [[2]]
## [1] "nicker" "140"   
## 
## [[3]]
## [1] "tony" "152"
代码语言:r复制
str_split(y," ",simplify = T) # 简化为矩阵,后续自己调整
代码语言:r复制
##      [,1]     [,2] 
## [1,] "jimmy"  "150"
## [2,] "nicker" "140"
## [3,] "tony"   "152"

3.按位置提取字符串

代码语言:r复制
str_sub(x,5,9) # 第5位到第9位
代码语言:r复制
## [1] "birch"
代码语言:r复制
str_sub(x,5,-2) # 倒数也可以
代码语言:r复制
## [1] "birch canoe slid on the smooth planks"

4.字符检测 得到等长逻辑值向量

代码语言:r复制
str_detect(x2,"h")
代码语言:r复制
## [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
代码语言:r复制
str_starts(x2,"T")
代码语言:r复制
## [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
代码语言:r复制
str_ends(x2,"e")
代码语言:r复制
## [1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

5.字符串替换

代码语言:r复制
x2
代码语言:r复制
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth"  "planks."
代码语言:r复制
str_replace(x2,"o","A") # 一个字符串内重复出现的字符只被替换第一个
代码语言:r复制
## [1] "The"     "birch"   "canAe"   "slid"    "An"      "the"     "smAoth"  "planks."
代码语言:r复制
str_replace_all(x2,"o","A") # 替换全部
代码语言:r复制
## [1] "The"     "birch"   "canAe"   "slid"    "An"      "the"     "smAAth"  "planks."

6.字符删除

代码语言:r复制
x
代码语言:r复制
## [1] "The birch canoe slid on the smooth planks."
代码语言:r复制
str_remove(x," ")
代码语言:r复制
## [1] "Thebirch canoe slid on the smooth planks."
代码语言:r复制
str_remove_all(x," ")
代码语言:r复制
## [1] "Thebirchcanoeslidonthesmoothplanks."
代码语言:r复制
####字符串处理可以学一下正则表达式

练习题

代码语言:r复制
#  读取表格文件,提取title列, 提取所有"Control"     "Vemurafenib" 并转为小写
library(rio)
a <- import("group.csv")
title <- a$title;title
代码语言:r复制
## [1] "A375 cells 24h Control rep1"     "A375 cells 24h Control rep2"    
## [3] "A375 cells 24h Control rep3"     "A375 cells 24h Vemurafenib rep1"
## [5] "A375 cells 24h Vemurafenib rep2" "A375 cells 24h Vemurafenib rep3"
代码语言:r复制
# 发现要提取的都是第四个单词,可以用空格分开转换为矩阵,取第四列
a <- str_split(title," ",simplify = T);a
代码语言:r复制
##      [,1]   [,2]    [,3]  [,4]          [,5]  
## [1,] "A375" "cells" "24h" "Control"     "rep1"
## [2,] "A375" "cells" "24h" "Control"     "rep2"
## [3,] "A375" "cells" "24h" "Control"     "rep3"
## [4,] "A375" "cells" "24h" "Vemurafenib" "rep1"
## [5,] "A375" "cells" "24h" "Vemurafenib" "rep2"
## [6,] "A375" "cells" "24h" "Vemurafenib" "rep3"
代码语言:r复制
a <- a[,4];a
代码语言:r复制
## [1] "Control"     "Control"     "Control"     "Vemurafenib" "Vemurafenib" "Vemurafenib"
代码语言:r复制
b <- tolower(a);b # 大写转小写
代码语言:r复制
## [1] "control"     "control"     "control"     "vemurafenib" "vemurafenib" "vemurafenib"

也可以不用空格分开,直接取15到-6位字符

引用自生信技能树

0 人点赞