1. 玩转字符串
概念区分
代码语言:r复制rm(list = ls())
if(!require(stringr))install.packages('stringr')
library(stringr)
x <- "The birch canoe slid on the smooth planks."
x
代码语言:r复制## [1] "The birch canoe slid on the smooth planks."
1.检测字符串长度
代码语言:r复制str_length(x)
代码语言:r复制## [1] 42
代码语言:r复制length(x)
代码语言:r复制## [1] 1
2.字符串拆分
代码语言:r复制str_split(x," ") # 把x按空格拆分,得到一个只有一个元素的列表
代码语言:r复制## [[1]]
## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
代码语言:r复制class(str_split(x," "))
代码语言:r复制## [1] "list"
代码语言:r复制x2 = str_split(x," ")[[1]];x2
代码语言:r复制## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
代码语言:r复制y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ") # 把y按空格拆分,得到一个有三个元素的列表(对多个字符串同时拆分)
代码语言:r复制## [[1]]
## [1] "jimmy" "150"
##
## [[2]]
## [1] "nicker" "140"
##
## [[3]]
## [1] "tony" "152"
代码语言:r复制str_split(y," ",simplify = T) # 简化为矩阵,后续自己调整
代码语言:r复制## [,1] [,2]
## [1,] "jimmy" "150"
## [2,] "nicker" "140"
## [3,] "tony" "152"
3.按位置提取字符串
代码语言:r复制str_sub(x,5,9) # 第5位到第9位
代码语言:r复制## [1] "birch"
代码语言:r复制str_sub(x,5,-2) # 倒数也可以
代码语言:r复制## [1] "birch canoe slid on the smooth planks"
4.字符检测 得到等长逻辑值向量
代码语言:r复制str_detect(x2,"h")
代码语言:r复制## [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE
代码语言:r复制str_starts(x2,"T")
代码语言:r复制## [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
代码语言:r复制str_ends(x2,"e")
代码语言:r复制## [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
5.字符串替换
代码语言:r复制x2
代码语言:r复制## [1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
代码语言:r复制str_replace(x2,"o","A") # 一个字符串内重复出现的字符只被替换第一个
代码语言:r复制## [1] "The" "birch" "canAe" "slid" "An" "the" "smAoth" "planks."
代码语言:r复制str_replace_all(x2,"o","A") # 替换全部
代码语言:r复制## [1] "The" "birch" "canAe" "slid" "An" "the" "smAAth" "planks."
6.字符删除
代码语言:r复制x
代码语言:r复制## [1] "The birch canoe slid on the smooth planks."
代码语言:r复制str_remove(x," ")
代码语言:r复制## [1] "Thebirch canoe slid on the smooth planks."
代码语言:r复制str_remove_all(x," ")
代码语言:r复制## [1] "Thebirchcanoeslidonthesmoothplanks."
代码语言:r复制####字符串处理可以学一下正则表达式
练习题
代码语言:r复制# 读取表格文件,提取title列, 提取所有"Control" "Vemurafenib" 并转为小写
library(rio)
a <- import("group.csv")
title <- a$title;title
代码语言:r复制## [1] "A375 cells 24h Control rep1" "A375 cells 24h Control rep2"
## [3] "A375 cells 24h Control rep3" "A375 cells 24h Vemurafenib rep1"
## [5] "A375 cells 24h Vemurafenib rep2" "A375 cells 24h Vemurafenib rep3"
代码语言:r复制# 发现要提取的都是第四个单词,可以用空格分开转换为矩阵,取第四列
a <- str_split(title," ",simplify = T);a
代码语言:r复制## [,1] [,2] [,3] [,4] [,5]
## [1,] "A375" "cells" "24h" "Control" "rep1"
## [2,] "A375" "cells" "24h" "Control" "rep2"
## [3,] "A375" "cells" "24h" "Control" "rep3"
## [4,] "A375" "cells" "24h" "Vemurafenib" "rep1"
## [5,] "A375" "cells" "24h" "Vemurafenib" "rep2"
## [6,] "A375" "cells" "24h" "Vemurafenib" "rep3"
代码语言:r复制a <- a[,4];a
代码语言:r复制## [1] "Control" "Control" "Control" "Vemurafenib" "Vemurafenib" "Vemurafenib"
代码语言:r复制b <- tolower(a);b # 大写转小写
代码语言:r复制## [1] "control" "control" "control" "vemurafenib" "vemurafenib" "vemurafenib"
也可以不用空格分开,直接取15到-6位字符
引用自生信技能树