R語言--字符串操作

本文轉載自查看原文 2019-06-01 22:28 3649 R語言學習--基礎篇

字符串操作一般分割、拼接、替換、提取等等

拆分

strsplit

strsplit默認輸出格式為列表

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

x：字符串向量，向量中的每個字符串元素都會被分割
split：位置的字串向量，即在哪個字串處開始拆分；該參數默認是正則表達式匹配；若設置fixed= T則表示是用普通文本匹配或者正則表達式的精確匹配。用普通文本來匹配的運算速度要快些。

x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
strsplit(x, "e")

運行結果：

$`as`                                $qu                                    [[3]]                            [[4]]                            [[5]]
[1] "asf" "f"                        [1] "qw"  "rty"                                 [1] "yuiop["                             [1] "b"                                    [1] "stuff.blah.y" "ch"

str_split

stringr包中的str_split函數與標准庫中的strsplit一樣

str_split(string, pattern, n = Inf, simplify = FALSE)

string：字符串向量，向量中的每個字符串元素都會被分割
pattern：分割位置的字符串向量，即在哪個字符串處開始

library(stringr)
fruits <- c( "apples and oranges and pears and bananas","pineapples and mangos and guavas")
str_split(fruits, " and ")

運行結果：

[[1]]                                                                                                            [[2]]
[1] "apples"  "oranges" "pears"   "bananas"                                              [1] "pineapples" "mangos"     "guavas"

拼接

paste和paste0

paste和paste0之間的區別是拼接的字符之間是否帶有空格

paste (..., sep = " ", collapse = NULL)
paste0(..., collapse = NULL)

...：一個或者多個R對象，該對象需轉換為字符向量.如果是字符串，則所有字符串拼接在一起，如果是字符串向量，則匹配。具體看實例
sep：分割字符串

paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
# 結果
[1] "1st"  "2nd"  "3rd"  "4th"  "5th"  "6th"  "7th"  "8th"  "9th"  "10th" "11th" "12th"
paste(1:12, c("st", "nd", "rd", rep("th", 9)))
# 結果
"1 st"  "2 nd"  "3 rd"  "4 th"  "5 th"  "6 th"  "7 th"  "8 th"  "9 th"  "10 th" "11 th" "12 th"

paste(1:12, c("st", "nd"))
# 結果
[1] "1 st"  "2 nd"  "3 st"  "4 nd"  "5 st"  "6 nd"  "7 st"  "8 nd"  "9 st"  "10 nd" "11 st" "12 nd"
paste0(1:12, c("st", "nd"))
# 結果
[1] "1st"  "2nd"  "3st"  "4nd"  "5st"  "6nd"  "7st"  "8nd"  "9st"  "10nd" "11st" "12nd"

paste("I","love","you")
# 結果
[1] "I love you"
paste0("I","love","you")
# 結果
[1] "Iloveyou"

str_c

str_c(..., sep = "", collapse = NULL)

str_c和paste0函數一樣

str_c(1:12, c("st", "nd", "rd", rep("th", 9)))
# 結果
[1] "1st"  "2nd"  "3rd"  "4th"  "5th"  "6th"  "7th"  "8th"  "9th"  "10th" "11th" "12th"
str_c(1:12, c("st", "nd"))
# 結果
[1] "1st"  "2nd"  "3st"  "4nd"  "5st"  "6nd"  "7st"  "8nd"  "9st"  "10nd" "11st" "12nd"
str_c("I","love","you")
# 結果
[1] "Iloveyou"

替換

chartr

chartr(old, new, x)

x：字符串向量
old：需要被替換的字符/字符串，其長度不能長於new。也就是說只會更改下標上的字符，而不能更改下標。而且替換的時候，會old和new根據下標對應替換
new：替換的字符/字符串

chartr(old = "a",new = "c",c("a123","a15","a23"))
# 結果
[1] "c123" "c15"  "c23"
chartr(old = "a12345",new = "c6789101456",c("a123","a15","a23"))
# 結果
[1] "c678" "c61"  "c78"  # 拿a15說明，a在old中下標為1，便替換為new[1]。1在old中下標為2，所以替換為new[2]。5在old中下標為6，所以替換為new[6]，所以最后a15替換為c61。
chartr(old = "a1",new = "c4",c("a123","a15","a23"))
# 結果
[1] "c423" "c45"  "c23"

sub

sub可以替換字符串，但是sub()函數不會對原字符串進行操作。所以需要創建一個變量來儲存該操作后的字符串。另外，sub函數只會替換匹配到的第一個

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)

pattern：包含正則表達式的字符串
replacement：與pattern匹配的部分進行替換的值
x：字符串向量或者轉化為字符的R對象

str <- "Now is the time               "
sub(" +$", " 12:00", str) #正則表達式，即str尾部的空格替換為12:00 
# 結果
"Now is the time 12:00"
# 此時我們只是調用了sub函數，卻沒有保存這個結果。而且該函數不會對原函數操作的。
print(str)
"Now is the time               "

sub("Now","what",str)
# 結果
[1] "what is the time               "

sub(pattern = "nd",replacement = "ND",c("andbndcnd","sndendfund"))
# 結果，字符串元素中有很多"nd"，但是只會替換第一個"nd"。
[1] "aNDbndcnd"  "sNDendfund"

gsub

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)

gsub()函數和sub用法一樣，不過，gsub()函數可以替換所有匹配字符

gsub(pattern = "nd",replacement = "ND",c("andbndcnd","sndendfund"))
# 結果
[1] "aNDbNDcND"  "sNDeNDfuND"

substr和substring

這兩個函數可以提取、替換字符串。而且是對原字符串進行操作

substr(x, start, stop) <- value
substring(text, first, last = 1000000L) <- value

x, text：字符串向量
start, first：整型，替換字符的起始下標
stop,：整型，替換字符的結束下標
last：字符串長度
value：替換的字符，如果需要的話（與代替換向量長度不同)，自動循環補齊

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
substr(shopping_list,1,3) <- "AAA"
# 結果
[1] "AAAles x4"    "AAA of flour" "AAA of sugar" "AAAk x2"

substr(shopping_list,1) <- "AAA"
# 結果
[1] "AAAles x4"    "AAA of flour" "AAA of sugar" "AAAk x2"

substr(shopping_list,1,20) <- "yesterday once more"
# 結果
[1] "yesterday"    "yesterday on" "yesterday on" "yesterd"   

substring(shopping_list,1) <- "yesterday once more"
# 結果
[1] "yesterday"    "yesterday on" "yesterday on" "yesterd"

str_replace和str_replace_all

第三方包中的str_replace和str_replace_all

str_replace(string, pattern, replacement) # 和sub一樣，只替換第一個匹配字符
str_replace_all(string, pattern, replacement) # 和gsub一樣，替換所有匹配字符

fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, "[aeiou]", "-") #正則表達式，即對字符串中的小寫字母a或e或i或o或u，替換為-
# 結果
[1] "-ne apple"     "tw- pears"     "thr-e bananas" 
str_replace_all(fruits, "[aeiou]", "-") 
# 結果
[1] "-n- -ppl-"     "tw- p--rs"     "thr-- b-n-n-s"

str_sub

第三方包stringr

str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_sub(shopping_list,1,3) <- "AAA"
# 結果
[1] "AAAles x4"    "AAA of flour" "AAA of sugar" "AAAk x2"
str_sub(shopping_list,1) <- "AAA"
# 結果
[1] "AAA" "AAA" "AAA" "AAA"

提取

substr 和substring

substr(x, start, stop)
substring(text, first, last = 1000000L)

substr("abcdef", 2, 4)
# 結果
"bcd"
substr("abcdef", 1:6, 1:6)
# 結果
"a","b","c","d","d","e"

str_extract 和str_extract_all

第三方包stringr

str_extract(string, pattern)
str_extract_all(string, pattern, simplify = FALSE)

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "[a-z]+")
# 結果
[1] "apples" "bag"    "bag"    "milk"  
str_extract_all(shopping_list, "[a-z]+")
# 結果
[[1]]                                        [[2]]                                        [[3]]                                        [[4]]
[1] "apples" "x"                           [1] "bag"   "of"    "flour"                  [1] "bag"   "of"    "sugar"                [1] "milk" "x"

str_sub

第三方包stringr

str_sub(string, start = 1L, end = -1L)

str_sub(shopping_list,1,5)
# 結果
[1] "apple" "bag o" "bag o" "milk "

測定字符串長度

nchar

nchar(x, type = "chars", allowNA = FALSE, keepNA = NA) #以字符串為向量，返回向量元素--字符串的長度組成的向量
nzchar(x, keepNA = FALSE) #快速判定字符串向量元素是否為非空值

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
nchar(shopping_list)
# 結果
[1]  9 12 12  7
nzchar(shopping_list)
# 結果
[1] TRUE TRUE TRUE TRUE

str_count

str_count(string, pattern = "")

str_count不僅可以測定元素長度，還以測定某字符在字符串中的下標位置

str_count(shopping_list)
# 結果
[1]  9 12 12  7
str_count(shopping_list, "a")
# 結果，如果不包含則返回0
[1] 1 1 2 0

str_length

第三方包stringr

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_length(string)

str_length(shopping_list)
# 結果
[1]  9 12 12  7

字符串匹配

grep

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)

pattern: 包含一個正則表達式的字符串（或者，當fixed = True時，為字符串）
x: 一個待匹配的字符串向量，或者是一個可強制轉換為字符串的R對象
value:當value = False時，函數返回匹配值的下標。當value = True，函數返回匹配值

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
grep("apple",shopping_list)
# 結果
[1] 1
grep("apple",shopping_list,value = T)
# 結果
[1] "apples x4"

grepl

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

grepl和grep的用法差不多，只是grepl返回的是邏輯變量TRUE或FALSE

grepl("apple",shopping_list)
# 結果
[1]  TRUE FALSE FALSE FALSE

str_subset

str_subset(string, pattern, negate = FALSE)

string: 待匹配的字符串向量
pattern: 一個包含正則表達式的字符串
negate: 當negate = False，函數返回匹配值。當negate = True，函數返回與pattern不匹配的字符串

fruit <- c("apple", "banana", "pear", "pinapple")
str_subset(fruit, "a") #匹配所有含有a的字符串
# 結果
[1] "apple"    "banana"   "pear"     "pinapple"
str_subset(fruit, "^p", negate = TRUE) # 返回所有不以p開頭的字符串
# 結果
[1] "apple"  "banana"

str_which

str_which(string, pattern, negate = FALSE)

str_which(fruit, "a")
# 結果
[1] 1 2 3 4

排序

str_sort

str_sort(x, decreasing = FALSE, na_last = TRUE, locale = "en",
  numeric = FALSE, ...)

x:待排序的字符串向量
decreasing：布爾值，默認FALSE，表示從低到高排序。如果為TRUE，表示從高到低排序
na_last：NA 應該排在什么位置，TRUE表示放在末端，FALSE表示放在開頭，NA向下排。
numeric：如果為True，則按照數字排序而不是按照字符排序

x <- c("100a10", "100a5", "2b", "2a")
str_sort(x)
# 結果
[1] "100a10" "100a5"  "2a"     "2b" 
str_sort(x, numeric = TRUE)
# 結果
[1] "2a"     "2b"     "100a5"  "100a10"

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R語言中字符串的拼接操作 R語言簡單字符串操作 R語言拆分字符串 R語言字符串去引號 R語言-字符串處理函數 R語言字符串替換在R語言中使用Stringr進行字符串操作【R筆記】R語言中的字符串處理函數 R語言parse函數與eval函數的字符串轉命令行及執行操作 R語言統計字符串的字符數ncahr函數