3 min read•august 9, 2024
String manipulation is a crucial skill in R programming. Basic operations like concatenation, substring extraction, and character counting form the foundation for more complex text processing tasks. These functions allow you to combine, slice, and analyze strings efficiently.
Building on these basics, you'll learn about case manipulation, string splitting, and pattern substitution. These operations are essential for data cleaning and text analysis, enabling you to standardize, parse, and transform string data in various ways.
[paste()](https://www.fiveableKeyTerm:paste())
function combines multiple strings into a single string
sep
parameter (paste("Hello", "World", sep = "-")
)[substr()](https://www.fiveableKeyTerm:substr())
extracts or replaces substrings within a character vector
substr("Example", 1, 3)
returns "Exa")[nchar()](https://www.fiveableKeyTerm:nchar())
counts the number of characters in a string
[toupper()](https://www.fiveableKeyTerm:toupper())
converts all lowercase letters in a string to uppercase
toupper("Hello")
returns "HELLO")[tolower()](https://www.fiveableKeyTerm:tolower())
converts all uppercase letters in a string to lowercase
toupper()
, useful for case-insensitive comparisonstolower("WORLD")
returns "world")[strsplit()](https://www.fiveableKeyTerm:strsplit())
divides a string into substrings based on a specified delimiter
strsplit("a,b;c", "[,;]")
)[gsub()](https://www.fiveableKeyTerm:gsub())
function performs global string substitution
gsub("[0-9]", "", "Hello123")
removes all digits)[grepl()](https://www.fiveableKeyTerm:grepl())
function checks if a pattern exists in a string
[grep()](https://www.fiveableKeyTerm:grep())
function searches for pattern matches in a character vector
ignore.case = TRUE
parameter[0-9]
for digits, [a-z]
for lowercase letters\d
for digits, \w
for word characters*
matches zero or more occurrences+
matches one or more occurrences?
matches zero or one occurrence{n}
matches exactly n occurrences^
matches the start of a line$
matches the end of a line\b
matches a word boundarystr_length()
for counting characters (similar to nchar()
)str_c()
for concatenating strings (similar to paste()
)str_sub()
for extracting substrings (similar to substr()
)str_detect()
for checking if a pattern exists in a stringstr_extract()
for extracting matched patterns from stringsstr_replace()
for replacing matched patterns in stringsstr_
prefix%>%
)