paste() / nchar() / substr() (String Operations)
In R, paste() / paste0() (concatenation), nchar() (character count), and substr() (substring extraction and replacement) are the most commonly used string functions. All of them are vectorized over character vectors, so multiple strings can be processed at once without a loop. This page covers the syntax and behavior of these three functions.
Syntax
# ----------------------------------------------- # paste() — concatenate strings # ----------------------------------------------- # sep specifies the separator (default is " ", a single space) paste(..., sep = " ") # collapse joins all elements of a vector into one string paste(..., sep = " ", collapse = NULL) # paste0() is shorthand for sep = "" (no separator) paste0(..., collapse = NULL) # ----------------------------------------------- # nchar() — return the number of characters # ----------------------------------------------- nchar(x) # Returns the number of characters in string x nchar(x, type = "chars") # type = "chars": character count (default) nchar(x, type = "bytes") # type = "bytes": byte count nchar(x, allowNA = TRUE) # Allows NA in the result (default is FALSE) # ----------------------------------------------- # substr() — extract or replace a substring # ----------------------------------------------- substr(x, start, stop) # Returns characters from position start to stop in x substr(x, start, stop) <- value # Replaces the specified range with value (destructive) # substring() defaults stop to the end of the string (nchar(x)) substring(x, first, last = 1000000L)
Function Options
| Function / Argument | Description |
|---|---|
paste(..., sep = " ") | Concatenates multiple strings (or vectors) separated by sep. |
paste(..., collapse = "sep") | Collapses all elements of a vector into a single string, joined by the collapse separator. |
paste0(...) | Shorthand for paste() with sep = "". Concatenates without any separator. |
nchar(x) | Returns the number of characters as an integer. Vectorized over character vectors. |
nchar(x, type = "bytes") | Returns the byte count rather than character count. Values differ for multi-byte characters. |
substr(x, start, stop) | Returns the substring from position start to stop (1-based indexing). |
substr(x, start, stop) <- value | Overwrites the specified range with value. If value is shorter than the range, only part of the range is replaced. |
substring(x, first) | Returns from position first to the end of the string (equivalent to substr() with stop omitted). |
Sample Code
psychopass_string_basic.R
# psychopass_string_basic.R — sample for paste() / nchar() / substr()
# Uses PSYCHO-PASS character data
# to explore string concatenation, character counts, and substring operations
# -----------------------------------------------
# Concatenation with paste()
# -----------------------------------------------
# Define character names and titles as separate vectors
names <- c("Tsunemori Akane", "Ginoza Nobuchika", "Kogami Shinya", "Kagari Shusei", "Shimomura Mika")
titles <- c("Inspector", "Inspector", "Enforcer", "Enforcer", "Inspector")
cat("--- paste(): concatenate name and title (sep = \": \") ---\n")
labels <- paste(names, titles, sep = ": ")
print(labels)
cat("\n")
# -----------------------------------------------
# No-separator concatenation with paste0()
# -----------------------------------------------
cat("--- paste0(): build a department name ---\n")
bureau <- "PublicSafetyBureau"
division <- "CriminalInvestigation"
unit <- "Div1"
full_unit <- paste0(bureau, "-", division, "-", unit)
cat("Department:", full_unit, "\n\n")
# -----------------------------------------------
# Collapsing a vector into one string with collapse
# -----------------------------------------------
cat("--- collapse: join all elements into one string ---\n")
all_names <- paste(names, collapse = " / ")
cat("All characters:", all_names, "\n\n")
# -----------------------------------------------
# Getting character counts with nchar()
# -----------------------------------------------
cat("--- nchar(): get character count for each name ---\n")
char_counts <- nchar(names)
for (i in seq_along(names)) {
cat(sprintf(" %s -> %d characters\n", names[i], char_counts[i]))
}
cat("\n")
# -----------------------------------------------
# Comparing type = "chars" vs type = "bytes" with nchar()
# -----------------------------------------------
cat("--- nchar() type comparison ---\n")
sample_name <- "Tsunemori Akane"
chars_count <- nchar(sample_name, type = "chars")
bytes_count <- nchar(sample_name, type = "bytes")
cat(sprintf(" \"%s\" chars: %d bytes: %d\n", sample_name, chars_count, bytes_count))
cat("\n")
# -----------------------------------------------
# Extracting a substring with substr()
# -----------------------------------------------
cat("--- substr(): extract the first name (last word) ---\n")
for (name in names) {
parts <- strsplit(name, " ", fixed = TRUE)[[1]]
first_name <- parts[length(parts)]
cat(sprintf(" %s -> given name: %s\n", name, first_name))
}
cat("\n")
# -----------------------------------------------
# Replacing a range with substr()
# -----------------------------------------------
cat("--- substr(): replace a portion of the string ---\n")
code_name <- "Inspector: Tsunemori Akane"
cat("Original:", code_name, "\n")
substr(code_name, 1, 9) <- "Enforcer "
cat("Replaced:", code_name, "\n\n")
# -----------------------------------------------
# Extracting to the end with substring()
# -----------------------------------------------
cat("--- substring(): omit stop to extract to end ---\n")
record <- "CC: Tsunemori Akane: 67.4"
extracted <- substring(record, 5)
cat("Original:", record, "\n")
cat("From position 5:", extracted, "\n\n")
# -----------------------------------------------
# Practical example: filter by name length
# -----------------------------------------------
cat("--- Filter characters with 12 or more characters in their name ---\n")
long_names <- names[nchar(names) >= 12]
cat(paste(long_names, collapse = ", "), "\n")
Rscript psychopass_string_basic.R --- paste(): concatenate name and title (sep = ": ") --- [1] "Tsunemori Akane: Inspector" "Ginoza Nobuchika: Inspector" [3] "Kogami Shinya: Enforcer" "Kagari Shusei: Enforcer" [5] "Shimomura Mika: Inspector" --- paste0(): build a department name --- Department: PublicSafetyBureau-CriminalInvestigation-Div1 --- collapse: join all elements into one string --- All characters: Tsunemori Akane / Ginoza Nobuchika / Kogami Shinya / Kagari Shusei / Shimomura Mika --- nchar(): get character count for each name --- Tsunemori Akane -> 15 characters Ginoza Nobuchika -> 16 characters Kogami Shinya -> 13 characters Kagari Shusei -> 13 characters Shimomura Mika -> 14 characters --- nchar() type comparison --- "Tsunemori Akane" chars: 15 bytes: 15 --- substr(): extract the first name (last word) --- Tsunemori Akane -> given name: Akane Ginoza Nobuchika -> given name: Nobuchika Kogami Shinya -> given name: Shinya Kagari Shusei -> given name: Shusei Shimomura Mika -> given name: Mika --- substr(): replace a portion of the string --- Original: Inspector: Tsunemori Akane Replaced: Enforcer Tsunemori Akane --- substring(): omit stop to extract to end --- Original: CC: Tsunemori Akane: 67.4 From position 5: Tsunemori Akane: 67.4 --- Filter characters with 12 or more characters in their name --- Tsunemori Akane, Ginoza Nobuchika, Kogami Shinya, Kagari Shusei, Shimomura Mika
Common Mistakes
1: nchar() returns character count, not byte count
The default for nchar() is type = "chars", which counts each character as one unit even for multi-byte characters. For ASCII strings, character count and byte count are the same. When byte count is needed, specify type = "bytes".
string_nchar_bytes.R
name_ascii <- "Akane"
# type = "chars" (default): counts characters
cat(sprintf("chars: %d\n", nchar(name_ascii, type = "chars")))
# type = "bytes": counts bytes (same for ASCII; differs for multi-byte)
cat(sprintf("bytes: %d\n", nchar(name_ascii, type = "bytes")))
chars: 5 bytes: 5
2: The default sep in paste() is a space, not an empty string
The default value of sep in paste() is " " (a single space). If you want to concatenate without a separator and omit sep, a space will be inserted between elements. Use paste0() or paste(..., sep = "") for no-separator concatenation. The collapse argument (shared by both paste() and paste0()) folds a vector into a single string.
string_paste_sep.R
bureau <- "PublicSafety"
division <- "Criminal"
unit <- "Div1"
# paste() default sep is " " (space)
cat(paste(bureau, division, unit), "\n")
# Use paste0() for no-separator concatenation
cat(paste0(bureau, division, unit), "\n")
# Explicitly specifying sep = "" gives the same result
cat(paste(bureau, division, unit, sep = ""), "\n")
# collapse folds a vector into a single string
names <- c("Akane", "Nobuchika", "Shinya")
cat(paste(names, collapse = " / "), "\n")
PublicSafety Criminal Div1 PublicSafetyCriminalDiv1 PublicSafetyCriminalDiv1 Akane / Nobuchika / Shinya
Overview
R's paste() concatenates multiple strings using the separator specified by sep, and collapses an entire vector into one string using collapse. paste0() is a shorthand for sep = "", convenient for no-separator concatenation. nchar() returns the character count (or byte count with type = "bytes") and is vectorized, so the counts for all elements of a character vector can be obtained at once. substr(x, start, stop) uses 1-based indexing to extract a substring; used on the left-hand side, it replaces the specified range. When you want to omit stop, use substring() to extract from a position to the end of the string. For string search and replacement, see grep() / sub() / gsub(). For formatted output, see sprintf().
If you find any errors or copyright issues, please contact us.