R: Split the Elements of a Character Vector

strsplit {base}

R Documentation

Split the Elements of a Character Vector

Description

Split the elements of a character vector x into substrings according to the presence of substring split within them.

Usage

strsplit(x, split, extended = TRUE, fixed = FALSE, perl = FALSE)

Arguments

`x`	character vector, each element of which is to be split. Other inputs, including a factor, will give an error.
`split`	character vector (or object which can be coerced to such) containing regular expression(s) (unless `fixed = TRUE`) to use as “split”. If empty matches occur, in particular if `split` has length 0, `x` is split into single characters. If `split` has length greater than 1, it is re-cycled along `x`.
`extended`	logical. If `TRUE`, extended regular expression matching is used, and if `FALSE` basic regular expressions are used.
`fixed`	logical. If `TRUE` match string exactly, otherwise use regular expressions.
`perl`	logical. Should perl-compatible regexps be used? Has priority over `extended`.

Details

Argument split will be coerced to character, so you will see uses with split = NULL to mean split = character(0), including in the examples below.

Note that spltting into single characters can be done via split=character(0) or split=""; the two are equivalent as from R 1.9.0.

A missing value of split does not split the the corresponding element(s) of x at all.

Value

A list of length length(x) the i-th element of which contains the vector of splits of x[i].

Warning

The standard regular expression code has been reported to be very slow when applied to extremely long character strings (tens of thousands of characters or more): the code used when perl=TRUE seems much faster and more reliable for such usages.

The perl = TRUE option is only implemented for singlebyte and UTF-8 encodings, and will warn if used in a non-UTF-8 multibyte locale.

Examples

noquote(strsplit("A text I want to display with spaces", NULL)[[1]])

x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
# split x on the letter e
strsplit(x,"e")

unlist(strsplit("a.b.c", "."))
## [1] "" "" "" "" ""
## Note that 'split' is a regexp!
## If you really want to split on '.', use
unlist(strsplit("a.b.c", "\\."))
## [1] "a" "b" "c"
## or
unlist(strsplit("a.b.c", ".", fixed = TRUE))

## a useful function: rev() for strings
strReverse <- function(x)
        sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))

## get the first names of the members of R-core
a <- readLines(file.path(R.home(),"AUTHORS"))[-(1:8)]
a <- a[(0:2)-length(a)]
(a <- sub(" .*","", a))
# and reverse them
strReverse(a)

[Package base version 2.2.1 Index]