R: Character String Editing and Miscellaneous Character Handling Functions

sedit {Hmisc}

R Documentation

Character String Editing and Miscellaneous Character Handling Functions

Description

This suite of functions was written to implement many of the features of the UNIX sed program entirely within S-PLUS (function sedit). The substring.location function returns the first and last position numbers that a sub-string occupies in a larger string. The substring2<- function does the opposite of the builtin function substring. It is named substring2 because for S-Plus 5.x there is a built-in function substring, but it does not handle multiple replacements in a single string. replace.substring.wild edits character strings in the fashion of "change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING" passes an optional user-specified test function. Here, the "yyyy" string is searched for from right to left to handle balancing parentheses, etc. numeric.string and all.digits are two examples of test functions, to check, respectively if each of a vector of strings is a legal numeric or if it contains only the digits 0-9. For the case where old="*$" or "^*", or for replace.substring.wild with the same values of old or with front=TRUE or back=TRUE, sedit (if wild.literal=FALSE) and replace.substring.wild will edit the largest substring satisfying test.

substring2 is just a copy of substring so that substring2<- will work.

Usage

sedit(text, from, to, test, wild.literal=FALSE)
substring.location(text, string, restrict)
# substring(text, first, last) <- setto   # S-Plus only
replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE)
numeric.string(string)
all.digits(string)
substring2(text, first, last=1e6)
substring2(text, first, last) <- value

Arguments

`text`	a vector of character strings for `sedit, substring2, substring2<-` or a single character string for `substring.location, replace.substring.wild`.
`from`	a vector of character strings to translate from, for `sedit`. A single asterisk wild card, meaning allow any sequence of characters (subject to the `test` function, if any) in place of the `"*"`. An element of `from` may begin with `"^"` to force the match to begin at the beginning of `text`, and an element of `from` can end with `"$"` to force the match to end at the end of `text`.
`to`	a vector of character strings to translate to, for `sedit`. If a corresponding element in `from` had an `""`, the element in `to` may also have an `""`. Only single asterisks are allowed. If `to` is not the same length as `from`, the `rep` function is used to make it the same length.
`string`	a single character string, for `substring.location`, `numeric.string`, `all.digits`
`first`	a vector of integers specifying the first position to replace for `substring2<-`. `first` may also be a vector of character strings that are passed to `sedit` to use as patterns for replacing substrings with `setto`. See one of the last examples below.
`last`	a vector of integers specifying the ending positions of the character substrings to be replaced. The default is to go to the end of the string. When `first` is character, `last` must be omitted.
`setto`	a character string or vector of character strings used as replacements, in `substring2<-`
`old`	a character string to translate from for `replace.substring.wild`. May be `"$"` or `"^"` or any string containing a single `"*"` but not beginning with `"^"` or ending with `"$"`.
`new`	a character string to translate to for `replace.substring.wild`
`test`	a function of a vector of character strings returning a logical vector whose elements are `TRUE` or `FALSE` according to whether that string element qualifies as the wild card string for `sedit, replace.substring.wild`
`wild.literal`	set to `TRUE` to not treat asterisks as wild cards and to not look for `"^"` or `"$"` in `old`
`restrict`	a vector of two integers for `substring.location` which specifies a range to which the search for matches should be restricted
`front`	specifying `front=TRUE` and `old=""` is the same as specifying `old="^"`
`back`	specifying `back=TRUE` and `old=""` is the same as specifying `old="$"`
`value`	a character vector

Value

sedit returns a vector of character strings the same length as text. substring.location returns a list with components named first and last, each specifying a vector of character positions corresponding to matches. replace.substring.wild returns a single character string. numeric.string and all.digits return a single logical value.

Side Effects

substring2<- modifies its first argument

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
f.harrell@vanderbilt.edu

Examples

x <- 'this string'
substring2(x, 3, 4) <- 'IS'
x
substring2(x, 7) <- ''
x

substring.location('abcdefgabc', 'ab')
substring.location('abcdefgabc', 'ab', restrict=c(3,999))

replace.substring.wild('this is a cat','this*cat','that*dog')
replace.substring.wild('there is a cat','is a*', 'is not a*')
replace.substring.wild('this is a cat','is a*', 'Z')

qualify <- function(x) x==' 1.5 ' | x==' 2.5 '
replace.substring.wild('He won 1.5 million $','won*million',
                       'lost*million', test=qualify)
replace.substring.wild('He won 1 million $','won*million',
                       'lost*million', test=qualify)
replace.substring.wild('He won 1.2 million $','won*million',
                       'lost*million', test=numeric.string)

x <- c('a = b','c < d','hello')
sedit(x, c('=','he*o'),c('==','he*'))

sedit('x23', '*$', '[*]', test=numeric.string)
sedit('23xx', '^*', 'Y_{*} ', test=all.digits)

replace.substring.wild("abcdefabcdef", "d*f", "xy")

x <- "abcd"
substring2(x, "bc") <- "BCX"
x
substring2(x, "B*d") <- "B*D"
x

[Package Hmisc version 3.0-10 Index]