sedit {Hmisc} | R Documentation |
This suite of functions was written to implement many of the features
of the UNIX sed
program entirely within S-PLUS (function sedit
).
The substring.location
function returns the first and last position
numbers that a sub-string occupies in a larger string. The substring2<-
function does the opposite of the builtin function substring
.
It is named substring2
because for S-Plus 5.x there is a built-in
function substring
, but it does not handle multiple replacements in
a single string.
replace.substring.wild
edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified test
function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc. numeric.string
and all.digits
are two examples of test
functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where old="*$" or "^*"
, or for
replace.substring.wild
with the same values of old
or with
front=TRUE
or back=TRUE
, sedit
(if wild.literal=FALSE
) and
replace.substring.wild
will edit the largest substring
satisfying test
.
substring2
is just a copy of substring
so that
substring2<-
will work.
sedit(text, from, to, test, wild.literal=FALSE) substring.location(text, string, restrict) # substring(text, first, last) <- setto # S-Plus only replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE) numeric.string(string) all.digits(string) substring2(text, first, last=1e6) substring2(text, first, last) <- value
text |
a vector of character strings for sedit, substring2, substring2<-
or a single character string for substring.location,
replace.substring.wild .
|
from |
a vector of character strings to translate from, for sedit .
A single asterisk wild card, meaning allow any sequence of characters
(subject to the test function, if any) in place of the "*" .
An element of from may begin with "^" to force the match to
begin at the beginning of text , and an element of from can end with
"$" to force the match to end at the end of text .
|
to |
a vector of character strings to translate to, for sedit .
If a corresponding element in from had an "*" , the element
in to may also have an "*" . Only single asterisks are allowed.
If to is not the same length as from , the rep function
is used to make it the same length.
|
string |
a single character string, for substring.location , numeric.string ,
all.digits
|
first |
a vector of integers specifying the first position to replace for
substring2<- . first may also be a vector of character strings
that are passed to sedit to use as patterns for replacing
substrings with setto . See one of the last examples below.
|
last |
a vector of integers specifying the ending positions of the character
substrings to be replaced. The default is to go to the end of
the string. When first is character, last must be
omitted.
|
setto |
a character string or vector of character strings used as replacements,
in substring2<-
|
old |
a character string to translate from for replace.substring.wild .
May be "*$" or "^*" or any string containing a single "*" but
not beginning with "^" or ending with "$" .
|
new |
a character string to translate to for replace.substring.wild
|
test |
a function of a vector of character strings returning a logical vector
whose elements are TRUE or FALSE according
to whether that string element qualifies as the wild card string for
sedit, replace.substring.wild
|
wild.literal |
set to TRUE to not treat asterisks as wild cards and to not look for
"^" or "$" in old
|
restrict |
a vector of two integers for substring.location which specifies a
range to which the search for matches should be restricted
|
front |
specifying front=TRUE and old="*" is the same as specifying old="^*"
|
back |
specifying back=TRUE and old="*" is the same as specifying old="*$"
|
value |
a character vector |
sedit
returns a vector of character strings the same length as text
.
substring.location
returns a list with components named first
and last
, each specifying a vector of character positions corresponding
to matches. replace.substring.wild
returns a single character string.
numeric.string
and all.digits
return a single logical value.
substring2<-
modifies its first argument
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
f.harrell@vanderbilt.edu
x <- 'this string' substring2(x, 3, 4) <- 'IS' x substring2(x, 7) <- '' x substring.location('abcdefgabc', 'ab') substring.location('abcdefgabc', 'ab', restrict=c(3,999)) replace.substring.wild('this is a cat','this*cat','that*dog') replace.substring.wild('there is a cat','is a*', 'is not a*') replace.substring.wild('this is a cat','is a*', 'Z') qualify <- function(x) x==' 1.5 ' | x==' 2.5 ' replace.substring.wild('He won 1.5 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1.2 million $','won*million', 'lost*million', test=numeric.string) x <- c('a = b','c < d','hello') sedit(x, c('=','he*o'),c('==','he*')) sedit('x23', '*$', '[*]', test=numeric.string) sedit('23xx', '^*', 'Y_{*} ', test=all.digits) replace.substring.wild("abcdefabcdef", "d*f", "xy") x <- "abcd" substring2(x, "bc") <- "BCX" x substring2(x, "B*d") <- "B*D" x