findInterval {base} | R Documentation |
Find the indices of x
in vec
, where vec
must be
sorted (non-decreasingly); i.e., if i <- findInterval(x,v)
,
we have v[i[j]] <= x[j] < v[i[j] + 1]
where v[0] := - Inf,
v[N+1] := + Inf, and N <- length(vec)
.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed
and all.inside
.
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)
x |
numeric. |
vec |
numeric, sorted (weakly) increasingly, of length N ,
say. |
rightmost.closed |
logical; if true, the rightmost interval,
vec[N-1] .. vec[N] is treated as closed, see below. |
all.inside |
logical; if true, the returned indices are coerced into {1,...,N-1}, i.e., 0 is mapped to 1 and N to N-1. |
The function findInterval
finds the index of one vector x
in
another, vec
, where the latter must be non-decreasing. Where
this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum)
,
as a matter of fact, the internal algorithm uses interval search
ensuring O(n * log(N)) complexity where
n <- length(x)
(and N <- length(vec)
). For (almost)
sorted x
, it will be even faster, basically O(n).
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X))
is
identical to n * Fn(t;
X[1],..,X[n]) where Fn is the empirical distribution
function of X[1],..,X[n].
When rightmost.closed = TRUE
, the result
for x[j] = vec[N]
( = max(vec)), is N - 1
as for
all other values in the last interval.
vector of length length(x)
with values in 0:N
(and
NA
) where N <- length(vec)
, or values coerced to
1:(N-1)
iff all.inside = TRUE
(equivalently coercing all
x values inside the intervals). Note that NA
s are
propagated from x
, and Inf
values are allowed in
both x
and vec
.
Martin Maechler
approx(*, method = "constant")
which is a
generalization of findInterval()
, ecdf
for
computing the empirical distribution function which is (up to a factor
of n) also basically the same as findInterval(.).
N <- 100 X <- sort(round(rt(N, df=2), 2)) tt <- c(-100, seq(-2,2, len=201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X)