aggregate {stats} | R Documentation |
Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
aggregate(x, ...) ## Default S3 method: aggregate(x, ...) ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...) ## S3 method for class 'ts': aggregate(x, nfrequency = 1, FUN = sum, ndeltat = 1, ts.eps = getOption("ts.eps"), ...)
x |
an R object. |
by |
a list of grouping elements, each as long as the variables
in x . Names for the grouping variables are provided if
they are not given. The elements of the list will be coerced to
factors (if they are not already factors). |
FUN |
a scalar function to compute the summary statistics which can be applied to all data subsets. |
nfrequency |
new number of observations per unit of time; must
be a divisor of the frequency of x . |
ndeltat |
new fraction of the sampling period between
successive observations; must be a divisor of the sampling
interval of x . |
ts.eps |
tolerance used to decide if nfrequency is a
sub-multiple of the original frequency. |
... |
further arguments passed to or used by methods. |
aggregate
is a generic function with methods for data frames
and time series.
The default method aggregate.default
uses the time series
method if x
is a time series, and otherwise coerces x
to a data frame and calls the data frame method.
aggregate.data.frame
is the data frame method. If x
is not a data frame, it is coerced to one. Then, each of the
variables (columns) in x
is split into subsets of cases
(rows) of identical combinations of the components of by
, and
FUN
is applied to each such subset with further arguments in
...
passed to it.
(I.e., tapply(VAR, by, FUN, ..., simplify = FALSE)
is done
for each variable VAR
in x
, conveniently wrapped into
one call to lapply()
.)
Empty subsets are removed, and the result is reformatted into a data
frame containing the variables in by
and x
. The ones
arising from by
contain the unique combinations of grouping
values used for determining the subsets, and the ones arising from
x
the corresponding summary statistics for the subset of the
respective variables in x
.
aggregate.ts
is the time series method. If x
is not a
time series, it is coerced to one. Then, the variables in x
are split into appropriate blocks of length
frequency(x) / nfrequency
, and FUN
is applied to each
such block, with further (named) arguments in ...
passed to
it. The result returned is a time series with frequency
nfrequency
holding the aggregated values.
Kurt Hornik
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
## Compute the averages for the variables in 'state.x77', grouped ## according to the region (Northeast, South, North Central, West) that ## each state belongs to. aggregate(state.x77, list(Region = state.region), mean) ## Compute the averages according to region and the occurrence of more ## than 130 days of frost. aggregate(state.x77, list(Region = state.region, Cold = state.x77[,"Frost"] > 130), mean) ## (Note that no state in 'South' is THAT cold.) ## Compute the average annual approval ratings for American presidents. aggregate(presidents, nf = 1, FUN = mean) ## Give the summer less weight. aggregate(presidents, nf = 1, FUN = weighted.mean, w = c(1, 1, 0.5, 1))