R: Robust Fitting of Linear Models

rlm {MASS}

R Documentation

Robust Fitting of Linear Models

Description

Fit a linear model by robust regression using an M estimator.

Usage

rlm(x, ...)

## S3 method for class 'formula':
rlm(formula, data, weights, ..., subset, na.action,
    method = c("M", "MM", "model.frame"),
    wt.method = c("inv.var", "case"),
    model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)

## Default S3 method:
rlm(x, y, weights, ..., w = rep(1, nrow(x)),
    init, psi = psi.huber, scale.est, k2 = 1.345,
    method = c("M", "MM"), wt.method = c("inv.var", "case"),
    maxit = 20, acc = 1e-4, test.vec = "resid", lqs.control = NULL)

psi.huber(u, k = 1.345, deriv = 0)
psi.hampel(u, a = 2, b = 4, c = 8, deriv = 0)
psi.bisquare(u, c = 4.685, deriv = 0)

Arguments

`formula`	a formula of the form `y ~ x1 + x2 + ...`.
`data`	data frame from which variables specified in `formula` are preferentially to be taken.
`weights`	prior weights for each case.
`subset`	An index vector specifying the cases to be used in fitting.
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is for the procedure to fail. An alternative is `na.omit`, which leads to omission of cases with missing values on any required variable.
`x`	a matrix or data frame containing the explanatory variables.
`y`	the response: a vector of length the number of rows of `x`.
`method`	currently either M-estimation or find the model frame. MM estimation is M-estimation with Tukey's biweight initialized by a specific S-estimator. See the details section.
`wt.method`	are the weights case weights (giving the relative importance of case, so a weight of 2 means there are two of these) or the inverse of the variances, so a weight of two means this error is twice as variable?
`model`	should the model frame be returned in the object?
`x.ret`	should the model matrix be returned in the object?
`y.ret`	should the response be returned in the object?
`contrasts`	optional contrast specifications: se `lm`.
`w`	(optional) initial down-weighting for each case.
`init`	(optional) initial values for the coefficients OR a method to find initial values OR the result of a fit with a `coef` component. Known methods are `"ls"` (the default) for an initial least-squares fit using weights `w*weights`, and `"lts"` for an unweighted least-trimmed squares fit with 200 samples.
`psi`	the psi function is specified by this argument. It must give (possibly by name) a function `g(x, ..., deriv)` that for `deriv=0` returns psi(x)/x and for `deriv=1` returns psi'(x). Tuning constants will be passed in via `...`.
`scale.est`	method of scale estimation: re-scaled MAD of the residuals or Huber's proposal 2 (which can be selected by either `"Huber"` or `"proposal 2"`).
`k2`	tuning constant used for Huber proposal 2 scale estimation.
`maxit`	the limit on the number of IWLS iterations.
`acc`	the accuracy for the stopping criterion.
`test.vec`	the stopping criterion is based on changes in this vector.
`...`	additional arguments to be passed to `rlm.default` or to the `psi` function.
`lqs.control`	An optional list of control values for `lqs`.
`u`	numeric vector of evalation points.
`k,a,b,c`	tuning constants
`deriv`	`0` or `1`: compute values of the psi function or of its first derivative.

Details

Fitting is done by iterated re-weighted least squares (IWLS).

Psi functions are supplied for the Huber, Hampel and Tukey bisquare proposals as psi.huber, psi.hampel and psi.bisquare. Huber's corresponds to a convex optimization problem and gives a unique solution (up to collinearity). The other two will have multiple local minima, and a good starting point is desirable.

Selecting method = "MM" selects a specific set of options which ensures that the estimator has a high breakdown point. The initial set of coefficients and the final scale are selected by an S-estimator with k0 = 1.548; this gives (for n >> p) breakdown point 0.5. The final estimator is an M-estimator with Tukey's biweight and fixed scale that will inherit this breakdown point provided c > k0; this is true for the default value of c that corresponds to 95% relative efficiency at the normal.

Value

An object of class "rlm" inheriting from "lm". The additional components not in an lm object are

`s`	the robust scale estimate used
`w`	the weights used in the IWLS process
`psi`	the psi function with parameters substituted
`conv`	the convergence criteria at each iteration
`converged`	did the IWLS converge?

References

P. J. Huber (1981) Robust Statistics. Wiley.

F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw and W. A. Stahel (1986) Robust Statistics: The Approach based on Influence Functions. Wiley.

A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth & Brooks/Cole.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

data(stackloss)
summary(rlm(stack.loss ~ ., stackloss))
rlm(stack.loss ~ ., stackloss, psi = psi.hampel, init = "lts")
rlm(stack.loss ~ ., stackloss, psi = psi.bisquare)

[Package MASS version 7.2-23 Index]