rcorr {Hmisc}R Documentation

Matrix of Correlations and Generalized Spearman Rank Correlation

Description

rcorr Computes a matrix of Pearson's r or Spearman's rho rank correlation coefficients for all possible pairs of columns of a matrix. Missing values are deleted in pairs rather than deleting all rows of x having any missing variables. Ranks are computed using efficient algorithms (see reference 2), using midranks for ties.

spearman2 computes the square of Spearman's rho rank correlation and a generalization of it in which x can relate non-monotonically to y. This is done by computing the Spearman multiple rho-squared between (rank(x), rank(x)^2) and y. When x is categorical, a different kind of Spearman correlation used in the Kruskal-Wallis test is computed (and spearman2 can do the Kruskal-Wallis test). This is done by computing the ordinary multiple R^2 between k-1 dummy variables and rank(y), where x has k categories. x can also be a formula, in which case each predictor is correlated separately with y, using non-missing observations for that predictor. print and plot methods allow one to easily print or plot the results of spearman2(formula). The adjusted rho^2 is also computed, using the same formula used for the ordinary adjusted R^2. The F test uses the unadjusted R2. For plot, a dot chart is drawn which by default shows, in sorted order, the adjusted rho^2.

spearman computes Spearman's rho on non-missing values of two variables. spearman.test is a simple version of spearman2.default.

Usage

rcorr(x, y, type=c("pearson","spearman"))

## S3 method for class 'rcorr':
print(x, ...)

spearman2(x, ...)

## Default S3 method:
spearman2(x, y, p=1, minlev=0, exclude.imputed=TRUE, ...)

## S3 method for class 'formula':
spearman2(x, p=1, 
          data, subset, na.action, minlev=0, exclude.imputed=TRUE, ...)

## S3 method for class 'spearman2.formula':
print(x, ...)

## S3 method for class 'spearman2.formula':
plot(x, what=c('Adjusted rho2','rho2','P'),
     sort.=TRUE, main, xlab, ...)

spearman(x, y)

spearman.test(x, y, p=1)

Arguments

x a numeric matrix with at least 5 rows and at least 2 columns (if y is absent). For spearman2, the first argument may be a vector of any type, including character or factor. The first argument may also be a formula, in which case all predictors are correlated individually with the response variable. x may be a formula for spearman2 in which case spearman2.formula is invoked. Each predictor in the right hand side of the formula is separately correlated with the response variable. For print, x is an object produced by rcorr or spearman2. For plot, x is a result returned by spearman2. For spearman and spearman.test x is a numeric vector, as is y.
type specifies the type of correlations to compute. Spearman correlations are the Pearson linear correlations computed on the ranks of non-missing elements, using midranks for ties.
y a numeric vector or matrix which will be concatenated to x. If y is omitted for rcorr, x must be a matrix.
p for numeric variables, specifies the order of the Spearman rho^2 to use. The default is p=1 to compute the ordinary rho^2. Use p=2 to compute the quadratic rank generalization to allow non-monotonicity. p is ignored for categorical predictors.
data, subset, na.action the usual options for models. Default for na.action is to retain all values, NA or not, so that NAs can be deleted in only a pairwise fashion.
minlev minimum relative frequency that a level of a categorical predictor should have before it is pooled with other categories (see combine.levels) in spearman2. The default, minlev=0 causes no pooling.
exclude.imputed set to FALSE to include imputed values (created by impute) in the calculations.
what specifies which statistic to plot
sort. set sort.=FALSE to suppress sorting variables by the statistic being plotted
main main title for plot. Default title shows the name of the response variable.
xlab x-axis label. Default constructed from what.
... other arguments that are passed to dotchart2

Details

Uses midranks in case of ties, as described by Hollander and Wolfe. P-values are approximated by using the t distribution.

Value

rcorr returns a list with elements r, the matrix of correlations, n the matrix of number of observations used in analyzing each pair of variables, and P, the asymptotic P-values. Pairs with fewer than 2 non-missing values have the r values set to NA. The diagonals of n are the number of non-NAs for the single variable corresponding to that row and column. spearman2.default (the function that is called for a single x, i.e., when there is no formula) returns a vector of statistics for the variable. spearman2.formula returns a matrix with rows corresponding to predictors.

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu

References

Hollander M. and Wolfe D.A. (1973). Nonparametric Statistical Methods. New York: Wiley.

Press WH, Flannery BP, Teukolsky SA, Vetterling, WT (1988): Numerical Recipes in C. Cambridge: Cambridge University Press.

See Also

hoeffd, cor, combine.levels, varclus, dotchart2, impute

Examples

x <- c(-2, -1, 0, 1, 2)
y <- c(4,   1, 0, 1, 4)
z <- c(1,   2, 3, 4, NA)
v <- c(1,   2, 3, 4, 5)
rcorr(cbind(x,y,z,v))

spearman2(x, y)
plot(spearman2(z ~ x + y + v, p=2))

[Package Hmisc version 3.0-10 Index]