Overview {Hmisc}R Documentation

Overview of Hmisc Library


The Hmisc library contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, translating SAS datasets into S, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of S objects to LaTeX code, recoding variables, and bootstrap repeated measures analysis. Most of these functions were written by F Harrell, but a few were collected from statlib and from s-news; other authors are indicated below. This collection of functions includes all of Harrell's submissions to statlib other than the functions in the Design and display libraries. A few of the functions do not have "Help" documentation.

To make Hmisc load silently, issue options(Hverbose=FALSE) before library(Hmisc).


Function Name Purpose
abs.error.pred Computes various indexes of predictive accuracy based
on absolute errors, for linear models
all.is.numeric Check if character strings are legal numerics
approxExtrap Linear extrapolation
aregImpute Multiple imputation based on additive regression,
bootstrapping, and predictive mean matching
areg.boot Nonparametrically estimate transformations for both
sides of a multiple additive regression, and
bootstrap these estimates and R^2
ballocation Optimum sample allocations in 2-sample proportion test
binconf Exact confidence limits for a proportion and more accurate
(narrower!) score stat.-based Wilson interval
(Rollin Brant, mod. FEH)
bootkm Bootstrap Kaplan-Meier survival or quantile estimates
bpower Approximate power of 2-sided test for 2 proportions
Includes bpower.sim for exact power by simulation
bpplot Box-Percentile plot
(Jeffrey Banfield, umsfjban@bill.oscs.montana.edu)
bsamsize Sample size requirements for test of 2 proportions
bystats Statistics on a single variable by levels of >=1 factors
bystats2 2-way statistics
calltree Calling tree of functions
(David Lubinsky, david@hoqax.att.com)
character.table Shows numeric equivalents of all latin characters
Useful for putting many special chars. in graph titles
(Pierre Joyet, pierre.joyet@bluewin.ch)
ciapower Power of Cox interaction test
cleanup.import More compactly store variables in a data frame, and clean up
problem data when e.g. Excel spreadsheet had a non-
numeric value in a numeric column
combine.levels Combine infrequent levels of a categorical variable
comment Attach a comment attribute to an object:
comment(fit) <- 'Used old data'
comment(fit) (prints comment)
confbar Draws confidence bars on an existing plot using multiple
confidence levels distinguished using color or gray scale
contents Print the contents (variables, labels, etc.) of a data frame
cpower Power of Cox 2-sample test allowing for noncompliance
Cs Vector of character strings from list of unquoted names
csv.get Enhanced importing of comma separated files labels
cut2 Like cut with better endpoint label construction and allows
construction of quantile groups or groups with given n
datadensity Snapshot graph of distributions of all variables in
a data frame. For continuous variables uses scat1d.
dataRep Quantify representation of new observations in a database
ddmmmyy SAS "date7" output format for a chron object
deff Kish design effect and intra-cluster correlation
describe Function to describe different classes of objects.
Invoke by saying describe(object). It calls one of the
describe.data.frame Describe all variables in a data frame (generalization
describe.default Describe a variable (generalization of SAS UNIVARIATE)
do Assists with batch analyses
dot.chart Dot chart for one or two classification variables
Dotplot Enhancement of Trellis dotplot allowing for matrix
x-var., auto generation of Key function, superposition
drawPlot Simple mouse-driven drawing program, including a function
for fitting Bezier curves
ecdf Empirical cumulative distribution function plot
eip Edit an object "in-place" (may be dangerous!), e.g.
eip(sqrt) will replace the builtin sqrt function
errbar Plot with error bars (Charles Geyer, U. Chi., mod FEH)
event.chart Plot general event charts (Jack Lee, jjlee@mdanderson.org,
Ken Hess, Joel Dubin; Am Statistician 54:63-70,2000)
event.history Event history chart with time-dependent cov. status
(Joel Dubin, joel.dubin@yale.edu)
find.matches Find matches (with tolerances) between columns of 2 matrices
first.word Find the first word in an S expression (R Heiberger)
fit.mult.impute Fit most regression models over multiple transcan imputations,
compute imputation-adjusted variances and avg. betas
format.df Format a matrix or data frame with much user control
(R Heiberger and FE Harrell)
ftupwr Power of 2-sample binomial test using Fleiss, Tytun, Ury
ftuss Sample size for 2-sample binomial test using " " " "
(Both by Dan Heitjan, dheitjan@biostats.hmc.psu.edu)
gbayes Bayesian posterior and predictive distributions when both
the prior and the likelihood are Gaussian
getHdata Fetch and list datasets on our web site
gs.slide Sets nice defaults for graph sheets for S-Plus 2000 for
copying graphs into Microsoft applications
hdquantile Harrell-Davis nonparametric quantile estimator with s.e.
histbackback Back-to-back histograms (Pat Burns, Salomon Smith
Barney, London, pburns@dorado.sbi.com)
hist.data.frame Matrix of histograms for all numeric vars. in data frame
Use hist.data.frame(data.frame.name)
histSpike Add high-resolution spike histograms or density estimates
to an existing plot
hoeffd Hoeffding's D test (omnibus test of independence of X and Y)
impute Impute missing data (generic method)
interaction More flexible version of builtin function
is.present Tests for non-blank character values or non-NA numeric values
james.stein James-Stein shrinkage estimates of cell means from raw data
labcurve Optimally label a set of curves that have been drawn on
an existing plot, on the basis of gaps between curves.
Also position legends automatically at emptiest rectangle.
label Set or fetch a label for an S-object
Lag Lag a vector, padding on the left with NA or ''
latex Convert an S object to LaTeX (R Heiberger & FE Harrell)
ldBands Lan-DeMets bands for group sequential tests
list.tree Pretty-print the structure of any data object
(Alan Zaslavsky, zaslavsk@hcp.med.harvard.edu)
Load Enhancement of load
mask 8-bit logical representation of a short integer value
(Rick Becker)
matchCases Match each case on one continuous variable
matxv Fast matrix * vector, handling intercept(s) and NAs
mem mem() types quick summary of memory used during session
mgp.axis Version of axis() that uses appropriate mgp from
mgp.axis.labels and gets around bug in axis(2, ...)
that causes it to assume las=1
mgp.axis.labels Used by survplot and plot in Design library (and other
functions in the future) so that different spacing
between tick marks and axis tick mark labels may be
specified for x- and y-axes. ps.slide, win.slide,
gs.slide set up nice defaults for mgp.axis.labels.
Otherwise use mgp.axis.labels('default') to set defaults.
Users can set values manually using
mgp.axis.labels(x,y) where x and y are 2nd value of
par('mgp') to use. Use mgp.axis.labels(type=w) to
retrieve values, where w='x', 'y', 'x and y', 'xy',
to get 3 mgp values (first 3 types) or 2 mgp.axis.labels.
minor.tick Add minor tick marks to an existing plot
mtitle Add outer titles and subtitles to a multiple plot layout
nomiss Return a matrix after excluding any row with an NA
panel.bpplot Panel function for trellis bwplot - box-percentile plots
panel.plsmo Panel function for trellis xyplot - uses plsmo
pc1 Compute first prin. component and get coefficients on
original scale of variables
plotCorrPrecision Plot precision of estimate of correlation coefficient
plsmo Plot smoothed x vs. y with labeling and exclusion of NAs
Also allows a grouping variable and plots unsmoothed data
popower Power and sample size calculations for ordinal responses
(two treatments, proportional odds model)
prn prn(expression) does print(expression) but titles the
output with 'expression'. Do prn(expression,txt) to add
a heading ('txt') before the 'expression' title
p.sunflowers Sunflower plots (Andreas Ruckstuhl, Werner Stahel,
Martin Maechler, Tim Hesterberg)
ps.slide Set up postcript() using nice defaults for different types
of graphics media
pstamp Stamp a plot with date in lower right corner (pstamp())
Add ,pwd=T and/or ,time=T to add current directory
name or time
Put additional text for label as first argument, e.g.
pstamp('Figure 1') will draw 'Figure 1 date'
putKey Different way to use key()
putKeyEmpty Put key at most empty part of existing plot
rcorr Pearson or Spearman correlation matrix with pairwise deletion
of missing data
rcorr.cens Somers' Dyx rank correlation with censored data
rcorrp.cens Assess difference in concordance for paired predictors
rcspline.eval Evaluate restricted cubic spline design matrix
rcspline.plot Plot spline fit with nonparametric smooth and grouped estimates
rcspline.restate Restate restricted cubic spline in unrestricted form, and
create TeX expression to print the fitted function
recode Recodes variables
reShape Reshape a matrix into 3 vectors, reshape serial data
rm.boot Bootstrap spline fit to repeated measurements model,
with simultaneous confidence region - least
squares using spline function in time
rMultinom Generate multinomial random variables with varying prob.
samplesize.bin Sample size for 2-sample binomial problem
(Rick Chappell, chappell@stat.wisc.edu)
sas.get Convert SAS dataset to S data frame
sasxport.get Enhanced importing of SAS transport dataset in R
Save Enhancement of save
scat1d Add 1-dimensional scatterplot to an axis of an existing plot
(like bar-codes, FEH/Martin Maechler,
maechler@stat.math.ethz.ch/Jens Oehlschlaegel-Akiyoshi,
score.binary Construct a score from a series of binary variables or
sedit A set of character handling functions written entirely
in S. sedit() does much of what the UNIX sed
program does. Other functions included are
substring.location, substring<-, replace.string.wild,
and functions to check if a string is numeric or
contains only the digits 0-9
setpdf Adobe PDF graphics setup for including graphics in books
and reports with nice defaults, minimal wasted space
setps Postscript graphics setup for including graphics in books
and reports with nice defaults, minimal wasted space
Internally uses psfig function by
Antonio Possolo (antonio@atc.boeing.com).
setps works with Ghostscript to convert .ps to .pdf
setTrellis Set Trellis graphics to use blank conditioning panel strips,
line thickness 1 for dot plot reference lines:
setTrellis(); 3 optional arguments
show.col Show colors corresponding to col=0,1,...,99
show.pch Show all plotting characters specified by pch=.
Just type show.pch() to draw the table on the
current device.
showPsfrag Use LaTeX to compile, and dvips and ghostview to
display a postscript graphic containing psfrag strings
solvet Version of solve with argument tol passed to qr
somers2 Somers' rank correlation and c-index for binary y
spearman Spearman rank correlation coefficient spearman(x,y)
spearman.test Spearman 1 d.f. and 2 d.f. rank correlation test
spearman2 Spearman multiple d.f. rho^2, adjusted rho^2, Wilcoxon-Kruskal-
Wallis test, for multiple predictors
spower Simulate power of 2-sample test for survival under
complex conditions
Also contains the Gompertz2,Weibull2,Lognorm2 functions.
spss.get Enhanced importing of SPSS files using read.spss function
src src(name) = source("name.s") with memory
store store an object permanently (easy interface to assign function)
strmatch Shortest unique identifier match
(Terry Therneau, therneau@mayo.edu)
subset More easily subset a data frame
substi Substitute one var for another when observations NA
summarize Generate a data frame containing stratified summary
statistics. Useful for passing to trellis.
summary.formula General table making and plotting functions for summarizing
symbol.freq X-Y Frequency plot with circles' area prop. to frequency
sys Execute unix() or dos() depending on what's running
tex Enclose a string with the correct syntax for using
with the LaTeX psfrag package, for postscript graphics
transace ace() packaged for easily automatically transforming all
variables in a matrix
transcan automatic transformation and imputation of NAs for a
series of predictor variables
trap.rule Area under curve defined by arbitrary x and y vectors,
using trapezoidal rule
trellis.strip.blank To make the strip titles in trellis more visible, you can
make the backgrounds blank by saying trellis.strip.blank().
Use before opening the graphics device.
t.test.cluster 2-sample t-test for cluster-randomized observations
uncbind Form individual variables from a matrix
upData Update a data frame (change names, labels, remove vars, etc.)
units Set or fetch "units" attribute - units of measurement for var.
varclus Graph hierarchical clustering of variables using squared
Pearson or Spearman correlations or Hoeffding D as similarities
Also includes the naclus function for examining similarities in
patterns of missing values across variables.
xy.group Compute mean x vs. function of y by groups of x
xYplot Like trellis xyplot but supports error bars and multiple
response variables that are connected as separate lines
win.slide Setup win.graph or win.printer using nice defaults for
num.denom.setup Set of function for obtaining weighted estimates
zoom Zoom in on any graphical display
(Bill Dunlap, bill@statsci.com)

Copyright Notice

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

In short: You may use it any way you like, as long as you don't charge money for it, remove this notice, or hold anyone liable for its results. Also, please acknowledge the source and communicate changes to the author.

If this software is used is work presented for publication, kindly reference it using for example:
Harrell FE (2004): Hmisc S function library. Programs available from http://biostat.mc.vanderbilt.edu/s/Hmisc.
Be sure to reference S-Plus or R itself and other libraries used.


This work was supported by grants from the Agency for Health Care Policy and Research (US Public Health Service) and the Robert Wood Johnson Foundation.


Frank E Harrell Jr
Professor of Biostatistics
Chair, Department of Biostatistics
Vanderbilt University School of Medicine
Nashville, Tennessee


See Alzola CF, Harrell FE (2004): An Introduction to S and the Hmisc and Design Libraries at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf for extensive documentation and examples for the Hmisc package.

[Package Hmisc version 3.0-10 Index]