R: Some diagnostics for a fitted gam model

gam.check {mgcv}

R Documentation

Some diagnostics for a fitted gam model

Description

Takes a fitted gam object produced by gam() and produces some diagnostic information about the fitting procedure and results.

Usage

gam.check(b)

Arguments

`b`	a fitted `gam` object as produced by `gam()`.

Details

This function plots 4 standard diagnostic plots, and some other convergence diagnostics. Output differs depending on whether the underlying fitting method was mgcv or another method (see gam.method).

For mgcv based fits , the first plot shows the GCV or UBRE score against model degrees of freedom, given the final estimates of the relative smoothing parameters for the model. This is a slice through the GCV/UBRE score function that passes through the minimum found during fitting. Although not conclusive (except in the single smoothing parameter case), a lack of multiple local minima on this plot is suggestive of a lack of multiple local minima in the GCV/UBRE function and is therefore a good thing. Multiple local minima on this plot indicates that the GCV/UBRE function may have multiple local minima, but in a multiple smoothing parameter case this is not conclusive - multiple local minima on one slice through a function do not necessarily imply that the function has multiple local minima. A `good' plot here is a smooth curve with only one local minimum (which is therefore its global minimum).

The location of the minimum used for the fitted model is also marked on the first plot. Sometimes this location may be a local minimum that is not the global minimum on the plot. There is a legitimate reason for this to happen, and it does not always indicate problems. Smoothing parameter selection is based on applying GCV/UBRE to the approximating linear model produced by the GLM IRLS fitting method employed in gam.fit(). It is sometimes possible for these approximating models to develop `phantom' minima in their GCV/UBRE scores. These minima usually imply a big change in model parameters, and have the characteristic that the minimia will not be present in the GCV/UBRE score of the approximating model that would result from actually applying this parameter change. In other words, these are spurious minima in regions of parameter space well beyond those for which the weighted least squares problem can be expected to represent the real underlying likelihood well. Such minima can lead to convergence problems. To help ensure convergence even in the presence of phantom minima, gam.fit switches to a cautious optimization mode after a user controlled number of iterations of the IRLS algorithm (see gam.control). In the presence of local minima in the GCV/UBRE score, this method selects the minimum that leads to the smallest change in model estimated degrees of freedom. This approach is usually sufficient to deal with phantom minima. Setting trace to TRUE in gam.control will allow you to check exactly what is happening.

If the location of the point indicating the minimum is not on the curve showing the GCV/UBRE function then there are numerical problems with the estimation of the effective degrees of freedom: this usually reflects problems with the relative scaling of covariates that are arguments of a single smooth. In this circumstance reported estimated degrees of freedom can not be trusted, although the fitted model and term estimates are likely to be quite acceptable.

If the fit method is based on magic or gam.fit2 then there is no global search and the problems with phantom local minima are much reduced. The first plot in this case will simply be a normal QQ plot of the standardized residuals.

The other 3 plots are two residual plots and plot of fitted values against original data.

The function also prints out information about the convergence of the GCV minimization algorithm, indicating how many iterations were required to minimise the GCV/UBRE score. A message is printed if the minimization terminated by failing to improve the score with a steepest descent step: otherwise minimization terminated by meeting convergence criteria. The mean absolute gradient or RMS gradient of the GCV/UBRE function at the minimum is given. An indication of whether or not the Hessian of the GCV/UBRE function is positive definite is given. If some smoothing parameters are not well defined (effectively zero, or infinite) then it may not be, although this is not usually a problem. If the fit method is mgcv, a message is printed if the second guess smoothing parameters did not improve on the first guess - this is primarily there for the developer. For other fit methods the estimated rank of the model is printed.

The ideal results from this function have a smooth, single minima GCV/UBRE plot (where appropriate), good residual plots, and convergence to small gradients with a positive definite Hessian. However, failure to meet some of these criteria is often acceptable, and the information provided is primarily of use in diagnosing suspected problems. High gradients at convergence are a clear indication of problems, however.

Fuller data can be extracted from mgcv.conv part of the gam object.

Author(s)

Simon N. Wood simon.wood@r-project.org

References

Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

http://www.stats.gla.ac.uk/~simon/

Examples

library(mgcv)
set.seed(0)
n<-200
sig<-2
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
y <- 2 * sin(pi * x0)
y <- y + exp(2 * x1) - 3.75887
y <- y+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e <- rnorm(n, 0, sig)
y <- y + e
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
plot(b,pages=1)
gam.check(b)

[Package mgcv version 1.3-12 Index]