R: Find Similar Classes in Two-way Contingency Tables

matchClasses {e1071}

R Documentation

Find Similar Classes in Two-way Contingency Tables

Description

Try to find a mapping between the two groupings, such that as many cases as possible are in one of the matched pairs.

Usage

matchClasses(tab, method="rowmax", iter=1, maxexact=9, verbose=TRUE)
compareMatchedClasses(x, y, method="rowmax", iter=1,
                      maxexact=9, verbose=FALSE)

Arguments

`tab`	Two-way contingency table of class memberships
`method`	One of `"rowmax"`, `"greedy"` or `"exact"`.
`iter`	Number of iterations used in greedy search.
`verbose`	If `TRUE`, display some status messages during computation.
`maxexact`	Maximum number of variables for which all possible permutations are computed.
`x, y`	Vectors or matrices with class memberships.

Details

If method="rowmax", then each class defining a row in the contingency table is mapped to the column of the correspoding row maximum. Hence, some columns may be mapped to more than one row (while each row is mapped to a single column).

If method="greedy" or method="exact", then the contingency table must be a square matrix and a unique mapping is computed. This corresponds to a permutation of columns and rows, such that sum of the main diagonal, i.e., the trace of the matrix, gets as large as possible. For both methods, first all pairs where row and columns maxima correspond and are bigger than the sum of all other elements in the corresponding columns and rows together are located and fixed (this is a necessary condition for maximal trace).

If method="exact", then for the remaining rows and columns, all possible permutations are computed and the optimum is returned. This can get computationally infeasible very fast. If more than maxexact rows and columns remain after applying the necessary condition, then method is reset to "greedy". If method="greedy", then a greedy heuristic is tried iter times. Repeatedly a row is picked at random and matched to the free column with the maximum value.

compareMatchedClasses() computes the contingency table for each combination of columns from x and y and applies matchClasses to that table. The columns of the table are permuted accordingly and then the table is passed to classAgreement. The resulting agreement coefficients (diag, kappa, ...) are returned. The return value of compareMatchedClasses() is a list containing a matrix for each coefficient; with element (k,l) corresponding to the k-th column of x and l-th column of y. If y is missing, then the columns of x are compared with each other.

Author(s)

Friedrich Leisch

Examples

## a stupid example with no class correlations:
g1 <- sample(1:5, size=1000, replace=TRUE)
g2 <- sample(1:5, size=1000, replace=TRUE)
tab <- table(g1, g2)
matchClasses(tab, "exact")

## let pairs (g1=1,g2=4) and (g1=3,g2=1) agree better
k <- sample(1:1000, size=200)
g1[k] <- 1
g2[k] <- 4

k <- sample(1:1000, size=200)
g1[k] <- 3
g2[k] <- 1

tab <- table(g1, g2)
matchClasses(tab, "exact")

## get agreement coefficients:
compareMatchedClasses(g1, g2, method="exact")

[Package e1071 version 1.5-13 Index]