R: Two Sample Tests

TwoSampleTests {fBasics}

R Documentation

Two Sample Tests

Description

A collection and description of functions for two sample statistical tests. The functions allow to test for distributional equivalence, for difference in location, variance and scale, and for correlations.

Distributional Equivalence:

ks2Test Two sample Kolmogorov-Smirnov test.

Difference in Locations:

`tTest`	The t test,
`kw2Test`	the Kruskal–Wallis test.

Difference in Variance:

`varfTest`	The variance F test,
`bartlett2Test`	the Bartlett test,
`fligner2Test`	the Fligner–Killeen test.

Difference in Scale:

`ansariTest`	The Ansari–Bradley test,
`moodTest`	the Mood test.

Correlations:

`pearsonTest`	Pearson's coefficient,
`kendallTest`	Kendall's rho,
`spearmanTest`	Spearman's rho.

Test Distributions:

[dpq]ansariw Distribution of the Ansari W statistic.

Usage

ks2Test(x, y, title = NULL, description = NULL)

tTest(x, y, title = NULL, description = NULL) 
kw2Test(x, y, title = NULL, description = NULL)

varfTest(x, y, title = NULL, description = NULL)
bartlett2Test(x, y, title = NULL, description = NULL)
fligner2Test(x, y, title = NULL, description = NULL)

ansariTest(x, y, title = NULL, description = NULL)
moodTest(x, y, title = NULL, description = NULL)

pearsonTest(x, y, title = NULL, description = NULL)
kendallTest(x, y, title = NULL, description = NULL)
spearmanTest(x, y, title = NULL, description = NULL)

dansariw(x = NULL, m, n = m)
pansariw(q = NULL, m, n = m)
qansariw(p, m, n = m)

Arguments

`description`	optional description string, or a vector of character strings.
`m, n`	[*ansariw] -
`p`	[qansariw] - a numeric vector of quantiles.
`q`	[pansariw] - a numeric vector of quantiles.
`title`	an optional title string, if not specified the inputs data name is deparsed.
`x, y`	a numeric vector of data values. [bartlett2Test][fligner2Test][kw2Test] - here `x` is a list, where each element is either a vector or an object of class `timeSeries`. `y` is only used for the two–sample test situation, where `x` and `y` are two vectors or objects of class `timeSeries`. [dansariw] - a numeric vector of quantiles.

Details

The tests may be of interest for many financial and economic applications, especially for the comparison of two time series. The tests are grouped according to their functionalities.

Distributional Equivalence:

The test ks2Test performs a Kolmogorov–Smirnov two sample test that the two data samples x and y come from the same distribution, not necessarily a normal distribution. That means that it is not specified what that common distribution is.

Differences in Location:

The function tTest can be used to determine if the two sample means are equal for unpaired data sets. Two variants are used, assuming equal or unequal variances.

The function kw2Test performs a Kruskal-Wallis rank sum test of the null hypothesis that the central tendencies or medians of two samples are the same. The alternative is that they differ. Note, that it is not assumed that the two samples are drawn from the same distribution. It is also worth to know that the test assumes that the variables under consideration have underlying continuous distributions.

Differences in Variances:

The function varfTest can be used to compare variances of two normal samples performing an F test. The null hypothesis is that the ratio of the variances of the populations from which they were drawn is equal to one.

The function bartlett2Test performs the Bartlett's test of the null hypothesis that the variances in each of the samples are the same. This fact of equal variances across samples is also called homogeneity of variances. Note, that Bartlett's test is sensitive to departures from normality. That is, if the samples come from non-normal distributions, then Bartlett's test may simply be testing for non-normality. The Levene test (not yet implemented) is an alternative to the Bartlett test that is less sensitive to departures from normality.

The function fligner2Test performs the Fligner-Killeen test of the null that the variances in each of the two samples are the same.

Differences in Scale:

The function ansariTest performs the Ansari–Bradley two–sample test for a difference in scale parameters. Note, that we have completely reimplemented this test based on the statistcs and p-values computed from algorithm AS 93. The test returns for any sizes of the series x and y the exact p value together with its asymptotic limit. The test procedure is not limited to sizes shorter of length 50 as this is the case for the function ansari.Test implemented in R's stats package. For the test statistics the following functions are available: dansariw, pansariw, and qansariw.

The function code{moodTest}, is another test which performs a two–sample test for a difference in scale parameters. The underlying model is that the two samples are drawn from f(x-l) and f((x-l)/s)/s, respectively, where l is a common location parameter and s is a scale parameter. The null hypothesis is s=1.

Correlations:

The function correlationTest tests for association between paired samples, using Pearson's product moment correlation coefficient,

The function kendallTest performs Kendall's tau test

The function spearmanTest performs Spearman's rho test.

Value

In contrast to R's output report from S3 objects of class "htest" a different output report is produced. The classical tests presented here return an S4 object of class "fHTEST". The object contains the following slots:

`@call`	the function call.
`@data`	the data as specified by the input argument(s).
`@test`	a list whose elements contail the results from the statistical test. The information provided is similar to a list object of class{"htest"}.
`@title`	a character string with the name of the test. This can be overwritten specifying a user defined input argument.
`@description`	a character string with an optional user defined description. By default just the current date when the test was applied will be returned.
`statistic`	the value(s) of the test statistic.
`p.value`	the p-value(s) of the test.
`parameters`	a numeric value or vector of parameters.
`estimate`	a numeric value or vector of sample estimates.
`conf.int`	a numeric two row vector or matrix of 95
`method`	a character string indicating what type of test was performed.
`data.name`	a character string giving the name(s) of the data.

Note

Some of the test implementations are selected from R's ctest package.

Author(s)

R-core team for the tests from R's ctest package,
Diethelm Wuertz for the Rmetrics R-port.

References

Conover, W. J. (1971); Practical nonparametric statistics, New York: John Wiley & Sons.

Durbin J. (1961); Some Methods of Constructing Exact Tests, Biometrika 48, 41–55.

Durbin,J. (1973); Distribution Theory Based on the Sample Distribution Function, SIAM, Philadelphia.

Lehmann E.L. (1986); Testing Statistical Hypotheses, John Wiley and Sons, New York.

Moore, D.S. (1986); Tests of the chi-squared type, In: D'Agostino, R.B. and Stephens, M.A., eds., Goodness-of-Fit Techniques, Marcel Dekker, New York.

Examples

## SOURCE("fBasics.15E-TwoSampleTests")

## x, y -
   xmpBasics("\nStart: Create two Samples > ")
   x = rnorm(50)
   y = rnorm(50)
  
## ks2Test - 
   xmpBasics("\nNext: Distributional Tests > ")
   ks2Test(x, y)
   
## tTest | kw2Test - 
   xmpBasics("\nNext: Location Tests > ")
   tTest(x, y)
   kw2Test(x, y)
   
## varfTest, bartlett2Test | fligner2Test -
   xmpBasics("\nNext: Variance Tests > ")
   varfTest(x, y)
   bartlett2Test(x, y)
   fligner2Test(x, y)

## ansariTest | moodTest -
   xmpBasics("\nNext: Scale Tests > ")
   ansariTest(x, y)
   moodTest(x, y)
   
## pearsonTest | kendallTest | spearmanTest -
   xmpBasics("\nNext: Correlation Tests > ")
   pearsonTest(x, y)
   kendallTest(x, y)
   spearmanTest(x, y)

[Package fBasics version 221.10065 Index]