We generated three samples, with \(n=200\) observations each, from a
2-dimensional Gaussian distributions with mean vectors \(\mu_1 = (0, \frac{\sqrt{3}}{3})\), \({\mu}_2 = (-\frac{1}{2},
-\frac{\sqrt{3}}{6})\) and \(\mu_3 =
(\frac{1}{2}, -\frac{\sqrt{3}}{6})\), and the Identity matrix as
covariance matrix. In this situation, the generated samples are well
separated, following different Gaussian distributions, i.e. \(X_1 \sim N_2(\mu_1, I)\), \(X_2 \sim N_2(\mu_2, I)\) and \(X_3 \sim N_2(\mu_3, I)\)}. The vector
y
indicates the membership to groups.
library(mvtnorm)
library(QuadratiK)
sizes <- rep(200,3)
eps <- 1
set.seed(2468)
x1 <- rmvnorm(sizes[1], mean = c(0,sqrt(3)*eps/3))
x2 <- rmvnorm(sizes[2], mean = c(-eps/2,-sqrt(3)*eps/6))
x3 <- rmvnorm(sizes[3], mean = c(eps/2,-sqrt(3)*eps/6))
x <- rbind(x1, x2, x3)
y <- as.factor(rep(c(1,2,3), times=sizes))
Recall that the computed test statistics correspond to the omnibus tests.
##
## Kernel-based quadratic distance k-sample test
## U-statistic
## --------------------------------------------
## H0 is rejected: TRUE TRUE
## Test Statistic: 11.844 38.6817
## Critical value (CV): 0.5623288 1.836868
## CV method: subsampling
## Selected tuning parameter h: 1.5
When the \(k\)-sample test is
performed, the summary
method on the kb.test
object returns the results of the tests together with the standard
descriptive statistics for each variable computed, overall and with
respect to the provided groups.
##
## Kernel-based quadratic distance k-sample test
## Test_Statistic Critical_Value Reject_H0
## 1 11.8440 0.5623288 TRUE
## 2 38.6817 1.8368685 TRUE
## [[1]]
## Group 1 Group 2 Group 3 Overall
## mean -0.005959147 -0.5370127 0.5442058 0.0004113282
## sd 0.997319811 0.9583059 1.0374834 1.0900980006
## median -0.028244038 -0.5477108 0.5297478 -0.0239486027
## IQR 1.478884929 1.4105832 1.4234532 1.5377418198
## min -2.860006689 -3.1869808 -2.2119189 -3.1869807848
## max 2.151784802 2.0647648 3.1580700 3.1580700259
##
## [[2]]
## Group 1 Group 2 Group 3 Overall
## mean 0.4935364 -0.4042219 -0.2461729 -0.05228613
## sd 1.0449582 1.0411639 1.0474989 1.11391575
## median 0.5281635 -0.4325995 -0.2950922 -0.09520111
## IQR 1.4001089 1.4662111 1.2867345 1.48444495
## min -2.6448703 -2.8786352 -3.4932849 -3.49328492
## max 3.0792766 2.6788424 2.8290722 3.07927659
If a value of \(h\) is not provided,
the function automatically perform the function
select_h
.
For a more accurate search of the tuning parameter, the function
select_h
can be used.This function needs the input
x
and y
as the function kb.test
for the \(k\)-sample problem.
The figure generated by the select_h
function on the
result of the selection of \(h\)
algorithm for the \(k\)-sample data set
displays the obtained power versus the considered \(h\), for each value of skewness alternative
\(\delta\) considered.
As it is possible to see from the figure, when the alternative distribution \(F_\delta\) with \(\delta=0.2\) is considered, there are no values of \(h\) which achieve power greater than or equal to 0.5. Then, the second value of \(\delta=0.3\) is take into account and \(h=1.6\) is chosen as optimal value since it is the smallest value with power greater than 0.5. Additionally, it gives a possible set of values with high power performance.