% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ff_function.R
\name{fasano.franceschini.test}
\alias{fasano.franceschini.test}
\title{Fasano-Franceschini Test}
\usage{
fasano.franceschini.test(
  S1,
  S2,
  nPermute = 100,
  threads = 1,
  seed = NULL,
  p.conf.level = 0.95,
  verbose = TRUE,
  method = c("r", "b")
)
}
\arguments{
\item{S1}{\code{matrix} or \code{data.frame}.}

\item{S2}{\code{matrix} or \code{data.frame}.}

\item{nPermute}{A nonnegative \code{integer} setting the number of permuted
samples to generate when estimating the permutation test p-value. Default is
\code{100}. If set to \code{0}, only the test statistic is computed.}

\item{threads}{A positive \code{integer} or \code{"auto"} setting the number
of threads used for performing the permutation test. If set to \code{"auto"},
the number of threads is determined by \code{RcppParallel::defaultNumThreads()}.
Default is \code{1}.}

\item{seed}{An optional integer to seed the PRNG used for the permutation test.
A seed must be passed to reproducibly compute p-values.}

\item{p.conf.level}{Confidence level for the confidence interval of the
permutation test p-value.}

\item{verbose}{A \code{boolean} indicating whether to display a progress bar.
Default is \code{TRUE}. Only available when \code{threads = 1}.}

\item{method}{An optional \code{character} indicating which method to use to
compute the test statistic. The two methods are \code{'r'} (range tree) and
\code{'b'} (brute force). Both methods return the same results but may vary in
computation speed. If this argument is not passed, the sample sizes and dimension
of the data are used to infer which method is likely faster. See the Details
section for more information.}
}
\value{
A list with class \code{htest} containing the following components:
  \item{statistic}{The value of the test statistic \emph{D}.}
  \item{estimate}{The value of the difference statistics \emph{D1} and \emph{D2}.}
  \item{p.value}{The permutation test p-value.}
  \item{conf.int}{A binomial confidence interval for the p-value.}
  \item{method}{A character string indicating what type of test was performed.}
  \item{data.name}{A character string giving the names of the data.}
}
\description{
Performs a two-sample multidimensional Kolmogorov-Smirnov test as described
by Fasano and Franceschini (1987). This test evaluates the null hypothesis
that two i.i.d. random samples were drawn from the same underlying
probability distribution. The data can be of any dimension, and can be of
any type (continuous, discrete, or mixed).
}
\details{
The test statistic can be computed using two different methods.
Both methods return identical results, but have different time complexities:
\itemize{
  \item Range tree method: This method has a time complexity of
  \emph{O(n*log(n)^(d-1))}, where \emph{n} is the size of the larger sample
  and \emph{d} is the dimension of the data.
  \item Brute force method: This method has a time complexity of \emph{O(d*n^2)}.
}
The range tree method tends to be faster for low dimensional data or large
sample sizes, while the brute force method tends to be faster for high
dimensional data or small sample sizes. When \code{method} is not passed,
the sample sizes and dimension of the data are used to infer which method will
likely be faster. However, as the geometry of the samples can greatly influence
computation time, the method inferred to be faster may not actually be faster. To
perform more comprehensive benchmarking for a specific dataset, \code{nPermute}
can be set equal to \code{0}, which bypasses the permutation test and only
computes the test statistic.

The p-value for the test is computed empirically using a permutation test. As
it is almost always infeasible to compute the exact permutation test p-value,
a Monte Carlo approximation is made instead. This estimate is a binomially
distributed random variable, and thus a confidence interval can be computed.
The confidence interval is obtained using the procedure given in Clopper and
Pearson (1934).
}
\examples{
set.seed(0)

# create 2-D samples
S1 <- data.frame(x = rnorm(n = 20, mean = 0, sd = 1),
                 y = rnorm(n = 20, mean = 1, sd = 2))
S2 <- data.frame(x = rnorm(n = 40, mean = 0, sd = 1),
                 y = rnorm(n = 40, mean = 1, sd = 2))

# perform test
fasano.franceschini.test(S1, S2)

# perform test with more permutations
fasano.franceschini.test(S1, S2, nPermute = 150)

# set seed for reproducible p-value
fasano.franceschini.test(S1, S2, seed = 0)$p.value
fasano.franceschini.test(S1, S2, seed = 0)$p.value

# change confidence level for p-value confidence interval
fasano.franceschini.test(S1, S2, p.conf.level = 0.99)

# perform test using range tree method
fasano.franceschini.test(S1, S2, method = 'r')

# perform test using brute force method
fasano.franceschini.test(S1, S2, method = 'b')

# perform test using multiple threads to speed up p-value computation
\dontrun{
fasano.franceschini.test(S1, S2, threads = 2)
}

}
\references{
{
\itemize{
  \item{Fasano, G. & Franceschini, A. (1987). A multidimensional version of the
  Kolmogorov-Smirnov test. \emph{Monthly Notices of the Royal Astronomical Society},
  225:155-170. \doi{10.1093/mnras/225.1.155}.}
  \item{Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial
  limits illustrated in the case of the binomial. \emph{Biometrika}, 26, 404–413.
  \doi{10.2307/2331986}.}
} }
}
