% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/uhclust.R
\name{uhclust}
\alias{uhclust}
\title{U-statistic based significance hierarchical clustering}
\usage{
uhclust(md = NULL, data = NULL, alpha = 0.05, rep = 15, plot = TRUE)
}
\arguments{
\item{md}{Matrix of squared Euclidean distances between all data points.}

\item{data}{Data matrix. Each row represents an observation.}

\item{alpha}{Significance level.}

\item{rep}{Number of times to repeat optimization procedures. Important for problems with
multiple optima.}

\item{plot}{Logical, \code{TRUE} if p-value annotated dendrogram should be plotted.}
}
\value{
Returns an object of class \code{hclust} with three additional attribute arrays:\describe{
\item{Pvalues}{ P-values from uclust for the final data partition at each node of the dendrogram. This
array is in the same order of \code{height}, and only contains values for tests that were performed.}
\item{alpha}{ Bonferroni corrected significance levels for uclust for the data partitions at each node
of the dendrogram. This array is in the same order of \code{height}, and only contains values for tests that were performed.}
\item{groups}{ Final group assignments.}
}
}
\description{
Hierarchical clustering method that partitions the data only when these partitions are statistically significant.
}
\details{
This is the significance hierarchical clustering procedure of Valk and Cybis (2018). The data are
repeatedly partitioned into two subgroups, through function \code{uclust}, according to a hierarchical scheme.
The procedure stops when resulting subgroups are homogeneous or have fewer than 3 elements.
This function should be used in high dimension small sample size settings.



Either \code{data} or \code{md} should be provided.
If data are entered directly, Bn will be computed considering the squared Euclidean distance.
It is important that if a distance matrix is entered, it consists of squared Euclidean distances, otherwise test results are
invalid.

Variance of \code{bn} is estimated through resampling, and thus, p-values may vary a bit in different runs.

For more detail see Cybis, Gabriela B., Marcio Valk, and Sílvia RC Lopes. "Clustering and classification problems in genetics through U-statistics."
Journal of Statistical Computation and Simulation 88.10 (2018)
and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).

 See also \code{is_homo}, \code{uclust} and \code{Utest_class}.
}
\examples{

x = matrix(rnorm(100000),nrow=50)  #creating homogeneous Gaussian dataset
res = uhclust(data=x)


x[1:30,] = x[1:30,]+0.7   #Heterogeneous dataset
x[1:10,] = x[1:10,]+0.4
res = uhclust(data=x)
res$groups

}
