% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/UNPaC_Copula.R
\name{UNPaC_Copula}
\alias{UNPaC_Copula}
\title{Unimodal Non-Parametric Cluster (UNPaC) Significance Test}
\usage{
UNPaC_Copula(x, cluster, cluster.fun, nsim = 100,
  var_selection = FALSE, gamma = 0.1, p.adjust = "fdr", k = 2,
  rho = 0.02, cov = "glasso", center = TRUE, scale = FALSE)
}
\arguments{
\item{x}{a dataset with n observations (rows) and p features (columns)}

\item{cluster}{labels generated by clustering method}

\item{cluster.fun}{function used to cluster data. Function should return list containing a component "cluster."
Examples include \code{\link[stats]{kmeans}} and  \code{\link[cluster]{pam}}.}

\item{nsim}{a numeric value specifying the number of unimodal reference distributions used for testing
(default=100)}

\item{var_selection}{should dimension be reduced using feature filtering procedure? See description below. (default=FALSE)}

\item{gamma}{threshold for feature filtering procedure. See description below. Not used if var_selection=FALSE (default=0.10)}

\item{p.adjust}{p-value adjustment method for additional feature filtering. See \code{\link[stats]{p.adjust}}
for options. (default="fdr"). Not used if p.adjust="none."}

\item{k}{integer value specifying the number of clusters to test (default=2)}

\item{rho}{a regularization parameter used in implementation of the graphical lasso. See documentation for lambda in
\code{\link[huge]{huge}}.
Not used if \code{cov="est"} or \code{cov="banded"}}

\item{cov}{method used for approximating the covariance structure.  options include: "glasso"
(See \code{\link[huge]{huge}}), "banded"  (See \code{\link[PDSCE]{band.chol.cv}}) and
       "est" (default = "glasso")}

\item{center}{should data be centered such that each feature has mean equal to zero prior to clustering
(default=TRUE)}

\item{scale}{should data be scaled such that each feature has variance equal to one prior to clustering
(default=FALSE)}
}
\value{
The function returns a list with the following components:
\itemize{
\item{\code{selected_features}}: {A vector of integers indicating the features retained by the feature filtering process.}
\item{\code{sim_CI}}: {vector containing the cluster indices for each generated unimodal reference distribution}
\item{\code{pvalue_emp}}: {the empirical p-value:  the proportion of times the cluster index from the reference
data is smaller the cluster index from the observed data}
\item{\code{pvalue_norm}}: {the normalized p-value: the simulated p-value based on comparison to a standard normal distribution }
}
}
\description{
The UnPAC test assesses the significance of clusters by comparing the cluster index (CI) from the data to the CI from a ortho-unimodal reference data generated using a Gaussian copula.
This method is similar to them method described in Helgeson and Bair (2016) except a Gaussian copula approach is used to account for feature correlation.
 The CI is defined to be the sum of the
within-cluster sum of squares about the cluster means divided by the total sum of squares. Smaller values of the
CI indicate a stronger clustering.
}
\details{
There are three options for the covariance matrix used in generating the Gaussian
copula: sample covariance estimation, \code{cov="est"}, which should be used if n>p; the graphical lasso,
\code{cov="glasso"}, which should be used if n<p; and  k-banded covariance, \code{cov="banded"}, which can be used if n<p and it can be assumed that
features farther away in the ordering have weaker covariance. The graphical lasso is implemented using the \code{\link[huge]{huge}} function.
When \code{cov="banded"} is selected the k-banded covariance Cholesky factor of Rothman, Levina, and Zhu (2010) is used to estimate the covariance matrix.
Cross-validation is used for selecting the banding parameter. See documentation in \code{\link[PDSCE]{band.chol.cv}}.

In high dimensional (n<p) settings a dimension reduction step can be implemented which selects features
based on an F-test for difference in means across clusters. Features having a p-value less than a threshold
\code{gamma} are retained. For additional feature filtering a p-value adjustment procedure (such as p.adjust="fdr")
can be used. If no features are retained the resulting p-value for the cluster significance test is given as 1.
}
\examples{
# K-means example
test1 <- matrix(rnorm(100*50), nrow=100, ncol=50)
test1[1:30,1:50] <- rnorm(30*50, 2)
test.data<-scale(test1,scale=FALSE,center=TRUE)
cluster<-kmeans(test.data,2)$cluster
UNPaCResults <- UNPaC_Copula(test.data,cluster,kmeans, nsim=100,cov="est")

# Hierarchical clustering example
 \donttest{
test <- matrix(nrow=1200, ncol=75)
theta <- rep(NA, 1200)
theta[1:500] <- runif(500, 0, pi)
theta[501:1200] <- runif(700, pi, 2*pi)
test[1:500,seq(from=2,to=50,by=2)] <- -2+5*sin(theta[1:500])
test[501:1200,seq(from=2,to=50,by=2)] <- 5*sin(theta[501:1200])
test[1:500,seq(from=1,to=49,by=2)] <- 5+5*cos(theta[1:500])
test[501:1200,seq(from=1,to=49,by=2)] <- 5*cos(theta[501:1200])
test[,1:50] <- test[,1:50] + rnorm(50*1200, 0, 0.2)
test[,51:75] <- rnorm(25*1200, 0, 1)
test.data<-scale(test,center=TRUE,scale=FALSE)
# Defining clustering function
hclustFunction<-function(x,k){
 D<-dist(x)
 xn.hc <- hclust(D, method="single")
 list(cluster=cutree(xn.hc, k))}

cluster=hclustFunction(test.data,2)$cluster
UNPaCResults <- UNPaC_Copula(test.data,cluster,hclustFunction, nsim=100,cov="est")
}
}
\references{
\itemize{
    \item Helgeson E and Bair E (2016). ``Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution."
    arXiv preprint arXiv:1610.01424.
    \item Rothman, A. J., Levina, E., and Zhu, J. (2010). ``A new approach to Cholesky-based covariance regularization in
 high dimensions." Biometrika 97(3): 539-550.
}
}
\author{
Erika S. Helgeson, David Vock, Eric Bair
}
