% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{adproclus}
\alias{adproclus}
\title{Additive profile clustering}
\usage{
adproclus(
  data,
  nclusters,
  start_allocation = NULL,
  nrandomstart = 3,
  nsemirandomstart = 3,
  algorithm = "ALS2",
  save_all_starts = FALSE,
  seed = NULL
)
}
\arguments{
\item{data}{Object-by-variable data matrix of class \code{matrix} or
\code{data.frame}.}

\item{nclusters}{Number of clusters to be used. Must be a positive integer.}

\item{start_allocation}{Optional matrix of binary values as starting
allocation for first run. Default is \code{NULL}.}

\item{nrandomstart}{Number of random starts (see \code{\link{get_random}}).
Can be zero. Increase for better results, though longer computation time.
Some research finds 500 starts to be a useful reference.}

\item{nsemirandomstart}{Number of semi-random starts
(see \code{\link{get_semirandom}})). Can be zero. Increase for better
results, though longer computation time.
Some research finds 500 starts to be a useful reference.}

\item{algorithm}{Character string "\code{ALS1}" (default) or "\code{ALS2}",
denoting the type of alternating least squares algorithm. Can be
abbreviated with "1" or "2".}

\item{save_all_starts}{Logical. If \code{TRUE}, the results of all algorithm
starts are returned. By default, only the best solution is retained.}

\item{seed}{Integer. Seed for the random number generator.
Default: NULL, meaning no reproducibility.}
}
\value{
\code{adproclus()} returns a list with the following
  components, which describe the best model (from the multiple starts):
  \describe{
  \item{\code{model}}{matrix. The obtained overlapping clustering model
  \strong{M} of the same size as \code{data}.}
  \item{\code{A}}{matrix. The membership matrix \strong{A} of the clustering
  model. Clusters are sorted by size.}
  \item{\code{P}}{matrix. The profile matrix
  \strong{P} of the clustering model.}
  \item{\code{sse}}{numeric. The
  residual sum of squares of the clustering model, which is minimized by the
  ALS algorithm.}
  \item{\code{totvar}}{numeric. The total sum of squares
  of \code{data}.}
  \item{\code{explvar}}{numeric. The proportion of variance
  in \code{data} that is accounted for by the clustering model.}
  \item{\code{iterations}}{numeric. The number of algorithm iterations
  until convergence of the relevant single start.}
  \item{\code{timer_one_run}}{numeric. The amount of time (in seconds) the
  relevant single start ran for.}
  \item{\code{initial_start}}{list. Containing the initial
  membership matrix, as well as the type of start that was used
  to obtain the clustering solution. (as returned by \code{\link{get_random}}
  or \code{\link{get_semirandom}})}
  \item{\code{runs}}{list. Each element represents one model obtained from
  one of the multiple starts.
  Each element contains all of the above information for the
  respective start.}
  \item{\code{parameters}}{list. Contains the parameters used for the
  model.}
  \item{\code{timer}}{numeric. The amount of time (in seconds) the complete
  algorithm ran for.}}
}
\description{
Perform additive profile clustering (ADPROCLUS) on object-by-variable data.
Creates a model that assigns the objects to overlapping clusters which are
characterized in terms of the variables by the so-called profiles.
}
\details{
In this function, Mirkin's (1987, 1990) Additive Profile Clustering
(ADPROCLUS) method is used to obtain an unrestricted overlapping clustering
model of the object by variable data provided by \code{data}.

The ADPROCLUS model approximates an \eqn{I \times J} object by
variable data matrix \eqn{X} by an \eqn{I \times J} model matrix
\eqn{M} that can be decomposed into an \eqn{I \times K} binary
cluster membership matrix \eqn{A} and a \eqn{K \times J}
real-valued cluster profile matrix \eqn{P}, with \eqn{K}
indicating the number of overlapping clusters.
In particular, the aim of an ADPROCLUS analysis is therefore,
given a number of clusters \eqn{K}, to estimate a
model matrix \eqn{M = AP} which reconstructs the data matrix
\eqn{X} as close as possible in a least squares sense
(i.e. sum of squared residuals). For a detailed illustration of the
ADPROCLUS model and associated loss function, see Wilderjans et al. (2011).

The alternating least squares algorithms ("\code{ALS1}" and "\code{ALS2}")
that can be used for minimization of the loss function were proposed by
Depril et al. (2008). In "\code{ALS2}", starting from an initial random or
rational estimate of \eqn{A} (see \code{\link{get_random}} and
\code{\link{get_semirandom}}), \eqn{A} and \eqn{P}
are alternately re-estimated conditionally upon each other until convergence.
The "\code{ALS1}" algorithm differs from the previous one in that each
row in \eqn{A} is updated independently and that the
conditionally optimal \eqn{P} is recalculated after each row
update, instead of the end of the matrix. For a discussion and comparison of
the different algorithms, see Depril et al., 2008.

\strong{Warning:} Computation time increases exponentially with increasing
number of clusters, \eqn{K}. We recommend to determine the computation time
of a single start for each specific dataset and \eqn{K} before increasing the
number of starts.
}
\examples{
# Loading a test dataset into the global environment
x <- stackloss

# Quick clustering with K = 2 clusters
clust <- adproclus(data = x, nclusters = 2)

# Clustering with K = 3 clusters,
# using the ALS2 algorithm,
# with 2 random and 2 semi-random starts
clust <- adproclus(x, 3,
  nrandomstart = 2, nsemirandomstart = 2, algorithm = "ALS2"
)

# Saving the results of all starts
clust <- adproclus(x, 3,
  nrandomstart = 2, nsemirandomstart = 2, save_all_starts = TRUE
)

# Clustering using a user-defined rational start profile matrix
# (here the first 4 rows of the data)
start <- get_rational(x, x[1:4, ])$A
clust <- adproclus(x, 4, start_allocation = start)

}
\references{
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D.
  (2011S). ADPROCLUS: a graphical user interface for fitting additive profile
  clustering models to object by variable data matrices. \emph{Behavior
  Research Methods, 43}(1), 56-65.

  Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive
  clustering of rectangular data tables. \emph{Computational Statistics and
  Data Analysis, 52,} 4923-4938.

  Mirkin, B. G. (1987). The method of principal clusters. \emph{Automation
  and Remote Control}, 10:131-143.

  Mirkin, B. G. (1990). A sequential fitting procedure for linear data
  analysis models. \emph{Journal of Classification}, 7(2):167-195.
}
\seealso{
\describe{
  \item{\code{\link{adproclus_low_dim}}}{for low dimensional ADPROCLUS}
  \item{\code{\link{get_random}}}{for generating random starts}
  \item{\code{\link{get_semirandom}}}{for generating semi-random starts}
  \item{\code{\link{get_rational}}}{for generating rational starts}
}
}
