% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ombc_lcwm.R
\name{ombc_lcwm}
\alias{ombc_lcwm}
\title{Sequentially identify outliers while fitting a linear cluster-weighted
model.}
\usage{
ombc_lcwm(
  xy,
  x,
  y_formula,
  comp_num,
  max_out,
  gross_outs = rep(FALSE, nrow(x)),
  init_scheme = c("update", "reinit", "reuse"),
  mnames = "VVV",
  nmax = 1000,
  atol = 1e-08,
  init_z = NULL,
  init_method = c("hc", "kmpp"),
  init_scaling = TRUE,
  kmpp_seed = 123,
  verbose = TRUE,
  dd_weight = 0.5
)
}
\arguments{
\item{xy}{\code{data.frame} containing covariates and response.}

\item{x}{Covariate data only.}

\item{y_formula}{Regression formula.}

\item{comp_num}{Number of mixture components.}

\item{max_out}{Maximum number of outliers.}

\item{gross_outs}{Logical vector identifying gross outliers.}

\item{init_scheme}{Which initialisation scheme to use.}

\item{mnames}{Model names for mixture::gpcm.}

\item{nmax}{Maximum number of iterations for \code{flexCWM::cwm}.}

\item{atol}{EM convergence threshold for \code{flexCWM::cwm}.}

\item{init_z}{Initial component assignment probability matrix.}

\item{init_method}{Method used to initialise each mixture model.}

\item{init_scaling}{Logical value controlling whether the data should be
scaled for initialisation.}

\item{kmpp_seed}{Optional seed for k-means++ initialisation.}

\item{verbose}{Whether the iteration count is printed.}

\item{dd_weight}{A value between \code{0} and \code{1} which controls the weighting of
the response and covariate dissimilarities when aggregating.}
}
\value{
\code{ombc_lcwm} returns an object of class "outliermbc_lcwm", which is
essentially a list with the following elements:
\describe{
\item{\code{labels}}{Vector of mixture component labels with outliers denoted by
0.}
\item{\code{outlier_bool}}{Logical vector indicating if an observation has been
classified as an outlier.}
\item{\code{outlier_num}}{Number of observations classified as outliers.}
\item{\code{outlier_rank}}{Order in which observations are removed from the data
set. Observations which were provisionally removed,
including those that were eventually not classified
as outliers, are ranked from \code{1} to \code{max_out}. All
gross outliers have rank \code{1}. If there are
\code{gross_num} gross outliers, then the observations
removed during the main algorithm itself will be
numbered from \code{gross_num + 1} to \code{max_out}.
Observations that were ever removed have rank \code{0}.}
\item{\code{gross_outs}}{Logical vector identifying the gross outliers. This is
identical to the \code{gross_outs} vector passed to this
function as an argument / input.}
\item{\code{lcwm}}{Output from \code{flexCWM::cwm} fitted to the non-outlier
observations.}
\item{\code{loglike}}{Vector of log-likelihood values for each iteration.}
\item{\code{removal_dens}}{Vector of mixture densities for the removed
observations. These are the lowest mixture densities
at each iteration.}
\item{\code{distrib_diff_vec}}{Vector of aggregated cross-component
dissimilarity values for each iteration.}
\item{\code{distrib_diff_mat}}{Matrix of component-specific dissimilarity values
for each iteration.}
\item{\code{distrib_diff_arr}}{Array of component-specific response and
covariate dissimilarity values for each
iteration.}
\item{\code{call}}{Arguments / parameter values used in this function call.}
\item{\code{version}}{Version of \code{outlierMBC} used in this function call.}
\item{\code{conv_status}}{Logical vector indicating which iterations' mixture
models reached convergence during model-fitting.}
}
}
\description{
This function performs model-based clustering, clusterwise regression, and
outlier identification. It does so by iteratively fitting a linear
cluster-weighted model and removing the observation that is least likely
under the model. Its procedure is summarised below:
\enumerate{
\item Fit a linear cluster-weighted model to the data.
\item Compute a dissimilarity between the theoretical and observed distributions
of the scaled squared sample Mahalanobis distances for each mixture
component.
\item Compute a dissimilarity between the theoretical and observed distributions
of the scaled squared studentised residuals for each mixture component.
\item Aggregate these two dissimilarities to obtain one dissimilarity value
for each component.
\item Aggregate across the components to obtain a single dissimilarity value.
\item Remove the observation  with the lowest mixture density.
\item Repeat Steps 1-6 until \code{max_out} observations have been removed.
\item Identify the number of outliers which minimised the aggregated
dissimilarity, remove only those observations, and fit a linear
cluster-weighted model to the remaining data.
}
}
\examples{
gross_lcwm_k3n1000o10 <- find_gross(lcwm_k3n1000o10, 20)

ombc_lcwm_k3n1000o10 <- ombc_lcwm(
  xy = lcwm_k3n1000o10[, c("X1", "Y")],
  x = lcwm_k3n1000o10$X1,
  y_formula = Y ~ X1,
  comp_num = 3,
  max_out = 20,
  mnames = "V",
  gross_outs = gross_lcwm_k3n1000o10$gross_bool
)
}
