% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/microaggregation.R
\docType{methods}
\name{microaggregation}
\alias{microaggregation}
\title{Microaggregation}
\usage{
microaggregation(
  obj,
  variables = NULL,
  aggr = 3,
  strata_variables = NULL,
  method = "mdav",
  weights = NULL,
  nc = 8,
  clustermethod = "clara",
  measure = "mean",
  trim = 0,
  varsort = 1,
  transf = "log"
)
}
\arguments{
\item{obj}{either an object of class \code{\link{sdcMicroObj-class}} or a \code{data.frame}}

\item{variables}{variables to microaggregate. For \code{NULL}: If obj is of class
sdcMicroObj, all numerical key variables are chosen per default. For
\code{data.frames}, all columns are chosen per default.}

\item{aggr}{aggregation level (default=3)}

\item{strata_variables}{for \code{data.frames}, by-variables for applying microaggregation only
within strata defined by the variables. For \code{\link{sdcMicroObj-class}}-objects, the
stratification-variable defined in slot \code{@strataVar} is used. This slot can be changed any
time using \code{strataVar<-}.}

\item{method}{pca, rmd, onedims, single, simple, clustpca, pppca,
clustpppca, mdav, clustmcdpca, influence, mcdpca}

\item{weights}{sampling weights. If obj is of class sdcMicroObj the vector
of sampling weights is chosen automatically. If determined, a weighted
version of the aggregation measure is chosen automatically, e.g. weighted
median or weighted mean.}

\item{nc}{number of cluster, if the chosen method performs cluster analysis}

\item{clustermethod}{clustermethod, if necessary}

\item{measure}{aggregation statistic, mean, median, trim, onestep (default=mean)}

\item{trim}{trimming percentage, if measure=trim}

\item{varsort}{variable for sorting, if method=single}

\item{transf}{transformation for data x}
}
\value{
If \sQuote{obj} was of class \code{\link{sdcMicroObj-class}} the corresponding
slots are filled, like manipNumVars, risk and utility. If \sQuote{obj} was
of class \dQuote{data.frame}, an object of class \dQuote{micro} with following entities is returned:
\describe{
\item{\code{x}: }{original data}
\item{\code{mx}: }{the microaggregated dataset}
\item{\code{method}: }{method}
\item{\code{aggr}: }{aggregation level}
\item{\code{measure}: }{proximity measure for aggregation}}
}
\description{
Function to perform various methods of microaggregation.
}
\details{
On \url{https://research.cbs.nl/casc/glossary.htm} one can found the
\dQuote{official} definition of microaggregation:

Records are grouped based on a proximity measure of variables of interest,
and the same small groups of records are used in calculating aggregates for
those variables. The aggregates are released instead of the individual
record values.

The recommended method is \dQuote{rmd} which forms the proximity using
multivariate distances based on robust methods. It is an extension of the
well-known method \dQuote{mdav}.  However, when computational speed is
important, method \dQuote{mdav} is the preferable choice.

While for the proximity measure very different concepts can be used, the
aggregation itself is naturally done with the arithmetic mean.
Nevertheless, other measures of location can be used for aggregation,
especially when the group size for aggregation has been taken higher than 3.
Since the median seems to be unsuitable for microaggregation because of
being highly robust, other mesures which are included can be chosen. If a
complex sample survey is microaggregated, the corresponding sampling weights
should be determined to either aggregate the values by the weighted
arithmetic mean or the weighted median.

This function contains also a method with which the data can be clustered
with a variety of different clustering algorithms. Clustering observations
before applying microaggregation might be useful.  Note, that the data are
automatically standardised before clustering.

The usage of clustering method \sQuote{Mclust} requires package mclust02,
which must be loaded first. The package is not loaded automatically, since
the package is not under GPL but comes with a different licence.

The are also some projection methods for microaggregation included.  The
robust version \sQuote{pppca} or \sQuote{clustpppca} (clustering at first)
are fast implementations and provide almost everytime the best results.

Univariate statistics are preserved best with the individual ranking method
(we called them \sQuote{onedims}, however, often this method is named
\sQuote{individual ranking}), but multivariate statistics are strong
affected.

With method \sQuote{simple} one can apply microaggregation directly on the
(unsorted) data. It is useful for the comparison with other methods as a
benchmark, i.e. replies the question how much better is a sorting of the
data before aggregation.
}
\note{
if only one variable is specified, \code{\link{mafast}} is applied and argument \code{method} is ignored.
Parameters \code{measure} are ignored for methods \code{mdav} and \code{rmd}.
}
\examples{
data(testdata)
# donttest since Examples with CPU time larger 2.5 times elapsed time, because
# of using data.table and multicore computation.
\donttest{
m <- microaggregation(
  obj = testdata[1:100, c("expend", "income", "savings")],
  method = "mdav",
  aggr = 4
)
summary(m)

## for objects of class sdcMicro:
## no stratification because `@strataVar` is `NULL`
data(testdata2)
sdc <- createSdcObj(
  dat = testdata2,
  keyVars = c("urbrur", "roof", "walls", "water", "electcon", "sex"),
  numVars = c("expend", "income", "savings"),
  w = "sampling_weight"
)
sdc <- microaggregation(
  obj = sdc,
  variables = c("expend", "income")
)

## with stratification using variable `"relat"`
strataVar(sdc) <- "relat"
sdc <- microaggregation(
  obj = sdc,
  variables = "savings"
)
}
}
\references{
Templ, M. and Meindl, B., \emph{Robust Statistics Meets SDC: New Disclosure
Risk Measures for Continuous Microdata Masking}, Lecture Notes in Computer
Science, Privacy in Statistical Databases, vol. 5262, pp. 113-126, 2008.

Templ, M. \emph{Statistical Disclosure Control for Microdata Using the
R-Package sdcMicro}, Transactions on Data Privacy, vol. 1, number 2, pp.
67-85, 2008.  \url{http://www.tdp.cat/issues/abs.a004a08.php}

Templ, M. \emph{New Developments in Statistical Disclosure Control and
Imputation: Robust Statistics Applied to Official Statistics},
Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280,
264 pages.

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R.
\emph{Springer International Publishing}, 287 pages, 2017. ISBN 978-3-319-50272-4. \doi{10.1007/978-3-319-50272-4}
\doi{10.1007/978-3-319-50272-4}

Templ, M. and Meindl, B. and Kowarik, A.: \emph{Statistical Disclosure Control for
Micro-Data Using the R Package sdcMicro}, Journal of Statistical Software,
67 (4), 1--36, 2015.
}
\seealso{
\code{\link{summary.micro}}, \code{\link{plotMicro}},
\code{\link{valTable}}
}
\author{
Matthias Templ, Bernhard Meindl

For method \dQuote{mdav}: This work is being supported by the International
Household Survey Network (IHSN) and funded by a DGF Grant provided by the
World Bank to the PARIS21 Secretariat at the Organisation for Economic
Co-operation and Development (OECD).  This work builds on previous work
which is elsewhere acknowledged.

Author for the integration of the code for mdav in R: Alexander Kowarik.
}
\keyword{manip}
