% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bestNormalize.R
\name{bestNormalize}
\alias{bestNormalize}
\alias{predict.bestNormalize}
\alias{print.bestNormalize}
\alias{tidy.bestNormalize}
\title{Calculate and perform best normalizing transformation}
\usage{
bestNormalize(
  x,
  standardize = TRUE,
  allow_orderNorm = TRUE,
  allow_lambert_s = FALSE,
  allow_lambert_h = FALSE,
  allow_exp = TRUE,
  out_of_sample = TRUE,
  cluster = NULL,
  k = 10,
  r = 5,
  loo = FALSE,
  warn = FALSE,
  quiet = FALSE,
  tr_opts = list(),
  new_transforms = list(),
  norm_stat_fn = NULL,
  ...
)

\method{predict}{bestNormalize}(object, newdata = NULL, inverse = FALSE, ...)

\method{print}{bestNormalize}(x, ...)

\method{tidy}{bestNormalize}(x)
}
\arguments{
\item{x}{A `bestNormalize` object.}

\item{standardize}{If TRUE, the transformed values are also centered and
scaled, such that the transformation attempts a standard normal. This will
not change the normality statistic.}

\item{allow_orderNorm}{set to FALSE if orderNorm should not be applied}

\item{allow_lambert_s}{Set to FALSE if the lambertW of type "s"  should not be
applied (see details). Expect about 2-3x elapsed computing time if TRUE.}

\item{allow_lambert_h}{Set to TRUE if the lambertW of type "h"  should be
applied (see details). Expect about 2-3x elapsed computing time.}

\item{allow_exp}{Set to TRUE if the exponential transformation should be
applied (sometimes this will cause errors with heavy right skew)}

\item{out_of_sample}{if FALSE, estimates quickly in-sample performance}

\item{cluster}{name of cluster set using \code{makeCluster}}

\item{k}{number of folds}

\item{r}{number of repeats}

\item{loo}{should leave-one-out CV be used instead of repeated CV? (see
details)}

\item{warn}{Should bestNormalize warn when a method doesn't work?}

\item{quiet}{Should a progress-bar not be displayed for cross-validation
progress?}

\item{tr_opts}{a list (of lists), specifying options to be passed to each
transformation (see details)}

\item{new_transforms}{a named list of new transformation functions and their
predict methods (see details)}

\item{norm_stat_fn}{if specified, a function to calculate to assess normality
(default is the pearson chi-squared statistic divided by its d.f.)}

\item{...}{additional arguments.}

\item{object}{an object of class 'bestNormalize'}

\item{newdata}{a vector of data to be (reverse) transformed}

\item{inverse}{if TRUE, performs reverse transformation}
}
\value{
A list of class \code{bestNormalize} with elements

  \item{x.t}{transformed original data} \item{x}{original data}
  \item{norm_stats}{Pearson's Pearson's P / degrees of freedom}
  \item{method}{out-of-sample or in-sample, number of folds + repeats}
  \item{chosen_transform}{the chosen transformation (of appropriate class)}
  \item{other_transforms}{the other transformations (of appropriate class)}
  \item{oos_preds}{Out-of-sample predictions (if loo == TRUE) or
  normalization stats}

  The \code{predict} function returns the numeric value of the transformation
  performed on new data, and allows for the inverse transformation as well.
}
\description{
Performs a suite of normalizing transformations, and selects the
  best one on the basis of the Pearson P test statistic for normality. The
  transformation that has the lowest P (calculated on the transformed data)
  is selected. See details for more information.
}
\details{
\code{bestNormalize} estimates the optimal normalizing
  transformation. This transformation can be performed on new data, and
  inverted, via the \code{predict} function.

This function currently estimates the Yeo-Johnson transformation,
  the Box Cox transformation (if the data is positive), the log_10(x+a)
  transformation, the square-root (x+a) transformation, and the arcsinh
  transformation. a is set to max(0, -min(x) + eps) by default.  If
  allow_orderNorm == TRUE and if out_of_sample == FALSE then the ordered
  quantile normalization technique will likely be chosen since it essentially
  forces the data to follow a normal distribution. More information on the
  orderNorm technique can be found in the package vignette, or using
  \code{?orderNorm}.


  Repeated cross-validation is used by default to estimate the out-of-sample
  performance of each transformation if out_of_sample = TRUE. While this can
  take some time, users can speed it up by creating a cluster via the
  \code{parallel} package's \code{makeCluster} function, and passing the name
  of this cluster to \code{bestNormalize} via the cl argument. For best
  performance, we recommend the number of clusters to be set to the number of
  repeats r. Care should be taken to account for the number of observations
  per fold; to small a number and the estimated normality statistic could be
  inaccurate, or at least suffer from high variability.


  As of version 1.3, users can use leave-one-out cross-validation as well for
  each method by setting \code{loo} to \code{TRUE}.  This will take a lot of
  time for bigger vectors, but it will have the most accurate estimate of
  normalization efficacy. Note that if this method is selected, arguments
  \code{k, r} are ignored. This method will still work in parallel with the
  \code{cl} argument.


  Note that the Lambert transformation of type "h" can be done by setting
  allow_lambert_h = TRUE, however this can take significantly longer to run.

  Use \code{tr_opts} in order to set options for each transformation. For
  instance, if you want to overide the default a selection for \code{log_x},
  set \code{tr_opts$log_x = list(a = 1)}.

  See the package's vignette on how to use custom functions with
  bestNormalize. All it takes is to create an S3 class and predict method for
  the new transformation and load it into the environment, then the new
  custom function (and its predict method) can be passed to bestNormalize
  with \code{new_transform}.
}
\examples{

x <- rgamma(100, 1, 1)

\dontrun{
# With Repeated CV
BN_obj <- bestNormalize(x)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
}


\dontrun{
# With leave-one-out CV
BN_obj <- bestNormalize(x, loo = TRUE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
}

# Without CV
BN_obj <- bestNormalize(x, allow_orderNorm = FALSE, out_of_sample = FALSE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

}
\seealso{
\code{\link[bestNormalize]{boxcox}}, \code{\link{orderNorm}},
  \code{\link{yeojohnson}}
}
