% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/HARF.R
\name{HARF}
\alias{HARF}
\alias{HARF.default}
\alias{HARF.formula}
\title{High Agreement Random Forest}
\usage{
\method{HARF}{formula}(formula, data, ...)

\method{HARF}{default}(x, nfolds = 10, agreementLevel = 0.7, ntrees = 500,
  classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{nfolds}{Number of folds for the cross voting scheme.}

\item{agreementLevel}{Real number between 0.5 and 1. An instance is identified as
noise when the classification confidences provided by the random forest to the
classes that are not the actual class of the instance add up at least
\code{agreementLevel}. Authors obtain the best performance in (Sluban et al., 2010)
when setting it between 0.7 and 0.8.}

\item{ntrees}{Number of trees for the random forest.}

\item{classColumn}{Positive integer indicating the column which contains the (factor
of) classes. By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Ensemble-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
Making use of a \code{nfolds}-folds cross validation scheme, instances are
identified as noise and removed when a random forest provides little confidence for
the actual instance's label (namely, less than 1-\code{agreementLevel}). The value of
\code{agreementLevel} allows to tune the precision and recall of the filter, getting
the best trade-off when moving between 0.7 and 0.8 (Sluban et al., 2010).
}
\examples{
# Next example is not run in order to save time
\dontrun{
data(iris)
# We fix a seed since there exists a random partition for the ensemble
set.seed(1)
out <- HARF(Species~., data = iris, ntrees = 100)
print(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
}
}
\references{
Sluban B., Gamberger D., Lavrac N. (2010, August): Advances in Class
Noise Detection. In \emph{ECAI} (pp. 1105-1106).
}

