% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/CoBC.R
\name{coBC}
\alias{coBC}
\title{General Interface for CoBC model}
\usage{
coBC(learner, N = 3, perc.full = 0.7, u = 100, max.iter = 50)
}
\arguments{
\item{learner}{model from parsnip package for training a supervised base classifier
using a set of instances. This model need to have probability predictions in classification mode}

\item{N}{The number of classifiers used as committee members. All these classifiers
are trained using the \code{gen.learner} function. Default is 3.}

\item{perc.full}{A number between 0 and 1. If the percentage
of new labeled examples reaches this value the self-labeling process is stopped.
Default is 0.7.}

\item{u}{Number of unlabeled instances in the pool. Default is 100.}

\item{max.iter}{Maximum number of iterations to execute in the self-labeling process.
Default is 50.}
}
\value{
(When model fit) A list object of class "coBC" containing:
\describe{
\item{model}{The final \code{N} base classifiers trained using the enlarged labeled set.}
\item{model.index}{List of \code{N} vectors of indexes related to the training instances
used per each classifier. These indexes are relative to the \code{y} argument.}
\item{instances.index}{The indexes of all training instances used to
train the \code{N} models. These indexes include the initial labeled instances
and the newly labeled instances. These indexes are relative to the \code{y} argument.}
\item{model.index.map}{List of three vectors with the same information in \code{model.index}
but the indexes are relative to \code{instances.index} vector.}
\item{classes}{The levels of \code{y} factor in classification.}
\item{pred}{The function provided in the \code{pred} argument.}
\item{pred.pars}{The list provided in the \code{pred.pars} argument.}
}
}
\description{
Co-Training by Committee (CoBC) is a semi-supervised learning algorithm
with a co-training style. This algorithm trains \code{N} classifiers with the learning
scheme defined in the \code{learner} argument using a reduced set of labeled examples. For
each iteration, an unlabeled
example is labeled for a classifier if the most confident classifications assigned by the
other \code{N-1} classifiers agree on the labeling proposed. The unlabeled examples
candidates are selected randomly from a pool of size \code{u}.
The final prediction is the average of the estimates of the N regressors.
}
\details{
For regression tasks, labeling data is very expensive computationally. Its so slow.
This method trains an ensemble of diverse classifiers. To promote the initial diversity
the classifiers are trained from the reduced set of labeled examples by Bagging.
The stopping criterion is defined through the fulfillment of one of the following
criteria: the algorithm reaches the number of iterations defined in the \code{max.iter}
parameter or the portion of unlabeled set, defined in the \code{perc.full} parameter,
is moved to the enlarged labeled set of the classifiers.
}
\examples{
library(tidyverse)
library(tidymodels)
library(caret)
library(SSLR)

data(wine)

set.seed(1)
train.index <- createDataPartition(wine$Wine, p = .7, list = FALSE)
train <- wine[ train.index,]
test  <- wine[-train.index,]

cls <- which(colnames(wine) == "Wine")

#\% LABELED
labeled.index <- createDataPartition(wine$Wine, p = .2, list = FALSE)
train[-labeled.index,cls] <- NA

#We need a model with probability predictions from parsnip
#https://tidymodels.github.io/parsnip/articles/articles/Models.html
#It should be with mode = classification

#For example, with Random Forest
rf <-  rand_forest(trees = 100, mode = "classification") \%>\%
  set_engine("randomForest")


m <- coBC(learner = rf,N = 3,
          perc.full = 0.7,
          u = 100,
          max.iter = 3) \%>\% fit(Wine ~ ., data = train)

#Accuracy
predict(m,test) \%>\%
  bind_cols(test) \%>\%
  metrics(truth = "Wine", estimate = .pred_class)

}
\references{
Avrim Blum and Tom Mitchell.\cr
\emph{Combining labeled and unlabeled data with co-training.}\cr
In Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, pages 92-100, New York, NY, USA, 1998. ACM.
ISBN 1-58113-057-0. doi: 10.1145/279943.279962.\cr\cr
Mohamed Farouk Abdel-Hady, Mohamed Farouk Abdel-Hady and Günther Palm.\cr
\emph{Semi-supervised Learning for Regression with Cotraining by Committee}\cr
Institute of Neural Information Processing
University of Ulm
D-89069 Ulm, Germany
}
