% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/RePrInDT.R
\name{RePrInDT}
\alias{RePrInDT}
\title{Repeated \code{\link{PrInDT}} for specified percentage combinations}
\usage{
RePrInDT(datain, classname, ctestv=NA, N, plarge, psmall, conf.level=0.95,
       thres=0.5, stratvers=0, strat=NA, seedl=TRUE,minsplit=NA,minbucket=NA,
       valdat=datain)
}
\arguments{
\item{datain}{Input data frame with class factor variable 'classname' and the\cr
influential variables, which need to be factors or numericals (transform logicals and character variables to factors)}

\item{classname}{Name of class variable (character)}

\item{ctestv}{Vector of character strings of forbidden split results;\cr
(see function \code{\link{PrInDT}} for details.)\cr
If no restrictions exist, the default = NA is used.}

\item{N}{Number of repetitions (integer > 0)}

\item{plarge}{Vector of undersampling percentages of larger class (numerical, > 0 and <= 1)}

\item{psmall}{Vector of undersampling percentages of smaller class (numerical, > 0 and <= 1)}

\item{conf.level}{(1 - significance level) in function \code{ctree} (numerical, > 0 and <= 1);\cr
default = 0.95}

\item{thres}{Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5}

\item{stratvers}{Version of stratification;\cr
= 0: none (default),\cr
= 1: stratification according to the percentages of the values of the factor variable 'strat',\cr
> 1: stratification with minimum number 'stratvers' of observations per value of 'strat'}

\item{strat}{Name of one (!) stratification variable for undersampling (character);\cr
default = NA (no stratification)}

\item{seedl}{Should the seed for random numbers be set (TRUE / FALSE)?\cr
default = TRUE}

\item{minsplit}{Minimum number of elements in a node to be splitted;\cr
default = 20}

\item{minbucket}{Minimum number of elements in a node;\cr
default = 7}

\item{valdat}{Validation data; default = datain}
}
\value{
\describe{
\item{treesb}{best trees for the different percentage combinations; refer to an individual tree as \code{treesb[[k]]}, k = 1, ..., length(plarge)*length(psmall)}
\item{acc1st}{accuracies of best trees on full sample}
\item{acc3en}{accuracies of ensemble of 3 best trees on full sample}
\item{simp_m}{mean of permutation losses for the predictors}
}
}
\description{
The function \code{\link{PrInDT}} is called repeatedly according to all combinations of the percentages specified in the vectors 'plarge' and 
'psmall'.\cr
The relationship between the two-class factor variable 'classname' and all other factor and numerical variables
in the data frame 'datain' is optimally modeled by means of 'N' repetitions of undersampling.\cr 
The optimization citerion is the balanced accuracy on the validation sample 'valdat' (default = full input sample 'datain').\cr
The trees generated from undersampling can be restricted by rejecting 
unacceptable trees which include split results specified in the character strings of the vector 'ctestv'.\cr
The probability threshold 'thres' for the prediction of the smaller class may be specified (default = 0.5).\cr
Undersampling may be stratified in two ways by the feature 'strat'.\cr
The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.\cr

\strong{Reference}\cr Weihs, C., Buschfeld, S. 2021c. Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, 
and ranking of predictors in ensembles. arXiv:2108.05129
}
\details{
Standard output can be produced by means of \code{print(name)} or just \code{ name } as well as \code{plot(name)} where 'name' is the output data 
frame of the function.\cr
The plot function will produce a series of more than one plot. If you use R, you might want to specify \code{windows(record=TRUE)} before 
\code{plot(name)} to save the whole series of plots. In R-Studio this functionality is provided automatically.
}
\examples{
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat) # cleaned full data: no NAs
# interpretation restrictions (split exclusions)
ctestv <- rbind('ETH == {C2a, C1a}', 'MLU == {1, 3}')
N <- 51  # no. of repetitions
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
psmall <- c(0.95,1)     # percentages of the small class
plarge <- c(0.09,0.1)  # percentages of the large class
outRe <- RePrInDT(data,"real",ctestv,N,plarge,psmall,conf.level) 
outRe
plot(outRe)

}
