% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ordinalNetCV.R
\name{ordinalNetCV}
\alias{ordinalNetCV}
\title{Uses K-fold cross validation to obtain out-of-sample log-likelihood and
misclassification rates. Lambda is tuned within each cross validation fold.}
\usage{
ordinalNetCV(x, y, lambdaVals = NULL, folds = NULL, nFolds = 5,
  nFoldsCV = 5, tuneMethod = c("cvLoglik", "cvMisclass", "aic", "bic"),
  printProgress = TRUE, warn = TRUE, ...)
}
\arguments{
\item{x}{Covariate matrix.}

\item{y}{Response variable. Can be a factor, ordered factor, or a matrix
where each row is a multinomial vector of counts. A weighted fit can be obtained
using the matrix option, since the row sums are essentially observation weights.
Non-integer matrix entries are allowed.}

\item{lambdaVals}{An optional user-specified lambda sequence (vector). If \code{NULL},
a sequence will be generated using the model fit to the full training data.
This default sequence is based on \code{nLambda} and \code{lambdaMinRatio},
which can be passed as additional arguments (otherwise \code{ordinalNet} default
values are used). The maximum lambda is the smallest value that sets all penalized
coefficients to zero, and the minimum lambda is the maximum value multiplied
by the factor \code{lambdaMinRatio}.}

\item{folds}{An optional list, where each element is a vector of row indices
corresponding to a different cross validation fold. Indices correspond to rows
of the \code{x} matrix. Each index number should be used in exactly one fold.
If \code{NULL}, the data will be randomly divided into equally-sized partitions.
It is recommended to call \code{set.seed} before calling \code{ordinalNetCV}
for reproducibility.}

\item{nFolds}{Numer of cross validation folds. Only used if \code{folds=NULL}.}

\item{nFoldsCV}{Number of cross validation folds used to tune lambda for each
training set (i.e. within each training fold). Only used of \code{tuneMethod} is
"cvLoglik" or "cvMisclass".}

\item{tuneMethod}{Method used to tune lambda for each training set (ie. within
each training fold). The "cvLoglik" and "cvMisclass" methods use K-fold cross validation
with \code{nFoldsCV} folds. "cvLoglik chooses lambda with the best average
out-of-sample log-likelihood. "cvMisclass" chooses lambda with the best
average misclassification rate. The "aic" and "bic" methods are less computationally
intensive because they do not require cross validation to select lambda.
Note that for the methods that require cross validation, the fold splits are
determined randomly and cannot be specified by the user. The \code{set.seed()}
function should be called prior to \code{ordinalNetCV} for reproducibility.}

\item{printProgress}{Logical. If \code{TRUE} the fitting progress is printed
to the terminal.}

\item{warn}{Logical. If \code{TRUE}, the following warning message is displayed
when fitting a cumulative probability model with \code{nonparallelTerms=TRUE}
(i.e. nonparallel or semi-parallel model).
"Warning message: For out-of-sample data, the cumulative probability model
with nonparallelTerms=TRUE may predict cumulative probabilities that are not
monotone increasing."
The warning is displayed by default, but the user may wish to disable it.}

\item{...}{Other arguments (besides \code{x}, \code{y}, \code{lambdaVals}, and \code{warn})
passed to \code{ordinalNet}.}
}
\value{
A list containing the following:
\describe{
  \item{loglik}{Vector of out of sample log-likelihood values. Each value
  corresponds to a different fold.}
  \item{misclass}{Vector of out of sample misclassificaton rates. Each value
  corresponds to a different fold.}
  \item{bestLambdaIndex}{The index of the value within the lambda sequence
  selected for each fold by the tuning method.}
  \item{lambdaVals}{The sequence of lambda values used for all cross validation folds.}
  \item{folds}{A list containing the index numbers of each fold.}
  \item{fit}{An object of class "\code{ordinalNetFit}", resulting from fitting
  \code{ordinalNet} to the entire dataset.}
}
}
\description{
The data is divided into K folds. \code{ordinalNet} is fit \eqn{K} times, each time
leaving out one fold as a test set. For each of the \eqn{K} model fits, lambda
can be tuned by K-fold cross validation (within each fold), or by AIC or BIC
which do not require cross validation. If cross validation is used, the user
has the option to select the lambda value with either the best average out-of-sample
log-likelihood or the best misclassification rate. Once the model is tuned,
the out of sample log-likelihood and misclassification rate are obtained from
the held out test set.
}
\details{
\itemize{
  \item The fold partition splits can be passed by the user via the \code{folds}
  argument. By default, the data are randomly divided into equally-sized partitions.
  Note that if lambda is tuned by cross validation, the fold splits are
  determined randomly and cannot be specified by the user. The \code{set.seed}
  function should be called prior to \code{ordinalNetCV} for reproducibility.
  \item A sequence of lambda values can be passed by the user via the
  \code{lambdaVals} argument. By default, the sequence is generated by first
  fitting the model to the full data set (this sequence is determined by the
  \code{nLambda} and \code{lambdaMinRatio} arguments of \code{ordinalNet}).
  \item The \code{standardize} argument of \code{ordinalNet} can be modified through
  the additional arguments (...). If \code{standardize=TRUE}, then the data are scaled
  within each cross validation fold. If \code{standardize=TRUE} and lambda is tuned by
  cross validation, then the data are also scaled within each tuning sub-fold.
  This is done because scaling is part of the statistical procedure and should
  be repeated each time the procedure is applied.
}
}
\examples{
\dontrun{
# Simulate x as independent standard normal
# Simulate y|x from a parallel cumulative logit (proportional odds) model
set.seed(1)
n <- 50
intercepts <- c(-1, 1)
beta <- c(1, 1, 0, 0, 0)
ncat <- length(intercepts) + 1  # number of response categories
p <- length(beta)  # number of covariates
x <- matrix(rnorm(n*p), ncol=p)  # n x p covariate matrix
eta <- c(x \%*\% beta) + matrix(intercepts, nrow=n, ncol=ncat-1, byrow=TRUE)
invlogit <- function(x) 1 / (1+exp(-x))
cumprob <- t(apply(eta, 1, invlogit))
prob <- cbind(cumprob, 1) - cbind(0, cumprob)
yint <- apply(prob, 1, function(p) sample(1:ncat, size=1, prob=p))
y <- as.factor(yint)

# Evaluate out-of-sample performance of the  cumulative logit model
# when lambda is tuned by cross validation (best average out-of-sample log-likelihood)
cv <- ordinalNetCV(x, y, tuneMethod="cvLoglik")
cv$loglik
cv$misclass
}

}
