% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pred_validate.R
\name{pred_validate}
\alias{pred_validate}
\title{Validate an existing prediction}
\usage{
pred_validate(
  x,
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL,
  time_horizon = NULL,
  cal_plot = TRUE,
  ...
)
}
\arguments{
\item{x}{an object of class "\code{predinfo}" produced by calling
\code{\link{pred_input_info}}.}

\item{new_data}{data.frame upon which the prediction model should be
evaluated.}

\item{binary_outcome}{Character variable giving the name of the column in
\code{new_data} that represents the observed outcomes. Only relevant for
\code{x$model_type}="logistic"; leave as \code{NULL} otherwise.}

\item{survival_time}{Character variable giving the name of the column in
\code{new_data} that represents the observed survival times. Only relevant
for \code{x$model_type}="survival"; leave as \code{NULL} otherwise.}

\item{event_indicator}{Character variable giving the name of the column in
\code{new_data} that represents the observed survival indicator (1 for
event, 0 for censoring). Only relevant for \code{x$model_type}="survival";
leave as \code{NULL} otherwise.}

\item{time_horizon}{for survival models, an integer giving the time horizon
(post baseline) at which a prediction is required. Currently, this must
match a time in x$cum_hazard.}

\item{cal_plot}{indicate if a flexible calibration plot should be produced
(TRUE) or not (FALSE).}

\item{...}{further plotting arguments for the calibration plot. See Details
below.}
}
\value{
\code{\link{pred_validate}} returns an object of class
"\code{predvalidate}", with child classes per \code{model_type}. This is a
list of performance metrics, estimated by applying the existing prediction
model to the new_data. An object of class "\code{predvalidate}" is a list
containing relevant calibration and discrimination measures. For logistic
regression models, this will include calibration-intercept, calibration
slope, area under the ROC curve, R-squared, and Brier Score. For survival
models, this will include observed:expected ratio (if \code{cum_hazard} is
provided to \code{x}), calibration slope, and Harrell's C-statistic.
Optionally, a flexible calibration plot is also produced, along with a
histogram of the predicted risk distribution.
}
\description{
Validate an existing prediction model, to calculate the predictive
performance against a new (validation) dataset.
}
\details{
This function takes an existing prediction model formatted according
to \code{\link{pred_input_info}}, and calculates measures of predictive
performance on new data (e.g., within an external validation study). The
information about the existing prediction model should first be inputted by
calling \code{\link{pred_input_info}}, before passing the resulting object
to \code{pred_validate}.

\code{new_data} should be a data.frame, where each row should be an
observation (e.g. patient) and each variable/column should be a predictor
variable. The predictor variables need to include (as a minimum) all of the
predictor variables that are included in the existing prediction model
(i.e., each of the variable names supplied to
\code{\link{pred_input_info}}, through the \code{model_info} parameter,
must match the name of a variables in \code{new_data}).

Any factor variables within \code{new_data} must be converted to dummy
(0/1) variables before calling this function. \code{\link{dummy_vars}} can
help with this. See \code{\link{pred_predict}} for examples.

\code{binary_outcome}, \code{survival_time} and \code{event_indicator} are
used to specify the outcome variable(s) within \code{new_data} (use
\code{binary_outcome} if \code{x$model_type} = "logistic", or use
\code{survival_time} and \code{event_indicator} if \code{x$model_type} =
"survival").

In the case of validating a logistic regression model, this function
assesses the predictive performance of the predicted risks against an
observed binary outcome. Various metrics of calibration (agreement between
the observed risk and the predicted risks, across the full risk range) and
discrimination (ability of the model to distinguish between those who
develop the outcome and those who do not) are calculated. For calibration,
calibration-in-the-large (CITL) and calibration slopes are estimated. CITL
is estimated by fitting a logistic regression model to the observed binary
outcomes, with the linear predictor of the model as an offset. For
calibration slope, a logistic regression model is fit to the observed
binary outcome with the linear predictor from the model as the only
covariate. For discrimination, the function estimates the area under the
receiver operating characteristic curve (AUC). Various other metrics are
also calculated to assess overall accuracy (Brier score, Cox-Snell R2).

In the case of validating a survival prediction model, this function
assesses the predictive performance of the linear predictor and
(optionally) the predicted event probabilities at a fixed time horizon
against an observed time-to-event outcome. Various metrics of calibration
and discrimination are calculated. For calibration, the
observed-to-expected ratio at the specified \code{time_horizon} (if
predicted risks are available through specification of \code{x$cum_hazard})
and calibration slope are produced. For discrimination, Harrell's
C-statistic is calculated.

For both model types, a flexible calibration plot is produced (for survival
models, the cumulative baseline hazard must be available in the
\code{predinfo} object, \code{x$cum_hazard}). Specify parameter
\code{cal_plot} to indicate whether a calibration plot should be produced
(TRUE), or not (FALSE). The calibration plot is produced by regressing the
observed outcomes against a cubic spline of the logit of predicted risks
(for a logistic model) or the complementary log-log of the predicted risks
(for a survival model). A histogram of the predicted risk distribution is
displayed on the top x-axis. Users can specify parameters to modify the
calibration plot. Specifically, one can specify: \code{xlab}, \code{ylab},
\code{xlim}, and \code{ylim} to change plotting characteristics for the
calibration plot.
}
\examples{
#Example 1 - multiple existing model, with outcome specified; uses
#            an example dataset within the package
model1 <- pred_input_info(model_type = "logistic",
                          model_info = SYNPM$Existing_logistic_models)
pred_validate(x = model1,
             new_data = SYNPM$ValidationData,
             binary_outcome = "Y",
             cal_plot = FALSE)

}
\seealso{
\code{\link{pred_input_info}}
}
