% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stepwise-selection.R
\name{stepwise_model_selection}
\alias{stepwise_model_selection}
\title{Select and fit a model using stepwise regression}
\usage{
stepwise_model_selection(
  survey_design,
  outcome_variable,
  predictor_variables,
  model_type = "binary-logistic",
  max_iterations = 100L,
  alpha_enter = 0.5,
  alpha_remain = 0.5
)
}
\arguments{
\item{survey_design}{A survey design object created with the \code{survey} package.}

\item{outcome_variable}{The name of an outcome variable to use as the dependent variable.}

\item{predictor_variables}{A list of names of variables to consider as predictors for the model.}

\item{model_type}{A character string describing the type of model to fit.
\code{'binary-logistic'} for a binary logistic regression,
\code{'ordinal-logistic'} for an ordinal logistic regression (cumulative proportional-odds),
\code{'normal'} for the typical model which assumes residuals follow a Normal distribution.}

\item{max_iterations}{Maximum number of iterations to try adding new variables to the model.}

\item{alpha_enter}{The maximum p-value allowed for a variable to be added to the model.
Large values such as 0.5 or greater are recommended to reduce the bias
of estimates from the selected model.}

\item{alpha_remain}{The maximum p-value allowed for a variable to remain in the model.
Large values such as 0.5 or greater are recommended to reduce the bias
of estimates from the selected model.}
}
\value{
An object of class \code{\link[survey]{svyglm}} representing
a regression model fit using the 'survey' package.
}
\description{
A regression model is selected by iteratively adding and removing variables based on the p-value from a
likelihood ratio rest. At each stage, a single variable is added to the model if
the p-value of the likelihood ratio test from adding the variable is below \code{alpha_enter}
and its p-value is less than that of all other variables not already in the model.
Next, of the variables already in the model, the variable with the largest p-value
is dropped if its p-value is greater than \code{alpha_remain}. This iterative process
continues until a maximum number of iterations is reached or until
either all variables have been added to the model or there are no variables
not yet in the model whose likelihood ratio test has a p-value below \code{alpha_enter}. \cr

Stepwise model selection generally invalidates inferential statistics
such as p-values, standard errors, or confidence intervals and leads to
overestimation of the size of coefficients for variables included in the selected model.
This bias increases as the value of \code{alpha_enter} or \code{alpha_remain} decreases.
The use of stepwise model selection should be limited only to
reducing a large list of candidate variables for nonresponse adjustment.
}
\details{
See Lumley and Scott (2017) for details of how regression models are fit to survey data.
For overall tests of variables, a Rao-Scott Likelihood Ratio Test is conducted
(see section 4 of Lumley and Scott (2017) for statistical details)
using the function \code{regTermTest(method = "LRT", lrt.approximation = "saddlepoint")}
from the 'survey' package.

See Sauerbrei et al. (2020) for a discussion of statistical issues with using stepwise model selection.
}
\examples{
library(survey)

# Load example data and prepare it for analysis
data(involvement_survey_str2s, package = 'nrba')

involvement_survey <- svydesign(
  data = involvement_survey_str2s,
  ids = ~ SCHOOL_ID + UNIQUE_ID,
  fpc = ~ N_SCHOOLS_IN_DISTRICT + N_STUDENTS_IN_SCHOOL,
  strata = ~ SCHOOL_DISTRICT,
  weights = ~ BASE_WEIGHT
)

involvement_survey <- involvement_survey |>
    transform(WHETHER_PARENT_AGREES = factor(WHETHER_PARENT_AGREES))

# Fit a regression model using stepwise selection
selected_model <- stepwise_model_selection(
  survey_design = involvement_survey,
  outcome_variable = "WHETHER_PARENT_AGREES",
  predictor_variables = c("STUDENT_RACE", "STUDENT_DISABILITY_CATEGORY"),
  model_type = "binary-logistic",
  max_iterations = 100,
  alpha_enter = 0.5,
  alpha_remain = 0.5
)
}
\references{
\itemize{
\item Lumley, T., & Scott A. (2017). Fitting Regression Models to Survey Data. Statistical Science 32 (2) 265 - 278. https://doi.org/10.1214/16-STS605
\item Sauerbrei, W., Perperoglou, A., Schmid, M. et al. (2020). State of the art in selection of variables and functional forms in multivariable analysis - outstanding issues. Diagnostic and Prognostic Research 4, 3. https://doi.org/10.1186/s41512-020-00074-3
}
}
