% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/generate_constraints.R
\name{generate_constraints}
\alias{generate_constraints}
\title{Generate constraints to encourage covariate balance}
\usage{
generate_constraints(
  balance_formulas,
  z,
  data,
  default_rhs = NULL,
  weight_by_size = 0,
  denom_variance = "treated",
  autogen_missing = 50
)
}
\arguments{
\item{balance_formulas}{a list of formulas where the left hand side represents
the covariate to be balanced, and the terms on the right hand side represent
the groups within which the covariate should be balanced. More information can
be found in the details below.}

\item{z}{a treatment indicator vector with \code{i}th entry equal to 0 if
unit \code{i} is a control and equal to 1 if unit \code{i} is treated.}

\item{data}{a data frame containing the relevant covariates in the columns. The number
of rows should equal the length of \code{treated}.}

\item{default_rhs}{the list of \code{balance_formulas} can also contain entries
that are just the character corresponding to a covariate to balance. If so,
the covariate will be balanced according to \code{default_rhs}.}

\item{weight_by_size}{numeric between 0 and 1 stating how to adjust constraints
for the size of the groups they represent. Default is 0, meaning imbalance
within groups is viewed in absolute terms, not relative to the group size.
The program may thus prioritize
balancing the covariate in larger groups compared to smaller groups. A value
of 1 means that imbalance will be measured relative to the group's size, not
in absolute terms, implying that it is equally important to balance in every group.}

\item{denom_variance}{character stating what variance to use in the standardization:
either the default "treated", meaning the standardization will use the
treated variance (across all strata), or "pooled", meaning
the standardization will use the average of the treated and control variances.}

\item{autogen_missing}{whether to automatically generate missingness constraints
and how heavily to prioritize them. Should be a numeric
or \code{NULL}. \code{NULL} indicates that
constraints to balance the rate of missingness (denoted by \code{NA}s
in \code{data}) should not be automatically generated. Note that this is not
recommended unless the user has already accounted for missing values.
If not \code{NULL}, \code{autogen_missing} should be a numeric stating how heavily
to prioritize generated missingness constraints over covariate constraints.
The default is 50.}
}
\value{
A list with two named components:
\describe{
  \item{\code{X}}{a matrix with constraints as columns and the same number of rows as the inputs.
  The column names provide information about the constraints, including the covariate
  names and the factor and level to which it pertains.}
  \item{\code{importances}}{a named vector with names corresponding to the constraint names
  and values corresponding to how heavily that constraint should be prioritized,
  based on the information provided through \code{balance_formulas}, \code{weight_by_size},
  and \code{autogen_missing}.}
  }
}
\description{
This function generates constraints that encourage covariate balance as specified.
The main inputs are formula like objects, where the left hand side indicates
the covariate to be balanced and the right hand side indicates the
groups within which to balance. The constraints are
weighted and standardized by \code{\link{stand}()} to be used in \code{\link{optimize_controls}()}. Missingness
indicators can also be added and weighted for any covariate that has \code{NA} values.
}
\section{Details}{

  The \code{balance_formulas} argument can include formulas beyond those interpreted
  by \code{R} to be \code{formulas}. Their interpretation is also different, as
  explained below:

\describe{
\item{Left hand side}{The left hand side of the formula contains the covariate
  to be balanced. It can also be the sum of multiple covariates, in which case
  each term will be balanced individually according to the right hand side. Additionally,
  '.' on the left hand side will designate that all covariates in \code{data}
  should be balanced according to the designated or default right hand side
  (as usual, terms may be subtracted to remove them).}
\item{Right hand side}{The right hand side should be the sum of factor, character,
  or boolean variables. The covariate of the left hand side will be balanced within
  each level of each term on the right hand side. The right hand side can also
  contain '.', meaning the covariate will be balanced across all levels of all
  categorical, character, or boolean variables found in \code{data} (as usual,
  terms may be subtracted to remove them). In the most common case, the user
  will have one term on the right hand side consisting of the strata within
  which balance in desired.}
\item{Coefficients}{The formulas can contain coefficients specifying how much
  to weight a certain set of constraints. Coefficients of the left hand side terms will
  weight all constraints generated for that covariate, and coefficients of the
  right hand side will weight the constraints generated for each level of that
  term.}
\item{Intercept}{The intercept term, 1, is automatically included on the right
  hand side of the formula, and designates that the covariate of the left hand side
  will be balanced across all control units. You may enter a different numeric > 0
  that will signify how much to weight the constraint, or you may enter "- 1" or "+ 0"
  to remove the intercept and its associated constraint, as per usual.}}
}

\examples{
data('nh0506')

# Create strata
age_cat <- cut(nh0506$age,
               breaks = c(19, 39, 50, 85),
               labels = c('< 40 years', '40 - 50 years', '> 50 years'))
strata <- age_cat : nh0506$sex

# Balance age, race, education, poverty ratio, and bmi both across and within the levels of strata
constraints <- generate_constraints(
                 balance_formulas = list(age + race + education + povertyr + bmi ~ 1 + strata),
                 z = nh0506$z,
                 data = nh0506)

# Balance age and race both across and within the levels of strata,
# with balance for race being prioritized twice as much as for age,
# and balance across strata prioritized twice as much as within.
# Balance education across and within strata,
# with balance within strata prioritized twice as much as across.
# Balance poverty ratio and bmi only within the levels of strata,
# as specified in the default_rhs argument
constraints <- generate_constraints(
                 balance_formulas = list(age + 2 * race ~ 2 + strata,
                                         education ~ 1 + 2 * strata,
                                         'povertyr',
                                         'bmi'),
                 z = nh0506$z,
                 data = nh0506,
                 default_rhs = '0 + strata')

}
