% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/estimatr_lm_robust.R
\name{lm_robust}
\alias{lm_robust}
\title{Ordinary Least Squares with Robust Standard Errors}
\usage{
lm_robust(formula, data, weights, subset, clusters, se_type = NULL,
  ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE)
}
\arguments{
\item{formula}{an object of class formula, as in \code{\link{lm}}}

\item{data}{A \code{data.frame}}

\item{weights}{the bare (unquoted) names of the weights variable in the
supplied data.}

\item{subset}{An optional bare (unquoted) expression specifying a subset
of observations to be used.}

\item{clusters}{An optional bare (unquoted) name of the variable that
corresponds to the clusters in the data.}

\item{se_type}{The sort of standard error sought. If `clusters` is
not specified the options are "HC0", "HC1" (or "stata", the equivalent),
 "HC2" (default), "HC3", or
"classical". If `clusters` is specified the options are "CR0", "CR2" (default), or "stata" are
permissible.}

\item{ci}{logical. Whether to compute and return p-values and confidence
intervals, TRUE by default.}

\item{alpha}{The significance level, 0.05 by default.}

\item{return_vcov}{logical. Whether to return the variance-covariance
matrix for later usage, TRUE by default.}

\item{try_cholesky}{logical. Whether to try using a Cholesky
decomposition to solve least squares instead of a QR decomposition,
FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only
be used if users are sure their model is full-rank (i.e., there is no
perfect multi-collinearity)}
}
\value{
An object of class \code{"lm_robust"}.

The post-estimation commands functions \code{summary} and \code{\link{tidy}}
return results in a \code{data.frame}. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions \code{coef}, \code{vcov},
\code{confint}, and \code{predict}. Marginal effects and uncertainty about
them can be gotten by passing this object to
\code{\link[margins]{margins}} from the \pkg{margins}.

Users who want to print the results in TeX of HTML can use the
\code{\link[texreg]{extract}} function and the \pkg{texreg} package.

An object of class \code{"lm_robust"} is a list containing at least the
following components:
  \item{coefficients}{the estimated coefficients}
  \item{se}{the estimated standard errors}
  \item{df}{the estimated degrees of freedom}
  \item{p}{the p-values from a two-sided t-test using \code{coefficients}, \code{se}, and \code{df}}
  \item{ci_lower}{the lower bound of the \code{1 - alpha} percent confidence interval}
  \item{ci_upper}{the upper bound of the \code{1 - alpha} percent confidence interval}
  \item{coefficient_name}{a character vector of coefficient names}
  \item{alpha}{the significance level specified by the user}
  \item{res_var}{the residual variance}
  \item{N}{the number of observations used}
  \item{k}{the number of columns in the design matrix (includes linearly dependent columns!)}
  \item{rank}{the rank of the fitted model}
  \item{vcov}{the fitted variance covariance matrix}
  \item{r.squared}{The \eqn{R^2},
  \deqn{R^2 = 1 - Sum(e[i]^2) / Sum((y[i] - y^*)^2),} where \eqn{y^*}
  is the mean of \eqn{y[i]} if there is an intercept and zero otherwise,
  and \eqn{e[i]} is the ith residual.}
  \item{adj.r.squared}{The \eqn{R^2} but penalized for having more parameters, \code{rank}}
  \item{weighted}{whether or not weights were applied}
  \item{call}{the original function call}
We also return \code{terms} and \code{contrasts}, used by \code{predict}.
}
\description{
This formula fits a linear model, provides a variety of
options for robust standard errors, and conducts coefficient tests
}
\details{
This function performs linear regression and provides a variety of standard
errors. It takes a formula and data much in the same was as \code{\link{lm}}
does, and all auxiliary variables, such as clusters and weights, can be
passed either as quoted names of columns, as bare column names, or
as a self-contained vector. Examples of usage can be seen below and in the
\href{http://estimatr.declaredesign.org/articles/getting-started.html}{Getting Started vignette}.

The technical notes in
\href{http://estimatr.declaredesign.org/articles/technical-notes.html}{this vignette}
specify the exact estimators used by this function.
The default variance estimators have been chosen largely in accordance with the
procedures in
\href{https://github.com/acoppock/Green-Lab-SOP/blob/master/Green_Lab_SOP.pdf}{this manual}.
The default for the case
without clusters is the HC2 estimator and the default with clusters is the
analogous CR2 estimator. Users can easily replicate Stata standard errors in
the clustered or non-clustered case by setting \code{`se_type` = "stata"}.

The function estimates the coefficients and standard errors in C++, using
the \code{RcppEigen} package. By default, we estimate the coefficients
using Column-Pivoting QR decomposition from the Eigen C++ library, although
users could get faster solutions by setting \code{`try_cholesky` = TRUE} to
use a Cholesky decomposition instead. This will likely result in quicker
solutions, but the algorithm does not reliably detect when there are linear
dependencies in the model and may fail silently if they exist.
}
\examples{
library(fabricatr)
dat <- fabricate(
  N = 40,
  y = rpois(N, lambda = 4),
  x = rnorm(N),
  z = rbinom(N, 1, prob = 0.4)
)

# Default variance estimator is HC2 robust standard errors
lmro <- lm_robust(y ~ x + z, data = dat)

# Can tidy() the data in to a data.frame
tidy(lmro)
# Can use summary() to get more statistics
summary(lmro)
# Can also get coefficients three ways
lmro$coefficients
coef(lmro)
tidy(lmro)$coefficients
# Can also get confidence intervals from object or with new 1 - `alpha`
lmro$ci_lower
confint(lmro, level = 0.8)

# Can recover classical standard errors
lmclassic <- lm_robust(y ~ x + z, data = dat, se_type = "classical")
tidy(lmclassic)

# Can easily match Stata's robust standard errors
lmstata <- lm_robust(y ~ x + z, data = dat, se_type = "stata")
tidy(lmstata)

# Easy to specify clusters for cluster-robust inference
dat$clusterID <- sample(1:10, size = 40, replace = TRUE)

lmclust <- lm_robust(y ~ x + z, data = dat, clusters = clusterID)
tidy(lmclust)

# Can also match Stata's clustered standard errors
lm_robust(
  y ~ x + z,
  data = dat,
  clusters = clusterID,
  se_type = "stata"
)

# Works just as LM does with functions in the formula
dat$blockID <- rep(c("A", "B", "C", "D"), each = 10)

lm_robust(y ~ x + z + factor(blockID), data = dat)

# Weights are also easily specified
dat$w <- runif(40)

lm_robust(
  y ~ x + z,
  data = dat,
  weights = w,
  clusters = clusterID
)

# Subsetting works just as in `lm()`
lm_robust(y ~ x, data = dat, subset = z == 1)

# One can also choose to set the significance level for different CIs
lm_robust(y ~ x + z, data = dat, alpha = 0.1)

\dontrun{
  # Can also use 'margins' package if you have it installed to get
  # marginal effects
  library(margins)
  lmrout <- lm_robust(y ~ x + z, data = dat)
  summary(margins(lmrout))

  # Can output results using 'texreg'
  library(texreg)
  texregobj <- extract(lmrout)
}

}
\references{
Abadie, Alberto, Susan Athey, Guido W Imbens, and Jeffrey Wooldridge. 2017. "A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments." arXiv Pre-Print. \url{https://arxiv.org/abs/1710.02926v2}.

Bell, Robert M, and Daniel F McCaffrey. 2002. "Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples." Survey Methodology 28 (2): 169-82.

MacKinnon, James, and Halbert White. 1985. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics 29 (3): 305-25. \url{https://doi.org/10.1016/0304-4076(85)90158-7}.

Pustejovsky, James E, and Elizabeth Tipton. 2016. "Small Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models." Journal of Business & Economic Statistics. Taylor & Francis. \url{https://doi.org/10.1080/07350015.2016.1247004}.

Samii, Cyrus, and Peter M Aronow. 2012. "On Equivalencies Between Design-Based and Regression-Based Variance Estimators for Randomized Experiments." Statistics and Probability Letters 82 (2). \url{https://doi.org/10.1016/j.spl.2011.10.024}.
}
