% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/emfx.R
\name{emfx}
\alias{emfx}
\title{Post-estimation treatment effects for an ETWFE regressions.}
\usage{
emfx(
  object,
  type = c("simple", "group", "calendar", "event"),
  by_xvar = "auto",
  collapse = "auto",
  post_only = TRUE,
  ...
)
}
\arguments{
\item{object}{An `etwfe` model object.}

\item{type}{Character. The desired type of post-estimation aggregation.}

\item{by_xvar}{Logical. Should the results account for heterogeneous
treatment effects? Only relevant if the preceding `etwfe` call included a
specified `xvar` argument, i.e. interacted categorical covariate. The
default behaviour ("auto") is to automatically estimate heterogeneous
treatment effects for each level of `xvar` if these are detected as part
of the underlying `etwfe` model object. Users can override by setting to
either FALSE or TRUE. See the section on Heterogeneous treatment effects
below.}

\item{collapse}{Logical. Collapse the data by (period by cohort) groups
before calculating marginal effects? This trades off a loss in estimate
accuracy (typically around the 1st or 2nd significant decimal point) for a
substantial improvement in estimation time for large datasets. The default
behaviour ("auto") is to automatically collapse if the original dataset
has more than 500,000 rows. Users can override by setting either FALSE or
TRUE. Note that collapsing by group is only valid if the preceding `etwfe`
call was run with "ivar = NULL" (the default). See the section on
Performance tips below.}

\item{post_only}{Logical. Only keep post-treatment effects. All
pre-treatment effects will be zero as a mechanical result of ETWFE's
estimation setup, so the default is to drop these nuisance rows from the
dataset. But you may want to keep them for presentation reasons (e.g.,
plotting an event-study); though be warned that this is strictly
performative. This argument will only be evaluated if `type = "event"`.}

\item{...}{Additional arguments passed to
[`marginaleffects::marginaleffects`]. For example, you can pass `vcov =
FALSE` to dramatically speed up estimation times of the main marginal
effects (but at the cost of not getting any information about standard
errors; see Performance tips below). Another potentially useful
application is testing whether heterogeneous treatment effects (i.e. the
levels of any `xvar` covariate) are equal by invoking the `hypothesis`
argument, e.g. `hypothesis = "b1 = b2"`.}
}
\value{
A `slopes` object from the `marginaleffects` package.
}
\description{
Post-estimation treatment effects for an ETWFE regressions.
}
\section{Performance tips}{
 

  Under most situations, `etwfe` should complete very quickly. For its part,
  `emfx` is quite performant too and should take a few seconds or less for 
  datasets under 100k rows. However, `emfx`'s computation time does tend to
  scale non-linearly with the size of the original data, as well as the
  number of interactions from the underlying `etwfe` model. Without getting
  too deep into the weeds, the numerical delta method used to recover the
  ATEs of interest has to estimate two prediction models for *each*
  coefficient in the model and then compute their standard errors. So, it's
  a potentially expensive operation that can push the computation time for
  large datasets (> 1m rows) up to several minutes or longer.
  
  Fortunately, there are two complementary strategies that you can use to
  speed things up. The first is to turn off the most expensive part of the
  whole procedure---standard error calculation---by calling `emfx(..., vcov
  = FALSE)`. Doing so should bring the estimation time back down to a few
  seconds or less, even for datasets in excess of a million rows. While the
  loss of standard errors might not be an acceptable trade-off for projects
  where statistical inference is critical, the good news is this first
  strategy can still be combined our second strategy. It turns out that
  collapsing the data by groups prior to estimating the marginal effects can
  yield substantial speed gains of its own. Users can do this by invoking
  the `emfx(..., collapse = TRUE)` argument. While the effect here is not as
  dramatic as the first strategy, our second strategy does have the virtue
  of retaining information about the standard errors. The trade-off this
  time, however, is that collapsing our data does lead to a loss in accuracy
  for our estimated parameters. On the other hand, testing suggests that
  this loss in accuracy tends to be relatively minor, with results
  equivalent up to the 1st or 2nd significant decimal place (or even
  better).
  
  Summarizing, here's a quick plan of attack for you to try if you are
  worried about the estimation time for large datasets and models:
  
  0. Estimate `mod = etwfe(...)` as per usual.
  
  1. Run `emfx(mod, vcov = FALSE, ...)`. 
  
  2. Run `emfx(mod, vcov = FALSE, collapse = TRUE, ...)`. 
  
  3. Compare the point estimates from steps 1 and 2. If they are are similar
  enough to your satisfaction, get the approximate standard errors by
  running `emfx(mod, collapse = TRUE, ...)`.
}

\section{Heterogeneous treatment effects}{


  Specifying `etwfe(..., xvar = <xvar>)` will generate interaction effects
  for all levels of `<xvar>` as part of the main regression model. The
  reason that this is useful (as opposed to a regular, non-interacted
  covariate in the formula RHS) is that it allows us to estimate
  heterogeneous treatment effects as part of the larger ETWFE framework.
  Specifically, we can recover heterogeneous treatment effects for each
  level of `<xvar>` by passing the resulting `etwfe` model object on to 
  `emfx()`.
  
  For example, imagine that we have a categorical variable called "age" in
  our dataset, with two distinct levels "adult" and "child". Running
  `emfx(etwfe(..., xvar = age))` will tell us how the efficacy of treatment 
  varies across adults and children. We can then also leverage the in-built 
  hypothesis testing infrastructure of `marginaleffects` to test whether
  the treatment effect is statistically different across these two age
  groups; see Examples below. Note the same principles carry over to 
  categorical variables with multiple levels, or even continuous variables
  (although continuous variables are not as well supported yet).
}

\examples{
# We’ll use the mpdta dataset from the did package (which you’ll need to 
# install separately).

# install.packages("did")
data("mpdta", package = "did")

#
# Basic example
#

# The basic ETWFE workflow involves two steps:

# 1) Estimate the main regression model with etwfe().

mod = etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none)
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered by county)
    )
mod

# 2) Recover the treatment effects of interest with emfx().

emfx(mod)                 # simple average treatment effect (default)
emfx(mod, type = "event") # dynamic treatment effect a la an event study
# Etc. Other aggregation types are "group" and "calendar"


#
# Heterogeneous treatment effects
#

# Example where we estimate heterogeneous treatment effects for counties 
# within the 8 US Great Lake states (versus all other counties). 

gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27,
        "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55)

mpdta$gls = substr(mpdta$countyreal, 1, 2) \%in\% gls

hmod = etwfe(
   lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, 
   vcov = ~countyreal,
   xvar = gls           ## <= het. TEs by gls
   )

# Heterogeneous ATEs (could also specify "event", etc.) 

emfx(hmod)

# To test whether the ATEs across these two groups (non-GLS vs GLS) are 
# statistically different, simply pass an appropriate "hypothesis" argument.

emfx(hmod, hypothesis = "b1 = b2")


#
# Nonlinear model (distribution / link) families
#

# Poisson example

mpdta$emp = exp(mpdta$lemp)

etwfe(
   emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, 
   vcov = ~countyreal,
   family = "poisson"   ## <= family arg for nonlinear options
   ) |>
   emfx("event")

}
\seealso{
[marginaleffects::slopes()]
}
