
<!-- README.md is generated from README.Rmd. Please edit that file -->

# fixes <a><img src="man/figures/logo.png" align="right" height="138" /></a>

<!-- badges: start -->

[![R-CMD-check](https://github.com/yo5uke/fixes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/yo5uke/fixes/actions/workflows/R-CMD-check.yaml)
[![CRAN
status](https://www.r-pkg.org/badges/version/fixes)](https://CRAN.R-project.org/package=fixes)
<!-- badges: end -->

## Overview

> **Note**  
> By default, the `fixes` package assumes time is a regularly spaced
> numeric variable (e.g., year = 1995, 1996, …).  
> However, if your time variable is irregular or non-numeric (e.g.,
> `Date` type), you can enable `time_transform = TRUE` to automatically
> convert it to a sequential index within each unit.  
> You can also specify unit-specific treatment timing by setting
> `staggered = TRUE`.

The `fixes` package is designed for conducting analysis and creating
plots for event studies, a method used to verify the parallel trends
assumption in two-way fixed effects (TWFE) difference-in-differences
(DID) analysis.

The package includes two main functions:

1.  `run_es()`: Accepts a data frame, generates lead and lag variables,
    and performs event study analysis. The function returns the results
    as a tidy data frame. Supports options for fixed effects,
    covariates, clustered standard errors, and staggered treatment
    timing.
2.  `plot_es()`: Creates plots using `ggplot2` based on the data frame
    generated by `run_es()`. Users can choose between a plot with
    `geom_ribbon()` or `geom_errorbar()` to visualize the results.

## Installation

You can install the package like so:

``` r
# install.packages("pak")
pak::pak("fixes")
```

or

``` r
install.packages("fixes")
```

If you want to install development version, please install from GitHub
repository:

``` r
pak::pak("yo5uke/fixes")
```

## How to use

First, load the library.

``` r
library(fixes)
```

### Data frame

The `run_es()` function is designed to work with panel data.  
The data frame must include the following variables:

1.  A unit identifier (e.g., individual, firm, region)
2.  A treatment indicator variable (0/1 or TRUE/FALSE)
3.  A time variable (numeric or `Date`)
4.  An outcome variable (continuous)

In addition, if you use `staggered = TRUE`, you must provide a variable
that indicates **unit-specific treatment timing** (e.g., the year
treatment started for each unit).

------------------------------------------------------------------------

To get started, you can use example data from widely used packages:

- `did::sim_dt()`: A simulated panel dataset commonly used in
  difference-in-differences tutorials.
- `fixest::base_stagg`: A built-in dataset designed for analyzing
  staggered adoption of treatment.

These datasets already contain the necessary structure and can be used
directly with `run_es()`.

``` r
# Load example data
df1 <- fixest::base_did      # Basic DID example
df2 <- fixest::base_stagg    # Staggered treatment example
```

|          y |         x1 |  id | period | post | treat |
|-----------:|-----------:|----:|-------:|-----:|------:|
|  2.8753063 |  0.5365377 |   1 |      1 |    0 |     1 |
|  1.8606527 | -3.0431894 |   1 |      2 |    0 |     1 |
|  0.0941652 |  5.5768439 |   1 |      3 |    0 |     1 |
|  3.7814749 | -2.8300587 |   1 |      4 |    0 |     1 |
| -2.5581996 | -5.0443544 |   1 |      5 |    0 |     1 |
|  1.7287324 | -0.6363849 |   1 |      6 |    1 |     1 |

|  | id | year | year_treated | time_to_treatment | treated | treatment_effect_true | x1 | y |
|:---|---:|---:|---:|---:|---:|---:|---:|---:|
| 2 | 90 | 1 | 2 | -1 | 1 | 0 | -1.0947021 | 0.0172297 |
| 3 | 89 | 1 | 3 | -2 | 1 | 0 | -3.7100676 | -4.5808453 |
| 4 | 88 | 1 | 4 | -3 | 1 | 0 | 2.5274402 | 2.7381717 |
| 5 | 87 | 1 | 5 | -4 | 1 | 0 | -0.7204263 | -0.6510307 |
| 6 | 86 | 1 | 6 | -5 | 1 | 0 | -3.6711678 | -5.3338166 |
| 7 | 85 | 1 | 7 | -6 | 1 | 0 | -0.3152137 | 0.4956263 |

### `run_es()`

`run_es()` takes 14 arguments, including required variables and optional
specifications like fixed effects, clustering, covariates, staggered
treatment timing, and weights.

| Argument | Description |
|----|----|
| `data` | Data frame to be used. |
| `outcome` | Outcome variable. Can be specified as a raw variable or a transformation (e.g., `log(y)`). Provide it unquoted. |
| `treatment` | Dummy variable indicating the treated units. Provide it unquoted. Accepts both `0/1` and `TRUE/FALSE`. |
| `time` | Time variable. Provide it unquoted. |
| `staggered` | Logical. If `TRUE`, allows for unit-specific treatment timing (staggered adoption). Default is `FALSE`. |
| `timing` | The time at which the treatment occurs. If `staggered = FALSE`, this should be a scalar (e.g., `2005`). If `staggered = TRUE`, provide a variable (column) indicating the treatment time for each unit. |
| `lead_range` | Number of pre-treatment periods to include (e.g., 3 = `lead3`, `lead2`, `lead1`). Default is `NULL`, which automatically uses the maximum available lead range. Set to a number to restrict the range manually. |
| `lag_range` | Number of post-treatment periods to include (e.g., 2 = `lag0` (the treatment period), `lag1`, `lag2`). Default is `NULL`, which automatically uses the maximum available lag range. Set to a number to restrict the range manually. |
| `covariates` | Additional covariates to include in the regression. **Must be a one-sided formula** (e.g., `~ x1 + x2`). |
| `fe` | Fixed effects to control for unobserved heterogeneity. **Must be a one-sided formula** (e.g., `~ id + year`). |
| `cluster` | Specifies clustering for standard errors. Can be a **character vector** (e.g., `c("id", "year")`) or a **formula** (e.g., `~ id + year`, `~ id^year`). |
| `weights` | Optional weights to be used in the regression. Provide as a one-sided formula (e.g., `~ weight`). |
| `baseline` | Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. **Must be within the specified lead/lag range.** |
| `interval` | Time interval between observations (e.g., `1` for yearly data, `5` for 5-year intervals). |
| `time_transform` | Logical. If `TRUE`, converts the `time` variable into a sequential index (1, 2, 3, …) within each unit. Useful when time is irregular, such as with `Date` values or unbalanced panels (e.g., missing years or monthly observations). Default is `FALSE`. |
| `unit` | Required if `time_transform = TRUE`. Specifies the panel unit identifier (e.g., `firm_id`). |

------------------------------------------------------------------------

#### Example: Without Covariates

``` r
event_study <- run_es(
  data       = df1, 
  outcome    = y, 
  treatment  = treat, 
  time       = period, 
  timing     = 6, 
  lead_range = 5, 
  lag_range  = 4, 
  fe         = ~ id + period, 
  cluster    = ~ id, 
  baseline   = -1, 
  interval   = 1
)
```

***Note:*** The `fe` argument must be specified as a one-sided formula
(e.g., `~ firm_id + year`).  
The `cluster` argument can be specified either as a one-sided formula
(e.g., `~ state_id`) or as a character vector (e.g.,
`c("firm_id", "year")`).

The `run_es()` function returns a tidy data frame that includes
estimated event-study coefficients, confidence intervals, relative
timing values, and an indicator for the omitted baseline period.  
Estimation is performed using fast and flexible fixed effects
regression.

#### Example: With Covariates

If your dataset includes additional covariates, you can include them in
the regression by specifying a one-sided formula using the `covariates`
argument, as shown below.

``` r
event_study <- run_es(
  data       = df1, 
  outcome    = y, 
  treatment  = treat, 
  time       = period, 
  timing     = 6, 
  lead_range = 5, 
  lag_range  = 4, 
  covariates = ~ cov1 + cov2 + cov3, 
  fe         = ~ id + period, 
  cluster    = ~ id, 
  baseline   = -1, 
  interval   = 1
)
```

``` r
# Example using Date-type time variable and time_transform
df_alt <- df1 |>
  dplyr::mutate(
    year = rep(2001:2010, times = 108),  # 108 units × 10 periods
    date = as.Date(paste0(year, "-01-01"))
  )

event_study_alt <- run_es(
  data           = df_alt,
  outcome        = y,
  treatment      = treat,
  time           = date,
  timing         = 19,  # Corresponds to 19th time point in each unit
  lead_range     = 3,
  lag_range      = 3,
  fe             = ~ id + period,
  cluster        = ~ id,
  baseline       = -1,
  time_transform = TRUE,
  unit           = id
)
```

> **Note:**  
> When `time_transform = TRUE`, the `timing` argument must be specified
> using the transformed index (e.g., `timing = 19` for the 19th time
> point within each unit).  
> Support for specifying the original time values (e.g., a specific
> `Date`) directly as `timing` is planned for a future update.  
> Currently, `time_transform = TRUE` cannot be combined with
> `staggered = TRUE`. This combination is not yet supported, but may be
> implemented in a future release.

------------------------------------------------------------------------

You can use this result to create custom plots, or take advantage of the
built-in `plot_es()` function to visualize the estimates and confidence
intervals with minimal code.

### `plot_es()`

The `plot_es()` function creates a plot based on `ggplot2`.

`plot_es()` has 12 arguments.

| Arguments | Description |
|----|----|
| data | Data frame created by `run_es()` |
| type | The type of confidence interval visualization: “ribbon” (default) or “errorbar” |
| vline_val | The x-intercept for the vertical reference line (default: 0) |
| vline_color | Color for the vertical reference line (default: “\#000”) |
| hline_val | The y-intercept for the horizontal reference line (default: 0) |
| hline_color | Color for the horizontal reference line (default: “\#000”) |
| linewidth | The width of the lines for the plot (default: 1) |
| pointsize | The size of the points for the estimates (default: 2) |
| alpha | The transparency level for ribbons (default: 0.2) |
| barwidth | The width of the error bars (default: 0.2) |
| color | The color for the lines and points (default: “\#B25D91FF”) |
| fill | The fill color for ribbons (default: “\#B25D91FF”). |

If you don’t care about the details, you can just pass the data frame
created with `run_es()` and the plot will be complete.

``` r
plot_es(event_study)
```

![](README_files/figure-gfm/unnamed-chunk-6-1.png)<!-- -->

``` r
plot_es(event_study, type = "errorbar")
```

![](README_files/figure-gfm/unnamed-chunk-7-1.png)<!-- -->

``` r
plot_es(event_study, type = "errorbar", vline_val = -.5)
```

![](README_files/figure-gfm/unnamed-chunk-8-1.png)<!-- -->

Since it is created on a `ggplot2` basis, it is possible to modify minor
details.

``` r
plot_es(event_study, type = "errorbar") + 
  ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) + 
  ggplot2::ggtitle("Result of Event Study")
```

![](README_files/figure-gfm/unnamed-chunk-9-1.png)<!-- -->

## Planned Features

- Support for `staggered = TRUE` with `time_transform = TRUE`
  - Enable automatic alignment of treatment dates with transformed time
    indices, allowing analysis with irregular time variables (e.g.,
    `Date`) in staggered adoption settings.
- Allow `timing` to accept original time values (e.g., specific `Date`s)
  - Instead of manually calculating the time index (e.g.,
    `timing = 19`), users will be able to specify a `Date` or other
    original time value directly. This will simplify workflow when
    `time_transform = TRUE`.

## Debugging

If you find an issue, please report it on the GitHub Issues page.
