% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ATE_internal.R
\name{ATE_internal}
\alias{ATE_internal}
\title{Estimating the Average Treatment Effect (ATE) in an internal target population using multi-source data}
\usage{
ATE_internal(
  X,
  Y,
  S,
  A,
  cross_fitting = FALSE,
  replications = 10L,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(),
  outcome_model_args = list(),
  show_progress = TRUE
)
}
\arguments{
\item{X}{Data frame (or matrix) containing the covariate data in the multi-source data. It should have \eqn{n} rows and \eqn{p} columns. Character variables will be converted to factors.}

\item{Y}{Vector of length \eqn{n} containing the outcome.}

\item{S}{Vector of length \eqn{n} containing the source indicator. If \code{S} is a factor, it will maintain its level order; otherwise it will be converted to a factor with the default level order. The order will be carried over to the outputs and plots.}

\item{A}{Vector of length \eqn{n} containing the binary treatment (1 for treated and 0 for untreated).}

\item{cross_fitting}{Logical specifying whether sample splitting and cross fitting should be used.}

\item{replications}{Integer specifying the number of sample splitting and cross fitting replications to perform, if \code{cross_fitting = TRUE}. The default is \code{10L}.}

\item{source_model}{Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "\code{MN.glmnet}" (default) and "\code{MN.nnet}", which use \pkg{glmnet} and \pkg{nnet} respectively.}

\item{source_model_args}{List specifying the arguments for the source model (in \pkg{glmnet} or \pkg{nnet}).}

\item{treatment_model_type}{Character string specifying how the treatment model is estimated. Options include "\code{separate}" (default) and "\code{joint}". If "\code{separate}", the treatment model (i.e., \eqn{P(A=1|X, S=s)}) is estimated by regressing \eqn{A} on \eqn{X} within each specific internal population \eqn{S=s}. If "\code{joint}", the treatment model is estimated by regressing \eqn{A} on \eqn{X} and \eqn{S} using the multi-source population.}

\item{treatment_model_args}{List specifying the arguments for the treatment model (in \pkg{SuperLearner}).}

\item{outcome_model_args}{List specifying the arguments for the outcome model  (in \pkg{SuperLearner}).}

\item{show_progress}{Logical specifying whether to print a progress bar for the cross-fit replicates completed, if \code{cross_fitting = TRUE}.}
}
\value{
An object of class "ATE_internal". This object is a list with the following elements:
  \item{df_dif}{A data frame containing the treatment effect (mean difference) estimates for the internal populations.}
  \item{df_A0}{A data frame containing the potential outcome mean estimates under A = 0 for the internal populations.}
  \item{df_A1}{A data frame containing the potential outcome mean estimates under A = 1 for the internal populations.}
  \item{fit_outcome}{Fitted outcome model.}
  \item{fit_source}{Fitted source model.}
  \item{fit_treatment}{Fitted treatment model(s).}
}
\description{
Doubly-robust and efficient estimator for the ATE in each internal target population using multi-source data.
}
\details{
\strong{Data structure:}

The multi-source dataset consists the outcome \code{Y}, source \code{S}, treatment \code{A}, and covariates \code{X} (\eqn{n \times p}) in the internal populations. The data sources can be trials, observational studies, or a combination of both.

\strong{Estimation of nuissance parameters:}

The following models are fit:
\itemize{
\item Propensity score model: \eqn{\eta_a(X)=P(A=a|X)}. We perform the decomposition \eqn{P(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X)} and estimate \eqn{P(A=1|X, S=s)} (i.e., the treatment model) and \eqn{q_s(X)=P(S=s|X)} (i.e., the source model).
\item Outcome model: \eqn{\mu_a(X)=E(Y|X, A=a)}
}
The models are estimated by \pkg{SuperLearner} with the exception of the source model which is estimated by \pkg{glmnet} or \pkg{nnet}.


\strong{ATE estimation:}

The ATE estimator is
\deqn{
 \dfrac{\widehat \kappa}{n}\sum\limits_{i=1}^{n} \Bigg[ I(S_i = s) \widehat \mu_a(X_i)
 +I(A_i = a) \dfrac{\widehat q_{s}(X_i)}{\widehat \eta_a(X_i)}  \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg],
}
where \eqn{\widehat \kappa=\{n^{-1} \sum_{i=1}^n I(S_i=s)\}^{-1}}. The estimator is doubly robust and non-parametrically efficient.

To achieve non-parametric efficiency and asymptotic normality, it requires that \eqn{||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q_s(X) -q_s(X)||\big\}=o_p(n^{-1/2})}.
In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.

When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.
}
\examples{
\donttest{
ai <- ATE_internal(
  X = dat_multisource[, 1:10],
  Y = dat_multisource$Y,
  S = dat_multisource$S,
  A = dat_multisource$A,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  outcome_model_args = list(
    family = gaussian(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  )
)
}

}
\references{
Robertson, S.E., Steingrimsson, J.A., Joyce, N.R., Stuart, E.A., & Dahabreh, I.J. (2021). \emph{Center-specific causal inference with multicenter trials: Reinterpreting trial evidence in the context of each participating center}. arXiv preprint arXiv:2104.05905.
}
