% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method_nn.R
\name{method_nn}
\alias{method_nn}
\title{Mass Imputation Using Nearest Neighbours Matching Method}
\usage{
method_nn(
  y_nons,
  X_nons,
  X_rand,
  svydesign,
  weights = NULL,
  family_outcome = NULL,
  start_outcome = NULL,
  vars_selection = FALSE,
  pop_totals = NULL,
  pop_size = NULL,
  control_outcome = control_out(),
  control_inference = control_inf(),
  verbose = FALSE,
  se = TRUE
)
}
\arguments{
\item{y_nons}{target variable from non-probability sample}

\item{X_nons}{a \code{model.matrix} with auxiliary variables from non-probability sample}

\item{X_rand}{a \code{model.matrix} with auxiliary variables from non-probability sample}

\item{svydesign}{a svydesign object}

\item{weights}{case / frequency weights from non-probability sample}

\item{family_outcome}{a placeholder (not used in \code{method_nn})}

\item{start_outcome}{a placeholder (not used in \code{method_nn})}

\item{vars_selection}{whether variable selection should be conducted}

\item{pop_totals}{a placeholder (not used in \code{method_nn})}

\item{pop_size}{population size from the \code{nonprob} function}

\item{control_outcome}{controls passed by the \code{control_out} function}

\item{control_inference}{controls passed by the \code{control_inf} function}

\item{verbose}{parameter passed from the main \code{nonprob} function}

\item{se}{whether standard errors should be calculated}
}
\value{
an \code{nonprob_method} class which is a \code{list} with the following entries

\describe{
\item{model_fitted}{\code{RANN::nn2} object}
\item{y_nons_pred}{predicted values for the non-probablity sample (query to itself)}
\item{y_rand_pred}{predicted values for the probability sample}
\item{coefficients}{coefficients for the model (if available)}
\item{svydesign}{an updated \code{surveydesign2} object (new column \code{y_hat_MI} is added)}
\item{y_mi_hat}{estimated population mean for the target variable}
\item{vars_selection}{whether variable selection was performed (not implemented, for further development)}
\item{var_prob}{variance for the probability sample component (if available)}
\item{var_nonprob}{variance for the non-probability sample component}
\item{var_tot}{total variance, if possible it should be \code{var_prob+var_nonprob} if not, just a scalar}
\item{model}{model type (character \code{"nn"})}
\item{family}{placeholder for the \verb{NN approach} information}
}
}
\description{
Mass imputation using nearest neighbours approach as described in Yang et al. (2021).
The implementation is currently based on \link[RANN:nn2]{RANN::nn2} function and thus it uses
Euclidean distance for matching units from \eqn{S_A} (non-probability) to \eqn{S_B} (probability).
Estimation of the mean is done using \eqn{S_B} sample.
}
\details{
Analytical variance

The variance of the mean is estimated based on the following approach

(a) non-probability part  (\eqn{S_A} with size \eqn{n_A}; denoted as \code{var_nonprob} in the result)

This may be estimated using

\deqn{
\hat{V}_1 = \frac{1}{N^2}\sum_{i=1}^{S_A}\frac{1-\hat{\pi}_B(\boldsymbol{x}_i)}{\hat{\pi}_B(\boldsymbol{x}_i)}\hat{\sigma}^2(\boldsymbol{x}_i),
}

where \eqn{\hat{\pi}_B(\boldsymbol{x}_i)} is an estimator of propensity scores which
we currently estimate using \eqn{n_A/N} (constant) and \eqn{\hat{\sigma}^2(\boldsymbol{x}_i)} is
estimated using based on the average of \eqn{(y_i - y_i^*)^2}.

Chlebicki et al. (2025, Algorithm 2) proposed non-parametric mini-bootstrap estimator
(without assuming that it is consistent) but with good finite population properties.
This bootstrap can be applied using \code{control_inference(nn_exact_se=TRUE)} and
can be summarized as follows:
\enumerate{
\item Sample \eqn{n_A} units from \eqn{S_A} with replacement to create \eqn{S_A'} (if pseudo-weights are present inclusion probabilities should be proportional to their inverses).
\item Match units from \eqn{S_B} to \eqn{S_A'} to obtain predictions \eqn{y^*}=\eqn{{k}^{-1}\sum_{k}y_k}.
\item Estimate \eqn{\hat{\mu}=\frac{1}{N} \sum_{i \in S_B} d_i y_i^*}.
\item Repeat steps 1-3 \eqn{M} times (we set \eqn{M=50} in our simulations; this is hard-coded).
\item Estimate \eqn{\hat{V}_1=\text{var}({\hat{\boldsymbol{\mu}}})} obtained from simulations and save it as \code{var_nonprob}.
}

(b) probability part (\eqn{S_B} with size \eqn{n_B}; denoted as \code{var_prob} in the result)

This part uses functionalities of the \code{{survey}} package and the variance is estimated using the following
equation:

\deqn{
\hat{V}_2=\frac{1}{N^2} \sum_{i=1}^n \sum_{j=1}^n \frac{\pi_{i j}-\pi_i \pi_j}{\pi_{i j}}
\frac{y_i^*}{\pi_i} \frac{y_j^*}{\pi_j},
}

where \eqn{y^*_i} and \eqn{y_j^*} are values imputed imputed as an average
of \eqn{k}-nearest neighbour, i.e. \eqn{{k}^{-1}\sum_{k}y_k}. Note that \eqn{\hat{V}_2} in principle can be estimated in various ways depending on the type of the design and whether population size is known or not.
}
\examples{

data(admin)
data(jvs)
jvs_svy <- svydesign(ids = ~ 1,  weights = ~ weight, strata = ~ size + nace + region, data = jvs)

res_nn <- method_nn(y_nons = admin$single_shift,
                    X_nons = model.matrix(~ region + private + nace + size, admin),
                    X_rand = model.matrix(~ region + private + nace + size, jvs),
                    svydesign = jvs_svy)

res_nn

}
\references{
Yang, S., Kim, J. K., & Hwang, Y. (2021). Integration of data from probability surveys and
big found data for finite population inference using mass imputation.
Survey Methodology, June 2021 29 Vol. 47, No. 1, pp. 29-58

Chlebicki, P., Chrostowski, Ł., & Beręsewicz, M. (2025). Data integration of non-probability
and probability samples with predictive mean matching. arXiv preprint arXiv:2403.13750.
}
