% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/variance_function.R
\name{varDT}
\alias{varDT}
\alias{var_srs}
\title{Variance approximation with Deville-Tillé (2005) formula}
\usage{
varDT(y = NULL, pik, x = NULL, strata = NULL, w = NULL,
  precalc = NULL, id = NULL)

var_srs(y, pik, strata = NULL, w = NULL, precalc = NULL)
}
\arguments{
\item{y}{A (sparse) numerical matrix of the variable(s) whose variance of their total
is to be estimated.}

\item{pik}{A numerical vector of first-order inclusion probabilities.}

\item{x}{An optional (sparse) numerical matrix of balancing variable(s).}

\item{strata}{An optional categorical vector (factor or character) when
variance estimation is to be conducted within strata.}

\item{w}{An optional numerical vector of row weights (see Details).}

\item{precalc}{A list of pre-calculated results (see Details).}

\item{id}{A vector of identifiers of the units used in the calculation.
Useful when \code{precalc = TRUE} in order to assess whether the ordering of the
\code{y} data matrix matches the one used at the pre-calculation step.}
}
\value{
\itemize{ \item if \code{y} is not \code{NULL} (calculation step) : 
  the estimated variances as a numerical vector of size the number of 
  columns of y. \item if \code{y} is \code{NULL} (pre-calculation step) : a list 
  containing pre-calculated data.}
}
\description{
\code{varDT} estimates the variance of the estimator of a total
  in the case of a balanced sampling design with equal or unequal probabilities 
  using Deville-Tillé (2005) formula. Without balancing variables, it falls back 
  to Deville's (1993) classical approximation. Without balancing variables and 
  with equal probabilities, it falls back to the classical Horvitz-Thompson 
  variance estimator for the total in the case of simple random sampling. 
  Stratification is natively supported.
  
  \code{var_srs} is a convenience wrapper for the (stratified) simple random
  sampling case.
}
\details{
\code{varDT} aims at being the workhorse of most variance estimation conducted
  with the \code{gustave} package. It may be used to estimate the variance
  of the estimator of a total in the case of (stratified) simple random sampling, 
  (stratified) unequal probability sampling and (stratified) balanced sampling. 
  The native integration of stratification based on Matrix::TsparseMatrix allows 
  for significant performance gains compared to higher level vectorizations
  (\code{*apply} especially).
  
  Several time-consuming operations (e.g. collinearity-check, matrix
  inversion) can be pre-calculated in order to speed up the estimation at
  execution time. This is determined by the value of the parameters \code{y}
  and \code{precalc}: \itemize{ \item if \code{y} not \code{NULL} and
  \code{precalc} \code{NULL} : on-the-fly calculation (no pre-calculation). 
  \item if \code{y} \code{NULL} and \code{precalc} \code{NULL} :
  pre-calculation whose results are stored in a list of pre-calculated data. 
  \item if \code{y} not \code{NULL} and \code{precalc} not \code{NULL} :
  calculation using the list of pre-calculated data. }
  
  \code{w} is a row weight used at the final summation step. It is useful
  when \code{varDT} or \code{var_srs} are used on the second stage of a 
  two-stage sampling design applying the Rao (1975) formula.
}
\section{Difference with \code{varest} from package \code{sampling}}{

  
  \code{varDT} differs from \code{sampling::varest} in several ways: 
  \itemize{ \item The formula implemented in \code{varDT} is more general and
  encompasses balanced sampling. \item Even in its reduced
  form (without balancing variables), the formula implemented in \code{varDT}
  slightly differs from the one implemented in \code{sampling::varest}.
  Caron (1998, pp. 178-179) compares the two estimators
  (\code{sampling::varest} implements V_2, \code{varDT} implements V_1). 
  \item \code{varDT} introduces several optimizations: \itemize{ \item
  matrixwise operations allow to estimate variance on several interest
  variables at once \item Matrix::TsparseMatrix capability and the native
  integration of stratification yield significant performance gains. \item
  the ability to pre-calculate some time-consuming operations speeds up the
  estimation at execution time. } \item \code{varDT} does not natively
  implements the calibration estimator (i.e. the sampling variance estimator
  that takes into account the effect of calibration). In the context of the
  \code{gustave} package, \code{\link{res_cal}} should be called before 
  \code{varDT} in order to achieve the same result.}
}

\examples{
library(sampling)
set.seed(1)

# Simple random sampling case
N <- 1000
n <- 100
y <- rnorm(N)[as.logical(srswor(n, N))]
pik <- rep(n/N, n)
varDT(y, pik)
sampling::varest(y, pik = pik)
N^2 * (1 - n/N) * var(y) / n

# Unequal probability sampling case
N <- 1000
n <- 100
pik <- runif(N)
s <- as.logical(UPsystematic(pik))
y <- rnorm(N)[s]
pik <- pik[s]
varDT(y, pik)
varest(y, pik = pik)
# The small difference is expected (see Details).

# Balanced sampling case
N <- 1000
n <- 100
pik <- runif(N)
x <- matrix(rnorm(N*3), ncol = 3)
s <- as.logical(samplecube(x, pik))
y <- rnorm(N)[s]
pik <- pik[s]
x <- x[s, ]
varDT(y, pik, x)

# Balanced sampling case (variable of interest
# among the balancing variables)
N <- 1000
n <- 100
pik <- runif(N)
y <- rnorm(N)
x <- cbind(matrix(rnorm(N*3), ncol = 3), y)
s <- as.logical(samplecube(x, pik))
y <- y[s]
pik <- pik[s]
x <- x[s, ]
varDT(y, pik, x)
# As expected, the total of the variable of interest is perfectly estimated.

# strata argument
n <- 100
H <- 2
pik <- runif(n)
y <- rnorm(n)
strata <- letters[sample.int(H, n, replace = TRUE)]
all.equal(
 varDT(y, pik, strata = strata),
 varDT(y[strata == "a"], pik[strata == "a"]) + varDT(y[strata == "b"], pik[strata == "b"])
)

# precalc argument
n <- 1000
H <- 50
pik <- runif(n)
y <- rnorm(n)
strata <- sample.int(H, n, replace = TRUE)
precalc <- varDT(y = NULL, pik, strata = strata)
identical(
 varDT(y, precalc = precalc),
 varDT(y, pik, strata = strata)
)

}
\references{
Caron N. (1998), "Le logiciel Poulpe : aspects méthodologiques", \emph{Actes 
  des Journées de méthodologie statistique} \url{http://jms-insee.fr/jms1998s03_1/}
  Deville, J.-C. (1993), \emph{Estimation de la variance pour les enquêtes en
  deux phases}, Manuscript, INSEE, Paris.
  
  Deville, J.-C., Tillé, Y. (2005), "Variance approximation under balanced
  sampling", \emph{Journal of Statistical Planning and Inference}, 128, issue
  2 569-591
  
  Rao, J.N.K (1975), "Unbiased variance estimation for multistage designs",
  \emph{Sankhya}, C n°37
}
\seealso{
\code{\link{res_cal}}
}
\author{
Martin Chevalier
}
