% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/define_variance_wrapper.R
\name{define_variance_wrapper}
\alias{define_variance_wrapper}
\title{Define a variance estimation wrapper}
\usage{
define_variance_wrapper(variance_function, reference_id, default = list(stat =
  "total", alpha = 0.05), objects_to_include = NULL,
  objects_to_include_from = parent.frame())
}
\arguments{
\item{variance_function}{An R function, with input a data matrix and possibly 
other arguments (e.g. parameters affecting the estimation of variance), 
and output a numeric vector of estimated variances (or a list whose first 
element is a numeric vector of estimated variances).}

\item{reference_id}{A vector containing the ids of all the responding units 
of the survey. It is compared with \code{default$id} to check whether some 
observations are missing in the survey file. Observations are reordered 
according to \code{reference_id}.}

\item{default}{a named list specifying the default values for: \itemize{
  \item \code{id}: the name of the default identifying variable in the survey 
  file. It can also be an unevaluated expression (enclosed in \code{substitute()}) to be 
  evaluated within the survey file.
  \item \code{weight}: the name of the default weight variable in the survey file. 
  It can also be an unevaluated expression (enclosed in \code{substitute()}) to be 
  evaluated within the survey file.
  \item \code{stat}: the name of the default statistic to compute when none is specified. 
  It is set to \code{"total"} by default.
  \item \code{alpha}: the default threshold for confidence interval derivation. 
  It is set to \code{0.05} by default.
}}

\item{objects_to_include}{A character vector indicating the name of 
additional R objects to include within the variance wrapper. These objects 
are to be used to carry out the variance estimation.}

\item{objects_to_include_from}{The environment to which the additional R 
objects belong.}
}
\value{
An R function that makes the estimation of variance based on the provided 
variance function easier. Its parameters are:
  \itemize{
   \item \code{data}: the survey data where the interest variables are stored
   \item \code{...}: one or more calls to a linearization wrapper (see examples
   and \code{\link[=linearization_wrapper_standard]{standard linearization wrappers}})
   \item \code{where}: a logical vector indicating a domain on which the variance
   estimation is conducted
   \item \code{by}: a qualitative variable whose levels are used to define domains
   on which the variance estimation is conducted
   \item \code{stat}: a character vector of size 1 indicating the linearization
   wrapper to use when none is specified. Its default value depends on
   the value of \code{default_stat} in \code{define_variance_wrapper}
   \item \code{alpha}: a numeric vector of size 1 indicating the threshold
   for confidence interval derivation. Its default value depends on
   the value of \code{default_alpha} in \code{define_variance_wrapper}
   \item \code{id}: a character vector of size 1 containing the name of
   the identifying variable in the survey file. It can also be an 
   unevaluated expression (using \code{substitute()}) to be evaluated within
   the survey file. Its default value depends on the value of 
   \code{default_id} in \code{define_variance_wrapper}
   \item \code{envir}: an environment containing a binding to \code{data}
 }
}
\description{
Given a variance estimation \emph{function} (specific to a 
  survey), \code{define_variance_wrapper} defines a variance estimation 
  \emph{wrapper} easier to use (e.g. automatic domain estimation, 
  linearization).
}
\details{
Defining variance estimation wrappers is the \strong{key feature} of
  the \code{gustave} package.
  
  Analytical variance estimation is often difficult to carry out by 
  non-specialists owing to the complexity of the underlying sampling 
  and estimation methodology. This complexity yields complex \emph{variance estimation 
  functions} which are most often only used by the sampling expert who 
  actually wrote them. A \emph{variance estimation wrapper} is an 
  intermediate function that is "wrapped around" the (complex) variance 
  estimation function in order to provide the non-specialist with 
  user-friendly features: \itemize{ \item checks for consistency between the 
  provided dataset and the survey characteristics \item factor discretization
  \item domain estimation \item linearization of complex statistics (see 
  \code{\link[=linearization_wrapper_standard]{standard linearization wrappers}})}
  
  \code{define_variance_wrapper} allows the sampling expert to define a 
  variance estimation wrapper around a given variance estimation function and
  set its default parameters. The produced variance estimation wrapper will 
  be stand-alone in the sense that it can contain additional data which would
  \code{objects_to_include} and \code{objects_to_include_from} parameters).
}
\examples{
### Example from the Information and communication technologies (ICT) survey

# The subset of the (simulated) ICT survey has the following features: 
# - stratified one-stage sampling design of 650 firms;
# - 612 responding firms, non-response correction through reweighting 
# in homogeneous response groups based on economic sub-sector and turnover;
# - calibration on margins (number of firms and turnover broken down
# by economic sub-sector).

# Step 1 : Definition of a variance function

variance_function <- function(y){
  
  # Calibration
  y <- rescal(y, x = x)
  
  # Non-response
  y <- add0(y, rownames = ict_sample$firm_id)
  var_nr <- var_pois(y, pik = ict_sample$response_prob_est, w = ict_sample$w_sample)
  
  # Sampling
  y <- y / ict_sample$response_prob_est
  var_sampling <- var_srs(y, pik = 1 / ict_sample$w_sample, strata = ict_sample$division)
  
  var_sampling + var_nr
  
}

# With x the calibration variables matrix
x <- as.matrix(ict_survey[
  order(ict_survey$firm_id), 
  c(paste0("N_", 58:63), paste0("turnover_", 58:63))
])

# Test of the variance function
y <- as.matrix(ict_survey$speed_quanti)
rownames(y) <- ict_survey$firm_id
variance_function(y)

# Step 2 : Definition of a variance wrapper

variance_wrapper <- define_variance_wrapper(
  variance_function = variance_function,
  reference_id = ict_survey$firm_id,
  default = list(id = "firm_id", weight = "w_calib"),
  objects_to_include = c("x", "ict_sample")
)

# The objects "x" and "ict_sample" are embedded
# within the function variance_wrapper
ls(environment(variance_wrapper))
# Note : variance_wrapper is a closure
# (http://adv-r.had.co.nz/Functional-programming.html#closures)
# As a consequence, the variance wrapper will work even if 
# x is removed from globalenv()
rm(x)

# Step 3 : Features of the variance wrapper

# Better display of results
variance_wrapper(ict_survey, speed_quanti)

# Mean linearization
variance_wrapper(ict_survey, mean(speed_quanti))
# Ratio linearization
variance_wrapper(ict_survey, ratio(turnover, employees))

# Discretization of qualitative variables
variance_wrapper(ict_survey, speed_quali)
# On-the-fly recoding
variance_wrapper(ict_survey, speed_quali == "Between 2 and 10 Mbs")

# 1-domain estimation
variance_wrapper(ict_survey, speed_quanti, where = division == "58")
# Multiple domains estimation
variance_wrapper(ict_survey, speed_quanti, by = division)

# Multiple variables at a time
variance_wrapper(ict_survey, speed_quanti, big_data)
variance_wrapper(ict_survey, speed_quanti, mean(big_data))
# Flexible syntax for where and by arguments
# (similar to the aes() function in ggplot2)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100)
)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100, where = division == "61")
)
variance_wrapper(ict_survey, where = division == "58", 
  mean(speed_quanti), mean(big_data * 100, where = NULL)
)

}
\seealso{
\code{\link[=linearization_wrapper_standard]{standard linearization wrappers}} \code{\link{varDT}}
}
\author{
Martin Chevalier
}
