% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lsmi_cv.R
\name{lsmi_cv}
\alias{lsmi_cv}
\title{Cross-validation to Select an Optimal Combination of n.seed and n.wave}
\usage{
lsmi_cv(net, n.seeds, n.wave, seeds = NULL, B = 100, prob = 0.95,
  cl = 1, param = c("mu"), method = c("percentile", "basic"),
  proxyRep = 19, proxySize = 30)
}
\arguments{
\item{net}{a network object that is a list containing:
\describe{
  \item{\code{degree}}{the degree sequence of the network, which is
     an \code{integer} vector of length \eqn{n};}
  \item{\code{edges}}{the edgelist, which is a two-column
     matrix, where each row is an edge of the network;}
  \item{\code{n}}{the network order (i.e., number of nodes in the network).}
}
The network object can be simulated by \code{\link{random_network}},
selected from the networks available in \code{\link{artificial_networks}},
converged from an \code{igraph} object with \code{\link{igraph_to_network}},
etc.}

\item{n.seeds}{an integer vector of numbers of seeds for snowball sampling
(cf. a single integer \code{n.seed} in \code{\link{lsmi}}). Only
\code{n.seeds <= n} are retained. If \code{seeds} is
specified, only values \code{n.seeds < length(unique(seeds))} are retained
and automatically supplemented by \code{length(unique(seeds))}.}

\item{n.wave}{an integer defining the number of waves (order of the neighborhood)
to be recorded around the seed in the LSMI. For example, \code{n.wave = 1} corresponds to
an LSMI with the seed and its first neighbors. Note that the algorithm allows for
multiple inclusions.}

\item{seeds}{a vector of numeric IDs of pre-specified seeds. If specified,
LSMIs are constructed around each such seed.}

\item{B}{a positive integer, the number of bootstrap replications to perform.
Default is 100.}

\item{prob}{confidence level for the intervals. Default is 0.95
(i.e., 95\% confidence).}

\item{cl}{parameter to specify computer cluster for bootstrapping, passed to
the package \code{parallel} (default is \code{1}, meaning no cluster is used).
Possible values are:
\itemize{
  \item cluster object (list) produced by \link[parallel]{makeCluster}.
  In this case, new cluster is not started nor stopped;
  \item \code{NULL}. In this case, the function will attempt to detect
  available cores (see \link[parallel]{detectCores}) and, if there are
  multiple cores (\eqn{>1}), a cluster will be started with
  \link[parallel]{makeCluster}. If started, the cluster will be stopped
  after computations are finished;
  \item positive integer defining the number of cores to start a cluster.
  If \code{cl = 1}, no attempt to create a cluster will be made.
  If \code{cl > 1}, cluster will be started (using \link[parallel]{makeCluster})
  and stopped afterwards (using \link[parallel]{stopCluster}).
}}

\item{param}{The parameter of interest for which to run a cross-validation
and select optimal \code{n.seed} and \code{n.wave}. Currently, only one
selection is possible: \code{"mu"} (the network mean degree).}

\item{method}{method for calculating the bootstrap intervals. Default is
\code{"percentile"} (see Details).}

\item{proxyRep}{The number of times to repeat proxy sampling. Default is 19.}

\item{proxySize}{The size of the proxy sample. Default is 30.}
}
\value{
A list consisting of:
\item{bci}{A numeric vector of length 2 with the bootstrap confidence interval
(lower bound, upper bound) for the parameter of interest. This interval is
obtained by bootstrapping node degrees in an LSMI with the optimal combination
of \code{n.seed} and \code{n.wave}
(the combination is reported in \code{best_combination}).}
\item{estimate}{Point estimate of the parameter of interest
(based on the LSMI with \code{n.seed} seeds and \code{n.wave} waves
reported in the \code{best_combination}).}
\item{best_combination}{An integer vector of lenght 2 containing the optimal
\code{n.seed} and \code{n.wave} selected via cross-validation.}
\item{seeds}{A vector of numeric IDs of the seeds that were used
in the LSMI with the optimal combination of \code{n.seed} and \code{n.wave}.}
}
\description{
From the vector of specified \code{n.seeds} and possible waves \code{1:n.wave} around each
seed, the function selects a single number \code{n.seed} and an \code{n.wave}
(optimal seed-wave combination) that produce
a labeled snowball with multiple inclusions (LSMI) sample with desired
bootstrap confidence intervals for a parameter of interest. Here by `desired'
we mean that the interval (and corresponding seed-wave combination) are selected
as having the best coverage (closest to the specified level \code{prob}), based on
a cross-validation procedure with proxy estimates of the parameter.
See Algorithm 2 by \insertCite{gel_etal_2017;textual}{snowboot} and Details
below.
}
\details{
Currently, the bootstrap intervals can be calculated with two alternative
methods: \code{"percentile"} or \code{"basic"}. The \code{"percentile"}
intervals correspond to Efron's \eqn{100\cdot}\code{prob}\% intervals
\insertCite{@see @efron_1979, also Equation 5.18 by @davison_hinkley_1997 and Equation 3 by @gel_etal_2017, @chen_etal_2018_snowboot}{snowboot}:
\deqn{(\theta^*_{[B\alpha/2]}, \theta^*_{[B(1-\alpha/2)]}),}
where \eqn{\theta^*_{[B\alpha/2]}} and \eqn{\theta^*_{[B(1-\alpha/2)]}}
are empirical quantiles of the bootstrap distribution with \code{B} bootstrap
replications for parameter \eqn{\theta}
(\eqn{\theta} can be the \eqn{f(k)} or \eqn{\mu}),
and \eqn{\alpha = 1 -} \code{prob}.

The \code{"basic"} method produces intervals
\insertCite{@see Equation 5.6 by @davison_hinkley_1997}{snowboot}:
\deqn{(2\hat{\theta} - \theta^*_{[B(1-\alpha/2)]}, 2\hat{\theta} - \theta^*_{[B\alpha/2]}),}
where \eqn{\hat{\theta}} is the sample estimate of the parameter.
Note that this method can lead to negative confidence bounds, especially
when \eqn{\hat{\theta}} is close to 0.
}
\examples{
net <- artificial_networks[[1]]
a <- lsmi_cv(net, n.seeds = c(10, 20, 30), n.wave = 5, B = 100)

}
\references{
\insertAllCited{}
}
\seealso{
\code{\link{lsmi}}, \code{\link{lsmi_union}}, \code{\link{boot_dd}}, \code{\link{boot_ci}}
}
