% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/p_linreg_yerrors.R
\name{p_linreg_yerrors}
\alias{p_linreg_yerrors}
\title{Linear Regression of Y vs. Covariates with Y Measured in Pools and
(Potentially) Subject to Additive Normal Errors}
\usage{
p_linreg_yerrors(g, ytilde, x = NULL, errors = "processing",
  estimate_var = TRUE, start_nonvar_var = c(0.01, 1),
  lower_nonvar_var = c(-Inf, 1e-04), upper_nonvar_var = c(Inf, Inf),
  nlminb_list = list(control = list(trace = 1, eval.max = 500, iter.max =
  500)), hessian_list = list(method.args = list(r = 4)))
}
\arguments{
\item{g}{Numeric vector with pool sizes, i.e. number of members in each pool.}

\item{ytilde}{Numeric vector (or list of numeric vectors, if some pools have
replicates) with poolwise sum \code{Ytilde} values.}

\item{x}{Numeric matrix with poolwise \strong{\code{X}} values (if any), with
one row for each pool. Can be a vector if there is only 1 covariate.}

\item{errors}{Character string specifying the errors that \code{Y} is subject
to. Choices are \code{"neither"}, \code{"processing"} for processing error
only, \code{"measurement"} for measurement error only, and \code{"both"}.}

\item{estimate_var}{Logical value for whether to return variance-covariance
matrix for parameter estimates.}

\item{start_nonvar_var}{Numeric vector of length 2 specifying starting value
for non-variance terms and variance terms, respectively.}

\item{lower_nonvar_var}{Numeric vector of length 2 specifying lower bound for
non-variance terms and variance terms, respectively.}

\item{upper_nonvar_var}{Numeric vector of length 2 specifying upper bound for
non-variance terms and variance terms, respectively.}

\item{nlminb_list}{List of arguments to pass to \code{\link[stats]{nlminb}}
for log-likelihood maximization.}

\item{hessian_list}{List of arguments to pass to
\code{\link[numDeriv]{hessian}}.}
}
\value{
List containing:
\enumerate{
\item Numeric vector of parameter estimates.
\item Variance-covariance matrix (if \code{estimate_var = TRUE}).
\item Returned \code{\link[stats]{nlminb}} object from maximizing the
log-likelihood function.
\item Akaike information criterion (AIC).
}
}
\description{
Assumes outcome given covariates is a normal-errors linear regression. Pooled
outcome measurements can be assumed precise or subject to additive normal
processing error and/or measurement error. Replicates are supported.
}
\details{
The individual-level model of interest for Y|\strong{X} is:

Y = beta_0 + \strong{beta_x}^T \strong{X} + e, e ~ N(0, sigsq)

The implied model for summed Y*|\strong{X*} in a pool with g members is:

Y* = g beta_0 + \strong{beta_x}^T \strong{X*} + e*, e* ~ N(0, g sigsq)

The assay targets Ybar, the mean Y value for each pool, from which the sum Y*
can be calculated as Y* = g Ybar. But the Ybar's may be subject to processing
error and/or measurement error. Suppose Ybartilde is the imprecise version of
Ybar from the assay. If both errors are present, the assumed error structure
is:

Ybartilde = Ybar + e_p I(g > 1) + e_m, e_p ~ N(0, sigsq_p),
e_m ~ N(0, sigsq_m)

with the processing error e_p and measurement error e_m assumed independent
of each other. This motivates a maximum likelihood analysis for estimating
\strong{theta} = (beta_0, \strong{beta_x}^T)^T based on observed
(Ytilde*, \strong{X}*) values, where Ytilde* = g Ytildebar.
}
\examples{
# Load dataset containing data frame with (g, X1*, X2*, Y*, Ytilde*) values
# for 500 pools each of size 1, 2, and 3, and list of Ytilde values where 20
# of the single-specimen pools have replicates. Ytilde values are affected by
# processing error and measurement error; true parameter values are
# beta_0 = 0.25, beta_x1 = 0.5, beta_x2 = 0.25, sigsq = 1.
data(dat_p_linreg_yerrors)
dat <- dat_p_linreg_yerrors$dat
reps <- dat_p_linreg_yerrors$reps

# Fit Ytilde* vs. (X1*, X2*) ignoring errors in Ytilde (leads to loss of
# precision and overestimated sigsq, but no bias).
fit.naive <- p_linreg_yerrors(
  g = dat$g,
  y = dat$y,
  x = dat[, c("x1", "x2")],
  errors = "neither"
)
fit.naive$theta.hat

# Account for errors in Ytilde*, without using replicates
fit.corrected.noreps <- p_linreg_yerrors(
  g = dat$g,
  y = dat$ytilde,
  x = dat[, c("x1", "x2")],
  errors = "both"
)
fit.corrected.noreps$theta.hat

# Account for errors in Ytilde*, incorporating the 20 replicates
fit.corrected.reps <- p_linreg_yerrors(
  g = dat$g,
  y = reps,
  x = dat[, c("x1", "x2")],
  errors = "both"
)
fit.corrected.reps$theta.hat

# In this trial, incorporating replicates resulted in much better estimates
# of sigsq (truly 1), sigsq_p (truly 0.4), and sigsq_m (truly = 0.2) but very
# similar regression coefficient estimates.
fit.corrected.noreps$theta.hat
fit.corrected.reps$theta.hat


}
\references{
Schisterman, E.F., Vexler, A., Mumford, S.L. and Perkins, N.J. (2010) "Hybrid
pooled-unpooled design for cost-efficient measurement of biomarkers."
\emph{Stat. Med.} \strong{29}(5): 597--613.
}
