% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/matrixNAneighbourImpute.R
\name{matrixNAneighbourImpute}
\alias{matrixNAneighbourImpute}
\title{Imputation of NA-values based on non-NA replicates}
\usage{
matrixNAneighbourImpute(
  dat,
  gr,
  imputMethod = "mode2",
  retnNA = TRUE,
  avSdH = c(0.1, 0.5),
  NAneigLst = NULL,
  plotHist = c("hist", "mode"),
  xLab = NULL,
  xLim = NULL,
  yLab = NULL,
  yLim = NULL,
  tit = NULL,
  figImputDetail = TRUE,
  seedNo = NULL,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
}
\arguments{
\item{dat}{(matrix or data.frame) main data (may contain \code{NA})}

\item{gr}{(character or factor) grouping of columns of 'dat', replicate association}

\item{imputMethod}{(character) choose the imputation method (may be 'mode2'(default), 'mode1', 'datQuant', 'modeAdopt' or 'informed')}

\item{retnNA}{(logical) decide (if =\code{TRUE}) only NA-substuted data should be returned, or if list with $data, $nNA, $NAneighbour and $randParam should be returned}

\item{avSdH}{(numerical,length=2) population characteristics 'high' (mean and sd) for >1 \code{NA}-neighbours (per line)}

\item{NAneigLst}{(list) option for repeated rounds of imputations: list of \code{NA}-neighbour values can be furnished for slightly faster processing}

\item{plotHist}{(character or logical) decide if supplemental figure with histogram shoud be drawn, the details 'Hist','quant' (display quantile of originak data), 'mode' (display mode of original data) can be chosen explicitely}

\item{xLab}{(character) label on x-axis on plot}

\item{xLim}{(numeric, length=2) custom x-axis limits}

\item{yLab}{(character) label on y-axis on plot}

\item{yLim}{(numeric, length=2) custom y-axis limits}

\item{tit}{(character) title on plot}

\item{figImputDetail}{(logical) display details about data (number of NAs) and imputation in graph (min number of NA-neighbours per protein and group, quantile to model, mean and sd of imputed)}

\item{seedNo}{(integer) seed-value for normal random values}

\item{silent}{(logical) suppress messages}

\item{callFrom}{(character) allow easier tracking of messages produced}

\item{debug}{(logical) supplemental messages for debugging}
}
\value{
This function returns a list with \code{$data} .. matrix of data where \code{NA} are replaced by imputed values, \code{$nNA} .. number of \code{NA} by group, \code{$randParam} .. parameters used for making random data
}
\description{
It is assumed that \code{NA}-values appear in data when quantitation values are very low (as this appears eg in quantitative shotgun proteomics).
Here, the concept of (technical) replicates is used to investigate what kind of values appear in the other replicates next to NA-values for the same line/protein.
Groups of replicate samples  are defined via argument \code{gr} which descibes the columns of \code{dat}).
Then, they are inspected for each line to gather NA-neighbour values (ie those values where NAs and regular measures are observed the same time).
Eg, let's consider a line contains a set of 4 replicates for a given group. Now, if 2 of them are \code{NA}-values, the remaining 2 non-\code{NA}-values will be considered as NA-neighbours.
Ultimately, the aim is to replaces all \code{NA}-values based on values from a normal distribution ressembling theire respective NA-neighbours.
}
\details{
By default a histogram gets plotted showing the initial, imputed and final distribution to check the global hypothesis that \code{NA}-values arose
from very low measurements and to appreciate the impact of the imputed values to the overall final distribution.


There are a number of experimental settings where low measurements may be reported as \code{NA}.
Sometimes an arbitrary defined baseline (as 'zero') may provoke those values found below being unfortunately reported as \code{NA} or as 0 (in case of MaxQuant).
In quantitative proteomics (DDA-mode) the presence of numerous high-abundance peptides will lead to the fact that a number of less
intense MS-peaks don't get identified properly and will then be reported as \code{NA} in the respective samples,
while the same peptides may by correctly identified and quantified in other (replicate) samples.
So, if a given protein/peptide gets properly quantified in some replicate samples but reported as \code{NA} in other replicate samples
one may thus speculate that similar values like in the successful quantifications may have occored.
Thus, imputation of \code{NA}-values may be done on the basis of \code{NA}-neighbours.



When extracting \code{NA}-neighbours, a slightly more focussed approach gets checked, too, the 2-\code{NA}-neighbours : In case a set of replicates for a given protein
contains at least 2 non-\code{NA}-values (instead of just one) it will be considered as a (min) 2-\code{NA}-neighbour as well as regular \code{NA}-neighbour.
If >300 of these (min) 2-\code{NA}-neighbours get found, they will be used instead of the regular \code{NA}-neighbours.
For creating a collection of normal random values one may use directly the mode of the \code{NA}-neighbours (or 2-\code{NA}-neighbours, if >300 such values available).
To do so, the first value of argument \code{avSdH} must be set to \code{NA}. Otherwise, the first value \code{avSdH} will be used as quantile of all data to define the mean
for the imputed data (ie as \code{quantile(dat, avSdH[1], na.rm=TRUE)}). The sd for generating normal random values will be taken from the sd of all  \code{NA}-neighbours (or 2-\code{NA}-neighbours)
multiplied by the second value in argument \code{avSdH} (or \code{avSdH}, if >300 2-\code{NA}-neighbours), since the sd of the \code{NA}-neighbours is usually quite high.
In extremely rare cases it may happen that no \code{NA}-neighbours are found (ie if \code{NA}s occur, all replicates are \code{NA}).
Then, this function replaces \code{NA}-values based on the normal random values obtained as dscribed above.
}
\examples{
set.seed(2013)
datT6 <- matrix(round(rnorm(300)+3,1), ncol=6, dimnames=list(paste("li",1:50,sep=""),
  letters[19:24]))
datT6 <- datT6 +matrix(rep(1:nrow(datT6), ncol(datT6)), ncol=ncol(datT6))
datT6[6:7, c(1,3,6)] <- NA
datT6[which(datT6 < 11 & datT6 > 10.5)] <- NA
datT6[which(datT6 < 6 & datT6 > 5)] <- NA
datT6[which(datT6 < 4.6 & datT6 > 4)] <- NA
datT6b <- matrixNAneighbourImpute(datT6, gr=gl(2,3))
head(datT6b$data)
}
\seealso{
this function gets used by \code{\link{testRobustToNAimputation}}; estimation of mode \code{\link[wrMisc]{stableMode}}; detection of NAs \code{\link[stats]{na.fail}}
}
