\name{mzpart}
\alias{mzpart}

\title{
Divisive partitioning of raw LC-HRMS measurements
}

\description{
Divisive recursive partition of LC-HRMS measurements.
Preparatory step for \code{\link[enviPick]{mzclust}} and \code{\link[enviPick]{mzpick}};
altenative to \code{\link[enviPick]{mzagglom}}. 
Requires an MSlist initilialized by \code{\link[enviPick]{readMSdata}} as input.  
}

\usage{
	mzpart(MSlist, dmzgap = 10, drtgap = 500, ppm = TRUE, 
	minpeak = 4, peaklimit = 2500, cutfrac = 0.1, drtsmall=50, 
	progbar = FALSE, stoppoints = 2e+05)
}

\arguments{  
	\item{MSlist}{MSlist generated by \code{\link[enviPick]{readMSdata}}}
	\item{dmzgap}{m/z gap width for partitioning}
	\item{drtgap}{RT gap width for partitioning}
	\item{ppm}{\code{dmzgap} given in ppm (TRUE) or as absolute value (FALSE)?}
	\item{minpeak}{Minimum number of measurements in a partition}
	\item{peaklimit}{Maximum number of measurements in a partition}
	\item{cutfrac}{Fraction of low density measurements to be discarded}
	\item{drtsmall}{RT tolerance used to estimate density}
	\item{progbar}{For debugging, ignore}
	\item{stoppoints}{For debugging, ignore}
}



\details{
This function searchs recursively for gaps in retention time (RT) and m/z in the LC-HRMS measurements and thus partitions (and resorts) the matrix contained in MSlist[[4]].
If neither partitioning by RT nor by m/z results in a small enough partition of <= \code{peaklimit} measurements, a fraction \code{cutfrac} of 
lowest-density measurements is discarded and the partition procedure resumed. Measurement-wise density is based on a gaussian kernel density estimate
scaled to \code{dmzgap} and \code{drtsmall}, i.e., to the local neighbourhood of each measurement.

Partitioning is necessary to speed up the clustering procedure of \code{\link[enviPick]{mzclust}}. Hence, there is a trade-off: 
large values of \code{peaklimit} leads to faster execution of
\code{\link[enviPick]{mzpart}} but to slower computation of \code{\link[enviPick]{mzclust}} and vice versa. 
}

\section{Imbecile}{
Do not set \code{minpeak} bigger than its counterpart in \code{\link[enviPick]{mzclust}} or \code{\link[enviPick]{mzpick}}. 
Too complicated? Then rather use \code{\link[enviPick]{enviPickwrap}} for adjusting all function arguments.
}

\value{
	Returns the argument MSlist, with entries made:

\item{Parameters}{MSlist[[2]]: saves the parameter settings.}
\item{Scans}{MSlist[[4]]: matrix with raw measurements and tags resorted for partitions.}
\item{Partition_Index}{MSlist[[5]]: Index assigning partitions to sections in the raw measurement of MSlist[[4]]; required for fast (random) access.}

}

\author{Martin  Loos}

\section{Warning}{
Despite optimized code, this function has a potential to run for a intolerable long time or out of memory if (a) the parameters are set wrongly, (b) the .mzML/.mzXML-file was not centroided or
(c) the underlying data is inadequate for this peak picker. 
With regards to (a), do not assume gaps being larger than actually present. Instead, use \code{\link[enviPick]{plotMSlist}} to have a look at your 
data contained in MSlist after upload with \code{\link[enviPick]{readMSdata}};
set \code{progbar=TRUE} to monitor where a function fails. Once settled, set \code{progbar=FALSE} for faster execution.

To avoid running out of memory, \code{stoppoints} sets the maximum number of measurements that can be handled in the routines to delete
those of lowest intensity (in cases where \code{peaklimit} cannot be reached by partitioning by \code{dmzgap} and \code{drtgap} alone). 
If above \code{stoppoints}, execution aborts. 
}

\seealso{\code{\link[enviPick]{mzclust}}}








