% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/expand.R
\name{expand}
\alias{expand}
\title{Expand an existing classification tree.}
\usage{
expand(tree, x, clades = "0", refine = "Viterbi", iterations = 50,
  nstart = 20, minK = 2, maxK = 2, minscore = 0.9, probs = 0.5,
  retry = TRUE, resize = TRUE, maxsize = max(sapply(x, length)),
  recursive = TRUE, cores = 1, quiet = TRUE, ...)
}
\arguments{
\item{tree}{an object of class \code{"insect"}.}

\item{x}{an object of class\code{"DNAbin"} representing a list of
DNA sequences to be used as the training data for the tree-learning process.
All sequences should be from the same genetic region of interest
and be globally alignable (i.e. without unjustified end-gaps).
The sequences must have "names" attributes that include
taxonomic ID numbers corresponding with those in the taxonomy
database \code{db} (separated from the sequence ID by a "|" character).
For example: "AF296347|30962", "AF296346|8022", "AF296345|8017", etc.
See \code{\link{searchGB}} for more details on creating the reference
sequence database.}

\item{clades}{a vector of character strings giving the binary indices
matching the labels of the nodes that are to be expanded.
Defaults to "0", meaning all subclades are expanded.
See below for further details on clade indexing.}

\item{refine}{character string giving the iterative model refinement
method to be used in the partitioning process. Valid options are
\code{"Viterbi"} (Viterbi training; the default option) and
\code{"BaumWelch"} (a modified version of the Expectation-Maximization
algorithm).}

\item{iterations}{integer giving the maximum number of training-classification
iterations to be used in the splitting process.
Note that this is not necessarily the same as the number of Viterbi training
or Baum Welch iterations to be used in model training, which can be set
using the argument \code{"maxiter"} (eventually passed to
\code{\link[aphid]{train}} via the dots argument "...").}

\item{nstart}{integer. The number of random starting sets to be chosen
for initial k-means assignment of sequences to groups. Defaults to 20.}

\item{minK}{integer. The minimum number of furications allowed at each inner
node of the tree. Defaults to 2 (all inner nodes are bifuricating).}

\item{maxK}{integer. The maximum number of furications allowed at each inner
node of the tree. Defaults to 2 (all inner nodes are bifuricating).}

\item{minscore}{numeric between 0 and 1. The minimum acceptable value
for the \emph{n}th percentile of Akaike weights (where \emph{n} is
the value given in \code{"probs"}, for the node to be split and the
recursion process to continue.
At any given node, if the \emph{n}th percentile of Akaike weights
falls below this threshold, the recursion process for the node will
terminate. As an example, if \code{minscore = 0.9} and
\code{probs = 0.5} (the default settings), and after generating two
candidate PHMMs to occupy the candidate subnodes the median
of Akaike weights is 0.89, the splitting process will
terminate and the function will simply return the unsplit root node.}

\item{probs}{numeric between 0 and 1. The percentile of Akaike weights
to test against the minimum score threshold given in \code{"minscore"}.}

\item{retry}{logical indicating whether failure to split a node based on
the criteria outlined in 'minscore' and 'probs' should prompt a second
attempt with different initial groupings. These groupings are based on
maximum kmer frequencies rather than k-means division, which can give
suboptimal groupings when the cluster sizes are different (due to
the up-weighting of larger clusters in the k-means algorithm).}

\item{resize}{logical indicating whether the models should be free to
change size during the training process or if the number of modules
should be fixed. Defaults to TRUE. Only applicable if
\code{refine = "Viterbi"}.}

\item{maxsize}{integer giving the upper bound on the number of modules
in the PHMMs. If NULL (default) no maximum size is enforced.}

\item{recursive}{logical indicating whether the splitting process
should continue recursively until the discrimination criteria
are not met (TRUE; default), or whether a single split should
take place at each of the nodes specified in \code{clades}.}

\item{cores}{integer giving the number of CPUs to use
when training the models (only applicable if
\code{refine = 'Viterbi'}). Defaults to 1.
This argument may alternatively be a 'cluster' object,
in which case it is the user's responsibility to close the socket
connection at the conclusion of the operation,
e.g. by running \code{parallel::stopCluster(cores)}.
The string 'autodetect' is also accepted, in which case the maximum
number of cores to use is one less than the total number of cores
available.}

\item{quiet}{logical indicating whether feedback should be printed
to the console. Note that the output can be verbose.}

\item{...}{further arguments to be passed on to \code{\link[aphid]{train}}).}
}
\value{
an object of class \code{"insect"}.
}
\description{
This function is used to grow an existing classification tree, typically
  using more relaxed parameter settings than those used when the tree was
  created, or if fine-scale control over the tree-learning operation
  is required.
  Note that the same reference sequence database used to
  build the original tree is required.
}
\details{
The clade indexing system used here is based on character strings,
  where "0" refers to the root node,
  "01" is the first child node, "02" is the second child node,
  "011" is the first child node of the first child node, etc.
  Note that this means each node cannot have more than 9 child nodes.
}
\examples{
\donttest{
  data(whales)
  data(whale_taxonomy)
  ## split the first node
  tree <- learn(whales, db = whale_taxonomy, recursive = FALSE, quiet = FALSE)
  ## expand only the first clade
  tree <- expand(tree, whales, clades = "1", quiet = TRUE)
 }
}
\seealso{
\code{\link{learn}}.
}
\author{
Shaun Wilkinson
}
