% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/frame.R
\name{h2o.group_by}
\alias{h2o.group_by}
\title{Group and Apply by Column}
\usage{
h2o.group_by(data, by, ..., gb.control = list(na.methods = NULL, col.names =
  NULL))
}
\arguments{
\item{data}{an H2OFrame object.}

\item{by}{a list of column names}

\item{gb.control}{a list of how to handle \code{NA} values in the dataset as well as how to name
output columns. The method is specified using the \code{rm.method} argument. See
\code{Details:} for more help.}

\item{\dots}{any supported aggregate function. See \code{Details:} for more help.}
}
\value{
Returns a new H2OFrame object with columns equivalent to the number of
        groups created
}
\description{
Performs a group by and apply similar to ddply.
}
\details{
In the case of \code{na.methods} within \code{gb.control}, there are three possible settings.
\code{"all"} will include \code{NAs} in computation of functions. \code{"rm"} will completely
remove all \code{NA} fields. \code{"ignore"} will remove \code{NAs} from the numerator but keep
the rows for computational purposes. If a list smaller than the number of columns groups is
supplied, the list will be padded by \code{"ignore"}.

Note that to specify a list of column names in the \code{gb.control} list, you must add the
\code{col.names} argument. Similar to \code{na.methods}, \code{col.names} will pad the list with
the default column names if the length is less than the number of colums groups supplied.

Supported functions include \code{nrow}. This function is required and accepts a string for the
name of the generated column. Other supported aggregate functions accept \code{col} and \code{na}
arguments for specifying columns and the handling of NAs (\code{"all"}, \code{"ignore"}, and
GroupBy object; \code{max} calculates the maximum of each column specified in \code{col} for each
group of a GroupBy object; \code{mean} calculates the mean of each column specified in \code{col}
for each group of a GroupBy object; \code{min} calculates the minimum of each column specified in
\code{col} for each group of a GroupBy object; \code{mode} calculates the mode of each column
specified in \code{col} for each group of a GroupBy object; \code{sd} calculates the standard
deviation of each column specified in \code{col} for each group of a GroupBy object; \code{ss}
calculates the sum of squares of each column specified in \code{col} for each group of a GroupBy
object; \code{sum} calculates the sum of each column specified in \code{col} for each group of a
GroupBy object; and \code{var} calculates the variance of each column specified in \code{col} for
each group of a GroupBy object. If an aggregate is provided without a value (for example, as
\code{max} in \code{sum(col="X1", na="all").mean(col="X5", na="all").max()}), then it is assumed
that the aggregation should apply to all columns except the GroupBy columns. However, operations
 will not be performed on String columns.  They will be skipped.  Note again that
\code{nrow} is required and cannot be empty.
}

