% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rows_distinct.R
\name{rows_distinct}
\alias{rows_distinct}
\title{Verify that row data are distinct}
\usage{
rows_distinct(
  x,
  columns = NULL,
  preconditions = NULL,
  actions = NULL,
  brief = NULL,
  active = TRUE
)
}
\arguments{
\item{x}{A data frame, tibble, or an agent object of class \code{ptblank_agent}.}

\item{columns}{The column (or a set of columns, provided as a character
vector) to which this validation should be applied.}

\item{preconditions}{expressions used for mutating the input table before
proceeding with the validation. This is ideally as a one-sided R formula
using a leading \code{~}. In the formula representation, the \code{tbl} serves as the
input data table to be transformed (e.g.,
\code{~ tbl \%>\% dplyr::mutate(col = col + 10)}. A series of expressions can be
used by enclosing the set of statements with \code{{ }} but note that the \code{tbl}
object must be ultimately returned.}

\item{actions}{A list containing threshold levels so that the validation step
can react accordingly when exceeding the set levels. This is to be created
with the \code{\link[=action_levels]{action_levels()}} helper function.}

\item{brief}{An optional, text-based description for the validation step.}

\item{active}{A logical value indicating whether the validation step should
be active. If the step function is working with an agent, \code{FALSE} will make
the validation step inactive (still reporting its presence and keeping
indexes for the steps unchanged). If the step function will be operating
directly on data, then any step with \code{active = FALSE} will simply pass the
data through with no validation whatsoever. The default for this is \code{TRUE}.}
}
\value{
Either a \code{ptblank_agent} object or a table object, depending on what
was passed to \code{x}.
}
\description{
The \code{rows_distinct()} validation step function checks whether row values
(optionally constrained to a selection of specified \code{columns}) are, when
taken as a complete unit, distinct from all other units in the table. This
function can be used directly on a data table or with an \emph{agent} object
(technically, a \code{ptblank_agent} object). This validation step will operate
over the number of test units that is equal to the number of rows in the
table (after any \code{preconditions} have been applied).
}
\details{
We can specify the constraining column names in quotes, in \code{vars()}, and with
the following \strong{tidyselect} helper functions: \code{starts_with()},
\code{ends_with()}, \code{contains()}, \code{matches()}, and \code{everything()}.

Having table \code{preconditions} means \strong{pointblank} will mutate the table just
before interrogation. It's isolated to the validation steps produced by this
validation step function. Using \strong{dplyr} code is suggested here since the
statements can be translated to SQL if necessary. The code is to be supplied
as a one-sided \strong{R} formula (using a leading \code{~}). In the formula
representation, the obligatory \code{tbl} variable will serve as the input
data table to be transformed (e.g.,
\code{~ tbl \%>\% dplyr::mutate(col_a = col_b + 10)}. A series of expressions can be
used by enclosing the set of statements with \code{{ }} but note that the \code{tbl}
variable must be ultimately returned.

Often, we will want to specify \code{actions} for the validation. This argument,
present in every validation step function, takes a specially-crafted list
object that is best produced by the \code{\link[=action_levels]{action_levels()}} function. Read that
function's documentation for the lowdown on how to create reactions to
above-threshold failure levels in validation. The basic gist is that you'll
want at least a single threshold level (specified as either the fraction test
units failed, or, an absolute value), often using the \code{warn_at} argument.
This is especially true when \code{x} is a table object because, otherwise,
nothing happens. For the \verb{col_vals_*()}-type functions, using
\code{action_levels(warn_at = 0.25)} or \code{action_levels(stop_at = 0.25)} are good
choices depending on the situation (the first produces a warning when a
quarter of the total test units fails, the other \code{stop()}s at the same
threshold level).

Want to describe this validation step in some detail? Keep in mind that this
is only useful if \code{x} is an \emph{agent}. If that's the case, \code{brief} the agent
with some text that fits. Don't worry if you don't want to do it. The
\emph{autobrief} protocol is kicked in when \code{brief = NULL} and a simple brief will
then be automatically generated.
}
\section{Function ID}{

2-15
}

\examples{
# Create a simple table with three
# columns of numerical values
tbl <-
  dplyr::tibble(
    a = c(5, 7, 6, 5, 8, 7),
    b = c(7, 1, 0, 0, 8, 3),
    c = c(1, 1, 1, 3, 3, 3)
  )

# Validate that when considering only
# data in columns `a` and `b`, there
# are no duplicate rows (i.e., all
# rows are distinct)
agent <-
  create_agent(tbl = tbl) \%>\%
  rows_distinct(vars(a, b)) \%>\%
  interrogate()

# Determine if these column
# validations have all passed
# by using `all_passed()`
all_passed(agent)

}
\seealso{
Other Validation Step Functions: 
\code{\link{col_exists}()},
\code{\link{col_is_character}()},
\code{\link{col_is_date}()},
\code{\link{col_is_factor}()},
\code{\link{col_is_integer}()},
\code{\link{col_is_logical}()},
\code{\link{col_is_numeric}()},
\code{\link{col_is_posix}()},
\code{\link{col_schema_match}()},
\code{\link{col_vals_between}()},
\code{\link{col_vals_equal}()},
\code{\link{col_vals_gte}()},
\code{\link{col_vals_gt}()},
\code{\link{col_vals_in_set}()},
\code{\link{col_vals_lte}()},
\code{\link{col_vals_lt}()},
\code{\link{col_vals_not_between}()},
\code{\link{col_vals_not_equal}()},
\code{\link{col_vals_not_in_set}()},
\code{\link{col_vals_not_null}()},
\code{\link{col_vals_null}()},
\code{\link{col_vals_regex}()},
\code{\link{conjointly}()}
}
\concept{Validation Step Functions}
