% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tbl_svysummary.R
\name{tbl_svysummary}
\alias{tbl_svysummary}
\title{Create a table of summary statistics from a survey object}
\usage{
tbl_svysummary(
  data,
  by = NULL,
  label = NULL,
  statistic = NULL,
  digits = NULL,
  type = NULL,
  value = NULL,
  missing = NULL,
  missing_text = NULL,
  sort = NULL,
  percent = NULL,
  include = everything()
)
}
\arguments{
\item{data}{A survey object created with created with \code{survey::svydesign()}}

\item{by}{A column name (quoted or unquoted) in \code{data}.
Summary statistics will be calculated separately for each level of the \code{by}
variable (e.g. \code{by = trt}). If \code{NULL}, summary statistics
are calculated using all observations. To stratify a table by two or more
variables, use \code{tbl_strata()}}

\item{label}{List of formulas specifying variables labels,
e.g. \code{list(age ~ "Age", stage ~ "Path T Stage")}.  If a
variable's label is not specified here, the label attribute
(\code{attr(data$age, "label")}) is used.  If
attribute label is \code{NULL}, the variable name will be used.}

\item{statistic}{List of formulas specifying types of summary statistics to
display for each variable.  The default is
\code{list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}\%)")}.
See below for details.}

\item{digits}{List of formulas specifying the number of decimal
places to round summary statistics. If not specified,
\code{tbl_summary} guesses an appropriate number of decimals to round statistics.
When multiple statistics are displayed for a single variable, supply a vector
rather than an integer.  For example, if the
statistic being calculated is \code{"{mean} ({sd})"} and you want the mean rounded
to 1 decimal place, and the SD to 2 use \code{digits = list(age ~ c(1, 2))}. User
may also pass a styling function: \code{digits = age ~ style_sigfig}}

\item{type}{List of formulas specifying variable types. Accepted values
are \code{c("continuous", "continuous2", "categorical", "dichotomous")},
e.g. \code{type = list(age ~ "continuous", female ~ "dichotomous")}.
If type not specified for a variable, the function
will default to an appropriate summary type. See below for details.}

\item{value}{List of formulas specifying the value to display for dichotomous
variables. gtsummary selectors, e.g. \code{all_dichotomous()}, cannot be used
with this argument. See below for details.}

\item{missing}{Indicates whether to include counts of \code{NA} values in the table.
Allowed values are \code{"no"} (never display NA values),
\code{"ifany"} (only display if any NA values), and \code{"always"}
(includes NA count row for all variables). Default is \code{"ifany"}.}

\item{missing_text}{String to display for count of missing observations.
Default is \code{"Unknown"}.}

\item{sort}{List of formulas specifying the type of sorting to perform for
categorical data. Options are \code{frequency} where results are sorted in
descending order of frequency and \code{alphanumeric},
e.g. \code{sort = list(everything() ~ "frequency")}}

\item{percent}{Indicates the type of percentage to return. Must be one of
\code{"column"}, \code{"row"}, or \code{"cell"}. Default is \code{"column"}.}

\item{include}{variables to include in the summary table. Default is \code{everything()}}
}
\value{
A \code{tbl_svysummary} object
}
\description{
The \code{tbl_svysummary} function calculates descriptive statistics for
continuous, categorical, and dichotomous variables taking into account survey weights and design.
It is similar to \code{\link[=tbl_summary]{tbl_summary()}}.
}
\section{statistic argument}{

The statistic argument specifies the statistics presented in the table. The
input is a list of formulas that specify the statistics to report. For example,
\code{statistic = list(age ~ "{mean} ({sd})")} would report the mean and
standard deviation for age; \code{statistic = list(all_continuous() ~ "{mean} ({sd})")}
would report the mean and standard deviation for all continuous variables.
A statistic name that appears between curly brackets
will be replaced with the numeric statistic (see \link[glue:glue]{glue::glue}).

For categorical variables the following statistics are available to display.
\itemize{
\item \code{{n}} frequency
\item \code{{N}} denominator, or cohort size
\item \code{{p}} formatted percentage
\item \code{{n_unweighted}} unweighted frequency
\item \code{{N_unweighted}} unweighted denominator
\item \code{{p_unweighted}} unweighted formatted percentage
}
For continuous variables the following statistics are available to display.
\itemize{
\item \code{{median}} median
\item \code{{mean}} mean
\item \code{{sd}} standard deviation
\item \code{{var}} variance
\item \code{{min}} minimum
\item \code{{max}} maximum
\item \verb{\{p##\}} any integer percentile, where \verb{##} is an integer from 0 to 100
\item \code{{sum}} sum
}

Unlike \code{\link[=tbl_summary]{tbl_summary()}}, it is not possible to pass a custom function.

For both categorical and continuous variables, statistics on the number of
missing and non-missing observations and their proportions are available to
display.
\itemize{
\item \code{{N_obs}} total number of observations
\item \code{{N_miss}} number of missing observations
\item \code{{N_nonmiss}} number of non-missing observations
\item \code{{p_miss}} percentage of observations missing
\item \code{{p_nonmiss}} percentage of observations not missing
\item \code{{N_obs_unweighted}} unweighted total number of observations
\item \code{{N_miss_unweighted}} unweighted number of missing observations
\item \code{{N_nonmiss_unweighted}} unweighted number of non-missing observations
\item \code{{p_miss_unweighted}} unweighted percentage of observations missing
\item \code{{p_nonmiss_unweighted}} unweighted percentage of observations not missing
}

Note that for categorical variables, \code{{N_obs}}, \code{{N_miss}} and \code{{N_nonmiss}} refer
to the total number, number missing and number non missing observations
in the denominator, not at each level of the categorical variable.
}

\section{Example Output}{

\if{html}{Example 1}

\if{html}{\figure{tbl_svysummary_ex1.png}{options: width=31\%}}

\if{html}{Example 2}

\if{html}{\figure{tbl_svysummary_ex2.png}{options: width=36\%}}
}

\section{type argument}{

The \code{tbl_summary()} function has four summary types:
\itemize{
\item \code{"continuous"} summaries are shown on a \emph{single row}. Most numeric
variables default to summary type continuous.
\item \code{"continuous2"} summaries are shown on \emph{2 or more rows}
\item \code{"categorical"} \emph{multi-line} summaries of nominal data. Character variables,
factor variables, and numeric variables with fewer than 10 unique levels default to
type categorical. To change a numeric variable to continuous that
defaulted to categorical, use \code{type = list(varname ~ "continuous")}
\item \code{"dichotomous"} categorical variables that are displayed on a \emph{single row},
rather than one row per level of the variable.
Variables coded as \code{TRUE}/\code{FALSE}, \code{0}/\code{1}, or \code{yes}/\code{no} are assumed to be dichotomous,
and the \code{TRUE}, \code{1}, and \code{yes} rows are displayed.
Otherwise, the value to display must be specified in the \code{value}
argument, e.g. \code{value = list(varname ~ "level to show")}
}
}

\section{select helpers}{

\href{https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html#select_helpers}{Select helpers}
from the \\{tidyselect\\} package and \\{gtsummary\\} package are available to
modify default behavior for groups of variables.
For example, by default continuous variables are reported with the median
and IQR.  To change all continuous variables to mean and standard deviation use
\code{statistic = list(all_continuous() ~ "{mean} ({sd})")}.

All columns with class logical are displayed as dichotomous variables showing
the proportion of events that are \code{TRUE} on a single row. To show both rows
(i.e. a row for \code{TRUE} and a row for \code{FALSE}) use
\code{type = list(where(is.logical) ~ "categorical")}.

The select helpers are available for use in any argument that accepts a list
of formulas (e.g. \code{statistic}, \code{type}, \code{digits}, \code{value}, \code{sort}, etc.)

Read more on the \link{syntax} used through the package.
}

\examples{
\dontshow{if (broom.helpers::.assert_package("survey", pkg_search = "gtsummary", boolean = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
# A simple weighted dataset
tbl_svysummary_ex1 <-
  survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) \%>\%
  tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age))

# Example 2 ----------------------------------
# A dataset with a complex design
data(api, package = "survey")
tbl_svysummary_ex2 <-
  survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) \%>\%
  tbl_svysummary(by = "both", include = c(api00, stype))
\dontshow{\}) # examplesIf}
}
\seealso{
Review \link[=syntax]{list, formula, and selector syntax} used throughout gtsummary

Other tbl_svysummary tools: 
\code{\link{add_n.tbl_summary}()},
\code{\link{add_overall}()},
\code{\link{add_p.tbl_svysummary}()},
\code{\link{add_q}()},
\code{\link{add_stat_label}()},
\code{\link{modify}},
\code{\link{separate_p_footnotes}()},
\code{\link{tbl_merge}()},
\code{\link{tbl_split}()},
\code{\link{tbl_stack}()},
\code{\link{tbl_strata}()}
}
\author{
Joseph Larmarange
}
\concept{tbl_svysummary tools}
