% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/lm_tidiers.R
\name{lm_tidiers}
\alias{augment.lm}
\alias{glance.lm}
\alias{lm_tidiers}
\alias{tidy.lm}
\title{Tidying methods for a linear model}
\usage{
\method{tidy}{lm}(x, conf.int = FALSE, conf.level = 0.95,
  exponentiate = FALSE, quick = FALSE, ...)

\method{augment}{lm}(x, data = stats::model.frame(x), newdata, type.predict,
  type.residuals, ...)

\method{glance}{lm}(x, ...)
}
\arguments{
\item{x}{lm object}

\item{conf.int}{whether to include a confidence interval}

\item{conf.level}{confidence level of the interval, used only if
\code{conf.int=TRUE}}

\item{exponentiate}{whether to exponentiate the coefficient estimates
and confidence intervals (typical for logistic regression)}

\item{quick}{whether to compute a smaller and faster version, containing
only the \code{term} and \code{estimate} columns.}

\item{...}{extra arguments (not used)}

\item{data}{Original data, defaults to the extracting it from the model}

\item{newdata}{If provided, performs predictions on the new data}

\item{type.predict}{Type of prediction to compute for a GLM; passed on to
\code{\link{predict.glm}}}

\item{type.residuals}{Type of residuals to compute for a GLM; passed on to
  \code{\link{residuals.glm}}}
}
\value{
All tidying methods return a \code{data.frame} without rownames.
The structure depends on the method chosen.

\code{tidy.lm} returns one row for each coefficient, with five columns:
  \item{term}{The term in the linear model being estimated and tested}
  \item{estimate}{The estimated coefficient}
  \item{std.error}{The standard error from the linear model}
  \item{statistic}{t-statistic}
  \item{p.value}{two-sided p-value}

If the linear model is an "mlm" object (multiple linear model), there is an
additional column:
  \item{response}{Which response column the coefficients correspond to
  (typically Y1, Y2, etc)}

If \code{conf.int=TRUE}, it also includes columns for \code{conf.low} and
\code{conf.high}, computed with \code{\link{confint}}.

When \code{newdata} is not supplied \code{augment.lm} returns
one row for each observation, with seven columns added to the original
data:
  \item{.hat}{Diagonal of the hat matrix}
  \item{.sigma}{Estimate of residual standard deviation when
    corresponding observation is dropped from model}
  \item{.cooksd}{Cooks distance, \code{\link{cooks.distance}}}
  \item{.fitted}{Fitted values of model}
  \item{.se.fit}{Standard errors of fitted values}
  \item{.resid}{Residuals}
  \item{.std.resid}{Standardised residuals}

(Some unusual "lm" objects, such as "rlm" from MASS, may omit
\code{.cooksd} and \code{.std.resid})

When \code{newdata} is supplied, \code{augment.lm} returns one row for each
observation, with three columns added to the new data:
  \item{.fitted}{Fitted values of model}
  \item{.se.fit}{Standard errors of fitted values}
  \item{.resid}{Residuals of fitted values on the new data}

\code{glance.lm} returns a one-row data.frame with the columns
  \item{r.squared}{The percent of variance explained by the model}
  \item{adj.r.squared}{r.squared adjusted based on the degrees of freedom}
  \item{sigma}{The square root of the estimated residual variance}
  \item{statistic}{F-statistic}
  \item{p.value}{p-value from the F test, describing whether the full
  regression is significant}
  \item{df}{Degrees of freedom used by the coefficients}
  \item{logLik}{the data's log-likelihood under the model}
  \item{AIC}{the Akaike Information Criterion}
  \item{BIC}{the Bayesian Information Criterion}
  \item{deviance}{deviance}
  \item{df.residual}{residual degrees of freedom}
}
\description{
These methods tidy the coefficients of a linear model into a summary,
augment the original data with information on the fitted values and
residuals, and construct a one-row glance of the model's statistics.
}
\details{
If you have missing values in your model data, you may need to refit
the model with \code{na.action = na.exclude}.

If \code{conf.int=TRUE}, the confidence interval is computed with
the \code{\link{confint}} function.

While \code{tidy} is supported for "mlm" objects, \code{augment} and
\code{glance} are not.

When the modeling was performed with \code{na.action = "na.omit"}
(as is the typical default), rows with NA in the initial data are omitted
entirely from the augmented data frame. When the modeling was performed
with \code{na.action = "na.exclude"}, one should provide the original data
as a second argument, at which point the augmented data will contain those
rows (typically with NAs in place of the new columns). If the original data
is not provided to \code{augment} and \code{na.action = "na.exclude"}, a
warning is raised and the incomplete rows are dropped.
}
\examples{
library(ggplot2)
library(dplyr)

mod <- lm(mpg ~ wt + qsec, data = mtcars)

tidy(mod)
glance(mod)

# coefficient plot
d <- tidy(mod) \%>\% mutate(low = estimate - std.error,
                          high = estimate + std.error)
ggplot(d, aes(estimate, term, xmin = low, xmax = high, height = 0)) +
     geom_point() +
     geom_vline(xintercept = 0) +
     geom_errorbarh()

head(augment(mod))
head(augment(mod, mtcars))

# predict on new data
newdata <- mtcars \%>\% head(6) \%>\% mutate(wt = wt + 1)
augment(mod, newdata = newdata)

au <- augment(mod, data = mtcars)

plot(mod, which = 1)
qplot(.fitted, .resid, data = au) +
  geom_hline(yintercept = 0) +
  geom_smooth(se = FALSE)
qplot(.fitted, .std.resid, data = au) +
  geom_hline(yintercept = 0) +
  geom_smooth(se = FALSE)
qplot(.fitted, .std.resid, data = au,
  colour = factor(cyl))
qplot(mpg, .std.resid, data = au, colour = factor(cyl))

plot(mod, which = 2)
qplot(sample =.std.resid, data = au, stat = "qq") +
    geom_abline()

plot(mod, which = 3)
qplot(.fitted, sqrt(abs(.std.resid)), data = au) + geom_smooth(se = FALSE)

plot(mod, which = 4)
qplot(seq_along(.cooksd), .cooksd, data = au)

plot(mod, which = 5)
qplot(.hat, .std.resid, data = au) + geom_smooth(se = FALSE)
ggplot(au, aes(.hat, .std.resid)) +
  geom_vline(size = 2, colour = "white", xintercept = 0) +
  geom_hline(size = 2, colour = "white", yintercept = 0) +
  geom_point() + geom_smooth(se = FALSE)

qplot(.hat, .std.resid, data = au, size = .cooksd) +
  geom_smooth(se = FALSE, size = 0.5)

plot(mod, which = 6)
ggplot(au, aes(.hat, .cooksd)) +
  geom_vline(xintercept = 0, colour = NA) +
  geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") +
  geom_smooth(se = FALSE) +
  geom_point()
qplot(.hat, .cooksd, size = .cooksd / .hat, data = au) + scale_size_area()

# column-wise models
a <- matrix(rnorm(20), nrow = 10)
b <- a + rnorm(length(a))
result <- lm(b ~ a)
tidy(result)
}
\seealso{
\code{\link{summary.lm}}

\link{na.action}
}

