% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ale_core.R, R/ale_package.R
\docType{package}
\name{ale}
\alias{ale}
\alias{ale-package}
\title{ale_core.R}
\usage{
ale(
  test_data,
  model,
  x_cols = NULL,
  output = c("plot", "data"),
  pred_fun = function(object, newdata) {
     stats::predict(object = object, newdata =
    newdata, type = predict_type)
 },
  predict_type = "response",
  x_intervals = 100,
  boot_it = 0,
  seed = 0,
  boot_alpha = 0.05,
  boot_centre = "median",
  relative_y = "median",
  y_type = NULL,
  plot_alpha = 0.05,
  ale_xs = NULL,
  ale_ns = NULL
)
}
\arguments{
\item{test_data}{dataframe. Dataset from which to create predictions for the ALE.
Normally, this should be a test dataset, not the dataset on which the model
was trained.}

\item{model}{model object. Model for which ALE should be calculated. Must
contain a \code{terms} element, from which the name of the outcome target variable
will be automatically calculated.}

\item{x_cols}{character. Vector of column names from \code{test_dataset} for which
one-way ALE data is to be calculated (that is, simple ALE without interactions).
Must be provided if \code{ixn} is FALSE (default).}

\item{output}{character in c('plot', 'data'). Vector of types of results to return. 'plot' will return
an ALE plot; 'data' will return the source ALE data; both together will return both.}

\item{pred_fun, predict_type}{function,character. \code{pred_fun} is a function that
returns a vector of predicted values of type \code{predict_type} from \code{model} on \code{test_data}.
See details.}

\item{x_intervals}{non-negative integer. Maximum number of intervals on the x-axis
for the ALE data for each column in \code{x_cols}. The number of intervals that the algorithm generates
might eventually be fewer than what the user specifies if the data values for
a given x value do not support that many intervals.}

\item{boot_it}{non-negative integer. Number of bootstrap iterations for the
ALE values. If boot_it == 0 (default), then ALE will be calculated on the entire dataset
with no bootstrapping.}

\item{seed}{integer. Random seed. Supply this between runs to assure that
identical random ALE data is generated each time}

\item{boot_alpha}{numeric from 0 to 1. Alpha for percentile-based confidence
interval range for the bootstrap intervals; the bootstrap confidence intervals
will be the lowest and highest \code{(1 - 0.05) / 2} percentiles. For example,
if \code{boot_alpha} == 0.05 (default), the intervals will be from the 2.5 and 97.5
percentiles.}

\item{boot_centre}{character in c('median', 'mean'). When bootstrapping, the
main estimate for \code{ale_y} is considered to be \code{boot_centre}. Regardless of the
value specified here, both the median and mean will be available.}

\item{relative_y}{character in c('median', 'mean', 'zero'). The ale_y values will
be adjusted relative to this value. 'median' is the default. 'zero' will maintain the
default of \code{ALEPlot::ALEPlot}, which is not shifted.}

\item{y_type}{character. Datatype of the y (outcome) variable according to the
types returned by the \code{var_type} function (see that function for options). If not
provided, this will be automatically determined.}

\item{plot_alpha}{numeric from 0 to 1. Alpha for "confidence interval" range
for printing bands around the median for single-variable plots.
The band range will be the median value of y ± \code{plot_alpha}.}

\item{ale_xs, ale_ns}{list of ale_x and ale_n vectors. If provided, these vectors will be used to
set the intervals of the ALE x axis for each variable. By default (NULL), the
function automatically calculates the ale_x intervals. \code{ale_xs} is normally used
in advanced analyses where the ale_x intervals from a previous analysis are
reused for subsequent analyses (for example, for full model bootstrapping;
see the \code{model_bootstrap} function).}
}
\value{
list of ALE data tibbles and plots. The list is named by the x variables.
Within each list element, the data or plot is returned as requested in
the \code{output} argument.
}
\description{
Core functions for the ale package

Accumulated Local Effects (ALE) were initially developed as a model-agnostic
approach for global explanations of the results of black-box machine learning
algorithms. ALE has two primary advantages over other approaches like PDP
and SHAP: its values are not affected by the presence of interactions among
variables in a mode and its computation is relatively rapid. This package
rewrites the original \code{ALEPlot} code for calculating ALE data and it
completely reimplements the plotting of ALE values. In addition, future
versions hope to extend the original ALE concept beyond global explanations
with ALE-based measures that can be used for statistical inference
as well as an ALE-based approach for local explanations.
}
\details{
Create and return ALE data and plots

This is the central function that manages the creation of ALE data and plots
for one-way ALE. For two-way interactions, see \code{ale_ixn}. This function calls
\code{ale_core} that manages the ALE data and plot creation in detail. For details, see
the introductory vignette for this package or the details and examples below.

The calculation of ALE requires modifying several values of the original
\code{test_data}. Thus, \code{ale} needs direct access to a \code{predict} function that work on
\code{model}. By default, \code{ale} uses a generic default \code{predict} function of the form
\code{predict(model_object, new_data)} with the default prediction type of 'response'.
If, however, the desired prediction values are not generated with that format,
the user must specify what they want. Most of the time, they only need to change
the prediction type to some other value by setting the \code{predict_type} argument
(e.g., to 'prob' to generated classification probabilities). But if the desired
predictions need a different function signature, then the user must create a
custom prediction function and pass it to \code{pred_fun}. See an example below.

For binary prediction models, be sure to set the predict_type to whatever
type returns probabilities (from 0 to 1)
}
\examples{
diamonds
set.seed(0)
diamonds_sample <- diamonds[sample(nrow(diamonds), 1000), ]

# Split the dataset into training and test sets
# https://stackoverflow.com/a/54892459/2449926
set.seed(0)
train_test_split <- sample(
  c(TRUE, FALSE), nrow(diamonds_sample), replace = TRUE, prob = c(0.8, 0.2)
)
diamonds_train <- diamonds_sample[train_test_split, ]
diamonds_test <- diamonds_sample[!train_test_split, ]


# Create a GAM model with flexible curves to predict diamond price
# Smooth all numeric variables and include all other variables
# Build model on training data, not on the full dataset.
gam_diamonds <- mgcv::gam(
  price ~ s(carat) + s(depth) + s(table) + s(x) + s(y) + s(z) +
    cut + color + clarity,
  data = diamonds_train
)
summary(gam_diamonds)


# Simple ALE without bootstrapping
ale_gam_diamonds <- ale(diamonds_test, gam_diamonds)


\donttest{
# Plot the ALE data
# Skip .common_data when iterating through the data for plotting
ale_gam_diamonds[setdiff(names(ale_gam_diamonds), '.common_data')] |>
  purrr::map(\(.x) .x$plot) |>  # extract plots as a list
  gridExtra::grid.arrange(grobs = _, ncol = 2)

# Bootstrapped ALE
# This can be slow, since bootstrapping runs the algorithm boot_it times

# Create ALE with 100 bootstrap samples
ale_gam_diamonds_boot <- ale(diamonds_test, gam_diamonds, boot_it = 100)

# Bootstrapping produces confidence intervals
# Skip .common_data when iterating through the data for plotting
ale_gam_diamonds_boot[setdiff(names(ale_gam_diamonds_boot), '.common_data')] |>
  purrr::map(\(.x) .x$plot) |>  # extract plots as a list
  gridExtra::grid.arrange(grobs = _, ncol = 2)
}



}
\seealso{
Useful links:
\itemize{
  \item \url{https://github.com/Tripartio/ale}
  \item Report bugs at \url{https://github.com/Tripartio/ale/issues}
}

}
\author{
Chitu Okoli \email{Chitu.Okoli@skema.edu}
}
