% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tsbalancing.R
\name{build_balancing_problem}
\alias{build_balancing_problem}
\title{Build the elements of balancing problems.}
\usage{
build_balancing_problem(
  in_ts,
  problem_specs_df,
  in_ts_name = deparse1(substitute(in_ts)),
  ts_freq = stats::frequency(in_ts),
  periods = gs.time2str(in_ts),
  n_per = nrow(as.matrix(in_ts)),
  specs_df_name = deparse1(substitute(problem_specs_df)),
  temporal_grp_periodicity = 1,
  alter_pos = 1,
  alter_neg = 1,
  alter_mix = 1,
  lower_bound = -Inf,
  upper_bound = Inf,
  validation_only = FALSE
)
}
\arguments{
\item{in_ts}{(mandatory)

Time series (object of class "ts" or "mts") that contains the time series data to be reconciled.
They are the balancing problems' input data (initial solutions).}

\item{problem_specs_df}{(mandatory)

Balancing problem specifications data frame (object of class "data.frame"). Using a sparse format inspired from the
SAS/OR\eqn{^\circledR}{®} LP procedure’s \emph{sparse data input format} (SAS Institute 2015), it contains only the relevant
information such as the nonzero coefficients of the balancing constraints as well as the non-default alterability
coefficients and lower/upper bounds (i.e., values that would take precedence over those defined with arguments \code{alter_pos},
\code{alter_neg}, \code{alter_mix}, \code{alter_temporal}, \code{lower_bound} and \code{upper_bound}).

The information is provided using four mandatory variables (\code{type}, \code{col}, \code{row} and \code{coef}) and one optional variable
(\code{timeVal}). An observation (a row) in the problem specs data frame either defines a label for one of the seven types of the
balancing problem elements with columns \code{type} and \code{row} (see \emph{Label definition records} below) or specifies coefficients
(numerical values) for those balancing problem elements with variables \code{col}, \code{row}, \code{coef} and \code{timeVal} (see \emph{Information
specification records} below).
\itemize{
\item \strong{Label definition records} (\code{type} is not missing (is not \code{NA}))
\itemize{
\item \code{type} (chr): reserved keyword identifying the type of problem element being defined:
\itemize{
\item \code{EQ}: equality (\eqn{=}) balancing constraint
\item \code{LE}: lower or equal (\eqn{\le}{<=}) balancing constraint
\item \code{GE}: greater or equal (\eqn{\ge}{>=}) balancing constraint
\item \code{lowerBd}: period value lower bound
\item \code{upperBd}: period value upper bound
\item \code{alter}: period values alterability coefficient
\item \code{alterTmp}: temporal total alterability coefficient
}
\item \code{row} (chr): label to be associated to the problem element (\emph{\code{type} keyword})
\item \emph{all other variables are irrelevant and should contain missing data (\code{NA} values)} \cr \cr
}
\item \strong{Information specification records} (\code{type} is missing (is \code{NA}))
\itemize{
\item \code{type} (chr): not applicable (\code{NA})
\item \code{col} (chr): series name or reserved word \verb{_rhs_} to specify a balancing constraint right-hand side (RHS) value.
\item \code{row} (chr): problem element label.
\item \code{coef} (num): problem element value:
\itemize{
\item balancing constraint series coefficient or RHS value
\item series period value lower or upper bound
\item series period value or temporal total alterability coefficient
}
\item \code{timeVal} (num): optional time value to restrict the application of series bounds or alterability coefficients
to a specific time period (or temporal group). It corresponds to the time value, as returned by \code{stats::time()}, of a given
input time series (argument \code{in_ts}) period (observation) and is conceptually equivalent to \eqn{year + (period - 1) / 
  frequency}.
}
}

Note that empty strings (\code{""} or \code{''}) for character variables are interpreted as missing (\code{NA}) by the function. Variable
\code{row} identifies the elements of the balancing problem and is the key variable that makes the link between both types of
records. The same label (\code{row}) cannot be associated with more than one type of problem element (\code{type}) and multiple labels
(\code{row}) cannot be defined for the same given type of problem element (\code{type}), except for balancing constraints (values
\code{"EQ"}, \code{"LE"} and \code{"GE"} of column \code{type}). User-friendly features of the problem specs data frame include:
\itemize{
\item The order of the observations (rows) is not important.
\item Character values (variables \code{type}, \code{row} and \code{col}) are not case sensitive (e.g., strings \code{"Constraint 1"} and
\code{"CONSTRAINT 1"} for \code{row} would be considered as the same problem element label), except when \code{col} is used to specify a
series name (a column of the input time series object) where \strong{case sensitivity is enforced}.
\item The variable names of the problem specs data frame are also not case sensitive (e.g., \code{type}, \code{Type} or \code{TYPE} are all
valid) and \code{time_val} is an accepted variable name (instead of \code{timeVal}).
}

Finally, the following table lists valid aliases for the \emph{\code{type} keywords} (type of problem element):\tabular{cl}{
   \strong{Keyword} \tab \strong{Aliases} \cr
   \code{EQ} \tab \code{==}, \code{=} \cr
   \code{LE} \tab \code{<=}, \code{<} \cr
   \code{GE} \tab \code{>=}, \code{>} \cr
   \code{lowerBd} \tab \code{lowerBound}, \code{lowerBnd}, + \emph{same terms with '_', '.' or ' ' between words} \cr
   \code{upperBd} \tab \code{upperBound}, \code{upperBnd}, + \emph{same terms with '_', '.' or ' ' between words} \cr
   \code{alterTmp} \tab \code{alterTemporal}, \code{alterTemp}, + \emph{same terms with '_', '.' or ' ' between words} \cr
}


Reviewing the \strong{Examples} should help conceptualize the balancing problem specifications data frame.}

\item{in_ts_name}{(optional)

String containing the value of argument \code{in_ts}.

\strong{Default value} is \code{in_ts_name = deparse1(substitute(in_ts))}.}

\item{ts_freq}{(optional)

Frequency of the time series object (argument \code{in_ts}).

\strong{Default value} is \code{ts_freq = stats::frequency(in_ts)}.}

\item{periods}{(optional)

Character vector describing the time series object (argument \code{in_ts}) periods.

\strong{Default value} is \code{periods = gs.time2str(in_ts)}.}

\item{n_per}{(optional)

Number of periods of the time series object (argument \code{in_ts}).

\strong{Default value} is \code{n_per = nrow(as.matrix(in_ts))}.}

\item{specs_df_name}{(optional)

String containing the value of argument \code{problem_specs_df}.

\strong{Default value} is \code{specs_df_name = deparse1(substitute(problem_specs_df))}.}

\item{temporal_grp_periodicity}{(optional)

Positive integer defining the number of periods in temporal groups for which the totals should be preserved.
E.g., specify \code{temporal_grp_periodicity = 3} with a monthly time series for quarterly total preservation and
\code{temporal_grp_periodicity = 12} (or \code{temporal_grp_periodicity = frequency(in_ts)}) for annual total preservation.
Specifying \code{temporal_grp_periodicity = 1} (\emph{default}) corresponds to period-by-period processing without
temporal total preservation.

\strong{Default value} is \code{temporal_grp_periodicity = 1} (period-by-period processing without temporal total preservation).}

\item{alter_pos}{(optional)

Nonnegative real number specifying the default alterability coefficient associated to the values of time series with \strong{positive}
coefficients in all balancing constraints in which they are involved (e.g., component series in aggregation table raking problems).
Alterability coefficients provided in the problem specification data frame (argument \code{problem_specs_df}) override this value.

\strong{Default value} is \code{alter_pos = 1.0} (nonbinding values).}

\item{alter_neg}{(optional)

Nonnegative real number specifying the default alterability coefficient associated to the values of time series with \strong{negative}
coefficients in all balancing constraints in which they are involved (e.g., marginal totals in aggregation table raking problems).
Alterability coefficients provided in the problem specification data frame (argument \code{problem_specs_df}) override this value.

\strong{Default value} is \code{alter_neg = 1.0} (nonbinding values).}

\item{alter_mix}{(optional)

Nonnegative real number specifying the default alterability coefficient associated to the values of time series with a mix of
\strong{positive and negative} coefficients in the balancing constraints in which they are involved. Alterability coefficients provided
in the problem specification data frame (argument \code{problem_specs_df}) override this value.

\strong{Default value} is \code{alter_mix = 1.0} (nonbinding values).}

\item{lower_bound}{(optional)

Real number specifying the default lower bound for the time series values. Lower bounds provided in the problem specification
data frame (argument \ifelse{latex}{\code{problem _specs_df}}{\code{problem_specs_df}}) override this value.

\strong{Default value} is \code{lower_bound = -Inf} (unbounded).}

\item{upper_bound}{(optional)

Real number specifying the default upper bound for the time series values. Upper bounds provided in the problem specification
data frame (argument \ifelse{latex}{\code{problem _specs_df}}{\code{problem_specs_df}}) override this value.

\strong{Default value} is \code{upper_bound = Inf} (unbounded).}

\item{validation_only}{(optional)

Logical argument specifying whether the function should only perform input data validation or not. When
\code{validation_only = TRUE}, the specified \emph{balancing constraints} and \emph{period value (lower and upper) bounds} constraints
are validated against the input time series data, allowing for discrepancies up to the value specified with argument
\code{validation_tol}. Otherwise, when \code{validation_only = FALSE} (default), the input data are first reconciled and the
resulting (output) data are then validated.

\strong{Default value} is \code{validation_only = FALSE}.}
}
\value{
A list with the elements of the balancing problems (excluding the temporal totals info):
\itemize{
\item \code{labels_df}: cleaned-up version of the \emph{label definition records} from \code{problem_specs_df}
(\code{type} is not missing (is not \code{NA})); extra columns:
\itemize{
\item \code{type.lc} : \code{tolower(type)}
\item \code{row.lc}  : \code{tolower(row)}
\item \code{con.flag}: \code{type.lc \%in\% c("eq", "le", "ge")}
}
\item \code{coefs_df} : cleaned-up version of the information specification records from \code{problem_specs_df}
(\code{type} is missing (is \code{NA}); extra columns:
\itemize{
\item \code{row.lc}  : \code{tolower(row)}
\item \code{con.flag}: \code{labels_df$con.flag} allocated through \code{row.lc}
}
\item \code{values_ts}: reduced version of \code{in_ts} with only the relevant series (see vector \code{ser_names})
\item \code{lb}       : lower bound info (\code{type.lc  = "lowerbd"}) for the relevant series; list object with the
following elements:
\itemize{
\item \code{coefs_ts}       : lower bound values for series and period
\item \code{nondated_coefs} : vector of nondated lower bounds from \code{problem_specs_df} (\code{timeVal} is \code{NA})
\item \code{nondated_id_vec}: vector of \code{ser_names} id's associated to vector \code{nondated_coefs}
\item \code{dated_id_vec}   : vector of \code{ser_names} id's associated to dated lower bounds from
\code{problem_specs_df} (\code{timeVal} is not \code{NA})
}
\item \code{ub}       : \code{lb} equivalent for upper bounds (\code{type.lc = "upperbd"})
\item \code{alter}    : \code{lb} equivalent for period value alterability coefficients (\code{type.lc = "alter"})
\item \code{altertmp} : \code{lb} equivalent for temporal total alterability coefficients (\code{type.lc = "altertmp"})
\item \code{ser_names}: vector of the relevant series names (set of series involved in the balancing constraints)
\item \code{pos_ser}  : vector of series names that have only positive nonzero coefficients across all balancing constraints
\item \code{neg_ser}  : vector of series names that have only negative nonzero coefficients across all balancing constraints
\item \code{mix_ser}  : vector of series names that have both positive and negative nonzero coefficients across all balancing
constraints
\item \code{A1},\code{op1},\code{b1}: balancing constraint elements for problems involving a single period (e.g., each period of an
incomplete temporal group)
\item \code{A2},\code{op2},\code{b2}: balancing constraint elements for problems involving \code{temporal_grp_periodicity} periods (e.g., the set
of periods of a complete temporal group)
}
}
\description{
\if{html,text}{(\emph{version française: 
\url{https://StatCan.github.io/gensol-gseries/fr/reference/build_balancing_problem.html}})}

This function is used internally by \code{\link[=tsbalancing]{tsbalancing()}} to build the elements of the balancing problems.
It can also be useful to derive the indirect series associated to equality balancing constraints manually
(outside of the \code{\link[=tsbalancing]{tsbalancing()}} context).
}
\details{
See \code{\link[=tsbalancing]{tsbalancing()}} for a detailed description of \emph{time series balancing} problems.

Any missing (\code{NA}) value found in the input time series object (argument \code{in_ts}) would be replaced with 0 in \code{values_ts}
and trigger a warning message.

The returned elements of the balancing problems do not include the implicit temporal totals (i.e., elements \code{A2}, \code{op2}
and \code{b2} only contain the balancing constraints).

Multi-period balancing problem elements \code{A2}, \code{op2} and \code{b2} (when \code{temporal_grp_periodicity > 1}) are constructed
\emph{column by column} (in "column-major order"), corresponding to the default behaviour of R for converting objects of class
"matrix" into vectors. I.e., the balancing constraints conceptually correspond to:
\itemize{
\item \verb{A1 \%*\% values_ts[t, ] op1 b1} for problems involving a single period (\code{t})
\item \verb{A2 \%*\% as.vector(values_ts[t1:t2, ]) op2 b2} for problems involving \code{temporal_grp_periodicity} periods (\code{t1:t2}).
}

Notes:
\itemize{
\item Argument \code{alter_temporal} has not been applied yet at this point and \code{altertmp$coefs_ts} only contains the
coefficients specified in the problem specs data frame (argument \code{problem_specs_df}). I.e., \code{altertmp$coefs_ts} contains
missing (\code{NA}) values except for the temporal total alterability coefficients included in (specified with) \code{problem_specs_df}.
This is done in order to simplify the identification of the first non missing (non \code{NA}) temporal total alterability
coefficient of each complete temporal group (to occur later, when applicable, inside \code{\link[=tsbalancing]{tsbalancing()}}).
\item Argument validation is not performed here; it is (bluntly) assumed that the function is called by \code{\link[=tsbalancing]{tsbalancing()}}
where a thorough validation of the arguments is done.
}
}
\examples{
######################################################################################
#        Indirect series derivation framework with `tsbalancing()` metadata
######################################################################################
#
# Is is assumed (agreed) that...
#
# a) All balancing constraints are equality constraints (`type = EQ`).
# b) All constraints have only one nonbinding (free) series: the series to be derived
#    (i.e., all series have an alter. coef of 0 except the series to be derived).
# c) Each constraint derives a different (new) series.
# d) Constraints are the same for all periods (i.e., no "dated" alter. coefs 
#    specified with column `timeVal`).
######################################################################################


# Derive the 5 marginal totals of a 2 x 3 two-dimensional data cube using `tsbalancing()` 
# metadata (data cube aggregation constraints respect the above assumptions).


# Build the balancing problem specs through the (simpler) raking metadata.
my_specs <- rkMeta_to_blSpecs(
  data.frame(series = c("A1", "A2", "A3",
                        "B1", "B2", "B3"),
             total1 = c(rep("totA", 3),
                        rep("totB", 3)),
             total2 = rep(c("tot1", "tot2", "tot3"), 2)),
  alterSeries = 0,  # binding (fixed) component series
  alterTotal1 = 1,  # nonbinding (free) marginal totals (to be derived)
  alterTotal2 = 1)  # nonbinding (free) marginal totals (to be derived)
my_specs

# 6 periods (quarters) of data with marginal totals set to zero (0): they MUST exist
# in the input data AND contain valid (non missing) data.
my_ts <- ts(data.frame(A1 = c(12, 10, 12,  9, 15,  7),
                       B1 = c(20, 21, 15, 17, 19, 18),
                       A2 = c(14,  9,  8,  9, 11, 10),
                       B2 = c(20, 29, 20, 24, 21, 17),
                       A3 = c(13, 15, 17, 14, 16, 12),
                       B3 = c(24, 20, 30, 23, 21, 19),
                       tot1 = rep(0, 6),
                       tot2 = rep(0, 6),
                       tot3 = rep(0, 6),
                       totA = rep(0, 6),
                       totB = rep(0, 6)),
            start = 2019, frequency = 4)

# Get the balancing problem elements.
n_per <- nrow(my_ts)
p <- build_balancing_problem(my_ts, my_specs, 
                             temporal_grp_periodicity = n_per)

# `A2`, `op2` and `b2` define 30 constraints (5 marginal totals X 6 periods) 
# involving a total of 66 time series data points (11 series X 6 periods) of which 
# 36 belong to the 6 component series and 30 belong to the 5 marginal totals.
dim(p$A2)

# Get the names of the marginal totals (series with a nonzero alter. coef), in the order 
# in which the corresponding constraints appear in the specs (constraints specification 
# order).
tmp <- p$coefs_df$col[p$coefs_df$con.flag]
tot_names <- tmp[tmp \%in\% p$ser_names[p$alter$nondated_id_vec[p$alter$nondated_coefs != 0]]]

# Define logical flags identifying the marginal total columns:
# - `tot_col_logi1`: for single-period elements (of length 11 = number of series)
# - `tot_col_logi2`: for multi-period elements (of length 66 = number of data points), 
#                    in "column-major order" (the `A2` matrix element construction order)
tot_col_logi1 <- p$ser_names \%in\% tot_names
tot_col_logi2 <- rep(tot_col_logi1, each = n_per)

# Order of the marginal totals to be derived based on
# ... the input data columns ("mts" object `my_ts`)
p$ser_names[tot_col_logi1]
# ... the constraints specification (data frame `my_specs`)
tot_names


# Calculate the 5 marginal totals for all 6 periods
# Note: the following calculation allows for general linear equality constraints, i.e.,
#       a) nonzero right-hand side (RHS) constraint values (`b2`) and 
#       b) nonzero constraint coefs other than 1 for the component series and -1 for 
#          the derived series.
my_ts[, tot_names] <- {
  (
    # Constraints RHS.
    p$b2 - 

    # Sums of the components ("weighted" by the constraint coefficients).
    p$A2[, !tot_col_logi2, drop = FALSE] \%*\% as.vector(p$values_ts[, !tot_col_logi1])
  ) /

  # Derived series constraint coefficients: `t()` allows for a "row-major order" search 
  # in matrix `A2` (i.e., according to the constraints specification order).
  # Note: `diag(p$A2[, tot_col_logi2])` would work if `p$ser_names[tot_col_logi1]` and 
  #       `tot_names` were identical (same totals order); however, the following search 
  #       in "row-major order" will always work (and is necessary in the current case).
  t(p$A2[, tot_col_logi2])[t(p$A2[, tot_col_logi2]) != 0]
}
my_ts
}
\seealso{
\code{\link[=tsbalancing]{tsbalancing()}} \code{\link[=build_raking_problem]{build_raking_problem()}}
}
