% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pca_function.R
\name{determine_factors}
\alias{determine_factors}
\title{Determine the Optimal Number of Factors via an Information Criterion}
\usage{
determine_factors(returns, max_m, bandwidth = silverman(returns))
}
\arguments{
\item{returns}{A numeric matrix of asset returns with dimensions \eqn{T \times p}.}

\item{max_m}{Integer. The maximum number of factors to consider.}

\item{bandwidth}{Numeric. Kernel bandwidth for local PCA. Default is Silverman's rule of thumb.}
}
\value{
A list with:
\itemize{
  \item \code{optimal_m}: Integer. The optimal number of factors.
  \item \code{IC_values}: Numeric vector of IC values for each candidate \eqn{m}.
}
}
\description{
This function selects the optimal number of factors for a local principal component
analysis (PCA) model of asset returns. It computes an BIC-type information criterion (IC) for each candidate
number of factors, based on the sum of squared residuals (SSR) from the PCA reconstruction and a
penalty term that increases with the number of factors. The optimal number of factors is chosen as the
one that minimizes the IC. The procedure is available either as a stand-alone
function or as a method in the `TVMVP` R6 class.
}
\details{
Two usage styles:

\preformatted{
# Function interface
determine_factors(returns, max_m = 5)

# R6 method interface
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()
}
   
When using the method form, if `max_m` or `bandwidth` are omitted,
they default to values stored in the object. Results are cached and
retrievable via class methods.
   
For each candidate number of factors \eqn{m} (from 1 to \code{max_m}), the function:

\enumerate{
  \item Performs a local PCA on the returns at each time point \eqn{r = 1,\dots,T} using \eqn{m} factors.
  \item Computes a reconstruction of the returns and the corresponding residuals:
        \deqn{\text{Residual}_r = R_r - F_r \Lambda_r,}
        where \eqn{R_r} is the return at time \eqn{r}, and \eqn{F_r} and \eqn{\Lambda_r} are the local factors and loadings, respectively.
  \item Computes the average sum of squared residuals (SSR) as:
        \deqn{V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.}
  \item Adds a penalty term that increases with \eqn{R}:
        \deqn{\text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).}
  \item The information criterion is defined as:
        \deqn{\text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).}
}

The optimal number of factors is then chosen as the value of \eqn{m} that minimizes \eqn{\text{IC}(m)}.
}
\section{References}{
  
Su, L., & Wang, X. (2017). On time-varying factor models: Estimation and testing. Journal of Econometrics, 198(1), 84–101.
}

\examples{
set.seed(123)
returns <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30)

# Function usage
result <- determine_factors(returns, max_m = 5)
print(result$optimal_m)
print(result$IC_values)

# R6 usage
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()

}
