% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pcss.core.R
\name{pcss.core}
\alias{pcss.core}
\title{Principal Component Scoring to Generate Core collections}
\usage{
pcss.core(
  data,
  names,
  quantitative,
  qualitative,
  eigen.threshold = NULL,
  size = 0.2,
  var.threshold = 0.75
)
}
\arguments{
\item{data}{The data as a data frame object. The data frame should possess
one row per individual and columns with the individual names and multiple
trait/character data.}

\item{names}{Name of column with the individual/genotype names as a character
string.}

\item{quantitative}{Name of columns with the quantitative traits as a
character vector.}

\item{qualitative}{Name of columns with the qualitative traits as a character
vector.}

\item{eigen.threshold}{The lower limit of the eigen value of factors to be
included in the estimation. The default value is the average of all the
eigen values.}

\item{size}{The desired core set size proportion.}

\item{var.threshold}{The desired proportion of total variability to be}
}
\value{
A list of class \code{pcss.core} with the following components.
  \item{details}{The details of the core set generation process.}
  \item{raw.out}{The original output of \code{\link[FactoMineR]{PCA}},
  \code{\link[FactoMineR]{CA}} and \code{\link[FactoMineR]{FAMD}} functions
  of \code{\link[FactoMineR]{FactoMineR}}} \item{eigen}{A data frame with
  eigen values and their partial and cumulative contribution to percentage of
  variance.} \item{eigen.threshold}{The threshold eigen value used.}
  \item{rotation}{A matrix of rotation values or loadings.} \item{scores}{A
  matrix of scores from PCA, CA or FAMD.} \item{variability.ret}{A data frame
  of individuals/genotypes ordered by variability retained.}
  \item{cores.info}{A data frame of core set size and percentage variability
  retained according to the method used.}
}
\description{
Generate a Core Collection with Principal Component Scoring Strategy (PCSS)
\insertCite{hamon_proposed_1990,noirot_principal_1996,noirot_method_2003}{rpcss}
using qualitative and/or quantitative trait data. \loadmathjax
}
\details{
A core collection is constituted from an entire collection of \mjseqn{N}
genotypes using quantitative data of \mjseqn{J} traits using Principal
Component Scoring Strategy (PCSS)
\insertCite{hamon_proposed_1990,noirot_principal_1996,noirot_method_2003}{rpcss}
as follows:

\enumerate{

\item Principal Component Analysis (PCA) is performed on the standardized
genotype \mjseqn{\times} trait data. This takes care of multicollinearity
between the traits to generate \mjseqn{J} standardized and independent
variables or factors or principal component.

\item Considering only a subset of factors \mjseqn{K}, the Generalized Sum of
 Squares (GSS) of N individuals in K factorial spaces is computed as
\mjseqn{N \times K}.

\mjseqn{K} can be the number of factors for which the eigen value
\mjseqn{\lambda} is greater than a threshold value such as 1 (Kaiser-Guttman
criterion) or the average of all the eigen values.

\item The contribution of the \mjseqn{i}th genotype to GSS (\mjseqn{P_{i}})
or total variability is calculated as below.

\mjsdeqn{P_{i} = \sum_{j = 1}^{K} x_{ij}^{2}}

Where \mjseqn{x_{ij}} is the component score or coordinate of the
\mjseqn{i}th genotype on the \mjseqn{j}th principal component.

\item For each genotype, its relative contribution to GSS or total
variability is computed as below.

\mjsdeqn{CR_{i} = \frac{P_{i}}{N \times K}}

\item The genotypes are sorted in descending order of magnitude of their
contribution to GSS and then the cumulative contribution of successive
genotypes to GSS is computed.

\item The core collection can then be selected by three different methods.

 \enumerate{

 \item Selection of fixed proportion or percentage or number of the top
 accessions.

 \item Selection of the top accessions that contribute up to a fixed
 percentage of the GSS.

 \item Fitting a logistic regression model of the following form to the
 cumulative contribution of successive genotypes to GSS
 \insertCite{balakrishnan_method_2000}{rpcss}.

 \mjsdeqn{\frac{y}{A-y} = e^{a + bn}}

 The above equation can  be reparameterized as below.

 \mjsdeqn{\log_{e} \left ( {\frac{y}{A-y}} \right ) = a + bn}

 Where, \mjseqn{a} and \mjseqn{b} are the intercept and regression
 coefficient, respectively; \mjseqn{y} is the cumulative contribution of
 successive genotypes to GSS; \mjseqn{n} is the rank of the genotype when
 sorted according to the contribution to GSS and \mjseqn{A} is the asymptote
 of the curve (\mjseqn{A = 100}).

 The rate of increase in the successive contribution of genotypes to GSS can
 be computed by the following equation to find the point of inflection where
 the rate of increase starts declining.

 \mjseqn{\frac{\mathrm{d} y}{\mathrm{d} x} = by(A-y)}

 The number of accessions included till the peak or infection point are
 selected to constitute the core collection.

 }

}

Similarly for qualitative traits, standardized and independent variables or
factors can be obtained by Correspondence Analysis (CA) on complete
disjunctive table of genotype \mjseqn{\times} trait data or to be specific
Multiple Correspondence Analysis (MCA). In \code{rpcss}, this has also been
extended for data sets having both quantitative and qualitative traits by
implementing Factor Analysis for Mixed Data (FAMD) for obtaining standardized
and independent variables or factors.

In \code{rpcss}, PCA, MCA and FAMD are implemented via the
\code{\link[FactoMineR]{FactoMineR}} package.
\insertCite{le_FactoMineR_2008,husson_Exploratory_2017}{rpcss}.
}
\examples{

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Prepare example data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

suppressPackageStartupMessages(library(EvaluateCore))

# Get data from EvaluateCore

data("cassava_EC", package = "EvaluateCore")
data = cbind(Genotypes = rownames(cassava_EC), cassava_EC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")
rownames(data) <- NULL

# Convert qualitative data columns to factor
data[, qual] <- lapply(data[, qual], as.factor)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out1 <- pcss.core(data = data, names = "Genotypes",
                  quantitative = quant,
                  qualitative = NULL, eigen.threshold = NULL, size = 0.2,
                  var.threshold = 0.75)

out1

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out2 <- pcss.core(data = data, names = "Genotypes", quantitative = NULL,
                  qualitative = qual, eigen.threshold = NULL,
                  size = 0.2, var.threshold = 0.75)

out2

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative and qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out3 <- pcss.core(data = data, names = "Genotypes",
                  quantitative = quant,
                  qualitative = qual, eigen.threshold = NULL)

out3


}
\references{
\insertAllCited{}
}
\seealso{
\code{\link[FactoMineR]{PCA}}, \code{\link[FactoMineR]{CA}} and
  \code{\link[FactoMineR]{FAMD}}
}
