% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/kernel_functions.R
\name{Jaccard}
\alias{Jaccard}
\alias{Intersect}
\title{Kernels for sets}
\usage{
Jaccard(X, elements = LETTERS, comp = "sum", coeff = NULL)

Intersect(
  X,
  elements = LETTERS,
  comp = "sum",
  coeff = NULL,
  feat_space = FALSE
)
}
\arguments{
\item{X}{Matrix (class "character") or data.frame (class "character", or columns = "factor").
The elements in X are assumed to be categorical in nature.}

\item{elements}{All potential elements (symbols) that can appear in the sets. If there are
some elements that are not of interest, they can be excluded so they are not
taken into account by these kernels. (Defaults: LETTERS).}

\item{comp}{When \emph{D}>1, this argument indicates how the variables
of the dataset are combined. Options are: "mean", "sum" and "weighted". (Defaults: "mean")
\itemize{
  \item "sum" gives the same importance to all variables, and returns an
  unnormalized kernel matrix.
  \item "mean" gives the same importance to all variables, and returns a
  normalized kernel matrix (all its elements range between 0 and 1).
  \item "weighted" weights each variable according to the `coeff` parameter, and returns a
  normalized kernel matrix.
}}

\item{coeff}{(optional) A vector of weights with length \emph{D}.}

\item{feat_space}{(not available for the Jaccard kernel). If FALSE, only the
kernel matrix is returned. Otherwise, the feature space is returned too. (Defaults: FALSE).}
}
\value{
Kernel matrix (dimension: \emph{NxN}), or a list with the kernel matrix and the
feature space.
}
\description{
`Intersect()` or `Jaccard()` compute the kernel functions of the same name,
which are useful for set data. Their input is a matrix or data.frame with
dimension \emph{NxD}, where \emph{N}>1, \emph{D}>0. Samples should be in the
rows and features in the columns. When there is a single feature,
`Jaccard()` returns 1 if the elements of the set are exactly the same in
two given samples, and 0 if they are completely different (see Details). Instead,
in the multivariate case (\emph{D}>1), the results (for both `Intersect()` and
`Jaccard()`) of the \emph{D} features are combined with a sum, a mean, or a
weighted mean.
}
\details{
Let \eqn{A,B} be two sets. Then, the Intersect
kernel is defined as:

\deqn{K_{Intersect}(A,B)=|A \cap B| }

And the Jaccard kernel is defined as:

\deqn{K_{Jaccard}(A,B)=|A \cap B| / |A \cup B|}

This specific implementation of the Intersect and Jaccard kernels expects
that the set members (elements) are character symbols (length=1). In case the
set data is multivariate (\emph{D}>1 columns, and each one contains a set feature),
elements for the \emph{D} sets should come from the same domain (universe).
For instance, a dataset with two variables, so the elements
in the first one are colors c("green","black","white","red") and the second are names
c("Anna","Elsa","Maria") is not allowed. In that case, set factors should be recoded
to colors c("g","b","w","r") and names c("A","E","M") and, if necessary, 'Intersect()'
(or `Jaccard()`) should be called twice.
}
\examples{
# Sets data
## Generating a dataset with sets containing uppercase letters
random_set <- function(x)paste(sort(sample(LETTERS,x,FALSE)),sep="",collapse = "")
max_setsize <- 4
setsdata <- matrix(replicate(20,random_set(sample(2:max_setsize,1))),nrow=4,ncol=5)

## Computing the Intersect kernel:
Intersect(setsdata,elements=LETTERS,comp="sum")

## Computing the Jaccard kernel weighting the variables:
coeffs <- c(0.1,0.15,0.15,0.4,0.20)
Jaccard(setsdata,elements=LETTERS,comp="weighted",coeff=coeffs)
}
\references{
Bouchard, M., Jousselme, A. L., and Doré, P. E. (2013).
A proof for the positive definiteness of the Jaccard index matrix.
International Journal of Approximate Reasoning, 54(5), 615-626.

Ruiz, F., Angulo, C., and Agell, N. (2008).
Intersection and Signed-Intersection Kernels for Intervals.
Frontiers in Artificial Intelligence and Applications. 184. 262-270.
doi: 10.3233/978-1-58603-925-7-262.
}
