% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/semFit.R
\name{SEMrun}
\alias{SEMrun}
\title{Fit a graph as a Structural Equation Model (SEM)}
\usage{
SEMrun(
  graph,
  data,
  group = NULL,
  fit = 0,
  algo = "lavaan",
  start = NULL,
  limit = 100,
  ...
)
}
\arguments{
\item{graph}{An igraph object.}

\item{data}{A matrix whith rows corresponding to subjects, and
columns to graph nodes (variables).}

\item{group}{A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. If \code{NULL} (default), group influence will
not be considered.}

\item{fit}{A numeric value indicating the SEM fitting mode.
If \code{fit = 0} (default), no group effect is considered.
If \code{fit = 1}, a "common" model is used to evaluate group effects
on graph nodes.
If \code{fit = 2}, a two-group model is used to evaluate group effects
on graph edges.}

\item{algo}{MLE method used for SEM fitting. If \code{algo = "lavaan"}
(default), the SEM will be fitted using the NLMINB solver from
\code{lavaan} R package, with standard errors derived from the expected
Fisher information matrix. If \code{algo = "ricf"}, the model is fitted
via residual iterative conditional fitting (RICF; Drton et al. 2009).
If \code{algo = "cggm"}, model fitting is based on constrained Gaussian
Graphical Modeling (GGM) and de-sparsified glasso estimator
(Williams, 2020).}

\item{start}{Starting value of SEM parameters for \code{algo = "lavaan"}.
If start is \code{NULL} (default), the algorithm will determine the
starting values. If start is a numeric value, it will be used as a
scaling factor for the edge weights in the graph object (graph attribute
\code{E(graph)$weight}).
For instance, a scaling factor is useful when weights have fixed values
(e.g., 1 for activated, -1 for repressed, and 0 for unchanged interaction).
Fixed values may compromise model fitting, and scaling them is a safe
option to avoid this problem. As a rule of thumb, to our experience,
\code{start = 0.1} generally performs well with {-1, 0, 1} weights.}

\item{limit}{An integer value corresponding to the network size
(i.e., number of nodes). Beyond this limit, the execution under
\code{algo = "lavaan"} will be ridirected to \code{algo = "ricf"}, if
fit is either 0 or 1, or to \code{algo = "ggm"}, if \code{fit = 2}.
This redirection is necessary to reduce the computational demand of
standard error estimation by lavaan. Increasing this number will
enforce lavaan execution when \code{algo = "lavaan"}.}

\item{...}{Currently ignored.}
}
\value{
A list of 5 objects:
\enumerate{
\item "fit", SEM fitted lavaan, ricf, or ggmncv object,
depending on the MLE method specified by the \code{algo} argument;
\item "gest" or "dest", a data.frame of node-specific
("gest") or edge-specific ("dest") group effect estimates and P-values;
\item "model", SEM model as a string if \code{algo = "lavaan"},
and \code{NULL} otherwise;
\item "graph", the induced subgraph of the input network mapped
on data variables. Graph edges (i.e., direct effects) with P-value < 0.05
will be highlighted in red (beta > 0) or blue (beta < 0). If a group
vector is given, nodes with significant group effect (P-value < 0.05)
will be red-shaded (beta > 0) or lightblue-shaded (beta < 0);
\item "dataXY", input data subset mapping graph nodes, plus
group at the first column (if no group is specified, this column will
take NA values).
}
}
\description{
\code{SEMrun()} converts a (directed, undirected, or mixed)
graph to a SEM and fits it. If a binary group variable (i.e., case/control)
is present, node-level or edge-level perturbation is evaluated.
This function can handle loop-containing models, although multiple
links between the same two nodes (including self-loops and mutual
interactions) and bows (i.e., a directed and a bidirected link between
two nodes) are not allowed.
}
\details{
SEMrun maps data onto the input graph and converts it into a
SEM. Directed connections (X -> Y) are interpreted as direct causal
effects, while undirected, mutual, and bidirected connections are
converted into model covariances. SEMrun output contains different sets
of parameter estimates. Beta coefficients (i.e., direct effects) are
estimated from directed interactions and residual covariances (psi
coefficients) from bidirected, undirected, or mutual interactions.
If a group variable is given, exogenous group effects on nodes (gamma
coefficients) will be estimated. This will also lead to the estimation
of a set of aggregated group effects, if \code{algo = "ricf"} (see
\code{\link[SEMgraph]{SEMgsa}}).
By default, maximum likelihood parameter estimates and P-values for
parameter sets are computed by conventional z-test (= estimate/SE),
and fits it through the \code{\link[lavaan]{lavaan}} function, via
Maximum Likelihood Estimation (estimator = "ML", default estimator in
\code{\link[lavaan]{lavOptions}}).
In case of high dimensionality (n.variables >> n.subjects), the covariance
matrix could not be semi-definite positive and thus parameter estimates
could not be done. If this happens, covariance matrix regularization
is enabled using the James-Stein-type shrinkage estimator implemented
in the function \code{\link[corpcor]{pcor.shrink}} of corpcor R package.
Argument \code{fit} determines how group influence is evaluated in the
model, as absent (\code{fit = 0}), node perturbation (\code{fit = 1}),
or edge perturbation (\code{fit = 2}). When \code{fit = 1}, the group
is modeled as an exogenous variable, influencing all the other graph
nodes. When \code{fit = 2}, SEMrun estimates the differences
of the beta and/or psi coefficients (network edges) between groups.
This is equivalent to fit a separate model for cases and controls,
as opposed to one common model perturbed by the exogenous group effect.
Once fitted, the two models are then compared to assess significant
edge (i.e., direct effect) differences (d = beta1 - beta0).
P-values for parameter sets are computed by z-test (= d/SE), through
\code{\link[lavaan]{lavaan}}. As an alternative to standard P-value
calculation, SEMrun may use either RICF (randomization P-values) or
GGM (de-sparsified P-values) methods. These algorithms are much faster
than \code{\link[lavaan]{lavaan}} in case of large input graphs.
}
\examples{

#### Model fitting (no group effect)

sem0 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc), algo = "lavaan")
summary(sem0$fit)
head(parameterEstimates(sem0$fit))

sem0 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc), algo = "ricf")
summary(sem0$fit)
head(sem0$fit$parameterEstimates)

sem0 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc), algo = "cggm")
summary(sem0$fit)
head(sem0$fit$parameterEstimates)

# Graphs
gplot(sem0$graph, main = "edge differences")
plot(sem0$graph, layout = layout.circle, main = "edge differences")


#### Model fitting (common model, group effect on nodes)

sem1 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc),
               group = sachs$group)

# Fitting summaries
summary(sem1$fit)
print(sem1$gest)
head(parameterEstimates(sem1$fit))

# Graphs
gplot(sem1$graph, main = "node differences")
plot(sem1$graph, layout = layout.circle, main = "node differences")


#### Two-group model fitting (group effect on edges)

sem2 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc),
               group = sachs$group,
               fit = 2)

# Summaries
summary(sem2$fit)
print(sem2$dest)
head(parameterEstimates(sem2$fit))

# Graphs
gplot(sem2$graph, main = "Between group edge differences")
plot(sem2$graph, layout = layout.circle, main = "Between group edge differences")

\donttest{

# Fitting and visualization of a large pathway:

g <- kegg.pathways[["MAPK signaling pathway"]]
G <- properties(g)[[1]]; summary(G)

library(huge)
als.npn <- huge.npn(alsData$exprs)

g1 <- SEMrun(G, als.npn, alsData$group, algo = "cggm")$graph
g2 <- SEMrun(g1, als.npn, alsData$group, fit = 2, algo = "cggm")$graph

# extract the subgraph with between group node and edge differences
g2 <- g2 - E(g2)[-which(E(g2)$color != "gray50")]
g <- properties(g2)[[1]]

# plot graph
library(org.Hs.eg.db)
V(g)$label <- mapIds(org.Hs.eg.db, V(g)$name, 'SYMBOL', 'ENTREZID')
E(g)$color<- E(g2)$color[E(g2) \%in\% E(g)]
gplot(g, l = "fdp", main="node and edge group differences")

}

}
\references{
Pearl J (1998). Graphs, Causality, and Structural Equation Models.
Sociological Methods & Research., 27(2):226-284.
<https://doi.org/10.1177/0049124198027002004>

Yves Rosseel (2012). lavaan: An R Package for Structural Equation
Modeling. Journal of Statistical Software, 48(2): 1-36.
<https://www.jstatsoft.org/v48/i02/>

Pepe D, Grassi M (2014). Investigating perturbed pathway modules
from gene expression data via Structural Equation Models. BMC
Bioinformatics, 15: 132.
<https://doi.org/10.1186/1471-2105-15-132>

Drton M, Eichler M, Richardson TS (2009). Computing Maximum Likelihood
Estimated in Recursive Linear Models with Correlated Errors.
Journal of Machine Learning Research, 10(Oct): 2329-2348.
<https://www.jmlr.org/papers/volume10/drton09a/drton09a.pdf>

Larson JL and Owen AB (2015). Moment based gene set tests. BMC
Bioinformatics, 16: 132. <https://doi.org/10.1186/s12859-015-0571-7>

Palluzzi F, Grassi M (2021). SEMgraph: An R Package for Causal Network
Analysis of High-Throughput Data with Structural Equation Models.
<arXiv:2103.08332>

Williams D (2020). GGMncv: Gaussian Graphical Models with Non-Convex
Penalties. R package version 1.1.0.
<https://CRAN.R-project.org/package=GGMncv/>
}
\seealso{
See \code{\link[ggm]{fitAncestralGraph}} for RICF algorithm
details, \code{\link[flip]{flip}} for randomization P-values, and
\code{\link[GGMncv]{constrained}} for constrained GGM, and
\code{\link[GGMncv]{inference}} for de-sparsified P-values.
}
\author{
Mario Grassi \email{mario.grassi@unipv.it}
}
