% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/EG_selection.R
\name{EG_selection}
\alias{EG_selection}
\title{Selection of survey sites maximizing uniformity in environmental space
considering geographic structure}
\usage{
EG_selection(master, n_blocks, guess_distances = TRUE, initial_distance = NULL,
             increase = NULL, max_n_samplings = 1, replicates = 10,
             use_preselected_sites = TRUE, select_point = "E_centroid",
             cluster_method = "hierarchical", median_distance_filter = NULL,
             sample_for_distance = 250, set_seed = 1,
             verbose = TRUE, force = FALSE)
}
\arguments{
\item{master}{master_matrix object derived from the function
\code{\link{prepare_master_matrix}} or master_selection object derived from
functions \code{\link{random_selection}}, \code{\link{uniformG_selection}},
or \code{\link{uniformE_selection}}.}

\item{n_blocks}{(numeric) number of blocks to be selected to be used as the
base for further explorations. Default = NULL.}

\item{guess_distances}{(logical) whether or not to use internal algorithm
to automatically select \code{initial_distance} and \code{increase}. Default
= TRUE. If FALSE, \code{initial_distance} and \code{increase} must be
defined.}

\item{initial_distance}{(numeric) Euclidean distance to be used for a first
process of thinning and detection of remaining blocks. See details in
\code{\link{point_thinning}}. Default = NULL.}

\item{increase}{(numeric) initial value to be added to or subtracted from
\code{initial_distance} until reaching the number of \code{expected_points}.
Default = NULL.}

\item{max_n_samplings}{(numeric) maximum number of samples to be chosen
after performing all thinning \code{replicates}. Default = 1.}

\item{replicates}{(numeric) number of thinning replicates performed to
select blocks uniformly. Default = 10.}

\item{use_preselected_sites}{(logical) whether to use sites that have been
defined as part of the selected sites previous any selection. Object in
\code{master} must contain the site(s) preselected in and element of name
"preselected_sites" for this argument to be effective. Default = TRUE.
See details for more information on the approach used.}

\item{select_point}{(character) how or which point will be selected for each
block or cluster. Three options are available: "random", "E_centroid", and
"G_centroid". E_ or G_ centroid indicate that the point(s) closets to the
respective centroid will be selected. Default = "E_centroid".}

\item{cluster_method}{(character) name of the method to be used for detecting
geographic clusters of points inside each block. Options are "hierarchical"
and "k-means"; default = "hierarchical". See details in
\code{\link{find_clusters}}.}

\item{median_distance_filter}{(character) optional argument to define a
median distance-based filter based on which sets of sampling sites will be
selected. The default, NULL, does not apply such a filter. Options are:
"max" and "min". See details.}

\item{sample_for_distance}{(numeric) sample to be considered when measuring
the geographic distances among points in blocks created in environmental
space. The distances measured are then used to test whether points are
distributed uniformly or not in the geography. Default = 250.}

\item{set_seed}{(numeric) integer value to specify a initial seed.
Default = 1.}

\item{verbose}{(logical) whether or not to print messages about the process.
Default = TRUE.}

\item{force}{(logical) whether to replace existing set of sites selected
with this method in \code{master}.}
}
\value{
A \code{\link{master_selection}} object (S3) with a special element called
selected_sites_EG containing one or more sets of selected sites depending on
\code{max_n_samplings} and \code{median_distance_filter}.
}
\description{
Selection of sites to be sampled in a survey, with the goal of
maximizing uniformity of points in the environment, but considering
geographic patterns of data. Sets of points that are environmentally similar
and have a disjoint pattern in geography, are selected twice (two survey
sites are placed so they consider the biggest geographic clusters).
}
\details{
Two important steps are needed before using this function: 1) exploring data
in environmental and geographic spaces, and 2) performing a regionalization
of the environmental space. Exploring the data can be done using the function
\code{\link{explore_data_EG}}. This step is optional but strongly
recommended, as important decisions that need to be taken depend on the
of the data in the two spaces. A regionalization of the environmental space
configuration of the region of interest helps in defining important parts of
your region that should be considered to select sites. This can be done
using the function \code{\link{make_blocks}}. Later, the regions created in
environmental space will be used for selecting one or more sampling sites per
block depending on the geographic pattern of such environmental combinations.

The process of survey-site selection with this function is the most complex
among all functions in this package. The complexity derives from the aim of
the function, which is to select sites that sample appropriately
environmental combinations in the region of interest (environmental space),
but considering the geographic patterns of such environmental regions
(geographic space).

In this approach, the first step is to select candidate blocks (from the
ones obtained with \code{\link{make_blocks}}) that are uniformly distributed
in environmental space. The geographic configuration of points in such
blocks is explored to detect whether they are clustered (i.e., similar
environmental conditions are present in distant places in the region of
interest). For blocks with points that present one cluster in geography,
only one survey site is selected, and for those with multiple clusters in
geographic space, two survey sites are selected considering the two largest
clusters.

If \code{use_preselected_sites} is TRUE and such sites are included as an
element in the object in \code{master}, the approach for selecting sites in
environmental space considering geographic patterns is a little  different.
User-preselected sites will always be part of the sites selected. Other points
are selected based on an algorithm that searches for sites that are uniformly
distributed in environmental space but at a distance from preselected sites
that helps in maintaining uniformity among environmental blocks selected.
Note that preselected sites will not be processed, therefore, uniformity of
blocks representing such points cannot be warrantied.

As multiple sets could result from selection, the argument of the function
\code{median_distance_filter} could be used to select the set of sites with
the maximum ("max") or minimum ("min") median distance among selected sites.
Option "max" will increase the geographic distance among sampling sites,
which could be desirable if the goal is to cover the region of interest more
broadly. The other option, "min", could be used in cases when the goal is to
reduce resources and time needed to sample such sites.
}
\examples{
\donttest{
# Data
data("m_matrix", package = "biosurvey")

# Making blocks for analysis
m_blocks <- make_blocks(m_matrix, variable_1 = "PC1", variable_2 = "PC2",
                        n_cols = 10, n_rows = 10, block_type = "equal_area")

# Checking column names
colnames(m_blocks$data_matrix)

# Selecting sites uniformly in E and G spaces
EG_sel <- EG_selection(master = m_blocks, n_blocks = 10,
                       initial_distance = 1.5, increase = 0.1,
                       replicates = 1, max_n_samplings = 1,
                       select_point = "E_centroid",
                       cluster_method = "hierarchical",
                       sample_for_distance = 100)

head(EG_sel$selected_sites_EG[[1]])
dim(EG_sel$selected_sites_EG[[1]])
}
}
\seealso{
\code{\link{uniformG_selection}}, \code{\link{uniformE_selection}},
\code{\link{random_selection}}, \code{\link{make_blocks}},
\code{\link{plot_sites_EG}}
}
