% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/prepsources.R
\name{prepsources}
\alias{prepsources}
\title{Filter and aggregate the raw source dataset}
\usage{
prepsources(
  data,
  month = 1:12,
  year,
  long_min = -180,
  long_max = 180,
  lat_min = -90,
  lat_max = 90,
  split_by = NULL,
  prop_random = 0,
  random_level = "source",
  col_source_value = "source_value",
  col_source_ID = "source_ID",
  col_lat = "lat",
  col_long = "long",
  col_elev = "elev",
  col_month = "month",
  col_year = "year"
)
}
\arguments{
\item{data}{A \emph{dataframe} containing raw isotopic measurements of sources}

\item{month}{A \emph{numeric vector} indicating the months to select from.
Should be a vector of round numbers between 1 and 12. The default is 1:12
selecting all months.}

\item{year}{A \emph{numeric vector} indicating the years to select from.
Should be a vector of round numbers. The default is to select all years
available.}

\item{long_min}{A \emph{numeric} indicating the minimum longitude to select
from. Should be a number between -180 and 180 (default = -180).}

\item{long_max}{A \emph{numeric} indicating the maximal longitude to select
from. Should be a number between -180 and 180 (default = 180).}

\item{lat_min}{A \emph{numeric} indicating the minimum latitude to select
from. Should be a number between -90 and 90 (default = -90).}

\item{lat_max}{A \emph{numeric} indicating the maximal latitude to select
from (default = 90).}

\item{split_by}{A \emph{string} indicating whether data should be aggregated
per location (\code{split_by = NULL}, the default), per location:month
combination (\code{split_by = "month"}), or per location:year combination
(\code{split_by = "year"}).}

\item{prop_random}{A \emph{numeric} indicating the proportion of observations
or sampling locations (depending on the argument for \code{random_level})
that will be kept. If \code{prop_random} is greater than 0, then the
function will return a list containing two dataframes: one containing the
selected data, called \code{selected_data}, and one containing the
remaining data, called \code{remaining_data}.}

\item{random_level}{A \emph{string} indicating the level at which random draws
can be performed. The two possibilities are \code{"obs"}, which indicates
that observations are randomly drawn taken independently of their location,
or "source" (default), which indicates that observations are randomly drawn
at the level of sampling locations.}

\item{col_source_value}{A \emph{string} indicating the column containing the
isotopic measurements}

\item{col_source_ID}{A \emph{string} indicating the column containing the ID of
each sampling location}

\item{col_lat}{A \emph{string} indicating the column containing the latitude
of each sampling location}

\item{col_long}{A \emph{string} indicating the column containing the longitude
of each sampling location}

\item{col_elev}{A \emph{string} indicating the column containing the elevation
of each sampling location}

\item{col_month}{A \emph{string} indicating the column containing the month of
sampling}

\item{col_year}{A \emph{string} indicating the column containing the year of
sampling}
}
\value{
This function returns a \emph{dataframe} containing the filtered data
aggregated by sampling location, or a \emph{list}, see above argument
\code{prop_random}. For each sampling location the mean and variance sample
estimates are computed.
}
\description{
This function prepares the available dataset to be used for creating the
isoscape (e.g. \link{GNIPDataDE}). This function allows the trimming of data
by months, years and location, and for the aggregation of selected data per
location, location:month combination or location:year combination. The
function can also be used to randomly exclude some observations.
}
\details{
This function aggregates the data as required for the IsoriX workflow. Three
aggregation schemes are possible for now. The most simple one, used as
default, aggregates the data so to obtained a single row per sampling
location. Datasets prepared in this way can be readily fitted with the
function \link{isofit} to build an isoscape. It is also possible to
aggregate data in a different way in order to build sub-isoscapes
representing temporal variation in isotope composition, or in order to
produce isoscapes weighted by the amount of precipitation (for isoscapes on
precipitation data only). The two possible options are to either split the
data from each location by month or to split them by year. This is set with
the \code{split_by} argument of the function. Datasets prepared in this way
should be fitted with the function \link{isomultifit}.

The function also allows the user to filter the sampling locations based on
time (years and/ or months) and space (locations given in geographic
coordinates, i.e. longitude and latitude) to calculate tailored isoscapes
matching e.g. the time of sampling and speeding up the model fit by
cropping/clipping a certain area. The dataframe produced by this function can
be used as input to fit the isoscape (see \link{isofit} and
\link{isomultifit}).
}
\examples{
## Create a processed dataset for Germany
GNIPDataDEagg <- prepsources(data = GNIPDataDE)

head(GNIPDataDEagg)

## Create a processed dataset for Germany per month
GNIPDataDEmonthly <- prepsources(
  data = GNIPDataDE,
  split_by = "month"
)

head(GNIPDataDEmonthly)

## Create a processed dataset for Germany per year
GNIPDataDEyearly <- prepsources(
  data = GNIPDataDE,
  split_by = "year"
)

head(GNIPDataDEyearly)

## Create isoscape-dataset for warm months in germany between 1995 and 1996
GNIPDataDEwarm <- prepsources(
  data = GNIPDataDE,
  month = 5:8,
  year = 1995:1996
)

head(GNIPDataDEwarm)


## Create a dataset with 90\% of obs
GNIPDataDE90pct <- prepsources(
  data = GNIPDataDE,
  prop_random = 0.9,
  random_level = "obs"
)

lapply(GNIPDataDE90pct, head) # show beginning of both datasets

## Create a dataset with half the weather sources
GNIPDataDE50pctsources <- prepsources(
  data = GNIPDataDE,
  prop_random = 0.5,
  random_level = "source"
)

lapply(GNIPDataDE50pctsources, head)


## Create a dataset with half the weather sources split per month
GNIPDataDE50pctsourcesMonthly <- prepsources(
  data = GNIPDataDE,
  split_by = "month",
  prop_random = 0.5,
  random_level = "source"
)

lapply(GNIPDataDE50pctsourcesMonthly, head)

}
