% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/timeAverage.R
\name{timeAverage}
\alias{timeAverage}
\title{Function to calculate time averages for data frames}
\usage{
timeAverage(
  mydata,
  avg.time = "day",
  data.thresh = 0,
  statistic = "mean",
  type = "default",
  percentile = NA,
  start.date = NA,
  end.date = NA,
  interval = NA,
  vector.ws = FALSE,
  fill = FALSE,
  ...
)
}
\arguments{
\item{mydata}{A data frame containing a \code{date} field . Can be
class \code{POSIXct} or \code{Date}.}

\item{avg.time}{This defines the time period to average to. Can be
  \dQuote{sec}, \dQuote{min}, \dQuote{hour}, \dQuote{day},
  \dQuote{DSTday}, \dQuote{week}, \dQuote{month}, \dQuote{quarter}
  or \dQuote{year}. For much increased flexibility a number can
  precede these options followed by a space. For example, a
  timeAverage of 2 months would be \code{period = "2 month"}. In
  addition, \code{avg.time} can equal \dQuote{season}, in which
  case 3-month seasonal values are calculated with spring defined
  as March, April, May and so on.

  Note that \code{avg.time} can be \emph{less} than the time
  interval of the original series, in which case the series is
  expanded to the new time interval. This is useful, for example,
  for calculating a 15-minute time series from an hourly one where
  an hourly value is repeated for each new 15-minute period. Note
  that when expanding data in this way it is necessary to ensure
  that the time interval of the original series is an exact
  multiple of \code{avg.time} e.g. hour to 10 minutes, day to
  hour. Also, the input time series must have consistent time gaps
  between successive intervals so that \code{timeAverage} can work
  out how much \sQuote{padding} to apply. To pad-out data in this
  way choose \code{fill = TRUE}.}

\item{data.thresh}{The data capture threshold to use (\%). A value
of zero means that all available data will be used in a
particular period regardless if of the number of values
available. Conversely, a value of 100 will mean that all data
will need to be present for the average to be calculated, else
it is recorded as \code{NA}. See also \code{interval},
\code{start.date} and \code{end.date} to see whether it is
advisable to set these other options.}

\item{statistic}{The statistic to apply when aggregating the data;
default is the mean. Can be one of \dQuote{mean}, \dQuote{max},
\dQuote{min}, \dQuote{median}, \dQuote{frequency}, \dQuote{sd},
\dQuote{percentile}. Note that \dQuote{sd} is the standard
deviation, \dQuote{frequency} is the number (frequency) of valid
records in the period and \dQuote{data.cap} is the percentage
data capture. \dQuote{percentile} is the percentile level (\%)
between 0-100, which can be set using the \dQuote{percentile}
option --- see below. Not used if \code{avg.time = "default"}.}

\item{type}{\code{type} allows \code{timeAverage} to be applied to
cases where there are groups of data that need to be split and
the function applied to each group. The most common example is
data with multiple sites identified with a column representing
site name e.g. \code{type = "site"}. More generally, \code{type}
should be used where the date repeats for a particular grouping
variable. However, if type is not supplied the data will still
be averaged but the grouping variables (character or factor)
will be dropped.}

\item{percentile}{The percentile level in \% used when
\code{statistic = "percentile"}. The default is 95.}

\item{start.date}{A string giving a start date to use. This is
sometimes useful if a time series starts between obvious
intervals. For example, for a 1-minute time series that starts
\dQuote{2009-11-29 12:07:00} that needs to be averaged up to
15-minute means, the intervals would be \dQuote{2009-11-29
12:07:00}, \dQuote{2009-11-29 12:22:00} etc. Often, however, it
is better to round down to a more obvious start point e.g.
\dQuote{2009-11-29 12:00:00} such that the sequence is then
\dQuote{2009-11-29 12:00:00}, \dQuote{2009-11-29 12:15:00}
\ldots{} \code{start.date} is therefore used to force this type
of sequence.}

\item{end.date}{A string giving an end date to use. This is
sometimes useful to make sure a time series extends to a known
end point and is useful when \code{data.thresh} > 0 but the
input time series does not extend up to the final full interval.
For example, if a time series ends sometime in October but
annual means are required with a data capture of >75\% then it
is necessary to extend the time series up until the end of the
year. Input in the format yyyy-mm-dd HH:MM.}

\item{interval}{The \code{timeAverage} function tries to determine
  the interval of the original time series (e.g. hourly) by
  calculating the most common interval between time steps. The
  interval is needed for calculations where the \code{data.thresh}
  >0. For the vast majority of regular time series this works
  fine. However, for data with very poor data capture or irregular
  time series the automatic detection may not work. Also, for time
  series such as monthly time series where there is a variable
  difference in time between months users should specify the time
  interval explicitly e.g. \code{interval = "month"}. Users can
  also supply a time interval to \emph{force} on the time series.
  See \code{avg.time} for the format.

  This option can sometimes be useful with \code{start.date} and
  \code{end.date} to ensure full periods are considered e.g. a
  full year when \code{avg.time = "year"}.}

\item{vector.ws}{Should vector averaging be carried out on wind
speed if available? The default is \code{FALSE} and scalar
averages are calculated. Vector averaging of the wind speed is
carried out on the u and v wind components. For example,
consider the average of two hours where the wind direction and
speed of the first hour is 0 degrees and 2m/s and 180 degrees
and 2m/s for the second hour. The scalar average of the wind
speed is simply the arithmetic average = 2m/s and the vector
average is 0m/s. Vector-averaged wind speeds will always be
lower than scalar-averaged values.}

\item{fill}{When time series are expanded i.e. when a time
interval is less than the original time series, data are
\sQuote{padded out} with \code{NA}. To \sQuote{pad-out} the
additional data with the first row in each original time
interval, choose \code{fill = TRUE}.}

\item{...}{Additional arguments for other functions calling
\code{timeAverage}.}
}
\value{
Returns a data frame with date in class \code{POSIXct}.
}
\description{
Function to flexibly aggregate or expand data frames by different
time periods, calculating vector-averaged wind direction where
appropriate. The averaged periods can also take account of data
capture rates.
}
\details{
This function calculates time averages for a data frame. It also
treats wind direction correctly through vector-averaging. For
example, the average of 350 degrees and 10 degrees is either 0 or
360 - not 180. The calculations therefore average the wind
components.

When a data capture threshold is set through \code{data.thresh} it
is necessary for \code{timeAverage} to know what the original time
interval of the input time series is. The function will try and
calculate this interval based on the most common time gap (and
will print the assumed time gap to the screen). This works fine
most of the time but there are occasions where it may not e.g.
when very few data exist in a data frame or the data are monthly
(i.e. non-regular time interval between months). In this case the
user can explicitly specify the interval through \code{interval}
in the same format as \code{avg.time} e.g. \code{interval =
"month"}. It may also be useful to set \code{start.date} and
\code{end.date} if the time series do not span the entire period
of interest. For example, if a time series ended in October and
annual means are required, setting \code{end.date} to the end of
the year will ensure that the whole period is covered and that
\code{data.thresh} is correctly calculated. The same also goes for
a time series that starts later in the year where
\code{start.date} should be set to the beginning of the year.

\code{timeAverage} should be useful in many circumstances where it
is necessary to work with different time average data. For
example, hourly air pollution data and 15-minute meteorological
data. To merge the two data sets \code{timeAverage} can be used to
make the meteorological data 1-hour means first. Alternatively,
\code{timeAverage} can be used to expand the hourly data to 15
minute data - see example below.

For the research community \code{timeAverage} should be useful for
dealing with outputs from instruments where there are a range of
time periods used.

It is also very useful for plotting data using
\code{\link{timePlot}}.  Often the data are too dense to see
patterns and setting different averaging periods easily helps with
interpretation.
}
\examples{

## daily average values
daily <- timeAverage(mydata, avg.time = "day")

## daily average values ensuring at least 75 \% data capture
## i.e. at least 18 valid hours
\dontrun{daily <- timeAverage(mydata, avg.time = "day", data.thresh = 75)}

## 2-weekly averages
\dontrun{fortnight <- timeAverage(mydata, avg.time = "2 week")}

## make a 15-minute time series from an hourly one
\dontrun{
min15 <-  timeAverage(mydata, avg.time = "15 min", fill = TRUE)
}

# average by grouping variable
\dontrun{
dat <- importAURN(c("kc1", "my1"), year = 2011:2013)
timeAverage(dat, avg.time = "year", type = "site")

# can also retain site code
timeAverage(dat, avg.time = "year", type = c("site", "code"))

# or just average all the data, dropping site/code
timeAverage(dat, avg.time = "year")
}
}
\seealso{
See \code{\link{timePlot}} that plots time series data
  and uses \code{timeAverage} to aggregate data where necessary.
}
\author{
David Carslaw
}
\keyword{methods}
