% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/getData.R
\name{getData}
\alias{getData}
\title{Read Data to a Data Frame}
\usage{
getData(
  data,
  varnames = NULL,
  drop = FALSE,
  dropUnusedLevels = TRUE,
  omittedLevels = TRUE,
  defaultConditions = TRUE,
  formula = NULL,
  recode = NULL,
  includeNaLabel = FALSE,
  addAttributes = FALSE,
  returnJKreplicates = TRUE
)
}
\arguments{
\item{data}{an \code{edsurvey.data.frame} or
a \code{light.edsurvey.data.frame}}

\item{varnames}{a character vector of variable names that will be returned.
When both \code{varnames} and
a \code{formula} are specified, variables associated with both are
returned. Set to \code{NULL} by default.}

\item{drop}{a logical value. When set to the default value of \code{FALSE},
when a single column is returned, it is still represented as a
\code{data.frame} and is not converted to a vector.}

\item{dropUnusedLevels}{a logical value. When set to the default value of
\code{TRUE}, drops unused levels of all factor
variables.}

\item{omittedLevels}{a logical value. When set to the default value of
\code{TRUE}, drops those levels of all factor variables
that are specified in an \code{edsurvey.data.frame}. Use
\code{print} on an \code{edsurvey.data.frame} to see
the omitted levels. The omitted levels also can be
adjusted with \code{setAttributes}; see Examples.}

\item{defaultConditions}{a logical value. When set to the default value of
\code{TRUE}, uses the default conditions stored in
 an \code{edsurvey.data.frame} to subset the data. Use
\code{print} on an \code{edsurvey.data.frame} to
see the default conditions.}

\item{formula}{a \ifelse{latex}{\code{formula}}{\code{\link[stats]{formula}}}.
When included, \code{getData} returns data associated with
all variables of the \code{formula}. When both \code{varnames} and a
formula are specified, the variables associated with both are
returned. Set to \code{NULL} by default.}

\item{recode}{a list of lists to recode variables. Defaults to \code{NULL}.
Can be set as \code{recode} \code{=} \code{list(var1}
\code{=} \code{list(from} \code{=} \code{c("a","b","c"), to}
\code{=} \code{"d"))}. See Examples.}

\item{includeNaLabel}{a logical value to indicate if \code{NA} (missing) values are
returned as literal \code{NA} values or as factor levels
coded as \code{NA}}

\item{addAttributes}{a logical value set to \code{TRUE} to get a
\code{data.frame} that can be used in calls to
other functions that usually would take an
\code{edsurvey.data.frame}. This \code{data.frame} also is called a \code{light.edsurvey.data.frame}.
See Description section in \code{\link{edsurvey.data.frame}} for
more information on \code{light.edsurvey.data.frame}.}

\item{returnJKreplicates}{a logical value indicating if JK replicate weights
should be returned. Defaults to \code{TRUE}.}
}
\value{
When \code{addAttributes} is \code{FALSE}, \code{getData} returns a
\code{data.frame} containing data associated with the requested
variables. When \code{addAttributes} is \code{TRUE}, \code{getData} returns a
\code{light.edsurvey.data.frame}.
}
\description{
Reads in selected columns to a \code{data.frame} or a
             \code{light.edsurvey.data.frame}. On an \code{edsurvey.data.frame},
             the data are stored on disk.
}
\details{
By default, an \code{edsurvey.data.frame} does not have data read
into memory until \code{getData} is called and returns a data frame.
This structure allows \code{EdSurvey} to have a minimal memory footprint.
To keep the footprint small, you need to limit \code{varnames} to just
the necessary variables.

There are two methods of attaching survey attributes to a \code{data.frame}
to make it usable by the functions in the \code{EdSurvey} package (e.g., \code{lm.sdf}):
(a) setting the \code{addAttributes} argument to \code{TRUE} at in the call to \code{getData}
or (b) by appending the attributes to the data frame with \code{rebindAttributes}.

When \code{getData} is called, it returns a data frame. Setting the
\code{addAttributes} argument to \code{TRUE} adds the survey attributes and
changes the resultant \code{data.frame} to a \code{light.edsurvey.data.frame}.

Alternatively, a \code{data.frame} can be coerced into a \code{light.edsurvey.data.frame}
using \code{rebindAttributes}. See Examples in the \code{\link{rebindAttributes}} documentation.

If both \code{formula} and \code{varnames} are populated, the
variables on both will be included.

See the vignette titled
\href{https://www.air.org/sites/default/files/EdSurvey-getData.pdf}{\emph{Using the \code{getData} Function in EdSurvey}}
for long-form documentation on this function.
}
\examples{
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# get two variables, without weights
df <- getData(data=sdf, varnames=c("dsex", "b017451"))
table(df)

# example of using recode
df2 <- getData(data=sdf, varnames=c("dsex", "t088301"),
               recode=list(t088301=list(from=c("Yes, available","Yes, I have access"),
                                        to=c("Yes")),
                           t088301=list(from=c("No, have no access"),
                                        to=c("No"))))
table(df2)

# when readNAEP is called on a data file, it appends a default 
# condition to the edsurvey.data.frame. You can see these conditions
# by printing the sdf
sdf

# As per the default condition specified, getData restricts the data to only
# Reporting Sample. This behavior can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), defaultConditions = FALSE)
table(df2)

# similarly, the default behavior of omitting certain levels specified
# in the edsurvey.data.frame can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), omittedLevels = FALSE)
table(df2)

# omittedLevels can also be edited with setAttributes()
# here, the omitted level "Multiple" is removed from the list
sdfIncludeMultiple <- setAttributes(sdf, "omittedLevels", c(NA, "Omitted"))
# check that it was set
getAttributes(sdfIncludeMultiple, "omittedLevels")
# notice that omittedLevels is TRUE, removing NA and "Omitted" still
dfIncludeMultiple <- getData(data=sdfIncludeMultiple, varnames=c("dsex", "b017451"))
table(dfIncludeMultiple)

# the variable "c052601" is from the school-level data file; merging is handled automatically.
# returns a light.edsurvey.data.frame using addAttributes=TRUE argument
gddat <- getData(data=sdf, 
                 varnames=c("composite", "dsex", "b017451","c052601"),
                 addAttributes = TRUE)
class(gddat)
# look at the first few lines
head(gddat)

# get a selection of variables, recode using ifelse, and reappend attributes
# with rebindAttributes so that it can be used with EdSurvey analysis functions
df0 <- getData(sdf, c("composite", "dsex", "b017451", "origwt"))
df0$sex <- ifelse(df0$dsex=="Male", "boy", "girl")
df0 <- rebindAttributes(df0, sdf)

\dontrun{
# getting all the data can use up all the memory and is generally a bad idea
df0 <- getData(sdf, varnames=colnames(sdf),
               omittedLevels=FALSE, defaultConditions=FALSE)
} 
}
\seealso{
\code{\link{rebindAttributes}}, \code{\link{subset.edsurvey.data.frame}}
}
\author{
Tom Fink, Paul Bailey, and Ahmad Emad
}
