\name{cat.analysis}
\alias{cat.analysis}
\title{Categorical Data Analysis for Probability Survey Data}
\description{
  This function organizes input and output for analysis of categorical data 
  generated by a probability survey.  Input can be either an object belonging to
  class psurvey.analysis (see the documentation for function psurvey.analysis)
  or through use of the other arguments to this function.
}
\usage{
cat.analysis(sites=NULL, subpop=NULL, design=NULL, data.cat=NULL,
   N.cluster=NULL, popsize=NULL, stage1size=NULL, popcorrect=FALSE,
   sizeweight=FALSE, unitsize=NULL, vartype="Local", conf=95, psurvey.obj=NULL)
}
\arguments{
  \item{sites}{a data frame consisting of two variables: the first variable is 
    site IDs, and the second variable is a logical vector indicating which
    sites to use in the analysis.  If psurvey.obj is not provided, then
    this argument is required.  The default is NULL.}
  \item{subpop}{a data frame describing sets of populations and subpopulations 
    for which estimates will be calculated.  The first variable is site  
    IDs.  Each subsequent variable identifies a Type of population, where
    the variable name is used to identify Type.  A Type variable
    identifies each site with one of the subpopulations of that Type.  If
    psurvey.obj is not provided, then this argument is required.  The
    default is NULL.}
  \item{design}{a data frame consisting of design variables.  If psurvey.obj is
    not provided, then this argument is required.  The default is NULL.
    Variables should be named as follows:\cr
       siteID = site IDs\cr
       wgt = final adjusted weights, which are either the weights for a
         single-stage sample or the stage two weights for a two-stage sample\cr
       xcoord = x-coordinates for location, which are either the x-coordinates
         for a single-stage sample or the stage two x-coordinates for a
         two-stage sample\cr
       ycoord = y-coordinates for location, which are either the y-coordinates
         for a single-stage sample or the stage two y-coordinates for a
         two-stage sample\cr
       stratum = the stratum codes\cr
       cluster = the stage one sampling unit (primary sampling unit or cluster)
         codes\cr
       wgt1 = final adjusted stage one weights\cr
       xcoord1 = the stage one x-coordinates for location\cr
       ycoord1 = the stage one y-coordinates for location\cr
       support = support values - the value one (1) for a site from a 
         finite resource or the measure of the sampling unit associated 
         with a site from an extensive resource, which is required for 
         calculation of finite and continuous population correction 
         factors\cr
       swgt = size-weights, which is the stage two size-weight for a two-
         stage sample\cr
       swgt1 = stage one size-weights}
  \item{data.cat}{a data frame of categorical response variables.  The first 
    variable is site IDs.  Subsequent variables are response variables.
    Missing data (NA) is allowed.  If psurvey.obj is not provided, then
    this argument is required.  The default is NULL.}
  \item{N.cluster}{the number of stage one sampling units in the resource, which 
    is required for calculation of finite and continuous population 
    correction factors for a two-stage sample.  For a stratified sample 
    this variable must be a vector containing a value for each stratum and
    must have the names attribute set to identify the stratum codes.  The
    default is NULL.}
  \item{popsize}{the known size of the resource - the total number of sampling 
    units of a finite resource or the measure of an extensive resource,
    which is used to adjust estimators for the known size of a resource.
    This argument also is required for calculation of finite and
    continuous population correction factors for a single-stage sample.   
    The argument must be in the form of a list containing an element for   
    each population Type in the subpop data frame, where NULL is a valid   
    choice for a population Type.  The list must be named using the column  
    names for the population Types in subpop. If a population Type doesn't  
    contain subpopulations, then each element of the list is either a  
    single value for an unstratified sample or a vector containing a value  
    for each stratum for a stratified sample, where elements of the vector
    are named using the stratum codes.  If a population Type contains 
    subpopulations, then each element of the list is a list containing an 
    element for each subpopulation, where the list is named using the 
    subpopulation names.  The element for each subpopulation will be 
    either a single value for an unstratified sample or a named vector of 
    values for a stratified sample.  The default is NULL.\cr\cr
    Example popsize for a stratified sample:\cr
       popsize = list("Pop 1"=c("Stratum 1"=750, "Stratum 2"=500,
          "Stratum 3"=250),\cr "Pop2"=list("SubPop 1"=c("Stratum 1"=350,
          "Stratum 2"=250, "Stratum 3"=150),\cr "SubPop 2"=c("Stratum 1"=250,
          "Stratum 2"=150, "Stratum 3"=100),\cr "SubPop 3"=c("Stratum 1"=150,
          "Stratum 2"=150, "Stratum 3"=75)),\cr "Pop 3"=NULL)\cr\cr
    Example popsize for an unstratified sample:\cr
       popsize = list("Pop 1"=1500, "Pop2"=list("SubPop 1"=750,
          "SubPop 2"=500, "SubPop 3"=375), "Pop 3"=NULL)\cr}
  \item{stage1size}{the known size of the stage one sampling units of a 
    two-stage sample, which is required for calculation of finite and  
    continuous population correction factors for a two-stage sample and 
    must have the names attribute set to identify the stage one sampling 
    unit codes.  For a stratified sample, the names attribute must be set
    to identify both stratum codes and stage one sampling unit codes using
    a convention where the two codes are separated by the & symbol, e.g.,
    "Stratum 1&Cluster 1".  The default is NULL.}
  \item{popcorrect}{a logical value that indicates whether finite or continuous 
    population correction factors should be employed during variance 
    estimation, where TRUE = use the correction factors and FALSE = do not 
    use the correction factors.  The default is FALSE.}
  \item{sizeweight}{a logical value that indicates whether size-weights should 
    be used in the analysis, where TRUE = use the size-weights and FALSE = 
    do not use the size-weights.  The default is FALSE.}
  \item{unitsize}{the known sum of the size-weights of the resource.  The 
    argument must be in the form of a list containing an element for each  
    population Type in the subpop data frame, where NULL is a valid choice  
    for a population Type.  The list must be named using the column  
    names for population Types in subpop.  If a population Type doesn't  
    contain subpopulations, then each element of the list is either a  
    single value for an unstratified sample or a vector containing a value  
    for each stratum for a stratified sample, where elements of the vector
    are named using the stratum codes.  If a population Type contains 
    subpopulations, then each element of the list is a list containing an 
    element for each subpopulation, where the list is named using the 
    subpopulation names.  The element for each subpopulation will be 
    either a single value for an unstratified sample or a named vector of 
    values for a stratified sample.  The default is NULL.}
  \item{vartype}{the choice of variance estimator, where "Local" = local mean
    estimator and "SRS" = SRS estimator.  The default is "Local".}
  \item{conf}{the confidence level.  The default is 95\%.}
  \item{psurvey.obj}{a list of class psurvey.analysis that was produced by the
    function psurvey.analysis.  Depending on input to that function,
    some elements of the list may be NULL.  The default is NULL.}
}
\value{
  Value is a data frame of population estimates for all combinations of subpopulation 
  Types, subpopulations within Types, response variables, and categories within 
  each response variable.  Estimates are calculated for proportion and size of 
  the population.  Standard error estimates and  confidence interval estimates 
  also are calculated.
}
\references{
  Diaz-Ramos, S., D.L. Stevens, Jr., and A.R. Olsen. (1996).  \emph{EMAP
  Statistical Methods Manual.} EPA/620/R-96/XXX.  Corvallis, OR: U.S.
  Environmental Protection Agency, Office of Research and Development, National
  Health Effects and Environmental Research Laboratory, Western Ecology
  Division.
}
\author{
Tony Olsen \email{Olsen.Tony@epa.gov}\cr
Tom Kincaid \email{Kincaid.Tom@epa.gov}
}
\seealso{
\code{\link{category.est}} 
}
\examples{
# Categorical variable example for two resource classes
mysiteID <- paste("Site", 1:100, sep="")
mysites <- data.frame(siteID=mysiteID, Active=rep(TRUE, 100))
mysubpop <- data.frame(siteID=mysiteID, All.Sites=rep("All Sites", 100),
   Resource.Class=rep(c("Good","Poor"), c(55,45)))
mydesign <- data.frame(siteID=mysiteID, wgt=runif(100, 10, 100),
   xcoord=runif(100), ycoord=runif(100), stratum=rep(c("Stratum1",
   "Stratum2"), 50))
mydata.cat <- data.frame(siteID=mysiteID, CatVar=rep(c("north", "south",
   "east", "west"), 25))
mypopsize <- list(All.Sites=c(Stratum1=3500, Stratum2=2000),
   Resource.Class=list(Good=c(Stratum1=2500, Stratum2=1500),
   Poor=c(Stratum1=1000, Stratum2=500)))
cat.analysis(sites=mysites, subpop=mysubpop, design=mydesign,
   data.cat=mydata.cat, popsize=mypopsize)

# Exclude category "south" from the analysis
mysites <- data.frame(siteID=mysiteID, Active=rep(c(TRUE, FALSE, TRUE,
   TRUE), 25))
cat.analysis(sites=mysites, subpop=mysubpop, design=mydesign,
   data.cat=mydata.cat, popsize=mypopsize)
}
\keyword{survey}
\keyword{univar}
