% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lma_dict.R
\name{lma_dict}
\alias{lma_dict}
\title{English Function Word Category and Special Character Lists}
\usage{
lma_dict(..., as.regex = TRUE, as.function = FALSE)
}
\arguments{
\item{...}{Numbers or letters corresponding to category names: ppron, ipron, article,
adverb, conj, prep, auxverb, negate, quant, interrog, number, interjection, or special.}

\item{as.regex}{Logical: if \code{FALSE}, lists are returned without regular expression.}

\item{as.function}{Logical or a function: if specified and \code{as.regex} is \code{TRUE}, the selected dictionary
will be collapsed to a regex string (terms separated by \code{|}), and a function for matching characters to that
string will be returned. The regex string is passed to the matching function (\code{\link{grepl}} by default)
as a 'pattern' argument, with the first argument of the returned function being passed as an 'x' argument.
See examples.}
}
\value{
A list with a vector of terms for each category, or (when \code{as.function = TRUE}) a function which
accepts an initial "terms" argument (a character vector), and any additional arguments determined by function
entered as \code{as.function} (\code{\link{grepl}} by default).
}
\description{
Returns a list of function words based on the Linguistic Inquiry and Word Count 2015 dictionary
(in terms of category names -- words were selected independently), or a list of special characters and patterns.
}
\note{
The \code{special} category is not returned unless specifically requested. It is a list of regular expression
strings attempting to capture special things like ellipses and emojis, or sets of special characters (those outside
of the Basic Latin range; \code{[^\\u0020-\\u007F]}), which can be used for character conversions.
If \code{special} is part of the returned list, \code{as.regex} is set to \code{TRUE}.

The \code{special} list is always used by both \code{\link{lma_dtm}} and \code{\link{lma_termcat}}. When creating a
dtm, \code{special} is used to clean the original input (so that, by default, the punctuation involved in ellipses
and emojis are treated as different -- as ellipses and emojis rather than as periods and parens and colons and such).
When categorizing a dtm, the input dictionary is passed by the special lists to be sure the terms in the dtm match up
with the dictionary (so, for example, ": (" would be replaced with "repfrown" in both the text and dictionary).
}
\examples{
# return the full dictionary (excluding special)
lma_dict()

# return the standard 7 category lsm categories
lma_dict(1:7)

# return just a few categories without regular expression
lma_dict(neg, ppron, aux, as.regex = FALSE)

# return special specifically
lma_dict(special)

# returning a function
is.ppron <- lma_dict(ppron, as.function = TRUE)
is.ppron(c("i", "am", "you", "were"))

in.lsmcat <- lma_dict(1:7, as.function = TRUE)
in.lsmcat(c("a", "frog", "for", "me"))

## use as a stopword filter
is.stopword <- lma_dict(as.function = TRUE)
dtm <- lma_dtm("Most of these words might not be all that relevant.")
dtm[, !is.stopword(colnames(dtm))]

## use to replace special characters
clean <- lma_dict(special, as.function = gsub)
clean(c(
  "\u201Ccurly quotes\u201D", "na\u00EFve", "typographer\u2019s apostrophe",
  "en\u2013dash", "em\u2014dash"
))
}
\seealso{
To score texts with these categories, use \code{\link{lma_termcat}}.
}
