% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helpers.R
\name{ngrams}
\alias{ngrams}
\title{N-grams and their frequencies.}
\usage{
ngrams(x, n = 1, borders = c("", ""), rm = "", as.table = T)
}
\arguments{
\item{x}{[character vector] Words to be cut into n-grams.}

\item{n}{[integer] The length of n-grams to look for. Defaults to \code{1}.}

\item{borders}{[character] Characters to prepend and append to every word. Must be a vector of exactly two character strings. Defaults to \code{c("","")}.}

\item{rm}{[character] Characters to be removed from \code{x} before cutting into n-grams. May be a regular expression, f.ex. "[-\\|]" will capture the default symbol for linguistics zeros as well as the default segment separators. Empty string denotes nothing to replace. Defaults to empty string.}

\item{as.table}{[logical] Return the result as a table? Defaults to \code{TRUE}.}
}
\value{
[table] Table with counts of n-grams.
}
\description{
Find n-grams of specified length and return them as a list, or their counts as a table.
}
\details{
Data processed with \code{\link{soundcorrs}} are generally expected to be segmented and aligned, and both segmentation and alignment are recommended to be performed manually. This is a laborious process, but it is feasible when segments represent morphemes or phonemes. Should segments represent n-grams, however, the fully manual approach would have been very time consuming and prone to errors.
}
\examples{
dataset <- loadSampleDataset ("data-capitals")
ngrams(dataset$data[,"ALIGNED.German"], n=2)
ngrams(dataset$data[,"ALIGNED.German"], n=3, as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=4, rm="[-\\\\|]", as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=5, borders=c(">","<"), rm="[-\\\\|]", as.table=FALSE)
}
