% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/keyword_extract.R
\name{keyword_extract}
\alias{keyword_extract}
\title{Extract keywords from raw text}
\usage{
keyword_extract(dt, id = "id", text, dict, n_max = 4, n_min = 1)
}
\arguments{
\item{dt}{A data.frame containing at least two columns with document ID and text strings for extraction.}

\item{id}{Quoted characters specifying the column name of document ID.Default uses "id".}

\item{text}{Quoted characters specifying the column name of raw text for extraction.}

\item{dict}{A data.table with two columns,namely "id" and "keyword"(set as key).
This should be exported by \code{\link[akc]{make_dict}} function.}

\item{n_max}{The number of words in the n-gram. This must be an integer greater than or equal to 1.
Default uses 4.}

\item{n_min}{This must be an integer greater than or equal to 1, and less than or equal to n_max.
Default uses 1.}
}
\value{
A data.frame(tibble) with two columns, namely document ID and extracted keyword.
}
\description{
When we have raw text like abstract or article but not keywords, we might prefer extracting
keywords first. The least prerequisite data to be provided are a data.frame with document id and raw text,
and a user defined dictionary should be provided. One could use \code{\link[akc]{make_dict}} function to construct his(her)
own dictionary with a character vector containing the vocabularies.
}
\details{
In the procedure of keyword extraction from \pkg{akc},first the raw text would be split
into independent clause (namely split by puctuations of \code{[,;!?.]}). Then the ngrams of the
clauses would be extracted. Finally, the phrases represented by ngrams should be in the dictionary
created by the user (using \code{make_dict}).The user could also specify the \emph{n} of ngrams.

This function could take some time if the sample size is large, it is suggested to use system.time to do
some test first. Nonetheless, it has been optimized by data.table codes already and has good performance for big data.
}
\examples{

 library(akc)
 library(dplyr)

  bibli_data_table \%>\%
    keyword_clean(id = "id",keyword = "keyword") \%>\%
    pull(keyword) \%>\%
    make_dict -> my_dict

 \donttest{
  bibli_data_table \%>\%
    keyword_extract(id = "id",text = "abstract",dict = my_dict) \%>\%
    keyword_merge(keyword = "keyword")
 }
}
\seealso{
\code{\link[akc]{make_dict}}
}
