% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seg_file.R
\name{seg_file}
\alias{seg_file}
\title{Convenient Tool to Segemnt Chinese Texts}
\usage{
seg_file(..., from = "dir", folder = NULL, mycutter = DEFAULT_cutter,
  enc = "auto", myfun1 = NULL, myfun2 = NULL, special = "",
  ext = "txt")
}
\arguments{
\item{...}{names of folders, files, or the mixture of the two kinds. It can also be a character 
vector of text to be processed when setting \code{from} to "v", see below.}

\item{from}{should only be "dir" or "v". 
If your inputs are filenames, it should be "dir" (default), 
If the inputs is a character vector of texts, it should be "v". However, if it is set to "v", 
make sure each element of the vector is not identical to filename in your working
directory; if they are identical, an error will be raised. 
To do this check is because if they are identical, the function 
\code{segment} will take the input as a file to read!}

\item{folder}{a lenght 1 character indicating the folder to put the segmented text. 
Set it to \code{NULL} if you want the result to be a character vector rather than to be written 
on your disk. Otherwise, it should be a valid directory path, each segmented 
text will be written into a .txt/.rtf file. If the specified folder does not exist, the function 
will try to create it.}

\item{mycutter}{the jiebar cutter to segment text. A default cutter is used. See Details.}

\item{enc}{the file encoding used to read files. If files have different encodings or you do not 
know their encodings, set it to "auto" (default) to let encodings be detected automatically.}

\item{myfun1}{a function used to modify each text after being read by \code{scancn} 
and before being segmented.}

\item{myfun2}{a function used to modify each text after they are segmented.}

\item{special}{a length 1 character or regular expression to be passed to \code{dir_or_file} 
to specify what pattern should be met by filenames. The default is to read all files.}

\item{ext}{the extension of written files. Should be "txt", "rtf" or "". If it is not one of the 
three, it is set to "". This is only used when your input is a text vector rather than 
filenames and you want to write the outcome into your disk.}
}
\value{
a character vector, each element is a segmented text, with words splited by " ". 
If \code{folder} is a folder name, the result will be written into your disk and 
nothing returns.
}
\description{
The function first collects filenames or text vectors, then it 
calls \code{\link[jiebaR]{segment}} to semgent texts. In 
this process, it allows users to do additional modification. 
File encoding is detected automatically. 
After segmenting, segmented words that belong to a text will be pasted 
together into a single character with words splitted by " ".
The segmented result will be returned or written 
on the disk.
}
\details{
Users should provide their jiebar cutter by \code{mycutter}. Otherwise, the function 
uses \code{DEFAULT_cutter} which is created when the package is loaded. 
The \code{DEFAULT_cutter} is simply \code{worker(write=FALSE)}. 
See \code{\link[jiebaR]{worker}}.

As long as 
you have not manually created another variable called "DEFAULT_cutter", 
you can directly use \code{jiebaR::new_user_word(DEFAULT_cutter...)} 
to add new words. By the way, whether you manually create an object 
called "DEFAULT_cutter", the original loaded DEFAULT_cutter which is 
used by default by functions in this package will not be removed by you.
So, whenever you want to use this default value, either you do not set 
\code{mycutter}, or 
set it to \code{mycutter = chinese.misc::DEFAULT_cutter}.

The encoding for writing files (if \code{folder} is not NULL) depends on the encodings 
of files the function reads. If it is "GB18030", "GBK", "GB2312" or other encoding 
that has "GB" in it (case insensitive), 
the file will be written in "GB18030", otherwise, "UTF-8".
}
\examples{
require(jiebaR)
# No Chinese word is allowed, so we use English here.
x <- c("drink a bottle of milk", 
  "drink a cup of coffee", 
 "DRINK SOME WATER")
seg_file(x, from = "v", myfun1 = tolower)
}
