% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/LncFinder.R
\name{make_referFreq}
\alias{make_referFreq}
\title{Make Frequencies File for Log.Dist, Euc.Dist, and hexamer score}
\usage{
make_referFreq(
  cds.seq,
  lncRNA.seq,
  k = 6,
  step = 1,
  alphabet = c("a", "c", "g", "t"),
  on.orf = TRUE,
  ignore.illegal = TRUE
)
}
\arguments{
\item{cds.seq}{Coding sequences (mRNA without UTRs). Can be a FASTA file loaded
by \code{\link[seqinr]{seqinr-package}}.}

\item{lncRNA.seq}{Long non-coding RNA sequences. Can be a FASTA file loaded by
\code{\link[seqinr]{seqinr-package}}.}

\item{k}{An integer that indicates the sliding window size. (Default: \code{6})}

\item{step}{Integer defaulting to \code{1} for the window step.}

\item{alphabet}{A vector of single characters that specify the different character
of the sequence. (Default: \code{alphabet = c("a", "c", "g", "t")})}

\item{on.orf}{Logical. Incomplete CDs can lead to a false shift and a
inaccurate hexamer frequencies. When \code{on.orf = TRUE}, the frequencies
will be calculated on the longest ORF. This parameter is strongly recommended to
set as \code{TRUE} when mRNA is used as CDs. Only available when
\code{alphabet = c("a", "c", "g", "t")}. (Default: \code{TRUE})}

\item{ignore.illegal}{Logical. If \code{TRUE}, the sequences with non-nucleotide
characters (nucleotide characters: "a", "c", "g", "t") will be ignored when
calculating the frequencies. Only available when \code{alphabet = c("a", "c", "g", "t")}.
(Default: \code{TRUE})}
}
\value{
Returns a list which consists the frequencies of protein-coding sequences
and non-coding sequences.
}
\description{
This function is used to calculate the frequencies of lncRNAs and CDs.
The Frequencies file can be used to calculate Logarithm-Distance (\code{\link{compute_LogDistance}}),
Euclidean-Distance (\code{\link{compute_EucDistance}}), and hexamer score (\code{\link{compute_hexamerScore}}).

NOTE: If users need to make frequencies file to build
new LncFinder classifier using function \code{\link{extract_features}},
please refer to function \code{make_frequencies}.
}
\details{
This function is used to make frequencies file for the computation of
Logarithm-Distance (\code{\link{compute_LogDistance}}), Euclidean-Distance
(\code{\link{compute_EucDistance}}),
and hexamer score (\code{\link{compute_hexamerScore}}).

In order to achieve high accuracy, mRNA should not be regarded as CDs and assigned
to parameter \code{cds.seq}. However, CDs of some species may be insufficient
for calculating frequencies. In that case, mRNAs can be regarded as CDs with parameter
\code{on.orf = TRUE}, and the frequencies will be calculated on ORF region.
If \code{on.orf = TRUE}, users can set \code{step = 3} to simulate the translation process.
}
\section{References}{

Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li.
LncFinder: an integrated platform for long non-coding RNA identification utilizing
sequence intrinsic composition, structural information, and physicochemical property.
\emph{Briefings in Bioinformatics}, 2019, 20(6):2009-2027.
}

\examples{
\dontrun{
Seqs <- seqinr::read.fasta(file =
"http://www.ncbi.nlm.nih.gov/WebSub/html/help/sample_files/nucleotide-sample.txt")

referFreq <- make_referFreq(cds.seq = Seqs, lncRNA.seq = Seqs, k = 6, step = 1,
                            alphabet = c("a", "c", "g", "t"), on.orf = TRUE,
                            ignore.illegal = TRUE)
}

}
\seealso{
\code{\link{make_frequencies}},
         \code{\link{compute_LogDistance}},
         \code{\link{compute_EucDistance}},
         \code{\link{compute_hexamerScore}}.
}
\author{
HAN Siyu
}
