% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/external.bold.data.summarize.R
\name{bold.data.summarize}
\alias{bold.data.summarize}
\title{Generate specific summaries from the downloaded BCDM data}
\usage{
bold.data.summarize(
  bold_df,
  summary_type = c("concise_summary", "detailed_taxon_counts", "barcode_summary",
    "data_completeness"),
  primer_f = NULL,
  primer_r = NULL,
  rem_na_bin = FALSE
)
}
\arguments{
\item{bold_df}{the data.frame retrieved from the \code{\link[=bold.fetch]{bold.fetch()}} function.}

\item{summary_type}{A character string specifying the type of summary required ('concise_summary', 'detailed_taxon_counts','barcode_summary','data_completeness','all')}

\item{primer_f}{A character string specifying the forward primer. Default value is NULL.}

\item{primer_r}{A character string specifying the reverse primer. Default value is NULL.}

\item{rem_na_bin}{A logical value specifying whether NA BINs should be removed from the BCDM dataframe. Default value is FALSE.}
}
\value{
An output list containing:
\itemize{
\item A data frame of detailed summary based on the \code{summary_type}
\item A bar chart in case \code{summary_type = data_completeness} in addition to the dataframe.
}
}
\description{
The function is used to obtain a different types of data summaries for the downloaded BCDM data via \code{bold.fetch} function.
}
\details{
\code{bold.data.summarize} provides different types of data summaries for the downloaded BCDM dataset. Current options include:
\itemize{
\item concise_summary = A high level overview of the downloaded data that would include total records, counts of unique BINs, countries , institutes etc.
\item data_completeness = A data profile that includes information on missing data, proportion of complete cases for each field in the BCDM data along with data type specific insights like distribution, average and median values for numeric data. Also provides a bar chart visualizing the missing data and total records.
\item detailed_taxon_counts = Taxonomy focused counts of total records with and without BINs, unique countries and institutes.
\item barcode_summary = BIN focused summary of nucleotide basepair length, ambiguous basepair number (if present), presence of primer sequences (forward and/or reverse) in the sequence along with the processid, country and institute associated with the BIN.
\code{rem_na_bin}= TRUE removes all records that don’t have a BIN (Please note that this might result into empty data frames sometimes due to lot of missing data). The forward or reverse primer also needs to be specified. Details on all/specific fields can be checked using the \code{bold.field.info()}.
}

\emph{Note: }. Users are required to install and load the \code{Biostrings} package in case they want to generate the \code{barcode_summary} before running this function. For the data in the \code{nuc_basecount} column in the \code{barcode_summary}, please refer to the \code{bold.field.info()} for details.
}
\examples{
\dontrun{
bold_data.ids <- bold.public.search(taxonomy = list("Oreochromis"))

# Fetch the data using the ids.
#1. api_key must be obtained from BOLD support before using `bold.fetch()` function.
#2. Use the `bold.apikey()` function  to set the apikey in the global env.

bold.apikey('apikey')

bold.data <- bold.fetch(get_by = "processid",
                        identifiers = bold_data.ids$processid)

#1. Generate a concise summary of the data

test.data.summary.concise <- bold.data.summarize(bold_df=bold.data,
                                                 summary_type = "concise_summary")
# Result
test.data.summary.concise$concise_summary


#2. Generate a detailed taxon counts summary

test.data.summary <- bold.data.summarize(bold_df=bold.data,
                                         summary_type = "detailed_taxon_counts")

# Result
test.data.summary$detailed_taxon_counts


#3. Generate data completeness profile

test.data.summary.completeness <- bold.data.summarize(bold_df=bold.data,
                                                      summary_type = "data_completeness")

# Results
# Summary
test.data.summary.completeness$completeness_summary

# Plot
test.data.summary.completeness$completeness_plot


#4. Barcode summary (forward primer LCO1490)

# Users need to first load the package `Biostrings`

test.data.summary.barcode <- bold.data.summarize(bold_df=bold.data,
                                                 summary_type = "barcode_summary",
                                                 primer_f='GGTCAACAAATCATAAAGATATTGG')

# Results
test.data.summary.barcode$barcode_summary
}

}
